MX2008008762A - Resampling and picture resizing operations for multi-resolution video coding and decoding - Google Patents

Resampling and picture resizing operations for multi-resolution video coding and decoding

Info

Publication number
MX2008008762A
MX2008008762A MXMX/A/2008/008762A MX2008008762A MX2008008762A MX 2008008762 A MX2008008762 A MX 2008008762A MX 2008008762 A MX2008008762 A MX 2008008762A MX 2008008762 A MX2008008762 A MX 2008008762A
Authority
MX
Mexico
Prior art keywords
sampling
sample
image
horizontal
video
Prior art date
Application number
MXMX/A/2008/008762A
Other languages
Spanish (es)
Inventor
j sullivan Gary
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Publication of MX2008008762A publication Critical patent/MX2008008762A/en

Links

Abstract

Techniques and tools for high accuracy position calculation for picture resizing in applications such as spatially-scalable video coding and decoding are described. In one aspect, resampling of a video picture is performed according to a resampling scale factor. The resampling comprises computation of a sample value at a positioni, jin a resampled array. The computation includes computing a derived horizontal or vertical sub-sample positionxoryin a manner that involves approximating a value in part by multiplying a 2nvalue byan inverse (approximate or exact) of the upsampling scale factor. The approximating can be a rounding or some other kind of approximating, such as a ceiling or floor function that approximates to a nearby integer. The sample value is interpolated using a filter.

Description

OPERATIONS TO RE-SAMPLE AND RETURN TO ADJUST IMAGE FOR CODING AND DECODING OF MULTIPLE RESOLUTION VIDEO TECHNICAL FIELD Tools for coding / decoding digital video are described BACKGROUND With the increased popularity of DVDs, the delivery of music over the Internet, and digital cameras, digital media have become common. Engineers use a variety of techniques to process audio, video, and digital images efficiently while still maintaining quality. These techniques help to understand how audio, video, and image information is represented and processed on a computer I. Representation of Media Information in a Computer A computer processes media information as a series of numbers that represent that information. For example, a single number can represent the brightness intensity or intensity of a color component such as red, green or blue for each small elementary region of an image. , so that Digital representation of the image consists of one or more provisions of such numbers. Each number can be called a sample. For a color image, it is conventional to use more than one sample to represent the color of each elemental region, and typically three samples are used. The group of these samples for an elementary region can be referred to as a pixel, where the word "pixel" is a contraction that refers to the concept of an "image element". For example, a pixel may consist of three samples that represent the intensity of red, green and blue light needed to represent the elementary region. Such a pixel type is referred to as an RGB pixel. Several factors affect the quality of image information, including sample depth, resolution and frame rate (for video). The sample depth is a property usually measured in bits that indicates the scale of numbers that can be used to represent a sample. When more values are possible for the sample, the quality may be higher because the number may capture more subtle variations in intensity and / or a larger scale of values. Resolution generally refers to the number of samples in some length of time (for audio) or space (for images or individual video images). Images with higher spatial resolution tend to be seen more clearly than other images and contain more discernible useful details. The frame index is a common term for temporary resolution for video The video with top frame index tends to mimic the smooth movement of natural objects better than other videos, and similarly it can be considered to contain more detail in the temporal dimension. For all these factors, the high quality exchange is the cost of storing and transmit the information in terms of the bit rate needed to represent the sample depth, resolution and frame index, as shown in Table 1 Table 1 bit rates for different levels of incomplete video quality Despite the high bit rate needed to store high-quality video (such as HDTV), companies and customers increasingly depend on computers to create, distribute, and play high-quality content. For this reason, engineers use compression ( also called source coding or source coding) to reduce the bit rate of digital media Compression decreases the cost of storing and transmit information when converting information into a lower bit index form Compression can without loss, where the quality of the video does not suffer but the decreases in the bit rate are limited by the complexity of the video OR, the compression can be with loss, where the quality of the video suffers but the decreases in bit rate are more dramatic Decompression (also called decoding) reconstructs a version of the original information of the compressed form An "encoder / decoder" is an encoder system / Decoder In general, video compression techniques can include "intra" compression and "inter" or predictive compression. For video images, intra-compression techniques compress individual images. Inter-compression techniques compress images with reference to preceding images and / or following II. Multiple Resolution Video and Spatial Scalability Standard video encoders experience dramatic degradation in performance when the target bit rate falls below a certain threshold Quantification other lossy procedural steps introduce distortion At low bit rates, high frequency information can be distorted in a heavy way or be completely lost. As a result, significant artifacts can arise and cause a substantial drop in the quality of the reconstructed video.
Available bit rates increase while improving transmission and procedural technology, maintaining high visual quality at limited bit rates remains a primary goal of the video encoder / decoder design The existing encoders / decoders use several methods to improve visual quality in limited bit rates Multiple resolution encoding allows video encoding at different spatial resolutions Reduced resolution video can be encoded at a substantially lower bit rate, at the expense of loss information For example, a previous video encoder can sample down (when using a down sampling filter) full-resolution video and encode it at a reduced resolution in vertical and / or horizontal directions Reduce the resolution in each direction by half reduces the dimensions of the encoded image size by half The encoder points out the re reduced solution encoding a decoder The decoder receives information indicating reduced resolution coding and evaluates the information received as the reduced resolution video must be sampled upwards (by using a sampling filter upwards) to increase the image size before However, the information that was lost when the encoder was sampled downwards and encoded the video images still lacks the images sampled upwards. The spatially scalable video uses an approach of Multiple layers, which allows an encoder to reduce spatial resolution (and thus bitrates) in a base layer while retaining higher resolution information from the source video in one or more layers of enhancement. For example, a base layer intra image may be encoded at a reduced resolution, while an accompanying enhancement layer intra image may be encoded at a higher resolution. Similarly, the foreseen images of the base layer may be accompanied by expected images of the improvement layer. A decoder may choose (based on bit index and / or other criteria limitations) to decode only base layer images at the lower resolution to obtain reconstructed lower resolution images, or decode base layer and enhancement layer images to obtain images reconstructed of higher resolution. When the base layer is encoded at a lower resolution than the presented image (also referred to as a downward sampling), the encoded image size is actually smaller than the presented image. The decoder performs calculations to readjust the reconstructed image and uses up sampling filters to produce sample values interpolated at appropriate positions in the reconstructed image. However, previous encoders / decoders using spatially scalable video suffered from inflexible up sampling filters and inaccurate and expensive image readjustment techniques (in terms of computation time or bit rate).
Given the critical importance of compression and decompression from video to digital video, it is not surprising that video compression and decompression are richly developed fields. Whatever the benefits of previous video compression and decompression techniques, however, do not have the advantages of the following techniques tools BRIEF DESCRIPTION OF THE INVENTION This brief description is provided to introduce a selection of concepts in a simplified form that is also described later in the detailed description. This brief description does not pretend to identify key characteristics or essential characteristics of the claimed subject, nor does it intend to be used to limit the scope of the subject claimed. In summary, the detailed description addresses several techniques and tools for multiple resolution and coding and decoding of spatially scalable video in layers. For example, the detailed description addresses several techniques and tools for high accuracy position calculation to readjust image in applications such as spatially scalable video encoding and decoding. Techniques and tools for high accuracy position calculation for image readjustment in applications such as spatially scalable video encoding and decoding are described in n appearance, the Sampling of a video image is done according to a re-sampling scale factor Re-sampling involves calculation of a sample value in a position /, j in a re-sampled order The calculation includes calculating a sub position - horizontal or vertical sample derived x or y in a form that involves approximating a value in part by multiplying a 2n value by an inverse (approximate or exact) of the sampling scale factor upwards (or dividing the 2n value by the scale factor of up sampling or an approximation of the sampling scale factor up) The exponent n can be a sum of two integers that includes an integer F that represents a number of bits in a fraction component. The approximate can be a rounding or some other Approximate type, such as a ceiling or floor function that approximates a near integer Sample value is interpolated when using a filter Some alternatives of the techniques described provide a Altered sample position calculation that in an implementation provides approximately an extra bit of precision in calculations such as significant alteration of the position calculation procedure and its complexity. Some additional alternatives of the techniques described relate to how the position calculation of sample with 4 2 2 and 44 4 sampling structures These alternative techniques for sampling structures close the sample position calculations of luma and chroma together when the resolution of the chroma and luma sampling grid is the same in a particular dimension The additional features and advantages will be apparent from the following detailed description of several modalities that proceed with reference to the accompanying drawings BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a block diagram of a suitable computing environment in conjunction with which various described modalities can be implemented. Figure 2 is a block diagram of a generalized video coder system in conjunction with which various described modalities can be implemented. Figure 3 is a block diagram of a generalized video decoder system in conjunction with which various embodiments described can be implemented. Figure 4 is a diagram of a macroblock format used in various embodiments described. Figure 5A is a part diagram. of an interlaced video frame, showing alternate lines of an upper field and a lower field Figure 5B is a diagram of an interlaced video frame organized to encode / decode as a frame, and Figure 5C is a diagram of the frame of interlaced video organized to encode / decode as fields Figure 5D shows six illustrative spatial alignments of 4: 2: 0 chroma sample locations relative to luma sample locations for each field of a video frame. Figure 6 is a flow chart showing a generalized technique for multiple resolution video encoding. Figure 7 is a flow chart showing a generalized technique for multiple video resolution decoding. Figure 8 is a flow chart showing a technique for multiple resolution coding of intra-images and interim image images. Figure 9 is a flow chart showing a technique for multiple-resolution decoding of intra-images and predicted images of the image. Figure 10 is a flow chart showing a technique for encoding spatially scalable bit stream layers to allow decoding of video at different resolutions.
Figure 11 is a flow chart showing a technique for decoding spatially scalable bitstream layers to allow decoding video at different resolutions. Figures 12 and 13 are code diagrams showing pseudo-code for an illustrative multiple stage position calculation technique. Figure 14 is a code diagram showing pseudo-code for an illustrative incremental position calculation technique.
DETAILED DESCRIPTION The described modalities are directed to techniques tools for multiple resolution and encoding and decoding of spatially scalable video in layers. The various techniques and tools described herein can be used independently. Some of the techniques and tools can be used in combination (for example, in different phases of the combined coding and / or decoding process). Several techniques are described below with reference to flow charts of procedural acts. The various procedural acts shown in the flow charts can be consolidated into fewer acts or separated into more acts. For the search for simplicity, the relation of acts shown in a particular flow chart for acts described elsewhere is often not shown. In many cases, acts in a flow chart can be reordered. Much of the detailed description addresses the representation, encoding, decoding of video information. The techniques and tools described herein for representing, encoding and decoding video information may be applied to audio information, still image information, or other media information.
I. Computing Environment Figure 1 illustrates a generalized example of an adequate computing environment 100 in which several of the described modalities can be implemented. The computing environment 100 is not intended to suggest any limitations on the scope of use or functionality, while the techniques and tools can be implemented in various general purpose and special purpose computation environments With reference to Figure 1, the computing environment 100 which includes at least one processing unit 110 and memory 120 In Figure 1, this very basic configuration 130 is includes within a dotted line The processing unit 110 executes executable instructions by computer and can be a real processor or a virtual one In a multi-processing system, the multiple processing units execute executable instructions by computer to increase the processing power memory 120 can be a volatile memory (for example, re records, cache, RAM), non-volatile memory (for example, ROM, EEPROM, flash memory, etc.), or some combination of the two. The memory 120 stores software 180 that implements a video encoder or decoder with one or more techniques and described tools The computing environment may have additional features For example, the computing environment 100 includes storage 140, one or more input devices 150, one or more more output devices 160, and one or more communication connections 170. An interconnection mechanism (not shown) such as a common conductor, controller, or network interconnects components of the computing environment 100. Typically, the operating system software (not shown) provides an operating environment for other software running in the computing environment 100, and coordinates activities in the components of the computing environment 100. The storage 140 may be removable or non-removable, and includes magnetic disks, tape or magnetic cassettes , CD-ROMs, DVDs, flash memory, or any other means that can be used to store information and that can be accessed within the computing environment 100. The storage 140 stores instructions for the software 180 that implements the video encoder or decoder. The input device (s) 150 can be a touch input device such as a keyboard, mouse, pen, touch screen, or seguibola, a voice input device, a scanning device, or another device that provides input to the computing environment 100. For audio or video encoding, the input device (s) ( s) 150 can be a sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD-ROM, CD-RW DVD that reads samples of audio or video in the computing environment 100. The output device (s) 160 may be a screen, printer, horn, CD or DVD-writer, or other device that provides output from the computing environment 100. The communication connection (s) 170 allows in communication in a communication medium to another computing entity. The means of communication conveys information such as computer executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal having one or more of its characteristics set or changed in such a way as to encode information in the signal. By way of example, and not limitation, the means of communication include wired or wireless techniques implemented in an electrical, optical, RF carrier. Infrared, acoustic, or other. The techniques and tools can be described in the general context of computer-readable media. Computer-readable media is any available medium that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 100, computer-readable media includes memory 120, storage 140, media, and combinations of any of the foregoing. Techniques and tools can be described in the general context of computer-executable instructions, such as those included in program modules, which run in a computing environment on one or more real target processors or virtual processors. Generally, the program modules they include routines, programs, libraries, objects, classes, components, data structures, etc. who perform particular tasks or implement particular abstract data types. The functionality of the program modules can be combined or divided among program modules as desired in various modalities. Computer executable instructions for program modules can be executed within a local or distributed computing environment. For the presentation search, the detailed description uses terms such as "code", "decode", and "choose" to describe computer operations in a computing environment. These terms are higher level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations that correspond to these terms vary depending on the implementation.
II. Illustrative video encoder and decoder Figure 2 is a block diagram of an illustrative video encoder 200 in conjunction with some described embodiments that can be implemented. Figure 3 is a block diagram of a generalized video decoder 300 in conjunction with which some described modalities can be implemented. The relationships shown between decoder 200 encoder modules 300 indicate general information flows in the encoder and decoder, other relationships are not shown for the search of simplicity In particular, Figures 2 and 3 usually do not show side information indicating configurations, modes, encoder frames, etc. Used for a sequence of video, image, piece, macroblock, block, etc. Such lateral information is sent in the output bit stream, typically after entropy encoding the lateral information The format of the output bit stream may vary depending on implementation The encoder 200 and decoder 300 process images of the output bit stream. video, which can be video frames, video fields or combinations of frames and fields The syntax and semantics of bit stream in the image and macroblock levels may depend on whether field frames are used There may be changes to the macroblock organization and total timing too The encoder 200 and decoder 300 are based on cloque and use a for macroblock macro of 4 2 0 for frames, with each macroblock including four 8x8 luminance blocks (in times treated as a 16x16 macroblock) and two 8x8 chrominance blocks for fieldsThe same or a different macroblock organization and format can be used. The 8x8 blocks can also be sub-divided into different stages, for example, in the sequence transformation and stages and entropy coding. Illustrative video frame organizations are describe in more detail later Alternatively, the encoder 200 decoder 300 are based on object, use a different macroblock or block format, or perform operations on groups of samples of different size or configuration than 8x8 blocks and 16x16 macroblocks. Depending on the implementation and type of compression As desired, the encoder or decoder modules can be added, omitted, divided into multiple modules, combined into other modules, and / or replaced with similar modules. In alternative embodiments, encoders or decoders with different modules and / or other module configurations perform one or more of the techniques described.
A. Video Frame Organizations In some implementations, the encoder 200 and decoder 300 process video frames organized as follows. A frame contains spatial information lines of a video signal. For progressive video scanning, these lines contain samples that represent a snapshot of scene content sampled from some time point and that covers the entire scene from the top of the bottom of the frame. A progressive video frame is divided into macroblocks such as the macroblock 400 shown in Figure 4. The macroblock 400 includes four luminance blocks of 8x8 (Y1 to Y4) and two chrominance blocks of 8x8 which are co-processed. locate with the four blocks of luminance but half of horizontal resolution or [8 vertically, which follows the conventional macroblock format 4 2 0 The 8 x 8 blocks can also be sub-divided into different stages, for example, in the frequency transformation (for example, 8 x 4, 4 x 8 0 4 x 4 DCTS) and entropy coding stages A progressive I frame is an intra-code progressive video frame, where the term "intra" refers to coding methods that do not involve predicting content and other previously decoded images A framework Progressive P is a progressive video frame encoded by using prediction of one or more other images in cases of time that temporarily differ from the current image (sometimes referred to as forward prediction in some contexts), and a progressive B frame is a frame of progressive video encoded by using inter-frame prediction that involves an average (possibly heavy) multiple prediction values in some regions (sometimes referred to as as bi-directional or bi-directional prediction) Progressive P and B frames can include intra-coded macroblocks as well as several types of inter-frame macroblocks The interlaced video frame scan consists of an alternate series of two types of scanning of a scene-one, called the upper field, which comprises the fixed lines (lines numbered 0, 2, 4, etc.) of one frame, and the other, called the field, lower, which comprises the lines of possibility ( Numbered lines, 1, 3, 5, etc) of the frame The two fields typically they represent two instants of different snapshot time. Figure 5A shows portions of the interlaced video frame 500, which includes the alternating lines of the upper field and lower field of the upper left part of the interlaced video frame 500. Figure 5B shows the interlaced video frame 500 of Figure 5A organized to encode / decode as a frame 530. The interlaced video frame 500 was divided into macroblocks from another such region such as macroblocks 531 and 532, which use a 4: 2: 0 format as shown in Figure 4. In the luminance plane, each macroblock 531, 532 includes 8 upper field lines alternating with 8 lower field lines for 16 total lines, and each line is 16 samples long. (The actual organization of the image in macroblocks or other such regions and the placement of luminance blocks and chrominance blocks within macroblocks 531, 532 is not shown, and may in fact vary for different coding decisions and for different designs of video encoding). Within a given macroblock, the upper field information and the lower field information may be coded together or separately in any of the various phases. An interlaced I frame is an intra-encoded interlaced video frame containing two fields, where each macroblock includes information for one or both fields. An interlaced P frame is an interlaced video frame that contains two capos that are coded when using inter-frame prediction, where each macroblock includes information for one or both fields, such as an interlaced B frame. The interlaced P and B frames may include intra-coded macroblocks as well as various types of macroblocks provided inter-frame. Figure 5c shows the interlaced video frame 500 of Figure 5A organized for coding / decoding as fields 560. Each of the two fields of the interlaced video frame 500 is divided into macroblocks. The upper field is divided into macroblocks such as macroblock 561, and the lower field is divided into macroblocks such as macroblock 562. (Again, macroblocks use a 4: 2: 0 format as shown in Figure 4, and the organization of the image in macroblocks or other such regions and placement of luminance blocks and chrominance blocks within the macroblocks is not shown and may vary). In the luminance plane the macroblock 561 includes 16 lines from the upper field and the macro block 561 includes 16 lines from the lower field, and each line is 16 samples long. An interlaced I field is a separately depicted, individual field of an interlaced video frame. An interlaced P-field is a separately depicted individual field of the coded interlaced video frame when using inter-image prediction, such as an interlaced B-field, the interlaced P and B fields may include intra-macro macroblocks. encoded as well as different types of macroblocks provided by inter-image. Interlaced video frames organized to encode / decode as fields can include various combinations of different field types. For example, such a frame can have the same type of field (field I, field P, or field B) in both upper and lower fields or different field types in each field. The term image generally refers to a source frame or field, coded or reconstructed image data. For progressive scan video, an image is typically a progressive video frame. For interlaced video, an image can refer to an interlaced video frame, the upper field of a frame or the lower field of a frame, depending on the context. Figure 5D shows six illustrative spatial alignments of 4: 2: 0 chroma sample locations with respect to the luma sample locations for each field of a video frame. Alternatively, the encoder 200 and decoder 300 are object-based, or use a different macroblock format (e.g., 4: 2: 2 or 4: 4: 4) or block format or perform operations on different sized sample groups or configuration that blocks of 8 x 8 and macroblocks of 16 x 16.
B. Video Encoder Figure 2 is a block diagram of a system of illustrative video encoder 200 The encoder system 200 receives a sequence of video images that include a current image 205 (e.g., progressive video frame, interlaced video frame, or field of an interlaced video frame) and produces video information. compressed video 295 as output Particular embodiments of video encoders typically use a supplemented version variation of the illustrative encoder 200 The encoder system 200 uses coding procedures for intra-encoded images (intra) (images I) and (inters) predicted images inter-image (images P or B) For presentation security, Figure 2 shows a path for images I through the encoder system 200 and a path for interim imagery images Many of the components of the encoder system 200 They are used to compress I images and planned inter-image images. Exact operations performed or those components may vary depending on the type of information that is compressed A predicted image of inter-image is represented in terms of a prediction (or differences) of one or more other images (which are typically referred to as reference images) A residue of Prediction is the difference between what was predicted and the original image In contrast, an I image is compressed without reference to other images I images can use spatial prediction or frequency domain prediction (ie, intra-image prediction) to predict some portions of the image I that uses data from other portions of the same image I However, for the sake of brevity, such images I are not referred to in this description as "predicted" images so that the phrase "intended image" can be understood by a projected image of inter-image ( for example, an image P or B) In the current image 205 is a predicted image, a movement estimator 210 estimates movement of macroblocks or other groups of samples of the current image 205 with respect to one or more reference images, for example , the reconstructed pre-image 225 buffered in the image storage 220 A motion estimator 210 can estimate motion with respect to one or more temporally prior reference images and one or more temporally future reference images (e.g. case of a bi-predictive image) Accordingly, the encoder system 200 may use the separate stores 220 and 222 for multiple images Reference The movement estimator 210 may estimate movement by full sample, half sample, quarter sample, or other increments, and may change the resolution of the motion estimate on a picture basis by picture or other basis. The motion estimate 210 (and compensator 230) can also switch between interpolation types of reference image sample (e.g. between cubic convulsion interpolation and bihneal interpolation) on a base by frame or other resolution of the movement estimate may be the same or different horizontally and vertically. The movement estimator 210 outputs, as lateral information, information and movement 215 such as different motion vector information. The encoder 200 encodes the movement information 215, for example, at calculate one or more predictors for motion vectors, calculate differences between vectors and motion predictors, and entropy coding of differences To reconstruct in motion vector, a motion compensator 230 combines a predictor with information and motion vector difference The motion compensator 230 applies the reconstructed motion vector to the reconstructed image (s) 225 to form a motion-condensed prediction 235 The perfect prediction, however, and the difference between the motion-compensated prediction 235 and the original current image 205 is the prediction residue 245 During the last re construction of the image, an approximation of the prediction residue 245 will be added to the motion compensated prediction 235 to obtain a reconstructed image that is closer to the original current image 205 which is the prediction compensated by motion 235 In lossy compression, without However, some information is still lost from the image acts or originates. Alternatively, a motion estimator and compensator to the movement another type of motion estimation / compensation A frequency transformer 260 converts the information from Spatial domain video in frequency domain (ie, spectral) data. For block-based video encoding, the frequency transformer 260 typically applies a discrete cosine transformation (DST), a variant of a DST, or some other block transformation for blocks of the sample data or residual prediction data, which produce blocks of frequency domain transformation coefficients. Alternatively, the frequency transformer 260 applies another type of frequency transformation such as Fourier transformation or waveform applications or subband analysis. The frequency transformer 260 can apply a frequency transformation of 8 x 8, 8 x 4, 4 x 8, 4 x 4 or of another size. A quantizer 270 then quantizes the blocks of frequency domain transformation coefficients. The quantizer applies quantization on the scale to the transformation coefficients according to a quantization step size that varies on an image-by-image basis, a macroblock base, or some other basis, wherein the quantization step size is a control parameter that governs the uniformly spaced spacing between representable reconstruction points separated in the decoder inverse quantization procedure, which can be doubled in a encoder inverse quantizer procedure 276. Alternatively, the quantizer applies another type of quantization to the frequency domain transformation coefficients, for example, a scalar quantizer without uniform reconstruction points, a vector quantifier, or non-adaptive quantization, or directly quantizes spatial domain data in an encoder system that does not use frequency transformations In addition to adaptive quantization, encoder 200 may use drop-off frame, adaptive filtering, or other techniques for index control When a reconstructed current image is needed for subsequent motion estimation / compensation, an inverse quantizer 276 performs inverse quantization on the quantized frequency domain transformation coefficients A reverse frequency transformer 266 then it performs the inverse of the operations of the frequency transformer 260, which produces a residual approximation of reconstructed prediction (for a predicted image) or an approximate reconstructed I image If the current image 205 has an image I, the reconstructed image approximation I s e taken as the reconstructed current image approximation (not shown) If the current image 205 was a predicted image, the residual approximation of reconstructed prediction is added to the motion compensated prediction 235 to form the reconstructed current image approximation One or more storages 220, 225 buffer the reconstructed current image approximation for use as a reference image in prediction compensated by subsequent image movement The encoder may apply a filter to unlock or other image refinement procedure to the frame reconstructed to adaptively smooth discontinuities and remove other artifacts from the previous image or store the image approximation in one or more image storages 220, 222. The entropy encoder 280 compresses the output of the quantizer 270 as well as certain lateral information (e.g. movement information 215, quantization step size). Typical entropy coding techniques include arithmetic coding, differential coding, Hoffman coding, running length coding, Lempel-Ziv coding, dictionary coding, and combinations of the foregoing. The entropy coder 280 uses different coding techniques for different kinds of information (eg, low frequency coefficients, high frequency coefficients, zero frequency coefficients, different kinds of lateral information), and can choose from multiple code frames within a particular coding technique. The entropy encoder 280 provides compressed video information 295 to the multiplexer ["MUX"] 290. The MUX 290 may include a buffer, and a buffer totality level indicator may feed adaptive bit index modules for control of index. Before or after the MUX 290, the compressed video information 295 may be encoded per channel for transmission in the network. Channel coding can apply error detection and correction data to video information compressed 295.
C. Video Decoder Figure 3 is a block diagram of an illustrative video decoder system 300. The decoder system 300 receives information 395 for a compressed sequence of video images and produces output that includes a reconstructed image 305 (for example, example, progressive video frame, interlaced video frame, or field of an interlaced video frame). Particular embodiments of video decoders typically use a supplemented version variation of the generalized decoder 300. The decoder system 300 decompresses predicted images and images I. For the presentation search, Figure 3 shows a route for images I through the system of decoder 300 and a route for predicted images. Many of the components of the system of decoder 300 are used to compress both I images and predicted images. The exact operations performed by those components may vary depending on the type of information that is decompressed. A DEMUX 390 receives the information 395 for the compressed video stream and makes the received information available for the 390 entropy decoder. The DEMUX 390 may include an irregularity buffer and other buffers as well. Before or inside the DEMUX 390, the video information compressed can be decoded and processed per channel for error detection and correction. The entropy decoder entropy 380 decodes entropy-encoded quantized data as well as lateral information encoded by entropy (eg, motion information 315, quantization step size), which typically applies the inverse of the entropy coding performed in the encoder . Entropy decoding techniques include arithmetic decoding, differential decoding, Huffman decoding, running length decoding, Lempel-Ziv decoding, dictionary decoding, and combinations of the foregoing. The entropy decoder 380 typically uses different decoding techniques for different kinds of information (eg, low frequency coefficients, high frequency coefficients, zero frequency coefficients, different kinds of lateral information), and can choose from multiple frames of information. code within a particular decoding technique. Decoder 300 decodes motion information 315, for example, by calculating one or more predictors for motion readers, entropy that decodes motion vector differences (in 380 entropy decoder), and combines decoded motion vector differences with Foresight to reconstruct movement vectors. A motion compensator 330 applies information from movement 315 to one or more reference images 325 to form a prediction 335 of reconstructed image 305 For example, motion compensator 330 uses one or more macro block motion vectors to find sample blocks or to interpolate fractional positions between samples in reference frame (s) 325 One or more image storages (e.g., image storage 320, 322) store pre-constructed images for use as reference images Typically, B images have more than one image reference (e.g., at least one temporally prior reference image and at least one temporally future reference image) Accordingly, the decoder system 300 may use separate image stores 320 and 322 for multiple reference images The motion compensator 330 can compensate movement in full sample, 1/2 sample, 1/4 sample, or other increments, and p You can change the resolution of the motion compensation on a picture basis by picture or another base The motion compensator 330 can also change between types of sample interpolation of reference picture (for example, between interpolation of cubic convulsion and bilinear interpolation) on a base by frame or another The resolution of the motion compensation can be the same or different horizontally and vertically Alternatively, a motion compensator applies another type of motion compensation The prediction by the motion compensator rarely is perfect, while decoder 300 also reconstructs prediction residues An inverse of inverse quantizer 370 quantizes decoded entropy data Typically, the inverse quantizer applies inverse quantization of uniform scaling to decoded entropy data with a reconstruction step size that varies on a picture-by-picture basis, a macroblock basis, or some other basis Alternatively, the inverse quantizer applies to another type of inverse quantization to the data, for example, a non-uniform, vector, or non-adaptive inverse quantization, or directly quantizes inverse space domain data in a decoder system that does not use inverse frequency transformations A reverse frequency transformer 360 converts the inverse quantized frequency domain transformation coefficients to spatial domain video information For video-based images and n block, the inverse frequency transformer 360 applies an inverse DCT ["IDCT"], an IDCT variant, or some other inverse block transformation to blocks of the frequency transformation coefficients, which produce sample data or residual data for prediction of Inter-image for I images or predicted images, respectively Alternatively, the inverse frequency transformer 360 applies another type of inverse frequency transformation as an inverse Fourier transform or uses small wave or sub-band synthesis Reverse frequency transformer 360 can apply a reverse frequency transformation of size 8 x 8, 8 x 4, 4 x 8, 4 x 4, or another. For a predicted image, the decoder 300 combines the reconstructed prediction residue 345 with the motion compensated prediction 335 to form the reconstructed image 305. When the decoder needs a reconstructed image 305 for subsequent motion compensation, one or more image storages ( for example, image storage 320) buffer the reconstructed image 305 for use in predicting the next image. In some embodiments, the decoder 300 applies a de-blocking filter or other image refinement method to the reconstructed image to adaptively smooth discontinuities and remove other artifacts from the image prior to storing the reconstructed image 305 in one more of the storages of image (eg, 320 image storage) or prior to displaying the decoded image during decoded video playback. lll. General Review of Multiple Resolution Coding and Decoding The video can be encoded (and decoded) in different resolutions. For the purposes of this description, multiple resolution encoding and decoding can be written as frame-based encoding and decoding (eg, reference image sampling) or encoding and decoding in layers (sometimes referred to as spatially scalable). Multiple resolution encoding and decoding can also involve interlaced video and field-based encoding and decoding and switching between frame-based and field-based encoding and decoding on a specific basis of resolution or on some other basis. However, the coding of the progressive video frame is discussed in this review for the purpose of simplifying the description of the concept.
A. Frame-based Multiple Resolution Coding and Decoding In frame-based multiple resolution coding, an encoder encodes input images in different resolutions. The encoder chooses the spatial resolution for images on a picture-by-picture basis or on some other basis. For example, in re-sampling the reference image, a reference image can be re-sampled if it is encoded in a resolution different from the image that is encoded. The term re-sampling is used to describe increasing (upward or upward sampling) or decreasing (downward or downward sampling) the number of samples used to represent an area or some other section of a sampled signal. The number of samples per unit area or per signal section is referred to as the sampling resolution. Spatial resolution can be chosen based on, for example, a decrease / increase in available bit rate, decrease / increase in quantization step size, decrease / increase in the amount of movement in the input video content, other properties of the video content (e.g. strong edges, text, or other content that can be significantly distorted in lower resolutions), or some other basis The spatial resolution can be varied in vertical, horizontal, or both vertical and horizontal dimensions The horizontal resolution can be the same as or different from the resolution vertical A decoder decodes encoded frames using complementary techniques Once the encoder chose a spatial resolution for a current image or area within a current image, the encoder re-samples the original image at the desired resolution before encoding it The encoder can then point out the choice of spatial resolution when decoding Figure 6 shows a technique (600) for frame-based multiple resolution image coding. An encoder, such as encoder 200 in Figure 2 establishes a resolution (610) for an image. For example, the encoder considers the criteria listed above. or other criteria The encoder then encodes the image (620) at that resolution If the encoding of all the images to be encoded is done (630), the encoder exits If not, the encoder sets a resolution (610) for the next image and continue coding Alternatively, the encoder establishes resolutions at some different level at the image level, such as configuring the resolution differently for different parts of the image or making a resolution selection for a group or sequence of images. The encoder can encode predicted images as well as Ntra-images. Figure 8 shows a technique (800) for multiple resolution coding based on intra-frame images and the expected images of Inter-image. First, the encoder checks at 810 whether the current image to be encoded is an intra-image or a predicted image. If the current image is an intra-image, the encoder sets the resolution for the current image to 820. If the image is a predicted image, the encoder sets the resolution for the reference image at 830 before setting the resolution for the image current. After setting the resolution for the current image, the encoder encodes the current image (840) in that resolution. Setting the resolution for an image (if it is a current source image or a stored reference image) may involve resampling the image to match the selected resolution and may involve encoding a signal to indicate the selected resolution to the encoder. If the coding of all the images to be encoded is done (850), the encoder exits. If not, the encoder continues to encode additional images. Alternatively, the encoder treats predicted images in a different way. A decoder decodes the encoded image, and, if it is necessary, resamples the image before presentation As the resolution of the encoded image, the resolution of the decoded image can be adjusted in many different ways. For example, the resolution of the decoded image can be adjusted to adjust the resolution of the display device. output or a region of an output display device (e.g., for "image-in-picture" window presentation or PC desktops) Figure 7 shows a technique (700) for multiple resolution decoding based on frame of A decoder, such as decoder 300 in Figure 3, sets a resolution (in 710) for an image. For example, the decoder obtains encoder resolution information. The decoder then decodes the image (720) to that resolution. All the images to be decoded are done (730), the decoder comes out If not, the decoded r sets a resolution (710) for the next image and continues decoding Alternatively, the decoder sets resolutions at some level other than the image level The decoder can decode predicted images as well as intra-images Figure 9 shows a decoding technique (900) Multiple resolution based on intra-images and predicted images First, the decoder checks whether the current frame for decoding is an intra-image or a predicted image (910). If the current image is an intra-image, the decoder sets the resolution for the current image (920). If the image is a predicted image, the decoder sets the resolution for the reference image (930) before setting a resolution for the current image (920). Setting the resolution of the reference image may involve re-sampling the stored reference image to match the selected resolution after setting the resolution for the current image (920), the decoder decodes the current image (940) in that resolution. If the decoding of all the images to be encoded is done (950), the decoder goes out. If not, the decoder continues decoding. The decoder typically decodes images in the same resolutions used in the encoder. Alternatively, the decoder decodes images in different resolutions, such as when the resolutions available for the decoder are not exactly the same as those used in the encoder.
B. Encoding v Multiple Layer Resolution Decoding In multiple layer resolution encoding, an encoder encodes video in layers, with each layer having information to decode the video in a different resolution.
In this way, the encoder encodes at least some individual images in the video in more than one resolution. A decoder can then decode the video into one or more resolutions by processing different combinations of layers. For example, a first layer (sometimes referred to as a base layer) contains information for decoding video at a lower resolution, while one or more other layers (sometimes referred to as enhancement layers) contain information for decoding the video at higher resolutions. The base layer can be designed to be a stream of independently decodable bits Thus, in such a design, a decoder that decodes only the base layer will produce a valid decoded bit stream at the lower resolution of the base layer The proper decoding of higher resolution images using an enhancement layer may also require decoding some or all the coded base layer data and possibly one or more improvement layers A decoder that decodes the base layer and one or more higher resolution layers will be able to produce higher resolution content than a decoder that will decode only the base layer Two, three or more layers can be used to allow two, three or more different resolutions Alternatively, a higher resolution layer per se is also an independently decodable bit stream (Such a design is often referred to as an Multiple resolution coding for simultaneous broadcasting). Figure 10 shows a technique (1000) for encoding bitstream layers to allow decoding in different resolutions. An encoder such as an encoder 200 in Figure 2 takes full resolution video information as input (1010). The encoder samples down the full resolution video information (1020) and encodes the base layer by using the information sampled downward (1030). The encoder encodes one or more higher resolution layers using the base layer and higher resolution video information (1040). A higher resolution layer can be a layer that allows decoding in full resolution, or a layer that allows decoding in some intermediate resolution. The encoder then outputs a stream of bits in layers comprising two or more of the encoded layers. Alternatively, the encoding of the upper resolution layer (1040) may use base layer information and thus may allow the independent decoding of the higher resolution layer data for a multi-resolution simultaneous broadcast encoding approach. The encoder can perform multiple resolution layer coding in several ways that follow the basic profile shown in the Figure 10. For more information, see, for example, Patent of E.U.A.
No. 6,510,177 or the MPEG-2 standard or other video standards.
Figure 11 shows a technique (1100) for decoding bit stream layers to allow video decoding in different resolutions A decoder such as the decoder 300 in Figure 3 takes a stream of bits as input (1110) The layers include a lower resolution layer (base layer) and one or more layers comprising higher resolution information The layers of higher resolution do not need to contain their independently encoded images, typically, higher resolution layers include residual information that describes differences between versions of resolution images and upper and lower The decoder decodes the base layer (1120) and, if higher resolution decoding is desired, the decoder samples up decoded base layer images (1130) to the desired resolution. The decoder decodes one or more higher resolution layers (1140) and combines the higher decoded resolution information with Base layer images sampled upward, decoded to form higher resolution images (1150) depending on the desired resolution level, higher resolution images can be full resolution images or intermediate resolution images For more information, see, for example, US Patent No. 6,510,177, or the MPEG-2 standard or other video standards. The decoder typically decodes images in one of the resolutions used in the encoder. Alternatively, the resolutions available for the decoder are not exactly the same as those used. in the encoder IV. Re-sampling Filters for Scalable Video Encoding and Decoding This section describes techniques and tools for scalable video encoding and decoding. Although some described techniques and tools are described in a layered (or spatially scalable) context, some described techniques and tools can also be used. used in a frame-based context (or reference image sampling), or in some other context that involves re-sampling filters. In addition, before some described techniques and tools are described in the context of re-sampling images, some techniques and described tools can be used to re-sample residual or difference signals resulting from the prediction of higher resolution signals. Scalable video coding (SDC) is a type of digital video coding that allows a subgroup of a bit stream greater is decoded to produce decoded images with a A quality that is acceptable for some applications (although such image quality will be less than the quality produced when decoding a full upper bit index bit stream). A well-known type of SVC is termed as spatial scalability, or resolution flaw In a spatial SVC design, the coding procedure (or a pre-processing function to be performed prior to the coding procedure, depending on the exact definition of the scope of the coding procedure) typically includes sampling towards down the video to a lower resolution and encoding that lower resolution video to allow a lower resolution decoding procedure, and upward sampling of the lower resolution decoded images for use as a prediction of the values of the samples in the images of the higher resolution video The decoding procedure for the higher resolution video then includes decoding the lower resolution video (or some part of it) and using the sampled video upwards with a prediction of the value of the samples in the images of the higher resolution video Such designs require the use of re-sampling filters. In particular, encoder / decoder designs include the use of up-sampling filters in decoders and encoders and the use of down-sampling filters in encoders or encoders. coding pre-processors Specifically not s we focus on up sampling filters used in such designs Typically, the up sampling procedure is designed to be identical in encoders and decoders, in order to prevent a phenomenon known as slippage, which is an accumulation of error caused by the use of different predictions of the same signal during coding and decoding A disadvantage of some spatial SVC designs is the use of low quality filters (eg, bi-neal filters of two covers) in the decoding procedure The use of quality filters higher will be beneficial for video quality Spatial SVC can include re-sampling filters that allow a high degree of flexibility in the re-sampling ratio of the filter. However, this may require a large number of particular filter designs for each different "phase" of such a filter to develop and the "cap" values of these filters may be stored in implementations of encoders and decoders. In addition, it may be beneficial for video quality to allow an encoder to control the amount of fuzzy quality of the re-sampling filters used for spatial SVC. Thus, for each re-sampling "phase" to be designed for up sampling or down sampling, it may be beneficial to have several different filters to choose from, depending on the desired degree of fuzzy quality to be introduced into the procedure. The selection of the fuzzy quality grade to be performed during up sampling may be sent from an encoder to a decoder as information conveyed for use in the decoding process. This extra flexibility also complicates the design, since it greatly increases the number of necessary stage values that may be needed to be stored in an encoder or decoder. A unified design can be used to specify a variety of re-sampling filters with several phases and several degrees of fuzzy quality. One possible solution is the use of the Mitchell-Netravali filter design method. The direct application of the Mitchell-Netravali filter design method to these problems it may require excess calculation resources in the form of an excessive dynamic scale of possible values for quantities to be calculated in the encoder or decoder. For example, one of such design may require the use of 45-bit arithmetic procedure, rather than the 16-bit or 32-bit procedure elements normally used in CPUs and general purpose DCPs. To address this problem, we provide some design refinements. A typical SVC design requires a normative upstream sampling filter for spatial scalability to support arbitrary re-sampling ratios (a feature known as extended spatial scalability), an up-sampling filter design is described as incorporating a great deal of flexibility with respect to re-sampling relationships. Another key aspect is the relative alignment of luma and chroma. Since there is a variety of alignment structures (see, for example, H.261 / MPEG-1 versus MPEG-2 alignment for chroma 4: 2: 0, and H.264 / MEGs-4 AVC) in close-up approaches individual stage, the techniques and tools described support a flexible variety of alignments, with an easy way for the encoder to indicate to the decoder how to apply the filtering appropriately. The techniques and tools described comprise upward sampling filters capable of high quality up sampling and good anti-irregularity. In particular, the techniques and tools described have quality beyond that provided by previous bihneal filter designs for spatial escalabity The techniques and tools described have high-quality upward sampling filters that are visually pleasing as well as provide good signal processing frequency behavior. The techniques and tools described comprise a filter design that is simple to specify and does not require large memory storage boxes to support stage values, and the filtering operations themselves are computationally simple to operate For example, the techniques and tools described have a filter that is not excessively long and does not require excessive mathematical precision or excessive complex mathematical functions This section describes designs that have one or more of the following characteristics - flexibility of luma / chroma phase alignment, re-sampling relationship flexibility, frequency feature flexibility, high visual quality, not very few or too many filter covers (for example, between 4 and 6), simple to specify, simple to operate (for example, which uses practical word length arithmetic) A. Mitchell-Netravali Upstream Sampling Filters The techniques and tools described take a separable filtering approach, therefore, the following discussion will focus primarily on the procedure of a one-dimensional signal, as the two-dimensional case is a simple detachable application of the One-dimensional case First, it proposes a group of two filter parameters based on conceptually continuous impulse responses H (X) given by where b and c are the two parameters, for a relative phase equivalent position 0 = x = 1, this core produces a four-finite impulse response (FIR) filter with cap values given by the following matrix equation Actually, it is sufficient to consider only the scale of x from 0 to 1/2 since the FIR filter kernel for x is simply the FIR filter kernel for 1-x in reverse order This design has a number of interesting and useful properties Here are some of them - Trigonometric functions are not needed, functions transcendental or irrational number procedure to calculate the filter stage values. In fact, the cap values for such a filter can be calculated directly only with some simple operations. They do not need to be stored for the various possible values of the parameters and phases to be used; they can simply be calculated when needed. (Thus, to standardize the use of such filters, only a few formulas are needed, not large tables of numbers or standardized attempts to approximate functions such as cosines or Bessel functions are needed). The resulting filter has 4 caps. This is a very practical number. The filter only has an individual side lobe on each side of the main lobe. That way it will not produce excessive riffle artifacts. The filter has a soft impulse response. Its value and its first derivative are both continuous. It has a unit gain DC response, which means that there is no amplification of total brightness or attenuation in the information being sampled upwards. Members of this filter family include relatively good approximations of well-known good filters such as the "Lanczos-2" design and the "Catmull-Rom" design. In addition, the techniques and tools described include a particular relationship between the two parameters for the selection of visually pleasing filters That relationship can be expressed as follows This reduces the degrees of freedom to an individual bandwidth control parameter b This parameter controls the extra fuzzy quality degree introduced by the filter It should be noted that the member of this family associated with the value b = 0 is the filter of Excellent and well-known upward sampling Catmull-Rom (also known as an interpolation filter of "cubic convulsion" of Keys) The Catmull-Rom up sampling filter has a number of good properties by itself, in addition to the advantages found for all members of the Mitchell-Netravah filter family It is an "interpolation" filter, that is, for phase values of x = 0 and x = 1, the filter has a non-zero individual cap equal to 1 In other words, a signal sampled upwards will pass exactly through the values of the input samples at the edges of each segment of the sampled curve upwards. If you form a parabola (or a straight line, or a static value), the exit points will lie exactly on the parabolic curve (or straight line or static value). In fact, in some way, the upstream sampler of Catmull-Rom may be considered the best sampling filter towards above this length for these reasons, although introducing some extra fuzzy quality (which increases b) can sometimes be visually more pleasing Also introducing some extra fuzzy quality can help to erase some low bit rate compression artifacts and thus act more similar to a Wiener filter estimator (a well-known filter used for noise filtering) of the true up sampling image The simple substitution of Equation (3) in Equation (2) results in the following cap values It was reported that, based on subjective tests with 9 expert observers and more than 500 samples, a useful scale was reported, 0 < b = 5/3, 0 < b = 1/2 is classified as visually "satisfactory", with b = 1/3 reported as visually pleasing, - b > 1/2 is classified as "fuzzy", with b = 3/2 reported as excessively blurry B. Integer of the Bandwidth Control Parameter The division by 6 in Equation (4) may not be desirable. It may be desirable in turn desirable to place in whole the bandwidth control parameter and filter stage values, since infinite precision is impractical as part of a decoder design A substitution is considered that uses a new variable with integer value a defined as follows a = (b / 6) * 2 * (5), where S is a change factor of integer is already an unsigned integer that acts as an integer bandwidth control parameter The parameter a can be encoded as a syntax element by the encoder at the level of video sequence in a stream of bits For example, the parameter of a can be explicitly encoded with a variable length or fixed length code, coded together with other information, or explicitly pointed out Alternatively, parameter a is pointed to in some other level in a stream of bits The integer placement results in whole cap values of The result will then need to be scaled down by S positions in binary arithmetic procedure If S has a scale of 0 to M, then b has a scale of 0 to 6 * M / 2S Some possible useful choices for M include the following M = 2 ( S 2) -1, which results in a scale of b from 0 to 3/2 - 6 / 2s. M = Ceil) 2s / 6), which returns the smallest integer greater than or equal to 2s / 6, which results in a scale from 0 to slightly more than 1. M = 2 (S "3) -1, which results in an approximate scale of b from 0 to 3/4 - 6 / 2s. These choices for M are large enough to cover most useful cases, with the first choice (M = 2 (S'2) -1) that is the largest of the three choices.A useful scale for S is between 6 and 8. For example, consider S = 7 and M = 2 (S'2) -1, that is M = 31. Alternatively, you can use other values of M and S.
C. Integer of the Fractional Sample Placement Then we consider the granularity of the value of x. For practical use, we also approximate x. For example, we can define an integer i so that: x = i-2F (7) where F represents a fragmented sample position accuracy supported. For an example of a sufficiently accurate resampling operation, consider F = 4 (a sixteenth or greater precision of sample placement). This results in the following integer filter cover values: For example, consider F = 4. The result will then need to be scaled down by 3F + S positions. It should be noted that each integer in the previous matrix contains a factor of two in common (assuming that S is greater than 1). In this way, you can formulate the cap values as follows: where each of the cap values was divided by 2. The result will then need to be scaled down only by positions 3F + S-1. For the downward scaling, we define the function RoundTileRow (p, R) as the output of a right change of R bits (with rounding) calculated by input value p, calculated as follows: (/ 7 + 2s- ') »? for ? = 2,3,4S. ChangeRoundingRight (p, R) = (10) (p) »R for R = 0 or l where the notation "> >" refers to a binary arithmetic right change operator that uses two-complement binary arithmetic. Alternatively, the right rounding change is done differently. Some illustrative applications for rounded right change are provided later.
D. Consideration of Dynamic Scale If images with N bits of sampling bit depth are filtered and made two-dimensionally before doing any rounding, we will need 2 * (3F + S-1) + N + 1 dynamic scale bits in the accumulator previous to downward shift of the result by 2 * (3F + S-1) positions and hold the output to a scale of N bits. For example, if we have F = 4, S = 7 and N = S, we may need to use a 45-bit accumulator to calculate the filtered result. Some approaches are described to mitigate this problem in the following sub-sections. These approaches can be used separately or in combination with each other. It should be understood that variations of the dynamic scale mitigation approaches described are possible based on the descriptions herein. 1. First Aspect of Mitigation of Illustrative Dynamic Scale Consider an example where first horizontal filtering is performed, followed by vertical filtering. Consider a maximum word length of W bits for any point in the two-dimensional processing pipeline. In a first approximation of dynamic scale mitigation, to perform the filtering we use a change of right of rounding of RH bits in the output of the first stage (horizontal) of the procedure and a change of rounding right of R bits in the output of the second stage (vertical) procedure.
In this way we calculate the following: 2 * (3F + S-1) + N + 1-RH = W (11), and therefore RH = 2 * (3F + S-1) + N + 1-W ( 12). Then the right change for the second stage (vertical) can be calculated from RH + Rv = 2 * (3F + S-1) (13), and therefore Rv = 2 * (3F + S-1) -RH. (14) For example, for F = 4 and S = 7 and N = 8 and W = 32, we obtain RH = 13 and R = 23. Thus, instead of 45 bits of dynamic scale, with right rounding changes the scale dynamic is reduced to 32 bits. The right changes of different bit numbers can be used for different values of W. 2. Second Illustrative Dynamic Scale Mitigation Aspect A second dynamic scale mitigation approach involves reducing the accuracy of cap values rather than reducing the precision of phase placement (ie, reducing F), reducing the granularity of the adjustment parameter of Filter bandwidth (ie, reduce S) or reduce the accuracy of the output of the first stage (ie, increase RH). We denote the four values of the whole cap produced by Equation (9) as [t.- ,, t0,, t2]. It should be noted that the sum of the four filter stage values will be equal to 23h + S 1 that is, _ 3F + S 1 t 1 + t0 + t1 + t2 = 2 (15) This is an important property of this illustrative dynamic scale mitigation aspect due to that at any time that all four input samples have the same value, the output will have that same value, given a right change quantity Rt for the cap values, we define the following u ^ ChangeRight ofRedondeoit -i, Rt), u2 = ChangeRedondeRight (t2, Rt) u0 = 23F + S1-u? -u1-u2 Then filtering is performed with cap values [u.-i, u0, u ,, u2] more than [ti, t0, t ^ t2] Each increase of 1 in the value of Rt represents one bit less than the dynamic scale needed in the arithmetic accumulator, and one bit less than the right change to be performed in subsequent stages of the procedure 3. Third Aspect of Mitigation of Illustrative Dynamic Scale A prior design uses a trick that is similar in concept but differs from the first illustrative dynamic scale mitigation approach in that it makes the count change right after the first stage of the procedure a function of the value of the phase placement variable i It can be recognized that the values of the filter cover shown in Equation (9) will contain K LSBs of zero value when the value of i is an integer multiple of 2K. Thus, if the second stage of the filter procedure uses a phase placement variable i that is a January multiple of 2k, it is possible to change the cap values of the second stage right by K bits and decrease the right amount of change for the first stage by K bits. This can make it difficult to keep track of when to operate a generalized re-sampling factor. However, when performing simple re-sampling factors of 2: 1 or other simple factors, it is easy to recognize that all phases in use for the second stage of the filtering procedure contain the same multiple of 2K, which allows it to be applied this approach in these special cases.
V. Techniques v Position Calculation Tools The techniques and tools for calculating position information for spatial SVC are described. Some techniques and tools are aimed at how to focus on a length of word B and optimize the accuracy of the calculation within the limitation of that word length. Instead of just selecting the precision and requiring some necessary word length, applying the new method will result in superior precision in a real implementation and will expand the scale of effective application of the technique, because it uses the entire available word length to maximize the accuracy within the limitation. Some techniques and tools are aimed at, a) equalizing origin of the coordinate system and b) use unsigned integers rather than signed integers in order to achieve a better exchange between precision and word length / dynamic scale. A smaller increase in calculations is needed to add the equality of origin term to each calculated position. Some techniques and tools are aimed at breaking the calculation of different sections of the sequence of samples to occur at different procedural scales, where the origin of the coordinate system changes at the beginning of each stage. Again in smaller increase in computational requirements (since some extra calculations are made at the beginning of each stage). If the technique is taken at its logical end, the need for multiplication operations can be eliminated and the exchange between precision and word length / dynamic scale can also be provided. However, certain extra operations will need to be performed for each sample (since the extra calculation needed for "each stage" becomes necessary for each sample when each stage contains only one sample). As a general theme, the designs are described for the position calculation part of the procedure to achieve desirable exchanges between precision of the calculated results, word length / dynamic scale of the procedural elements, and the number and type of mathematical operations involved. in the procedure (for example, change, addition and multiplication operations).
For example, the described techniques and tools allow flexible precision calculations to use B-bit arithmetic (for example, 32 bits). This allows an SVC encoder / decoder to flexibly accommodate different image sizes without having to convert to a different arithmetic (e.g., 16-bit or 64-bit arithmetic) for calculations. With flexible precision B-bit arithmetic (for example, 32 bits), an encoder / decoder can dedicate a flexible number of bits to the fractional component. This allows for increased accuracy for calculations while the number of bits required to represent the integer component decreases (e.g., for a smaller frame size). While the number of bits required to represent the integer component increases (for example, for a larger frame size), the encoder / decoder can use more bits for the integer component and fewer bits for the fractional component, which reduces precision but it maintains the arithmetic of B-bits. In this way, it is widely simplified to switch between different accuracies and different frame sizes. This section includes specific details for an illustrative implementation. However, it should be noted that the specifications described here may vary in other implementations in accordance with the principles described here.
A. Principles of Introduction and Position Calculation The techniques for calculating position and phase information, which results in lower computational requirements without any significant loss of accuracy, are described. For example, the techniques described can reduce computational requirements significantly, for example, nominal dynamic scale requirements are dramatically (by dozens of bits). Considering the variety of possible chroma positions that can be used in base and enhancement layers, it is desirable to find a solution that provides proper placement of re-sampled chroma samples relative to luma samples. Therefore, the techniques described allow adjustments to be made to calculate positions for video formats with different relationships between luma and chroma positions. A previous up sampling method designed for extended spatial scalability uses a more difficult method of calculating the position and phase information when the lower resolution layer is sampled up, scales an approximate upward shift of a denominator, which causes amplification of rounding error in the inversion approach while the numerator increases (that is, while the up sampling procedure moves from left to right, or from top to bottom). In comparison, the techniques described here have excellent accuracy and simplify calculation. In particular, the techniques described here that reduce the dynamic scale and the amount of right change in position calculations by dozens of bits. For example, a technique for calculating the position information to obtain an integer position and a position variable of phase i, where i = 0.2F-1, for use in spatial up sampling of SVC. The techniques described apply the re-sampling procedure to the scalable video coding application rather than to forward reference image resampling. In this scalable spatial video coding application, certain simplifications may apply. More than a general buckling procedure, we only need an image reset operation. This can be a separable design for each dimension.
B. Design of Position Calculation Consider a problem statement, in each dimension (x or y), as the production of a sequence of samples that lie conceptually on a real valued scale from L to R > L in the new order (sampled up). This real valued scale is to correspond to a scale from L 'to R' > L 'in the order of lower resolution referenced. For a position T in the new order where L = T = R, then we need to calculate the position in the order of reference that corresponds to the position in the new order. This can be the position T '= L' + (T-L) * (R'-L ') + (- | _). Now instead of considering the readjustment of the scale from L to R, we define an integer M > 0 and we consider readjusting the scale from L to L + 2M by the same readjustment ratio (R'-L ') + (R-L). The scale corresponding in the sample coordinates referenced then are from L 'to R ", where R" = L' + 2M * (R'-L ') * (R-L). If M is large enough, that is, if M > Ceil (Log2 (RL)), then R "= R '. (Assume for now that this limitation is maintained in order to explain later concepts, although this limitation is not really necessary for proper functioning of the equations.) Now we can use linear interpolation between positions L 'and R "for positioning calculations. The placement L is outlined to place L ', and placing T = L is outlined to place ((2M- (TL)) * L' + (TL) * R ") + 2M.This converts the denominator of the operation to a energy of 2, which in this way reduces the computational complexity of a division operation by allowing it to be replaced by a binary right change.The appropriate modifications can be made to place the calculations in. The values of L 'and R "are rounded to integer multiples of '\ 2G, where G is an integer, so that L' is approximated by ki-2G and R "is approximated by r + 2G where k and r are integers.When using this setting, we have the position T delineated to the position ((2M- (TL)) * k + (TL) * r) + 2 (M + G) Now assume that the relevant values of T and L are integer multiples of 1-¡-2J, in where J is an integer, so that TL = j -; - 2J When using this setting, we have the position T delineated to the position ((2 <+ J) -j) * k + j * r) -2 ( M + G + J) Retake from section IV above, that phase f ractional Re-sampling filter is going to be an integer in units of 1 - * - 2F. So that the calculated position, in these units, is Round (((2 (M + J) -j) * k + j * r) 2 (M + G + JF)), ot '= ((2 (M + J) -j) * j * r + 2 (M + G + JF-1,) > > (M + G + JF) (16), or, more simply, t '= (j * C + D ) > > S (17), where S = M + G + JF (18), C = rk (19), D = (k < < (M + J)) + (K < ( S-1)) (20) The only error produced in the method described here (assuming no error in the representation of L and R and L 'and R') prior to the rounding of the calculated position to the nearest multiple of 1 -i-2F (which is an error that is present in both designs) is the rounding error of the rounding of the position R "to the nearest multiple of 1-2 G. This amount is very small if GM is relatively large. , this source of error is closely linked to a magnitude of approximately (TL) -r2 (G + M + 1), the word length requirements for calculating the results are modest, and the module arithmetic allows the part of whole the result separates to minimize the word length, or allows the calculation to be decomposed into other similar forms as well. F, for example, can be 4 or more. (For some applications, F = 3 or F = 2 may suffice). Illustrative values of J include J = 1 for luma position calculations and J = 2 for chroma sample positions. The rational for these illustrative values of J can be found later 1. First Simplified Position Calculation Technique Illustrative Using B-bit Arithmetic Signed If R '> 0 and L '> -R ', then all the positions t' to be calculated in the image to be sampled upwards, as an integer in units of 1-2F, will lie between -2Z and 2Z-1, where Z = Ce? L (Log2 (R ')) + F If the word length of the calculation (j * C + D) is B bits, we assume the use of two-complement complement arithmetic, then we require that B-1 = Z + S High accuracy is achieved if the limitation is fair, that is, if B-1 = Z + M + G + JF For reasonably small image sizes (for example, for levels up to level 4 2 in the current H 264 / MPEG-4 AVC standard), B = 32 can be used as a word length Other values of B can also be used For very large images, a larger B can be used Calculations can also be easily decomposed by smaller word length sub-calculations for use in 16 bits or others processors The two remaining degrees of freedom are M and G Their relationship is flexible as long as G is large enough to avoid any need for rounding error when representing L 'as k-2G Thus, based on problems discussed in the following section for SVC, we can only choose G = 2, which produces M = B + F- (G + J + Z +1) that is, M = 32 + 4- (2 + 1 + Z + 1) that is, M = 32-Z. For example, if we want to show up the luma order of an image that has a width of 1000 luma samples with B = 32 and L '= 0, we can use F = 4, G = 2, J = 1, M = 18 , S = 17, and Z = 14 using this first illustrative position calculation technique. When T is very wax (or equal) to R and R 'is very close (or equal) to an integer energy of 2, especially when (TL) * (R? -L') - * - 2F is large (for example greater than 1/2), it may be hypothetically possible for the upper union to be violated by 1. In addition we do not consider such cases here, although the adjustments to control such cases are direct. 2. Second Illustrative Simplified Position Calculation Technique Using Signed B-bit Arithmetic If all the positions to be calculated in the low resolution image are greater than or equal to 0, which can somehow be made true by adding an appropriate equivalent to the origin of the coordinate system, then it may be a better choice to calculate t '= (j * C + D) using unsigned integer arithmetic rather than two-complement arithmetic signed. This allows one more bit of dynamic scaling without overflow in the calculations (that is, we can use B bits of dynamic scale magnitude more than B-1 bits), thereby increasing M (or G) and S each by 1 and also increases the accuracy of the calculated results. Thus, after including an equivalent E to adjust the origin of the coordinate system, the form of the calculation would be t '= ((j * C + D') > > S ') + E more than just t' = (j * C + D) > > S Additional detail is provided in this more accurate method involving unsigned arithmetic by identifying when the equivalent of origin E will not be needed as follows. Choosing values for B, F, G, J, and Z as described above - Set M = B + F- (G + J + Z) Calculate S, C, and D as previously specified in Equations (18), (19) and (20), respectively, where D is calculated as a signed number If D is greater that or equal to zero, no equivalent of origin is needed (ie, no use of E) and the calculation can be done simply as t '= (j * C + D) > > S using unsigned arithmetic and the result will have greater accuracy than the first illustrative position calculation technique described in section VB 1 above. In addition to improving accuracy by enabling calculation when using unsigned integers, matching the origin can sometimes be used to provide Improved accuracy by enabling a decrease in the value of Z Without the source equivalent, Z is a function of R 'But with the origin equivalent, we can make Z a function of R'-L', which will make the calculation more accurate if this results in one more value small of Z. Additional detail is provided in this more accurate method involving unsigned arithmetic by showing a way to equalize the origin, deriving D 'and E as follows. - Choose values for B, F, G, and J, as described above. Set Z = Ceil (Log2 (R'-L ')) + F. Set M = B + F- (G + J + Z). Calculate S, C, and D as specified above in Equations (18), (19) and (20), respectively, where D is calculated as a signed number. Set E = D > > S. Set D '= D- (E < < S). The position calculation can then be performed as t '= ((j * C + D') > > S) + E. If D 'and E (and M, S, and Z) are calculated in this way, the mathematical result of the equation t '= ((j * C + D) > S) + E will actually always be theoretically the same as the result of the equation t' = (j * C + D) > > S, except that the value of (j * C + D) can sometimes fall outside the scale of values from 0 to 2B-1, while the value of (j + C + D ') does not. For example, if you want to sample upwards the luma order of an image that has a width of 1000 luma samples with B = 32 and L '= 0, you can use F = 4, G = 2, J = 1, M = 19, S = 18, and Z = 14 using this second illustrative position calculation technique. Another possibility that will work equally well, rather than equaling origin so that all values of j * C + D 'are not negative and thus allow the use of the scale that calculates B-bits from 0 to 2B-1 that uses unsigned arithmetic, will be equivalent to the origin in addition to the right for another 2 (B "1) to allow the use of the B-bit calculation scale from -2 (B" 1) to 2) B 1) -1 that uses signed arithmetic. As in the first illustrative position technique in the previous section, there may be necessary "corner case" adjustments when T is very close (or equal) to R and R'-L 'is very close (or equal) to an energy of 2 3. Illustrative Multiple Stage Techniques for Position Calculation Methods are discussed where the available design was made to perform the calculation when using the same equation, for example, t '= ((j * C + D') > > S) + E, with the same values of variable C, D ', S, and E for all the values of j that cover the sample scale to be generated (that is, for all values of T between L and R). We now discuss how this assumption can be relaxed, allowing greater accuracy and / or reduced computational dynamic scaling requirements. Ordinarily, the re-sampling procedure proceeds from left to right (from top to bottom) to generate a sequence of consecutive samples in equally spaced positions. In the second illustrative position technique described in section V.B.2 above, we show how to change the origin when using the Equivalent parameter E can be used to make good use of the dynamic B-bit scale of the register used to calculate the part (j * C + D ') of the position calculation. Recall that the previous section, only the least significant bits S of D were retained in D ', and the rest moved in E. Thus the remaining major problem for calculation of (j + C + D') is the magnitude of j * C. Recall that T and L are integer multiples of 1-¡-2J. Ordinarily, we perform the up sampling procedure to generate a sequence of samples of increments valued in the upper resolution image, with a spacing of 2J between samples generated consecutively. In that way we want to calculate the positions t'i corresponding to the positions T1 = (p + i * 2J) ^ - 2J for i = 0 to N-1 for some value of p and N. This procedure can be summarized in pseudo-code as shown in pseudo-code 1200 of Figure 12 for some value of p and N. While i increases towards N, the value of q increases, and the maximum value of q must be maintained within the available dynamic scale of B bits. The maximum value calculated for q is (p + (N-1) * jJC + D '.) Now, instead of generating all the samples in one turn in this way, consider dividing the multi-stage procedure, for example, two stages. For example, in a two-stage procedure, the first stage generates the first samples N0 <N, and the second stage generates the remaining samples N-N0. p is a constant with respect to the turn, we can move its impact in D 'and E before the first stage. This results in a two-stage procedure illustrated in pseudo-code 1300 in Figure 13. At the beginning of each stage in pseudo-code 1300, the origin was reset so that all the least significant bits S of the first value of q for the stage they moved in E (that is, in E0 for the first stage and E ^ for the second stage). Thus, during operation of each of the two stages, it requires a smaller dynamic scale. After dividing the procedure in stages in this way, the maximum value of q will be N0 * C '+ D0, or ((N-N0-1) * C' + D1, whichever is greater, but since D0 and D , each have no more than S unsigned dynamic scale bits, this will ordinarily be a lower maximum value than in the previously described individual stage design The number of samples generated in the stage (ie, N0 for the first stage and N-N0 for the second stage) can affect the dynamic scale for the associated calculations., using a smaller number of samples in each stage will result in a smaller dynamic scale for the associated calculations. Each stage can be further divided into stages, and thus the generation of the N total samples can further be decomposed into any number of such minor stages. For example, the procedure can be divided into stages of equal size so that the blocks of, for example, 8 or 16 consecutive samples are generated in each stage. This technique can be used to reduce the required number of dynamic scale bits B to calculate q or to increase the calculation accuracy (which increases S and G + M) while maintaining the same dynamic scale, or a mixture of these two benefits. This technique for decomposing the staging procedure can also be used to perform a continuous re-sampling procedure along a very long sequence of input samples (conceptually, the sequence can be infinitely long), such as when sampling rate conversion is performed while the samples arrive from an analog to digital converter for an audio signal. Clearly, without dividing the procedure into finite-sized stages and restoring the origin in increments from each stage to the next, an infinitely long sequence of samples can not be processed by the techniques described in the previous sections, since this will require an infinite dynamic scale in the word length of procedure. However, the difficulty in applying the techniques to effectively infinite sequence lengths is not a substantial limitation in such techniques since the application to effectively infinite length will only be useful when not rounding error is made by the representation of the mark positions of hypothetical bank L 'and R "in integer units representing multiples of 1 + 2. Under the circumstances in which multiple stage position calculation techniques can be applied, they provide a way for calculations to be made throughout a sequence of infinite length of samples without accumulation of "sliding" rounding error of any form in the operation of the position calculations through the complete index conversion procedure. 4. Illustrative Increment Operation of Position Calculation An interesting special case for the concept of multiple stage decomposition described above is when the number of samples to be produced in each stage was reduced in the way for only one sample per stage. The second-code 1400 in Figure 14 represents a procedure for generating N positions t'i for i = 0 to N-1. Since the procedure is described as an up-sampling procedure (although the same principles will also apply to a down-sampling procedure), we know that for each increment of i there is a spacing of 1 in the upper resolution image and so both there is an increment less than or equal to 1 in the lower resolution image. An increase of 1 in the spatial position in the lower resolution image corresponds to a value of 2 (S + F) for C. Also, we know that D '< 2S. Therefore q = C '+ D' has a scale of 0 to less than 2 (S + F) + 2S, and therefore q can be calculated with a dynamic scale requirement of no more than B = S + F + 1 bits that use unsigned integer arithmetic. In an implementation, this dynamic scaling requirement does not vary at image size (ie, it does not depend on the value of R 'or R'- L '). For scalable video coding and many other such applications, there may be no real need to support upward sampling ratios that are very close to 1. In such applications, we can assume that C really requires no more than S + F bits. For example, if we want to sample up the luma order of an image that has a width of 1000 luma samples with B = 32 and L '= 0, we can use F = 4, G = 2, J = 1, M = 29, S = 28, and Z = 14 using this method. The result will be extraordinarily accurate to make a smaller value of B seen as a more reasonable choice. Alternatively, if we want to sample up the luma order of an image that has a width of 1000 luma samples with B = 16 and L '= 0, we can use F = 4, G = 2, J = 1, M = 13 , S = 12, Z = 14 using this method. Additional knowledge of the circumstances of the up-sampling operation to be performed can provide additional optimization opportunities. For example, if the upsampling ratio is significantly greater than two, the dynamic scaling requirement will be reduced by another bit, and so on to sample upwards higher than four, sixteen, etc., upwards. None of the changes (relative to the illustrative multiple stage position calculation technique discussed above) described with reference to the illustrative increment position calculation technique in this section affects the actual calculated values of the positions t \ for given values of C, D and S. Only the dynamic scale needed to support the calculation changes. The inner turn in pseudo-code 1400 for this form of decomposition does not require any of the multiplication operations. This fact can be beneficial to provide reduced calculation time in some computation processors.
. Additional Observations For common re-sampling ratios such as 2: 1, 3: 2, etc., any case in which rounding will not be necessary to approximate the positions L 'and R "as an integer in units of 1 -5- 2 G, there is no rounding error at all when using these methods (unlike any rounding error can be entered when rounding the final result to an integer in units of 1 ^ 2, which is an error that will be present regardless of the position calculation method).
C. Positions and Relationships of Luma and Croma Assume exact alignment of the new complete image (sampled upwards) and the reference image orders, relativized to the luma sampling grid index coordinates, the L and R positions in the Current image coordinates are L = - - and R = W- -, where W is the number of samples in the image 2 2 vertically or horizontally, depending on the relevant resampling dimension. Equivalently, we can set the origin of the image spatial coordinate system a mean sample to the left of (or about) the grid index position 0 and add 1/2 when converting from spatial image coordinates to grid index values , thus avoiding the need to deal with negative numbers when calculations are made in the spatial coordinate system. The positions L 'and R' in the referenced image (lower resolution) are referenced to the sampling grid coordinates in the same way, where in this case W is the number of samples in the referenced image rather than the new image . From the chroma sampling grid (if it is in the new image or the referenced image), the solution is somehow less direct. To construct the designated alignment of chroma samples relative to luma, consider the image rectangle that is represented by the chroma samples to be the same as the rectangle that is represented by the luma samples. This produces the following cases: - Horizontally, for chroma sampling 4: 2: 0 the types 0, 2, and 4 (see Figure 5D), the current image coordinates are define by L = - and R = W- -. 4 4 Horizontally, for chroma sampling 4: 2: 0 types 3, 1, and 5 (see Figure 5D), the current image coordinates are define by L = - and R = W- -. 2 2 Vertically, for chroma sampling 4: 2: 0 types 2 and 3 (see Figure 5D), the current image coordinates are defined by L = - and R = W--. 4 4 - Vertically, for chroma sampling 4: 2: 0 types 0 and 1 (see Figure 5D), the current image coordinates are defined by L = - and R = W--. 2 2 Vertically, for chroma sampling 4: 2: 0 types 4 and (see Figure 5D), the current image coordinates are defined by L = - and R = W--. 4 4 Horizontally, for chroma sampling 4: 2: 2 the current image coordinates for the 4: 2: 2 sampling typically used in industry practice are defined by L = - and R = W- -. 4 4 Vertically, for chroma sampling 4: 2: 2 the current image coordinates for the 4: 2: 2 sampling typically used in industry practice are defined by L = - and R = W- -. 2 2 Both horizontally and vertically, for sampling from chroma 4: 4: 4, the current image coordinates are defined by L = - - 2 and R = W- Again an equivalent can be used to place the origin coordinate system enough to the left of position L and avoid the need to work with negative numbers. The integer coordinates and the fractional phase equivalent remainder are calculated by adjusting the integer coordinate positions of the samples to occur in the order sampled upward to compensate for the fractional equivalent L, and then apply the transformation shown at the end of section V.B. Conceptually, changing the result to the right by F bits results in the integer coordinate indicator in the reference image, and subtracting the left changed integer coordinate (changed by F bits) provides the remainder of the phase equivalent.
D. Extra Precision for Position Calculation for Upward Sampling This section describes how to delineate the position calculation method of the previous VC4 section to a specific upward sampling procedure, such as an up sampling procedure that can be used to the SVC H.264 Extension. The position calculation is applied in a very flexible way to maximize the accuracy for both luma and chroma cameras in various chroma formats as well as for both progressive and interlacing frame formats. The techniques described in this section can be varied depending on the implementation and by different up sampling procedures. In the position calculations described above (in previous sections VA-C), the parameter to re-scale (which is the variable C, and after that labeled deltaX (or deltaY) in the following equations) is scaled up by a climbing factor equal to 2J (where J = 1 for luma and 2 for chroma) to form the aggregate increase to generate each sample position from left to right or top to bottom. The climb was selected so that the upward scaling increment is set to 16 bits. 1. Maximum precision for calculation of climbing position A direct way to apply the method of position calculation is to scale up the climbing parameter by a climbing factor equal to 2J, where J = 1 for luma and 2 for chroma, to form aggregate increment to generate each sample position from left to right or top to bottom. The scaling parameters are then selected to ensure that the upward scaling increase will be adjusted to a specific word length such as 16 bits. A more flexible design is described in the following sections to maximize position accuracies. to. Luma Channel The "direct" luma position calculation method can be summarized with the following illustrative equations for F = 4 and S = 12 (along the horizontal direction): deltaX = Floor (((AncholmagenBase <<15) + (ScaledBaseWidth>> 1)) + ScaledBase Width) xf = ((2 * (xP-EquivalentelzquierdoBaseEscalado) + 1) * deltaX-30720) > > 12 Here, AncholmagenBase is the horizontal resolution of the base layer or low resolution image; WideBaseScale is the horizontal resolution of the region or high resolution image window; deltaX is the intermediate re-escalation parameter, which in this case is a rounded approximation of 32768 times the inverse of the up-sampling ratio; xP represents the sample position in the high resolution image; EquivalentelzquierdoBaseEscalado represents the relative position of the image window in the high resolution image, and Piso () denotes the larger integer less than or equal to its argument. The constant value 30720 results from adding 2S "1 as the rounding equivalent prior to the right change and subtracting 2s * 2F / 2 for the half sample equivalent of the luma sampling grid reference location as discussed at the beginning of the section Previous VC It is notable that each increment of xP results in an increase of 2 * deltaX within the equations, and LSB of the quantity 2 * deltaX is always zero, so that a bit of computational precision is essentially wasted. extra precision can be obtained without any significant increase in complexity, by changing these equations to: deltaX = Floor (((AncholmagenBase < < 16) + (AnchoBaseEscalado > > 1)) + AnchoBaseEscalado) xf = ((xP - EquivalentelzquierdoBaseEscalado) * deltaX + (deltaX> > 1) - 30720) > > 12 or a form (slightly) more accurately as follows: deltaXa = Floor (((AncholmagenBase < < 16) + (AnchoBaseEscalado > > 1)) -i-AnchoBaseEscalado) deltaXb = Floor (((AncholmagenBase < < 15) + (AnchoBaseEscalado > > 1)) + AnchoBaseEscalado) xf = ((p- EquivalentelzquierdoBaseEscalado) * deltaXa + (deltaXb-30720) > > 12. The latter two forms is suggested because its superior accuracy and negligible complexity impact (although the difference in accuracy also seems very small). It should be noted that in procedural architectures in which division calculations are difficult to perform, having the result of one of these equations can simplify the calculation of the other. The value of deltaXa will always be on the scale of 2 * deltaXa plus or minus 1. The following simplified rule can therefore be derived to avoid the need to perform a division operation for the deltaXa calculation: deltaXa = (deltaXb <; < 1) Differentiation = (AncholmagenBase <<16) + (ScaledBaseWidth>> 1) -deltaXa if (Remaining Difference <0) deltaXa - also yes (Remaining Difference = ScaledBaseWidth) deltaXa + + b. Chroma Channels A factor multiplier of four can be used for chroma channels instead of a factor multiplier of two in this part of the design to allow chroma position representation for sampling 4 2 0 (when using J = 2 for chroma more than J = 1 as described for luma) therefore "direct" equations are deltaXC = Pso (((AncholmagenBaseC < < 14) + (AnchoBaseEscalado C > > 1)) -? AnchoBaseEscaladoC) XFC = (( ((4 * (XC EquivalentelzquierdcBaseEscalado) + (2 + FaseCromaBaseescaladaX)) * deltaXC) + 2048) > > 12) -4 * (2 + FASEC rom abaseX) Here, FaseCromabaseX and FaseCromaBaseescaladaX represent louver position equivalent chroma sampling for low and high resolution images, respectively The values of these parameters can be conveyed explicitly as information sent from the encoder to the decoder, or they can have specific values determined by the application All ot Variable variables are similar to those defined for the luma channel with additional "C" suffix to represent application to the chroma channel. Each increment of xC results in an increment of 4 * deltaXC within the equation. Therefore, approximately two extra bits of precision can obtained, without any substantial increase in complexity, by changing these equations to deltaXC = Floor (((AncholmagenBaseC < < 16) + (AnchoBaseEscalado C > > 1)) + AnchoBaseEscaladoC XFC = (((xC-EquivalentelzquierdoBaseEscaladoC) * deltaXC + (2 + FaseCromaBaseescaladaX) * ((deltaXC + K) < < 2) + 2048) > >. 12) -4 * (2 + FaseCromabaseX) where K = 0, 1, 0 2. using K = 0 avoid an extra operation using K = 1 or K = 2 will have little higher accuracy the correspondingly, slightly more accurate the following: deltaXCa = Floor (((AncholmagenBaseC < < 16) + (AnchoBaseEscaladoC < < 1)) + AnchoBaseEscaladoC) deltaXCb = Floor (((AncholmagenBaseC. < < 14) + (AnchoBaseEscaladoC < < 1)) + AnchoBaseEscaladoC) XFC = (((xC-EquivalentelzquierdoBaseEscaladoC) * deltaXCa + (2 + FaseCromaBaseescaladaX) * deltaXCb + 2048) > > 12) -4 * (2 + PhaseChromabaseX) As with the luma case, the last variant is preferred since the difference in complexity seems insignificant (although the difference in precision also seems very small). c. Intertwined Field Coordinates The reference for the coordinate system of an image is ordinarily based on half-sample positions in luma frame coordinates, which results in the scale factor of two for luma coordinate reference positions as described above. A change of half sample in luma frame coordinates corresponds to a change of a quarter of sample in frame coordinates of chroma 4: 2: 0, which is why we currently use a factor of four more than a factor of two in the scale for the chroma coordinates as described above. Horizontally there is no substantial difference in operation for coded images representing a frame and those representing an individual field of interlaced video. However, when a coded image represents an individual field, a change of position of average sample in vertical coordinates of luma frame corresponds to a change of position of a quarter of sample in the vertical coordinates of luma field. Thus, a scale factor of four more than two must be applied in the calculation of the vertical luma coordinate positions. Similarly, when a coded image represents an individual field, a change of position of average sample in vertical coordinates of luma frame corresponds to a change of position of one eighth of sample in the vertical coordinates of chroma field. Thus, a scale factor of eight more than four must be applied in the calculation of the vertical chroma coordinate positions. These scaling factors for calculating vertical coordinate positions in coded field images can be incorporated into a calculation of vertical increase of the Y of the same as described above for the calculation of increment in coded frame images. In this case, because the increased scaling factor is applied, the precision improvement becomes approximately two bits of precision added for luma positions and three bits of precision added for chroma (vertically). 2. Restriction and Refinement of Croma 4: 2: 2 and 4: 4: 4 The position calculation method of section V.D.I.b requires the use of a different multiplication factor for chroma than for luma. This makes sense for 4: 2: 0 video and it is also reasonable for 4: 2: 2 video horizontally, but it is necessary for 4: 2: 2 video vertically for 4: 4: 4 video either horizontally or vertically, since in In those cases the resolution of luma and chroma is the same and the luma and chroma samples are therefore presumably co-located. As a result, the method of the VDIb section may require separate calculations to determine luma and chroma positions even when the resolution of luma and chroma is the same in some dimension and no phase change is intended, only because the rounding will be performed slightly different form in both cases. This is undesirable, so that a different chroma control is suggested in this section for use with sampling structures 4: 2: 2 and 4: 4: 4. to. Vertical Positions 4: 2: 2 and Horizontal and Vertical 4: 4: 4 For the vertical dimension of video 4 2 2 and for both vertical and horizontal dimensions of video 44 4, there is no apparent need for the chroma phase custom control Therefore, any time that the chroma resolution is the same as the resolution luma in some dimension, the equations for calculating chroma positions must be modified to result in calculation of the same exact positions for both luma and chroma samples at any time that the chroma sampling format has the same resolution for luma and chroma in one dimension particular One option is just to set the chroma position variables equal to the luma position variables, and another is to set the chroma position equations so that they have the same result b. Horizontal positions 4: 2: 2 While there is no functional problem with allowing chroma phase adjustment hopzontally for video 4 2.2, if there is only one type of horizontal subsampling structure that is in use for 4 2 2, such as one that corresponds to the value -1 for PhaseChromeBaseScaledX or PhaseChromeBaseX in section equations V D 1 b, it may be desirable to consider forcing these values to be used at any time that the color sampling format is 4 2 2 SAW. Extensions v Alternatives The techniques and tools described here can also be applied to multi-resolution video encoding using reference image re-sampling as found, for example in Annex P of the ITU-T International Standard Recommendation H.263. The techniques and tools described herein can also be applied not only to up sampling of image sample orders, but also to up sampling of residual data signals or other signals. For example, the techniques and tools described herein can also be applied to the up sampling of residual data signals for low resolution update coding as found, for example in Annex Q of ITU-T International Standard Recommendation H.263. As another example, the techniques and tools described herein can also be applied to up sampling of residual data signals for prediction of high resolution residual signals from lower resolution residual signals in a spatially scalable video coding design. As a further example, the techniques and tools described herein can also be applied to up-sampling of motion vector fields in a design for scalable spatial video coding. As a further example, the techniques and tools described here can also be applied to up sampling of graphic images, still images, audio sample signals, etc.
Having described and illustrated the principles of my invention with reference to the various modalities described, it will be recognized that the described modalities may be modified in order and detail without departing from such principles. It should be understood that the programs, procedures, or methods described herein are not related to or limited to any particular type of computing environment, unless otherwise indicated. Various types of general purpose or specialized computing environments can be used with or perform operations in accordance with the teachings described herein. The elements of the described modalities shown in software can be implemented in hardware and vice versa. In view of many possible modalities to which the principles of my invention may be applied, all such modalities that may come within the scope and spirit of the following claims and equivalents thereof are claimed as the invention.

Claims (22)

1. - A method comprising: performing a re-sampling of image data according to a horizontal re-sampling scale factor, where the re-sampling comprises calculation of a sample value in horizontal position / 'in a re-order -sampled, and wherein the calculation comprises: calculating a horizontal sub-sample position derived x in a form that is mathematically equivalent in result to the formula x = (i * C + D) > > S, where C is derived by approximating a value equivalent to 2S + multiplied by an inverse of the scale factor of horizontal re-sampling, and where F, C, D, and S are integer values.
2. A method comprising: performing a re-sampling of image data according to a vertical re-sampling scale factor, where the re-sampling comprises calculating a sample value in vertical position j in an order re -sampled, and wherein the calculation comprises: calculating a derivative horizontal sub-sample position and in a form that is mathematically equivalent in result to the formula y = (j * C + D) > > S, where C is derived by approximating a value equivalent to 2S + F multiplied by an inverse of the scale factor of horizontal re-sampling, and where F, C, D, and S are integer values.
3. A method that includes: perform an up sampling of a video image according to a horizontal up sampling scale factor and a vertical up sampling scale factor, where up sampling includes calculation of an interpolated sample value in a horizontal position i and in vertical position j in an order sampled upwards, and where the calculation comprises: calculating a horizontal sub-sample position derived x in a form that is mathematically equivalent in result to the formula x = (i * C + D) > > S, where C is derived by approximating a value equivalent to 2S + F multiplied by an inverse of the horizontal scale sampling factor, and where F, C, D, and S are integer values; calculate a vertical sub-sample position derived and in a form that is mathematically equivalent in result to the formula y = (j * C + D) > > S, where C is derived by approximating a value equivalent to 2S + F multiplied by an inverse of the vertical scale sampling factor; and interpolating a sample value in the sub-sample position derived x, y.
4. The method according to claim 3, wherein the calculation further comprises: selecting a horizontal re-sampling filter based on F least significant bits of the horizontal sub-sample position derived x; and select lower resolution samples to filter based on the remaining most significant bits of the horizontal sub-sample position derived x; and wherein interpolating a sample value at the sub-sample position derived x, y, comprises: interpolating the sample value based on the selected lower resolution samples and using the selected horizontal sampling filter.
5. The method according to claim 4, wherein a horizontal re-sampling filter applied for at least one value of the least significant F bits of the horizontal sub-sample position derived x is a pulse response filter. finite with more than two non-zero filter cover values.
6. The method according to claim 5, wherein a horizontal re-sampling filter applied for all values other than 0 for the least significant F digits of the derived horizontal sub-sample position x is a response filter. of finite impulse with four filter cap values of non-zero.
7. The method according to claim 3, wherein the calculation further comprises: selecting a vertical re-sampling filter based on F least significant bits of the vertical sub-sample position derived y; and selecting lower resolution samples to be filtered based on the remaining most significant bits of the vertical sub-sample position derived y; Y wherein interpolating a sample value at the sub-sample position derived x, y, comprises: interpolating the sample value based on the selected lower resolution samples and using the selected vertical sampling filter.
8. The method according to claim 7, wherein a vertical re-sampling filter applied to at least one value of the least significant F bits of the vertical sub-sample position derived y, is a response filter of Finite impulse with more than two non-zero filter cover values.
9. The method according to claim 8, wherein a vertical re-sampling filter applied for all values other than 0 for the least significant F digits of the vertical sub-sample position derived x is a response filter. of finite impulse with four filter cap values of non-zero.
10. The method according to claim 3, wherein up sampling is performed by using one or more Mitchell-Netravalli resampling filters.
11. The method according to claim 3, wherein up sampling is performed by using one or more Catmull-Rom resampling filters.
12. The method according to claim 3, wherein at least one of the vertical or horizontal values of F, C, D, or S differs based at least partly on whether the sample value is a sample value of chroma or a sample value of luma.
13. - The method according to claim 3, wherein a form that is mathematically equivalent in result to the formula x = (i * C + D) > > S comprises an implementation of the formula x = ((i * C + D) > S) + E, where E is an equivalent.
14. The method according to claim 3, wherein up sampling is performed using one or more resampling filters having filter cover values controlled by bandwidth control parameters.
15. The method according to claim 3, wherein the up sampling is performed in a spatially scalable video decoding process in layers.
16. The method according to claim 3, wherein upward sampling is performed in a spatially scalable video coding method.
17. The method according to claim 3, wherein the show up is performed to re-sample reference image.
18. The method according to claim 3, wherein the value of F is equal to 4 and the value of S is equal to 12.
19. The method according to claim 3, wherein the approximate comprises rounding .
20. The method according to claim 3, wherein the inverse is an approximate inverse.
21. The method according to claim 3, wherein at least one of the integer values F, C, D, and S are different from the horizontal computation that for the vertical calculation.
22. A method comprising: sampling upwards a video image according to an upward sampling scale factor, where upward sampling comprises calculation of a sample value interpolated in horizontal position i and in position vertical j in an order sampled upwards, and where the calculation comprises: calculating a horizontal sub-sample position derived x in a mathematically equivalent way as a result to the formula x = ((2J * i + Q) * C + D ) > > S, where C is derived by approximating a value equivalent to 2S + F multiplied by an inverse of the sampling scale factor above, and where F, C, D, S, J and Q are integer values; calculate a derivative vertical sub-sample position and, in a form that is mathematically equivalent in result to the formula y = ((2J * i + Q) * C + D) > > S; and interpolating a sample value in the sub-sample position derived x, y.
MXMX/A/2008/008762A 2006-01-06 2008-07-04 Resampling and picture resizing operations for multi-resolution video coding and decoding MX2008008762A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US60/756,846 2006-01-06
US60/786,573 2006-03-27
US60/829,515 2006-10-13

Publications (1)

Publication Number Publication Date
MX2008008762A true MX2008008762A (en) 2008-09-26

Family

ID=

Similar Documents

Publication Publication Date Title
US9319729B2 (en) Resampling and picture resizing operations for multi-resolution video coding and decoding
US8107571B2 (en) Parameterized filters and signaling techniques
US7116831B2 (en) Chrominance motion vector rounding
US7620109B2 (en) Sub-pixel interpolation in motion estimation and compensation
US7110459B2 (en) Approximate bicubic filter
US7305034B2 (en) Rounding control for multi-stage interpolation
US20030156646A1 (en) Multi-resolution motion estimation and compensation
KR20150034699A (en) Method and apparatus for image interpolation having quarter pixel accuracy using intra prediction modes
KR101562343B1 (en) Method and apparatus for image interpolation having quarter pixel accuracy using intra prediction modes
MX2008008762A (en) Resampling and picture resizing operations for multi-resolution video coding and decoding
KR101934840B1 (en) Method and apparatus for image interpolation having quarter pixel accuracy using intra prediction modes
KR101810198B1 (en) Method and apparatus for image interpolation having quarter pixel accuracy using intra prediction modes
KR101700411B1 (en) Method and apparatus for image interpolation having quarter pixel accuracy using intra prediction modes
KR20190004247A (en) Method and apparatus for image interpolation having quarter pixel accuracy using intra prediction modes
KR20190004246A (en) Method and apparatus for image interpolation having quarter pixel accuracy using intra prediction modes