WO2012058394A1

WO2012058394A1 - Systems and methods for adaptive video coding

Info

Publication number: WO2012058394A1
Application number: PCT/US2011/058027
Authority: WO
Inventors: Serhad Doken; Zhifeng Chen; Jie Dong; Yan Ye
Original assignee: Vid Scale, Inc.
Priority date: 2010-10-27
Filing date: 2011-10-27
Publication date: 2012-05-03
Also published as: KR20130105870A; AU2011319844A1; CN103283227A; EP2633685A1

Abstract

Systems and methods are described for determining an optimised sampling ratio for coding video data that reduces overall distortion introduced by the coding process. It seeks to balance the information loss introduced during downsampling and the information loss introduced during coding. The sampling ratio is generally determined by reducing, or in some cases minimizing, the overall error introduced through the downsampling process and the coding process, and may be adaptive based on the content of the video data being processed and a target bit rate. Computation power can be saved by coding a downsampled video. The process derives a plurality of downsampling ratios, and selects a downsampling ratio that reduces the total amount of distortion introduced during the down - sampling and coding stages. The down - sampling ratio may be selected given the available data transmission capacity, input video signal statistics, and/or other operational parameters, and may optimally reduce the overall distortion.

Description

SYSTEMS AND METHODS FOR ADAPTIVE VIDEO CODING

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application No. 61/407,329, filed October 27, 2010, the entirety of which is incorporated herein by reference.

BACKGROUND

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, and the like. Many digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), and extensions of such standards, to transmit and receive digital video information more efficiently. Although wireless communications technology has dramatically increased the wireless bandwidth and improved the quality of service for users of mobile devices, the fast-growing demand of video content, such as high-definition (HD) video content, over mobile Internet brings new challenges for mobile video content providers, distributors, and carrier service providers.

SUMMARY

In accordance with one embodiment, a video encoding method comprises receiving video data, and determining a sampling error value at each of a plurality of downsampling ratios. The video encoding method may also comprise, for a bit rate, determining a coding error value at each of the plurality of downsampling ratios and summing the sampling error value and the coding error value at each of the plurality of downsampling ratios. The video encoding method may also comprise selecting one of the plurality of downsampling ratios based on the sum of the sampling error value and the coding error value at the selected downsampling ratio,

downsampling the video data at the selected sampling ratio, and encoding the downsampled video data.

In accordance with another embodiment, a video decoding method comprises receiving compressed video data and receiving an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios. The video decoding method may also comprise decoding the compressed video data to form reconstructed video data, upsampling the reconstructed video data at the selected sampling ratio to increase the resolution of the reconstructed video data, and outputting the filtered video data.

In accordance with another embodiment, a video decoding system comprises a video decoder. The video decoder may configured to receive compressed video data, and receive an indication of a selected sampling ratio, where the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios. The video decoder may also be configured to decode the compressed video data to form reconstructed video data, upsample the reconstructed video data to increase the resolution of the reconstructed video data, and output the upsampled video data.

BRIEF DESCRIPTION OF THE DRAWINGS

A more detailed understanding may be had from the following description, given by way of example in conjunction with the accompanying drawings wherein:

FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize the adaptive coding techniques described herein;

FIG. 2 is a block diagram illustrating an example of video encoder that may implement techniques for the adaptive encoding of a video signal;

FIG. 3 is a block diagram illustrating an example of video decoder that may implement techniques for the adaptive decoding of a video signal;

FIG. 4 shows a coding scheme applying a codec directly on an input video;

FIG. 5 shows an exemplary embodiment utilizing coding with down- sampling and up- sampling stages;

FIGS. 6A and 6B show the processing illustrated in FIG. 5 decomposed into a sampling component and a coding component, respectively;

FIG. 7 is a look-up table for a in accordance with one non-limiting embodiment;

FIG. 8 is a look-up table for β in accordance with one non-limiting embodiment;

FIGS. 9A, 9B and 9C illustrate searching strategies to find the sampling ratio M_t in accordance with various non-limiting embodiments;

FIGS. 10A and 10B are process flows in accordance with one non-limiting embodiment; FIG. 11 is a block diagram of a horizontal downsampling process having a downsampling ratio of — in accordance with one non-limiting embodiment;

FIG. 12 illustrates an example downsampling process;

FIG. 13 illustrates an example upsampling process;

FIG. 14 illustrates an example Gaussian window function;

FIG. 15 illustrates pixels during an example upsampling process;

FIG. 16 illustrates an exemplary encoder architecture in accordance with one non- limiting embodiment;

FIG. 17 illustrates an exemplary decoder architecture in accordance with one non- limiting embodiment;

FIG. 18 illustrates an exemplary embodiment of the pre-processing of the video data with regard to a transcoder;

FIG. 19A is a system diagram of an example communications system in which one or more disclosed embodiments may be implemented;

FIG. 19B is a system diagram of an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 19A; and

FIG. 19C, 19D, and 19E is a system diagram of an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 19A. DETAILED DESCRIPTION

Both multimedia technology and mobile communications have experienced massive growth and commercial success in recent years. Wireless communications technology has dramatically increased the wireless bandwidth and improved the quality of service for mobile users. For example, 3^rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) standard has improved the quality of service as compared to 2^nd Generation (2G) and/or 3^rd Generation (3G). While wireless communications technology has greatly improved, the fast- growing demand of video content, such as high-definition (HD) video content for example, over mobile Internet brings new challenges for mobile video content providers, distributors and carrier service providers.

Video and multimedia content that is available on the wired web has driven users to desire equivalent on-demand access to that content from a mobile device. A much higher percent of the world's mobile data traffic is becoming video content. Mobile video has the highest growth rate of any application category measured within the mobile data portion of the Cisco VNI Forecast at this time.

As video content demands increase, so does the amount of data needed to meet these demands. The block size for processing video content under current compression standards, such as the H.264 (AVC) standard for example, is 16x16. Therefore, current compression standards may be good for small resolution video content, but not for higher quality and/or higher resolution video content, such as HD video content for example. Driven by the demand for high quality and/or resolution video content and the availability of more advanced compression techniques, video coding standards may be created that may further reduce the data rate needed for high quality video coding, as compared to the current standards, such as AVC for example. For example, groups such as the Joint Collaborative Team on Video Coding (JCT- VC), which was formed by International Telecommunication Union Video Coding Experts Group (ITU-VCEG) and International Organization for Standardization Moving Picture Experts Group (ISO-MPEG), are being created to develop video coding standards to improve video coding standards.

However, the expected long research, development and deployment period of a new video standard, based on the experience of previous video standards development, may not meet the enormously emerging demand for high quality and/or resolution video content delivery over mobile Internet as quickly as demand may require. Therefore, systems and methods are needed to meet the growing demand of high quality and/or resolution video content delivery over mobile Internet. For example, systems and methods may be provided for high quality and/or resolution video content compatibility with current standards, such as HD video content compatibility with the AVC video compression standard for example.

FIG. 1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the adaptive coding techniques described herein. As shown in FIG. 1, system 10 includes a source device 12 that transmits encoded video to a destination device 14 via a communication channel 16. Source device 12 and destination device 14 may comprise any of a wide range of devices. In some cases, source device 12 and destination device 14 may comprise wireless receive/transmit units (WRTUs), such as wireless handsets or any wireless devices that can communicate video information over a communication channel 16, in which case communication channel 16 is wireless. The systems and method described herein, however, are not necessarily limited to wireless applications or settings. For example, these techniques may apply to over-the-air television broadcasts, cable television transmissions, satellite television transmissions, Internet video transmissions, encoded digital video that is encoded onto a storage medium, or other scenarios. Accordingly, communication channel 16 may comprise any combination of wireless or wired media suitable for transmission of encoded video data.

In the example of FIG. 1, source device 12 includes a video source 18, video encoder 20, a modulator (generally referred to as a modem) 22 and a transmitter 24. Destination device 14 includes a receiver 26, a demodulator (generally referred to as a modem) 28, a video decoder 30, and a display device 32. In accordance with this disclosure, video encoder 20 of source device 12 may be configured to apply the adaptive coding techniques described in more detail below. In other examples, a source device and a destination device may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Likewise, destination device 14 may interface with an external display device, rather than including an integrated display device. In other embodiments, the data stream generated by the video encoder may be conveyed to other devices without the need for modulating the data onto a carrier signal, such as by direct digital transfer, wherein the other devices may or may not modulate the data for transmission.

The illustrated system 10 of FIG. 1 is merely one example. The techniques described herein may be performed by any digital video encoding and/or decoding device. Although generally the techniques of this disclosure are performed by a video encoding device, the techniques may also be performed by a video encoder/decoder, typically referred to as a "CODEC." Moreover, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices in which source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetrical manner such that each of devices 12, 14 include video encoding and decoding components. Hence, system 10 may support one-way or two-way video transmission between devices 12, 14, e.g., for video streaming, video playback, video broadcasting, or video telephony. In some embodiments, the source device may be a video streaming server for generating encoded video data for one or more destination devices, where the destination devices may be in communication with the source device over wired and/or wireless communication systems.

Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed from a video content provider. As a further alternative, video source 18 may generate computer graphics- based data as the source video, or a combination of live video, archived video, and computer- generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. As mentioned above, however, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be modulated by modem 22 according to a communication standard, and transmitted to destination device 14 via transmitter 24. Modem 22 may include various mixers, filters, amplifiers or other components designed for signal modulation. Transmitter 24 may include circuits designed for transmitting data, including amplifiers, filters, and one or more antennas.

Receiver 26 of destination device 14 receives information over channel 16, and modem 28 demodulates the information. Again, the video decoding process may implement one or more of the techniques described herein. The information communicated over channel 16 may include syntax information defined by video encoder 20, which is also used by video decoder 30, that includes syntax elements that describe characteristics and/or processing of macroblocks and other coded units, e.g., GOPs. Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.

In the example of FIG. 1, communication channel 16 may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media. Communication channel 16 may form part of a packet-based network, such as a local area network, a wide-area network, or a global network such as the Internet. Communication channel 16 generally represents any suitable communication medium, or collection of different communication media, for transmitting video data from source device 12 to destination device 14, including any suitable combination of wired or wireless media. Communication channel 16 may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from source device 12 to destination device 14.

Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, Part 10, Advanced Video Coding (AVC). The techniques of this disclosure, however, are not limited to any particular coding standard. Other examples include MPEG-2 and ITU-T H.263. Although not shown in FIG. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).

The ITU-T H.264/MPEG-4 (AVC) standard was formulated by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership known as the Joint Video Team (JVT). In some aspects, the techniques described in this disclosure may be applied to devices that generally conform to the H.264 standard. The H.264 standard is described in ITU-T Recommendation H.264, Advanced Video Coding for generic audiovisual services, by the ITU-T Study Group, and dated March, 2005, which may be referred to herein as the H.264 standard or H.264 specification, or the H.264/ AVC standard or specification. The Joint Video Team (JVT) continues to work on extensions to H.264/MPEG-4 AVC.

Video encoder 20 and video decoder 30 each may be implemented as any of a variety of suitable encoder circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective camera, computer, mobile device, subscriber device, broadcast device, set-top box, server, media aware network element, or the like.

A video sequence typically includes a series of video frames. A group of pictures (GOP) generally comprises a series of one or more video frames. A GOP may include syntax data in a header of the GOP, a header of one or more frames of the GOP, or elsewhere, that describes a number of frames included in the GOP. Each frame may include frame syntax data that describes an encoding mode for the respective frame. Video encoder 20 typically operates on video blocks within individual video frames in order to encode the video data. A video block may correspond to a macroblock, a partition of a macroblock, or a collection of blocks or macroblocks. The video blocks may have fixed or varying sizes, and may differ in size according to a specified coding standard. Each video frame may include a plurality of slices. Each slice may include a plurality of macroblocks, which may be arranged into partitions, also referred to as sub-blocks. Many popular video coding standards, such as H.263, MPEG-2, and MPEG-4, H.264/AVC (advanced video coding), HEVC (High Efficiency Video Coding) utilize motion- compensated prediction techniques. An image or a frame of a video may be partitioned into multiple macroblocks and each macroblock can be further partitioned. Macroblocks in an I- frame may be encoded by using the prediction from spatial neighbors (that is, other blocks of the I-frame). Macroblocks in a P- or B-frame may be encoded by using either the prediction from their spatial neighbors (spatial prediction or intra-mode encoding) or areas in other frames (temporal prediction or inter-mode encoding). Video coding standards define syntax elements to represent coding information. For example, for every macroblock, H.264 defines an mb_type value that represents the manner in which a macroblock is partitioned and the method of prediction (spatial or temporal).

Video encoder 20 may provide individual motion vectors for each partition of a macroblock. For example, if video encoder 20 elects to use the full macroblock as a single partition, video encoder 20 may provide one motion vector for the macroblock. As another example, if video encoder 20 elects to partition a 16x16 pixel macroblock into four 8x8 partitions, video encoder 20 may provide four motion vectors, one for each partition. For each partition (or sub-macroblock unit), video encoder 20 may provide an mvd (motion vector difference) value and a ref idx value to represent motion vector information. The mvd value may represent an encoded motion vector for the partition, relative to a motion predictor. The ref idx (reference index) value may represent an index into a list of potential reference pictures, that is, reference frames. As an example, H.264 provides two lists of reference pictures: list 0 and list 1. The ref_idx value may identify a picture in one of the two lists. Video encoder 20 may also provide information indicative of the list to which the ref idx value relates.

As an example, the ITU-T H.264 standard supports intra prediction in various block partition sizes, such as 16 by 16, 8 by 8, or 4 by 4 for luma components, and 8x8 for chroma components, as well as inter prediction in various block sizes, such as 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4 for luma components and corresponding scaled sizes for chroma components. In this disclosure, "NxN" and "N by N" may be used interchangeably to refer to the pixel dimensions of the block in terms of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 by 16 pixels. In general, a 16x16 block will have 16 pixels in a vertical direction (y=16) and 16 pixels in a horizontal direction (x=16). Likewise, an NxN block generally has N pixels in a vertical direction and N pixels in a horizontal direction, where N represents a nonnegative integer value. The pixels in a block may be arranged in rows and columns. Moreover, blocks need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction. For example, blocks may comprise NxM pixels, where M is not necessarily equal to N.

Block sizes that are less than 16 by 16 may be referred to as partitions of a 16 by 16 macroblock. Video blocks may comprise blocks of pixel data in the pixel domain, or blocks of transform coefficients in the transform domain, e.g., following application of a transform such as a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video block data representing pixel differences between coded video blocks and predictive video blocks. In some cases, a video block may comprise blocks of quantized transform coefficients in the transform domain.

Smaller video blocks can provide better prediction and less residual, and may be used for locations of a video frame that include high levels of detail. In general, macroblocks and the various partitions, sometimes referred to as sub-blocks, may be considered video blocks. In addition, a slice may be considered to be a plurality of video blocks, such as macroblocks and/or sub-blocks. Each slice may be an independently decodable unit of a video frame. Alternatively, frames themselves may be decodable units, or other portions of a frame may be defined as decodable units. The term "coded unit" or "coding unit" may refer to any independently decodable unit of a video frame such as an entire frame, a slice of a frame, a group of pictures (GOP) also referred to as a sequence, or another independently decodable unit defined according to applicable coding techniques.

The H.264 standard supports motion vectors having one-quarter-pixel precision. That is, encoders, decoders, and encoders/decoders (CODECs) that support H.264 may use motion vectors that point to either a full pixel position or one of fifteen fractional pixel positions. Values for fractional pixel positions may be determined using adaptive interpolation filters or fixed interpolation filters. In some examples, H.264-compliant devices may use filters to calculate values for the half-pixel positions, then use bilinear filters to determine values for the remaining one-quarter-pixel positions. Adaptive interpolation filters may be used during an encoding process to adaptively define interpolation filter coefficients, and thus the filter coefficients may change over time when performing adaptive interpolation filters.

Following intra-predictive or inter-predictive coding to produce predictive data and residual data, and following any transforms (such as the 4x4 or 8x8 integer transform used in H.264/AVC or a discrete cosine transform DCT) to produce transform coefficients, quantization of transform coefficients may be performed. Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.

Following quantization, entropy coding of the quantized data may be performed, e.g., according to content adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), or another entropy coding methodology. A processing unit configured for entropy coding, or another processing unit, may perform other processing functions, such as zero run length coding of quantized coefficients and/or generation of syntax information such as coded block pattern (CBP) values, macroblock type, coding mode, maximum macroblock size for a coded unit (such as a frame, slice, macroblock, or sequence), or the like.

Video encoder 20 may further send syntax data, such as block-based syntax data, frame- based syntax data, slice-based syntax data, and/or GOP-based syntax data, to video decoder 30, e.g., in a frame header, a block header, a slice header, or a GOP header. The GOP syntax data may describe a number of frames in the respective GOP, and the frame syntax data may indicate an encoding/prediction mode used to encode the corresponding frame.

Video decoder 30 may receive a bitstream including motion vectors encoded according to any of the techniques of this disclosure. Accordingly, video decoder 30 may be configured to interpret the encoded motion vector. For example, video decoder 30 may first analyze a sequence parameter set or slice parameter set to determine whether the encoded motion vector was encoded using a method that keeps all motion vectors in one motion resolution, or using a method where the motion predictor was quantized to the resolution of the motion vector. Video decoder 30 may then decode the motion vector relative to the motion predictor by determining the motion predictor and adding the value for the encoded motion vector to the motion predictor.

Video encoder 20 and video decoder 30 each may be implemented as any of a variety of suitable encoder or decoder circuitry, as applicable, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic circuitry, software, hardware, firmware or any combinations thereof. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined video encoder/decoder (CODEC). An apparatus including video encoder 20 and/or video decoder 30 may comprise an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular telephone. FIG. 2 is a block diagram illustrating an example of video encoder 200 that may implement techniques for the adaptive encoding of a video signal. Video encoder 200 may perform intra- and inter-coding of blocks within video frames, including macroblocks, or partitions or sub-partitions of macroblocks. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames of a video sequence. Intra-mode (I-mode) may refer to any of several spatial based compression modes and inter-modes such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode) may refer to any of several temporal-based compression modes. Although components for inter- mode encoding are depicted in FIG. 2, it should be understood that video encoder 200 may further include components for intra-mode encoding. However, such components are not illustrated for the sake of brevity and clarity.

The input video signal 202 is processed block by block. The video block unit may be 16 pixels by 16 pixels (i.e., a macoblock (MB)). Currently, JCT-VC (Joint Collaborative Team on Video Coding) of ITU-T/SG16/Q.6/VCEG and ISO/IEC/MPEG is developing the next generation video coding standard called High Efficiency Video Coding (HEVC). In HEVC, extended block sizes (called a "coding unit" or CU) are used to compress high resolution (1080p and beyond) video signals more efficiently. In HEVC, a CU can be up to 64x64 pixels and down to 4x4 pixels. A CU can be further partitioned into prediction units or PU, for which separate prediction methods are applied. Each input video block (MB, CU, PU, etc.) may be processed by using spatial prediction unit 260 and/or temporal prediction unit 262.

Spatial prediction (i.e., intra prediction) uses pixels from the already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal. Temporal prediction (i.e., inter prediction or motion compensated prediction) uses pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal. Temporal prediction for a given video block is usually signaled by one or more motion vectors which indicate the amount and the direction of motion between the current block and one or more of its reference block(s).

If multiple reference pictures are supported (as is the case for the recent video coding standards such as H.264/AVC or HEVC), then for each video block, its reference picture index is also sent. The reference index is used to identify from which reference picture in the reference picture store 264 the temporal prediction signal comes. After spatial and/or temporal prediction, the mode decision and encoder controller 280 in the encoder chooses the prediction mode, for example based on a rate-distortion optimization method. The prediction block is then subtracted from the current video block at adder 216 and the prediction residual is transformed by transformation unit 204 and quantized by quantization unit 206. The quantized residual coefficients are inverse quantized at inverse quantization unit 210 and inverse transformed at inverse transformation unit 212 to form the reconstructed residual. The reconstructed block is then added back to the prediction block at adder 226 to form the reconstructed video block. Further in-loop filtering, such as deblocking filter and adaptive loop filters 266, may be applied on the reconstructed video block before it is put in the reference picture store 264 and used to code future video blocks. To form the output video bitstream 220, coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are sent to the entropy coding unit 208 to be further compressed and packed to form the bitstream 220. As described in more detail below, the systems and methods described herein may be implemented, at least partially, within the spatial prediction unit 260.

FIG. 3 is a block diagram of a block-based video decoder in accordance with one non- limiting embodiment. The video bitstream 302 is first unpacked and entropy decoded at entropy decoding unit 308. The coding mode and prediction information are sent to either the spatial prediction unit 360 (if intra coded) or the temporal prediction unit 362 (if inter coded) to form the prediction block. The residual transform coefficients are sent to inverse quantization unit 310 and inverse transform unit 312 to reconstruct the residual block. The prediction block and the residual block are then added together at 326. The reconstructed block may further go through in-loop filtering unit 366 before it is stored in reference picture store 364. The reconstructed video 320 may then be sent out to drive a display device, as well as used to predict future video blocks.

According to an embodiment, a pre-processing and/or post-processing system architecture may compress raw video data and/or transcode an already encoded video data, such as a bit stream for example, with further compression through jointly controlling the transform- domain quantization and spatial-domain down-sampling, without changing the standard format of the video stream. The pre-processing and/or post-processing system architecture may encode and/or decode video data in any format, such as H.263, MPEG-2, Flash, MPEG-4, H.264/AVC, HEVC or any similar multimedia format for example. These, and similar, formats may use such video compression methods as discrete cosine transform (DCT), fractal compression methods, matching pursuit, or discrete wavelet transform (DWT) for example, as described above.. A limitation of various existing compression standards, such as H.264/AVC, is the specified Macroblock (MB) size, such as 16x16 for example. Inside one MB, pixels may be partitioned into several block sizes dependent on the prediction modes. The maximum size of any block may be 16x16 and any two MBs may be independently transformed and quantized. This technique may provide very high efficiency for CIF/QCIF and other similar resolution contents. However, it may not be efficient for the video contents of higher resolutions, such as 720p, 1080i/1080p and/or similar or even higher resolutions for example. This may be because there is much higher correlation among pixels in local areas. As a result, the specified 16x16 MB size may limit further compression of utilizing such correlation information across adjacent MBs.

The high resolution content encoded by small MB size may cause unnecessary overhead. For example, in an H.264 bit stream, the codec elements may include four types of information: 1) motion information, such as motion vector and reference frame index for example; 2) residual data; 3) MB header information, such as MB type, coded block pattern, and/or quantization parameters (QP) for example; 4) sequence-, picture-, and/or slice-layer syntax elements. While the motion information and residual data may be highly content-dependent, the MB header information and/or syntax elements may be relatively constant. Thus the MB header information and/or syntax elements may represent the overhead in the bit stream. Given the content and/or encoding profile, a higher compression ratio of an encoder may be achieved by reducing the bit rate of residual data. For example, a higher compression ratio of an H.264 encoder may be achieved by reducing the bit rate of residual data. The higher the compression ratio is, the higher the percentage of overhead that may exist. As a result, in the high resolution and/or low bit rate applications, overhead may consume a large part of the bit stream for transmission and storage. Having such a large part of the bit stream consumed by overhead may cause an encoder, such as an H.264 encoder for example, to have a low efficiency.

The pre-processing and/or post-processing in accordance with the systems and methods described herein may lead to less overhead, alignment of the motion compensation accuracy and reconstruction accuracy, enhancement of residual accuracy, and/or less complexity and/or memory requirements. Less overhead may be produced due to the downsampling performed in the pre-processing, as the number of MB may be reduced to the downsampling rate. Thus, the near-constant MB header and/or slice-layer syntax elements may be reduced.

The motion compensation accuracy and reconstruction accuracy may also be aligned in the pre-processing and/or post-processing of video data. In the down-sampled frames, the number of motion vector differences (MVD) may be reduced. According to an embodiment, the reduction in MVD may save bits for encoding motion information. In an embodiment, the saved bits may be used to encode the prediction error in low bit rate scenarios. Therefore, the reconstruction accuracy may be improved by aligning the accuracy of motion compensation and accuracy of quantized prediction error.

The pre-processing and/or post-processing of video data may also enhance residual accuracy. For example, in the down-sampled frames, the same transform block size may correspond to a higher transform block size in the original frames. According to one example, an 8x8 transform block size may correspond to a transform block size of 16x16 at ¼ downsampling rate. As the quantization steps may be the same for the transform coefficients in an encoder, such as an H.264 encoder for example, the encoder may lose information in both high frequency and low frequency components. Therefore, the pre-processing and/or post-processing of video data described herein may preserve the higher accuracy of low frequency components than traditional encoders for the high resolution and low bit rate encoding cases, which may produce better subjective quality. The upsampling process in a decoder may be used to interpolate the pixels to recover the original frames.

The pre-processing and/or post-processing of video data may also result in less complexity and/or memory requirements. As the number of pixels for encoding after down- sampling may be reduced to the down-sampling rate, the complexity and/or memory requirement of encoding (or transcoding) may be reduced to the same level. Accordingly, the complexity and/or memory requirements of decoding may also be reduced to the same level. These encoding and/or decoding processes may facilitate the application of lower resolution encoders and/or decoders, such as the encoding in mobile phones and other resource-limited devices for example. According to an exemplary embodiment, these encoding and/or decoding processes may facilitate the incorporation and/or application of the H.264 encoder and/or decoder in mobile phones.

To address the limitation of traditional encoders in the high resolution and/or low bit rate applications, the systems and methods described herein may independently and/or jointly control the transform-domain quantization and spatial-domain down-sampling to achieve further compression. The quantization and down-sampling may be performed with an acceptable subjective quality. FIG. 4 shows a coding scheme applying a codec (i.e., a H.264/AVC codec) directly on an input video. FIG. 5 shows an exemplary embodiment utilizing coding with down- and up-sampling stages. Compared with the approach illustrated in FIG. 4, the approach illustrated in FIG. 5 may be able to allocate more bits to code the intra- and inter- prediction errors in the coding step; hence it may obtain a better reconstruction with higher visual quality. Although down-sampling introduces information loss (specifically the high frequency components), when the operating bit rate is low due to network limitation, better reconstruction at the coding stage may outweigh the detail loss in the downsampling process; hence better overall visual quality is provided. Additionally, computation power can be saved by coding a smaller (i.e. downsampled) video. However, since downsampling causes information loss prior to the coding process, if the original video is downsampled too much, information loss introduced upfront may outweigh the benefit of higher fidelity in the coding stage. Thus, the systems and methods described herein generally seek to balance the information loss introduced during downsampling and the information loss introduced during coding. Specifically, the processes described herein may derive a plurality of downsampling ratios, and select a downsampling ratio that reduces a total amount of distortion introduced during the down- sampling and coding stages. The selected down-sampling ratio may be selected given the available data transmission capacity, input video signal statistics, and/or other operational parameters. In some embodiments, the selected down-sampling ratio may be the down-sampling ratio that optimally reduces the overall distortion.

The flexibility provided by the filters described herein may be more useful than other filters, such as anti-aliasing filters that may provide only 2x2 down- sampling and up-sampling, for example. At high bit-rates, such as 512 kbits/s for CIF, for example, the downsampling ratio 2x2 is so high that the high frequency components are significantly lost and cannot be compensated even if using lossless coding. Therefore, at high bit-rates, the sampling ratio may be adjusted to provide a tradeoff between resolution reduction and detail preserving.

Referring now to FIG. 5, the downsampling ratio denoted as M, is a variable which may be determined as a function of various parameters, such as the available data transmission capacity, Quality of Service Class Identifier (QCI) of the bearer associated with the video, and characteristics of the input video signal. For example, if the data transmission capacity is relatively plentiful for the input video signal, then an H.264/AVC encoder will have enough bits to code the prediction errors; in this case, the value of M may be set approaching 1.0. On the contrary, if the data transmission capacity is deemed to be insufficient for the input signal, then a larger value of M may be selected (resulting in more downsampling), as the information loss due to the downsampling process will be well compensated by lesser coding error due to the coding stage. As the data transmission capacity is usually represented by bit rate, which may be in fine granularity, in various embodiments the value of M may be very flexible. As described in more detail below, systems and methods are provided to determine a selected sampling ratio M based on, at least in part, on the available data transmission capacity and the input video signal. Given the selected sampling ratio M, a dedicated filter may be calculated to downsample the video for coding and upsample the decoded video for display. Various techniques for design anti-aliasing filters for arbitrary rational-valued sampling ratios are also described in more detail below with regard to FIGS. 1 1-15.

Still referring to FIG. 4 and FIG. 5, the video input is denoted as / and the output of the conventional codec is denoted fi and the output of an example codec in accordance with the systems and methods is denoted as The reconstruction error of the codec in FIG. 4 may be defined as equation (1):

°i = £^■[⁽ - A⁾²] (1)

The reconstruction error of the codec in FIG. 5 may be defined as equation (2):

¾² = E[(f - f₂)²] (2)

Therefore, the codec in FIG. 5 performs better than the codec in FIG. 4, if σ is smaller than σι . In accordance with the systems and methods described herein, the gap between σ and σ may be increased (and in some cases maximized) by finding the M, as shown in equation(3):

Since σ may be a constant given the target bit-rate, in some embodiments equation (3) is simplified and is stated as equation (4):

Therefore, in accordance with the systems and methods described herein, for a given bit-rate, the sampling ratio M may be identified, such that the reconstruction error (σ ) of the codec shown in FIG. 5 is reduced. In some embodiments, the sampling ratio M may be determined which will result in reconstruction error reaching the minimum (or at least substantially near the minimum). In some embodiments, the sampling ratio M is selected from among of set of predetermined sampling ratios, where the selected ratio M provides the smallest reconstruction error from among the set of predetermined sampling ratios.

In some embodiments, M is a scalar, such that the horizontal and vertical directions have the same ratio. Given the resolution of the video W x H, the resolution of the downsampled

W H

video is — x— . For some embodiments with decoders that support non-square sample (i.e., a sample aspect ratio (SAR) is not equal to 1 : 1) and can interpolate the downsampled video to the full-resolution with correct picture aspect ratio (PAR), the horizontal and vertical ratios may be different. In this case, M=[M_¾, M_v] may be a vector, with M_¾ and M_v representing the sampling ratios for the horizontal and vertical directions, respectively. Thus, while some example embodiments are described in a scalar context, this disclosure is not so limited. Instead, some embodiments may utilize a coding process with uneven ratios applied for each direction.

For ease of explanation, the processing illustrated in FIG. 5 may be decomposed into the sampling component (FIG. 6A) and the coding component (FIG. 6B). Referring to the sampling component shown in FIG. 6A, for the input original video sequence upsampling with a factor M 608 is applied right after downsampling with a factor M 602 to generate f} that is, the error between / and is caused only by sampling and may be referred to as "downsampling error" and denoted as σ , which may be defined by equation (5):

video di, and di is encoded by encoder 612 and decoded by decoder 614 to obtain the reconstruction signal <¾, which is the degraded version of di. The error between di and is caused only by coding and may be referred to as "coding error" and denoted as a_c ², which may be defined by equation (6): a_c ² = E [(d₁ - d₂)²] (6)

The relationship among (Equation 2) and a_d and a_c may thus be defined by equation (7):

2

σ₂ = σ_ά + σό (V)

Therefore, the optimization problem in (4) may be re-written as in equation (8):

M = arg min_M (μσ| + σ? (8)

In equations (6) and (7), μ is a weighting factor in the range of [0, 1]. For the purposed of simplification, but without the loss of generality, the weighting factor μ is set to 1 for the exemplary embodiments described herein.

Estimation of Sampling Error

During the sampling stage, / may be filtered by an anti-aliasing filter, which may be a type of low-pass filter, before /is downsampled. Additional details regarding example filters are described below with regard to FIGS. 1 1-15. The output of the sampling stage, denoted as / (FIG. 6A), is a blurred version of /, because no longer possesses the energy components with frequency components higher than the cut-off frequency of the anti-aliasing filter applied to / Therefore, in some embodiments, the sampling error can be measured in the frequency domain by measuring the energy of the high frequency components that exist in /but are lost in /. In accordance with various embodiments, the energy distribution of / can be modeled based on the real Power Spectral Density (PSD) or the estimated PSD, as described in more detail below. Alternatively, other techniques may be used to assess the sampling ratio's effect on the video signal's frequency content.

Data-Based Estimation of PSD of f

Given a Wide-Sense Stationary (WSS) random field with auto-correlation R(x_h, x_v), the PSD 5_χχ ω₁, ω₂) may be calculated by 2-D discrete-time Fourier transform (DTFT) as in equation (9):

(9)

R(x_h, τ_ν) may be an estimate based on a set of video signals. Applying 2-D DTFT to the estimated R(x_h, x_v) produces an estimated PSD, which may no longer be consistent. In accordance with various embodiments, PSD is estimated by the periodgram of the random field, as given in equation (10):

(10) where W and H represent the width and height of a video sequence. The factor— may be used

WH

to guarantee that the total energy in frequency domain is equal to that in the spatial domain, as shown in equation (11):

W-l H-l

§_χχ(ω₁, ω₂)άω₁άω₂ = ^ ^ |x[w, h] |²

w=0 h=0

(1 1) In accordance with the systems and methods described herein, when the video sequence / is given, which means the input is a deterministic 2-D signal instead of a WSS random field, §χχ(^ωι_< ^ω ₂) in equation (10) is also known as the energy spectral density (ESD).

In equation (10), x[w,h] is one frame in the video sequence /; §_χχ(ω₁, ω₂) is x[w, z]'s representation in the frequency domain. In one embodiment, the video sequence / may consist of consistent content, such as a single shot. In this case, §_χχ ω_!, ω₂) calculated based on one typical x[w,h] , e.g., the first frame, in / may represent the energy distribution of the whole sequence / In another embodiment, where / contains scene changes; in this case, §_χχ ω_!, ω₂) can be the average of a plurality of PSDs: §_χχ1 ω_!, ω₂), §_χχ2 (ω₁, ω₂), etc., which are calculated based on a plurality of frames, xi[w,h], X2[w,h], etc., respectively. Further, the frames Xi[w,h] (i=l,2,etc.) may be selected from scene #i.

In some embodiments, the techniques for estimating the PSD of the whole sequence may vary. For example, in one embodiment a plurality of frames: xi[w,h], X2[w,h], etc. may be picked out from / at a regular interval, e.g., one second, and a plurality of corresponding PSDs: §_χχ1 (ω₁₍ ω₂), SXX2 (ω₁₍ ω₂), etc. may be calculated and averaged to generate §_χχ(ω₁₍ ω₂). In one embodiment, the video sequence / is divided into / segments, where each segment consists of a group of successive frames (for example, such segmentation may be based on content, motion, texture, and the structure of edges, etc), and has an assigned weight of ¼¾. Then, the overall PSD S_xx(ro₁₍ ω₂) is set to the weighted average of PSDs of frame Xi[w,h] (i=0,l,2, . . . 1-1) , each picked out from segment #i, as shown in equation (12):

SxxCoH, ω₂) =

(12)

Model-Based Estimation of the PSD off

In some embodiments, such as embodiments associated with real time video streaming, none of the frames that represent the typical content of a sequence may be accessible for pre- processing (i.e., x[w,h] in equation (10)), to estimate PSD. Therefore, in some embodiments, the PSD S_xx may be modeled using formulas, as shown in equations (13), (14) and (15):

§_Η(ωι, ω₂) = Ρ ω₁₍ ω₂, b) (13) where b = [b₀, b₁, ... &_n_i] is a vector containing the arguments of the function F(-). In one embodiment, the function F(-) used to model S_xx has one parameter, as shown in equation (14):

where K is a factor to ensure energy conservation. Since the exact total energy in the spatial domain is unknown (since x[w,h] is unavailable), in some embodiments it may be estimated as shown in equation (15):

π π W-l H-1

I I §_χχ(ω₁, ω₂)άω₁άω₂ = ^ ^ |x[w, h] |² = W x H x 128²

w=0 h=0

(15)

In equation (14), b₀ is an argument which may be determined by the resolution and content of the video sequence. In one embodiment, the content of b₀ is classified into three categories: simple, medium, and tough. Empirical values of b₀ for different resolutions and context in accordance with one non-limiting embodiment are shown in Table 1.

Estimation of the PSD of f₃

Since the ratio M is a rational number, it can be represented as - , A≥ B. Thus, a downsampled video has the resolution w x ^ H^. In other words, the proportion of the reduced resolution is equal to (1— -). In the frequency domain, the proportion of the lost frequency components is also equal to (1— -) and all these lost components are located in the high frequency domain, if the anti-aliasing filter applied to / has a sharp cut-off frequency at

± - TT. In this ideal case, (i.e., the output of down-sampling followed by up-sampling), all the high frequency components of f₃ in FIG. 6A in the band^— IT,—— TCJ and "^ ⁷¹] ^are lost. The PSD of /j, denoted as S_yy(< >₁, ω₂), may be estimated from S_xx(a)₁₍ ω₂) by setting the values of §χχ(ω₁₍ ω₂), ( ω₁₍ ω₂ G [— ττ,— - ττΐ U [-ττ, ττΙΥ equal to zero, as shown in equation (16):

(16)

It is noted that the estimation of § ω₁, ω₂) in (1 1) may not be exactly true, because the anti-aliasing filter does not have an ideally sharp cut-off frequency, but it is a good approximation of the true PSD of fc.

Furthermore, when the horizontal and vertical directions have different sampling ratios

A A ^Λ

M_h =— and M_v =— , respectively, the estimation of 5_νν ω₁₍ ω₂) may be re-written as in equation (17):

(ω₁₍ ω₂), if o^ e -— ττ,— ττ and ω_? £

SyyCO⁾!, 0⁾ ₂) = Ah Ah

0, ^~Τ^π'Τ^π

otherwise

(17)

Sampling Error Calculation

After estimating the PSD of / and fo, (i.e., §_χχ(ω₁₍ ω₂) and § ω₁, ω₂)), the downsampling error σ may be calculated by equation (18):

° = 77777 I \ [5_Χχ(ω!, ω₂) - SyyCcDt, ω₂)] άω₁άω₂

^■' -π^■' -π

(18)

Generally, the downsampling error ad provided by equation (18) provides an indication of the difference of high frequency energy content between the input video signal and the video signal sampled at a downsampling rate. Other techniques may be used to generate downsampling error σ . For example, in some embodiments, the downsampling error σ may be obtained by determining the mean squared error (MSE) between the downsampled and upsampled video signal f₃ and the input video signal / For another example, in some embodiments, the downsampling error a_d may be obtained by applying the anti-aliasing filter to the input video signal and determining the MSE between the filtered / and the original input video / For another example, in some embodiments, the downsampling error σ may be obtained by applying a high-pass filter that has the same cut-off frequency with the aforementioned anti-aliasing filter to the input video signal / and determining the average energy per pixel of the high-pass filtered /

Estimate the Coding Error

Given the target bit-rate R, the coding error a_c ² may be estimated by a model. In some embodiments, the following rate-distortion (R-D) model shown by equation (19) is used: σ: c

(19) where r is the average number of bits allocated to each pixel, i.e., bits per pixel (bpp). In some embodiments r may be calculated by equation (20):

R x M_h x M_v

Γ^~ fps x W x H

(20)

In equation (20), fps is the frame rate, which means the number of frames captured in each second, M¾ and M_v are the sampling ratios in the horizontal and vertical directions, respectively, W is the horizontal resolution, H is the vertical resolution, and R is the bit rate.

The bit rate R may be acquired, or otherwise deduced, by a variety of techniques. For example, the bit rate R may be provided by a user of the coding system. In some embodiments, a network node associated with the coding system, such as a video server or media-aware network element, may monitor the bit rates associated with various video streams. The video encoder may then query the network node to request a bit rate indication for a particular video stream. In some embodiments, the bit rate may change over time, such as during handovers or IP Flow Mobility (IFOM) functionality associated with a user device receiving video. The encoder may receive messages containing updated target bit rates. In some embodiments, the bit rate R may be deduced by the decoder from the Quality of Service Class Indicator (QCI) assigned to the video stream. For example, QCIs one through four currently offer guaranteed bit rates (GBR). The GBR may be utilized by the video encoder to determine coding error a_c ². In some embodiments, the bit rate R may be determined, or otherwise provided, by a user device associated with a decoder. For example, the user device may provide to the encoder through appropriate signaling an estimate of the total aggregate data transmission throughput. In the case of user devices capable of multi-radio access technology (RAT) communications, the bit rate R may be an indication of the throughput through two or more radio access technologies such as a cellular RAT and a non-cellular RAT, for example. In some embodiments, the RTP/RTCP protocols may be used to ascertain bit rate information. For example, RTP/RTCP may be run in a WRTU and a basestation in order to collect the application layer bit rate. This bit rate R may then be utilized in equation (20).

The R-D model in equation (19) has two parameters: a and β, of which the values vary according to factors including, but not limited to, the content of the sequence, the resolution of the sequence, the encoder implementation and configurations, and so forth. Various embodiments for finding the appropriate values of a and β are described in more detail below. Once values for a and β have been identified using any suitable technique, the coding error a_c ² for a particular sampling ratio may then be calculated. For sampling ratios M_¾ and M_v, the average bits per pixel r using equation (20) may be first determined. Next, the determined average bits per pixel r may then be used to calculate the coding error a_c ², as described by equation (19). The coding error σ may then be calculated for different sampling ratios. First, a new average bits per pixel r may be calculated using new sampling ratio values in equation (19). This new value of r may then be used to solve equation (19).

Values of a and β - Off-line Mode

In some embodiments, when the sampling ratio may be selected without time constraint, off-line training may be utilized to find the values for a and β which which most accurately predict, or model, the distortion from the coding process. Thus, in one embodiment, a video may be preprocessed to determine a relationship between the bit-rate and the coding distortion. The determined relationship may then be utilized when determining a sampling ratio as the available bit rate, or target bit rate, changes over time during video transmission. The relationship may be influenced by factors including by not limited to the content of the video data, the resolution of the video data, the encoder implementation and configurations, and so forth.

Fixing the aforementioned factors, an encoder configured at known settings may encode a given sequence at the full-resolution. This simulation may be performed at a range of bit-rates {Ro, Ri, RN-I} , producing a set of distortions {D₀, D_H DM-I} corresponding to each bit- rate. The bit-rates may be normalized to bpp {r₀, r_h ..., rn-i) using equation (21): fps X W X H

(21)

The corresponding distortions may be normalized accordingly to mean squared error (MSE), denoted as {do, di, dn-i}. The pairs of normalized bit-rate and distortion \r di\ (0<i<N) may be plotted as an R-D curve. A numerical optimization algorithm may be used to fit that R-D curve by solving the equation in (22) to find desired values of a_opt and β_ορΐ.

Values of a and β - On-line Mode

For some embodiments, the video sequence or a segment of the sequence is accessible for pre-processing, but off-line training is unaffordable for the applications because of the high complexity, for example. In these embodiments, a signal analysis may be performed based on the available part of the video sequence and useful features may be extracted that reflect the characteristics of the video sequence, such as motion, texture, edge, and so forth. The extracted features and the values of parameter a and β have high correlations, and therefore the extracted features may be used to estimate the values of a and β providing a reduction in coding-induced distortion.

In one embodiment, the video sequence based on the PSD (described in detail above) may be analyzed and two features may be extracted from S_xx. One feature that may be utilized is the percentage of energy of the DC component, F_DC, and the other feature is the cut-off frequency, ±ω_ε, where the energy of the components with frequencies outside the range of ±ω_ε has less than a threshold T (e.g., Γ=0.5%) of the total energy. Generally, the cut-off frequency ±o)_c represents the PSD decay speed toward the high frequency band, with the absolute value of ±o)_c is in the range [0, π]. Thus, the smaller the value of ±ω_ε, the faster the PSD decays toward the high frequency band. F_DC and ω₀ may be calculated by equations (23) and (24), respectively:

In one embodiment, F_DC is truncated to the range of [0.85, 0.99] and quantized by an H-step uniform quantizer. In one embodiment, ω_ε is truncated to the range of [0, 0.9 π] and quantized by an L-step uniform quantizer. These two extracted features, i.e., quantized F_DC and ω_ε, denoted as F_DC and ω₀, may be used as two indices to look up the entries in two 2-D tables to obtain the values of and β, respectively. In one embodiment, F_DC is quantized by a 15-step uniform quantizer with the reconstruction points at {0.85, 0.86, 0.98, 0.99} and ω_ε is quantized by a 10-step uniform quantizer with the reconstruction points at {Ο.Οπ, Ο. ΐπ, ..., 0.8π, 0.9π}. Look-up tables for a and β using F_DC and ώ_ε as indices in accordance with one embodiment are shown in FIG. 7 and FIG. 8, respectively. It is noted that -1.0 in some entries does not indicate the values of a or /?; instead, the combinations of F_DC and ώ_ε that goes to the entries with value -1.0 could not happen in practice.

Values of a and β - Simplified Mode

In some embodiments, such as real time video streaming, for example, none of the frames that represent the typical content of a sequence is accessible for pre-processing, (e.g., x[w,h] in equation (10)) to estimate PSD or consequently extract features from PSD to analyze the video sequence. Under these circumstances, a mode (referred to herein as a "simplified mode") may be used estimate and β.

Given the resolution and the category of the content of the input video the values of and β may be determined by looking up 2-D tables. The pre-defined resolution formats may be the commonly used ones, such as CIF, WVGA, VGA, 720p, 1080p, and so forth. In case the actual resolution of the input / is not one of the pre-defined, the most similar pre-defined resolution may be used for approximation. The content of a video sequence may include motion, texture, structure of edges, and so forth. Given the bit rate, video with simple content may be less degraded than complex videos after coding. In some embodiments, the content of a video sequence can be classified into several categories from "simple" to "tough", depending on the level of granularity that the application has. The type of content may, for example, be indicated by the users based on their prior knowledge of the video; or, when prior knowledge does not exist, the content type maybe automatically set to the default value. In one embodiment, Table 2 may be used as the 2-D look-up tables for the values of and β. Table 2 indicates values of and β for different resolutions and content in accordance with various embodiments.

Table 2

While, the pre-defined resolutions includes CIF, WVGA, 720p, and 1080p, and three categories of content (simple, medium, tough) are used, this disclosure is not so limited. In some embodiments, additional level of granularity may be included in the table. Furthermore, in some embodiments, the default content type may be set to "medium."

According to various embodiments, the complexity of the video may be ascertained through a variety of techniques. For example, in one embodiment user input is received which indicates a relative level of complexity. This user input may then be used to determine an appropriate a and β to be used in equation (19). In some embodiments, video characteristic information (such as complexity) may be received from a network node that has access to the information. Based on this video information, suitable values of a and β may be determined (e.g., via a look up table) and subsequently used in equation (19). In some embodiments, a complexity value for the video may be calculated or estimated from content statistics by prestoring some frames before downsampling the first frame. In this regard, a variety of techniques may be utilized, such as pixel value gradients, histograms, variances, and so forth.

Searching for Ratio M

Identifying the minimum of overall error

is equivalent to finding the minimum of the summation of the sampling error σ| and the coding error σ_<?, as defined by equation (8). The estimation of σ| and σ_<? in accordance with various non-limiting embodiments are discussed above. Various algorithms that may be used to search for the M to that reduces, and in some cases minimizes, the overall error are described in more detail below.

Even Sampling Ratio M for Horizontal and Vertical Directions

When the pixel aspect ratio (PAR) of the downsampled video is required to be identical to that of the full-resolution video and the shape of each pixel is required to be square, i.e., storage aspect ratio (SAR) equal to 1, the sampling ratio M = - for the horizontal and vertical directions must be the same. Thus, in some embodiments, this requirement may serve as a first constraint. As the second constraint, for many applications it may be preferred that the

B B

downsampled resolution - W x - H be integers for a digital video format. In some applications, however, some cropping and/or padding may be used to obtain integer number of pixels in either dimension. In any event, with these two constraints, the possible values of M are limited. Denoting the greatest common divisor (GCD) of W and H as G, possible ratios may be represented by equation (25).

G

M = , 0 < n < G - 1

G— n

(25)

Sometime, the output resolution is not only required to be integers, but also required to be the multiples of K. For example, some Η.264 encoders only handle the case that K is equal to 16, because they don't support padding the frames to obtain an integer number of macroblocks (MB). Under this additional constraint, the possible values of M are further reduced, and (25) may be re-written as equation (26).

(26) In any event, in some embodiments, an "exhaustive" search method may be used to find the overall errors σ for all the possible M, which are denoted as a vector M = {M₁, M₂, ... }, and select the sampling ratio Mj, which provides the minimum overall error. In other embodiments, a search method which finds an appropriate value of M without determining the overall error for all possible values of M is utilized.

FIG. 9A, 9B, and 9C illustrate searching strategies to find the sampling ratio Mj in accordance with various non-limiting embodiments. FIG. 9 A shows an exhaustive searching strategy, FIG. 9B shows searching with large steps, and FIG. 9C shows fine searching.

Referring first to FIG. 9A, after calculating the overall error σ for all values of M, M₁₃ is selected as the sampling ratio in the illustrated embodiment. To save time without missing the ; which provides a reduction in coding distortion, searching may be performed in large steps, as shown in FIG. 9B, in order to reach the range that the desired M_j is located. Then, further search with finer steps within that range is conducted as shown in FIG. 9C. In the example illustrated in FIG. 9, M has 24 possible values and the exhaustive search in FIG. 9A calculates the overall error σ 24 times to find the selected M_; ; in comparison, the combination of coarse and fine search in FIG. 9B and FIG. 9C reduces the computations by half. In some embodiments, the selected sampling ratio may be selected from any suitable ratio that produces an overall error σ beneath an overall error threshold. In other words, as opposed to identifying a single sampling ratio resulting in an "absolute" minimum overall error value, there may be a plurality of sampling ratios that result in an overall error beneath a desired overall error threshold. Thus, in accordance with various embodiments, any one of the sampling ratios resulting in an overall error level beneath the threshold may be selected as a sampling ratio for coding. In some embodiments, once a sampling ratio is identified generates an overall error level beneath a particular threshold amount, the encoding may proceed with that ratio as the selected sampling ratio.

Uneven Sampling Ratio M_h and M_v for Horizontal and Vertical Directions

In various embodiments, then the constraint of even ratio for both directions is not imposed, the horizontal vertical ratios, M¾ and M_v can be selected more freely. Possible values of Mh and M_v are shown in equation (27) and equation (28), respectively:

W

0 < m≤ W

W— m '

(27)

H

M_v = 0 < n < H - 1

(28)

Therefore, the joint event of (M¾, M_v) can have W x H possibilities. The exhaustive search that goes through all these possibilities, while possible, may be too time-consuming for most applications. As one of the fast searching strategies, the W x H possibilities may be processed using large steps, as shown in equation (29) and equation (30), where Ah and A_v are integerstep sizes for the horizontal and vertical directions, respectively:

W W

- , 0≤m≤— - 1

W - ΊΊΐΔ h Ah

(29)

H H

M_v = , 0 < n < 1

v Η - ηΔ_Ί, ' Δ„

(30) Thus, the number of possibilities reduce to — x— , among which the approximate range

( _ft, M_y) providing the smallest σ may be found. A further fine search may then be performed in the neighborhood of (M_h, M_v).

However, in some embodiments, when

has local minimums with respect to the W x H possibilities of (M¾, M_v), the sampling ratio identified found by this strategy may be one of the local minimums instead of the global optimum. In one embodiment, several ratios (M_hl, M_vl), (M_h2, M_v2 ), and so forth are identified which provide relatively small values of the error

Then, a fine search is performed in the neighborhood of each candidate to find the respectively refined ratios( _ftl, M_vl), ( _ft2, M_v2), and so forth that yield local minimum errors

within the given neighborhood. The final ratio may then be selected among (M_hl, M_vl), ( _ft2, M_v2 ), and so forth as the one yielding the lowest

.

In another embodiment, search with large steps is performed first with the constraint of even ratio in the two directions, similar to FIG. 9B. The ratio found from this first step may be identified as Mj . Note that since the constraint of even ratio is enforced, Mj is applied for both horizontal and vertical directions. Then, a range of [M_a, Mb] may be defined which encloses the desired ratio Mj , that is, M_a < Mj < Mb. The constraint of enforcing the same ratio for the horizontal and vertical directions is then released and the following search may be performed to obtain selected sampling ratios for each of the two directions separately. The search range of the horizontal and vertical ratios, M¾ and M_v, are shown in equation (31) and equation (32), respectively:

W W w

h W- m M_a M_b

(3 1)

H H H

Μ_Ί, = , < n <

v H - n ' M_a - ^~ M_b

(32)

As can be seen, the search range of (M¾ , M_v) is reduced from W x H to (^-— j x (— — ). Then, the aforementioned combination of coarse search followed by fine search is applied within this search range to find the final selected subsampling ratios for the horizontal and the vertical directions.

FIG. 10A illustrates a process flow 1000 for encoding video data in accordance with one non-limiting embodiment. At 1002 video data to be encoded is received. At 1004, a sampling error value is determined at each of a plurality of sampling ratios. In some embodiments, the sampling error value is determined using a power spectral density (PSD) of the received video data and an estimation of the PSD of downsampled video data. As described above, in various embodiments, a data-based technique may be used to estimate the PSD for the video data. In various embodiments, a model-based technique may be used to estimate the PSD for the video data. At 1006, a coding error value may be determined at each of a plurality of sampling ratios. The coding error may be based on a given bit rate. In some embodiments, the bit rate may be received from a network node, such as a video server or an end-user device, for example. For the given bit rate, a coding error model may be developed to provide coding error values for each of the plurality of sampling ratios. The coding error model may comprise a first parameter and a second parameter that each independently varies based on characteristics of the received video data. Values for the first and second parameters may be determined using any suitable technique. For example, in one embodiment, the first and second parameters are identified through a curve-fitting process. In another embodiment, the first and second parameters may be identified through consultation of various look-up tables, as described in more detail above. In some embodiments, the coding error values at 1006 may be determined before the sampling error values at 1004. At 1008, the sampling error values and the coding error values at each sampling ratio are summed to identify a sampling ratio that reduces the over error value. At 1010, a sampling ratio is selected. In some embodiments, a plurality of sampling ratios may be selected throughout the duration of the video encoding process. For example, a first sampling ratio may be selected at the beginning of the received video data and subsequently one or more additional sampling ratios may be selected during the duration of encoding event. In some embodiments, an exhaustive search is performed to identify a selected sampling ratio. In other embodiments, a non-exhaustive search is performed to identify a selected sampling ratio. For example, only errors associated with a subordinate set (subset) of the plurality of sampling ratios may be summed. From that subset of summed sampling errors and coding errors, a sampling ratios may be selected. In some embodiments, additional searching may be utilized to further refine the search for the selected sampling ratio. In any event, at 1014 the video data may downsampled at the selected sampling ratio and, at 1016, the downsampled video data may be encoded. In some embodiments, if the bit rate changes, the encoding process may be re-evaluated to determine an updated sampling ratio. Furthermore, in some embodiments, the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio. These horizontal and vertical sampling ratios may be the same or different.

FIG. 10B illustrates a process flow 1050 for decoding video data in accordance with one non-limiting embodiment. At 1052, compressed video data are received. The video data may be received from any suitable provider, such as a live video stream or previously stored video. At 1054 an indication of a selected sampling ratio is received. The sampling ratio may be based on, for example, a summation of a sampling error value and a coding error value across a plurality of sampling ratios. At 1056, the block of coefficients is decoded to form reconstructed video data. At 1058, the reconstructed video data is upsampled at the selected sampling ratio to the resolution of the reconstructed video data. At 1060, the upsampled video data may be outputted.

According to various embodiments, for an input video with the resolution WxH, the downsampling process (i.e., by downsampling unit 1606 in FIG. 16) may downsample it by factors a and b for the horizontal and vertical directions, respectively, where a and b are positive rational numbers. Then, the output video has the resolution ^ x While a and b can be any positive rational numbers, represented by and -^, respectively, where M_h, N_h, M_v, and N_v are all positive integers, the output of a downsampling process is also a digital video, which has integer numbers of rows and columns of pixels. Thus, in various embodiments, ^ and ^ (i.e., ^{W M}h- _anj ^tfxM" ) _are integers with N_h and N_v being factors of W and H to satisfy output resolution requirements.

In some embodiments, the upsampling process (i.e., by upsampling unit 1712 in FIG. 17) may have an upsampling ratio equal to the downsampling ratio of the downsampling process which results in the processed video having the same resolution as the original input video. In other embodiments, the upsampling ratio is decoupled from the downsampling ratio, which may allow for a more flexible upsampling ratio. For example, assuming the video to be upsampled has the resolution Wj xH;, the upsampling ratios may be set to c and d for the horizontal and vertical directions, respectively, and get the resolution of the output video equal to cWi xdHi, where c and d are positive rational numbers. The values of c and d may be configured before upsampling based on various criteria. For example, in order to make the output video has a resolution greater than or equal to the input resolution, the factors c and d should be greater than or equal to 1.0. Moreover, while c and d can be any positive rational numbers, represented by— and— , respectively, where K_h, L_h, K_v, and L_v are all positive integers, in various embodiments, L_h and L_v are factors of Wj and Hi, respectively. As an additional criteria for choosing c and d,

c d

the picture aspect ratio (PAR) may be kept at - = - FIG. 1 1 is a block diagram 1 100 for a horizontal downsampling process having a downsampling ratio of — . The block diagram 1 100 comprises of upsampling M_h times at block 1 102, applying filter fd_,h at block 1 104, and downsampling N_h times at block 1 106. After being processed by the block diagram 1 100, the width of the output video is ^{W Mh}_

N_h

FIG. 12 illustrates an example downsampling process with M_h = 3 and N_h = 4. The original row (FIG. 12(a)) with the spectrum F (FIG. 12(b)) is first upsampled M_h times by inserting zero-valued samples. The resulting row is illustrated as X_u in FIG. 12(c). As a result of the upsampling, the spectrum F is squeezed M_h times as shown in FIG. 12(d), denoted as F_u. In F_u, the spectra centering at integer multiples of are introduced by the zero-insertion and need to be removed by the

(as shown in block 1 104 in FIG. 1 1). Since X_u will subsequently be downsampled by a factor of N_h at block 1406, the cutoff frequency oifdh should be — (e.g.,

^Nh

± -) instead of— , as shown in FIG. 12(f). The filter gain oifd h is M_h, because the row X is

4 Mh

upsampled M_h times the length and the energy is also increased M_h times. Therefore, fd_,h can be calculated by applying the inverse Fourier transform to the ideal frequency response Hd as illustrated in FIG. 12(f), as shown in equation (33): fa,_h(n) = ^ Η_άε^άω = f M_hej™d<o = ¾inc ^ n) (33)

»h ^Nn n

where

Sinc(x) = x ^x * ^υ . (34) x = 0

By multiplying F_u (FIG. 12(d)) with Hd (FIG. 12(f)), the remaining spectrum Z/ is determined, as illustrated in FIG. 12(g). In the spatial domain, Zf corresponds to the filtered row, denoted as Xf (see the upper row in FIG. 12(e). ^ is then downsampled by the factor N_h (block 1406 in FIG. 14) by simply picking out every N_h pixels from Xf. Finally, the downsampled row Xd (FIG. 12(e)) and its spectrum Zd (FIG. 12(h)), are determined.

Similarly, a vertical downsampling filter fd_iV can be calculated using equation (35): fa,v n) = ;Γΐ M_ve!™da> = ? Sinc( n) (35)

2π

To generate the intermediate frame with the resolution M_h WxM_vH, a two-step strategy may be used: applying the horizontal and vertical filters consequently (in any order) to the original video. In some embodiments, a 2-D non-separable filter f_d,2D may be calculated, which is the 2-D convolution of /¾¾ and f_d,v, and apply fao to the original video directly.

Designing the upsampling filter may be similar to designing the downsampling filter. For example, the horizontal direction may be focused on first, and then it may be extended to the vertical direction. A resolution of the input video having a width Wi will be changed to ^{Wl Kh} after upsampling. As illustrated in FIG. 13, the upsampling process 1300 may comprise upsampling the original row K_h times by zero-insertion at block 1302, applying filter ^ _¾ at block 1304, and downsampling L_h times at block 1306 by picking one pixel out of every L_h pixels, where the filter ^ ¾ may be calculated by equation(36): fu_,h(n) = f^K K_hei™da) = Sinc( n)

^Kh

Similarly, the vertical upsampling filter f_d,v may be calculated by (37) ί^ηωάω = (37)

In some embodiments, a window function may be utilized to limit the size of the above- referenced filters. Suitable types of the window functions include, but are not limited to, Hanning, Hamming, triangular, Gaussian, and Blackman windows, for example.

In one embodiment, a Gaussian window function expressed in equation (38) is used, where N denotes the length of the filter and σ is the standard variance of the Gaussian function. FIG. 14 illustrates an example of the window function with (N=71, σ=1.5).

To generate the intermediate frame with the resolution WjK_h xHjK_v, a two-step strategy may be used: applying the horizontal and vertical filters consequently (in any order) to the original video. In some embodiments, a 2-D non-separable filter f 2D may be calculated, which is the 2-D convolution of f„ ¾ and f„ _v, and apply f_a,2nto the original video directly.

While frames may be interpolated to WM_h x HM_V and W₁K_h x E_XK_V as the intermediate for downsampling and upsampling, respectively, many of the interpolated pixels may not be used. For instance, in some embodiments, only (or ) pixels are picked out to form the final output video with the resolution χ f_or downsampling (or ^WlKh χ ^}h^∑. f_or

N_H N_V L_H L_V

upsampling). Therefore, most of the computation is not utilized. In light of this result, in some embodiments, only the pixels that will finally be picked out to form the output videos are interpolated.

FIG. 15 illustrates an embodiment where upsampling is performed with M_h = 3 and N_h = 4. In row 1502, the 1504a, 1504b, 1504c, etc. represent the integer pixels and the white ones 1506 represent inserted zeros. Instead of interpolating all the unknown positions, the pixels forming the final downsampled row are first selected, as shown in row 1508 of FIG. 15. Then these selected positions may be classified into M_h categories, based on their phases. In one embodiment, the phase of a pixel is determined by its distances from the neighboring integer pixels. In row 1512 of FIG. 15, there are three different phases, illustrated as zero phase 1514, first phase 1516, and second phases 1518.

In some embodiments, each of the down- sampling and up-sampling filters (i.Q., fd,h, fd,v, f_u,h, and^ _v) are decomposed to a set of phase filters, and each phase filter is used to interpolate the associated pixels. In Table 3, the lengths of fd,h, fd.v, fu.h, and ^_jV are denoted as ND,H, ND, V, Νυ,Η, and Νυ,ν, respectively. The decomposition process is provided in Table 3, where i is a non- negative integer and k is the index of the filter.

Table 3

FIG. 16 and FIG. 17 illustrate example embodiments of architectures including preprocessing and/or post-processing steps and that may be used before, after, and/or concurrently with encoding, decoding, and/or transcoding video data in accordance with the systems and methods described herein. The pre-processing and/or post-processing may by an adaptive process including quantization, down-sampling, upsampling, anti-aliasing, low-pass interpolation filtering, and/or anti-blur filtering of video data, for example. According to an embodiment, the pre-processing and/or post-processing of the video data may enable the use of standard encoders and/or decoders, such as H.264 encoders and/or decoders for example.

Exemplary Encoder Architecture

FIG. 16 illustrates an exemplary encoder architecture 1600 which includes the processing and pre-processing that may be performed prior to or concurrently with encoding of video data in order to obtain the selected sampling ratio. The transform 1608, quantization 1610, entropy encoding 1612, inverse quantization 1614, inverse transform 1616, motion compensation 1620, memory 1618 and/or motion estimation 1624 described above with reference to FIG. 2 may be a part of the encoder processing for the video data. The anti-aliasing filter 1604, downsampling unit 1606, and encoder controller 1622 may be a part of the pre-processing steps for encoding the video data. These pre-processing elements may be incorporated into an encoder, work independently of the encoder, or be configured to sit on top of the encoder. In any event, after the video data from the input 1602 has been encoded, the encoded video data may be transmitted via a channel 1626 and/or to storage.

In some embodiments, an output buffer may be provided for storing the output encoded video data. The buffer fullness may be monitored, or the buffer input and output rates may be compared to determine its relative fullness level, and may indicate the relative fullness level to the controller. The output buffer may indicate the relative fullness level using, for example, a buffer fullness signal provided from the output buffer to the encoder controller 1622. The encoder controller 1622 may monitor various parameters and/or constraints associated with the channel 1626, computational capabilities of the video encoder system, demands by the users, etc., and may establish target parameters to provide an attendant quality of experience (QoE) suitable for the specified constraints and/or conditions of the channel 1626. The target bit rate may be adjusted from time to time depending upon the specified constraints and/or channel conditions. Typical target bit rates include, for example 64 kbps, 128 kbps, 256 kbps, 384 kbps, 512 kbps, and so forth.

As illustrated in FIG. 16, video data is received from an input 1602, such as a video source. The video data being received may include an original or decoded video signal, video sequence, bit stream, or any other data that may represent an image or video content. The received video data may be pre-processed by the anti-aliasing filter 1604, downsampling unit 1606, and/or encoder controller 1622 in accordance with the systems and methods described herein. The anti-aliasing filter 1604, downsampling unit 1606, and/or encoder controller 1622 may be in communication with one another and/or with other elements of an encoder to encode the received video data for transmission. In some embodiments, the anti-aliasing filter 1604 may be designed using the techniques described above with respect to FIGS. 1 1-15. The preprocessing of the received video data may be performed prior to or concurrently with the processing performed by the transform, quantization, entropy encoding, inverse quantization, inverse transform, motion compensation, and/or motion estimation other elements of the encoder.

As illustrated in FIG. 16, the original and/or decoded video data may be transmitted to an anti-aliasing filter 1604 for pre-processing. The anti-aliasing filter may be used to restrict the frequency content of the video data to satisfy the conditions of the downsampling unit 1606. According to an embodiment, the anti-aliasing filter 1604 for 2: 1 downsampling may be an 1 1- tap FIR, i.e. [1, 0, -5, 0, 20, 32, 20, 0, -5, 0, l]/64. According to an embodiment, the anti-aliasing filter may be adaptive to the content being received and/or jointly designed with quantization parameters (QP). The encoder controller 1622 may determine the selected sampling ratio and communicate with the downsampling unit 1606 during pre-processing of the video data to provide the downsampling unit 1606 with the selected sampling ratio. For example, the encoder controller 1622 may adaptively select the filter types (separable or non-separable), filter coefficients, and/or filter length in any dimension based on the statistics of video data and/or channel data transmission capacity.

As illustrated in FIG. 16, the pre-processing of the video data may include down- sampling the video data using down-sampling unit 1606. The down-sampling unit 1606 may downsample at the sampling ratio M, as described in detail above. The video data may be transmitted to the downsampling unit 1606 from the anti-aliasing filter 1604. Alternatively, the original and/or decoded video data may be transmitted to the downsampling unit 1606 directly. In any event, the downsampling unit 1606 may downsample the video data to reduce the sampling ratio of the video data. Down-sampling the video data may produce a lower resolution image and/or video than the original image and/or video represented by the video data. As described above, the sampling ratio M of the downsampling unit 1606 may be adaptive to the received content and/or jointly designed with QP. For example, the encoder controller 1622 may adaptively select the downsampling ratio, such as 1/3 or a rational fraction for example, based on the instantaneous video content and/or channel data transmission capacity. The pre-processing performed by the anti-aliasing filter 1604 and/or downsampling unit 1606 may be controlled and/or aided by communication with the encoder controller 1622. The encoder controller 1622 may additionally, or alternatively, control the quantization performed in the processing of the video data. The encoder controller 1622 may be configured to choose the encoding parameters. For example, the encoder controller may be content dependent and may utilize motion information, residual data, and other statistics from the video data to determine the encoding parameters and/or pre-processing parameters, such as the sampling ratio M for example.

Exemplary Decoder Architecture

FIG. 17 illustrates an exemplary decoder architecture 1700 for the processing and postprocessing that may be performed to decode video data. The entropy decoding 1704, inverse quantization 1706, inverse transform 1708, and/or motion compensation 1720 may be a part of the decoder processing for the video data. The upsampling unit 1712, low-pass filter 1714, anti- blur filter 1016, and/or decoder controller 1710 may be a part of the post-processing steps for decoding the video data. These post-processing elements may be incorporated into the decoder 1700, work independently of the decoder, or be configured to sit on top of the decoder. In any event, after the video data from the channel 1702 has been decoded and the post-processing has been performed, the decoded video data may be transmitted via output 1718, such as to a storage medium or an output device for example.

As illustrated in FIG. 17, video data is received via a channel 1702, such as from an encoder or storage medium for example. The video data being received may include an encoded video signal, video sequence, bit stream, or any other data that may represent an image or video content. The received video data may be processed using the entropy decoding, inverse quantization, inverse transform, and/or motion compensation, as illustrated in FIG. 3. The processing of the encoded video data may be performed prior to or concurrently with the postprocessing. The encoded video data may be post-processed by the upsampling unit 1712, low- pass filter 1714, anti-blur filter 1716, and/or decoder controller 1710. The decoder controller 1710 may receive an indication of the selected sampling ratio and transmit the selected sampling ratio to the upsampling unit 1712. The upsampling unit 1712, low-pass filter 1714, anti-blur filter 1716, and/or decoder controller 1718 may be in communication with one another and/or with other elements of a decoder 1700 to decode the received video data for storage and/or output to a display. In some embodiments, the low-pass filter 1714 may be designed using the techniques described above with respect to FIGS. 14-18. As illustrated in FIG. 17, the post-processing of the video data may include upsampling the video data. The upsampling ratio may be the selected rate Mi, as described above. The video data may be transmitted to the upsampling unit 1712 after being processed by the decoder 1700 (as illustrated). The upsampling unit 1712 may increase the resolution and/or quality of the reconstructed video. For example, the upsampling of the video data may correspond to the down-sampling performed on the video data at the pre-processing of the encoder. Similar to the downsampling unit 1606 (FIG. 16), the upsampling unit 1712 may have a dynamic sampling ratio for upsampling the video data.

According to an embodiment, the post-processing of the video data may include a low- pass interpolation filter 1714. The low-pass interpolation filter may implement anti-aliasing and improve the quality and definition of the video content represented by the video data. According to an embodiment, the low-pass interpolation filter for 1 :2 upsamping may include a 4-tap FIR, i.e. [0.25, 0.75, 0.75, 0.25]. The low-pass interpolation filter 1714 may be adaptive to the content and/or jointly designed with QP. According to an embodiment, the decoder controller may adaptively select the filter types, filter coefficients and/or filter length in any dimension. The selections made by the decoder controller may be based on the statistics and/or syntax in the encoded video data, such as statistics of previous frames and QP of current frame for example, as described in detail above.

As illustrated in FIG. 17, the post-processing of the video data may, in some embodiments, include an anti-blur (or sharpening) filter 1716. The anti-blur filter 1716 may be used to compensate the blurriness caused by the down-sampling and/or low-pass filtering. According to an embodiment, the anti-blur filter may include a 2D-Laplacian filter, i.e. [0, 0, 0; 0, 1 , 0; 0, 0, 0] + [- 1, - 1 , - 1 ; -1 , 8, - 1 ; -1 , - 1, -l ]/5. The anti-blur filter may be adaptive to the content and/or jointly designed with QP. According to an embodiment, the decoder controller 1710 may adaptively select the filter types, filter coefficients, and/or filter length in any dimension. The selections may be based on the statistics and/or syntax in the encoded video bit stream, such as statistics of previous frames and QP of current frame for example, as described in more detail above.

According to an embodiment, the encoder and decoder performing the pre-processing and post-processing, respectively, may be aware of one another. For example, the encoder and decoder may have a communication link (such as communication channel 16 in FIG. 1) that enables transmission of information corresponding to the pre-processing of the video data to the decoder. Similarly, the decoder may transmit information corresponding to the post-processing of the video data to the encoder via the communication link. Such a communication link may enable the decoder to adjust the post-processing based on the pre-processing that occurs at the encoder. Similarly, the communication link may enable the encoder to adjust the pre-processing based on the post-processing that occurs at the decoder. A similar communication link may also be established with other entities performing the pre-processing and/or post-processing of the video data if the pre-processing and post-processing are not performed at the encoder and decoder, respectively.

FIG. 18 illustrates an exemplary embodiment of the pre-processing of the video data with regard to a transcoder. As illustrated in FIG. 18, video data 1804 may be received, such as a bit stream, a video signal, video sequence, or any other data that may represent an image or video content. The video data may be pre-processed by the anti-aliasing filter 1808, downsampler 1810, and/or encoder controller 1802. The anti-aliasing filter 1808, downsampler 1810, and/or encoder controller 1802 may be in communication with one another and/or with other elements of an encoder and/or decoder. The pre-processing of the received video data may be performed prior to or concurrently with the processing performed by the encoder and/or decoder. The video data may be pre-processed as described above with regard to the discussion of the pre-processing of video data in FIG. 16.

As described above with regard to FIG. 1, for example, video coded in accordance with the systems and methods described herein may be sent via a communication channel 16, which may included wireline connections and/or wireless connections, through a communications network. The communications network may be any suitable type of communication system, as described in more detail below with respect to FIGS. 19A, 19B, 19C, and 19D.

FIG. 19A is a diagram of an example communications system 1900 in which one or more disclosed embodiments may be implemented. The communications system 1900 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users. The communications system 1900 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth. For example, the communications systems 1900 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single- carrier FDMA (SC-FDMA), and the like.

As shown in FIG. 19 A, the communications system 1900 may include wireless transmit/receive units (WTRUs) 1902a, 1902b, 1902c, 1902d, a radio access network (RAN) 1904, a core network 1906, a public switched telephone network (PSTN) 1908, the Internet 1910, and other networks 1912, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements. Each of the WTRUs 1902a, 1902b, 1902c, 1902d may be any type of device configured to operate and/or communicate in a wireless environment. By way of example, the WTRUs 1902a, 1902b, 1902c, 1902d may be configured to transmit and/or receive wireless signals and may include user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, consumer electronics, or any other terminal capable of receiving and processing compressed video communications.

The communications systems 1900 may also include a base station 1914a and a base station 1914b. Each of the base stations 1914a, 1914b may be any type of device configured to wirelessly interface with at least one of the WTRUs 1902a, 1902b, 1902c, 1902d to facilitate access to one or more communication networks, such as the core network 1906, the Internet 1910, and/or the networks 1912. By way of example, the base stations 1914a, 1914b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 1914a, 1914b are each depicted as a single element, it will be appreciated that the base stations 1914a, 1914b may include any number of interconnected base stations and/or network elements.

The base station 1914a may be part of the RAN 1904, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 1914a and/or the base station 1914b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be divided into cell sectors. For example, the cell associated with the base station 1914a may be divided into three sectors. Thus, in one embodiment, the base station 1914a may include three transceivers, i.e., one for each sector of the cell. In another embodiment, the base station 1914a may employ multiple-input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.

The base stations 1914a, 1914b may communicate with one or more of the WTRUs

1902a, 1902b, 1902c, 1902d over an air interface 1916, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface 1916 may be established using any suitable radio access technology (RAT).

More specifically, as noted above, the communications system 1900 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 1914a in the RAN 1904 and the WTRUs 1902a, 1902b, 1902c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 1916 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High- Speed Uplink Packet Access (HSUPA).

In another embodiment, the base station 1914a and the WTRUs 1902a, 1902b, 1902c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 1916 using Long Term Evolution (LTE) and/or LTE- Advanced (LTE-A).

In other embodiments, the base station 1914a and the WTRUs 1902a, 1902b, 1902c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 IX, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.

The base station 1914b in FIG. 19A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, and the like. In one embodiment, the base station 1914b and the WTRUs 1902c, 1902d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN). In another embodiment, the base station 1914b and the WTRUs 1902c, 1902d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 1914b and the WTRUs 1902c, 1902d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, etc.) to establish a picocell or femtocell. As shown in FIG. 19 A, the base station 1914b may have a direct connection to the Internet 1910. Thus, the base station 1914b may not be required to access the Internet 1910 via the core network 1906. The RAN 1904 may be in communication with the core network 1906, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 1902a, 1902b, 1902c, 1902d. For example, the core network 1906 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high- level security functions, such as user authentication. Although not shown in FIG. 19A, it will be appreciated that the RAN 1904 and/or the core network 1906 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 1904 or a different RAT. For example, in addition to being connected to the RAN 1904, which may be utilizing an E-UTRA radio technology, the core network 1906 may also be in communication with another RAN (not shown) employing a GSM radio technology.

The core network 1906 may also serve as a gateway for the WTRUs 1902a, 1902b, 1902c, 1902d to access the PSTN 1908, the Internet 1910, and/or other networks 1912. The PSTN 1908 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 1910 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and the internet protocol (IP) in the TCP/IP internet protocol suite. The networks 1912 may include wired or wireless communications networks owned and/or operated by other service providers. For example, the networks 1912 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 104 or a different RAT.

Some or all of the WTRUs 1902a, 1902b, 1902c, 1902d in the communications system 1900 may include multi-mode capabilities, i.e., the WTRUs 1902a, 1902b, 1902c, 1902d may include multiple transceivers for communicating with different wireless networks over different wireless links. For example, the WTRU 1902c shown in FIG. 19A may be configured to communicate with the base station 1914a, which may employ a cellular-based radio technology, and with the base station 1914b, which may employ an IEEE 802 radio technology.

FIG. 19B is a system diagram of an example WTRU 1902. As shown in FIG. 19B, the WTRU 1902 may include a processor 1918, a transceiver 1920, a transmit/receive element 1922, a speaker/microphone 1924, a keypad 1926, a display/touchpad 1928, non-removable memory 1906, removable memory 1932, a power source 1934, a global positioning system (GPS) chipset 1936, and other peripherals 1938. It will be appreciated that the WTRU 1902 may include any sub-combination of the foregoing elements while remaining consistent with an embodiment. The processor 1918 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a graphics processing unit (GPU), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 1918 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1902 to operate in a wireless environment. The processor 1918 may be coupled to the transceiver 1920, which may be coupled to the transmit/receive element 1922. While FIG. 19B depicts the processor 1918 and the transceiver 1920 as separate components, it will be appreciated that the processor 1918 and the transceiver 1920 may be integrated together in an electronic package or chip.

The transmit/receive element 1922 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 1914a) over the air interface 1916. For example, in one embodiment, the transmit/receive element 1919 may be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 1922 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In yet another embodiment, the transmit/receive element 1922 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive element 1922 may be configured to transmit and/or receive any combination of wireless signals.

In addition, although the transmit/receive element 1922 is depicted in FIG. 19B as a single element, the WTRU 1902 may include any number of transmit/receive elements 1922. More specifically, the WTRU 1902 may employ MIMO technology. Thus, in one embodiment, the WTRU 1902 may include two or more transmit/receive elements 1922 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1916.

The transceiver 1920 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 1922 and to demodulate the signals that are received by the transmit/receive element 1922. As noted above, the WTRU 1902 may have multi-mode capabilities. Thus, the transceiver 1920 may include multiple transceivers for enabling the WTRU 1902 to communicate via multiple RATs, such as UTRA and IEEE 802.11, for example.

The processor 1918 of the WTRU 1902 may be coupled to, and may receive user input data from, the speaker/microphone 1924, the keypad 1926, and/or the display/touchpad 1928 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 1918 may also output user data to the speaker/microphone 1924, the keypad 1926, and/or the display/touchpad 1928. In addition, the processor 1918 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 1906 and/or the removable memory 1932. The non-removable memory 1906 may include random- access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 1932 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 1918 may access information from, and store data in, memory that is not physically located on the WTRU 1902, such as on a server or a home computer (not shown).

The processor 1918 may receive power from the power source 1934, and may be configured to distribute and/or control the power to the other components in the WTRU 1902. The power source 1934 may be any suitable device for powering the WTRU 1902. For example, the power source 1934 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.

The processor 1918 may also be coupled to the GPS chipset 1936, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 1902. In addition to, or in lieu of, the information from the GPS chipset 1936, the WTRU 1902 may receive location information over the air interface 1916 from a base station (e.g., base stations 1914a, 1914b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 1902 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.

The processor 1918 may further be coupled to other peripherals 1938, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 1938 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.

FIG. 19C is a system diagram of the RAN 1904 and the core network 1906 according to an embodiment. As noted above, the RAN 1904 may employ a UTRA radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. The RAN 1904 may also be in communication with the core network 1906. As shown in FIG. 19C, the RAN 1904 may include Node-Bs 1940a, 1940b, 1940c, which may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. The Node-Bs 1940a, 1940b, 1940c may each be associated with a particular cell (not shown) within the RAN 1904. The RAN 1904 may also include RNCs 1942a, 1942b. It will be appreciated that the RAN 1904 may include any number of Node-Bs and RNCs while remaining consistent with an embodiment.

As shown in FIG. 19C, the Node-Bs 1940a, 1940b may be in communication with the RNC 1942a. Additionally, the Node-B 1940c may be in communication with the RNC 1942b. The Node-Bs 1940a, 1940b, 1940c may communicate with the respective RNCs 1942a, 1942b via an lub interface. The RNCs 1942a, 1942b may be in communication with one another via an Iur interface. Each of the RNCs 1942a, 1942b may be configured to control the respective Node- Bs 1940a, 1940b, 1940c to which it is connected. In addition, each of the RNCs 1942a, 1942b may be configured to carry out or support other functionality, such as outer loop power control, load control, admission control, packet scheduling, handover control, macrodiversity, security functions, data encryption, and the like.

The core network 1906 shown in FIG. 19C may include a media gateway (MGW) 1944, a mobile switching center (MSC) 1946, a serving GPRS support node (SGSN) 1948, and/or a gateway GPRS support node (GGSN) 1950. While each of the foregoing elements are depicted as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator.

The RNC 1942a in the RAN 1904 may be connected to the MSC 1946 in the core network 1906 via an IuCS interface. The MSC 1946 may be connected to the MGW 1944. The MSC 1946 and the MGW 1944 may provide the WTRUs 1902a, 1902b, 1902c with access to circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices.

The RNC 1942a in the RAN 1904 may also be connected to the SGSN 1948 in the core network 1906 via an IuPS interface. The SGSN 1948 may be connected to the GGSN 1950. The SGSN 1948 and the GGSN 1950 may provide the WTRUs 1902a, 1902b, 1902c with access to packet-switched networks, such as the Internet 1910, to facilitate communications between and the WTRUs 1902a, 1902b, 1902c and IP-enabled devices. As noted above, the core network 1906 may also be connected to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers.

FIG. 19D is a system diagram of the RAN 1904 and the core network 1906 according to another embodiment. As noted above, the RAN 1904 may employ an E-UTRA radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. The RAN 1904 may also be in communication with the core network 1906.

The RAN 1904 may include eNode-Bs 1960a, 1960b, 1960c, though it will be appreciated that the RAN 1904 may include any number of eNode-Bs while remaining consistent with an embodiment. The eNode-Bs 1960a, 1960b, 1960c may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. In one embodiment, the eNode-Bs 1960a, 1960b, 1960c may implement MIMO technology. Thus, the eNode-B 1960a, for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 1902a.

Each of the eNode-Bs 1960a, 1960b, 1960c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the uplink and/or downlink, and the like. As shown in FIG. 19D, the eNode-Bs 1960a, 1960b, 1960c may communicate with one another over an X2 interface.

The core network 1906 shown in FIG. 19D may include a mobility management gateway

(MME) 1962, a serving gateway 1964, and a packet data network (PDN) gateway 1966. While each of the foregoing elements are depicted as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator.

The MME 1962 may be connected to each of the eNode-Bs 1960a, 1960b, 1960c in the

RAN 1904 via an SI interface and may serve as a control node. For example, the MME 1962 may be responsible for authenticating users of the WTRUs 1902a, 1902b, 1902c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 1902a, 1902b, 1902c, and the like. The MME 1962 may also provide a control plane function for switching between the RAN 1904 and other RANs (not shown) that employ other radio technologies, such as GSM or WCDMA.

The serving gateway 1964 may be connected to each of the eNode Bs 1960a, 1960b, 1960c in the RAN 1904 via the SI interface. The serving gateway 1964 may generally route and forward user data packets to/from the WTRUs 1902a, 1902b, 1902c. The serving gateway 1964 may also perform other functions, such as anchoring user planes during inter-eNode B handovers, triggering paging when downlink data is available for the WTRUs 1902a, 1902b, 1902c, managing and storing contexts of the WTRUs 1902a, 1902b, 1902c, and the like.

The serving gateway 1964 may also be connected to the PDN gateway 1966, which may provide the WTRUs 1902a, 1902b, 1902c with access to packet-switched networks, such as the Internet 1910, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and IP- enabled devices.

The core network 1906 may facilitate communications with other networks. For example, the core network 1906 may provide the WTRUs 1902a, 1902b, 102c with access to circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices. For example, the core network 1906 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the core network 1906 and the PSTN 1908. In addition, the core network 1906 may provide the WTRUs 1902a, 1902b, 1902c with access to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers.

FIG. 19E is a system diagram of the RAN 1904 and the core network 1906 according to another embodiment. The RAN 1904 may be an access service network (ASN) that employs IEEE 802.16 radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. As will be further discussed below, the communication links between the different functional entities of the WTRUs 1902a, 1902b, 1902c, the RAN 1904, and the core network 1906 may be defined as reference points.

As shown in FIG. 19E, the RAN 1904 may include base stations 1970a, 1970b, 1970c, and an ASN gateway 1972, though it will be appreciated that the RAN 1904 may include any number of base stations and ASN gateways while remaining consistent with an embodiment. The base stations 1970a, 1970b, 1970c may each be associated with a particular cell (not shown) in the RAN 1904 and may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. In one embodiment, the base stations 1970a, 1970b, 1970c may implement MIMO technology. Thus, the base station 1970a, for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 1902a. The base stations 1970a, 1970b, 1970c may also provide mobility management functions, such as handoff triggering, tunnel establishment, radio resource management, traffic classification, quality of service (QoS) policy enforcement, and the like. The ASN gateway 1972 may serve as a traffic aggregation point and may be responsible for paging, caching of subscriber profiles, routing to the core network 1906, and the like.

The air interface 1916 between the WTRUs 1902a, 1902b, 1902c and the RAN 1904 may be defined as an Rl reference point that implements the IEEE 802.16 specification. In addition, each of the WTRUs 1902a, 1902b, 1902c may establish a logical interface (not shown) with the core network 1906. The logical interface between the WTRUs 1902a, 1902b, 1902c and the core network 1906 may be defined as an R2 reference point, which may be used for authentication, authorization, IP host configuration management, and/or mobility management.

The communication link between each of the base stations 1970a, 1970b, 1970c may be defined as an R8 reference point that includes protocols for facilitating WTRU handovers and the transfer of data between base stations. The communication link between the base stations 1970a, 1970b, 1970c and the ASN gateway 1972 may be defined as an R6 reference point. The R6 reference point may include protocols for facilitating mobility management based on mobility events associated with each of the WTRUs 1902a, 1902b, 1900c.

As shown in FIG. 19E, the RAN 1904 may be connected to the core network 1906. The communication link between the RAN 104 and the core network 1906 may defined as an R3 reference point that includes protocols for facilitating data transfer and mobility management capabilities, for example. The core network 1906 may include a mobile IP home agent (ΜΓΡ- HA) 1974, an authentication, authorization, accounting (AAA) server 1976, and a gateway 1978. While each of the foregoing elements are depicted as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator.

The MIP-HA 1974 may be responsible for IP address management, and may enable the WTRUs 1902a, 1902b, 1902c to roam between different ASNs and/or different core networks. The MIP-HA 1974 may provide the WTRUs 1902a, 1902b, 1902c with access to packet- switched networks, such as the Internet 1910, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and IP-enabled devices. The AAA server 1976 may be responsible for user authentication and for supporting user services. The gateway 1978 may facilitate interworking with other networks. For example, the gateway 1978 may provide the WTRUs 1902a, 1902b, 1902c with access to circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices. In addition, the gateway 1978 may provide the WTRUs 1902a, 1902b, 1902c with access to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers.

Although not shown in FIG. 19E, it will be appreciated that the RAN 1904 may be connected to other ASNs and the core network 1906 may be connected to other core networks. The communication link between the RAN 1904 the other ASNs may be defined as an R4 reference point, which may include protocols for coordinating the mobility of the WTRUs 1902a, 1902b, 1902c between the RAN 1904 and the other ASNs. The communication link between the core network 1906 and the other core networks may be defined as an R5 reference, which may include protocols for facilitating interworking between home core networks and visited core networks.

EMBODIMENTS

A video encoding method, comprising receiving video data; at each of a plurality of sampling ratios, determining a sampling error value; for a bit rate, at each of the plurality of sampling ratios, determining a coding error value; summing the sampling error value and the coding error value at each of the plurality of sampling ratios; selecting one of the plurality of sampling ratios based on the sum of the sampling error value and the coding error value at the selected sampling ratio; downsampling the video data at the selected sampling ratio; and encoding the downsampled video data.

A method of the proceeding embodiment, wherein selecting one of the plurality of sampling ratios comprises selecting the one of the plurality sampling ratios resulting in the lowest summation of the sampling error value and the coding error value.

A method of any of the proceeding embodiments, wherein selecting one of the plurality of sampling ratios comprises selecting one of the plurality sampling ratios resulting in a summation of the sampling error value and the coding error value having an overall error value beneath an overall error threshold.

A method of any of the proceeding embodiments, wherein the sampling error value is based on a power spectral density (PSD) of the video data and an estimation of the PSD of downsampled video data.

A method of any of the proceeding embodiments, wherein the estimation of the PSD of downsampled video data is a function, wherein at least one parameter of the function is determined by at least one characteristic of the video data. A method of any of the proceeding embodiments, wherein the sampling error value is based on a difference of the received video data and anti-aliasing filtered video data.

A method of any of the proceeding embodiments, wherein the coding error value is based on a coding error model, wherein the coding error model is a function of the bit rate and a sampling ratio.

A method of any of the proceeding embodiments, wherein the coding error model comprises a first parameter and a second parameter, and wherein the first parameter and the second parameter are each determined by at least one characteristic of the video data.

A method of any of the proceeding embodiments, further comprising for each of a plurality of bit rates, determining a bit per pixel value; for each of the plurality of bit rates, determining a distortion value; for each of the plurality of bit rates, determining a plurality of estimated distortion values based on a plurality of values for the first parameter and a plurality of values for the second parameter of the coding error model; and determining a selected value for the first parameter and a value for the second parameter of the coding error model, such that the plurality of distortion values have the minimum difference with the plurality of the estimated distortion values.

A method of any of the proceeding embodiments, further comprising selecting a value for the first parameter from a first look-up table; and selecting a value for the second parameter form a second look-up table.

A method of any of the proceeding embodiments, further comprising determining a power spectral density of the video data, wherein the values for the first and second parameters are based on a DC component of the power spectral density.

A method of any of the proceeding embodiments, further comprising determining a power spectral density of the video data, wherein the values for the first and second parameters are based on the decay speed toward the high frequency band of the power spectral density.

A method of any of the proceeding embodiments, wherein the at least one characteristic is a complexity value of the received video data; and wherein the complexity value is received from one of a user input and a network node.

A method of any of the proceeding embodiments, further comprising receiving an indication of the bit rate from a network node.

A method of any of the proceeding embodiments, further comprising subsequent to selecting the one of the plurality of sampling ratios, receiving an indication of a second bit rate; for a second bit rate, determining an updated coding error valueat each of the plurality of sampling ratios; selecting an updated sampling ratio based on a summation of the sampling error value and updated coding error value; downsampling the input video at the updated sampling ratio; and encoding the downsampled video sequence.

A method of any of the proceeding embodiments, wherein the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is different from the vertical sampling ratio.

A method of any of the proceeding embodiments, wherein the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is the same as the vertical sampling ratio.

A method of any of the proceeding embodiments, wherein a first selection of the sampling ratio is performed at the beginning of the received video data and at least a second selection of the sampling ratio is performed during the duration of the received video data.

A video decoding method, comprising receiving compressed video data; receiving an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios; decoding the compressed video data to form reconstructed video data; upsampling the reconstructed video data at the selected sampling ratio to increase the resolution of the upsampled reconstructed video; and outputting the upsampled video data.

A video decoding system comprising a video decoder, the video decoder configured to receive compressed video data; receive an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios; decode the compressed video data to form reconstructed video data; upsample the reconstructed video data to increase the resolution of the reconstructed video data; and output the filtered video data.

A video decoding system of the proceeding embodiment further comprising a wireless receive/transmit unit in communication with a communication system, wherein the wireless receive/transmit unit is configured to receive the video data from the communication system.

Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer- readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer- readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Variations of the method, apparatus and system described above are possible without departing from the scope of the invention. In view of the wide variety of embodiments that can be applied, it should be understood that the illustrated embodiments are exemplary only, and should not be taken as limiting the scope of the following claims.

Moreover, in the embodiments described above, processing platforms, computing systems, controllers, and other devices containing processors are noted. These devices may contain at least one Central Processing Unit ("CPU") and memory. In accordance with the practices of persons skilled in the art of computer programming, reference to acts and symbolic representations of operations or instructions may be performed by the various CPUs and memories. Such acts and operations or instructions may be referred to as being "executed," "computer executed" or "CPU executed."

One of ordinary skill in the art will appreciate that the acts and symbolically represented operations or instructions include the manipulation of electrical signals by the CPU. An electrical system represents data bits that can cause a resulting transformation or reduction of the electrical signals and the maintenance of data bits at memory locations in a memory system to thereby reconfigure or otherwise alter the CPU's operation, as well as other processing of signals. The memory locations where data bits are maintained are physical locations that have particular electrical, magnetic, optical, or organic properties corresponding to or representative of the data bits. It should be understood that the exemplary embodiments are not limited to the above- mentioned platforms or CPUs and that other platforms and CPUs may support the described methods.

The data bits may also be maintained on a computer readable medium including magnetic disks, optical disks, and any other volatile (e.g., Random Access Memory ("RAM")) or nonvolatile (e.g., Read-Only Memory ("ROM")) mass storage system readable by the CPU. The computer readable medium may include cooperating or interconnected computer readable medium, which exist exclusively on the processing system or are distributed among multiple interconnected processing systems that may be local or remote to the processing system. It should be understood that the exemplary embodiments are not limited to the above-mentioned memories and that other platforms and memories may support the described methods.

No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article "a" is intended to include one or more items. Where only one item is intended, the term "one" or similar language is used. Further, the terms "any of followed by a listing of a plurality of items and/or a plurality of categories of items, as used herein, are intended to include "any of," "any combination of," "any multiple of," and/or "any combination of multiples of the items and/or the categories of items, individually or in conjunction with other items and/or other categories of items. Further, as used herein, the term "set" is intended to include any number of items, including zero. Further, as used herein, the term "number" is intended to include any number, including zero.

Moreover, the claims should not be read as limited to the described order or elements unless stated to that effect. In addition, use of the term "means" in any claim is intended to invoke 35 U.S.C. § 1 12, H 6, and any claim without the word "means" is not so intended.

Claims

What is Claimed:

1. A video encoding method, comprising:

receiving video data;

at each of a plurality of sampling ratios, determining a sampling error value;

for a bit rate, at each of the plurality of sampling ratios, determining a coding error value; summing the sampling error value and the coding error value at each of the plurality of sampling ratios;

selecting one of the plurality of sampling ratios based on the sum of the sampling error value and the coding error value at the selected sampling ratio;

downsampling the video data at the selected sampling ratio; and

encoding the downsampled video data.

2. The method of claim 1, wherein selecting one of the plurality of sampling ratios comprises selecting the one of the plurality sampling ratios resulting in the lowest summation of the sampling error value and the coding error value.

3. The method of claim 1, wherein selecting one of the plurality of sampling ratios comprises selecting one of the plurality sampling ratios resulting in a summation of the sampling error value and the coding error value having an overall error value beneath an overall error threshold.

4. The method of claim 1, wherein the sampling error value is based on a power spectral density (PSD) of the video data and an estimation of the PSD of downsampled video data.

5. The method of claim 4, wherein the estimation of the PSD of downsampled video data is a function, wherein at least one parameter of the function is determined by at least one characteristic of the video data.

6. The method of claim 1, wherein the sampling error value is based on a difference of the received video data and anti-aliasing filtered video data.

7. The method of claim 1, wherein the coding error value is based on a coding error model, wherein the coding error model is a function of the bit rate and a sampling ratio.

8. The method of claim 7, wherein the coding error model comprises a first parameter and a second parameter, and wherein the first parameter and the second parameter are each determined by at least one characteristic of the video data.

9. The method of claim 8, comprising:

for each of a plurality of bit rates, determining a bit per pixel value;

for each of the plurality of bit rates, determining a distortion value;

for each of the plurality of bit rates, determining a plurality of estimated distortion values based on a plurality of values for the first parameter and a plurality of values for the second parameter of the coding error model; and

determining a selected value for the first parameter and a value for the second parameter of the coding error model, such that the plurality of distortion values have the minimum difference with the plurality of the estimated distortion values.

10. The method of claim 8, comprising:

selecting a value for the first parameter from a first look-up table; and

selecting a value for the second parameter form a second look-up table.

1 1. The method of claim 8, comprising:

determining a power spectral density of the video data, wherein the values for the first and second parameters are based on a DC component of the power spectral density.

12. The method of claim 8, comprising:

determining a power spectral density of the video data, wherein the values for the first and second parameters are based on the decay speed toward the high frequency band of the power spectral density.

13. The method of claim 8, comprising:

wherein the at least one characteristic is a complexity value of the received video data; and

wherein the complexity value is received from one of a user input and a network node.

14. The method of claim 1, comprising:

receiving an indication of the bit rate from a network node.

15. The method of claim 14, comprising:

subsequent to selecting the one of the plurality of sampling ratios, receiving an indication of a second bit rate;

for a second bit rate, determining an updated coding error value at each of the plurality of sampling ratios;

selecting an updated sampling ratio based on a summation of the sampling error value and updated coding error value;

downsampling the input video at the updated sampling ratio; and

encoding the downsampled video sequence.

16. The method of claim 1, wherein the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is different from the vertical sampling ratio.

17. The method of claim 1, wherein the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is the same as the vertical sampling ratio.

18. The method of claim 1, wherein a first selection of the sampling ratio is performed at the beginning of the received video data and at least a second selection of the sampling ratio is performed during the duration of the received video data.

19. A video decoding method, comprising:

receiving compressed video data;

receiving an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios;

decoding the compressed video data to form reconstructed video data;

upsampling the reconstructed video data at the selected sampling ratio to increase resolution of the reconstructed video data; and outputting the filtered video data.

20. A video decoding system, comprising:

A video decoder, the video decoder configured to:

receive compressed video data;

receive an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios;

decode the compressed video data to form reconstructed video data; upsample the reconstructed video data to increase a resolution of the reconstructed video; and

output the upsampled video data.

21. The video decoding system of claim 20, comprising:

a wireless receive/transmit unit in communication with a communication system, wherein the wireless receive/transmit unit is configured to receive the video data from the communication system.