AU2011319844A1

AU2011319844A1 - Systems and methods for adaptive video coding

Info

Publication number: AU2011319844A1
Application number: AU2011319844A
Authority: AU
Inventors: Zhifeng Chen; Serhad Doken; Jie Dong; Yan Ye
Original assignee: Vid Scale Inc
Current assignee: Vid Scale Inc
Priority date: 2010-10-27
Filing date: 2011-10-27
Publication date: 2013-06-13
Also published as: KR20130105870A; WO2012058394A1; CN103283227A; EP2633685A1

Abstract

Systems and methods are described for determining an optimised sampling ratio for coding video data that reduces overall distortion introduced by the coding process. It seeks to balance the information loss introduced during downsampling and the information loss introduced during coding. The sampling ratio is generally determined by reducing, or in some cases minimizing, the overall error introduced through the downsampling process and the coding process, and may be adaptive based on the content of the video data being processed and a target bit rate. Computation power can be saved by coding a downsampled video. The process derives a plurality of downsampling ratios, and selects a downsampling ratio that reduces the total amount of distortion introduced during the down - sampling and coding stages. The down - sampling ratio may be selected given the available data transmission capacity, input video signal statistics, and/or other operational parameters, and may optimally reduce the overall distortion.

Description

WO 2012/058394 PCT/US2011/058027 SYSTEMS AND METHODS FOR ADAPTIVE VIDEO CODING CROSS REFERENCE TO RELATED APPLICATION This application claims the benefit of U.S. Provisional Application No. 61/407,329, filed 5 October 27, 2010, the entirety of which is incorporated herein by reference. BACKGROUND Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital 10 assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, and the like. Many digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), and extensions of such standards, to transmit and receive 15 digital video information more efficiently. Although wireless communications technology has dramatically increased the wireless bandwidth and improved the quality of service for users of mobile devices, the fast-growing demand of video content, such as high-definition (HD) video content, over mobile Internet brings new challenges for mobile video content providers, distributors, and carrier service providers. 20 SUMMARY In accordance with one embodiment, a video encoding method comprises receiving video data, and determining a sampling error value at each of a plurality of downsampling ratios. The video encoding method may also comprise, for a bit rate, determining a coding error value at 25 each of the plurality of downsampling ratios and summing the sampling error value and the coding error value at each of the plurality of downsampling ratios. The video encoding method may also comprise selecting one of the plurality of downsampling ratios based on the sum of the sampling error value and the coding error value at the selected downsampling ratio, downsampling the video data at the selected sampling ratio, and encoding the downsampled 30 video data. In accordance with another embodiment, a video decoding method comprises receiving compressed video data and receiving an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across WO 2012/058394 PCT/US2011/058027 a plurality of sampling ratios. The video decoding method may also comprise decoding the compressed video data to form reconstructed video data, upsampling the reconstructed video data at the selected sampling ratio to increase the resolution of the reconstructed video data, and outputting the filtered video data. 5 In accordance with another embodiment, a video decoding system comprises a video decoder. The video decoder may configured to receive compressed video data, and receive an indication of a selected sampling ratio, where the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios. The video decoder may also be configured to decode the compressed video data to form reconstructed 10 video data, upsample the reconstructed video data to increase the resolution of the reconstructed video data, and output the upsampled video data. BRIEF DESCRIPTION OF THE DRAWINGS A more detailed understanding may be had from the following description, given by way 15 of example in conjunction with the accompanying drawings wherein: FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize the adaptive coding techniques described herein; FIG. 2 is a block diagram illustrating an example of video encoder that may implement techniques for the adaptive encoding of a video signal; 20 FIG. 3 is a block diagram illustrating an example of video decoder that may implement techniques for the adaptive decoding of a video signal; FIG. 4 shows a coding scheme applying a codec directly on an input video; FIG. 5 shows an exemplary embodiment utilizing coding with down- sampling and up sampling stages; 25 FIGS. 6A and 6B show the processing illustrated in FIG. 5 decomposed into a sampling component and a coding component, respectively; FIG. 7 is a look-up table for a in accordance with one non-limiting embodiment; FIG. 8 is a look-up table for P in accordance with one non-limiting embodiment; FIGS. 9A, 9B and 9C illustrate searching strategies to find the sampling ratio M; in 30 accordance with various non-limiting embodiments; FIGS. 10A and 10B are process flows in accordance with one non-limiting embodiment; -2- WO 2012/058394 PCT/US2011/058027 FIG. 11 is a block diagram of a horizontal downsampling process having a downsampling ratio of h in accordance with one non-limiting embodiment; Mh FIG. 12 illustrates an example downsampling process; FIG. 13 illustrates an example upsampling process; 5 FIG. 14 illustrates an example Gaussian window function; FIG. 15 illustrates pixels during an example upsampling process; FIG. 16 illustrates an exemplary encoder architecture in accordance with one non limiting embodiment; FIG. 17 illustrates an exemplary decoder architecture in accordance with one non 10 limiting embodiment; FIG. 18 illustrates an exemplary embodiment of the pre-processing of the video data with regard to a transcoder; FIG. 19A is a system diagram of an example communications system in which one or more disclosed embodiments may be implemented; 15 FIG. 19B is a system diagram of an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 19A; and FIG. 19C, 19D, and 19E is a system diagram of an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 19A. 20 DETAILED DESCRIPTION Both multimedia technology and mobile communications have experienced massive growth and commercial success in recent years. Wireless communications technology has dramatically increased the wireless bandwidth and improved the quality of service for mobile users. For example, 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) 25 standard has improved the quality of service as compared to 2 Generation (2G) and/or 3rd Generation (3G). While wireless communications technology has greatly improved, the fast growing demand of video content, such as high-definition (HD) video content for example, over mobile Internet brings new challenges for mobile video content providers, distributors and carrier service providers. 30 Video and multimedia content that is available on the wired web has driven users to desire equivalent on-demand access to that content from a mobile device. A much higher percent of the world's mobile data traffic is becoming video content. Mobile video has the -3 - WO 2012/058394 PCT/US2011/058027 highest growth rate of any application category measured within the mobile data portion of the Cisco VNI Forecast at this time. As video content demands increase, so does the amount of data needed to meet these demands. The block size for processing video content under current compression standards, 5 such as the H.264 (AVC) standard for example, is 16x16. Therefore, current compression standards may be good for small resolution video content, but not for higher quality and/or higher resolution video content, such as HD video content for example. Driven by the demand for high quality and/or resolution video content and the availability of more advanced compression techniques, video coding standards may be created that may further reduce the data 10 rate needed for high quality video coding, as compared to the current standards, such as AVC for example. For example, groups such as the Joint Collaborative Team on Video Coding (JCT VC), which was formed by International Telecommunication Union Video Coding Experts Group (ITU-VCEG) and International Organization for Standardization Moving Picture Experts Group (ISO-MPEG), are being created to develop video coding standards to improve video 15 coding standards. However, the expected long research, development and deployment period of a new video standard, based on the experience of previous video standards development, may not meet the enormously emerging demand for high quality and/or resolution video content delivery over mobile Internet as quickly as demand may require. Therefore, systems and methods are needed 20 to meet the growing demand of high quality and/or resolution video content delivery over mobile Internet. For example, systems and methods may be provided for high quality and/or resolution video content compatibility with current standards, such as HD video content compatibility with the AVC video compression standard for example. FIG. 1 is a block diagram illustrating an example video encoding and decoding system 10 25 that may utilize the adaptive coding techniques described herein. As shown in FIG. 1, system 10 includes a source device 12 that transmits encoded video to a destination device 14 via a communication channel 16. Source device 12 and destination device 14 may comprise any of a wide range of devices. In some cases, source device 12 and destination device 14 may comprise wireless receive/transmit units (WRTUs), such as wireless handsets or any wireless devices that 30 can communicate video information over a communication channel 16, in which case communication channel 16 is wireless. The systems and method described herein, however, are not necessarily limited to wireless applications or settings. For example, these techniques may apply to over-the-air television broadcasts, cable television transmissions, satellite television -4- WO 2012/058394 PCT/US2011/058027 transmissions, Internet video transmissions, encoded digital video that is encoded onto a storage medium, or other scenarios. Accordingly, communication channel 16 may comprise any combination of wireless or wired media suitable for transmission of encoded video data. In the example of FIG. 1, source device 12 includes a video source 18, video encoder 20, 5 a modulator (generally referred to as a modem) 22 and a transmitter 24. Destination device 14 includes a receiver 26, a demodulator (generally referred to as a modem) 28, a video decoder 30, and a display device 32. In accordance with this disclosure, video encoder 20 of source device 12 may be configured to apply the adaptive coding techniques described in more detail below. In other examples, a source device and a destination device may include other components or 10 arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Likewise, destination device 14 may interface with an external display device, rather than including an integrated display device. In other embodiments, the data stream generated by the video encoder may be conveyed to other devices without the need for modulating the data onto a carrier signal, such as by direct digital transfer, 15 wherein the other devices may or may not modulate the data for transmission. The illustrated system 10 of FIG. 1 is merely one example. The techniques described herein may be performed by any digital video encoding and/or decoding device. Although generally the techniques of this disclosure are performed by a video encoding device, the techniques may also be performed by a video encoder/decoder, typically referred to as a 20 "CODEC." Moreover, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices in which source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetrical manner such that each of devices 12, 14 include video encoding and decoding components. Hence, 25 system 10 may support one-way or two-way video transmission between devices 12, 14, e.g., for video streaming, video playback, video broadcasting, or video telephony. In some embodiments, the source device may be a video streaming server for generating encoded video data for one or more destination devices, where the destination devices may be in communication with the source device over wired and/or wireless communication systems. 30 Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed from a video content provider. As a further alternative, video source 18 may generate computer graphics based data as the source video, or a combination of live video, archived video, and computer -5- WO 2012/058394 PCT/US2011/058027 generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. As mentioned above, however, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, 5 pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be modulated by modem 22 according to a communication standard, and transmitted to destination device 14 via transmitter 24. Modem 22 may include various mixers, filters, amplifiers or other components designed for signal modulation. Transmitter 24 may include circuits designed for transmitting data, including amplifiers, filters, and one or more 10 antennas. Receiver 26 of destination device 14 receives information over channel 16, and modem 28 demodulates the information. Again, the video decoding process may implement one or more of the techniques described herein. The information communicated over channel 16 may include syntax information defined by video encoder 20, which is also used by video decoder 30, that 15 includes syntax elements that describe characteristics and/or processing of macroblocks and other coded units, e.g., GOPs. Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device. 20 In the example of FIG. 1, communication channel 16 may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media. Communication channel 16 may form part of a packet-based network, such as a local area network, a wide-area network, or a global network such as the Internet. Communication channel 16 generally represents any suitable 25 communication medium, or collection of different communication media, for transmitting video data from source device 12 to destination device 14, including any suitable combination of wired or wireless media. Communication channel 16 may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from source device 12 to destination device 14. 30 Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, Part 10, Advanced Video Coding (AVC). The techniques of this disclosure, however, are not limited to any particular coding standard. Other examples include MPEG-2 and ITU-T H.263. Although -6- WO 2012/058394 PCT/US2011/058027 not shown in FIG. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, MUX-DEMUX units may conform to the ITU 5 H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP). The ITU-T H.264/MPEG-4 (AVC) standard was formulated by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership known as the Joint Video Team (JVT). In some aspects, the techniques described in this disclosure may be applied to devices that generally conform to 10 the H.264 standard. The H.264 standard is described in ITU-T Recommendation H.264, Advanced Video Coding for generic audiovisual services, by the ITU-T Study Group, and dated March, 2005, which may be referred to herein as the H.264 standard or H.264 specification, or the H.264/AVC standard or specification. The Joint Video Team (JVT) continues to work on extensions to H.264/MPEG-4 AVC. 15 Video encoder 20 and video decoder 30 each may be implemented as any of a variety of suitable encoder circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, 20 either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective camera, computer, mobile device, subscriber device, broadcast device, set-top box, server, media aware network element, or the like. A video sequence typically includes a series of video frames. A group of pictures (GOP) generally comprises a series of one or more video frames. A GOP may include syntax data in a 25 header of the GOP, a header of one or more frames of the GOP, or elsewhere, that describes a number of frames included in the GOP. Each frame may include frame syntax data that describes an encoding mode for the respective frame. Video encoder 20 typically operates on video blocks within individual video frames in order to encode the video data. A video block may correspond to a macroblock, a partition of a macroblock, or a collection of blocks or macroblocks. The video 30 blocks may have fixed or varying sizes, and may differ in size according to a specified coding standard. Each video frame may include a plurality of slices. Each slice may include a plurality of macroblocks, which may be arranged into partitions, also referred to as sub-blocks. -7- WO 2012/058394 PCT/US2011/058027 Many popular video coding standards, such as H.263, MPEG-2, and MPEG-4, H.264/AVC (advanced video coding), HEVC (High Efficiency Video Coding) utilize motion compensated prediction techniques. An image or a frame of a video may be partitioned into multiple macroblocks and each macroblock can be further partitioned. Macroblocks in an I 5 frame may be encoded by using the prediction from spatial neighbors (that is, other blocks of the I-frame). Macroblocks in a P- or B-frame may be encoded by using either the prediction from their spatial neighbors (spatial prediction or intra-mode encoding) or areas in other frames (temporal prediction or inter-mode encoding). Video coding standards define syntax elements to represent coding information. For example, for every macroblock, H.264 defines an mb-type 10 value that represents the manner in which a macroblock is partitioned and the method of prediction (spatial or temporal). Video encoder 20 may provide individual motion vectors for each partition of a macroblock. For example, if video encoder 20 elects to use the full macroblock as a single partition, video encoder 20 may provide one motion vector for the macroblock. As another 15 example, if video encoder 20 elects to partition a 16x16 pixel macroblock into four 8x8 partitions, video encoder 20 may provide four motion vectors, one for each partition. For each partition (or sub-macroblock unit), video encoder 20 may provide an mvd (motion vector difference) value and a refidx value to represent motion vector information. The mvd value may represent an encoded motion vector for the partition, relative to a motion predictor. The refidx 20 (reference index) value may represent an index into a list of potential reference pictures, that is, reference frames. As an example, H.264 provides two lists of reference pictures: list 0 and list 1. The refidx value may identify a picture in one of the two lists. Video encoder 20 may also provide information indicative of the list to which the refidx value relates. As an example, the ITU-T H.264 standard supports intra prediction in various block 25 partition sizes, such as 16 by 16, 8 by 8, or 4 by 4 for luma components, and 8x8 for chroma components, as well as inter prediction in various block sizes, such as 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4 for luma components and corresponding scaled sizes for chroma components. In this disclosure, "NxN" and "N by N" may be used interchangeably to refer to the pixel dimensions of the block in terms of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 30 by 16 pixels. In general, a 16x16 block will have 16 pixels in a vertical direction (y=16) and 16 pixels in a horizontal direction (x=16). Likewise, an NxN block generally has N pixels in a vertical direction and N pixels in a horizontal direction, where N represents a nonnegative integer value. The pixels in a block may be arranged in rows and columns. Moreover, blocks -8- WO 2012/058394 PCT/US2011/058027 need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction. For example, blocks may comprise NxM pixels, where M is not necessarily equal to N. Block sizes that are less than 16 by 16 may be referred to as partitions of a 16 by 16 macroblock. Video blocks may comprise blocks of pixel data in the pixel domain, or blocks of 5 transform coefficients in the transform domain, e.g., following application of a transform such as a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video block data representing pixel differences between coded video blocks and predictive video blocks. In some cases, a video block may comprise blocks of quantized transform coefficients in the transform domain. 10 Smaller video blocks can provide better prediction and less residual, and may be used for locations of a video frame that include high levels of detail. In general, macroblocks and the various partitions, sometimes referred to as sub-blocks, may be considered video blocks. In addition, a slice may be considered to be a plurality of video blocks, such as macroblocks and/or sub-blocks. Each slice may be an independently decodable unit of a video frame. Alternatively, 15 frames themselves may be decodable units, or other portions of a frame may be defined as decodable units. The term "coded unit" or "coding unit" may refer to any independently decodable unit of a video frame such as an entire frame, a slice of a frame, a group of pictures (GOP) also referred to as a sequence, or another independently decodable unit defined according to applicable coding techniques. 20 The H.264 standard supports motion vectors having one-quarter-pixel precision. That is, encoders, decoders, and encoders/decoders (CODECs) that support H.264 may use motion vectors that point to either a full pixel position or one of fifteen fractional pixel positions. Values for fractional pixel positions may be determined using adaptive interpolation filters or fixed interpolation filters. In some examples, H.264-compliant devices may use filters to calculate 25 values for the half-pixel positions, then use bilinear filters to determine values for the remaining one-quarter-pixel positions. Adaptive interpolation filters may be used during an encoding process to adaptively define interpolation filter coefficients, and thus the filter coefficients may change over time when performing adaptive interpolation filters. Following intra-predictive or inter-predictive coding to produce predictive data and 30 residual data, and following any transforms (such as the 4x4 or 8x8 integer transform used in H.264/AVC or a discrete cosine transform DCT) to produce transform coefficients, quantization of transform coefficients may be performed. Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the -9- WO 2012/058394 PCT/US2011/058027 coefficients. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m. Following quantization, entropy coding of the quantized data may be performed, e.g., 5 according to content adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), or another entropy coding methodology. A processing unit configured for entropy coding, or another processing unit, may perform other processing functions, such as zero run length coding of quantized coefficients and/or generation of syntax information such as coded block pattern (CBP) values, macroblock type, coding mode, 10 maximum macroblock size for a coded unit (such as a frame, slice, macroblock, or sequence), or the like. Video encoder 20 may further send syntax data, such as block-based syntax data, frame based syntax data, slice-based syntax data, and/or GOP-based syntax data, to video decoder 30, e.g., in a frame header, a block header, a slice header, or a GOP header. The GOP syntax data 15 may describe a number of frames in the respective GOP, and the frame syntax data may indicate an encoding/prediction mode used to encode the corresponding frame. Video decoder 30 may receive a bitstream including motion vectors encoded according to any of the techniques of this disclosure. Accordingly, video decoder 30 may be configured to interpret the encoded motion vector. For example, video decoder 30 may first analyze a sequence 20 parameter set or slice parameter set to determine whether the encoded motion vector was encoded using a method that keeps all motion vectors in one motion resolution, or using a method where the motion predictor was quantized to the resolution of the motion vector. Video decoder 30 may then decode the motion vector relative to the motion predictor by determining the motion predictor and adding the value for the encoded motion vector to the motion predictor. 25 Video encoder 20 and video decoder 30 each may be implemented as any of a variety of suitable encoder or decoder circuitry, as applicable, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic circuitry, software, hardware, firmware or any combinations thereof. Each of video encoder 20 and video decoder 30 may be included in one or more 30 encoders or decoders, either of which may be integrated as part of a combined video encoder/decoder (CODEC). An apparatus including video encoder 20 and/or video decoder 30 may comprise an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular telephone. - 10 - WO 2012/058394 PCT/US2011/058027 FIG. 2 is a block diagram illustrating an example of video encoder 200 that may implement techniques for the adaptive encoding of a video signal. Video encoder 200 may perform intra- and inter-coding of blocks within video frames, including macroblocks, or partitions or sub-partitions of macroblocks. Intra-coding relies on spatial prediction to reduce or 5 remove spatial redundancy in video within a given video frame. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames of a video sequence. Intra-mode (I-mode) may refer to any of several spatial based compression modes and inter-modes such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode) may refer to any of several temporal-based compression modes. Although components for inter 10 mode encoding are depicted in FIG. 2, it should be understood that video encoder 200 may further include components for intra-mode encoding. However, such components are not illustrated for the sake of brevity and clarity. The input video signal 202 is processed block by block. The video block unit may be 16 pixels by 16 pixels (i.e., a macoblock (MB)). Currently, JCT-VC (Joint Collaborative Team on 15 Video Coding) of ITU-T/SG16/Q.6/VCEG and ISO/IEC/MPEG is developing the next generation video coding standard called High Efficiency Video Coding (HEVC). In HEVC, extended block sizes (called a "coding unit" or CU) are used to compress high resolution (1080p and beyond) video signals more efficiently. In HEVC, a CU can be up to 64x64 pixels and down to 4x4 pixels. A CU can be further partitioned into prediction units or PU, for which separate 20 prediction methods are applied. Each input video block (MB, CU, PU, etc.) may be processed by using spatial prediction unit 260 and/or temporal prediction unit 262. Spatial prediction (i.e., intra prediction) uses pixels from the already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal. Temporal prediction (i.e., inter 25 prediction or motion compensated prediction) uses pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal. Temporal prediction for a given video block is usually signaled by one or more motion vectors which indicate the amount and the direction of motion between the current block and one or more of its reference block(s). 30 If multiple reference pictures are supported (as is the case for the recent video coding standards such as H.264/AVC or HEVC), then for each video block, its reference picture index is also sent. The reference index is used to identify from which reference picture in the reference picture store 264 the temporal prediction signal comes. After spatial and/or temporal prediction, - 11 - WO 2012/058394 PCT/US2011/058027 the mode decision and encoder controller 280 in the encoder chooses the prediction mode, for example based on a rate-distortion optimization method. The prediction block is then subtracted from the current video block at adder 216 and the prediction residual is transformed by transformation unit 204 and quantized by quantization unit 206. The quantized residual 5 coefficients are inverse quantized at inverse quantization unit 210 and inverse transformed at inverse transformation unit 212 to form the reconstructed residual. The reconstructed block is then added back to the prediction block at adder 226 to form the reconstructed video block. Further in-loop filtering, such as deblocking filter and adaptive loop filters 266, may be applied on the reconstructed video block before it is put in the reference picture store 264 and used to 10 code future video blocks. To form the output video bitstream 220, coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are sent to the entropy coding unit 208 to be further compressed and packed to form the bitstream 220. As described in more detail below, the systems and methods described herein may be implemented, at least partially, within the spatial prediction unit 260. 15 FIG. 3 is a block diagram of a block-based video decoder in accordance with one non limiting embodiment. The video bitstream 302 is first unpacked and entropy decoded at entropy decoding unit 308. The coding mode and prediction information are sent to either the spatial prediction unit 360 (if intra coded) or the temporal prediction unit 362 (if inter coded) to form the prediction block. The residual transform coefficients are sent to inverse quantization unit 3 10 20 and inverse transform unit 312 to reconstruct the residual block. The prediction block and the residual block are then added together at 326. The reconstructed block may further go through in-loop filtering unit 366 before it is stored in reference picture store 364. The reconstructed video 320 may then be sent out to drive a display device, as well as used to predict future video blocks. 25 According to an embodiment, a pre-processing and/or post-processing system architecture may compress raw video data and/or transcode an already encoded video data, such as a bit stream for example, with further compression through jointly controlling the transform domain quantization and spatial-domain down-sampling, without changing the standard format of the video stream. The pre-processing and/or post-processing system architecture may encode 30 and/or decode video data in any format, such as H.263, MPEG-2, Flash, MPEG-4, H.264/AVC, HEVC or any similar multimedia format for example. These, and similar, formats may use such video compression methods as discrete cosine transform (DCT), fractal compression methods, matching pursuit, or discrete wavelet transform (DWT) for example, as described above.. - 12 - WO 2012/058394 PCT/US2011/058027 A limitation of various existing compression standards, such as H.264/AVC, is the specified Macroblock (MB) size, such as 16x16 for example. Inside one MB, pixels may be partitioned into several block sizes dependent on the prediction modes. The maximum size of any block may be 16x16 and any two MBs may be independently transformed and quantized. 5 This technique may provide very high efficiency for CIF/QCIF and other similar resolution contents. However, it may not be efficient for the video contents of higher resolutions, such as 720p, 1080i/1080p and/or similar or even higher resolutions for example. This may be because there is much higher correlation among pixels in local areas. As a result, the specified 16x16 MB size may limit further compression of utilizing such correlation information across adjacent 10 MBs. The high resolution content encoded by small MB size may cause unnecessary overhead. For example, in an H.264 bit stream, the codec elements may include four types of information: 1) motion information, such as motion vector and reference frame index for example; 2) residual data; 3) MB header information, such as MB type, coded block pattern, and/or quantization 15 parameters (QP) for example; 4) sequence-, picture-, and/or slice-layer syntax elements. While the motion information and residual data may be highly content-dependent, the MB header information and/or syntax elements may be relatively constant. Thus the MB header information and/or syntax elements may represent the overhead in the bit stream. Given the content and/or encoding profile, a higher compression ratio of an encoder may be achieved by reducing the bit 20 rate of residual data. For example, a higher compression ratio of an H.264 encoder may be achieved by reducing the bit rate of residual data. The higher the compression ratio is, the higher the percentage of overhead that may exist. As a result, in the high resolution and/or low bit rate applications, overhead may consume a large part of the bit stream for transmission and storage. Having such a large part of the bit stream consumed by overhead may cause an encoder, such as 25 an H.264 encoder for example, to have a low efficiency. The pre-processing and/or post-processing in accordance with the systems and methods described herein may lead to less overhead, alignment of the motion compensation accuracy and reconstruction accuracy, enhancement of residual accuracy, and/or less complexity and/or memory requirements. Less overhead may be produced due to the downsampling performed in 30 the pre-processing, as the number of MB may be reduced to the downsampling rate. Thus, the near-constant MB header and/or slice-layer syntax elements may be reduced. The motion compensation accuracy and reconstruction accuracy may also be aligned in the pre-processing and/or post-processing of video data. In the down-sampled frames, the - 13 - WO 2012/058394 PCT/US2011/058027 number of motion vector differences (MVD) may be reduced. According to an embodiment, the reduction in MVD may save bits for encoding motion information. In an embodiment, the saved bits may be used to encode the prediction error in low bit rate scenarios. Therefore, the reconstruction accuracy may be improved by aligning the accuracy of motion compensation and 5 accuracy of quantized prediction error. The pre-processing and/or post-processing of video data may also enhance residual accuracy. For example, in the down-sampled frames, the same transform block size may correspond to a higher transform block size in the original frames. According to one example, an 8x8 transform block size may correspond to a transform block size of 16x16 at 14 downsampling 10 rate. As the quantization steps may be the same for the transform coefficients in an encoder, such as an H.264 encoder for example, the encoder may lose information in both high frequency and low frequency components. Therefore, the pre-processing and/or post-processing of video data described herein may preserve the higher accuracy of low frequency components than traditional encoders for the high resolution and low bit rate encoding cases, which may produce 15 better subjective quality. The upsampling process in a decoder may be used to interpolate the pixels to recover the original frames. The pre-processing and/or post-processing of video data may also result in less complexity and/or memory requirements. As the number of pixels for encoding after down sampling may be reduced to the down-sampling rate, the complexity and/or memory requirement 20 of encoding (or transcoding) may be reduced to the same level. Accordingly, the complexity and/or memory requirements of decoding may also be reduced to the same level. These encoding and/or decoding processes may facilitate the application of lower resolution encoders and/or decoders, such as the encoding in mobile phones and other resource-limited devices for example. According to an exemplary embodiment, these encoding and/or decoding processes 25 may facilitate the incorporation and/or application of the H.264 encoder and/or decoder in mobile phones. To address the limitation of traditional encoders in the high resolution and/or low bit rate applications, the systems and methods described herein may independently and/or jointly control the transform-domain quantization and spatial-domain down-sampling to achieve further 30 compression. The quantization and down-sampling may be performed with an acceptable subjective quality. FIG. 4 shows a coding scheme applying a codec (i.e., a H.264/AVC codec) directly on an input video. FIG. 5 shows an exemplary embodiment utilizing coding with down and up-sampling stages. Compared with the approach illustrated in FIG. 4, the approach - 14 - WO 2012/058394 PCT/US2011/058027 illustrated in FIG. 5 may be able to allocate more bits to code the intra- and inter- prediction errors in the coding step; hence it may obtain a better reconstruction with higher visual quality. Although down-sampling introduces information loss (specifically the high frequency components), when the operating bit rate is low due to network limitation, better reconstruction 5 at the coding stage may outweigh the detail loss in the downsampling process; hence better overall visual quality is provided. Additionally, computation power can be saved by coding a smaller (i.e. downsampled) video. However, since downsampling causes information loss prior to the coding process, if the original video is downsampled too much, information loss introduced upfront may outweigh the benefit of higher fidelity in the coding stage. Thus, the 10 systems and methods described herein generally seek to balance the information loss introduced during downsampling and the information loss introduced during coding. Specifically, the processes described herein may derive a plurality of downsampling ratios, and select a downsampling ratio that reduces a total amount of distortion introduced during the down sampling and coding stages. The selected down-sampling ratio may be selected given the 15 available data transmission capacity, input video signal statistics, and/or other operational parameters. In some embodiments, the selected down-sampling ratio may be the down-sampling ratio that optimally reduces the overall distortion. The flexibility provided by the filters described herein may be more useful than other filters, such as anti-aliasing filters that may provide only 2x2 down- sampling and up-sampling, 20 for example. At high bit-rates, such as 512 kbits/s for CIF, for example, the downsampling ratio 2x2 is so high that the high frequency components are significantly lost and cannot be compensated even if using lossless coding. Therefore, at high bit-rates, the sampling ratio may be adjusted to provide a tradeoff between resolution reduction and detail preserving. Referring now to FIG. 5, the downsampling ratio denoted as M, is a variable which may 25 be determined as a function of various parameters, such as the available data transmission capacity, Quality of Service Class Identifier (QCI) of the bearer associated with the video, and characteristics of the input video signal. For example, if the data transmission capacity is relatively plentiful for the input video signal, then an H.264/AVC encoder will have enough bits to code the prediction errors; in this case, the value of M may be set approaching 1.0. On the 30 contrary, if the data transmission capacity is deemed to be insufficient for the input signal, then a larger value of M may be selected (resulting in more downsampling), as the information loss due to the downsampling process will be well compensated by lesser coding error due to the coding stage. As the data transmission capacity is usually represented by bit rate, which may be in fine - 15 - WO 2012/058394 PCT/US2011/058027 granularity, in various embodiments the value of M may be very flexible. As described in more detail below, systems and methods are provided to determine a selected sampling ratio M based on, at least in part, on the available data transmission capacity and the input video signal. Given the selected sampling ratio M, a dedicated filter may be calculated to downsample the video for 5 coding and upsample the decoded video for display. Various techniques for design anti-aliasing filters for arbitrary rational-valued sampling ratios are also described in more detail below with regard to FIGS. 11-15. Still referring to FIG. 4 and FIG. 5, the video input is denoted as f and the output of the conventional codec is denoted f, and the output of an example codec in accordance with the 10 systems and methods is denoted as f2. The reconstruction error of the codec in FIG. 4 may be defined as equation (1): ao2 = E [ (f - fi)2] (1) The reconstruction error of the codec in FIG. 5 may be defined as equation (2): u22 = E [(f - f2)2] (2) 15 Therefore, the codec in FIG. 5 performs better than the codec in FIG. 4, if o2 2 is smaller than o-f. In accordance with the systems and methods described herein, the gap between o2 and o-2 may be increased (and in some cases maximized) by finding the M, as shown in equation(3): M =argmaxM( - U2 (3) Since o0- may be a constant given the target bit-rate, in some embodiments equation (3) is 20 simplified and is stated as equation (4): M = arg minm (4) Therefore, in accordance with the systems and methods described herein, for a given bit-rate, the sampling ratio M may be identified, such that the reconstruction error ( 2 2 ) of the codec shown in FIG. 5 is reduced. In some embodiments, the sampling ratio M may be determined which will 25 result in reconstruction error reaching the minimum (or at least substantially near the minimum). In some embodiments, the sampling ratio M is selected from among of set of predetermined sampling ratios, where the selected ratio M provides the smallest reconstruction error from among the set of predetermined sampling ratios. In some embodiments, M is a scalar, such that the horizontal and vertical directions have 30 the same ratio. Given the resolution of the video W x H, the resolution of the downsampled video is x H. For some embodiments with decoders that support non-square sample (i.e., a - 16 - WO 2012/058394 PCT/US2011/058027 sample aspect ratio (SAR) is not equal to 1:1) and can interpolate the downsampled video to the full-resolution with correct picture aspect ratio (PAR), the horizontal and vertical ratios may be different. In this case, M=[Mh, M] may be a vector, with Mh and M, representing the sampling ratios for the horizontal and vertical directions, respectively. Thus, while some example 5 embodiments are described in a scalar context, this disclosure is not so limited. Instead, some embodiments may utilize a coding process with uneven ratios applied for each direction. For ease of explanation, the processing illustrated in FIG. 5 may be decomposed into the sampling component (FIG. 6A) and the coding component (FIG. 6B). Referring to the sampling component shown in FIG. 6A, for the input original video sequence f upsampling with a factor 10 M 608 is applied right after downsampling with a factor M 602 to generate fs; that is, the error betweenf andf 3 is caused only by sampling and may be referred to as "downsampling error" and denoted as o, which may be defined by equation (5): ut = E [ (f - f3)2] (5) Referring to the coding component shown in FIG. 6B, the input is the downsampled 15 video d 1 , and d, is encoded by encoder 612 and decoded by decoder 614 to obtain the reconstruction signal d 2 , which is the degraded version of di. The error between d, and d 2 is caused only by coding and may be referred to as "coding error" and denoted as o-c 2 , which may be defined by equation (6): 20 o = E[(di - d 2

)

2 ] (6) The relationship among o-2 (Equation 2) and o and oc 2 may thus be defined by equation (7): o2 = P0d +oc (7) Therefore, the optimization problem in (4) may be re-written as in equation (8): 25 M = argminm(prd + u,) (8) In equations (6) and (7), p is a weighting factor in the range of [0,1]. For the purposed of simplification, but without the loss of generality, the weighting factor P is set to 1 for the exemplary embodiments described herein. Estimation of Sampling Error 30 During the sampling stage, f may be filtered by an anti-aliasing filter, which may be a type of low-pass filter, before f is downsampled. Additional details regarding example filters are described below with regard to FIGS. 11-15. The output of the sampling stage, denoted as f3 - 17 - WO 2012/058394 PCT/US2011/058027 (FIG. 6A), is a blurred version off, because f; no longer possesses the energy components with frequency components higher than the cut-off frequency of the anti-aliasing filter applied to f Therefore, in some embodiments, the sampling error can be measured in the frequency domain by measuring the energy of the high frequency components that exist in f but are lost in f3. In 5 accordance with various embodiments, the energy distribution off can be modeled based on the real Power Spectral Density (PSD) or the estimated PSD, as described in more detail below. Alternatively, other techniques may be used to assess the sampling ratio's effect on the video signal's frequency content. Data-Based Estimation of PSD of f 10 Given a Wide-Sense Stationary (WSS) random field with auto-correlation R(Th, Tv), the PSD Sxx(W 1 , W 2 ) may be calculated by 2-D discrete-time Fourier transform (DTFT) as in equation (9): SXX(o 1 , o2) = Y R(Th, Tv)ej1'hijo2Ty Th=-00 [v=-00 (9) R(Th, Tv) may be an estimate based on a set of video signals. Applying 2-D DTFT to the 15 estimated R(Th, TV) produces an estimated PSD, which may no longer be consistent. In accordance with various embodiments, PSD is estimated by the periodgram of the random field, as given in equation (10): W-1H-1 2 Sxx(1, W2) |X(W1, W2)|Y x[w, h] e-joi-i" WH WH Y x,h~i1i w=O h=O (10) where W and H represent the width and height of a video sequence. The factor may be used 20 to guarantee that the total energy in frequency domain is equal to that in the spatial domain, as shown in equation (11): W-1 H-1 fS gX(., (02)dwl dW2 JX[w, h]|12 w=O h=O (11) - 18 - WO 2012/058394 PCT/US2011/058027 In accordance with the systems and methods described herein, when the video sequencef is given, which means the input is a deterministic 2-D signal instead of a WSS random field, Sxx(o 1 , )2) in equation (10) is also known as the energy spectral density (ESD). In equation (10), x[w,h] is one frame in the video sequence f; $xxo 1 , 02) is x[w,h]'s 5 representation in the frequency domain. In one embodiment, the video sequencef may consist of consistent content, such as a single shot. In this case, $xxGo 1 , 02) calculated based on one typical x[w,h] , e.g., the first frame, in f may represent the energy distribution of the whole sequencef In another embodiment, wheref contains scene changes; in this case, Sx(o 1 , 02) can be the average of a plurality of PSDs: $Xx 1 (o 1 , 02), $xx 2 (o 1 , 02), etc., which are calculated 10 based on a plurality of frames, xi[w,h], x 2 [w,h], etc., respectively. Further, the frames xi[w,h] (i=1,2,etc.) may be selected from scene #i. In some embodiments, the techniques for estimating the PSD of the whole sequence may vary. For example, in one embodiment a plurality of frames: xi[w,h], x 2 [w,h], etc. may be picked out from f at a regular interval, e.g., one second, and a plurality of corresponding PSDs: 15 XX1(o1, (02), $xx2 (O1, 02), etc. may be calculated and averaged to generate $x (Co 1 , 02). In one embodiment, the video sequence f is divided into I segments, where each segment consists of a group of successive frames (for example, such segmentation may be based on content, motion, texture, and the structure of edges, etc), and has an assigned weight of wi. Then, the overall PSD $xX(oi, (0) is set to the weighted average of PSDs of frame xi[w,h] (i=0,1,2, . . . I-1) , each 20 picked out from segment #i, as shown in equation (12): -11-1 W -1 H -1 2 (- 2) = Iwi|X (-), -2)2 wI xi [w, h]e-1w-iW2h i=0 i=O w=O h=O (12) Model-Based Estimation of the PSD off In some embodiments, such as embodiments associated with real time video streaming, none of the frames that represent the typical content of a sequence may be accessible for pre 25 processing (i.e., x[w,h] in equation (10)), to estimate PSD. Therefore, in some embodiments, the PSD $xx may be modeled using formulas, as shown in equations (13), (14) and (15): 5(i, )2) = F(wi, W2, fb) (13) - 19 - WO 2012/058394 PCT/US2011/058027 where b = [bo, bi, ... b,_ 1 ] is a vector containing the arguments of the function F(-). In one embodiment, the function F(-) used to model S,,, has one parameter, as shown in equation (14): S K - e(14) where K is a factor to ensure energy conservation. Since the exact total energy in the spatial 5 domain is unknown (since x[w,h] is unavailable), in some embodiments it may be estimated as shown in equation (15): W-1 H-1 5f(.'(0,2)dojdo)2 =x[w, h]|2 = W x H x 1282 - r w=O h=O (15) In equation (14), be is an argument which may be determined by the resolution and content of the video sequence. In one embodiment, the content of be is classified into three categories: 10 simple, medium, and tough. Empirical values of be for different resolutions and context in accordance with one non-limiting embodiment are shown in Table 1. FORMAT SIMPLE MEDIUM TOUGH CIF 0.1061 0.137 0.1410 WVGA 0.1020 0.124 0.1351 1280x720 0.0983 0.105 0.1261 1920x1080 0.0803 0.092 0.1198 Table 1 Estimation of the PSD of f 3 A 15 Since the ratio M is a rational number, it can be represented as -,A B. Thus, a downsampled video has the resolution (W x H). In other words, the proportion of the reduced resolution is equal to (1 - ). In the frequency domain, the proportion of the lost B frequency components is also equal to (1 -) and all these lost components are located in the high frequency domain, if the anti-aliasing filter applied to f has a sharp cut-off frequency at B 20 i -. In this ideal case, (i.e., the output of down-sampling followed by up-sampling), all the - 20 - WO 2012/058394 PCT/US2011/058027 high frequency components of f 3 in FIG. 6A in the band -T, - AT1] and [A T, TE] are lost. The PSD off 3 , denoted as Syy(o 1 , (o)2), may be estimated from $x( 1 , 2 ) by setting the values of $xx(oi, W, (W1, W2 [ -r, - Br] U [B, r, equal to zero, as shown in equation (16): otherwise (16) 5 It is noted that the estimation of Syy(o 1 , o 2 ) in (11) may not be exactly true, because the anti-aliasing filter does not have an ideally sharp cut-off frequency, but it is a good approximation of the true PSD off 3 . Furthermore, when the horizontal and vertical directions have different sampling ratios Mh = Lh and M, = -, respectively, the estimation of Syy(oi, oz) may be re-written as in 10 equation (17): . B Bh B Ba yy( W2) =2) if Ah " Ah and 2 AVAV otherwise (17) Sampling Error Calculation After estimating the PSD of f and f3, (i.e., Sxx(o 1 , o 2 ) and Syy( 2 )), the 2 downsampling error 0 d may be calculated by equation (18): 2 1 " "~f aid - WH ' fTxx (W1, W2) - Syy(W1, W2) doidW2 15 (18) Generally, the downsampling error od 2 provided by equation (18) provides an indication of the difference of high frequency energy content between the input video signal and the video signal sampled at a downsampling rate. Other techniques may be used to generate downsampling error 0d. For example, in some embodiments, the downsampling error 0 d may 20 be obtained by determining the mean squared error (MSE) between the downsampled and upsampled video signal f 3 and the input video signal f For another example, in some -21- WO 2012/058394 PCT/US2011/058027 embodiments, the downsampling error o may be obtained by applying the anti-aliasing filter to the input video signal f and determining the MSE between the filtered f and the original input video f For another example, in some embodiments, the downsampling error 0 d may be obtained by applying a high-pass filter that has the same cut-off frequency with the 5 aforementioned anti-aliasing filter to the input video signalf and determining the average energy per pixel of the high-pass filteredf Estimate the Coding Error 2 Given the target bit-rate R, the coding error oc2 may be estimated by a model. In some embodiments, the following rate-distortion (R-D) model shown by equation (19) is used: 2 Ta 10 (19) where r is the average number of bits allocated to each pixel, i.e., bits per pixel (bpp). In some embodiments r may be calculated by equation (20): R x Mh X M, fps x W x H (20) In equation (20), fps is the frame rate, which means the number of frames captured in each 15 second, Mh and M, are the sampling ratios in the horizontal and vertical directions, respectively, W is the horizontal resolution, H is the vertical resolution, and R is the bit rate. The bit rate R may be acquired, or otherwise deduced, by a variety of techniques. For example, the bit rate R may be provided by a user of the coding system. In some embodiments, a network node associated with the coding system, such as a video server or media-aware network 20 element, may monitor the bit rates associated with various video streams. The video encoder may then query the network node to request a bit rate indication for a particular video stream. In some embodiments, the bit rate may change over time, such as during handovers or IP Flow Mobility (IFOM) functionality associated with a user device receiving video. The encoder may receive messages containing updated target bit rates. In some embodiments, the bit rate R may 25 be deduced by the decoder from the Quality of Service Class Indicator (QCI) assigned to the video stream. For example, QCIs one through four currently offer guaranteed bit rates (GBR). The GBR may be utilized by the video encoder to determine coding error o 2. In some embodiments, the bit rate R may be determined, or otherwise provided, by a user device associated with a decoder. For example, the user device may provide to the encoder through - 22 - WO 2012/058394 PCT/US2011/058027 appropriate signaling an estimate of the total aggregate data transmission throughput. In the case of user devices capable of multi-radio access technology (RAT) communications, the bit rate R may be an indication of the throughput through two or more radio access technologies such as a cellular RAT and a non-cellular RAT, for example. In some embodiments, the RTP/RTCP 5 protocols may be used to ascertain bit rate information. For example, RTP/RTCP may be run in a WRTU and a basestation in order to collect the application layer bit rate. This bit rate R may then be utilized in equation (20). The R-D model in equation (19) has two parameters: a and fl, of which the values vary according to factors including, but not limited to, the content of the sequence, the resolution of 10 the sequence, the encoder implementation and configurations, and so forth. Various embodiments for finding the appropriate values of a and f# are described in more detail below. Once values for a and f# have been identified using any suitable technique, the coding error 0c2 for a particular sampling ratio may then be calculated. For sampling ratios Mh and Mv, the average bits per pixel r using equation (20) may be first determined. Next, the determined 2 15 average bits per pixel r may then be used to calculate the coding error o2 , as described by 2 equation (19). The coding error oc2 may then be calculated for different sampling ratios. First, a new average bits per pixel r may be calculated using new sampling ratio values in equation (19). This new value of r may then be used to solve equation (19). Values of a and p - Off-line Mode 20 In some embodiments, when the sampling ratio may be selected without time constraint, off-line training may be utilized to find the values for a and f# which which most accurately predict, or model, the distortion from the coding process. Thus, in one embodiment, a video may be preprocessed to determine a relationship between the bit-rate and the coding distortion. The determined relationship may then be utilized when determining a sampling ratio as the available 25 bit rate, or target bit rate, changes over time during video transmission. The relationship may be influenced by factors including by not limited to the content of the video data, the resolution of the video data, the encoder implementation and configurations, and so forth. Fixing the aforementioned factors, an encoder configured at known settings may encode a given sequence at the full-resolution. This simulation may be performed at a range of bit-rates 30 {Ro, R 1 , ... , RN-1), producing a set of distortions {Do, DI, ... , DN-1) corresponding to each bit rate. The bit-rates may be normalized to bpp {ro, r 1 , ... , rN-1) using equation (21): -23 - WO 2012/058394 PCT/US2011/058027 Ri ri= fps x W x H (21) The corresponding distortions may be normalized accordingly to mean squared error (MSE), denoted as {do, d 1 , ... , dN-1}. The pairs of normalized bit-rate and distortion [ri, dj] (O i<N) may be plotted as an R-D curve. A numerical optimization algorithm may be used to fit that R-D 5 curve by solving the equation in (22) to find desired values of aopt and port. [a o p t, ilopt] = arg minp )_-j1 (di - (22) Values of a and p - On-line Mode For some embodiments, the video sequence or a segment of the sequence is accessible for 10 pre-processing, but off-line training is unaffordable for the applications because of the high complexity, for example. In these embodiments, a signal analysis may be performed based on the available part of the video sequence and useful features may be extracted that reflect the characteristics of the video sequence, such as motion, texture, edge, and so forth. The extracted features and the values of parameter a and fl have high correlations, and therefore the extracted 15 features may be used to estimate the values of a and fl providing a reduction in coding-induced distortion. In one embodiment, the video sequence based on the PSD (described in detail above) may be analyzed and two features may be extracted from $XX. One feature that may be utilized is the percentage of energy of the DC component, FDC, and the other feature is the cut-off 20 frequency, ±c, where the energy of the components with frequencies outside the range of iac has less than a threshold T (e.g., T=0.5%) of the total energy. Generally, the cut-off frequency ioc represents the PSD decay speed toward the high frequency band, with the absolute value of ioc is in the range [0, 7r]. Thus, the smaller the value of +), the faster the PSD decays toward the high frequency band. FDC and wc may be calculated by equations (23) and (24), 25 respectively: FDc =XX(O, 0) (23) - 24 - WO 2012/058394 PCT/US2011/058027 (OC = min f fS' ((01, (2)doid(02 >! (1 - T) (24) In one embodiment, FDc is truncated to the range of [0.85, 0.99] and quantized by an H-step uniform quantizer. In one embodiment, o, is truncated to the range of [0, 0.9 7r] and quantized by an L-step uniform quantizer. These two extracted features, i.e., quantized FDc and ac, denoted 5 as PDc and 0c, may be used as two indices to look up the entries in two 2-D tables to obtain the values of a and fl, respectively. In one embodiment, FDC is quantized by a 15-step uniform quantizer with the reconstruction points at {0.85, 0.86, ... , 0.98, 0.99} and oc is quantized by a 10-step uniform quantizer with the reconstruction points at {0.07c, 0.17, ..., 0.87r, 0.97c}. Look-up tables for a and f# using PDC and 0, as indices in accordance with one embodiment are shown in 10 FIG. 7 and FIG. 8, respectively. It is noted that -1.0 in some entries does not indicate the values of a or f; instead, the combinations of PDC and c that goes to the entries with value -1.0 could not happen in practice. Values of a and p - Simplified Mode In some embodiments, such as real time video streaming, for example, none of the frames 15 that represent the typical content of a sequence is accessible for pre-processing, (e.g., x[w,h] in equation (10)) to estimate PSD or consequently extract features from PSD to analyze the video sequence. Under these circumstances, a mode (referred to herein as a "simplified mode") may be used estimate a and fi. Given the resolution and the category of the content of the input video f the values of a 20 and fi may be determined by looking up 2-D tables. The pre-defined resolution formats may be the commonly used ones, such as CIF, WVGA, VGA, 720p, 1080p, and so forth. In case the actual resolution of the input f is not one of the pre-defined, the most similar pre-defined resolution may be used for approximation. The content of a video sequence may include motion, texture, structure of edges, and so forth. Given the bit rate, video with simple content may be less 25 degraded than complex videos after coding. In some embodiments, the content of a video sequence can be classified into several categories from "simple" to "tough", depending on the level of granularity that the application has. The type of content may, for example, be indicated by the users based on their prior knowledge of the video; or, when prior knowledge does not exist, the content type maybe automatically set to the default value. In one embodiment, Table 2 - 25 - WO 2012/058394 PCT/US2011/058027 may be used as the 2-D look-up tables for the values of a and fl. Table 2 indicates values of a and f for different resolutions and content in accordance with various embodiments. a f Format Simple Medium Tough Simple Medium Tough CIF 0.76 0.93 1.23 1.49 5.45 8.66 WVGA 0.87 1 1.32 1.09 3.19 6.72 1280x720 0.95 1.04 1.3 1.46 2.8 4.81 1920x1080 0.93 1.1 1.45 1.06 2.4 4.21 Table 2 While, the pre-defined resolutions includes CIF, WVGA, 720p, and 1080p, and three 5 categories of content (simple, medium, tough) are used, this disclosure is not so limited. In some embodiments, additional level of granularity may be included in the table. Furthermore, in some embodiments, the default content type may be set to "medium." According to various embodiments, the complexity of the video may be ascertained through a variety of techniques. For example, in one embodiment user input is received which 10 indicates a relative level of complexity. This user input may then be used to determine an appropriate a and # to be used in equation (19). In some embodiments, video characteristic information (such as complexity) may be received from a network node that has access to the information. Based on this video information, suitable values of a and # may be determined (e.g., via a look up table) and subsequently used in equation (19). In some embodiments, a 15 complexity value for the video may be calculated or estimated from content statistics by prestoring some frames before downsampling the first frame. In this regard, a variety of techniques may be utilized, such as pixel value gradients, histograms, variances, and so forth. Searching for Ratio M Identifying the minimum of overall error ay is equivalent to finding the minimum of the 20 summation of the sampling error o and the coding error u,2, as defined by equation (8). The estimation of o and uo2 in accordance with various non-limiting embodiments are discussed above. Various algorithms that may be used to search for the M to that reduces, and in some cases minimizes, the overall error are described in more detail below. Even Sampling Ratio M for Horizontal and Vertical Directions 25 When the pixel aspect ratio (PAR) of the downsampled video is required to be identical to that of the full-resolution video and the shape of each pixel is required to be square, i.e., storage aspect ratio (SAR) equal to 1, the sampling ratio M = for the horizontal and vertical B -26- WO 2012/058394 PCT/US2011/058027 directions must be the same. Thus, in some embodiments, this requirement may serve as a first constraint. As the second constraint, for many applications it may be preferred that the downsampled resolution B W x B H be integers for a digital video format. In some applications, A A however, some cropping and/or padding may be used to obtain integer number of pixels in either 5 dimension. In any event, with these two constraints, the possible values of M are limited. Denoting the greatest common divisor (GCD) of W and H as G, possible ratios may be represented by equation (25). G M=-n, O n G-1 G -n (25) Sometime, the output resolution is not only required to be integers, but also required to be the 10 multiples of K. For example, some H.264 encoders only handle the case that K is equal to 16, because they don't support padding the frames to obtain an integer number of macroblocks (MB). Under this additional constraint, the possible values of M are further reduced, and (25) may be re-written as equation (26). G G M=-, On<--1 G -nK K (26) 15 In any event, in some embodiments, an "exhaustive" search method may be used to find the 2 overall errors 02 for all the possible M, which are denoted as a vector M =M 1 , M 2 , }, and select the sampling ratio Mi, which provides the minimum overall error. In other embodiments, a search method which finds an appropriate value of M without determining the overall error for all possible values of M is utilized. 20 FIG. 9A, 9B, and 9C illustrate searching strategies to find the sampling ratio Mi in accordance with various non-limiting embodiments. FIG. 9A shows an exhaustive searching strategy, FIG. 9B shows searching with large steps, and FIG. 9C shows fine searching. Referring first to FIG. 9A, after calculating the overall error o2 2 for all values of M, M 13 is selected as the sampling ratio in the illustrated embodiment. To save time without missing the 25 Mi which provides a reduction in coding distortion, searching may be performed in large steps, as shown in FIG. 9B, in order to reach the range that the desired M; is located. Then, further search with finer steps within that range is conducted as shown in FIG. 9C. In the example illustrated in FIG. 9, M has 24 possible values and the exhaustive search in FIG. 9A calculates the overall error G2 24 times to find the selected Mi; in comparison, the combination of coarse 30 and fine search in FIG. 9B and FIG. 9C reduces the computations by half. - 27 - WO 2012/058394 PCT/US2011/058027 In some embodiments, the selected sampling ratio may be selected from any suitable ratio that produces an overall error u' beneath an overall error threshold. In other words, as opposed to identifying a single sampling ratio resulting in an "absolute" minimum overall error value, there may be a plurality of sampling ratios that result in an overall error beneath a desired overall 5 error threshold. Thus, in accordance with various embodiments, any one of the sampling ratios resulting in an overall error level beneath the threshold may be selected as a sampling ratio for coding. In some embodiments, once a sampling ratio is identified generates an overall error level beneath a particular threshold amount, the encoding may proceed with that ratio as the selected sampling ratio. 10 Uneven Sampling Ratio Mh and M, for Horizontal and Vertical Directions In various embodiments, then the constraint of even ratio for both directions is not imposed, the horizontal vertical ratios, Mh and M, can be selected more freely. Possible values of Mh and M, are shown in equation (27) and equation (28), respectively: W Mh - m, !5 M!5 W - 1 W-m (27) H MV - , n0 5 n :5 H - 1 H-n 15 (28) Therefore, the joint event of (Mh, M,) can have W x H possibilities. The exhaustive search that goes through all these possibilities, while possible, may be too time-consuming for most applications. As one of the fast searching strategies, the W x H possibilities may be processed 20 using large steps, as shown in equation (29) and equation (30), where Ah and A, are integerstep sizes for the horizontal and vertical directions, respectively: W W MhW-A, 0!5m<-1 W - M~hAh (29) H H MV = , 0 5 n - - 1 H - nA' AV (30) -28- WO 2012/058394 PCT/US2011/058027 Thus, the number of possibilities reduce to W x H, among which the approximate range (Flh, RV) providing the smallest ay may be found. A further fine search may then be performed in the neighborhood of (Rh, Fqv). However, in some embodiments, when a2 has local minimums with respect to the W x H 5 possibilities of (Mh, M,), the sampling ratio identified found by this strategy may be one of the local minimums instead of the global optimum. In one embodiment, several ratios (4,, R, 1 ), (Fah2, 2 ), and so forth are identified which provide relatively small values of the error ca. Then, a fine search is performed in the neighborhood of each candidate to find the respectively refined ratios(RMh, RV1), (Rh 2 , M, 2 ), and so forth that yield local minimum errors C within the 10 given neighborhood. The final ratio may then be selected among (Rhi, RV 1 ), (Rh 2 , MV 2 ), and so forth as the one yielding the lowest . In another embodiment, search with large steps is performed first with the constraint of even ratio in the two directions, similar to FIG. 9B. The ratio found from this first step may be identified as M . Note that since the constraint of even ratio is enforced, Mi is applied for both 15 horizontal and vertical directions. Then, a range of [Ma, Mb] may be defined which encloses the desired ratio M , that is, Ma < Mi < Mb. The constraint of enforcing the same ratio for the horizontal and vertical directions is then released and the following search may be performed to obtain selected sampling ratios for each of the two directions separately. The search range of the horizontal and vertical ratios, Mh and M., are shown in equation (31) and equation (32), 20 respectively: W W W Mh - ,-- <5 M < - W-m Ma Mb (31) H H H M H -n'M n H- a Mb (32) As can be seen, the search range of (Mh , M) is reduced from W x H to - x 25 ( - . Then, the aforementioned combination of coarse search followed by fine search is -29- WO 2012/058394 PCT/US2011/058027 applied within this search range to find the final selected subsampling ratios for the horizontal and the vertical directions. FIG. 1OA illustrates a process flow 1000 for encoding video data in accordance with one non-limiting embodiment. At 1002 video data to be encoded is received. At 1004, a sampling 5 error value is determined at each of a plurality of sampling ratios. In some embodiments, the sampling error value is determined using a power spectral density (PSD) of the received video data and an estimation of the PSD of downsampled video data. As described above, in various embodiments, a data-based technique may be used to estimate the PSD for the video data. In various embodiments, a model-based technique may be used to estimate the PSD for the video 10 data. At 1006, a coding error value may be determined at each of a plurality of sampling ratios. The coding error may be based on a given bit rate. In some embodiments, the bit rate may be received from a network node, such as a video server or an end-user device, for example. For the given bit rate, a coding error model may be developed to provide coding error values for each of the plurality of sampling ratios. The coding error model may comprise a first parameter and a 15 second parameter that each independently varies based on characteristics of the received video data. Values for the first and second parameters may be determined using any suitable technique. For example, in one embodiment, the first and second parameters are identified through a curve-fitting process. In another embodiment, the first and second parameters may be identified through consultation of various look-up tables, as described in more detail above. In 20 some embodiments, the coding error values at 1006 may be determined before the sampling error values at 1004. At 1008, the sampling error values and the coding error values at each sampling ratio are summed to identify a sampling ratio that reduces the over error value. At 1010, a sampling ratio is selected. In some embodiments, a plurality of sampling ratios may be selected throughout the duration of the video encoding process. For example, a first sampling ratio may 25 be selected at the beginning of the received video data and subsequently one or more additional sampling ratios may be selected during the duration of encoding event. In some embodiments, an exhaustive search is performed to identify a selected sampling ratio. In other embodiments, a non-exhaustive search is performed to identify a selected sampling ratio. For example, only errors associated with a subordinate set (subset) of the plurality of sampling ratios may be 30 summed. From that subset of summed sampling errors and coding errors, a sampling ratios may be selected. In some embodiments, additional searching may be utilized to further refine the search for the selected sampling ratio. In any event, at 1014 the video data may downsampled at the selected sampling ratio and, at 1016, the downsampled video data may be encoded. In some - 30 - WO 2012/058394 PCT/US2011/058027 embodiments, if the bit rate changes, the encoding process may be re-evaluated to determine an updated sampling ratio. Furthermore, in some embodiments, the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio. These horizontal and vertical sampling ratios may be the same or different. 5 FIG. 10B illustrates a process flow 1050 for decoding video data in accordance with one non-limiting embodiment. At 1052, compressed video data are received. The video data may be received from any suitable provider, such as a live video stream or previously stored video. At 1054 an indication of a selected sampling ratio is received. The sampling ratio may be based on, for example, a summation of a sampling error value and a coding error value across a plurality of 10 sampling ratios. At 1056, the block of coefficients is decoded to form reconstructed video data. At 1058, the reconstructed video data is upsampled at the selected sampling ratio to the resolution of the reconstructed video data. At 1060, the upsampled video data may be outputted. According to various embodiments, for an input video with the resolution WxH, the downsampling process (i.e., by downsampling unit 1606 in FIG. 16) may downsample it by 15 factors a and b for the horizontal and vertical directions, respectively, where a and b are positive W H rational numbers. Then, the output video has the resolution - x -. While a and b can be any a b positive rational numbers, represented by and , respectively, where Mh, Nh, Me, and N, are all positive integers, the output of a downsampling process is also a digital video, which has W H integer numbers of rows and columns of pixels. Thus, in various embodiments, w and - (i.e., a b 20 WXMf and HxMv ) are integers with Nh and Nv being factors of W and H to satisfy output Nh Nv resolution requirements. In some embodiments, the upsampling process (i.e., by upsampling unit 1712 in FIG. 17) may have an upsampling ratio equal to the downsampling ratio of the downsampling process which results in the processed video having the same resolution as the original input video. In 25 other embodiments, the upsampling ratio is decoupled from the downsampling ratio, which may allow for a more flexible upsampling ratio. For example, assuming the video to be upsampled has the resolution W, xH, the upsampling ratios may be set to c and d for the horizontal and vertical directions, respectively, and get the resolution of the output video equal to cW, xdH], where c and d are positive rational numbers. The values of c and d may be configured before 30 upsampling based on various criteria. For example, in order to make the output video has a resolution greater than or equal to the input resolution, the factors c and d should be greater than -31- WO 2012/058394 PCT/US2011/058027 or equal to 1.0. Moreover, while c and d can be any positive rational numbers, represented by L and -, respectively, where Kh, Lh, K, and L, are all positive integers, in various embodiments, Lh and L, are factors of W, and H, respectively. As an additional criteria for choosing c and d, the picture aspect ratio (PAR) may be kept at = . a b* 5 FIG. 11 is a block diagram 1100 for a horizontal downsampling process having a downsampling ratio of L. The block diagram 1100 comprises of upsampling Mh times at block 1102, applying filterfdh at block 1104, and downsampling Nh times at block 1106. After being processed by the block diagram 1100, the width of the output video is wxNh FIG. 12 illustrates an example downsampling process with Mh= 3 and Nh= 4. The 10 original row X (FIG. 12(a)) with the spectrum F (FIG. 12(b)) is first upsampled Mh times by inserting zero-valued samples. The resulting row is illustrated as X, in FIG. 12(c). As a result of the upsampling, the spectrum F is squeezed Mh times as shown in FIG. 12(d), denoted as F,. In F,, the spectra centering at integer multiples of - are introduced by the zero-insertion and need to be removed by the filterfdh (as shown in block 1104 in FIG. 11). Since X, will subsequently 15 be downsampled by a factor of Nh at block 1406, the cutoff frequency offdh should be -- (e.g., + ) instead of -, as shown in FIG. 12(f). The filter gain offdh is Mh, because the row X is upsampled Mh times the length and the energy is also increased Mh times. Therefore, fdh can be calculated by applying the inverse Fourier transform to the ideal frequency response Hd as illustrated in FIG. 12(f), as shown in equation (33): 20 fah(n) = f N Haeicd = ! fNh M en' do =- Sinc( n) (33) Nh Nh where sin ) x # 0 Sinc(x) = x (34) By multiplying F, (FIG. 12(d)) with Hd (FIG. 12(f)), the remaining spectrum Zf is determined, as illustrated in FIG. 12(g). In the spatial domain, Zj corresponds to the filtered row, denoted as Xf 25 (see the upper row in FIG. 12(e). Xf is then downsampled by the factor Nh (block 1406 in FIG. 14) by simply picking out every Nh pixels from Xf. Finally, the downsampled row Xd (FIG. 12(e)) and its spectrum Zd (FIG. 12(h)), are determined. Similarly, a vertical downsampling filterfd,v can be calculated using equation (35): - 32 - WO 2012/058394 PCT/US2011/058027 fa,,(N) = fn Mes""do = Sinc( n) (35) To generate the intermediate frame with the resolution Mh WxMyH, a two-step strategy may be used: applying the horizontal and vertical filters consequently (in any order) to the original video. In some embodiments, a 2-D non-separable filterfd,2D may be calculated, which is 5 the 2-D convolution offdh andfd, and applyfd,2D to the original video directly. Designing the upsampling filter may be similar to designing the downsampling filter. For example, the horizontal direction may be focused on first, and then it may be extended to the vertical direction. A resolution of the input video having a width W, will be changed to W 1 XKh after upsampling. As illustrated in FIG. 13, the upsampling process 1300 may comprise 10 upsampling the original row Kh times by zero-insertion at block 1302, applying filterfuh at block 1304, and downsampling Lh times at block 1306 by picking one pixel out of every Lh pixels, where the filterf,h may be calculated by equation(36): Iff,(n) = R Kheinl"d = Sinc( T" n) (36) Kh Similarly, the vertical upsampling filterfd,v may be calculated by (37): 15 fh(n) f Kveind = Sinc(-n) (37) 2K Ky KV Ky In some embodiments, a window function may be utilized to limit the size of the above referenced filters. Suitable types of the window functions include, but are not limited to, Hanning, Hamming, triangular, Gaussian, and Blackman windows, for example. In one embodiment, a Gaussian window function expressed in equation (38) is used, 20 where N denotes the length of the filter and o- is the standard variance of the Gaussian function. FIG. 14 illustrates an example of the window function with (N=71, a-=1.5). 1fn-(N-1)/2 2 w(n) = e 2 -(N-)12) (38) To generate the intermediate frame with the resolution WIKh xHiKv, a two-step strategy may be used: applying the horizontal and vertical filters consequently (in any order) to the 25 original video. In some embodiments, a 2-D non-separable filterfu,2D may be calculated, which is the 2-D convolution of fh and f,, and apply f,2D to the original video directly. While frames may be interpolated to WMh x HMv and W 1 Kh x H 1 Kv as the intermediate for downsampling and upsampling, respectively, many of the interpolated pixels may not be - 33 - WO 2012/058394 PCT/US2011/058027 used. For instance, in some embodiments, only NfxNy (or ) pixels are picked out to form the final output video with the resolution WMx HMv for downsampling (or W K HK_ for Nh NV Lh LV upsampling). Therefore, most of the computation is not utilized. In light of this result, in some embodiments, only the pixels that will finally be picked out to form the output videos are 5 interpolated. FIG. 15 illustrates an embodiment where upsampling is performed with Mh = 3 and Nh = 4. In row 1502, the 1504a, 1504b, 1504c, etc. represent the integer pixels and the white ones 1506 represent inserted zeros. Instead of interpolating all the unknown positions, the pixels forming the final downsampled row are first selected, as shown in row 1508 of FIG. 15. Then 10 these selected positions may be classified into Mh categories, based on their phases. In one embodiment, the phase of a pixel is determined by its distances from the neighboring integer pixels. In row 1512 of FIG. 15, there are three different phases, illustrated as zero phase 1514, first phase 1516, and second phases 1518. In some embodiments, each of the down- sampling and up-sampling filters (i.e., fdh, fd,, 15 fuh, andf,,) are decomposed to a set of phase filters, and each phase filter is used to interpolate the associated pixels. In Table 3, the lengths offdh, fd,v, fu,h, and f, are denoted as ND,H, ND,V, NuH, and Nu v, respectively. The decomposition process is provided in Table 3, where i is a non negative integer and k is the index of the filter. Filter Number of Scenario Filters of Phase m (m starts from 0) Length Phases Horizontal Downsampling ND,H M fm = fdh(k), k < ND,H and k = m + i x Mh Vertical Downsampling ND,V Mf v fd,v(k), k <ND,V and k =m + i xM Horizontal Horizon NU,H Kh f. = fu,(k), k <NU, and k =m + i xK Vertical N~ .rialN, K fj,) = fu,,(k), k <Nu,V and k =m+ i xK Table 3 20 FIG. 16 and FIG. 17 illustrate example embodiments of architectures including pre processing and/or post-processing steps and that may be used before, after, and/or concurrently with encoding, decoding, and/or transcoding video data in accordance with the systems and -34 - WO 2012/058394 PCT/US2011/058027 methods described herein. The pre-processing and/or post-processing may by an adaptive process including quantization, down-sampling, upsampling, anti-aliasing, low-pass interpolation filtering, and/or anti-blur filtering of video data, for example. According to an embodiment, the pre-processing and/or post-processing of the video data may enable the use of standard encoders 5 and/or decoders, such as H.264 encoders and/or decoders for example. Exemplary Encoder Architecture FIG. 16 illustrates an exemplary encoder architecture 1600 which includes the processing and pre-processing that may be performed prior to or concurrently with encoding of video data in order to obtain the selected sampling ratio. The transform 1608, quantization 1610, entropy 10 encoding 1612, inverse quantization 1614, inverse transform 1616, motion compensation 1620, memory 1618 and/or motion estimation 1624 described above with reference to FIG. 2 may be a part of the encoder processing for the video data. The anti-aliasing filter 1604, downsampling unit 1606, and encoder controller 1622 may be a part of the pre-processing steps for encoding the video data. These pre-processing elements may be incorporated into an encoder, work 15 independently of the encoder, or be configured to sit on top of the encoder. In any event, after the video data from the input 1602 has been encoded, the encoded video data may be transmitted via a channel 1626 and/or to storage. In some embodiments, an output buffer may be provided for storing the output encoded video data. The buffer fullness may be monitored, or the buffer input and output rates may be 20 compared to determine its relative fullness level, and may indicate the relative fullness level to the controller. The output buffer may indicate the relative fullness level using, for example, a buffer fullness signal provided from the output buffer to the encoder controller 1622. The encoder controller 1622 may monitor various parameters and/or constraints associated with the channel 1626, computational capabilities of the video encoder system, demands by the users, 25 etc., and may establish target parameters to provide an attendant quality of experience (QoE) suitable for the specified constraints and/or conditions of the channel 1626. The target bit rate may be adjusted from time to time depending upon the specified constraints and/or channel conditions. Typical target bit rates include, for example 64 kbps, 128 kbps, 256 kbps, 384 kbps, 512 kbps, and so forth. 30 As illustrated in FIG. 16, video data is received from an input 1602, such as a video source. The video data being received may include an original or decoded video signal, video sequence, bit stream, or any other data that may represent an image or video content. The received video data may be pre-processed by the anti-aliasing filter 1604, downsampling unit - 35 - WO 2012/058394 PCT/US2011/058027 1606, and/or encoder controller 1622 in accordance with the systems and methods described herein. The anti-aliasing filter 1604, downsampling unit 1606, and/or encoder controller 1622 may be in communication with one another and/or with other elements of an encoder to encode the received video data for transmission. In some embodiments, the anti-aliasing filter 1604 may 5 be designed using the techniques described above with respect to FIGS. 11-15. The pre processing of the received video data may be performed prior to or concurrently with the processing performed by the transform, quantization, entropy encoding, inverse quantization, inverse transform, motion compensation, and/or motion estimation other elements of the encoder. 10 As illustrated in FIG. 16, the original and/or decoded video data may be transmitted to an anti-aliasing filter 1604 for pre-processing. The anti-aliasing filter may be used to restrict the frequency content of the video data to satisfy the conditions of the downsampling unit 1606. According to an embodiment, the anti-aliasing filter 1604 for 2:1 downsampling may be an 11 tap FIR, i.e. [1, 0, -5, 0, 20, 32, 20, 0, -5, 0, 1]/64. According to an embodiment, the anti-aliasing 15 filter may be adaptive to the content being received and/or jointly designed with quantization parameters (QP). The encoder controller 1622 may determine the selected sampling ratio and communicate with the downsampling unit 1606 during pre-processing of the video data to provide the downsampling unit 1606 with the selected sampling ratio. For example, the encoder controller 1622 may adaptively select the filter types (separable or non-separable), filter 20 coefficients, and/or filter length in any dimension based on the statistics of video data and/or channel data transmission capacity. As illustrated in FIG. 16, the pre-processing of the video data may include down sampling the video data using down-sampling unit 1606. The down-sampling unit 1606 may downsample at the sampling ratio M, as described in detail above. The video data may be 25 transmitted to the downsampling unit 1606 from the anti-aliasing filter 1604. Alternatively, the original and/or decoded video data may be transmitted to the downsampling unit 1606 directly. In any event, the downsampling unit 1606 may downsample the video data to reduce the sampling ratio of the video data. Down-sampling the video data may produce a lower resolution image and/or video than the original image and/or video represented by the video data. As 30 described above, the sampling ratio M of the downsampling unit 1606 may be adaptive to the received content and/or jointly designed with QP. For example, the encoder controller 1622 may adaptively select the downsampling ratio, such as 1/3 or a rational fraction for example, based on the instantaneous video content and/or channel data transmission capacity. - 36 - WO 2012/058394 PCT/US2011/058027 The pre-processing performed by the anti-aliasing filter 1604 and/or downsampling unit 1606 may be controlled and/or aided by communication with the encoder controller 1622. The encoder controller 1622 may additionally, or alternatively, control the quantization performed in the processing of the video data. The encoder controller 1622 may be configured to choose the 5 encoding parameters. For example, the encoder controller may be content dependent and may utilize motion information, residual data, and other statistics from the video data to determine the encoding parameters and/or pre-processing parameters, such as the sampling ratio M for example. Exemplary Decoder Architecture 10 FIG. 17 illustrates an exemplary decoder architecture 1700 for the processing and post processing that may be performed to decode video data. The entropy decoding 1704, inverse quantization 1706, inverse transform 1708, and/or motion compensation 1720 may be a part of the decoder processing for the video data. The upsampling unit 1712, low-pass filter 1714, anti blur filter 1016, and/or decoder controller 1710 may be a part of the post-processing steps for 15 decoding the video data. These post-processing elements may be incorporated into the decoder 1700, work independently of the decoder, or be configured to sit on top of the decoder. In any event, after the video data from the channel 1702 has been decoded and the post-processing has been performed, the decoded video data may be transmitted via output 1718, such as to a storage medium or an output device for example. 20 As illustrated in FIG. 17, video data is received via a channel 1702, such as from an encoder or storage medium for example. The video data being received may include an encoded video signal, video sequence, bit stream, or any other data that may represent an image or video content. The received video data may be processed using the entropy decoding, inverse quantization, inverse transform, and/or motion compensation, as illustrated in FIG. 3. The 25 processing of the encoded video data may be performed prior to or concurrently with the post processing. The encoded video data may be post-processed by the upsampling unit 1712, low pass filter 1714, anti-blur filter 1716, and/or decoder controller 1710. The decoder controller 1710 may receive an indication of the selected sampling ratio and transmit the selected sampling ratio to the upsampling unit 1712. The upsampling unit 1712, low-pass filter 1714, anti-blur 30 filter 1716, and/or decoder controller 1718 may be in communication with one another and/or with other elements of a decoder 1700 to decode the received video data for storage and/or output to a display. In some embodiments, the low-pass filter 1714 may be designed using the techniques described above with respect to FIGS. 14-18. - 37 - WO 2012/058394 PCT/US2011/058027 As illustrated in FIG. 17, the post-processing of the video data may include upsampling the video data. The upsampling ratio may be the selected rate Mi, as described above. The video data may be transmitted to the upsampling unit 1712 after being processed by the decoder 1700 (as illustrated). The upsampling unit 1712 may increase the resolution and/or quality of the 5 reconstructed video. For example, the upsampling of the video data may correspond to the down-sampling performed on the video data at the pre-processing of the encoder. Similar to the downsampling unit 1606 (FIG. 16), the upsampling unit 1712 may have a dynamic sampling ratio for upsampling the video data. According to an embodiment, the post-processing of the video data may include a low 10 pass interpolation filter 1714. The low-pass interpolation filter may implement anti-aliasing and improve the quality and definition of the video content represented by the video data. According to an embodiment, the low-pass interpolation filter for 1:2 upsamping may include a 4-tap FIR, i.e. [0.25, 0.75, 0.75, 0.25]. The low-pass interpolation filter 1714 may be adaptive to the content and/or jointly designed with QP. According to an embodiment, the decoder controller 15 may adaptively select the filter types, filter coefficients and/or filter length in any dimension. The selections made by the decoder controller may be based on the statistics and/or syntax in the encoded video data, such as statistics of previous frames and QP of current frame for example, as described in detail above. As illustrated in FIG. 17, the post-processing of the video data may, in some 20 embodiments, include an anti-blur (or sharpening) filter 1716. The anti-blur filter 1716 may be used to compensate the blurriness caused by the down-sampling and/or low-pass filtering. According to an embodiment, the anti-blur filter may include a 2D-Laplacian filter, i.e. [0, 0, 0; 0, 1, 0; 0, 0, 0] + [-1, -1, -1; -1, 8, -1; -1, -1, -1]/5. The anti-blur filter may be adaptive to the content and/or jointly designed with QP. According to an embodiment, the decoder controller 25 1710 may adaptively select the filter types, filter coefficients, and/or filter length in any dimension. The selections may be based on the statistics and/or syntax in the encoded video bit stream, such as statistics of previous frames and QP of current frame for example, as described in more detail above. According to an embodiment, the encoder and decoder performing the pre-processing and 30 post-processing, respectively, may be aware of one another. For example, the encoder and decoder may have a communication link (such as communication channel 16 in FIG. 1) that enables transmission of information corresponding to the pre-processing of the video data to the decoder. Similarly, the decoder may transmit information corresponding to the post-processing - 38 - WO 2012/058394 PCT/US2011/058027 of the video data to the encoder via the communication link. Such a communication link may enable the decoder to adjust the post-processing based on the pre-processing that occurs at the encoder. Similarly, the communication link may enable the encoder to adjust the pre-processing based on the post-processing that occurs at the decoder. A similar communication link may also 5 be established with other entities performing the pre-processing and/or post-processing of the video data if the pre-processing and post-processing are not performed at the encoder and decoder, respectively. FIG. 18 illustrates an exemplary embodiment of the pre-processing of the video data with regard to a transcoder. As illustrated in FIG. 18, video data 1804 may be received, such as a bit 10 stream, a video signal, video sequence, or any other data that may represent an image or video content. The video data may be pre-processed by the anti-aliasing filter 1808, downsampler 1810, and/or encoder controller 1802. The anti-aliasing filter 1808, downsampler 1810, and/or encoder controller 1802 may be in communication with one another and/or with other elements of an encoder and/or decoder. The pre-processing of the received video data may be performed 15 prior to or concurrently with the processing performed by the encoder and/or decoder. The video data may be pre-processed as described above with regard to the discussion of the pre-processing of video data in FIG. 16. As described above with regard to FIG. 1, for example, video coded in accordance with the systems and methods described herein may be sent via a communication channel 16, which 20 may included wireline connections and/or wireless connections, through a communications network. The communications network may be any suitable type of communication system, as described in more detail below with respect to FIGS. 19A, 19B, 19C, and 19D. FIG. 19A is a diagram of an example communications system 1900 in which one or more disclosed embodiments may be implemented. The communications system 1900 may be a 25 multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users. The communications system 1900 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth. For example, the communications systems 1900 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access 30 (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single carrier FDMA (SC-FDMA), and the like. As shown in FIG. 19A, the communications system 1900 may include wireless transmit/receive units (WTRUs) 1902a, 1902b, 1902c, 1902d, a radio access network (RAN) - 39 - WO 2012/058394 PCT/US2011/058027 1904, a core network 1906, a public switched telephone network (PSTN) 1908, the Internet 1910, and other networks 1912, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements. Each of the WTRUs 1902a, 1902b, 1902c, 1902d may be any type of device configured to operate and/or 5 communicate in a wireless environment. By way of example, the WTRUs 1902a, 1902b, 1902c, 1902d may be configured to transmit and/or receive wireless signals and may include user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, consumer electronics, or any other terminal capable of receiving and processing 10 compressed video communications. The communications systems 1900 may also include a base station 1914a and a base station 1914b. Each of the base stations 1914a, 1914b may be any type of device configured to wirelessly interface with at least one of the WTRUs 1902a, 1902b, 1902c, 1902d to facilitate access to one or more communication networks, such as the core network 1906, the Internet 15 1910, and/or the networks 1912. By way of example, the base stations 1914a, 1914b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 1914a, 1914b are each depicted as a single element, it will be appreciated that the base stations 1914a, 1914b may include any number of interconnected base stations and/or network elements. 20 The base station 1914a may be part of the RAN 1904, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 1914a and/or the base station 1914b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be 25 divided into cell sectors. For example, the cell associated with the base station 1914a may be divided into three sectors. Thus, in one embodiment, the base station 1914a may include three transceivers, i.e., one for each sector of the cell. In another embodiment, the base station 1914a may employ multiple-input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell. 30 The base stations 1914a, 1914b may communicate with one or more of the WTRUs 1902a, 1902b, 1902c, 1902d over an air interface 1916, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), - 40 - WO 2012/058394 PCT/US2011/058027 visible light, etc.). The air interface 1916 may be established using any suitable radio access technology (RAT). More specifically, as noted above, the communications system 1900 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, 5 FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 1914a in the RAN 1904 and the WTRUs 1902a, 1902b, 1902c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 1916 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA 10 (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High Speed Uplink Packet Access (HSUPA). In another embodiment, the base station 1914a and the WTRUs 1902a, 1902b, 1902c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 1916 using Long Term Evolution (LTE) and/or LTE 15 Advanced (LTE-A). In other embodiments, the base station 1914a and the WTRUs 1902a, 1902b, 1902c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global 20 System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like. The base station 1914b in FIG. 19A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, and the 25 like. In one embodiment, the base station 1914b and the WTRUs 1902c, 1902d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN). In another embodiment, the base station 1914b and the WTRUs 1902c, 1902d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 1914b and the WTRUs 1902c, 1902d may utilize a 30 cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, etc.) to establish a picocell or femtocell. As shown in FIG. 19A, the base station 1914b may have a direct connection to the Internet 1910. Thus, the base station 1914b may not be required to access the Internet 1910 via the core network 1906. -41 - WO 2012/058394 PCT/US2011/058027 The RAN 1904 may be in communication with the core network 1906, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 1902a, 1902b, 1902c, 1902d. For example, the core network 1906 may provide call control, billing services, mobile location-based 5 services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high level security functions, such as user authentication. Although not shown in FIG. 19A, it will be appreciated that the RAN 1904 and/or the core network 1906 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 1904 or a different RAT. For example, in addition to being connected to the RAN 1904, which may be utilizing an 10 E-UTRA radio technology, the core network 1906 may also be in communication with another RAN (not shown) employing a GSM radio technology. The core network 1906 may also serve as a gateway for the WTRUs 1902a, 1902b, 1902c, 1902d to access the PSTN 1908, the Internet 1910, and/or other networks 1912. The PSTN 1908 may include circuit-switched telephone networks that provide plain old telephone 15 service (POTS). The Internet 1910 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and the internet protocol (IP) in the TCP/IP internet protocol suite. The networks 1912 may include wired or wireless communications networks owned and/or operated by other service providers. For example, the 20 networks 1912 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 104 or a different RAT. Some or all of the WTRUs 1902a, 1902b, 1902c, 1902d in the communications system 1900 may include multi-mode capabilities, i.e., the WTRUs 1902a, 1902b, 1902c, 1902d may include multiple transceivers for communicating with different wireless networks over different 25 wireless links. For example, the WTRU 1902c shown in FIG. 19A may be configured to communicate with the base station 1914a, which may employ a cellular-based radio technology, and with the base station 1914b, which may employ an IEEE 802 radio technology. FIG. 19B is a system diagram of an example WTRU 1902. As shown in FIG. 19B, the WTRU 1902 may include a processor 1918, a transceiver 1920, a transmit/receive element 1922, 30 a speaker/microphone 1924, a keypad 1926, a display/touchpad 1928, non-removable memory 1906, removable memory 1932, a power source 1934, a global positioning system (GPS) chipset 1936, and other peripherals 1938. It will be appreciated that the WTRU 1902 may include any sub-combination of the foregoing elements while remaining consistent with an embodiment. - 42 - WO 2012/058394 PCT/US2011/058027 The processor 1918 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a graphics processing unit (GPU), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field 5 Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 1918 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1902 to operate in a wireless environment. The processor 1918 may be coupled to the transceiver 1920, which may be coupled to the transmit/receive element 1922. While FIG. 19B depicts the 10 processor 1918 and the transceiver 1920 as separate components, it will be appreciated that the processor 1918 and the transceiver 1920 may be integrated together in an electronic package or chip. The transmit/receive element 1922 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 1914a) over the air interface 1916. For 15 example, in one embodiment, the transmit/receive element 1919 may be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 1922 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In yet another embodiment, the transmit/receive element 1922 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive 20 element 1922 may be configured to transmit and/or receive any combination of wireless signals. In addition, although the transmit/receive element 1922 is depicted in FIG. 19B as a single element, the WTRU 1902 may include any number of transmit/receive elements 1922. More specifically, the WTRU 1902 may employ MIMO technology. Thus, in one embodiment, the WTRU 1902 may include two or more transmit/receive elements 1922 (e.g., multiple 25 antennas) for transmitting and receiving wireless signals over the air interface 1916. The transceiver 1920 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 1922 and to demodulate the signals that are received by the transmit/receive element 1922. As noted above, the WTRU 1902 may have multi-mode capabilities. Thus, the transceiver 1920 may include multiple transceivers for enabling the 30 WTRU 1902 to communicate via multiple RATs, such as UTRA and IEEE 802.11, for example. The processor 1918 of the WTRU 1902 may be coupled to, and may receive user input data from, the speaker/microphone 1924, the keypad 1926, and/or the display/touchpad 1928 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display - 43 - WO 2012/058394 PCT/US2011/058027 unit). The processor 1918 may also output user data to the speaker/microphone 1924, the keypad 1926, and/or the display/touchpad 1928. In addition, the processor 1918 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 1906 and/or the removable memory 1932. The non-removable memory 1906 may include random 5 access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 1932 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 1918 may access information from, and store data in, memory that is not physically located on the WTRU 1902, such as on a server or a home computer (not shown). 10 The processor 1918 may receive power from the power source 1934, and may be configured to distribute and/or control the power to the other components in the WTRU 1902. The power source 1934 may be any suitable device for powering the WTRU 1902. For example, the power source 1934 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel 15 cells, and the like. The processor 1918 may also be coupled to the GPS chipset 1936, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 1902. In addition to, or in lieu of, the information from the GPS chipset 1936, the WTRU 1902 may receive location information over the air interface 1916 from a base 20 station (e.g., base stations 1914a, 1914b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 1902 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment. The processor 1918 may further be coupled to other peripherals 1938, which may include 25 one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 1938 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth@ module, a frequency modulated (FM) radio unit, a digital music player, a 30 media player, a video game player module, an Internet browser, and the like. FIG. 19C is a system diagram of the RAN 1904 and the core network 1906 according to an embodiment. As noted above, the RAN 1904 may employ a UTRA radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. The RAN 1904 - 44 - WO 2012/058394 PCT/US2011/058027 may also be in communication with the core network 1906. As shown in FIG. 19C, the RAN 1904 may include Node-Bs 1940a, 1940b, 1940c, which may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. The Node-Bs 1940a, 1940b, 1940c may each be associated with a particular cell (not 5 shown) within the RAN 1904. The RAN 1904 may also include RNCs 1942a, 1942b. It will be appreciated that the RAN 1904 may include any number of Node-Bs and RNCs while remaining consistent with an embodiment. As shown in FIG. 19C, the Node-Bs 1940a, 1940b may be in communication with the RNC 1942a. Additionally, the Node-B 1940c may be in communication with the RNC1942b. 10 The Node-Bs 1940a, 1940b, 1940c may communicate with the respective RNCs 1942a, 1942b via an lub interface. The RNCs 1942a, 1942b may be in communication with one another via an lur interface. Each of the RNCs 1942a, 1942b may be configured to control the respective Node Bs 1940a, 1940b, 1940c to which it is connected. In addition, each of the RNCs 1942a, 1942b may be configured to carry out or support other functionality, such as outer loop power control, 15 load control, admission control, packet scheduling, handover control, macrodiversity, security functions, data encryption, and the like. The core network 1906 shown in FIG. 19C may include a media gateway (MGW) 1944, a mobile switching center (MSC) 1946, a serving GPRS support node (SGSN) 1948, and/or a gateway GPRS support node (GGSN) 1950. While each of the foregoing elements are depicted 20 as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator. The RNC 1942a in the RAN 1904 may be connected to the MSC 1946 in the core network 1906 via an IuCS interface. The MSC 1946 may be connected to the MGW 1944. The MSC 1946 and the MGW 1944 may provide the WTRUs 1902a, 1902b, 1902c with access to 25 circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices. The RNC 1942a in the RAN 1904 may also be connected to the SGSN 1948 in the core network 1906 via an IuPS interface. The SGSN 1948 may be connected to the GGSN 1950. The SGSN 1948 and the GGSN 1950 may provide the WTRUs 1902a, 1902b, 1902c with access to 30 packet-switched networks, such as the Internet 1910, to facilitate communications between and the WTRUs 1902a, 1902b, 1902c and IP-enabled devices. - 45 - WO 2012/058394 PCT/US2011/058027 As noted above, the core network 1906 may also be connected to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers. FIG. 19D is a system diagram of the RAN 1904 and the core network 1906 according to 5 another embodiment. As noted above, the RAN 1904 may employ an E-UTRA radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. The RAN 1904 may also be in communication with the core network 1906. The RAN 1904 may include eNode-Bs 1960a, 1960b, 1960c, though it will be appreciated that the RAN 1904 may include any number of eNode-Bs while remaining consistent 10 with an embodiment. The eNode-Bs 1960a, 1960b, 1960c may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. In one embodiment, the eNode-Bs 1960a, 1960b, 1960c may implement MIMO technology. Thus, the eNode-B 1960a, for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 1902a. 15 Each of the eNode-Bs 1960a, 1960b, 1960c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the uplink and/or downlink, and the like. As shown in FIG. 19D, the eNode-Bs 1960a, 1960b, 1960c may communicate with one another over an X2 interface. 20 The core network 1906 shown in FIG. 19D may include a mobility management gateway (MME) 1962, a serving gateway 1964, and a packet data network (PDN) gateway 1966. While each of the foregoing elements are depicted as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator. 25 The MME 1962 may be connected to each of the eNode-Bs 1960a, 1960b, 1960c in the RAN 1904 via an SI interface and may serve as a control node. For example, the MME 1962 may be responsible for authenticating users of the WTRUs 1902a, 1902b, 1902c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 1902a, 1902b, 1902c, and the like. The MME 1962 may also provide a control plane 30 function for switching between the RAN 1904 and other RANs (not shown) that employ other radio technologies, such as GSM or WCDMA. The serving gateway 1964 may be connected to each of the eNode Bs 1960a, 1960b, 1960c in the RAN 1904 via the SI interface. The serving gateway 1964 may generally route and - 46 - WO 2012/058394 PCT/US2011/058027 forward user data packets to/from the WTRUs 1902a, 1902b, 1902c. The serving gateway 1964 may also perform other functions, such as anchoring user planes during inter-eNode B handovers, triggering paging when downlink data is available for the WTRUs 1902a, 1902b, 1902c, managing and storing contexts of the WTRUs 1902a, 1902b, 1902c, and the like. 5 The serving gateway 1964 may also be connected to the PDN gateway 1966, which may provide the WTRUs 1902a, 1902b, 1902c with access to packet-switched networks, such as the Internet 1910, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and IP enabled devices. The core network 1906 may facilitate communications with other networks. For 10 example, the core network 1906 may provide the WTRUs 1902a, 1902b, 102c with access to circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices. For example, the core network 1906 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the core network 1906 15 and the PSTN 1908. In addition, the core network 1906 may provide the WTRUs 1902a, 1902b, 1902c with access to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers. FIG. 19E is a system diagram of the RAN 1904 and the core network 1906 according to another embodiment. The RAN 1904 may be an access service network (ASN) that employs 20 IEEE 802.16 radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. As will be further discussed below, the communication links between the different functional entities of the WTRUs 1902a, 1902b, 1902c, the RAN 1904, and the core network 1906 may be defined as reference points. As shown in FIG. 19E, the RAN 1904 may include base stations 1970a, 1970b, 1970c, 25 and an ASN gateway 1972, though it will be appreciated that the RAN 1904 may include any number of base stations and ASN gateways while remaining consistent with an embodiment. The base stations 1970a, 1970b, 1970c may each be associated with a particular cell (not shown) in the RAN 1904 and may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916. In one embodiment, the base stations 30 1970a, 1970b, 1970c may implement MIMO technology. Thus, the base station 1970a, for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 1902a. The base stations 1970a, 1970b, 1970c may also provide mobility management functions, such as handoff triggering, tunnel establishment, radio resource -47- WO 2012/058394 PCT/US2011/058027 management, traffic classification, quality of service (QoS) policy enforcement, and the like. The ASN gateway 1972 may serve as a traffic aggregation point and may be responsible for paging, caching of subscriber profiles, routing to the core network 1906, and the like. The air interface 1916 between the WTRUs 1902a, 1902b, 1902c and the RAN 1904 may 5 be defined as an RI reference point that implements the IEEE 802.16 specification. In addition, each of the WTRUs 1902a, 1902b, 1902c may establish a logical interface (not shown) with the core network 1906. The logical interface between the WTRUs 1902a, 1902b, 1902c and the core network 1906 may be defined as an R2 reference point, which may be used for authentication, authorization, IP host configuration management, and/or mobility management. 10 The communication link between each of the base stations 1970a, 1970b, 1970c may be defined as an R8 reference point that includes protocols for facilitating WTRU handovers and the transfer of data between base stations. The communication link between the base stations 1970a, 1970b, 1970c and the ASN gateway 1972 may be defined as an R6 reference point. The R6 reference point may include protocols for facilitating mobility management based on mobility 15 events associated with each of the WTRUs 1902a, 1902b, 1900c. As shown in FIG. 19E, the RAN 1904 may be connected to the core network 1906. The communication link between the RAN 104 and the core network 1906 may defined as an R3 reference point that includes protocols for facilitating data transfer and mobility management capabilities, for example. The core network 1906 may include a mobile IP home agent (MIP 20 HA) 1974, an authentication, authorization, accounting (AAA) server 1976, and a gateway 1978. While each of the foregoing elements are depicted as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator. The MIP-HA 1974 may be responsible for IP address management, and may enable the 25 WTRUs 1902a, 1902b, 1902c to roam between different ASNs and/or different core networks. The MIP-HA 1974 may provide the WTRUs 1902a, 1902b, 1902c with access to packet switched networks, such as the Internet 1910, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and IP-enabled devices. The AAA server 1976 may be responsible for user authentication and for supporting user services. The gateway 1978 may facilitate interworking 30 with other networks. For example, the gateway 1978 may provide the WTRUs 1902a, 1902b, 1902c with access to circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices. In addition, the gateway 1978 may provide the WTRUs 1902a, 1902b, -48- WO 2012/058394 PCT/US2011/058027 1902c with access to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers. Although not shown in FIG. 19E, it will be appreciated that the RAN 1904 may be connected to other ASNs and the core network 1906 may be connected to other core networks. 5 The communication link between the RAN 1904 the other ASNs may be defined as an R4 reference point, which may include protocols for coordinating the mobility of the WTRUs 1902a, 1902b, 1902c between the RAN 1904 and the other ASNs. The communication link between the core network 1906 and the other core networks may be defined as an R5 reference, which may include protocols for facilitating interworking between home core networks and 10 visited core networks. EMBODIMENTS A video encoding method, comprising receiving video data; at each of a plurality of sampling ratios, determining a sampling error value; for a bit rate, at each of the plurality of 15 sampling ratios, determining a coding error value; summing the sampling error value and the coding error value at each of the plurality of sampling ratios; selecting one of the plurality of sampling ratios based on the sum of the sampling error value and the coding error value at the selected sampling ratio; downsampling the video data at the selected sampling ratio; and encoding the downsampled video data. 20 A method of the proceeding embodiment, wherein selecting one of the plurality of sampling ratios comprises selecting the one of the plurality sampling ratios resulting in the lowest summation of the sampling error value and the coding error value. A method of any of the proceeding embodiments, wherein selecting one of the plurality of sampling ratios comprises selecting one of the plurality sampling ratios resulting in a 25 summation of the sampling error value and the coding error value having an overall error value beneath an overall error threshold. A method of any of the proceeding embodiments, wherein the sampling error value is based on a power spectral density (PSD) of the video data and an estimation of the PSD of downsampled video data. 30 A method of any of the proceeding embodiments, wherein the estimation of the PSD of downsampled video data is a function, wherein at least one parameter of the function is determined by at least one characteristic of the video data. - 49 - WO 2012/058394 PCT/US2011/058027 A method of any of the proceeding embodiments, wherein the sampling error value is based on a difference of the received video data and anti-aliasing filtered video data. A method of any of the proceeding embodiments, wherein the coding error value is based on a coding error model, wherein the coding error model is a function of the bit rate and a 5 sampling ratio. A method of any of the proceeding embodiments, wherein the coding error model comprises a first parameter and a second parameter, and wherein the first parameter and the second parameter are each determined by at least one characteristic of the video data. A method of any of the proceeding embodiments, further comprising for each of a 10 plurality of bit rates, determining a bit per pixel value; for each of the plurality of bit rates, determining a distortion value; for each of the plurality of bit rates, determining a plurality of estimated distortion values based on a plurality of values for the first parameter and a plurality of values for the second parameter of the coding error model; and determining a selected value for the first parameter and a value for the second parameter of the coding error model, such that the 15 plurality of distortion values have the minimum difference with the plurality of the estimated distortion values. A method of any of the proceeding embodiments, further comprising selecting a value for the first parameter from a first look-up table; and selecting a value for the second parameter form a second look-up table. 20 A method of any of the proceeding embodiments, further comprising determining a power spectral density of the video data, wherein the values for the first and second parameters are based on a DC component of the power spectral density. A method of any of the proceeding embodiments, further comprising determining a power spectral density of the video data, wherein the values for the first and second parameters 25 are based on the decay speed toward the high frequency band of the power spectral density. A method of any of the proceeding embodiments, wherein the at least one characteristic is a complexity value of the received video data; and wherein the complexity value is received from one of a user input and a network node. A method of any of the proceeding embodiments, further comprising receiving an 30 indication of the bit rate from a network node. A method of any of the proceeding embodiments, further comprising subsequent to selecting the one of the plurality of sampling ratios, receiving an indication of a second bit rate; for a second bit rate, determining an updated coding error valueat each of the plurality of -50 - WO 2012/058394 PCT/US2011/058027 sampling ratios; selecting an updated sampling ratio based on a summation of the sampling error value and updated coding error value; downsampling the input video at the updated sampling ratio; and encoding the downsampled video sequence. A method of any of the proceeding embodiments, wherein the sampling ratio comprises a 5 horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is different from the vertical sampling ratio. A method of any of the proceeding embodiments, wherein the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is the same as the vertical sampling ratio. 10 A method of any of the proceeding embodiments, wherein a first selection of the sampling ratio is performed at the beginning of the received video data and at least a second selection of the sampling ratio is performed during the duration of the received video data. A video decoding method, comprising receiving compressed video data; receiving an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a 15 sampling error value and a coding error value across a plurality of sampling ratios; decoding the compressed video data to form reconstructed video data; upsampling the reconstructed video data at the selected sampling ratio to increase the resolution of the upsampled reconstructed video; and outputting the upsampled video data. A video decoding system comprising a video decoder, the video decoder configured to 20 receive compressed video data; receive an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios; decode the compressed video data to form reconstructed video data; upsample the reconstructed video data to increase the resolution of the reconstructed video data; and output the filtered video data. 25 A video decoding system of the proceeding embodiment further comprising a wireless receive/transmit unit in communication with a communication system, wherein the wireless receive/transmit unit is configured to receive the video data from the communication system. Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any 30 combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer -51 - WO 2012/058394 PCT/US2011/058027 readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital 5 versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer. Variations of the method, apparatus and system described above are possible without departing from the scope of the invention. In view of the wide variety of embodiments that can 10 be applied, it should be understood that the illustrated embodiments are exemplary only, and should not be taken as limiting the scope of the following claims. Moreover, in the embodiments described above, processing platforms, computing systems, controllers, and other devices containing processors are noted. These devices may contain at least one Central Processing Unit ("CPU") and memory. In accordance with the 15 practices of persons skilled in the art of computer programming, reference to acts and symbolic representations of operations or instructions may be performed by the various CPUs and memories. Such acts and operations or instructions may be referred to as being "executed," "computer executed" or "CPU executed." One of ordinary skill in the art will appreciate that the acts and symbolically represented 20 operations or instructions include the manipulation of electrical signals by the CPU. An electrical system represents data bits that can cause a resulting transformation or reduction of the electrical signals and the maintenance of data bits at memory locations in a memory system to thereby reconfigure or otherwise alter the CPU's operation, as well as other processing of signals. The memory locations where data bits are maintained are physical locations that have particular 25 electrical, magnetic, optical, or organic properties corresponding to or representative of the data bits. It should be understood that the exemplary embodiments are not limited to the above mentioned platforms or CPUs and that other platforms and CPUs may support the described methods. The data bits may also be maintained on a computer readable medium including magnetic 30 disks, optical disks, and any other volatile (e.g., Random Access Memory ("RAM")) or non volatile (e.g., Read-Only Memory ("ROM")) mass storage system readable by the CPU. The computer readable medium may include cooperating or interconnected computer readable medium, which exist exclusively on the processing system or are distributed among multiple - 52 - WO 2012/058394 PCT/US2011/058027 interconnected processing systems that may be local or remote to the processing system. It should be understood that the exemplary embodiments are not limited to the above-mentioned memories and that other platforms and memories may support the described methods. No element, act, or instruction used in the description of the present application should be 5 construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article "a" is intended to include one or more items. Where only one item is intended, the term "one" or similar language is used. Further, the terms "any of' followed by a listing of a plurality of items and/or a plurality of categories of items, as used herein, are intended to include "any of," "any combination of," "any multiple of," and/or "any combination 10 of multiples of' the items and/or the categories of items, individually or in conjunction with other items and/or other categories of items. Further, as used herein, the term "set" is intended to include any number of items, including zero. Further, as used herein, the term "number" is intended to include any number, including zero. Moreover, the claims should not be read as limited to the described order or elements 15 unless stated to that effect. In addition, use of the term "means" in any claim is intended to invoke 35 U.S.C. §112, 1 6, and any claim without the word "means" is not so intended. - 53 -

Claims

1. A video encoding method, comprising: receiving video data; at each of a plurality of sampling ratios, determining a sampling error value; 5 for a bit rate, at each of the plurality of sampling ratios, determining a coding error value; summing the sampling error value and the coding error value at each of the plurality of sampling ratios; selecting one of the plurality of sampling ratios based on the sum of the sampling error value and the coding error value at the selected sampling ratio; 10 downsampling the video data at the selected sampling ratio; and encoding the downsampled video data.

2. The method of claim 1, wherein selecting one of the plurality of sampling ratios comprises selecting the one of the plurality sampling ratios resulting in the lowest summation of 15 the sampling error value and the coding error value.

3. The method of claim 1, wherein selecting one of the plurality of sampling ratios comprises selecting one of the plurality sampling ratios resulting in a summation of the sampling error value and the coding error value having an overall error value beneath an overall error 20 threshold.

4. The method of claim 1, wherein the sampling error value is based on a power spectral density (PSD) of the video data and an estimation of the PSD of downsampled video data. 25

5. The method of claim 4, wherein the estimation of the PSD of downsampled video data is a function, wherein at least one parameter of the function is determined by at least one characteristic of the video data.

6. The method of claim 1, wherein the sampling error value is based on a difference of the 30 received video data and anti-aliasing filtered video data.

7. The method of claim 1, wherein the coding error value is based on a coding error model, wherein the coding error model is a function of the bit rate and a sampling ratio. - 54 - WO 2012/058394 PCT/US2011/058027

8. The method of claim 7, wherein the coding error model comprises a first parameter and a second parameter, and wherein the first parameter and the second parameter are each determined by at least one characteristic of the video data. 5

9. The method of claim 8, comprising: for each of a plurality of bit rates, determining a bit per pixel value; for each of the plurality of bit rates, determining a distortion value; for each of the plurality of bit rates, determining a plurality of estimated distortion values 10 based on a plurality of values for the first parameter and a plurality of values for the second parameter of the coding error model; and determining a selected value for the first parameter and a value for the second parameter of the coding error model, such that the plurality of distortion values have the minimum difference with the plurality of the estimated distortion values. 15

10. The method of claim 8, comprising: selecting a value for the first parameter from a first look-up table; and selecting a value for the second parameter form a second look-up table. 20

11. The method of claim 8, comprising: determining a power spectral density of the video data, wherein the values for the first and second parameters are based on a DC component of the power spectral density.

12. The method of claim 8, comprising: 25 determining a power spectral density of the video data, wherein the values for the first and second parameters are based on the decay speed toward the high frequency band of the power spectral density.

13. The method of claim 8, comprising: 30 wherein the at least one characteristic is a complexity value of the received video data; and wherein the complexity value is received from one of a user input and a network node. - 55 - WO 2012/058394 PCT/US2011/058027

14. The method of claim 1, comprising: receiving an indication of the bit rate from a network node.

15. The method of claim 14, comprising: 5 subsequent to selecting the one of the plurality of sampling ratios, receiving an indication of a second bit rate; for a second bit rate, determining an updated coding error value at each of the plurality of sampling ratios; selecting an updated sampling ratio based on a summation of the sampling error value 10 and updated coding error value; downsampling the input video at the updated sampling ratio; and encoding the downsampled video sequence.

16. The method of claim 1, wherein the sampling ratio comprises a horizontal sampling ratio 15 and a vertical sampling ratio and the horizontal sampling ratio is different from the vertical sampling ratio.

17. The method of claim 1, wherein the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is the same as the vertical 20 sampling ratio.

18. The method of claim 1, wherein a first selection of the sampling ratio is performed at the beginning of the received video data and at least a second selection of the sampling ratio is performed during the duration of the received video data. 25

19. A video decoding method, comprising: receiving compressed video data; receiving an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling 30 ratios; decoding the compressed video data to form reconstructed video data; upsampling the reconstructed video data at the selected sampling ratio to increase resolution of the reconstructed video data; and - 56 - WO 2012/058394 PCT/US2011/058027 outputting the filtered video data.

20. A video decoding system, comprising: A video decoder, the video decoder configured to: 5 receive compressed video data; receive an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios; decode the compressed video data to form reconstructed video data; 10 upsample the reconstructed video data to increase a resolution of the reconstructed video; and output the upsampled video data.

21. The video decoding system of claim 20, comprising: 15 a wireless receive/transmit unit in communication with a communication system, wherein the wireless receive/transmit unit is configured to receive the video data from the communication system. - 57 -