WO2012058394A1 - Systems and methods for adaptive video coding - Google Patents
Systems and methods for adaptive video coding Download PDFInfo
- Publication number
- WO2012058394A1 WO2012058394A1 PCT/US2011/058027 US2011058027W WO2012058394A1 WO 2012058394 A1 WO2012058394 A1 WO 2012058394A1 US 2011058027 W US2011058027 W US 2011058027W WO 2012058394 A1 WO2012058394 A1 WO 2012058394A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sampling
- video
- video data
- coding
- error value
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/59—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/587—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
Definitions
- Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, and the like.
- PDAs personal digital assistants
- Many digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), and extensions of such standards, to transmit and receive digital video information more efficiently.
- video compression techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), and extensions of such standards, to transmit and receive digital video information more efficiently.
- AVC Advanced Video Coding
- a video encoding method comprises receiving video data, and determining a sampling error value at each of a plurality of downsampling ratios.
- the video encoding method may also comprise, for a bit rate, determining a coding error value at each of the plurality of downsampling ratios and summing the sampling error value and the coding error value at each of the plurality of downsampling ratios.
- the video encoding method may also comprise selecting one of the plurality of downsampling ratios based on the sum of the sampling error value and the coding error value at the selected downsampling ratio,
- a video decoding method comprises receiving compressed video data and receiving an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios.
- the video decoding method may also comprise decoding the compressed video data to form reconstructed video data, upsampling the reconstructed video data at the selected sampling ratio to increase the resolution of the reconstructed video data, and outputting the filtered video data.
- a video decoding system comprises a video decoder.
- the video decoder may configured to receive compressed video data, and receive an indication of a selected sampling ratio, where the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios.
- the video decoder may also be configured to decode the compressed video data to form reconstructed video data, upsample the reconstructed video data to increase the resolution of the reconstructed video data, and output the upsampled video data.
- FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize the adaptive coding techniques described herein;
- FIG. 2 is a block diagram illustrating an example of video encoder that may implement techniques for the adaptive encoding of a video signal
- FIG. 3 is a block diagram illustrating an example of video decoder that may implement techniques for the adaptive decoding of a video signal
- FIG. 4 shows a coding scheme applying a codec directly on an input video
- FIG. 5 shows an exemplary embodiment utilizing coding with down- sampling and up- sampling stages
- FIGS. 6A and 6B show the processing illustrated in FIG. 5 decomposed into a sampling component and a coding component, respectively;
- FIG. 7 is a look-up table for a in accordance with one non-limiting embodiment
- FIG. 8 is a look-up table for ⁇ in accordance with one non-limiting embodiment
- FIGS. 9A, 9B and 9C illustrate searching strategies to find the sampling ratio M t in accordance with various non-limiting embodiments
- FIGS. 10A and 10B are process flows in accordance with one non-limiting embodiment
- FIG. 11 is a block diagram of a horizontal downsampling process having a downsampling ratio of — in accordance with one non-limiting embodiment
- FIG. 12 illustrates an example downsampling process
- FIG. 13 illustrates an example upsampling process
- FIG. 14 illustrates an example Gaussian window function
- FIG. 15 illustrates pixels during an example upsampling process
- FIG. 16 illustrates an exemplary encoder architecture in accordance with one non- limiting embodiment
- FIG. 17 illustrates an exemplary decoder architecture in accordance with one non- limiting embodiment
- FIG. 18 illustrates an exemplary embodiment of the pre-processing of the video data with regard to a transcoder
- FIG. 19A is a system diagram of an example communications system in which one or more disclosed embodiments may be implemented.
- FIG. 19B is a system diagram of an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 19A; and
- WTRU wireless transmit/receive unit
- FIG. 19C, 19D, and 19E is a system diagram of an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 19A.
- WTRU wireless transmit/receive unit
- Wireless communications technology has dramatically increased the wireless bandwidth and improved the quality of service for mobile users.
- 3GPP 3 rd Generation Partnership Project
- LTE Long Term Evolution
- 3G 3 rd Generation Partnership Project
- 3GPP 3 rd Generation Partnership Project
- LTE Long Term Evolution
- 3G 3 rd Generation
- wireless communications technology has greatly improved, the fast- growing demand of video content, such as high-definition (HD) video content for example, over mobile Internet brings new challenges for mobile video content providers, distributors and carrier service providers.
- HD high-definition
- Video and multimedia content that is available on the wired web has driven users to desire equivalent on-demand access to that content from a mobile device.
- a much higher percent of the world's mobile data traffic is becoming video content.
- Mobile video has the highest growth rate of any application category measured within the mobile data portion of the Cisco VNI Forecast at this time.
- the block size for processing video content under current compression standards such as the H.264 (AVC) standard for example, is 16x16. Therefore, current compression standards may be good for small resolution video content, but not for higher quality and/or higher resolution video content, such as HD video content for example.
- video coding standards may be created that may further reduce the data rate needed for high quality video coding, as compared to the current standards, such as AVC for example.
- JCT- VC Joint Collaborative Team on Video Coding
- ITU-VCEG International Telecommunication Union Video Coding Experts Group
- ISO-MPEG International Organization for Standardization Moving Picture Experts Group
- systems and methods are needed to meet the growing demand of high quality and/or resolution video content delivery over mobile Internet.
- systems and methods may be provided for high quality and/or resolution video content compatibility with current standards, such as HD video content compatibility with the AVC video compression standard for example.
- FIG. 1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the adaptive coding techniques described herein.
- system 10 includes a source device 12 that transmits encoded video to a destination device 14 via a communication channel 16.
- Source device 12 and destination device 14 may comprise any of a wide range of devices.
- source device 12 and destination device 14 may comprise wireless receive/transmit units (WRTUs), such as wireless handsets or any wireless devices that can communicate video information over a communication channel 16, in which case communication channel 16 is wireless.
- WRTUs wireless receive/transmit units
- the systems and method described herein, however, are not necessarily limited to wireless applications or settings.
- communication channel 16 may comprise any combination of wireless or wired media suitable for transmission of encoded video data.
- source device 12 includes a video source 18, video encoder 20, a modulator (generally referred to as a modem) 22 and a transmitter 24.
- Destination device 14 includes a receiver 26, a demodulator (generally referred to as a modem) 28, a video decoder 30, and a display device 32.
- video encoder 20 of source device 12 may be configured to apply the adaptive coding techniques described in more detail below.
- a source device and a destination device may include other components or arrangements.
- source device 12 may receive video data from an external video source 18, such as an external camera.
- destination device 14 may interface with an external display device, rather than including an integrated display device.
- the data stream generated by the video encoder may be conveyed to other devices without the need for modulating the data onto a carrier signal, such as by direct digital transfer, wherein the other devices may or may not modulate the data for transmission.
- the illustrated system 10 of FIG. 1 is merely one example.
- the techniques described herein may be performed by any digital video encoding and/or decoding device. Although generally the techniques of this disclosure are performed by a video encoding device, the techniques may also be performed by a video encoder/decoder, typically referred to as a "CODEC.” Moreover, the techniques of this disclosure may also be performed by a video preprocessor.
- Source device 12 and destination device 14 are merely examples of such coding devices in which source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetrical manner such that each of devices 12, 14 include video encoding and decoding components.
- system 10 may support one-way or two-way video transmission between devices 12, 14, e.g., for video streaming, video playback, video broadcasting, or video telephony.
- the source device may be a video streaming server for generating encoded video data for one or more destination devices, where the destination devices may be in communication with the source device over wired and/or wireless communication systems.
- Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed from a video content provider. As a further alternative, video source 18 may generate computer graphics- based data as the source video, or a combination of live video, archived video, and computer- generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. As mentioned above, however, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20.
- a video capture device such as a video camera, a video archive containing previously captured video, and/or a video feed from a video content provider.
- video source 18 may generate computer graphics- based data as the source video, or a combination of live video, archived video, and computer- generated
- the encoded video information may then be modulated by modem 22 according to a communication standard, and transmitted to destination device 14 via transmitter 24.
- Modem 22 may include various mixers, filters, amplifiers or other components designed for signal modulation.
- Transmitter 24 may include circuits designed for transmitting data, including amplifiers, filters, and one or more antennas.
- Receiver 26 of destination device 14 receives information over channel 16, and modem 28 demodulates the information.
- the information communicated over channel 16 may include syntax information defined by video encoder 20, which is also used by video decoder 30, that includes syntax elements that describe characteristics and/or processing of macroblocks and other coded units, e.g., GOPs.
- Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.
- CTR cathode ray tube
- LCD liquid crystal display
- OLED organic light emitting diode
- communication channel 16 may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media.
- Communication channel 16 may form part of a packet-based network, such as a local area network, a wide-area network, or a global network such as the Internet.
- Communication channel 16 generally represents any suitable communication medium, or collection of different communication media, for transmitting video data from source device 12 to destination device 14, including any suitable combination of wired or wireless media.
- Communication channel 16 may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from source device 12 to destination device 14.
- Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, Part 10, Advanced Video Coding (AVC).
- AVC Advanced Video Coding
- the techniques of this disclosure are not limited to any particular coding standard.
- Other examples include MPEG-2 and ITU-T H.263.
- video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
- UDP user datagram protocol
- the ITU-T H.264/MPEG-4 (AVC) standard was formulated by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership known as the Joint Video Team (JVT).
- JVT Joint Video Team
- the H.264 standard is described in ITU-T Recommendation H.264, Advanced Video Coding for generic audiovisual services, by the ITU-T Study Group, and dated March, 2005, which may be referred to herein as the H.264 standard or H.264 specification, or the H.264/ AVC standard or specification.
- the Joint Video Team (JVT) continues to work on extensions to H.264/MPEG-4 AVC.
- Video encoder 20 and video decoder 30 each may be implemented as any of a variety of suitable encoder circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective camera, computer, mobile device, subscriber device, broadcast device, set-top box, server, media aware network element, or the like.
- CDEC combined encoder/decoder
- a video sequence typically includes a series of video frames.
- a group of pictures generally comprises a series of one or more video frames.
- a GOP may include syntax data in a header of the GOP, a header of one or more frames of the GOP, or elsewhere, that describes a number of frames included in the GOP.
- Each frame may include frame syntax data that describes an encoding mode for the respective frame.
- Video encoder 20 typically operates on video blocks within individual video frames in order to encode the video data.
- a video block may correspond to a macroblock, a partition of a macroblock, or a collection of blocks or macroblocks.
- the video blocks may have fixed or varying sizes, and may differ in size according to a specified coding standard.
- Each video frame may include a plurality of slices.
- Each slice may include a plurality of macroblocks, which may be arranged into partitions, also referred to as sub-blocks.
- Many popular video coding standards such as H.263, MPEG-2, and MPEG-4, H.264/AVC (advanced video coding), HEVC (High Efficiency Video Coding) utilize motion- compensated prediction techniques.
- An image or a frame of a video may be partitioned into multiple macroblocks and each macroblock can be further partitioned.
- Macroblocks in an I- frame may be encoded by using the prediction from spatial neighbors (that is, other blocks of the I-frame).
- Macroblocks in a P- or B-frame may be encoded by using either the prediction from their spatial neighbors (spatial prediction or intra-mode encoding) or areas in other frames (temporal prediction or inter-mode encoding).
- Video coding standards define syntax elements to represent coding information. For example, for every macroblock, H.264 defines an mb_type value that represents the manner in which a macroblock is partitioned and the method of prediction (spatial or temporal).
- Video encoder 20 may provide individual motion vectors for each partition of a macroblock. For example, if video encoder 20 elects to use the full macroblock as a single partition, video encoder 20 may provide one motion vector for the macroblock. As another example, if video encoder 20 elects to partition a 16x16 pixel macroblock into four 8x8 partitions, video encoder 20 may provide four motion vectors, one for each partition. For each partition (or sub-macroblock unit), video encoder 20 may provide an mvd (motion vector difference) value and a ref idx value to represent motion vector information. The mvd value may represent an encoded motion vector for the partition, relative to a motion predictor.
- mvd motion vector difference
- the ref idx (reference index) value may represent an index into a list of potential reference pictures, that is, reference frames. As an example, H.264 provides two lists of reference pictures: list 0 and list 1. The ref_idx value may identify a picture in one of the two lists. Video encoder 20 may also provide information indicative of the list to which the ref idx value relates.
- the ITU-T H.264 standard supports intra prediction in various block partition sizes, such as 16 by 16, 8 by 8, or 4 by 4 for luma components, and 8x8 for chroma components, as well as inter prediction in various block sizes, such as 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4 for luma components and corresponding scaled sizes for chroma components.
- NxN and “N by N” may be used interchangeably to refer to the pixel dimensions of the block in terms of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 by 16 pixels.
- an NxN block generally has N pixels in a vertical direction and N pixels in a horizontal direction, where N represents a nonnegative integer value.
- the pixels in a block may be arranged in rows and columns.
- blocks need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction.
- blocks may comprise NxM pixels, where M is not necessarily equal to N.
- Video blocks may comprise blocks of pixel data in the pixel domain, or blocks of transform coefficients in the transform domain, e.g., following application of a transform such as a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video block data representing pixel differences between coded video blocks and predictive video blocks.
- a video block may comprise blocks of quantized transform coefficients in the transform domain.
- a slice may be considered to be a plurality of video blocks, such as macroblocks and/or sub-blocks.
- Each slice may be an independently decodable unit of a video frame.
- frames themselves may be decodable units, or other portions of a frame may be defined as decodable units.
- coded unit or “coding unit” may refer to any independently decodable unit of a video frame such as an entire frame, a slice of a frame, a group of pictures (GOP) also referred to as a sequence, or another independently decodable unit defined according to applicable coding techniques.
- GOP group of pictures
- the H.264 standard supports motion vectors having one-quarter-pixel precision. That is, encoders, decoders, and encoders/decoders (CODECs) that support H.264 may use motion vectors that point to either a full pixel position or one of fifteen fractional pixel positions. Values for fractional pixel positions may be determined using adaptive interpolation filters or fixed interpolation filters. In some examples, H.264-compliant devices may use filters to calculate values for the half-pixel positions, then use bilinear filters to determine values for the remaining one-quarter-pixel positions. Adaptive interpolation filters may be used during an encoding process to adaptively define interpolation filter coefficients, and thus the filter coefficients may change over time when performing adaptive interpolation filters.
- Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients.
- the quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.
- entropy coding of the quantized data may be performed, e.g., according to content adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), or another entropy coding methodology.
- CAVLC content adaptive variable length coding
- CABAC context adaptive binary arithmetic coding
- a processing unit configured for entropy coding, or another processing unit may perform other processing functions, such as zero run length coding of quantized coefficients and/or generation of syntax information such as coded block pattern (CBP) values, macroblock type, coding mode, maximum macroblock size for a coded unit (such as a frame, slice, macroblock, or sequence), or the like.
- CBP coded block pattern
- Video encoder 20 may further send syntax data, such as block-based syntax data, frame- based syntax data, slice-based syntax data, and/or GOP-based syntax data, to video decoder 30, e.g., in a frame header, a block header, a slice header, or a GOP header.
- the GOP syntax data may describe a number of frames in the respective GOP, and the frame syntax data may indicate an encoding/prediction mode used to encode the corresponding frame.
- Video decoder 30 may receive a bitstream including motion vectors encoded according to any of the techniques of this disclosure. Accordingly, video decoder 30 may be configured to interpret the encoded motion vector. For example, video decoder 30 may first analyze a sequence parameter set or slice parameter set to determine whether the encoded motion vector was encoded using a method that keeps all motion vectors in one motion resolution, or using a method where the motion predictor was quantized to the resolution of the motion vector. Video decoder 30 may then decode the motion vector relative to the motion predictor by determining the motion predictor and adding the value for the encoded motion vector to the motion predictor.
- Video encoder 20 and video decoder 30 each may be implemented as any of a variety of suitable encoder or decoder circuitry, as applicable, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic circuitry, software, hardware, firmware or any combinations thereof.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined video encoder/decoder (CODEC).
- An apparatus including video encoder 20 and/or video decoder 30 may comprise an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular telephone.
- Video encoder 200 may perform intra- and inter-coding of blocks within video frames, including macroblocks, or partitions or sub-partitions of macroblocks.
- Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame.
- Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames of a video sequence.
- Intra-mode may refer to any of several spatial based compression modes and inter-modes such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode) may refer to any of several temporal-based compression modes.
- I-mode may refer to any of several spatial based compression modes and inter-modes such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode) may refer to any of several temporal-based compression modes.
- components for inter- mode encoding are depicted in FIG. 2, it should be understood that video encoder 200 may further include components for intra-mode encoding. However, such components are not
- the input video signal 202 is processed block by block.
- the video block unit may be 16 pixels by 16 pixels (i.e., a macoblock (MB)).
- MB macoblock
- JCT-VC Joint Collaborative Team on Video Coding
- ISO/IEC/MPEG is developing the next generation video coding standard called High Efficiency Video Coding (HEVC).
- HEVC High Efficiency Video Coding
- extended block sizes called a "coding unit” or CU
- a CU can be up to 64x64 pixels and down to 4x4 pixels.
- a CU can be further partitioned into prediction units or PU, for which separate prediction methods are applied.
- Each input video block (MB, CU, PU, etc.) may be processed by using spatial prediction unit 260 and/or temporal prediction unit 262.
- Spatial prediction i.e., intra prediction
- Temporal prediction i.e., inter prediction or motion compensated prediction
- Temporal prediction uses pixels from the already coded video pictures to predict the current video block.
- Temporal prediction reduces temporal redundancy inherent in the video signal.
- Temporal prediction for a given video block is usually signaled by one or more motion vectors which indicate the amount and the direction of motion between the current block and one or more of its reference block(s).
- the mode decision and encoder controller 280 in the encoder chooses the prediction mode, for example based on a rate-distortion optimization method.
- the prediction block is then subtracted from the current video block at adder 216 and the prediction residual is transformed by transformation unit 204 and quantized by quantization unit 206.
- the quantized residual coefficients are inverse quantized at inverse quantization unit 210 and inverse transformed at inverse transformation unit 212 to form the reconstructed residual.
- the reconstructed block is then added back to the prediction block at adder 226 to form the reconstructed video block.
- Further in-loop filtering such as deblocking filter and adaptive loop filters 266, may be applied on the reconstructed video block before it is put in the reference picture store 264 and used to code future video blocks.
- coding mode inter or intra
- prediction mode information motion information
- quantized residual coefficients are sent to the entropy coding unit 208 to be further compressed and packed to form the bitstream 220.
- the systems and methods described herein may be implemented, at least partially, within the spatial prediction unit 260.
- FIG. 3 is a block diagram of a block-based video decoder in accordance with one non- limiting embodiment.
- the video bitstream 302 is first unpacked and entropy decoded at entropy decoding unit 308.
- the coding mode and prediction information are sent to either the spatial prediction unit 360 (if intra coded) or the temporal prediction unit 362 (if inter coded) to form the prediction block.
- the residual transform coefficients are sent to inverse quantization unit 310 and inverse transform unit 312 to reconstruct the residual block.
- the prediction block and the residual block are then added together at 326.
- the reconstructed block may further go through in-loop filtering unit 366 before it is stored in reference picture store 364.
- the reconstructed video 320 may then be sent out to drive a display device, as well as used to predict future video blocks.
- a pre-processing and/or post-processing system architecture may compress raw video data and/or transcode an already encoded video data, such as a bit stream for example, with further compression through jointly controlling the transform- domain quantization and spatial-domain down-sampling, without changing the standard format of the video stream.
- the pre-processing and/or post-processing system architecture may encode and/or decode video data in any format, such as H.263, MPEG-2, Flash, MPEG-4, H.264/AVC, HEVC or any similar multimedia format for example.
- These, and similar, formats may use such video compression methods as discrete cosine transform (DCT), fractal compression methods, matching pursuit, or discrete wavelet transform (DWT) for example, as described above.
- DCT discrete cosine transform
- DWT discrete wavelet transform
- MB Macroblock
- 16x16 a limitation of various existing compression standards, such as H.264/AVC
- MB Macroblock
- pixels may be partitioned into several block sizes dependent on the prediction modes.
- the maximum size of any block may be 16x16 and any two MBs may be independently transformed and quantized.
- This technique may provide very high efficiency for CIF/QCIF and other similar resolution contents.
- it may not be efficient for the video contents of higher resolutions, such as 720p, 1080i/1080p and/or similar or even higher resolutions for example. This may be because there is much higher correlation among pixels in local areas.
- the specified 16x16 MB size may limit further compression of utilizing such correlation information across adjacent MBs.
- the codec elements may include four types of information: 1) motion information, such as motion vector and reference frame index for example; 2) residual data; 3) MB header information, such as MB type, coded block pattern, and/or quantization parameters (QP) for example; 4) sequence-, picture-, and/or slice-layer syntax elements.
- motion information such as motion vector and reference frame index for example
- residual data such as motion vector and reference frame index
- MB header information such as MB type, coded block pattern, and/or quantization parameters (QP) for example
- QP quantization parameters
- sequence-, picture-, and/or slice-layer syntax elements While the motion information and residual data may be highly content-dependent, the MB header information and/or syntax elements may be relatively constant. Thus the MB header information and/or syntax elements may represent the overhead in the bit stream. Given the content and/or encoding profile, a higher compression ratio of an encoder may be achieved by reducing the bit rate of residual data.
- a higher compression ratio of an H.264 encoder may be achieved by reducing the bit rate of residual data.
- overhead may consume a large part of the bit stream for transmission and storage. Having such a large part of the bit stream consumed by overhead may cause an encoder, such as an H.264 encoder for example, to have a low efficiency.
- the pre-processing and/or post-processing in accordance with the systems and methods described herein may lead to less overhead, alignment of the motion compensation accuracy and reconstruction accuracy, enhancement of residual accuracy, and/or less complexity and/or memory requirements. Less overhead may be produced due to the downsampling performed in the pre-processing, as the number of MB may be reduced to the downsampling rate. Thus, the near-constant MB header and/or slice-layer syntax elements may be reduced.
- the motion compensation accuracy and reconstruction accuracy may also be aligned in the pre-processing and/or post-processing of video data.
- the number of motion vector differences (MVD) may be reduced.
- the reduction in MVD may save bits for encoding motion information.
- the saved bits may be used to encode the prediction error in low bit rate scenarios. Therefore, the reconstruction accuracy may be improved by aligning the accuracy of motion compensation and accuracy of quantized prediction error.
- the pre-processing and/or post-processing of video data may also enhance residual accuracy.
- the same transform block size may correspond to a higher transform block size in the original frames.
- an 8x8 transform block size may correspond to a transform block size of 16x16 at 1 ⁇ 4 downsampling rate.
- the quantization steps may be the same for the transform coefficients in an encoder, such as an H.264 encoder for example, the encoder may lose information in both high frequency and low frequency components. Therefore, the pre-processing and/or post-processing of video data described herein may preserve the higher accuracy of low frequency components than traditional encoders for the high resolution and low bit rate encoding cases, which may produce better subjective quality.
- the upsampling process in a decoder may be used to interpolate the pixels to recover the original frames.
- the pre-processing and/or post-processing of video data may also result in less complexity and/or memory requirements.
- the complexity and/or memory requirement of encoding (or transcoding) may be reduced to the same level.
- the complexity and/or memory requirements of decoding may also be reduced to the same level.
- These encoding and/or decoding processes may facilitate the application of lower resolution encoders and/or decoders, such as the encoding in mobile phones and other resource-limited devices for example.
- these encoding and/or decoding processes may facilitate the incorporation and/or application of the H.264 encoder and/or decoder in mobile phones.
- FIG. 4 shows a coding scheme applying a codec (i.e., a H.264/AVC codec) directly on an input video.
- FIG. 5 shows an exemplary embodiment utilizing coding with down- and up-sampling stages. Compared with the approach illustrated in FIG. 4, the approach illustrated in FIG. 5 may be able to allocate more bits to code the intra- and inter- prediction errors in the coding step; hence it may obtain a better reconstruction with higher visual quality.
- a codec i.e., a H.264/AVC codec
- down-sampling introduces information loss (specifically the high frequency components), when the operating bit rate is low due to network limitation, better reconstruction at the coding stage may outweigh the detail loss in the downsampling process; hence better overall visual quality is provided. Additionally, computation power can be saved by coding a smaller (i.e. downsampled) video. However, since downsampling causes information loss prior to the coding process, if the original video is downsampled too much, information loss introduced upfront may outweigh the benefit of higher fidelity in the coding stage. Thus, the systems and methods described herein generally seek to balance the information loss introduced during downsampling and the information loss introduced during coding.
- the processes described herein may derive a plurality of downsampling ratios, and select a downsampling ratio that reduces a total amount of distortion introduced during the down- sampling and coding stages.
- the selected down-sampling ratio may be selected given the available data transmission capacity, input video signal statistics, and/or other operational parameters.
- the selected down-sampling ratio may be the down-sampling ratio that optimally reduces the overall distortion.
- the flexibility provided by the filters described herein may be more useful than other filters, such as anti-aliasing filters that may provide only 2x2 down- sampling and up-sampling, for example.
- the downsampling ratio 2x2 is so high that the high frequency components are significantly lost and cannot be compensated even if using lossless coding. Therefore, at high bit-rates, the sampling ratio may be adjusted to provide a tradeoff between resolution reduction and detail preserving.
- the downsampling ratio denoted as M is a variable which may be determined as a function of various parameters, such as the available data transmission capacity, Quality of Service Class Identifier (QCI) of the bearer associated with the video, and characteristics of the input video signal. For example, if the data transmission capacity is relatively plentiful for the input video signal, then an H.264/AVC encoder will have enough bits to code the prediction errors; in this case, the value of M may be set approaching 1.0.
- QCI Quality of Service Class Identifier
- a larger value of M may be selected (resulting in more downsampling), as the information loss due to the downsampling process will be well compensated by lesser coding error due to the coding stage.
- the data transmission capacity is usually represented by bit rate, which may be in fine granularity, in various embodiments the value of M may be very flexible.
- systems and methods are provided to determine a selected sampling ratio M based on, at least in part, on the available data transmission capacity and the input video signal. Given the selected sampling ratio M, a dedicated filter may be calculated to downsample the video for coding and upsample the decoded video for display.
- Various techniques for design anti-aliasing filters for arbitrary rational-valued sampling ratios are also described in more detail below with regard to FIGS. 1 1-15.
- the video input is denoted as / and the output of the conventional codec is denoted fi and the output of an example codec in accordance with the systems and methods is denoted as
- the reconstruction error of the codec in FIG. 4 may be defined as equation (1):
- the reconstruction error of the codec in FIG. 5 may be defined as equation (2):
- the codec in FIG. 5 performs better than the codec in FIG. 4, if ⁇ is smaller than ⁇ .
- the gap between ⁇ and ⁇ may be increased (and in some cases maximized) by finding the M, as shown in equation(3):
- equation (3) is simplified and is stated as equation (4):
- the sampling ratio M may be identified, such that the reconstruction error ( ⁇ ) of the codec shown in FIG. 5 is reduced.
- the sampling ratio M may be determined which will result in reconstruction error reaching the minimum (or at least substantially near the minimum).
- the sampling ratio M is selected from among of set of predetermined sampling ratios, where the selected ratio M provides the smallest reconstruction error from among the set of predetermined sampling ratios.
- M is a scalar, such that the horizontal and vertical directions have the same ratio. Given the resolution of the video W x H, the resolution of the downsampled
- video is — x— .
- a sample aspect ratio SAR
- PAR picture aspect ratio
- this disclosure is not so limited. Instead, some embodiments may utilize a coding process with uneven ratios applied for each direction.
- the processing illustrated in FIG. 5 may be decomposed into the sampling component (FIG. 6A) and the coding component (FIG. 6B).
- the sampling component shown in FIG. 6A for the input original video sequence upsampling with a factor M 608 is applied right after downsampling with a factor M 602 to generate f ⁇ that is, the error between / and is caused only by sampling and may be referred to as "downsampling error" and denoted as ⁇ , which may be defined by equation (5):
- coding error E [(d 1 - d 2 ) 2 ] (6)
- Equation 2 The relationship among (Equation 2) and a d and a c may thus be defined by equation (7):
- ⁇ is a weighting factor in the range of [0, 1].
- the weighting factor ⁇ is set to 1 for the exemplary embodiments described herein.
- / may be filtered by an anti-aliasing filter, which may be a type of low-pass filter, before /is downsampled. Additional details regarding example filters are described below with regard to FIGS. 1 1-15.
- the output of the sampling stage denoted as / (FIG. 6A), is a blurred version of /, because no longer possesses the energy components with frequency components higher than the cut-off frequency of the anti-aliasing filter applied to / Therefore, in some embodiments, the sampling error can be measured in the frequency domain by measuring the energy of the high frequency components that exist in /but are lost in /.
- the energy distribution of / can be modeled based on the real Power Spectral Density (PSD) or the estimated PSD, as described in more detail below.
- PSD Power Spectral Density
- other techniques may be used to assess the sampling ratio's effect on the video signal's frequency content.
- the PSD 5 ⁇ ⁇ 1 , ⁇ 2 may be calculated by 2-D discrete-time Fourier transform (DTFT) as in equation (9):
- R(x h , ⁇ ⁇ ) may be an estimate based on a set of video signals. Applying 2-D DTFT to the estimated R(x h , x v ) produces an estimated PSD, which may no longer be consistent.
- PSD is estimated by the periodgram of the random field, as given in equation (10):
- x[w,h] is one frame in the video sequence /; ⁇ ⁇ ( ⁇ 1 , ⁇ 2 ) is x[w, z]'s representation in the frequency domain.
- the video sequence / may consist of consistent content, such as a single shot.
- ⁇ ⁇ ⁇ ! , ⁇ 2 ) calculated based on one typical x[w,h] , e.g., the first frame, in / may represent the energy distribution of the whole sequence /
- ⁇ ⁇ ⁇ ! , ⁇ 2 can be the average of a plurality of PSDs: ⁇ ⁇ 1 ⁇ !
- the techniques for estimating the PSD of the whole sequence may vary. For example, in one embodiment a plurality of frames: xi[w,h], X2[w,h], etc. may be picked out from / at a regular interval, e.g., one second, and a plurality of corresponding PSDs: ⁇ ⁇ 1 ( ⁇ 1( ⁇ 2 ), SXX2 ( ⁇ 1( ⁇ 2 ), etc. may be calculated and averaged to generate ⁇ ⁇ ( ⁇ 1( ⁇ 2 ).
- the video sequence / is divided into / segments, where each segment consists of a group of successive frames (for example, such segmentation may be based on content, motion, texture, and the structure of edges, etc), and has an assigned weight of 1 ⁇ 43 ⁇ 4.
- the PSD S xx may be modeled using formulas, as shown in equations (13), (14) and (15):
- ⁇ ⁇ ( ⁇ , ⁇ 2 ) ⁇ ⁇ 1( ⁇ 2 , b) (13)
- b [b 0 , b 1 , ... & n _i] is a vector containing the arguments of the function F(-).
- the function F(-) used to model S xx has one parameter, as shown in equation (14): where K is a factor to ensure energy conservation. Since the exact total energy in the spatial domain is unknown (since x[w,h] is unavailable), in some embodiments it may be estimated as shown in equation (15):
- b 0 is an argument which may be determined by the resolution and content of the video sequence.
- the content of b 0 is classified into three categories: simple, medium, and tough.
- Empirical values of b 0 for different resolutions and context in accordance with one non-limiting embodiment are shown in Table 1.
- the ratio M is a rational number, it can be represented as - , A ⁇ B.
- a downsampled video has the resolution w x ⁇ H ⁇ .
- the proportion of the reduced resolution is equal to (1— -).
- the proportion of the lost frequency components is also equal to (1— -) and all these lost components are located in the high frequency domain, if the anti-aliasing filter applied to / has a sharp cut-off frequency at
- the PSD of /j may be estimated from S xx (a) 1( ⁇ 2 ) by setting the values of ⁇ ( ⁇ 1( ⁇ 2 ), ( ⁇ 1( ⁇ 2 G [— ⁇ ,— - ⁇ U [- ⁇ , ⁇ equal to zero, as shown in equation (16):
- the downsampling error ⁇ may be calculated by equation (18):
- the downsampling error ad provided by equation (18) provides an indication of the difference of high frequency energy content between the input video signal and the video signal sampled at a downsampling rate.
- Other techniques may be used to generate downsampling error ⁇ .
- the downsampling error ⁇ may be obtained by determining the mean squared error (MSE) between the downsampled and upsampled video signal f 3 and the input video signal /
- MSE mean squared error
- the downsampling error a d may be obtained by applying the anti-aliasing filter to the input video signal and determining the MSE between the filtered / and the original input video /
- the downsampling error ⁇ may be obtained by applying a high-pass filter that has the same cut-off frequency with the aforementioned anti-aliasing filter to the input video signal / and determining the average energy per pixel of the high-pass filtered /
- the coding error a c 2 may be estimated by a model.
- R-D rate-distortion
- r is the average number of bits allocated to each pixel, i.e., bits per pixel (bpp).
- r may be calculated by equation (20):
- fps is the frame rate, which means the number of frames captured in each second
- M3 ⁇ 4 and M v are the sampling ratios in the horizontal and vertical directions, respectively
- W is the horizontal resolution
- H is the vertical resolution
- R is the bit rate.
- the bit rate R may be acquired, or otherwise deduced, by a variety of techniques.
- the bit rate R may be provided by a user of the coding system.
- a network node associated with the coding system such as a video server or media-aware network element, may monitor the bit rates associated with various video streams.
- the video encoder may then query the network node to request a bit rate indication for a particular video stream.
- the bit rate may change over time, such as during handovers or IP Flow Mobility (IFOM) functionality associated with a user device receiving video.
- the encoder may receive messages containing updated target bit rates.
- IFOM IP Flow Mobility
- the bit rate R may be deduced by the decoder from the Quality of Service Class Indicator (QCI) assigned to the video stream. For example, QCIs one through four currently offer guaranteed bit rates (GBR).
- the GBR may be utilized by the video encoder to determine coding error a c 2 .
- the bit rate R may be determined, or otherwise provided, by a user device associated with a decoder. For example, the user device may provide to the encoder through appropriate signaling an estimate of the total aggregate data transmission throughput.
- the bit rate R may be an indication of the throughput through two or more radio access technologies such as a cellular RAT and a non-cellular RAT, for example.
- the RTP/RTCP protocols may be used to ascertain bit rate information. For example, RTP/RTCP may be run in a WRTU and a basestation in order to collect the application layer bit rate. This bit rate R may then be utilized in equation (20).
- the R-D model in equation (19) has two parameters: a and ⁇ , of which the values vary according to factors including, but not limited to, the content of the sequence, the resolution of the sequence, the encoder implementation and configurations, and so forth.
- a and ⁇ of which the values vary according to factors including, but not limited to, the content of the sequence, the resolution of the sequence, the encoder implementation and configurations, and so forth.
- the coding error a c 2 for a particular sampling ratio may then be calculated.
- the average bits per pixel r using equation (20) may be first determined.
- the determined average bits per pixel r may then be used to calculate the coding error a c 2 , as described by equation (19).
- the coding error ⁇ may then be calculated for different sampling ratios.
- a new average bits per pixel r may be calculated using new sampling ratio values in equation (19). This new value of r may then be used to solve equation (19).
- off-line training may be utilized to find the values for a and ⁇ which which most accurately predict, or model, the distortion from the coding process.
- a video may be preprocessed to determine a relationship between the bit-rate and the coding distortion. The determined relationship may then be utilized when determining a sampling ratio as the available bit rate, or target bit rate, changes over time during video transmission.
- the relationship may be influenced by factors including by not limited to the content of the video data, the resolution of the video data, the encoder implementation and configurations, and so forth.
- an encoder configured at known settings may encode a given sequence at the full-resolution. This simulation may be performed at a range of bit-rates ⁇ Ro, Ri, RN-I ⁇ , producing a set of distortions ⁇ D 0 , D H DM-I ⁇ corresponding to each bit- rate.
- the bit-rates may be normalized to bpp ⁇ r 0 , r h ..., rn-i) using equation (21): fps X W X H
- the corresponding distortions may be normalized accordingly to mean squared error (MSE), denoted as ⁇ do, di, dn-i ⁇ .
- MSE mean squared error
- the pairs of normalized bit-rate and distortion ⁇ r di ⁇ (0 ⁇ i ⁇ N) may be plotted as an R-D curve.
- a numerical optimization algorithm may be used to fit that R-D curve by solving the equation in (22) to find desired values of a opt and ⁇ ⁇ .
- the video sequence or a segment of the sequence is accessible for pre-processing, but off-line training is unaffordable for the applications because of the high complexity, for example.
- a signal analysis may be performed based on the available part of the video sequence and useful features may be extracted that reflect the characteristics of the video sequence, such as motion, texture, edge, and so forth.
- the extracted features and the values of parameter a and ⁇ have high correlations, and therefore the extracted features may be used to estimate the values of a and ⁇ providing a reduction in coding-induced distortion.
- the video sequence based on the PSD may be analyzed and two features may be extracted from S xx .
- One feature that may be utilized is the percentage of energy of the DC component, F DC
- the cut-off frequency ⁇ o) c represents the PSD decay speed toward the high frequency band, with the absolute value of ⁇ o) c is in the range [0, ⁇ ].
- F DC and ⁇ 0 may be calculated by equations (23) and (24), respectively:
- F DC is truncated to the range of [0.85, 0.99] and quantized by an H-step uniform quantizer.
- ⁇ ⁇ is truncated to the range of [0, 0.9 ⁇ ] and quantized by an L-step uniform quantizer.
- F DC is quantized by a 15-step uniform quantizer with the reconstruction points at ⁇ 0.85, 0.86, 0.98, 0.99 ⁇ and ⁇ ⁇ is quantized by a 10-step uniform quantizer with the reconstruction points at ⁇ . ⁇ , ⁇ . ⁇ , ..., 0.8 ⁇ , 0.9 ⁇ .
- Look-up tables for a and ⁇ using F DC and ⁇ ⁇ as indices in accordance with one embodiment are shown in FIG. 7 and FIG. 8, respectively. It is noted that -1.0 in some entries does not indicate the values of a or /?; instead, the combinations of F DC and ⁇ ⁇ that goes to the entries with value -1.0 could not happen in practice.
- none of the frames that represent the typical content of a sequence is accessible for pre-processing, (e.g., x[w,h] in equation (10)) to estimate PSD or consequently extract features from PSD to analyze the video sequence.
- a mode (referred to herein as a "simplified mode") may be used estimate and ⁇ .
- the values of and ⁇ may be determined by looking up 2-D tables.
- the pre-defined resolution formats may be the commonly used ones, such as CIF, WVGA, VGA, 720p, 1080p, and so forth. In case the actual resolution of the input / is not one of the pre-defined, the most similar pre-defined resolution may be used for approximation.
- the content of a video sequence may include motion, texture, structure of edges, and so forth. Given the bit rate, video with simple content may be less degraded than complex videos after coding. In some embodiments, the content of a video sequence can be classified into several categories from “simple" to "tough", depending on the level of granularity that the application has.
- the type of content may, for example, be indicated by the users based on their prior knowledge of the video; or, when prior knowledge does not exist, the content type maybe automatically set to the default value.
- Table 2 may be used as the 2-D look-up tables for the values of and ⁇ . Table 2 indicates values of and ⁇ for different resolutions and content in accordance with various embodiments.
- the pre-defined resolutions includes CIF, WVGA, 720p, and 1080p, and three categories of content (simple, medium, tough) are used, this disclosure is not so limited.
- additional level of granularity may be included in the table.
- the default content type may be set to "medium.”
- the complexity of the video may be ascertained through a variety of techniques. For example, in one embodiment user input is received which indicates a relative level of complexity. This user input may then be used to determine an appropriate a and ⁇ to be used in equation (19).
- video characteristic information (such as complexity) may be received from a network node that has access to the information. Based on this video information, suitable values of a and ⁇ may be determined (e.g., via a look up table) and subsequently used in equation (19).
- a complexity value for the video may be calculated or estimated from content statistics by prestoring some frames before downsampling the first frame. In this regard, a variety of techniques may be utilized, such as pixel value gradients, histograms, variances, and so forth.
- Identifying the minimum of overall error is equivalent to finding the minimum of the summation of the sampling error ⁇
- and ⁇ ⁇ ? in accordance with various non-limiting embodiments are discussed above.
- Various algorithms that may be used to search for the M to that reduces, and in some cases minimizes, the overall error are described in more detail below.
- the sampling ratio M - for the horizontal and vertical directions must be the same.
- this requirement may serve as a first constraint.
- the second constraint for many applications it may be preferred that the
- downsampled resolution - W x - H be integers for a digital video format. In some applications, however, some cropping and/or padding may be used to obtain integer number of pixels in either dimension. In any event, with these two constraints, the possible values of M are limited. Denoting the greatest common divisor (GCD) of W and H as G, possible ratios may be represented by equation (25).
- the output resolution is not only required to be integers, but also required to be the multiples of K.
- K is equal to 16
- MB macroblocks
- a search method which finds an appropriate value of M without determining the overall error for all possible values of M is utilized.
- FIG. 9A, 9B, and 9C illustrate searching strategies to find the sampling ratio Mj in accordance with various non-limiting embodiments.
- FIG. 9 A shows an exhaustive searching strategy
- FIG. 9B shows searching with large steps
- FIG. 9C shows fine searching.
- M 13 is selected as the sampling ratio in the illustrated embodiment.
- searching may be performed in large steps, as shown in FIG. 9B, in order to reach the range that the desired M j is located. Then, further search with finer steps within that range is conducted as shown in FIG. 9C.
- M has 24 possible values and the exhaustive search in FIG. 9A calculates the overall error ⁇ 24 times to find the selected M ; ; in comparison, the combination of coarse and fine search in FIG. 9B and FIG. 9C reduces the computations by half.
- the selected sampling ratio may be selected from any suitable ratio that produces an overall error ⁇ beneath an overall error threshold.
- any one of the sampling ratios resulting in an overall error level beneath the threshold may be selected as a sampling ratio for coding.
- the encoding may proceed with that ratio as the selected sampling ratio.
- the joint event of (M3 ⁇ 4, M v ) can have W x H possibilities.
- the exhaustive search that goes through all these possibilities, while possible, may be too time-consuming for most applications.
- the W x H possibilities may be processed using large steps, as shown in equation (29) and equation (30), where Ah and A v are integerstep sizes for the horizontal and vertical directions, respectively:
- the sampling ratio identified found by this strategy may be one of the local minimums instead of the global optimum.
- several ratios (M hl , M vl ), (M h 2, M v2 ), and so forth are identified which provide relatively small values of the error.
- a fine search is performed in the neighborhood of each candidate to find the respectively refined ratios( ftl , M vl ), ( ft2 , M v2 ), and so forth that yield local minimum errors within the given neighborhood.
- the final ratio may then be selected among (M hl , M vl ), ( ft2 , M v2 ), and so forth as the one yielding the lowest .
- search with large steps is performed first with the constraint of even ratio in the two directions, similar to FIG. 9B.
- the ratio found from this first step may be identified as Mj .
- Mj is applied for both horizontal and vertical directions.
- a range of [M a , Mb] may be defined which encloses the desired ratio Mj , that is, M a ⁇ Mj ⁇ Mb.
- the constraint of enforcing the same ratio for the horizontal and vertical directions is then released and the following search may be performed to obtain selected sampling ratios for each of the two directions separately.
- the search range of the horizontal and vertical ratios, M3 ⁇ 4 and M v are shown in equation (31) and equation (32), respectively:
- the search range of (M3 ⁇ 4 , M v ) is reduced from W x H to ( ⁇ -— j x (— — ). Then, the aforementioned combination of coarse search followed by fine search is applied within this search range to find the final selected subsampling ratios for the horizontal and the vertical directions.
- FIG. 10A illustrates a process flow 1000 for encoding video data in accordance with one non-limiting embodiment.
- video data to be encoded is received.
- a sampling error value is determined at each of a plurality of sampling ratios.
- the sampling error value is determined using a power spectral density (PSD) of the received video data and an estimation of the PSD of downsampled video data.
- PSD power spectral density
- a data-based technique may be used to estimate the PSD for the video data.
- a model-based technique may be used to estimate the PSD for the video data.
- a coding error value may be determined at each of a plurality of sampling ratios. The coding error may be based on a given bit rate.
- the bit rate may be received from a network node, such as a video server or an end-user device, for example.
- a coding error model may be developed to provide coding error values for each of the plurality of sampling ratios.
- the coding error model may comprise a first parameter and a second parameter that each independently varies based on characteristics of the received video data. Values for the first and second parameters may be determined using any suitable technique. For example, in one embodiment, the first and second parameters are identified through a curve-fitting process. In another embodiment, the first and second parameters may be identified through consultation of various look-up tables, as described in more detail above.
- the coding error values at 1006 may be determined before the sampling error values at 1004.
- a sampling ratio is selected.
- a plurality of sampling ratios may be selected throughout the duration of the video encoding process. For example, a first sampling ratio may be selected at the beginning of the received video data and subsequently one or more additional sampling ratios may be selected during the duration of encoding event.
- an exhaustive search is performed to identify a selected sampling ratio.
- a non-exhaustive search is performed to identify a selected sampling ratio. For example, only errors associated with a subordinate set (subset) of the plurality of sampling ratios may be summed.
- a sampling ratios may be selected. In some embodiments, additional searching may be utilized to further refine the search for the selected sampling ratio.
- the video data may downsampled at the selected sampling ratio and, at 1016, the downsampled video data may be encoded.
- the encoding process may be re-evaluated to determine an updated sampling ratio.
- the sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio. These horizontal and vertical sampling ratios may be the same or different.
- FIG. 10B illustrates a process flow 1050 for decoding video data in accordance with one non-limiting embodiment.
- compressed video data are received.
- the video data may be received from any suitable provider, such as a live video stream or previously stored video.
- an indication of a selected sampling ratio is received.
- the sampling ratio may be based on, for example, a summation of a sampling error value and a coding error value across a plurality of sampling ratios.
- the block of coefficients is decoded to form reconstructed video data.
- the reconstructed video data is upsampled at the selected sampling ratio to the resolution of the reconstructed video data.
- the upsampled video data may be outputted.
- the downsampling process may downsample it by factors a and b for the horizontal and vertical directions, respectively, where a and b are positive rational numbers.
- the output video has the resolution ⁇ x
- a and b can be any positive rational numbers, represented by and - ⁇ , respectively, where M h , N h , M v , and N v are all positive integers
- the output of a downsampling process is also a digital video, which has integer numbers of rows and columns of pixels.
- ⁇ and ⁇ i.e., W M h- an j tfxM "
- N h and N v being factors of W and H to satisfy output resolution requirements.
- the upsampling process (i.e., by upsampling unit 1712 in FIG. 17) may have an upsampling ratio equal to the downsampling ratio of the downsampling process which results in the processed video having the same resolution as the original input video.
- the upsampling ratio is decoupled from the downsampling ratio, which may allow for a more flexible upsampling ratio. For example, assuming the video to be upsampled has the resolution Wj xH;, the upsampling ratios may be set to c and d for the horizontal and vertical directions, respectively, and get the resolution of the output video equal to cWi xdHi, where c and d are positive rational numbers.
- c and d may be configured before upsampling based on various criteria. For example, in order to make the output video has a resolution greater than or equal to the input resolution, the factors c and d should be greater than or equal to 1.0. Moreover, while c and d can be any positive rational numbers, represented by— and— , respectively, where K h , L h , K v , and L v are all positive integers, in various embodiments, L h and L v are factors of Wj and Hi, respectively. As an additional criteria for choosing c and d,
- FIG. 1 1 is a block diagram 1 100 for a horizontal downsampling process having a downsampling ratio of — .
- the block diagram 1 100 comprises of upsampling M h times at block 1 102, applying filter fd , h at block 1 104, and downsampling N h times at block 1 106.
- the width of the output video is W Mh _
- the original row (FIG. 12(a)) with the spectrum F (FIG. 12(b)) is first upsampled M h times by inserting zero-valued samples.
- the resulting row is illustrated as X u in FIG. 12(c).
- the spectrum F is squeezed M h times as shown in FIG. 12(d), denoted as F u .
- F u the spectra centering at integer multiples of are introduced by the zero-insertion and need to be removed by the (as shown in block 1 104 in FIG. 1 1).
- the cutoff frequency oifdh should be — (e.g.,
- a two-step strategy may be used: applying the horizontal and vertical filters consequently (in any order) to the original video.
- a 2-D non-separable filter f d ,2D may be calculated, which is the 2-D convolution of /3 ⁇ 43 ⁇ 4 and f d,v , and apply fao to the original video directly.
- Designing the upsampling filter may be similar to designing the downsampling filter.
- the horizontal direction may be focused on first, and then it may be extended to the vertical direction.
- a resolution of the input video having a width Wi will be changed to Wl Kh after upsampling.
- a window function may be utilized to limit the size of the above- referenced filters.
- Suitable types of the window functions include, but are not limited to, Hanning, Hamming, triangular, Gaussian, and Blackman windows, for example.
- a Gaussian window function expressed in equation (38) is used, where N denotes the length of the filter and ⁇ is the standard variance of the Gaussian function.
- a two-step strategy may be used: applying the horizontal and vertical filters consequently (in any order) to the original video.
- a 2-D non-separable filter f 2D may be calculated, which is the 2-D convolution of f voyage 3 ⁇ 4 and f detox v , and apply f a,2 nto the original video directly.
- frames may be interpolated to WM h x HM V and W 1 K h x E X K V as the intermediate for downsampling and upsampling, respectively, many of the interpolated pixels may not be used. For instance, in some embodiments, only (or ) pixels are picked out to form the final output video with the resolution ⁇ f or downsampling (or WlKh ⁇ ⁇ h ⁇ . f or
- the 1504a, 1504b, 1504c, etc. represent the integer pixels and the white ones 1506 represent inserted zeros.
- the pixels forming the final downsampled row are first selected, as shown in row 1508 of FIG. 15. Then these selected positions may be classified into M h categories, based on their phases. In one embodiment, the phase of a pixel is determined by its distances from the neighboring integer pixels.
- each of the down- sampling and up-sampling filters (i.Q., fd,h, fd,v, f u ,h, and ⁇ v ) are decomposed to a set of phase filters, and each phase filter is used to interpolate the associated pixels.
- each phase filter is used to interpolate the associated pixels.
- Table 3 the lengths of fd,h, fd.v, fu.h, and ⁇ jV are denoted as ND,H, ND, V, ⁇ , ⁇ , and ⁇ , ⁇ , respectively.
- the decomposition process is provided in Table 3, where i is a non- negative integer and k is the index of the filter.
- FIG. 16 and FIG. 17 illustrate example embodiments of architectures including preprocessing and/or post-processing steps and that may be used before, after, and/or concurrently with encoding, decoding, and/or transcoding video data in accordance with the systems and methods described herein.
- the pre-processing and/or post-processing may by an adaptive process including quantization, down-sampling, upsampling, anti-aliasing, low-pass interpolation filtering, and/or anti-blur filtering of video data, for example.
- the pre-processing and/or post-processing of the video data may enable the use of standard encoders and/or decoders, such as H.264 encoders and/or decoders for example.
- FIG. 16 illustrates an exemplary encoder architecture 1600 which includes the processing and pre-processing that may be performed prior to or concurrently with encoding of video data in order to obtain the selected sampling ratio.
- the transform 1608, quantization 1610, entropy encoding 1612, inverse quantization 1614, inverse transform 1616, motion compensation 1620, memory 1618 and/or motion estimation 1624 described above with reference to FIG. 2 may be a part of the encoder processing for the video data.
- the anti-aliasing filter 1604, downsampling unit 1606, and encoder controller 1622 may be a part of the pre-processing steps for encoding the video data.
- These pre-processing elements may be incorporated into an encoder, work independently of the encoder, or be configured to sit on top of the encoder. In any event, after the video data from the input 1602 has been encoded, the encoded video data may be transmitted via a channel 1626 and/or to storage.
- an output buffer may be provided for storing the output encoded video data.
- the buffer fullness may be monitored, or the buffer input and output rates may be compared to determine its relative fullness level, and may indicate the relative fullness level to the controller.
- the output buffer may indicate the relative fullness level using, for example, a buffer fullness signal provided from the output buffer to the encoder controller 1622.
- the encoder controller 1622 may monitor various parameters and/or constraints associated with the channel 1626, computational capabilities of the video encoder system, demands by the users, etc., and may establish target parameters to provide an attendant quality of experience (QoE) suitable for the specified constraints and/or conditions of the channel 1626.
- the target bit rate may be adjusted from time to time depending upon the specified constraints and/or channel conditions. Typical target bit rates include, for example 64 kbps, 128 kbps, 256 kbps, 384 kbps, 512 kbps, and so forth.
- video data is received from an input 1602, such as a video source.
- the video data being received may include an original or decoded video signal, video sequence, bit stream, or any other data that may represent an image or video content.
- the received video data may be pre-processed by the anti-aliasing filter 1604, downsampling unit 1606, and/or encoder controller 1622 in accordance with the systems and methods described herein.
- the anti-aliasing filter 1604, downsampling unit 1606, and/or encoder controller 1622 may be in communication with one another and/or with other elements of an encoder to encode the received video data for transmission.
- the anti-aliasing filter 1604 may be designed using the techniques described above with respect to FIGS. 1 1-15.
- the preprocessing of the received video data may be performed prior to or concurrently with the processing performed by the transform, quantization, entropy encoding, inverse quantization, inverse transform, motion compensation, and/or motion estimation other elements of the encoder.
- the original and/or decoded video data may be transmitted to an anti-aliasing filter 1604 for pre-processing.
- the anti-aliasing filter may be used to restrict the frequency content of the video data to satisfy the conditions of the downsampling unit 1606.
- the anti-aliasing filter 1604 for 2: 1 downsampling may be an 1 1- tap FIR, i.e. [1, 0, -5, 0, 20, 32, 20, 0, -5, 0, l]/64.
- the anti-aliasing filter may be adaptive to the content being received and/or jointly designed with quantization parameters (QP).
- the encoder controller 1622 may determine the selected sampling ratio and communicate with the downsampling unit 1606 during pre-processing of the video data to provide the downsampling unit 1606 with the selected sampling ratio. For example, the encoder controller 1622 may adaptively select the filter types (separable or non-separable), filter coefficients, and/or filter length in any dimension based on the statistics of video data and/or channel data transmission capacity.
- the pre-processing of the video data may include down- sampling the video data using down-sampling unit 1606.
- the down-sampling unit 1606 may downsample at the sampling ratio M, as described in detail above.
- the video data may be transmitted to the downsampling unit 1606 from the anti-aliasing filter 1604.
- the original and/or decoded video data may be transmitted to the downsampling unit 1606 directly.
- the downsampling unit 1606 may downsample the video data to reduce the sampling ratio of the video data. Down-sampling the video data may produce a lower resolution image and/or video than the original image and/or video represented by the video data.
- the sampling ratio M of the downsampling unit 1606 may be adaptive to the received content and/or jointly designed with QP.
- the encoder controller 1622 may adaptively select the downsampling ratio, such as 1/3 or a rational fraction for example, based on the instantaneous video content and/or channel data transmission capacity.
- the pre-processing performed by the anti-aliasing filter 1604 and/or downsampling unit 1606 may be controlled and/or aided by communication with the encoder controller 1622.
- the encoder controller 1622 may additionally, or alternatively, control the quantization performed in the processing of the video data.
- the encoder controller 1622 may be configured to choose the encoding parameters.
- the encoder controller may be content dependent and may utilize motion information, residual data, and other statistics from the video data to determine the encoding parameters and/or pre-processing parameters, such as the sampling ratio M for example.
- FIG. 17 illustrates an exemplary decoder architecture 1700 for the processing and postprocessing that may be performed to decode video data.
- the entropy decoding 1704, inverse quantization 1706, inverse transform 1708, and/or motion compensation 1720 may be a part of the decoder processing for the video data.
- the upsampling unit 1712, low-pass filter 1714, anti- blur filter 1016, and/or decoder controller 1710 may be a part of the post-processing steps for decoding the video data.
- These post-processing elements may be incorporated into the decoder 1700, work independently of the decoder, or be configured to sit on top of the decoder.
- the decoded video data may be transmitted via output 1718, such as to a storage medium or an output device for example.
- video data is received via a channel 1702, such as from an encoder or storage medium for example.
- the video data being received may include an encoded video signal, video sequence, bit stream, or any other data that may represent an image or video content.
- the received video data may be processed using the entropy decoding, inverse quantization, inverse transform, and/or motion compensation, as illustrated in FIG. 3.
- the processing of the encoded video data may be performed prior to or concurrently with the postprocessing.
- the encoded video data may be post-processed by the upsampling unit 1712, low- pass filter 1714, anti-blur filter 1716, and/or decoder controller 1710.
- the decoder controller 1710 may receive an indication of the selected sampling ratio and transmit the selected sampling ratio to the upsampling unit 1712.
- the upsampling unit 1712, low-pass filter 1714, anti-blur filter 1716, and/or decoder controller 1718 may be in communication with one another and/or with other elements of a decoder 1700 to decode the received video data for storage and/or output to a display.
- the low-pass filter 1714 may be designed using the techniques described above with respect to FIGS. 14-18.
- the post-processing of the video data may include upsampling the video data.
- the upsampling ratio may be the selected rate Mi, as described above.
- the video data may be transmitted to the upsampling unit 1712 after being processed by the decoder 1700 (as illustrated).
- the upsampling unit 1712 may increase the resolution and/or quality of the reconstructed video.
- the upsampling of the video data may correspond to the down-sampling performed on the video data at the pre-processing of the encoder. Similar to the downsampling unit 1606 (FIG. 16), the upsampling unit 1712 may have a dynamic sampling ratio for upsampling the video data.
- the post-processing of the video data may include a low- pass interpolation filter 1714.
- the low-pass interpolation filter may implement anti-aliasing and improve the quality and definition of the video content represented by the video data.
- the low-pass interpolation filter for 1 :2 upsamping may include a 4-tap FIR, i.e. [0.25, 0.75, 0.75, 0.25].
- the low-pass interpolation filter 1714 may be adaptive to the content and/or jointly designed with QP.
- the decoder controller may adaptively select the filter types, filter coefficients and/or filter length in any dimension. The selections made by the decoder controller may be based on the statistics and/or syntax in the encoded video data, such as statistics of previous frames and QP of current frame for example, as described in detail above.
- the post-processing of the video data may, in some embodiments, include an anti-blur (or sharpening) filter 1716.
- the anti-blur filter 1716 may be used to compensate the blurriness caused by the down-sampling and/or low-pass filtering.
- the anti-blur filter may include a 2D-Laplacian filter, i.e. [0, 0, 0; 0, 1 , 0; 0, 0, 0] + [- 1, - 1 , - 1 ; -1 , 8, - 1 ; -1 , - 1, -l ]/5.
- the anti-blur filter may be adaptive to the content and/or jointly designed with QP.
- the decoder controller 1710 may adaptively select the filter types, filter coefficients, and/or filter length in any dimension. The selections may be based on the statistics and/or syntax in the encoded video bit stream, such as statistics of previous frames and QP of current frame for example, as described in more detail above.
- the encoder and decoder performing the pre-processing and post-processing, respectively may be aware of one another.
- the encoder and decoder may have a communication link (such as communication channel 16 in FIG. 1) that enables transmission of information corresponding to the pre-processing of the video data to the decoder.
- the decoder may transmit information corresponding to the post-processing of the video data to the encoder via the communication link.
- Such a communication link may enable the decoder to adjust the post-processing based on the pre-processing that occurs at the encoder.
- the communication link may enable the encoder to adjust the pre-processing based on the post-processing that occurs at the decoder.
- a similar communication link may also be established with other entities performing the pre-processing and/or post-processing of the video data if the pre-processing and post-processing are not performed at the encoder and decoder, respectively.
- FIG. 18 illustrates an exemplary embodiment of the pre-processing of the video data with regard to a transcoder.
- video data 1804 may be received, such as a bit stream, a video signal, video sequence, or any other data that may represent an image or video content.
- the video data may be pre-processed by the anti-aliasing filter 1808, downsampler 1810, and/or encoder controller 1802.
- the anti-aliasing filter 1808, downsampler 1810, and/or encoder controller 1802 may be in communication with one another and/or with other elements of an encoder and/or decoder.
- the pre-processing of the received video data may be performed prior to or concurrently with the processing performed by the encoder and/or decoder.
- the video data may be pre-processed as described above with regard to the discussion of the pre-processing of video data in FIG. 16.
- video coded in accordance with the systems and methods described herein may be sent via a communication channel 16, which may included wireline connections and/or wireless connections, through a communications network.
- the communications network may be any suitable type of communication system, as described in more detail below with respect to FIGS. 19A, 19B, 19C, and 19D.
- FIG. 19A is a diagram of an example communications system 1900 in which one or more disclosed embodiments may be implemented.
- the communications system 1900 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users.
- the communications system 1900 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth.
- the communications systems 1900 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single- carrier FDMA (SC-FDMA), and the like.
- CDMA code division multiple access
- TDMA time division multiple access
- FDMA frequency division multiple access
- OFDMA orthogonal FDMA
- SC-FDMA single- carrier FDMA
- the communications system 1900 may include wireless transmit/receive units (WTRUs) 1902a, 1902b, 1902c, 1902d, a radio access network (RAN) 1904, a core network 1906, a public switched telephone network (PSTN) 1908, the Internet 1910, and other networks 1912, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements.
- WTRUs 1902a, 1902b, 1902c, 1902d may be any type of device configured to operate and/or communicate in a wireless environment.
- the WTRUs 1902a, 1902b, 1902c, 1902d may be configured to transmit and/or receive wireless signals and may include user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, consumer electronics, or any other terminal capable of receiving and processing compressed video communications.
- UE user equipment
- PDA personal digital assistant
- smartphone a laptop
- netbook a personal computer
- a wireless sensor consumer electronics, or any other terminal capable of receiving and processing compressed video communications.
- the communications systems 1900 may also include a base station 1914a and a base station 1914b.
- Each of the base stations 1914a, 1914b may be any type of device configured to wirelessly interface with at least one of the WTRUs 1902a, 1902b, 1902c, 1902d to facilitate access to one or more communication networks, such as the core network 1906, the Internet 1910, and/or the networks 1912.
- the base stations 1914a, 1914b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 1914a, 1914b are each depicted as a single element, it will be appreciated that the base stations 1914a, 1914b may include any number of interconnected base stations and/or network elements.
- the base station 1914a may be part of the RAN 1904, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc.
- the base station 1914a and/or the base station 1914b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown).
- the cell may further be divided into cell sectors.
- the cell associated with the base station 1914a may be divided into three sectors.
- the base station 1914a may include three transceivers, i.e., one for each sector of the cell.
- the base station 1914a may employ multiple-input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.
- MIMO multiple-input multiple output
- the base stations 1914a, 1914b may communicate with one or more of the WTRUs
- an air interface 1916 which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, etc.).
- the air interface 1916 may be established using any suitable radio access technology (RAT).
- RAT radio access technology
- the communications system 1900 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like.
- the base station 1914a in the RAN 1904 and the WTRUs 1902a, 1902b, 1902c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 1916 using wideband CDMA (WCDMA).
- WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+).
- HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High- Speed Uplink Packet Access (HSUPA).
- the base station 1914a and the WTRUs 1902a, 1902b, 1902c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 1916 using Long Term Evolution (LTE) and/or LTE- Advanced (LTE-A).
- E-UTRA Evolved UMTS Terrestrial Radio Access
- LTE Long Term Evolution
- LTE-A LTE- Advanced
- the base station 1914a and the WTRUs 1902a, 1902b, 1902c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 IX, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
- IEEE 802.16 i.e., Worldwide Interoperability for Microwave Access (WiMAX)
- CDMA2000, CDMA2000 IX, CDMA2000 EV-DO Code Division Multiple Access 2000
- IS-95 IS-95
- IS-856 Interim Standard 856
- GSM Global System for Mobile communications
- GSM Global System for Mobile communications
- EDGE Enhanced Data rates for GSM Evolution
- GERAN GSM EDGERAN
- the base station 1914b in FIG. 19A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, and the like.
- the base station 1914b and the WTRUs 1902c, 1902d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN).
- the base station 1914b and the WTRUs 1902c, 1902d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN).
- WLAN wireless local area network
- WPAN wireless personal area network
- the base station 1914b and the WTRUs 1902c, 1902d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, etc.) to establish a picocell or femtocell.
- a cellular-based RAT e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, etc.
- the base station 1914b may have a direct connection to the Internet 1910.
- the base station 1914b may not be required to access the Internet 1910 via the core network 1906.
- the RAN 1904 may be in communication with the core network 1906, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 1902a, 1902b, 1902c, 1902d.
- VoIP voice over internet protocol
- the core network 1906 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high- level security functions, such as user authentication.
- the RAN 1904 and/or the core network 1906 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 1904 or a different RAT.
- the core network 1906 may also be in communication with another RAN (not shown) employing a GSM radio technology.
- the core network 1906 may also serve as a gateway for the WTRUs 1902a, 1902b, 1902c, 1902d to access the PSTN 1908, the Internet 1910, and/or other networks 1912.
- the PSTN 1908 may include circuit-switched telephone networks that provide plain old telephone service (POTS).
- POTS plain old telephone service
- the Internet 1910 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and the internet protocol (IP) in the TCP/IP internet protocol suite.
- the networks 1912 may include wired or wireless communications networks owned and/or operated by other service providers.
- the networks 1912 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 104 or a different RAT.
- the WTRUs 1902a, 1902b, 1902c, 1902d in the communications system 1900 may include multi-mode capabilities, i.e., the WTRUs 1902a, 1902b, 1902c, 1902d may include multiple transceivers for communicating with different wireless networks over different wireless links.
- the WTRU 1902c shown in FIG. 19A may be configured to communicate with the base station 1914a, which may employ a cellular-based radio technology, and with the base station 1914b, which may employ an IEEE 802 radio technology.
- FIG. 19B is a system diagram of an example WTRU 1902.
- the WTRU 1902 may include a processor 1918, a transceiver 1920, a transmit/receive element 1922, a speaker/microphone 1924, a keypad 1926, a display/touchpad 1928, non-removable memory 1906, removable memory 1932, a power source 1934, a global positioning system (GPS) chipset 1936, and other peripherals 1938.
- GPS global positioning system
- the processor 1918 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a graphics processing unit (GPU), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
- the processor 1918 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1902 to operate in a wireless environment.
- the processor 1918 may be coupled to the transceiver 1920, which may be coupled to the transmit/receive element 1922. While FIG. 19B depicts the processor 1918 and the transceiver 1920 as separate components, it will be appreciated that the processor 1918 and the transceiver 1920 may be integrated together in an electronic package or chip.
- the transmit/receive element 1922 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 1914a) over the air interface 1916.
- the transmit/receive element 1919 may be an antenna configured to transmit and/or receive RF signals.
- the transmit/receive element 1922 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example.
- the transmit/receive element 1922 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive element 1922 may be configured to transmit and/or receive any combination of wireless signals.
- the WTRU 1902 may include any number of transmit/receive elements 1922. More specifically, the WTRU 1902 may employ MIMO technology. Thus, in one embodiment, the WTRU 1902 may include two or more transmit/receive elements 1922 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1916.
- the WTRU 1902 may include two or more transmit/receive elements 1922 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1916.
- the transceiver 1920 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 1922 and to demodulate the signals that are received by the transmit/receive element 1922.
- the WTRU 1902 may have multi-mode capabilities.
- the transceiver 1920 may include multiple transceivers for enabling the WTRU 1902 to communicate via multiple RATs, such as UTRA and IEEE 802.11, for example.
- the processor 1918 of the WTRU 1902 may be coupled to, and may receive user input data from, the speaker/microphone 1924, the keypad 1926, and/or the display/touchpad 1928 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit).
- the processor 1918 may also output user data to the speaker/microphone 1924, the keypad 1926, and/or the display/touchpad 1928.
- the processor 1918 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 1906 and/or the removable memory 1932.
- the non-removable memory 1906 may include random- access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
- the removable memory 1932 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like.
- SIM subscriber identity module
- SD secure digital
- the processor 1918 may access information from, and store data in, memory that is not physically located on the WTRU 1902, such as on a server or a home computer (not shown).
- the processor 1918 may receive power from the power source 1934, and may be configured to distribute and/or control the power to the other components in the WTRU 1902.
- the power source 1934 may be any suitable device for powering the WTRU 1902.
- the power source 1934 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
- the processor 1918 may also be coupled to the GPS chipset 1936, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 1902.
- location information e.g., longitude and latitude
- the WTRU 1902 may receive location information over the air interface 1916 from a base station (e.g., base stations 1914a, 1914b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 1902 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
- the processor 1918 may further be coupled to other peripherals 1938, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
- the peripherals 1938 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
- the peripherals 1938 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet
- FIG. 19C is a system diagram of the RAN 1904 and the core network 1906 according to an embodiment.
- the RAN 1904 may employ a UTRA radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916.
- the RAN 1904 may also be in communication with the core network 1906.
- the RAN 1904 may include Node-Bs 1940a, 1940b, 1940c, which may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916.
- the Node-Bs 1940a, 1940b, 1940c may each be associated with a particular cell (not shown) within the RAN 1904.
- the RAN 1904 may also include RNCs 1942a, 1942b. It will be appreciated that the RAN 1904 may include any number of Node-Bs and RNCs while remaining consistent with an embodiment.
- the Node-Bs 1940a, 1940b may be in communication with the RNC 1942a. Additionally, the Node-B 1940c may be in communication with the RNC 1942b. The Node-Bs 1940a, 1940b, 1940c may communicate with the respective RNCs 1942a, 1942b via an lub interface. The RNCs 1942a, 1942b may be in communication with one another via an Iur interface. Each of the RNCs 1942a, 1942b may be configured to control the respective Node- Bs 1940a, 1940b, 1940c to which it is connected. In addition, each of the RNCs 1942a, 1942b may be configured to carry out or support other functionality, such as outer loop power control, load control, admission control, packet scheduling, handover control, macrodiversity, security functions, data encryption, and the like.
- the core network 1906 shown in FIG. 19C may include a media gateway (MGW) 1944, a mobile switching center (MSC) 1946, a serving GPRS support node (SGSN) 1948, and/or a gateway GPRS support node (GGSN) 1950. While each of the foregoing elements are depicted as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator.
- MGW media gateway
- MSC mobile switching center
- SGSN serving GPRS support node
- GGSN gateway GPRS support node
- the RNC 1942a in the RAN 1904 may be connected to the MSC 1946 in the core network 1906 via an IuCS interface.
- the MSC 1946 may be connected to the MGW 1944.
- the MSC 1946 and the MGW 1944 may provide the WTRUs 1902a, 1902b, 1902c with access to circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices.
- the RNC 1942a in the RAN 1904 may also be connected to the SGSN 1948 in the core network 1906 via an IuPS interface.
- the SGSN 1948 may be connected to the GGSN 1950.
- the SGSN 1948 and the GGSN 1950 may provide the WTRUs 1902a, 1902b, 1902c with access to packet-switched networks, such as the Internet 1910, to facilitate communications between and the WTRUs 1902a, 1902b, 1902c and IP-enabled devices.
- the core network 1906 may also be connected to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers.
- FIG. 19D is a system diagram of the RAN 1904 and the core network 1906 according to another embodiment.
- the RAN 1904 may employ an E-UTRA radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916.
- the RAN 1904 may also be in communication with the core network 1906.
- the RAN 1904 may include eNode-Bs 1960a, 1960b, 1960c, though it will be appreciated that the RAN 1904 may include any number of eNode-Bs while remaining consistent with an embodiment.
- the eNode-Bs 1960a, 1960b, 1960c may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916.
- the eNode-Bs 1960a, 1960b, 1960c may implement MIMO technology.
- the eNode-B 1960a for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 1902a.
- Each of the eNode-Bs 1960a, 1960b, 1960c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the uplink and/or downlink, and the like. As shown in FIG. 19D, the eNode-Bs 1960a, 1960b, 1960c may communicate with one another over an X2 interface.
- the core network 1906 shown in FIG. 19D may include a mobility management gateway
- MME mobile multimedia convergence protocol
- PDN packet data network gateway 1966. While each of the foregoing elements are depicted as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator.
- the MME 1962 may be connected to each of the eNode-Bs 1960a, 1960b, 1960c in the
- the MME 1962 may be responsible for authenticating users of the WTRUs 1902a, 1902b, 1902c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 1902a, 1902b, 1902c, and the like.
- the MME 1962 may also provide a control plane function for switching between the RAN 1904 and other RANs (not shown) that employ other radio technologies, such as GSM or WCDMA.
- the serving gateway 1964 may be connected to each of the eNode Bs 1960a, 1960b, 1960c in the RAN 1904 via the SI interface.
- the serving gateway 1964 may generally route and forward user data packets to/from the WTRUs 1902a, 1902b, 1902c.
- the serving gateway 1964 may also perform other functions, such as anchoring user planes during inter-eNode B handovers, triggering paging when downlink data is available for the WTRUs 1902a, 1902b, 1902c, managing and storing contexts of the WTRUs 1902a, 1902b, 1902c, and the like.
- the serving gateway 1964 may also be connected to the PDN gateway 1966, which may provide the WTRUs 1902a, 1902b, 1902c with access to packet-switched networks, such as the Internet 1910, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and IP- enabled devices.
- PDN gateway 1966 may provide the WTRUs 1902a, 1902b, 1902c with access to packet-switched networks, such as the Internet 1910, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and IP- enabled devices.
- the core network 1906 may facilitate communications with other networks.
- the core network 1906 may provide the WTRUs 1902a, 1902b, 102c with access to circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices.
- the core network 1906 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the core network 1906 and the PSTN 1908.
- IMS IP multimedia subsystem
- the core network 1906 may provide the WTRUs 1902a, 1902b, 1902c with access to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers.
- FIG. 19E is a system diagram of the RAN 1904 and the core network 1906 according to another embodiment.
- the RAN 1904 may be an access service network (ASN) that employs IEEE 802.16 radio technology to communicate with the WTRUs 1902a, 1902b, 1902c over the air interface 1916.
- ASN access service network
- the communication links between the different functional entities of the WTRUs 1902a, 1902b, 1902c, the RAN 1904, and the core network 1906 may be defined as reference points.
- the RAN 1904 may include base stations 1970a, 1970b, 1970c, and an ASN gateway 1972, though it will be appreciated that the RAN 1904 may include any number of base stations and ASN gateways while remaining consistent with an embodiment.
- the base stations 1970a, 1970b, 1970c may each be associated with a particular cell (not shown) in the RAN 1904 and may each include one or more transceivers for communicating with the WTRUs 1902a, 1902b, 1902c over the air interface 1916.
- the base stations 1970a, 1970b, 1970c may implement MIMO technology.
- the base station 1970a for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 1902a.
- the base stations 1970a, 1970b, 1970c may also provide mobility management functions, such as handoff triggering, tunnel establishment, radio resource management, traffic classification, quality of service (QoS) policy enforcement, and the like.
- the ASN gateway 1972 may serve as a traffic aggregation point and may be responsible for paging, caching of subscriber profiles, routing to the core network 1906, and the like.
- the air interface 1916 between the WTRUs 1902a, 1902b, 1902c and the RAN 1904 may be defined as an Rl reference point that implements the IEEE 802.16 specification.
- each of the WTRUs 1902a, 1902b, 1902c may establish a logical interface (not shown) with the core network 1906.
- the logical interface between the WTRUs 1902a, 1902b, 1902c and the core network 1906 may be defined as an R2 reference point, which may be used for authentication, authorization, IP host configuration management, and/or mobility management.
- the communication link between each of the base stations 1970a, 1970b, 1970c may be defined as an R8 reference point that includes protocols for facilitating WTRU handovers and the transfer of data between base stations.
- the communication link between the base stations 1970a, 1970b, 1970c and the ASN gateway 1972 may be defined as an R6 reference point.
- the R6 reference point may include protocols for facilitating mobility management based on mobility events associated with each of the WTRUs 1902a, 1902b, 1900c.
- the RAN 1904 may be connected to the core network 1906.
- the communication link between the RAN 104 and the core network 1906 may defined as an R3 reference point that includes protocols for facilitating data transfer and mobility management capabilities, for example.
- the core network 1906 may include a mobile IP home agent ( ⁇ - HA) 1974, an authentication, authorization, accounting (AAA) server 1976, and a gateway 1978. While each of the foregoing elements are depicted as part of the core network 1906, it will be appreciated that any one of these elements may be owned and/or operated by an entity other than the core network operator.
- the MIP-HA 1974 may be responsible for IP address management, and may enable the WTRUs 1902a, 1902b, 1902c to roam between different ASNs and/or different core networks.
- the MIP-HA 1974 may provide the WTRUs 1902a, 1902b, 1902c with access to packet- switched networks, such as the Internet 1910, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and IP-enabled devices.
- the AAA server 1976 may be responsible for user authentication and for supporting user services.
- the gateway 1978 may facilitate interworking with other networks.
- the gateway 1978 may provide the WTRUs 1902a, 1902b, 1902c with access to circuit-switched networks, such as the PSTN 1908, to facilitate communications between the WTRUs 1902a, 1902b, 1902c and traditional land-line communications devices.
- the gateway 1978 may provide the WTRUs 1902a, 1902b, 1902c with access to the networks 1912, which may include other wired or wireless networks that are owned and/or operated by other service providers.
- the RAN 1904 may be connected to other ASNs and the core network 1906 may be connected to other core networks.
- the communication link between the RAN 1904 the other ASNs may be defined as an R4 reference point, which may include protocols for coordinating the mobility of the WTRUs 1902a, 1902b, 1902c between the RAN 1904 and the other ASNs.
- the communication link between the core network 1906 and the other core networks may be defined as an R5 reference, which may include protocols for facilitating interworking between home core networks and visited core networks.
- a video encoding method comprising receiving video data; at each of a plurality of sampling ratios, determining a sampling error value; for a bit rate, at each of the plurality of sampling ratios, determining a coding error value; summing the sampling error value and the coding error value at each of the plurality of sampling ratios; selecting one of the plurality of sampling ratios based on the sum of the sampling error value and the coding error value at the selected sampling ratio; downsampling the video data at the selected sampling ratio; and encoding the downsampled video data.
- selecting one of the plurality of sampling ratios comprises selecting the one of the plurality sampling ratios resulting in the lowest summation of the sampling error value and the coding error value.
- selecting one of the plurality of sampling ratios comprises selecting one of the plurality sampling ratios resulting in a summation of the sampling error value and the coding error value having an overall error value beneath an overall error threshold.
- sampling error value is based on a power spectral density (PSD) of the video data and an estimation of the PSD of downsampled video data.
- PSD power spectral density
- a method of any of the proceeding embodiments wherein the estimation of the PSD of downsampled video data is a function, wherein at least one parameter of the function is determined by at least one characteristic of the video data.
- the sampling error value is based on a difference of the received video data and anti-aliasing filtered video data.
- the coding error value is based on a coding error model, wherein the coding error model is a function of the bit rate and a sampling ratio.
- the coding error model comprises a first parameter and a second parameter, and wherein the first parameter and the second parameter are each determined by at least one characteristic of the video data.
- a method of any of the proceeding embodiments further comprising for each of a plurality of bit rates, determining a bit per pixel value; for each of the plurality of bit rates, determining a distortion value; for each of the plurality of bit rates, determining a plurality of estimated distortion values based on a plurality of values for the first parameter and a plurality of values for the second parameter of the coding error model; and determining a selected value for the first parameter and a value for the second parameter of the coding error model, such that the plurality of distortion values have the minimum difference with the plurality of the estimated distortion values.
- a method of any of the proceeding embodiments further comprising selecting a value for the first parameter from a first look-up table; and selecting a value for the second parameter form a second look-up table.
- a method of any of the proceeding embodiments further comprising determining a power spectral density of the video data, wherein the values for the first and second parameters are based on a DC component of the power spectral density.
- a method of any of the proceeding embodiments further comprising determining a power spectral density of the video data, wherein the values for the first and second parameters are based on the decay speed toward the high frequency band of the power spectral density.
- the at least one characteristic is a complexity value of the received video data; and wherein the complexity value is received from one of a user input and a network node.
- a method of any of the proceeding embodiments further comprising receiving an indication of the bit rate from a network node.
- a method of any of the proceeding embodiments further comprising subsequent to selecting the one of the plurality of sampling ratios, receiving an indication of a second bit rate; for a second bit rate, determining an updated coding error valueat each of the plurality of sampling ratios; selecting an updated sampling ratio based on a summation of the sampling error value and updated coding error value; downsampling the input video at the updated sampling ratio; and encoding the downsampled video sequence.
- sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is different from the vertical sampling ratio.
- sampling ratio comprises a horizontal sampling ratio and a vertical sampling ratio and the horizontal sampling ratio is the same as the vertical sampling ratio.
- a method of any of the proceeding embodiments wherein a first selection of the sampling ratio is performed at the beginning of the received video data and at least a second selection of the sampling ratio is performed during the duration of the received video data.
- a video decoding method comprising receiving compressed video data; receiving an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios; decoding the compressed video data to form reconstructed video data; upsampling the reconstructed video data at the selected sampling ratio to increase the resolution of the upsampled reconstructed video; and outputting the upsampled video data.
- a video decoding system comprising a video decoder, the video decoder configured to receive compressed video data; receive an indication of a selected sampling ratio, wherein the sampling ratio is based on a summation of a sampling error value and a coding error value across a plurality of sampling ratios; decode the compressed video data to form reconstructed video data; upsample the reconstructed video data to increase the resolution of the reconstructed video data; and output the filtered video data.
- a video decoding system of the proceeding embodiment further comprising a wireless receive/transmit unit in communication with a communication system, wherein the wireless receive/transmit unit is configured to receive the video data from the communication system.
- ROM read only memory
- RAM random access memory
- register cache memory
- semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
- a processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.
- processing platforms, computing systems, controllers, and other devices containing processors are noted. These devices may contain at least one Central Processing Unit (“CPU”) and memory.
- CPU Central Processing Unit
- FIG. 1 A block diagram illustrating an exemplary computing system
- FIG. 1 A block diagram illustrating an exemplary computing system
- FIG. 1 A block diagram illustrating an exemplary computing system
- FIG. 1 A block diagram illustrating an exemplary computing system
- FIG. 1 A block diagram illustrating an exemplary computing system
- memory may contain at least one Central Processing Unit (“CPU”) and memory.
- CPU Central Processing Unit
- Such acts and operations or instructions may be referred to as being "executed,” “computer executed” or "CPU executed.”
- an electrical system represents data bits that can cause a resulting transformation or reduction of the electrical signals and the maintenance of data bits at memory locations in a memory system to thereby reconfigure or otherwise alter the CPU's operation, as well as other processing of signals.
- the memory locations where data bits are maintained are physical locations that have particular electrical, magnetic, optical, or organic properties corresponding to or representative of the data bits. It should be understood that the exemplary embodiments are not limited to the above- mentioned platforms or CPUs and that other platforms and CPUs may support the described methods.
- the data bits may also be maintained on a computer readable medium including magnetic disks, optical disks, and any other volatile (e.g., Random Access Memory (“RAM”)) or nonvolatile (e.g., Read-Only Memory (“ROM”)) mass storage system readable by the CPU.
- RAM Random Access Memory
- ROM Read-Only Memory
- the computer readable medium may include cooperating or interconnected computer readable medium, which exist exclusively on the processing system or are distributed among multiple interconnected processing systems that may be local or remote to the processing system. It should be understood that the exemplary embodiments are not limited to the above-mentioned memories and that other platforms and memories may support the described methods.
- the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used. Further, the terms “any of followed by a listing of a plurality of items and/or a plurality of categories of items, as used herein, are intended to include “any of,” “any combination of,” “any multiple of,” and/or “any combination of multiples of the items and/or the categories of items, individually or in conjunction with other items and/or other categories of items. Further, as used herein, the term “set” is intended to include any number of items, including zero. Further, as used herein, the term “number” is intended to include any number, including zero.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020137013488A KR20130105870A (en) | 2010-10-27 | 2011-10-27 | Systems and methods for adaptive video coding |
EP11779073.3A EP2633685A1 (en) | 2010-10-27 | 2011-10-27 | Systems and methods for adaptive video coding |
CN2011800628602A CN103283227A (en) | 2010-10-27 | 2011-10-27 | Systems and methods for adaptive video coding |
AU2011319844A AU2011319844A1 (en) | 2010-10-27 | 2011-10-27 | Systems and methods for adaptive video coding |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US40732910P | 2010-10-27 | 2010-10-27 | |
US61/407,329 | 2010-10-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012058394A1 true WO2012058394A1 (en) | 2012-05-03 |
Family
ID=44906484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2011/058027 WO2012058394A1 (en) | 2010-10-27 | 2011-10-27 | Systems and methods for adaptive video coding |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP2633685A1 (en) |
KR (1) | KR20130105870A (en) |
CN (1) | CN103283227A (en) |
AU (1) | AU2011319844A1 (en) |
WO (1) | WO2012058394A1 (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103475880A (en) * | 2013-09-11 | 2013-12-25 | 浙江大学 | Method for low-complexity video transcoding from H.264 to HEVC based on statistic analysis |
CN103945222A (en) * | 2014-04-21 | 2014-07-23 | 福州大学 | Code rate control model updating method based on HEVC standards |
WO2014143008A1 (en) | 2013-03-15 | 2014-09-18 | Icelero Inc | Method and system for improved video codec rate-distortion performance by pre and post-processing |
CN105430395A (en) * | 2015-12-03 | 2016-03-23 | 北京航空航天大学 | HEVC (High Efficiency Video Coding) CTU (Coding Tree Unit) grade code rate control method based on optimal bit allocation |
EP3097694A1 (en) * | 2014-01-24 | 2016-11-30 | Cisco Technology, Inc. | Line rate visual analytics on edge devices |
WO2018018445A1 (en) * | 2016-07-27 | 2018-02-01 | 王晓光 | Method and system for sending video advertisement on the basis of video capacity |
WO2019240631A1 (en) * | 2018-06-15 | 2019-12-19 | Huawei Technologies Co., Ltd. | Method and apparatus for intra prediction |
WO2020042269A1 (en) * | 2018-08-31 | 2020-03-05 | 网宿科技股份有限公司 | Code rate adjustment method and device for encoding process |
US10825206B2 (en) * | 2018-10-19 | 2020-11-03 | Samsung Electronics Co., Ltd. | Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image |
CN112560552A (en) * | 2019-09-25 | 2021-03-26 | 华为技术有限公司 | Video classification method and device |
US10986370B2 (en) | 2013-10-07 | 2021-04-20 | Vid Scale, Inc. | Combined scalability processing for multi-layer video coding |
US20210358083A1 (en) | 2018-10-19 | 2021-11-18 | Samsung Electronics Co., Ltd. | Method and apparatus for streaming data |
US11184638B1 (en) * | 2020-07-16 | 2021-11-23 | Facebook, Inc. | Systems and methods for selecting resolutions for content optimized encoding of video data |
EP3934261A1 (en) * | 2020-07-02 | 2022-01-05 | Samsung Electronics Co., Ltd. | Electronic device and method of operating the same |
US11381816B2 (en) | 2013-03-15 | 2022-07-05 | Crunch Mediaworks, Llc | Method and system for real-time content-adaptive transcoding of video content on mobile devices to save network bandwidth during video sharing |
US11395001B2 (en) | 2019-10-29 | 2022-07-19 | Samsung Electronics Co., Ltd. | Image encoding and decoding methods and apparatuses using artificial intelligence |
CN115052146A (en) * | 2022-06-16 | 2022-09-13 | 上海大学 | Content self-adaptive down-sampling video coding optimization method based on classification |
US11688038B2 (en) | 2018-10-19 | 2023-06-27 | Samsung Electronics Co., Ltd. | Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102119300B1 (en) | 2017-09-15 | 2020-06-04 | 서울과학기술대학교 산학협력단 | Apparatus and method for encording 360-degree video, recording medium for performing the method |
CN112367147B (en) * | 2020-09-27 | 2022-09-09 | 苏州宣怀智能科技有限公司 | Data display method and device, electronic equipment and computer readable medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6104434A (en) * | 1996-10-24 | 2000-08-15 | Fujitsu Limited | Video coding apparatus and decoding apparatus |
WO2009055899A1 (en) * | 2007-11-02 | 2009-05-07 | Ecole De Technologie Superieure | System and method for quality-aware selection of parameters in transcoding of digital images |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003531533A (en) * | 2000-04-18 | 2003-10-21 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Bitrate allocation in joint bitrate transcoding |
US7536469B2 (en) * | 2004-12-10 | 2009-05-19 | Microsoft Corporation | System and process for controlling the coding bit rate of streaming media data employing a limited number of supported coding bit rates |
CN101389021B (en) * | 2007-09-14 | 2010-12-22 | 华为技术有限公司 | Video encoding/decoding method and apparatus |
-
2011
- 2011-10-27 CN CN2011800628602A patent/CN103283227A/en active Pending
- 2011-10-27 KR KR1020137013488A patent/KR20130105870A/en unknown
- 2011-10-27 AU AU2011319844A patent/AU2011319844A1/en not_active Abandoned
- 2011-10-27 WO PCT/US2011/058027 patent/WO2012058394A1/en active Application Filing
- 2011-10-27 EP EP11779073.3A patent/EP2633685A1/en not_active Ceased
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6104434A (en) * | 1996-10-24 | 2000-08-15 | Fujitsu Limited | Video coding apparatus and decoding apparatus |
WO2009055899A1 (en) * | 2007-11-02 | 2009-05-07 | Ecole De Technologie Superieure | System and method for quality-aware selection of parameters in transcoding of digital images |
Non-Patent Citations (4)
Title |
---|
A. BRUCKSTEIN: "On optimal image digitization", IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. 35, no. 4, 1 April 1987 (1987-04-01), pages 553 - 555, XP055018088, ISSN: 0096-3518, DOI: 10.1109/TASSP.1987.1165148 * |
BRUCKSTEIN A M ET AL: "Down-scaling for better transform compression", IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 12, no. 9, 1 September 2003 (2003-09-01), pages 1132 - 1144, XP011099900, ISSN: 1057-7149, DOI: 10.1109/TIP.2003.816023 * |
EKMEKCIOGLU E ET AL: "Bit-Rate Adaptive Downsampling for the Coding of Multi-View Video with Depth Information", 3DTV CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2008, IEEE, PISCATAWAY, NJ, USA, 28 May 2008 (2008-05-28), pages 137 - 140, XP031275230, ISBN: 978-1-4244-1760-5 * |
EKMEKCIOGLU E ET AL: "Low-delay random view access in multi-view coding using a bit-rate adaptive downsampling approach", MULTIMEDIA AND EXPO, 2008 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 23 June 2008 (2008-06-23), pages 745 - 748, XP031312829, ISBN: 978-1-4244-2570-9 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11381816B2 (en) | 2013-03-15 | 2022-07-05 | Crunch Mediaworks, Llc | Method and system for real-time content-adaptive transcoding of video content on mobile devices to save network bandwidth during video sharing |
EP3934247A3 (en) * | 2013-03-15 | 2022-03-16 | Icelero Inc. | Method and system for improved video codec rate-distortion performance by pre and post-processing |
WO2014143008A1 (en) | 2013-03-15 | 2014-09-18 | Icelero Inc | Method and system for improved video codec rate-distortion performance by pre and post-processing |
US10785481B2 (en) | 2013-03-15 | 2020-09-22 | Crunch Mediaworks Llc | Method and system for video codec rate-distortion performance by pre and post-processing |
EP2974311A4 (en) * | 2013-03-15 | 2016-08-24 | Icelero Inc | Method and system for improved video codec rate-distortion performance by pre and post-processing |
US10230951B2 (en) | 2013-03-15 | 2019-03-12 | Crunch Mediaworks, Llc | Method and system for video codec rate-distortion performance by pre and post-processing |
US11856191B2 (en) | 2013-03-15 | 2023-12-26 | Crunch Mediaworks, Llc | Method and system for real-time content-adaptive transcoding of video content on mobile devices to save network bandwidth during video sharing |
CN103475880A (en) * | 2013-09-11 | 2013-12-25 | 浙江大学 | Method for low-complexity video transcoding from H.264 to HEVC based on statistic analysis |
US10986370B2 (en) | 2013-10-07 | 2021-04-20 | Vid Scale, Inc. | Combined scalability processing for multi-layer video coding |
EP3097694A1 (en) * | 2014-01-24 | 2016-11-30 | Cisco Technology, Inc. | Line rate visual analytics on edge devices |
CN103945222B (en) * | 2014-04-21 | 2017-01-25 | 福州大学 | Code rate control model updating method based on HEVC standards |
CN103945222A (en) * | 2014-04-21 | 2014-07-23 | 福州大学 | Code rate control model updating method based on HEVC standards |
CN105430395A (en) * | 2015-12-03 | 2016-03-23 | 北京航空航天大学 | HEVC (High Efficiency Video Coding) CTU (Coding Tree Unit) grade code rate control method based on optimal bit allocation |
WO2018018445A1 (en) * | 2016-07-27 | 2018-02-01 | 王晓光 | Method and system for sending video advertisement on the basis of video capacity |
US11546631B2 (en) | 2018-06-15 | 2023-01-03 | Huawei Technologies Co., Ltd. | Method and apparatus for DC intra prediction of rectangular blocks using an aspect ratio |
WO2019240631A1 (en) * | 2018-06-15 | 2019-12-19 | Huawei Technologies Co., Ltd. | Method and apparatus for intra prediction |
WO2020042269A1 (en) * | 2018-08-31 | 2020-03-05 | 网宿科技股份有限公司 | Code rate adjustment method and device for encoding process |
US20210358083A1 (en) | 2018-10-19 | 2021-11-18 | Samsung Electronics Co., Ltd. | Method and apparatus for streaming data |
US11748847B2 (en) | 2018-10-19 | 2023-09-05 | Samsung Electronics Co., Ltd. | Method and apparatus for streaming data |
US10825206B2 (en) * | 2018-10-19 | 2020-11-03 | Samsung Electronics Co., Ltd. | Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image |
US11688038B2 (en) | 2018-10-19 | 2023-06-27 | Samsung Electronics Co., Ltd. | Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image |
US11663747B2 (en) | 2018-10-19 | 2023-05-30 | Samsung Electronics Co., Ltd. | Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image |
CN112560552A (en) * | 2019-09-25 | 2021-03-26 | 华为技术有限公司 | Video classification method and device |
US11395001B2 (en) | 2019-10-29 | 2022-07-19 | Samsung Electronics Co., Ltd. | Image encoding and decoding methods and apparatuses using artificial intelligence |
US11405637B2 (en) | 2019-10-29 | 2022-08-02 | Samsung Electronics Co., Ltd. | Image encoding method and apparatus and image decoding method and apparatus |
US11706261B2 (en) | 2020-07-02 | 2023-07-18 | Samsung Electronics Co., Ltd. | Electronic device and method for transmitting and receiving content |
EP3934261A1 (en) * | 2020-07-02 | 2022-01-05 | Samsung Electronics Co., Ltd. | Electronic device and method of operating the same |
US11184638B1 (en) * | 2020-07-16 | 2021-11-23 | Facebook, Inc. | Systems and methods for selecting resolutions for content optimized encoding of video data |
CN115052146A (en) * | 2022-06-16 | 2022-09-13 | 上海大学 | Content self-adaptive down-sampling video coding optimization method based on classification |
Also Published As
Publication number | Publication date |
---|---|
KR20130105870A (en) | 2013-09-26 |
EP2633685A1 (en) | 2013-09-04 |
CN103283227A (en) | 2013-09-04 |
AU2011319844A1 (en) | 2013-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2012058394A1 (en) | Systems and methods for adaptive video coding | |
US11405621B2 (en) | Sampling grid information for spatial layers in multi-layer video coding | |
JP6592145B2 (en) | Inter-layer reference image enhancement for multi-layer video coding | |
US10237555B2 (en) | System and method of video coding quantization and dynamic range control | |
US10218971B2 (en) | Adaptive upsampling for multi-layer video coding | |
JP2022023856A (en) | Codec architecture for layer videos coding | |
US20190014333A1 (en) | Inter-layer prediction for scalable video coding | |
EP2917892A2 (en) | Temporal filter for denoising a high dynamic range video | |
WO2017020021A1 (en) | Scalable high efficiency video coding to high efficiency video coding transcoding | |
WO2012061258A2 (en) | Parametric bit rate model for frame-level rate control in video coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11779073 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2011779073 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011779073 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20137013488 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2011319844 Country of ref document: AU Date of ref document: 20111027 Kind code of ref document: A |