EP1730846A2 - Verfahren und vorrichtungen zum komprimieren digitaler bilddaten mit bewegungsprädiktion - Google Patents
Verfahren und vorrichtungen zum komprimieren digitaler bilddaten mit bewegungsprädiktionInfo
- Publication number
- EP1730846A2 EP1730846A2 EP05725507A EP05725507A EP1730846A2 EP 1730846 A2 EP1730846 A2 EP 1730846A2 EP 05725507 A EP05725507 A EP 05725507A EP 05725507 A EP05725507 A EP 05725507A EP 1730846 A2 EP1730846 A2 EP 1730846A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- frame
- sub
- bit stream
- motion
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/129—Scanning of coding units, e.g. zig-zag scan of transform coefficients or flexible macroblock ordering [FMO]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/523—Motion estimation or motion compensation with sub-pixel accuracy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/53—Multi-resolution motion estimation; Hierarchical motion estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/533—Motion estimation using multistep search, e.g. 2D-log search or one-at-a-time search [OTS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/57—Motion estimation characterised by a search window with variable size or shape
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
- H04N19/619—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding the transform being operated outside the prediction loop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
- H04N19/64—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
- H04N19/64—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission
- H04N19/647—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission using significance based coding, e.g. Embedded Zerotrees of Wavelets [EZW] or Set Partitioning in Hierarchical Trees [SPIHT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/87—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/96—Tree coding, e.g. quad-tree coding
Definitions
- the present invention relates generally to multimedia applications. More particularly, this invention relates to compressing digital image data with motion prediction.
- a motion prediction- is performed between the consecutive frames by tracking motion on a luminance map of the frames to generate motion prediction information for the luminance component.
- the motion prediction information of the luminance component is then applied to the chrominance maps.
- the wavelet coefficients of each frame and the motion prediction information are encoded into a bit stream based on a target transmission rate, where the encoded wavelet coefficients satisfy a predetermined threshold according to a predetermined algorithm.
- FIG. 1 is a block diagram illustrating an exemplary-multimedia streaming system according to one embodiment.
- Figure 2 is a block diagram illustrating an exemplary multimedia streaming system according to one embodiment.
- Figure 3 is a block diagram illustrating an exemplary network stack according to one embodiment.
- Figures 4A and 4B are block diagram illustrating exemplary encoding and decoding systems according to certain embodiments.
- Figure 5 is a flow diagram illustrating an exemplary encoding process according to one embodiment.
- Figures 6 and 7 are block diagrams illustrating exemplary pixel maps according to certain embodiments.
- Figure 8 is a flow diagram illustrating an exemplary encoding process according to an alternative embodiment.
- Figures 9A-9B and 10A-10B are block diagrams illustrating exemplary encoding and decoding systems with motion prediction according certain embodiments.
- Figure 11 is a flow diagram illustrating an exemplary encoding process with motion prediction according to one embodiment.
- Figures 12-15 are block diagrams illustrating exemplary pixel maps according to certain embodiments.
- Embodiments of the system are suited for wireless streaming solutions, due to the seamless progressive transmission capability (e.g., various bandwidths), which help in graceful degradation of video quality in the event of a sudden shortfall in channel bandwidth. Moreover, it also allows for comprehensive intra- as well as inter-frame rate control, thereby allowing for the optimal allocation of bits to each frame, and an optimal distribution of the frame bit budget between the luma and chroma maps. As a result, this helps in improving the perspective quality of frames that have relatively high levels of detail or motion, while maintaining a minimal threshold on the picture quality of uniform texture and/or slow motion sequences within a video clip.
- Embodiments set forth herein include a stable system to compress and decompress digital audio/video data that is implemented on software and/or hardware platforms. Some advantages of the various embodiments of the invention include, but are not limited to, low battery power consumption, low complexity and low processing load, leading to a more efficient implementation of a commercial audio/video compression/decompression and transmission system.
- some other advantages include, but are not restricted to, a robust error detection and correction routine that exploits the redundancies in the unique data structure used in the source/arithmetic encoder/decoder of the system, and a smaller search space for predicting motion between two consecutive frames, for a more efficient and faster motion prediction routine.
- FIG. 1 is a block diagram of one embodiment of an exemplary multimedia streaming system.
- exemplary system 100 includes- server component 101 (also referred to herein as a server suite) communicatively coupled to client components 103-104 (also referred to herein as client suites) over a network 102, which may be a wired network, a wireless network, or a combination of both.
- a server suite is an amalgamation of several services that provide download- and-playback (D&P), streaming broadcast, and/or peer-to-peer communication services. This server suite is designed to communicate with any third party network protocol stack (as shown in Figure 3).
- these components of the system may be implemented in the central server, though a lightweight version of the encoder may be incorporated into a handheld platform for peer-to-peer video conferencing applications.
- the decoder may be implemented in a client-side memory.
- the server component 101 may be implemented as a plug-in application within a server, such as a Web server.
- each of the client components 103- 104 may be implemented as a plug-in within a client, such as a wireless station (e.g., cellular phone, a personal digital assistant or PDA).
- a wireless station e.g., cellular phone, a personal digital assistant or PDA
- server component 101 includes a data acquisition module 105, an encoder 106, and a decoder 107.
- the data acquisition module 105 includes a video/audio repository, an imaging device to capture video in real-time, and/or a repository of video/audio clips.
- an encoder 106 reads the data and entropy/arithmetic encodes it into a byte stream. The encoder 106 may be implemental within a server suite.
- video/audio services are provided to a client engine (e.g., clients 103-104), which is a product suite encapsulating a network stack implementation (as shown in Figure 3) and a proprietary decoder (e.g;, 108-109).
- client engine e.g., clients 103-104
- This suite can accept a digital payload at various data rates and footprint formats, segregate the audio and video streams, decode each byte stream independently, and display the data in a coherent and real-life manner.
- encoder module 106 reads raw data in a variety of data formats, (which includes, but is not limited to, RGB x:y:z, YUV x':y':z ⁇ YCrCb x":y":z", where the letter symbols denote sub-sampling ratios etc.), and converts them into one single standard format for purposes of standardization and simplicity.
- the digital information is read frame wise in a non-interleaved raster format.
- the encoder unit 106 segregates the audio and video streams prior to actual processing. This is useful since the encoding and decoding mechanisms used for audio and video may be different.
- the frame data is then fed into a temporary buffer, and transformed into the spatial-frequency domain using a unique set of wavelet filtering operations.
- the ingenuity in this wavelet transformation lies in its preclusion of extra buffering, and the conversion of computationally complex filtering operations into simple addition/subtraction operations. This makes the wavelet module in this codec memory more efficient.
- the source encoder/decoder performs compression of the data by reading the wavelet coefficients of every sub-band of the frame obtained from the previous operation in a unique zigzag fashion, known as a Morton scan (similar to the one shown in Figure 7). -This allows the system to arrange the data in an order based on the significance of the wavelet coefficients, and code it in that order.
- the coding alphabet can be classified into significance, sign and refinement classes in a manner well-known in the art (e.g., JPEG 2000, etc.)
- the significance, sign and bit plane information of the pixel is coded and transmitted into the byte-stream.
- the first set of coefficients to be coded thus is the coarsest sub-band in the top-left corner of the sub-band map. Once the coarsest sub- band has been exhausted in this fashion, the coefficients in the finer sub-bands are coded in a similar fashion, based on a unique tree-structure relationship between coefficients in spatially homologous sub-bands.
- bit stream To further exploit the redundancy of the bit stream, it is partitioned into independent logical groups of bits, based on their source in the sub-band tree map and the type of information it represents (e.g., significance, sign or refinement), and is arithmetic coded for further compression. This process that achieves results is similar to, but superior to, the context-based adaptive binary arithmetic coding (CABAC) technique specified in the H.264 and MPEG4 standards.
- CABAC context-based adaptive binary arithmetic coding
- the temporal redundancy between consecutive frames in a video stream is exploited to reduce the bit count even further, by employing a motion prediction scheme.
- motion is predicted over four coarsest sub-bands, and by employing a type of affined transformation, is predicted in the remaining finer sub-bands using a lower-entropy refinement search.
- the effective search area for predicting motion in the finer sub-bands is lesser than in the coarser sub-bands, leading to a speed-up in the overall performance of the system, along with a lower bit-rate as compared to similar video compression systems in current use.
- the video decoder e.g., decoders 108-109 works in a manner similar to the encoder, with the exception that it does not have the motion prediction feedback loop.
- the decoder performs the relatively opposite operations as in the encoder.
- the byte stream is read on a bit-by-bit basis, and the coefficients are updated using a non-linear quantization scheme, based on the context decision. Similar logic applies to the wavelet transformation and arithmetic coding blocks.
- the updated coefficient map is inverse wavelet transformed using a set of arithmetic lifting operations, which may be reverse of the operations undertaken in the forward wavelet transform block in the encoder, to create the reconstructed frame.
- the reconstructed frame is then rendered by a set of native API (application programming interface) calls in the decoder client.
- native API application programming interface
- the codec suite is made compatible with several popular third party multimedia network protocol suites.
- the exemplary system can be deployed on a variety of operating systems and environments, both on the hand-held as well as PC domain. These include, but are not restricted to; Microsoft ® Windows ® 9x/Me/XP/NT - 4.X/2000, Microsoft ® Windows ® CE, PocketLinuxTM (and its various third party flavors), SymbianOSTM and PalmOSTM. It is available on a range of third-party development platforms. These include, but are not limited to, Microsoft ® PocketPCTM 200X, Sun Microsystems ® J2METM MIDP ® X.O/CLDC ® X.O, Texas Instruments ® OMAPTM and Qualcomm ® BREWTM.
- embodiments of the invention can be provided as a solution on a wide range of platforms including, but not limited to, Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuits (ASIC) and System-on-Chip (SoC) implementations.
- FPGA Field Programmable Gate Arrays
- ASIC Application Specific Integrated Circuits
- SoC System-on-Chip
- FIG. 2 is a block diagram illustrating an exemplary multimedia streaming system according to one embodiment.
- exemplary system 200 includes a server or servers 201 communicatively coupled to one or more clients 202-203 over a various types of networks, such as wireless network 204 and/or wired networks 205-206, which may be the same network.
- server 201 may be implemented as server 101 of Figure 1.
- Clients 202-203 may be implemented as clients 103-104 of Figure 1.
- the server platform 201 includes, but is not limited to, three units labeled A, B and C. However, it is not so limited. These units may be implemented as a single unit or module. These units can communicate with one another, as well with external units, to provide all relevant communication and video/audio processing capabilities.
- Unit C may be an application server, which provides download services for client-side components such as decoder/encoder APIs to facilitate third party support, browser plug-ins, drivers and plug-and-play components.
- Unit B may be a web services platform. This addresses component reusability and scalability issues by providing COMTM, COM+TM, EJBTM, CORB ATM, XML and other related web and/or MMS related services. These components are discrete and encapsulate the data. They minimize system dependencies and reduce interaction to a set of inputs and desired outputs. To use a component, a developer may call its interface. The functionality once developed can be used in various applications, hence making the component reusable.
- Unit A may be an actual network services platform.
- Unit A provides network services required to transmit encoded data over the wireless network, either in a D&P (Download and Play) or a streaming profile.
- Unit A also provides support for peer-to-peer (P2P) communications in mobile video conferencing applications, as well as communicates with the wireless service provider to expedite billing and other ancillary issues.
- P2P peer-to-peer
- a user 203 with unrestricted mobility (such as a person driving a car downtown) is able to access his or her wireless multimedia services using the nearest wireless base station (BS) 209 of the service provider to which he or she subscribes.
- the connection could be established using a wide range of technologies including, but not limited to, WCDMA [UMTS], IS-95 A/B-CDMA 1.X/EVDO EVDV, IS-2000-CDMA2000, GSM-GPRS-EDGE, AMPS, iDEN/WiDEN, and Wi-MAX.
- the BS 209 communicates with the switching office (MTSO) 210 of the service provider over a TCP/IP or UDP/IP connection on the wireless WAN 204.
- the MTSO 210 handles hand-off, call dropping, roaming and other user profile issues.
- the payload and profile data is sent to the wireless ISP server for processing.
- the user 202 has limited mobility, for example, within a home or office building (e.g., a LAN controlled by access point/gateway 211).
- a user sends in a request for a particular service over a short-range wireless connection, which includes, but is not restricted to, a BluetoothTM, Wi-FiTM (IEEETM 802.1 lx), HomeRF, HiperLAN/1 or HiperLan/2 connection, via an access point (AP) and the corporate gateway 211, to the web gateway of his or her service provider.
- the ISP communicates with the MTSO 210, to forward the request to the server suite 201. All communications are over a TCP/IP or UDP/IP connection 206.
- peer-to-peer (P2P) communication is enabled by bypassing the server 201 altogether.
- all communications, payload transfer and audio/video processing are routed or delegated through the wireless ISP server (e.g., server 207) without any significant load on the server, other than performing the functions of control, assignment, and monitoring.
- the system capabilities may be classified based on the nature of the services and modes of payload transfer.
- the user waits for the entire payload (e.g., video/audio clip) to be downloaded onto his or her wireless mobile unit or handset before playing it.
- Such a service has a large latency period, but can be transported over secure and reliable TCP/IP connections.
- the payload routing is the same as before, with the exception that it is now transported over a streaming protocol stack (e.g., RTSP/RTP, RTCP, SDP) over a UDP/IP network (e.g., networks 205-206).
- a streaming protocol stack e.g., RTSP/RTP, RTCP, SDP
- UDP/IP network e.g., networks 205-206.
- This ensures that the payload packets are transmitted quickly, though there is a chance of data corruption (e.g., packet loss) due to the unreliable nature of the UDP connection.
- the payload is routed through a UDP IP connection, to ensure live video/audio quality needed for video conferencing applications.
- the decoder as well as the encoder may be available on hardware, software, or a combination of both.
- the encoder may be stored in the remote server, which provides the required service over an appropriate connection, while a lightweight software decoder may be stored in the memory of the wireless handheld terminal.
- the decoder APIs can be downloaded from an application server (e.g., unit A) over an HTTP/FTP-over- TCP/IP connection.
- an application server e.g., unit A
- HTTP/FTP-over- TCP/IP connection e.g., HTTP/FTP-over- TCP/IP connection.
- decoder e.g., an application layer software
- FIGS 4A and 4B are data flow diagrams illustrating exemplary encoding and decoding processes through an encoding system and a decoding system respectively, according to certain embodiments of the invention.
- Figure 5 is a flow diagram illustrating an exemplary process for encoding digital image data according to one embodiment.
- the exemplary process 500 may be performed by processing logic that may include hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system, a server, or a dedicated machine), or a combination of both.
- processing logic may include hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system, a server, or a dedicated machine), or a combination of both.
- exemplary process 500 may be performed by a server component (e.g., server suite), such as, for example, server 101 of Figure 1 or sever 201 of Figure 2.
- server component e.g., server suite
- the codec works on a raw file format 401 having raw YUV color frame data specified by any of the several standard file and frame formats including, but not limited to, high definition television (HDTV), standard definition television (SDTV), extended video graphics array (XVGA), standard video graphics array (SVGA), video graphics array (VGA), common interchange format (GIF), quarter common interchange format (QCIF) and sub- quarter interchange format (S-QCIF).
- the pixel data is stored in a byte format, which is read in serial fashion and stored in a 1 -dimensional array.
- each image, henceforth to be called the 'frame' includes three maps.
- Each of these maps may be designated to either store one primary color component, or in a more asymmetric scheme, one map stores the luminance information (also referred to as a luma map or Y map), while the other two maps store the chrominance information (also referred to as chroma maps or Cb/Cr maps).
- the Y map stores the luma information of the frame, while the chroma information is stored in two quadrature components.
- the system is designed to work on a wide variety of chrominance sub-sampling formats (which includes, but is not restricted to, the 4: 1 : 1 color format).
- the dimensions of the chroma maps are an integral fraction of the dimensions of the luma map, along both the cardinal directions.
- the pixel data is stored in a byte format in the raw payload file, which is read in serial fashion and stored in a set of 3 one-dimensional arrays, one for each map.
- the 2- dimensional co-ordinate of each pixel in the image map is mapped onto the indexing system of the 1 -dimensional array representing the current color map.
- the actual 1- dimensional index is divided by the width of the frame, to obtain the "row number”.
- a modulo operation on the 1-dimensional index gives a remainder that is the "column number" of the corresponding pixel.
- the pixel coefficient values are scaled up by shifting the absolute value of each pixel coefficient by a predetermined factor (e.g., factor of 64). This increases the dynamic range of the pixel coefficient values, thereby allowing for a finer approximation of the reconstructed frame during decoding.
- the next operation in the encoding process is to transform the payload from the spatial domain into the multi-resolution domain.
- a set of forward and backward wavelet filter 402 coefficients with an integral number of taps is used for the low pass and high pass filtering operations (e.g., operation 501).
- the filter coefficients may be modified in such a way that all operations can be done in-place, without a need for buffering the pixel values in a separate area in memory. This saves valuable volatile-memory space and processing time.
- the wavelet filtering operations on each image pixel are performed in-place, and the resultant coefficients maintain their relative position in the sub-band map.
- the entire wavelet decomposition process is split into its horizontal and vertical components, with no particular preference to the order in which the cardinal orientation of filtering may be chosen. Due to the unique lifting nature of the filtering process, the complex mathematical computations involved in the filtering process is reduced to a set of fast, low-complexity addition and/or subtraction operations.
- a row or column (also referred to as a row vector or a column vector) is chosen, depending on the direction of the current filtering process.
- a low pass filtering operation is performed on every pixel that has an even index relative to the first pixel in the current vector
- a high pass filtering operation is performed on every pixel that has an odd index relative to the first pixel of the same vector.
- the pixel whose wavelet coefficient is to be determined, along with a set of pixels symmetrically arranged around it in its neighborhood along the current orientation of filtering, in the current vector is chosen.
- Wavelet filters with a vanishing moment of four is applied on the pixels.
- four tap high pass and low pass filters are used for the transformation.
- the high pass filter combines the four neighboring even pixels weighted and normalized, as shown below, for filtering an odd pixel [9*(X k - ! + X k+1 ) - (X k - 3 + X k+3 ) + 16] /32
- the low pass filter combines the four neighboring odd pixels weighted and normalized, as shown below, for filtering an even pixel [9*(X k - ! + X k+1 ) - (X k . 3 + ⁇ k+3 ) + 8] /16 where X is the pixel at position k.
- the wavelet filtering operation is viewed as a dyadic hierarchical filtering process, meaning that the end-result of a single iteration of the filtering process on the image is to decimate it into four sub-bands, or sub- images, each with half the dimensions in both directions as the original image.
- the four sub-bands, or sub-images are labeled as HH k , HL k , LH k and LL (where k is the level of decomposition beginning with one for the finest level), depending on their spatial orientation relative to the original image.
- the entire filtering process ifr repeated on only the LL k sub-image obtained in the previous pass, to obtain four sub-images called HH k - l5 IM k -u HL f c-t and LL k - l5 which have half the dimensions of LL , as explained above.
- This process is repeated for as many levels of decomposition as is desired, or until the LL sub-band has been reduced to a block which is one pixel across, in which case, further decimation is no longer possible.
- the filtering is split into horizontal and vertical filtering operations. For the vertical filtering mode, each column (e.g., vertical vector) in the three maps is processed one at a time.
- the temporary vector is split into two halves. Pixels located in the even numbered memory locations (such as 0, 2, 4,...) of the temporary vector are low pass filtered using the low pass filter (LPF) coefficients, while the pixels in the odd numbered memory locations (such as 1, 3, 5,...) of the temporary vector are high pass filtered using the high pass filter (HPF) coefficients.
- LPF low pass filter
- HPF high pass filter
- the result of each filtering operation (high-pass or low-pass) is stored in the current vector, such that all the results of the low-pass filtering operations are stored in the upper half of the vector (e.g., the top half of a vertical vector, or the left half of a horizontal vector, depending on the current orientation of filtering), while the results from the high-pass filtering operations are stored in the lower half of the column (e.g., the bottom half of a vertical vector, or the right half of a horizontal vector).
- the pixel data is decimated in a single iteration.
- the entire process is repeated for all the columns and rows in the current map and frame.
- the entire process is repeated for all three maps for the current frame, to obtain the wavelet transformed image.
- the bootstrapped source entropy and arithmetic coding process 403 of the wavelet map is also referred to as channel coding (e.g., operation 502).
- the arithmetic coding exploits the intimate relationships between spatially homologous blocks within the sub-band tree structure generated in the wavelet transformation 402 described above.
- the data in the wavelet map is encoded by representing the significance (e.g., with respect to a variable-size quantization threshold), sign and bit plane information of the pixels using a single bit alphabet.
- the bit stream is encoded in an embedded form, meaning that all the relevant information of a single pixel at a particular quantization threshold is transmitted as a continuous stream of bits.
- the quantization threshold depends on the number of bits used to represent the Wavelet coefficients. In this embodiment, sixteen bits are used for representing the coefficients. Hence for the first pass the quantization threshold is set, for example, at 0x8000. After a single pass, the threshold is lowered, and the pixels are encoded in the same or similar order as before until substantially all the pixels have been processed. This ensures that all pixels are progressively coded and transmitted in the bit stream.
- the entropy coded bit stream is further compressed by passing the outputted bit through a context based adaptive arithmetic encoder 404 (also referred to as a channel encoder), as shown as operation 503.
- CAB AC This context based adaptive binary arithmetic coder
- This context based adaptive binary arithmetic coder encodes the bit information depending on the probability of occurrence of a predetermined set of bits immediately preceding the current bit.
- the context in which the current bit is " encoded depends on the nature of the information represented by the bit (significance, sign or bit plane information) and the location of the coefficient being coded in the hierarchical tree structure.
- the concept of a CAB AC is similar in principle to the one specified in the ITU-T SG16 WP3 Q.6 (VCEG) Rec. H.264 and ISO/EEC JTC 1/SC 29/WG 11 (MPEG) Rec. 14496-10 (MPEG4 part 10). The difference lies in the context modeling, estimation and adaptation of probabilities.
- the coefficients of the embodiment has different statistical characteristics.
- the CAB AC-type entropy coder, as specified in the embodiment, is designed to exploit these characteristics to the maximum.
- the context is an n-bit data structure with a dynamic range of 0 to 2 n . With every new bit coded, the context variable assigned to the current bit is updated, based on a probability estimation table (PET).
- the system uses (9 x m) context variables for each frame - for three bit classes over three spatial orientation trees, and all sub-bands over m levels of decomposition.
- the decoder which may reside in the client, may be implemented similar to the exemplary encoder 400 of Figure 4A, but in a reversed order as shown in Figure 4B.
- FIG. 6 is a diagram illustrating an exemplary pixel map for encoding processing according to one embodiment.
- the root of the tree structure may be made up of the set of all the pixels in the coarsest sub-band, LL , and the set be labeled as H.
- the pixels in set H are grouped in sets of 2x2, or quads.
- each quad in set H (e.g., block 601) has four pixels, with all but the top-left member 602 of every quad having four descendants (e.g., blocks 603-605) in the spatially homologous next finer level of decomposition.
- the top-right pixel in a quad has four descendant pixels 604 (in a 2x2 format) in the next finer sub-band with the same spatial orientation ( ⁇ L ⁇ . ⁇ in this case).
- the relative location of the descendants is related to the spatial orientation of the tree root.
- the first generation descendants of a coefficient (henceforth labeled as offspring) of the top-right pixel in the top-left quad of set H are the top-left 2x2 quad in HL k -j (e.g., block 604).
- the offspring of the bottom right pixel in any quad of set H lie in spatially homologous positions in the HH k - ⁇ sub-band, while the descendants of the bottom left pixel in any quad of set H lie in spatially homologous positions in the LH k -i sub-band (e.g., block 603).
- Descendants beyond the first generation of pixels, and sets (including quads) thereof, are generally labeled as grandchildren coefficients, for example, blocks 606-611 as shown in Figure 6.
- a unique data structure records the order in which the coefficients are encoded.
- Three dynamically linked data structures, or queues, are maintained for this purpose, labeled as insignificant pixel queue (IPQ), insignificant set queue (ISQ) and significant pixel queue (SPQ).
- IPQ insignificant pixel queue
- ISQ insignificant set queue
- SPQ significant pixel queue
- each queue is implemented as a dynamic data structure, which includes, but is not restricted to, a doubly linked list or a stack array structure, where each node stores information about the pixel such as coordinates, bit plane number when the pixel becomes significant and type of ISQ list.
- three types of sets of transform coefficients are defined to partition the pixels and their descendant trees. However, more or less sets may be implemented.
- the set D(T) is the set of all descendants of a pixel, or an arbitrary set, T, thereof. This includes direct descendants (e.g., offspring such as blocks 603-605) as well as grandchildren coefficients (e.g., blocks 606-608).
- the set O(T) is defined as the set of all first generation, or direct, descendants of a pixel, or an arbitrary set, T, thereof (e.g., blocks 603-605).
- two types of ISQ entries may be defined. ISQ entries of type ⁇ represent the set D(T). ISQ entries of type ⁇ represent the set L(T).
- a binary metric used extensively in the encoding process is the significance function, S n (T).
- the significance function gives an output of one if the largest wavelet coefficient in the set ⁇ is larger than the current quantization threshold level (e.g., the quantization threshold in the current iteration), or else give an output of zero.
- the significance function may be defined as follows:
- S n (T) is the set of pixels, T whose significance is to be measured against the current threshold, ni.
- FIG 8 is a flow diagram illustrating an exemplary encoding process according to one embodiment.
- the exemplary process 500 may be performed by processing logic that may include hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system, a server, or a dedicated machine), or a combination of both.
- processing logic may include hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system, a server, or a dedicated machine), or a combination of both.
- exemplary process 500 may be performed by a server component (e.g., server suite), such as, for example, server 101 of Figure 1 or sever 201 of Figure 2.
- server suite e.g., server suite
- the first phase in the encoding process of the encoder (also referred to as the initialization pass) is the determination and transmission (as a sequence of 8 bits, in binary format) of the number of passes the encoder has to iterate through (block 801).
- the number of iterations is less than or equal to the number of bit-planes of the largest wavelet coefficient in the current map.
- the number of iterations to code all the bits in a single map is determined by the number of quantization levels. In one embodiment, this is determined using a formula that may be defined as follows: npceilQogalw roax l)
- w ⁇ is the largest wavelet coefficient in the current map. This number is transmitted (without context) into the byte stream. The coding process is then iterated ni times over the entire map.
- IPQ is populated with all the pixels in set H
- ISQ is populated with all the pixels in set H that have descendants (i.e., in set H, all the pixels in every quad except the top-left one).
- the SPQ is kept empty and is filled gradually as pixels become significant against the current quantization threshold.
- all the pixels in the IPQ are sorted to determine which ones have become significant with respect to a current quantization threshold.
- the value of the significance function for the current pixel is determined, and the value is sent out as the output in the form of a single bit.
- the sign bit of the pixel entry if the value of the significance function in the previous operation were one, is sent as the output in the form of a single bit.
- the output of the sign is 1 if the entry is positive and 0 if the entry is negative.
- the significance of the set D(T) is transmitted as a single bit. If this output bit is one (e.g., the entry has one or more significant descendants), a similar test is performed for the direct (e.g., first generation) descendants, or offspring of the entry. For all four offspring of the entry (defined by set O(T)), according to one embodiment, two operations are performed. First, the significance of the offspring pixel is determined and transmitted. As a second operation, if the offspring pixel is significant, the sign of the ISQ entry is transmitted.
- type ⁇ e.g., the class of entries that represents all the descendants of the pixel across all generations
- a value of one is transmitted if the entry is positive, or a value of zero is transmitted if the entry is negative.
- the entry is then deleted from the ISQ and appended to the SPQ. If, however, the offspring pixel is insignificant, the offspring pixel is removed from the ISQ, and appended to the IPQ.
- the current ISQ entry is retained depending on the depth of the descendant hierarchy. If the entry has no descendants beyond the immediate offspring (L (T) ⁇ (0), the entry is purged from the ISQ. If, however, descendants for the current set exist beyond the first generation, the entry is removed from its current position in the ISQ, and appended to the end of the ISQ as an entry of type ⁇ (block 805).
- the significance test is performed on the set L (T). For every entry in the ISQ of type ⁇ , the significance of the set L(T) is tested (e.g., using a significant function) and transmitted as a single-bit. If there exists one or more significant pixels in the set ⁇ L(T), all four offspring of the current ISQ entry are appended to the ISQ as type ot entries at block 806, to be processed in future passes. The current entry in the ISQ is then purged from the queue at block 807.
- the final phase in the coding process is referred to as the refinement pass.
- the refinement pass At the end of the sorting pass, all the pixels (or sets thereof) that have become significant against the current quantization threshold level up to the current iteration are removed from the IPQ and appended to the SPQ.
- the iteration number "n" when the entry was appended to the queue (and the corresponding coefficient became significant against the current quantization threshold level), is recorded along with the co-ordinate information.
- the n ⁇ most significant bit is transmitted.
- the output of the entropy coder may be passed through a CABAC-type processor.
- the embedded output stream of the entropy coder has been designed in a way, such that the compression is optimized by segregating the bit stream based on the context in which the particular bit has been coded.
- the bit stream includes the bits representing the binary decisions made during the coding. The bits corresponding to the same decisions are segregated and coded separately.
- the Wavelet transformed coefficients are arranged such that the coefficients with identical characteristics are grouped together, the decisions made on the coefficients in a group are expected to be similar or identical. Hence the bit-stream generated as a result would have longer runs of identical bits, making it more suitable for compression, and achieving more optimal level of compression.
- the wavelet coefficients "w" have a unique spatial correlation with one another, depending on which sub-band and tree it may belong to. Particularly, such a close correlation exists between the pixels of a single sub-band at a particular level, though the level of correlation weakens across pixels of different sub-bands in the same or different trees and levels. Also note that there is a run- length based correlation between bits that have the similar syntactic relationship.
- bits in the embedded stream represent sign information for a particular pixel, while others represent significance information. For example, a value of one in this case denotes that the pixel currently being processed is significant with respect to the current quantization threshold, while a zero value denotes otherwise.
- a third and final class of bits represent refinement bits, which encode the actual quantization error information.
- each bit in the output stream may be classified based on the nature of the information it represents (3 types) or the location in the sub-band tree map (3 ni + 1 possible locations, where n ! is the number of levels of decomposition). This gives rise to 3 x (3 nj + 1) possible contexts in which a bit can exist, and a unique context is used to code an arbitrary bit.
- context variables act as an interface between the output of the entropy coder and the binary arithmetic coder.
- Each context variable is an 8-bitmemory location, which updates its value one bit at a time, as additional coded bits are outputted.
- PAT probability estimation table
- the wavelet map may be split into blocks of size 32 x 32 (in pixels), and each such block is source coded independent of all other blocks in the wavelet map.
- each wavelet map if the dimensions of the map are not a multiple of 32 in either direction, a set of columns and/or rows are padded with zeros such that the new dimensions of the map is a multiple of 32 in both directions.
- the coefficients in each such block are arranged in the hierarchical Mallat format.
- the number of levels of decomposition may be arbitrary. In one embodiment, the number is five, so that the coarsest sub-band in Mallat format of each block is one pixel across.
- the coarsest band is constructed by amalgamating the six coarsest sub- bands in the Mallat format.
- the bands are numbered in a zigzag manner, similar to the sequence shown in Figure 7.
- the coarsest band is labeled as band 0, while the next three bands (HL, LH and HH orientations, in that order) are labeled as bands 1, 2 and 3 respectively, and so on.
- An additional data structure known as a stripe, may be used to represent a set of 4 x 4 coefficients.
- each of bands 0, 1, 2 and 3 is made up of one such stripe.
- Bands in the second and third level of decomposition are made of four and sixteen stripes each.
- quantization thresholds are assigned to all coefficients in band 0 (coarsest), as well as all finer bands. There exists a linear progressive relationship between the thresholds assigned to the various coefficients and bands in the wavelet map. The value of the thresholds are arbitrary, and a matter of conjecture and rigorous experimentation.
- the top-left (coarsest) sub-band (which is a part of the band 0) is assigned a particular threshold value (labeled x), while the top-right and bottom-left sub-bands of the same level of decomposition (also part of band 0) are assigned a threshold of 2x, and the threshold for the bottom-right sub-band is 4x.
- the threshold for the top-right and bottom-left sub-bands is the same as the threshold value of the bottom-right sub-band of the previous (coarser) level, while the bottom- right sub-band of the current (finer) level has a threshold that is double that value.
- This process is applied to the assignment of threshold values for all consecutive bands numbered 0 through 9 in the cunent block.
- the initial thresholds for the four coarsest pixels in the top-left corner of band 0 are set at 4000 , 8000 h , 8000 h and lOOOO h (h denotes a number in hexadecimal notation).
- the four- pixel quartet in the top-right comer of band 0 is assigned a threshold of lOOOO h
- the quartets in the bottom-left and bottom-right comer of band 0 are assigned- thresholds of lOOOO h and 20000 h respectively.
- the coding scheme includes four passes, labeled 0 to 3.
- pass 0 in one embodiment, the decision on the significance of the cunent band is considered.
- the coarsest band band 0
- the cunent band is marked as significant.
- An extra bit e.g., 1 is transmitted to the output stream to represent a significant band. If the cunent band has already been marked as significant, then no further action is necessary.
- each stripe is a set of 4 x 4 pixels, and each set of 2 x 2 pixels in the stripe has a hierarchical parent-child relationship with a homologous pixel in the previous coarser sub-band with the same orientation.
- each stripe has a parent-child hierarchical relationship with a 2 x 2 quad that is homologous in its spatial orientation in the previous coarser sub-band (see Fig. 11).
- a stripe is designated as significant if its 2 x 2 quad parent (as explained above) is also significant, or the band within which the stripe resides has been marked as significant (in pass 0).
- a parent quad is marked as significant if one or more of the coefficients in the quad is above the cunent threshold level for the band in which the quad resides.
- the significance information of individual pixels in the cunent stripe, along with their sign information, is considered.
- the number of pixels in the cunent stripe that are significant is recorded. This information is used to determine which context variable is to be used to code the significance information of the pixels in the cunent stripe (see discussion on CAB AC above).
- a binary 1 is transmitted, followed by a single bit for the sign of that coefficient (1 for a positive coefficient, or a 0 for a negative coefficient). If the cunent coefficient is insignificant, a 0 is transmitted, and its sign need not be checked. This test is performed on all 16 pixels in the cunent stripe, and is repeated over all the stripes in the cunent band, and for all bands in the cunent block of the wavelet map.
- the refinement information for each pixel in the cunent block is transmitted. For every band, each pixel is compared against the threshold level for the particular band and stripe. If the absolute value of the coefficient is above the threshold level for the cunent band and stripe, then a 1 (bit) is transmitted, else a 0 is transmitted.
- the first three passes are nested within each other for the cunent block, band and stripe.
- pass 0 is performed on every band in the cunent block, with the bands numbered sequentially in a zigzag fashion, and tested in that order.
- pass 1 for performed on all the stripes in the" band in a raster scan fashion.
- pass 2 is performed on every coefficient of the cunent stripe, also in raster scan mode.
- Pass 3 is performed on all the coefficients of the block, without consideration to the sequence of bands of stripes within the bands.
- a fast and efficient motion prediction scheme is made to take optimal advantage of the temporal redundancy inherent in the video stream.
- the spatial shift in the wavelet coefficient's location is tracked using an innovative, fast and accurate motion prediction routine, in order to exploit the temporal redundancy between the wavelet coefficients of homologous sub-bands in successive frames in a video clip.
- every sub-band, or sub-image, in the entire wavelet map for each frame in the video clip represents a sub-sampled and decimated version of the original image.
- a feedback loop is introduced in the linear signal flow path.
- FIGS 9A-9B and 10A-10B are block diagrams illustrating exemplary encoding and decoding processes according to certain embodiments of the invention.
- the overall motion in the original image is tracked by following the motion of homologous blocks of pixels in every sub-band of consecutive frames.
- motion is tracked only in the luma (Y) map, while the same motion prediction information is used in the two chroma (Cr and Cb) maps. This works relatively well since it can be assumed that chroma information follows changes in the luma map fairly assiduously.
- a full-fledged search of the entire search space is performed only in the four coarsest sub-bands as shown in Figure 6, while this information is scaled and refined using a set of affined transformations, for example, in the six finer sub-bands. This saves a considerable amount of bandwidth, due to the less number of bits that now needs to be coded and transmitted to represent the motion information, without any significant loss of fidelity.
- I-frames cunent frames that do not need to be predictively coded for temporal redundancies are labeled as infra-coded frames (I-frames).
- I-frames cunent frames that do not need to be predictively coded for temporal redundancies are labeled as infra-coded frames (I-frames).
- Frames that are coded using information from previously coded frames are called predicted frames (P-frames).
- P-frames predicted frames
- B -frames bi-directional frames
- the luma (Y) map of- the cunent frame may be encoded using the arithmetic coding I II scheme with a target bit-rate.
- the bit budget is exhausted (e.g. a number of bits encoded that will be transmitted within a period of time determined by the target bit rate), or all the bit-planes have been coded, the coding is stopped, and the similar reverse procedure (called inverse arithmetic coding I/II) is executed to recover the (lossy) version of the luma (Y) component of the wavelet map of the cunent frame.
- the version of arithmetic coding to be used here is similar or the same as the version used in the forward entropy coder described above.
- FIG 11 is a flow diagram illustrating an exemplary process for motion prediction according to one embodiment.
- the exemplary process 1100 may be performed by processing logic that may include hardware (e.g., circuitry, dedicated logic, etc.), software (such as is ran on a general-purpose computer system, a server, or a dedicated machine), or a combination of both.
- processing logic may include hardware (e.g., circuitry, dedicated logic, etc.), software (such as is ran on a general-purpose computer system, a server, or a dedicated machine), or a combination of both.
- exemplary process 1100 may be performed by a server component (e.g., server suite), such as, for example, server 101 of Figure 1 or sever 201 of Figure 2.
- server suite e.g., server suite
- the recovered wavelet map is buffered as the reference frame, for use as a reference for the next frame in the sequence (block 1101).
- the second frame is read and decomposed using the n-level wavelet decomposition filter-bank, to generate a new cunent frame.
- a unique search-and-match algorithm is performed on the wavelet map to keep track on pixels, and sets thereof, which have changed their location due to general motion in the video sequence.
- the search algorithm is refened to as motion estimation (ME), while the match algorithm is refened to as motion compensation (MC " ).
- a lower threshold is set, to determine which coefficient values need to be tracked for motion, and eventually compensated for.
- most coefficients in the finer sub-bands are automatically quantized to zero, while most of the coefficients in the coarsest sub-bands are typically not quantized to zero.
- it makes sense to determine the largest coefficient in the intermediate (level 2) sub- bands during the encoding process, which is quantized down to zero during the lossy reconstruction process, and use that as a lower threshold (also refened to as loThres).
- a traditional search-and-match is performed on the four coarsest sub-bands of the wavelet maps of the reference and cunent frames (block 1102).
- the motion prediction routine performed on these sub-bands involves a simple block search-and-match algorithm on homologous blocks of pixels. This operation identifies the blocks where motion has occuned. The amount of motion is determined and compensated for. This reduces the amount of information that is required to be transmitted, hence leading to better compression.
- a block neighborhood is defined around the block of pixels in the reference map, whose motion is to be estimated (which is called the reference block), as shown in- Figure-12. . . . . . .
- the depth of the neighborhood around the pixel block is usually set at equal to k and 1 respectively, though a slightly lower value (e.g., k-1 and 1-1) performs equally well.
- the neighborhood region spills over outside the sub-band.
- an edge extension zone is used to create the block neighborhood.
- a minoring scheme is used to create the edge extension zone.
- columns of pixels in the neighborhood zone are filled with pixels in the same column along the horizontal edge of the block, in a reverse order.
- the pixel in the neighborhood zone closest to the edge is filled with the value of the pixel directly abutting it in the same column, and inside the block.
- the block of pixels in the cunent as well as the reference frames that are in the same relative position in homologous sub-bands are used for the ME routine.
- the block of pixels in the cunent frame, which is to be matched is called the cunent block.
- the region encompassed by the block neighborhood around the reference block can be viewed as being made up of several blocks having the same dimensions as the reference (or cunent) block.
- the metric used to measure the objective numerical difference between the cunent and any block of the same size in the neighborhood of the reference block is the popular Li, or mean absolute enor (MAE) metric.
- MAE mean absolute enor
- a block of pixels with the same dimensions as the cunent block is identified within the neighborhood zone.
- the difference between the absolute values of conesponding pixels in the two blocks is computed and summed. This process is repeated for all such possible blocks of pixels within the neighborhood region, including the reference block itself (block 1104).
- One important aspect of the search technique is the order in which the search take place. Rather than using a traditional raster scan, according to one embodiment, an innovative outward coil technique is used.
- the first block in the cunent neighborhood in the cunent sub-band of the reference frame to be matched with the cunent block (of the homologous sub-band in the cunent frame) is the reference block itself.
- the reference block has been tested, all the blocks which are at a one pixel offset on all sides of the reference block are tested. After the first iteration, all blocks that are at two pixels' offset from the reference block are tested. In this fashion, the search space progressively moves outwards-, until all the blocks in the cunent neighborhood have been tested.
- the particular block within the neighborhood region that possesses the minimum MAE is of special interest to the cunent system (also refened to as a matching block). This is the block of pixels in the reference (previous) frame, which is closest in terms of absolute difference to the cunent block of pixels in the cunent frame.
- a unique data structure also refened to as a motion vector (MV) is utilized.
- the MV of the block being tested contains information on the relative displacement between the reference block (e.g., a block of a previous or future frame) and the matching block (in the cunent frame).
- the top- left comer of each block is chosen as the point of reference to track matching blocks.
- the relative shift between the coordinates of the top-left comer of the reference block and that of the matching block is stored in the motion vector data structure.
- the motion vectors in the LL k sub-band are labeled N ⁇ °, while the motion vectors in the three other coarsest sub-bands are labeled V 2 °, where o is the orientation (HL, LH and HH), as shown in Figure 12.
- the data is transmitted without context through a baseline binary arithmetic compression algorithm (also refened to herein as the 'pass through mode').
- a hierarchical order is followed while transmitting motion information, especially the motion vector data structure.
- Motion vector information both absolute (from coarser levels) and refined (from finer levels), according to one embodiment, has a hierarchical structure.
- the motion vectors conesponding to blocks that share a parent-child relationship along the same spatial orientation tree have some degree of conelation, and hence may be transmitted using the same context variable.
- the pixel values of the matching block in the reference frame are subtracted from the homologous block in the cunent frame, and the result of each operation is used to overwrite the conesponding pixel in the cunent block.
- This difference block also refened to as the compensated block, replaces the current block in the cunent frame.
- This process is refened to as motion compensation (MC).
- MC motion compensation
- the previously defined lower threshold (loThres) is used to perform motion compensation only on such coefficients.
- the compensated coefficient may be quantized down to zero. This ensures that only those coefficients that make some significant contribution to the overall fidelity of the reconstructed frame are allowed to contribute to the final bit rate.
- the above ME/MC process is repeated over all 2 x 2 blocks in the four coarsest sub-bands of the cunent and reference wavelet maps.
- a refinement motion prediction scheme may be implemented using an affined transformation over the motion vectors conesponding to the homologous blocks in the coarsest sub-bands, and applying a regular search routine over a limited area in the region around the displaced reference block as shown in Figure 12.
- the relative position of the reference block in the finer sub-bands is closely related to the reference blocks in the coarsest sub-bands.
- the descendants of the top-left 2 x 2 block of pixels in the HL 3 sub-band include the 4 x 4 block of pixels in the top-left comer of HL 2 , and the 8 x 8 block of pixels in the top- left comer of HL l5 as shown in Figure 6.
- the size of a reference block along both dimensions is twice that of a homologous reference block in the previously coarser sub-band.
- the size of a motion vector in the finer sub- band may be assumed to be twice as the motion vector in a homologous coarser sub- band. This provides a very coarse approximation of the spatial shift of the pixels in the reference block in the sub-band. To further refine this approximation and track the motion of pixels in finer sub-bands more accurately, according to one embodiment, a refined-search-and-match routine is performed on reference blocks in the finer sub-bands.
- the dimensions of the reference block depend upon the level of the sub- band where the reference block resides.
- reference blocks in level 2 are of size 4 4, while those-in level 3 are of size 8 8, and so on.
- the size of a reference block along both directions is twice as the reference block in the immediately preceding (coarser) sub-band.
- a block with the same or similar dimensions as the reference block in a particular level and shifted by a certain amount along both cardinal directions is identified.
- the amount of displacement depends on the level where the reference block resides, as shown in Figure 12.
- the approximate displacement is 2 * V k °, where, V k ° is the motion vector for a homologous reference block in the coarsest (level 1) sub-band.
- V k ° is the motion vector for a homologous reference block in the coarsest (level 1) sub-band.
- the new reference block is displaced by 2 * V k ° from the original reference block.
- a search region which is identical to the neighborhood zone around the reference block defined earlier, is defined around the new reference block, along with edge extension if the block happens to abut the sub-band edge.
- the depth of the neighborhood zone depends on the level of decomposition. In one embodiment, it has been set at 4 pixels for level 2 sub-bands, and 8 for level 3 sub-bands, and so on.
- the refined-search-and-match routine is implemented in a manner that is similar or identical to the search-and- match routine for the level 1 (coarsest) sub-bands, as described above.
- the (resultant) conected motion vector, V k °, pointing to the net displacement of the matching block is given by adding the approximate (scaled) motion vector, 2 * V ⁇ 0 , and the refinement vector, ⁇ k °.
- the approximate motion vector (to ⁇ account for the doubling of the dimensions of the reference block) is given by V k ° + (2 * V M 0 ).
- a block that is displaced from the original reference block by the approximate motion vector is then used as the new reference block.
- the depth of the neighborhood zone is set at twice the size as that set in the immediately coarser level, around this block.
- the new refined motion vector, ⁇ 2*k °, thus obtained is transmitted in a manner similar to that of coarser levels (see Fig. 12).
- the motion compensation (MC) routine for the refined motion prediction algorithm performed on the finer sub-bands is similar or identical to the process outlined for the coarsest sub-bands.
- the matching block, pointed to by the refined motion vector is subtracted pixel-by-pixel from the cunent block (in the homologous position in the cunent frame), and the difference is overwritten in the location occupied by the cunent block. This block is now called the compensated block (as described above for coarser sub-bands).
- the new frame is called the compensated frame.
- This compensated (difference) frame also called the predicted frame
- the bit stream is transmitted over the transmission channel (e.g., blocks 403-405 of Figure 4A).
- the source coding and motion compensation feedback loop for predicted frame is similar to the process employed for Intra-frames, with some minor modifications. It is well known that the statistical distribution of coefficient values in a predicted-frame is different from the one found in Intra-coded frames. In case of Intra coded frames, it is assumed that the energy compaction of the wavelet filter ensures superior energy compaction. This ensures that a majority portion of the energy is concentrated in the four coarsest sub-bands. However, during the entire setup, the data has the non-deterministic statistical properties of real time visual signals, such as video sequences. But in the case of predicted frames, only the spatially variant difference values of the pixels are stored, and these coefficients lack the entropy of a real video clip. Hence, the superior energy compaction of the predicted wavelet map cannot be taken for granted.
- the coarsest sub-band has the largest mean and variance of coefficient values, and these statistics decrease along a logarithmic curve towards finer levels.
- Such a "downhill” contour maintains the high level of energy compaction in the wavelet map.
- This "top-heavy" distribution helps in the high coding efficiency and gain of the source coder.
- the first and second statistical moments of these sub-bands are not so intimately related in predicted wavelet maps.
- the wavelet coefficients of the finer sub-bands in a predicted map may be scaled down from their original values.
- this process of scaling is reversed in the decoding process.
- scaling factors of 8, 16 and 32 for the finest sub-bands (other than the LL k sub-band) along a particular tree orientation for a three level decomposition.
- a group-of-frames is defined as a temporally contiguous set of frames, beginning with an intra-coded frame, and succeeded by predicted (B or P or otherwise) frames.
- an intra-coded frame signals the beginning of a new GOF.
- An important facet of rate control is to ensure that intra-coded frames are introduced only when it is needed, due to their inherently higher coding rates.
- the two events that wanant the introduction of intra-coded frames is a fall in the average frame PSNR below acceptable levels and/or a change in scene in a video clip. Due to the accurate motion prediction routine used by the system, the average PSNR of the frame less likely falls below a previously accepted threshold (thereby ensuring good subjectively quality throughout the entire video sequence).
- LL k coarsest
- the absolute difference of homologous pixels in the LL k sub-band is computed and compared with respect to a threshold.
- This threshold is determined upon experimentation on a wide range of video clips. In a particular embodiment, a value of 500 is suitable for most purposes. This absolute differencing operation is performed on all coefficients of the coarsest sub-band, and a counter keeps track of the number of cases where the value of the absolute difference exceeds the threshold.
- the number of pixels in whose case the absolute difference exceeds the threshold is above or equal to a predetermined level, it can be assumed that there has been such a drastic change in the scenery in the video frame, so as to wanant an introduction of an intra-coded frame, thereby marking the end of the cunent GOF, and the beginning of a new one.
- the numeric level hereby labeled as the scene change factor (SCF) that determines a scene change is a matter of experimentation.
- SCF scene change factor
- a value of 50 is suitable for most cases.
- a technique is employed to ensure that only those matching blocks (within a sub-band) that satisfy certain minimum and maximum threshold requirements are compensated and coded. This technique is called adaptive threshold.
- the first block to be compared with the cunent block is the reference block.
- the MAE of this block is compared with the MAE of the reference block against a threshold. If the difference in the values of the MAE of these two blocks is less than a threshold value, this match is discarded, and the reference block continues to be regarded as the best match.
- the threshold value may be determined by experimentation, and is different for different levels of the wavelet tree structure. At the coarser level (higher sub-bands) the coefficients are the average values while at the finer level (lower sub- bands) the coefficients are the difference values. Average values are larger than the difference values.
- the threshold value is higher than other sub-bands. All the sub-bands at given decomposition levels have the same quantization value and the value reduces as we go down the decomposition levels.
- the energy of the cunent block in the cunent frame
- the energy of the compensated block obtained by differencing homologous pixels of the cunent block in the cunent frame and the matching block in the reference frame.
- the energy in this case is a simple first order metric obtained by summing the coefficient values of the particular compensated block.
- the compensated block is discarded and the cunent block is used in its place in the compensated (residual) frame.
- the value of the cunent threshold level may be determined through extensive experimentation, and is different for the various levels of the wavelet pyramid.
- the motion prediction routine used in certain embodiments is refened to herein as bi-directional multi-resolution motion prediction (B-MRMP).
- B-MRMP bi-directional multi-resolution motion prediction
- motion is estimated from a previous as well as succeeding frame.
- the temporal offset between past, cunent and future frames used for motion prediction is a matter of conjecture.
- a temporal offset of one is usually applied for best results.
- frames are read and wavelet transformed in pairs. In such a scenario, three popular sequence modes are possible.
- the first frame in the pair is the bi-directionally predicted frame, where each block in each- sub-band of this frame, which undergoes the motion prediction routine is tested against a homologous block in both a previously coded (reference) frame, as well future (P or otherwise) frame.
- the frame data is read and wavelet transformed in the natural order.
- the (succeeding) P frame is motion predicted before the B frame.
- the P frame is predicted by applying the motion prediction routine using the second frame of the last pair of frames (e.g., the reference frame).
- the frame is then reconstructed and compensated using the motion prediction techniques, to recover a lossy version of the frame.
- Each block in the B frame is now motion predicted with homologous blocks from both the (past) reference frame as well as the (future) P frame. If estimation/compensation with the reference block from the reference frame gives a lower energy compensated block, the particular block is compensated using the reference block of the (past) reference frame, or else, compensation is carried out using the reference block of the (future) P frame.
- the decision to use one of the two frames (past reference or future P) for compensation is based on the frame used for this purpose in the parent blocks in the four coarsest sub-bands.
- an anay While recording and transmitting the motion information of the B frame, an anay stores the identity of the frame (past reference or future P) used in the compensation process, and using a 2-bit alphabet. This information for all blocks in the frame is transmitted with context over the channel prior to other motion information.
- the advantage of using B frames is that they do not need compensation and reconstruction in the motion prediction feedback loop, since they are less likely " used as reference frames to predict future frames. Thus this routine passes through the feedback reconstruction loop in the encoding process for half the non-intra-coded frames than in other systems, thereby saving a considerable amount of processing time.
- the first frame in the pair is predictive coded using the second frame of the previous pair of frames as reference.
- the intra-coded frame in the latter part of this pair is used as reference for the next pair of the frames.
- the first frame is an intra-coded frame and is used as reference for the (unidirectional) motion prediction of the second frame in the pair.
- the second (P) frame is reassigned as the new reference frame for the next pair of frames.
- the motion prediction is performed using a single predicted frame, also refened to as uni-directional multi-resolution motion prediction (U-MRMP mode).
- U-MRMP mode uni-directional multi-resolution motion prediction
- the motion compensation (MC) scheme may be replaced with a motion block superposition (MBS).
- MBS motion block superposition
- the motion estimation is performed as described above.
- the arithmetic encoding scheme is highly inefficient in coding predicted (enor) maps (B and P). Due to the skewed probability distribution of coefficients in B and P frames, they do not satisfy the top-heavy tree structure assumptions made in the case of arithmetic coding. This results in several of the large coefficients being interspersed in the finer sub-bands, causing the iterative mechanism of arithmetic coding to loop through several bit planes before these isolated coefficients have been coded for higher fidelity.
- one way to resolve this problem is to avoid working on enor maps altogether.
- the arbitrary GOF size is replaced by a GOF of fixed size.
- the number of frames in the GOF may be equal to the number of frames per second (e.g., a new GOF every second).
- a new GOF is defined from this new I frame.
- the coefficient values of the cunent block are replaced with the homologous pixels of the matching block in the reference frame. This saves time by not computing the difference of the two blocks, and also maintains the general statistics of an intra- coded frame. In effect, this results in the blocks of the first infra-coded frame in the cunent GOF being moved around within a limited region, like a jig-saw puzzle, with the motion being represented using only the conesponding motion vectors.
- a threshold also refened to as the motion information factor (lvflF) may be used to decide-on the mode in which the cunent and future frames are to be temporally coded.
- two independent thresholds are used to compute the MIF.
- Coefficients in the sub-bands in the wavelet map may be used for this purpose.
- the decision tree to classify blocks based on the average amount of motion is based on the segregation of the coefficients into three categories. For blocks whose total energy after compensation is greater than the energy of the original cunent block itself, the conesponding motion vector co-ordinates are set to a predetermined value, such as, for example, a value of 127.
- the other two categories of blocks have motion vectors with both coordinates equal to a value other than the predetermined value.
- these blocks are labeled as NC (non-compensated), Z (zero) and NZ (non-zero) respectively.
- the first threshold is set for the four coarsest sub-bands in the wavelet map.
- a is less than 10% of the value of ⁇ then the particular frame is repeated. Otherwise, motion prediction (B-MRMP) is performed.
- B-MRMP motion prediction
- a similar test with the same test parameters ( a and ⁇ ) is performed on the remaining finer sub-bands. In one embodiment, if a is less than 10% of ? , motion block substitution (MBS) is performed. Otherwise, motion prediction (B- MRMP) is performed.
- the threshold factor and the number of sub-bands to be used in either test is a matter of conjecture and diligenrexperimentation. In one embodiment, 4 (out of a possible 10) sub-bands are used for the first test and the remaining 6 are used for the second test, with a threshold factor of 10% in either case.
- a full search routine of the pixel (spatial) map is introduced prior to the wavelet transformation block, in order to predict and track motion in the spatial domain and thereby exploit the temporal redundancy between consecutive frames in the video sequence, as shown in Figures 9B and 10B.
- a 16 x 16 block size is best suited for tracking real- world global and local motion. This includes, but is not limited to, rotational, translational, camera- pan, and zoom motion. Hence, blocks of this size are refened to as standard macroblocks.
- a unidirectional motion prediction (U-MP) is employed to predict motion between consecutive frames using a full search technique.
- the frame is divided into blocks with height and width of the standard macroblock size (16 x 16).
- frame dimensions are edge extended to be a multiple of 16.
- a standard and uniform technique is applied across all frames.
- the edge extended zone can be filled with the pixels values along the edge of the actual image, for instance, or may be padded with zeros throughout. A variety of techniques may be utilized dependent upon the specific configurations.
- the U-MP routine is applied to all such blocks in a raster scan sequence.
- a neighborhood zone is defined around the edges of the macroblock, as shown in Figure 13.
- the depth of the neighborhood zone is chosen to be equal to 15 pixels in every direction.
- each macroblock to be processed using U-MP is padded with a 15 pixel neighborhood zone around it from all directions.
- the neighborhood zone may extend over to the region outside the image map.
- the neighborhood zone for the macroblock uses pixels from the edge extended zone.
- the U-MP routine may be split into five basic operations.
- a threshold is set to determine which pixels, or sets thereof, need to be compensated in the U-MP process.
- each pixel in the reference frame is subtracted from the homologous pixel in the cunent frame, thereby generating a difference map.
- Each pixel in the difference map is then compared against a pre-determined threshold. The value of the threshold is a matter of conjecture and rigorous experimentation.
- the pixel is marked as active; else it is marked as inactive.
- a count of the number of such active pixels in each 16 x 16 pixels in the reference frame is recorded. If the number of active pixels in the macroblock is above a pre-determined threshold, the macroblock is marked as active; else it is marked as inactive.
- the value of the threshold is a matter of conjecture and rigorous experimentation.
- the second operation in the U-MP process is the unidirectional motion prediction (U-MP) operation.
- U-MP unidirectional motion prediction
- a modification of the traditional half-pel motion prediction algorithm is performed.
- each frame is interpolated by a factor of two, leading to a search area that is four times the original image map.
- the previous frame known as the reference frame
- the cunent frame (known simply as the cunent frame) is used as the other basis of comparison.
- the homologous blocks in these two frames that are compared are called the reference block and the cunent block respectively, as shown in Figure 13.
- the non -integer- pel motion interpolation scheme may be further modified to perform a for of quarter-pel motion prediction as shown in Figure 14.
- the luma map of the cunent and reference frames are interpolated by a factor of four along both the cardinal directions, such that the effective search area in the search-and- match routine is increased by a factor of sixteen.
- the choice of the interpolation mechanism is a matter of conjecture and rigorous experimentation, which includes, but is not restricted to, bi- linear, quadratic and cubic-spline interpolation schemes. The tradeoff between accurate prediction of the interpolated coefficients and speed of computation is a major deciding factor for the choice of scheme.
- each macroblock in the cunent frame is subtracted pixel-by-pixel from the homologous macroblock in the reference frame. This generates the non-displaced compensated block.
- an integer search is performed on every 16 16 macroblock of the cunent frame. In this routine, the pixels of the cunent macroblock are superimposed over every set of pixels of the same size as the cunent block in the neighborhood zone around the reference block.
- the metric employed for comparing these two sets of pixels is the Li (sum of absolute differences - SAD) metric.
- the SAD is computed for all 16 x 16 blocks in the neighborhood zone of the reference block, and the position of the block with the lowest value of SAD is labeled as a matching block.
- the relative position between the matching block and the reference block is recorded using a unique data structure known as the motion vector for the cunent reference block.
- a half-pel search is performed on every 16 x 16 macroblock of the cunent frame (see Fig. 13).
- the motion vector obtained for a particular macroblock in the integer search mode is doubled, and a refined search is performed.
- the depth of the refined search area is one pixel across in all directions. This operation helps in detecting motion which is less than or equal to half a pixel in all directions.
- the resultant motion vector is obtained by summing the scaled motion vector obtained in the integer search and the refined search modes. This and the conesponding SAD value are recorded for future mode selection (see Fig. 13).
- each macroblock is split into four blocks of size 8 x 8, and half-pel search is performed on each of the four blocks (see Fig. 14).
- the set of four resultant motion vectors, obtained by summing the scaled motion vector obtained in the integer search and refined search modes, and their conesponding SAD values are recorded for mode selection later on (see Fig. 14).
- each block of 8x8 within the cunent macroblock is further split into four blocks of 4x4, and the above technique of scaling and refined-search outlined above may be repeated for all possible search areas of dimensions 4x4, 4x8 and 8x4 pixels.
- the SAD values obtained from the refined motion estimation routines outlined in this paragraph are also tabulated for future mode selection.
- the weights are imposed by comparing the SAD values against some predetermined threshold.
- the value of the threshold in each of the three cases outlined above, is a matter of conjecture and rigorous experimentation. This is done to ensure that a mode with higher rate is chosen for a particular macroblock, only when the advantage so obtained, in terms of higher fidelity (and lower SAD), is fairly substantial.
- OBMC overlapped block matching/compensation
- DMD displaced frame difference
- the choice of the Matching Block is a function of the motion vectors of the reference block cunently being tested, as well as its abutting neighbors, as shown in Figure 15.
- the motion vectors from all three blocks are translated to any one comer of the reference block being tested (with no preference being given to any particular comer, though this choice should be consistent throughout the compensation procedure for that block), and the conesponding matching blocks are determined.
- the dimensions of all three matching blocks should be equal to the dimensions of the reference block (see Fig. 15).
- homologous pixels from all the matching blocks, so determined are summed with different weights, and then differenced with the homologous pixel in the cunent block (in the cunent frame). The difference values are overwritten on the conesponding pixel positions in the cunent block. This difference block is labeled as the compensated block.
- the matching block is of size 8 x 8.
- each of the four 8 x 8 blocks carved out of the original 16 x 16 reference block is used to perform OBMC. If the block directly abutting any one of the 8 x 8 blocks is of mode 1MV, its single motion vector is used in the OBMC process. If the abutting block is of mode 4MV, only that 8 x 8 block of such an abutting block, which shares an entire line of pixels as the border with the 8 x 8 block in question (in the reference block being tested) is used (see Fig. 15).
- the weighting function applied to the pixels or sets thereof in the reference block cunently being tested, as well as the function applied to the pixels or sets thereof in the blocks abutting the reference block can be determined using a process of rigorous experimentation.
- a residual frame is generated, as a direct outcome of the OBMC operation described abov ⁇ 7 Using the DFD routine, each block (8 x 8 or 16 x 16) is differenced, and the pixel values are overwritten onto the conesponding pixel positions in the cunent block, thereby generating the residual block. Once all the blocks in the cunent frame have been tested, the resultant frame is labeled as the residual frame.
- the SAD may be compared against a predetermined threshold. If the SAD is below the predetermined threshold, the particular macroblock is marked as a non- compensated macroblock (NCMB). If four such NCMBs are found adjacent to each other in a 2 x 2 grid anay anangement, this set of four blocks is jointly labeled as a non-coded block (NCB).
- NCMB non- compensated macroblock
- the decoder decodes the encoded bit stream has the reverse signal flow as the encoder.
- the relative order of the various signal processing operations are reversed (for example, the wavelet reconstruction block, or I-DWT comes after the source/entropy decoder, inverse arithmetic coding).
- I-DWT the wavelet reconstruction block
- MC + Motion Compensation
- ME/MC motion estimation/compensation
- the motion vector information for a particular block of pixels (of any arbitrary sub-band at any arbitrary level of resolution) is used to mark the cunent block under consideration, and the residual frame is updated (or 'compensated') by simply adding the values of the homologous pixels from the residual block to the current block as shown in Figures 9B and 10B.
- Embodiments of the present invention also relate to an apparatus for performing the operations herein.
- This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD- ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), erasable programmable ROMs (EPROMs), electrically erasable programmable ROMs (EEPROMs), magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- ROMs read-only memories
- RAMs random access memories
- EPROMs erasable programmable ROMs
- EEPROMs electrically erasable programmable ROMs
- magnetic or optical cards or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer).
- a machine-readable medium includes read only memory ("ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US55235604P | 2004-03-10 | 2004-03-10 | |
US55215304P | 2004-03-10 | 2004-03-10 | |
US11/077,106 US7522774B2 (en) | 2004-03-10 | 2005-03-09 | Methods and apparatuses for compressing digital image data |
US11/076,746 US20050207495A1 (en) | 2004-03-10 | 2005-03-09 | Methods and apparatuses for compressing digital image data with motion prediction |
PCT/US2005/008391 WO2005086981A2 (en) | 2004-03-10 | 2005-03-10 | Methods and apparatuses for compressing digital image data with motion prediction |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1730846A2 true EP1730846A2 (de) | 2006-12-13 |
EP1730846A4 EP1730846A4 (de) | 2010-02-24 |
Family
ID=34976280
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05725507A Withdrawn EP1730846A4 (de) | 2004-03-10 | 2005-03-10 | Verfahren und vorrichtungen zum komprimieren digitaler bilddaten mit bewegungsprädiktion |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1730846A4 (de) |
JP (1) | JP2007529184A (de) |
KR (1) | KR20070026451A (de) |
WO (1) | WO2005086981A2 (de) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7583844B2 (en) * | 2005-03-11 | 2009-09-01 | Nokia Corporation | Method, device, and system for processing of still images in the compressed domain |
CN1964219B (zh) * | 2005-11-11 | 2016-01-20 | 上海贝尔股份有限公司 | 实现中继的方法和设备 |
US8654833B2 (en) | 2007-09-26 | 2014-02-18 | Qualcomm Incorporated | Efficient transformation techniques for video coding |
KR100950417B1 (ko) * | 2008-01-16 | 2010-03-29 | 에스케이 텔레콤주식회사 | 방향성 필터링 기반 웨이블렛 변환에서 문맥 모델링 방법및 웨이블렛 코딩 장치와 이를 위한 기록 매체 |
KR101423466B1 (ko) | 2008-05-06 | 2014-08-18 | 삼성전자주식회사 | 비트 플레인 영상의 변환 방법 및 장치, 역변환 방법 및장치 |
KR101634228B1 (ko) * | 2009-03-17 | 2016-06-28 | 삼성전자주식회사 | 디지털 이미지 처리장치, 추적방법, 추적방법을 실행시키기위한 프로그램을 저장한 기록매체 및 추적방법을 채용한 디지털 이미지 처리장치 |
US9232230B2 (en) * | 2012-03-21 | 2016-01-05 | Vixs Systems, Inc. | Method and device to identify motion vector candidates using a scaled motion search |
CN113630391B (zh) * | 2015-06-02 | 2023-07-11 | 杜比实验室特许公司 | 具有智能重传和插值的服务中质量监视系统 |
KR20210145754A (ko) | 2019-04-12 | 2021-12-02 | 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 | 행렬 기반 인트라 예측에서의 산출 |
JP2022535726A (ja) | 2019-05-31 | 2022-08-10 | 北京字節跳動網絡技術有限公司 | 行列ベースイントラ予測における制約されたアップサンプリングプロセス |
WO2020244610A1 (en) | 2019-06-05 | 2020-12-10 | Beijing Bytedance Network Technology Co., Ltd. | Context determination for matrix-based intra prediction |
CN117041597B (zh) * | 2023-10-09 | 2024-01-19 | 中信建投证券股份有限公司 | 一种视频编码、解码方法、装置、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993017524A1 (en) * | 1992-02-26 | 1993-09-02 | General Electric Company | Data compression system including significance quantizer |
US5477272A (en) * | 1993-07-22 | 1995-12-19 | Gte Laboratories Incorporated | Variable-block size multi-resolution motion estimation scheme for pyramid coding |
US5495292A (en) * | 1993-09-03 | 1996-02-27 | Gte Laboratories Incorporated | Inter-frame wavelet transform coder for color video compression |
US6084908A (en) * | 1995-10-25 | 2000-07-04 | Sarnoff Corporation | Apparatus and method for quadtree based variable block size motion estimation |
US6148027A (en) * | 1997-05-30 | 2000-11-14 | Sarnoff Corporation | Method and apparatus for performing hierarchical motion estimation using nonlinear pyramid |
-
2005
- 2005-03-10 EP EP05725507A patent/EP1730846A4/de not_active Withdrawn
- 2005-03-10 WO PCT/US2005/008391 patent/WO2005086981A2/en active Application Filing
- 2005-03-10 KR KR1020067021047A patent/KR20070026451A/ko not_active Application Discontinuation
- 2005-03-10 JP JP2007503104A patent/JP2007529184A/ja not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993017524A1 (en) * | 1992-02-26 | 1993-09-02 | General Electric Company | Data compression system including significance quantizer |
US5477272A (en) * | 1993-07-22 | 1995-12-19 | Gte Laboratories Incorporated | Variable-block size multi-resolution motion estimation scheme for pyramid coding |
US5495292A (en) * | 1993-09-03 | 1996-02-27 | Gte Laboratories Incorporated | Inter-frame wavelet transform coder for color video compression |
US6084908A (en) * | 1995-10-25 | 2000-07-04 | Sarnoff Corporation | Apparatus and method for quadtree based variable block size motion estimation |
US6148027A (en) * | 1997-05-30 | 2000-11-14 | Sarnoff Corporation | Method and apparatus for performing hierarchical motion estimation using nonlinear pyramid |
Non-Patent Citations (3)
Title |
---|
CHOW R K Y ET AL: "Scalable video delivery to unicast handheld-based clients" NETWORKS, 2000. (ICON 2000). PROCEEDINGS. IEEE INTERNATIONAL CONFERENC E ON SEPTEMBER 5-8, 2000, PISCATAWAY, NJ, USA,IEEE, 5 September 2000 (2000-09-05), pages 93-98, XP010514085 ISBN: 978-0-7695-0777-4 * |
JO YEW THAM ET AL: "Highly Scalable Wavelet-Based Video Codec for Very Low Bit-Rate Environment" IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, US, vol. 16, no. 1, 1 January 1998 (1998-01-01), XP011054745 ISSN: 0733-8716 * |
See also references of WO2005086981A2 * |
Also Published As
Publication number | Publication date |
---|---|
EP1730846A4 (de) | 2010-02-24 |
JP2007529184A (ja) | 2007-10-18 |
KR20070026451A (ko) | 2007-03-08 |
WO2005086981A2 (en) | 2005-09-22 |
WO2005086981A3 (en) | 2006-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7522774B2 (en) | Methods and apparatuses for compressing digital image data | |
US20050207495A1 (en) | Methods and apparatuses for compressing digital image data with motion prediction | |
WO2005086981A2 (en) | Methods and apparatuses for compressing digital image data with motion prediction | |
US10375409B2 (en) | Method and apparatus for image encoding with intra prediction mode | |
CN108848376B (zh) | 视频编码、解码方法、装置和计算机设备 | |
JP5606591B2 (ja) | ビデオ圧縮方法 | |
EP3570545B1 (de) | Intraprädiktion zur videocodierung mit geringer komplexität | |
US11284107B2 (en) | Co-located reference frame interpolation using optical flow estimation | |
US8761252B2 (en) | Method and apparatus for scalably encoding and decoding video signal | |
CN113923455B (zh) | 一种双向帧间预测方法及装置 | |
US11876974B2 (en) | Block-based optical flow estimation for motion compensated prediction in video coding | |
US20100002770A1 (en) | Video encoding by filter selection | |
US20080247467A1 (en) | Adaptive interpolation filters for video coding | |
WO2014120374A1 (en) | Content adaptive predictive and functionally predictive pictures with modified references for next generation video coding | |
EP1466477A2 (de) | Codierung dynamischer filter | |
US20100086048A1 (en) | System and Method for Video Image Processing | |
JP5230798B2 (ja) | 符号化及び復号化方法、コーダ及びデコーダ | |
CN118540480A (zh) | 编码装置、解码装置和非暂时性计算机可读介质 | |
Hua et al. | Inter frame video compression with large dictionaries of tilings: algorithms for tiling selection and entropy coding | |
JP2024102378A (ja) | 動画像符号化装置、復号装置 | |
US8218639B2 (en) | Method for pixel prediction with low complexity | |
JP2007235299A (ja) | 画像符号化方法 | |
Wang | Fully scalable video coding using redundant-wavelet multihypothesis and motion-compensated temporal filtering | |
CN114830645A (zh) | 图像编码方法和图像解码方法 | |
CN114521325A (zh) | 图像编码方法和图像解码方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20061005 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR LV MK YU |
|
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20100126 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: H04N 7/26 20060101ALI20100120BHEP Ipc: H04N 7/12 20060101ALI20100120BHEP Ipc: H04B 1/66 20060101AFI20061102BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20100423 |