US20190238872A1 - Method and apparatus to process video sequences in transform space - Google Patents
Method and apparatus to process video sequences in transform space Download PDFInfo
- Publication number
- US20190238872A1 US20190238872A1 US16/377,489 US201916377489A US2019238872A1 US 20190238872 A1 US20190238872 A1 US 20190238872A1 US 201916377489 A US201916377489 A US 201916377489A US 2019238872 A1 US2019238872 A1 US 2019238872A1
- Authority
- US
- United States
- Prior art keywords
- transform
- frames
- frequency domain
- dimensional
- domain representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/423—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
- H04N19/426—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/48—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/65—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Definitions
- Pre-filtering is used in video encoding to remove undesirable noise from video sources. For example, a plurality of video frames are processed by a pre-filter to produce a plurality of filtered video frames. The plurality of video frames are then compressed by a video encoder. Without a pre filter, the noise degrades the performance of a video encoder by wasting a number of bits to represent the noise itself, and by introducing encoding artifacts such as blocking and ringing noise.
- a finite impulse response filter converts an input sequence in pixel space into a filtered output sequence equal in number to the input sequence, by processing through a sequence of multiply-add operations.
- Construction of a finite impulse filter solution implementing a specified frequency response requires construction of a finite set of taps representing the inverse Discrete Fourier Transform of the desired frequency response, substantially decreasing the possibility of constructing an arbitrary noise removal function.
- FIG. 1 illustrates an example block diagram of a system including compressing video sequences using a transform-domain video processor to pre-compress the video sequences and a video encoder to compress the pre-compressed video sequences according to some implementations.
- FIG. 2 illustrates an example graphical representation of transform coefficients from a family of equally-spaced transforms usable to pre-compress a video sequence according to some implementations.
- FIG. 3 illustrates an example graphical representation of transform coefficients from a family of unequally-spaced transforms usable to pre-compress a video sequence according to some implementations.
- FIG. 4 illustrates an example graphical representation of multiplicative constants in a one-dimensional transform domain associated with a one-dimensional description of human visibility as a function of frequency according to some implementations.
- FIG. 5 illustrates an example graphical representation of quantizing values in a one-dimensional transform domain associated with a one-dimensional description of human contrast sensitivity as a function of frequency according to some implementations.
- FIG. 6 illustrates an example a one-dimensional pre-compressor for pre-compressing a video sequence according to some implementations.
- FIG. 7 illustrates an example of a one-dimensional processor for pre-compressing a video sequence according to some implementations.
- FIG. 8 illustrates an example graphical representation of multiplicative constants in a two-dimensional transform domain associated with a two-dimensional description of human visibility as a function of frequency according to some implementations.
- FIG. 9 illustrates an example representation of a two-dimensional description of human contrast sensitivity as a function of frequency according to some implementations.
- FIG. 10 illustrates an example a two-dimensional processor for pre-compressing a video sequence according to some implementations.
- FIG. 11 illustrates an example of a two-dimensional pre-compressor for pre-compressing a video sequence according to some implementations.
- FIG. 12 illustrates an example representation a three-dimensional description of human visibility as a function of frequency according to some implementations.
- FIG. 13 illustrates another example representation a three-dimensional description of human visibility as a function of frequency according to some implementations
- FIG. 14 illustrates an example a three-dimensional processor for pre-compressing a video sequence according to some implementations.
- FIG. 15 illustrates an example of a three-dimensional pre-compressor for pre-compressing a video sequence according to some implementations.
- FIG. 16 illustrates example components an electronic device that may be configured to perform pre-compression according to some implementations.
- This disclosure includes techniques and implementations for pre-compressing image data, including spatiotemporal three-dimensional video sequences, to improve compressing rates by a video encoder. For example, rather than pre-filtering the image data in the pixel domain space and/or the temporal space, which may result in a perceivable reduction in video quality as data is removed, the implementations describe herein pre-compress the image data in the transform domain space and/or the frequency domain space using functions representative of the contrast sensitivity of the human eye to configure the image data for compression in a manner that results in changes in the data that are substantially imperceptible to the human eye.
- the image data may be pre-compressed as a series of frames in one-dimensional, two-dimensional, or three-dimensional spaces.
- a one-dimensional processor may pre-compress the image data by processing series of adjacent pixels, such as a row of pixels within a frame, a column of pixels within a frame, or as a series of pixels having a shared coordinate within each of a plurality of frames.
- a two-dimensional processor may pre-compress the image data by processing blocks of pixels within a frame.
- a three-dimensional processor may pre-compress the image data by processing series of blocks of pixels having a shared coordinate within each of a plurality of frames.
- a dimensional processor may apply a noise filtering function specified in transform domain to an input video sequence.
- the transform domain may comprise a combination of wavelet transform, the Discrete Cosine Transform, the Karhunen-Loeve transform, or other linear transforms in one to three dimensions.
- the noise filtering function may include a white noise filter, a pink-noise filter, a band-pass filter or other filter function.
- the dimensional processor may generate an output video sequence that is substantially similar to the input video sequence when viewed by a human (e.g., the output video sequence differs from the input video sequence in ways that are imperceptible to the human eye). However, the output video sequence results in improved compression when compared with the input video sequence following compression by a video encoder.
- a frequency response function internal to the domain processor may remove subjectively redundant visual information by calculating optimal visually-weighed quantizers corresponding to the decorrelating-transformed block decomposition of a sequence of video images.
- a function representative of the contrast sensitivity of the human eye to actual time-varying transform-domain frequency of each transform component may be calculated.
- the resolution of the transformed data e.g., the transform representation of the original video sequence
- the dimensional processor may implement a three-dimensional Discrete Cosine Transform as a decorrelating transform.
- FIG. 1 illustrates an example block diagram of a system 100 for compressing video sequences 102 using transform domain pre-compressor 104 that includes a transform-domain dimensional engine (not shown) and a transform space shaping engine to pre-compress the video sequences 102 and a video encoder 106 to compress the pre-compressed video sequences 108 according to some implementations.
- an input video sequence 102 including a plurality of individual frames 110 may be received at an image buffer (not shown).
- the video sequence 102 may then be pre-compressed by the transform domain pre-compressor 104 in one, two, or three dimensions.
- the video sequence may be processed by the transform domain pre-compressor 104 in one-dimension, as series of adjacent pixels (such as a row of pixels within a frame, a column of pixels within a frame, or as a series of pixels having a shared coordinate within each of a plurality of frames), in two-dimensions, as blocks of pixels within a frame, or in three dimensions, as a series of blocks of pixels having a shared coordinate within each of a plurality of frames.
- the transform domain pre-compressor 104 may process the frames 110 in the transform domain space or frequency domain space using a function representative of the contrast sensitivity of the human eye to alter the frames in a manner visually imperceptible to the human eye.
- the resulting pre-compressed video sequence 106 has a plurality of frames 112 with visual quality substantially equivalent to the visual quality of the frames 110 of the input video sequence 102 .
- the video encoder 108 receives the frames 112 of the pre-compressed video sequence 106 from the transform domain pre-compressor 104 .
- the video encoder 108 compresses the frames 112 of the pre-compressed video sequence 106 into a compressed representation of the frames 112 , referred to herein as compressed frames 114 .
- the compressed frame 114 is smaller than a compressed representation of the frame 110 of the input video sequence 102 .
- the video quality is substantially maintained as, for example, in pixels are not removed from the frames 110 as is the case when pre-filtering in the pixel domain.
- pre-filtering in the pixel space or the temporal space may be applied prior to the pre-compression of the frames 110 by the transform domain pre-compressor 104 to further improve compression rates during compression by the video encoded 108 .
- FIG. 2 illustrates an example graphical representation 200 of transform coefficients from a family of equally-spaced transforms usable to pre-compress a video sequence according to some implementations.
- some examples of evenly-spaced domains may include the Discrete Cosine Transform (DCT) or the Karhunen-Loeve Transform (KLT).
- DCT Discrete Cosine Transform
- KLT Karhunen-Loeve Transform
- a desired frequency response may be approximated by specifying an amplitude per evenly-spaced transform domain component frequency, generally indicated by the graph 204 .
- the desired frequency response 202 may be approximated by specifying a quantizer calculated as the inverse of the amplitude per evenly-spaced transform domain component frequency, generally indicated by the graph 206 .
- the amplitude of the desired frequency response 202 increases, the number of each quantizer F 0 -F n-1 increases and the size of each quantizer F 0 -F n-1 decreases.
- the image data of the video sequence is divided by the corresponding quantizer F 0 -F n-1 and rounded to the nearest integer in the domain space.
- quantizer F 0 is a relatively large quantizer that allows for few steps or output data values, as the desired frequency response 202 , is low at F 0 in the domain space.
- F 0 more information related to the video sequence is maintained when the desired frequency response 202 is high than at points where the desired frequency response 202 is low.
- the desired frequency response 202 is a function of the contrast sensitivity of the human eye, more information associated with the video sequence is maintained at points along the contrasts sensitivity functions in which the human eye is able to decipher more information.
- FIG. 3 illustrates an example graphical representation 300 of transform coefficients from a family of unequally-spaced transforms usable to pre-compress a video sequence according to some implementations.
- the family of wavelet transforms is an example of an unevenly-spaced transform domain, which may be utilized by a dimensional processor to pre-compress video sequences or other image data.
- an unevenly-spaced frequency transform may again be representative of a contrast sensitivity of the human eye.
- the unevenly-spaced frequency transform 302 may be approximated by specifying an amplitude per unevenly-spaced transform domain component frequency, generally indicated by graph 304 .
- the desired frequency response 302 may be approximated by specifying a quantizer calculated as the inverse of the amplitude per unevenly-spaced transform domain component frequency, generally indicated as 306 .
- FIG. 4 illustrates an example graphical representation of multiplicative constants 400 in a one-dimensional transform domain associated with a one-dimensional description of human visibility as a function of frequency according to some implementations.
- a reversible linear discrete transform the Discrete Cosine Transform 402 is selected:
- N is the block size of the transform
- i is a pixel index counting from 0 to N-1
- x i is the i th pixel in the block
- u is the mapped discrete frequency index, from 0 to N-1
- X u is the u th frequency amplitude.
- the reversible linear discrete transform 402 maps to a set of discrete frequencies, generally indicated by 404 .
- the frequencies 404 may be either evenly-spaced or unevenly spaced, as described above with respect to FIGS. 2 and 3 .
- a normalized human visual system transfer function 406 is defined in frequency space such that substantially perfect reproduction of a video sequence following pre-compression is defined as unity:
- the normalized human visual system transfer function 406 may be sampled at the discrete frequencies 404 of the reversible linear discrete transform 402 to generate the multiplicative constants 400 . It should be understood, that one multiplicative constant 400 may be generated per said discrete frequency 404 .
- FIG. 5 illustrates an example graphical representation of quantizing values 500 in a one-dimensional transform domain associated with a one-dimensional description of human contrast sensitivity as a function of frequency according to some implementations.
- a dimensional pre-compressor may assigning fixed quantizer values 500 to each individual frequency amplitude F 0 -F n-1 .
- a reversible linear discrete transform 402 is selected based at least in part to result in a mapping to a set of discrete frequencies 504 .
- the set of discrete frequencies 504 may either be evenly-spaced or unevenly spaced, as discussed above with respect to FIGS. 2 and 3 .
- the scaled inverse of the normalized human visual system contrast sensitivity function 506 may be defined in transform space or frequency space as:
- the scaled inverse normalized human visual system contrast sensitivity function 506 may be sampled at the discrete frequencies F 0 -F n-1 504 of the reversible linear discrete transform 502 per discrete frequency F 0 -F n-1 in the illustrated example, the sampling at the discrete frequencies F 0 -F n-1 504 generates the quantizer value for each sampled frequency.
- the scaled inverse normalized human visual system contrast sensitivity function 506 may be sampled at the discrete frequencies F 4 , generally indicated by the line 508 , to generate the quantizer value at F 4 .
- FIG. 6 illustrates an example a one-dimensional processor 600 for pre-compressing a video sequence according to some implementations.
- the one-dimensional processor 600 may include a linear pixel input buffer 602 , a forward transform engine 604 , a frequency domain frequency response shaping engine 606 , an inverse transform engine 608 , and a linear pixel output buffer 610 .
- the one-dimensional processor 600 receives input pixels 612 and store the input pixels in the input pixel buffer 602 prior to pre-compression.
- the input pixels 612 may be associated with a frame or multiple frames of a video sequence or a still image or photograph.
- the input pixels 612 may be a series of adjacent pixels, such as a row or column of an image, photograph, or frame of a video sequence.
- the input pixels 612 may include a series of adjacent pixels over multiple frames of the video sequence, such as pixels having the same coordinate in each frame sequential of the video sequence.
- the forward transform engine 604 may receive a reversible linear forward transform 614 selected as the transform for the pre-compression operations.
- the reversible linear forward transform 614 may be a stored internal transform that is fixed relative to the pixel input 612 , while in other cases, the reversible linear forward transform 614 may be selected per buffering or per video sequence.
- the forward transform 614 may be an evenly-spaced transform (e.g., the Discrete Cosine Transform or the Karhunen-Loeve Transform), or an unevenly-spaced transform (e.g., a wavelet transform).
- the forward transform engine 604 may be configured to perform the reversible linear forward transform 614 on the input pixels 612 stored in the linear pixel input buffer 602 .
- the operation of the reversible linear forward transform 614 on the input pixels 612 generates a frequency domain representation of the input pixels 612 store in the linear pixel input buffer 602 .
- the frequency domain frequency response shaping engine 606 receives the frequency domain representation of the input pixels 612 .
- the frequency domain frequency response shaping engine 606 may be configured to apply a frequency response shape function 616 to the frequency domain representation of the input pixels 612 while stored in the linear pixel input buffer 602 .
- the frequency response shape function 616 may cause the frequency domain frequency response shaping engine 606 to apply a multiplicative constant to individual frequency amplitudes of the frequency domain representation of the input pixels 612 .
- the frequency response shape function 616 may cause the frequency domain frequency response shaping engine 606 to apply a quantize operation followed by a dequantize operation using a quantizing factor determined for each individual frequency of the frequency domain representation of the input pixels 612 .
- the reverse transform engine 608 may be configured to receive a reverse transform 618 to convert the frequency domain representation of the input pixels 612 to the pixel space.
- the reverse transform engine 608 may be configured to perform the reverse transform 618 on an output of the frequency domain frequency response shaping engine 606 to generate the output pixels 620 or a pixel domain representation of out of the frequency domain frequency response shaping engine 606 .
- the output pixels 620 may be stored in the linear pixel output buffer 610 to allow a remote component (e.g., a video encoder or other circuit/module) to access the output pixels 620 for transmission or further processing.
- FIG. 7 illustrates an example of a one-dimensional pre-compressor 700 for pre-compressing a video sequence according to some implementations.
- a buffer such as the linear input buffer 602 of FIG. 6 above, may store a video sequence 702 including a plurality of individual frames 704 to be pre-compressed.
- the one-dimensional pre-compressor 700 may be configured to process a series of adjacent pixels from within the frames 704 of the video sequence 702 .
- a temporal line of pixels is being processed as a unit.
- one pixel from a fixed location or coordinate within each of the frames 704 may be selected to form the set of pixels 706 .
- each of the one-dimensional transform processors 708 may include a one-dimensional transform engine (such as forward transform engine 604 ), a one-dimensional frequency domain frequency response shaping engine (such as the frequency domain frequency response shaping engine 606 ), and a one-dimensional inverse transform engine (such as the inverse transform engine 608 ).
- a one-dimensional transform engine such as forward transform engine 604
- a one-dimensional frequency domain frequency response shaping engine such as the frequency domain frequency response shaping engine 606
- a one-dimensional inverse transform engine such as the inverse transform engine 608 .
- the set of pixels 706 may be pre-compressed by the one-dimensional transform processors 708 ( 1 ) to generate a set of output pixels 710 representative of the same visual content as the set of pixels 706 in the video sequence 702 but in a manner that when compressed by a video encoder results in an improved rate of compression.
- the one-dimensional processor 700 may include a plurality of one-dimensional transform processors 708 ( 1 )-(K).
- each of the one-dimensional transform processors 708 ( 1 )-(K) may process a set of pixels, such as set of pixels 706 , in parallel.
- the one-dimensional transform processors 708 (K) is configured to receive a set of pixels 712 of a second set of frames 714 of a second video sequence 716 (such as additional frames of the same video as the video sequence 702 or another video altogether).
- the one-dimensional transform processors 708 ( 1 )-(K) may pre-compress the set of pixels 712 into a set of pre-compressed output pixels 718 that may result in an improved rate of compression over the set of pixels 712 when compressed by a video encoder.
- each of the one-dimensional transform processors 708 ( 1 )-(K) may process sets of pixels from the same video sequence and/or the same frames in parallel to improve the overall throughput of the one-dimensional pre-compressor 700 .
- FIG. 8 illustrates an example graphical representation of multiplicative constants 800 in a two-dimensional transform domain associated with a two-dimensional description of human visibility as a function of frequency according to some implementations.
- a dimensional pre-compressor may assign multiplicative constants 800 to each individual frequency amplitude of a normalized two-dimensional human visual system transfer function.
- a reversible two-dimensional linear discrete transform 802 may be selected to map a two-dimensional set of discrete frequencies 804 .
- the discrete frequencies 804 may be mapped using either an evenly-spaced or an unevenly spaced function in either of the two-dimensions.
- a normalized human visual system transfer function 806 may be defined in the two-dimensional frequency space such that substantially perfect reproduction of visual data as detected by a human eye may be defined as unity:
- u and v are each associated with a frequency of the visual stimulus in cycles per degree in either pixel or line direction and CSF(u,v) is defined as a relative sensitivity of the human eye to the joint frequency u and v.
- the normalized two-dimensional human visual system transfer function is non-linear with respect to pixel-direction frequencies (u) and line-direction frequencies (v).
- the normalized two-dimensional human visual system transfer function 806 is sampled at each frequency of the two-dimensional set of discrete frequencies 804 .
- the illustrated example depicts a graph 808 of the normalized two-dimensional human visual system transfer function.
- the normalized two-dimensional human visual system transfer function 808 illustrates a typical grid of frequencies at which sampling of the two-dimensional human visual system transfer function 806 takes place.
- FIG. 9 illustrates an example representation of a two-dimensional description of human contrast sensitivity as a function of frequency according to some implementations.
- a two-dimensional system contrast sensitivity function 900 is defined in terms of viewing conditions.
- the viewing conditions may include an expected average ambient luminance I 902 , and additional variables u 904 , temporal frequency, X 0 906 (e.g., angle subtended by DCT block), and X max 908 (e.g., angle subtended by display surface).
- FIG. 10 illustrates an example a two-dimensional processor 1000 for pre-compressing a video sequence according to some implementations.
- the two-dimensional processor 1000 may include a frame sub-block input buffer 1002 , a forward two-dimensional transform engine 1004 , a two-dimensional frequency domain frequency response shaping engine 1006 , a two-dimensional inverse transform engine 1008 , and a frame sub-block output buffer 1010 .
- the two-dimensional processor 1000 receives input blocks 1012 and store the input blocks in the input block buffer 1002 prior to pre-compression.
- the input blocks 1012 may be associated with a block of pixels of a frame of a video sequence, a still image, or a photograph.
- the two-dimensional forward transform engine 1004 may receive a reversible two-dimensional linear forward transform 1014 selected as the transform for the pre-compression operations.
- the reversible two-dimensional linear forward transform 1014 may be a stored internal transform that is fixed relative to the input blocks 1012 , while in other cases, the reversible two-dimensional linear forward transform 1014 may be selected per buffering or per video sequence.
- the reversible two-dimensional forward transform 1014 may be an evenly-spaced transform (e.g., the Discrete Cosine Transform or the Karhunen-Loeve Transform), or an unevenly-spaced transform (e.g., a wavelet transform).
- an evenly-spaced transform e.g., the Discrete Cosine Transform or the Karhunen-Loeve Transform
- an unevenly-spaced transform e.g., a wavelet transform
- the two-dimensional forward transform engine 1004 may be configured to perform the reversible two-dimensional linear forward transform 1014 on the input blocks 1012 stored in the frame sub-block input buffer 1002 .
- the operation of the two-dimensional reversible two-dimensional linear forward transform 1014 on the input blocks 1012 generates a frequency domain representation of the input blocks 1012 store in the input buffer 1002 .
- the two-dimensional frequency domain frequency response shaping engine 1006 receives the frequency domain representation of the input blocks 1012 .
- the two-dimensional frequency domain frequency response shaping engine 1006 may be configured to apply a two-dimensional frequency response shape function 1016 to the two-dimensional frequency domain representation of the input blocks 1012 while stored in the input buffer 1002 .
- the two-dimensional frequency response shape function 1016 may cause the two-dimensional frequency domain frequency response shaping engine 1006 to apply a multiplicative constant to individual frequency amplitudes of the two-dimensional frequency domain representation of the input blocks 1012 .
- the two-dimensional frequency response shape function 1016 may cause the two-dimensional frequency domain frequency response shaping engine 1006 to apply a quantize operation followed by a dequantize operation using a quantizing factor determined for each individual frequency of the frequency domain representation of the input blocks 1012 .
- the two-dimensional reverse transform engine 1008 may be configured to receive a two-dimensional reverse transform 1018 to convert the two-dimensional frequency domain representation of the input blocks 1012 to the pixel space.
- the two-dimensional reverse transform engine 1008 may be configured to perform the two-dimensional reverse transform 1018 on an output of the two-dimensional frequency domain frequency response shaping engine 1006 to generate output blocks 1020 or a pixel domain representation of out of the two-dimensional frequency domain frequency response shaping engine 1006 .
- the output blocks 1020 may be stored in the output buffer 1010 to allow a remote component (e.g., a video encoder or other circuit/module) to access the output blocks 1020 for transmission or further processing.
- a remote component e.g., a video encoder or other circuit/module
- FIG. 11 illustrates an example of a two-dimensional pre-compressor 1100 for pre-compressing a video sequence according to some implementations.
- a buffer such as the input block buffer 1002 of FIG. 10 above, may store a video sequence including a plurality of individual frames 1102 to be pre-compressed.
- the two-dimensional pre-compressor 1100 may be configured to process blocks of image data or sub-blocks of a frame 1104 .
- the two-dimensional pre-compressor 1100 may be pre-compressing the sub-block 1104 ( 3 ).
- the sub-blocks 1104 ( 3 ) may be accessed or received by one of a plurality of two-dimensional transform processors 1106 .
- Each of the two-dimensional transform processors 1106 may include a two-dimensional transform engine (such as two-dimensional forward transform engine 1004 ), a two-dimensional frequency domain frequency response shaping engine (such as the two-dimensional frequency domain frequency response shaping engine 1006 ), and a two-dimensional inverse transform engine (such as the two-dimensional inverse transform engine 1008 ).
- the sub-block 1104 ( 3 ) is being processed by the two-dimensional transform processors 1106 ( 3 ).
- the other two-dimensional transform processors 1106 ( 1 ) and 1106 ( 2 )- 1106 (K) may pre-compress the corresponding sub-blocks 1104 ( 1 ) and 1104 ( 2 )- 1104 (K).
- the two-dimensional transform processors 1106 ( 3 ) may convert pixels of the sub-block 1104 ( 3 ) into the frequency domain, quantize a frequency domain representation of the pixels of the sub-block 1104 ( 3 ), and convert frequency domain representation of the pixels of the sub-block 1104 ( 3 ) back into the pixel domain following quantization.
- the two-dimensional transform processors 1106 ( 3 ) may generate pre-compress sub-block 1108 ( 3 ) that may be a substantially visibly equivalent representation of the sub-block 1104 ( 3 ) when viewed by a human.
- the pre-compress sub-block 1108 ( 3 ) may result in an improved rate of compression over the sub-block 1104 ( 3 ) when compressed by a video encoder.
- the size of a frame sub-block 1108 is taken as the size of a block in the two-dimensional transform. In some cases, the size of the sub-blocks processed using two-dimensional transform may be equal to a size of a frame of the video sequence 1102 .
- FIG. 12 illustrates an example representation a three-dimensional description of human visibility as a function of frequency according to some implementations.
- FIG. 12 shows a formulation of a three-dimensional human vision system transfer function 1202 which may be used to generate or assign multiplicative constants to individual frequency amplitudes in three-dimensions.
- the normalized three-dimensional human visual system transfer function is not separable by pixel frequency u, line frequency v, or frame rate w and, thus, requires pre-compressing using a three-dimensional pre-compressor or using a three-dimensional domain transform.
- FIG. 13 illustrates another example representation a three-dimensional description of human visibility as a function of frequency according to some implementations.
- the illustrated example shows a formulation of a three-dimensional system contrast sensitivity function 1300 .
- the three-dimensional system contrast sensitivity function 1300 may be defined in terms of viewing conditions.
- the viewing conditions may include configuration item expected average ambient luminance I 1302 , additional variables s 1304 , spatial frequency, w 1306 temporal frequency, X 0 1308 (e.g., angle subtended by DCT block), and X max 1310 (e.g., angle subtended by display surface).
- FIG. 14 illustrates an example a three-dimensional processor for pre-compressing 1400 a video sequence according to some implementations.
- the three-dimensional processor 1400 may include a frame sub-block input buffer 1402 , a forward three-dimensional transform engine 1404 , a three-dimensional frequency domain frequency response shaping engine 1406 , a three-dimensional inverse transform engine 1408 , and a frame sub-block output buffer 1410 .
- the three-dimensional processor 1400 receives three-dimensional input blocks 1412 and store the input blocks in the input block buffer 1402 prior to pre-compression.
- the input blocks 1412 may be associated with a block of pixels of a plurality of frames of a video sequence.
- the three-dimensional forward transform engine 1404 may receive a reversible three-dimensional linear forward transform 1414 selected as the transform for the pre-compression operations.
- the reversible three-dimensional linear forward transform 1414 may be a stored internal transform that is fixed relative to the input blocks 1412 , while in other cases, the reversible three-dimensional linear forward transform 1414 may be selected per buffering or per video sequence.
- the three-dimensional forward transform 1414 may be an evenly-spaced transform (e.g., the Discrete Cosine Transform or the Karhunen-Loeve Transform), or an unevenly-spaced transform (e.g., a wavelet transform).
- an evenly-spaced transform e.g., the Discrete Cosine Transform or the Karhunen-Loeve Transform
- an unevenly-spaced transform e.g., a wavelet transform
- the three-dimensional forward transform engine 1404 may be configured to perform the reversible three-dimensional linear forward transform 1414 on the input blocks 1412 stored in the input buffer 1402 .
- the operation of the reversible three-dimensional linear forward transform 1414 on the input blocks 1412 generates a frequency domain representation of the input blocks 1412 store in the input buffer 1402 .
- the three-dimensional frequency domain frequency response shaping engine 1406 receives the frequency domain representation of the input blocks 1412 .
- the three-dimensional frequency domain frequency response shaping engine 1406 may be configured to apply a three-dimensional frequency response shape function 1416 to the three-dimensional frequency domain representation of the input blocks 1412 while stored in the input buffer 1402 .
- the three-dimensional frequency response shape function 1416 may cause the three-dimensional frequency domain frequency response shaping engine 1406 to apply a multiplicative constant to individual frequency amplitudes of the three-dimensional frequency domain representation of the input blocks 1412 .
- the three-dimensional frequency response shape function 1416 may cause the three-dimensional frequency domain frequency response shaping engine 1406 to apply a quantize operation followed by a dequantize operation using a quantizing factor determined for each individual frequency of the frequency domain representation of the input blocks 1412 .
- the three-dimensional reverse transform engine 1408 may be configured to receive a three-dimensional reverse transform 1418 to convert the three-dimensional frequency domain representation of the input blocks 1412 to the pixel space.
- the three-dimensional reverse transform engine 1408 may be configured to perform the three-dimensional reverse transform 1418 on an output of the three-dimensional frequency domain frequency response shaping engine 1406 to generate output blocks 1420 or a pixel domain representation of out of the three-dimensional frequency domain frequency response shaping engine 1406 .
- the output blocks 1420 may be stored in the output buffer 1410 to allow a remote component (e.g., a video encoder or other circuit/module) to access the output blocks 1420 for transmission or further processing.
- a remote component e.g., a video encoder or other circuit/module
- FIG. 15 illustrates an example of a three-dimensional pre-compressor 1500 for pre-compressing a video sequence according to some implementations.
- a buffer such as the input block buffer 1402 of FIG. 14 above, may store a multiple sub-blocks over a plurality of individual frames 1502 to be pre-compressed as a unit.
- the sub-blocks 1504 ( 1 ) may be stored in the input block buffer 1402 to process as a unit or block of image data.
- the three-dimensional pre-compressor 1500 may be configured to process three-dimensional blocks of image data or sub-blocks of multiple frames 1504 .
- the three-dimensional pre-compressor 1500 may be pre-compressing the sub-block 1504 ( 3 ) over multiple frames.
- the sub-blocks 1504 ( 1 ) may be accessed or received by one of a plurality of three-dimensional transform processors 1506 .
- Each of the three-dimensional transform processors 1506 ( 1 )-(K) may include a three-dimensional transform engine (such as three-dimensional forward transform engine 1404 ), a three-dimensional frequency domain frequency response shaping engine (such as the three-dimensional frequency domain frequency response shaping engine 1406 ), and a three-dimensional inverse transform engine (such as the three-dimensional inverse transform engine 1408 ).
- the sub-block 1504 ( 1 ) is being processed by the three-dimensional transform processors 1506 ( 1 ).
- the other three-dimensional transform processors 1506 ( 2 )- 1106 (K) may pre-compress the corresponding sub-blocks 1504 ( 2 )- 1504 (K).
- the three-dimensional transform processors 1506 ( 1 ) may convert pixels of the sub-blocks 1504 ( 1 ) into the frequency domain, quantize a frequency domain representation of the pixels of the sub-block 1504 ( 1 ), and convert frequency domain representation of the pixels of the sub-block 1504 ( 1 ) into the pixel domain following quantization.
- the three-dimensional transform processors 1506 ( 1 ) may generate pre-compress sub-block 1508 ( 1 ) that may be a substantially visibly equivalent representation of the sub-block 1504 ( 1 ) when viewed by a human.
- the pre-compress sub-block 1508 ( 1 ) may result in an improved rate of compression over the sub-block 1504 ( 1 ) when compressed by a video encoder.
- the size of a frame sub-block 1504 is taken as the size of a block in the three-dimensional transform. In some cases, the size of the sub-blocks processed using three-dimensional transform may be equal to a size of a frame 1502 . It should also be understood, that the multiple frames 1502 may include more than four sub-blocks.
- FIG. 16 illustrates example components of an electronic device that may be configured to perform pre-compression according to some implementations.
- a dimensional pre-compressor 1600 may be formed in software.
- the dimensional pre-compressor 1600 may include processing resources, as represented by processors 1602 , and computer-readable storage media 1604 .
- the computer-readable storage media 1604 may include volatile and nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data.
- Such memory includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, RAID storage systems, or any other medium which can be used to store the desired information and which can be accessed by a computing device.
- the dimensional pre-compressor 1600 may also include one or more communication interfaces 1606 , which may support both wired and wireless connection to various networks, such as cellular networks, radio (e.g., radio-frequency identification RFID), WiFi networks, short-range or near-field networks (e.g., Bluetooth®), infrared signals, local area networks, wide area networks, the Internet, and so forth.
- the communication interfaces 1606 may allow the dimensional pre-compressor 1600 to receive image data, such as video sequences, frames, or still images.
- the communication interfaces 1606 may also allow the position recognizing system 1300 to send the output data (e.g., the pre-compressed frames) to a video encoder or remote receiver device.
- modules, sets of instructions, data stores, and so forth may be stored within the computer-readable media 1604 and configured to execute on the processors 1602 .
- a dimensional forward transform module 1608 a dimensional transform space shaping module 1610 , and a dimensional reverse transform module 1612 , as well as other modules.
- the computer-readable media 1604 may store data, such as store input pixel or block data 1612 (e.g., the original video sequences or images), output pixel or block data 1614 (e.g., the pre-compressed video sequences or images), one or more contrast sensitivity functions (e.g., the contrast sensitivity of the human eye), one or more transforms 1618 (e.g., one, two, or three dimensional transforms in the forward or reverse direction including wavelet transforms, the Discrete Cosine Transform, the Karhunen-Loeve transform, or other linear transforms), various block sizes associated with the one or more transforms 1618 , and one or more shaping functions.
- data 1612 e.g., the original video sequences or images
- output pixel or block data 1614 e.g., the pre-compressed video sequences or images
- contrast sensitivity functions e.g., the contrast sensitivity of the human eye
- transforms 1618 e.g., one, two, or three dimensional transforms
- the dimensional forward transform module 1608 may apply a forward transform to generate a frequency domain reparation of an input image or video sequence.
- the dimensional transform space shaping module 1610 may quantize and/or dequantize the frequency domain reparation of an input image to remove information from the input image or video sequence that is substantially imperceptible to the human eye using a function representative of contrast sensitivity of the human eye.
- the dimensional reverse transform module 1612 apply a reverse transform to generate a pixel domain representation of the output of the dimensional transform space shaping module 1610 .
Abstract
A system configured to preform pre-compression on video sequences within a transform space to improve the compressibility of the video sequence during standard video encoding. In some cases, the pre-compression is configured to prevent the introduction of perceivable distortion into the video sequence or to substantially minimize the introduction of perceivable distortion. In some examples, a transform-Domain video processor may pre-compress or pre-process the video sequence in one, two, or three dimensional blocks or sequences using models of human visual contrast sensitivity.
Description
- This application is a continuation of and claims priority to U.S. application Ser. No. 15/091,625, filed on Apr. 6, 2016 and entitled “METHOD AND APPARATUS TO PROCESS VIDEO SEQUENCES IN TRANSFORM SPACE,” which is a non-provisional of and claims priority to U.S. Provisional Application Ser. No. 62/143,648, filed on Apr. 6, 2015, entitled “METHOD AND APPARATUS TO PROCESS VIDEO SEQUENCES IN TRANSFORM SPACE”, the entirety of which are incorporated herein by reference.
- Pre-filtering is used in video encoding to remove undesirable noise from video sources. For example, a plurality of video frames are processed by a pre-filter to produce a plurality of filtered video frames. The plurality of video frames are then compressed by a video encoder. Without a pre filter, the noise degrades the performance of a video encoder by wasting a number of bits to represent the noise itself, and by introducing encoding artifacts such as blocking and ringing noise.
- Existing pre-filtering solutions are implemented in two ways, as a finite impulse response filter operating over the entire frame in pixel space, or as a finite impulse response filter operating in temporal space over a selected region of fast motion discovered by means of a motion estimation operation. A finite impulse response filter converts an input sequence in pixel space into a filtered output sequence equal in number to the input sequence, by processing through a sequence of multiply-add operations.
- Construction of a finite impulse filter solution implementing a specified frequency response requires construction of a finite set of taps representing the inverse Discrete Fourier Transform of the desired frequency response, substantially decreasing the possibility of constructing an arbitrary noise removal function.
- The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical components or features.
-
FIG. 1 illustrates an example block diagram of a system including compressing video sequences using a transform-domain video processor to pre-compress the video sequences and a video encoder to compress the pre-compressed video sequences according to some implementations. -
FIG. 2 illustrates an example graphical representation of transform coefficients from a family of equally-spaced transforms usable to pre-compress a video sequence according to some implementations. -
FIG. 3 illustrates an example graphical representation of transform coefficients from a family of unequally-spaced transforms usable to pre-compress a video sequence according to some implementations. -
FIG. 4 illustrates an example graphical representation of multiplicative constants in a one-dimensional transform domain associated with a one-dimensional description of human visibility as a function of frequency according to some implementations. -
FIG. 5 illustrates an example graphical representation of quantizing values in a one-dimensional transform domain associated with a one-dimensional description of human contrast sensitivity as a function of frequency according to some implementations. -
FIG. 6 illustrates an example a one-dimensional pre-compressor for pre-compressing a video sequence according to some implementations. -
FIG. 7 illustrates an example of a one-dimensional processor for pre-compressing a video sequence according to some implementations. -
FIG. 8 illustrates an example graphical representation of multiplicative constants in a two-dimensional transform domain associated with a two-dimensional description of human visibility as a function of frequency according to some implementations. -
FIG. 9 illustrates an example representation of a two-dimensional description of human contrast sensitivity as a function of frequency according to some implementations. -
FIG. 10 illustrates an example a two-dimensional processor for pre-compressing a video sequence according to some implementations. -
FIG. 11 illustrates an example of a two-dimensional pre-compressor for pre-compressing a video sequence according to some implementations. -
FIG. 12 illustrates an example representation a three-dimensional description of human visibility as a function of frequency according to some implementations. -
FIG. 13 illustrates another example representation a three-dimensional description of human visibility as a function of frequency according to some implementations -
FIG. 14 illustrates an example a three-dimensional processor for pre-compressing a video sequence according to some implementations. -
FIG. 15 illustrates an example of a three-dimensional pre-compressor for pre-compressing a video sequence according to some implementations. -
FIG. 16 illustrates example components an electronic device that may be configured to perform pre-compression according to some implementations. - This disclosure includes techniques and implementations for pre-compressing image data, including spatiotemporal three-dimensional video sequences, to improve compressing rates by a video encoder. For example, rather than pre-filtering the image data in the pixel domain space and/or the temporal space, which may result in a perceivable reduction in video quality as data is removed, the implementations describe herein pre-compress the image data in the transform domain space and/or the frequency domain space using functions representative of the contrast sensitivity of the human eye to configure the image data for compression in a manner that results in changes in the data that are substantially imperceptible to the human eye.
- In some examples, the image data may be pre-compressed as a series of frames in one-dimensional, two-dimensional, or three-dimensional spaces. For instance, a one-dimensional processor may pre-compress the image data by processing series of adjacent pixels, such as a row of pixels within a frame, a column of pixels within a frame, or as a series of pixels having a shared coordinate within each of a plurality of frames. In another instance, a two-dimensional processor may pre-compress the image data by processing blocks of pixels within a frame. In yet another instance, a three-dimensional processor may pre-compress the image data by processing series of blocks of pixels having a shared coordinate within each of a plurality of frames.
- In one specific example, a dimensional processor may apply a noise filtering function specified in transform domain to an input video sequence. In some cases, the transform domain may comprise a combination of wavelet transform, the Discrete Cosine Transform, the Karhunen-Loeve transform, or other linear transforms in one to three dimensions. The noise filtering function may include a white noise filter, a pink-noise filter, a band-pass filter or other filter function. In this example, the dimensional processor may generate an output video sequence that is substantially similar to the input video sequence when viewed by a human (e.g., the output video sequence differs from the input video sequence in ways that are imperceptible to the human eye). However, the output video sequence results in improved compression when compared with the input video sequence following compression by a video encoder.
- In another specific example, a frequency response function internal to the domain processor may remove subjectively redundant visual information by calculating optimal visually-weighed quantizers corresponding to the decorrelating-transformed block decomposition of a sequence of video images. A function representative of the contrast sensitivity of the human eye to actual time-varying transform-domain frequency of each transform component may be calculated. The resolution of the transformed data (e.g., the transform representation of the original video sequence) is reduced by the calculated function representative of the contrast sensitivity of the human eye. For example, the dimensional processor may implement a three-dimensional Discrete Cosine Transform as a decorrelating transform.
-
FIG. 1 illustrates an example block diagram of asystem 100 for compressingvideo sequences 102 using transform domain pre-compressor 104 that includes a transform-domain dimensional engine (not shown) and a transform space shaping engine to pre-compress thevideo sequences 102 and avideo encoder 106 to compress thepre-compressed video sequences 108 according to some implementations. In some cases, aninput video sequence 102 including a plurality ofindividual frames 110 may be received at an image buffer (not shown). Thevideo sequence 102 may then be pre-compressed by the transform domain pre-compressor 104 in one, two, or three dimensions. For example, the video sequence may be processed by the transform domain pre-compressor 104 in one-dimension, as series of adjacent pixels (such as a row of pixels within a frame, a column of pixels within a frame, or as a series of pixels having a shared coordinate within each of a plurality of frames), in two-dimensions, as blocks of pixels within a frame, or in three dimensions, as a series of blocks of pixels having a shared coordinate within each of a plurality of frames. - The transform domain pre-compressor 104 may process the
frames 110 in the transform domain space or frequency domain space using a function representative of the contrast sensitivity of the human eye to alter the frames in a manner visually imperceptible to the human eye. Thus, after processing, the resultingpre-compressed video sequence 106 has a plurality offrames 112 with visual quality substantially equivalent to the visual quality of theframes 110 of theinput video sequence 102. - The
video encoder 108 receives theframes 112 of thepre-compressed video sequence 106 from the transform domain pre-compressor 104. Thevideo encoder 108 compresses theframes 112 of thepre-compressed video sequence 106 into a compressed representation of theframes 112, referred to herein ascompressed frames 114. It should be understood that, thecompressed frame 114 is smaller than a compressed representation of theframe 110 of theinput video sequence 102. Further, unlike traditional pixel space or temporal space pre-filtering, the video quality is substantially maintained as, for example, in pixels are not removed from theframes 110 as is the case when pre-filtering in the pixel domain. Further, the improved compression rates are maintained even when a conventional pre-filtering in the pixel space or temporal space is applied to theframes 112 before compression by thevideo encoder 108. Additionally, pre-filtering in the pixel space or the temporal space may be applied prior to the pre-compression of theframes 110 by the transform domain pre-compressor 104 to further improve compression rates during compression by the video encoded 108. -
FIG. 2 illustrates an examplegraphical representation 200 of transform coefficients from a family of equally-spaced transforms usable to pre-compress a video sequence according to some implementations. For instance, some examples of evenly-spaced domains may include the Discrete Cosine Transform (DCT) or the Karhunen-Loeve Transform (KLT). - In the illustrated example, a desired frequency response, generally indicated by
graph 202, may be approximated by specifying an amplitude per evenly-spaced transform domain component frequency, generally indicated by thegraph 204. Alternatively, the desiredfrequency response 202 may be approximated by specifying a quantizer calculated as the inverse of the amplitude per evenly-spaced transform domain component frequency, generally indicated by thegraph 206. Thus, as illustrated, the amplitude of the desiredfrequency response 202 increases, the number of each quantizer F0-Fn-1 increases and the size of each quantizer F0-Fn-1 decreases. - In this example, when a dimensional processor pre-compresses the video sequence based on the desired
frequency response 202, the image data of the video sequence is divided by the corresponding quantizer F0-Fn-1 and rounded to the nearest integer in the domain space. For example, quantizer F0 is a relatively large quantizer that allows for few steps or output data values, as the desiredfrequency response 202, is low at F0 in the domain space. As such, more information related to the video sequence is maintained when the desiredfrequency response 202 is high than at points where the desiredfrequency response 202 is low. Thus, when the desiredfrequency response 202 is a function of the contrast sensitivity of the human eye, more information associated with the video sequence is maintained at points along the contrasts sensitivity functions in which the human eye is able to decipher more information. -
FIG. 3 illustrates an examplegraphical representation 300 of transform coefficients from a family of unequally-spaced transforms usable to pre-compress a video sequence according to some implementations. For example, the family of wavelet transforms is an example of an unevenly-spaced transform domain, which may be utilized by a dimensional processor to pre-compress video sequences or other image data. - In the illustrated example, an unevenly-spaced frequency transform, generally indicated by
graph 302, may again be representative of a contrast sensitivity of the human eye. In the current example, the unevenly-spaced frequency transform 302 may be approximated by specifying an amplitude per unevenly-spaced transform domain component frequency, generally indicated bygraph 304. Alternatively, the desiredfrequency response 302 may be approximated by specifying a quantizer calculated as the inverse of the amplitude per unevenly-spaced transform domain component frequency, generally indicated as 306. Thus, as described above with respect toFIG. 2 , the when the desiredfrequency response 302 is high more information associated with the video sequence is maintained, thus, maintaining more information at points along the contrasts sensitivity functions at which the human eye is able to decipher more information. -
FIG. 4 illustrates an example graphical representation ofmultiplicative constants 400 in a one-dimensional transform domain associated with a one-dimensional description of human visibility as a function of frequency according to some implementations. In the illustrated example, a reversible linear discrete transform, theDiscrete Cosine Transform 402 is selected: -
- where N is the block size of the transform, i is a pixel index counting from 0 to N-1, xi is the ith pixel in the block, u is the mapped discrete frequency index, from 0 to N-1, and Xu is the uth frequency amplitude.
- The reversible linear
discrete transform 402 maps to a set of discrete frequencies, generally indicated by 404. Thefrequencies 404 may be either evenly-spaced or unevenly spaced, as described above with respect toFIGS. 2 and 3 . A normalized human visualsystem transfer function 406 is defined in frequency space such that substantially perfect reproduction of a video sequence following pre-compression is defined as unity: -
CSF(u)=2.6*(0.0192+0.114u)*e −(0.114u)1.1 - where u is the frequency of the visual stimulus in cycles per degree and CSF(u) is the relative sensitivity of the human eye to the frequency u. In this example, the normalized human visual
system transfer function 406 may be sampled at thediscrete frequencies 404 of the reversible lineardiscrete transform 402 to generate themultiplicative constants 400. It should be understood, that onemultiplicative constant 400 may be generated per saiddiscrete frequency 404. -
FIG. 5 illustrates an example graphical representation of quantizingvalues 500 in a one-dimensional transform domain associated with a one-dimensional description of human contrast sensitivity as a function of frequency according to some implementations. In the current example, a dimensional pre-compressor may assigning fixedquantizer values 500 to each individual frequency amplitude F0-Fn-1. - In the illustrated example, a reversible linear
discrete transform 402 is selected based at least in part to result in a mapping to a set ofdiscrete frequencies 504. The set ofdiscrete frequencies 504 may either be evenly-spaced or unevenly spaced, as discussed above with respect toFIGS. 2 and 3 . In this example, the scaled inverse of the normalized human visual system contrast sensitivity function 506 may be defined in transform space or frequency space as: -
- where N is the transform block size, u is the frequency of the visual stimulus in cycles per degree, and Q(u) is the quantizer value associated with the frequency u. In the illustrated example, the scaled inverse normalized human visual system contrast sensitivity function 506 may be sampled at the discrete frequencies F0-
F n-1 504 of the reversible lineardiscrete transform 502 per discrete frequency F0-Fn-1 in the illustrated example, the sampling at the discrete frequencies F0-F n-1 504 generates the quantizer value for each sampled frequency. For instance, in the current example, the scaled inverse normalized human visual system contrast sensitivity function 506 may be sampled at the discrete frequencies F4, generally indicated by theline 508, to generate the quantizer value at F4. -
FIG. 6 illustrates an example a one-dimensional processor 600 for pre-compressing a video sequence according to some implementations. The one-dimensional processor 600 may include a linearpixel input buffer 602, aforward transform engine 604, a frequency domain frequencyresponse shaping engine 606, aninverse transform engine 608, and a linearpixel output buffer 610. - During operation, the one-
dimensional processor 600 receivesinput pixels 612 and store the input pixels in theinput pixel buffer 602 prior to pre-compression. Theinput pixels 612 may be associated with a frame or multiple frames of a video sequence or a still image or photograph. Theinput pixels 612 may be a series of adjacent pixels, such as a row or column of an image, photograph, or frame of a video sequence. In some cases, theinput pixels 612 may include a series of adjacent pixels over multiple frames of the video sequence, such as pixels having the same coordinate in each frame sequential of the video sequence. - Once the
input pixels 612 are buffered and arranged in a desired grouping, theforward transform engine 604 may receive a reversible linearforward transform 614 selected as the transform for the pre-compression operations. In some cases, the reversible linearforward transform 614 may be a stored internal transform that is fixed relative to thepixel input 612, while in other cases, the reversible linearforward transform 614 may be selected per buffering or per video sequence. For instance, theforward transform 614 may be an evenly-spaced transform (e.g., the Discrete Cosine Transform or the Karhunen-Loeve Transform), or an unevenly-spaced transform (e.g., a wavelet transform). - In some examples, the
forward transform engine 604 may be configured to perform the reversible linearforward transform 614 on theinput pixels 612 stored in the linearpixel input buffer 602. The operation of the reversible linearforward transform 614 on theinput pixels 612 generates a frequency domain representation of theinput pixels 612 store in the linearpixel input buffer 602. - The frequency domain frequency
response shaping engine 606 receives the frequency domain representation of theinput pixels 612. For example, the frequency domain frequencyresponse shaping engine 606 may be configured to apply a frequencyresponse shape function 616 to the frequency domain representation of theinput pixels 612 while stored in the linearpixel input buffer 602. The frequencyresponse shape function 616 may cause the frequency domain frequencyresponse shaping engine 606 to apply a multiplicative constant to individual frequency amplitudes of the frequency domain representation of theinput pixels 612. In another example, the frequencyresponse shape function 616 may cause the frequency domain frequencyresponse shaping engine 606 to apply a quantize operation followed by a dequantize operation using a quantizing factor determined for each individual frequency of the frequency domain representation of theinput pixels 612. - The
reverse transform engine 608 may be configured to receive areverse transform 618 to convert the frequency domain representation of theinput pixels 612 to the pixel space. For example, thereverse transform engine 608 may be configured to perform the reverse transform 618 on an output of the frequency domain frequencyresponse shaping engine 606 to generate theoutput pixels 620 or a pixel domain representation of out of the frequency domain frequencyresponse shaping engine 606. Theoutput pixels 620 may be stored in the linearpixel output buffer 610 to allow a remote component (e.g., a video encoder or other circuit/module) to access theoutput pixels 620 for transmission or further processing. -
FIG. 7 illustrates an example of a one-dimensional pre-compressor 700 for pre-compressing a video sequence according to some implementations. For example, a buffer, such as thelinear input buffer 602 ofFIG. 6 above, may store avideo sequence 702 including a plurality ofindividual frames 704 to be pre-compressed. As discussed above, the one-dimensional pre-compressor 700 may be configured to process a series of adjacent pixels from within theframes 704 of thevideo sequence 702. In the illustrated example a temporal line of pixels is being processed as a unit. Thus, in the current example, one pixel from a fixed location or coordinate within each of theframes 704 may be selected to form the set ofpixels 706. - The set of
pixels 706 may then be accessed or received by one of a plurality of one-dimensional transform processors 708. In the illustrated example, each of the one-dimensional transform processors 708 may include a one-dimensional transform engine (such as forward transform engine 604), a one-dimensional frequency domain frequency response shaping engine (such as the frequency domain frequency response shaping engine 606), and a one-dimensional inverse transform engine (such as the inverse transform engine 608). In the illustrated example, the set ofpixels 706 may be pre-compressed by the one-dimensional transform processors 708(1) to generate a set ofoutput pixels 710 representative of the same visual content as the set ofpixels 706 in thevideo sequence 702 but in a manner that when compressed by a video encoder results in an improved rate of compression. - Additionally, as illustrated in the current example, the one-
dimensional processor 700 may include a plurality of one-dimensional transform processors 708(1)-(K). Thus, each of the one-dimensional transform processors 708(1)-(K) may process a set of pixels, such as set ofpixels 706, in parallel. In the illustrated example, the one-dimensional transform processors 708(K) is configured to receive a set ofpixels 712 of a second set offrames 714 of a second video sequence 716 (such as additional frames of the same video as thevideo sequence 702 or another video altogether). The one-dimensional transform processors 708(1)-(K) may pre-compress the set ofpixels 712 into a set ofpre-compressed output pixels 718 that may result in an improved rate of compression over the set ofpixels 712 when compressed by a video encoder. - While the current example illustrates, one-dimensional transform processors 708(1) and 708(K) processing sets of
pixels different video sequences different frames dimensional pre-compressor 700. -
FIG. 8 illustrates an example graphical representation ofmultiplicative constants 800 in a two-dimensional transform domain associated with a two-dimensional description of human visibility as a function of frequency according to some implementations. In the current example, a dimensional pre-compressor may assignmultiplicative constants 800 to each individual frequency amplitude of a normalized two-dimensional human visual system transfer function. - For example, a reversible two-dimensional linear
discrete transform 802 may be selected to map a two-dimensional set ofdiscrete frequencies 804. Thediscrete frequencies 804 may be mapped using either an evenly-spaced or an unevenly spaced function in either of the two-dimensions. A normalized human visualsystem transfer function 806 may be defined in the two-dimensional frequency space such that substantially perfect reproduction of visual data as detected by a human eye may be defined as unity: -
CSF(u,v)=2.6*(0.0192+0.114√{square root over (u*u+v*v)})*e −(−0.114√{square root over (u*u+v*v)})1.1 , - Where u and v are each associated with a frequency of the visual stimulus in cycles per degree in either pixel or line direction and CSF(u,v) is defined as a relative sensitivity of the human eye to the joint frequency u and v.
- As shown in the illustrated example, the normalized two-dimensional human visual system transfer function is non-linear with respect to pixel-direction frequencies (u) and line-direction frequencies (v). Thus, in some cases, the normalized two-dimensional human visual
system transfer function 806 is sampled at each frequency of the two-dimensional set ofdiscrete frequencies 804. For instance, the illustrated example depicts agraph 808 of the normalized two-dimensional human visual system transfer function. As shown, the normalized two-dimensional human visualsystem transfer function 808 illustrates a typical grid of frequencies at which sampling of the two-dimensional human visualsystem transfer function 806 takes place. -
FIG. 9 illustrates an example representation of a two-dimensional description of human contrast sensitivity as a function of frequency according to some implementations. In the illustrated example, a two-dimensional systemcontrast sensitivity function 900 is defined in terms of viewing conditions. For example, the viewing conditions may include an expected average ambient luminance I 902, and additional variables u 904, temporal frequency, X0 906 (e.g., angle subtended by DCT block), and Xmax 908 (e.g., angle subtended by display surface). -
FIG. 10 illustrates an example a two-dimensional processor 1000 for pre-compressing a video sequence according to some implementations. The two-dimensional processor 1000 may include a framesub-block input buffer 1002, a forward two-dimensional transform engine 1004, a two-dimensional frequency domain frequencyresponse shaping engine 1006, a two-dimensionalinverse transform engine 1008, and a framesub-block output buffer 1010. - During operation, the two-
dimensional processor 1000 receives input blocks 1012 and store the input blocks in theinput block buffer 1002 prior to pre-compression. The input blocks 1012 may be associated with a block of pixels of a frame of a video sequence, a still image, or a photograph. Once the input blocks 1012 are buffered, the two-dimensionalforward transform engine 1004 may receive a reversible two-dimensional linearforward transform 1014 selected as the transform for the pre-compression operations. In some cases, the reversible two-dimensional linearforward transform 1014 may be a stored internal transform that is fixed relative to the input blocks 1012, while in other cases, the reversible two-dimensional linearforward transform 1014 may be selected per buffering or per video sequence. For instance, the reversible two-dimensional forward transform 1014 may be an evenly-spaced transform (e.g., the Discrete Cosine Transform or the Karhunen-Loeve Transform), or an unevenly-spaced transform (e.g., a wavelet transform). - In some examples, the two-dimensional
forward transform engine 1004 may be configured to perform the reversible two-dimensional linearforward transform 1014 on the input blocks 1012 stored in the framesub-block input buffer 1002. The operation of the two-dimensional reversible two-dimensional linearforward transform 1014 on the input blocks 1012 generates a frequency domain representation of the input blocks 1012 store in theinput buffer 1002. - The two-dimensional frequency domain frequency
response shaping engine 1006 receives the frequency domain representation of the input blocks 1012. For example, the two-dimensional frequency domain frequencyresponse shaping engine 1006 may be configured to apply a two-dimensional frequencyresponse shape function 1016 to the two-dimensional frequency domain representation of the input blocks 1012 while stored in theinput buffer 1002. The two-dimensional frequencyresponse shape function 1016 may cause the two-dimensional frequency domain frequencyresponse shaping engine 1006 to apply a multiplicative constant to individual frequency amplitudes of the two-dimensional frequency domain representation of the input blocks 1012. In another example, the two-dimensional frequencyresponse shape function 1016 may cause the two-dimensional frequency domain frequencyresponse shaping engine 1006 to apply a quantize operation followed by a dequantize operation using a quantizing factor determined for each individual frequency of the frequency domain representation of the input blocks 1012. - The two-dimensional
reverse transform engine 1008 may be configured to receive a two-dimensional reverse transform 1018 to convert the two-dimensional frequency domain representation of the input blocks 1012 to the pixel space. For example, the two-dimensionalreverse transform engine 1008 may be configured to perform the two-dimensional reverse transform 1018 on an output of the two-dimensional frequency domain frequencyresponse shaping engine 1006 to generateoutput blocks 1020 or a pixel domain representation of out of the two-dimensional frequency domain frequencyresponse shaping engine 1006. The output blocks 1020 may be stored in theoutput buffer 1010 to allow a remote component (e.g., a video encoder or other circuit/module) to access theoutput blocks 1020 for transmission or further processing. -
FIG. 11 illustrates an example of a two-dimensional pre-compressor 1100 for pre-compressing a video sequence according to some implementations. For example, a buffer, such as theinput block buffer 1002 ofFIG. 10 above, may store a video sequence including a plurality ofindividual frames 1102 to be pre-compressed. As discussed above, the two-dimensional pre-compressor 1100 may be configured to process blocks of image data or sub-blocks of aframe 1104. In the illustrated example, the two-dimensional pre-compressor 1100 may be pre-compressing the sub-block 1104(3). - The sub-blocks 1104(3) may be accessed or received by one of a plurality of two-
dimensional transform processors 1106. Each of the two-dimensional transform processors 1106 may include a two-dimensional transform engine (such as two-dimensional forward transform engine 1004), a two-dimensional frequency domain frequency response shaping engine (such as the two-dimensional frequency domain frequency response shaping engine 1006), and a two-dimensional inverse transform engine (such as the two-dimensional inverse transform engine 1008). In the illustrated example, the sub-block 1104(3) is being processed by the two-dimensional transform processors 1106(3). Thus, in one specific example, the other two-dimensional transform processors 1106(1) and 1106(2)-1106(K) may pre-compress the corresponding sub-blocks 1104(1) and 1104(2)-1104(K). - In one example, the two-dimensional transform processors 1106(3) may convert pixels of the sub-block 1104(3) into the frequency domain, quantize a frequency domain representation of the pixels of the sub-block 1104(3), and convert frequency domain representation of the pixels of the sub-block 1104(3) back into the pixel domain following quantization. Thus, the two-dimensional transform processors 1106(3) may generate pre-compress sub-block 1108(3) that may be a substantially visibly equivalent representation of the sub-block 1104(3) when viewed by a human. The pre-compress sub-block 1108(3) may result in an improved rate of compression over the sub-block 1104(3) when compressed by a video encoder.
- In the illustrated example, it should be understood that the size of a
frame sub-block 1108 is taken as the size of a block in the two-dimensional transform. In some cases, the size of the sub-blocks processed using two-dimensional transform may be equal to a size of a frame of thevideo sequence 1102. -
FIG. 12 illustrates an example representation a three-dimensional description of human visibility as a function of frequency according to some implementations. For example,FIG. 12 shows a formulation of a three-dimensional human vision system transfer function 1202 which may be used to generate or assign multiplicative constants to individual frequency amplitudes in three-dimensions. In general, the normalized three-dimensional human visual system transfer function is not separable by pixel frequency u, line frequency v, or frame rate w and, thus, requires pre-compressing using a three-dimensional pre-compressor or using a three-dimensional domain transform. -
FIG. 13 illustrates another example representation a three-dimensional description of human visibility as a function of frequency according to some implementations. For instance, the illustrated example shows a formulation of a three-dimensional systemcontrast sensitivity function 1300. The three-dimensional systemcontrast sensitivity function 1300 may be defined in terms of viewing conditions. In some cases, the viewing conditions may include configuration item expected averageambient luminance I 1302, additional variables s 1304, spatial frequency, w 1306 temporal frequency, X0 1308 (e.g., angle subtended by DCT block), and Xmax 1310 (e.g., angle subtended by display surface). -
FIG. 14 illustrates an example a three-dimensional processor for pre-compressing 1400 a video sequence according to some implementations. The three-dimensional processor 1400 may include a framesub-block input buffer 1402, a forward three-dimensional transform engine 1404, a three-dimensional frequency domain frequencyresponse shaping engine 1406, a three-dimensional inverse transform engine 1408, and a framesub-block output buffer 1410. - During operation, the three-
dimensional processor 1400 receives three-dimensional input blocks 1412 and store the input blocks in theinput block buffer 1402 prior to pre-compression. The input blocks 1412 may be associated with a block of pixels of a plurality of frames of a video sequence. Once the input blocks 1412 are buffered, the three-dimensionalforward transform engine 1404 may receive a reversible three-dimensional linearforward transform 1414 selected as the transform for the pre-compression operations. In some cases, the reversible three-dimensional linearforward transform 1414 may be a stored internal transform that is fixed relative to the input blocks 1412, while in other cases, the reversible three-dimensional linearforward transform 1414 may be selected per buffering or per video sequence. For instance, the three-dimensional forward transform 1414 may be an evenly-spaced transform (e.g., the Discrete Cosine Transform or the Karhunen-Loeve Transform), or an unevenly-spaced transform (e.g., a wavelet transform). - In some examples, the three-dimensional
forward transform engine 1404 may be configured to perform the reversible three-dimensional linearforward transform 1414 on the input blocks 1412 stored in theinput buffer 1402. The operation of the reversible three-dimensional linearforward transform 1414 on the input blocks 1412 generates a frequency domain representation of the input blocks 1412 store in theinput buffer 1402. - The three-dimensional frequency domain frequency
response shaping engine 1406 receives the frequency domain representation of the input blocks 1412. For example, the three-dimensional frequency domain frequencyresponse shaping engine 1406 may be configured to apply a three-dimensional frequencyresponse shape function 1416 to the three-dimensional frequency domain representation of the input blocks 1412 while stored in theinput buffer 1402. The three-dimensional frequencyresponse shape function 1416 may cause the three-dimensional frequency domain frequencyresponse shaping engine 1406 to apply a multiplicative constant to individual frequency amplitudes of the three-dimensional frequency domain representation of the input blocks 1412. In another example, the three-dimensional frequencyresponse shape function 1416 may cause the three-dimensional frequency domain frequencyresponse shaping engine 1406 to apply a quantize operation followed by a dequantize operation using a quantizing factor determined for each individual frequency of the frequency domain representation of the input blocks 1412. - The three-dimensional reverse transform engine 1408 may be configured to receive a three-
dimensional reverse transform 1418 to convert the three-dimensional frequency domain representation of the input blocks 1412 to the pixel space. For example, the three-dimensional reverse transform engine 1408 may be configured to perform the three-dimensional reverse transform 1418 on an output of the three-dimensional frequency domain frequencyresponse shaping engine 1406 to generateoutput blocks 1420 or a pixel domain representation of out of the three-dimensional frequency domain frequencyresponse shaping engine 1406. The output blocks 1420 may be stored in theoutput buffer 1410 to allow a remote component (e.g., a video encoder or other circuit/module) to access theoutput blocks 1420 for transmission or further processing. -
FIG. 15 illustrates an example of a three-dimensional pre-compressor 1500 for pre-compressing a video sequence according to some implementations. For example, a buffer, such as theinput block buffer 1402 ofFIG. 14 above, may store a multiple sub-blocks over a plurality ofindividual frames 1502 to be pre-compressed as a unit. For example, the sub-blocks 1504(1) may be stored in theinput block buffer 1402 to process as a unit or block of image data. As discussed above, the three-dimensional pre-compressor 1500 may be configured to process three-dimensional blocks of image data or sub-blocks ofmultiple frames 1504. In the illustrated example, the three-dimensional pre-compressor 1500 may be pre-compressing the sub-block 1504(3) over multiple frames. - The sub-blocks 1504(1) may be accessed or received by one of a plurality of three-
dimensional transform processors 1506. Each of the three-dimensional transform processors 1506(1)-(K) may include a three-dimensional transform engine (such as three-dimensional forward transform engine 1404), a three-dimensional frequency domain frequency response shaping engine (such as the three-dimensional frequency domain frequency response shaping engine 1406), and a three-dimensional inverse transform engine (such as the three-dimensional inverse transform engine 1408). In the illustrated example, the sub-block 1504(1) is being processed by the three-dimensional transform processors 1506(1). Thus, in one specific example, the other three-dimensional transform processors 1506(2)-1106(K) may pre-compress the corresponding sub-blocks 1504(2)-1504(K). - In one example, the three-dimensional transform processors 1506(1) may convert pixels of the sub-blocks 1504(1) into the frequency domain, quantize a frequency domain representation of the pixels of the sub-block 1504(1), and convert frequency domain representation of the pixels of the sub-block 1504(1) into the pixel domain following quantization. Thus, the three-dimensional transform processors 1506(1) may generate pre-compress sub-block 1508(1) that may be a substantially visibly equivalent representation of the sub-block 1504(1) when viewed by a human. The pre-compress sub-block 1508(1) may result in an improved rate of compression over the sub-block 1504(1) when compressed by a video encoder.
- In the illustrated example, it should be understood that the size of a
frame sub-block 1504 is taken as the size of a block in the three-dimensional transform. In some cases, the size of the sub-blocks processed using three-dimensional transform may be equal to a size of aframe 1502. It should also be understood, that themultiple frames 1502 may include more than four sub-blocks. -
FIG. 16 illustrates example components of an electronic device that may be configured to perform pre-compression according to some implementations. For example, in some cases, a dimensional pre-compressor 1600 may be formed in software. Thus, in some cases, the dimensional pre-compressor 1600 may include processing resources, as represented byprocessors 1602, and computer-readable storage media 1604. The computer-readable storage media 1604 may include volatile and nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data. Such memory includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, RAID storage systems, or any other medium which can be used to store the desired information and which can be accessed by a computing device. - The dimensional pre-compressor 1600 may also include one or
more communication interfaces 1606, which may support both wired and wireless connection to various networks, such as cellular networks, radio (e.g., radio-frequency identification RFID), WiFi networks, short-range or near-field networks (e.g., Bluetooth®), infrared signals, local area networks, wide area networks, the Internet, and so forth. For example, thecommunication interfaces 1606 may allow the dimensional pre-compressor 1600 to receive image data, such as video sequences, frames, or still images. The communication interfaces 1606 may also allow theposition recognizing system 1300 to send the output data (e.g., the pre-compressed frames) to a video encoder or remote receiver device. - Several modules, sets of instructions, data stores, and so forth may be stored within the computer-
readable media 1604 and configured to execute on theprocessors 1602. For example, a dimensionalforward transform module 1608, a dimensional transformspace shaping module 1610, and a dimensionalreverse transform module 1612, as well as other modules. In some implementations, the computer-readable media 1604 may store data, such as store input pixel or block data 1612 (e.g., the original video sequences or images), output pixel or block data 1614 (e.g., the pre-compressed video sequences or images), one or more contrast sensitivity functions (e.g., the contrast sensitivity of the human eye), one or more transforms 1618 (e.g., one, two, or three dimensional transforms in the forward or reverse direction including wavelet transforms, the Discrete Cosine Transform, the Karhunen-Loeve transform, or other linear transforms), various block sizes associated with the one ormore transforms 1618, and one or more shaping functions. - In some examples, the dimensional
forward transform module 1608 may apply a forward transform to generate a frequency domain reparation of an input image or video sequence. The dimensional transformspace shaping module 1610 may quantize and/or dequantize the frequency domain reparation of an input image to remove information from the input image or video sequence that is substantially imperceptible to the human eye using a function representative of contrast sensitivity of the human eye. The dimensionalreverse transform module 1612 apply a reverse transform to generate a pixel domain representation of the output of the dimensional transformspace shaping module 1610. - Although the subject matter has been described in language specific to structural features, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features described. Rather, the specific features are disclosed as illustrative forms of implementing the claims.
Claims (20)
1. A method comprising:
receiving a set of frames;
transforming the set of frames into a linearly equivalent transform representation of the set of frames;
determining transform components associated with a dimensional transform based at least in part on blocks of the visual data associated with the set of frames, each of the blocks having a specified transformation block size;
determining a specified angular size and a resolution of visually quantizers by applying a spatial contrast sensitivity function to frequencies of the transform components;
quantizing the set of frames using the visually quantizers to generate pre-compressed representation of the set of frames; and
performing an inverse transform operation on the pre-compressed representation of the set of frames to construct a pre-compressed set of frames, the pre-compressed set of frames compressible, by a video encoder, to a smaller size than the set of frames.
2. The method as recited in claim 1 , wherein the dimensional transform and the inverse transform operation are in the linearly equivalent transform space.
3. The method as recited in claim 1 , wherein the visually quantizers are determined at least in part based on an inverse of an amplitude of an unevenly-spaced transform domain component frequency.
4. The method as recited in claim 1 , wherein the dimensional transform is an unevenly-spaced frequency transform.
5. The method as recited in claim 1 , wherein the dimensional transform is an evenly-spaced frequency transform.
6. The method as recited in claim 1 , wherein the set of frames are part of a video sequence.
7. The method as recited in claim 1 , wherein the dimensional transform is a two-dimensional transform.
8. A method comprising:
receiving a set of frames;
generating a series of adjacent pixels from the set of frames;
generating a frequency domain representation of the series of pixels by applying a reversible linear forward transform to the series of adjacent pixels;
applying a frequency response shape function to the frequency domain representation of the series of pixels; and
performing an inverse transform operation on the frequency domain representation of the series of pixels to in part construct a pre-compressed set of frames, the pre-compressed set of frames compressible, by a video encoder, to a smaller size than the set of frames.
9. The method as recited in claim 8 , wherein applying the frequency response shape function to the frequency domain representation of the series of pixels includes applying a multiplicative constant to individual frequency amplitudes of the frequency domain representation of the series of pixels.
10. The method as recited in claim 8 , wherein applying the frequency response shape function to the frequency domain representation of the series of pixels includes applying a quantize operation followed by a dequantize operation using a quantizing factor determined for each individual frequency of the frequency domain representation of the series of pixels.
11. The method as recited in claim 8 , further comprising:
generating a second series of adjacent pixels from the set of frames;
generating, substantially concurrently with the first frequency domain representation of the first series of pixels, a second frequency domain representation of the second series of pixels by applying the reversible linear forward transform to the second series of adjacent pixels;
applying, substantially concurrently with applying the frequency response shape function to the first frequency domain representation of the first series of pixels, the frequency response shape function to the second frequency domain representation of the second series of pixels; and
performing, substantially concurrently with performing the inverse transform operation on the first frequency domain representation of the first series of pixels, a second inverse transform operation on the second frequency domain representation of the second series of pixels to in part construct the pre-compressed set of frames.
12. A method comprising:
receiving a plurality of three-dimensional (3D) input block associated with a set of frames, each of the plurality of 3D input blocks having a specified transformation block size;
generate a frequency domain representation of the plurality of 3D input blocks by applying a reversible linear forward transform to the plurality of 3D input blocks;
apply a frequency response shape function to the frequency domain representation of the plurality of 3D input blocks; and
performing an inverse transform operation on the frequency domain representation of the plurality of 3D input blocks to in part construct a pre-compressed set of frames.
13. The method as recited in claim 12 , further comprising receiving the reversible linear forward transform and storing the reversible linear forward transform prior to generating the frequency domain representation of the plurality of 3D input blocks.
14. The method as recited in claim 12 , further comprising selecting the reversible linear forward transform based at least in part on the set of frames prior to generating the frequency domain representation of the plurality of 3D input blocks.
15. The method as recited in claim 12 , wherein the reversible linear forward transform is a reversible 3D linear forward transform.
16. The method as recited in claim 12 , wherein applying the frequency response shape function to the frequency domain representation of the plurality of 3D input blocks includes applying a multiplicative constant to individual frequency amplitudes of the frequency domain representation of the plurality of 3D input blocks.
17. The method as recited in claim 12 , wherein applying the frequency response shape function to the frequency domain representation of the plurality of 3D input blocks includes applying a quantize operation followed by a dequantize operation using a quantizing factor determined for each individual frequency of the frequency domain representation of the plurality of 3D input blocks.
18. The method as recited in claim 12 , further comprising:
encoding the pre-compressed set of frames into an encoded set of frames; and
sending the encoded set of frames to a remote device.
19. The method as recited in claim 12 , further comprising:
compressing the pre-compressed set of frames into a compressed set of frames, the compressed set of frames having a smaller size than a compressed version of the set of frames.
20. The method as recited in claim 12 , further comprising:
receiving a second plurality of 3D input block associated with the set of frames;
generating, substantially concurrently with the first frequency domain representation of the first plurality of 3D input block, a second frequency domain representation of the second plurality of 3D input block by applying the reversible linear forward transform to the second plurality of 3D input block;
applying, substantially concurrently with applying the frequency response shape function to the first frequency domain representation of the first plurality of 3D input block, the frequency response shape function to the second frequency domain representation of the second plurality of 3D input block; and
performing, substantially concurrently with performing the inverse transform operation on the first frequency domain representation of the first plurality of 3D input block, a second inverse transform operation on the second frequency domain representation of the second plurality of 3D input block to in part construct the pre-compressed set of frames.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/377,489 US20190238872A1 (en) | 2015-04-06 | 2019-04-08 | Method and apparatus to process video sequences in transform space |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562143648P | 2015-04-06 | 2015-04-06 | |
US15/091,625 US10298942B1 (en) | 2015-04-06 | 2016-04-06 | Method and apparatus to process video sequences in transform space |
US16/377,489 US20190238872A1 (en) | 2015-04-06 | 2019-04-08 | Method and apparatus to process video sequences in transform space |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/091,625 Continuation US10298942B1 (en) | 2015-04-06 | 2016-04-06 | Method and apparatus to process video sequences in transform space |
Publications (1)
Publication Number | Publication Date |
---|---|
US20190238872A1 true US20190238872A1 (en) | 2019-08-01 |
Family
ID=66541039
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/091,625 Expired - Fee Related US10298942B1 (en) | 2015-04-06 | 2016-04-06 | Method and apparatus to process video sequences in transform space |
US16/377,489 Abandoned US20190238872A1 (en) | 2015-04-06 | 2019-04-08 | Method and apparatus to process video sequences in transform space |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/091,625 Expired - Fee Related US10298942B1 (en) | 2015-04-06 | 2016-04-06 | Method and apparatus to process video sequences in transform space |
Country Status (1)
Country | Link |
---|---|
US (2) | US10298942B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210274231A1 (en) * | 2020-02-27 | 2021-09-02 | Ssimwave Inc. | Real-time latency measurement of video streams |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113518227B (en) * | 2020-04-09 | 2023-02-10 | 于江鸿 | Data processing method and system |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8711925B2 (en) * | 2006-05-05 | 2014-04-29 | Microsoft Corporation | Flexible quantization |
US20100226444A1 (en) * | 2009-03-09 | 2010-09-09 | Telephoto Technologies Inc. | System and method for facilitating video quality of live broadcast information over a shared packet based network |
JP2011029954A (en) * | 2009-07-27 | 2011-02-10 | Sony Corp | Image encoding device and image encoding method |
US20120057629A1 (en) * | 2010-09-02 | 2012-03-08 | Fang Shi | Rho-domain Metrics |
US8442338B2 (en) * | 2011-02-28 | 2013-05-14 | Sony Corporation | Visually optimized quantization |
JP5900163B2 (en) * | 2012-05-30 | 2016-04-06 | ソニー株式会社 | Image processing apparatus, image processing method, and program |
-
2016
- 2016-04-06 US US15/091,625 patent/US10298942B1/en not_active Expired - Fee Related
-
2019
- 2019-04-08 US US16/377,489 patent/US20190238872A1/en not_active Abandoned
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210274231A1 (en) * | 2020-02-27 | 2021-09-02 | Ssimwave Inc. | Real-time latency measurement of video streams |
US11638051B2 (en) * | 2020-02-27 | 2023-04-25 | Ssimwave, Inc. | Real-time latency measurement of video streams |
Also Published As
Publication number | Publication date |
---|---|
US10298942B1 (en) | 2019-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Haghighat et al. | Real-time fusion of multi-focus images for visual sensor networks | |
US10432971B2 (en) | Image data compression and decompression using minimize size matrix algorithm | |
Stamm et al. | Wavelet-based image compression anti-forensics | |
US20190238872A1 (en) | Method and apparatus to process video sequences in transform space | |
Parmar et al. | Comparison of DCT and wavelet based image compression techniques | |
Jayakar et al. | Color image compression using SPIHT algorithm | |
CN104135664A (en) | Method for digital processing of medical image | |
Deshlahra et al. | A comparative study of DCT, DWT & hybrid (DCT-DWT) transform | |
US20130315317A1 (en) | Systems and Methods for Compression Transmission and Decompression of Video Codecs | |
Lukin et al. | Automatic lossy compression of noisy images by spiht or jpeg2000 in optimal operation point neighborhood | |
Jakisc et al. | Analysis of different influence of compression algorithm on the image filtered Laplacian, Prewitt and Sobel operator | |
Rani et al. | Comparative analysis of image compression using dct and dwt transforms | |
MR et al. | Medical image compression using embedded zerotree wavelet (EZW) coder | |
Patel | Lossless DWT Image Compression using Parallel Processing | |
Zhao et al. | Effects of lossy compression on lesion detection: predictions of the nonprewhitening matched filter | |
Rakshit et al. | A Hybrid JPEG & JPEG 2000 Image Compression Scheme for Gray Images | |
CN110113619B (en) | Encoding method, encoding device, electronic equipment and storage medium | |
Hakami et al. | Improve data compression performance using wavelet transform based on HVS | |
KR100810137B1 (en) | Apparatus and method for reconstructing image using inverse discrete wavelet transforming | |
Sekaran et al. | Performance analysis of compression techniques using SVD, BTC, DCT and GP | |
El-Sharkawey et al. | Comparison between (RLE & Huffman and DWT) Algorithms for Data Compression | |
JPH02122766A (en) | Device and method for compressing picture data and device and method for expanding compression data | |
Siddeq | Novel methods of image compression for 3D reconstruction | |
JPH10336658A (en) | Image processor | |
WO2003084205A2 (en) | Repetition coded compression for highly correlated image data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ZPEG, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WESTWATER, RAYMOND;REEL/FRAME:051440/0542 Effective date: 20160809 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |