WO2014107762A1 - Method and apparatus for comparing two blocks of pixels - Google Patents

Method and apparatus for comparing two blocks of pixels Download PDF

Info

Publication number
WO2014107762A1
WO2014107762A1 PCT/AU2014/000006 AU2014000006W WO2014107762A1 WO 2014107762 A1 WO2014107762 A1 WO 2014107762A1 AU 2014000006 W AU2014000006 W AU 2014000006W WO 2014107762 A1 WO2014107762 A1 WO 2014107762A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
pixels
signature
blocks
vector
Prior art date
Application number
PCT/AU2014/000006
Other languages
English (en)
French (fr)
Inventor
Vincenzo Liguori
Original Assignee
Vincenzo Liguori
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2013900077A external-priority patent/AU2013900077A0/en
Application filed by Vincenzo Liguori filed Critical Vincenzo Liguori
Priority to CN201480005080.8A priority Critical patent/CN104937938A/zh
Priority to EP14737954.9A priority patent/EP2944085A4/en
Priority to KR1020157021426A priority patent/KR20150128664A/ko
Publication of WO2014107762A1 publication Critical patent/WO2014107762A1/en
Priority to US14/750,942 priority patent/US20150296207A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/547Motion estimation performed in a transform domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets

Definitions

  • a number of problems in image recognition and image compression rely on comparing two blocks of pixels taken from different images. For example, in video transmission and compression schemes the redundancy between successive frames is utilized to reduce the bandwidth and storage requirements of the video.
  • block motion prediction schemes each frame is divided into a plurality of fixed size blocks.
  • a frame to be transmitted is first coded in terms of blocks of a reference frame that has already been sent by finding the block in the reference frame that most closely matches the corresponding block in the current frame.
  • the current block is then represented as the block in the reference frame plus a difference block. If the reference block is a close match to the current block, the difference block will have substantially less information, and hence, can be coded using a lossy high compression image compression scheme that still allows the current block to be reconstructed at the receiver to the desired accuracy.
  • blocks of pixels from an object in a library must be compared to blocks of pixels in the image to determine if the library object is in the image.
  • the block of pixels from the library object must be matched against similar sized blocks at a number of locations in the image to determine whether the object is in the image and the location of that object within the image.
  • blocks of pixels from one view must be compared to blocks of pixels from a second view to identify an object that is present in both views and determine the three-dimensional location of the object.
  • the present invention includes a method for operating a data processing system to compare a first block of pixels, B i. in a current frame to a second block of pixels, B 2 , in the reference frame.
  • the method includes generating first and second signature vectors, V
  • ,V 2 ) is measured to provide a comparison of the similarity of the blocks.
  • the signature vectors are chosen such that D(B I , B 2 ) ⁇ D(B I ,B 3 ) then D(V,,V 2 ) ⁇ D(V , , V 3 ), where B 3 is a third block of pixels in the reference frame and such that computing ' D(B i ,Bz) imposes a first computational workload on the data processing system and computing D(V
  • the sum of the computational workload imposed by generating V[ and V 2 oh the data processing and the second computational workload are, on average, less than the first computational workload.
  • generating the first signature vector includes transforming the first block using a linear transformation to generate a component of the first signature vector.
  • the component preferably measures a power in a portion of the first block in spatial frequencies that are less than a first spatial frequency limit.
  • One of the components of the signature vector could also be chosen such that the component: measures a power in a portion of the first block in spatial frequencies in a first spatial frequency band having a low- spatial frequency cut-off greater than zero.
  • the linear transformation is a wavelet transformation.
  • a third signature vector, V 3 for a third block in the reference frame is generated.
  • the third signature vector is generated by updating the second signature vector.
  • the data processing system compares the distance between the first and third signature vectors with the distance between the first and second signature vectors to determine which of the second and third blocks is a better match to the first block.
  • the reference frame includes a plurality of rows and columns of pixels and wherein the third block is located on the same row or column of the reference frame as the second block and has pixels in common with the second block.
  • Figure I illustrates the matching -of Mocks between a current frame and a reference frame.
  • Figure 2 illustrates an apparatus for the matching of one block in the current frame to a sequence of blocks in the reference frame.
  • Figure 3 illustrates the transformation of an image using a two-dimensional wavelet transformation.
  • Figure 4 illustrates the transformation of an image block using the class of wavelet transformation discussed with respect to Figure 3.
  • Figure 5 illustrates a video compression engine according to one embodiment of the present invention.
  • Figure 6 illustrates an engine for performing stereo disparity matching.
  • the present invention can be more easily understood in terms of the problems encountered when an nxn block of pixels in a first frame, referred to as the current frame, is to be matched against all of the possible blocks of the same size within some target region in a second frame, referred to as the reference frame.
  • the current frame is shown at 20.
  • a block of pixels 21 in current frame 20 is to be matched against a plurality of target blocks in reference frame 25.
  • a typical target block is shown at 22.
  • target blocks are shown as non- overlapping to simplify the drawing, it is to be understood that in general the sequence of target blocks overlap one another, typically being displaced from one another by a distance of one pixel, in the image. While the example shown in Figure 1 has the target blocks being shifted along the same horizontal line, it is to be understood that in the more general case, the target blocks may be shifted with respect to one another in both the horizontal and vertical directions.
  • the images are NxN pixel images, where N»n.
  • the computational cost of such a search is approximately N 2 C h (n) if the entire reference frame is searched.
  • C b (n) is the computational cost of comparing two single nxn blocks.
  • comparisons that measure the correlation between the blocks such as those that utilize a sum of differences of corresponding pixels in each block have a cost that is proportional to n 2 .
  • the present invention reduces the average computational workload in comparing a block of pixels in the current frame with a sequence of target blocks in the reference frame by defining a signature vector that represents each of the blocks to he compared. The comparison is then carried out on the signature vectors rather than the corresponding blocks of pixels.
  • the number of components in the signature vector is much smaller than n 2 , for an nxn block.
  • the cost of comparing the two signature vectors is substantially less than the cost of comparing the two blocks in the prior art scheme.
  • the signatures of the present invention should satisfy two additional conditions.
  • V 1, V 2 Denote a function that measures the difference between two vectors by D(V 1, V 2 ), where V 1 and V 2 are the two vectors.
  • This function will be referred to as the distance function in the following discussion. For example, where Nv is the number of components in each vector. To simplify the following discussion, this particular function will be assumed unless otherwise indicated. However, it is to be understood that there are a number of different functions that could be used to measure the difference between two vectors. For example, a distance function that sums the square of the difference between the components of the vectors is also commonly used to measure the distance between two vectors.
  • the signature vector representing a block B by V(B).
  • the average computational workload in comparing a block in the current frame to all possible blocks within the search area of the reference frame must be less than the workload of comparing the blocks directly.
  • Figure 2 illustrates the matching of one block in the current frame to a sequence of blocks in the reference frame.
  • Signature vector 32 only has to be computed once for the entire sequence of blocks that are to be tested in the reference frame with respect to that block.
  • the workload in generating that signature vector can be amortized over all of the target block comparisons in the reference frame, and does not impose a significant computational workload on the process.
  • a signature vector 34 For each target reference block 33 in the reference frame, a signature vector 34 must be generated.
  • signature vector definitions that utilize only a small portion of the block of pixels, and hence, can provide the required reduction in computational workload.
  • a signature vector that has each component generated by the sum of a few pixels at predetermined locations in the block can provide a component of a signature vector that is computationally inexpensi ve and is less subject to noise than just selecting a sub-group of pixels.
  • the second requirement can be satisfied.
  • the workload needed to generate a signature for a block can be significantly reduced for some choices of signature vector by using a signature vector computed for a previous block as shown at 35.
  • a signature vector computed for a previous block For example, consider an nxn block of pixels having a base address of (t x, t y ) in an image represented by an array, l x . y .
  • the signature vector for the next block is merely the previous signature with the components of the previous signature vector shifted and the last component being replaced by the sum of the pixels in the new column that is introduced by shifting the block one pixel.
  • the shift operation can be performed by changing a pointer in that buffer.
  • the work to generate the new signature vector is essentially the n additions needed to sum the pixels in the new column.
  • the work to compare the two signature vectors is proportional to n.
  • the workload to compare the blocks using the signature vector method of the present invention is of order n, as opposed to the work to compare the blocks directly which is of order n 2 .
  • the distance between them is computed by a processor as shown at 36.
  • the minimum value of the distance is stored in the processor together with the location of the block in the reference frame that generated that minimum,
  • the components of the signature vector are derived from the transform coefficients of a linear transformation of the block of pixels.
  • the above described column summing algorithm is an example of such a linear transform.
  • Transforms used in image compression algorithms are also linear transformations and provide additional benefits.
  • a picture is compressed by first fitting the picture to an expansion using a set of two-dimensional basis functions. The particular set of basis functions depends on the particular image compression algorithm. The coefficients of the basis functions in the expansion become the "pixels" of the transformed image in image compression.
  • [0028] Define the "energy" in a block of pixels by the sura of the square of the pixel values over the block.
  • the image transform concentrates the energy in the image into a sub-set of the pixels of the transformed image. That is, the total energy in this sub-set per pixel, on average, is greater than the average energy per pixel in the original image (i.e., the sum of the squares of the pixel intensities divided by the total number of pixels in the original image).
  • the present invention is based on the observation that the transformed pixels having the most energy are also good candidates for constructing a signature vector. Such transformed pixels are less affected by noise and represent the information of interest to a human observer.
  • the image compression transforms can be viewed as providing a plurality of transformed images representing different spatial frequencies in the image.
  • a subset of the transformed "pixels" from the transformed images having different spatial frequency bands are used to construct the signature vector for a block of pixels.
  • the transformations used in image compression are reversible. That is, the picture can be recovered by summing the basis functions multiplied by the coefficients provided the original coefficients were computed to sufficient accuracy.
  • the present invention can utilize transforms or approximations that would not provide a reconstruction of the image.
  • the transform is used to concentrate the information of most interest in the picture into a sub-set of the transform coefficients, which are then provided with a better approximation than those associated with information of less interest to a human observer.
  • the present invention utilizes the observation that the pixels of the transformed image of a block of pixels in which the information that is of most interest to human observers are also likely to be pixels that are good candidates for generating components of a signature representing a block of pixels.
  • the transformed image must be calculated.
  • the coefficients that are being used to compute a component of a signature vector need be computed.
  • the computational workload to compute the coefficients is substantially less than that needed to transform an entire image prior to approximating the coefficients.
  • a transformation that allows a coefficient of interest to be computed for successive blocks of pixels that are offset with respect to one another by updating the previously computed coefficient rather than computing the coefficient directly from the pixels of the new block can further reduce the computational workload.
  • One class of image compression algorithms utilizes a set of basis functions thai are referred to as wavelets.
  • wavelets In a wavelet transformation of an image, the original image is transformed in a number of transformed images that represent the information in various spatial frequency ranges in different portions of the image.
  • the transformation of the image is typically performed by filtering the rows and columns of pixels using a plurality of filters. In the simplest case two filters are used. The first is a low-pass filter that emphasizes the low spatial frequencies and the second is a high pass filter that emphasizes the high spatial frequencies of the image.
  • Figure 3 illustrates the transformation of an image using a two-dimensional wavelet transformation.
  • the transformation is normally performed by first filtering the horizontal lines of pixels of the original image 41 with the two filters to create two sub-images 42 and 43, each having half the number of pixels.
  • Sub-image 42 emphasizes the low spatial frequencies in the horizontal direction
  • sub-image 43 represents the high spatial frequencies in the horizontal direction. Since the edges of objects have high spatial frequencies, sub-image 43 resembles a map of the edges in the original image that are crossed by moving in a horizontal direction.
  • Each of these sub-images is again filtered by filtering the columns of pixels in each sub-image through the two filters to arrive at the four sub-images shown at 44-47.
  • Sub-image 44 emphasizes the low spatial frequencies, and the remaining sub-images emphasize edges with different orientations.
  • the "pixels" in the various sub-images are actually coefficients of a fit to the original image using two-dimensional basis functions.
  • the specific basis functions depend on the details of the wavelet transform, which, in turn, determine the filter coefficients of the filter used to process the rows and columns of pixels. Since the transforms are linear in nature, it is sufficient to note that any "pixel” in the transformed image can be obtained from a weighted sum of pixels in the original image. [0033] For orthogonal transformations, it can be shown that any given image coefficient, i.e., "pixel" in the transformed image, can computed from a formula of the form
  • the parameters Wy i,j,m,n are weight factors that depend on the particular transformation. While (m,n) vary over the entire image, the transformation can be chosen such that the weight factors are only non-zero for a small number of pixels for each coefficient of Interest. The reduction of the computational workload to only a small number of pixels per coefficient can be further improved by setting some weight factors to zero. While such an approach would not be permitted in an image compression context, for the purposes of generating a signature, such approximations can provide an adequate signature for comparing two blocks of pixels, while further reducing the computational workload.
  • the matching process involves matching a block in the current frame to a number of different blocks in the reference frame.
  • the blocks in the reference frame are displaced from one another, usually by an offset of one pixel.
  • the weight factors are either 1 or 0.
  • the example given above in which the block of pixels was transformed by adding the pixels in each column is an example of such a transform.
  • the low frequency filter used to implement a wavelet transform using the Haar basis functions satisfies this constraint.
  • the high frequency filter in the Haar basis has the property that the coefficients are 0, 1, or - 1.
  • the block of pixels in the original image corresponding to this block has the same size but is displaced by one pixel to the right.
  • This block of pixels in the original image differs from the block t by two columns of pixels. That is. where C n+ i is the sum of the pixels in the new column of pixels that are included in block (t+ l ) and Ci is the sum of the pixels in the column of pixels in block (t) that is not included in block (t+l). If the sum of the columns is saved from block to block, the successive signatures can be computed by updating the previous coefficient by the sum of the new column of pixels. This operation has a computational complexity that is linear in the size of the block of pixels in the original image that determines the coefficient of the signature vector.
  • the component of the signature vector was computed from a low spatial frequency component of a Haar-based transformation of the original image.
  • signature components can also be based on high frequency pixels in the
  • the component can still be written in the form shown in Equation 6; however, the w m , complaint (t) can now also have a value of - 1.
  • the component for the (t+l) sl block can still be written in the form of a correction to the component; for block (t); however, additional columns must now be added and subtracted from A(t).
  • a component of a signature vector could be a transform coefficient of the type discussed above, it is often advantageous to reduce the number of bits needed to represent that component.
  • the computational workload and computational hardware needed to compute the distance between two signature vectors depends on the form of the components (integer, floating, etc.) and the number of bits needed to represent the component. Integer arithmetic can be carried out in less expensive hardware than floating point arithmetic, and a computation can be completed in less time. Hence, integer signature vector components are preferred. In addition, components that require smaller integers to represent the component are preferred. It should be noted that the transform computations can be carried out in parallel on a computing platform that supports multiple cores. Thus reducing the computational workload can also improve the overall speed of matching, which can be important in real-time applications.
  • transforms that utilize non-zero weight functions having values that can be represented by ⁇ 1 are preferred, as such transforms can be carried out in integer arithmetic, since the pixel values used in most image representations are integers, ft should be noted that any transform in which all of the weights are ⁇ C, where C is a constant can be computed using weights of ⁇ 1 followed by multiplication by C.
  • the multiplication by C can be included in the computations used to approximate the transform coefficient during quantization of the coefficient.
  • the number of bits needed to represent a transform coefficient can be much greater than the number needed to match blocks using a signature vector.
  • the component could require 16 bits to represent.
  • the transform coefficient could be negative as well as positive.
  • Arithmetic based on 8-bit integers would require less computational hardware.
  • the number of bits can be reduced by quantizing the transform coefficient to arrive at the component of the signature vector. The quantization maps the values obtained by the transform to an integer within some predetermined range.
  • the 16-bit representation of the transform coefficient could be mapped to an 8-bit integer. If the transform coefficient is represented by a 1.6-bit signed integer, the quantization can be implemented by shifting bits off of the integer while preserving the sign bit. Hence, the computational work involved quantizing the transform coefficient is small compared to the workload of generating the transform coefficient.
  • Signature vector components can also be constructed by combining transform coefficients.
  • a signature coefficient can be constructed from a weighted sum of two or more transform coefficients of the type discussed above. It should be noted that any linear function of transform coefficients can be expanded to obtain a new transform coefficient that satisfied Equation 5 discussed above.
  • the number of bits needed to represent a signature vector component after quantization can be further reduced by using entropy coding of the component. Entropy coding is a loss-less compression scheme in which the quantized values are replaced by codes having different numbers of bits. The quantized values that are used the most are assigned codes that have fewer bits than the quantized values that appear less frequently.
  • the quantization process can also be applied at the signature vector level. That is, the range of signature vectors obtained from the transform coefficients can be mapped to a predetermined set of vectors having fewer possible vectors.
  • the quantized vectors are then used in the block matching algorithm.
  • Each signature vector can be viewed as representing a point in an N v dimensional space.
  • vector quantization a set of lattice points are defined in the space. The signature vector is then replaced by the lattice point that is closest to that vector.
  • the set of quantized vectors can be mapped to a set of codes to further reduce the bits needed to specify the quantized vectors.
  • the quantized vectors can then be stored in a separate memory and retrieved via the codes.
  • the computational workload in generating the set of lattice points and determining which lattice point most closely approximates a signature vector can be excessive in the general case, as the optimum set of lattice points depends of the distribution of possible signature vectors.
  • the method Given a normalized vector, the method provides the identifier, and conversely, given an identifier, the method will return a normalized vector. Whether the additional workload of performing vector quantization provides a significant improvement over just measuri ng the distance between the signature vectors representing each block depends on the specific application.
  • the identifiers are analogous to the codes generated by pyramid vector quantization algorithms. The advantage of these identifiers lies in the ease of decoding the coded vectors that results from all. of the identifier codes having the same length. The codes provided by entropy encoding are of varying lengths, and hence, complicate the decoding process.
  • Pyramid vector quantization schemes generate normalized vectors, and hence, the normalization can be used as a separate component of the final, vector or just the normalized vectors can be compared.
  • the normalized vectors can be useful in matching problems in which the different images have different illumination levels.
  • the signature components can be generated starting from a linear transformation of the image block being matched.
  • Figure 4 illustrates the transformation of an image block using the class of wavelet transformation discussed above with respect to Figure 3.
  • the low frequency transform coefficients are again transformed using the same pair of filters so that the final transformed image has the seven "sub-images" shown at 71-77.
  • the sub-images shown at 75- 77 are the high spatial frequency sub-images created by the first application of the wavelet transform.
  • the sub-images shown at 71-74 are the sub-images created by processing the original low spatial frequency sub-image a second time.
  • sub-image 71 is the low spatial frequency coefficient for the block and the remaining sub-images are various high- spatial frequency coefficients that have information of less value to human observers.
  • Several of the "pixels" from the low spatial frequency sub-image are selected to be components of the signature vector 80 as shown at 81.
  • one or more coefficients from the high-spatial frequency sub-images 72 and 73 are also selected for components of signature vector 80 as shown at 82 and 83.
  • the pixels can be combined to reduce the number of components in the signature vector.
  • coefficients from the high spatial frequency components 75 and 76 are also combined to form components of signature vector 80 as shown at 84 and 85.
  • the components can be computed from any function of the pixels in question. Linear
  • components of the signature vector can be reduced further by quantizing the individual components or quantizing the vector as discussed above.
  • the low spatial frequency region shown at 71 will have four coefficients.
  • the two high spatial frequency regions shown at 72 and 73 will also have four coefficients.
  • the four coefficients from sub-image 71 are selected as components of the signature vector.
  • two components are generated from high-spatial frequency sub-image 73 by adding together the components in two 2x2 areas of this region.
  • two additional components are generated from high-spatial frequency sub-image 72 by adding together the components in two 2x2 areas of that region.
  • the resulting signature vector has eight components.
  • the components can be reduced further by quantizing each of these eight components. If the components are held in integer registers, the quantization can be performed by a simple shift Operation on the registers. More complex quantization schemes can also be employed.
  • the entire 8x8 starting block does not need to be subjected to the Haar transforms.
  • the Haar transform is a linear transform, and hence, the coefficients of the compressed image need not all be computed.
  • the selected coefficients can be written in terms of the transformation of the original image in the form given above in Equation 5.
  • the weight coefficients can be chosen such that all of the coefficients have absolute values of I or 0. It should also be noted that the combined coefficients from high-spatial frequency sub- images 72 and 73 can be computed directly using a relationship of the form shown in
  • Equation 5 as the addition is also a linear operation. It should also be noted that the signature vector for a block that is displaced by one pixel from a previous block for which the signature vector is known can be computed by updating the previously computed signature vector, which further reduces the average computational workload in generating the signature vectors for successive blocks in the reference image.
  • the number of bits that are used to represent each component of a signature vector can be chosen to minimize the total number of bits in the signature vector.
  • the designer is free to set the number of bits per component to optimize other design criteria.
  • the computations involved in measuring the distance between two signature vectors can be run in parallel on special purpose hardware. The difference between one pair of corresponding components in the signature vectors does not depend on the difference between any other pair, and hence, these computations can be done in parallel.
  • One form of special purpose hardware comprises a programmable gate array for constructing the engine that computes the difference between the two signature vectors.
  • the number of gates needed to compare a pair of components is determined by the number of bits used to represent the components. Hence, minimizing the number of bits required to represent each component allows fewer gates to be utilized in the special purpose hardware, and hence, reduces the cost of the special purpose hardware.
  • signature block distance calculations are being performed on a general purpose computer, it is advantageous to chose a signature vector that is optimized for vector computations on that computer. For example, a single hardware instruction that computes the distance between two vectors having eight or 16 components in which each component is eight bits is available on a number of general purpose computers. In this case, reducing the components below eight bits does not provide a significant advantage.
  • the problem of matching a block in a current frame against a number of blocks in a reference frame occurs in a number of important applications.
  • the block in the reference frame that, is the best match for the block in the current frame does not need to be from the same object in the two frames.
  • the purpose of the block matching is to provide an approximation to the current block that can be sent to the receiver. That approximation is subtracted from the current block to produce a remainder block that can be compressed with fewer bits than would be needed if the original block were compressed and still provide the desired level of accuracy in the reconstructed image at the receiver.
  • the optimum block size is the size that minimizes the bits that must be transmitted to reconstruct the image.
  • the approximation block in an inter-prediction scheme need not be the corresponding block to the current block in the reference frame image.
  • the location of the current block in the current frame by (xi.yi).
  • the best match for this block in the reference frame would be the block at (x 1 , y 1 ).
  • the best match to the current block might be at (X2,y2)- Since the compression scheme only uses the best block match as an approximation to the current block, it does not matter that the blocks are not corresponding blocks in the two frames.
  • stereo disparity matching two frames taken at the same time from cameras that are displaced from one another and which view the same scene, are processed to provide a three-dimensional reconstruction of the scene.
  • the matching algorithm attempts to identify an object on the current frame with an object in the reference frame by matching a block of pixels in the current frame to a plurality of blocks in the reference frame.
  • the blocks must be big enough to ensure that the matching blocks represent the same part of the same object.
  • the present invention is particularly well suited for such matching.
  • the block of pixels shown at 21 is matched against a sequence in reference frame 25 that is located on the same horizontal scan line 23. That is, it is assumed that the cameras are at the same elevation relative to the scene. Hence, the sequence of blocks to be matched differ from one another by one pixel in the horizontal direction.
  • the signature vector for each 8x8 block is derived from a two level Haar transform as shown in Figure 4.
  • a signature vector for a block is constructed by using the four low-spatial frequency components as the first four components of the signature vector and four high spatial frequency components from high-spatial frequency sub-images72 and four high spatial frequency components from region 74 to provide a 12 component signature vector.
  • Each component of the signature vector requires that four pixels in the original image be added or subtracted.
  • adds Since an add and a subtract impose the same computational workload, these operations will be referred to as "adds" in this example.
  • the cost of generating a signature vector is of the order of 48 adds if the weights in Equation 5 all have the same absolute values. In the general case, the weights would not have the same absolute values, and hence, 48 adds and 48 multiplies would be needed.
  • the signature vectors for each block in the reference frame on a gi ven horizontal line need only be computed once.
  • the signature vectors for each block in the current frame need only be computed once.
  • the computational workload to compute the 2M signature vectors is 2M * 48 adds.
  • the work to compare each block in the current frame to each block in the reference frame is M 2 *C, where C is the cost of comparing two signatures. If the sum of the absolute differences is used, C is of the order of 48 adds (one add to form the difference and one to sum the absolute value of the components).
  • the cost of using the signature vectors of the present in vention is approximately 48 M 2 +48M adds, if the blocks in question were matched using the prior art methods, the cost of comparing two 8x8 blocks is 128 adds. The cost of matching all of the blocks is 128 M 2 .
  • the signature vector approach of the present invention is significantly less computationally intense. As the size of the blocks increases, the difference is even more significant.
  • the optimum block size for stereo disparity matching must be sufficient to ensure that two blocks from different objects in the scene do not inadvertently match one another.
  • the computational workload to match blocks using the direct block matching algorithms increases as the square of the block size.
  • the direct method is often limited to sub-optimum block sizes, in contrast, in the present invention, the size of the signature vector does not necessarily increase at this rate with the size of the block, and hence, the present invention can work with much larger blocks.
  • the present invention can also be utilized in motion estimation procedures that are utilized in video compression. These procedures are also known as inter-prediction or intra-prediction.
  • video compression the next image to be transmitted is broken into a plurality of blocks, which are compressed separately and then transmitted.
  • a prediction block is identified. The block may be based on a previously transmitted frame or part of the current frame that has already been sent to the receiver. The former case is referred to as inter-prediction, and the later case is referred to an intra-prediction.
  • the block that is chosen is the one that best approximates the block that is currently being coded for transmission.
  • the prediction block can be transmitted to the receiver in a few bits that identify the location of the block in the previous frame or the type of intra-prediction.
  • the prediction block is then subtracted from the current block to produce a residual block that is then compressed using some form of image transformation and quantization procedure. If the prediction block is a good match to the cun-ent block, the range of pixel values in the residual block will be much less than the range of pixel values in the current block, and hence, the number of bits that must be transmitted to reconstruct the current block at the receiver to some predetermined accuracy is substantially reduced.
  • FIG. 5 illustrates a video compression engine according to one embodiment of the present invention.
  • a block of pixels to be encoded for transmission is received on line 60 that provides an input port to the engine.
  • an intra-block generator generates potential prediction blocks based on the blocks of the current frame that have already been sent and stores these blocks in a prediction block library 51.
  • the prediction block library 51 also includes a previously sent frame for use in generating inter- frame prediction blocks to be used in the encoding operation.
  • a signature vector is generated by signature generator 52 for the current block on line 60.
  • Controller 50 uses signature comparator 54 to compare a signature for each of the blocks in prediction block library 51 to the signature for the current block and selects the block corresponding to the best signature match and places that block in a buffer 55.
  • the intra- prediction blocks are added to prediction block library 51 by intra-block generator 53 before the start of the comparisons.
  • a residual block is then created by subtracting the block in buffer 55 from the current block.
  • the residual block is then transformed and quantized in a manner analogous to that discussed above by transform/quantizor 56.
  • the output of transform/quantizor 56 is typically encoded using a loss-less encoding scheme such as entropy encoding as shown at 57 prior to being transmitted to the receiver to further reduce the bandwidth needed to send or store the encoded image.
  • Transform/quantizor 56 and coder 57 are con ventional components used in an image compressor that relies on comparing blocks rather than comparing signature vectors.
  • the current block as sent to the receiver is regenerated.
  • the output of transform/quantizor 56 is transformed by inverse transform/quantizor 58 and added to the prediction block to provide a copy of the current block as that block will be regenerated in the receiver.
  • This block is then stored in prediction block libraiy 51 for future use.
  • the signature for this block can also be generated and stored in prediction block library 51 by signature generator 59.
  • prediction block library 51 must either include the signature vectors for all possible blocks or the hardware must generate those signature vectors upon request.
  • the number of possible inter-prediction blocks is approximately the same as the number of pixels in the reference frame.
  • the signature vector of each current block can be generated by signature generator 59 as that block is coded and stored in prediction block library 51 for future use.
  • the signature vector for an inter-prediction block that is not stored in prediction block library 51 is generated the first time the signature vector for that block is requested. The signature vector is then stored for future use. Since each inter-prediction block signature vector will be used multiple times during the encoding of the current frame, the average computational workload for generating these vectors is relatively small. However, the memory needed to store the full complement of signature vectors for the reference frame is approximately N v times the memory needed to store the reference frame, where N v is the number of components in each signature vector, and it is assumed that the number of bits/components is substantially the same as the number of bits used for each pixel in the reference frame.
  • the signature vector for each potential inter-prediction block could be generated when that block is requested by using an algorithm that does not impose this high memory requirement whi le still requiring less work to compute than the work to compute the signature vector from the pixels of the reference frame.
  • the blocks of pixels in an image that coincide with the blocks that are coded during transmission will be referred to as the "encoded" blocks.
  • the signature vector for each encoded block in a reference frame is generated by signature generator 59 when that block was encoded for transmission in a previous frame.
  • the number of encoded blocks is a small fraction of the number of potential inter-prediction blocks. For example, if 8x8 blocks are encoded, the number of potential inter-prediction blocks is 64 times the number of encoded blocks.
  • the memory needed to store the signature vectors for the encoded blocks is much smaller than that needed to store the reference frame.
  • the components of the signature vectors are chosen such that the signature vector for a block that starts between the starting locations of the encoded blocks can be approximated by interpolating the signature vectors of the encoded blocks that are closest to that block.
  • the computational workload is substantially reduced and the storage requirements are also substantially reduced.
  • the components of the signature vectors are the low-spatial frequency components of an image compression transform such as a wavelet transform
  • the components will have this property for the correct choice of wavelet transform, since the components are positive weighted sums of blocks of adjacent pixels, and the blocks that start on pixels that are different from the starting pixels of the encoded blocks will still have a significant number of pixels in common with encoded blocks; hence, the transform coefficients will change slowly as a function of the starting location of the block.
  • FIG. 6 illustrates an engine for performing stereo disparity matching.
  • the current frame will be referred to as the left image and the reference frame will be referred to as the right image; however, the images could be interchanged.
  • each block centered on a gi ven horizontal line in the left image is matched against each block on the same horizontal line in the right image.
  • a block of pixels is defined and entered into a corresponding line buffer.
  • the line buffers for the left and right images are shown at 61 and 62, respectively.
  • a signature is generated for each block by a signature generator, the signal generators are shown at 63 and 64.
  • the signatures may be encoded prior to being stored in a corresponding signature memory.
  • the signature memory for the left image is shown at 65 and the signature memory for the right image is shown at 66.
  • a controller 91 matches a signature from right signature memory 66 with alt of the signatures corresponding to blocks on the same horizontal line that are stored in the left signature .
  • memory 65 if the signatures have been encoded, a signature decoder generates the actual signal vectors from the decoded forms stored in the signature memories.
  • the signature decoders are shown at 67 and 68.
  • Controller 91 keeps track of the block from the two images whose signature vectors are closest to one another.
  • controller 91 also runs the various modules that generate the signatures and store and retrieve those signatures. To simplify the drawings, the connections between controller 91 and the various modules have been omitted.
  • the present invention can be utilized to improve computational workloads in various image recognition systems.
  • objects are characterized by a set of "invariant" points in the object.
  • the invariant points are actually blocks of pixels that have been transformed to a standard orientation and scale and then weighted to emphasize the pixels near the center of the block.
  • the invariant points associated with that object are stored.
  • blocks of pixels within that scene are compared to invariant points in the library after similarly rotating, scaling, and weighting the blocks.
  • a list of blocks that matched is then compared the lists of blocks associated with each object in the library to determine if any of the objects in the library are present in the scene.
  • the reference "frame” is the collection of the blocks in the library. Each block in the current frame is compared with aU of the blocks in the library after the block in the current frame has been similarly rotated, scaled, and weighted.
  • the present invention reduces the workload in making the comparisons.
  • a signature can be associated with each block in the reference frame by using an energy concentrating transform on the reference blocks. This computation only needs to be done once and the signatures become part of the library.
  • a signature is then created for each block to be compared in the unknown image after a similar rotation, scaling, and weighting operation.
  • the comparisons can now be carried out by using the signature vectors rather than matching the blocks directly.
  • the size of the pixel blocks being compared is set by the invariant point library, and are typically much larger than the size of the blocks used in inter-predication. Accordingly, the present invention is particularly well suited for such comparisons.
  • the present invention can be implemented in a variety of hardware embodiments.
  • the present invention is well suited for applications in which each block in a current frame is to be compared with a plurality of blocks in a reference frame to determine the block in the reference frame that most closely matches the block in the current frame, it is useful to differentiate the nomenclature used for the blocks in the current and reference frames, in general, a first block, d, of pixels in a current frame is compared to a second block, R 1 , of pixels in a reference frame,
  • the apparatus includes a signature processor and a distance measuring processor.
  • the signature processor generates signature vectors from blocks of pixels in either frame.
  • the signature processor generates a first signature vector, VCi, for the first block and a second signature vector, VRi, for the second block.
  • the distance measuring processor operates on any two vectors having the same length.
  • the distance measuring processor measures a distance between two vectors Vi and V 2 using a distance function D(V
  • the distance function and signature vectors are chosen such that if where R 2 is a third block of pixels in said reference frame.
  • computing D(CuRi) imposes a first computational workload
  • ) imposes a second computational workload
  • and generating the signature vectors also imposes a third computational workload on the apparatus.
  • the signature vectors and distance functions are chosen such that the sum of the third computational workload and the second computational workload is less than the first computational workload.
  • the apparatus also includes a controller that compares each of a plurality of blocks of pixels in the reference frame to the first block of pixels by causing the signature processor to generate a reference signature vector corresponding to each of the blocks of pixels in the reference frame and measuring the distance between the reference signature vector corresponding to that block of pixels and VRj .
  • the apparatus can be included in a number of systems including stereo disparity systems and motion compensation systems for image compression.
PCT/AU2014/000006 2013-01-09 2014-01-08 Method and apparatus for comparing two blocks of pixels WO2014107762A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201480005080.8A CN104937938A (zh) 2013-01-09 2014-01-08 用于比较两个像素块的方法和设备
EP14737954.9A EP2944085A4 (en) 2013-01-09 2014-01-08 METHOD AND APPARATUS FOR COMPARING TWO PIXEL BLOCKS
KR1020157021426A KR20150128664A (ko) 2013-01-09 2014-01-08 픽셀의 2개의 블록을 비교하기 위한 방법 및 장치
US14/750,942 US20150296207A1 (en) 2013-01-09 2015-06-25 Method and Apparatus for Comparing Two Blocks of Pixels

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2013900077A AU2013900077A0 (en) 2013-01-09 Compact Signatures for Fast Signal Comparison
AU2013900077 2013-01-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/750,942 Continuation US20150296207A1 (en) 2013-01-09 2015-06-25 Method and Apparatus for Comparing Two Blocks of Pixels

Publications (1)

Publication Number Publication Date
WO2014107762A1 true WO2014107762A1 (en) 2014-07-17

Family

ID=51166429

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2014/000006 WO2014107762A1 (en) 2013-01-09 2014-01-08 Method and apparatus for comparing two blocks of pixels

Country Status (5)

Country Link
US (1) US20150296207A1 (ko)
EP (1) EP2944085A4 (ko)
KR (1) KR20150128664A (ko)
CN (1) CN104937938A (ko)
WO (1) WO2014107762A1 (ko)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016029243A1 (en) * 2014-08-26 2016-03-03 Vincenzo Liguori Video compression system that utilizes compact signature vectors for inter and intra prediction
CN106797463A (zh) * 2014-10-03 2017-05-31 索尼公司 信息处理装置和信息处理方法
CN109493295A (zh) * 2018-10-31 2019-03-19 泰山学院 一种非局部哈尔变换图像去噪方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11575896B2 (en) * 2019-12-16 2023-02-07 Panasonic Intellectual Property Corporation Of America Encoder, decoder, encoding method, and decoding method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003150939A (ja) * 2001-11-14 2003-05-23 Fuji Heavy Ind Ltd 画像処理装置および画像処理方法
KR20080004915A (ko) * 2006-07-07 2008-01-10 주식회사 리버트론 H.264 코딩의 압축모드 예측 장치 및 방법
KR20100097387A (ko) * 2009-02-26 2010-09-03 한양대학교 산학협력단 고속 움직임 추정을 위한 부분 블록정합 방법
US20100329347A1 (en) * 2008-01-29 2010-12-30 Dong Hyung Kim Method and apparatus for encoding and decoding video signal using motion compensation based on affine transformation
US20120308144A1 (en) * 2011-06-01 2012-12-06 Sony Corporation Image processing device, image processing method, recording medium, and program

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6625216B1 (en) * 1999-01-27 2003-09-23 Matsushita Electic Industrial Co., Ltd. Motion estimation using orthogonal transform-domain block matching
WO2001041451A1 (en) * 1999-11-29 2001-06-07 Sony Corporation Video/audio signal processing method and video/audio signal processing apparatus
KR100747958B1 (ko) * 2001-09-18 2007-08-08 마쯔시다덴기산교 가부시키가이샤 화상 부호화 방법 및 화상 복호화 방법
US7231090B2 (en) * 2002-10-29 2007-06-12 Winbond Electronics Corp. Method for performing motion estimation with Walsh-Hadamard transform (WHT)
CN100463525C (zh) * 2006-12-11 2009-02-18 浙江大学 计算复杂度可动态调整的视频编码方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003150939A (ja) * 2001-11-14 2003-05-23 Fuji Heavy Ind Ltd 画像処理装置および画像処理方法
KR20080004915A (ko) * 2006-07-07 2008-01-10 주식회사 리버트론 H.264 코딩의 압축모드 예측 장치 및 방법
US20100329347A1 (en) * 2008-01-29 2010-12-30 Dong Hyung Kim Method and apparatus for encoding and decoding video signal using motion compensation based on affine transformation
KR20100097387A (ko) * 2009-02-26 2010-09-03 한양대학교 산학협력단 고속 움직임 추정을 위한 부분 블록정합 방법
US20120308144A1 (en) * 2011-06-01 2012-12-06 Sony Corporation Image processing device, image processing method, recording medium, and program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2944085A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016029243A1 (en) * 2014-08-26 2016-03-03 Vincenzo Liguori Video compression system that utilizes compact signature vectors for inter and intra prediction
CN106797463A (zh) * 2014-10-03 2017-05-31 索尼公司 信息处理装置和信息处理方法
CN109493295A (zh) * 2018-10-31 2019-03-19 泰山学院 一种非局部哈尔变换图像去噪方法
CN109493295B (zh) * 2018-10-31 2022-02-11 泰山学院 一种非局部哈尔变换图像去噪方法

Also Published As

Publication number Publication date
EP2944085A4 (en) 2016-02-17
KR20150128664A (ko) 2015-11-18
US20150296207A1 (en) 2015-10-15
CN104937938A (zh) 2015-09-23
EP2944085A1 (en) 2015-11-18

Similar Documents

Publication Publication Date Title
CN113678466A (zh) 用于预测点云属性编码的方法和设备
CN108028941B (zh) 用于通过超像素编码和解码数字图像的方法和装置
US7916958B2 (en) Compression for holographic data and imagery
Lai et al. A fast fractal image coding based on kick-out and zero contrast conditions
US20060112115A1 (en) Data structure for efficient access to variable-size data objects
US20100114871A1 (en) Distance Quantization in Computing Distance in High Dimensional Space
US20150296207A1 (en) Method and Apparatus for Comparing Two Blocks of Pixels
EP3104615B1 (en) Moving image encoding device and moving image encoding method
JP5799080B2 (ja) 画像シーケンスのブロックを符号化する方法および再構成する方法
CN114651270A (zh) 通过时间可变形卷积进行深度环路滤波
CN109544557A (zh) 基于区块的主成分分析转换方法及其装置
CN112911302A (zh) 一种面向动态点云几何信息压缩的新型合并预测编码方法
WO2022131948A1 (en) Devices and methods for sequential coding for point cloud compression
EP3535973A1 (en) Image and video processing apparatuses and methods
JP5931747B2 (ja) 画像シーケンスのブロックの符号化および復元の方法
JP5809574B2 (ja) 符号化方法、復号方法、符号化装置、復号装置、符号化プログラム及び復号プログラム
Abood et al. Fast Full-Search Algorithm of Fractal Image Compression for Acceleration Image Processing
Huilgol et al. Lossless image compression using seed number and JPEG-LS prediction technique
WO2024084952A1 (ja) 情報処理装置および方法
Kucherov et al. A computer system for images compression
US20230007237A1 (en) Filter generation method, filter generation apparatus and program
KR100316411B1 (ko) 제어 그리드 보간방식에 따른 영상부호기에서 판별형 벡터양자화를 이용한 제어점의 움직임 벡터 산출 장치
WO2016029243A1 (en) Video compression system that utilizes compact signature vectors for inter and intra prediction
Liu et al. Spatio-temporal depth data reconstruction from a subset of samples
Schmalz Digital Images: Compression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14737954

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20157021426

Country of ref document: KR

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2014737954

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014737954

Country of ref document: EP