GB2285359A - Disparity coding images for bandwidth reduction - Google Patents

Disparity coding images for bandwidth reduction Download PDF

Info

Publication number
GB2285359A
GB2285359A GB9326582A GB9326582A GB2285359A GB 2285359 A GB2285359 A GB 2285359A GB 9326582 A GB9326582 A GB 9326582A GB 9326582 A GB9326582 A GB 9326582A GB 2285359 A GB2285359 A GB 2285359A
Authority
GB
United Kingdom
Prior art keywords
block
image
blocks
sub
criteria
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB9326582A
Other versions
GB9326582D0 (en
Inventor
Vassilis Seferidis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Philips Electronics UK Ltd
Original Assignee
Philips Electronics UK Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Philips Electronics UK Ltd filed Critical Philips Electronics UK Ltd
Priority to GB9326582A priority Critical patent/GB2285359A/en
Publication of GB9326582D0 publication Critical patent/GB9326582D0/en
Publication of GB2285359A publication Critical patent/GB2285359A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/40Tree coding, e.g. quadtree, octree
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/54Motion estimation other than block-based using feature points or meshes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0085Motion estimation from stereoscopic image signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

First and second images (such as the left and right images of a stereo pair) are initially divided (30, figure 2) into corresponding blocks of a first size and, for each block of the first image, a search is made (32) for a block at corresponding and nearby positions in the second image. If the search is unsuccessful, the first image block in question, and the blocks searched in the second image, are subdivided and the search repeated for each sub-block. When a reasonable match is made, the required shift is stored and a series of distortions (34) - see figure 4 - are applied to the selected second image block to identify (36) which distortion, pattern, if any, improves the match. The resulting data, from which the first image may be recreated comprises, for each first image block or sub-block, identification of the selected second image block or sub-block in the second image and the shift and distortion applied. <IMAGE>

Description

DESCRIPTION IMAGE CODING The present invention relates to image coding and in particular to a method for disparity encoding of a first two-dimensional image in relation to a second two-dimensional image, in which each of the first and second images is divided into a plurality of regular non-overlapping blocks and, for each block of the first image, performing the steps of: comparing the block with those blocks at and near the corresponding position in the second image and selecting that providing the best match; applying a predetermined series of distortions to the selected block of the second image, comparing the result of each distortion with the first image block, and identifying the applied distortion providing the best match to the first image block; and storing the location of the selected second image block and applied distortion.
In the following description, the invention will be described in terms of a stereo image coding technique wherein one image of a stereo pair is not stored or transmitted in full but is recreated from the other image and difference information. It will be readily appreciated by those skilled in the art however that the technique is applicable to image coding generally and particularly to the coding of sequences of video images.
Area-based stereo disparity estimation has many features in common with motion compensation applied to consecutive frames of an image sequence. A difference between them is the objective of each process.
Motion estimation is regarded as satisfactory as long as the identified areas (pixel blocks) satisfy a selection criterion. Disparity estimation on the other hand must always determine the correct disparity vector, since one stereo image will be reconstructed from the other, at the receiver, and used for stereo perception. Hence, incorporation of convention block based vector estimation techniques have been found unsatisfactory, mainly due to a blockiness effect which degrades the picture quality and forbids the preservation of depth perception. The problems of incorporation are discussed in greater detail in "Improved Disparity Estimation in Stereoscopic Television" by V. Seferidis and D.
Papadimitriou, Electronics Letters, Vol.29, No.9, April 1993, pp. 782783.
Image formation may be considered as a mapping process in which the three-dimensional (3D) scene space is projected onto the twodimensional (2D) image plane. Due to the many-to-one nature of the mapping, the 3D depth information is lost after projection. The depth ambiguities in the resulting 2D image not only give rise to many problems in scene analysis and image understanding applications, they also eliminate the cues for the determination of the spatial relationships between points and surfaces in a scene. Stereo vision provides a direct way of inferring the missing depth information by using two images (a stereo pair) destined for the left eye and right eye respectively.
The stereo images are generated by recording two slightly different view angles of the same scene. Typically, the stereo camera arrangement consists of two identical cameras which are placed close to each other on the same baseline. When each image of a stereo pair is viewed by its respective eye, the stereo image is perceived in 3D. The differences between the two images are very important since they embody information about the geometry of the scene, such as the depth of each object point. Such geometric differences will be referred to hereinafter as disparity and the process of assigning a disparity vector for every pixel or group of pixels will be referred to as disparity analysis.
It is known to use disparity analysis for identifying objects in applications such as robot and computer vision, autonomous navigation, photogrammetry, surveillance and model based coding. In addition to these, stereo imaging has become a particularly important technique in recent multi-disciplinary applications such as 3D television, virtual reality and telepresence.
A natural way to approach disparity analysis is by matching. Matching is perhaps the most important process in disparity estimation because it establishes the correspondence of features that are the same physical identity in each view. The existing stereo matching techniques can be classified into two categories according to the matching primitives used in the process, In so-called area-based methods, each pixel in the right-eye image is searched for correspondence in the left-eye image.
The search is usually performed by calculating a correlation feature (e.g.
a cross-correlation) or a distortion measure (e.g. mean-square error or mean-absolute-difference) of pixel blocks surrounding the pixels under consideration. The so-called feature-based techniques on the other hand use symbolic features derived from the intensity images rather than the image intensities themselves. Typical features include lines, edges, corners and zero crossings. Since these features represent geometric properties of the scene, their location can be identified precisely. Moreover, the whole process is less sensitive to intensity variations. There is however a major drawback, namely the sparsity of the estimated display field.
For stereo image coding this problem becomes even more important because it affects the good reconstruction of one stereo image from the other. A possible solution is to interpolate the missing disparity vectors assuming a monotonic variation of disparity values between the existing samples. The interpolation however increases the computational load without securing a better overall performance. Therefore it has been argued that the requirements of a stereo coding system favours the adoption of area-based methods rather that feature-based ones.
Moreover an area-based stereo coding scheme simplifies the design of coders compatible with existing video coding standards such as H.261, MPEG I, MPEG II. This is very important if disparity and motion estimation algorithms are to be combined in order to exploit temporal as well as spatial similarities of the two sequences.
An additional drawback of traditional matching methods is their poor performance in handling occluded areas, that is to say those parts of the scene which are present in one image only. The problem is worse in stereo compared to interframe occlusion (sometimes referred to as uncovered background) because the twin camera arrangement introduces its own geometric deformations. The deformations are impossible to be compensated with traditional block matching algorithms because they inherently estimate only translations. A successful disparity estimation scheme must be able to cope with nonlinear deformations and occluded objects in order to reconstruct accurately the scene depth. We have recognised that one method which inherently has these properties is generalized block matching, which provides a method for disparity encoding as set forth in the opening paragraph. The principles of generalized block matching are described in greater detail hereinafter and also in the following articles, the disclosures of which are incorporated herein by reference: "Three Dimensional Block Matching Motion Estimation" by V. Seferidis, Electronics Letters, Vol.28, No.18, August 1992, pp. 1770-1772; "General Approach to Block Matching Motion Estimation" by V.
Seferidis and M. Ghanbari, Optical Engineering, Vol.32, No.7, July 1993, pp. 1464-1474.
It is an object of the present invention to provide means for the efficient coding of stereo image sequences or successive video images for transmission and/or storage.
It is a further object of the present invention to provide a coding method and apparatus which takes into account similarities between a pair of images and allows a good reconstruction of one image to be generated from the other.
In accordance with the present invention there is provided a method for disparity encoding as set forth in the opening paragraph and characterised in that the first and second images are initially divided into blocks of a first size, each block of the first image is compared with the correspondingly positioned block of the second image in accordance with predetermined matching criteria and, if the criteria are not met, the first and second image blocks are divided into corresponding sub-blocks and each sub-block compared according to the same criteria, the step of dividing into sub-blocks and comparing being repeated until the criteria are satisfied, and the steps of comparing, applying distortions and storing are then performed for each block or sub-block meeting the criteria.
By applying the generalized block matching scheme to blocks of differing sizes, the present invention compensates more accurately those parts of a scene having large disparity values due to the allocation of smaller blocks to those areas.
The division of blocks into sub-blocks, and the subsequent division of sub-blocks into further sub-blocks, may suitably comprise dividing into four equal portions of the same shape as the parent. To prevent excessive computation to provide minimal effect, a minimum sub-block size may be specified such that the matching criteria are assumed to be satisfied when this block size is reached, regardless of whether or not a further subdivision would otherwise be indicated.
Where each block or sub-block is made up of a number of pixels, the distortions applied by generalized block matching may comprise sequentially moving each corner of a block or sub-block to a number of different positions about its original position, such as a pattern of nine positions each spaced from the original position by n pixels in the horizontal, vertical or diagonal direction, where n is an integer.
Following identification of the applied distortion providing the best match, the mean absolute difference between the first image block and the undistorted second image block and between the first image block and the distorted second image block may be compared and, where the ratio between the two mean absolute differences is below a predetermined value, the applied distortion is omitted from the stored information.
Also in accordance with the present invention there is provided disparity encoding apparatus operable to receive first and second twodimensional images and to encode the first image terms of its disparity with respect to the second image, the apparatus comprising: image receiving means arranged to receive the first and second images and to divide each into a plurality of corresponding non-overlapping blocks of a first size; first comparison means operable to compare each block of the first image with the corresponding block of the second image and a plurality of blocks surrounding the said corresponding block in accordance with predetermined matching criteria and, where the criteria are not met, operable to indicate so to the image receiving means, the image receiving means thereafter operating to divide the block failing to meet the predetermined matching criteria together with the corresponding block of the second image into a plurality of sub-blocks, with the first comparison means being arranged to then repeat the comparison for each sub-block in accordance with the predetermined matching criteria; image modulation means operable to apply, for each first image block or sub-block, a predetermined series of distortions to the respective selected second image block or sub-block; second comparison means arranged to compare each of the series of distorted versions of the selected second image block or sub-block to the respective first image block or sub-block, in accordance with a second predetermined matching criteria;and storage means connected to the first and second comparison means and operable to receive and store, for each block or sub-block of the first image, information identifying the block of the second image meeting the predetermined matching criteria, and the applied distortion best satisfying the second predetermined matching criteria.
Further features and advantages of the present invention will be apparent from reading of the claims and the following description of preferred embodiments of the present invention, now described in the context of stereo image coding by way of example only and with reference to the accompanying drawings in which: Figure 1 is a block diagram of a stereoscopic television coding scheme; Figure 2 is a block schematic representation of coding apparatus embodying the present invention; Figure 3 illustrates the principle of generalized block matching; Figure 4 illustrates the matching of quadrilaterals using a 9-position search algorithm; Figure 5 shows the principle of quad-tree segmentation;and Figures 6 and 7 respectively represent the absolute difference and quadtree segmentation for a stereo pair of a test image.
In Fig.1 there is shown a stereoscopic television coding scheme which incorporates motion/disparity compensation. Left and right image sequences 10,12 are applied initially to respective motion estimators 14,16 which eliminate the redundancy between successive frames of the same sequence. The resulting images are then passed to a disparity estimator 1 8 which calculates a disparity field in a manner to be described below. The difference between the original right image and the disparity and motion compensated image (predictor) is coded by a source encoder 20. The motion compensated left image is coded by a second encoder 22. The two encoder outputs may then be transmitted 24 and, on reception, respective left and right decoders 26,28 recreate the left and right image sequences 10a,12a.
The disparity estimator 1 8 operates in the manner represented by Fig.2.
An image receiver 30 receives the first and second motion compensated images and divides each into a plurality of corresponding non-overlapping blocks of a first size. A first comparator 32 operates to compare each block of the first image with the corresponding block of the second image and a plurality of blocks surrounding the corresponding block in accordance with predetermined matching criteria. Where the matching criteria are not met, the comparator 32 indicates so to the image receiver 30, which then divides the block failing to meet the predetermined matching criteria together with the corresponding block of the second image into a plurality of sub-blocks.
The first comparator 32 then repeats the comparison for each subblock in accordance with the predetermined matching criteria. When the matching criteria are satisfied, an image modulator 34 applies, for each first image block or sub-block, a predetermined series of distortions to the selected second image block or sub-block, in a manner to be described below. A second comparator 36 compares each of the series of distorted versions of the selected second image block or subblock to the first image block or sub-block, in accordance with a second predetermined matching criteria and, having determined which applied distortion best satisfies the second predetermined matching criteria, passes details of the selected second image block or sub-block and the applied distortion to a buffer 38 for subsequent storage or transmission.
The technique of distorting and comparing is known as generalized block matching and is a block matching technique which approximates the deformations of real objects by deforming the corresponding blocks in the picture. As with other block matching techniques, the image is divided into non-overlapping square blocks and a multi-dimensional vector is assigned to each one. For a stereo image pair, the vector consists of the mapping parameters which satisfy the following criteria: xir = f1 (xi1,yi1) xir = f1 (xi1,yi1) sN2 (gTI)2 =min where grand gj'represent the grey values of two blocks of NxN pixels each, of the right and left image respectively and xir,yir and xj',yj' are the corresponding x and y coordinates, where i=1,2,....N2. As will be appreciated, the above summation represents the mean-square-error criterion which guides the search for the optimal position, although other distortion or correlation measures may be used instead. The mapping functions f, and f2 relate the coordinates of the corresponding images in the two stereo images. It is not necessary for f1 and f2to be linear or monotonous: they may represent one-to-many mappings in order to compensate the non-linear deformations introduced by the stereo arrangement. For backwards compatibility with existing block matching techniques however, it is desirable for both functions to facilitate quadrilateral-to-quadrilateral mappings.
Adoption of the general transformation given above for hand yj' suggests that the matching criteria are applied on irregular quadrilateral areas of the stereo pictures. Without loss of generality, it may be assumed that the quadrilaterals in the predicted image remain square blocks as in conventional block matching, as shown by the predicted right image in Fig.3. This assumption simplifies and accelerates the estimation process because the mapping parameters can be derived from a set of simple equations which involve only integer arithmetic.
Experimental results have shown that a considerable improvement can be achieved with the adoption of the generalized block matching in motion estimation both on the prediction picture quality and on the reduction of the bit rate for hybrid DPCM/DCT coders. Results obtained for a variety of pictures suggest that the perspective transformations for the mapping functions f, and f2 outperform bilinear and affine transformations. The perspective mappings are given by the following equations: acxi1 +a1y ta2 xir = a6x; any ta8 yir = a3xi' +a4y11 +a5 a6xil +a,yil ta8 Again, without loss of generality, the mappings can be normalized so that a8 = 1. The motion or disparity vector is then equivalent to the list of mapping parameters where i=0,1,2....7. Fast methods for the calculation of these parameters are described in the above-mentioned Optical Engineering article of V. Seferidis and M. Ghanbari. The idea of fast search algorithms is to selectively check only a small number of possible search positions assuming that the distortion measure monotonically decreases towards the best match position. Hopefully, by checking only some representatives of the whole set of possible combinations, the same accuracy can be achieved but with only a fraction of the operations. An example of such a search is shown in Fig.4 for a block of 16x16 pixels: for the sake of simplicity, only variations of the top left-hand corner of the matching block in the lefteye image are shown. Each quadrilateral is formed by displacing the top left corner by +/- 4 pixels horizontally, vertically or diagonally. For each displacement, all the remaining three corners are similarly displaced and the quadrilateral which minimises the mean-square-error is chosen as the best match. It is easy to verify that the total number of quadrilaterals from the left-eye image that are matched with a square block on the right-eye image is 94=6561.
As with all block matching techniques, the performance of generalized block matching increases with a reduction of the block size, for example using 8x8 pixel blocks rather that 1 6x1 6. However the small block size increases the overhead information (mapping parameters) that must be transmitted or stored. A possible solution to alleviate this problem is to segment the areas that exhibit uniform disparities and to estimate the mapping parameters for each one separately. However, not all segmentation techniques are suitable for disparity compensated stereo coding due to the excessive number of bits required to describe the shape and location of each region. A large amount of overhead is unacceptable in stereo coding applications where the disparity compensation represents only part of a more complex coding scheme and thus it must leave enough bits to encode detailed regions with good fidelity using a conventional source encoder.
We have recognised that a special data structure known as a quad-tree may be used to provide a good compromise between accurate segmentation and the resulting overhead. The segmentation using quad-tree decomposition results in a relatively small overhead because the original image frame is decomposed into sub-blocks whose size, shape and location is predetermined and hence require relatively fewer bits for their description.
Quad-tree is a tree structure in which each node, unless it is a leaf, generates four children. Each child represents a quarter of the area of its' parent. Fig.5b shows an example of quad-tree segmentation applied on the bilevel image of Fig.5a, whereas Fig.5c gives the corresponding graphical representation. In this example the decomposition is driven by a simple homogeneity test regarding the pixel values. It will be noted that each node corresponds to a subblock of the image that is determined both in size and location by its position in the tree. The subdivision of a parent node into its four children is guided by a hypothesis test in which a decision is made whether four adjacent sub-blocks (children) are homogenous in the property of interest. If the result of the test is positive, there is no need for a subdivision of the currently examined block. If the test fails, four children are generated and the parental region is represented by four independent tree nodes corresponding to four children sub-blocks.
In the past quad-tree segmentation has been successfully used in a number of picture coding applications. The traditional construction of the quad-tree representation starts with the assumption that the whole image can be represented by only one node (root) and an initial hypothesis test decides if further splitting is necessary. However, experimental results for both still and moving pictures have shown that in practise it is preferable to start by testing smaller blocks (typically 32x32 pixels) instead of the entire picture since the homogeneity test within larger blocks is rarely successful.
Similar constraints are introduced for the size of the smallest blocks in order to maintain the overhead information within acceptable limits.
Research on variable transform coding and vector quantization suggests that the lowest level of the quad-tree representation should be in the region of 4x4 pixels. In the case of disparity compensation however, this size is too small and requires an unacceptably large amount of addressing information (overhead). In an attempt to avoid overshooting of the overhead information, a minimum permitted block size of 8x8 pixels is preferred. To further simplify the segmentation process, the AC energy of each pixel block may be used as the hypothesis test.
The AC energy is defined as: AC. energy = N2 (gi-g)2 where gi is the intensity values of the individual pixels in the block and g is the mean intensity value of the whole block. According to this approach, the algorithm calculates the AC energy for each NxN pixel block and if its value is greater than a threshold, further subdivision of the block is carried out. The threshold value is suitably chosen to be half the AC energy calculated over the four children belonging to the same parental node.
The quad-tree segmentation starts with the application of the above algorithm to 256 blocks of 32x32 pixels which make up the original 512x512 pixel image. The blocks that exhibit large disparity differences (that is to say with an AC energy level above the threshold) are further subdivided into 16x16 or 8x8 blocks according to the recursion of the algorithm. Fig.6 shows the quad-tree segmentation obtained from a stereo pair of images of an engine component. The segmentation consists of 148 blocks of 32x32 pixels, 104 blocks of 1 6x1 6 pixels and 1312 blocks of 8x8 pixels each (1564 blocks in total).
Assuming that for each block its starting position (12 bits) as well as its size (2 bits) has to be transmitted, there are approximately 21 896 bits maximum overhead information for the description of the quad-tree to the decoder.
We have recognised that quad-tree decomposition provides a relatively economical and effective solution to the problem of object segmentation for the generalized block matching. A preferred way to combine the generalized block matching with quad-tree segmentation is to perform disparity estimation in two stages. In the first stage the translational component of the disparity vector is estimated using the conventional full search block matching to the variable sized blocks that result from the quad-tree segmentation.
After the estimation of the translational component, an attempt to compensate for other non-linear deformations is made. For that, each corner of a block is independently displaced according to a predetermined pattern (i.e. 9 search positions as shown in Fig.4). As noted earlier, there are 6561 candidate quadrilaterals for each block.
Assuming that 3 bits are required to describe the displacement of each corner, there are 12 bits per block to describe the mapping parameters to the decoder. This relatively high figure rep resents the upper limit for the disparity overhead. Adoption of a more sophisticated coding scheme than a simple PCM used here can further reduce this overhead.
An alternative way to reduce the overhead information is to impose a discrimination process which will compensate only blocks that exhibit considerable improvement after the application of the generalized block matching. Details of such implementation is described below.
The two-stage algorithm described above is applied on blocks of differing sizes which has the advantage of compensating more accurately those parts of the screen having larger disparity values due to the smaller size of the blocks allocated to those areas of the picture.
On the other hand, large blocks are assigned to low disparity areas which are usually successfully compensated with only the translational component of the disparity vector. This is in accordance with the characteristics of the Human Visual System (HVS) regarding the depth resolution required for an accurate perception of 3D from stereopair images. Hence, the size of each block also gives cues to whether the application of the computationallyexpensive generalized block matching is necessary or not.
As an example of the application of the two-stage algorithm described above, Fig.7 shows the absolute difference between two members of a stereo pair of images of an engine component (for which the quad tree segmentation is shown in Fig.6), magnified by a factor of 5.
Original monochrome pictures of 512x512 pixels were each quantized to 256 grey levels. Based on this differential signal and starting with pixel blocks of 32x32 pixels, the quad-tree segmentation of Fig.6 was obtained. Then disparity estimation based on generalized block matching was applied on each block. For the first stage of the algorithm the conventional full search block matching was applied over a 65x9 pixel search window. This resulted in 585 possible candidates for each block. The best matched block was then deformed in the second stage according to the 9-position pattern.
In order to reduce the overhead information, a classification can be introduced based on the mean absolute difference (MAD). According to this, the mapping parameters are transmitted only if the generalized block mapping reduces the MAD value of a block to a value lower than two-thirds of that resulting from the first stage (the conventional full search). In the above example, this discrimination process reduced the number of coded blocks from 1564 to 1202. The overhead information for the description of mapping parameters then became 1 202x1 2 = 14424 bits and that for the quad-tree description became 1202x14 = 16828 bits.
The resulting disparity compensated picture used as a predictor in a simple DPCM/DCT coder which encodes the right-eye picture. The coder takes the difference between the input right-eye picture and the disparity compensated left-eye picture (predictor) and divides the resulting signal into non-overlapping 8x8 pixel blocks. Each block is then 2D discrete-cosine transformed (DCT) and quantized with a fixed quantization step size. As can be seen from Table 1 below, the number of bits required to code the right eye picture using the method of the present invention in the example was 325993 bits in total. This compares very favourably with the 386467 bits required to code the same picture using a simpler prediction formed by the conventional full search block matching with fixed sized blocks (16x16 pixels) and utilizing the same search window of 16x9 pixels.
We have found that the generalized block matching using quad-tree decomposition results in lower bit-rate than conventional block matching methods because it produces a better predicti feature of the two blocks. In both methods, illumination is considered uniform in both the spatial and temporal domain. To satisfy the uniformity assumptions, relatively small pixel blocks (such as 8x8 or 16x16) are used in practice.
Although stereo disparity and dynamic movement analysis are generally closely related, there are several special characteristics typical to each process. Firstly, in stereo images there is a fixed geometric relationship between the two stereo views which can be used to assist the disparity estimation. The best matched blocks for example are very likely to be found in the same scan line in the two images due to the horizontal arrangement of the two cameras. Theoretically, this will cause the collapse of the search window into a single line, although in reality a search window with a horizontal dimension many times larger than that of the vertical one is used, such as 126x9 pixels, to account for small misalignments, tiltings, noise etc.
Another important difference is related to the amplitude of the encountered displacements. In motion estimation, only a limited number of objects is moving in a scene and their displacement is not very large (e.g. + /- 16 pixels for a CCIR frame). In contrast, the images of a stereo pair have a geometric disparity throughout the picture and the displacements are commonly very large. Moreover, in motion estimation, large interframe differences are mainly caused by fast moving objects whereas in disparity estimation they are due to objects close to the cameras.
From reading the present disclosure, other modifications will be apparent to persons skilled in the art. Such modifications may involve other features which are already known in the design, manufacture and use of image transmission and storage systems, display apparatuses and component parts thereof and which may be used instead of or in addition to features already described herein. Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure of the present invention also includes any novel feature or any novel combination of features disclosed herein either explicitly or implicitly or any generalisation thereof, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention. The applicants hereby give notice that new claims may be formulated to such features and/or combinations of features during the prosecution of the present application or of any further application derived therefrom.
Table 1:
Without disparity Conventional full Generalized block compensation search (fixed size matching (variable size blocks) blocks) Total number ~~ 1024 1564 of blocks Number of --- 1024 1202 coded blocks Prediction 21.64dB 30.31dB 35.88dB quality (SNR) Bits to code 463072 376227 294741 right image Overhead 0 10240 14424+16828 Total bits 463072 386467 325993

Claims (10)

  1. CLAIMS 1. A method for disparity encoding of a first two-dimensional image in relation to a second two-dimensional image, in which each of the first and second images is divided into a plurality of regular non-overlapping blocks and, for each block of the first image; a) comparing the block with those blocks at and near the corresponding position in the second image and selecting that providing the best match; b) applying a predetermined series of distortions to the selected block of the second image, comparing the result of each distortion with the first image block, and identifying the applied distortion providing the best match to the first image block: and c) storing the location of the selected second image block and applied distortion, characterised in that the first and second images are initially divided into blocks of a first size, each block of the first image is compared with the correspondingly positioned block of the second image in accordance with predetermined matching criteria and, if the criteria are not met, the first and second image blocks are divided into corresponding sub-blocks and each sub-block compared according to the same criteria, the step of dividing into sub-blocks and comparing being repeated until the criteria are satisfied, and steps a),b) and c) are then performed for each block or sub-block meeting the criteria.
  2. 2. A method according to Claim 1, in which the step of dividing a block into sub-blocks, or a sub-block into further sub-blocks, comprises dividing the block or sub-block into four equal portions.
  3. 3. A method according to Claim 1 or Claim 2, in which the step of applying a predetermined series of distortions comprises sequentially moving each corner of the block to a number of different positions about its original position.
  4. 4. A method according to Claim 3, wherein the block is made up of a number of pixels and each corner of the block is moved through nine positions each of which is spaced from the original position by n pixels in the horizontal, vertical or diagonal direction, where n is an integer.
  5. 5. A method according to any of Claims 1 to 4, comprising the further step, following identification of the applied distortion providing the best match, of comparing the mean absolute difference between the first image block and the undistorted second image block and between the first image block and the distorted second image block and, where the ratio between the two mean absolute differences is below a predetermined value, omitting the applied distortion from the stored information.
  6. 6. A method according to Claim 5, in which the applied distortion is omitted when the mean absolute difference between the first image and undistorted second image blocks is less than 1.5 times greater than that between the first image and distorted second image blocks.
  7. 7. A method according to any preceding Claim, in which the said predetermined matching criteria determining whether a block or subblock is to be divided are assumed to be satisfied when the block or sub-block under consideration is of a predetermined minimum size.
  8. 8. Disparity encoding apparatus operable to receive first and second two-dimensional images and to encode the first image terms of its disparity with respect to the second image, the apparatus comprising: image receiving means arranged to receive the first and second images and to divide each into a plurality of corresponding non-overlapping blocks of a first size; first comparison means operable to compare each block of the first image with the corresponding block of the second image and a plurality of blocks surrounding the said corresponding block in accordance with predetermined matching criteria and, where the criteria are not met, operable to indicate so to the image receiving means, the image receiving means thereafter operating to divide the block failing to meet the predetermined matching criteria together with the corresponding block of the second image into a plurality of sub-blocks, with the first comparison means being arranged to then repeat the comparison for each sub-block in accordance with the predetermined matching criteria; image modulation means operable to apply, for each first image block or sub-block, a predetermined series of distortions to the respective selected second image; second comparison means arranged to compare each of the series of distorted versions of the selected second image block or sub-block to the respective first image block or sub-block, in accordance with a second predetermined matching criteria;and storage means connected to the first and second comparison means and operable to receive and store, for each block or sub-block of the first image, information identifying the block of the second image meeting the predetermined matching criteria, and the applied distortion best satisfying the second predetermined matching criteria.
  9. 9. A method for disparity encoding substantially as hereinbefore described with reference to the accompanying drawings.
  10. 10. Disparity encoding apparatus substantially as hereinbefore described with reference to the accompanying drawings.
GB9326582A 1993-12-31 1993-12-31 Disparity coding images for bandwidth reduction Withdrawn GB2285359A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB9326582A GB2285359A (en) 1993-12-31 1993-12-31 Disparity coding images for bandwidth reduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB9326582A GB2285359A (en) 1993-12-31 1993-12-31 Disparity coding images for bandwidth reduction

Publications (2)

Publication Number Publication Date
GB9326582D0 GB9326582D0 (en) 1994-03-02
GB2285359A true GB2285359A (en) 1995-07-05

Family

ID=10747315

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9326582A Withdrawn GB2285359A (en) 1993-12-31 1993-12-31 Disparity coding images for bandwidth reduction

Country Status (1)

Country Link
GB (1) GB2285359A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999010841A1 (en) * 1997-08-26 1999-03-04 HEINRICH-HERTZ-INSTITUT FüR NACHRICHTENTECHNIK BERLIN GMBH Device for the view-adaptable synthesis of object images
EP1416738A3 (en) * 2002-11-04 2004-09-01 Samsung Electronics Co., Ltd. Adaptive DCT/IDCT apparatus based on energy and method for controlling the same
WO2005011284A1 (en) * 2003-07-24 2005-02-03 Eastman Kodak Company Foveated video coding and transcoding system and method for mono or stereoscopic images
DE102004049163B3 (en) * 2004-10-08 2005-12-29 Siemens Ag Method for determining a complete depth information of a camera image
US8325196B2 (en) 2006-05-09 2012-12-04 Koninklijke Philips Electronics N.V. Up-scaling
US10129522B2 (en) 2011-11-28 2018-11-13 Thomson Licensing Processing device for the generation of 3D content versions and associated device for obtaining content

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0005918B1 (en) * 1979-05-09 1983-05-04 Hughes Aircraft Company Scene tracker system
EP0254643A2 (en) * 1986-07-22 1988-01-27 Schlumberger Technologies, Inc. System for expedited computation of laplacian and gaussian filters and correlation of their outputs for image processing
GB2198310A (en) * 1986-11-06 1988-06-08 British Broadcasting Corp 3d video transmission
EP0353644A2 (en) * 1988-08-04 1990-02-07 Schlumberger Technologies Inc Configurable correlation windows for the direct measurement of differential field distortion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0005918B1 (en) * 1979-05-09 1983-05-04 Hughes Aircraft Company Scene tracker system
EP0254643A2 (en) * 1986-07-22 1988-01-27 Schlumberger Technologies, Inc. System for expedited computation of laplacian and gaussian filters and correlation of their outputs for image processing
GB2198310A (en) * 1986-11-06 1988-06-08 British Broadcasting Corp 3d video transmission
EP0353644A2 (en) * 1988-08-04 1990-02-07 Schlumberger Technologies Inc Configurable correlation windows for the direct measurement of differential field distortion

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999010841A1 (en) * 1997-08-26 1999-03-04 HEINRICH-HERTZ-INSTITUT FüR NACHRICHTENTECHNIK BERLIN GMBH Device for the view-adaptable synthesis of object images
EP1416738A3 (en) * 2002-11-04 2004-09-01 Samsung Electronics Co., Ltd. Adaptive DCT/IDCT apparatus based on energy and method for controlling the same
US7142598B2 (en) 2002-11-04 2006-11-28 Samsung Electronics Co., Ltd. Adaptive DCT/IDCT apparatus based on energy and method for controlling the same
WO2005011284A1 (en) * 2003-07-24 2005-02-03 Eastman Kodak Company Foveated video coding and transcoding system and method for mono or stereoscopic images
DE102004049163B3 (en) * 2004-10-08 2005-12-29 Siemens Ag Method for determining a complete depth information of a camera image
US8325196B2 (en) 2006-05-09 2012-12-04 Koninklijke Philips Electronics N.V. Up-scaling
US10129522B2 (en) 2011-11-28 2018-11-13 Thomson Licensing Processing device for the generation of 3D content versions and associated device for obtaining content

Also Published As

Publication number Publication date
GB9326582D0 (en) 1994-03-02

Similar Documents

Publication Publication Date Title
US20230308638A1 (en) Encoding method and device therefor, and decoding method and device therefor
EP3571839B1 (en) Video encoder and decoder for predictive partitioning
Merkle et al. The effects of multiview depth video compression on multiview rendering
US9036933B2 (en) Image encoding method and apparatus, image decoding method and apparatus, and programs therefor
US6144701A (en) Stereoscopic video coding and decoding apparatus and method
Shen et al. Efficient intra mode selection for depth-map coding utilizing spatiotemporal, inter-component and inter-view correlations in 3D-HEVC
US20230056139A1 (en) Encoding method based on encoding order change and device therefor, and decoding method based on encoding order change and device thprefor
KR20220162859A (en) Adaptive partition coding
Moellenhoff et al. Transform coding of stereo image residuals
EP2624566A1 (en) Method and device for encoding images, method and device for decoding images, and programs therefor
Ahmmed et al. Dynamic point cloud geometry compression using cuboid based commonality modeling framework
Shahriyar et al. Depth sequence coding with hierarchical partitioning and spatial-domain quantization
Aydinoglu et al. Compression of multi-view images
GB2285359A (en) Disparity coding images for bandwidth reduction
US5990956A (en) Method and apparatus for padding a video signal for shape adaptive transformation
Aydinoglu et al. Region-based stereo image coding
Moellenhoff et al. DCT transform coding of stereo images for multimedia applications
Seo et al. A least-squares-based 2-D filtering scheme for stereo image compression
US6968009B1 (en) System and method of finding motion vectors in MPEG-2 video using motion estimation algorithm which employs scaled frames
Ahmmed et al. Dynamic mesh commonality modeling using the cuboidal partitioning
Etoh et al. Template-based video coding with opacity representation
Chou et al. Video coding algorithm based on image warping and nonrectangular DCT coding
Jiang et al. Efficient block matching for Ray-Space predictive coding in Free-Viewpoint television systems
Takano et al. 3D space coding using virtual object surface
Chien et al. Fast disparity estimation algorithm for mesh-based stereo image/video compression with two-stage hybrid approach

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)