WO2014154574A1

WO2014154574A1 - Method, apparatus and computer program for image processing

Info

Publication number: WO2014154574A1
Application number: PCT/EP2014/055687
Authority: WO
Inventors: Piergiorgio Sartor; Francesco Michielin
Original assignee: Sony Corporation; Sony Deutschland Gmbh
Priority date: 2013-03-25
Filing date: 2014-03-21
Publication date: 2014-10-02

Abstract

Method for image processing, in particular for motion and/or disparity estimation, comprising: providing a set of temporal or spatial related picture frames containing correlated blocks, transforming the picture frames by binary compressing the picture frames at least partially to provide binary pixel blocks of the pixel frames, correlating the binary pixel blocks, and determining the correlated picture blocks in the set of picture frames on the basis of the correlation of the binary pixel blocks.

Description

METHOD, APPARATUS AND COMPUTER PROGRAM FOR IMAGE PROCESSING

BACKGROUND

Field of the Disclosure

[0001] The present disclosure relates to a method for image processing. The invention also relates to an apparatus for image processing, a computer program as well as a non-transitory computer-readable recording medium.

Description of Related Art [0002] There is an increased demand for 2D/3D and multiple view applications, all of them requiring image processing, like motion estimation, disparity estimation or picture frame interpolation. In the art several methods for estimating motion and disparity are known. Most of them are working independently of each other and do not use spatial and temporal information between multiple picture frames captured by e.g. two or more cameras. The same applies to frame interpolation, i.e. creating new frames based on other frames and estimation of vector information. In order to provide the motion estimation, the disparity estimation or the picture frame interpolation it is necessary to identify correlated pixel blocks in different image frames.

[0003] Therefore there is a demand for providing a reliable and effective image processing to identify correlated pixel blocks in different image frames.

[0004] In order to determine a motion in a set of images US 2008/0037869 Al suggests to calculate an absolute difference between pixels of a certain pixel region in the image frames and to determine the correlated pixel blocks on the basis of the absolute difference between the pixels.

[0005] However, the known methods for identifying correlated pixel blocks in a set of image frames are complex, require a long processing time and a fast processing unit to calculate the respective motion vectors.

[0006] The "background" description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventor(s), to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present invention.

SUMMARY [0007] It is an object to provide a method for image processing which achieves an improved fast motion and/or disparity estimation with low technical effort. It is a further object to provide an apparatus for image processing which achieves an improved fast motion and/or disparity estimation with low technical effort, as well as a corresponding computer program for implementing the method and a non-transitory computer-readable recording medium for implementing the method.

[0008] According to an aspect there is provided a method for image processing, in particular for motion and/or disparity estimation, comprising:

providing a set of temporal and/or spatial related picture frames containing correlated picture blocks,

transforming the picture frames by binary compressing the picture frames at least partially to provide binary pixel blocks of the pixel frames,

correlating the binary pixel blocks, and

determining the correlated picture blocks in the set of picture frames on the basis of the correlation of the binary pixel blocks.

[0009] According to a further aspect there is provided an apparatus for image processing of a number of spatially and/or temporally separated picture frames, in particular for motion and/or disparity estimation, comprising:

a transformation unit adapted to compress the picture frames at least partially and to provide binary pixel blocks of the picture frames,

a correlation unit adapted to correlate the binary pixel blocks, and a detection unit for detecting correlated picture blocks in the set of picture frames on the basis of the correlation of the binary pixel blocks.

[0010] According to still further aspects a computer program comprising program means for causing a computer to carry out the steps of the method disclosed herein, when said computer program is carried out on a computer, as well as a non-transitory computer-readable recording medium that stores therein a computer program product, which, when executed by a processor, causes the method disclosed herein to be performed are provided.

[0011] Preferred embodiments are defined in the dependent claims. It shall be understood that the claimed apparatus, the claimed computer program and the claimed computer-readable recording medium have similar and/or identical preferred embodiments as the claimed method and as defined in the dependent claims.

[0012] One of the aspects of the present disclosure is to implement a motion and/or a disparity estimation using multiple frames in time and/or space. In order to determine correlated pixel blocks in the temporal or spatial related picture frames, the respective picture frames are binarized at least partially to provide respective binary pixel blocks of the pixel frames. Preferably, the pixel blocks which are estimated to be correlated are binarized and compressed to binary pixel blocks in order to reduce the technical effort and the time to process those pixel blocks. After the binarization, the binary pixel blocks are correlated in order to determine the correlated picture blocks in the set of picture frames on the basis of the correlation of the binary pixel blocks.

[0013] Since the picture frames are compressed by means of the binarization, the technical effort and the time for the image correlation is reduced and the correlated picture blocks can be determined with low technical effort by binary pixel block correlation.

[0014] It is to be understood that both the foregoing general description of the invention and the following detailed description are exemplary, but are not restrictive, of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:

Fig. 1 shows a schematic block diagram illustrating an embodiment of the method for image processing;

Fig. 2 shows a block diagram illustrating the correlation of pixel blocks in a set of picture frames;

Fig. 3 shows picture frames and binarized picture frames for illustrating the binary correlation;

Fig. 4 shows a diagram for illustrating the cross-correlation of the binarized pixel blocks of Fig. 3; and

Fig. 5 shows a schematic block diagram of an apparatus for image processing.

DESCRIPTION OF THE EMBODIMENTS

[0016] Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several views, Fig. 1 shows a schematic block diagram of an embodiment of a method for image processing which is generally denoted by 10.

[0017] A video stream comprising different picture frames are provided as indicated by 12, wherein the picture frames are spatial and/or temporal separated picture frames from which a certain motion or a certain disparity should be estimated and/or certain frames should be interpolated. In a following step as indicated by 14, the picture frames are preprocessed. The preprocessing step 14 may include filtering, scaling in order to reduce the pixel size and/or the resolution of the picture frames, contrast enhancement of the picture frames and/or frequency manipulation such as low-pass filtering or high-pass filtering which may be applied depending on the noise level of the picture frames. The so- preprocessed image frames are binarized in a further step as indicated by 16. This binariza- tion performs a binary compression of the image frames, i.e. the input image is reduced from 8 or 10 or 12 bit per pixel to one bit per pixel. The binarization may be performed by calculating an average level of the pixels of the image frame or an average level of a certain pixel block of the image frame and by comparing each pixel value of the image frame or the certain image block with the respective average value. Depending on the pixel value with respect to the average value, a 1 or a 0 is assigned to the pixel in order to provide the binary image frame or the binary pixel block. If the pixel value is equal to the average value, a 1 or a 0 can be assigned to the respective pixel and it is integrated in a comparison.

[0018] Alternatively each pixel value can be compared with a central value and a 1 or a 0 is assigned to the respective pixel depending on the comparison. In general, the pixel values of the image frame or of a certain block are compared with a certain value and a 1 or a 0 is assigned to the respective pixel to binarize the image frames.

[0019] The binarized image frames or pixel blocks are blockwise cross- correlated as indicated by 18. The cross correlation is performed between a binarized source block of a first picture frame and a search block or search area of a second picture frame. The search block may be larger than the source block in order to determine the correlated pixel blocks of the two different image frames.

[0020] The search block or the search area is determined on the basis of an estimation process in order to reduce the search effort to find the correlated pixel blocks. The estimation process selects from a set of vectors a plurality of candidate vectors which are assigned to the source block and point to the search area, where the pixel block correlated to the source block is estimated. The estimation may be a motion estimation process. [0021] The cross-correlation 18 provides a cross-correlation value by multiplying each pixel of the search block with a corresponding pixel of the search area. A sum of each of the multiplication result of the pixels forms the cross-correlation result, wherein a high value indicates that most of the compared pixels have an identical value (0 or 1) and have a high degree of correlation. The cross correlation is performed for different positions of the source block in the search area so that a matching can be found if a correlated pixel block is present in the search area. If the correlated pixel blocks are found, a respective vector from the source block to the correlated pixel block in the search area is determined and the determined vector is provided as an output vector as indicated by 20. Further, the so-determined vector is fed back to the cross-correlation 18 in order to select the candidate vectors from the field of vectors for a cross correlation of the next picture frame. The step of feeding the output vector back to the cross correlation in order to select candidate vectors is called predictor-setup and is denoted by 22.

[0022] The multiplication of the pixels of the binary pixel blocks may be implemented as a simple XNOR function. Alternatively, the cross-correlation may be implemented as an XOR function, wherein in that case the lowest value of the cross-correlation result indicates that most of the compared pixels have an identical value (0 or 1) and have a high degree of correlation.

[0023] The method 10 is usually carried out as 3D or 2D recursive motion estimation, wherein the highest correlation value is defined as a match of the correlated pixel blocks.

[0024] Fig. 2 shows an example of two picture frames, which are in Fig. 1 indicated with the reference numerals 24, 26. The picture frames 24, 26 are captured e.g. by two cameras as left and right frames for three-dimensional imaging or as two time instances t, t+1 of a sequence of consecutive images captured by one camera. As generally known, in a digital electronics environment, a frame is built up of pixels, each pixel carrying information for example on colour etc. In Fig. 1 the picture frames 24, 26 are built of an array of pixel blocks 28 each comprising an array of pixels.

[0025] In Fig. 2 a source pixel block 30 is schematically shown in a first position within the picture frame 24. In the second picture frame 26, the pixel block corresponding to the source pixel block 30 is located at a different position wherein the difference between both positions characterize a movement of the respective pattern or the disparity between both corresponding pixel blocks. The present method is provided to identify the position of the pixel block corresponding to the source pixel block 30 in the second picture frame 26 in order to determine the disparity between both corresponding pixel blocks and/or the movement of the respective corresponding pixel blocks from one picture frame 24 to the other picture frame 26.

[0026] In order to identify the pixel block corresponding to the source pixel block 30 in the second picture frame 26, a plurality of estimation vectors 32, 34, 36, 38, 40 are determined or selected from a field of vectors estimating a movement of a disparity of the corresponding pixel block and pointing from the position of the source pixel block 30 to an estimated position 42, 44, 46, 48, 50 of the corresponding pixel block in the second picture frame 26. It should be noted that the estimated position 42-50 are coarse estimations of the position of the corresponding pixel block estimated e.g. on the basis of determined previous motion vectors and/or disparity vectors and the correct position of the corresponding pixel block in the second picture frame 26 has to be determined or verified by a following image processing step.

[0027] Departing from the estimated positions 42-50 a search area 52, 54, 56, 58, 60 or a search pixel block 52-60 is determined for each of the estimated positions 42- 50. The search areas 52-60 are usually larger than the source block 30 and surround usually the estimated positions 42-50. In order to identify the correct position of the corresponding pixel block, the source pixel block 30 and the pixel blocks of the search areas 52-60 are binarized to a binary pixel block and compared by means of a cross-correlation. Since the search areas 52-60 are larger than the source pixel block 30, the source pixel block 30 is successively compared to different positions in the search areas 52-60 and a cross- correlation value is determined for each position by multiplying the binary pixel blocks of the source block 30 and the search area 52-60 or by means of an XNOR operator wherein the different values of the pixel by pixel comparison are added to determine the cross- correlation value. Hence, the best matching of the source block 30 and the search areas 52- 60 lead to the highest cross-correlation value and, therefore, the pixel block in the search area 52-60 corresponding to the source pixel block 30 can be determined in a fast processing with a high reliability. If the alternative XOR function is used for the cross- correlation of the source block 30 and the search area 52-60 and the cross-correlation value is correspondingly determined by adding the results of each of the XOR operations, the best matching of the source block 30 and the search areas 52-60 lead to the lowest cross- correlation value. Hence, the pixel block in the search area 52-60 corresponding to the source pixel block 30 can be determined also on the basis of the XOR operstion with a high reliability. The pixel block corresponding to the source pixel block 30 determined in the second picture frame 26 by means of the cross-correlation is indicated as an example by reference numeral 62.

[0028] On the basis of the so-determined corresponding pixel block 62, a motion vector or disparity vector 64 pointing from the source pixel block 30 to the corresponding pixel block 62 in the second picture frame 26 is determined and provided as the image processing result. The so-determined output vector 64 is used to determine the estimated vectors 32-40, e.g. by selecting the most probable vectors from a field of estimated vectors 42-50 as candidate vectors for the next search and cross-correlation step in the following picture frame. In other words the determination of the corresponding pixel block 62 is based on the results of the previous image processing steps or the analysis of the previous picture frames and fed-back as mentioned above by means of the predictor set-up 22.

[0029] In a preferred embodiment, the size of the search areas 52-60 is variable and is changed on the basis of a reliability of the estimation of the estimation vectors 32- 40. If the reliability of the estimation vectors 32-40 is high so that the estimated positions 42-50 is expected to be very close to the correlated pixel block 62 to be determined, the size of search area 52-60 is reduced. If the reliability of the estimation of the estimation vectors 32-40 is low, the search area 52-60 is increased so that the correlated pixel block 62 can be found within the larger search areas 52-60 with a high probability.

[0030] The reliability of the estimation vectors 32-40 can be determined on the basis of an amount of the estimated vectors 32-40, wherein a large amount of estimation vectors 32-40 or candidate vectors 32-40 indicate that the estimation of the estimated positions 42-50 has a low reliability so that the size of the search areas 52-60 should be increased in order to find the correlated pixel block 62 with a higher probability. Alternatively, the reliability estimation can be based on a uniformity of the estimation vectors 32- 40, wherein the estimation is considered to have a poor reliability if the vector candidates 32-40 point in many different directions i.e. have a low uniformity and that the estimation has a high reliability if the estimation vectors 32-40 point in one direction i.e. have a high uniformity. If the uniformity of the candidate vectors 32-40 is high, the size of the search are 52-60 can be reduced. Additionally, the reliability estimation can be based on a correlation result of the binary pixel blocks of different picture frames determined in a previous processing step. In other words, if a correlation value of the binary pixel blocks is high for the XNOR operation, the correlated pixel block 62 is determined with a high reliability. Alternatively, if the XOR operation is used, the correlated pixel block 62 is determined with a high reliability if the correlation value of the binary pixel blocks is low. If the output vector 64 of the previous processing step which is used to determine the candidate vectors or the estimation vectors has a high reliability the estimation of the candidate vectors also has a high reliability. In that case, the size of the search area can be reduced since the reliability of the estimation vector is considered to be high.

[0031] The plurality of estimation vectors 32-40 are usually selected from a field of estimation vectors and are selected on the basis of a motion and/or disparity relation of the different picture frames. The estimated vectors 32-40 can be selected from the field of estimation vectors on the basis of a general movement of the image pattern within the picture frames. In other words, if a general movement e.g. of the background of the image is determined in the picture frames, the estimated vectors 32-40 are selected on the basis of this detected general movement of the image pattern. The general movement can also be determined on the basis of a movement sensor within the used camera which determines the movement of the camera independently and wherein the general movement of the image pattern within the picture frames is estimated on the basis of the measurement of the motion sensor. Further, the estimation vectors 32-40 can be determined on the basis of scaled picture frames of the set of picture frames, wherein the pixel size of the scaled picture frames is reduced in order to provide a fast movement detection within the scaled picture frame so that the coarse position of the correlated pixel block 62 can be determined quickly on the basis of a coarse picture calculation.

[0032] In a further embodiment, the size of the search area 52-60 can be adapted independently in X-direction and in Y-direction depending on the estimation vectors 32- 40.

[0033] Fig. 3 shows two picture frames as captured and two corresponding bina- rized picture frames and the respective source blocks for illustrating the cross-correlation process 18 in order to determine the corresponding pixel block 62. Fig. 3a shows a first picture frame 66 corresponding to the first picture frame 24 and a second picture frame 68 corresponding to the second picture frame 26 shown in Fig. 2. Further, a source block 70 as part of the first picture frame 66 is shown corresponding to the source block 30 shown in Fig. 2. The pictures frames 66, 68 and the source block 70 shown in Fig. 3a are monochrome or coloured pictures as captured by the respective camera.

[0034] Fig. 3b shows a first binarized picture frame 72 and a second binarized picture frame 74 corresponding to the first picture frame 66 and the second picture frame 68 shown in Fig. 3a. Further, a binarized source block 76 corresponding to the source block 70 shown in Fig. 3a is shown in Fig. 3b. [0035] The binarization of the picture frames 66, 68 is performed as described above, wherein the pixel value of each pixel of the picture frames 66, 68 is compared to a certain pixel threshold level and depending on the comparison result a 0 or a 1 is assigned as a binary pixel value to the respective pixel in order to provide the binarized picture frames 72, 74. The threshold value can be predefined or can be determined as a medium value of all pixels of the picture frame 66, 68 or a certain block or area of the picture frames 66, 68. On the basis of the binarized first picture frame 72, the binarized source block 76 is determined in order to determine the respective corresponding pixel block 62 in the binarized second picture frame 74. Alternatively, merely the source block 70 extracted from the captured first picture frame 66 can be binarized in order to reduce the image processing effort.

[0036] In the binarized second picture frame 74, the search area 52-60 is determined as mentioned above and the binarized source block 76 is correlated to the search area 52-60 of the binarized second picture frame 74. The correlation is performed by cross- correlation as mentioned above, wherein a pixel by pixel XNOR (or alternatively XOR) operation or a multiplication of the pixel values is performed for different search positions of the binarized source block 76 within the search area 52-60. The results of the pixel by pixel XNOR (or XOR) or multiplication operation are added in order to determine a sum as a single cross-correlation value. The position of the binarized source block 76 within the search area 52-60 providing the highest cross-correlation value is defined as a matching position in the case of the XNOR operation and is determined as corresponding pixel block 62 which corresponds to the source pixel block 70. Alternatively, in the case of the XOR operation, the position of the binarized source block 76 within the search area 52-60 providing the lowest cross-correlation value is defined as the matching position.

[0037] Fig. 4 shows a map of cross-correlation values determined for different positions of the binarized source block 76 within the search area 52-60 by means of an XNOR operation. The cross-correlation value is generally denoted by A. The map shown in Fig. 4 shows different values depending on the X-position and the Y-position of the binarized source block 76 within the search area 52-60. The cross-correlation value map shows different low peaks and a single high peak, which is generally denoted by 78. The high peak 78 is obviously the highest cross-correlation value A for the search shown in Fig. 4 so that this respective X-position and Y-position is defined as the position of the corresponding pixel block 62 for this XNOR operation. In the alternative case of the use of the XOR operation for the cross-correlation, the cross-correlation value map comprises different high peaks and a single low peak. In that case, the low peak indicates the best correlation for the search so that this respective X-position and Y-position is defined as the matching position of the corresponding pixel block 62. Due to the cross-correlation and the so-determined cross-correlation value A, a fast matching can be achieved since the cross correlation in general provides a fast convergence of the calculation process.

[0038] In an alternative embodiment of the present invention, the correlation of the binarized search block 76 and the binarized second picture frame 74 is performed by phase correlation, wherein edges in the source block 76 and the second binarized picture frame are correlated in order to identify the correlated pixel block 62.

[0039] Fig. 5 shows a schematic block diagram of an apparatus for image processing on the basis of the image processing method 10 described above. The apparatus is generally denoted by 80.

[0040] As an input, the apparatus 80 receives a plurality of image frames 66, 68 from an imaging device 82 such as a camera. The picture frames 66, 68 are preprocessed by means of a pre-processing unit 84 which comprises a filter unit, a scaling unit and/or a contrast-enhancement unit for performing the pre-processing step 14 shown in Fig. 1. The pre-processed picture frames 86 are provided to a binarization unit 88. The binarization unit 88 performs a binary compression of the pre-processed picture frames 86 as mentioned above and provides the binarized picture frames 72, 74 and the binarized source block 76 to a cross-correlation unit 90. The cross-correlation unit 90 performs the cross- correlation step 18 as mentioned above and provides the cross-correlation value A as a result of the cross-correlation 18 to an evaluation unit 92 which evaluates the position corresponding to the highest peak 78 for an XNOR operator (or the lowest peak for an XOR operator) of the cross-correlation value A and determines the output vector 64 as a result of the image processing pointing from the source block 30 to the correlated pixel block 62. The output vector 64 is also fed back to the cross-correlation unit 90 in order to determine the estimation vectors on the basis of the output vector. The feedback loop is called predictors setup.

[0041] The method 10 for image processing and the apparatus 80 for image processing have the advantage that a fast convergence can be provided so that the image processing can be performed with low time effort and the technical effort to perform this method, is reduced.

[0042] Obviously, numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be understood that within the scope of the appended claims, the invention may be practiced otherwise than as specifically described herein.

[0043] In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single element or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

[0044] In so far as embodiments of the invention have been described as being implemented, at least in part, by software-controlled data processing apparatus, it will be appreciated that a non-transitory machine-readable medium carrying such software, such as an optical disk, a magnetic disk, semiconductor memory or the like, is also considered to represent an embodiment of the present invention. Further, such a software may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.

[0045] Any reference signs in the claims should not be construed as limiting the scope.

Claims

1. Method for image processing, in particular for motion and/or disparity estimation, comprising:

providing a set of temporal or spatial related picture frames containing correlated picture blocks,

correlating the binary pixel blocks, and

2. Method according to claim 1, wherein the correlating of the binary pixel blocks comprises a cross correlation of the binary pixel blocks.

3. Method as claimed in claim 1, wherein a source pixel block is defined and bina- rized in a first of the picture frames and a search pixel block is defined and binarized in a second of the picture frames.

4. Method as claimed in claim 3, wherein a cross-correlation function is calculated by cross-correlation of the source pixel block and the search pixel block and wherein the correlated pixel blocks are determined on the basis of an absolute value of the cross- correlation function.

5. Method as claimed in claim 3, wherein the search pixel block comprises a larger amount of pixels than the source pixel block.

6. Method as claimed in claim 1, further comprising the step of providing a plurality of vector candidates for each picture frame indicating at least one estimated motion and/or disparity relation between two of the picture frames.

7. Method as claimed in claim 6, wherein the search pixel blocks are determined on the basis of the vector candidates.

8. Method as claimed in claim 6, wherein the vector candidates are determined on the basis of a motion and/or a disparity relation of correlated pixel blocks of a plurality of picture frames.

9. Method as claimed in claim 6, wherein the vector candidates are determined on the basis of a general movement of image pattern within the picture frames.

10. Method as claimed in claim 6, wherein the vector candidates are determined on the basis of scaled picture frames of the set of picture frames having a reduced pixel size.

11. Method as claimed in claim 6, wherein a size of the search pixel block is adapted on the basis of a reliability estimation of the search pixel block determination.

12. Method as claimed in claim 11, wherein the reliability estimation is based on an amount of the vector candidates estimated for the image frame.

13. Method as claimed in claim 11, wherein the reliability estimation is based on a uniformity of the estimated vector candidates.

14. Method as claimed in claim 11, wherein the reliability estimation is based on a correlation of the binary pixel blocks of different picture frames determined in a previous processing step.

15. Method as claimed in claim 1, further comprising a step of preprocessing in advance of the transformation including filtering, scaling and/or contrast enhancement of the picture frames.

16. Method as claimed in claim 1, wherein the correlation of the binary pixel blocks comprises a phase correlation of the binary pixel blocks.

17. Apparatus for image processing of a number of spatially and/or temporally separated picture frames, in particular for motion and/or disparity estimation, comprising:

18. A computer program comprising program code means for causing a computer to perform the steps of said method as claimed in claim 1 when said computer program is carried out on a computer.

19. A non-transitory computer-readable recording medium that stores therein a computer program product, which, when executed by a processor, causes the method according claim 1 to be performed.