WO2013034878A2

WO2013034878A2 - Image processing

Info

Publication number: WO2013034878A2
Application number: PCT/GB2012/000705
Authority: WO
Inventors: Toby BRECKON; Ioannis KATRAMADOS
Original assignee: Cranfield University
Priority date: 2011-09-09
Filing date: 2012-09-10
Publication date: 2013-03-14
Also published as: WO2013034878A3; GB201115600D0

Abstract

Apparatus for generating a visual saliency data signal S comprises an input for an image data signal U₁ of resolution w x h and an output for a visual saliency data signal S. The apparatus is configured to successively downsample the image data signal U₁ using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal Un of resolution (w/2^n-1) x (h/^2n-1); thereafter successively upsample the data level signal U_n using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal D₁, thereafter calculate a minimum ratio signal matrix M, where and thereafter generate a visual saliency data signal S, wherein S_ij = 1 - M_ij.

Description

TITLE: IMAGE PROCESSING

DESCRIPTION

TECHNICAL FIELD

The present invention relates to methods of and apparatus for image processing, in particular the derivation of visual saliency data matrices or maps.

BACKGROUND ART

In this document, visual saliency is defined as the perceptual quality that makes a group of pixels stand out relative to its neighbours - cf. R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, "Frequency-tuned salient region detection," in IEEE CVPR, 2009, pp. 1597 -1604. Visual saliency forms the basis of several computer vision applications, including automatic object detection, medical imaging and robotics.

L. Itti, C. Koch, and E. Niebur, "A model of saliency based visual attention for rapid scene analysis," IEEE PAMI, vol. 20, no. 1 1, pp. 1254 -1259, Nov. 1998 discloses so- called "biological" models of visual saliency using a bottom-up approach for feature extraction mainly based on colour, intensity and orientation. Inspired by the structure of the human eye, this approach detects the contrast difference between an image region and its surroundings, which is also known as centre-surround contrast. Itti et al. use the Difference- of-Gaussians (DoG) filter for deriving the centre-surround contrast, whereas D. Walther and D. Koch, "Modeling attention to salient proto-objects," Neural Networks, vol. 19, no. 9, pp. 5 1395 - 1407, 2006 takes this further by adopting the concept of salient proto-objects. As set out in Haonan Yu, Jia Li, Yonghong Tian, and Tiejun Huang, "Automatic interesting object extraction from images using complementary saliency maps," in Proceedings of the international conference on Multimedia. 2010, pp.891-894, ACM, a common characteristic of these approaches is that they usually produce saliency maps that lack sharpness and0 detail. Furthermore, the complexity of known biological models means that performance is slow, thus they are more suitable for use in non-real-time applications.

DISCLOSURE OF INVENTION

According to the present invention, there is provided:

a method of generating visual saliency data S from image data U_\ of resolution w x h,5 the method comprising, in order, the steps of:

1 : starting with the image data Ui_, successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level U_n of resolution

(w/2"-¹) x (h/2^n']);

2: starting with data level U_n, successively upsampling using a Gaussian filter n-10 times to create a second Gaussian pyramid having a base data level Dj;

3: generating a minimum ratio matrix M, where

4: generating visual saliency data S, wherein S,_j = 1 - Mj _j There is also provided:

a method of generating visual saliency data S from image data Ui of resolution w x h, the method comprising, in order, the steps of:

1 : starting with the image data Ui, successively downsarnpling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level U_n of resolution (w/2^n_1) x (h/2"-¹);

2: starting with data level U_n, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having abase data level Dj;

3: generating, for each level of the first and second Gaussian pyramids, a minimum ratio matrix M, where

Mnij ^ min Γ^'^ Ί n-ly

4: generating visual saliency data S, wherein Sj_j = 1 - M,_j

The methods may comprise the step of downsarnpling and/or upsampling using a 5 x 5 Gaussian filter. The methods may comprise the step of creating first and second Gaussian pyramids having a maximum level n=5. Where the image data comprises multiple colour channels, in particular Red, Green and Blue, the methods may repeated separately for each channel.

There is also provided apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal and an output for a visual saliency data signal and being configured to operate in accordance with either one of the methods described above. There is also provided a system comprising the apparatus described above and having an input connected to an image sensor for generating an image data signal.

An embodiment of the invention will now be described by way of example. DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

The example, which may be called a "Division of Gaussians" (DIVoG) approach, comprises three distinct steps: 1) Bottom-up construction of Gaussian pyramid, 2) Top- down construction of Gaussian pyramid based on the output of Step 1 , 3) Element-by element division of the input image with the output of Step 2.

Step 1 : The Gaussian pyramid U comprises n levels, starting with an image U) as the base with resolution w x h. Successively higher pyramid levels are derived via downsampling of the preceding pyramid level using a 5 x 5 Gaussian filter. The top pyramid level has a resolution of (w/2^n'1) x (h 2^n_1). This image maybe called U_n.

Step 2: U_n is used as the top level, D„, of a second Gaussian pyramid D in order to derive its base Dj. In this case, lower pyramid levels are derived via upsampling using a 5 x 5 Gaussian filter

Step 3: Element-by-element division of Uiand Ό_\ is performed in order to derive the minimum ratio matrix M (also called MiR matrix) of their corresponding values as described by the following equation 1 :

The saliency map S is then given by the following equation 2, which means that saliency is expressed as a floating-point number in the range 0 - 1.

The described approach can be further expanded to include element-by-element division of all corresponding levels of pyramids U and D. In this case, the MiR matrix is initialised as a unit matrix (i.e. for each matrix element Moi_j = 1). Then each pair of pyramid levels U_n and D„ is scaled up to the input's resolution. Then the MiR matrix M_n is multiplied by M_n-i as described by the DIVoG equation below, which is a generalised form of equation I .

Mnn = min Diij , yiij Mn-li_j (3)

LUiij D_Hj for n greater than or equal to 1. The saliency map is then derived using equation 2. Deriving the MiR matrix through processing of all pyramid levels produces more accurate saliency maps than equation 1, but also increases the computational complexity.

The above approaches are colourspace independent and can consequently derive saliency maps even from greyscale images, which significantly reduces computational cost. In the present example, all operations are performed using 32-bit floating point matrices. To avoid division by zero, or division with floating point numbers in the range 0 to 1, the minimum pixel value is defined to be equal to k", where k is the size of the Gaussian kernel. This ensures that pyramidal downsampling will always result into a value greater than I. For colour images, the method can be used with any colourspace, each channel being processed separately to produce a salience map. All the saliency maps in this example have been produced using 24-bit colour images in the RGB colourspace. In this example, the Gaussian pyramid has also been constructed with n = 5 and all saliency maps normalised to fit the 0 - 255 range.

It should be understood that this invention has been described by way of examples only and that a wide variety of modifications can be made without departing from the scope of the invention.

Claims

1. Apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal U] of resolution w x h and an output for a visual saliency data signal S, the apparatus being configured to:

successively downsample the image data signal U_\ using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal U_n of resolution

(w/2^n_1) x (h/2ⁿ-^]); thereafter

successively upsample the data level signal U_n using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal Di ; thereafter

calculate a minimum ratio signal matrix M, where

and thereafter generate a visual saliency data signal S, wherein = 1 - Mj_j

2. Apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal Ui of resolution x h and an output for a visual saliency data signal S, the apparatus being configured to:

successively downsample the image data signal Ui using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal U_n of resolution

(w/2"^'1) x (h/2ⁿ-¹); thereafter

successively upsample the data level signal U„ using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal Di; thereafter

calculate, for each data level signal of the first and second Gaussian pyramids, a minimum ratio signal matrix M, where

and thereafter generate a visual saliency data signal S, wherein S, _j = 1 - M_y

3. Apparatus according to claim 1 or claim 2 and configured to successively downsample the image data signal Ui using a 5 x 5 Gaussian filter.

4. Apparatus according to any preceding claim and configured to successively upsample the data level U_n signal using a 5 x 5 Gaussian filter.

5. Apparatus according to any preceding claim and configured to create first and second Gaussian pyramids having a maximum level n=5.

6. Apparatus according to any preceding claim and comprising an input for multiple colour image data signals U] each having resolution w x h, the apparatus being configured to generate a visual saliency data signal S for each colour image data signal U_t.

7. System comprising an image sensor for generating one or more image data signals and apparatus according to any preceding claim, the sensor being connected to the input of the apparatus.

8. Method of generating visual saliency data S from image data Uj of resolution w x h, the method comprising, in order, the steps of:

1 : starting with the image data Uj, successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level U„ of resolution (w/2ⁿ-') x (h^"-¹);

2: starting with data level U_n, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level Dj;

3: generating a minimum ratio matrix M, where

5 Mij = min

Uiij Diij

4: generating visual saliency data S, wherein Sj _j = 1 - Mj_j

9. Method of generating visual saliency data S from image data Ui of resolution w x h, the method comprising, in order, the steps of:

10 1 : starting with the image data U]_, successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level U_n of resolution (w/2"-¹) x (h/2ⁿ-¹);

2: starting with data level U„, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level Di;

15 3: generating, for each level of the first and second Gaussian pyramids, a minimum ratio matrix M, where

Mnij = min ^ ' ^ Mn-li i

4: generating visual saliency data S, wherein ¾ = 1 - Mj_j

20 10. Method according to claim 8 or claim 9 and comprising the step of downsampling using a 5 x 5 Gaussian filter.

11. Method according to any one of claims 8 to 10 and comprising the step of upsampling using a 5 x 5 Gaussian filter.

12. Method according to any one of claims 8 to 1 1 and comprising the step of creating first and second Gaussian pyramids having a maximum level n=5.

13. Method according to any of claims 8 to 12 and comprising the step of generating visual saliency data S for each of multiple colour channels.