WO2013034878A2 - Image processing - Google Patents

Image processing Download PDF

Info

Publication number
WO2013034878A2
WO2013034878A2 PCT/GB2012/000705 GB2012000705W WO2013034878A2 WO 2013034878 A2 WO2013034878 A2 WO 2013034878A2 GB 2012000705 W GB2012000705 W GB 2012000705W WO 2013034878 A2 WO2013034878 A2 WO 2013034878A2
Authority
WO
WIPO (PCT)
Prior art keywords
signal
data
gaussian
visual saliency
successively
Prior art date
Application number
PCT/GB2012/000705
Other languages
French (fr)
Other versions
WO2013034878A3 (en
Inventor
Toby BRECKON
Ioannis KATRAMADOS
Original Assignee
Cranfield University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cranfield University filed Critical Cranfield University
Publication of WO2013034878A2 publication Critical patent/WO2013034878A2/en
Publication of WO2013034878A3 publication Critical patent/WO2013034878A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • the present invention relates to methods of and apparatus for image processing, in particular the derivation of visual saliency data matrices or maps.
  • visual saliency is defined as the perceptual quality that makes a group of pixels stand out relative to its neighbours - cf.
  • R. Achanta S. Hemami, F. Estrada, and S. Susstrunk, "Frequency-tuned salient region detection," in IEEE CVPR, 2009, pp. 1597 -1604.
  • Visual saliency forms the basis of several computer vision applications, including automatic object detection, medical imaging and robotics.
  • the methods may comprise the step of downsarnpling and/or upsampling using a 5 x 5 Gaussian filter.
  • apparatus for generating a visual saliency data signal S comprising an input for an image data signal and an output for a visual saliency data signal and being configured to operate in accordance with either one of the methods described above.
  • a system comprising the apparatus described above and having an input connected to an image sensor for generating an image data signal.
  • the example which may be called a "Division of Gaussians" (DIVoG) approach, comprises three distinct steps: 1) Bottom-up construction of Gaussian pyramid, 2) Top- down construction of Gaussian pyramid based on the output of Step 1 , 3) Element-by element division of the input image with the output of Step 2.
  • DIVoG Division of Gaussians
  • Step 1 The Gaussian pyramid U comprises n levels, starting with an image U) as the base with resolution w x h. Successively higher pyramid levels are derived via downsampling of the preceding pyramid level using a 5 x 5 Gaussian filter.
  • the top pyramid level has a resolution of (w/2 n'1 ) x (h 2 n_1 ). This image maybe called U n .
  • Step 2 U n is used as the top level, D administrat, of a second Gaussian pyramid D in order to derive its base Dj.
  • lower pyramid levels are derived via upsampling using a 5 x 5 Gaussian filter
  • Step 3 Element-by-element division of Uiand ⁇ ⁇ is performed in order to derive the minimum ratio matrix M (also called MiR matrix) of their corresponding values as described by the following equation 1 :
  • the saliency map S is then given by the following equation 2, which means that saliency is expressed as a floating-point number in the range 0 - 1.

Abstract

Apparatus for generating a visual saliency data signal S comprises an input for an image data signal U1 of resolution w x h and an output for a visual saliency data signal S. The apparatus is configured to successively downsample the image data signal U1 using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal Un of resolution (w/2n-1) x (h/2n-1); thereafter successively upsample the data level signal Un using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal D1, thereafter calculate a minimum ratio signal matrix M, where and thereafter generate a visual saliency data signal S, wherein Sij = 1 - Mij.

Description

TITLE: IMAGE PROCESSING
DESCRIPTION
TECHNICAL FIELD
The present invention relates to methods of and apparatus for image processing, in particular the derivation of visual saliency data matrices or maps.
BACKGROUND ART
In this document, visual saliency is defined as the perceptual quality that makes a group of pixels stand out relative to its neighbours - cf. R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, "Frequency-tuned salient region detection," in IEEE CVPR, 2009, pp. 1597 -1604. Visual saliency forms the basis of several computer vision applications, including automatic object detection, medical imaging and robotics.
L. Itti, C. Koch, and E. Niebur, "A model of saliency based visual attention for rapid scene analysis," IEEE PAMI, vol. 20, no. 1 1, pp. 1254 -1259, Nov. 1998 discloses so- called "biological" models of visual saliency using a bottom-up approach for feature extraction mainly based on colour, intensity and orientation. Inspired by the structure of the human eye, this approach detects the contrast difference between an image region and its surroundings, which is also known as centre-surround contrast. Itti et al. use the Difference- of-Gaussians (DoG) filter for deriving the centre-surround contrast, whereas D. Walther and D. Koch, "Modeling attention to salient proto-objects," Neural Networks, vol. 19, no. 9, pp. 5 1395 - 1407, 2006 takes this further by adopting the concept of salient proto-objects. As set out in Haonan Yu, Jia Li, Yonghong Tian, and Tiejun Huang, "Automatic interesting object extraction from images using complementary saliency maps," in Proceedings of the international conference on Multimedia. 2010, pp.891-894, ACM, a common characteristic of these approaches is that they usually produce saliency maps that lack sharpness and0 detail. Furthermore, the complexity of known biological models means that performance is slow, thus they are more suitable for use in non-real-time applications.
DISCLOSURE OF INVENTION
According to the present invention, there is provided:
a method of generating visual saliency data S from image data U\ of resolution w x h,5 the method comprising, in order, the steps of:
1 : starting with the image data Ui, successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level Un of resolution
(w/2"-1) x (h/2n']);
2: starting with data level Un, successively upsampling using a Gaussian filter n-10 times to create a second Gaussian pyramid having a base data level Dj;
3: generating a minimum ratio matrix M, where
Figure imgf000003_0001
4: generating visual saliency data S, wherein S,j = 1 - Mj j There is also provided:
a method of generating visual saliency data S from image data Ui of resolution w x h, the method comprising, in order, the steps of:
1 : starting with the image data Ui, successively downsarnpling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level Un of resolution (w/2n_1) x (h/2"-1);
2: starting with data level Un, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having abase data level Dj;
3: generating, for each level of the first and second Gaussian pyramids, a minimum ratio matrix M, where
Mnij ^ min Γ^'^ Ί n-ly
4: generating visual saliency data S, wherein Sjj = 1 - M,j
The methods may comprise the step of downsarnpling and/or upsampling using a 5 x 5 Gaussian filter. The methods may comprise the step of creating first and second Gaussian pyramids having a maximum level n=5. Where the image data comprises multiple colour channels, in particular Red, Green and Blue, the methods may repeated separately for each channel.
There is also provided apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal and an output for a visual saliency data signal and being configured to operate in accordance with either one of the methods described above. There is also provided a system comprising the apparatus described above and having an input connected to an image sensor for generating an image data signal.
An embodiment of the invention will now be described by way of example. DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
The example, which may be called a "Division of Gaussians" (DIVoG) approach, comprises three distinct steps: 1) Bottom-up construction of Gaussian pyramid, 2) Top- down construction of Gaussian pyramid based on the output of Step 1 , 3) Element-by element division of the input image with the output of Step 2.
Step 1 : The Gaussian pyramid U comprises n levels, starting with an image U) as the base with resolution w x h. Successively higher pyramid levels are derived via downsampling of the preceding pyramid level using a 5 x 5 Gaussian filter. The top pyramid level has a resolution of (w/2n'1) x (h 2n_1). This image maybe called Un.
Step 2: Un is used as the top level, D„, of a second Gaussian pyramid D in order to derive its base Dj. In this case, lower pyramid levels are derived via upsampling using a 5 x 5 Gaussian filter
Step 3: Element-by-element division of Uiand Ό\ is performed in order to derive the minimum ratio matrix M (also called MiR matrix) of their corresponding values as described by the following equation 1 :
Figure imgf000005_0001
The saliency map S is then given by the following equation 2, which means that saliency is expressed as a floating-point number in the range 0 - 1.
Figure imgf000005_0002
The described approach can be further expanded to include element-by-element division of all corresponding levels of pyramids U and D. In this case, the MiR matrix is initialised as a unit matrix (i.e. for each matrix element Moij = 1). Then each pair of pyramid levels Un and D„ is scaled up to the input's resolution. Then the MiR matrix Mn is multiplied by Mn-i as described by the DIVoG equation below, which is a generalised form of equation I .
Mnn = min Diij , yiij Mn-lij (3)
LUiij DHj for n greater than or equal to 1. The saliency map is then derived using equation 2. Deriving the MiR matrix through processing of all pyramid levels produces more accurate saliency maps than equation 1, but also increases the computational complexity.
The above approaches are colourspace independent and can consequently derive saliency maps even from greyscale images, which significantly reduces computational cost. In the present example, all operations are performed using 32-bit floating point matrices. To avoid division by zero, or division with floating point numbers in the range 0 to 1, the minimum pixel value is defined to be equal to k", where k is the size of the Gaussian kernel. This ensures that pyramidal downsampling will always result into a value greater than I. For colour images, the method can be used with any colourspace, each channel being processed separately to produce a salience map. All the saliency maps in this example have been produced using 24-bit colour images in the RGB colourspace. In this example, the Gaussian pyramid has also been constructed with n = 5 and all saliency maps normalised to fit the 0 - 255 range.
It should be understood that this invention has been described by way of examples only and that a wide variety of modifications can be made without departing from the scope of the invention.

Claims

1. Apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal U] of resolution w x h and an output for a visual saliency data signal S, the apparatus being configured to:
successively downsample the image data signal U\ using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal Un of resolution
(w/2n_1) x (h/2n-]); thereafter
successively upsample the data level signal Un using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal Di ; thereafter
calculate a minimum ratio signal matrix M, where
Figure imgf000007_0001
and thereafter generate a visual saliency data signal S, wherein = 1 - Mjj
2. Apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal Ui of resolution x h and an output for a visual saliency data signal S, the apparatus being configured to:
successively downsample the image data signal Ui using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal Un of resolution
(w/2"'1) x (h/2n-1); thereafter
successively upsample the data level signal U„ using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal Di; thereafter
calculate, for each data level signal of the first and second Gaussian pyramids, a minimum ratio signal matrix M, where
Figure imgf000008_0001
and thereafter generate a visual saliency data signal S, wherein S, j = 1 - My
3. Apparatus according to claim 1 or claim 2 and configured to successively downsample the image data signal Ui using a 5 x 5 Gaussian filter.
4. Apparatus according to any preceding claim and configured to successively upsample the data level Un signal using a 5 x 5 Gaussian filter.
5. Apparatus according to any preceding claim and configured to create first and second Gaussian pyramids having a maximum level n=5.
6. Apparatus according to any preceding claim and comprising an input for multiple colour image data signals U] each having resolution w x h, the apparatus being configured to generate a visual saliency data signal S for each colour image data signal Ut.
7. System comprising an image sensor for generating one or more image data signals and apparatus according to any preceding claim, the sensor being connected to the input of the apparatus.
8. Method of generating visual saliency data S from image data Uj of resolution w x h, the method comprising, in order, the steps of:
1 : starting with the image data Uj, successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level U„ of resolution (w/2n-') x (h^"-1);
2: starting with data level Un, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level Dj;
3: generating a minimum ratio matrix M, where
5 Mij = min
Uiij Diij
4: generating visual saliency data S, wherein Sj j = 1 - Mjj
9. Method of generating visual saliency data S from image data Ui of resolution w x h, the method comprising, in order, the steps of:
10 1 : starting with the image data U], successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level Un of resolution (w/2"-1) x (h/2n-1);
2: starting with data level U„, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level Di;
15 3: generating, for each level of the first and second Gaussian pyramids, a minimum ratio matrix M, where
Mnij = min ^ ' ^ Mn-li i
4: generating visual saliency data S, wherein ¾ = 1 - Mjj
20 10. Method according to claim 8 or claim 9 and comprising the step of downsampling using a 5 x 5 Gaussian filter.
11. Method according to any one of claims 8 to 10 and comprising the step of upsampling using a 5 x 5 Gaussian filter.
12. Method according to any one of claims 8 to 1 1 and comprising the step of creating first and second Gaussian pyramids having a maximum level n=5.
13. Method according to any of claims 8 to 12 and comprising the step of generating visual saliency data S for each of multiple colour channels.
PCT/GB2012/000705 2011-09-09 2012-09-10 Image processing WO2013034878A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB1115600.7A GB201115600D0 (en) 2011-09-09 2011-09-09 Image processing
GB1115600.7 2011-09-09

Publications (2)

Publication Number Publication Date
WO2013034878A2 true WO2013034878A2 (en) 2013-03-14
WO2013034878A3 WO2013034878A3 (en) 2013-04-25

Family

ID=44908309

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2012/000705 WO2013034878A2 (en) 2011-09-09 2012-09-10 Image processing

Country Status (2)

Country Link
GB (1) GB201115600D0 (en)
WO (1) WO2013034878A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105551060A (en) * 2015-12-10 2016-05-04 电子科技大学 Infrared weak small object detection method based on space-time significance and quaternary cosine transformation
EP3489901A1 (en) * 2017-11-24 2019-05-29 V-Nova International Limited Signal encoding

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
D. WALTHER; D. KOCH: "Modeling attention to salient proto-objects", NEURAL NETWORKS, vol. 19, no. 9, 2006, pages 1395 - 1407, XP024902864, DOI: doi:10.1016/j.neunet.2006.10.001
HAONAN YU; JIA LI; YONGHONG TIAN; TIEJUN HUANG: "Automatic interesting object extraction from images using complementary saliency maps", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2010, pages 891 - 894
L. ITTI; C. KOCH; E. NIEBUR: "A model of saliency based visual attention for rapid scene analysis", IEEE PAMI, vol. 20, no. 11, November 1998 (1998-11-01), pages 1254 - 1259, XP001203933, DOI: doi:10.1109/34.730558
R. ACHANTA; S. HEMAMI; F. ESTRADA; S. SUSSTRUNK: "Frequency-tuned salient region detection", IEEE CVPR, 2009, pages 1597 - 1604

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105551060A (en) * 2015-12-10 2016-05-04 电子科技大学 Infrared weak small object detection method based on space-time significance and quaternary cosine transformation
EP3489901A1 (en) * 2017-11-24 2019-05-29 V-Nova International Limited Signal encoding
WO2019101911A1 (en) * 2017-11-24 2019-05-31 V-Nova International Limited Signal encoding

Also Published As

Publication number Publication date
WO2013034878A3 (en) 2013-04-25
GB201115600D0 (en) 2011-10-26

Similar Documents

Publication Publication Date Title
KR102281184B1 (en) Method and apparatus for calibrating image
US8498444B2 (en) Blob representation in video processing
EP3644599B1 (en) Video processing method and apparatus, electronic device, and storage medium
JP2020064637A (en) System and method for detecting image forgery and alteration via convolutional neural network, and method for providing non-correction detection service using the same
EP3709266A1 (en) Human-tracking methods, apparatuses, systems, and storage media
CN110335216B (en) Image processing method, image processing apparatus, terminal device, and readable storage medium
US8577137B2 (en) Image processing apparatus and method, and program
EP2863362B1 (en) Method and apparatus for scene segmentation from focal stack images
CN111985281B (en) Image generation model generation method and device and image generation method and device
US8538079B2 (en) Apparatus capable of detecting location of object contained in image data and detection method thereof
CN109948441B (en) Model training method, image processing method, device, electronic equipment and computer readable storage medium
CN110348358B (en) Skin color detection system, method, medium and computing device
CN113673584A (en) Image detection method and related device
US9020269B2 (en) Image processing device, image processing method, and recording medium
CN110674759A (en) Monocular face in-vivo detection method, device and equipment based on depth map
CN112348778A (en) Object identification method and device, terminal equipment and storage medium
Katramados et al. Real-time visual saliency by division of gaussians
CN111080537B (en) Intelligent control method, medium, equipment and system for underwater robot
CN110111347B (en) Image sign extraction method, device and storage medium
WO2018132961A1 (en) Apparatus, method and computer program product for object detection
WO2013034878A2 (en) Image processing
CN110717452B (en) Image recognition method, device, terminal and computer readable storage medium
JP6963038B2 (en) Image processing device and image processing method
CN113256643A (en) Portrait segmentation model training method, storage medium and terminal equipment
CN108470327B (en) Image enhancement method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12783250

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12783250

Country of ref document: EP

Kind code of ref document: A2