WO2013034878A2 - Image processing - Google Patents
Image processing Download PDFInfo
- Publication number
- WO2013034878A2 WO2013034878A2 PCT/GB2012/000705 GB2012000705W WO2013034878A2 WO 2013034878 A2 WO2013034878 A2 WO 2013034878A2 GB 2012000705 W GB2012000705 W GB 2012000705W WO 2013034878 A2 WO2013034878 A2 WO 2013034878A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- data
- gaussian
- visual saliency
- successively
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Definitions
- the present invention relates to methods of and apparatus for image processing, in particular the derivation of visual saliency data matrices or maps.
- visual saliency is defined as the perceptual quality that makes a group of pixels stand out relative to its neighbours - cf.
- R. Achanta S. Hemami, F. Estrada, and S. Susstrunk, "Frequency-tuned salient region detection," in IEEE CVPR, 2009, pp. 1597 -1604.
- Visual saliency forms the basis of several computer vision applications, including automatic object detection, medical imaging and robotics.
- the methods may comprise the step of downsarnpling and/or upsampling using a 5 x 5 Gaussian filter.
- apparatus for generating a visual saliency data signal S comprising an input for an image data signal and an output for a visual saliency data signal and being configured to operate in accordance with either one of the methods described above.
- a system comprising the apparatus described above and having an input connected to an image sensor for generating an image data signal.
- the example which may be called a "Division of Gaussians" (DIVoG) approach, comprises three distinct steps: 1) Bottom-up construction of Gaussian pyramid, 2) Top- down construction of Gaussian pyramid based on the output of Step 1 , 3) Element-by element division of the input image with the output of Step 2.
- DIVoG Division of Gaussians
- Step 1 The Gaussian pyramid U comprises n levels, starting with an image U) as the base with resolution w x h. Successively higher pyramid levels are derived via downsampling of the preceding pyramid level using a 5 x 5 Gaussian filter.
- the top pyramid level has a resolution of (w/2 n'1 ) x (h 2 n_1 ). This image maybe called U n .
- Step 2 U n is used as the top level, D administrat, of a second Gaussian pyramid D in order to derive its base Dj.
- lower pyramid levels are derived via upsampling using a 5 x 5 Gaussian filter
- Step 3 Element-by-element division of Uiand ⁇ ⁇ is performed in order to derive the minimum ratio matrix M (also called MiR matrix) of their corresponding values as described by the following equation 1 :
- the saliency map S is then given by the following equation 2, which means that saliency is expressed as a floating-point number in the range 0 - 1.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
Apparatus for generating a visual saliency data signal S comprises an input for an image data signal U1 of resolution w x h and an output for a visual saliency data signal S. The apparatus is configured to successively downsample the image data signal U1 using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal Un of resolution (w/2n-1) x (h/2n-1); thereafter successively upsample the data level signal Un using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal D1, thereafter calculate a minimum ratio signal matrix M, where and thereafter generate a visual saliency data signal S, wherein Sij = 1 - Mij.
Description
TITLE: IMAGE PROCESSING
DESCRIPTION
TECHNICAL FIELD
The present invention relates to methods of and apparatus for image processing, in particular the derivation of visual saliency data matrices or maps.
BACKGROUND ART
In this document, visual saliency is defined as the perceptual quality that makes a group of pixels stand out relative to its neighbours - cf. R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, "Frequency-tuned salient region detection," in IEEE CVPR, 2009, pp. 1597 -1604. Visual saliency forms the basis of several computer vision applications, including automatic object detection, medical imaging and robotics.
L. Itti, C. Koch, and E. Niebur, "A model of saliency based visual attention for rapid scene analysis," IEEE PAMI, vol. 20, no. 1 1, pp. 1254 -1259, Nov. 1998 discloses so- called "biological" models of visual saliency using a bottom-up approach for feature extraction mainly based on colour, intensity and orientation. Inspired by the structure of the
human eye, this approach detects the contrast difference between an image region and its surroundings, which is also known as centre-surround contrast. Itti et al. use the Difference- of-Gaussians (DoG) filter for deriving the centre-surround contrast, whereas D. Walther and D. Koch, "Modeling attention to salient proto-objects," Neural Networks, vol. 19, no. 9, pp. 5 1395 - 1407, 2006 takes this further by adopting the concept of salient proto-objects. As set out in Haonan Yu, Jia Li, Yonghong Tian, and Tiejun Huang, "Automatic interesting object extraction from images using complementary saliency maps," in Proceedings of the international conference on Multimedia. 2010, pp.891-894, ACM, a common characteristic of these approaches is that they usually produce saliency maps that lack sharpness and0 detail. Furthermore, the complexity of known biological models means that performance is slow, thus they are more suitable for use in non-real-time applications.
DISCLOSURE OF INVENTION
According to the present invention, there is provided:
a method of generating visual saliency data S from image data U\ of resolution w x h,5 the method comprising, in order, the steps of:
1 : starting with the image data Ui, successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level Un of resolution
(w/2"-1) x (h/2n']);
2: starting with data level Un, successively upsampling using a Gaussian filter n-10 times to create a second Gaussian pyramid having a base data level Dj;
4: generating visual saliency data S, wherein S,j = 1 - Mj j
There is also provided:
a method of generating visual saliency data S from image data Ui of resolution w x h, the method comprising, in order, the steps of:
1 : starting with the image data Ui, successively downsarnpling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level Un of resolution (w/2n_1) x (h/2"-1);
2: starting with data level Un, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having abase data level Dj;
3: generating, for each level of the first and second Gaussian pyramids, a minimum ratio matrix M, where
Mnij ^ min Γ^'^ Ί n-ly
4: generating visual saliency data S, wherein Sjj = 1 - M,j
The methods may comprise the step of downsarnpling and/or upsampling using a 5 x 5 Gaussian filter. The methods may comprise the step of creating first and second Gaussian pyramids having a maximum level n=5. Where the image data comprises multiple colour channels, in particular Red, Green and Blue, the methods may repeated separately for each channel.
There is also provided apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal and an output for a visual saliency data signal and being configured to operate in accordance with either one of the methods described above. There is also provided a system comprising the apparatus described above and having an input connected to an image sensor for generating an image data signal.
An embodiment of the invention will now be described by way of example.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
The example, which may be called a "Division of Gaussians" (DIVoG) approach, comprises three distinct steps: 1) Bottom-up construction of Gaussian pyramid, 2) Top- down construction of Gaussian pyramid based on the output of Step 1 , 3) Element-by element division of the input image with the output of Step 2.
Step 1 : The Gaussian pyramid U comprises n levels, starting with an image U) as the base with resolution w x h. Successively higher pyramid levels are derived via downsampling of the preceding pyramid level using a 5 x 5 Gaussian filter. The top pyramid level has a resolution of (w/2n'1) x (h 2n_1). This image maybe called Un.
Step 2: Un is used as the top level, D„, of a second Gaussian pyramid D in order to derive its base Dj. In this case, lower pyramid levels are derived via upsampling using a 5 x 5 Gaussian filter
Step 3: Element-by-element division of Uiand Ό\ is performed in order to derive the minimum ratio matrix M (also called MiR matrix) of their corresponding values as described by the following equation 1 :
The saliency map S is then given by the following equation 2, which means that saliency is expressed as a floating-point number in the range 0 - 1.
The described approach can be further expanded to include element-by-element division of all corresponding levels of pyramids U and D. In this case, the MiR matrix is initialised as a unit matrix (i.e. for each matrix element Moij = 1). Then each pair of pyramid levels Un and D„ is scaled up to the input's resolution. Then the MiR matrix Mn is
multiplied by Mn-i as described by the DIVoG equation below, which is a generalised form of equation I .
Mnn = min Diij , yiij Mn-lij (3)
LUiij DHj for n greater than or equal to 1. The saliency map is then derived using equation 2. Deriving the MiR matrix through processing of all pyramid levels produces more accurate saliency maps than equation 1, but also increases the computational complexity.
The above approaches are colourspace independent and can consequently derive saliency maps even from greyscale images, which significantly reduces computational cost. In the present example, all operations are performed using 32-bit floating point matrices. To avoid division by zero, or division with floating point numbers in the range 0 to 1, the minimum pixel value is defined to be equal to k", where k is the size of the Gaussian kernel. This ensures that pyramidal downsampling will always result into a value greater than I. For colour images, the method can be used with any colourspace, each channel being processed separately to produce a salience map. All the saliency maps in this example have been produced using 24-bit colour images in the RGB colourspace. In this example, the Gaussian pyramid has also been constructed with n = 5 and all saliency maps normalised to fit the 0 - 255 range.
It should be understood that this invention has been described by way of examples only and that a wide variety of modifications can be made without departing from the scope of the invention.
Claims
1. Apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal U] of resolution w x h and an output for a visual saliency data signal S, the apparatus being configured to:
successively downsample the image data signal U\ using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal Un of resolution
(w/2n_1) x (h/2n-]); thereafter
successively upsample the data level signal Un using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal Di ; thereafter
and thereafter generate a visual saliency data signal S, wherein = 1 - Mjj
2. Apparatus for generating a visual saliency data signal S, the apparatus comprising an input for an image data signal Ui of resolution x h and an output for a visual saliency data signal S, the apparatus being configured to:
successively downsample the image data signal Ui using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level signal Un of resolution
(w/2"'1) x (h/2n-1); thereafter
successively upsample the data level signal U„ using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level signal Di; thereafter
calculate, for each data level signal of the first and second Gaussian pyramids, a minimum ratio signal matrix M, where
and thereafter generate a visual saliency data signal S, wherein S, j = 1 - My
3. Apparatus according to claim 1 or claim 2 and configured to successively downsample the image data signal Ui using a 5 x 5 Gaussian filter.
4. Apparatus according to any preceding claim and configured to successively upsample the data level Un signal using a 5 x 5 Gaussian filter.
5. Apparatus according to any preceding claim and configured to create first and second Gaussian pyramids having a maximum level n=5.
6. Apparatus according to any preceding claim and comprising an input for multiple colour image data signals U] each having resolution w x h, the apparatus being configured to generate a visual saliency data signal S for each colour image data signal Ut.
7. System comprising an image sensor for generating one or more image data signals and apparatus according to any preceding claim, the sensor being connected to the input of the apparatus.
8. Method of generating visual saliency data S from image data Uj of resolution w x h, the method comprising, in order, the steps of:
1 : starting with the image data Uj, successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level U„ of resolution (w/2n-') x (h^"-1);
2: starting with data level Un, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level Dj;
3: generating a minimum ratio matrix M, where
5 Mij = min
Uiij Diij
4: generating visual saliency data S, wherein Sj j = 1 - Mjj
9. Method of generating visual saliency data S from image data Ui of resolution w x h, the method comprising, in order, the steps of:
10 1 : starting with the image data U], successively downsampling using a Gaussian filter n-1 times to create a first Gaussian pyramid having an nth data level Un of resolution (w/2"-1) x (h/2n-1);
2: starting with data level U„, successively upsampling using a Gaussian filter n-1 times to create a second Gaussian pyramid having a base data level Di;
15 3: generating, for each level of the first and second Gaussian pyramids, a minimum ratio matrix M, where
Mnij = min ^ ' ^ Mn-li i
4: generating visual saliency data S, wherein ¾ = 1 - Mjj
20 10. Method according to claim 8 or claim 9 and comprising the step of downsampling using a 5 x 5 Gaussian filter.
11. Method according to any one of claims 8 to 10 and comprising the step of upsampling using a 5 x 5 Gaussian filter.
12. Method according to any one of claims 8 to 1 1 and comprising the step of creating first and second Gaussian pyramids having a maximum level n=5.
13. Method according to any of claims 8 to 12 and comprising the step of generating visual saliency data S for each of multiple colour channels.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GBGB1115600.7A GB201115600D0 (en) | 2011-09-09 | 2011-09-09 | Image processing |
GB1115600.7 | 2011-09-09 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2013034878A2 true WO2013034878A2 (en) | 2013-03-14 |
WO2013034878A3 WO2013034878A3 (en) | 2013-04-25 |
Family
ID=44908309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2012/000705 WO2013034878A2 (en) | 2011-09-09 | 2012-09-10 | Image processing |
Country Status (2)
Country | Link |
---|---|
GB (1) | GB201115600D0 (en) |
WO (1) | WO2013034878A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105551060A (en) * | 2015-12-10 | 2016-05-04 | 电子科技大学 | Infrared weak small object detection method based on space-time significance and quaternary cosine transformation |
EP3489901A1 (en) * | 2017-11-24 | 2019-05-29 | V-Nova International Limited | Signal encoding |
-
2011
- 2011-09-09 GB GBGB1115600.7A patent/GB201115600D0/en not_active Ceased
-
2012
- 2012-09-10 WO PCT/GB2012/000705 patent/WO2013034878A2/en active Application Filing
Non-Patent Citations (4)
Title |
---|
D. WALTHER; D. KOCH: "Modeling attention to salient proto-objects", NEURAL NETWORKS, vol. 19, no. 9, 2006, pages 1395 - 1407, XP024902864, DOI: doi:10.1016/j.neunet.2006.10.001 |
HAONAN YU; JIA LI; YONGHONG TIAN; TIEJUN HUANG: "Automatic interesting object extraction from images using complementary saliency maps", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2010, pages 891 - 894 |
L. ITTI; C. KOCH; E. NIEBUR: "A model of saliency based visual attention for rapid scene analysis", IEEE PAMI, vol. 20, no. 11, November 1998 (1998-11-01), pages 1254 - 1259, XP001203933, DOI: doi:10.1109/34.730558 |
R. ACHANTA; S. HEMAMI; F. ESTRADA; S. SUSSTRUNK: "Frequency-tuned salient region detection", IEEE CVPR, 2009, pages 1597 - 1604 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105551060A (en) * | 2015-12-10 | 2016-05-04 | 电子科技大学 | Infrared weak small object detection method based on space-time significance and quaternary cosine transformation |
EP3489901A1 (en) * | 2017-11-24 | 2019-05-29 | V-Nova International Limited | Signal encoding |
WO2019101911A1 (en) * | 2017-11-24 | 2019-05-31 | V-Nova International Limited | Signal encoding |
Also Published As
Publication number | Publication date |
---|---|
WO2013034878A3 (en) | 2013-04-25 |
GB201115600D0 (en) | 2011-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016101883A1 (en) | Method for face beautification in real-time video and electronic equipment | |
KR102281184B1 (en) | Method and apparatus for calibrating image | |
US8498444B2 (en) | Blob representation in video processing | |
EP3644599B1 (en) | Video processing method and apparatus, electronic device, and storage medium | |
EP3709266A1 (en) | Human-tracking methods, apparatuses, systems, and storage media | |
US12100162B2 (en) | Method and system for obtaining and processing foreground image and background image | |
CN110335216B (en) | Image processing method, image processing apparatus, terminal device, and readable storage medium | |
US8577137B2 (en) | Image processing apparatus and method, and program | |
EP2863362B1 (en) | Method and apparatus for scene segmentation from focal stack images | |
CN111985281B (en) | Image generation model generation method and device and image generation method and device | |
CN110674759A (en) | Monocular face in-vivo detection method, device and equipment based on depth map | |
CN108805838B (en) | Image processing method, mobile terminal and computer readable storage medium | |
CN110348358B (en) | Skin color detection system, method, medium and computing device | |
CN111080537B (en) | Intelligent control method, medium, equipment and system for underwater robot | |
CN110111347B (en) | Image sign extraction method, device and storage medium | |
CN110796664A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN112348778A (en) | Object identification method and device, terminal equipment and storage medium | |
CN113658065B (en) | Image noise reduction method and device, computer readable medium and electronic equipment | |
CN113673584A (en) | Image detection method and related device | |
US9020269B2 (en) | Image processing device, image processing method, and recording medium | |
CN110717452A (en) | Image recognition method, device, terminal and computer readable storage medium | |
Katramados et al. | Real-time visual saliency by division of gaussians | |
CN117710868B (en) | Optimized extraction system and method for real-time video target | |
CN113205011B (en) | Image mask determining method and device, storage medium and electronic equipment | |
WO2013034878A2 (en) | Image processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12783250 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12783250 Country of ref document: EP Kind code of ref document: A2 |