WO2007004868A1

WO2007004868A1 - Method and apparatus for image characterization

Info

Publication number: WO2007004868A1
Application number: PCT/NL2006/000328
Authority: WO
Inventors: Jan-Mark Geusebroek
Original assignee: Universiteit Van Amsterdam
Priority date: 2005-07-06
Filing date: 2006-07-03
Publication date: 2007-01-11
Also published as: WO2007004864A1

Abstract

Method of characterizing an image comprising: defining one or more image areas of the image; analyzing color and/or intensity transitions within the image area of a predefined color basis; creating a density profile of said transitions in said image area; and fitting said density profile to a predefined parametrization function According to the method said density profile is characteristic for an image and can be used for image characterization purposes.

Description

Method and apparatus for image characterization

The invention relates to image characterization by visual inspection. In particular, the invention relates to a method of inspecting an image and associating the image with a predetermined characterization or category of objects. Object appearance is highly influenced by the imaging circumstances under which the object is viewed. Illumination color, shading effects, cast shadows, all affect the appearance of the object. Also local features have received much attention in the field of object recognition. Promising methods include the local SIFT (scale invariant feature transform) features proposed by Lowe, for instance discussed in US6711293. The dependence on local features is crucial for these methods. The SIFT method is however not related to analysing colouring aspects of an object.

Especially color characteristics of an object are highly sensible for local illumination circumstances. For this class of appearance effects, color invariant features are known to be very effective in emphasizing the native object characteristics. One publication discussing color invariants is US2003099376. However, this publication is not related to characterizing an object on a local level and the object recognition power is limited.

In "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope", by A. Oliva and A. Torralba, International Journal of Computer Vision 43(3)145-175, 2001 a scene classification is proposed that bypasses the segmentation and the processing of individual objects or regions. It is an aspect of the invention to provide an apparatus and method wherein coloring aspects are taken into account in order to improve a recognition ratio of objects to be inspected. It is another aspect of the invention to provide a reliable classification of objects according to characteristics invariant to local imaging and lighting conditions. In another aspect, it is an object to provide a local characterization of imaging areas for producing and reproducing an image.

Accordingly, the invention provides a method according to the features of claim 1.In another aspect, the invention provides an apparatus according to the features of claim 21. In particular, by defining a plurality of image areas in an image; analyzing color and/or intensity transitions within the image area of a predefined color basis; creating a density profile of said transitions in said image area; and fitting said density profile to a predefined parametrization, a robust image characterization method is provided. This is in particular the case, when these color transitions are made invariant to local lighting conditions using color invariants. Further, in particular, the method conforms to a natural image statistics characterization. Examples of such characterizations are Weibull type distributions or integrated Weibull distribution, also known as Generalized Gaussian or Generalized Laplacian. The invention will be further elucidated with reference to the figures. In the figures:

Fig 1 illustrates a score chart between the inventive method and a prior art recognition strategy;

Fig 2 illustrates another comparison between the inventive method and a prior art recognition strategy;

Fig 3 illustrates a local density distribution for various Weibull parameter values;

Fig 4 shows an image to be analyzed;

Fig 5 shows a retina of image analysis kernels for image analysis; Fig 6 shows an apparatus for visual image characterization according to the invention; and

Fig 7 an exemplary score chart is illustrated for categorization of images. In Fig 1 a score chart is illustrated of the inventive method and a prior art recognition strategy, in particular, the visual recognition strategy of the Lowe SIFT patent. It can be shown that where the prior art only has a high recognition score when an accepted fault tolerance is high, the method according to the invention shows a high recognition score with a much smaller fault tolerance. In particular, when a fault tolerance of 20% is accepted, the inventive method has a 95% recognition score. In contrast, the prior art score is then only 30%. It can be concluded that the method performs particularly well in view of the prior art method. In Fig 2 another score chart is shown, showing a fault recognition ratio (of a total of 1000 samples) for differing illumination conditions. These conditions are standardized according to the ALOI conditions as is detailed in "The Amsterdam Library of Object Images", Jan-Mark Geusebroek et al, International Journal of Computer Vision 61(1), 103-112, 2005. Here the "1" condition (11-18) refers to differing illumination angles; the "r" condition refers to differing viewing angles of the object relative to the camera; and the "c" condition relates to a frontal angle of the camera and corresponding azimuth of the illumination direction. The "i" condition relates to an illumination color, from reddish to white. It appears that the chart of the method of the invention yields a lower error score than the prior art method; for instance, a small rotation gives a dramatic increase of error score for the prior art method; whereas the inventive method only shows a light increase of error score for rotations close to 90° (0° indicating a frontal view).

In Fig 3 a probability distribution for a different set of gamma's (ranging from 0,5 to 2) is shown for a Weibull distribution, to be elaborated further below. It shows that larger gamma results in a broader distribution, with less pronounced tails, resulting in corresponding relative small local textureness variations in the picture. Smaller gamma, results in wilder inclinations and more distributed inclinations for color transitions in the picture. In Fig 4 a schematic approach is given of analysis of an image 1 showing an object 2 using a mask area or retina 3. Here, the mask area is defined by a predefined number of image areas 4 having a predetermined position relative to each other. Of each individual image area an error matching parameter is calculated by fitting a density profile of color transitions in said image area to a predefined parametrization function. In particular, the image area 4 is a Gaussian Kernel, given by eq. (9) herebelow. Furthermore, the scale of the kernel 4 can be adjusted to conform with scaling properties of the object to be inspected. In particular, the error matching parameter can be provided by eq. (16) and (17) further specified herebelow. An optimal recognition can be obtained by a total error matching parameter of the mask area defined as a product of error matching parameters of said image areas 4.

Fig 5 specifically shows a configuration of a retina or mask area 3. Here a total of 1+6+12+18 = 37 histograms are constructed (for convenience, only a few image areas 4 are referenced), while the kernels are positioned on a hexagonal grid having a spacing distance of roughly 2 σ (the kernel scale). Fig 6 finally shows an apparatus 5 for characterizing an object. The apparatus 5 comprises: an input 6 for receiving a digitized graphical image 7 and a circuit 8 arranged for defining one or more image areas of the object in said digitized graphical image. Accordingly, a number of preselected image areas are defined as explained with reference to Fig 4 and 5.

Furthermore, the apparatus 5 comprises a circuit 9 for receiving digitized input of the image area for analyzing color and/or intensity transitions within the image area of a predefined color basis. These color transitions result in a calculation of color invariant coefficients as further exemplified below with respect to eqs. (4)-(7). Also a circuit 10 is provided for creating a density profile based on the transitions calculated in circuit 9 and for fitting said density profile to a predefined parametrization function. The apparatus 5 further comprises an output 11 for providing the matching parameters of said density profile.

In particular, for image characterization purposes the apparatus 5 is communicatively coupled to a database 12 of a set of objects comprising predetermined density profiles characteristics; and matching circuitry 13 is provided for matching a measured density profile or characteristics thereof of said object to said predetermined density profile characteristics for outputting a response 14 in relation to recognizing said object. The matching circuitry 13 is arranged to provide an error matching parameter derived from the measured gamma and beta characteristics of a test density profile relatative to a targeted Weibull distribution.

In the remainder, prior to introducing the histogram based invariants according to the invention, a short overview is given of Color Invariant features. The analysis is started by transforming each pixel's RGB value to an opponent color representation,

(D

The rationale behind this transformation is that the RGB sensitivity curves of the camera are transformed to Gaussian basis functions, being the Gaussian and its first and second order derivative. Hence, the transformed values represent an opponent color system, measuring intensity, yellow versus blue, and red versus green.

Spatial scale is incorporated by convolving the opponent color images with a Gaussian filter, E(x, ^■£/, σ) E}, (2% y, σ)

where

Note that one now has a spatial Gaussian times a spectral Gaussian, yielding a combined Gaussian measurement in a spatio-spectral Hubert space. The spatial derivatives results in the color-NJet, denoted by its components {E, Ex, Ey, ..., Eλ , Eλx, Eλy,.., Eλλ , Eλλx, Eλλy, ...}.

Photometric invariance is now obtained by considering two non-linear transformations. The first one isolates intensity variation from chromatic variation, and is given by (leaving out parameters)

(4)

That is, all spatial derivatives of the intensity image ^ΛE normalized by intensity ^ΛE. The invariant W measures all intensity fluctuations except for overall intensity level. That is, edges due to shading, cast shadow, and albedo changes of the object surface. A more strict class of invariance is obtained by considering the chromatic invariant C,

_} (5) d^n+m ( E_λλ(x, y, σ)

Cλλxⁿ y^m — 1 dx^'ndy^m { E(x, y, σ

(6) for which the first order derivatives are given by (leaving out parameters) c — ^^λx^ ^~ ^ -^ n — ^XyE - E\E_y

^{~" λx ~} β2 ' ^{Oλ&t ~} β2 ^' (7)

*--^"λλ.τ —

Each of the invariants in C is composed by an algebraic combination of the color-NJet components. For example, Cλx is obtained by filtering the yellow-blue opponent color channel with a first order Gaussian derivative filtering, resulting in ^ΛEλx. This is pixel-wise multiplied by the Gaussian smoothed version of the intensity channel, ^ΛE , yielding ^ΛEλ x • ^ΛE. The second combination in the numerator of Cλx is obtain by smoothing the yellow -blue opponent channel, and multiplying with the Gaussian derivative of the intensity channel. The two parts are pixel-wise subtracted, and divided by the smoothed intensity squared, yielding the invariant under consideration. The invariant C measures all chromatic variation in the image, disregarding intensity variation. That is, all variation where the color of the pixels change. These invariants measure point-properties of the scene, and are referred to as point-based invariants.

Point-based invariants, as provided above, are well known to be unstable and noise sensitive. Increasing the scale of the Gaussian filters overcomes this partially. However, robustness is traded for invariance. In this section, a new class of invariant features is derived, which have high discriminative power, are robust to noise, and improve upon invariant properties of point-based invariants. The main idea is to construct local histograms of responses for the color invariants given in the previous section.

Localization is obtained by estimating the histogram under a kernel. Kernel based descriptors are known to be highly discriminative, and have been successfully applied in tracking applications.

Advantage of the use of an opponent color space, with additional photometric invariant transformations, is that color values are decorrelated.

Hence, for a distinctive image content descriptor, one may as well use the marginal, one dimensional, distributions for each of the color channels. This in contrast to the histogram of the full 2D chromatic or 3D color space. In the next sections, the one -dimensional channel histograms of the invariant gradients (Ww, Cλw, Cλλw}, or edge detectors (Wx, Wy, Cλx, Cλy, Cλλx, Cλλy}, are considered separately. The resulting histograms may be described by parameterized density functions. The parameters act as a new class of photometric and geometric invariants.

Localization and spatial extent (scale) of local histograms is obtained by weighing the contribution of pixels by a kernel

^¹W ⁼ Σ ^fei> - E₀, y ^~ yo)δ [r(x, y) - i] ^" (8) where δ is the Kronecker delta function, and where r(x; y) is a discretized version of one of the invariant gradients (Ww, Cλw, Cλλw}, or edge detectors (Wx, Wy, Cλx, Cλy, Cλλx, Cλλy}. The histogram h(i) is constructed by taking all pixels with discretized value i, and adding there weighed contribution, weighed by kernel k(.), to the histogram bin i. The choice of kernel should be such that the contribution to the histogram for pixels far away from the origin (xθ; yθ) approaches zero. A suitable kernel choice is provided by the Gaussian kernel,

The parameter σk represents the size of the kernel, not to be mistaken for the scale σ of the Gaussian filters in the previous section. Hence, there is provided an "inner" scale at which point measurements are taken, which are accumulated over an "outer" scale into a local histogram.

Besides spatial extent, a kernel may be introduced in the contrast direction. This boils down to the use of a kernel density estimator for the histogram of invariant edge responses. Next it will be shown that a known density function may be fitted through the histogram, effectively describing the data. In that case, the accuracy of histogram estimation is not of major concern.

From natural image statistics research, it is known that histograms of derivative filters can be well modelled by simple distributions. In previous work, we showed that histograms of Gaussian derivative filters in a large collection of images follow a Weibull type distribution. Furthermore, the gradient magnitude for invariants W and C given above follow a Weibull distribution,

where r represents the response for one of the invariants (Ww, Cλw, Cλλw}. The local histogram of invariants of derivative filters can be well modelled by an integrated Weibull type distribution ^•v

T

/H = exp

2TFiST (l/₇) 7 5

(H)

In this case, r represents the response for one of the invariants {Wx, Wy, Cλx, Cλy, Cλλx, Cλλy},. Furthermore, T(α) represents the complete Gamma

function, Y(a) = . See Figure 1 for examples of the distribution. For

the Weibull distribution, an expression for the MLE (Maximum Likelyhood Estimation) of β and γ is given by

assuming zero centered data. For the local invariant histograms, the histogram density was converted to Weibull form by first centering the data at the mode μ of the distribution. Then, multiplying each bin frequency p(ri) by its absolute response value | p(ri) | , and normalizing the distribution. These transformations allows the estimation of the Weibull density parameters β and γ- Note that the density estimation is only marginally sensitive to histogram quantization effects. A too small number of bins will yield poor estimates of the Weibull parameters. Too many bins will have no influence, the limit being one bin for each data point. In that case, the parameters may as well be estimated from the data directly. In general, this yields optimal estimates but at the cost of considerable computing time (typically seconds). As a rule of thumb, choosing the number of bins in the order of the one- dimensional effective extent of the kernel K will yield a good estimate of the parameters, at low computational cost (in the order of milliseconds).

So far, Weibull parameters were estimated from either the gradient magnitude of invariant color edge filters, or directly from the derivative response of these filters. Regarding the former case, rotation invariance is trivially obtained by the rotational symmetry of the gradient operator and the rotational symmetry of the kernel K(.) Eq. (9). However, when assessing the response of derivative filters in the latter case, derivatives are taken in a particular direction. It was found previously that Weibull parameters are close to elliptical when plotted as function of orientation. This empirical finding may be explained by the steerable characteristic of the Gaussian derivative filter. If one takes a derivative filter in the x and y-direction, a derivative in any other direction may be achieved by the linear combination [2]

Eg = E_x cos θ H- E_y sin θ

(14) where Ex and Ey represent the response to the x and y- derivative filter, respectively, and where Eθ is resulting response of a derivative filter in the θ-direction. Each of the Ex and Ey responses are characterized by an integral Weibull type probability density, although they may have different parameters.

Hence, their weighted sum results in a probability density which is given by the linear convolution of the two densities. As a consequence, the Weibull parameters span ellipses when plotted as function of angle. The shortest and longest principal axis for β and γ, together with the orientation of the ellipse, indicate the directional structure in the underlying edge map. To achieve rotation invariance, one needs to estimate the longest and shortest principal axes of these ellipses, disregarding the ellipse's orientation. Many methods exists for elliptic fitting. As a simple example, one could estimate the γ and β for 0°, 45°, 90°, and 135°, and use a least square fitting to obtain the shortest and longest axes γs, γl, βs, and βl, which characterizes the local histogram invariant to rotation of the original image.

For translational invariance, consider the two steps of the algorithm, being the edge detection and the local histogram formation. The edge detection boils' down to a combination of convolution operators, which is inherently translation invariant. The local histogram formation is parameterized by the kernel K(x; y; σk) Eq. (9), fixed at its location (x; y). Translation invariance here is obtained by a dense sampling over all image locations, where "dense" implies a sampling such that kernel centers are typically σk apart. Hence, translational invariance is ensured by the convolution operator, followed by densely sampling the local histograms. Regarding scale invariance, both the invariant edge detectors as well as the local histogram kernel have a scale parameter. Scale invariance is achieved by sampling over increasing scale. Scale selection methods can be applied to detect locally the optimal scale for describing image content [10].

Consider the parameters μ, β, and γ estimated from the local histograms of point-invariants. Recall the point invariants W and C to measure invariant edge strength. The local histogram of Ww represents the possibility of finding an edge with contrast w within the local region described by a kernel K. Smooth transitions in the image, smooth compared to the size of the kernel K, will cause a shift in the edge histogram. Hence, its mode μ will be shifted from zero. Such shifts are typically caused by uneven illumination, large scale shading effects, and -most prominently in the chromatic C invariant- by colored illumination. Hence, to achieve color constancy, the value of μ may be ignored. The remaining parameters β and γ indicate the (local) edge contrast, and the (local) roughness or textureness, respectively [5].

To assess the difference between two local histograms, fitted by a Weibull distribution, a goodness-of-fit tests is considered between the two respective distributions. A sensitive goodness-of-fit test is obtained when considering the integrated squared error between the cumulative distributions, which is obtained by the Cramer-von Mises statistic,

W² =

Here, F* represents the test distribution, and F the target cumulative distribution function under consideration. For the Weibull distribution with different parameters γl βl γ2 β2, a first order Taylor approximation boils down to the log difference between the parameters. Hence, assessing the ratio between the parameter values,

- 1 ^Υ~ _ 1 ^β-

7+ ^{^ '} β+

(16) where γ+ and γ- are the and are the largest and smallest of γl and γ2, respectively. Similarly, β+ and β- and represent the values for beta. In this way, the errors are normalized between zero and one. Due to independence between the and γ and β parameter, the total error is obtained by

^{' '} (17) Alternatively, one may use the Mallows distance to end-up with an error measure between two Weibull distributions. In this case, one. measures the integrated distance between the quantile functions of the two cumulative Weibull distributions F and F*,

which yields the exact result ε = A²r(^) - 2A /OT + -+-) + A²r£±21)

+ O₁ - /Z₂)² + 2A(M -2A(/i, - y«2)r(i + )

where the last part vanishes if μ is not considered to be important, μi- β2=0. Other distance measures may be used without departing from the scope of the invention.

Note that, for non-overlapping histograms taken at various locations, the histograms may be assumed to be independent, and errors may be multiplied to yield a total error. To assess the constancy of the proposed invariants under varying illumination conditions, the features have been applied to the ALOI collection. The collection consists of 1,000 objects recorded under various imaging circumstances. Specifically, viewing angle, illumination angle, and illumination color is systematically varied for each object, resulting in over a hundred images of each object. Color constancy, one of the hardest cases of illumination invariance, is tested by assessing the variation in the parameters of the Weibull fit as function of illumination color. Therefore, the illO : : : i250 recordings of the 1,000 objects in the ALOI collection are considered, yielding black-body illumination in the range of 2175 up to 3075 Kelvin. Parameters were fitted for the central local histogram of the collection, at a scale σk = 30 pixels. The scale for the derivative filters was set at σ = 4 pixels. To compare the proposed features with point-based invariants, the experiments were repeated for Gaussian point-based features at scale σ = 4 and at scale σ = 30, respectively. In this way, insight is gained in the behaviour of the features at which the local histograms are based. Furthermore, comparison with point- based features having the same spatial extent as the local histograms can be made.

Discriminative power was tested by counting the number of patches which could not be distinguished within the 1,000 objects in the ALOI collection. Therefore, the local histogram at the centre of the 18cl image was constructed, and values were compared against the same patch for the other objects. A patch was counted as indistinguishable if values were within the relative error at i = 110 in the previous graphs. If one only considers intensity W, 858 and 679 patches are indistinguishable for respectively σ = 4 and σ = 30. For β, 672 patches are not distinguishable out of 920. For γ, 742 patches are undistinguishable. If one combines γ and β, only 29 objects are indistinguishable. For chromatic variation only, that is Cλ and Cλλ, 884 and 816 patches are indistinguishable for respectively σ = 1 and σ = 7:5. For β, 636 patches are not distinguishable out of 920. For γ, 565 patches are undistinguishable. If one combines γ and β, only 91 objects are indistinguishable. Most discriminative power is obtained if one combines intensity and chromaticity by taking W, Cλ, and Cλλ into account. In that case, 629 and 170 patches are indistinguishable for the point-based invariants at respectively σ = 4 and σ = 30. For β, 75 patches are not distinguishable out of 920. For γ, 28 patches are undistinguishable. If one combines γ and β, all 920 objects can be recognized. Hence, the newly proposed features outperform point-based invariants with respect to discriminative power.

To illustrate the effectiveness of the proposed features in capturing object properties, a simple algorithm for object recognition and localization is suggested. An object is characterized by learning the invariant Weibull parameters at fixed locations in the training image, representing a sort of fixed "retina" of receptive fields as discussed with reference to Figs 4 and 5. The kernels are positioned at a hexagonal grid, 2σ apart, on three rings from the center of the image. Hence, a total of 1 + 6 + 12 + 18 = 37 histograms are constructed. For each histogram, the invariant Weibull parameters are estimated. The same retinal structure is swept over the target image, and values are compared (Eq. (16)) against the object under consideration (or a database). Hence, the example objects are searched within the composition. The proposed recognition algorithm runs at two frames per second, allowing close to real time recognition rates.

It was found that the algorithm is not too sensitive for differences in background. Even a cluttered background yields correct recognition and despite illumination conditions, the proposed features are well able to capture the intrinsic texture of the material. Furthermore, tests were performed where a composite image was captured by a high-quality CMOS 3RGB camera (Sigma SD9) at night, with electronic flash. The sensitivity curves of the CMOS RGB sensors is known to be highly non-linear. The example objects are captured by a digital video camera (JVC GR-DX25), with digital video compressed output, under daylight/direct sunlight (the bear). Without any calibration, the change in equipment appears not to influence results, demonstrating the robustness of the proposed features. The proposed features appear to be highly stable against compression artifacts.

Although the method hitherto has been explained with reference to a photometric reflectance model, also, other models can be used for determining photometric invariants, for example, by deriving coloring coefficients from a transmitted light model (for instance, for the purposes of image analysis in light microscopy), a scattered or transluded or diffused light model (for example, in the analysis of images with diffused light such as translucent plastics), or a fluorescent light model (for instance, for purposes of cell classification methods in fluorescence microscopy/flow cytometry). Furthermore, while the method has been explained with reference to fitting to Weibull functions, generally, given the discretized probability density of the values occurring in the edge image also other functions can be fitted through the histogram values in order to summarize the content of the histogram by the parameters of the simple function. Such functions should have a settable origin, a settable width, and optionally a settable peakness. For example, consider a Gaussian function G(r; m , σ) = l/(sqrt(2 pi) sigma) exp(((x-m)/σ )^Λ2) which has a origin given by its mean "m" and a width given by its standard deviation "σ".

Alternatively, consider an double exponential e(r; m, σ) = 1/(2 sigma) exp(- 1 r-m | / σ) origin given by m, width given by σ . Furthermore, consider the integral form of the Weibull distribution (Weibull, 1951), f(r; m, b , g) = 1/(2 b g^Λ(l/g) Gamma(l/g)) exp(- 1 (r-m )/b | ^Λg) where Gamma(t) is the complete Gamma function, and where m denotes origin, b the width, and g the peakness of the distribution. In the case of gradient magnitude, which is an absolute positive quantity by combining orthogonal filter responses grad_mag = sqrt(Filter_x^Λ2+Filter_jr^Λ2), the response distribution is best characterized by a simple distribution limited at 0 at one extreme, going to infinite at the other extreme, having a settable width and optionally a skewness parameter.

Consider for example the Rayleigh distribution f(x;beta) = (2/b)x exp(-(x/b)^A2) where beta represents the width, or alternatively the Weibull distribution, f(x; b, g) = gib x^Λ(g -1) exp(-(x/b)^Λg) where b indicates its width and g its skewness. Of course, many additional alternatives can be constructed. Furthermore, the class of "simple" peaked functions is well approximated by combinations of other simple function, for example by a Mixture of Gaussians. The parameters of the simple function can be optimized such that the parameterized function optimally or sub optimally fits the histogram. Then, the values of the histogram are characterized by the chosen function and its parameters. Hence, the object representation is now coded by the few parameters of the simple function rather than by the original discretized values of the histogram. This results in a large data reduction, such that more objects can be stored in a small memory than the original histogram representation. Consequently, more objects may be stored and faster search times are achieved. Additionally, invariant properties of the parameters can be characterized. Furthermore, while the method and apparatus have been discussed in the context of object recognition, more generally, object or image characterization is well within reach of the scope of the invention. These aspects can indeed be used in even a broader context, for instance, in the context of rendering virtual reality images and/or compression technology or image classification in search machines. For instance classifying statements can be provided for an image such as whether the image is indoor/outdoor, close/distant, portrait, close-up, macro photograph, landscape, rural, urban. In addition, a focus of interest can be provided for analyzing an image, to classify an item in said image and/or to classify the image to derive an image characterization. This latter aspect is determined more by the ensemble of items, rather than by individual items themselves. Thus, with scene categorization, background and surrounding items, that is, the context, is included in the image analysis.

According to an aspect of the invention, statistical information that is locally available in images, can also be used to categorize a scene. An example of such a categorization may be close-up, indoor, outdoor, panorama. At a higher level of semantics, one may aim at categorizing the sort of items in the image: anchorman, explosion, boats, rural, city view, traffic jam. Both categorizations have strong correlations with the statistical structure of the scene. For example, considering Fig. 4 and Fig. 5, these figures show a retinal structure of a connected form, that is swept through an image, for identifying relevant items in the image, which are, by nature, of a connected form too. However, providing a more distributed form of a retinal structure, wherein for example randomly selected elements of an image are taken into account, or rather a fixed distributed form that is not swept through the image, can be used for image characterization. Of course, such a distributed form can also be acquired by combining several sequential inputs from the retinal structure, when swept through the image. In this mode, there is no need to provide a solid object recognition on item level in an image, but the statistics of the ensemble of items can be used to characterize the image. Furthermore, by scaling the image, a selection can be provided relative to the retinal structure, of image elements to be analyzed.

This aspect can be used to provide semantic access to image collections. In particular, complex scenes can be decomposed in proto-concepts like vegetation, water, fire, sky etc. A proto-concept can thus be described as an image characteristic that provides an image context to an image object.

Given a fixed vocabulary of proto-concepts, a similarity score can be assigned to all proto-concepts for all regions in an image. Different combinations of a similarity histogram of proto-concepts provide a sufficient characterization of a complex scene. By characterizing a similarity to a number of predetermined vocabulary elements an alternative is given to conventional codebook approaches. A codebook approach uses the single, best matching vocabulary element to represent an image patch. For example, given a blue area, the codebook approach must choose between water and sky, leaving no room for uncertainty.

According to an aspect of the invention, a distance can be provided of a scene to a number of predetermined vocabulary elements. Hence, an uncertainty is modelled of assigning an image patch to each vocabulary element. By using similarities to the whole vocabulary, scenes can be modelled that consist of items not in the codebook vocabulary.

Discussing the scene-characterization in more detail, a proto-concept occurrence histogram can be provided to characterize both global and local texture information. Global information can then be described by computing an occurrence histogram accumulated over all regions in the image. Local information can be taken into account by constructing another occurrence histogram for only the response of the best matching region. For each proto- concept, or bin b, the accumulated occurrence histogram and the best occurrence histogram are constructed by

tf—W = r£ ΣR$ im ) a.ξ Σ.A{b) ^cl|0'^r> ^•

Hbettib) =

where R(im) denotes the set of regions in image im, A(b) represents the set of stored annotations for protoconcept b,- and C² is the Cramer-von Mises statistic as introduced in equation (15).

The quantity Haccu, denoting a proto-concept occurrence histogram for proto -concepts (b), thus counts the relative amount of proto-concepts present in a scene, hence how much of a proto-concept is present in a scene. This feature is important for characterizing, for example, airplanes and boats. In these cases, the accumulated histogram indicates the presence of a large water body or a large area of sky. The quantity Hbest indicates the presence of protoconcepts, hence indicates which proto-concepts provide a best match with items present in a scene.

In Fig. 7 an exemplary score chart is illustrated for categorization of images according to an aspect of the inventive image characterization method. Scores of correct classification of the first 100 results for each category are given, retrieved from a large collection of more than 50,000 photos, whereas categories were learned from a separate collection of 10,000 photos. It shows that the method is well able to categorize images of Sunsets, Flowers, Aviation, Fireworks, Forest, Mountain, Boats, and Architecture. Good results are obtained for the other categories.

While the invention has been described with reference to the drawings, it is not limited thereto and these embodiments are for illustrative purposes only, where variations and modifications to the basic inventive concept is possible without departing from the scope of the claims, as defined hereinafter.

Claims

1. Method of characterizing an image comprising:

- defining a plurality of image areas in the image;

- analyzing color and/or intensity transitions within the image area of a predefined color basis; - creating a density profile of said transitions in said image area; and

- fitting said density profile to a predefined parametrization function.

2. Method according to claim 1, wherein said parametrization conforms to a natural image statistics characterization.

3. Method according to claim 1, wherein said parametrization conforms to a Weibull distribution or integrated Weibull distribution.

4. Method according to any of claims 1-3, further comprising defining an error matching parameter according to a Cramer von Mises statistic, or according to a Mallows distance.

5. Method according to claim 4, wherein a gamma error and a beta error for gamma and beta are derived from a test density profile relatative to a targeted Weibull distribution p'(r), defining gamma and beta as

6. Method according to claim 5, wherein said gamma error ε(γ) is given by: ε(γ) = 1- γ-/γ+; wherein γ- is the smaller and γ+ is the larger of the γ parameters of the test and target distributions; and wherein said βerror ε(β) is given by: ε(β) = 1- β-/β+; wherein β- is the smaller and β+ is the larger of the γ parameters of the test and target distributions.

7. Method according to claim 5 and 6, further comprising a step of analysing color transitions in at least two different directions; comparing the density distributions in said at least two directions; and deriving a rotation invariant feature from said at least two density distributions.

8. Method according to claim 7, wherein said rotation invariant feature is provided from the group of long and a short axes of an ellipse describing a Weibull parameter curve as a function of rotation.

9. Method according to claim 1, further comprising providing a mask area comprising a predefined number of image areas having a predetermined position relative to each other, wherein a total error matching parameter of the mask area is defined as the product of error matching parameters of said image areas.

10. Method according to claim 9, wherein the mask area is of a connected form for providing object recognition purposes.

11. Method according to claim 9, wherein the mask area is of a distributed form for providing image classification purposes.

12. Method according to claim 1, wherein said image area is defined by a Gausssian kernel.

13. Method according to any of the preceding claims, wherein said method further comprises providing a database of a set of predetermined density profiles characteristics; and matching a measured density profile or characteristics thereof of said image to said predetermined density profile characteristics for recognizing said image.

14. Method according to claim 13, wherein said set of predetermined density profiles characteristics is associated to a collection of predetermined items to be matched for object recognition purposes.

15. Method according to claim 13, wherein said set of predetermined density profiles characteristics is associated to a collection of image characteristics that provide an image context for image classification purposes.

16. Method according to claim 1, wherein said color basis is an invariant color basis providing point coloring characteristics of said image which are invariant to lighting and imaging conditions.

17. Method according to claim 16, wherein said invariant color basis is formed by coloring coefficients derived from a photometric reflectance model of imaging light reflected by items in said image.

18. Method according to claim 16, wherein said invariant color basis is formed by coloring coefficients derived from a transmitted light model, a scattered or transluded or diffused light model, or a fluorescent light model.

19. Method according to claim 17, wherein said color invariant coefficients are expressed by a spatial derivatives W of a normalized intensity value; and spatial derivatives C of a chromatic variation value of opponent color images transforming red, green and blue in intensity, yellow/blue contrast and red/green contrast.

20. Method according to claim 1, wherein a density profile is created from an image area by spatially integrating said point coloring characteristics by a weighted contribution in a predefined image area.

21. Apparatus for characterizing an image comprising: - an input for receiving a digitized graphical image;

- a circuit arranged for defining a plurality of image areas in the image; - analyzing color and/or intensity transitions within the image area of a predefined color basis;

- creating a density profile of said transitions in said image area; and

- fitting said density profile to a predefined parametrization function; and — an output for providing the matching parameters of said density profile.

22. Apparatus according to claim 21, wherein apparatus is further communicatively coupled to a database comprising predetermined density profiles characteristics; and matching circuitry is provided for matching a measured density profile or characteristics thereof of said images to said predetermined density profile characteristics for recognizing said image.

23. Apparatus according to claim 21, wherein said set of predetermined density profiles characteristics is associated to a collection of predetermined items to be matched for object recognition purposes.

24. Apparatus according to claim 21, wherein said set of predetermined density profiles characteristics is associated to a collection of image characteristics that provide an image context for image classification purposes.