WO2006090313A2

WO2006090313A2 - Object recognition using adrc (adaptive dynamic range coding)

Info

Publication number: WO2006090313A2
Application number: PCT/IB2006/050518
Authority: WO
Inventors: Ahmet Ekin
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2005-02-24
Filing date: 2006-02-17
Publication date: 2006-08-31
Also published as: WO2006090313A3

Abstract

The invention relates to type identification of one or more digital objects by means of texture-based features using an Adaptive Dynamic Range Coding (ADRC) process. A method for identifying a type of one or more digital objects in a digital data representation, such as in image data, an object detector, a system for object detection, as well as applications of the method are disclosed. A type may be identified by transforming at least a part of the digital data representation from a first form, such as an image form to a second form by means of an Adaptive Dynamic Range Coding (ADRC) process. The second form being a histogram representation of a probability density function computed from texture feature statistics. The type identification is obtained from a probabilistic-based comparison between the digital data in the second form and type data in a type repository.

Description

Type identification of digital object(s)

The invention relates to identification of a type of one or more digital objects. In particular the invention relates to type identification by means of texture-based features.

Object detection in digital image data is interesting for a number of applications, such as a surveillance system, a face recognition system, a video-based computer/machine interface, etc. It has proven a difficult task to provide a machine, e.g. a computer-based machine with the ability of recognizing a type of an object in an image with a high certainty, i.e. assigning an abstract type to a concrete ensemble of image elements. Methods of identifying a type of a digital object include differentiating one object from the other by means of texture features. In texture-based object detection, however, extraction of texture features, such as Gabor filter coefficients, is usually computational heavy. Because of that, instead of texture features, easily computable color- based features may be used to detect objects. The problem with using based-based features is the variability of object based features such as due to the changes in illumination, image capturing parameters, and the employed based spaces.

The publication "Face Detection in Still Gray Images", B. Heisele, T. Poggio and M. Pontil, A.I. Memo 1687, Massachusetts Institute of Technology, 2000; describes a trainable system for detecting frontal and near- frontal views of faces in still grey images by use of Support Vector Machines (SVMs). Object analysis based on SVM is, however, computational heavy.

The inventor of the present invention has appreciated that an improved method for type identification of a digital object is of benefit, and has in consequence devised the present invention.

The present invention seeks to provide improved means for identifying one or more objects in a digital data representation. Preferably, the invention alleviates or mitigates one or more of the above or other disadvantages singly or in any combination. Accordingly there is provided, in a first aspect, a method for identifying a type of one or more digital objects in a digital data representation, the method comprising the steps of: transforming at least a part of the digital data representation from a first form to a second form, and identifying the type of the one or more digital objects from a probabilistic- based comparison between the digital data in the second form and type data in a type repository, wherein the digital data is transformed into the second form by means of an Adaptive Dynamic Range Coding (ADRC) process.

A digital data representation may be an image, such as a 2D image made up pixels, it may be a 3D image made up of voxels, it may be a stream of images, such as a video stream, etc. The format of the image may be any type of format, such as standard image and video stream formats. An object in the image may be any kind of image objects, such as a graphical object defined by a selection of image elements.

The digital data representation may initially be in a first form, and at least a part of the digital data representation may be transformed into a second form. The second form may be obtained by means of running one or more algorithms, one or more mathematical transformations, etc. on the data to obtain the digital data in the second form. The data may in the transformation process be present in one or more intermediate forms, e.g. in connection with running a number of algorithms. The dimensionality of the data may be altered in the transformation, e.g. 2D and/or 3D image data, may be transformed into ID data. The transformation to the second form may include statistical data analysis.

A repository of type data may be consulted, and based on a probabilistic comparison between the data in the second form and the type data in the repository, may a type of the object be identified. The probabilistic comparison may include a likelihood analysis assessing a statistical likelihood between the data in the second form and the type data to determine whether or not an object of a specific type is represented in the type repository. The type repository may comprise one or more data sets, each data set corresponding to data of a specific type or data specifically not corresponding to a given type.

The transformation of the data from the first type to the second type may be done by means of an Adaptive Dynamic Range Coding (ADRC) process. An ADRC process is a method to efficiently extract texture characteristics of image data. See e.g. US 5,241,381 and US 5,825,313. A range of advantages may be attributed to identifying a type by means of an ADRC process. ADRC processing offers a means for fast extraction of texture features in a digital image. Texture-based features bring about large descriptive power for object detection, extracting them, however, is typically costly. The present invention circumvents this problem by describing objects by easily computable ADRC features. Furthermore, ADRC features are in widespread use for spatial image up-scaling and temporal video up- conversion. ADRC features may therefore, in a number of applications, such as TV-sets, DVD-players etc., already be computed for the spatial and/or temporal up-sampling applications, and the current invention may immediately benefit form this, since after only minor extension can available temporal and spatial up-sampling architectures be provided with detection capability. And even if such ADRC features are not already provided for other purposes, may the features be extracted at low cost.

The features as defined in claim 2 has the advantage that a histogram representation facilitates fast and easy extraction of texture feature statistics, fast and easy as compared to standard texture feature analytical tools, such as texture features based on Gabor filter responses.

The features as defined in claim 3 has the advantage that a pattern of characteristics, such as texture features of the image elements located near the image element of interest, may be chosen based on a type of the object. This is advantageous since texture features of different types of objects may be different for different object types, and a pattern may be chosen which is known to provide a high certainty in type identification. Alternatively, in a situation where a type may not be identified with a sufficiently high certainty, different patterns may be tried to improve the certainty. Another advantage may be that the complexity of the pattern, e.g. size of the pattern and/or the inclusion of different weights to different pixel elements in the pattern may be altered in accordance with the type of digital data representation, resolution of the digital data representation, computational power, accepted calculation time, etc.

The feature as defined in claim 4 has the advantage that a mathematical formalism describing probability density iunction (pfd) is available. Probability density function formalism is a generic object detection framework and applicable to any type of object, ensuring robust and trustworthy analysis. Furthermore, the construction of a histogram non-parametric pdf is faster than the construction of parametric pdfs, such as the ones based on gaussian mixture models (GMMs). The feature as defined in claim 5 has the advantage that by comparison to both type data and type data different from the type, e.g. non-type data, a more efficient type identification may be provided.

The feature as defined in claim 6 has the advantage that type data may be generated at a specified identification resolution, or at a number of predefined identification resolutions. This is advantageous since the type data may be present in a ready-to-use form which may be easily accessed, e.g. in look-up tables.

According to a second aspect is provided an object detector for identifying a type of one or more digital objects in a digital data representation, the object detector comprising: a transformer for transforming at least a part of the digital data representation from a first form to a second form, and an analyzer for identifying the type of the one or more digital objects from a probabilistic-based comparison between the digital data in the second form and type data in a type repository, wherein the digital data is transformed into the second form by means of an Adaptive Dynamic Range Coding (ADRC) process.

The method according to the first aspect of the invention may be implemented in a device, such as a stand-alone device or a module suitable for implementation in a device for providing object identification capability to the device. The implementation may be provided by means of software implementation or hardware implementation, e.g. in an implementation comprising one or more ICs, or any other suitable way of implementation.

It is an advantage to provide an object detector since such a device may be part of, or may easily be made part of a device where object detection is desirable. According to a third aspect of the invention is provided an integrated circuit

(IC) for identification of a type of one or more digital objects in a digital data representation, the IC being adapted to identify a digital object according to the first aspect of the invention.

The IC may be a single chip, a group of chips or an electronic circuit comprising a variety of electronic components. Especially, may the IC be incorporated as a part of the object detector according to the second aspect of the invention.

According to a fourth aspect of the invention is provided a computer readable code for identification of a type of one or more digital objects in a digital data representation, the code being adapted to implement the method according to the first aspect of the invention. The computer readable code may be implemented in an object detector according to the second aspect of the invention and/or in an IC according to the third aspect of the invention.

According to a fifth aspect of the invention is provided a system for identification of a type of one or more digital objects in a digital data representation, the system comprising: an input module for inputting at least a part of the digital data representation, a transforming module for transforming the digital data representation accessed from the input module from a first form to a second form, - a repository for storing type data, an identification module for identifying the type of the one or more digital objects from a probabilistic-based comparison between the digital data in the second form and type data in the type repository, and an output module for outputting a type of the identified one or more digital objects, wherein the digital data is transformed into the second form by means of an Adaptive Dynamic Range Coding (ADRC) process.

The input module may be a software application or a hardware section, e.g. an interlace means for interfacing one or more signals, such as data streams, to the transformer module comprising a processing means. However, in general may the input module be any type of means provided for feeding or providing one or more signals or data to the transforming module. The input signal may be an output signal from a given unit, e.g. an input signal may be a signal provided by a processing means storing digital data for visual or other purposes. The transforming module and the identification module may comprise separate or shared processing means. The processing means may be any type of processing means, both dedicated processing means, or the processing means may be part of general purpose computer, such as a computer program. The output module may be a storage means enabling access to the result or the output module, e.g. as an intermediate step in connection with showing the result graphically. The system may be a system for implementing the method according to the first aspect of the invention and/or a system including the object detector according to the second aspect of the invention. Furthermore, may the system implement IC means according to the third aspect of the invention, as well as computer readable code according to the forth aspect of the invention. In general may the various aspects of the invention may be combined and coupled in any way possible within the scope of the invention.

These and other aspects, features and/or advantages of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.

Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which:

Fig. 1 shows a general scheme illustrating an embodiment of the present invention,

Fig. 2 shows an example of a digital data representation in the first form, Fig. 3 shows a schematic illustration a selection of image elements located near an image element of interest, and

Fig. 4 shows plots of histograms computed from a collection of face and non- face images.

A general scheme illustrating an embodiment of the present invention for identifying a type of one or more digital objects in a digital data representation is presented in Fig. 1. A specific type of an object refers to, for example, whether an object is a car, a face, a text region, etc. A digital data representation is in connection with the description of Figs. 1 to 4 exemplified by a 2D image. It is however to be understood that a digital data representation is not limited to a 2D image.

In Fig. 1 is a digital object provided as a digital data representation in a first step 10 in a first form. The digital data representation of the object is in another step 11 transformed into a second form. In a further step 12 is the digital data representation in the second form compared to type data in a type repository 13, resulting in a type identification 14 of the object present in the digital data representation in the first form.

The digital data representation in the first form may be an image represented in a bitmap type representation, such as a standard bitmap format, e.g. a jpeg, gif, bmp, etc. format. The image in the first form may be transformed into the second form by means of an Adaptive Dynamic Range Coding (ADRC) process, where ADRC features are extracted from the image in the first form, and used to generate the second form. The description of an object by ADRC features according to an embodiment of the present invention is described first.

An example of an image 20 in the first form is provided in Fig. 2. A part 21 of the image, i.e. a selection of image elements (pixels 22), may be analyzed in a pixel-by-pixel manner, where a value (ADRC value) may be assigned to each pixel in the selection of image elements. The image shown in Fig. 2 shows only one object 23, being a face, however it is to be understood that more than one object may be present in an image. The present invention may be applied to more objects, e.g. by selecting one object to be identified or by sequential (or parallel) identification of more than one object in an image. For each pixel in the selection of image elements is ADRC features computed for a plurality of image elements, typically being a local window with a given aperture size (aperture size of 3x3, 5x5, 7x7 are common), which is shifted pixel by pixel to span the whole image.

A schematic illustration of an area 30 containing 5x5 pixels is illustrated in Fig. 3. The local area being a part of the selection of image elements. The ADRC value of a pixel (here illustrated for the pixel designated 0) is computed for the given window (aperture) size 31. The ADRC value is based on the computation of the average (/_αvg) of the pixel intensity values in the window and the assignment of the pixels in the same window to a level based on their differences from that average value, referred to as L-levels. In the embodiment illustrated here, an aperture size 31 of 3x3 is used, and the case where L=2 is explained, it is however to be understood that L may be larger resulting in a larger number of possible patterns. In the L=2 case, if a pixel intensity value in the window is greater than I_avg, the pixel is assigned to one, and otherwise zero. These settings result in 511 (2^3x3 -1) possible patterns for a window (all zero or all one cases, depending on where the equality is assigned, if the all one case is impossible to realize due to the definition of ADRC; two is subtracted). The resulting patterns of the local windows are classified into a set of classes. The patterns are thus a pattern of texture characteristics (here whether the individual pixels are darker or lighter than the average) of the image elements located near the image element of interest, i.e. the center pixel 0. The local window is in the present embodiment a square including nearest and next nearest neighbors, however other shapes may be employed, such as shapes with an envelope shape of a rectangle, a triangle, a polygon, a circle, an ellipse, etc.

Thus for the window designated 31 in Fig. 3 A, is the average intensity calculated and the intensities of the pixels 0-8 are evaluated according to whether they posses a greater or a smaller intensity than the average value. This is exemplified in Fig. 2B where pixels 1, 7 and 8 are found (by way of example) to be darker whereas the pixels 0, 2-6 are found to be brighter. Based on all the pixel assignments in the window, a class of the pattern is computed and assigned to the center pixel of the window, i.e. to pixel 0. The class may e.g. be a corresponding binary number, such as the class of the window illustrated in Fig. 3B may be the binary number: 011111100 that is assigned to pixel 0. In this way, each pixel can be assigned to a value in the range [0, L^MxM- 1]. These values are binned in a histogram, where each bin counts the number of times a given pattern appears for each pixel in the image or in a subsection of the image. The ADRC values are used to obtain the digital data representation in the second form. In a given embodiment may the histogram of the ADRC values, provided as explained above, be taken as the data representation in the second form.

The ADRC values can be described in terms of object probability density functions (pdfs). A pdf can be described either in parametric form or in non-parametric form. Parametric pdfs are powerful if the assumed distribution is true. However, if not, estimated pdf will be very different from the true one. In contrast to this drawback of parametric density estimation methods, non-parametric descriptors do not depend on such assumptions. As a result, the use of a non-parametric descriptor (a histogram) may be preferable to represent the pdf of an object.

In the following, identification of a type of an object is exemplified in terms of face recognition, i.e. the type of the object is taken to be a face. In order to obtain the pdf of a face class, i.e. in order to obtain a type repository (i.e. a face repository), ADRC statistics are collected for a number of face images to compute a face histogram (Hf_ace). Please note that all the histograms are normalized, so that the sum of their bins is equal to one. A face histogram 40 is illustrated in Fig. 4A. It may be more efficient to also model non- faces, by collected images that do not contain faces and from those images, build non- face histogram, (H_non.f_ace). A non-face histogram 41 is illustrated in Fig. 4B. By using both face and non-face data or generally first and second data referring to object and non-object data, the type identification may be based on comparison to first and second data.

When a new image is presented to detect a particular type of object, the ADRC statistics is collected to compute the histogram of the unknown object, H_unknOwn. In an embodiment of the present invention may the statistics be computed from a block of pixels with a given identification resolution (e.g. a fixed block size of 24x24). Larger objects can be detected by down-sampling the image so that the object size gets close to the predefined block size, and vice versa for smaller objects can be up-sampled. The size-independence of the object detector can be guaranteed by running the detector at multiple scales. In order to verify that the histogram computed from the image is a face (or a known object), a probabilistic-based comparison may be made, in this embodiment by computing the similarity of H_unknOwn to Hf_ace and H_non.f_ace . The similarity to face (Sf_aCe) and non-face histograms (S_non.face) are found by employing a popular histogram similarity metric. Without limiting the scope of the invention, an embodiment utilizing histogram intersection as defined below for histograms H₁ and H₂ is explained.

∑wm{H₁₃H₂ ) S(H₁₃H₂) = -^-z

(1)

Detection decision may in a given embodiment be based on the similarity values and several thresholds. A type, i.e. a face, may be set as is detected if the following conditions are satisfied:

S _face > Thr_x and S_face - S_non__face > Thr₂ (2)

where we set 7%n=0.75 and Thr₂=0Λ.

Other condition may be employed, as well as a routine may be present for adjusting the detection condition, likewise may other thresholds be used. The values of 0.75 and 0.1 are mentioned only for illustrative purposes, and should not be taken as a limitation. Any suitable threshold value may be assumed, and the threshold values may depend upon the type of object to be identified, the nature of the digital data representation, etc. A routine may be present for adjusting the threshold values in a situation of use.

Experiments with face detection have been conducted. Figs.4 A and 4B show the plots of histograms Hf_ace and H_non.f_acethat are computed from a collection of face and non- face images, respectively. By using 100 of face/non-face images, including difficult sets of face images, with different orientation, eye-glasses, and faces of people across different age groups, could a correct type identification rate of 85%, with 5% false identifications was achieved, and in 10% of the cases, could a face/non-face type identification not be made.

These results are mentioned only for illustrative purposes and should not be taken as a limitation of the present invention. The attainable correct type identification rate may in a situation of use depend upon a number of features, such as upon the type of object to be identified, the nature of the digital data representation, nature and quality of the type data etc. In another embodiment may certain bins, i.e. certain patterns, be weighted. For example, may a certain type of object be very sensitive to the occurrence of certain patterns, and large weights may be given to these bins:

Object data and non-object data may be provided in a variety of ways. An application may be born with given data sets, i.e. given object histograms and non-object histograms. However, histograms may be provided in a training process. Also weight assigned to bin as explained above may be learned in a training process.

Although the present invention has been described in connection with preferred embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims.

In this section, certain specific details of the disclosed embodiment such as number of process steps, algorithm details, object type, etc., are set forth for purposes of explanation rather than limitation, so as to provide a clear and thorough understanding of the present invention. However, it should be understood readily by those skilled in this art, that the present invention may be practiced in other embodiments which do not conform exactly to the details set forth herein, without departing significantly from the spirit and scope of this disclosure. Further, in this context, and for the purposes of brevity and clarity, detailed descriptions of well-known apparatus, circuits and methodology have been omitted so as to avoid unnecessary detail and possible confusion. Reference signs are included in the claims, however the inclusion of the reference signs is only for clarity reasons and should not be construed as limiting the scope of the claims.

Claims

CLAIMS:

1. Method for identifying a type (14) of one or more digital objects (10,23) in a digital data representation (20), the method comprising the steps of: transforming at least a part (21) of the digital data representation (20) from a first form to a second form (40,41), and - identifying the type (14) of the one or more digital objects from a probabilistic-based comparison between the digital data in the second form (11) and type data in a type repository (13), wherein the digital data is transformed into the second form by means of an Adaptive Dynamic Range Coding (ADRC) process.

2. Method according to claim 1, wherein the first form is an image form (20) and wherein the transformation to the second form, for at least a selection of image elements in the image form, includes the step of assigning a value to an image element of interest (0) in at least the selection of image elements (31), the value being assigned based on a plurality of image elements located near the image element of interest, and wherein the second form is a histogram representation (40,41) of the assigned image element values.

3. Method according to claim 2, wherein the assigned value of the image element of interest is assigned in accordance with a pattern of the plurality of image elements located near the image element of interest, the pattern being a pattern of characteristics of the image elements located near the image element of interest.

4. Method according to claim 1, wherein the second form is a representation of texture features of the one or more digital objects, and wherein the representation is a probability density iunction (pdf) computed from texture feature statistics.

5. Method according to claim 1, wherein the type repository (13) comprises first data (40) describing the type of the selected digital object and second data (41) describing a type different from the type of the selected digital object, and wherein the probabilistic-based comparison includes comparison to both the first and the second data.

6. Method according to claim 1 , wherein the identification of the type of one or more objects in a digital data representation is performed using a given identification resolution of the digital data representation, so that if a resolution of the digital data representation is smaller than the identification resolution, the digital data representation is up-sampled to fit the identification resolution, and if the resolution of the digital data representation is larger than the identification resolution, the digital data representation is down-sampled to fit the identification resolution,

7. Object detector for identifying a type (14) of one or more digital objects (10,23) in a digital data representation (20), the object detector comprising: a transformer for transforming at least a part (21 ) of the digital data representation (20) from a first form to a second form (40,41), and an analyzer (12) for identifying the type of the one or more digital objects from a probabilistic-based comparison between the digital data in the second form (11) and type data in a type repository (13), wherein the digital data is transformed into the second form by means of an Adaptive Dynamic Range Coding (ADRC) process.

8. Integrated circuit (IC) for identification of a type of one or more digital objects in a digital data representation, the IC being adapted to identify a digital object according to the method of claim 1.

9. Computer readable code for identification of a type of one or more digital objects in a digital data representation, the code being adapted to conduct the steps: transforming at least a part of the digital data representation from a first form to a second form, and - identifying the type of the one or more digital objects from a probabilistic- based comparison between the digital data in the second form and type data in a type repository, wherein the digital data is transformed into the second form by means of an Adaptive Dynamic Range Coding (ADRC) process.

10. System for identification of a type of one or more digital objects in a digital data representation, the system comprising: an input module for inputting at least a part of the digital data representation, - a transforming module for transforming the digital data representation accessed from the input module from a first form to a second form, a repository for storing type data, an identification module for identifying the type of the one or more digital objects from a probabilistic-based comparison between the digital data in the second form and type data in the type repository, and an output module for outputting a type of the identified one or more digital objects, wherein the digital data is transformed into the second form by means of an Adaptive Dynamic Range Coding (ADRC) process.

11. Use of an Adaptive Dynamic Range Coding (ADRC) process for identification of a type of one or more digital objects in a digital data representation.