MXPA01002354A

MXPA01002354A - Method and device for displaying or searching for object in image and computer-readable storage medium

Info

Publication number: MXPA01002354A
Application number: MXPA/A/2001/002354A
Authority: MX
Inventors: Z BOBER Miroslaw
Original assignee: Mitsubishi Electric Information Technology Centre Europe Bv
Priority date: 1999-07-05
Filing date: 2001-03-05
Publication date: 2002-02-26

Abstract

A method for displaying an object appearing in a still or video image by processing a signal corresponding to the image. The method comprises a step of deriving numerical values relating to the feature appearing on the outline of an object starting from any point on the outline and a step of displaying the outline by applying a predetermined sorting to the numerical values.

Description

METHOD, APPARATUS, COMPUTER PROGRAM, < * 4 * COMPUTER SYSTEM, AND READABLE COMPUTER STORAGE AVERAGE TO REPRESENT AND SEARCH AN OBJECT IN AN IMAGE TECHNICAL FIELD The present invention relates to the representation of an object appearing in a still or video image, such as an image stored in a multimedia database, especially for search purposes, and a method and apparatus for searching an object. using this representation.

BACKGROUND OF THE INVENTION In applications such as image or video libraries, it is desirable to have an efficient representation and storage of the profile or shape of the objects or part of the objects that appear in still or video images. A well-known technique for classifying on the basis of shape and retrieve, uses the representation of Space in Curvature Scale (CSS). The details of the CSS representation can be found in the articles "Robust and Efficient Shape Indexing trhough Curvature Scales Space1 'Proc. British Machine Vision conference, pp. 53-62, Edinburgh, UK, 1996 e" "Indexing an Image Datábase by Shape Content using Curvature Scales Space '' Proc. IEE Colloquium on Intelligent Databases, London 1996, both by F. Mokhtarian, S. Abbasi and J. Kittler, the contents of which are incorporated herein by reference. curvature function for the profile of the object, starting from an arbitrary point in the profile.The curvature function is studied as the profile shape is developed by a series of deformations that soften the form.Specifically, crossings are computed zero of the derivative of the curvature function rolled with a family of Gaussian filters, zero crossings are plotted on a graph, known as the curvature scale space, where the axis of x is the normalized arc length of the curve and the y axis is the development parameter, specifically the filter parameter applied. The planes in the graph form characteristic handles of the profile. Each concave convex part of the profile of the object corresponds to a handle in the CS image. The coordinates of the peaks of the most prominent handles in the CSS image are used as a representation of the profile. To search for objects in images stored in a database that correspond to the shape of an input object, the CSS representation of an input image is calculated. The similarity between an input image and the stored shapes is determined by comparing the position and height of the peaks in the respective CSS images using a matching algorithm. A problem with the known CSS representation is that the peaks for a given profile are based on the curvature function that is computed starting from an arbitrary point in the profile. If the starting point is changed, then there is a cyclic change along the x axis of the peaks in the CSS image. In this way, when a measure of similarity is computed, all possible changes need to be investigated, or at least the most likely change. This results in an increased complexity in the search and correspondence procedure. Accordingly, the present invention provides a method for representing an object that appears in a still or video image, by processing signals that correspond to the image, the method comprising deriving a plurality of numerical values associated with the characteristics appearing in the profile of an object starting from an arbitrary point in the profile and applying a predetermined order to the values to arrive at a representation of the profile. Preferably, the values are derived from a CSS representation of the profile, and preferably correspond to the CSS peak values. As a result of the invention, the computation comprised in the matching procedures can be greatly reduced, without a significant reduction in recovery accuracy.

DESCRIPTION OF THE INVENTION A method for representing an object that appears in a still or video image, by processing signals corresponding to the image set forth in claim 1, the method comprises deriving a plurality of numerical values associated with the features appearing in the profile of an object starting from an arbitrary point in the profile and applying a predetermined order to the values; s to arrive at a representation of the profile. In a method set forth in claim 2, the predetermined arrangement is such that the resulting representation is independent of the starting point in the profile. In a method set out in the claim 3, the numerical values reflect inflection points in the curve. In a method set out in the claim 4, a profile curvature space representation of the profile is obtained by smoothing the profile in a plurality of steps using a sigma smoothing parameter, which results in a plurality of profile curves, using values for the maximum and minimum of the profile curvature of each profile curve to derive characteristic curves of the original profile, and select the coordinates of the peaks of the characteristic curves as numerical values. In a method set out in the claim , the coordinates of the characteristic curves correspond to an arc length parameter of the profile and the smoothing parameter.

In a method set out in the claim 6, the peak coordinate values are ordered based on the peak height values, which correspond to the smoothing parameter. In a method set out in the claim 7, the values are ordered starting from the largest value. In a method set out in the claim 8, the values are ordered in a decreasing size. In a method set out in the claim 9, the values are ordered starting from the smallest value. A method for representing an object that appears in a still or video image, by processing signals corresponding to the image set forth in claim 10, the method comprises deriving a plurality of numerical values associated with the features appearing in the profile of a object to represent the profile and derive a factor that indicates the reliability of the representation using a relationship between at least two of the values. In a method set forth in claim 11, the factor is based on the relationship between two of the values.

In a method set out in the claim 12, the ratio is of the two largest values. In a method set out in the claim 13, a profile curvature space representation of the profile is maintained by smoothing the profile in a plurality of steps using a sigma smoothing parameter, which results in a plurality of profile curves, using the values for the maximum and minimum of the curvature of each profile curvature to derive characteristic curves of the original profile, and select the coordinates of the peaks of the characteristic curves as numerical values. In a method set forth in claim 14, the values are derived using a method according to any of claims 1 to 9. A method for searching an object in a still or video image by processing signals corresponding images set forth in claim 15, the method comprises entering a question in the form of a two-dimensional profile, deriving a profile descriptor using a method according to any of claims 1 to 9, obtaining a descriptor of objects in drifting stored images using a method according to any of claims 1 to 9 and compare the question descriptor with each descriptor for a stored object, and select and display at least one result that corresponds to an image that contains an object for which the comparison indicates a degree of similarity between the question and the object. In a method set forth in claim 16A factor is derived for the question profile and for each stored profile using a method according to any of claims 10 to 12, and the comparison is made using the predetermined sort only or the predetermined sort and some other sort depending on these factors. A method for representing a plurality of objects that appear in still or video images, by processing signals corresponding to the images set forth in claim 17, the method comprises deriving a plurality of numerical values associated with features appearing in the profile of each object and apply the same default order to the values for each profile to arrive at a representation of each profile.

An apparatus set forth in claim 18 is adapted to implement a method according to any of claims 1 to 17. A computer program set forth in claim 19 implements a method according to any of claims 1 to 17. A computer system set forth in claim 20 is programmed to operate according to a method according to any of claims 1 to 17. A computer readable storage medium set forth in claim 21 stores computer executable process steps for implementing a method according to any one of claims 1 to 17. A method for representing objects in still or video images set forth in claim 22 is described with reference to the accompanying drawings. A method for searching objects in still or video images set forth in claim 23 is described with reference to the accompanying drawings. U: a computer system set forth in claim 24, is described with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a block diagram of a video database system; Figure 2 is a drawing of a profile of an object; Figure 3 is a CSS representation of the profile of Figure 2; Figure 4 is a block diagram illustrating a search method.

BEST MODE FOR CARRYING OUT THE INVENTION First Mode Figure 1 shows a computerized video database system, according to one embodiment of the invention. The system includes a control unit 2 in the form of a computer, a display unit 4 in the form of a monitor, a signaling device 6 in the shape of a mouse, a data base 8 of images including still images and video, stored and a descriptor database 10 that stores descriptors of objects or parts of objects that appear in images stored in the image database 8. A descriptor for the shape of each object of interest that appears in an image in the image database and is derived by the control unit 2 and stored in the descriptor database 10. The control unit 2 derives the descriptors that operate under the control of a suitable program that implements a method as described below. First, for a given profile of object, one derives unei representation of CSS from the profile. This is done using the method known as described in one of the articles mentioned above. More specifically, the profile is expressed by a representation? =. { (x (u), y (u), ue [0, 1].}. where u is a normalized arc length parameter.The profile is smoothed when rolled? with a Gaussian core ID g (u, s) , and the zeros crosses of the curvature of the development curvature are examined as changes S. Zero crosses are identified using the following expression for curvature: where X (u, s) = x (u) * g (u, s) Y (us) = y (u) * g (u, s) and X,. { u, s) = x (u) * gu (u, s) ^ t? ^ s) = y (u) * guu (u, s) in the above, * represents convolution and the subscripts represent derivatives. The zero crossing number of curvature changes as the s changes, and when s is high enough? It is a convex curvature without zero crossing. Zero crossing points (u, s) are plotted on a graph, known as the CSS image space. This results in a plurality of characteristic curves of the original profile. The peaks of the characteristic curves are identified and the corresponding coordinates are extracted and stored. In general terms, this gives a set of n coordinate pairs [(xl, yl), (x2, y2), ... (xn, yn)], where n is the number of peaks and xi is the position of arc length of the i-th peak and yi is the height of the peak. The order and position of the characteristic curves and corresponding peaks as they appear in the CSS image space depends on the starting point for the curvature function described above. According to the invention, the peak coordinates are reordered using a specific sort function. Sorting is performed by a one-to-one correlation T of the peak indices. { l ... n} to a new set of indexes. { l ... n} . In this mode, the coordinate pairs are ordered by considering the size of the y coordinates. First, the highest peak is selected. It is assumed that the kth peak is the most prominent. Then (xk, yk) becomes the first in the ordered set of values. In other words, T (k) = 1. Similarly, the other peak coordinates are re-ordered in terms of the decreasing height of the peaks. If two peaks have the same height, then the peak that has the x coordinate, closer to that of the preceding pair of coordinates is placed first. In other words, each pair of coordinates that has an original index i is assigned to a new index j where T (i) = j and yj >; = y (j + D • Also, each value xi is subject to a cyclic change of -xk As a specific example, the profile shown in Figure 2 results in a CSS image as shown in Figure 3. The details of the coordinates of the curves peaks in the CSS image are given in Table 1 below.

Table 1 The peaks are sorted using the arrangement described above. In other words, the coordinates are arranged in terms of the decreasing height of the peaks. Also, the x coordinates are all changed to zero by an amount equal to the original x coordinate of the highest peak.

This results in reordered peak coordinates as given in Table 2 below.

Table 2 These reordered peak coordinates form the basis of the descriptor stored in the database 10 for the object profile. In this embodiment, the peak coordinates are stored in the order shown in Table 2. Alternatively, the coordinates may be stored in the original order, together with an associated classification indicating the new order.

Second modality An alternative method to represent the object profile according to a second modality will now be described.

As described above, a CSS representation of the profile is derived. However, the ordering of the peak coordinates is different from the ordering in the modality 1 described above. More specifically, first of all, the highest peak is selected. It is assumed that the peak k is the prominent. Then (xk, yk) becomes the first peak in the ordered set of peaks. Subsequent peaks are arranged so that for each peak coordinate of the original index i, then T (i) j and xj < = x (j + l). Also, all values xi are changed down by an amount xk equal to the original x coordinate of the original peak k. In other words, in the sorting method according to mode 2, the highest peak is selected and placed first, and then the remaining peaks are followed in the original sequence starting from the highest peak. Table 3 shows the peak values of the Table 1 ordered according to the second modality.

Table 2 In a development of modalities 1 and 2 described above, a confidence factor (CF) is additionally associated with each representation of a form. The CF is calculated from the ratio of the second highest peak value and the highest peak values for a given shape. For the profile shown in Figure 2, the CF value is CF = 1001/2120. In this example, the CF is quantified by rounding to the nearest 0.1 to reduce storage requirements. Therefore, here CF = 0.5. The value of CF in this example is a reflection of the accuracy or exclusivity of the representation. Here, a CF value close to 1 means low confidence and a CF value close to zero means high confidence. In other words, the closer the two highest peak values are, the less accurate the representation is. The value of CF can be useful when performing a matching procedure as shown in the following description.

Third Mode A method for searching an object in an image according to an embodiment of the invention will now be described with reference to Figure 4 which is a block diagram of the search method. Here, the database 10 of descriptors of the system of Figure 1 stores derived descriptors according to the first sorting method described above together with the associated CF values. The user initiates a search by tracing an object profile on the screen using a signaling device (step 410). The control unit 2 then derives a CSS representation of the input profile and sorts the peak coordinates according to the same sort function used for the images in the database to arrive at a descriptor for the input profile (step 420) ). The control unit 2 then also calculates a CF value for the input profile by calculating the ratio of the second highest peak value to the highest peak value and quantifying the result (step 430). The control unit 2 then compares the value of CF for the input profile with a predetermined threshold (step 440). In this example, the threshold is 0.75. If the CF value is less than the threshold, which indicates a relatively high confidence in the accuracy of the input descriptor, then the next step is to consider the CF value for the model (ie, image stored in the database) under consideration. If the CF of the model is also lower than the threshold (step 450), then the input and a model are compared using respective descriptors in the predetermined order only (step 460). If CF either for the input or the model is greater than the threshold, then the correspondence is made by comparing all possible different orderings of the coordinate values in the input descriptors with the model descriptor in the database (step 470). The correspondence comparison is carried out using an appropriate algorithm that results in a measure of similarity for each descriptor in the database. A known correspondence algorithm such as that described in the articles mentioned above can be used. This correspondence procedure is described briefly below. Given two closed contour shapes, the? I of the image curve and the? M of the model curve and their respective sets of peaks. { (xil, yil), (xi2, yi2), ..., (xin, yin)} Y . { (xml, yml), (xm2, ym2), ..., (xm, ymn)} , the measure of similarity is calculated. The similarity measure is defined as a total cost of peaks correspondence in the model in peaks in the image. The correspondence that minimizes the total cost is determined using dynamic programming. The algorithm recursively matches the peaks of the model to the peaks of the image and calculates the cost of each correspondence. Each model peak can be matched with only one image peak and each image peak can correspond to only one model peak. Some of the model or image peaks may remain without correspondence, and there is an additional cost per penalty for each unrequited peak. Two peaks can be mapped if their horizontal distance is less than 0.2. The cost of a correspondence is the length of the straight line between the two corresponding peaks. The cost of an unrequited peak is its height. In more detail the algorithm works by creating and expanding this a tree-like structure, where the nodes correspond to corresponding peaks. 1. Create a start node consisting of the largest maximum of the image (xik, yik) and the largest maximum of the model (xir, yir). 2. For each remaining model peak that is within 80 percent of the largest peak of the image peaks, an additional start node is created. 3. Initialize the cost of each start node created in 1 and 2 to the absolute difference of the coordinate and of the image and model peaks linked by this node. 4. For each start node in 3, compute the alpha CSS change parameter, defined as the difference in to the x coordinates(horizontal) of the corresponding model and image peaks in this start node. The change parameter will be different for each node. 5. For each start node, create a list of model peaks and a list of image peaks. The list retains the information that peaks that are still going to correspond. For each starting node, mark the corresponding peaks in this node as "" corresponded1 ', and all other peaks as "" unrequited'1. 6. Recurrently extend a lower cost node (starting from each node created in steps 1-6 and continuing with its descendant nodes) until the condition is satisfied in point 8. Extend a node using the following procedure: 7. Expand a node: If there is at least one image and model peak left unattributed: select the maximum scale image CSS maximum that is not matched (xip, yip) • Apply the start node change parameter (computed in step 4) to correlate the selected maximum to the model CSS image, now the selected peak has coordinates (xip-alpha, yip). Place the nearest model curve peak that is not matching (xms, yms). If the horizontal distance between the two peaks is less than 0.2 (ie: Ixip-alpha xms I <0.2), match the two peaks and define the cost of the correspondence as the length of the straight line between the two peaks. . Add the cost of the correspondence to the total cost of that node. Remove the corresponding peaks from the respective lists by marking them as "" corresponded "". If the horizontal distance between the two peaks is greater than 0.2, the image peak (xip, yip) can not be mapped. In that case, add your yip height to the total cost and remove only the peak (xip, yip) from the list of image peaks when you mark it as "" corresponded. " Otherwise (there are only image peaks or there are only model peaks left and not corresponding): Define the correspondence cost as the height of the model peak, unrequired higher and remove that peak from the list. 8. If after the expansion of a node in 7 there are no unrequired peaks in both the image and model lists, the matching procedure is determined. The cost of this node is the measure of similarity between the image and model curvature. Otherwise, go to point 7 and expand the lowest cost node. The above procedure is repeated with the image curve peaks and the model curve peaks exchanged. The final matching value is the lower of the two. As another example, for each position in the array, the distance between the input value x and the corresponding model value x and the distance between the input value y and the corresponding model value y are calculated. The total distance over all the positions is calculated and the smaller the total distance, the closer the correspondence. If the number of peaks for the entry and the model are different, the peak height for the remainder is included in the total distance. The previous steps are repeated for each model in the database (step 480). The similarity measures that result from the correspondence comparisons are ordered (step 490) and the objects corresponding to the descriptors that have similarity measures indicate the closest correspondence (that is, here the lowest similarity measures) then are displayed in the display or display unit 4 for the user (step 500). The number of objects to be displayed can be preset or selected by the user. In the previous mode, if the value of CF is greater than the threshold, then all possible orderings of the values of the input descriptors are considered in the correspondence. It is not necessary to consider all possible arrangements, and instead only some possible arrangements can be considered, such as some or all of the cyclic changes of the original CSS representation. Additionally, in the previous mode, the threshold value is adjusted to 0.75, but the threshold can be adjusted to different levels. For example, if the threshold is set to zero, then all correspondences are made by analysis of some or all of the possible arrangements. The amount of computing required is increased compared to the case when the threshold is above zero, but since the peaks have already been ordered and their y coordinate is adjusted for a particular starting point or object rotation, the amount of The computation required is reduced compared to the original system where this adjustment has not been made. Consequently, the adjustment in the threshold to zero of the system offers some reduction in the computational cost and the recovery performance is exactly the same as in the original system. Alternatively, if the threshold is adjusted to one, then correspondence is made using only the stored arrangement. There is then a significant reduction in the computation required, with only a small deterioration in the accuracy of recovery. Various modifications of the modalities described above are possible. For example, instead of ordering CSS peak coordinate values as described in modes 1 and 2, other arrangements can be used. For example, the values can be placed in the order of increment instead of the decreasing peak height. Instead of storing the sorted values in the database, the sort can be carried out during the matching procedure.

INDUSTRIAL APPLICABILITY A system according to the invention can be provided for example in an image library. Alternatively, the databases can be placed away from the system control unit, connected to the control unit by a temporary link such as a telephone line or by a network such as the Internet. The databases of the images and diptors can be provided, for example, in permanent storage or in a portable data storage medium such as CD-ROM or DVD. The components of the system as dibed can be provided in the form of a computer program or computer equipment. Although the invention has been dibed in the form of a computer system, it could be implemented in other ways, for example using a dedicated circuit. Specific examples of methods have been given to represent a two-dimensional shape of an object and methods to calculate values that represent similarities between two shapes but any of the appropriate methods can be used. The invention can also be used, for example, to match images of objects for verification purposes, or for filtering.

Claims

CLAIMS 1. A method to represent an object that appears in a fixed or video image, when processing signals that correspond to the image, the method involves deriving a plurality of numerical values associated with the characteristics that appear in the profile of an object initiating from an arbitrary point in the profile and applying a predetermined order to the values to arrive at a representation of the profile. A method according to claim 1, wherein the predetermined arrangement is such that the resulting representation is independent of the starting point in the profile. 3. A method according to claim 1, wherein the numerical values reflect inflection points in the curve. A method according to claim 1, wherein a profile curvature space representation of the profile is obtained by smoothing the profile in a plurality of stages using a sigma smoothing parameter, which results in a plurality of profile curves, using values for the maximum and minimum of the curvature of each profile curve to derive characteristic curves of the original profile, and select the coordinates of the peaks of the characteristic curves as numerical values. 5. A method according to claim 4, wherein the coordinates of the characteristic curves correspond to an arc length parameter of the profile and the smoothing parameter. 6. A method according to claim 5, wherein the peak coordinate values are ordered based on the peak height values, which correspond to the smoothing parameter. 7. A method according to any of claims 1 to 6, wherein the values are ordered starting from the largest value. 8. A method according to claim 7, wherein the values are ordered in a decreasing size. 9. A method according to any of claims 1 to 6, wherein the values are ordered starting from the smallest value. 10. A method for representing an object that appears in a still or video image, when processing signals that correspond to the image, the method comprises deriving a plurality of numerical values associated with the characteristics that appear in the profile of an object to represent the profile and derive a factor that indicates the trustworthiness of the representation using a relationship between at least two of the values. 11. A method according to claim 10, wherein the factor is based on the relationship between two of the values. 12. A method according to claim 11, wherein the ratio is of the two largest values. A method according to any of claims 10 to 12, wherein a profile representation of curvature space of the profile is maintained by smoothing the profile in a plurality of steps using a sigma smoothing parameter, which results in a plurality of profile curves, using the values for the maximum and minimum of the curvature of each profile curvature to derive characteristic curves of the original profile, and to select the coordinates of the peaks of the characteristic curves as numerical values. A method according to claim 10, wherein the values are derived using a method according to any of claims 1 to 9. 15. A method for searching an object in a still or video image by processing signals that correspond to images the method comprises entering a question in the form of a two-dimensional profile, deriving a profile descriptor using a method according to any of claims 1 to 9, obtaining a descriptor of objects in drifting images stored using a method according to any of claims 1 to 9, and comparing the question descriptor with each descriptor for a stored object, and selecting and displaying at least one result corresponding to an image containing an object for which the comparison indicates a degree of similarity between the question and the object. 16. A method according to claim 15, wherein a factor is derived for the question profile and for each stored profile using a method according to any of claims 10 to 12, and the comparison is made using the predetermined sort only or the predetermined sort and some other sort depending on these factors. 17. A method for representing a plurality of objects that appear in still or video images, by processing signals that correspond to the images, the method comprises deriving a plurality of numerical values associated with features appearing in the profile of each object and applying the same order predetermined to the values for each profile to arrive at a representation of each profile. 18. An apparatus that is adapted to implement a method according to any of claims 1 to 17. 19. A computer program for implementing a method according to any of claims 1 to 17. 20. A computer system that is programmed to operate according to a method according to any of claims 1 to 17. 21. A computer-readable storage medium that stores computer executable process steps for implementing a method according to any of claims 1 to 17.