IMAGE COMPARING SYSTEM
BACKGROUND OF THE INVENTION The present invention relates generally to image processing techniques for comparing images, and in particular to a method and system for extracting from an image database a set of images that are close in appearance to a target image.
The general problem to be solved is that of retrieving from a large and diverse database of images all of the images that share certain properties. Attempts have been made to solve this problem by assigning to each image a set of keywords at the time it is inserted into the database. Images are then judged to be similar if they are tagged with the same keywords. The problem with this method is that it is impossible to encapsulate in a few words everything about the image that might be used as a basis for judging image similarity. For example, a picture of a car on a beach may be tagged with the key words "car" and "beach", but probably will not be tagged with such terms as,"brown pebbly beach" or "beach next to lake with blue green water" or "beach on the left; lake on the right". People see a lot of things they do not commonly put into words. However, actual image comparison is often based on just these non-verbal attributes on an image, e.g., on what the image is like instead of how the image would be described in words.
The advent of databases of digital images on computers makes it possible to compare images on the basis of their actual visual attributes (colors, textures, shapes, etc.). This permits image search by example; the operator of a computer image search system selects a given target image and then requests the computer system to find all images in the database which resemble the example
It is, however, difficult to design a successful search-by-example system. The problem is that a human being in deciding whether or not two images are similar processes the data in an image in a complex manner. Color, shape, texture, etc. are interdependent in the effect they exert on a person's judgment of image similarity. Existing prior-art systems, however, have placed too much emphasis on single sets of features of the images being compared. A particular prior-art technique consists of generating a frequency diagram or histogram of all the colors in the image. Two images are judged to be similar if their color histograms are similar. Such techniques ignore the shapes of objects in the scene, and hence do a poor job of imitating a human's methodology for comparison of images. Other techniques look for image shapes of a specific type, for example, human faces or thumbprints. These methods do analyze objects in the image, but they are limited to the specific task of the identification of specific target objects.
Many prior-art computer techniques also require extensive analysis of candidate images at search time and hence are slow. Accordingly, it is clear that what is needed in the art is an improved methodology for extracting images from a database that is both quicker and more like real human image-matching methodology than is the prior art.
SUMMARY OF THE INVENTION The present invention provides a method and system for quickly comparing a target image with candidate images in a database, and for extracting those images in the database that best match the target. The invention associates with each image a set of image statistics characterizing the image. Hence the invention is similar to prior-art keyword-tagging search schemes in the sense that a set of characteristics is assigned to each image. In the present invention, however, the selection of this set of characteristics is based on algorithmic examination and decomposition of the image by a computer program and is not subject to human idiosyncrasies or errors. When a target image is selected or inputted, the same decomposition is done to it.
The process that associates with each image a set of image statistics makes use of decomposition of the image into a set of "blobs". Each blob is a cohesive area of the original image (roughly uniform in color, or bounded by a distinct edge) which can be transformed into an exactly uniform-in-color region in the decomposed image. Each cohesive region or blob is characterized by a limited set of numerical parameters (e.g. x and y extent, center of gravity, color, shape, texture, etc.). The set of blobs in the image, along with the characterizing statistics of each blob, constitute the characterizing statistics for the image. An image-similarity score is calculated for any pair of images based on a comparison of the image statistics of the two images. The computation of an image- similarity score between two images typically comprises the three steps of (a) placing the blobs of the two images in one-to-one correspondence, (b) computing a similarity score for each pair of blobs, and then (c) obtaining an overall similarity score for the two images as a function of the similarity scores of the paired blobs in the two images. The user is able to modify aspects of the image-comparison algorithm by varying the weights assigned to the parameters (e.g., size, color, position, etc.) used in generating an image-similarity score.
A computer-implemented method is provided for selecting from a computer database of candidate images one or more images which closely match a target image. The method typically includes the steps of extending the image database by computing, for each candidate image, image-characterizing statistics and adding the statistics to the database;
computing image-characterizing statistics for the target image; computing, for each candidate image, a measure of its similarity to the target image, wherein the measure is computed as a function of the image-characterizing statistics of the target image and of the candidate image; and displaying at least a portion of one or more of the candidate images having the best image-similarity measures.
An image processing system is also provided. The image processing system typically includes a memory for storing a plurality of candidate images and image- characterizing statistics associated with each candidate image; and input means for inputting a target image for comparison with the candidate images. The system also typically includes a microprocessor coupled to the memory and the input means, wherein the microprocessor computes image-characterizing statistics for the target image, and wherein for each candidate image the microprocessor determines a measure of the similarity of the candidate image to the target image, wherein the similarity measure is computed as a function of the image- characterizing statistics of the target image and the image-characterizing statistics of the candidate image; and a display for displaying at least a portion of one or more of the candidate images having the best image-similarity measures.
Reference to the remaining portions of the specification, including the drawings and claims, will realize other features and advantages of the present invention. Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with respect to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 illustrates an exemplary image processing system for extracting images from a database "by example" according to an embodiment of the present invention; Figure 2 is a flowchart showing the process of analyzing images to be stored in the database;
Figure 3 is a flowchart showing the process of obtaining a target image, generating statistics for it, comparing it with images stored in the database and displaying the result;
Figure 4 illustrates a lion cub image and an owl image and accompanying statistics after reduction of the images to blobs;
Figure 5 shows the computer display a user might see after seeking a set of twenty candidate images matching the lion cub image;
Figure 6 illustrates the image match controls in an embodiment of the invention; Figure 7 is a flowchart showing the process of comparing the target image with images stored in the database; and
Figure 8 is a flowchart showing the process of generating match scores for blob pairs.
DESCRIPTION OF THE SPECIFIC EMBODIMENTS Figure 1 illustrates an embodiment of an image processing system for implementing the image processing and comparison techniques of the present invention. Image processing system 70 includes a computer system 71 comprising a microprocessor 72 and a memory 74. Microprocessor 72 performs the image processing and memory 74 stores computer code for processing images. Computer system 71 is any type of computer, such as a PC, a Macintosh, laptop, mainframe or the like. Imaging system 70 also includes a scanner 80 for scanning images directly. Computer system 71 is coupled to momtor 76 for displaying a graphical user interface as well as images. Computer system 71 is also coupled to various interface devices such as internal or external memory
mouse and a keyboard (not shown). Printer 78 allows for the printing of any images as required by the user. Cable 82 provides the ability to transfer images to and from another computer device via e-mail, the Internet, direct access or the like.
Figure 2 is a flowchart showing the process of analyzing images to store their characteristic data according to an embodiment of the present invention. The process breaks down into the following general steps as shown in Figure 2:
a. Image insertion; b. create "blobs"; c. analyze "blobs"; and d. store results.
In step 10, an image is provided to the imaging system. In one embodiment, the image is provided by selecting an image from an existing collection of images stored in a memory. Alternatively, an image could be provided to the imaging system using a color scanner, digital camera, paint program, or the like.
Once an image is provided, the sequence of operations outlined by box 15 of Figure 2 will result in the generation of a set of statistics characterizing the image. This sequence of operations is decomposed into the specific steps described below.
At step 20, the image is resized to a standard size while maintaining the aspect ratio. The scale factor is stored for later comparisons based on size. In one embodiment, the image is reduced to a maximum 64-by-64 pixel resolution. Other resolutions may be used as desired. There is a tradeoff between the speed and the accuracy of the image comparison process. Smaller resolutions provide for increased speed with some loss of accuracy. Maintenance of the aspect ratio means that if the original image is non-square, then the longer axis of the reduced image will have the designated size (e.g., 64 pixels) and the shorter axis will be proportionately smaller. In step 30, detail is removed from the reduced-size image. In one embodiment, the image is blurred using a 10-pixel radius Gaussian blur filter. This effectively removes most of the detail of the image while keeping the dominant colors mostly intact. Alternatively or additionally, a median filter may also be used to blur the image. In another embodiment, an edge-preserving blur is used to reduce detail. One embodiment uses a Sigma filter as an edge-preserving blur; each pixel is replaced by a mean value of all pixels
(a) which are within a given distance of the target pixel, and (b) whose color differs from the color of the target pixel by less than a specified amount.
In step 40, the blurred, reduced-size image is decomposed into a set of cohesive regions or "blobs". In one embodiment, blobs of identical color are generated by reducing the number of different colors in the image to a small number. This is done according to one embodiment by using resampling techniques developed originally for computer video displays. Early computer video displays had a small palette of distinct displayable colors. In some such displays the number of displayable colors was limited to a value such as 64 or 256, but each color in this limited palette could be chosen at run time from millions of candidate colors. Hence, technologies were developed to reduce the set of perhaps millions of distinct colors in an image to a representative set of, say, 256 of these colors. One embodiment of this invention uses one such image resampling technique, a median-cut algorithm, as is described, for example, by James D. Foley, van Dam, Feiner and Hughes, in Computer Graphics, Principles and Practice, Addison- Wesley, 1995, at p. 600, the disclosure of which is hereby incorporated by reference. Although more colors can be used, in preferred aspects the number of colors should be less than about 10 to speed the subsequent image-match algorithm. Note that these colors can be completely different across images. The image is now divided into a set of areas of solid color, i.e., blobs. These blobs are catalogued using a flood-fill type algorithm, as is well-known in the art and which is described in Foley, van Dam, Feiner and Hughes, op. cit., pp. 979-980.
An alternative embodiment for the blob-generation process of step 40 employs an adaptive color seed-fill algorithm, thus eliminating the need for image resampling. In this embodiment, the image is scanned pixel by pixel, left to right, top to bottom. The first pixel in the image, at the top left, is taken to be the first pixel of the first blob. The second pixel scanned is added to the first blob if it is sufficiently similar in color to the first pixel. Otherwise, it becomes the first pixel of a second blob. A pixel scanned subsequently is added to the blob enclosing one of its adjacent already-scanned neighbor pixels if its color is sufficiently similar to the color of the adjacent pixel. Otherwise it becomes the first pixel in a new blob. This algorithm is a variant of a seed-fill algorithm as is well-known in the art and as is described in Foley, van Dam, Feiner and Hughes, op. cit., pp. 979-980. This algorithm
varies from standard seed- fill algorithms in its adaptive property. Standard seed-fill algorithms cease adding pixels to an area when a pixel is encountered that fails a fixed test; e.g., the test might be that the pixel have a color not too different from black. The seed fill algorithm used in this embodiment is adaptive in the sense that the test for inclusion of a pixel into the blob enclosing a neighbor pixel depends on the color of the neighbor pixel. Hence, because the colors of the pixels within a blob vary at this stage, the test for inclusion or exclusion of pixels adapts itself depending on the color of the target pixel. A result is that an original-image area of gradually-changing color (e.g., a vignette, gradient, or ramp) may be parsed by the blob-generating algorithm as a single blob. At the end of the blob generation step 40 the entire reduced-size image has been partitioned into a set of blobs; every pixel in the reduced-size image has been assigned to a blob.
When the blob-generation step 40 is completed, step 45 is entered. Step 45 is used to ascertain whether a pre-specified set of criteria concerning the total number of blobs in the reduced-size image has been achieved. If so, flow of control passes to the blob- analysis step 50. If not, steps 30 and 40 are repeated, but with the parameters of the detail- removal and blob-generation algorithms modified so that on the subsequent pass through steps 30 and 40 the image will be decomposed into a smaller total number of blobs. For example, if the adaptive color seed-fill algorithm is used to generate blobs, then on each iteration through step 40 it may be programmed to be more liberal and less discriminating in the criteria it applies when deciding whether or not to add a given image pixel to an existing blob. The system is programmed to cycle through steps 30, 40 and 45 until the predetermined goal has been reached, or until a predetermined maximum number of cycles have been taken. Control then passes to step 50. On each iteration through steps 30 and 40, the total number of blobs in the decomposed image declines. (Strictly speaking, the number .either declines or stays the same.) The goal of iterating over steps 30 and 40 is, roughly speaking, to reduce the number of blobs to a predetermined maximum number. In one embodiment, the number of blobs is preferably reduced to ten blobs. However, any number of blobs can be used. Additionally, in many embodiments it is efficient to impose a halting criterion that does not refer explicitly
to the target number of blobs into which the reduced-size image has been decomposed. For example, one such halting criterion is that the largest p blobs (for example, the largest 10 blobs) occupy an area equal to a pre-specified proportion (e.g., 75%) of the reduced-size image. If the minimal perimeter adaptive fill seed algorithm has been used for blob generation there is no guarantee that each individual blob remaining in the image at the beginning of step 50 will be filled with pixels of identical color. However, it is required that at the beginning of step 50 a mean color for each blob be known. Hence, in this case, there may be an additional step between steps 45 and 50 comprising replacing the colors of pixels within each blob with the average of the colors of all pixels in the blob. Alternatively, the average color of each blob may be computed as the blob is constructed, so that the mean color of the blob is known before step 50 is entered.
Figure 4 shows two images after they have been reduced to 64-by-64 pixel resolution, ten-blob images. Image 200 of Figure 4 is the blob image of the original lion cub image 300 shown in Figure 5 (and of thumbnail image 305 of Figure 5). Image 210 of
Figure 4 is the blob image of the great horned owl thumbnail image 310 shown in Figure 5. In step 50, the characteristics (e.g., color, size, center of gravity, moment of inertia, texture, etc.) of the blobs are determined, and a numerical view of the blobs is created. For efficiency, in one embodiment, step 50 is combined with step 40; i.e., the image statistics are in fact generated as the blobs are being generated.
The numerical view of each image created in step 50 is stored (usually, but not necessarily, in a database) in step 60.
Figure 4 shows statistics for the four largest (amongst ten generated) of the blobs in the lion cub image 200 and the owl image 210, and, in addition, other statistics generated after matching the two images. Column 0, headed "Match" enumerates the matches between the largest four blobs of the image, in order, with the best match shown first. Column 1, headed "Blob" shows which blobs are matched in each Match. The first two entries in the "Blob" column as shown are zero and zero, indicating that the match is between blob 0 of image 0, background area 202 of lion cub image 200, and blob 0 of image 1, background area 212 of owl image 210. The next column headed "ValA" shows an overall
match score for the two blobs. The next column headed "Val" shows a normalized match score, ValA divided by an Area measure, for the two blobs. The next column headed "Area" shows the areas in pixels of the two blobs. Subsequent columns show the statistics summarized below (in each case the statistic characterizes a blob): X: the X position of the center of gravity;
Y: the Y position of the center of gravity; H: the hue (a color measure); S: the saturation (a color measure); V: the value (a color measure); Xe: the X extent, in pixels;
Ye: the Y extent, in pixels; Mo: the moment of inertia; Ra: the minimum radius;
An: the angle from the horizontal of the major axis; Sk: the skewness;
The image statistics illustrated in Figure 4 exemplify one embodiment. Other embodiments will vary. The shared goal in the various embodiments is to include statistics measuring for each blob its size (Area in the example), location (X and Y in the example), color (H, S and V) in the example, and shape (Area, Xe, Ye, Mo, Ra, An and Sk in the example). Other embodiments add to this list a set of measures of the textures of blobs.
The above process of image statistics generation, as shown in box 15 of Figure 2, is repeated for each image desired to be stored.
After all information has been created a user inputs a target image desired to be matched with the collection of stored images. The target image is analyzed as above. Figure 3 is a flowchart showing the process of analyzing and comparing the target image with a collection of stored images. The matching process brςaks down into the following general steps as shown in Figure 3 :
a. Generate image statistics; b. obtain user requirements (e.g., color important, position important, etc.);
c. compare to stored images; and d. display results of best/closest match(es).
According to one embodiment, in step 110 of Figure 3 the target image is provided by selecting an image from a pre-existing collection of images. Alternatively, the target image is provided to the imaging system using a color scanner, digital camera, paint program, or the like.
Once the target image is provided, the target image is subjected in step 115 of Figure 3 to the same sequence of image statistics generation operations as were applied to database images in step 15 of Figure 2.
At step 150 of Figure 3, the numerical results of the statistic-generating step 115 are cached in computer memory for later comparison to the same statistics generated for images in the database, which statistics were stored in the database at step 60 of Figure 2.
In step 160, the specific requirements of the image processing system operator are obtained. The user has control in determining which search parameters are most important (e.g., whether color and/or location are the most important parameters when searching for matches). A set of sliders, such as are shown in Figure 6, is presented to the user to permit setting of the importance of various factors to be used in the comparison of the target image with candidate images. These factors include, for example: 1. The maximum number of candidate image matches to display (e.g., the
"Max Ids to Return" slider 400 in Figure 6).
2. The maximum number of blobs to compare (e.g., the "Max Blobs to Compare" slider 405 in Figure 6).
3. A measure of the importance of color in the match (e.g., the "Color Weight" slider 420 in Figure 6).
4. A measure of the importance of position iq., the match (e.g., the "Location Weight" slider 415 in Figure 6), affecting how the center of gravity parameter is weighted in the matching computation.
5. Measures of the importance of shape in the match (e.g., the "Area", "Extents", "Inertia", "Radius", "Angle", and "Skew" sliders 410, 425, 430, 435, 440
and 445, respectively, of Figure 6. These affect how the moment of inertia and x and y extents, etc., are used in the match).
In step 170, once the statistics for the target image have been determined, the given target image is compared with all stored candidate images and a match score is generated for each pair of the form (target image, candidate image) .
Figure 7 is a flow chart displaying the details of the "Compare with Images in Database" step 170 of Figure 3. An image match score for each pair of images is generated from the similarity scores of the paired blobs from the two images. Consequently, before an image match score is generated it is necessary to place all or some of the blobs from the two images into one-to-one correspondence. The correspondence is such that similar blobs from each image are paired with each other. In one preferred embodiment 10 blobs are generated for each image, and four blobs from each image are placed in one-to-one correspondence with each other. The general rule is that if p blobs are generated for each image, then n blobs from each image, n < p, are placed in one-to-one correspondence with each other. The former number p is the number of generated blobs in the image and the latter number n is the number of significant blobs in the image. The process of placing the significant blobs in one to one correspondence is shown as step 510 of Figure 7.
In step 510, the n significant blobs are placed in one to one correspondence. This requires as input a set of measures of the similarity of blob pairs. These measures are generated at step 500 of Figure 7, the original step in operation 170.
In Step 500 match scores are developed for pairs of blobs. In one embodiment, match scores are generated for all p-by-p pairs of generated blobs, with each pair consisting of one generated blob from the target image and one generated blob from the candidate image. The set of n significant blobs (n < p) to be placed in one-to-one correspondence is then chosen on the basis of these match scores: if the best (largest) match score matches blob i of the target image to blob j of the candidate image, then blob i from the target is one of the n significant blobs, as is blob j from the candidate. Target blob i and candidate blob j are then placed in one-to-one correspondence. This process is repeated until n blobs from the target have been placed in one-to-one correspondence with n blobs from the
candidate. In another embodiment, the n significant blobs to be matched from each image are chosen a priori to be the n largest blobs in the image and blob match scores are generated for only the n-by-n pairs of these blobs. In this latter embodiment the matching of blobs at step 510 is done on the basis of these n-by-n match scores: if the best match score matches blob i of the target image to blob j of the candidate image, then target blob i and candidate blob j are then placed in one-to-one correspondence. This process is repeated until all n significant blobs from the target have been placed in one-to-one correspondence with all n significant blobs from the candidate.
Figure 8 shows details of step 500, the process of generating match scores for pairs of blobs. First, at step 600, for each given pair of blobs, similarity scores are generated for each separate statistical component - that is, for each of the several measures which collectively measure the area, location, color, shape and texture of a blob. At step 610, an overall blob match score is generated from the individual component similarity scores. In some embodiments the individual component similarity scores share the same bounds (from 0 to 1, or from 0 to 100), and the overall blob match score is a measure of the mean of the individual component scores, either the arithmetic mean or the geometric mean or some other measure with the property of a mean. In one embodiment, the latter mean similarity score is weighted by the mean areas of the blobs being compared, so as to give a larger similarity score to paired large blobs. After n significant blobs from the target image have been placed in one-to-one correspondence with n significant blobs from the candidate image at step 510 of Figure 7, the overall image match score for the pair of images is generated at step 520 of Figure 7. The overall image match score is generated as a sum or mean (or other increasing function) of the n match scores for the n paired blobs in the one-to-one correspondence list of step 510. The blob match scores used are the same ones that were generated at step 500 of Figure 7.
The set of all candidate-image match scores i§ computed by the operation of step 170 of Figure 3, which comprises steps 500, 510 and 520 of Figure 7. When step 520 is completed the resulting set of candidate-image match scores is passed to the final step 180 of Figure 3.
In step 180, the system is programmed to display the candidate images in the database identified as having the best matches with the target image (e.g., the top 20, or the top 50, etc.) given the user's desired input requirements (i.e., parameter settings). If the results are not to the user's liking, the user is able to modify the input parameters and search again.
Figure 5 shows one set of displayed results, and Figure 4 shows the associated image and match statistics for one match. In the case illustrated by Figure 5 the goal was to match the lion cub image 300 with images in the database. The system returned the 20 best matches. Because the target lion cub image 300 is itself a member of the database, the best match is between the lion cub image and itself, as shown by the thumbnail lion cub image 305. The best non- trivial match is the second best overall match, between the target lion cub image 300 and the great horned owl image 310.
In conclusion, the present invention provides a simple, efficient solution to the problem of extracting images from a data base "by example." While the above is a complete description of the preferred embodiments of the invention, various alternatives, modifications and equivalents may be used.
In one variant embodiment, the target image is not a photographic image but an image painted by the user using computer graphic painting software. In this embodiment the user who wants to find a lion cub in the image database first paints an image of the lion cub and then looks for matches to the painted image. The search for matching candidate images can be iterated as the painting progresses; a rough draft of the lion cub painting will yield a first set of matches. As detail is added other sets of matches will be found.
It is often useful to look for image matches based on image textures, such as the textures in fabrics, in grass, in sand, or in the bark of trees. Texture matching techniques
> are often based on the spectral decomposition, such as can be obtained by a Fourier transformation, of areas of the image; texture matching can also be done by a process known as Wold decomposition, and described in A new Wold ordering for image similarity; Rosalind W. Picard and Fang Liu, Proc. IEEE Conf. on Acoustics, Speech, and Signal Proc, Adelaide, Australia, April 1994, pp. 129-132, the contents of which are hereby incorporated by reference for all purposes. Texture-based comparisons are introduced into this invention
in the following manner. Once the image has been decomposed into blobs, each such blob can be used as an index back into the original image. For example, in Figure 4, blob 202 is the body of the lion cub; blob 212 is the body of the owl. Areas of the original images 300 and 310 of Figure 5 corresponding to each such blob are found, and texture measures are computed over the indicated areas of the original images. The resulting texture measures are added to the set of blob-characterizing statistics, and a texture similarity score is computed for each blob pair. Referring to Figure 5, a texture comparison from, on the one hand, the bodies of the owl 305, serval 315, second lion cub 320 and puma 325 to, on the other hand, the body of the target lion cub 300 will reveal the greater similarities of the fur-to-fur texture comparison between cat-cat pairs than the fur-to-feather comparison between the cat-owl pair.
Another variant embodiment modifies the image similarity score algorithm and then cycles through the image-comparison step 170 of Figure 3, culling the set of candidates to a smaller number on each pass. The very first comparisons between a target image and the set of candidate images may be a simple and fast culling operation using a relatively small set of image statistics over a small number of blobs per image, and basing image comparisons on relatively simple measures of differences between blobs. Such a first- pass culling operation can be used to reduce the number of candidate images from, for example about 1,000,000 to about 100,000. A slightly more-sophisticated set of tests is then used to reduce the set of candidate images to about 10,000, and so on, until a manageable number of candidate images, for example about 20, remain.
Another variant of the invention bases the search for candidate images not on a single target image but on n = 2 or more target images. The candidate images are then the ones that match best to all n target images, as measured by the mean of all n matches, or the maximum or minimum of the n matches, or some compound of such measures.
Other variants of the invention use blob-comrjarison measures and image- comparison measures other than similarity measures. Comparison can be based on difference measures just as well as on similarity measures, because difference measures can be constructed as inverses of similarity measures. Comparison can also be based on propinquity measures, since two sets of numbers can be said to be similar to the extent that
they are close to each other. Comparison can also be based on distance measures just as well as on propinquity measures, since distance measures can be constructed as inverses of propinquity measures.
In light of the various alternatives, modifications and equivalents to the present invention, the above description should not be taken as limiting the scope of the invention which is defined by the appended claims.