CN108805148B

CN108805148B - Method of processing image and apparatus for processing image

Info

Publication number: CN108805148B
Application number: CN201710295810.8A
Authority: CN
Inventors: 曹琼; 刘汝杰
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2022-01-11
Anticipated expiration: 2037-04-28
Also published as: CN108805148A

Abstract

Exemplary embodiments disclosed herein relate to a method of processing an image and an apparatus for processing an image. According to a method of processing images, at least one image model is generated by clustering a plurality of images, wherein each image model is represented by images of the images that are similar to each other. If the number of images representing an image model exceeds a threshold, a visual dictionary is learned from the images representing the image model and the image model is represented with the visual dictionary in place of the images representing the image model.

Description

Method of processing image and apparatus for processing image

Technical Field

Example embodiments disclosed herein relate to image processing. More particularly, exemplary embodiments relate to automatic classification or identification of images.

Background

With the rapid development of various fields such as digital products and the internet, a large amount of image contents which need to be analyzed, identified, organized, classified and retrieved urgently are generated. The method effectively identifies the image information, and becomes a research hotspot in a plurality of fields such as image processing, machine vision, pattern recognition, artificial intelligence, neuroscience and the like. Image classification is an important research content therein.

Image classification is an image processing method that distinguishes images into different categories of objects based on different characteristics reflected in the image information. Common image classification methods can be classified into supervised classification methods and unsupervised classification methods.

The difference between supervised and unsupervised classification approaches is whether training data is used to obtain a priori class knowledge. The supervised classification method selects characteristic parameters according to samples provided by a training data set, establishes a discriminant function and classifies images to be classified. Therefore, supervised classification approaches rely on selected training data. In contrast, unsupervised classification methods do not require more prior knowledge, but rather classify based only on the natural clustering characteristics of the image data. Therefore, the unsupervised classification method is simple and has high accuracy. One example of an unsupervised classification method is the K-means (K-means) method.

Disclosure of Invention

According to one exemplary embodiment disclosed herein, a method of processing an image is provided. According to the method, at least one image model is generated by clustering a plurality of images, wherein each image model is represented by images that are similar to each other in the images. If the number of images representing an image model exceeds a threshold, a visual dictionary is learned from the images representing the image model and the image model is represented with the visual dictionary in place of the images representing the image model.

According to another exemplary embodiment disclosed herein, a method of processing an image is provided. According to the method, the similarity between the image and at least one image model is calculated, and the image model corresponding to the higher similarity higher than a similarity threshold is identified as the image model to which the image belongs. If one of the image models is a first type image model represented by at least one representative image, calculating a similarity between the image and the image model based on a similarity between the image and the representative image. If one of the image models is a second type of image model represented by a visual dictionary, calculating a similarity between the image and the image model based on a similarity between features of the image and visual words of the visual dictionary.

According to another exemplary embodiment disclosed herein, there is provided an apparatus for processing an image, including at least one processor. The at least one processor is configured to perform a method as the exemplary embodiments disclosed herein.

Further features and advantages of exemplary embodiments of the present invention, as well as the structure and operation of exemplary embodiments of the present invention, are described in detail below with reference to the accompanying drawings. It should be noted that the present invention is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Other embodiments will occur to those skilled in the relevant art based on the teachings contained herein.

Drawings

The exemplary embodiments disclosed herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1 is a flow chart for illustrating a method of generating an image model according to an exemplary embodiment;

FIG. 2 is a flow chart for explaining an image classification method according to an exemplary embodiment;

fig. 3 is a flowchart for explaining a similarity calculation method according to an exemplary embodiment;

fig. 4 is a flowchart for explaining a similarity calculation method according to another exemplary embodiment;

FIG. 5 is a flowchart for explaining an image classification method according to another exemplary embodiment;

FIG. 6 is pseudo code for explaining an image classification judgment algorithm according to an exemplary embodiment;

FIG. 7 is a flowchart for explaining an image model merging method according to an exemplary embodiment;

fig. 8 is a flowchart for explaining an image classification method as a variation to the exemplary embodiment of fig. 2;

fig. 9 is a flowchart for explaining an image classification method as a variation to the exemplary embodiment of fig. 5;

FIG. 10 is a flowchart for explaining an image model updating method according to an exemplary embodiment;

FIG. 11 is a block diagram illustrating an exemplary system for implementing aspects of the exemplary embodiments disclosed herein.

Detailed Description

Exemplary embodiments disclosed herein are described below with reference to the accompanying drawings. It should be noted that for the sake of clarity, representations and explanations relating to parts and processes known to a person skilled in the art but not related to the exemplary embodiments have been omitted from the drawings and the description.

As will be appreciated by one skilled in the art, aspects of the exemplary embodiments may be embodied as a system, method or computer program product. Thus, aspects of the exemplary embodiments may be embodied in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware portions that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, aspects of the illustrative embodiments may take the form of a computer program product embodied on one or more computer-readable media having computer-readable program code embodied thereon. The computer program may be distributed, for example, over a computer network, or it may be located on one or more remote servers or embedded in the memory of the device.

Any combination of one or more computer-readable media may be used. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any suitable form, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.

A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied in a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio frequency, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the exemplary embodiments disclosed herein may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

Various aspects of the exemplary embodiments disclosed herein are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to exemplary embodiments. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

FIG. 1 is a flow chart illustrating a method 100 of generating an image model according to an exemplary embodiment.

As shown in fig. 1, method 100 begins at step 101. In step 103, a plurality of images I are clustered₁-I_MTo generate at leastAn image model O₁-O_NN is more than or equal to 1. Clustering methods classify images according to their similarity or their distance from each other in a feature space. When clustering is carried out, a certain clustering criterion is made to reach an extreme value as a target, so that a clustering result of the image is obtained. Examples of clustering methods include, but are not limited to, iterative dynamic clustering algorithms (e.g., C-means algorithm and ISODATA algorithm) and non-iterative hierarchical clustering algorithms. Through clustering, image I₁-I_MAre divided into different groups. The images in each group are similar to each other or are close in distance in feature space. Such a group of images is called an image model. Each image model O_iImage I that can be contained by it_j∈O_i(also referred to as image model O)_iRepresentative image of (b) is displayed. Such an image model is also referred to as a first type of image model in this disclosure. Each image model O_iCan be stored as an image I contained thereby_jAs such, may also be stored as a plurality of images I from each image_jThe features of the extracted feature points.

In step 105, a current image model O among the image models obtained in step 103 is determined_kWhether the number of images exceeds a threshold. If the number of images does not exceed the threshold, the method 100 proceeds to step 111. If the number of images exceeds the threshold, then the image model O is represented in step 107_kThe visual dictionary is learned from the images. For example, by representing the image model O from_kExtracting features from the image and clustering the extracted features to obtain an image model O_kThe visual dictionary of (1).

In one example, multiple attributes may be extracted as image features to embed different cues in the image model. For example, the extracted features may include scale-invariant feature transform (SIFT) features and/or Color Name (CN) features.

In one example where the extracted features include color name features, the color name features of a local block in the image may be calculated as the average of the color name features of all pixels within the local block. Taking an image retrieval system as an example, Scale Invariant Feature Transform (SIFT) features describing local gradient distributions are commonly used in image retrieval systems, and inverted indexes are employed to build bag of words model (BoW) -based image retrieval systems, where each entry corresponds to a visual word defined in a codebook of SIFT features. However, reliance on SIFT features results in disregarding other characteristics of the image (e.g., color). This problem, together with the loss of information during quantization, results in many false alarm matches. To enhance the discrimination ability of SIFT visual words, color name features may be employed, which assign an 11-D vector to each pixel. Around each detected feature point, a local block having an area proportional to the feature point scale may be obtained. Then, a CN vector of each pixel in the region is calculated, and an average CN vector is calculated as a color feature.

In step 109, instead of representing the image model with the included images, the image model is represented with a learned visual dictionary. Such an image model is also referred to as a second type of image model in this disclosure. When the number of images included in the image model is large, storage and processing of the images or image features in practical application require large overhead. In contrast, representing the image model with a visual dictionary reduces overhead in the application.

In step 111, it is determined whether there is a next unprocessed image model in the image models obtained by step 103. If there is a next unprocessed image model, then the next unprocessed image model is set as the current image model and the method 100 returns to step 105. If there is no next unprocessed image model, the method 100 ends at step 113.

FIG. 2 is a flow chart illustrating a method 200 of image classification according to an example embodiment. In the application scenario of this exemplary embodiment, the image model O represents different classifications₁-O_NAnd identifying the image model to which the query image q belongs in N & gt1, so that the classification of the identified image model is identified as the classification to which the query image q belongs.

As shown in fig. 2, method 200 begins at step 201. In step 203, an image model O is determined₁-O_NCurrent image model O in (1)_kWhether of the first type or the second type.

If the current image model O_kIs a first type of image model, then at step 205 based on the query image q and the current image model O_kTo calculate the similarity between the query image q and the current image model O_kThe similarity between them.

If the current image model O_kIs a second type of image model, then in step 207 the features based on the query image q are compared to the current image model O_kTo calculate the similarity between the visual words of the visual dictionary of the query image q and the current image model O_kThe similarity between them.

After the similarity is calculated, it is then determined whether the calculated similarity is above a similarity threshold in step 209. If the calculated similarity is not above the similarity threshold, the method 200 proceeds to step 213. If the calculated similarity is higher than the similarity threshold, the corresponding current image model O is selected in step 211_kThe image model to which the query image q belongs is identified. The method 200 then proceeds to step 215.

In one example, the similarity may be measured using euclidean distance. In another example, hamming distance may be used to measure similarity. A smaller distance indicates a higher degree of correlation.

In one example, the image features include CN features. Given that each dimension of a CN feature has a clear semantic meaning, each dimension of a CN feature may be binarized to produce a binary feature.

In one example, the similarity of all features (e.g., SIFT features and/or CN features) of the query image may be combined, such as arithmetic mean or weighted sum, to obtain the final similarity.

In step 213, an image model O is determined₁-O_NWhether there is a next unprocessed image model. If there is a next unprocessed image model, then the next unprocessed image model is set as the current image model and the method 200 returns to step 203. Such asIf there is no next unprocessed image model, the method 200 ends at step 215.

In the method shown in fig. 2, the current image model, which is first identified in step 209, with a similarity higher than the similarity threshold, is identified as the image model to which the query image belongs. In an exemplary modified embodiment, the corresponding image model with the highest similarity among the similarities higher than the similarity threshold may be identified as the image model to which the query image belongs after the similarities between the query image and all the image models are calculated. In another exemplary modified embodiment, the respective image models of at least two of the same highest similarities among the similarities higher than the similarity threshold may be identified as the image model to which the query image belongs, after the similarities between the query image and all the image models are calculated. In another exemplary modified embodiment, respective image models of at least two of the similarities which satisfy a predetermined proximity criterion with each other and are higher than the other similarities among the similarities higher than the similarity threshold may be identified as the image model to which the query image belongs, after the similarities between the query image and all the image models are calculated.

Fig. 3 is a flowchart for explaining the similarity calculation method of step 205 according to an exemplary embodiment.

As shown in fig. 3, method 300 begins at step 301. In step 303, feature points p are identified from the query image q₁-p_LAnd extracts the features of the feature points. At step 305, the current feature point p for the query image q_tFrom the current image model O_kSelecting a feature point in a representative image, wherein the feature point p_tThe degree of closeness between the feature of (a) and the feature of the selected feature point representing the image satisfies predetermined requirements, such as the similarity being higher than a threshold level, the distance being lower than a threshold level, the similarity being highest, the distance being lowest, and so on. Then, the feature point p is calculated_tS similarity between the features of (a) and the features of the corresponding selected feature points_t,k。

In step 307, it is determined whether there is a next unprocessed feature point among the feature points of the query image q. If it is looked upIf there is a next unprocessed feature point in the feature points of query image q, the process switches to the next unprocessed feature point and method 300 returns to step 305. If there is no next unprocessed feature point among the feature points of the query image q, then in step 309, feature point p based on the query image q is used₁-p_LS similarity between the features of (a) and the features of the corresponding selected feature points_1,k-S_L,kTo calculate the query image q and the current image model O_kSimilarity between them S_k. For example, the similarity S_kCan be calculated as the similarity S of the features of the respective feature points_1,k-S_L,kAn arithmetic mean or a weighted sum of.

In step 311, an image model O is determined₁-O_NWhether there is a next unprocessed image model. If the image model O₁-O_NIf there is a next unprocessed image model, then a switch is made to the next unprocessed image model and the method 300 returns to step 305. If the image model O₁-O_NThere is no next unprocessed image model, the method ends at step 313.

Those skilled in the art will appreciate that the feature extraction operation of step 303 may also be performed outside the flow of method 300, for example, at a time prior to the similarity calculation operation of step 205 in the flow of method 200.

Fig. 4 is a flowchart for explaining the similarity calculation method of step 207 according to an exemplary embodiment.

As shown in fig. 4, the method 400 begins at step 401. In step 403, feature points p are identified from the query image q₁-p_LAnd extracts the features of the feature points. At step 405, the current feature point p for the query image q_tFrom the current image model O_kIn the visual dictionary, wherein the feature point p is selected_tThe proximity of the selected visual word to the feature of (b) meets a predetermined requirement, such as similarity above a threshold level, distance below a threshold level, similarity highest, distance lowest, and so on. Then, the feature point p is calculated_tCharacteristic of and corresponding selection ofThe similarity between visual words S_t,k。

In step 407, it is determined whether there is a next unprocessed feature point among the feature points of the query image q. If there is a next unprocessed feature point among the feature points of the query image q, then a switch is made to the next unprocessed feature point and the method 400 returns to step 405. If there is no next unprocessed feature point among the feature points of the query image q, then in step 409, based on the feature point p of the query image q₁-p_LS similarity between the features of (a) and the correspondingly selected visual word_1,k-S_L,kTo calculate the query image q and the current image model O_kSimilarity between them S_k. For example, the similarity S_kCan be calculated as the similarity S of the features of the respective feature points_1,k-S_L,kAn arithmetic mean or a weighted sum of.

In step 411, an image model O is determined₁-O_NWhether there is a next unprocessed image model. If the image model O₁-O_NIf there is a next unprocessed image model, then a switch is made to the next unprocessed image model and the method 400 returns to step 405. If the image model O₁-O_NThere is no next unprocessed image model, the method ends at step 413.

Those skilled in the art will appreciate that the feature extraction operation of step 403 may also be performed outside the flow of method 400, for example, at a time prior to the similarity calculation operation of step 207 in the flow of method 200.

When the image model to which the query image belongs is identified from the similarity between the query image and the image models, there may be a case where the similarity between the query image and the plurality of image models is high and close to each other. In such a case, the plurality of image models may be identified as the image model to which the query image belongs as in the embodiment described in conjunction with fig. 2, or the plurality of image models may be merged into one image model, and the image model to which the query image belongs is identified among the merged plurality of image models.

FIG. 5 is a flow diagram illustrating a method 500 of image classification, including a process of merging image models, according to an example embodiment.

As shown in fig. 5, method 500 begins at step 501. In step 503, an image model O is determined₁-O_NCurrent image model O in (1)_kWhether of the first type or the second type.

If the current image model O_kIs a first type of image model, then at step 505 based on the query image q and the current image model O_kTo calculate the similarity between the query image q and the current image model O_kThe similarity between them.

If the current image model O_kIs a second type of image model, then in step 507 the features based on the query image q are compared to the current image model O_kTo calculate the similarity between the visual words of the visual dictionary of the query image q and the current image model O_kThe similarity between them.

After the similarity is calculated, the image model O is then determined in step 509₁-O_NWhether there is a next unprocessed image model. If there is a next unprocessed image model, then the next unprocessed image model is set as the current image model and the method 500 returns to step 503. If there is no next unprocessed image model, it is determined in step 511 whether there are at least two image models having a higher similarity and the degree of similarity satisfies a predetermined condition. For example, whether there are at least two similarities that satisfy a predetermined proximity criterion to each other and are higher than the other similarities among the similarities higher than the similarity threshold. If such at least two similarities do not exist, the method 500 proceeds to step 515. If there are such at least two similarities, the respective image models of such at least two similarities are merged into one image model in step 513, and the similarity between the query image and the merged image model is calculated.

At step 515, the image model with the highest similarity above the threshold is identified as the image model to which the query image belongs. The method 500 then ends at step 517.

In the process of step 513, the merged image model may be directly identified as the image model to which the query image belongs after the merging, without calculating the similarity between the query image and the merged image model and executing step 515.

FIG. 6 is pseudo code for illustrating an image classification decision algorithm according to an exemplary embodiment, in which a specific example of merging and recognition logic is provided.

In the example shown in FIG. 6, assume that there are query images q and n existing image models. The calculated similarity between the query image and the image model is arranged as S in descending order_k1,S_k2,…,S_knWhere kj is the number of the image model. In the example shown in fig. 6, the degree of closeness of the similarities is measured by the ratio R between the similarities. If the ratio R is greater than the threshold th2, determining that the similarity is not close; otherwise, determining that the similarity is close. In the example shown in fig. 6, if the two highest similarities are found to be close, the respective image models are merged, and then the iterative process is restarted.

FIG. 7 is a flowchart illustrating an image model merging method 700 according to an example embodiment.

As shown in fig. 7, method 700 begins at step 701. In step 703, it is determined whether one of the at least two image models to be merged is an image model of a second type (represented by a visual dictionary). If neither of the at least two image models to be merged is the second type of image model, the method 700 proceeds to step 709. In step 705, if one of the at least two image models to be merged is an image model of the second type, a visual dictionary representing the merged image model is learned from the representations of the at least two image models to be merged. The representation of the image model may be either the image representative image itself or a visual dictionary. If the image models to be merged are each represented by a representative image, features of the feature points are extracted from the representative images, and the extracted features are clustered to learn a visual dictionary representing the merged image models. If the image models to be merged are all represented by visual dictionaries, the visual words of these video dictionaries are clustered to learn the visual dictionary representing the merged image model. If the image model to be merged has both an image model represented by a visual dictionary and an image model represented by a representative image, features of the feature points are extracted from the representative image, and the extracted features and visual words of the video dictionary are clustered to learn a visual dictionary representing the merged image model. The merged image model is then represented by the learned visual dictionary in step 707. The method 700 then ends at step 713.

At step 709, it is determined whether the number of representative images of the image models to be merged exceeds a threshold. If the number of representative images of the image models to be merged exceeds a threshold, the method 700 proceeds to step 705. If the number of representative images of the image models to be merged does not exceed the threshold, the merged image model is represented with the representative images of the image models to be merged in step 711. The method 700 then ends at step 713.

Determining the image model to which the query image belongs is a basic function of image classification. Furthermore, the image model may also be updated from the query image. In the case where an image model to which the query image belongs is identified, the update may be performed by incorporating the query image into the identified image model.

Fig. 8 is a flowchart for explaining an image classification method 800 as a variation on the exemplary embodiment of fig. 2.

As shown in fig. 8, method 800 begins at step 801. In step 803, an image model O is determined₁-O_NCurrent image model O in (1)_kWhether of the first type or the second type.

If the current image model O_kIs a first type of image model, then at step 805 based on the query image q and the current image model O_kTo calculate the similarity between the query image q and the current image model O_kThe similarity between them.

If the current image model O_kIs the second type of image model, then in step 807 based on the features of the query image q and the current mapImage model O_kTo calculate the similarity between the visual words of the visual dictionary of the query image q and the current image model O_kThe similarity between them.

After the similarity is calculated, it is then determined whether the calculated similarity is above a similarity threshold at step 809. If the calculated similarity is not above the similarity threshold, the method 800 proceeds to step 813. If the calculated similarity is higher than the similarity threshold, the corresponding current image model O is applied in step 811_kThe image model to which the query image q belongs is identified. At step 815, the identified image model is updated by incorporating the query image into the identified image model. The method 800 then proceeds to step 817. If it is determined at step 809 that the calculated similarity is not above the similarity threshold, the method 800 proceeds to step 813.

At step 813, an image model O is determined₁-O_NWhether there is a next unprocessed image model. If there is a next unprocessed image model, then the next unprocessed image model is set as the current image model and the method 800 returns to step 803. If there is no next unprocessed image model, the method 800 ends at step 817.

In a variant of the exemplary embodiment of the image classification method described above, if no image model to which the query image belongs is identified, a new image model may be built and identified as the image model to which the query image belongs. This new image model takes the query image as the representative image.

Fig. 9 is a flowchart for explaining an image classification method 900 as a variation on the exemplary embodiment of fig. 5.

As shown in fig. 9, method 900 begins at step 901. In step 903, an image model O is determined₁-O_NCurrent image model O in (1)_kWhether of the first type or the second type.

If the current image model O_kIs a first type of image model, then at step 905 based on the query image q and the current image model O_kIs calculated from the similarity between the representative imagesQuery image q and current image model O_kThe similarity between them.

If the current image model O_kIs a second type of image model, then in step 907 it is based on the features of the query image q and the current image model O_kTo calculate the similarity between the visual words of the visual dictionary of the query image q and the current image model O_kThe similarity between them.

After the similarity is calculated, the image model O is then determined in step 909₁-O_NWhether there is a next unprocessed image model. If there is a next unprocessed image model, then the next unprocessed image model is set as the current image model and the method 900 returns to step 903. If there is no next unprocessed image model, it is determined in step 911 whether there are at least two image models having a higher similarity and the closeness of the similarities thereof satisfies a predetermined condition. For example, whether there are at least two similarities that satisfy a predetermined proximity criterion to each other and are higher than the other similarities among the similarities higher than the similarity threshold. If such at least two similarities do not exist, the method 900 proceeds to step 915. If there are such at least two similarities, then at step 913 the respective image models of such at least two similarities are merged into one image model, and the similarity between the query image and the merged image model is calculated.

At step 915, the image model with the highest similarity above the threshold is identified as the image model to which the query image belongs. In step 917, the identified image model is updated by incorporating the query image into the identified image model. Method 900 then ends at step 919.

In the process of step 913, the merged image model may also be directly identified as the image model to which the query image belongs after the merging, and the method proceeds to step 917.

FIG. 10 is a flowchart illustrating an image model update method 1000 according to an example embodiment.

As shown in fig. 10, method 1000 begins at step 1001. In step 1003, it is determined whether the image model to be updated is an image model of a second type (represented by a visual dictionary). If the image model to be updated is not the second type of image model, the method 1000 proceeds to step 1009. In step 1005, if the image model to be updated is a second type of image model, features of the feature points are extracted from the query image, and the extracted features and visual words representing a video dictionary of the image model to be updated are clustered to learn a visual dictionary representing the updated image model. If the image model to be updated is of a first type (represented by a representative image), then features of feature points are extracted from the query image and the representative image, and the extracted features are clustered to learn a visual dictionary representing the updated image model. The updated image model is then represented by the learned visual dictionary at step 1007. Method 1000 then ends at step 1013.

At step 1009, it is determined whether the number +1 of representative images of the image model to be updated exceeds a threshold. If the number of representative images of the image model to be updated +1 exceeds the threshold, method 1000 proceeds to step 1005. If the number of representative images of the image model to be updated +1 does not exceed the threshold, the updated image model is represented by the query image and the representative images of the image model to be updated at step 1011. The method 1000 then ends at step 1013.

In fig. 11, a Central Processing Unit (CPU)1101 performs various processes in accordance with a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 to a Random Access Memory (RAM) 1103. In the RAM 1103, data necessary when the CPU 1101 executes various processes and the like is also stored as necessary.

The CPU 1101, ROM 1102, and RAM 1103 are connected to each other via a bus 1104. An input/output interface 1105 is also connected to bus 1104.

The following components are connected to the input/output interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet.

A driver 1110 is also connected to the input/output interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted on the storage section 1108 as necessary.

In the case where the above-described steps and processing are implemented by software, a program constituting the software is installed from a network such as the internet or a storage medium such as the removable medium 1111.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The foregoing description of the invention has been presented for purposes of illustration and description and is not intended to be exhaustive or to limit the invention to the precise form disclosed. It will be apparent to those skilled in the art that many modifications and variations can be made in the present invention without departing from the scope and spirit thereof. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

The following exemplary embodiments (all denoted by "appendant") are described herein.

Supplementary note 1. a method of processing an image, comprising:

generating at least one image model by clustering a plurality of images, wherein each image model is represented by images that are similar to each other in the images; and

if the number of images representing an image model exceeds a threshold, a visual dictionary is learned from the images representing the image model and the image model is represented with the visual dictionary in place of the images representing the image model.

Supplementary notes 2. the method of supplementary notes 1, wherein the learning is based on features extracted from an image representing the image model, and the features include scale-invariant feature transform features and/or color name features.

Note 3. the method of note 2, wherein in the case where the feature includes a color name feature, the color name feature of a local block is calculated as an average of the color name features of all pixels within the local block.

Note 4. a method of processing an image, comprising:

calculating a similarity between the image and at least one image model; and

identifying an image model corresponding to a higher similarity above a similarity threshold as an image model to which the image belongs,

wherein if one of said image models is a first type image model represented by at least one representative image, a similarity between said image and said image model is calculated based on a similarity between said image and said representative image, and

if one of the image models is a second type of image model represented by a visual dictionary, calculating a similarity between the image and the image model based on a similarity between features of the image and visual words of the visual dictionary.

Appendix 5. the method of appendix 4, wherein the features on which the visual dictionary is based include scale-invariant feature transform features and/or color name features.

Supplementary notes 6. the method of supplementary notes 4, wherein the calculation of the similarity comprises:

if one of said image models is a first type of image model, then

For each feature point of the image, selecting a feature point in a representative image from the representative images of the image model, wherein the closeness degree between the features of the feature point of the image and the features of the feature point of the selected representative image meets a preset requirement; and

calculating a similarity between the image and the image model based on a similarity between features of respective feature points of the image and features of the respectively selected feature points.

Supplementary note 7. the method of supplementary note 4, wherein the calculation of the similarity comprises:

if one of said image models is an image model of the second type, then

Selecting a visual word from a visual dictionary of the image model for each feature point of the image, wherein the degree of closeness between the features of the feature points of the image and the selected visual word meets a preset requirement; and

calculating a similarity between the image and the image model based on a similarity between features of respective feature points of the image and the respective selected visual words.

Note 8. the method of note 6 or 7, wherein the similarity is calculated as a weighted sum of the similarities of the features of the respective feature points.

Reference 9. the method of reference 6 or 7, wherein the features on which the calculation of the similarity is based comprise scale invariant feature transform features and/or color name features.

Reference 10. the method of reference 4, wherein said identifying comprises:

if at least two image models have higher similarity and the similarity meets a preset condition, combining the at least two image models into one image model; and

identifying the merged image model as the image model to which the image belongs,

wherein if one of the at least two image models is represented by a visual dictionary, the merging comprises:

a visual dictionary representing the merged image model is learned from the representations of the at least two image models.

Supplementary notes 11. the method of supplementary notes 10, wherein if the at least two image models are both represented by representative images and the number of representative images exceeds a threshold, the merging comprises:

a visual dictionary representing the merged image model is learned from a representative image representation representing the at least two image models.

Appendix 12. the method of appendix 10, wherein if the at least two image models are both represented by representative images and the number of representative images does not exceed a threshold, the merging comprises:

the merged image model is represented by a representative image representing the at least two image models.

Supplementary note 13. the method of supplementary note 4, 10, 11 or 12, further comprising:

if the recognized image model is represented by a visual dictionary, the visual dictionary representing the recognized image model is learned from the visual dictionary representing the recognized image model and the image.

Supplementary notes 14. the method of supplementary notes 4, 10, 11 or 12, further comprising:

if the identified image model is represented by a representative image and the total number of the representative image and the images exceeds a threshold, a visual dictionary is learned from the representative image and the images in place of the representative image representing the identified image model.

Supplementary note 15. the method of supplementary note 4, 10, 11 or 12, further comprising:

representing the identified image model with a representative image and the image if the identified image model is represented by the representative image and a total number of the representative image and the image does not exceed a threshold.

Note 16. an apparatus for processing an image, comprising:

at least one processor configured to:

Note 17. the apparatus of note 16, wherein the learning is based on features extracted from an image representing the image model, and the features include scale-invariant feature transform features and/or color name features.

Supplementary notes 18. the apparatus of supplementary notes 17, wherein in the event that the features include color name features, the color name features of a local block are calculated as the mean of the color name features of all pixels within the local block.

Supplementary note 19. an apparatus for processing an image, comprising:

at least one processor configured to:

calculating a similarity between the image and at least one image model; and

Reference 20. the apparatus of reference 19, wherein the features on which the visual dictionary is based comprise scale-invariant feature transform features and/or color name features.

Supplementary notes 21. the apparatus of supplementary notes 19, wherein the calculation of the similarity comprises:

if one of said image models is a first type of image model, then

Supplementary notes 22. the apparatus of supplementary notes 19, wherein the calculation of the similarity comprises:

if one of said image models is an image model of the second type, then

Supplementary notes 23. the apparatus as claimed in supplementary notes 21 or 22, wherein the similarity is calculated as a weighted sum of the similarities of the features of the respective feature points.

Supplementary notes 24. the apparatus of supplementary notes 21 or 22, wherein the features on which the calculation of similarity is based include scale-invariant feature transform features and/or color name features.

Reference 25. the apparatus of reference 19, wherein said identifying comprises:

Supplementary notes 26. the apparatus of supplementary notes 25, wherein if the at least two image models are each represented by representative images and the number of representative images exceeds a threshold, the merging comprises:

Annex 27. the apparatus of annex 25, wherein if the at least two image models are both represented by representative images and the number of representative images does not exceed a threshold, the merging comprises:

Supplementary note 28. the apparatus of supplementary note 19, 25, 26, or 27, wherein the processor is further configured to:

Reference 29 the apparatus of reference 19, 25, 26, or 27, wherein the processor is further configured to:

Supplementary note 30 the apparatus of supplementary note 19, 25, 26, or 27, wherein the processor is further configured to:

Claims

1. A method of processing an image, the method being based on two types of image models and comprising:

calculating a similarity between the image and at least one image model; and

2. The method of claim 1, wherein the calculating of the similarity comprises:

if one of said image models is a first type of image model, then

3. The method of claim 1, wherein the calculating of the similarity comprises:

if one of said image models is an image model of the second type, then

4. The method of claim 1, wherein the identifying comprises:

5. The method of claim 4, wherein if the at least two image models are each represented by a representative image and the number of representative images exceeds a threshold, the merging comprises:

6. The method of claim 1, 4 or 5, further comprising:

7. The method of claim 1, 4 or 5, further comprising:

8. The method of claim 1, 4 or 5, further comprising:

9. An apparatus for processing an image, comprising:

at least one processor configured to perform the method of any one of claims 1 to 8.

10. A method of processing an image, comprising: