METHOD AND APPARATUS OF IDENTIFYING SIMILAR IMAGES
CROSS REFERENCE TO RELATED PATENT APPLICATIONS
This application claims priority to Chinese Patent Application No. 201110031701.8, filed on January 28, 2011, entitled "METHOD AND DEVICE OF SIMILAR IMAGE RECOGNITION," which is hereby incorporated by reference in its entirety.
TECHN ICAL FIELD
This disclosure relates to the field of multimedia image recognition. More specifically, the disclosure relates to methods and devices of image recognition.
BACKGROUND
Similar image retrieval, as a type of multimedia recognition technologies, is booming. The similar image retrieval may comprise feature extraction, index construction, inquiries, and similarities sorting. However, conventional technologies of similar image retrieval may present some problems (e.g., poor compatibility and fault tolerance problems) because conventional image signatures do not contain content information of images. For example, two images that have identical content but saved in different formats (e.g., bmp, jpeg, png, or gif) may be considered as different images.
SUMMARY
This disclosure provides methods and devices for identifying similar images. In some embodiments, a user may submit an image and request from a server a set of images that are similar to the submitted image. Upon receiving the submitted image, the server may generate an image signature based on the content of that image. The server may also conduct a Hash operation to the image signature to generate one or more Hash values. The one or more values may be used to identify candidate images similar to the image. The server may then determine an image similarity between each of these candidate images and the image.
In some embodiments, to generate the image signature, the server may convert the image into a gray image and then divide the gray image into multiple sub-images. The server may then calculate edge histograms of each of the multiple sub-images. Based on the edge histograms, the server may generate the image signature. In some embodiments, to generate the one or more Hash values, the server may conduct a Hash operation to the image signature using Locality-sensitive hashing functions to generate one or more first Hash values. The server may then conduct a Hash operation to each of the one or more first Hash values to generate one or more second Hash values. The one or more second Hash values may be used to identify the candidate images. In some embodiments, to determine the image similarity, the server may determine a similarity between the image and each of the one or more candidate images using at least one of Hamming distance or Euclidean distance.
BRIEF DESCRI PTION OF TH E DRAWI NGS
The Detailed Description is described with reference to the accompanying figures. The use of the same reference numbers in different figures indicates similar or identical items.
FIG. 1 is a schematic diagram showing an exemplary environment for identifying similar images.
FIG. 2 is a schematic block diagram showing an exemplary of the image server(s) of FIG. 1.
FIG. 3 is a schematic block diagram showing another exemplary of image server(s) of FIG. 1.
FIG. 4 is a flowchart showing an exemplary process of identifying similar images. FIG. 5 is a flowchart showing an exemplary process of obtaining an image signature.
FIG. 6 is a flowchart showing an exemplary process of creating an image index. FIG. 7 is a flowchart showing another exemplary process of identifying similar images.
DETAILED DESCRIPTION
The detailed description of a method of and server for TCP is set forth with reference to the accompanying figures.
FIG. 1 is a schematic diagram showing an exemplary environment 100 for identifying similar images. The environment 100 may include image servers 102 interacting with a user 104 and a user 106. The image servers 102 may include upload servers 108, an image database device 110, index servers 112, image
calculation servers 114, and image search front servers 116.
The user 104 may uploads an image to the upload servers 108 to create image index database. The upload servers 108 may calculate an image signature of the image and send the image signature to the image database device 110. In some embodiments, the image may be obtained from the Internet and saved in the upload servers 108 using a crawler system. In some embodiments, the upload servers 108 may calculate the image signature of the image based on content information of the image, thus reducing probability of a conflict. For example, the content information of the image may include metadata (e.g., color, grain, etc.) instead of a non-byte stream.
The image database device 110 may conduct a Hash operation to the image by using Locality-sensitive hashing (LSH) after receiving the image signature. The resulting values are then sent as index information to the index servers 112.
The user 106 may send an image to the search front servers 116 and request the image servers to return a set of images similar to the submitted image. The image search front servers 116 may transmit the image to the image calculation servers 114 and request the similar images or a list of the similar images. In some embodiments, an image signature may be generated based on the content information of the image, and then the index servers 112 search for similar images based on the image signature. In some embodiments, the image calculation servers 114 may search the similar images (e.g., IDs) based on Hash values of these similar images on the index servers 112. For example, the image calculation servers 114 may identify IDs associated with images having the same Hash value as the image. The similar image or a list of similar images may be returned to the image search
front server 116 and displayed to the user 106.
FIG. 2 is a schematic block diagram showing details of exemplary image servers 102 of FIG. 1. The image servers 102 may be configured as any suitable servers. In one exemplary configuration, the image servers 102 include processors 202, input/output interfaces 204, network interfaces 206, and memory 208.
The memory 208 may include computer-readable media in the form of volatile memory, such as random-access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash RAM. The memory 208 is an example of computer-readable media.
Computer-readable media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk readonly memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. As defined herein, computer-readable media does not include transitory media such as modulated data signals and carrier waves.
Turning to the memory 208 in more detail, the memory 208 may store any
number of software modules or units, including an output unit 210, a recognition unit 212, a search unit 214, an obtaining unit 216, a Hash operation unit 218, and a Hash table creation unit 220. The obtaining unit 216 may obtain an image signature of an image submitted by the user 106. The Hash operation unit 218 may conduct a Hash operation to the image signature. The searching unit 214 may search for candidate images in predetermined Hash tables. The recognition unit 212 may identify and retrieve image(s) similar to the image from candidate images.
In some embodiments, the image servers 102 may determine similarities of images based on image signatures generated from content of the images. For example, the more similar the images are, the more similar their image signatures are. In some embodiments, the similarities may be related to Hamming distance or Euclidean distance. For example, the shorter these distances, the more similar are the images. Accordingly, images that are the same but saved in different formats may have the same image signatures. Also, problems of fault tolerance with the conventional technologies may be resolved.
In some embodiments, the output unit 210 may output the candidate images based on the similarities. In some embodiments, the Hash table creation unit 220 may create L Hash tables by conducting Hash operations to the image signature to obtain L Hash values by using L LSH functions, wherein L is an integer. The Hash table creation unit 220 may also conduct a Hash operation to the L first Hash values to obtain L second Hash values by using an overall Hash function. The Hash table creation unit 220 may record a jth second Hash value, an image identification and an image signature in the jth Hash table, wherein j=l...L.
FIG. 3 is a schematic block diagram showing another exemplary of image
servers 300 that may include, similar to the image servers 102 of FIGS. 1 and 2, the recognition unit 212, the searching unit 214, the obtaining unit 216, and the Hash operation unit 218.
The obtaining unit 216 may include a conversion module 302, a calculation module 304, and a generation module 306. The conversion module 302 may convert an image submitted by the user 106 to a gray image. The calculation module 304 may then divide the gray image into NxN sub-images, calculate edge histograms from M directions for each of the sub-images, and obtain NxNxM calculation results, wherein N and M are integers. The generation module 306 may assemble the NxNxM calculation results as an NxNxM dimensional vector of the image signature. In some embodiments, non-uniform quantitative technologies based on human eye sensitivity can be applied to the NxNxM dimensional vector. In some embodiments, statistic data of edge histograms may be used as basic features of the image signature. The statistic data combined with the image division and the non-uniform quantitative technologies based on human eye sensitivity may be resolved problems caused by different colors, scales, or partial fuzzy & distortion. In some embodiments, the gray images may be divided into NxP sub-images.
In some embodiments, the calculation module 304 may, for each image block included in a sub-image, calculate a gradient value on each of the M directions. The calculation module 304 may then select one of the M directions that has the greatest gradient value as the direction of the corresponding image block for statistics. The module 304 may further calculate the number of the selected directions for each of the M directions and obtain the statistic value as histograms of the sub-image.
Suppose that a sub-image has 1000 image blocks, and M has 5 directions
including A, B, C, D and E (i.e., M=5). Further, suppose that direction A for statistic data are 100 image blocks, direction B has 200 image blocks, direction C has 300 image blocks, direction D has 400 image blocks, and direction E has no image blocks. As a result, the statistics value (i.e., histograms) of the sub-image may be represented by a vector (100, 200, 300, 400, 0).
In some embodiments, the Hash operation unit 218 may include a first Hash operation module 308 and a second Hash operation module 310. The first Hash operation module 308 may conduct Hash operations to the image signature, and obtain L first Hash values by using L LSH functions, wherein L is an integer. The second Hash operation module 310 may conduct Hash operations to the L first Hash values, and obtain L second Hash values by using an overall Hash function. In some embodiments, a method of mass similar image index based on the LSH technology may be used to reduce complexity of search time to a non-linear level.
In some embodiments, the searching unit 214 may include a searching module 312 and an addition module 314. The searching module 312 may search an entry recording the second Hash value in the Hash tables. Each entry of the Hash tables records a Hash value and an image identification of an image. Each entry may further include an image signature. The addition module 314 may add an image to the candidate images upon identifying an entry having the second Hash value in the Hash tables. In some embodiments, an ith second Hash value may correspond to an ith Hash table, and i=l ... L.
In some embodiments, the first Hash operation module 308 may convert an image signature to a R dimensional binary vector, wherein R is an integer. The first Hash operation module 308 may then generate L LSH functions by using the R
dimensional binary vector. For example, each of the LSH functions is generated based on a one-dimensional or multiple-dimensional binary vector of the R dimensional binary vector.
In some embodiments, the first Hash operation module 308 may assign an input parameter of a LSH function as K, randomly select K dimensional binary vector from the R dimensional binary vector, and combine the K dimensional binary vector as a return value of the LSH function, wherein K<R.
In some embodiments, the recognition unit 212 may include a calculation module 316 and a recognition module 318. The calculation module 316 may calculate a spatial distance between an image signature of a submitted image and an image signature of a candidate image (e.g., Hamming distance or Euclidean distance). The recognition module 318 may then determine a similarity between the candidate image and the submitted image based on the spatial distance. For example, the smaller the spatial distance is, the more similar the candidate image is to the submitted image.
FIGS. 4-7 are flowcharts showing exemplary processes of identifying similar images. The processes are illustrated as a collection of blocks in logical flowcharts, which represent a sequence of operations that can be implemented in hardware, software, or a combination. For discussion purposes, the processes are described with reference to the image servers 102 shown in FIGS. 1-3. However, the processes may be performed using different environments and devices. Moreover, the environments and devices described herein may be used to perform different processes.
FIG. 4 is a flowchart showing an exemplary process 400 of identifying similar
images. The process 400 may be performed by one or more processors of the image servers, which may be inclusive of one or more microcontrollers (MCU) or one or more field-programmable gate arrays (FPGAs).
At 402, the image servers may receive an image submitted by the user 106 via, for example, a USB transmission interface, a Bluetooth transmission interface, or an Ethernet transmission interface. At 404, the obtaining unit 216 may generate or otherwise obtain an image signature of the submitted image. For example, the obtaining unit 216 may involve a MCU or a FPGA to calculate the image signature.
At 406, the Hash operation unit 218 may conduct a Hash operation to the image signature. In some embodiments, the Hash operation unit 218 may include a MCU or a FPGA configured to perform the Hash operations. In some embodiments, the functions performed by the Hash operation unit 218 and the obtaining unit 216 may be implemented by the same processor(s).
At 408, the search unit 214 may search entries corresponding to a result of the Hash operation in predetermined Hash tables to identify candidate images. At 410, the recognition unit 212 may recognize an image similar to the submitted image from candidate images corresponding to the entries. In some embodiments, the image calculation server 108 obtains, locally or from third-party equipment, the candidate images based on the IDs of similar images returned by the index server 106, and recognizes the image similar to the submitted image from the candidate images based on the entries. In some embodiments, the recognition unit 212 may obtain, locally or from third-party equipment, the candidate image based on the ID of the similar images returned by the searching unit 214, and recognizes the image similar to the submitted image from the candidate images based on the entries.
In some embodiment, the output unit 210 may include a Bluetooth transmission module, an infrared transmission module and/or an Ethernet transmission module to output the identified similar image and to display to the user 106.
FIG. 5 is a flowchart showing an exemplary process 500 of obtaining an image signature. The process 500 may generate an image signature based on the grain features of edge histograms.
At 502, the image servers convert an image submitted by the user 106 to a gray image to make final results not sensitive to color and/or light. At 504, the calculation module 304 may divide the gray image into N xN sub-images. At 506, the calculation module 304 may further divide each of the sub-images into a fix number of image blocks. The area of each image block depends on the area of the submitted image. At 508, the calculation module 304 may then calculate edge histograms from one or more directions for each of the sub-images. For example, the calculation module 304 may calculate a gradient value of an image block on five (5) directions to select a direction having the greatest gradient value to be the direction of the image block for statistics. The calculation module 304 may calculate the frequency that each of the 5 directions is selected as the direction for statistics in one sub-image. The frequency may be used as histograms of the sub-image.
Suppose that a sub-image is divided into 1000 image blocks, and the one or more directions include A, B, C, D and E, 5 directions. Further, suppose that direction A is selected as the directions of 100 image blocks for statistics, direction B had 200 image blocks, direction C has 300 image blocks, direction D has 400 image blocks, and direction E has 0 image blocks. As a result, the statistics value (i.e.,
corresponding histograms) of the sub-image is a vector (100, 200, 300, 400, 0).
At 510, the generation module 306 may combine the vector of each sub- image to a multiple dimensional vector as the image signature of the image submitted by the user 106. If N=4, an image signature may be presented as a 4x4x5=80 dimensional vector. In some embodiments, at 512, a non-uniform quantitative method may be adopted given non-uniform characteristics of human eye sensitivity. The generation module 306 may conduct quantitative compression to, for example, the 80 dimensional vector to achieve better space utilization. For example, an image signature would be approximately 240 bits (i.e., 30 bytes) if 8 digitals between 0-7 for quantitative process are used. The process may save 90% of storage.
At 514, the image servers 102 may obtain the image signature after the compression process. In some embodiments, the conversion module 302, the calculation module 304 and the generation module 306 may be implemented in a MCU.
FIG. 6 is a flowchart showing an exemplary process 600 of creating an image index. At 602, the first Hash operation module 308 may convert a vector obtained in the process 500 to a high-dimensional binary vector (i.e., each dimension is 1 or 0) in a Hamming space. For example, suppose that a one-dimensional vector value is X and the biggest value is C. As a result, the vector presented in a Hamming space is C- dimensional binary vector including X consecutive Is right after C-X consecutive 0s.
At 604, the first Hash operation module 308 may define a Hash function G by randomly selecting a K dimensional binary vector from vectors obtained in the operation 602, and return a value after combining the results. In these instances, the
greater the similarity between the vectors is, the greater the probability of generating the same Hash value is.
At 606, the first Hash operation module 308 may conduct L Hash functions on the high dimensional vector obtained in the operation 602 to reduce errors of the similarity search, wherein L is an integer. At 608, the second Hash operation module 310 may conduct Hash operations to the results from the operation 606 with the traditional Hash function (e.g., Message-Digest Algorithm 5 (MD5)). At 610, the second Hash operation module 310 may save Hash results of the operation 608 as keys and unique IDs of image in corresponding L Hash tables, wherein L is an integer. In some embodiments, images with the same image signatures may be saved in the same bin, while images with different image signatures are more likely to be saved in different bins. In some embodiments, the first Hash operation module 308 and the second Hash operation module 310 may be implemented by a same encoding chip or a same MCU.
In some embodiments, because accuracy and recall rates may be dramatically affected by different K and L, a simulation may be conducted for prediction in advance. The image index, supported by appropriate hardware, may be stored in the memory to increase efficiency in searching. In some embodiments, an image index document may be stored in local disks or processed in a way of distribution for a mass sample database.
FIG. 7 is a flowchart showing an exemplary process 700 of identifying similar images. At 702, the obtaining unit 210 may calculate an image signature of the image submitted by the user 106. At 704, the Hash operation unit 218 may define a LSH function based on the image signature. At 706, a Hash code of the image
signature may be calculated using multiple LSH functions that are randomly selected. At 708, a Hash operation (e.g., MD5) may be conducted to obtained results. At 710, the searching unit 214 may search L Hash tables using the results obtained in the operation 708 as keys.
At 712, the searching unit 214 may add candidate images based on the searched results to a candidate image array. At 714, the recognition unit 212 may calculate a spatial distance (e.g., Hamming distance or Euclidean distance) between each image signature in the candidate image array and the image signature of the image submitted by the user 106. In some embodiments, the distance may indicate similarity between the candidate image and the submitted image.
At 716, the output unit 210 may sort and output candidate images based on the distance. In these instances, a number of the image signatures in the candidate array is much smaller than a number of the image signatures in database. Accordingly, the cost of calculation can be greatly reduced while compared with conventional technologies.
In some embodiments, the Hash operation unit 218 may be a MCU or a FPGA. In some embodiments, the functions operated by the Hash operation unit 218 and the obtaining unit 216 can be implemented by a same processor. In some embodiments, the searching unit 214 may communicate with database through an internal bus, and the searching unit 214 and the recognition unit 212 may be implemented by a same MCU.
In this disclosure, an image signature may be determined based on content of the image. Similarity of image signatures depends on similarity of image content. For example, the more similar two images are, the more similar their image
signatures are. The similarity may be represented by a spatial distance (e.g., Hamming distance or Euclidean distance). Therefore, two identical images but saved in different formats may have the same image signatures, resolving the fault tolerance problem existing in the conventional similar image recognition technologies. The disclosure also introduces methods of creating a mass similar image index based on a LSH Hash technology. The methods may reduce complexity of search, save time to a non-linear level, and sort and output results based on similarities. The disclosure also provides edge histogram statistics as a basic feature of the image signature, an image division and the non-uniform quantitative technology based on human eye sensitivity.
The em bodiments in this disclosure are merely for illustrating purposes and are not intended to limit the scope of this disclosure. A person having ordinary skill in the art would be able to make changes and alterations to embodiments provided in this disclosure. Any changes and alterations that persons with ordinary skill in the art would appreciate fall within the scope of this disclosure.