US20120093371A1 - Generating search requests from multimodal queries - Google Patents
Generating search requests from multimodal queries Download PDFInfo
- Publication number
- US20120093371A1 US20120093371A1 US13/332,248 US201113332248A US2012093371A1 US 20120093371 A1 US20120093371 A1 US 20120093371A1 US 201113332248 A US201113332248 A US 201113332248A US 2012093371 A1 US2012093371 A1 US 2012093371A1
- Authority
- US
- United States
- Prior art keywords
- image
- query
- images
- component
- multimodal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
- Y10S707/99945—Object-oriented database structure processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99948—Application of database or data structure, e.g. distributed, multimedia, or image
Definitions
- search engine services such as Google and Overture, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request or query that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling and indexing” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages.
- the keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on.
- the search engine service then ranks the web pages of the search result based on the closeness of each match, web page popularity (e.g., Google's PageRank), and so on.
- the search engine service may also generate a relevance score to indicate how relevant the information of the web page may be to the search request.
- the search engine service displays to the user links to those web pages in an order that is based on their rankings.
- search engine services may, however, not be particularly useful in certain situations.
- the person when returning home may formulate the search request of “picture of yellow tulip-like flower in Europe” (e.g., yellow tulip) in hopes of seeing a picture of the flower.
- the search result may identify so many web pages that it may be virtually impossible for the person to locate the correct picture assuming that the person can even accurately remember the details of the flower.
- the person may be able to submit the search request while at the side of the road.
- a mobile device such as a personal digital assistant (“PDA”) or cell phone
- PDA personal digital assistant
- Such mobile devices have limited input and output capabilities, which make it both difficult to enter the search request and to view the search result.
- CBIR Content Based Information Retrieval
- the detection of duplicate images can be achieved when the image database of the CBIR system happens to contain a duplicate image, the image database will not contain a duplicate of the picture of the flower at the side of the road. If a duplicate image is not in the database, it can be prohibitively expensive computationally, if even possible, to find a “matching” image. For example, if the image database contains an image of a field of yellow tulips and the picture contains only a single tulip, then the CBIR system may not recognize the images as matching.
- a method and system for generating a search request from a multimodal query is provided.
- the multimodal query system inputs a multimodal query that includes a query image and query text.
- the multimodal query system provides a collection of images along with one or more words associated with each image.
- the multimodal query system identifies images of the collection that are textually related to the query image based on similarity between associated words and the query text.
- the multimodal query system selects those images of the identified images that are visually related to the query image.
- the multimodal query system may formulate a search request based on keywords of the web pages that contain the selected images and submit that search request to a search engine service, a dictionary service, an encyclopedia service, or the like.
- the multimodal query system Upon receiving the search result, the multimodal query system provides that search result as the search result for the multimodal query.
- FIG. 1 is a block diagram that illustrates the overall processing of the multimodal query system in one embodiment.
- FIG. 2 is a block diagram that illustrates components of the multimodal query system in one embodiment.
- FIG. 3 is a flow diagram that illustrates the processing of the perform multimodal query component in one embodiment.
- FIG. 4 is a diagram that illustrates the generating of a signature of an image in one embodiment.
- FIG. 5 is a flow diagram that illustrates the processing of the calculate image signature component in one embodiment.
- FIG. 6 is a flow diagram that illustrates the processing of the find related images component in one embodiment.
- FIG. 7 is a flow diagram that illustrates the processing of the identify images by textual relatedness component in one embodiment.
- FIG. 8 is a flow diagram that illustrates the processing of the select images by visual relatedness component in one embodiment.
- FIG. 9 is a flow diagram that illustrates the processing of the create indexes component in one embodiment.
- FIG. 10 is a flow diagram that illustrates the processing of the generate signature-to-image index component in one embodiment.
- FIG. 11 is a flow diagram that illustrates the processing of the generate image-to-related-information index component in one embodiment.
- FIG. 12 is a flow diagram that illustrates the processing of the select keywords for web page component in one embodiment.
- FIG. 13 is a flow diagram that illustrates the processing of the score keywords of web page component in one embodiment.
- FIG. 14 is a flow diagram that illustrates the processing of the generate word-to-image index in one embodiment.
- the multimodal query system inputs a multimodal query that includes an image (i.e., query image) and verbal information (i.e., query text).
- a multimodal query may include a picture of a flower along with the word “flower.”
- the verbal information may be input as text via a keyboard, audio via a speaker, and so on.
- the multimodal query system provides a collection of images along with one or more words associated with each image. For example, each image of the collection may have associated words that describe the subject of the image. In the case of an image of a yellow tulip, the associated words may include yellow, tulip, lily, flower, and so on.
- the multimodal query system identifies images of the collection whose associated words are related to the query text. The identifying of images based on relatedness to the query text helps to reduce the set of images that may be related to the query image.
- the multimodal query system selects those images of the identified images that are visually related to the query image. For example, the multimodal query system may use a content base information retrieval (“CBIR”) system to determine which of the identified images are most visually similar to the query image.
- the multimodal query system may return the selected images as the search result. For example, the multimodal query system may provide links to web pages that contain the selected images.
- the multimodal query system may formulate a search request based on keywords of the web pages that contain the selected images and submit that search request to a search engine service, a dictionary service, an encyclopedia service, or the like.
- the keywords of the web pages that contain the selected images may include the phrases yellow tulip, tulipa, Liliaceae lily flower, Holland yellow flower, and so on, and the formulated search request may be “yellow tulip lily flower Holland.”
- the multimodal query system Upon receiving the search result, the multimodal query system provides that search result as the search result for the multimodal query. In this way, the multimodal query system allows the multimodal query to specify needed information more precisely than is specified by a unimodal query (e.g., query image alone or query text alone).
- the multimodal query system may generate from the collection of images a word-to-image index for use in identifying the images that are related to the query text.
- the word-to-image index maps images to their associated words. For example, the words tulip, flower, and yellow may map to the image of a field of yellow tulips.
- the multimodal query system may generate the collection of images from a collection of web pages that each contain one or more images.
- the multimodal query system may assign a unique image identifier to each image of a web page.
- the multimodal query system may then identify words associated with the image. For each associated word, the multimodal query system adds an entry that maps the word to the image identifier.
- the multimodal query system uses these entries when identifying images that are related to the query text.
- the multimodal query system may use conventional techniques to identify the images that are most textually related to the query text based on analysis of the associated words.
- the multimodal query system may generate from the collection of images an image-to-related-information index for use in selecting the identified images that are visually related to the query image.
- the image-to-related-information index may map each image to a visual feature vector of the image, a bitmap of the image, a web page that contains the image, and keywords of the web page that are associated with the image.
- the multimodal query system For each image, the multimodal query system generates a visual feature vector of features (e.g., average RGB value) that represents the image.
- the multimodal query system When determining whether an image of the collection is visually related to a query image, the multimodal query system generates a visual feature vector for the query image and compares it to the visual feature vector of the image-to-related-information index.
- the multimodal query system may identify, from the web page that contains an image, keywords associated with the image and store an indication of those keywords in the image-to-related-information index.
- the multimodal query system uses the keywords associated with the selected images to formulate a unimodal or text-based search request for the multimodal query.
- the multimodal query system may initially search the collection of images to determine whether there is a duplicate image. If a duplicate image is found, then the multimodal query system may use the keywords associated with that image (e.g., from the image-to-related-information index) to formulate a search request based on the multimodal query. If no duplicate image is found, then the multimodal query system uses the query text to identify images and then selects from those identified images that are textually and visually related to the query image as described above. The multimodal query system may generate a signature-to-image index for identifying duplicate images by comparing signatures of the images of the collection to the signature of a query image.
- the multimodal query system may generate a signature-to-image index for identifying duplicate images by comparing signatures of the images of the collection to the signature of a query image.
- the multimodal query system may use various hashing algorithms to map an image to a signature that has a relatively high likelihood of being unique to that image within the collection (i.e., no collisions). To identify duplicate images, the multimodal query system generates a signature for the query image and determines whether the signature-to-image index contains an entry with the same signature.
- FIG. 1 is a block diagram that illustrates the overall processing of the multimodal query system in one embodiment.
- the input to the multimodal query system is a multimodal query that includes a query image 101 and a query text 102 .
- the system initially generates a signature for the query image.
- decision block 103 if the signature-to-image index contains an entry with a matching signature, then the collection contains a duplicate image and the system continues at block 106 , else the system continues at block 104 .
- the system identifies images that are textually related to the query text using the word-to-image index. Before identifying the images, the system may use various techniques to expand the query text, such as by adding to the query text synonyms of the original words of the query text.
- the output of block 104 is the identified images that are most textually related.
- the system selects from the identified textually related images those images that are visually related to the query image using the image-to-related-information index.
- the system may determine the visual distance between the visual feature vector of the query image and the visual feature vector of each image of the collection and select the images with the smallest visual distances as being most visually related.
- the output of block 105 is the selected visually related images.
- the system formulates a search request based on the images selected in block 105 or on the duplicate image as identified in block 103 .
- the system retrieves the keywords associated with the selected images or the duplicate image and generates a text-based search request from those keywords.
- the system submits the search request to a search engine service, a dictionary service, an encyclopedia service, or the like.
- the system then returns the search result provided by the search engine as the search result for the multimodal query.
- FIG. 2 is a block diagram that illustrates components of the multimodal query system in one embodiment.
- the multimodal query system includes an indexing component 210 , a data store 220 , and a querying component 230 .
- the indexing component generates the indexes from a collection of web pages.
- the indexing component includes a generate indexes component 211 , a generate signature-to-image index component 212 , a generate image-to-related-information index component 213 , and a generate word-to-image index component 214 .
- the generate indexes component invokes the other components of the indexing component to generate the appropriate index.
- the data store 220 includes a web page store 221 , a signature-to-image index 222 , an image-to-related-information index 223 , and a word-to-image index 224 .
- the web page store contains a collection of web pages from which the indexes are generated.
- the index may be organized using various data structures such as hash tables, B-trees, ordered list, so on. In addition, the indexes may be represented by a single data structure or separate data structures.
- the querying component 230 includes a perform multimodal query component 231 , a calculate image signature component 232 , a find related images component 233 , an identify images by textual relatedness component 234 , and a select images by visual relatedness component 235 .
- the perform multimodal query component is invoked to perform a multimodal query on an input query image and an input query text.
- the component invokes the calculate image signature component to generate a signature for the query image for use in determining whether the collection of images contains a duplicate of the query image.
- the component also invokes the find related images component to find images that are related when no duplicate image has been found.
- the find related images component invokes the identify images by textual relatedness component and the select images by visual relatedness component to find the related images.
- the perform multimodal query component then formulates a text-based search request based on the keywords associated with the related images and submits the search request to a search engine service, a dictionary service, an encyclopedia service, or the like to generate the search result for the multimodal query.
- the computing devices on which the multimodal query system may be implemented may include a central processing unit, memory, input devices (e.g., keyboard and pointing devices), output devices (e.g., display devices), and storage devices (e.g., disk drives).
- the memory and storage devices are computer-readable media that may contain instructions that implement the multimodal query system.
- the data structures may be stored or transmitted via a data transmission medium, such as a signal on a communications link.
- Various communications links may be used to connect components of the system, such as the Internet, a local area network, a wide area network, a point-to-point dial-up connection, a cell phone network, and so on.
- Embodiments of the multimodal query system may be implemented in various operating environments that include personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics, digital cameras, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and so on.
- the devices may include cell phones, personal digital assistants, smart phones, personal computers, programmable consumer electronics, digital cameras, and so on.
- the multimodal query system may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices.
- program modules include routines, programs, objects, components, data structures, and so on that perform particular tasks or implement particular abstract data types.
- functionality of the program modules may be combined or distributed as desired in various embodiments.
- FIG. 3 is a flow diagram that illustrates the processing of the perform multimodal query component in one embodiment.
- the component is passed a multimodal query that includes a query image and a query text.
- the query text includes one or more query words.
- the component invokes the calculate image signature component to calculate the signature of the query image.
- decision block 302 if the signature-to-image index contains the calculated signature, then a duplicate image has been found and the component continues at block 305 , else the component continues at block 303 .
- decision block 303 if query text was provided (e.g., not blank), then the component continues at block 304 , else the component completes without performing the search.
- the component invokes the find related images component to find images related to the query image.
- the component extracts keywords from the image-to-related-information index for the related images or the duplicate image.
- the component formulates a search request based on the keywords and submits the search request to a search engine. The component then completes.
- FIG. 4 is a diagram that illustrates the generating of a signature of an image in one embodiment.
- Image 401 represents the image for which the signature is to be generated.
- the system converts the image to a gray level image as represented by image 402 .
- the system then divides the image into blocks (e.g., 8-by-8 blocks) as illustrated by image 403 .
- the system calculates the average intensity of each block to generate matrix 404 as indicated by the following equation:
- I ij is the average intensity for block ij and x and y represent the pixels of block ij.
- the system then performs a two-dimensional discrete cosine transform (“DCT”) on the matrix.
- DCT discrete cosine transform
- the system discards the DC coefficient of the DCT matrix and selects 48 AC coefficients of the DCT matrix in a zigzag pattern as illustrated by pattern 405 resulting in an AC coefficients vector 406 .
- the system then performs a principal component analysis (“PCA”) to generate a 32-dimension feature vector 407 as illustrated by the following equation:
- Y n represents the 32-dimension feature vector
- a m represents the 48 AC coefficients
- P represents an m ⁇ n transform matrix whose columns are the n orthonormal eigenvectors corresponding to the first n largest eigenvalues of the covariance matrix ⁇ A m
- the system generates a 32-bit hash value 408 from the 32-dimension feature vector by setting the value of each of the 32 bits to 1 if the corresponding 32-dimension feature vector is greater than 0, and to 0 otherwise.
- One skilled in the art will appreciate that many different algorithms may be used to generate a signature for an image.
- FIG. 5 is a flow diagram that illustrates the processing of the calculate image signature component in one embodiment.
- the component is passed an image and generates a signature for the image.
- the component converts the image into a gray level image.
- the component divides the image into blocks.
- the component calculates the average intensity of each block to generate an intensity matrix.
- the component performs a two-dimensional discrete cosine transform on the intensity matrix.
- the component extracts 48 AC coefficients from the DCT matrix.
- the component performs a PCA to generate a 32-dimension feature vector from the 48 AC coefficients.
- the component generates a 32-bit signature from the 32-dimension feature vector and then completes.
- FIG. 6 is a flow diagram that illustrates the processing of the find related images component in one embodiment.
- the component is passed a multimodal query containing a query image and a query text and returns an indication of images that are related to the multimodal query.
- the component invokes the identify images by textual relatedness component to identify images that are related to the query text.
- the component invokes the select images by visual relatedness component to select those identified images that are visually related to the query image. The component then returns the identifiers of the selected images as the related images.
- FIG. 7 is a flow diagram that illustrates the processing of the identify images by textual relatedness component in one embodiment.
- the component is passed a query text and returns the identification of images that are related to the query text as indicated by the word-to-image index.
- the component removes stop words (e.g., a, the, and an) from the query text.
- the component applies stemming rules, such as the Porter stemming algorithm, to generate the stems for the words of the query text. For example, the words flowers, flowering, and flowered may be transformed to their stem flower.
- the component expands the words of the query text to include synonyms and hyponyms using, for example, the Wordnet system.
- the word flower may be expanded to include bloom, blossom, heyday, efflorescence, flush, peony, lesser celandine, pilewort, Ranunculus ficaria, anemone, wildflower, and so on.
- the component searches the word-to-image index to locate images with associated words that are related to the expanded query text.
- the component ranks the images based on how well the associated words match the expanded query text.
- the component identifies the highest ranking images and returns the identified images.
- the component may treat words associated with each image as a document and use standard query techniques to find the documents that are most closely related to the expanded query text.
- FIG. 8 is a flow diagram that illustrates the processing of the select images by visual relatedness component in one embodiment.
- the component is passed the query image and an indication of the images that were identified based on textual relatedness.
- the component selects those identified images that are visually related to the query image.
- the component extracts a feature vector for the query image.
- the feature vector for an image includes three features: a 64-element RGB color histogram feature, a 64-element HSV color histogram feature, and a 192-element Daubechies' wavelet coefficient feature.
- the component loops determining the distance between the feature vector of the query image and the feature vector of each image of the collection.
- the component selects the next image.
- decision block 803 if all the images have already been selected, then the component continues at block 808 , else the component continues at block 804 .
- the component calculates the RGB distance between the selected image and the query image.
- the component calculates the HSV distance between the selected image and the query image.
- the component calculates the Daubechies' distance between the selected image and the query image.
- the component calculates the normalized distance between the selected image and the query image as represented by the following equation:
- F RGB query , F HSV query , and F Daub query are the feature vectors of the query image and F RGB j , F j , and F Daub j are the feature vectors of the selected image, and is a normalization operator.
- FIGS. 9-14 are flow diagrams that illustrate the creation of the indexes.
- FIG. 9 is a flow diagram that illustrates the processing of the create indexes component in one embodiment.
- the component is passed a collection of web pages that contain images.
- the component invokes the generate signature-to-image index component.
- the component invokes the generate image-to-related-information index component.
- the component invokes the generate word-to-image index component. The component then completes.
- FIG. 10 is a flow diagram that illustrates the processing of the generate signature-to-image index component in one embodiment.
- the component is passed the images of the web pages and calculates a signature for each image and stores a mapping of that signature to the image.
- the component selects the next image and assigns to it a unique image identifier.
- decision block 1002 if all the images have already been selected, then the component returns, else the component continues at block 1003 .
- the component invokes the calculate image signature component to calculate the signature for the selected image.
- the component stores an entry in the signature-to-image index that maps the signature to the image identifier and then loops to block 1001 to select the next image.
- FIG. 11 is a flow diagram that illustrates the processing of the generate image-to-related-information index component in one embodiment.
- the component is passed a collection of web pages and generates a mapping of the images of the web pages to the corresponding keywords.
- the component loops selecting each web page and image combination and identifies the keywords for the image of the web page (i.e., a web page can have multiple images).
- the component selects the next web page and image combination.
- decision block 1102 if all the web page and image combinations have already been selected, then the component continues at block 1104 , else the component continues at block 1103 .
- the component invokes the select keywords for web page component and then loops to block 1101 to select the next web page and image combination.
- blocks 1104 - 1108 the component loops selecting the highest scored keywords of each web page for each image.
- the component selects the next web page and image combination.
- decision block 1105 if all the web page and image combinations have already been selected, the component returns, else the component continues at block 1106 .
- the component invokes the score keywords of web page component.
- the component selects the highest scored keywords.
- the component stores an entry in the image-to-related-information index that maps the image identifier of the image to the keywords.
- the component stores other related information, such as the visual feature vector for the image and the identification of the web page, in the entry of the image-to-related-information index. The component then loops to block 1104 to select the next web page and image combination.
- FIG. 12 is a flow diagram that illustrates the processing of the select keywords for web page component in one embodiment.
- the component is passed a web page along with the identification of an image.
- the component identifies from that web page the keywords associated with the image.
- the component creates a document from the text of the web page that is related to the image. For example, the component may analyze the document object model (“DOM”) representation of the web page to identify text that surrounds the image.
- the component identifies phrases within the document such as all possible phrases of length four or less.
- the component removes non-boundary words (e.g., “a,” “the,” “to”) from the ends of the phrases.
- the component removes stop words from the phrases.
- the component counts the number of occurrences of each phrase within the document. The component then returns the phrases as the keywords.
- FIG. 13 is a flow diagram that illustrates the processing of the score keywords of web page component in one embodiment.
- the component is passed a web page, an image of the web pages, and keywords and scores the importance of each keyword to the image.
- the component uses a term frequency by inverse document frequency (“TF-IDF”) score for each word of the collection of web pages.
- TF-IDF inverse document frequency
- the component may calculate the term frequency by inverse document frequency score according to the following equation:
- tf ⁇ idf i represents the score for word i
- n id represents the number of occurrences of a word i on web page d
- n d represents the total number of words on web page d
- n i represents the number of pages that contains word i
- N represents the number of web pages in the collection of web pages.
- the component loops calculating a score for each phrase of the document.
- the component selects the next keyword, which can contain a single word or multiple words.
- decision block 1302 if all the keywords have already been selected, then the component returns the score for the keyword, else the component continues at block 1303 .
- the component calculates a mutual information score of the selected keyword as represented by the following equation:
- MI ⁇ ( P ) log ⁇ ( Occu ⁇ ( P ) ) ⁇ N ⁇ ( ⁇ P ⁇ ) Occu ⁇ ( prefix ⁇ ( P ) ) ⁇ Occu ⁇ ( suffix ⁇ ( P ) ) ( 5 )
- MI(P) represents the mutual information score for keyword P
- Occu(P) represents the count of occurrences of P on the web page
- represents the number of words P contains
- ) represents the total number of keywords (i.e., phrases) with length less than
- prefix(P) represents the prefix of P with length
- suffix(P) represents the suffix of P with length
- the component calculates the TF-IDF score for the selected keyword as the average of the TF-IDF score for the words of the keyword.
- the component calculates a visualization style score (“VSS”) to factor in the visual characteristics of the keyword as represented by the following equation:
- VSS ⁇ ( P ) ⁇ tf - idf Max , if ⁇ ⁇ P ⁇ ⁇ is ⁇ ⁇ in ⁇ ⁇ title , alt ⁇ ⁇ text ⁇ ⁇ or ⁇ ⁇ meta ; 1 4 ⁇ tf - idf Max , else ⁇ ⁇ if ⁇ ⁇ P ⁇ ⁇ is ⁇ ⁇ in ⁇ ⁇ bold ; 1 8 ⁇ tf - idf Max , else ⁇ ⁇ if ⁇ ⁇ P ⁇ ⁇ is ⁇ ⁇ in ⁇ a ⁇ ⁇ large ⁇ ⁇ font ; 0 , otherwise . ( 6 )
- VSS(P) represents the VSS score for the keyword P and tf ⁇ idf max represents the maximum TF-IDF score of all keywords of the web page.
- the VSS is based on whether the keyword is in the title or in metadata and whether the keyword is in bold or in a large font.
- the component calculates a combined score for the selected keyword according to the following equation:
- FIG. 14 is a flow diagram that illustrates the processing of the generate word-to-image index in one embodiment.
- the component may input the image-to-related-information index and add an entry to the word-to-image index for each word of each keyword for each image.
- the component selects the next image.
- decision block 1402 if all the images have already been selected, then the component returns, else the component continues at block 1403 .
- the component selects the next keyword for the selected image.
- decision block 1404 if all the keywords have already been selected, then the component loops to block 1401 to select the next image, else the component continues at block 1405 .
- the component selects the next word of the selected keyword.
- decision block 1406 if all the words of the selected keyword have already been selected, then the component loops to block 1403 to select the next keyword of the selected image, else the component continues at block 1407 .
- block 1407 the component adds an entry to the word-to-image index that maps the selected words to the image identifier of the selected image. The component then loops to block 1405 to select the next word of the selected keyword.
- the multimodal query system may consider images to be duplicates when they are identical and when they are of the same content but from different points of view. An example of different points of view would be pictures of the same building from different angles or different distances.
- the term “keyword” refers to a phrase of one or more words. For example, “yellow tulips” and “tulips” are both keywords. Accordingly, the invention is not limited except as by the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method and system for generating a search request from a multimodal query that includes a query image and query text is provided. The multimodal query system identifies images of a collection that are textually related to the query image based on similarity between words associated with each image and the query text. The multimodal query system then selects those images of the identified images that are visually related to the query image. The multimodal query system may formulate a search request based on keywords of web pages that contain the selected images and submit that search request to a search engine service.
Description
- This application is a continuation application of U.S. Pat. No. 8,081,824, filed on Nov. 30, 2004, and issued on Dec. 20, 2011, entitled, “GENERATING SEARCH REQUESTS FROM MULTIMODAL QUERIES: which is a divisional application of U.S. Pat. No. 7,457,825, filed on Sep. 21, 2005, and issued on Nov. 25, 2008, entitled “GENERATING SEARCH REQUESTS FROM MULTIMODAL QUERIES,” which are incorporated herein in their entireties by reference.
- Many search engine services, such as Google and Overture, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request or query that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling and indexing” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service then ranks the web pages of the search result based on the closeness of each match, web page popularity (e.g., Google's PageRank), and so on. The search engine service may also generate a relevance score to indicate how relevant the information of the web page may be to the search request. The search engine service then displays to the user links to those web pages in an order that is based on their rankings.
- These search engine services may, however, not be particularly useful in certain situations. In particular, it can difficult to formulate a suitable search request that effectively describes the needed information. For example, if a person sees a flower on the side of a road and wants to learn the identity of the flower, the person when returning home may formulate the search request of “picture of yellow tulip-like flower in Europe” (e.g., yellow tulip) in hopes of seeing a picture of the flower. Unfortunately, the search result may identify so many web pages that it may be virtually impossible for the person to locate the correct picture assuming that the person can even accurately remember the details of the flower. If the person has a mobile device, such as a personal digital assistant (“PDA”) or cell phone, the person may be able to submit the search request while at the side of the road. Such mobile devices, however, have limited input and output capabilities, which make it both difficult to enter the search request and to view the search result.
- If the person, however, is able to take a picture of the flower, the person may then be able to use a Content Based Information Retrieval (“CBIR”) system to find a similar looking picture. Although the detection of duplicate images can be achieved when the image database of the CBIR system happens to contain a duplicate image, the image database will not contain a duplicate of the picture of the flower at the side of the road. If a duplicate image is not in the database, it can be prohibitively expensive computationally, if even possible, to find a “matching” image. For example, if the image database contains an image of a field of yellow tulips and the picture contains only a single tulip, then the CBIR system may not recognize the images as matching.
- A method and system for generating a search request from a multimodal query is provided. The multimodal query system inputs a multimodal query that includes a query image and query text. The multimodal query system provides a collection of images along with one or more words associated with each image. The multimodal query system identifies images of the collection that are textually related to the query image based on similarity between associated words and the query text. The multimodal query system then selects those images of the identified images that are visually related to the query image. The multimodal query system may formulate a search request based on keywords of the web pages that contain the selected images and submit that search request to a search engine service, a dictionary service, an encyclopedia service, or the like. Upon receiving the search result, the multimodal query system provides that search result as the search result for the multimodal query.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
-
FIG. 1 is a block diagram that illustrates the overall processing of the multimodal query system in one embodiment. -
FIG. 2 is a block diagram that illustrates components of the multimodal query system in one embodiment. -
FIG. 3 is a flow diagram that illustrates the processing of the perform multimodal query component in one embodiment. -
FIG. 4 is a diagram that illustrates the generating of a signature of an image in one embodiment. -
FIG. 5 is a flow diagram that illustrates the processing of the calculate image signature component in one embodiment. -
FIG. 6 is a flow diagram that illustrates the processing of the find related images component in one embodiment. -
FIG. 7 is a flow diagram that illustrates the processing of the identify images by textual relatedness component in one embodiment. -
FIG. 8 is a flow diagram that illustrates the processing of the select images by visual relatedness component in one embodiment. -
FIG. 9 is a flow diagram that illustrates the processing of the create indexes component in one embodiment. -
FIG. 10 is a flow diagram that illustrates the processing of the generate signature-to-image index component in one embodiment. -
FIG. 11 is a flow diagram that illustrates the processing of the generate image-to-related-information index component in one embodiment. -
FIG. 12 is a flow diagram that illustrates the processing of the select keywords for web page component in one embodiment. -
FIG. 13 is a flow diagram that illustrates the processing of the score keywords of web page component in one embodiment. -
FIG. 14 is a flow diagram that illustrates the processing of the generate word-to-image index in one embodiment. - A method and system for generating a search request from a multimodal query is provided. In one embodiment, the multimodal query system inputs a multimodal query that includes an image (i.e., query image) and verbal information (i.e., query text). For example, a multimodal query may include a picture of a flower along with the word “flower.” The verbal information may be input as text via a keyboard, audio via a speaker, and so on. The multimodal query system provides a collection of images along with one or more words associated with each image. For example, each image of the collection may have associated words that describe the subject of the image. In the case of an image of a yellow tulip, the associated words may include yellow, tulip, lily, flower, and so on. The multimodal query system identifies images of the collection whose associated words are related to the query text. The identifying of images based on relatedness to the query text helps to reduce the set of images that may be related to the query image. The multimodal query system then selects those images of the identified images that are visually related to the query image. For example, the multimodal query system may use a content base information retrieval (“CBIR”) system to determine which of the identified images are most visually similar to the query image. In one embodiment, the multimodal query system may return the selected images as the search result. For example, the multimodal query system may provide links to web pages that contain the selected images. In another embodiment, the multimodal query system may formulate a search request based on keywords of the web pages that contain the selected images and submit that search request to a search engine service, a dictionary service, an encyclopedia service, or the like. For example, the keywords of the web pages that contain the selected images may include the phrases yellow tulip, tulipa, Liliaceae lily flower, Holland yellow flower, and so on, and the formulated search request may be “yellow tulip lily flower Holland.” Upon receiving the search result, the multimodal query system provides that search result as the search result for the multimodal query. In this way, the multimodal query system allows the multimodal query to specify needed information more precisely than is specified by a unimodal query (e.g., query image alone or query text alone).
- In one embodiment, the multimodal query system may generate from the collection of images a word-to-image index for use in identifying the images that are related to the query text. The word-to-image index maps images to their associated words. For example, the words tulip, flower, and yellow may map to the image of a field of yellow tulips. The multimodal query system may generate the collection of images from a collection of web pages that each contain one or more images. The multimodal query system may assign a unique image identifier to each image of a web page. The multimodal query system may then identify words associated with the image. For each associated word, the multimodal query system adds an entry that maps the word to the image identifier. The multimodal query system uses these entries when identifying images that are related to the query text. The multimodal query system may use conventional techniques to identify the images that are most textually related to the query text based on analysis of the associated words.
- In one embodiment, the multimodal query system may generate from the collection of images an image-to-related-information index for use in selecting the identified images that are visually related to the query image. The image-to-related-information index may map each image to a visual feature vector of the image, a bitmap of the image, a web page that contains the image, and keywords of the web page that are associated with the image. For each image, the multimodal query system generates a visual feature vector of features (e.g., average RGB value) that represents the image. When determining whether an image of the collection is visually related to a query image, the multimodal query system generates a visual feature vector for the query image and compares it to the visual feature vector of the image-to-related-information index. The multimodal query system may identify, from the web page that contains an image, keywords associated with the image and store an indication of those keywords in the image-to-related-information index. The multimodal query system uses the keywords associated with the selected images to formulate a unimodal or text-based search request for the multimodal query.
- In one embodiment, the multimodal query system may initially search the collection of images to determine whether there is a duplicate image. If a duplicate image is found, then the multimodal query system may use the keywords associated with that image (e.g., from the image-to-related-information index) to formulate a search request based on the multimodal query. If no duplicate image is found, then the multimodal query system uses the query text to identify images and then selects from those identified images that are textually and visually related to the query image as described above. The multimodal query system may generate a signature-to-image index for identifying duplicate images by comparing signatures of the images of the collection to the signature of a query image. The multimodal query system may use various hashing algorithms to map an image to a signature that has a relatively high likelihood of being unique to that image within the collection (i.e., no collisions). To identify duplicate images, the multimodal query system generates a signature for the query image and determines whether the signature-to-image index contains an entry with the same signature.
-
FIG. 1 is a block diagram that illustrates the overall processing of the multimodal query system in one embodiment. The input to the multimodal query system is a multimodal query that includes aquery image 101 and aquery text 102. The system initially generates a signature for the query image. Indecision block 103, if the signature-to-image index contains an entry with a matching signature, then the collection contains a duplicate image and the system continues atblock 106, else the system continues atblock 104. Inblock 104, the system identifies images that are textually related to the query text using the word-to-image index. Before identifying the images, the system may use various techniques to expand the query text, such as by adding to the query text synonyms of the original words of the query text. The use of the expanded query text may help improve the chances of identifying the most textually related images. The output ofblock 104 is the identified images that are most textually related. Inblock 105, the system selects from the identified textually related images those images that are visually related to the query image using the image-to-related-information index. The system may determine the visual distance between the visual feature vector of the query image and the visual feature vector of each image of the collection and select the images with the smallest visual distances as being most visually related. The output ofblock 105 is the selected visually related images. Inblock 107, the system formulates a search request based on the images selected inblock 105 or on the duplicate image as identified inblock 103. The system retrieves the keywords associated with the selected images or the duplicate image and generates a text-based search request from those keywords. Inblock 107, the system submits the search request to a search engine service, a dictionary service, an encyclopedia service, or the like. The system then returns the search result provided by the search engine as the search result for the multimodal query. -
FIG. 2 is a block diagram that illustrates components of the multimodal query system in one embodiment. The multimodal query system includes anindexing component 210, adata store 220, and aquerying component 230. The indexing component generates the indexes from a collection of web pages. The indexing component includes a generateindexes component 211, a generate signature-to-image index component 212, a generate image-to-related-information index component 213, and a generate word-to-image index component 214. The generate indexes component invokes the other components of the indexing component to generate the appropriate index. Thedata store 220 includes aweb page store 221, a signature-to-image index 222, an image-to-related-information index 223, and a word-to-image index 224. The web page store contains a collection of web pages from which the indexes are generated. The index may be organized using various data structures such as hash tables, B-trees, ordered list, so on. In addition, the indexes may be represented by a single data structure or separate data structures. Thequerying component 230 includes a performmultimodal query component 231, a calculateimage signature component 232, a find relatedimages component 233, an identify images bytextual relatedness component 234, and a select images byvisual relatedness component 235. The perform multimodal query component is invoked to perform a multimodal query on an input query image and an input query text. The component invokes the calculate image signature component to generate a signature for the query image for use in determining whether the collection of images contains a duplicate of the query image. The component also invokes the find related images component to find images that are related when no duplicate image has been found. The find related images component invokes the identify images by textual relatedness component and the select images by visual relatedness component to find the related images. The perform multimodal query component then formulates a text-based search request based on the keywords associated with the related images and submits the search request to a search engine service, a dictionary service, an encyclopedia service, or the like to generate the search result for the multimodal query. - The computing devices on which the multimodal query system may be implemented may include a central processing unit, memory, input devices (e.g., keyboard and pointing devices), output devices (e.g., display devices), and storage devices (e.g., disk drives). The memory and storage devices are computer-readable media that may contain instructions that implement the multimodal query system. In addition, the data structures may be stored or transmitted via a data transmission medium, such as a signal on a communications link. Various communications links may be used to connect components of the system, such as the Internet, a local area network, a wide area network, a point-to-point dial-up connection, a cell phone network, and so on.
- Embodiments of the multimodal query system may be implemented in various operating environments that include personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics, digital cameras, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and so on. The devices may include cell phones, personal digital assistants, smart phones, personal computers, programmable consumer electronics, digital cameras, and so on.
- The multimodal query system may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, and so on that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
-
FIG. 3 is a flow diagram that illustrates the processing of the perform multimodal query component in one embodiment. The component is passed a multimodal query that includes a query image and a query text. The query text includes one or more query words. Inblock 301, the component invokes the calculate image signature component to calculate the signature of the query image. Indecision block 302, if the signature-to-image index contains the calculated signature, then a duplicate image has been found and the component continues atblock 305, else the component continues atblock 303. Indecision block 303, if query text was provided (e.g., not blank), then the component continues atblock 304, else the component completes without performing the search. Inblock 304, the component invokes the find related images component to find images related to the query image. Inblock 305, the component extracts keywords from the image-to-related-information index for the related images or the duplicate image. Inblock 306, the component formulates a search request based on the keywords and submits the search request to a search engine. The component then completes. -
FIG. 4 is a diagram that illustrates the generating of a signature of an image in one embodiment.Image 401 represents the image for which the signature is to be generated. Initially, the system converts the image to a gray level image as represented byimage 402. The system then divides the image into blocks (e.g., 8-by-8 blocks) as illustrated byimage 403. The system then calculates the average intensity of each block to generatematrix 404 as indicated by the following equation: -
- where Iij is the average intensity for block ij and x and y represent the pixels of block ij. The system then performs a two-dimensional discrete cosine transform (“DCT”) on the matrix. The system discards the DC coefficient of the DCT matrix and selects 48 AC coefficients of the DCT matrix in a zigzag pattern as illustrated by
pattern 405 resulting in anAC coefficients vector 406. The system then performs a principal component analysis (“PCA”) to generate a 32-dimension feature vector 407 as illustrated by the following equation: -
Yn=PTAm (2) - where Yn represents the 32-dimension feature vector, Am represents the 48 AC coefficients, and P represents an m×n transform matrix whose columns are the n orthonormal eigenvectors corresponding to the first n largest eigenvalues of the covariance matrix ΣA
m , and PTP=In. The system may train the transform matrix using a collection of sample web pages. Finally, the system generates a 32-bit hash value 408 from the 32-dimension feature vector by setting the value of each of the 32 bits to 1 if the corresponding 32-dimension feature vector is greater than 0, and to 0 otherwise. One skilled in the art will appreciate that many different algorithms may be used to generate a signature for an image. -
FIG. 5 is a flow diagram that illustrates the processing of the calculate image signature component in one embodiment. The component is passed an image and generates a signature for the image. In block 501, the component converts the image into a gray level image. In block 502, the component divides the image into blocks. In block 503, the component calculates the average intensity of each block to generate an intensity matrix. In block 504, the component performs a two-dimensional discrete cosine transform on the intensity matrix. In block 505, the component extracts 48 AC coefficients from the DCT matrix. In block 506, the component performs a PCA to generate a 32-dimension feature vector from the 48 AC coefficients. In block 507, the component generates a 32-bit signature from the 32-dimension feature vector and then completes. -
FIG. 6 is a flow diagram that illustrates the processing of the find related images component in one embodiment. The component is passed a multimodal query containing a query image and a query text and returns an indication of images that are related to the multimodal query. Inblock 601, the component invokes the identify images by textual relatedness component to identify images that are related to the query text. Inblock 602, the component invokes the select images by visual relatedness component to select those identified images that are visually related to the query image. The component then returns the identifiers of the selected images as the related images. -
FIG. 7 is a flow diagram that illustrates the processing of the identify images by textual relatedness component in one embodiment. The component is passed a query text and returns the identification of images that are related to the query text as indicated by the word-to-image index. Inblock 701, the component removes stop words (e.g., a, the, and an) from the query text. Inblock 702, the component applies stemming rules, such as the Porter stemming algorithm, to generate the stems for the words of the query text. For example, the words flowers, flowering, and flowered may be transformed to their stem flower. Inblock 703, the component expands the words of the query text to include synonyms and hyponyms using, for example, the Wordnet system. For example, the word flower may be expanded to include bloom, blossom, heyday, efflorescence, flush, peony, lesser celandine, pilewort, Ranunculus ficaria, anemone, wildflower, and so on. Inblock 704, the component searches the word-to-image index to locate images with associated words that are related to the expanded query text. Inblock 705, the component ranks the images based on how well the associated words match the expanded query text. Inblock 706, the component identifies the highest ranking images and returns the identified images. The component may treat words associated with each image as a document and use standard query techniques to find the documents that are most closely related to the expanded query text. -
FIG. 8 is a flow diagram that illustrates the processing of the select images by visual relatedness component in one embodiment. The component is passed the query image and an indication of the images that were identified based on textual relatedness. The component selects those identified images that are visually related to the query image. Inblock 801, the component extracts a feature vector for the query image. In one embodiment, the feature vector for an image includes three features: a 64-element RGB color histogram feature, a 64-element HSV color histogram feature, and a 192-element Daubechies' wavelet coefficient feature. One skilled in the art will appreciate that any of a variety of well-known techniques can be used to generate a feature vector for an image. In blocks 802-807, the component loops determining the distance between the feature vector of the query image and the feature vector of each image of the collection. Inblock 802, the component selects the next image. Indecision block 803, if all the images have already been selected, then the component continues atblock 808, else the component continues atblock 804. Inblock 804, the component calculates the RGB distance between the selected image and the query image. Inblock 805, the component calculates the HSV distance between the selected image and the query image. Inblock 806, the component calculates the Daubechies' distance between the selected image and the query image. Inblock 807, the component calculates the normalized distance between the selected image and the query image as represented by the following equation: - where FRGB query, FHSV query, and FDaub query are the feature vectors of the query image and FRGB j, Fj, and FDaub j are the feature vectors of the selected image, and is a normalization operator. In one embodiment, the component uses the constant weights of wRGB=0.3, wHSV=0.5, and wDaub=0.2. The component then loops to block 802 to select the next image. In
block 808, the component selects the images with the smallest distances and returns the selected images. -
FIGS. 9-14 are flow diagrams that illustrate the creation of the indexes.FIG. 9 is a flow diagram that illustrates the processing of the create indexes component in one embodiment. The component is passed a collection of web pages that contain images. Inblock 901, the component invokes the generate signature-to-image index component. Inblock 902, the component invokes the generate image-to-related-information index component. Inblock 903, the component invokes the generate word-to-image index component. The component then completes. -
FIG. 10 is a flow diagram that illustrates the processing of the generate signature-to-image index component in one embodiment. The component is passed the images of the web pages and calculates a signature for each image and stores a mapping of that signature to the image. Inblock 1001, the component selects the next image and assigns to it a unique image identifier. Indecision block 1002, if all the images have already been selected, then the component returns, else the component continues atblock 1003. Inblock 1003, the component invokes the calculate image signature component to calculate the signature for the selected image. Inblock 1004, the component stores an entry in the signature-to-image index that maps the signature to the image identifier and then loops to block 1001 to select the next image. -
FIG. 11 is a flow diagram that illustrates the processing of the generate image-to-related-information index component in one embodiment. The component is passed a collection of web pages and generates a mapping of the images of the web pages to the corresponding keywords. In blocks 1101-1103, the component loops selecting each web page and image combination and identifies the keywords for the image of the web page (i.e., a web page can have multiple images). Inblock 1101, the component selects the next web page and image combination. Indecision block 1102, if all the web page and image combinations have already been selected, then the component continues atblock 1104, else the component continues atblock 1103. Inblock 1103, the component invokes the select keywords for web page component and then loops to block 1101 to select the next web page and image combination. In blocks 1104-1108, the component loops selecting the highest scored keywords of each web page for each image. Inblock 1104, the component selects the next web page and image combination. Indecision block 1105, if all the web page and image combinations have already been selected, the component returns, else the component continues atblock 1106. Inblock 1106, the component invokes the score keywords of web page component. Inblock 1107, the component selects the highest scored keywords. Inblock 1108, the component stores an entry in the image-to-related-information index that maps the image identifier of the image to the keywords. Inblock 1109, the component stores other related information, such as the visual feature vector for the image and the identification of the web page, in the entry of the image-to-related-information index. The component then loops to block 1104 to select the next web page and image combination. -
FIG. 12 is a flow diagram that illustrates the processing of the select keywords for web page component in one embodiment. The component is passed a web page along with the identification of an image. The component identifies from that web page the keywords associated with the image. Inblock 1201, the component creates a document from the text of the web page that is related to the image. For example, the component may analyze the document object model (“DOM”) representation of the web page to identify text that surrounds the image. Inblock 1202, the component identifies phrases within the document such as all possible phrases of length four or less. Inblock 1203, the component removes non-boundary words (e.g., “a,” “the,” “to”) from the ends of the phrases. Inblock 1204, the component removes stop words from the phrases. Inblock 1205, the component counts the number of occurrences of each phrase within the document. The component then returns the phrases as the keywords. -
FIG. 13 is a flow diagram that illustrates the processing of the score keywords of web page component in one embodiment. The component is passed a web page, an image of the web pages, and keywords and scores the importance of each keyword to the image. The component uses a term frequency by inverse document frequency (“TF-IDF”) score for each word of the collection of web pages. The component may calculate the term frequency by inverse document frequency score according to the following equation: -
- where tf−idfi represents the score for word i, nid represents the number of occurrences of a word i on web page d, nd represents the total number of words on web page d, ni represents the number of pages that contains word i, and N represents the number of web pages in the collection of web pages. In blocks 1301-1307, the component loops calculating a score for each phrase of the document. In
block 1301, the component selects the next keyword, which can contain a single word or multiple words. Indecision block 1302, if all the keywords have already been selected, then the component returns the score for the keyword, else the component continues atblock 1303. Inblock 1303, the component calculates a mutual information score of the selected keyword as represented by the following equation: -
- where MI(P) represents the mutual information score for keyword P, Occu(P) represents the count of occurrences of P on the web page, |P| represents the number of words P contains, N(|P|) represents the total number of keywords (i.e., phrases) with length less than |P|, prefix(P) represents the prefix of P with length |P|−1, and suffix(P) represents the suffix of P with length |P|−1. In
decision block 1304, if the mutual information score is greater than a threshold, then the component continues atblock 1305, else the component loops to block 1301 to select the next keyword. If the mutual information score does not meet a threshold level, then the component considers the keyword to be unimportant and sets its score to 0. Inblock 1305, the component calculates the TF-IDF score for the selected keyword as the average of the TF-IDF score for the words of the keyword. Inblock 1306, the component calculates a visualization style score (“VSS”) to factor in the visual characteristics of the keyword as represented by the following equation: -
- where VSS(P) represents the VSS score for the keyword P and tf−idfmax represents the maximum TF-IDF score of all keywords of the web page. The VSS is based on whether the keyword is in the title or in metadata and whether the keyword is in bold or in a large font. One skilled in the art will appreciate that other visual characteristics could be taken into consideration, such as position of a keyword on a page, closeness to an image, and so on. In
block 1307, the component calculates a combined score for the selected keyword according to the following equation: -
- where X={tf−idf, Mi, VSS} and the coefficients b0, . . . , b3 are empirically determined. The component then loops to block 1301 to select the next keyword.
-
FIG. 14 is a flow diagram that illustrates the processing of the generate word-to-image index in one embodiment. The component may input the image-to-related-information index and add an entry to the word-to-image index for each word of each keyword for each image. Inblock 1401, the component selects the next image. Indecision block 1402, if all the images have already been selected, then the component returns, else the component continues atblock 1403. Inblock 1403, the component selects the next keyword for the selected image. Indecision block 1404, if all the keywords have already been selected, then the component loops to block 1401 to select the next image, else the component continues atblock 1405. Inblock 1405, the component selects the next word of the selected keyword. Indecision block 1406, if all the words of the selected keyword have already been selected, then the component loops to block 1403 to select the next keyword of the selected image, else the component continues atblock 1407. Inblock 1407, the component adds an entry to the word-to-image index that maps the selected words to the image identifier of the selected image. The component then loops to block 1405 to select the next word of the selected keyword. - Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. For example, the multimodal query system may consider images to be duplicates when they are identical and when they are of the same content but from different points of view. An example of different points of view would be pictures of the same building from different angles or different distances. As used herein, the term “keyword” refers to a phrase of one or more words. For example, “yellow tulips” and “tulips” are both keywords. Accordingly, the invention is not limited except as by the appended claims.
- From the foregoing, it will be appreciated that specific embodiments of the invention have been described herein for purposes of illustration, but that various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited except as by the appended claims.
Claims (20)
1. A method in a device for generating a search request for a multimodal query with a query image and query text, the query image being stored in electronic form, the method comprising:
providing access to a collection of images and associated words;
receiving a multimodal query that includes a query image and query text;
identifying images of the collection based on textual relatedness between a word associated with an image and the query text;
selecting images of the identified images based on visual relatedness between an identified image and the query image;
generating a search request based on keywords associated with the selected images;
submitting the generated search request to a search engine for identifying documents related to the multimodal query; and
providing an indication of the identified documents as a search result for the multimodal query.
2. The method of claim 1 wherein the selecting comprises extracting a feature vector for the identified image, determining the distance between the extracted feature vector and the feature vector of each image of the collection, and selecting the images based on the determined distance.
3. The method of claim 1 wherein the collection includes a collection of web pages with images and words.
4. The method of claim 1 wherein visual relatedness is based on similarity in color space and wavelet coefficients.
5. The method of claim 1 including:
before identifying images of the collection,
determining whether the query image is a duplicate of an image of the collection; and
when the query image is a duplicate of an image, generating a search request based on a keyword associated with that image.
6. The method of claim 5 wherein the query image is a duplicate when the images are identical.
7. The method of claim 5 wherein the query image is a duplicate when the images are of the same content but from different points of view.
8. The method of claim 5 wherein the collection includes signatures of the images and wherein images are duplicates when they have the same signature.
9. The method of claim 1 wherein the query text is derived from audio information.
10. A computer-readable storage device containing computer-executable instructions for controlling a computing device to find images related to a multimodal query, the instructions for performing a method comprising:
providing access to web pages with images, the web pages having words;
receiving a query image and query text of the multimodal query;
identifying images of the web pages based on textual relatedness between words of a web page and the query text;
selecting images of the identified images of the web pages based on visual relatedness between an identified image and the query image; and
generating a search request based on keywords associated with the selected images.
11. The computer-readable storage device of claim 10 including:
submitting the generated search request to a search engine for identifying documents related to the multimodal query; and
providing an indication of the identified documents as a search result for the multimodal query.
12. The computer-readable storage device of claim 10 wherein the selecting comprises extracting a feature vector for the query image, determining the distance between the extracted feature vector and the feature vector and the feature vector of each image of the collection, and selecting the images based on the determined distance.
13. The computer-readable storage device of claim 10 including:
before identifying web pages,
determining whether the query image is a duplicate of an image of a web page; and
when the query image is a duplicate of an image of a web page, generating a search request based on words of the web page that contains the duplicate image.
14. The computer-readable storage device of claim 10 wherein visual relatedness is based on similarity in color space and wavelet coefficients.
15. The computer-readable storage device of claim 13 wherein the query text is derived from audio information.
16. A computing device for generating a search request for a multimodal query with a query image and query text, comprising:
a memory storing computer-executable instructions of:
a component that identifies images of a collection of images based on textual relatedness between a word associated with an image and the query text;
a component that selects images of the identified images based on visual relatedness between an identified image and the query image; and
a component that generates a search request based on keywords associated with the selected images;
a processor that executes the computer-executable instructions stored in the memory.
17. The computing device of claim 16 including:
a component that submits the generated search request to a search engine for identifying documents related to the multimodal query; and
a component that provides an indication of the identified documents as a search result for the multimodal query.
18. The computing device of claim 17 wherein the component that selects extracts a feature vector for the identified image, determines the distance between the extracted feature vector and the feature vector of each image of the collection, and selects the images based on the determined distance.
19. The computing device of claim 17 including a component that before identifying images of the collection, determines whether the query image is a duplicate of an image of the collection and when the query image is a duplicate of an image, generates a search request based on a keyword associated with that image.
20. The computing device of claim 17 wherein the collection includes signatures of the images and wherein images are duplicates when they have the same signature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/332,248 US20120093371A1 (en) | 2005-09-21 | 2011-12-20 | Generating search requests from multimodal queries |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/233,352 US7457825B2 (en) | 2005-09-21 | 2005-09-21 | Generating search requests from multimodal queries |
US12/247,958 US8081824B2 (en) | 2005-09-21 | 2008-10-08 | Generating search requests from multimodal queries |
US13/332,248 US20120093371A1 (en) | 2005-09-21 | 2011-12-20 | Generating search requests from multimodal queries |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/247,958 Continuation US8081824B2 (en) | 2005-09-21 | 2008-10-08 | Generating search requests from multimodal queries |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120093371A1 true US20120093371A1 (en) | 2012-04-19 |
Family
ID=37885443
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/233,352 Expired - Fee Related US7457825B2 (en) | 2005-09-21 | 2005-09-21 | Generating search requests from multimodal queries |
US12/247,958 Expired - Fee Related US8081824B2 (en) | 2005-09-21 | 2008-10-08 | Generating search requests from multimodal queries |
US13/332,248 Abandoned US20120093371A1 (en) | 2005-09-21 | 2011-12-20 | Generating search requests from multimodal queries |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/233,352 Expired - Fee Related US7457825B2 (en) | 2005-09-21 | 2005-09-21 | Generating search requests from multimodal queries |
US12/247,958 Expired - Fee Related US8081824B2 (en) | 2005-09-21 | 2008-10-08 | Generating search requests from multimodal queries |
Country Status (1)
Country | Link |
---|---|
US (3) | US7457825B2 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080189633A1 (en) * | 2006-12-27 | 2008-08-07 | International Business Machines Corporation | System and Method For Processing Multi-Modal Communication Within A Workgroup |
US20110035406A1 (en) * | 2009-08-07 | 2011-02-10 | David Petrou | User Interface for Presenting Search Results for Multiple Regions of a Visual Query |
US20110125735A1 (en) * | 2009-08-07 | 2011-05-26 | David Petrou | Architecture for responding to a visual query |
US20110131235A1 (en) * | 2009-12-02 | 2011-06-02 | David Petrou | Actionable Search Results for Street View Visual Queries |
US20110131241A1 (en) * | 2009-12-02 | 2011-06-02 | David Petrou | Actionable Search Results for Visual Queries |
US20110129153A1 (en) * | 2009-12-02 | 2011-06-02 | David Petrou | Identifying Matching Canonical Documents in Response to a Visual Query |
US20110128288A1 (en) * | 2009-12-02 | 2011-06-02 | David Petrou | Region of Interest Selector for Visual Queries |
US20130301935A1 (en) * | 2011-01-28 | 2013-11-14 | Alibaba Group Holding Limited | Method and Apparatus of Identifying Similar Images |
CN103544216A (en) * | 2013-09-23 | 2014-01-29 | Tcl集团股份有限公司 | Information recommendation method and system combining image content and keywords |
US8805079B2 (en) | 2009-12-02 | 2014-08-12 | Google Inc. | Identifying matching canonical documents in response to a visual query and in accordance with geographic information |
US8811742B2 (en) | 2009-12-02 | 2014-08-19 | Google Inc. | Identifying matching canonical documents consistent with visual query structural information |
US8935246B2 (en) * | 2012-08-08 | 2015-01-13 | Google Inc. | Identifying textual terms in response to a visual query |
US9176986B2 (en) | 2009-12-02 | 2015-11-03 | Google Inc. | Generating a combination of a visual query and matching canonical document |
US11645323B2 (en) | 2020-02-26 | 2023-05-09 | Samsung Electronics Co.. Ltd. | Coarse-to-fine multimodal gallery search system with attention-based neural network models |
Families Citing this family (79)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8563133B2 (en) * | 2004-06-08 | 2013-10-22 | Sandisk Corporation | Compositions and methods for modulation of nanostructure energy levels |
US20070005490A1 (en) * | 2004-08-31 | 2007-01-04 | Gopalakrishnan Kumar C | Methods and System for Distributed E-commerce |
US7873911B2 (en) * | 2004-08-31 | 2011-01-18 | Gopalakrishnan Kumar C | Methods for providing information services related to visual imagery |
US8370323B2 (en) * | 2004-08-31 | 2013-02-05 | Intel Corporation | Providing information services related to multimodal inputs |
US20060230073A1 (en) * | 2004-08-31 | 2006-10-12 | Gopalakrishnan Kumar C | Information Services for Real World Augmentation |
US8521737B2 (en) | 2004-10-01 | 2013-08-27 | Ricoh Co., Ltd. | Method and system for multi-tier image matching in a mixed media environment |
US8965145B2 (en) | 2006-07-31 | 2015-02-24 | Ricoh Co., Ltd. | Mixed media reality recognition using multiple specialized indexes |
US8838591B2 (en) | 2005-08-23 | 2014-09-16 | Ricoh Co., Ltd. | Embedding hot spots in electronic documents |
US9171202B2 (en) | 2005-08-23 | 2015-10-27 | Ricoh Co., Ltd. | Data organization and access for mixed media document system |
US9530050B1 (en) | 2007-07-11 | 2016-12-27 | Ricoh Co., Ltd. | Document annotation sharing |
US8510283B2 (en) | 2006-07-31 | 2013-08-13 | Ricoh Co., Ltd. | Automatic adaption of an image recognition system to image capture devices |
US9384619B2 (en) | 2006-07-31 | 2016-07-05 | Ricoh Co., Ltd. | Searching media content for objects specified using identifiers |
US8856108B2 (en) | 2006-07-31 | 2014-10-07 | Ricoh Co., Ltd. | Combining results of image retrieval processes |
US7812986B2 (en) | 2005-08-23 | 2010-10-12 | Ricoh Co. Ltd. | System and methods for use of voice mail and email in a mixed media environment |
US7702673B2 (en) | 2004-10-01 | 2010-04-20 | Ricoh Co., Ltd. | System and methods for creation and use of a mixed media environment |
US8176054B2 (en) | 2007-07-12 | 2012-05-08 | Ricoh Co. Ltd | Retrieving electronic documents by converting them to synthetic text |
US8156116B2 (en) | 2006-07-31 | 2012-04-10 | Ricoh Co., Ltd | Dynamic presentation of targeted information in a mixed media reality recognition system |
US8825682B2 (en) | 2006-07-31 | 2014-09-02 | Ricoh Co., Ltd. | Architecture for mixed media reality retrieval of locations and registration of images |
US9373029B2 (en) | 2007-07-11 | 2016-06-21 | Ricoh Co., Ltd. | Invisible junction feature recognition for document security or annotation |
US8949287B2 (en) | 2005-08-23 | 2015-02-03 | Ricoh Co., Ltd. | Embedding hot spots in imaged documents |
US8868555B2 (en) | 2006-07-31 | 2014-10-21 | Ricoh Co., Ltd. | Computation of a recongnizability score (quality predictor) for image retrieval |
US9405751B2 (en) | 2005-08-23 | 2016-08-02 | Ricoh Co., Ltd. | Database for mixed media document system |
US8600989B2 (en) | 2004-10-01 | 2013-12-03 | Ricoh Co., Ltd. | Method and system for image matching in a mixed media environment |
US8156115B1 (en) | 2007-07-11 | 2012-04-10 | Ricoh Co. Ltd. | Document-based networking with mixed media reality |
US7894854B2 (en) | 2004-10-26 | 2011-02-22 | Pantech & Curitel Communications, Inc. | Image/audio playback device of mobile communication terminal |
US9092458B1 (en) * | 2005-03-08 | 2015-07-28 | Irobot Corporation | System and method for managing search results including graphics |
US20070028189A1 (en) * | 2005-07-27 | 2007-02-01 | Microsoft Corporation | Hierarchy highlighting |
US7457825B2 (en) * | 2005-09-21 | 2008-11-25 | Microsoft Corporation | Generating search requests from multimodal queries |
US7647331B2 (en) * | 2006-03-28 | 2010-01-12 | Microsoft Corporation | Detecting duplicate images using hash code grouping |
US7860317B2 (en) * | 2006-04-04 | 2010-12-28 | Microsoft Corporation | Generating search results based on duplicate image detection |
US9892196B2 (en) * | 2006-04-21 | 2018-02-13 | Excalibur Ip, Llc | Method and system for entering search queries |
CN100530183C (en) * | 2006-05-19 | 2009-08-19 | 华为技术有限公司 | System and method for collecting watch database |
US9176984B2 (en) | 2006-07-31 | 2015-11-03 | Ricoh Co., Ltd | Mixed media reality retrieval of differentially-weighted links |
US8489987B2 (en) | 2006-07-31 | 2013-07-16 | Ricoh Co., Ltd. | Monitoring and analyzing creation and usage of visual content using image and hotspot interaction |
US9063952B2 (en) | 2006-07-31 | 2015-06-23 | Ricoh Co., Ltd. | Mixed media reality recognition with image tracking |
US8201076B2 (en) | 2006-07-31 | 2012-06-12 | Ricoh Co., Ltd. | Capturing symbolic information from documents upon printing |
US9020966B2 (en) | 2006-07-31 | 2015-04-28 | Ricoh Co., Ltd. | Client device for interacting with a mixed media reality recognition system |
US8676810B2 (en) | 2006-07-31 | 2014-03-18 | Ricoh Co., Ltd. | Multiple index mixed media reality recognition using unequal priority indexes |
KR100886767B1 (en) * | 2006-12-29 | 2009-03-04 | 엔에이치엔(주) | Method and system for providing serching service using graphical user interface |
US8291316B2 (en) * | 2007-05-30 | 2012-10-16 | Xerox Corporation | Production environment CRM information gathering system for VI applications |
US8571850B2 (en) * | 2007-09-13 | 2013-10-29 | Microsoft Corporation | Dual cross-media relevance model for image annotation |
US8457416B2 (en) * | 2007-09-13 | 2013-06-04 | Microsoft Corporation | Estimating word correlations from images |
US20090287680A1 (en) * | 2008-05-14 | 2009-11-19 | Microsoft Corporation | Multi-modal query refinement |
US20090313239A1 (en) * | 2008-06-16 | 2009-12-17 | Microsoft Corporation | Adaptive Visual Similarity for Text-Based Image Search Results Re-ranking |
US8538958B2 (en) * | 2008-07-11 | 2013-09-17 | Satyam Computer Services Limited Of Mayfair Centre | System and method for context map generation |
US8463053B1 (en) | 2008-08-08 | 2013-06-11 | The Research Foundation Of State University Of New York | Enhanced max margin learning on multimodal data mining in a multimedia database |
US8520979B2 (en) | 2008-08-19 | 2013-08-27 | Digimarc Corporation | Methods and systems for content processing |
US8452794B2 (en) * | 2009-02-11 | 2013-05-28 | Microsoft Corporation | Visual and textual query suggestion |
US8126897B2 (en) * | 2009-06-10 | 2012-02-28 | International Business Machines Corporation | Unified inverted index for video passage retrieval |
CN101576932B (en) * | 2009-06-16 | 2012-07-04 | 阿里巴巴集团控股有限公司 | Close-repetitive picture computer searching method and device |
JP2011053781A (en) * | 2009-08-31 | 2011-03-17 | Seiko Epson Corp | Image database creation device, image retrieval device, image database creation method and image retrieval method |
JP5664553B2 (en) * | 2009-10-16 | 2015-02-04 | 日本電気株式会社 | Person clothes feature extraction device, person search device, person clothes feature extraction method, person search method, and program |
US20110106798A1 (en) * | 2009-11-02 | 2011-05-05 | Microsoft Corporation | Search Result Enhancement Through Image Duplicate Detection |
US9710491B2 (en) * | 2009-11-02 | 2017-07-18 | Microsoft Technology Licensing, Llc | Content-based image search |
US8433140B2 (en) * | 2009-11-02 | 2013-04-30 | Microsoft Corporation | Image metadata propagation |
US20110191336A1 (en) * | 2010-01-29 | 2011-08-04 | Microsoft Corporation | Contextual image search |
US20110238679A1 (en) * | 2010-03-24 | 2011-09-29 | Rovi Technologies Corporation | Representing text and other types of content by using a frequency domain |
EP2591466B1 (en) * | 2010-07-06 | 2019-05-08 | Sparkup Ltd. | Method and system for book reading enhancement |
US8875007B2 (en) * | 2010-11-08 | 2014-10-28 | Microsoft Corporation | Creating and modifying an image wiki page |
US11423029B1 (en) | 2010-11-09 | 2022-08-23 | Google Llc | Index-side stem-based variant generation |
US8447767B2 (en) * | 2010-12-15 | 2013-05-21 | Xerox Corporation | System and method for multimedia information retrieval |
US9575994B2 (en) * | 2011-02-11 | 2017-02-21 | Siemens Aktiengesellschaft | Methods and devices for data retrieval |
US9031960B1 (en) | 2011-06-10 | 2015-05-12 | Google Inc. | Query image search |
US9058331B2 (en) | 2011-07-27 | 2015-06-16 | Ricoh Co., Ltd. | Generating a conversation in a social network based on visual search results |
KR20140093957A (en) | 2011-11-24 | 2014-07-29 | 마이크로소프트 코포레이션 | Interactive multi-modal image search |
WO2015152876A1 (en) * | 2014-03-31 | 2015-10-08 | Empire Technology Development Llc | Hash table construction for utilization in recognition of target object in image |
JP6316447B2 (en) * | 2014-05-15 | 2018-04-25 | 華為技術有限公司Huawei Technologies Co.,Ltd. | Object search method and apparatus |
US9990433B2 (en) | 2014-05-23 | 2018-06-05 | Samsung Electronics Co., Ltd. | Method for searching and device thereof |
US11314826B2 (en) | 2014-05-23 | 2022-04-26 | Samsung Electronics Co., Ltd. | Method for searching and device thereof |
US9454713B2 (en) | 2014-12-30 | 2016-09-27 | Ebay Inc. | Similar item detection |
WO2017049045A1 (en) * | 2015-09-16 | 2017-03-23 | RiskIQ, Inc. | Using hash signatures of dom objects to identify website similarity |
US9386037B1 (en) * | 2015-09-16 | 2016-07-05 | RiskIQ Inc. | Using hash signatures of DOM objects to identify website similarity |
US10346520B2 (en) * | 2016-04-26 | 2019-07-09 | RiskIQ, Inc. | Techniques for monitoring version numbers of web frameworks |
US10789287B2 (en) * | 2016-07-05 | 2020-09-29 | Baidu Usa Llc | Method and system for multi-dimensional image matching with content in response to a search query |
US11341459B2 (en) * | 2017-05-16 | 2022-05-24 | Artentika (Pty) Ltd | Digital data minutiae processing for the analysis of cultural artefacts |
EP3602321B1 (en) * | 2017-09-13 | 2023-09-13 | Google LLC | Efficiently augmenting images with related content |
CN107909054B (en) * | 2017-11-30 | 2021-05-04 | 任艳 | Similarity evaluation method and device for picture texts |
CN113590852B (en) * | 2021-06-30 | 2022-07-08 | 北京百度网讯科技有限公司 | Training method of multi-modal recognition model, multi-modal recognition method and device |
US11960528B1 (en) | 2022-09-30 | 2024-04-16 | Amazon Technologies, Inc. | Systems for determining image-based search results |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5873080A (en) * | 1996-09-20 | 1999-02-16 | International Business Machines Corporation | Using multiple search engines to search multimedia data |
US6233586B1 (en) * | 1998-04-01 | 2001-05-15 | International Business Machines Corp. | Federated searching of heterogeneous datastores using a federated query object |
US20020087577A1 (en) * | 2000-05-31 | 2002-07-04 | Manjunath Bangalore S. | Database building method for multimedia contents |
US20020168117A1 (en) * | 2001-03-26 | 2002-11-14 | Lg Electronics Inc. | Image search method and apparatus |
US6606417B1 (en) * | 1999-04-20 | 2003-08-12 | Microsoft Corporation | Method and system for searching for images based on color and shape of a selected image |
US6862713B1 (en) * | 1999-08-31 | 2005-03-01 | International Business Machines Corporation | Interactive process for recognition and evaluation of a partial search query and display of interactive results |
US7437349B2 (en) * | 2002-05-10 | 2008-10-14 | International Business Machines Corporation | Adaptive probabilistic query expansion |
US7872669B2 (en) * | 2004-01-22 | 2011-01-18 | Massachusetts Institute Of Technology | Photo-based mobile deixis system and related techniques |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5668897A (en) * | 1994-03-15 | 1997-09-16 | Stolfo; Salvatore J. | Method and apparatus for imaging, image processing and data compression merge/purge techniques for document image databases |
US5852823A (en) * | 1996-10-16 | 1998-12-22 | Microsoft | Image classification and retrieval system using a query-by-example paradigm |
EP1104174A4 (en) * | 1998-06-09 | 2006-08-02 | Matsushita Electric Ind Co Ltd | Image encoder, image decoder, character checker, and data storage medium |
US6285995B1 (en) * | 1998-06-22 | 2001-09-04 | U.S. Philips Corporation | Image retrieval system using a query image |
US6711293B1 (en) * | 1999-03-08 | 2004-03-23 | The University Of British Columbia | Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image |
US6961463B1 (en) * | 2000-03-29 | 2005-11-01 | Eastman Kodak Company | Method of detecting duplicate pictures in an automatic albuming system |
US7046851B2 (en) * | 2000-11-08 | 2006-05-16 | California Institute Of Technology | Image and video indexing scheme for content analysis |
US6748398B2 (en) * | 2001-03-30 | 2004-06-08 | Microsoft Corporation | Relevance maximizing, iteration minimizing, relevance-feedback, content-based image retrieval (CBIR) |
JP3956360B2 (en) * | 2002-09-30 | 2007-08-08 | 株式会社リコー | Imaging apparatus and image processing method |
US20060161403A1 (en) * | 2002-12-10 | 2006-07-20 | Jiang Eric P | Method and system for analyzing data and creating predictive models |
US7379627B2 (en) * | 2003-10-20 | 2008-05-27 | Microsoft Corporation | Integrated solution to digital image similarity searching |
US7912291B2 (en) * | 2003-11-10 | 2011-03-22 | Ricoh Co., Ltd | Features for retrieval and similarity matching of documents from the JPEG 2000-compressed domain |
US7634472B2 (en) * | 2003-12-01 | 2009-12-15 | Yahoo! Inc. | Click-through re-ranking of images and other data |
US7853582B2 (en) * | 2004-08-31 | 2010-12-14 | Gopalakrishnan Kumar C | Method and system for providing information services related to multimodal inputs |
US7447337B2 (en) * | 2004-10-25 | 2008-11-04 | Hewlett-Packard Development Company, L.P. | Video content understanding through real time video motion analysis |
US8732025B2 (en) * | 2005-05-09 | 2014-05-20 | Google Inc. | System and method for enabling image recognition and searching of remote content on display |
US7457825B2 (en) * | 2005-09-21 | 2008-11-25 | Microsoft Corporation | Generating search requests from multimodal queries |
-
2005
- 2005-09-21 US US11/233,352 patent/US7457825B2/en not_active Expired - Fee Related
-
2008
- 2008-10-08 US US12/247,958 patent/US8081824B2/en not_active Expired - Fee Related
-
2011
- 2011-12-20 US US13/332,248 patent/US20120093371A1/en not_active Abandoned
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5873080A (en) * | 1996-09-20 | 1999-02-16 | International Business Machines Corporation | Using multiple search engines to search multimedia data |
US6233586B1 (en) * | 1998-04-01 | 2001-05-15 | International Business Machines Corp. | Federated searching of heterogeneous datastores using a federated query object |
US6606417B1 (en) * | 1999-04-20 | 2003-08-12 | Microsoft Corporation | Method and system for searching for images based on color and shape of a selected image |
US6862713B1 (en) * | 1999-08-31 | 2005-03-01 | International Business Machines Corporation | Interactive process for recognition and evaluation of a partial search query and display of interactive results |
US20020087577A1 (en) * | 2000-05-31 | 2002-07-04 | Manjunath Bangalore S. | Database building method for multimedia contents |
US20020168117A1 (en) * | 2001-03-26 | 2002-11-14 | Lg Electronics Inc. | Image search method and apparatus |
US7437349B2 (en) * | 2002-05-10 | 2008-10-14 | International Business Machines Corporation | Adaptive probabilistic query expansion |
US7872669B2 (en) * | 2004-01-22 | 2011-01-18 | Massachusetts Institute Of Technology | Photo-based mobile deixis system and related techniques |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8589778B2 (en) * | 2006-12-27 | 2013-11-19 | International Business Machines Corporation | System and method for processing multi-modal communication within a workgroup |
US20080189633A1 (en) * | 2006-12-27 | 2008-08-07 | International Business Machines Corporation | System and Method For Processing Multi-Modal Communication Within A Workgroup |
US20110035406A1 (en) * | 2009-08-07 | 2011-02-10 | David Petrou | User Interface for Presenting Search Results for Multiple Regions of a Visual Query |
US20110125735A1 (en) * | 2009-08-07 | 2011-05-26 | David Petrou | Architecture for responding to a visual query |
US10534808B2 (en) | 2009-08-07 | 2020-01-14 | Google Llc | Architecture for responding to visual query |
US9135277B2 (en) | 2009-08-07 | 2015-09-15 | Google Inc. | Architecture for responding to a visual query |
US9087059B2 (en) | 2009-08-07 | 2015-07-21 | Google Inc. | User interface for presenting search results for multiple regions of a visual query |
US20110128288A1 (en) * | 2009-12-02 | 2011-06-02 | David Petrou | Region of Interest Selector for Visual Queries |
US9183224B2 (en) | 2009-12-02 | 2015-11-10 | Google Inc. | Identifying matching canonical documents in response to a visual query |
US20110131235A1 (en) * | 2009-12-02 | 2011-06-02 | David Petrou | Actionable Search Results for Street View Visual Queries |
US8805079B2 (en) | 2009-12-02 | 2014-08-12 | Google Inc. | Identifying matching canonical documents in response to a visual query and in accordance with geographic information |
US8811742B2 (en) | 2009-12-02 | 2014-08-19 | Google Inc. | Identifying matching canonical documents consistent with visual query structural information |
US9405772B2 (en) | 2009-12-02 | 2016-08-02 | Google Inc. | Actionable search results for street view visual queries |
US8977639B2 (en) | 2009-12-02 | 2015-03-10 | Google Inc. | Actionable search results for visual queries |
US9176986B2 (en) | 2009-12-02 | 2015-11-03 | Google Inc. | Generating a combination of a visual query and matching canonical document |
US20110131241A1 (en) * | 2009-12-02 | 2011-06-02 | David Petrou | Actionable Search Results for Visual Queries |
US20110129153A1 (en) * | 2009-12-02 | 2011-06-02 | David Petrou | Identifying Matching Canonical Documents in Response to a Visual Query |
US9087235B2 (en) | 2009-12-02 | 2015-07-21 | Google Inc. | Identifying matching canonical documents consistent with visual query structural information |
US9053386B2 (en) * | 2011-01-28 | 2015-06-09 | Alibaba Group Holding Limited | Method and apparatus of identifying similar images |
US20130301935A1 (en) * | 2011-01-28 | 2013-11-14 | Alibaba Group Holding Limited | Method and Apparatus of Identifying Similar Images |
CN104685501A (en) * | 2012-08-08 | 2015-06-03 | 谷歌公司 | Identifying textual terms in response to a visual query |
US9372920B2 (en) | 2012-08-08 | 2016-06-21 | Google Inc. | Identifying textual terms in response to a visual query |
US8935246B2 (en) * | 2012-08-08 | 2015-01-13 | Google Inc. | Identifying textual terms in response to a visual query |
CN103544216A (en) * | 2013-09-23 | 2014-01-29 | Tcl集团股份有限公司 | Information recommendation method and system combining image content and keywords |
US11645323B2 (en) | 2020-02-26 | 2023-05-09 | Samsung Electronics Co.. Ltd. | Coarse-to-fine multimodal gallery search system with attention-based neural network models |
Also Published As
Publication number | Publication date |
---|---|
US20070067345A1 (en) | 2007-03-22 |
US20090041366A1 (en) | 2009-02-12 |
US7457825B2 (en) | 2008-11-25 |
US8081824B2 (en) | 2011-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7457825B2 (en) | Generating search requests from multimodal queries | |
US10216766B2 (en) | Large-scale image tagging using image-to-topic embedding | |
US7698332B2 (en) | Projecting queries and images into a similarity space | |
US7231381B2 (en) | Media content search engine incorporating text content and user log mining | |
US8185526B2 (en) | Dynamic keyword suggestion and image-search re-ranking | |
US9189554B1 (en) | Providing images of named resources in response to a search query | |
US8224849B2 (en) | Object similarity search in high-dimensional vector spaces | |
US7647331B2 (en) | Detecting duplicate images using hash code grouping | |
US7860317B2 (en) | Generating search results based on duplicate image detection | |
US7548936B2 (en) | Systems and methods to present web image search results for effective image browsing | |
US8429173B1 (en) | Method, system, and computer readable medium for identifying result images based on an image query | |
US8095478B2 (en) | Method and system for calculating importance of a block within a display page | |
EP1591921B1 (en) | Method and system for identifying page elements relatedness using link and page layout analysis | |
US8046370B2 (en) | Retrieval of structured documents | |
EP2368200B1 (en) | Interactively ranking image search results using color layout relevance | |
US8571850B2 (en) | Dual cross-media relevance model for image annotation | |
US20100017389A1 (en) | Content based image retrieval | |
US8589371B2 (en) | Learning retrieval functions incorporating query differentiation for information retrieval | |
US20090112830A1 (en) | System and methods for searching images in presentations | |
EP1202187A2 (en) | Image retrieval system and methods with semantic and feature based relevance feedback | |
WO2014050002A1 (en) | Query degree-of-similarity evaluation system, evaluation method, and program | |
US9977816B1 (en) | Link-based ranking of objects that do not include explicitly defined links | |
US20020178149A1 (en) | Content -based similarity retrieval system for image data | |
Clustering | Grouping of Questions From a Question Bank Using Partition-Based Clustering | |
Rahman et al. | An interactive and dynamic fusion-based image retrieval approach by cindi |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0541 Effective date: 20141014 |
|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, MING JING;MA, WEI-YING;XIE, XING;AND OTHERS;SIGNING DATES FROM 20051019 TO 20051024;REEL/FRAME:036168/0571 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |