WO2011017558A1 - Interface utilisateur de présentation de résultats de recherche pour de multiples régions d'une interrogation visuelle - Google Patents
Interface utilisateur de présentation de résultats de recherche pour de multiples régions d'une interrogation visuelle Download PDFInfo
- Publication number
- WO2011017558A1 WO2011017558A1 PCT/US2010/044604 US2010044604W WO2011017558A1 WO 2011017558 A1 WO2011017558 A1 WO 2011017558A1 US 2010044604 W US2010044604 W US 2010044604W WO 2011017558 A1 WO2011017558 A1 WO 2011017558A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- visual
- search
- visual query
- results
- query
- Prior art date
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 402
- 230000002452 interceptive effect Effects 0.000 claims abstract description 84
- 238000000034 method Methods 0.000 claims abstract description 74
- 230000008569 process Effects 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims description 43
- 230000004913 activation Effects 0.000 claims description 15
- 238000012552 review Methods 0.000 claims description 4
- 230000026676 system process Effects 0.000 abstract description 4
- 230000001815 facial effect Effects 0.000 description 55
- 239000000047 product Substances 0.000 description 40
- 238000004891 communication Methods 0.000 description 34
- 238000012015 optical character recognition Methods 0.000 description 32
- 238000010586 diagram Methods 0.000 description 20
- 238000007781 pre-processing Methods 0.000 description 16
- 238000010191 image analysis Methods 0.000 description 12
- 239000007787 solid Substances 0.000 description 12
- 230000004044 response Effects 0.000 description 11
- 150000001875 compounds Chemical class 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 241001481833 Coryphaena hippurus Species 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 238000010079 rubber tapping Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 239000011800 void material Substances 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5838—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/438—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/14—Digital output to display device ; Cooperation and interconnection of the display device with other functional units
Definitions
- the disclosed embodiments relate generally to a presenting search results by a plurality of parallel search systems for processing a visual query.
- a text-based or term-based searching wherein a user inputs a word or phrase into a search engine and receives a variety of results is a useful tool for searching.
- term based queries require that a user be able to input a relevant term.
- a user may wish to know information about an image. For example, a user might want to know the name of a person in a photograph, or a user might want to know the name of a flower or bird in a picture. Accordingly, a system that can receive a visual query and provide search results would be desirable.
- a computer-implemented method of processing a visual query includes performing the following steps on a server system having one or more processors and memory storing one or more programs for execution by the one or more processors.
- a visual query from a client system is received.
- the visual query is processed by sending the visual query to a plurality of parallel search systems for
- Each of the plurality of search systems implements a distinct visual query search process of a plurality of visual query search processes.
- the server system receives a plurality of search results from one or more of the plurality of parallel search systems. It creates an interactive results document comprising one or more visual identifiers of respective sub-portions of the visual query. Each visual identifier has at least one user selectable link to at least one of the search results.
- the server system sends the interactive results document to the client system.
- the search result includes data related to the corresponding sub-portion of the visual query.
- the sending further comprises sending a subset of the plurality of search results in a search results list for presentation with the interactive results document.
- the method further comprises, receiving a user selection of the at least one user selectable link; and identifying a search result in the search results list corresponding to the selected link.
- the visual identifiers comprise one or more bounding boxes around respective sub-portions of the visual query.
- the bounding boxes may be square or may outline the respective sub-portion of the visual query.
- some bounding boxes include smaller bounding boxes inside of them.
- each of the bounding boxes includes a user selectable link to one or more search results, and the user selectable link has an activation region corresponding to the sub-portion of the visual query surrounded by the bounding box.
- a respective user selectable link has an activation region which corresponds to the sub-portion of the visual query that is associated with a corresponding visual identifier.
- the method when the selectable sub-portion contains text, the method further includes sending the text of the selectable sub-portion to a text based query processing system.
- the sub-portion of the visual query corresponds to a respective visual identifier containing text
- the search results corresponding to the respective visual identifier include results from a term query search on at least one of the terms in the text.
- the search results corresponding to the respective visual identifier include a name, a handle, contact information, account information, address information, current location of a related mobile device associated with the person who's face is contained in the selectable sub-portion other images of the person who's face is contained in the selectable sub-portion, and/or potential image matches for the person's face.
- the search results corresponding to the respective visual identifier include product information, a product review, an option to initiate purchase of the product, an option to initiate a bid on the product, a list of similar products, and/or a list of related products.
- a respective visual identifier of the one or more visual identifiers is formatted for presentation in a visually distinctive manner in accordance with a type of recognized entity in the respective sub-portion of the visual query.
- the respective visual identifier may be formatted for presentation in a visually distinctive such as overlay color, overlay pattern, label background color, label background pattern, label font color, and border color.
- a respective visual identifier of the one or more visual identifiers comprises a label consisting of at least one term associated with the image in the respective sub-portion of the visual query.
- the label is formatted for presentation in the interactive results document on or near the respective sub-portion.
- Figure 1 is a block diagram illustrating a computer network that includes a visual query server system.
- Figure 2 is a flow diagram illustrating the process for responding to a visual query, in accordance with some embodiments.
- Figure 3 is a flow diagram illustrating the process for responding to a visual query with an interactive results document, in accordance with some embodiments.
- Figure 4 is a flow diagram illustrating the communications between a client and a visual query server system, in accordance with some embodiments.
- FIG. 5 is a block diagram illustrating a client system, in accordance with some embodiments.
- Figure 6 is a block diagram illustrating a front end visual query processing server system, in accordance with some embodiments.
- Figure 7 is a block diagram illustrating a generic one of the parallel search systems utilized to process a visual query, in accordance with some embodiments.
- Figure 8 is a block diagram illustrating an OCR search system utilized to process a visual query, in accordance with some embodiments.
- Figure 9 is a block diagram illustrating a facial recognition search system utilized to process a visual query, in accordance with some embodiments.
- Figure 10 is a block diagram illustrating an image to terms search system utilized to process a visual query, in accordance with some embodiments.
- Figure 11 illustrates a client system with a screen shot of an exemplary visual query, in accordance with some embodiments.
- Figures 12A and 12B each illustrate a client system with a screen shot of an interactive results document with bounding boxes, in accordance with some embodiments.
- Figure 13 illustrates a client system with a screen shot of an interactive results document that is coded by type, in accordance with some embodiments.
- Figure 14 illustrates a client system with a screen shot of an interactive results document with labels, in accordance with some embodiments.
- Figure 15 illustrates a screen shot of an interactive results document and visual query displayed concurrently with a results list, in accordance with some embodiments.
- phrase “if it is determined” or “if is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event)” or “in response to detecting (the stated condition or event),” depending on the context.
- FIG. 1 is a block diagram illustrating a computer network that includes a visual query server system according to some embodiments.
- the computer network 100 includes one or more client systems 102 and a visual query server system 106.
- One or more communications networks 104 interconnect these components.
- the communications network 104 may be any of a variety of networks, including local area networks (LAN), wide area networks (WAN), wireless networks, wireline networks, the Internet, or a combination of such networks.
- the client system 102 includes a client application 108, which is executed by the client system, for receiving a visual query (e.g., visual query 1102 of Fig 11).
- a visual query is an image that is submitted as a query to a search engine or search system.
- the client application 108 is selected from the set consisting of a search application, a search engine plug-in for a browser application, and a search engine extension for a browser application.
- the client application 108 is an "omnivorous" search box, which allows a user to drag and drop any format of image into the search box to be used as the visual query.
- a client system 102 sends queries to and receives data from the visual query server system 106.
- the client system 102 may be any computer or other device that is capable of communicating with the visual query server system 106. Examples include, without limitation, desktop and notebook computers, mainframe computers, server computers, mobile devices such as mobile phones and personal digital assistants, network terminals, and set-top boxes.
- the visual query server system 106 includes a front end visual query processing server 110.
- the front end server 110 receives a visual query from the client 102, and sends the visual query to a plurality of parallel search systems 112 for simultaneous processing.
- the search systems 112 each implement a distinct visual query search process and access their corresponding databases 114 as necessary to process the visual query by their distinct search process.
- a face recognition search system 112-A will access a facial image database 114- A to look for facial matches to the image query.
- the facial recognition search system 112-A will return one or more search results (e.g., names, matching faces, etc.) from the facial image database 114-A.
- the optical character recognition (OCR) search system 112-B converts any recognizable text in the visual query into text for return as one or more search results.
- OCR optical character recognition
- an OCR database 114-B may be accessed to recognize particular fonts or text patterns as explained in more detail with regard to Figure 8. [0036] Any number of parallel search systems 112 may be used.
- Some examples include a facial recognition search system 112-A, an OCR search system 112-B, an image-to- terms search system 112-C (which may recognize an object or an object category), a product recognition search system (which may be configured to recognize 2-D images such as book covers and CDs and may also be configured to recognized 3-D images such as furniture), bar code recognition search system (which recognizes ID and 2D style bar codes), a named entity recognition search system, landmark recognition (which may configured to recognize particular famous landmarks like the Eiffel Tower and may also be configured to recognize a corpus of specific images such as billboards), place recognition aided by geo-location information provided by a GPS receiver in the client system 102 or mobile phone network, a color recognition search system, and a similar image search system (which searches for and identifies images similar to a visual query).
- a facial recognition search system 112-A an OCR search system 112-B, an image-to- terms search system 112-C (which may recognize an object or an object category)
- a product recognition search system which may be configured
- the visual query server system 106 includes a facial recognition search system 112-A, an OCR search system 112-B, and at least one other query-by-image search system 112.
- the parallel search systems 112 each individually process the visual search query and return their results to the front end server system 110.
- the front end server 100 may perform one or more analyses on the search results such as one or more of: aggregating the results into a compound document, choosing a subset of results to display, and ranking the results as will be explained in more detail with regard to Figure 6.
- the front end server 110 communicates the search results to the client system 102.
- the client system 102 presents the one or more search results to the user.
- the results may be presented on a display, by an audio speaker, or any other means used to communicate information to a user.
- the user may interact with the search results in a variety of ways.
- the user's selections, annotations, and other interactions with the search results are transmitted to the visual query server system 106 and recorded along with the visual query in a query and annotation database 116.
- Information in the query and annotation database can be used to improve visual query results.
- the information from the query and annotation database 116 is periodically pushed to the parallel search systems 112, which incorporate any relevant portions of the information into their respective individual databases 114.
- the computer network 100 optionally includes a term query server system
- a term query is a query containing one or more terms, as opposed to a visual query which contains an image.
- the term query server system 118 may be used to generate search results that supplement information produced by the various search engines in the visual query server system 106.
- the results returned from the term query server system 118 may include any format.
- the term query server system 118 may include textual documents, images, video, etc. While term query server system 118 is shown as a separate system in Figure 1 , optionally the visual query server system 106 may include a term query server system 118.
- Figure 2 is a flow diagram illustrating a visual query server system method for responding to a visual query, according to certain embodiments of the invention.
- Each of the operations shown in Figure 2 may correspond to instructions stored in a computer memory or computer readable storage medium.
- the visual query server system receives a visual query from a client system
- the client system may be a desktop computing device, a mobile device, or another similar device (204) as explained with reference to Figure 1.
- An example visual query on an example client system is shown in Figure 11.
- the visual query is an image document of any suitable format.
- the visual query can be a photograph, a screen shot, a scanned image, or a frame or a sequence of multiple frames of a video (206).
- the visual query is a drawing produced by a content authoring program (736, Fig. 5).
- the user "draws" the visual query, while in other embodiments the user scans or photographs the visual query.
- Some visual queries are created using an image generation application such as Acrobat, a photograph editing program, a drawing program, or an image editing program.
- a visual query could come from a user taking a photograph of his friend on his mobile phone and then submitting the photograph as the visual query to the server system.
- the visual query could also come from a user scanning a page of a magazine, or taking a screen shot of a webpage on a desktop computer and then submitting the scan or screen shot as the visual query to the server system.
- the visual query is submitted to the server system 106 through a search engine extension of a browser application, through a plug-in for a browser application, or by a search application executed by the client system 102.
- Visual queries may also be submitted by other application programs (executed by a client system) that support or generate images which can be transmitted to a remotely located server by the client system.
- the visual query can be a combination of text and non-text elements (208).
- a query could be a scan of a magazine page containing images and text, such as a person standing next to a road sign.
- a visual query can include an image of a person's face, whether taken by a camera embedded in the client system or a document scanned by or otherwise received by the client system.
- a visual query can also be a scan of a document containing only text.
- the visual query can also be an image of numerous distinct subjects, such as several birds in a forest, a person and an object (e.g., car, park bench, etc.), a person and an animal ⁇ e.g., pet, farm animal, butterfly, etc.).
- Visual queries may have two or more distinct elements.
- a visual query could include a barcode and an image of a product or product name on a product package.
- the visual query could be a picture of a book cover that includes the title of the book, cover art, and a bar code.
- one visual query will produce two or more distinct search results corresponding to different portions of the visual query, as discussed in more detail below.
- the server system processes the visual query as follows.
- the front end server system sends the visual query to a plurality of parallel search systems for simultaneous processing (210).
- Each search system implements a distinct visual query search process, i.e., an individual search system processes the visual query by its own processing scheme.
- one of the search systems to which the visual query is sent for processing is an optical character recognition (OCR) search system.
- OCR optical character recognition
- one of the search systems to which the visual query is sent for processing is a facial recognition search system.
- the plurality of search systems running distinct visual query search processes includes at least: optical character recognition (OCR), facial recognition, and another query-by-image process other than OCR and facial recognition (212).
- OCR optical character recognition
- the other query-by-image process is selected from a set of processes that includes but is not limited to product recognition, bar code recognition, object-or-object- category recognition, named entity recognition, and color recognition (212).
- named entity recognition occurs as a post process of the OCR search system, wherein the text result of the OCR is analyzed for famous people, locations, objects and the like, and then the terms identified as being named entities are searched in the term query server system (118, Fig. 1).
- images of famous landmarks, logos, people, album covers, trademarks, etc. are recognized by an image- to-terms search system.
- a distinct named entity query-by-image process separate from the image-to-terms search system is utilized.
- the object-or-object category recognition system recognizes generic result types like "car.”
- this system also recognizes product brands, particular product models, and the like, and provides more specific descriptions, like "Porsche.”
- Some of the search systems could be special user specific search systems. For example, particular versions of color recognition and facial recognition could be a special search systems used by the blind.
- the front end server system receives results from the parallel search systems
- the results are accompanied by a search score.
- some of the search systems will find no relevant results. For example, if the visual query was a picture of a flower, the facial recognition search system and the bar code search system will not find any relevant results.
- a null or zero search score is received from that search system (216).
- the front end server does not receive a result from a search system after a pre-defined period of time (e.g., 0.2, 0.5, 1, 2 or 5 seconds), it will process the received results as if that timed out server produced a null search score and will process the received results from the other search systems.
- one of the predefined criteria excludes void results.
- a pre-defined criterion is that the results are not void.
- one of the predefined criteria excludes results having numerical score (e.g., for a relevance factor) that falls below a pre-defined minimum score.
- the plurality of search results are filtered (220).
- the results are only filtered if the total number of results exceeds a pre-defined threshold.
- all the results are ranked but the results falling below a pre-defined minimum score are excluded.
- the content of the results are filtered. For example, if some of the results contain private information or personal protected information, these results are filtered out.
- the visual query server system creates a compound search result
- the term query server system (118, Fig. 1) may augment the results from one of the parallel search systems with results from a term search, where the additional results are either links to documents or information sources, or text and/or images containing additional information that may be relevant to the visual query.
- the compound search result may contain an OCR result and a link to a named entity in the OCR document (224).
- the OCR search system (112-B, Fig. 1) or the front end visual query processing server (110, Fig. 1) recognizes likely relevant words in the text. For example, it may recognize named entities such as famous people or places.
- the named entities are submitted as query terms to the term query server system (118, Fig. 1).
- the term query results produced by the term query server system are embedded in the visual query result as a "link.”
- the term query results are returned as separate links. For example, if a picture of a book cover were the visual query, it is likely that an object recognition search system will produce a high scoring hit for the book.
- the term query results are presented in a labeled group to distinguish them from the visual query results.
- the results may be searched individually, or a search may be performed using all the recognized named entities in the search query to produce particularly relevant additional search results.
- the visual query is a scanned travel brochure about Paris
- the returned result may include links to the term query server system 118 for initiating a search on a term query "Notre Dame.”
- compound search results include results from text searches for recognized famous images.
- the visual query server system then sends at least one result to the client system (226).
- the visual query processing server receives a plurality of search results from at least some of the plurality of search systems, it will then send at least one of the plurality of search results to the client system.
- only one search system will return relevant results.
- the OCR server's results may be relevant.
- only one result from one search system may be relevant.
- only the product related to a scanned bar code may be relevant. In these instances, the front end visual processing server will return only the relevant search result(s).
- a plurality of search results are sent to the client system, and the plurality of search results include search results from more than one of the parallel search systems (228). This may occur when more than one distinct image is in the visual query. For example, if the visual query were a picture of a person riding a horse, results for facial recognition of the person could be displayed along with object identification results for the horse. In some embodiments, all the results for a particular query by image search system are grouped and presented together. For example, the top N facial recognition results are displayed under a heading "facial recognition results" and the top N object recognition results are displayed together under a heading "object recognition results.” Alternatively, as discussed below, the search results from a particular image search system may be grouped by image region.
- the search results may include both OCR results and one or more image-match results (230).
- the user may wish to learn more about a particular search result. For example, if the visual query was a picture of a dolphin and the "image to terms" search system returns the following terms “water,” “dolphin,” “blue,” and “Flipper;” the user may wish to run a text based query term search on "Flipper.”
- the query term server system 118, Fig. 1 is accessed, and the search on the selected term(s) is run.
- the corresponding search term results are displayed on the client system either separately or in conjunction with the visual query results (232).
- the front end visual query processing server (110, Fig. 1) automatically (i.e., without receiving any user command, other than the initial visual query) chooses one or more top potential text results for the visual query, runs those text results on the term query server system 118, and then returns those term query results along with the visual query result to the client system as a part of sending at least one search result to the client system (232).
- the front end server runs a term query on "Flipper" and returns those term query results along with the visual query results to the client system.
- results are displayed as a compound search result (222) as explained above.
- the results are part of a search result list instead of or in addition to a compound search result.
- Figure 3 is a flow diagram illustrating the process for responding to a visual query with an interactive results document.
- the first three operations (202, 210, 214) are described above with reference to Figure 2.
- an interactive results document is created (302).
- the interactive results document includes one or more visual identifiers of respective sub-portions of the visual query.
- Each visual identifier has at least one user selectable link to at least one of the search results.
- a visual identifier identifies a respective sub-portion of the visual query.
- the interactive results document has only one visual identifier with one user selectable link to one or more results.
- a respective user selectable link to one or more of the search results has an activation region, and the activation region corresponds to the sub-portion of the visual query that is associated with a corresponding visual identifier.
- the visual identifier is a bounding box (304).
- the bounding box encloses a sub-portion of the visual query as shown in Figure 12 A.
- the bounding box need not be a square or rectangular box shape but can be any sort of shape including circular, oval, conformal (e.g., to an object in, entity in or region of the visual query), irregular or any other shape as shown in Figure 12B.
- the bounding box outlines the boundary of an identifiable entity in a sub-portion of the visual query (306).
- each bounding box includes a user selectable link to one or more search results, where the user selectable link has an activation region corresponding to a sub-portion of the visual query surrounded by the bounding box.
- search results that correspond to the image in the outlined sub-portion are returned.
- the visual identifier is a label (307) as shown in Figure
- label includes at least one term associated with the image in the respective sub-portion of the visual query.
- Each label is formatted for presentation in the interactive results document on or near the respective sub-portion.
- the labels are color coded.
- each respective visual identifiers is formatted for presentation in a visually distinctive manner in accordance with a type of recognized entity in the respective sub-portion of the visual query. For example, as shown in Figure 13, bounding boxes around a product, a person, a trademark, and the two textual areas are each presented with distinct cross-hatching patterns, representing differently colored transparent bounding boxes.
- the visual identifiers are formatted for presentation in visually distinctive manners such as overlay color, overlay pattern, label background color, label background pattern, label font color, and border color.
- the user selectable link in the interactive results document is a link to a document or object that contains one or more results related to the corresponding sub-portion of the visual query (308).
- at least one search result includes data related to the corresponding sub-portion of the visual query.
- a visual query was a photograph of a bar code
- the interactive results document may include a bounding box around only the bar code.
- the bar code search result is displayed.
- the bar code search result may include one result, the name of the product corresponding to that bar code, or the bar code results may include several results such as a variety of places in which that product can be purchased, reviewed, etc.
- the search results corresponding to the respective visual identifier include results from a term query search on at least one of the terms in the text.
- the search results corresponding to the respective visual identifier include one or more of: name, handle, contact information, account information, address information, current location of a related mobile device associated with the person whose face is contained in the selectable sub-portion, other images of the person whose face is contained in the selectable sub-portion, and potential image matches for the person's face.
- the search results corresponding to the respective visual identifier include one or more of: product information, a product review, an option to initiate purchase of the product, an option to initiate a bid on the product, a list of similar products, and a list of related products.
- a respective user selectable link in the interactive results document includes anchor text, which is displayed in the document without having to activate the link.
- the anchor text provides information, such as a key word or term, related to the information obtained when the link is activated.
- Anchor text may be displayed as part of the label (307), or in a portion of a bounding box (304), or as additional information displayed when a user hovers a cursor over a user selectable link for a pre-determined period of time such as 1 second.
- a respective user selectable link in the interactive results document is a link to a search engine for searching for information or documents corresponding to a text-based query (sometimes herein called a term query).
- Activation of the link causes execution of the search by the search engine, where the query and the search engine are specified by the link (e.g., the search engine is specified by a URL in the link and the text- based search query is specified by a URL parameter of the link), with results returned to the client system.
- the link in this example may include anchor text specifying the text or terms in the search query.
- the interactive results document produced in response to a visual query can include a plurality of links that correspond to results from the same search system.
- a visual query may be an image or picture of a group of people.
- the interactive results document may include bounding boxes around each person, which when activated returns results from the facial recognition search system for each face in the group.
- a plurality of links in the interactive results document corresponds to search results from more than one search system (310). For example, if a picture of a person and a dog was submitted as the visual query, bounding boxes in the interactive results document may outline the person and the dog separately.
- the interactive results document contains an OCR result and an image match result (312).
- the interactive results document may include visual identifiers for the person and for the text in the sign.
- the interactive results document may include visual identifiers for photographs or trademarks in
- the interactive results document After the interactive results document has been created, it is sent to the client system (314).
- the interactive results document (e.g., document 1200, Figure 15) is sent in conjunction with a list of search results from one or more parallel search systems, as discussed above with reference to Figure 2.
- the interactive results document is displayed at the client system above or otherwise adjacent to a list of search results from one or more parallel search systems (315) as shown in Figure 15.
- the user will interact with the results document by selecting a visual identifier in the results document.
- the server system receives from the client system information regarding the user selection of a visual identifier in the interactive results document (316).
- the link is activated by selecting an activation region inside a bounding box.
- the link is activated by a user selection of a visual identifier of a sub-portion of the visual query, which is not a bounding box.
- the linked visual identifier is a hot button, a label located near the sub-portion, an underlined word in text, or other representation of an object or subject in the visual query.
- the search results list is presented with the interactive results document (315)
- the search result in the search results list corresponding to the selected link is identified.
- the cursor will jump or automatically move to the first result corresponding to the selected link.
- selecting a link in the interactive results document causes the search results list to scroll or jump so as to display at least a first result corresponding to the selected link.
- the results list is reordered such that the first result corresponding to the link is displayed at the top of the results list.
- the visual query server system when the user selects the user selectable link (316) the visual query server system sends at least a subset of the results, related to a corresponding sub-portion of the visual query, to the client for display to the user (318).
- the user can select multiple visual identifiers concurrently and will receive a subset of results for all of the selected visual identifiers at the same time.
- search results corresponding to the user selectable links are preloaded onto the client prior to user selection of any of the user selectable links so as to provide search results to the user virtually instantaneously in response to user selection of one or more links in the interactive results document.
- FIG. 4 is a flow diagram illustrating the communications between a client and a visual query server system.
- the client 102 receives a visual query from a user/querier (402).
- visual queries can only be accepted from users who have signed up for or "opted in” to the visual query system.
- searches for facial recognition matches are only performed for users who have signed up for the facial recognition visual query system, while other types of visual queries are performed for anyone regardless of whether they have "opted in” to the facial recognition portion.
- the format of the visual query can take many forms.
- the visual query will likely contain one or more subjects located in sub-portions of the visual query document.
- the client system 102 performs type recognition pre-processing on the visual query (404).
- the client system 102 searches for particular recognizable patterns in this pre-processing system. For example, for some visual queries the client may recognize colors.
- the client may recognize that a particular sub-portion is likely to contain text (because that area is made up of small dark characters surrounded by light space etc.)
- the client may contain any number of pre-processing type recognizers, or type recognition modules.
- the client will have a type recognition module (barcode recognition 406) for recognizing bar codes. It may do so by recognizing the distinctive striped pattern in a rectangular area.
- the client will have a type recognition module (face detection 408) for recognizing that a particular subject or sub-portion of the visual query is likely to contain a face.
- the recognized "type" is returned to the user for verification.
- the client system 102 may return a message stating "a bar code has been found in your visual query, are you interested in receiving bar code query results?"
- the message may even indicate the sub-portion of the visual query where the type has been found.
- this presentation is similar to the interactive results document discussed with reference to Figure 3. For example, it may outline a sub- portion of the visual query and indicate that the sub-portion is likely to contain a face, and ask the user if they are interested in receiving facial recognition results.
- the client 102 After the client 102 performs the optional pre-processing of the visual query, the client sends the visual query to the visual query server system 106, specifically to the front end visual query processing server 110.
- the client if pre-processing produced relevant results, i.e., if one of the type recognition modules produced results above a certain threshold, indicating that the query or a sub-portion of the query is likely to be of a particular type (face, text, barcode etc.), the client will pass along information regarding the results of the pre-processing. For example, the client may indicate that the face recognition module is 75% sure that a particular sub-portion of the visual query contains a face.
- the pre-processing results include one or more subject type values (e.g., bar code, face, text, etc.).
- the pre-processing results sent to the visual query server system include one or more of: for each subject type value in the pre-processing results, information identifying a sub-portion of the visual query corresponding to the subject type value, and for each subject type value in the pre-processing results, a confidence value indicating a level of confidence in the subject type value and/or the identification of a corresponding sub-portion of the visual query.
- the front end server 110 receives the visual query from the client system
- the visual query received may contain the pre-processing information discussed above.
- the front end server sends the visual query to a plurality of parallel search systems (210). If the front end server 110 received pre-processing information regarding the likelihood that a sub-portion contained a subject of a certain type, the front end server may pass this information along to one or more of the parallel search systems. For example, it may pass on the information that a particular sub-portion is likely to be a face so that the facial recognition search system 112- A can process that subsection of the visual query first. Similarly, sending the same information (that a particular sub-portion is likely to be a face) may be used by the other parallel search systems to ignore that sub-portion or analyze other sub-portions first. In some embodiments, the front end server will not pass on the pre-processing information to the parallel search systems, but will instead use this information to augment the way in which it processes the results received from the parallel search systems.
- the front end server 110 receives a plurality of search results from the parallel search systems (214). The front end server may then perform a variety of ranking and filtering, and may create an interactive search result document as explained with reference to Figures 2 and 3. If the front end server 110 received pre-processing information regarding the likelihood that a sub- portion contained a subject of a certain type, it may filter and order by giving preference to those results that match the pre-processed recognized subject type. If the user indicated that a particular type of result was requested, the front end server will take the user's requests into account when processing the results.
- the front end server may filter out all other results if the user only requested bar code information, or the front end server will list all results pertaining to the requested type prior to listing the other results. If an interactive visual query document is returned, the server may pre-search the links associated with the type of result the user indicated interest in, while only providing links for performing related searches for the other subjects indicated in the interactive results document. Then the front end server 110 sends the search results to the client system (226).
- the client 102 receives the results from the server system (412). When applicable, these results will include the results that match the type of result found in the preprocessing stage. For example, in some embodiments they will include one or more bar code results (414) or one or more facial recognition results (416). If the client's pre-processing modules had indicated that a particular type of result was likely, and that result was found, the found results of that type will be listed prominently.
- the user will select or annotate one or more of the results (418).
- the user may select one search result, may select a particular type of search result, and/or may select a portion of an interactive results document (420). Selection of a result is implicit feedback that the returned result was relevant to the query. Such feedback information can be utilized in future query processing operations.
- An annotation provides explicit feedback about the returned result that can also be utilized in future query processing operations. Annotations take the form of corrections of portions of the returned result (like a correction to a mis-OCRed word) or a separate annotation (either free form or structured.)
- the user's selection of one search result is a process that is referred to as a selection among interpretations.
- the user's selection of a particular type of search result generally selecting the result "type" of interest from several different types of returned results (e.g., choosing the OCRed text of an article in a magazine rather than the visual results for the advertisements also on the same page), is a process that is referred to as disambiguation of intent.
- a user may similarly select particular linked words (such as recognized named entities) in an OCRed document as explained in detail with reference to Figure 8.
- the user may alternatively or additionally wish to annotate particular search results.
- This annotation may be done in freeform style or in a structured format (422).
- the annotations may be descriptions of the result or may be reviews of the result. For example, they may indicate the name of subject(s) in the result, or they could indicate "this is a good book” or "this product broke within a year of purchase.”
- Another example of an annotation is a user-drawn bounding box around a sub-portion of the visual query and user-provided text identifying the object or subject inside the bounding box. User annotations are explained in more detail with reference to Figure 5.
- the user selections of search results and other annotations are sent to the server system (424).
- the front end server 110 receives the selections and annotations and further processes them (426). If the information was a selection of an object, sub-region or term in an interactive results document, further information regarding that selection may be requested, as appropriate. For example, if the selection was of one visual result, more information about that visual result would be requested. If the selection was a word (either from the OCR server or from the Image-to-Terms server) a textual search of that word would be sent to the term query server system 118. If the selection was of a person from a facial image recognition search system, that person's profile would be requested.
- the server system receives an annotation, the annotation is stored in a query and annotation database 116, explained with reference to Figure 5. Then the information from the annotation database 116 is periodically copied to individual annotation databases for one or more of the parallel server systems, as discussed below with reference to Figures 7 - 10.
- FIG. 5 is a block diagram illustrating a client system 102 in accordance with one embodiment of the present invention.
- the client system 102 typically includes one or more processing units (CPU's) 702, one or more network or other communications interfaces 704, memory 712, and one or more communication buses 714 for interconnecting these components.
- the client system 102 includes a user interface 705.
- the user interface 705 includes a display device 706 and optionally includes an input means such as a keyboard, mouse, or other input buttons 708.
- the display device 706 includes a touch sensitive surface 709, in which case the display 706/709 is a touch sensitive display.
- a physical keyboard is optional (e.g., a soft keyboard may be displayed when keyboard entry is needed).
- client systems use a microphone and voice recognition to supplement or replace the keyboard.
- the client 102 includes a GPS (global positioning satellite) receiver, or other location detection apparatus 707 for determining the location of the client system 102.
- visual query search services are provided that require the client system 102 to provide the visual query server system to receive location information indicating the location of the client system 102.
- the client system 102 also includes an image capture device 710 such as a camera or scanner.
- Memory 712 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
- Memory 712 may optionally include one or more storage devices remotely located from the CPU(s) 702.
- Memory 712, or alternately the non-volatile memory device(s) within memory 712, comprises a non-transitory computer readable storage medium.
- memory 712 or the computer readable storage medium of memory 712 stores the following programs, modules and data structures, or a subset thereof: • an operating system 716 that includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a network communication module 718 that is used for connecting the client system 102 to other computers via the one or more communication network interfaces 704 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- a image capture module 720 for processing a respective image captured by the image capture device/camera 710, where the respective image may be sent (e.g., by a client application module) as a visual query to the visual query server system;
- client application modules 722 for handling various aspects of querying by image, including but not limited to: a query-by-image submission module 724 for submitting visual queries to the visual query server system; optionally a region of interest selection module 725 that detects a selection (such as a gesture on the touch sensitive display 706/709) of a region of interest in an image and prepares that region of interest as a visual query; a results browser 726 for displaying the results of the visual query; and optionally an annotation module 728 with optional modules for structured annotation text entry 730 such as filling in a form or for freeform annotation text entry 732, which can accept annotations from a variety of formats, and an image region selection module 734 (sometimes referred to herein as a result selection module) which allows a user to select a particular sub-portion of an image for annotation;
- a query-by-image submission module 724 for submitting visual queries to the visual query server system
- a region of interest selection module 725 that detects a selection (such as a gesture on the touch
- an optional content authoring application(s) 736 that allow a user to author a visual query by creating or editing an image rather than just capturing one via the image capture device 710; optionally, one or such applications 736 may include instructions that enable a user to select a sub-portion of an image for use as a visual query;
- the local image analysis may recognize particular types of images, or sub-regions within an image. Examples of image types that may be recognized by such modules 738 include one or more of: facial type (facial image recognized within visual query), bar code type (bar code recognized within visual query), and text type (text recognized within visual query); and
- client applications 740 such as an email application, a phone
- the application corresponding to an appropriate actionable search result can be launched or accessed when the actionable search result is selected.
- the image region selection module 734 which allows a user to select a particular sub-portion of an image for annotation, also allows the user to choose a search result as a "correct" hit without necessarily further annotating it.
- the user may be presented with a top N number of facial recognition matches and may choose the correct person from that results list.
- more than one type of result will be presented, and the user will choose a type of result.
- the image query may include a person standing next to a tree, but only the results regarding the person is of interest to the user. Therefore, the image selection module 734 allows the user to indicate which type of image is the "correct" type - i.e., the type he is interested in receiving.
- the user may also wish to annotate the search result by adding personal comments or descriptive words using either the annotation text entry module 730 (for filling in a form) or freeform annotation text entry module 732.
- the optional local image analysis module 738 is a portion of the client application (108, Fig. 1). Furthermore, in some embodiments the optional local image analysis module 738 includes one or more programs to perform local image analysis to pre-process or categorize the visual query or a portion thereof. For example, the client application 722 may recognize that the image contains a bar code, a face, or text, prior to submitting the visual query to a search engine. In some embodiments, when the local image analysis module 738 detects that the visual query contains a particular type of image, the module asks the user if they are interested in a corresponding type of search result.
- the local image analysis module 738 may detect a face based on its general characteristics (i.e., without determining which person's face) and provides immediate feedback to the user prior to sending the query on to the visual query server system. It may return a result like, "A face has been detected, are you interested in getting facial recognition matches for this face?" This may save time for the visual query server system (106, Fig. 1). For some visual queries, the front end visual query processing server (110, Fig. 1) only sends the visual query to the search system 112 corresponding to the type of image recognized by the local image analysis module 738.
- the visual query to the search system 112 may send the visual query to all of the search systems 112A-N, but will rank results from the search system 112 corresponding to the type of image recognized by the local image analysis module 738.
- the manner in which local image analysis impacts on operation of the visual query server system depends on the configuration of the client system, or configuration or processing parameters associated with either the user or the client system.
- the actual content of any particular visual query and the results produced by the local image analysis may cause different visual queries to be handled differently at either or both the client system and the visual query server system.
- bar code recognition is performed in two steps, with analysis of whether the visual query includes a bar code performed on the client system at the local image analysis module 738. Then the visual query is passed to a bar code search system only if the client determines the visual query is likely to include a bar code. In other embodiments, the bar code search system processes every visual query.
- the client system 102 includes additional client applications 740.
- FIG 6 is a block diagram illustrating a front end visual query processing server system 110 in accordance with one embodiment of the present invention.
- the front end server 110 typically includes one or more processing units (CPU's) 802, one or more network or other communications interfaces 804, memory 812, and one or more
- Memory 812 includes highspeed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non- volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices. Memory 812 may optionally include one or more storage devices remotely located from the CPU(s) 802. Memory 812, or alternately the nonvolatile memory device(s) within memory 812, comprises a non-transitory computer readable storage medium. In some embodiments, memory 812 or the computer readable storage medium of memory 812 stores the following programs, modules and data structures, or a subset thereof:
- an operating system 816 that includes procedures for handling various basic system services and for performing hardware dependent tasks
- a network communication module 818 that is used for connecting the front end server system 110 to other computers via the one or more communication network interfaces 804 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- a query manager 820 for handling the incoming visual queries from the client system 102 and sending them to two or more parallel search systems; as described elsewhere in this document, in some special situations a visual query may be directed to just one of the search systems, such as when the visual query includes an client-generated instruction (e.g., "facial recognition search only");
- a results filtering module 822 for optionally filtering the results from the one or more parallel search systems and sending the top or "relevant" results to the client system 102 for presentation;
- a results ranking and formatting module 824 for optionally ranking the results from the one or more parallel search systems and for formatting the results for presentation;
- module 826 may include sub-modules, including but not limited to a bounding box creation module 828 and a link creation module 830;
- a label creation module 831 for creating labels that are visual identifiers of respective sub-portions of a visual query
- an annotation module 832 for receiving annotations from a user and sending them to an annotation database 116;
- an actionable search results module 838 for generating, in response to a visual query, one or more actionable search result elements, each configured to launch a client-side action;
- examples of actionable search result elements are buttons to initiate a telephone call, to initiate email message, to map an address, to make a restaurant reservation, and to provide an option to purchase a product;
- a query and annotation database 116 which comprises the database itself 834 and an index to the database 836.
- the results ranking and formatting module 824 ranks the results returned from the one or more parallel search systems (112-A - 112-N, Fig. 1). As already noted above, for some visual queries, only the results from one search system may be relevant. In such an instance, only the relevant search results from that one search system are ranked. For some visual queries, several types of search results may be relevant. In these instances, in some embodiments, the results ranking and formatting module 824 ranks all of the results from the search system having the most relevant result (e.g., the result with the highest relevance score) above the results for the less relevant search systems. In other embodiments, the results ranking and formatting module 824 ranks a top result from each relevant search system above the remaining results.
- the results ranking and formatting module 824 ranks the results in accordance with a relevance score computed for each of the search results.
- augmented textual queries are performed in addition to the searching on parallel visual search systems.
- textual queries are also performed, their results are presented in a manner visually distinctive from the visual search system results.
- the results ranking and formatting module 824 also formats the results.
- the results are presented in a list format.
- the results are presented by means of an interactive results document.
- both an interactive results document and a list of results are presented.
- the type of query dictates how the results are presented. For example, if more than one searchable subject is detected in the visual query, then an interactive results document is produced, while if only one searchable subject is detected the results will be displayed in list format only.
- the results document creation module 826 is used to create an interactive search results document.
- the interactive search results document may have one or more detected and searched subjects.
- the bounding box creation module 828 creates a bounding box around one or more of the searched subjects.
- the bounding boxes may be rectangular boxes, or may outline the shape(s) of the subject(s).
- the link creation module 830 creates links to search results associated with their respective subject in the interactive search results document. In some embodiments, clicking within the bounding box area activates the corresponding link inserted by the link creation module.
- the query and annotation database 116 contains information that can be used to improve visual query results.
- the user may annotate the image after the visual query results have been presented.
- the user may annotate the image before sending it to the visual query search system. Pre-annotation may help the visual query processing by focusing the results, or running text based searches on the annotated words in parallel with the visual query searches.
- annotated versions of a picture can be made public (e.g., when the user has given permission for publication, for example by designating the image and annotation(s) as not private), so as to be returned as a potential image match hit.
- the information from the query and annotation database 116 is periodically pushed to the parallel search systems 112, which incorporate relevant portions of the information (if any) into their respective individual databases 114.
- FIG 7 is a block diagram illustrating one of the parallel search systems utilized to process a visual query.
- Figure 7 illustrates a "generic" server system 112-N in accordance with one embodiment of the present invention.
- This server system is generic only in that it represents any one of the visual query search servers 112-N.
- the generic server system 112-N typically includes one or more processing units (CPU's) 502, one or more network or other communications interfaces 504, memory 512, and one or more
- Memory 512 includes highspeed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non- volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices. Memory 512 may optionally include one or more storage devices remotely located from the CPU(s) 502. Memory 512, or alternately the nonvolatile memory device(s) within memory 512, comprises a non-transitory computer readable storage medium. In some embodiments, memory 512 or the computer readable storage medium of memory 512 stores the following programs, modules and data structures, or a subset thereof:
- an operating system 516 that includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a network communication module 518 that is used for connecting the generic server system 112-N to other computers via the one or more communication network interfaces 504 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- a search application 520 specific to the particular server system it may for example be a bar code search application, a color recognition search application, a product recognition search application, an object-or-object category search application, or the like;
- an optional image database 524 for storing the images relevant to the particular search application, where the image data stored, if any, depends on the search process type;
- the ranking module may assign a relevancy score for each result from the search application, and if no results reach a pre-defined minimum score, may return a null or zero value score to the front end visual query processing server indicating that the results from this server system are not relevant;
- FIG. 8 is a block diagram illustrating an OCR search system 112-B utilized to process a visual query in accordance with one embodiment of the present invention.
- the OCR search system 112-B typically includes one or more processing units (CPU's) 602, one or more network or other communications interfaces 604, memory 612, and one or more communication buses 614 for interconnecting these components.
- Memory 612 includes highspeed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non- volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices. Memory 612 may optionally include one or more storage devices remotely located from the CPU(s) 602. Memory 612, or alternately the nonvolatile memory device(s) within memory 612, comprises a non-transitory computer readable storage medium. In some embodiments, memory 612 or the computer readable storage medium of memory 612 stores the following programs, modules and data structures, or a subset thereof:
- an operating system 616 that includes procedures for handling various basic system services and for performing hardware dependent tasks
- a network communication module 618 that is used for connecting the OCR search system 112-B to other computers via the one or more communication network interfaces 604 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- OCR Optical Character Recognition
- an optional spell check module 622 which improves the conversion of images of letters into characters by checking the converted words against a dictionary and replacing potentially mis-converted letters in words that otherwise match a dictionary word
- an optional named entity recognition module 624 which searches for named entities within the converted text, sends the recognized named entities as terms in a term query to the term query server system (118, Fig. 1), and provides the results from the term query server system as links embedded in the OCRed text associated with the recognized named entities;
- an optional text match application 632 which improves the conversion of images of letters into characters by checking converted segments (such as converted sentences and paragraphs) against a database of text segments and replacing potentially mis- converted letters in OCRed text segments that otherwise match a text match application text segment, in some embodiments the text segment found by the text match application is provided as a link to the user (for example, if the user scanned one page of the New York Times, the text match application may provide a link to the entire posted article on the New York Times website);
- a results ranking and formatting module 626 for formatting the OCRed results for presentation and formatting optional links to named entities, and also optionally ranking any related results from the text match application;
- annotation database (116, Fig. 1) determining if any of the annotation information is relevant to the OCR search system and incorporating any determined relevant portions of the annotation information into the respective annotation database 630.
- Figure 9 is a block diagram illustrating a facial recognition search system 112-
- the facial recognition search system 112-A typically includes one or more processing units (CPU's) 902, one or more network or other communications interfaces 904, memory 912, and one or more communication buses 914 for interconnecting these components.
- Memory 912 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
- Memory 912 may optionally include one or more storage devices remotely located from the CPU(s) 902.
- Memory 912 or alternately the non- volatile memory device(s) within memory 912, comprises a non-transitory computer readable storage medium.
- memory 912 or the computer readable storage medium of memory 912 stores the following programs, modules and data structures, or a subset thereof:
- an operating system 916 that includes procedures for handling various basic system services and for performing hardware dependent tasks
- recognition search system 112-A to other computers via the one or more
- communication network interfaces 904 wireless or wireless and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- a facial recognition search application 920 for searching for facial images matching the face(s) presented in the visual query in a facial image database 114- A and searches the social network database 922 for information regarding each match found in the facial image database 114-A.
- a facial image database 114-A for storing one or more facial images for a plurality of users; optionally, the facial image database includes facial images for people other than users, such as family members and others known by users and who have been identified as being present in images included in the facial image database 114-A; optionally, the facial image database includes facial images obtained from external sources, such as vendors of facial images that are legally in the public domain;
- a social network database 922 which contains information regarding users of the social network such as name, address, occupation, group memberships, social network connections, current GPS location of mobile device, share preferences, interests, age, hometown, personal statistics, work information, etc. as discussed in more detail with reference to Fig. 12 A;
- a results ranking and formatting module 924 for ranking (e.g., assigning a relevance and/or match quality score to) the potential facial matches from the facial image database 114-A and formatting the results for presentation; in some embodiments, the ranking or scoring of results utilizes related information retrieved from the aforementioned social network database ; in some embodiment, the search formatted results include the potential image matches as well as a subset of information from the social network database; and
- an annotation module 926 for receiving annotation information from an annotation database (116, Fig. 1) determining if any of the annotation information is relevant to the facial recognition search system and storing any determined relevant portions of the annotation information into the respective annotation database 928.
- Figure 10 is a block diagram illustrating an image-to -terms search system 112-
- the image-to-terms search system recognizes objects (instance recognition) in the visual query. In other embodiments, the image-to-terms search system recognizes object categories (type recognition) in the visual query. In some embodiments, the image to terms system recognizes both objects and object-categories. The image-to-terms search system returns potential term matches for images in the visual query.
- the image-to-terms search system 112-C typically includes one or more processing units (CPU's) 1002, one or more network or other communications interfaces 1004, memory 1012, and one or more communication buses 1014 for interconnecting these components.
- Memory 1012 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non- volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices. Memory 1012 may optionally include one or more storage devices remotely located from the CPU(s) 1002. Memory 1012, or alternately the non-volatile memory device(s) within memory 1012, comprises a non- transitory computer readable storage medium. In some embodiments, memory 1012 or the computer readable storage medium of memory 1012 stores the following programs, modules and data structures, or a subset thereof:
- an operating system 1016 that includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a network communication module 1018 that is used for connecting the image-to-terms search system 112-C to other computers via the one or more communication network interfaces 1004 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- a terms-to-image inverse index 1022 which stores the textual terms used by users when searching for images using a text based query search engine 1006;
- an annotation module 1026 for receiving annotation information from an annotation database (116, Fig. 1) determining if any of the annotation information is relevant to the image-to terms search system 112-C and storing any determined relevant portions of the annotation information into the respective annotation database 1028.
- Figures 5-10 are intended more as functional descriptions of the various features which may be present in a set of computer systems than as a structural schematic of the embodiments described herein.
- items shown separately could be combined and some items could be separated.
- some items shown separately in these figures could be implemented on single servers and single items could be implemented by one or more servers.
- the actual number of systems used to implement visual query processing and how features are allocated among them will vary from one implementation to another.
- Each of the methods described herein may be governed by instructions that are stored in a non-transitory computer readable storage medium and that are executed by one or more processors of one or more servers or clients.
- the above identified modules or programs i.e., sets of instructions
- Each of the operations shown in Figures 5-10 may correspond to instructions stored in a computer memory or non-transitory computer readable storage medium.
- Figure 11 illustrates a client system 102 with a screen shot of an exemplary visual query 1102.
- the client system 102 shown in Figure 11 is a mobile device such as a cellular telephone, portable music player, or portable emailing device.
- the client system 102 includes a display 706 and one or more input means 708 such the buttons shown in this figure.
- the display 706 is a touch sensitive display 709.
- soft buttons displayed on the display 709 may optionally replace some or all of the electromechanical buttons 708.
- Touch sensitive displays are also helpful in interacting with the visual query results as explained in more detail below.
- the client system 102 also includes an image capture mechanism such as a camera 710.
- Figure 11 illustrates a visual query 1102 which is a photograph or video frame of a package on a shelf of a store.
- the visual query is a two dimensional image having a resolution corresponding to the size of the visual query in pixels in each of two dimensions.
- the visual query 1102 in this example is a two
- the visual query 1102 includes background elements, a product package 1104, and a variety of types of entities on the package including an image of a person 1106, an image of a trademark 1108, an image of a product 1110, and a variety of textual elements 1112.
- the visual query 1102 is sent to the front end server 110, which sends the visual query 1102 to a plurality of parallel search systems (112A-N), receives the results and creates an interactive results document.
- Figures 12A and 12B each illustrate a client system 102 with a screen shot of an embodiment of an interactive results document 1200.
- the interactive results document 1200 includes one or more visual identifiers 1202 of respective sub-portions of the visual query 1102, which each include a user selectable link to a subset of search results.
- Figures 12A and 12B illustrate an interactive results document 1200 with visual identifiers that are bounding boxes 1202 (e.g., bounding boxes 1202-1, 1202-2, 1202-3).
- the user activates the display of the search results corresponding to a particular sub-portion by tapping on the activation region inside the space outlined by its bounding box 1202.
- the user would activate the search results corresponding to the image of the person, by tapping on a bounding box 1306 ( Figure 13) surrounding the image of the person.
- the selectable link is selected using a mouse or keyboard rather than a touch sensitive display.
- the first corresponding search result is displayed when a user previews a bounding box 1202 (i.e., when the user single clicks, taps once, or hovers a pointer over the bounding box).
- the user activates the display of a plurality of corresponding search results when the user selects the bounding box (i.e., when the user double clicks, taps twice, or uses another mechanism to indicate selection.)
- the visual identifiers are bounding boxes 1202 surrounding sub-portions of the visual query.
- Figure 12A illustrates bounding boxes 1202 that are square or rectangular.
- Figure 12B illustrates a bounding box 1202 that outlines the boundary of an identifiable entity in the sub-portion of the visual query, such as the bounding box 1202-3 for a drink bottle.
- a respective bounding box 1202 includes smaller bounding boxes 1202 within it.
- the bounding box identifying the package 1202-1 surrounds the bounding box identifying the trademark 1202-2 and all of the other bounding boxes 1202.
- active hot links 1204 for some of the textual terms.
- Figure 12B shows an example where "Active Drink” and "United States" are displayed as hot links 1204.
- the search results corresponding to these terms are the results received from the term query server system 118, whereas the results corresponding to the bounding boxes are results from the query by image search systems.
- Figure 13 illustrates a client system 102 with a screen shot of an interactive results document 1200 that is coded by type of recognized entity in the visual query.
- the visual query of Figure 11 contains an image of a person 1106, an image of a trademark 1108, an image of a product 1110, and a variety of textual elements 1112.
- the interactive results document 1200 displayed in Figure 13 includes bounding boxes 1202 around a person 1306, a trademark 1308, a product 1310, and the two textual areas 1312.
- the bounding boxes of Figure 13 are each presented with separate cross-hatching which represents differently colored transparent bounding boxes 1202.
- the visual identifiers of the bounding boxes are formatted for presentation in visually distinctive manners such as overlay color, overlay pattern, label background color, label background pattern, label font color, and bounding box border color.
- the type coding for particular recognized entities is shown with respect to bounding boxes in Figure 13, but coding by type can also be applied to visual identifiers that are labels.
- Figure 14 illustrates a client device 102 with a screen shot of an interactive results document 1200 with labels 1402 being the visual identifiers of respective sub-portions of the visual query 1102 of Figure 11.
- the label visual identifiers 1402 each include a user selectable link to a subset of corresponding search results.
- the selectable link is identified by descriptive text displayed within the area of the label 1402.
- Some embodiments include a plurality of links within one label 1402. For example, in Figure 14, the label hovering over the image of a woman drinking includes a link to facial recognition results for the woman and a link to image recognition results for that particular picture (e.g., images of other products or advertisements using the same picture.)
- the labels 1402 are displayed as partially transparent areas with text that are located over their respective sub-portions of the interactive results document.
- a respective label is positioned near but not located over its respective sub-portion of the interactive results document.
- the labels are coded by type in the same manner as discussed with reference to Figure 13.
- the user activates the display of the search results corresponding to a particular sub-portion corresponding to a label 1302 by tapping on the activation region inside the space outlined by the edges or periphery of the label 1302.
- the same previewing and selection functions discussed above with reference to the bounding boxes of Figures 12A and 12B also apply to the visual identifiers that are labels 1402.
- Figure 15 illustrates a screen shot of an interactive results document 1200 and the original visual query 1102 displayed concurrently with a results list 1500.
- the interactive results document 1200 is displayed by itself as shown in Figures 12-14. In other embodiments, the interactive results document 1200 is displayed
- the list of visual query results 1500 is concurrently displayed along with the original visual query 1102 and/or the interactive results document 1200.
- the type of client system and the amount of room on the display 706 may determine whether the list of results 1500 is displayed concurrently with the interactive results document 1200.
- the client system 102 receives (in response to a visual query submitted to the visual query server system) both the list of results 1500 and the interactive results document 1200, but only displays the list of results 1500 when the user scrolls below the interactive results document 1200. In some of these embodiments, the client system 102 displays the results
- the list of results 1500 is organized into categories
- Each category contains at least one result 1503.
- the categories titles are highlighted to distinguish them from the results 1503.
- the categories 1502 are ordered according to their calculated category weight.
- the category weight is a combination of the weights of the highest N results in that category. As such, the category that has likely produced more relevant results is displayed first. In embodiments where more than one category 1502 is returned for the same recognized entity (such as the facial image recognition match and the image match shown in Figure 15) the category displayed first has a higher category weight.
- the cursor when a selectable link in the interactive results document 1200 is selected by a user of the client system 102, the cursor will automatically move to the appropriate category 1502 or to the first result 1503 in that category.
- the list of results 1500 is reordered such that the category or categories relevant to the selected link are displayed first. This is accomplished, for example, by either coding the selectable links with information identifying the corresponding search results, or by coding the search results to indicate the corresponding selectable links or to indicate the corresponding result categories.
- the categories of the search results correspond to the query-by-image search system that produce those search results.
- some of the categories are product match 1506, logo match 1508, facial recognition match 1510, image match 1512.
- the original visual query 1102 and/or an interactive results document 1200 may be similarly displayed with a category title such as the query 1504.
- results from any term search performed by the term query server may also be displayed as a separate category, such as web results 1514.
- more than one entity in a visual query will produce results from the same query-by-image search system.
- the visual query could include two different faces that would return separate results from the facial recognition search system.
- the categories 1502 are divided by recognized entity rather than by search system.
- an image of the recognized entity is displayed in the recognized entity category header 1502 such that the results for that recognized entity are distinguishable from the results for another recognized entity, even though both results are produced by the same query by image search system.
- the product match category 1506 includes two entity product entities and as such as two entity categories 1502 - a boxed product 1516 and a bottled product 1518, each of which have a plurality of corresponding search results 1503.
- the categories may be divided by recognized entities and type of query-by-image system. For example, in Figure 15, there are two separate entities that returned relevant results under the product match category product.
- the results 1503 include thumbnail images.
- thumbnail images For example, as shown for the facial recognition match results in Figure 15, small versions (also called thumbnail images) of the pictures of the facial matches for "Actress X" and "Social Network Friend Y" are displayed along with some textual description such as the name of the person in the image.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
- Processing Or Creating Images (AREA)
Abstract
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP10742686A EP2462518A1 (fr) | 2009-08-07 | 2010-08-05 | Interface utilisateur de présentation de résultats de recherche pour de multiples régions d'une interrogation visuelle |
BR112012002803A BR112012002803A2 (pt) | 2009-08-07 | 2010-08-05 | método implementado por computador para processamento de uma consulta visual, sistema servidor,e, mídia de armazenamento não temporário legível por computador |
CN2010800451970A CN102667764A (zh) | 2009-08-07 | 2010-08-05 | 用于为视觉查询的多个区域展示搜索结果的用户接口 |
KR1020127006115A KR101670956B1 (ko) | 2009-08-07 | 2010-08-05 | 시각 질의의 다수 영역들에 대한 검색 결과들을 제시하기 위한 사용자 인터페이스 |
AU2010279334A AU2010279334A1 (en) | 2009-08-07 | 2010-08-05 | User interface for presenting search results for multiple regions of a visual query |
CA2770186A CA2770186C (fr) | 2009-08-07 | 2010-08-05 | Interface utilisateur de presentation de resultats de recherche pour de multiples regions d'une interrogation visuelle |
JP2012523961A JP2013501976A (ja) | 2009-08-07 | 2010-08-05 | 視覚クエリの複数の領域についての検索結果を提示するためのユーザインターフェイス |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US23239709P | 2009-08-07 | 2009-08-07 | |
US61/232,397 | 2009-08-07 | ||
US26612209P | 2009-12-02 | 2009-12-02 | |
US61/266,122 | 2009-12-02 | ||
US12/850,513 US9087059B2 (en) | 2009-08-07 | 2010-08-04 | User interface for presenting search results for multiple regions of a visual query |
US12/850,513 | 2010-08-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011017558A1 true WO2011017558A1 (fr) | 2011-02-10 |
Family
ID=43544672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2010/044604 WO2011017558A1 (fr) | 2009-08-07 | 2010-08-05 | Interface utilisateur de présentation de résultats de recherche pour de multiples régions d'une interrogation visuelle |
Country Status (8)
Country | Link |
---|---|
EP (1) | EP2462518A1 (fr) |
JP (2) | JP2013501976A (fr) |
KR (1) | KR101670956B1 (fr) |
CN (1) | CN102667764A (fr) |
AU (1) | AU2010279334A1 (fr) |
BR (1) | BR112012002803A2 (fr) |
CA (1) | CA2770186C (fr) |
WO (1) | WO2011017558A1 (fr) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102594896A (zh) * | 2012-02-23 | 2012-07-18 | 广州商景网络科技有限公司 | 电子相片共享方法及系统 |
JP2014006680A (ja) * | 2012-06-25 | 2014-01-16 | Sony Corp | ビデオ記録装置、情報処理システム、情報処理方法および記録媒体 |
WO2014035430A1 (fr) | 2012-08-31 | 2014-03-06 | Hewlett-Packard Development Company, L.P. | Régions actives d'une image comprenant des liens accessibles |
CN104462423A (zh) * | 2014-12-15 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | 搜索方法、装置和移动终端 |
US9355317B2 (en) | 2011-12-14 | 2016-05-31 | Nec Corporation | Video processing system, video processing method, video processing device for mobile terminal or server and control method and control program thereof |
EP3188034A4 (fr) * | 2014-08-25 | 2017-07-05 | ZTE Corporation | Procédé de traitement de données reposant sur un terminal d'affichage |
US10089412B2 (en) | 2015-03-30 | 2018-10-02 | Yandex Europe Ag | Method of and system for processing a search query |
US10255240B2 (en) | 2014-03-27 | 2019-04-09 | Yandex Europe Ag | Method and system for processing a voice-based user-input |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488769B (zh) * | 2013-09-27 | 2017-06-06 | 中国科学院自动化研究所 | 一种基于多媒体数据挖掘的地标信息检索方法 |
KR101588950B1 (ko) * | 2014-03-28 | 2016-01-26 | 주식회사 에스원 | 시장 점유율 판별 시스템, 시장 점유율 판별 방법 및 그 판별 시스템을 실행할 수 있는 프로그램을 기록한 컴퓨터로 읽을 수 있는 매체 |
CN105765577A (zh) * | 2014-09-29 | 2016-07-13 | 微软技术许可有限责任公司 | 可定制的数据服务 |
US20160132567A1 (en) * | 2014-11-12 | 2016-05-12 | Microsoft Corporation | Multi-search and multi-task in search |
US10102565B2 (en) | 2014-11-21 | 2018-10-16 | Paypal, Inc. | System and method for content integrated product purchasing |
CN104536995B (zh) * | 2014-12-12 | 2016-05-11 | 北京奇虎科技有限公司 | 基于终端界面触控操作进行搜索的方法及系统 |
KR102339461B1 (ko) * | 2014-12-18 | 2021-12-15 | 삼성전자 주식회사 | 전자 장치의 텍스트 기반 컨텐츠 운용 방법 및 장치 |
WO2016101768A1 (fr) * | 2014-12-26 | 2016-06-30 | 北京奇虎科技有限公司 | Terminal et procédé de recherche à base d'opération tactile et dispositif |
US10621676B2 (en) * | 2015-02-04 | 2020-04-14 | Vatbox, Ltd. | System and methods for extracting document images from images featuring multiple documents |
US10579330B2 (en) * | 2015-05-13 | 2020-03-03 | Microsoft Technology Licensing, Llc | Automatic visual display of audibly presented options to increase user efficiency and interaction performance |
JP2018523251A (ja) * | 2015-08-03 | 2018-08-16 | オランド エセ.ア. | カタログ内の製品を検索するためのシステムおよび方法 |
BR112018008266A2 (pt) * | 2015-10-25 | 2018-10-23 | Alva Alta Lda | embalagem reconhecível por diferentes tipos de meios, sistema e processo para preparação de produtos edíveis com base nas referidas embalagens reconhecíveis |
US10528613B2 (en) * | 2015-11-23 | 2020-01-07 | Advanced Micro Devices, Inc. | Method and apparatus for performing a parallel search operation |
US9779293B2 (en) * | 2016-01-27 | 2017-10-03 | Honeywell International Inc. | Method and tool for post-mortem analysis of tripped field devices in process industry using optical character recognition and intelligent character recognition |
DE102016201373A1 (de) | 2016-01-29 | 2017-08-03 | Robert Bosch Gmbh | Verfahren zu einer Erkennung von Objekten, insbesondere von dreidimensionalen Objekten |
EP3491626A1 (fr) * | 2016-07-26 | 2019-06-05 | Google LLC | Outil de navigation géocontextuel interactif |
CN106484817B (zh) * | 2016-09-26 | 2020-06-26 | 广州致远电子有限公司 | 一种数据搜索方法及系统 |
US10346727B2 (en) * | 2016-10-28 | 2019-07-09 | Adobe Inc. | Utilizing a digital canvas to conduct a spatial-semantic search for digital visual media |
US10558857B2 (en) * | 2018-03-05 | 2020-02-11 | A9.Com, Inc. | Visual feedback of process state |
CN109168069A (zh) * | 2018-09-03 | 2019-01-08 | 聚好看科技股份有限公司 | 一种识别结果分区域显示方法、装置及智能电视 |
CN109189289B (zh) * | 2018-09-03 | 2021-12-24 | 聚好看科技股份有限公司 | 一种基于截屏图像生成图标的方法及装置 |
TWI768232B (zh) * | 2019-08-07 | 2022-06-21 | 上銀科技股份有限公司 | 線性傳動裝置的影像判定系統及其影像判定方法 |
CN112417192B (zh) * | 2019-08-21 | 2024-08-30 | 上银科技股份有限公司 | 线性传动装置的影像判定系统及其影像判定方法 |
JP7379059B2 (ja) * | 2019-10-02 | 2023-11-14 | キヤノン株式会社 | 中間サーバ装置、情報処理装置、通信方法 |
CN113297475B (zh) * | 2021-03-26 | 2024-10-22 | 淘宝(中国)软件有限公司 | 商品对象信息搜索方法、装置及电子设备 |
CN114581360B (zh) * | 2021-04-01 | 2024-03-12 | 正泰集团研发中心(上海)有限公司 | 光伏组件标签检测方法、装置、设备和计算机存储介质 |
CN113901257B (zh) | 2021-10-28 | 2023-10-27 | 北京百度网讯科技有限公司 | 地图信息的处理方法、装置、设备和存储介质 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060253491A1 (en) * | 2005-05-09 | 2006-11-09 | Gokturk Salih B | System and method for enabling search and retrieval from image files based on recognized information |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09330336A (ja) * | 1996-06-11 | 1997-12-22 | Sony Corp | 情報処理装置 |
US7016532B2 (en) * | 2000-11-06 | 2006-03-21 | Evryx Technologies | Image capture and identification system and process |
JP2003150617A (ja) * | 2001-11-12 | 2003-05-23 | Olympus Optical Co Ltd | 画像処理装置およびプログラム |
JP2005165461A (ja) * | 2003-11-28 | 2005-06-23 | Nifty Corp | 情報提供装置及び情報提供プログラム |
JP4413633B2 (ja) * | 2004-01-29 | 2010-02-10 | 株式会社ゼータ・ブリッジ | 情報検索システム、情報検索方法、情報検索装置、情報検索プログラム、画像認識装置、画像認識方法および画像認識プログラム、ならびに、販売システム |
US7751805B2 (en) * | 2004-02-20 | 2010-07-06 | Google Inc. | Mobile image-based information retrieval system |
WO2006043319A1 (fr) * | 2004-10-20 | 2006-04-27 | Fujitsu Limited | Terminal et serveur |
US7809192B2 (en) * | 2005-05-09 | 2010-10-05 | Like.Com | System and method for recognizing objects from images and identifying relevancy amongst images and information |
JP2007018166A (ja) * | 2005-07-06 | 2007-01-25 | Nec Corp | 情報検索装置、情報検索システム、情報検索方法及び情報検索プログラム |
JP2007018456A (ja) * | 2005-07-11 | 2007-01-25 | Nikon Corp | 情報表示装置及び情報表示方法 |
JP2007026316A (ja) * | 2005-07-20 | 2007-02-01 | Yamaha Motor Co Ltd | 画像管理装置、ならびに画像管理用コンピュータプログラムおよびそれを記録した記録媒体 |
US8849821B2 (en) * | 2005-11-04 | 2014-09-30 | Nokia Corporation | Scalable visual search system simplifying access to network and device functionality |
US20080267504A1 (en) * | 2007-04-24 | 2008-10-30 | Nokia Corporation | Method, device and computer program product for integrating code-based and optical character recognition technologies into a mobile visual search |
-
2010
- 2010-08-05 KR KR1020127006115A patent/KR101670956B1/ko active IP Right Grant
- 2010-08-05 JP JP2012523961A patent/JP2013501976A/ja active Pending
- 2010-08-05 EP EP10742686A patent/EP2462518A1/fr not_active Ceased
- 2010-08-05 AU AU2010279334A patent/AU2010279334A1/en not_active Abandoned
- 2010-08-05 BR BR112012002803A patent/BR112012002803A2/pt not_active IP Right Cessation
- 2010-08-05 CN CN2010800451970A patent/CN102667764A/zh active Pending
- 2010-08-05 WO PCT/US2010/044604 patent/WO2011017558A1/fr active Application Filing
- 2010-08-05 CA CA2770186A patent/CA2770186C/fr active Active
-
2014
- 2014-12-17 JP JP2014254890A patent/JP6025812B2/ja active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060253491A1 (en) * | 2005-05-09 | 2006-11-09 | Gokturk Salih B | System and method for enabling search and retrieval from image files based on recognized information |
Non-Patent Citations (1)
Title |
---|
ANAGNOSTOPOULOS I ET AL: "Information fusion meta-search interface for precise photo acquisition on the web", INFORMATION TECHNOLOGY INTERFACES, 2003. ITI 2003. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON JUNE 16-19, 2003, PISCATAWAY, NJ, USA,IEEE, 16 June 2003 (2003-06-16), pages 375 - 381, XP010654750, ISBN: 978-953-96769-6-2 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9355317B2 (en) | 2011-12-14 | 2016-05-31 | Nec Corporation | Video processing system, video processing method, video processing device for mobile terminal or server and control method and control program thereof |
CN102594896A (zh) * | 2012-02-23 | 2012-07-18 | 广州商景网络科技有限公司 | 电子相片共享方法及系统 |
JP2014006680A (ja) * | 2012-06-25 | 2014-01-16 | Sony Corp | ビデオ記録装置、情報処理システム、情報処理方法および記録媒体 |
WO2014035430A1 (fr) | 2012-08-31 | 2014-03-06 | Hewlett-Packard Development Company, L.P. | Régions actives d'une image comprenant des liens accessibles |
CN104583983A (zh) * | 2012-08-31 | 2015-04-29 | 惠普发展公司,有限责任合伙企业 | 具有可访问的链接的图像的活动区域 |
US20150242522A1 (en) * | 2012-08-31 | 2015-08-27 | Qian Lin | Active regions of an image with accessible links |
EP2891068A4 (fr) * | 2012-08-31 | 2016-01-20 | Hewlett Packard Development Co | Régions actives d'une image comprenant des liens accessibles |
US10210273B2 (en) | 2012-08-31 | 2019-02-19 | Hewlett-Packard Development Company, L.P. | Active regions of an image with accessible links |
US10255240B2 (en) | 2014-03-27 | 2019-04-09 | Yandex Europe Ag | Method and system for processing a voice-based user-input |
EP3188034A4 (fr) * | 2014-08-25 | 2017-07-05 | ZTE Corporation | Procédé de traitement de données reposant sur un terminal d'affichage |
CN104462423A (zh) * | 2014-12-15 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | 搜索方法、装置和移动终端 |
US10089412B2 (en) | 2015-03-30 | 2018-10-02 | Yandex Europe Ag | Method of and system for processing a search query |
Also Published As
Publication number | Publication date |
---|---|
BR112012002803A2 (pt) | 2019-09-24 |
KR101670956B1 (ko) | 2016-10-31 |
CN102667764A (zh) | 2012-09-12 |
CA2770186C (fr) | 2018-05-22 |
JP2013501976A (ja) | 2013-01-17 |
JP6025812B2 (ja) | 2016-11-16 |
AU2010279334A1 (en) | 2012-03-15 |
KR20120055627A (ko) | 2012-05-31 |
CA2770186A1 (fr) | 2011-02-10 |
EP2462518A1 (fr) | 2012-06-13 |
JP2015062141A (ja) | 2015-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190012334A1 (en) | Architecture for Responding to Visual Query | |
CA2770186C (fr) | Interface utilisateur de presentation de resultats de recherche pour de multiples regions d'une interrogation visuelle | |
US9087059B2 (en) | User interface for presenting search results for multiple regions of a visual query | |
CA2781845C (fr) | Resultats de recherche susceptibles d'action pour requetes visuelles | |
CA2770239C (fr) | Reconnaissance faciale avec assistance de reseau social | |
US9183224B2 (en) | Identifying matching canonical documents in response to a visual query | |
US20110128288A1 (en) | Region of Interest Selector for Visual Queries | |
AU2016200659B2 (en) | Architecture for responding to a visual query | |
AU2017200336A1 (en) | Facial recognition with social network aiding | |
AU2013245488A1 (en) | Facial recognition with social network aiding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080045197.0 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10742686 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2770186 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012523961 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010279334 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010742686 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20127006115 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2010279334 Country of ref document: AU Date of ref document: 20100805 Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112012002803 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 112012002803 Country of ref document: BR Kind code of ref document: A2 Effective date: 20120207 |