KR102032038B1 - Recognize items depicted by images - Google Patents

Recognize items depicted by images

Info

Publication number
KR102032038B1
KR102032038B1 KR1020177023364A KR20177023364A KR102032038B1 KR 102032038 B1 KR102032038 B1 KR 102032038B1 KR 1020177023364 A KR1020177023364 A KR 1020177023364A KR 20177023364 A KR20177023364 A KR 20177023364A KR 102032038 B1 KR102032038 B1 KR 102032038B1
Authority
KR
South Korea
Prior art keywords
set
candidate
image
match
matches
Prior art date
Application number
KR1020177023364A
Other languages
Korean (ko)
Other versions
KR20170107039A (en
Inventor
케빈 쉬
웨이 디
비그네쉬 자가디쉬
로빈슨 피라무수
Original Assignee
이베이 인크.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US201562107095P priority Critical
Priority to US62/107,095 priority
Priority to US14/973,582 priority patent/US20160217157A1/en
Priority to US14/973,582 priority
Application filed by 이베이 인크. filed Critical 이베이 인크.
Priority to PCT/US2016/012691 priority patent/WO2016118339A1/en
Publication of KR20170107039A publication Critical patent/KR20170107039A/en
Application granted granted Critical
Publication of KR102032038B1 publication Critical patent/KR102032038B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

Products (eg, books) contain a significant amount of informative document information that can be used to identify an item. The input query image is a picture of the product (eg, a picture taken using a mobile phone). The picture is taken at any angle and direction and includes any background (eg, a background with significant clutter). From the query image, the identification server retrieves the corresponding clean catalog image from the database. For example, the database can be a product database with a product name, product image, product price, product sales history, or any suitable combination thereof. The search is performed both by matching the image in the database with the image and by matching the text in the database with the text retrieved from the image.

Description

Recognize items depicted by images

This application claims U.S. Provisional Application No. 62 / 107,095, filed Jan. 23, 2015 entitled "Efficient Media Retrieval," and US Patent Application No., filed December 17, 2015 entitled "Recognition of Items Depicted in Images." Claims priority to 14 / 973,582, each of which is incorporated herein by reference in its entirety.

The subject matter disclosed herein relates generally to a computer system for identifying items depicted in images. In particular, the present disclosure addresses systems and methods relating to the efficient retrieval of data for items from a media database.

The item recognition engine may have a high success rate in recognizing items depicted as images when the query image is cooperative. Collaborative images are taken with proper lighting, items are facing the camera directly and aligned properly, and images do not depict objects other than items. The item recognition engine may be unable to recognize an item depicted in a non-cooperative image. Background art associated with the present disclosure may be referred to, for example, US Patent Publication No. 2006/0147127.

Some embodiments are shown by way of example and not by way of limitation in the figures of the accompanying drawings.
1 is a network diagram illustrating a network environment suitable for identifying items depicted as images, in accordance with some example embodiments.
2 is a block diagram illustrating components of an identification server suitable for identifying an item depicted as an image, in accordance with some example embodiments.
3 is a block diagram illustrating components of a device suitable for communicating with a server configured to capture an image of an item and identify the item depicted in the image, in accordance with some example embodiments.
4 illustrates a reference image and an uncooperative image of an item, in accordance with some example embodiments.
5 illustrates a text extraction operation to identify an item depicted as an image, in accordance with some example embodiments.
6 illustrates a set of proposed matches for an item and an input image depicting the item in accordance with some example embodiments.
7 is a flowchart illustrating operation of a server to perform a process of identifying items in an image, in accordance with some example embodiments.
8 is a flowchart illustrating the operation of a server to perform a process of automatically generating a sales listing for an item depicted as an image, in accordance with some example embodiments.
9 is a flowchart illustrating operation of a server to perform a process of providing results based on items depicted as images in accordance with some example embodiments.
10 is a block diagram illustrating an example of a software architecture that may be installed on a machine, in accordance with some example embodiments.
11 shows a schematic representation of a machine in the form of a computer system that a set of instructions may be executed to cause a machine to perform any one or more methodologies discussed herein, in accordance with some example embodiments.

Example methods and systems relate to the identification of items depicted in images. The examples are merely typical variations possible. Unless expressly stated otherwise, the components and functions may be optional and integrated or subdivided, and the operations may be sequentially changed, integrated or subdivided. In the following description, for purposes of illustration, numerous specific details are set forth in order to provide a thorough understanding of the illustrative embodiments. However, it will be apparent to one skilled in the art that the subject matter of the present invention may be practiced without these specific details.

Products (eg, books or compact discs (CDs)) contain a significant amount of informative document information that can be used to identify an item primarily from an image depicting the item. Portions of the product containing such document information include the front cover, back cover, and back of the book, the front, back, and the like of a CD, Digital Video Disc (DVD), or Blu-ray ™ disc. Other parts of the products that contain informative document information are covers, packaging, and user manuals. Traditional optical character reading (OCR) can be used when the text on the item is aligned with the edge of the image and the image quality is high. Collaborative images are taken with proper lighting, items are facing the camera directly and aligned properly, and images do not depict objects other than items. Images lacking one or more of these features are referred to as "uncooperative". As an example, images taken in low light are uncooperative. As another example, an image that includes an occlusion that blocks one or more portions of the item depiction is also uncooperative. Traditional OCR can fail when processing uncooperative images. Thus, the use of OCR at the supplemental word level can provide some information about potential matches that can be supplemented by the use of direct image classification (using deep overlapping neural networks (CNNs)).

In some example embodiments, the picture (eg, a picture taken using a mobile phone) is an input query image. The picture is taken at any angle and direction and includes any background (eg, a background with significant clutter). From the query image, the identification server retrieves the corresponding clean catalog image from the database. For example, the database can be a product database with a product name, product image, product price, product sales history, or any suitable combination thereof. The search is performed both by matching the image with the image in the database and by matching the text retrieved from the image with the text in the database.

1 is a network diagram illustrating a network environment suitable for identifying items depicted as images, in accordance with some example embodiments. Network environment 100 includes e-commerce servers 120 and 140, identification server 130, and devices 150A, 150B, and 150C, all of which are communicatively coupled to one another via network 170. Devices 150A, 150B, and 150C may be collectively referred to as "devices 150" or generally referred to as "device 150". E-commerce servers 120 and 140 and identification server 130 may be part of network-based system 100. Alternatively, device 150 may connect to identification server 130 directly or via a local network separate from network 170 used to connect to e-commerce server 120 or 140. As described below in connection with FIGS. 10 and 11, the e-commerce server 120 and 140, the identification server 130, and the device 150 may each be implemented in whole or in part as a computer system.

E-commerce servers 120 and 140 provide e-commerce applications to other machines (eg, device 150) via network 170. E-commerce servers 120 and 140 may also be directly connected to or integrated with identification server 130. In some example embodiments, one e-commerce server 120 and identification server 130 are part of the network-based system 110, while another e-commerce server (eg, e-commerce server 140 is network-based). Separate from system 110. E-commerce applications may provide a way for users to buy and sell items directly from each other, to buy and sell from an e-commerce application provider, or both.

User 160 is also shown in FIG. 1. User 160 may be a human user (eg, a human), a machine user (eg, a computer configured by a software program to interact with device 150 and identification server 130), or any thereof. A suitable combination (eg, a person assisted by a machine or a machine supervised by a person). User 160 is not part of network environment 100, but may be associated with and may be a user of device 150. For example, device 150 may be a sensor, desktop computer, vehicle computer, tablet computer, navigation device, portable media device, or smartphone belonging to user 160.

In some example embodiments, identification server 130 receives data regarding items of interest to the user. For example, the camera attached to the device 150A may take an image of the item that the user 160 wants to sell and transmit the image to the identification server 130 via the network 170. The identification server 130 identifies the item based on the image. Information about the identified item may be sent to the e-commerce server 120 or 140, the device 150A, or any combination thereof. Information to assist in creating a listing of items for sale may be used by the e-commerce server 120 or 140. Similarly, the image may be an item of interest to user 160, and information to assist in selecting a listing of items for display to user 160 may be used by e-commerce server 120 or 140. have.

Any machine, database, or device shown in FIG. 1 may be modified (eg, configured or programmed) by software to be a special purpose computer that performs the functions described herein for that machine, database, or device. ) Can be implemented as a general purpose computer. For example, a computer system capable of implementing any one or more of the methodologies described herein is described below with respect to FIGS. 10 and 11. As used herein, a “database” is a data storage resource and is a text file, table, spreadsheet, relational database (eg, an object relational database), triple store, hierarchical data store, or any suitable combination thereof. Can store structured data. In addition, any two or more machines, databases, or devices shown in FIG. 1 may be integrated into a single machine, and the functions described herein for any single machine, database, or device may include multiple machines, databases, or Can be broken down into devices.

Network 170 may be any network that enables communication between or among machines, databases, and devices (eg, identification server 130 and device 150). Thus, network 170 may be a wired network, a wireless network (eg, a mobile or cellular network), or any suitable combination thereof. Network 170 may include one or more portions consisting of a private network, a public network (eg, the Internet), or any suitable combination thereof.

2 is a block diagram illustrating components of identification server 130 in accordance with some example embodiments. Identification server 130 may include communication module 210, text identification module 220, image identification module 230, ranking module 240, user interface (UI) module 250, listing module 260, and It is shown as including storage module 270, and is configured to all communicate with one another (eg, via a bus, shared memory, or switch). Any one or more modules described herein may be implemented using hardware (eg, a processor of a machine). In addition, any two or more of these modules may be integrated into a single module, and the functions described herein for a single module may be subdivided among multiple modules. In addition, according to various example embodiments, the modules described herein as implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.

The communication module 210 is configured to transmit and receive data. For example, the communication module 210 may receive image data through the network 170 and transmit the received data to the text identification module 220 and the image identification module 230. As another example, ranking module 240 may determine the best match for the depicted item, and the identifier for the item to be sent by communication module 210 to e-commerce server 120 via network 170. Can be. The image data may be a two-dimensional image, a frame from a continuous video stream, a three-dimensional image, a depth image, an infrared image, a binocular image, or any suitable combination thereof.

The text identification module 220 is configured to generate a set of proposed matches for the item depicted with the input image based on the text extracted from the input image. For example, the text extracted from the input image can be matched against the text in the database and the top n (eg, top 5) matches are reported as the suggested match for the item.

Image identification module 230 is configured to use the image matching technique to generate a set of proposed matches for the item depicted as the input image. For example, a CNN skilled to distinguish between different media items can be used to report the probability of a match between the depicted item and one or more media items. For the purposes of this CNN, a media item is an item of media that can be depicted. For example, books, CDs, and DVDs are all media items. Pure electronic media, such as MP4 audio files, are also "media items" in this sense if they are associated with an image. For example, the electronic download version of the CD may be associated with a cover image of the CD that has been modified to include a marker indicating that the version is an electronic download. Thus, the skilled CNN of image identification module 230 may identify the probability of a particular image matching the downloadable version of the CD separated from the probability of the particular image matching the physical version of the CD.

The ranking module 240 integrates and merges the set of proposed matches for the item generated by the text identification module 220 with the set of suggested matches for the item generated by the image identification module 230. Is configured to rank. For example, text identification module 220 and image identification module 230 may each provide a score for a proposed match and ranking module 240 may incorporate them using weighting factors. Ranking module 240 may report the top suggested proposal as an identified item depicted as an image. The weight used by ranking module 240 may be determined using an ordinal regression support vector machine (OR-SVM).

User interface module 250 is configured to cause a user interface to be presented on one or more user devices 150A- 150C. For example, the user interface module 250 may be implemented by a web server that provides a hypertext markup language (HTML) file to the user device 150 via the network 170. The user interface includes the image received by the communication module 210, the data retrieved from the storage module 270 regarding the item identified in the image by the ranking module 240, the item generated or selected by the listing module 260. Listings, or any suitable combination thereof.

The listing module 260 is configured to generate an item listing for the identified item using the ranking module. For example, after a user uploads an image depicting an item and the item has been successfully identified, listing module 260 may display an item image from an item catalog, an item title from an item catalog, a description from an item catalog, or Item listings can be created that include any suitable combination of these. The user may be prompted to confirm or modify the generated listing, or the generated listing may be automatically published in response to the identification of the depicted item. The listing may be sent to the e-commerce server 120 or 140 via the communication module 210. In some example embodiments, the listing module 260 may be implemented as an e-commerce server 120 or 140 and listings may be included in the identifiers for the items sent from the identification server 130 to the e-commerce server 120 or 140. Generated in response.

The storage module 270 stores data generated and used by the text identification module 220, the image identification module 230, the ranking module 240, the user interface module 250, and the listing module 260. Configured to search. For example, the classifier used by image identification module 230 may be stored by storage module 270. Information regarding the identification of the item depicted by the image, generated by the ranking module 240, may also be stored by the storage module 270. E-commerce server 120 or 140 may be retrieved from storage by storage module 270 and transmitted over network 170 using communication module 210 (eg, image, By providing an image identifier or both).

3 is a block diagram illustrating components of device 150 in accordance with some example embodiments. Device 150 is shown as comprising an input module 310, a camera module 320, and a communication module 330, all configured to communicate with one another (eg, via a bus, shared memory, or switch). do. Any one or more modules described herein may be implemented using hardware (eg, a processor of a machine). In addition, any two or more modules may be integrated into a single module, and the functions described herein for a single module may be subdivided among multiple modules. In addition, according to various example embodiments, the modules described herein as implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.

The input module 310 is configured to receive input from a user via a user interface. For example, the user may enter his username and password into the input module, configure the camera, select the image to be used as a basis for listing or item retrieval, or perform any combination thereof.

Camera module 320 is configured to capture image data. For example, an image may be received from a camera, a depth image may be received from an infrared camera, and a pair of images may be received from a binocular camera.

The communication module 330 is configured to deliver data received by the input module 310 or the camera module 320 to the identification server 130, the e-commerce server 120, or the e-commerce server 140. For example, the input module 310 may receive a selection of an image taken by the camera module 320 and an indication that the image depicts an item that the user (eg, the user 160) wants to sell. The communication module 330 may transmit the image and the display to the e-commerce server 120. The e-commerce server 120 sends the image to the identification server 130 to request identification of the item depicted by the image, generate a listing template based on the category, and the listing template is the communication module 330 and the input module ( 310 may be presented to the user.

4 illustrates a reference and uncooperative image of an item in accordance with some example embodiments. The first entry in each of the groups 410, 420, and 430 is a category image. Items depicted in the catalog image are well-lit, face-to-face with the camera, and properly oriented. The remaining images of each group are images taken by the user in various directions and facings. In addition, non-catalog images depict background clutter.

5 illustrates a text extraction operation to identify an item depicted as an image, in accordance with some example embodiments. Each row of FIG. 5 illustrates example operations performed on an input image. Components 510A and 510B show input images for each row. Components 520A and 520B show the results of candidate extraction and direction. That is, taking into account the query image, the text block is identified and oriented using a heuristic based random transformation. The roughly collinear characters are identified as lines and pass through an OCR (eg, 4D cube OCR) to obtain text output. As an example, components 530A and 530B show a subset of the obtained text output.

6 illustrates a proposed set of matches for an item and an input image depicting a media item in accordance with some example embodiments. Image 610 is an input image. Image 610 is oriented such that the text on the depicted media item is aligned with the image, but the media item is at an angle to the camera. Media items also reflect light sources that obscure some of the text depicted in the image. The proposed set of matches 620 depicts the top five matches reported by the text identification module 220. The proposed set of matches 630 depicts the top five matches reported by the image identification module 230. The proposed set of matches 640 depicts the top five matches reported by ranking module 240. Thus, the first entry in the set of proposed matches 640 is correctly reported by the identification server 130 as a match for the input image 610.

7 is a flowchart illustrating operation of identification server 130 to perform a process of identifying items in an image, in accordance with some example embodiments. Process 700 includes operations 710, 720, 730, 740, and 750. By way of example only and not limitation, operations 710-750 are described as being performed by modules 210-270.

In operation 710, the image classification module 230 accesses the image. For example, the image may be captured by the device 150, transmitted to the identification server 130 via the network 170, and received by the communication module 210 of the identification server 130. And may be transmitted by the communication module 210 to the image classification module 230. Image classification module 230 determines a score for each of the first set of candidate matches for the image in the database (operation 720). For example, vector of locally aggregated descriptors (VLAD) can be used to identify and rank candidate matches in a database. In some example embodiments, the VLAD is structured by densely extracting the speeded up robust feature (SURF) from the training set and clustering the descriptors using a k-average with k = 256 to generate a vocabulary. do. In some example embodiments, the similarity metric is based on L2 (Euclidean) distance between normalized VLAD descriptors.

In operation 730, the text identification module 220 accesses and extracts text from the image. Text identification module 220 determines a score for each of the second set of candidate matches for text in the database. For example, a bag of words (BoW) algorithm can be used to identify and rank candidate matches in a database. Text may be extracted from an image in an orientation-agnostic manner. The extracted text is redirected to horizontal alignment through projection analysis. The random transform is calculated and the angle of the line has the selected projected area. Individual lines of text are extracted using clustering of central characters. The most stable extreme region (MSER) is identified as a potential letter within each cluster. Character candidates are grouped into lines by incorporating regions of similar height when they are adjacent or when their bases have proximity y values. Unrealistic line candidates are excluded when the aspect ratio exceeds the threshold (eg, when the length of the line exceeds 15 times the width).

The text of the identified line passes through the OCR engine for text extraction. To illustrate the possibility that the extracted lines of text may be reversed, the text of the identified lines is also rotated 180 degrees and the rotated lines pass through the OCR engine.

In operation 740, the character n-gram is used for text matching. A sliding window of size N meets each word of sufficient length so that non-alphabetic characters are discarded. As an example where N = 3, the phrase "I like turtles" would be broken down into "lik", "ike", "tur", "urt", "rtl", "tle" and "les". In some example embodiments, case is ignored by converting all characters to lowercase.

The denormalized histogram of N-grams for each document is referred to as f. In some example embodiments, the following scheme is used to calculate a normalized similarity score between a query and a document.

Where N 1 and N 2 are functions for calculating L 1 and L 2 normalization, respectively. The gamma vector is a vector of inverse document frequency (idf) weights. For each unique N-gram g, the corresponding idf weight is And the natural log of multiple documents in the database is partitioned by multiple documents comprising N-grams g. The final normalization is L2 normalization.

In operation 750, the ordering module 240 identifies possible matches for the image based on the first set of scores and the second set of scores. For example, the corresponding scores can be summed, weighted or integrated, and the candidate match has the best result score identified as a possible match.

Combines a set of similarity measures into a unified ranking. Each Denotes a measure of similarity from one feature type. The optimal weight of the term to compute is always provides a higher similarity between exact query / reference matches than incorrect. Thus, the following optimization can be implemented during the training process for learning the optimal weight vector w.

During operation 750, individual S values (eg, for an OCR match and for a VLAD match) are Integrated into a vector, the integrated score on Is multiplied by In some example embodiments, the item with the best integrated score for the query image is taken as a matching item. In some example embodiments, when there is no item with an integrated score above a threshold, there is no item that is found to match. In some example embodiments, a set of items with an integrated score above a threshold, a set of K items with the best integrated score, or a suitable combination thereof, may further utilize additional image matching using geometric features as described below. To be selected.

Potential match and query images are scaled to a standard size (eg 256 × 256 pixels). Histograms of oriented gradients (HOG) values are determined for eight directions, 8 by 8 pixels per cell, and 2 by 2 cells per block for each scaled image. For each potential match, a linear transformation matrix is identified that minimizes the error between the transformed query matrix and the potentially matching image. Minimized errors are compared and potential matches with the least minimized errors are reported as matches.

One way to identify linear transformation matrices that minimize errors is to randomly generate a number (eg, 100) of these transformation matrices and determine the error for each of these matrices. If the minimum error is below the threshold, the corresponding matrix is used. If not, a new set of random transform matrices is generated and evaluated. After a predetermined number of iterations, the matrix corresponding to the identified minimum error is used and the method ends.

8 is a flowchart illustrating the operation of a server performing a process 800 for automatically generating a sales listing of an item depicted as an image, in accordance with some example embodiments. Process 800 includes operations 810, 820, and 830. By way of example only and not limitation, operations 810-830 are described as being performed by identification server 130 and e-commerce server 120.

In operation 810, the e-commerce server 120 receives an image. For example, the user 160 can take an image using the device 150 and upload it to the e-commerce server 120. At operation 820, identification server 120 uses process 700 to identify the item depicted in the image. For example, the e-commerce server 130 may transfer the image to the identification server 120 for identification. In some example embodiments, the e-commerce server 120 and the identification server 130 are integrated and the e-commerce server 120 identifies the item in the image.

In operation 830, the e-commerce server 120 generates a listing describing the item as sold by the user 160. For example, if a user uploads a picture of a book titled "The Last Mogul", a listing may be created for "The Last Mogul". In some example embodiments, the generated listing includes a catalog image of the item, the item title, and a description of the item, all loaded from the product database. A user interface presented to the user may be used to select additional listing options or default listing options (eg, price or starting price, sale format (auction or fixed price), or shipping option).

9 is a flowchart illustrating operation of a server to perform a process of providing results based on items depicted as images in accordance with some example embodiments. Process 900 includes operations 910, 920, and 930. By way of example only and not limitation, operations 910 to 930 are described as being performed by identification server 130 and e-commerce server 120.

In operation 910, the e-commerce server 120 or search engine server receives the image. For example, the user 160 can take an image using the device 150 and upload it to the e-commerce server 120 or search engine server. At operation 920, identification server 130 uses process 700 to identify the item depicted in the image. For example, the e-commerce server 120 may transfer the image to the identification server 130 for identification. In some example embodiments, the e-commerce server 130 and the identification server 120 are integrated and the e-commerce server 130 identifies the item depicted in the image. Similarly, a search engine server (eg, a server that locates a document, web page, image, video, or other file) receives the image and, via identification server 130, retrieves the media item depicted as the image. To identify.

At operation 930, the e-commerce server 120 or search engine server provides the user with information regarding one or more items in response to receiving the image. An item is selected based on the identified item depicted in the image. For example, if a user uploads a picture of a book titled "The Last Mogul", the sales listing for "The Last Mogul" listed through the e-commerce server 120 or 140 is identified and provided an image. May be provided to one user (eg, sent over network 170 to device 150A for display to user 160). As another example, if a user uploads a picture of "The Last Mogul" to a general search engine, a web page mentioning "The Last Mogul" can be identified, and a store with "The Last Mogul" for sale is identified. And a video of the review for “The Last Mogul” can be identified, and one or more of them can be provided to the user (eg, in a web page for display on a web browser of the user device).

In accordance with various example embodiments, one or more methodologies described herein may provide for identifying an item (eg, a media item) depicted as an image. In addition, one or more methodologies described herein may provide for identifying items depicted in images as compared to a method of image identification alone or text classification. In addition, one or more methodologies described herein can provide for identifying items depicted in images more quickly and using less computational power compared to previous methods.

When these effects are considered in combination, one or more of the methodologies described herein can alleviate the need for particular effort or resources to be included in identifying items depicted with images. The effort consumed by the user in ordering the item of interest may also be reduced by one or more methodologies described herein. For example, accurately identifying an item of interest to the user from the image can reduce the amount of time or effort spent by the user in generating an item listing or in finding an item for purchase. Computing resources used by one or more machines, databases, or devices (eg, within network environment 100) may be similarly reduced. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption and cooling capacity.

Software architecture

10 is a block diagram 1000 illustrating an architecture of software 1002 that may be installed on any one or more devices described above. 10 is merely a non-limiting example of a software architecture, and it can be understood that many other architectures can be implemented to provide the functionality described herein. The software 1002 may be implemented by hardware such as the machine 1100 of FIG. 11 that includes a processor 1110, memory 1130, and input / output (I / O) component 1150. In this example architecture, software 1002 can be conceptualized as a stack of layers, with each layer providing a specific function. For example, software 1002 includes layers such as operating system 1004, library 1006, framework 1008, and application 1010. Optionally, according to some implementations, the application 1010 calls an application programming interface (API) call 1012 via a software stack and receives a message 1014 in response to the API call 1012.

In various implementations, operating system 1004 manages hardware resources and provides common services. Operating system 1004 includes, for example, kernel 102, service 1022, and driver 1024. In some implementations kernel 1020 operates as an abstraction layer between hardware and other software layers. For example, kernel 1020 provides memory management, processor management (eg, scheduling), component management, networking, security settings, and other functions. The service 1022 may provide other common services for other software layers. The driver 1024 is responsible for interfacing or controlling the underlying hardware. For example, the driver 1024 may be a display driver, a camera driver, a Bluetooth® driver, a flash memory driver, a serial communication driver (eg, a universal serial bus (USB) driver), a WiFi® driver, an audio driver, a power management driver. And the like.

In some example implementations, library 1006 provides a low level common infrastructure that can be utilized by application 1010. The library 1006 may include a system library 1030 (eg, a C standard library) that may provide functions such as memory allocation functions, string manipulation functions, mathematical functions, and the like. In addition, the library 1006 may be a media library (e.g., Moving Picture Experts Group-4 (MPEG4), H.264 or Advanced Video Coding (AVC), Moving Picture Experts Group Layer-3 (MP3), Advanced Audio (AAC). Coding), Adaptive Multi-Rate (AMR) audio codecs, libraries that support the presentation and manipulation of various media formats such as JPEG or Joint Photographic Experts Group (JPG), PNG (Portable Network Graphics), graphics libraries (e.g. , The OpenGL framework used to render in two-dimensional (2D) and three-dimensional (3D), database libraries (e.g. SQLite to provide various relational database features), web libraries (e.g. web browsing features It may include an API library (1032) such as provided WebKit). Library 1006 can also include various other libraries 1034 that provide many other APIs to the application 1010.

According to some implementations, framework 1008 provides a high level common infrastructure that can be utilized by application 1010. For example, framework 1008 provides various graphical user interface (GUI) functions, high level resource management, high level location services, and the like. The framework 1008 may provide a broad spectrum of other APIs that may be utilized by the application 1010, some of which may be specific to a particular operating system or platform.

In an example embodiment, the application 1010 may include a home application 1050, a contact application 1052, a browser application 1054, a book reader application 1056, a location application 1058, a media application 1060, a messaging application. 1062, game applications 1064, and a wide variety of other applications such as third party applications 1066. According to some embodiments, the application 1010 is a program that executes a function defined in the program. Various programming languages may include one or more applications 1010 structured in various ways, such as an object-oriented programming language (eg, Object-C, Java, or C ++) or a procedural programming language (eg, C or assembly language). Can be utilized to generate the. In certain instances, third party applications 1066 (e.g., applications developed using Android ™ or iOS ™ Software Development Kits (SDKs) by entities other than the provider of a particular platform) may include iOS ™, Android ™, It can be mobile software running on a mobile operating system such as a Windows® phone or other mobile operating system. In this example, the third party application 1066 can call the API call 1012 provided by the mobile operating system 1004 to provide the functionality described herein.

Example Machine Architecture and Machine-readable Media

11 is a machine (capable of reading instructions from a machine readable medium (eg, machine readable storage medium) and performing any one or more methodologies described herein, in accordance with some example embodiments. A block diagram illustrating the components of 1100. In particular, FIG. 11 illustrates instructions 1116 (eg, software, programs, applications, applets, apps, or other executable code) internally to cause machine 1100 to perform any one or more methodologies discussed herein. Shows a schematic representation of the machine 1100 in an example form of a computer system, in which) may be executed. In alternative embodiments, the machine 1100 may operate as a standalone device or may be connected (networked) to another machine. In a networked deployment, the machine 1100 may operate at the capacity of a server machine or client machine in a server-client network environment, or may operate as a peer machine in a peer-to-peer (or distributed) network environment. The machine 1100 is a server computer, client computer, personal computer (PC), tablet computer, laptop computer, netbook, set top box (STB), personal digital assistant (PDA), entertainment media system, cellular phone, smartphone, mobile Devices, wearable devices (e.g., smart watches), smart home devices (e.g., smart appliances), other smart devices, web appliances, network routers, network switches, network bridges, or sequentially or otherwise machines ( It can include, but is not limited to, any machine that can execute the instructions 1116, which specifies the action to be taken by 1100. Furthermore, although only a single machine 1100 is shown, the term “machine” also includes a set of machines 1100 that individually or jointly execute instructions 1116 to perform any one or more methodologies discussed herein. Will be taken. In practice, certain embodiments of machine 100 may be better suited to the methodology described herein. For example, any computing device with sufficient processing power may serve as identification server 130, while accelerometers, cameras, and cellular network connectivity perform identification server 130 performing the image identification methods discussed herein. Not directly related to Thus, in some example embodiments, the cost savings are implemented by implementing various described methodologies on the machine 110 (e.g., integration that can be found commonly only on a wearable or portable device and without a directly connected display). By implementing identification server 130 in a server machine without sensors), additional functions that are unnecessary for the performance of tasks assigned to each machine 1100 are excluded.

The machine 1100 may include a processor 1110, a memory 1130, and an I / O component 1150, which may be configured to communicate with each other via a bus 1102. In an example embodiment, the processor 1110 (eg, Central Processing Unit (CPU), Reduced Instruction Set Computing (RISC) processor, Complex Instruction Set Computing (CISC) processor, Graphics Processing Unit (GPU), Digital DSP) Signal Processors, Application Specific Integrated Circuits (ASICs), Radio-Frequency Integrated Circuits (RFICs), other processors, or any suitable combination thereof, may be, for example, processors 1112 capable of executing instructions 1116. And a processor 1114. The term "processor" is intended to encompass a multi-core processor that may include two or more independent processors (also referred to as "cores") that can execute instructions simultaneously. 11 illustrates a number of processors, the machine 1100 includes a single processor with a single core, a single processor with a plurality of cores (eg, a multi-core processor), a plurality of processors with a single core, a plurality of cores. It may include a plurality of processors having, or a combination thereof.

Memory 1130 may include main memory 1132, static memory 1134, and storage unit 1136 that are accessible to processor 1110 via bus 1102. Storage unit 1136 may include machine readable medium 1138 having instructions 1116 stored thereon that implement any one or more methodologies or functions described herein. The instructions 1116 may also be in main memory 1132, in static memory 1134, in at least one of the at least one processor 1110 (eg, in the cache memory of the processor), or any suitable combination thereof. May be fully or at least partially present during their execution by the machine 1100. Thus, in various implementations, main memory 1132, static memory 1134, and processor 1110 are considered as machine readable medium 1138.

As used herein, the term “memory” is capable of temporarily or permanently storing data and includes random access memory (RAM), read-only memory (ROM), buffer memory, flash memory, and cache memory. Refers to a machine readable medium 1138 that can be taken without being limited thereto. Although the machine-readable medium 1138 is shown in an example embodiment such that it is a single medium, the term “machine-readable medium” refers to a single medium or multiple media (eg, centrally) capable of storing the instructions 1116. Centralized or distributed databases or associated caches and servers). The term “machine readable medium” also means that when instructions are executed by one or more processors (eg, processor 1110) of the machine 1100, the machine 1100 performs any one or more methodologies discussed herein. It will be taken to include any medium capable of storing instructions (eg, instructions 1116) for execution by the machine (eg, machine 1100), or a combination of multiple media. Thus, "machine readable medium" refers to a "cloud-based" storage system or storage network as well as a single storage device or device including multiple storage devices or devices, thus the term "machine readable medium" refers to a solid state memory. (E.g., flash memory), optical media, magnetic media, other non-volatile memory (e.g., removable programmable read only Memory (EPROM), or any suitable combination of these, will be taken to include, but is not limited to.

I / O component 1150 includes various components such as receiving input, providing output, generating output, transmitting information, exchanging information, capturing measurements, and the like. In general, it will be appreciated that I / O component 1150 may include many other components not shown in FIG. 11. I / O components 1150 are merely grouped according to functionality that simplifies the discussion below and are not a limiting grouping. In various example embodiments, I / O component 1150 includes an output component 1152 and an input component 1154. The output component 1152 may be a visual component (eg, a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)), an acoustic component (eg, Speakers), haptic components (eg, vibration motors), other signal generators, and the like. Input component 1154 may include an alphanumeric input component (eg, a keyboard, a touch screen configured to receive alphanumeric input, an optical keyboard, or other alphanumeric input component), a point based input component (eg, a mouse, touchpad, trackball). , Joysticks, motion sensors, or other pointing mechanisms, tactile input components (e.g., touch screens and other tactile input components that provide physical buttons, position and touch force or touch gestures), audio input components (e.g., Microphone) and the like.

In some further example embodiments, I / O component 1150 includes biometric component 1156, motion component 1158, environmental component 1160, or location component 1162, and various other components. . For example, the biometric component 1156 detects an expression (eg, hand expression, facial expression, facial expression, body gesture, or eye tracking), and the biosignal (eg, blood pressure, heart rate, body). Components such as measuring temperature, sweat, or brain waves, and identifying a person (eg, voice identification, retinal identification, facial identification, fingerprint identification, or electroencephalography based on identification), and the like. The movement component 1158 includes an acceleration sensor component (eg, accelerometer), a gravity sensor component, a rotation sensor component (eg, a gyroscope), and the like. Environmental component 1160 may include, for example, an illumination sensor component (photometer), a temperature sensor component (eg, one or more thermometers to detect ambient temperature), a humidity sensor component, a pressure sensor component (eg, a barometer) Acoustic sensor components (e.g., one or more microphones to detect background noise), proximity sensor components (e.g., infrared sensors to detect nearby objects), gas sensors (e.g. concentrations of dangerous gases for safety Machine smell detection sensor, gas detection sensor) for detecting or measuring contaminants in the environment), or other components capable of providing an indication, measurement, or signal corresponding to the surrounding physical environment. The position component 1162 may be a position sensor component (eg, a global position system (GPS) receiver component), an altitude sensor component (eg, an altimeter or barometer that detects air pressure from which an altitude can be derived), a direction sensor component (For example, a magnetic meter) and the like.

Communication can be implemented using various techniques. I / O component 1150 may include communication component 1164 operable to connect machine 1100 to network 1180 or device 1170 via coupling 1182 and coupling 1172 respectively. have. For example, communication component 1164 includes a network interface component or other suitable device that interfaces with network 1180. In a further example, the communication component 1164 may include a wired communication component, a wireless communication component, a cellular communication component, a near field communication (NFC) component, a Bluetooth® component (e.g., Bluetooth® low energy) that provides communication via another modality, Wi-Fi® components, and other communication components. Device 1170 may be another machine or any of a variety of peripheral devices (eg, peripheral devices connected via USB).

In addition, in some implementations, the communication component 1164 includes a component operable to detect the identifier or to detect the identifier. For example, the communication component 1164 may include a radio frequency identification (RFID) tag reader component, an NFC smart tag detection component, an optical reader component (e.g., a one-dimensional barcode such as a Universal Product Code (UPC), Quick Response (QR). Multidimensional barcodes such as codes, Aztec codes, data matrices, Dataglyph, maxicode, PDF417, ultracode, UCC Uniform Commercial Code Reduced Space Symbology (DCC) -2D barcodes, and other optical codes), acoustic detection components ( For example, a microphone that identifies a tagged audio signal), or any suitable combination thereof. In addition, various information may be communicated via communication component 1164, such as location via Internet Protocol (IP) geographic location, location via Wi-Fi® signal triangulation, location through detecting NFC beacon signals that may indicate a particular location, and the like. Can be derived.

Transmission medium

In various example embodiments, one or more portions of network 1180 may comprise an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), Wireless WAN (WWAN), Metropolitan Area Network (MAN), Internet, Partial Internet, Partial Public Switched Telephone Network (PSTN), Conventional Telephone Service (POTS) Network, Cellular Telephone Network, Wireless Network, Wi-Fi® Network, Other Types of Network , Or a combination of two or more such networks. For example, network 1180 or part of network 1180 may comprise a wireless or cellular network and coupling 1182 may be a code division multiple access (CDMA) connection, a global system (GSM) connection for mobile communications. Or other types of cellular or wireless coupling. In this example, the coupling 1182 uses 1xRTT (Single Carrier Radio Transmission Technology), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced Data rates for GSM Evolution (EDGE) technology, 3G. Third Generation Partnership Project (3GPP), fourth generation wireless (4G) network, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Worldwide Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE) standards It is possible to implement any of various types of data transmission techniques, such as others defined by various standard setting mechanisms, other remote protocols, other data transmission techniques.

In an example embodiment, the instructions 1116 use a transmission medium via a network interface device (eg, a network interface component included in the communication component 1164) and use a number of well-known transport protocols (eg, hyper). Text transmission protocol (HTTP) may be transmitted or received via the network 1180. Similarly, in another example embodiment, the instructions 1116 are transmitted or received to the device 1170 using a transmission medium via coupling 1172 (eg, peer-to-peer coupling). The term “transmission medium” is capable of storing, encoding, or conveying an instruction 1116 for execution by the machine 1100 and communicating any intangible medium or such software, including digital or analog transmission signals. It will be taken to include other intangible media that enable this. The transmission medium is an embodiment machine readable medium.

language

Throughout this specification, multiple instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more individual operations may be performed concurrently, and the operations need not be performed in the order illustrated. Structures and functions presented as separate components in the example configurations may be implemented as integrated structures or components. Similarly, structures and functions presented as a single component can be implemented as separate components. Various variations, modifications, additions, and improvements fall within the scope of the subject matter herein.

While an overview of the subject matter herein has been described with reference to specific example embodiments, various modifications and variations may be made to these embodiments without departing from the broad scope of the embodiments of the present disclosure. Such embodiments of the subject matter herein may be individually or collectively referred to herein by the term "invention" for convenience only, but the scope of the present application is defined in the context of either disclosure or invention (indeed, One or more may be disclosed) but are not intended to be voluntary.

The embodiments illustrated herein are described in sufficient detail to enable those skilled in the art to practice the disclosed teachings. Other embodiments may be utilized and derived from them so that structural and logical substitutions and changes may be made without departing from the scope of the present disclosure. Accordingly, the detailed description is not to be taken in a limiting sense, and the scope of the various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

As used herein, the term “or” can be interpreted in an inclusive or exclusive sense. In addition, multiple instances may be provided as a single instance for the resources, operations, or structures described herein. In addition, the boundaries between the various resources, operations, modules, engines, and data stores are somewhat arbitrary, and certain operations are illustrated in the context of certain example configurations. Other assignments of functionality may be configured and may fall within the scope of various embodiments of the present disclosure. In general, structures and functionality presented as discrete resources in the example configuration may be implemented as an integrated structure or resource. Similarly, structures and functions presented as a single resource can be implemented as individual resources. Various modifications, modifications, additions, and improvements fall within the scope of embodiments of the present disclosure as represented by the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

The enumerated examples below define the methods, machine readable media, and systems (eg, apparatus) of the various illustrative embodiments discussed herein.

Example 1. As a system,

A memory having instructions implemented therein;

One or more processors configured by the instructions, wherein the instructions,

Storing a plurality of records for a plurality of corresponding items, each record of the plurality of records comprising text data and image data for an item corresponding to the record;

Accessing a first image depicting the first item,

Generating a first set of candidate matches for the first item from the plurality of items based on the first image and the image data of the plurality of records;

Recognizing text in the first image;

Generating a second set of candidate matches for the first item from the plurality of items based on the recognized text and the text data of the plurality of records;

Merging the first set of candidate matches and the second set of candidate matches into an integrated set of candidate matches;

Perform an operation comprising identifying a top candidate match of the merged set of candidate matches.

system.

Example 2. In Example 1,

The first image is associated with a user account,

The operation further comprises creating a listing in an electronic marketplace, wherein the listing is associated with the user account and the listing is for a top candidate match.

system.

Example 3. The method of example 1 or 2,

Recognizing the text includes extracting a cluster of text in an orientation-agnostic manner,

Generating the second set of candidate matches includes matching fixed size N character N-grams in the cluster of text.

system.

Example 4. In Example 3,

The fixed size N is 3

system.

Example 5. The method of any of examples 1-4,

Generating the first set of candidate matches comprises generating a first score corresponding to each candidate match in the first set of candidate matches,

Generating the second set of candidate matches comprises generating a second score corresponding to each candidate match in the second set of candidate matches,

Incorporating the first set of candidate matches and the second set of candidate matches into the merged set of candidate matches, each candidate included in both the first set of candidate matches and the second set of candidate matches. For a match, summing the first score and the second score corresponding to the candidate match,

Identifying a top candidate match of the merged set of candidate matches identifies a candidate match in the merged set of candidate matches with the highest summed score.

system.

Example 6. The method of any of examples 1-5.

The operation is,

Receiving the first image from a client device as part of a search request;

Identifying a set of results based on the top candidate match;

In response to the search request, providing the set of results to the client device

system.

Example 7. For example 6,

The set of results includes a set of item listings of items for sale.

system.

Example 8. A computer-implemented method

Storing a plurality of records for a plurality of corresponding items, each record of the plurality of records comprising text data and image data for an item corresponding to the record;

Accessing a first image depicting the first item;

Generating a first set of candidate matches for the first item from the plurality of items based on the first image and the image data of the plurality of records;

Recognizing text in the first image;

Generating a second set of candidate matches for the first item from the plurality of items based on the recognized text and the text data of the plurality of records;

Incorporating the first set of candidate matches and the second set of candidate matches into an integrated set of candidate matches;

Identifying a top candidate match of the combined set of candidate matches;

Computer-implemented method.

Example 9. For example 8,

The first image is associated with a user account,

The method includes creating a listing in an electronic marketplace, wherein the listing is associated with the user account and the listing is for a top candidate match.

Computer-implemented method.

Example 10. The method of example 8 or 9, wherein

Recognizing the text includes extracting a cluster of text in a manner that is not constrained by direction;

Generating the second set of candidate matches includes matching fixed size N character engrams in the cluster of text.

Computer-implemented method.

Example 11. For example 10,

The fixed size N is 3

Computer-implemented method.

Example 12. The method of any of examples 8-11.

Generating the first set of candidate matches comprises generating a first score corresponding to each candidate match in the first set of candidate matches,

Generating the second set of candidate matches comprises generating a second score corresponding to each candidate match in the second set of candidate matches,

Incorporating the first set of candidate matches and the second set of candidate matches into the merged set of candidate matches, each of which is included in both the first set of candidate matches and the second set of candidate matches. For a candidate match, summing the first score and the second score corresponding to the candidate match,

Identifying the top candidate match of the candidate set of the merged set includes identifying a candidate match in the candidate set of the merged set with the highest summed score.

Computer-implemented method.

Example 13. The method of any of examples 8-12.

Receiving the first image from a client device as part of a search request;

Identifying a set of results based on the top candidate match;

In response to the search request, providing the set of results to the client device;

Computer-implemented method.

Example 14. For example 13,

The set of results includes a set of item listings of items for sale.

Computer-implemented method.

Example 15 A machine readable medium comprising instructions executable by one or more processors of a machine to cause a machine to perform any of the methods of Examples 8-14.

Claims (15)

  1. As a system,
    A memory having instructions implemented therein;
    One or more processors configured by the instructions, wherein the instructions,
    Storing a plurality of records for a plurality of corresponding items, each record of the plurality of records comprising text data and image data for an item corresponding to the record;
    Accessing a first image depicting a first item, wherein the first image is associated with creating a listing in an electronic marketplace;
    Generating a first set of candidate matches for the first item from the plurality of items based on the first image and the image data of the plurality of records, wherein each candidate match in the first set of candidate matches is generated by a first match. Has a score of 1, and
    Recognizing text in the first image;
    Generating a second set of candidate matches for the first item from the plurality of items based on the recognized text and the text data of the plurality of records, wherein each candidate match in the second set of candidate matches is generated by the first match. Has 2 scores, and
    Consolidating the first set of candidate matches and the second set of candidate matches into a consolidated set of candidate matches, wherein each candidate match within the candidate set of the merged set comprises the first score and the first score of each candidate match; Has an integrated score generated from two scores—and
    Identifying a top candidate match of the candidate matches of the unified set based on the unified scores of each candidate match within the unified set of candidate matches;
    Perform an operation comprising generating a listing user interface as a selectable option that includes the top candidate match of the unified set of candidate matches.
    system.
  2. The method of claim 1,
    The first image is associated with a user account,
    The operation further includes creating the listing in the electronic marketplace using an option selected for the top candidate match of the consolidated set of candidate matches, the listing associated with the user account.
    system.
  3. The method of claim 1,
    Recognizing the text includes extracting a cluster of text in an orientation-agnostic manner,
    Generating the second set of candidate matches includes matching fixed size N character N-grams in the cluster of text.
    system.

  4. The method of claim 3, wherein
    The fixed size N is 3
    system.
  5. The method of claim 1,
    Incorporating the first set of candidate matches and the second set of candidate matches into the merged set of candidate matches, each candidate included in both the first set of candidate matches and the second set of candidate matches. For a match, summing the first score and the second score corresponding to the candidate match,
    Identifying a top candidate match of the merged set of candidate matches identifies a candidate match in the merged set of candidate matches with the highest summed score.
    system.
  6. The method of claim 1,
    The operation is,
    Receiving the first image from a client device as part of a search request;
    Identifying a set of results based on the top candidate match;
    In response to the search request, providing the set of results to the client device
    system.
  7. The method of claim 6,
    The set of results includes a set of item listings of items for sale.
    system.
  8. As a computer-implemented method,
    Storing a plurality of records for a plurality of corresponding items, each record of the plurality of records comprising text data and image data for an item corresponding to the record;
    Accessing a first image depicting a first item, wherein the first image is associated with creating a listing in an electronic marketplace;
    Generating a first set of candidate matches for the first item from the plurality of items based on the first image and the image data of the plurality of records, wherein each candidate match in the first set of candidate matches is generated by a first match. Has a score of 1--and,
    Recognizing text in the first image;
    Generating a second set of candidate matches for the first item from the plurality of items based on the recognized text and the text data of the plurality of records, wherein each candidate match in the second set of candidate matches is generated by the first match. 2 scores
    Consolidating the first set of candidate matches and the second set of candidate matches into a consolidated set of candidate matches, wherein each candidate match within the candidate set of the merged set comprises the first score and the first score of each candidate match; Has an integrated score generated from two scores—and
    Identifying a top candidate match of the candidate matches of the merged set based on the consolidated score of each candidate match in the candidate match of the merged set;
    Generating a listing user interface as a selectable option that includes the top candidate match of the merged set of candidate matches;
    Computer-implemented method.
  9. The method of claim 8,
    The first image is associated with a user account,
    The method includes creating the listing in the electronic marketplace using an option selected for the top candidate match of the consolidated set of candidate matches, the listing associated with the user account.
    Computer-implemented method.
  10. The method of claim 8,
    Recognizing the text includes extracting a cluster of text in a manner that is not constrained by direction;
    Generating the second set of candidate matches includes matching fixed size N character engrams in the cluster of text.
    Computer-implemented method.
  11. The method of claim 10,
    The fixed size N is 3
    Computer-implemented method.
  12. The method of claim 8,
    Incorporating the first set of candidate matches and the second set of candidate matches into the merged set of candidate matches, each of which is included in both the first set of candidate matches and the second set of candidate matches. For a candidate match, summing the first score and the second score corresponding to the candidate match,
    Identifying the top candidate match of the candidate set of the merged set includes identifying a candidate match in the candidate set of the merged set with the highest summed score.
    Computer-implemented method.
  13. The method of claim 8,
    Receiving the first image from a client device as part of a search request;
    Identifying a set of results based on the top candidate match;
    In response to the search request, providing the set of results to the client device;
    Computer-implemented method.

  14. The method of claim 13,
    The set of results includes a set of item listings of items for sale.
    Computer-implemented method.
  15. A non-transitory machine readable medium comprising instructions executable by one or more processors of a machine to cause a machine to perform any of the methods of claims 8-14.
KR1020177023364A 2015-01-23 2016-01-08 Recognize items depicted by images KR102032038B1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US201562107095P true 2015-01-23 2015-01-23
US62/107,095 2015-01-23
US14/973,582 US20160217157A1 (en) 2015-01-23 2015-12-17 Recognition of items depicted in images
US14/973,582 2015-12-17
PCT/US2016/012691 WO2016118339A1 (en) 2015-01-23 2016-01-08 Recognition of items depicted in images

Publications (2)

Publication Number Publication Date
KR20170107039A KR20170107039A (en) 2017-09-22
KR102032038B1 true KR102032038B1 (en) 2019-10-14

Family

ID=56417585

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020177023364A KR102032038B1 (en) 2015-01-23 2016-01-08 Recognize items depicted by images

Country Status (5)

Country Link
US (1) US20160217157A1 (en)
EP (1) EP3248142A4 (en)
KR (1) KR102032038B1 (en)
CN (1) CN107430691A (en)
WO (1) WO2016118339A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10424052B2 (en) * 2015-09-15 2019-09-24 Peking University Shenzhen Graduate School Image representation method and processing device based on local PCA whitening
CN106326902B (en) * 2016-08-30 2019-05-14 广西师范大学 Image search method based on conspicuousness structure histogram
US20180107682A1 (en) * 2016-10-16 2018-04-19 Ebay Inc. Category prediction from semantic image clustering
US20180137551A1 (en) * 2016-11-11 2018-05-17 Ebay Inc. Intelligent online personal assistant with image text localization
CN106777177A (en) * 2016-12-22 2017-05-31 百度在线网络技术(北京)有限公司 Search method and device
US10115016B2 (en) * 2017-01-05 2018-10-30 GM Global Technology Operations LLC System and method to identify a vehicle and generate reservation
US20190156403A1 (en) * 2017-11-17 2019-05-23 Ebay Inc. Rendering of object data based on recognition and/or location matching

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080267504A1 (en) * 2007-04-24 2008-10-30 Nokia Corporation Method, device and computer program product for integrating code-based and optical character recognition technologies into a mobile visual search
JP4607633B2 (en) * 2005-03-17 2011-01-05 株式会社リコー Character direction identification device, image forming apparatus, program, storage medium, and character direction identification method
US20110238659A1 (en) * 2010-03-29 2011-09-29 Ebay Inc. Two-pass searching for image similarity of digests of image-based listings in a network-based publication system

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5404507A (en) * 1992-03-02 1995-04-04 At&T Corp. Apparatus and method for finding records in a database by formulating a query using equivalent terms which correspond to terms in the input query
US8775436B1 (en) * 2004-03-19 2014-07-08 Google Inc. Image selection for news search
US7809192B2 (en) * 2005-05-09 2010-10-05 Like.Com System and method for recognizing objects from images and identifying relevancy amongst images and information
US7949191B1 (en) * 2007-04-04 2011-05-24 A9.Com, Inc. Method and system for searching for information on a network in response to an image query sent by a user from a mobile communications device
US9495386B2 (en) * 2008-03-05 2016-11-15 Ebay Inc. Identification of items depicted in images
US7991646B2 (en) * 2008-10-30 2011-08-02 Ebay Inc. Systems and methods for marketplace listings using a camera enabled mobile device
US8478052B1 (en) * 2009-07-17 2013-07-02 Google Inc. Image classification
US8761512B1 (en) * 2009-12-03 2014-06-24 Google Inc. Query by image
US9323784B2 (en) * 2009-12-09 2016-04-26 Google Inc. Image search using text-based elements within the contents of images
US9378290B2 (en) * 2011-12-20 2016-06-28 Microsoft Technology Licensing, Llc Scenario-adaptive input method editor
US8935246B2 (en) * 2012-08-08 2015-01-13 Google Inc. Identifying textual terms in response to a visual query
US9830632B2 (en) * 2012-10-10 2017-11-28 Ebay Inc. System and methods for personalization and enhancement of a marketplace
US8635124B1 (en) * 2012-11-28 2014-01-21 Ebay, Inc. Message based generation of item listings

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4607633B2 (en) * 2005-03-17 2011-01-05 株式会社リコー Character direction identification device, image forming apparatus, program, storage medium, and character direction identification method
US20080267504A1 (en) * 2007-04-24 2008-10-30 Nokia Corporation Method, device and computer program product for integrating code-based and optical character recognition technologies into a mobile visual search
US20110238659A1 (en) * 2010-03-29 2011-09-29 Ebay Inc. Two-pass searching for image similarity of digests of image-based listings in a network-based publication system

Also Published As

Publication number Publication date
EP3248142A4 (en) 2017-12-13
EP3248142A1 (en) 2017-11-29
US20160217157A1 (en) 2016-07-28
WO2016118339A1 (en) 2016-07-28
CN107430691A (en) 2017-12-01
KR20170107039A (en) 2017-09-22

Similar Documents

Publication Publication Date Title
US9836890B2 (en) Image based tracking in augmented reality systems
US20170206707A1 (en) Virtual reality analytics platform
WO2016044424A1 (en) Geolocation-based pictographs
US20170161382A1 (en) System to correlate video data and contextual data
US20160139662A1 (en) Controlling a visual device based on a proximity between a user and the visual device
US9659244B2 (en) Custom functional patterns for optical barcodes
US20150058239A1 (en) Item-based social discovery
US10242258B2 (en) Organizational data enrichment
Nguyen et al. Recognition of activities of daily living with egocentric vision: A review
US10453111B2 (en) Data mesh visualization
US20160125490A1 (en) Transferring authenticated sessions and states between electronic devices
US10055489B2 (en) System and method for content-based media analysis
US10198671B1 (en) Dense captioning with joint interference and visual context
KR20170077183A (en) Hierarchical deep convolutional neural network
US9436883B2 (en) Collaborative text detection and recognition
US9858492B2 (en) System and method for scene text recognition
KR20180006951A (en) Local Augmented Reality Sticky Object
US20160203525A1 (en) Joint-based item recognition
Zhao et al. Wearable device-based gait recognition using angle embedded gait dynamic images and a convolutional neural network
Hua et al. Introduction to the special issue on mobile vision
US10198635B2 (en) Systems and methods for associating an image with a business venue by using visually-relevant and business-aware semantics
US9904871B2 (en) Deep convolutional neural network prediction of image professionalism
US20180107644A1 (en) Correction of user input
US20190266390A1 (en) Automated avatar generation
US10346723B2 (en) Neural network for object detection in images

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
AMND Amendment
E601 Decision to refuse application
AMND Amendment
X701 Decision to grant (after re-examination)
GRNT Written decision to grant