KR102032038B1

KR102032038B1 - Recognize items depicted by images

Info

Publication number: KR102032038B1
Application number: KR1020177023364A
Authority: KR
Inventors: 케빈 쉬; 웨이 디; 비그네쉬 자가디쉬; 로빈슨 피라무수
Original assignee: 이베이 인크.
Priority date: 2015-01-23
Filing date: 2016-01-08
Publication date: 2019-10-14
Also published as: CN107430691A; EP3248142A4; KR20170107039A; US20160217157A1; EP3248142A1; WO2016118339A1

Abstract

제품들(예를 들어, 책)은 아이템을 식별하는데 사용될 수 있는 상당한 양의 유익한 문서 정보를 포함한다. 입력 질의 이미지는 제품의 사진(예를 들어, 모바일 전화를 사용하여 찍은 그림)이다. 사진은 임의의 각도 및 방향으로 찍히고 임의의 배경(예를 들어, 상당한 클러터를 갖는 배경)을 포함한다. 질의 이미지로부터, 식별 서버는 데이터베이스로부터 대응하는 깨끗한 카탈로그 이미지를 검색한다. 예를 들어, 데이터베이스는 제품 이름, 제품 이미지, 제품 가격, 제품 판매 이력, 또는 이들의 임의의 적절한 조합을 갖는 제품 데이터베이스가 될 수 있다. 검색은 데이터베이스에서의 이미지를 이미지와 매칭하는 것과 데이터베이스에서의 텍스트를 이미지로부터 검색된 텍스트와 매칭하는 것 모두에 의해 수행된다.Products (eg, books) contain a significant amount of informative document information that can be used to identify an item. The input query image is a picture of the product (eg, a picture taken using a mobile phone). The picture is taken at any angle and direction and includes any background (eg, a background with significant clutter). From the query image, the identification server retrieves the corresponding clean catalog image from the database. For example, the database can be a product database with a product name, product image, product price, product sales history, or any suitable combination thereof. The search is performed both by matching the image in the database with the image and by matching the text in the database with the text retrieved from the image.

Description

Recognize items depicted by images

본 출원은 "효율적인 미디어 검색"이라는 명칭의 2015년 1월 23일 출원된 미국 특허 가출원 번호 62/107,095 및 "이미지로 묘사된 아이템 인식"이라는 명칭의 2015년 12월 17일 출원된 미국 특허 출원 번호 14/973,582에 대한 우선권을 주장하고, 이들의 각각은 그 전체가 본원에 참조로써 포함된다.This application claims U.S. Provisional Application No. 62 / 107,095, filed Jan. 23, 2015 entitled "Efficient Media Retrieval," and US Patent Application No., filed December 17, 2015 entitled "Recognition of Items Depicted in Images." Claims priority to 14 / 973,582, each of which is incorporated herein by reference in its entirety.

본원에 개시된 청구 대상은 일반적으로 이미지로 묘사된 아이템을 식별하는 컴퓨터 시스템에 관한 것이다. 특히, 본 개시는 미디어 데이터베이스로부터의 아이템에 대한 데이터의 효율적인 검색에 관한 시스템 및 방법에 대해 다룬다.The subject matter disclosed herein relates generally to a computer system for identifying items depicted in images. In particular, the present disclosure addresses systems and methods relating to the efficient retrieval of data for items from a media database.

아이템 인식 엔진은 질의 이미지가 협조적(cooperative)일 때 이미지로 묘사된 아이템을 인식하는데 높은 성공률을 가질 수 있다. 협조적인 이미지는 적절한 조명으로 찍은 것이고, 아이템은 카메라를 직접 대면하고 적절히 정렬되며, 이미지는 아이템 외의 객체는 묘사하지 않는다. 아이템 인식 엔진은 비협조적(non-cooperative)인 이미지로 묘사된 아이템을 인식하는 것이 불가능할 수 있다. 본원과 관련된 배경기술로는 예를 들어 미국 특허공개공보 제2006/0147127호를 참조할 수 있다.The item recognition engine may have a high success rate in recognizing items depicted as images when the query image is cooperative. Collaborative images are taken with proper lighting, items are facing the camera directly and aligned properly, and images do not depict objects other than items. The item recognition engine may be unable to recognize an item depicted in a non-cooperative image. Background art associated with the present disclosure may be referred to, for example, US Patent Publication No. 2006/0147127.

일부 실시예들은 첨부 도면의 도면들에서 제한이 아닌 예시로서 도시된다.
도 1은 일부 예시의 실시예에 따라 이미지로 묘사된 아이템을 식별하기에 적합한 네트워크 환경을 도시하는 네트워크 다이어그램이다.
도 2는 일부 예시의 실시예에 따라 이미지로 묘사된 아이템을 식별하기에 적합한 식별 서버의 컴포넌트를 도시하는 블록도이다.
도 3은 일부 예시의 실시예에 따라 아이템의 이미지를 캡쳐하고 이미지에서 묘사된 아이템을 식별하도록 구성된 서버와 통신하기에 적합한 디바이스의 컴포넌트를 도시하는 블록도이다.
도 4는 일부 예시의 실시예에 따라 아이템의 기준 이미지 및 비협조적인 이미지를 도시한다.
도 5는 일부 예시의 실시예에 따라 이미지로 묘사된 아이템을 식별하기 위한 텍스트 추출 동작을 도시한다.
도 6은 일부 예시의 실시예에 따라 아이템을 묘사하는 입력 이미지 및 아이템에 대해 제안된 매치의 세트를 도시한다.
도 7은 일부 예시의 실시예에 따라 이미지에서의 아이템을 식별하는 프로세스를 수행하는 서버의 동작을 도시하는 흐름도이다.
도 8은 일부 예시의 실시예에 따라 이미지로 묘사된 아이템을 위한 판매 리스팅을 자동으로 생성하는 프로세스를 수행하는 서버의 동작을 도시하는 흐름도이다.
도 9는 일부 예시의 실시예에 따라 이미지로 묘사된 아이템에 기초한 결과를 제공하는 프로세스를 수행하는 서버의 동작을 도시하는 흐름도이다.
도 10은 일부 예시의 실시예에 따라, 머신 상에 설치될 수 있는 소프트웨어 아키텍쳐의 일례를 도시하는 블록도이다.
도 11은 일부 예시의 실시예에 따라, 명령어의 세트가 머신으로 하여금 본원에서 논의된 임의의 하나 이상의 방법론을 수행하게 하기 위해 실행될 수 있는 컴퓨터 시스템의 형태인 머신의 개략적인 표현을 도시한다.Some embodiments are shown by way of example and not by way of limitation in the figures of the accompanying drawings.
1 is a network diagram illustrating a network environment suitable for identifying items depicted as images, in accordance with some example embodiments.
2 is a block diagram illustrating components of an identification server suitable for identifying an item depicted as an image, in accordance with some example embodiments.
3 is a block diagram illustrating components of a device suitable for communicating with a server configured to capture an image of an item and identify the item depicted in the image, in accordance with some example embodiments.
4 illustrates a reference image and an uncooperative image of an item, in accordance with some example embodiments.
5 illustrates a text extraction operation to identify an item depicted as an image, in accordance with some example embodiments.
6 illustrates a set of proposed matches for an item and an input image depicting the item in accordance with some example embodiments.
7 is a flowchart illustrating operation of a server to perform a process of identifying items in an image, in accordance with some example embodiments.
8 is a flowchart illustrating the operation of a server to perform a process of automatically generating a sales listing for an item depicted as an image, in accordance with some example embodiments.
9 is a flowchart illustrating operation of a server to perform a process of providing results based on items depicted as images in accordance with some example embodiments.
10 is a block diagram illustrating an example of a software architecture that may be installed on a machine, in accordance with some example embodiments.
11 shows a schematic representation of a machine in the form of a computer system that a set of instructions may be executed to cause a machine to perform any one or more methodologies discussed herein, in accordance with some example embodiments.

예시의 방법 및 시스템은 이미지로 묘사된 아이템의 식별에 관한 것이다. 예시들은 단지 가능한 전형적인 변형이다. 달리 명시적으로 언급되지 않으면, 컴포넌트들 및 기능들은 선택적이고 통합되거나 세분화될 수 있고, 동작들은 순차적으로 변하거나 통합되거나 세분화될 수 있다. 다음의 설명에서, 예시의 목적을 위해, 다양한 특성 상세가 예시의 실시예의 완전한 이해를 제공하기 위해 제시된다. 그러나, 본 발명의 청구 대상은 이들 특성 상세 없이 실시될 수 있음이 당업자에게 명백해질 것이다.Example methods and systems relate to the identification of items depicted in images. The examples are merely typical variations possible. Unless expressly stated otherwise, the components and functions may be optional and integrated or subdivided, and the operations may be sequentially changed, integrated or subdivided. In the following description, for purposes of illustration, numerous specific details are set forth in order to provide a thorough understanding of the illustrative embodiments. However, it will be apparent to one skilled in the art that the subject matter of the present invention may be practiced without these specific details.

제품들(예를 들어, 책 또는 컴팩트 디스크(CD))은 주로 아이템을 묘사하는 이미지로부터 아이템을 식별하는데 사용될 수 있는 상당한 양의 유익한 문서 정보를 포함한다. 이러한 문서 정보를 포함하는 제품의 부분들은 책의 앞 표지, 뒤 표지, 및 등, CD, 디지털 비디오 디스크(DVD), 또는 블루레이™ 디스크의 전면, 후면, 및 등을 포함한다. 유익한 문서 정보를 포함하는 제품들의 다른 부분들은 표지, 패키징, 및 사용자 매뉴얼이다. 전통적인 광학식 문자 판독(OCR)은 아이템 상의 텍스트가 이미지의 모서리와 정렬되고 이미지 품질이 높을 때 사용될 수 있다. 협조적인 이미지는 적절한 조명으로 찍은 것이고, 아이템은 카메라를 직접 대면하고 적절히 정렬되며, 이미지는 아이템 외의 객체는 묘사하지 않는다. 이들 특징 중 하나 이상이 부족한 이미지는 "비협조적"으로 지칭된다. 일례로서, 어두운 조명으로 찍은 이미지는 비협조적이다. 다른 예시로서, 아이템 묘사의 하나 이상의 부분을 차단하는 폐색(occlusion)을 포함하는 이미지는 또한 비협조적이다. 전통적인 OCR은 비협조적인 이미지를 처리할 때 실패할 수 있다. 따라서, 보조 단어 레벨에서의 OCR의 사용은 직접 이미지 분류의 사용(깊은 중첩 신경망(CNN)을 사용함)에 의해 보충될 수 있는 잠재적인 매치에 관한 일부 정보를 제공할 수 있다. Products (eg, books or compact discs (CDs)) contain a significant amount of informative document information that can be used to identify an item primarily from an image depicting the item. Portions of the product containing such document information include the front cover, back cover, and back of the book, the front, back, and the like of a CD, Digital Video Disc (DVD), or Blu-ray ™ disc. Other parts of the products that contain informative document information are covers, packaging, and user manuals. Traditional optical character reading (OCR) can be used when the text on the item is aligned with the edge of the image and the image quality is high. Collaborative images are taken with proper lighting, items are facing the camera directly and aligned properly, and images do not depict objects other than items. Images lacking one or more of these features are referred to as "uncooperative". As an example, images taken in low light are uncooperative. As another example, an image that includes an occlusion that blocks one or more portions of the item depiction is also uncooperative. Traditional OCR can fail when processing uncooperative images. Thus, the use of OCR at the supplemental word level can provide some information about potential matches that can be supplemented by the use of direct image classification (using deep overlapping neural networks (CNNs)).

일부 예시의 실시예에서, 사진(예를 들어, 모바일 전화를 사용하여 찍은 그림)은 입력 질의 이미지이다. 사진은 임의의 각도 및 방향으로 찍히고 임의의 배경(예를 들어, 상당한 클러터를 갖는 배경)을 포함한다. 질의 이미지로부터, 식별 서버는 데이터베이스로부터 대응하는 깨끗한 카탈로그 이미지를 검색한다. 예를 들어, 데이터베이스는 제품 이름, 제품 이미지, 제품 가격, 제품 판매 이력, 또는 이들의 임의의 적절한 조합을 갖는 제품 데이터베이스가 될 수 있다. 검색은 이미지를 데이터베이스에서의 이미지와 매칭하는 것과 이미지로부터 검색된 텍스트를 데이터베이스에서의 텍스트와 매칭하는 것 모두에 의해 수행된다.In some example embodiments, the picture (eg, a picture taken using a mobile phone) is an input query image. The picture is taken at any angle and direction and includes any background (eg, a background with significant clutter). From the query image, the identification server retrieves the corresponding clean catalog image from the database. For example, the database can be a product database with a product name, product image, product price, product sales history, or any suitable combination thereof. The search is performed both by matching the image with the image in the database and by matching the text retrieved from the image with the text in the database.

도 1은 일부 예시의 실시예에 따라 이미지로 묘사된 아이템을 식별하기에 적합한 네트워크 환경을 도시하는 네트워크 다이어그램이다. 네트워크 환경(100)은 전자 상거래 서버(120 및 140), 식별 서버(130), 및 디바이스(150A, 150B 및 150C)를 포함하고, 모두 네트워크(170)를 통해 서로 통신가능하게 연결된다. 디바이스(150A, 150B, 및 150C)는 집합적으로 "디바이스들(150)"로서 지칭될 수 있거나 일반적으로 "디바이스(150)"로서 지칭될 수 있다. 전자 상거래 서버(120 및 140) 및 식별 서버(130)는 네트워크 기반 시스템(100)의 일부가 될 수 있다. 대안으로, 디바이스(150)는 직접 또는 전자 상거래 서버(120 또는 140)에 접속하는데 사용되는 네트워크(170)로부터 구분되는 로컬 네트워크를 통해 식별 서버(130)에 접속할 수 있다. 도 10 및 11과 관련하여 이하에서 설명되는 바와 같이, 전자 상거래 서버(120 및 140), 식별 서버(130), 및 디바이스(150)는 각각 컴퓨터 시스템 전체 또는 부분으로 구현될 수 있다.1 is a network diagram illustrating a network environment suitable for identifying items depicted as images, in accordance with some example embodiments. Network environment 100 includes e-commerce servers 120 and 140, identification server 130, and devices 150A, 150B, and 150C, all of which are communicatively coupled to one another via network 170. Devices 150A, 150B, and 150C may be collectively referred to as "devices 150" or generally referred to as "device 150". E-commerce servers 120 and 140 and identification server 130 may be part of network-based system 100. Alternatively, device 150 may connect to identification server 130 directly or via a local network separate from network 170 used to connect to e-commerce server 120 or 140. As described below in connection with FIGS. 10 and 11, the e-commerce server 120 and 140, the identification server 130, and the device 150 may each be implemented in whole or in part as a computer system.

전자 상거래 서버(120 및 140)는 네트워크(170)를 통해 다른 머신(예를 들어, 디바이스(150))에 전자 상거래 애플리케이션을 제공한다. 전자 상거래 서버(120 및 140)는 또한 식별 서버(130)에 직접 접속되거나 통합될 수 있다. 일부 예시의 실시예에서, 하나의 전자 상거래 서버(120) 및 식별 서버(130)는 네트워크 기반 시스템(110)의 일부인 반면, 다른 전자 상거래 서버(예를 들어, 전자 상거래 서버(140)는 네트워크 기반 시스템(110)으로부터 분리된다. 전자 상거래 애플리케이션은 사용자들이 서로 직접 아이템을 구매하고 판매하는 방식, 전자 상거래 애플리케이션 제공자로부터 구매하고 판매하는 방식, 또는 둘다를 제공할 수 있다.E-commerce servers 120 and 140 provide e-commerce applications to other machines (eg, device 150) via network 170. E-commerce servers 120 and 140 may also be directly connected to or integrated with identification server 130. In some example embodiments, one e-commerce server 120 and identification server 130 are part of the network-based system 110, while another e-commerce server (eg, e-commerce server 140 is network-based). Separate from system 110. E-commerce applications may provide a way for users to buy and sell items directly from each other, to buy and sell from an e-commerce application provider, or both.

사용자(160)가 도 1에서 또한 도시된다. 사용자(160)는 사람 사용자(예를 들어, 인간), 머신 사용자(예를 들어, 디바이스(150) 및 식별 서버(130)와 인터랙팅하도록 소프트웨어 프로그램에 의해 구성되는 컴퓨터), 또는 이들의 임의의 적절한 조합(예를 들어, 머신에 의해 보조되는 사람 또는 사람에 의해 감독되는 머신)이 될 수 있다. 사용자(160)는 네트워크 환경(100)의 부분은 아니지만, 디바이스(150)와 연관되고 디바이스(150)의 사용자가 될 수 있다. 예를 들어, 디바이스(150)는 센서, 데스크탑 컴퓨터, 차량 컴퓨터, 태블릿 컴퓨터, 네비게이션 디바이스, 휴대용 미디어 디바이스, 또는 사용자(160)에게 속하는 스마트 폰이 될 수 있다.User 160 is also shown in FIG. 1. User 160 may be a human user (eg, a human), a machine user (eg, a computer configured by a software program to interact with device 150 and identification server 130), or any thereof. A suitable combination (eg, a person assisted by a machine or a machine supervised by a person). User 160 is not part of network environment 100, but may be associated with and may be a user of device 150. For example, device 150 may be a sensor, desktop computer, vehicle computer, tablet computer, navigation device, portable media device, or smartphone belonging to user 160.

일부 예시의 실시예에서, 식별 서버(130)는 사용자에게 관심있는 아이템에 관한 데이터를 수신한다. 예를 들어, 디바이스(150A)에 부착된 카메라는 사용자(160)가 판매하기를 원하는 아이템의 이미지를 찍고 네트워크(170)를 통해 식별 서버(130)로 이미지를 전송할 수 있다. 식별 서버(130)는 이미지에 기초하여 아이템을 식별한다. 식별된 아이템에 대한 정보는 전자 상거래 서버(120 또는 140)로, 디바이스(150A)로, 또는 이들의 임의의 조합으로 송신될 수 있다. 판매를 위한 아이템의 리스팅을 생성하는 것을 보조하기 위한 정보가 전자상거래 서버(120 또는 140)에 의해 사용될 수 있다. 유사하게, 이미지는 사용자(160)에게 관심있는 아이템이 될 수 있고, 사용자(160)에게 보여주기 위한 아이템의 리스팅을 선택하는 것을 보조하기 위한 정보가 전자 상거래 서버(120 또는 140)에 의해 사용될 수 있다.In some example embodiments, identification server 130 receives data regarding items of interest to the user. For example, the camera attached to the device 150A may take an image of the item that the user 160 wants to sell and transmit the image to the identification server 130 via the network 170. The identification server 130 identifies the item based on the image. Information about the identified item may be sent to the e-commerce server 120 or 140, the device 150A, or any combination thereof. Information to assist in creating a listing of items for sale may be used by the e-commerce server 120 or 140. Similarly, the image may be an item of interest to user 160, and information to assist in selecting a listing of items for display to user 160 may be used by e-commerce server 120 or 140. have.

도 1에 도시된 임의의 머신, 데이터베이스, 또는 디바이스는 그 머신, 데이터베이스, 또는 디바이스에 대해 본원에서 설명된 기능을 수행하는 특수 목적 컴퓨터가 되도록 소프트웨어에 의해 수정된(예를 들어, 구성된 또는 프로그래밍된) 범용 컴퓨터로 구현될 수 있다. 예를 들어, 본원에서 설명된 임의의 하나 이상의 방법론을 구현하는 것이 가능한 컴퓨터 시스템은 도 10 및 11과 관련하여 이하에서 설명된다. 본원에서 사용된 바와 같이, "데이터베이스"는 데이터 저장 리소스이고 텍스트 파일, 테이블, 스프레드시트, 관계 데이터베이스(예를 들어, 객체 관계 데이터베이스), 삼중 저장소, 계층적 데이터 저장소, 또는 이들의 임의의 적절한 조합으로서 구조화된 데이터를 저장할 수 있다. 또한, 도 1에 도시된 임의의 둘 이상의 머신, 데이터베이스, 또는 디바이스는 단일 머신으로 통합될 수 있고, 임의의 단일 머신, 데이터베이스, 또는 디바이스를 위해 본원에서 설명된 기능들은 다수의 머신, 데이터베이스, 또는 디바이스로 세분화될 수 있다.Any machine, database, or device shown in FIG. 1 may be modified (eg, configured or programmed) by software to be a special purpose computer that performs the functions described herein for that machine, database, or device. ) Can be implemented as a general purpose computer. For example, a computer system capable of implementing any one or more of the methodologies described herein is described below with respect to FIGS. 10 and 11. As used herein, a “database” is a data storage resource and is a text file, table, spreadsheet, relational database (eg, an object relational database), triple store, hierarchical data store, or any suitable combination thereof. Can store structured data. In addition, any two or more machines, databases, or devices shown in FIG. 1 may be integrated into a single machine, and the functions described herein for any single machine, database, or device may include multiple machines, databases, or Can be broken down into devices.

네트워크(170)는 머신, 데이터베이스, 및 디바이스 (예를 들어, 식별 서버(130) 및 디바이스(150)) 사이에 또는 가운데 통신을 가능하게 하는 임의의 네트워크가 될 수 있다. 따라서, 네트워크(170)는 유선 네트워크, 무선 네트워크(예를 들어, 모바일 또는 셀룰러 네트워크), 또는 이들의 임의의 적절한 조합이 될 수 있다. 네트워크(170)는 사설 네트워크, 공용 네트워크(예를 들어, 인터넷), 또는 이들의 임의의 적절한 조합으로 구성된 하나 이상이 부분을 포함할 수 있다.Network 170 may be any network that enables communication between or among machines, databases, and devices (eg, identification server 130 and device 150). Thus, network 170 may be a wired network, a wireless network (eg, a mobile or cellular network), or any suitable combination thereof. Network 170 may include one or more portions consisting of a private network, a public network (eg, the Internet), or any suitable combination thereof.

도 2는 일부 예시의 실시예에 따라 식별 서버(130)의 컴포넌트를 도시하는 블록도이다. 식별 서버(130)는 통신 모듈(210), 텍스트 식별 모듈(220), 이미지 식별 모듈(230), 순위화 모듈(240), 사용자 인터페이스(UI) 모듈(250), 리스팅 모듈(260), 및 저장 모듈(270)을 포함하는 것으로서 도시되고, (예를 들어, 버스, 공유 메모리, 또는 스위치를 통해) 모두 서로 통신하도록 구성된다. 본원에서 설명된 임의의 하나 이상의 모듈은 하드웨어(예를 들어, 머신의 프로세서)를 사용하여 구현될 수 있다. 또한, 임의의 둘 이상의 이들 모듈은 단일 모듈로 통합될 수 있고, 단일 모듈을 위한 본원에서 설명된 기능들은 다수의 모듈 사이에서 세분화될 수 있다. 또한, 다양한 예시의 실시예에 따라, 단일 머신, 데이터베이스, 또는 디바이스 내에서 구현되는 것으로서 본원에서 설명된 모듈은 다수의 머신, 데이터베이스, 또는 디바이스에 걸쳐 분산될 수 있다.2 is a block diagram illustrating components of identification server 130 in accordance with some example embodiments. Identification server 130 may include communication module 210, text identification module 220, image identification module 230, ranking module 240, user interface (UI) module 250, listing module 260, and It is shown as including storage module 270, and is configured to all communicate with one another (eg, via a bus, shared memory, or switch). Any one or more modules described herein may be implemented using hardware (eg, a processor of a machine). In addition, any two or more of these modules may be integrated into a single module, and the functions described herein for a single module may be subdivided among multiple modules. In addition, according to various example embodiments, the modules described herein as implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.

통신 모듈(210)은 데이터를 송신 및 수신하도록 구성된다. 예를 들어, 통신 모듈(210)은 네트워크(170)를 통해 이미지 데이터를 수신하고 수신된 데이터를 텍스트 식별 모듈(220) 및 이미지 식별 모듈(230)로 송신할 수 있다. 다른 예시로서, 순위화 모듈(240)은 묘사된 아이템에 대해 최상의 매치를 결정할 수 있고, 아이템에 대한 식별자는 통신 모듈(210)에 의해 네트워크(170)를 통해 전자 상거래 서버(120)로 전송될 수 있다. 이미지 데이터는 2차원 이미지, 연속 비디오 스트림으로부터의 프레임, 3차원 이미지, 깊이 이미지, 적외선 이미지, 쌍안경 이미지, 또는 이들의 임의의 적절한 조합이 될 수 있다.The communication module 210 is configured to transmit and receive data. For example, the communication module 210 may receive image data through the network 170 and transmit the received data to the text identification module 220 and the image identification module 230. As another example, ranking module 240 may determine the best match for the depicted item, and the identifier for the item to be sent by communication module 210 to e-commerce server 120 via network 170. Can be. The image data may be a two-dimensional image, a frame from a continuous video stream, a three-dimensional image, a depth image, an infrared image, a binocular image, or any suitable combination thereof.

텍스트 식별 모듈(220)은 입력 이미지로부터 추출된 텍스트에 기초하여, 입력 이미지로 묘사된 아이템에 대해 제안된 매치의 세트를 생성하도록 구성된다. 예를 들어, 입력 이미지로부터 추출된 텍스트는 데이터베이스에서의 텍스트에 대해 매칭될 수 있고 상위 n개(예를 들어, 상위 5개)의 매치는 아이템에 대해 제안된 매치로서 보고된다.The text identification module 220 is configured to generate a set of proposed matches for the item depicted with the input image based on the text extracted from the input image. For example, the text extracted from the input image can be matched against the text in the database and the top n (eg, top 5) matches are reported as the suggested match for the item.

이미지 식별 모듈(230)은 이미지 매칭 기술을 사용하여, 입력 이미지로 묘사된 아이템에 대해 제안된 매치의 세트를 생성하도록 구성된다. 예를 들어, 상이한 미디어 아이템 사이를 구분하도록 숙련된 CNN은 묘사된 아이템과 하나 이상의 미디어 아이템 사이의 매치의 확률을 보고하는데 사용될 수 있다. 이러한 CNN의 목적을 위해, 미디어 아이템은 묘사되는 것이 가능한 미디어의 아이템이다. 예를 들어, 책, CD, 및 DVD는 모두 미디어 아이템이다. MP4 오디오 파일과 같은, 순수한 전자 미디어는 또한 이들이 이미지와 연관되었다면, 이러한 의미에서 "미디어 아이템"이다. 예를 들어, CD의 전자 다운로드 버전은 버전이 전자 다운로드임을 나타내는 마커를 포함하도록 수정된 CD의 커버 이미지와 연관될 수 있다. 따라서, 이미지 식별 모듈(230)의 숙련된 CNN은 CD의 물리적 버전과 매칭하는 특정 이미지의 확률로부터 분리된 다운로드가능한 버전의 CD와 매칭하는 특정 이미지의 확률을 식별할 수 있다.Image identification module 230 is configured to use the image matching technique to generate a set of proposed matches for the item depicted as the input image. For example, a CNN skilled to distinguish between different media items can be used to report the probability of a match between the depicted item and one or more media items. For the purposes of this CNN, a media item is an item of media that can be depicted. For example, books, CDs, and DVDs are all media items. Pure electronic media, such as MP4 audio files, are also "media items" in this sense if they are associated with an image. For example, the electronic download version of the CD may be associated with a cover image of the CD that has been modified to include a marker indicating that the version is an electronic download. Thus, the skilled CNN of image identification module 230 may identify the probability of a particular image matching the downloadable version of the CD separated from the probability of the particular image matching the physical version of the CD.

순위화 모듈(240)은 텍스트 식별 모듈(220)에 의해 생성된 아이템에 대한 제안된 매치의 세트를 이미지 식별 모듈(230)에 의해 생성된 아이템에 대해 제안된 매치의 세트와 통합하고 통합된 세트를 순위화하도록 구성된다. 예를 들어, 텍스트 식별 모듈(220) 및 이미지 식별 모듈(230)은 각각 제안된 매치를 위한 스코어를 각각 제공할 수 있고 순위화 모듈(240)은 가중치 인자를 사용하여 이들을 통합할 수 있다. 순위화 모듈(240)은 이미지로 묘사된 식별된 아이템으로서 최상위의 제안된 매치를 보고할 수 있다. 순위화 모듈(240)에 의해 사용되는 가중치는 서수 회귀 지원 벡터 머신(OR-SVM)을 사용하여 결정될 수 있다.The ranking module 240 integrates and merges the set of proposed matches for the item generated by the text identification module 220 with the set of suggested matches for the item generated by the image identification module 230. Is configured to rank. For example, text identification module 220 and image identification module 230 may each provide a score for a proposed match and ranking module 240 may incorporate them using weighting factors. Ranking module 240 may report the top suggested proposal as an identified item depicted as an image. The weight used by ranking module 240 may be determined using an ordinal regression support vector machine (OR-SVM).

사용자 인터페이스 모듈(250)은 사용자 인터페이스가 하나 이상의 사용자 디바이스(150A 내지 150C) 상에서 제시되게 하도록 구성된다. 예를 들어, 사용자 인터페이스 모듈(250)은 네트워크(170)를 통해 하이퍼텍스트 마크업 언어(HTML) 파일을 사용자 디바이스(150)로 제공하는 웹 서버에 의해 구현될 수 있다. 사용자 인터페이스는 통신 모듈(210)에 의해 수신된 이미지, 순위화 모듈(240)에 의해 이미지에서 식별된 아이템에 관한 저장 모듈(270)로부터 검색된 데이터, 리스팅 모듈(260)에 의해 생성된 또는 선택된 아이템 리스팅, 또는 이들의 임의의 적절한 조합을 나타낼 수 있다.User interface module 250 is configured to cause a user interface to be presented on one or more user devices 150A- 150C. For example, the user interface module 250 may be implemented by a web server that provides a hypertext markup language (HTML) file to the user device 150 via the network 170. The user interface includes the image received by the communication module 210, the data retrieved from the storage module 270 regarding the item identified in the image by the ranking module 240, the item generated or selected by the listing module 260. Listings, or any suitable combination thereof.

리스팅 모듈(260)은 순위화 모듈을 사용하여 식별된 아이템에 대한 아이템 리스팅을 생성하도록 구성된다. 예를 들어, 사용자가 아이템을 묘사하는 이미지를 업로드하고 아이템이 성공적으로 식별된 이후에, 리스팅 모듈(260)은 아이템 카탈로그로부터의 아이템 이미지, 아이템 카탈로그로부터의 아이템 제목, 아이템 카탈로그로부터의 설명, 또는 이들의 임의의 적절한 조합을 포함하는 아이템 리스팅을 생성할 수 있다. 사용자는 생성된 리스팅을 확인 또는 수정하도록 프롬프팅될 수 있거나, 생성된 리스팅은 묘사된 아이템의 식별에 응답하여 자동으로 공개될 수 있다. 리스팅은 통신 모듈(210)을 통해 전자 상거래 서버(120 또는 140)로 송신될 수 있다. 일부 예시의 실시예에서, 리스팅 모듈(260)은 전자 상거래 서버(120 또는 140)로 구현될 수 있고 리스팅은 식별 서버(130)로부터 전자 상거래 서버(120 또는 140)로 송신되는 아이템에 대한 식별자에 응답하여 생성된다.The listing module 260 is configured to generate an item listing for the identified item using the ranking module. For example, after a user uploads an image depicting an item and the item has been successfully identified, listing module 260 may display an item image from an item catalog, an item title from an item catalog, a description from an item catalog, or Item listings can be created that include any suitable combination of these. The user may be prompted to confirm or modify the generated listing, or the generated listing may be automatically published in response to the identification of the depicted item. The listing may be sent to the e-commerce server 120 or 140 via the communication module 210. In some example embodiments, the listing module 260 may be implemented as an e-commerce server 120 or 140 and listings may be included in the identifiers for the items sent from the identification server 130 to the e-commerce server 120 or 140. Generated in response.

저장 모듈(270)은 텍스트 식별 모듈(220), 이미지 식별 모듈(230), 순위화 모듈(240), 사용자 인터페이스 모듈(250), 및 리스팅 모듈(260)에 의해 생성되고 사용된 데이터를 저장 및 검색하도록 구성된다. 예를 들어, 이미지 식별 모듈(230)에 의해 사용된 분류자는 저장 모듈(270)에 의해 저장될 수 있다. 순위화 모듈(240)에 의해 생성된, 이미지로 묘사된 아이템의 식별에 관한 정보가 또한 저장 모듈(270)에 의해 저장될 수 있다. 전자 상거래 서버(120 또는 140)는 저장 모듈(270)에 의해 저장소로부터 검색될 수 있고 통신 모듈(210)을 사용하여 네트워크(170)를 통해 송신될 수 있는, 이미지에서 (예를 들어, 이미지, 이미지 식별자 또는 둘다 제공함으로써) 아이템의 식별을 요청할 수 있다.The storage module 270 stores data generated and used by the text identification module 220, the image identification module 230, the ranking module 240, the user interface module 250, and the listing module 260. Configured to search. For example, the classifier used by image identification module 230 may be stored by storage module 270. Information regarding the identification of the item depicted by the image, generated by the ranking module 240, may also be stored by the storage module 270. E-commerce server 120 or 140 may be retrieved from storage by storage module 270 and transmitted over network 170 using communication module 210 (eg, image, By providing an image identifier or both).

도 3은 일부 예시의 실시예에 따른 디바이스(150)의 컴포넌트를 도시하는 블록도이다. 디바이스(150)는 입력 모듈(310), 카메라 모듈(320), 및 통신 모듈(330)을 포함하는 것으로서 도시되고, (예를 들어, 버스, 공유 메모리, 또는 스위치를 통해) 모두 서로 통신하도록 구성된다. 본원에서 설명된 임의의 하나 이상의 모듈은 하드웨어(예를 들어, 머신의 프로세서)를 사용하여 구현될 수 있다. 또한, 임의의 둘 이상의 모듈은 단일 모듈로 통합될 수 있고, 단일 모듈에 대해 본원에서 설명된 기능들은 다수의 모듈 사이에서 세분화될 수 있다. 또한, 다양한 예시의 실시예에 따라, 단일 머신, 데이터베이스, 또는 디바이스 내에서 구현되는 것으로서 본원에서 설명된 모듈은 다수의 머신, 데이터베이스, 또는 디바이스에 걸쳐 분산될 수 있다.3 is a block diagram illustrating components of device 150 in accordance with some example embodiments. Device 150 is shown as comprising an input module 310, a camera module 320, and a communication module 330, all configured to communicate with one another (eg, via a bus, shared memory, or switch). do. Any one or more modules described herein may be implemented using hardware (eg, a processor of a machine). In addition, any two or more modules may be integrated into a single module, and the functions described herein for a single module may be subdivided among multiple modules. In addition, according to various example embodiments, the modules described herein as implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.

입력 모듈(310)은 사용자 인터페이스를 통해 사용자로부터 입력을 수신하도록 구성된다. 예를 들어, 사용자는 그의 사용자이름 및 패스워드를 입력 모듈에 입력하고, 카메라를 구성하고, 이미지를 리스팅 또는 아이템 검색을 위한 근거로서 사용되도록 선택하거나, 이들의 임의의 조합을 수행할 수 있다.The input module 310 is configured to receive input from a user via a user interface. For example, the user may enter his username and password into the input module, configure the camera, select the image to be used as a basis for listing or item retrieval, or perform any combination thereof.

카메라 모듈(320)은 이미지 데이터를 캡쳐하도록 구성된다. 예를 들어, 이미지는 카메라로부터 수신될 수 있고, 깊이 이미지는 적외선 카메라로부터 수신될 수 있고, 한 쌍의 이미지는 쌍안경 카메라로부터 수신될 수 있다.Camera module 320 is configured to capture image data. For example, an image may be received from a camera, a depth image may be received from an infrared camera, and a pair of images may be received from a binocular camera.

통신 모듈(330)은 입력 모듈(310) 또는 카메라 모듈(320)에 의해 수신된 데이터를 식별 서버(130), 전자 상거래 서버(120), 또는 전자 상거래 서버(140)로 전달하도록 구성된다. 예를 들어, 입력 모듈(310)은 카메라 모듈(320)로 찍은 이미지의 선택 및 사용자(예를 들어, 사용자(160))가 판매하기를 원하는 아이템을 이미지가 묘사한다는 표시를 수신할 수 있다. 통신 모듈(330)은 이미지 및 표시를 전자 상거래 서버(120)로 전송할 수 있다. 전자 상거래 서버(120)는 이미지를 식별 서버(130)로 송신하여 이미지로 묘사된 아이템의 식별을 요청하고, 카테고리에 기초하여 리스팅 템플릿을 생성하고, 리스팅 템플릿이 통신 모듈(330) 및 입력 모듈(310)을 통해 사용자에게 제시되게 할 수 있다.The communication module 330 is configured to deliver data received by the input module 310 or the camera module 320 to the identification server 130, the e-commerce server 120, or the e-commerce server 140. For example, the input module 310 may receive a selection of an image taken by the camera module 320 and an indication that the image depicts an item that the user (eg, the user 160) wants to sell. The communication module 330 may transmit the image and the display to the e-commerce server 120. The e-commerce server 120 sends the image to the identification server 130 to request identification of the item depicted by the image, generate a listing template based on the category, and the listing template is the communication module 330 and the input module ( 310 may be presented to the user.

도 4는 일부 예시의 실시예에 따라 아이템의 기준 및 비협조적인 이미지를 도시한다. 그룹(410, 420, 및 430)의 각각에서의 제 1 엔트리는 카테고리 이미지이다. 카탈로그 이미지로 묘사된 아이템은 채광이 좋고(well-lit), 카메라와 직접 대면하며, 적절하게 방향조정된다. 각각의 그룹의 나머지 이미지는 다양한 방향 및 대면으로 사용자에 의해 찍힌 이미지이다. 추가적으로, 비 카탈로그 이미지는 배경 클러터를 묘사한다.4 illustrates a reference and uncooperative image of an item in accordance with some example embodiments. The first entry in each of the groups 410, 420, and 430 is a category image. Items depicted in the catalog image are well-lit, face-to-face with the camera, and properly oriented. The remaining images of each group are images taken by the user in various directions and facings. In addition, non-catalog images depict background clutter.

도 5는 일부 예시의 실시예에 따라 이미지로 묘사된 아이템을 식별하기 위한 텍스트 추출 동작을 도시한다. 도 5의 각각의 행은 입력 이미지 상에서 수행되는 예시의 동작을 도시한다. 구성요소(510A 및 510B)는 각각의 행에 대한 입력 이미지를 도시한다. 구성요소(520A 및 520B)는 후보 추출 및 방향의 결과를 도시한다. 즉, 질의 이미지를 고려하면, 텍스트 블록은 휴리스틱(heuristic)에 기초한 랜덤 변환을 사용하여 식별되고 방향조정된다. 대강 동일 선상의 문자들은 라인으로 식별되고 OCR(예를 들어, 4차원 정육면체 OCR)을 통과하여 텍스트 출력을 획득한다. 예시로서, 구성요소(530A 및 530B)는 획득된 텍스트 출력의 서브세트를 보여준다.5 illustrates a text extraction operation to identify an item depicted as an image, in accordance with some example embodiments. Each row of FIG. 5 illustrates example operations performed on an input image. Components 510A and 510B show input images for each row. Components 520A and 520B show the results of candidate extraction and direction. That is, taking into account the query image, the text block is identified and oriented using a heuristic based random transformation. The roughly collinear characters are identified as lines and pass through an OCR (eg, 4D cube OCR) to obtain text output. As an example, components 530A and 530B show a subset of the obtained text output.

도 6은 일부 예시의 실시예에 따라 미디어 아이템을 묘사하는 입력 이미지 및 아이템에 대해 제안된 매치의 세트를 도시한다. 이미지(610)는 입력 이미지이다. 이미지(610)는 묘사된 미디어 아이템 상의 텍스트가 이미지와 정렬되도록 배향되어 있지만, 미디어 아이템은 카메라에 대해 소정 각도를 이룬다. 또한, 미디어 아이템은 이미지로 묘사된 텍스트 중 일부를 불분명하게 만드는 광원을 반사시킨다. 제안된 매치의 세트(620)는 텍스트 식별 모듈(220)에 의해 보고되는 상위 5개의 매치를 묘사한다. 제안된 매치의 세트(630)는 이미지 식별 모듈(230)에 의해 보고되는 상위 5개의 매치를 묘사한다. 제안된 매치의 세트(640)는 순위화 모듈(240)에 의해 보고되는 상위 5개의 매치를 묘사한다. 따라서, 제안된 매치의 세트(640)에서 제 1 엔트리는 식별 서버(130)에 의해 입력 이미지(610)에 대한 매치로서 정확하게 보고된다.6 illustrates a proposed set of matches for an item and an input image depicting a media item in accordance with some example embodiments. Image 610 is an input image. Image 610 is oriented such that the text on the depicted media item is aligned with the image, but the media item is at an angle to the camera. Media items also reflect light sources that obscure some of the text depicted in the image. The proposed set of matches 620 depicts the top five matches reported by the text identification module 220. The proposed set of matches 630 depicts the top five matches reported by the image identification module 230. The proposed set of matches 640 depicts the top five matches reported by ranking module 240. Thus, the first entry in the set of proposed matches 640 is correctly reported by the identification server 130 as a match for the input image 610.

도 7은 일부 예시의 실시예에 따라 이미지에서의 아이템을 식별하는 프로세스를 수행하는 식별 서버(130)의 동작을 도시하는 흐름도이다. 프로세스(700)는 동작 (710), (720), (730), (740), 및 (750)을 포함한다. 제한이 아닌 오직 예시로서, 동작(710) 내지 (750)은 모듈(210) 내지 (270)에 의해 수행되는 것으로서 설명된다.7 is a flowchart illustrating operation of identification server 130 to perform a process of identifying items in an image, in accordance with some example embodiments. Process 700 includes operations 710, 720, 730, 740, and 750. By way of example only and not limitation, operations 710-750 are described as being performed by modules 210-270.

동작(710)에서, 이미지 분류 모듈(230)이 이미지에 액세스한다. 예를 들어, 이미지는 디바이스(150)에 의해 캡쳐될 수 있고, 네트워크(170)를 통해 식별 서버(130)로 송신될 수 있고, 식별 서버(130)의 통신 모듈(210)에 의해 수신될 수 있고, 통신 모듈(210)에 의해 이미지 분류 모듈(230)로 전달될 수 있다. 이미지 분류 모듈(230)은 데이터베이스에서 이미지에 대한 후보 매치의 제 1 세트의 각각에 대한 스코어를 결정한다(동작 (720)). 예를 들어, 국부적으로 집계된 서술자 벡터(vector of locally aggregated descriptors, VLAD)는 데이터베이스에서 후보 매치를 식별하고 이들을 순위화하는데 사용될 수 있다. 일부 예시의 실시예에서, VLAD는 트레이닝 세트로부터 가속화된 강력한 특징(speeded up robust feature, SURF)을 밀도 높게 추출하고 k=256인 k-평균을 사용하여 서술자를 클러스터링하는 것에 의해 구조화되어 어휘를 생성한다. 일부 예시의 실시예에서, 유사성 메트릭은 정규화된 VLAD 서술자들 사이의 L2(유클리드) 거리에 기초한다.In operation 710, the image classification module 230 accesses the image. For example, the image may be captured by the device 150, transmitted to the identification server 130 via the network 170, and received by the communication module 210 of the identification server 130. And may be transmitted by the communication module 210 to the image classification module 230. Image classification module 230 determines a score for each of the first set of candidate matches for the image in the database (operation 720). For example, vector of locally aggregated descriptors (VLAD) can be used to identify and rank candidate matches in a database. In some example embodiments, the VLAD is structured by densely extracting the speeded up robust feature (SURF) from the training set and clustering the descriptors using a k-average with k = 256 to generate a vocabulary. do. In some example embodiments, the similarity metric is based on L2 (Euclidean) distance between normalized VLAD descriptors.

동작(730)에서, 텍스트 식별 모듈(220)은 이미지에 액세스하고 이로부터 텍스트를 추출한다. 텍스트 식별 모듈(220)은 데이터베이스에서의 텍스트에 대한 후보 매치의 제 2 세트의 각각에 대한 스코어를 결정한다. 예를 들어, BoW(bag of words) 알고리즘이 데이터베이스에서 후보 매치를 식별하고 이들을 순위화하는데 사용될 수 있다. 텍스트는 이미지로부터 방향에 구속받지 않는(orientation-agnostic) 방식으로 추출될 수 있다. 추출된 텍스트는 투사 분석을 통해 수평 정렬로 재지향된다. 랜덤 변환이 계산되고 라인의 각도는 선택된 투사된 영역을 갖는다. 텍스트의 개별 라인은 중심부 문자의 클러스터링을 사용하여 추출된다. 최대로 안정한 극단 영역(MSER)은 각각의 클러스터 내에서 잠재적인 문자로서 식별된다. 문자 후보는 이들이 인접한 경우 또는 이들의 기저가 근접 y 값을 갖는 경우에 유사한 높이의 영역을 통합함으로써 라인으로 그룹화된다. 비현실적인 라인 후보는 종횡비(aspect ratio)가 임계치를 초과하는 경우(예를 들어, 라인의 길이가 폭의 15배를 초과하는 경우) 배제된다.In operation 730, the text identification module 220 accesses and extracts text from the image. Text identification module 220 determines a score for each of the second set of candidate matches for text in the database. For example, a bag of words (BoW) algorithm can be used to identify and rank candidate matches in a database. Text may be extracted from an image in an orientation-agnostic manner. The extracted text is redirected to horizontal alignment through projection analysis. The random transform is calculated and the angle of the line has the selected projected area. Individual lines of text are extracted using clustering of central characters. The most stable extreme region (MSER) is identified as a potential letter within each cluster. Character candidates are grouped into lines by incorporating regions of similar height when they are adjacent or when their bases have proximity y values. Unrealistic line candidates are excluded when the aspect ratio exceeds the threshold (eg, when the length of the line exceeds 15 times the width).

식별된 라인의 텍스트는 텍스트 추출을 위해 OCR 엔진을 통과한다. 텍스트의 추출된 라인이 뒤집힐 수 있는 가능성을 설명하기 위해, 식별된 라인의 텍스트는 또한 180도로 회전되고 회전된 라인이 OCR 엔진을 통과한다.The text of the identified line passes through the OCR engine for text extraction. To illustrate the possibility that the extracted lines of text may be reversed, the text of the identified lines is also rotated 180 degrees and the rotated lines pass through the OCR engine.

동작(740)에서, 캐릭터 n-그램은 텍스트 매칭을 위해 사용된다. 크기 N의 슬라이딩 윈도우는 충분한 길이를 갖는 각각의 단어를 만나서 알파벳이 아닌 문자는 폐기된다. N=3인 일례로서, "I like turtles"라는 구문은 "lik", "ike", "tur", "urt", "rtl", "tle" 및 "les"로 분해될 것이다. 일부 예시의 실시예에서, 모든 문자를 소문자로 변환함으로써 대소문자가 무시된다.In operation 740, the character n-gram is used for text matching. A sliding window of size N meets each word of sufficient length so that non-alphabetic characters are discarded. As an example where N = 3, the phrase "I like turtles" would be broken down into "lik", "ike", "tur", "urt", "rtl", "tle" and "les". In some example embodiments, case is ignored by converting all characters to lowercase.

각각의 문서에 대한 N-그램의 비정규화된 히스토그램은 f로서 지칭된다. 일부 예시의 실시예에서, 질의와 문서 사이의 정규화된 유사성 스코어를 계산하기 위해 이하의 스킴이 사용된다.The denormalized histogram of N-grams for each document is referred to as f. In some example embodiments, the following scheme is used to calculate a normalized similarity score between a query and a document.

여기서 N₁ 및 N₂는 각각 L1 및 L2 정규화를 계산하기 위한 함수이다. 감마 벡터는 역 문서 빈도수(idf) 가중치의 벡터이다. 각각의 고유 N-그램 g에 대해, 대응하는 idf 가중치는 으로서 계산되고, 데이터베이스에서 다수의 문서의 자연 로그는 N-그램 g를 포함하는 다수의 문서에 의해 분할된다. 최종 정규화는 L2 정규화이다.Where N ₁ and N ₂ are functions for calculating L ₁ and L ₂ normalization, respectively. The gamma vector is a vector of inverse document frequency (idf) weights. For each unique N-gram g, the corresponding idf weight is And the natural log of multiple documents in the database is partitioned by multiple documents comprising N-grams g. The final normalization is L2 normalization.

동작(750)에서, 순서화 모듈(240)은 스코어의 제 1 세트 및 스코어의 제 2 세트에 기초하여, 이미지에 대한 가능성있는 매치를 식별한다. 예를 들어, 대응하는 스코어는 합해지고 가중되거나 통합될 수 있고, 후보 매치는 가능성있는 매치로서 식별되는 최상 결과 스코어를 갖는다.In operation 750, the ordering module 240 identifies possible matches for the image based on the first set of scores and the second set of scores. For example, the corresponding scores can be summed, weighted or integrated, and the candidate match has the best result score identified as a possible match.

는 유사도 측정치의 세트를 통합된 순위로 통합한다. 각각의 는 하나의 특징 타입으로부터의 유사도 측정치를 나타낸다. 를 계산하기 위한 항의 최적 가중치는 항상 부정확한 것보다 정확한 질의/참조 매치 사이의 더 높은 유사성을 제공한다. 따라서, 이하의 최적화가 최적 가중치 벡터 w를 학습하기 위한 트레이닝 프로세스 동안 시행될 수 있다. Combines a set of similarity measures into a unified ranking. Each Denotes a measure of similarity from one feature type. The optimal weight of the term to compute is always provides a higher similarity between exact query / reference matches than incorrect. Thus, the following optimization can be implemented during the training process for learning the optimal weight vector w.

동작(750) 동안, 개별 S 값(예를 들어, OCR 매치에 대한 것 및 VLAD 매치에 대한 것)은 벡터로 통합되고, 통합된 스코어는 에 를 곱하여 생성된다. 일부 예시의 실시예에서, 질의 이미지에 대한 최상의 통합된 스코어를 갖는 아이템은 매칭 아이템으로서 취해진다. 일부 예시의 실시예에서, 임계치를 초과하는 통합된 스코어를 갖는 아이템이 존재하지 않을 때, 매치되는 것으로 확인된 아이템이 존재하지 않는다. 일부 예시의 실시예에서, 임계치를 초과하는 통합된 스코어를 갖는 아이템의 세트, 최상의 통합된 스코어를 갖는 K개의 아이템의 세트, 또는 이들의 적절한 조합이 후술하는 바와 같이 기하학적 특징을 이용하는 추가 이미지 매칭을 위해 선택된다.During operation 750, individual S values (eg, for an OCR match and for a VLAD match) are Integrated into a vector, the integrated score on Is multiplied by In some example embodiments, the item with the best integrated score for the query image is taken as a matching item. In some example embodiments, when there is no item with an integrated score above a threshold, there is no item that is found to match. In some example embodiments, a set of items with an integrated score above a threshold, a set of K items with the best integrated score, or a suitable combination thereof, may further utilize additional image matching using geometric features as described below. To be selected.

잠재적인 매치 및 질의 이미지는 표준 크기(예를 들어, 256 × 256 픽셀)로 크기가 조정된다. 각각의 크기 조정된 이미지에 대한 8개의 방향, 셀 당 8 바이 8 픽셀, 및 블록 당 2 바이 2 셀에 대해 방향성 기울기 히스토그램(histograms of oriented gradients, HOG) 값이 결정된다. 각각의 잠재적인 매치에 대해, 변환된 질의 매트릭스와 잠재적으로 매칭하는 이미지 사이의 에러를 최소화하는 선형 변환 매트릭스가 확인된다. 최소화된 에러가 비교되고, 가장 적게 최소화된 에러를 갖는 잠재적인 매치가 매치로서 보고된다.Potential match and query images are scaled to a standard size (eg 256 × 256 pixels). Histograms of oriented gradients (HOG) values are determined for eight directions, 8 by 8 pixels per cell, and 2 by 2 cells per block for each scaled image. For each potential match, a linear transformation matrix is identified that minimizes the error between the transformed query matrix and the potentially matching image. Minimized errors are compared and potential matches with the least minimized errors are reported as matches.

에러를 최소화하는 선형 변환 매트릭스를 식별하는 하나의 방법은 다수(예를 들어, 100)의 이러한 변환 매트릭스를 랜덤하게 생성하고, 이들 매트릭스의 각각에 대한 에러를 판정하는 것이다. 최소 에러가 임계치 미만인 경우, 대응하는 매트릭스가 사용된다. 그렇지 않다면, 새로운 세트의 랜덤 변환 매트릭스가 생성되고 평가된다. 사전결정된 수의 반복 이후에, 확인된 최소 에러에 대응하는 매트릭스가 사용되고, 방법이 종료된다.One way to identify linear transformation matrices that minimize errors is to randomly generate a number (eg, 100) of these transformation matrices and determine the error for each of these matrices. If the minimum error is below the threshold, the corresponding matrix is used. If not, a new set of random transform matrices is generated and evaluated. After a predetermined number of iterations, the matrix corresponding to the identified minimum error is used and the method ends.

도 8은 일부 예시의 실시예에 따라 이미지로 묘사된 아이템의 판매 리스팅을 자동으로 생성하는 프로세스(800)를 수행하는 서버의 동작을 도시하는 흐름도이다. 프로세스(800)는 동작(810), (820) 및 (830)을 포함한다. 제한이 아닌 오직 예시로서, 동작(810) 내지 (830)은 식별 서버(130) 및 전자 상거래 서버(120)에 의해 수행되는 것으로서 설명된다.8 is a flowchart illustrating the operation of a server performing a process 800 for automatically generating a sales listing of an item depicted as an image, in accordance with some example embodiments. Process 800 includes operations 810, 820, and 830. By way of example only and not limitation, operations 810-830 are described as being performed by identification server 130 and e-commerce server 120.

동작(810)에서, 전자 상거래 서버(120)는 이미지를 수신한다. 예를 들어, 사용자(160)는 디바이스(150)를 사용하여 이미지를 찍을 수 있고 이를 전자 상거래 서버(120)에 업로드할 수 있다. 동작(820)에서, 식별 서버(120)는 프로세스(700)를 사용하여 이미지로 묘사된 아이템을 식별한다. 예를 들어, 전자 상거래 서버(130)는 식별을 위해 이미지를 식별 서버(120)로 전달할 수 있다. 일부 예시의 실시예에서, 전자 상거래 서버(120) 및 식별 서버(130)가 통합되고 전자 상거래 서버(120)는 이미지에서 아이템을 식별한다.In operation 810, the e-commerce server 120 receives an image. For example, the user 160 can take an image using the device 150 and upload it to the e-commerce server 120. At operation 820, identification server 120 uses process 700 to identify the item depicted in the image. For example, the e-commerce server 130 may transfer the image to the identification server 120 for identification. In some example embodiments, the e-commerce server 120 and the identification server 130 are integrated and the e-commerce server 120 identifies the item in the image.

동작(830)에서, 전자 상거래 서버(120)는 사용자(160)에 의해 판매되는 것으로서 아이템을 설명하는 리스팅을 생성한다. 예를 들어, 사용자가 "The Last Mogul"이라는 제목의 책의 그림을 업로드한다면, "The Last Mogul"에 대한 리스팅이 생성될 수 있다. 일부 예시의 실시예에서, 생성된 리스팅은 아이템의 카탈로그 이미지, 아이템 제목, 아이템의 설명을 포함하고, 모두 제품 데이터베이스로부터 로딩된다. 추가 리스팅 옵션 또는 디폴트 리스팅 옵션(예를 들어, 가격 또는 시작가, 판매 포맷(경매 또는 고정 가격), 또는 선적 옵션)을 선택하도록 사용자에게 제시되는 사용자 인터페이스가 사용될 수 있다.In operation 830, the e-commerce server 120 generates a listing describing the item as sold by the user 160. For example, if a user uploads a picture of a book titled "The Last Mogul", a listing may be created for "The Last Mogul". In some example embodiments, the generated listing includes a catalog image of the item, the item title, and a description of the item, all loaded from the product database. A user interface presented to the user may be used to select additional listing options or default listing options (eg, price or starting price, sale format (auction or fixed price), or shipping option).

도 9는 일부 예시의 실시예에 따라 이미지로 묘사된 아이템에 기초한 결과를 제공하는 프로세스를 수행하는 서버의 동작을 도시하는 흐름도이다. 프로세스(900)는 동작(910), (920), 및 (930)을 포함한다. 제한이 아닌 오직 예시로서, 동작(910) 내지 (930)은 식별 서버(130) 및 전자 상거래 서버(120)에 의해 수행되는 것으로서 설명된다.9 is a flowchart illustrating operation of a server to perform a process of providing results based on items depicted as images in accordance with some example embodiments. Process 900 includes operations 910, 920, and 930. By way of example only and not limitation, operations 910 to 930 are described as being performed by identification server 130 and e-commerce server 120.

동작(910)에서, 전자 상거래 서버(120) 또는 검색 엔진 서버는 이미지를 수신한다. 예를 들어, 사용자(160)는 디바이스(150)를 사용하여 이미지를 찍을 수 있고 이를 전자 상거래 서버(120) 또는 검색 엔진 서버에 업로드할 수 있다. 동작(920)에서, 식별 서버(130)는 프로세스(700)를 사용하여 이미지에서 묘사된 아이템을 식별한다. 예를 들어, 전자 상거래 서버(120)는 식별을 위해 이미지를 식별 서버(130)로 전달할 수 있다. 일부 예시의 실시예에서, 전자 상거래 서버(130) 및 식별 서버(120)가 통합되고 전자 상거래 서버(130)는 이미지로 묘사된 아이템을 식별한다. 유사하게, 검색 엔진 서버(예를 들어, 문서, 웹 페이지, 이미지, 비디오, 또는 다른 파일을 위치지정하는 서버)는 이미지를 수신하고, 식별 서버(130)를 통해, 이미지로 묘사된 미디어 아이템을 식별한다.In operation 910, the e-commerce server 120 or search engine server receives the image. For example, the user 160 can take an image using the device 150 and upload it to the e-commerce server 120 or search engine server. At operation 920, identification server 130 uses process 700 to identify the item depicted in the image. For example, the e-commerce server 120 may transfer the image to the identification server 130 for identification. In some example embodiments, the e-commerce server 130 and the identification server 120 are integrated and the e-commerce server 130 identifies the item depicted in the image. Similarly, a search engine server (eg, a server that locates a document, web page, image, video, or other file) receives the image and, via identification server 130, retrieves the media item depicted as the image. To identify.

동작(930)에서, 전자 상거래 서버(120) 또는 검색 엔진 서버는 이미지의 수신에 응답하여 사용자에게 하나 이상의 아이템에 관한 정보를 제공한다. 이미지로 묘사된 식별된 아이템에 기초하여 아이템이 선택된다. 예를 들어, 사용자가 "The Last Mogul"이라는 제목의 책의 그림을 업로드한다면, 전자 상거래 서버(120) 또는 (140)을 통해 리스팅된 "The Last Mogul"에 대한 판매 리스팅이 식별되고 이미지를 제공한 사용자에게 제공될 수 있다(예를 들어, 사용자(160)에게 디스플레이하기 위해 디바이스(150A)로 네트워크(170)를 통해 전송됨). 다른 예시로서, 사용자가 일반 검색 엔진에 "The Last Mogul"의 그림을 업로드한다면, "The Last Mogul"을 언급한 웹 페이지가 식별될 수 있고, 판매를 위한 "The Last Mogul"을 갖는 상점이 식별될 수 있고, "The Last Mogul"에 대한 리뷰의 비디오가 식별될 수 있고, 이들 중 하나 이상이 사용자에게 제공될 수 있다(예를 들어, 사용자 디바이스의 웹 브라우저 상에서 디스플레이하기 위한 웹 페이지에서).At operation 930, the e-commerce server 120 or search engine server provides the user with information regarding one or more items in response to receiving the image. An item is selected based on the identified item depicted in the image. For example, if a user uploads a picture of a book titled "The Last Mogul", the sales listing for "The Last Mogul" listed through the e-commerce server 120 or 140 is identified and provided an image. May be provided to one user (eg, sent over network 170 to device 150A for display to user 160). As another example, if a user uploads a picture of "The Last Mogul" to a general search engine, a web page mentioning "The Last Mogul" can be identified, and a store with "The Last Mogul" for sale is identified. And a video of the review for “The Last Mogul” can be identified, and one or more of them can be provided to the user (eg, in a web page for display on a web browser of the user device).

다양한 예시의 실시예에 따라, 본원에서 설명된 하나 이상의 방법론이 이미지로 묘사된 아이템(예를 들어, 미디어 아이템)을 식별하는 것을 제공할 수 있다. 또한, 본원에서 설명된 하나 이상의 방법론은 단독의 이미지 식별 방법 또는 텍스트 분류 방법에 비해 이미지로 묘사된 아이템을 식별하는 것을 제공할 수 있다. 또한, 본원에서 설명된 하나 이상의 방법론은 더 신속하게 그리고 이전 방법과 비교하여 더 적은 계산력을 사용하여 이미지로 묘사된 아이템을 식별하는 것을 제공할 수 있다.In accordance with various example embodiments, one or more methodologies described herein may provide for identifying an item (eg, a media item) depicted as an image. In addition, one or more methodologies described herein may provide for identifying items depicted in images as compared to a method of image identification alone or text classification. In addition, one or more methodologies described herein can provide for identifying items depicted in images more quickly and using less computational power compared to previous methods.

이들 효과가 합쳐져 고려될 때, 본원에서 설명된 하나 이상의 방법론은 이미지로 묘사된 아이템을 식별하는데 포함될 특정한 노력 또는 리소스에 대한 필요성을 경감시킬 수 있다. 관심 아이템을 주문하는데 사용자에 의해 소모되는 노력은 또한 본원에서 설명된 하나 이상의 방법론에 의해 감소될 수 있다. 예를 들어, 이미지로부터 사용자에게 관심있는 아이템을 정확하게 식별하는 것은 아이템 리스팅을 생성하는 것 또는 구매를 위한 아이템을 찾는 것에 있어서 사용자에 의해 소모되는 시간 또는 노력의 양을 감소시킬 수 있다. (예를 들어, 네트워크 환경(100) 내의) 하나 이상의 머신, 데이터베이스, 또는 디바이스에 의해 사용되는 컴퓨팅 리소스가 유사하게 감소될 수 있다. 이러한 컴퓨팅 리소스의 예시들은 프로세서 주기, 네트워크 트래픽, 메모리 사용량, 데이터 저장 용량, 전력 소비 및 냉각 용량을 포함한다.When these effects are considered in combination, one or more of the methodologies described herein can alleviate the need for particular effort or resources to be included in identifying items depicted with images. The effort consumed by the user in ordering the item of interest may also be reduced by one or more methodologies described herein. For example, accurately identifying an item of interest to the user from the image can reduce the amount of time or effort spent by the user in generating an item listing or in finding an item for purchase. Computing resources used by one or more machines, databases, or devices (eg, within network environment 100) may be similarly reduced. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption and cooling capacity.

소프트웨어 아키텍쳐Software architecture

도 10은 상술된 임의의 하나 이상의 디바이스 상에 설치될 수 있는 소프트웨어(1002)의 아키텍쳐를 도시하는 블록도(1000)이다. 도 10은 단지 소프트웨어 아케틱쳐의 비제한적인 예시이고, 많은 다른 아키텍쳐는 본원에서 설명된 기능을 제공하도록 구현될 수 있다는 것을 이해할 수 있다. 소프트웨어(1002)는 프로세서(1110), 메모리(1130), 및 입력/출력(I/O) 컴포넌트(1150)를 포함하는 도 11의 머신(1100)과 같은 하드웨어에 의해 구현될 수 있다. 이 예시의 아키텍쳐에서, 소프트웨어(1002)는 레이어의 스택으로서 개념화될 수 있고, 각각의 레이어는 특정 기능을 제공할 수 있다. 예를 들어, 소프트웨어(1002)는 운영 시스템(1004), 라이브러리(1006), 프레임워크(1008), 및 애플리케이션(1010)과 같은 레이어를 포함한다. 선택적으로, 일부 구현예에 따라, 애플리케이션(1010)은 소프트웨어 스택을 통해 애플리케이션 프로그래밍 인터페이스(API) 호출(1012)을 호출하고 API 호출(1012)에 응답하여 메시지(1014)를 수신한다.10 is a block diagram 1000 illustrating an architecture of software 1002 that may be installed on any one or more devices described above. 10 is merely a non-limiting example of a software architecture, and it can be understood that many other architectures can be implemented to provide the functionality described herein. The software 1002 may be implemented by hardware such as the machine 1100 of FIG. 11 that includes a processor 1110, memory 1130, and input / output (I / O) component 1150. In this example architecture, software 1002 can be conceptualized as a stack of layers, with each layer providing a specific function. For example, software 1002 includes layers such as operating system 1004, library 1006, framework 1008, and application 1010. Optionally, according to some implementations, the application 1010 calls an application programming interface (API) call 1012 via a software stack and receives a message 1014 in response to the API call 1012.

다양한 구현예에서, 운영 시스템(1004)은 하드웨어 리소스를 관리하고 공통 서비스를 제공한다. 운영 시스템(1004)은, 예를 들어, 커넬(102), 서비스(1022), 및 드라이버(1024)를 포함한다. 일부 구현예에서 커넬(1020)은 하드웨어와 다른 소프트웨어 레이어 사이의 추상 레이어로서 동작한다. 예를 들어, 커넬(1020)은 메모리 관리, 프로세서 관리(예를 들어, 스케쥴링), 컴포넌트 관리, 네트워킹, 보안 설정, 기타 다른 기능들을 제공한다. 서비스(1022)는 다른 소프트웨어 레이어를 위한 다른 공통 서비스를 제공할 수 있다. 드라이버(1024)는 기초 하드웨어와 인터페이싱하거나 제어해야할 책임이 있다. 예를 들어, 드라이버(1024)는 디스플레이 드라이버, 카메라 드라이버, 블루투스® 드라이버, 플래쉬 메모리 드라이버, 직렬 통신 드라이버(예를 들어, 범용 직렬 버스(USB) 드라이버), 와이파이® 드라이버, 오디오 드라이버, 전력 관리 드라이버 등을 포함할 수 있다.In various implementations, operating system 1004 manages hardware resources and provides common services. Operating system 1004 includes, for example, kernel 102, service 1022, and driver 1024. In some implementations kernel 1020 operates as an abstraction layer between hardware and other software layers. For example, kernel 1020 provides memory management, processor management (eg, scheduling), component management, networking, security settings, and other functions. The service 1022 may provide other common services for other software layers. The driver 1024 is responsible for interfacing or controlling the underlying hardware. For example, the driver 1024 may be a display driver, a camera driver, a Bluetooth® driver, a flash memory driver, a serial communication driver (eg, a universal serial bus (USB) driver), a WiFi® driver, an audio driver, a power management driver. And the like.

일부 예시의 구현예에서, 라이브러리(1006)는 애플리케이션(1010)에 의해 활용될 수 있는 저레벨 공통 인프라스트럭쳐를 제공한다. 라이브러리(1006)는 메모리 할당 기능, 문자열 조작 기능, 수학 기능 등과 같은 기능을 제공할 수 있는 시스템 라이브러리(1030)(예를 들어, C 표준 라이브러리)를 포함할 수 있다. 또한, 라이브러리(1006)는 미디어 라이브러리(예를 들어, MPEG4(Moving Picture Experts Group-4), H.264 또는 AVC(Advanced Video Coding), MP3(Moving Picture Experts Group Layer-3), AAC(Advanced Audio Coding), AMR(Adaptive Multi-Rate) 오디오 코덱, JPEG 또는 JPG(Joint Photographic Experts Group), PNG(Portable Network Graphics)과 같은 다양한 미디어 포맷의 프레젠테이션 및 조작을 지원하는 라이브러리), 그래픽 라이브러리(예를 들어, 2차원(2D) 및 3차원(3D)에서 렌더링하는데 사용되는 OpenGL 프레임워크), 데이터베이스 라이브러리(예를 들어, 다양한 관계 데이터베이스 기능을 제공하는 SQLite), 웹 라이브러리(예를 들어, 웹 브라우징 기능을 제공하는 WebKit) 등과 같은 API 라이브러리(1032)를 포함할 수 있다. 라이브러리(1006)는 또한 애플리케이션(1010)에 많은 다른 API를 제공하는 다양한 다른 라이브러리(1034)를 포함할 수 있다.In some example implementations, library 1006 provides a low level common infrastructure that can be utilized by application 1010. The library 1006 may include a system library 1030 (eg, a C standard library) that may provide functions such as memory allocation functions, string manipulation functions, mathematical functions, and the like. In addition, the library 1006 may be a media library (e.g., Moving Picture Experts Group-4 (MPEG4), H.264 or Advanced Video Coding (AVC), Moving Picture Experts Group Layer-3 (MP3), Advanced Audio (AAC). Coding), Adaptive Multi-Rate (AMR) audio codecs, libraries that support the presentation and manipulation of various media formats such as JPEG or Joint Photographic Experts Group (JPG), PNG (Portable Network Graphics), graphics libraries (e.g. , The OpenGL framework used to render in two-dimensional (2D) and three-dimensional (3D), database libraries (e.g. SQLite to provide various relational database features), web libraries (e.g. web browsing features It may include an API library (1032) such as provided WebKit). Library 1006 can also include various other libraries 1034 that provide many other APIs to the application 1010.

일부 구현예에 따라, 프레임워크(1008)는 애플리케이션(1010)에 의해 활용될 수 있는 고레벨 공통 인프라스트럭쳐를 제공한다. 예를 들어, 프레임워크(1008)는 다양한 그래픽 사용자 인터페이스(GUI) 기능, 고레벨 리소스 관리, 고레벨 위치 서비스 등을 제공한다. 프레임워크(1008)는 애플리케이션(1010)에 의해 활용될 수 있는 다른 API의 광범위 스펙트럼을 제공할 수 있고, 이들 중 일부는 특정 운영 시스템 또는 플랫폼에 대해 특정될 수 있다.According to some implementations, framework 1008 provides a high level common infrastructure that can be utilized by application 1010. For example, framework 1008 provides various graphical user interface (GUI) functions, high level resource management, high level location services, and the like. The framework 1008 may provide a broad spectrum of other APIs that may be utilized by the application 1010, some of which may be specific to a particular operating system or platform.

예시의 실시예에서, 애플리케이션(1010)은 홈 애플리케이션(1050), 연락처 애플리케이션(1052), 브라우저 애플리케이션(1054), 북 리더 애플리케이션(1056), 위치 애플리케이션(1058), 미디어 애플리케이션(1060), 메시징 애플리케이션(1062), 게임 애플리케이션(1064), 및 3자 애플리케이션(1066)과 같은 광범위한 종류의 다른 애플리케이션을 포함한다. 일부 실시예에 따르면, 애플리케이션(1010)은 프로그램에 정의된 기능을 실행하는 프로그램이다. 다양한 프로그래밍 언어는 객체 지향 프로그래밍 언어(예를 들어, 객체-C, 자바, 또는 C++) 또는 절차 프로그래밍 언어(예를 들어, C 또는 어셈블리 언어)와 같이, 다양한 방식으로 구조화된 하나 이상의 애플리케이션(1010)을 생성하도록 활용될 수 있다. 특정 예시에서, 제 3 자 애플리케이션(1066)(예를 들어, 특정 플랫폼의 공급자 이외의 엔티티에 의해 안드로이드™ 또는 iOS™ 소프트웨어 개발 키트(SDK)를 사용하여 개발된 애플리케이션)은 iOS™, 안드로이드™, 윈도우®전화, 또는 다른 모바일 운영 시스템과 같은 모바일 운영 시스템 상에서 구동하는 모바일 소프트웨어가 될 수 있다. 이 예시에서, 제 3 자 애플리케이션(1066)은 본원에서 설명된 기능을 제공하기 위해 모바일 운영 시스템(1004)에 의해 제공되는 API 호출(1012)를 호출할 수 있다. In an example embodiment, the application 1010 may include a home application 1050, a contact application 1052, a browser application 1054, a book reader application 1056, a location application 1058, a media application 1060, a messaging application. 1062, game applications 1064, and a wide variety of other applications such as third party applications 1066. According to some embodiments, the application 1010 is a program that executes a function defined in the program. Various programming languages may include one or more applications 1010 structured in various ways, such as an object-oriented programming language (eg, Object-C, Java, or C ++) or a procedural programming language (eg, C or assembly language). Can be utilized to generate the. In certain instances, third party applications 1066 (e.g., applications developed using Android ™ or iOS ™ Software Development Kits (SDKs) by entities other than the provider of a particular platform) may include iOS ™, Android ™, It can be mobile software running on a mobile operating system such as a Windows® phone or other mobile operating system. In this example, the third party application 1066 can call the API call 1012 provided by the mobile operating system 1004 to provide the functionality described herein.

예시의 머신 아키텍쳐 및 머신 판독가능 매체Example Machine Architecture and Machine-readable Media

도 11은 일부 예시의 실시예에 따라, 머신 판독가능 매체(예를 들어, 머신 판독가능 저장 매체)로부터 명령어를 판독하고 본원에 설명된 임의의 하나 이상의 방법론을 수행하는 것을 가능하게 하는, 머신(1100)의 컴포넌트를 도시하는 블록도이다. 특히, 도 11은 내부에서 머신(1100)으로 하여금 본원에서 논의된 임의의 하나 이상의 방법론을 수행하게 하는 명령어(1116)(예를 들어, 소프트웨어, 프로그램, 애플리케이션, 애플릿, 앱, 또는 다른 실행가능한 코드)가 실행될 수 있는, 예시의 형태의 컴퓨터 시스템에서의 머신(1100)의 개략적인 표현을 도시한다. 대안의 실시예에서, 머신(1100)은 스탠드얼론 디바이스로서 동작하거나 다른 머신에 연결(네트워킹)될 수 있다. 네트워킹된 배치에서, 머신(1100)은 서버-클라이언트 네트워크환경에서 서버 머신 또는 클라이언트 머신의 용량으로 동작할 수 있거나, 피어-투-피어(또는 분산된) 네트워크 환경에서 피어 머신으로서 동작할 수 있다. 머신(1100)은 서버 컴퓨터, 클라이언트 컴퓨터, 개인용 컴퓨터(PC), 태블릿 컴퓨터, 랩탑 컴퓨터, 넷북, 셋탑 박스(STB), 개인용 디지털 보조장치(PDA), 엔터테인먼트 미디어 시스템, 셀룰러 전화, 스마트 폰, 모바일 디바이스, 착용가능한 디바이스(예를 들어, 스마트 워치), 스마트 홈 디바이스(예를 들어, 스마트 어플라이언스), 다른 스마트 디바이스, 웹 어플라이언스, 네트워크 라우터, 네트워크 스위치, 네트워크 브릿지, 또는 순차적으로 또는 그렇지않으면 머신(1100)에 의해 취해질 액션을 특정하는, 명령어(1116)를 실행가능한 임의의 머신을 포함할 수 있지만 이에 제한되지 않는다. 또한, 단일 머신(1100)만이 도시되었지만, 용어 "머신"은 또한 본원에서 논의된 임의의 하나 이상의 방법론을 수행하는 명령어(1116)를 개별적으로 또는 공동으로 실행하는 머신(1100)의 집합을 포함하도록 취해질 것이다. 실제적으로는, 머신(100)의 특정 실시예는 본원에서 설명된 방법론에 더 적합해질 수 있다. 예를 들어, 충분한 프로세싱 전력을 갖는 임의의 컴퓨팅 디바이스는 식별 서버(130)로서 역할을 할 수 있는 반면, 가속도계, 카메라, 및 셀룰러 네트워크 접속성은 본원에서 논의된 이미지 식별 방법을 수행하는 식별 서버(130)의 능력과 직접적으로 관련되지 않는다. 따라서, 일부 예시의 실시예에서, 비용 절감은 머신(110) 상에서 다양한 설명된 방법론을 구현함으로써 구현되고 (예를 들어, 직접 연결된 디스플레이 없이 그리고 착용가능한 또는 휴대가능한 디바이스 상에서만 공통으로 찾을 수 있는 집적 센서 없이 서버 머신에서 식별 서버(130)를 구현함으로써) 각각의 머신(1100)에 할당되는 태스크의 수행에 불필요한 추가 기능은 제외한다.11 is a machine (capable of reading instructions from a machine readable medium (eg, machine readable storage medium) and performing any one or more methodologies described herein, in accordance with some example embodiments. A block diagram illustrating the components of 1100. In particular, FIG. 11 illustrates instructions 1116 (eg, software, programs, applications, applets, apps, or other executable code) internally to cause machine 1100 to perform any one or more methodologies discussed herein. Shows a schematic representation of the machine 1100 in an example form of a computer system, in which) may be executed. In alternative embodiments, the machine 1100 may operate as a standalone device or may be connected (networked) to another machine. In a networked deployment, the machine 1100 may operate at the capacity of a server machine or client machine in a server-client network environment, or may operate as a peer machine in a peer-to-peer (or distributed) network environment. The machine 1100 is a server computer, client computer, personal computer (PC), tablet computer, laptop computer, netbook, set top box (STB), personal digital assistant (PDA), entertainment media system, cellular phone, smartphone, mobile Devices, wearable devices (e.g., smart watches), smart home devices (e.g., smart appliances), other smart devices, web appliances, network routers, network switches, network bridges, or sequentially or otherwise machines ( It can include, but is not limited to, any machine that can execute the instructions 1116, which specifies the action to be taken by 1100. Furthermore, although only a single machine 1100 is shown, the term “machine” also includes a set of machines 1100 that individually or jointly execute instructions 1116 to perform any one or more methodologies discussed herein. Will be taken. In practice, certain embodiments of machine 100 may be better suited to the methodology described herein. For example, any computing device with sufficient processing power may serve as identification server 130, while accelerometers, cameras, and cellular network connectivity perform identification server 130 performing the image identification methods discussed herein. Not directly related to Thus, in some example embodiments, the cost savings are implemented by implementing various described methodologies on the machine 110 (e.g., integration that can be found commonly only on a wearable or portable device and without a directly connected display). By implementing identification server 130 in a server machine without sensors), additional functions that are unnecessary for the performance of tasks assigned to each machine 1100 are excluded.

머신(1100)은 버스(1102)를 통해 서로 통신하도록 구성될 수 있는, 프로세서(1110), 메모리(1130), 및 I/O 컴포넌트(1150)를 포함할 수 있다. 예시의 실시예에서, 프로세서(1110)(예를 들어, CPU(Central Processing Unit), RISC(Reduced Instruction Set Computing) 프로세서, CISC(Complex Instruction Set Computing) 프로세서, GPU(Graphics Processing Unit), DSP(Digital Signal Processor), ASIC(Application Specific Integrated Circuit), RFIC(Radio-Frequency Integrated Circuit), 다른 프로세서, 또는 이들의 임의의 적절한 조합)는, 예를 들어, 명령어(1116)를 실행할 수 있는 프로세서(1112) 및 프로세서(1114)를 포함할 수 있다. 용어 "프로세서"는 동시에 명령어를 실행할 수 있는 둘 이상의 독립 프로세서(또한 "코어"로서 지칭됨)를 포함할 수 있는 다중 코어 프로세서를 포함하는 것이다. 도 11은 다수의 프로세서를 도시하지만, 머신(1100)은 단일 코어를 갖는 단일 프로세서, 다수의 코어를 갖는 단일 프로세서(예를 들어, 다중 코어 프로세서), 단일 코어를 갖는 다수의 프로세서, 다수의 코어를 갖는 다수의 프로세서, 또는 이들의 조합을 포함할 수 있다.The machine 1100 may include a processor 1110, a memory 1130, and an I / O component 1150, which may be configured to communicate with each other via a bus 1102. In an example embodiment, the processor 1110 (eg, Central Processing Unit (CPU), Reduced Instruction Set Computing (RISC) processor, Complex Instruction Set Computing (CISC) processor, Graphics Processing Unit (GPU), Digital DSP) Signal Processors, Application Specific Integrated Circuits (ASICs), Radio-Frequency Integrated Circuits (RFICs), other processors, or any suitable combination thereof, may be, for example, processors 1112 capable of executing instructions 1116. And a processor 1114. The term "processor" is intended to encompass a multi-core processor that may include two or more independent processors (also referred to as "cores") that can execute instructions simultaneously. 11 illustrates a number of processors, the machine 1100 includes a single processor with a single core, a single processor with a plurality of cores (eg, a multi-core processor), a plurality of processors with a single core, a plurality of cores. It may include a plurality of processors having, or a combination thereof.

메모리(1130)는 버스(1102)를 통해 프로세서(1110)에 액세스가능한 주 메모리(1132), 정적 메모리(1134), 저장 유닛(1136)을 포함할 수 있다. 저장 유닛(1136)은 본원에서 설명된 임의의 하나 이상의 방법론 또는 기능을 구현하는 명령어(1116)가 저장된 머신 판독가능 매체(1138)를 포함할 수 있다. 명령어(1116)는 또한 주 메모리(1132) 내에, 정적 메모리(1134) 내에, 적어도 하나의 프로세서(1110) 중 적어도 하나 내에(예를 들어, 프로세서의 캐시 메모리 내에), 또는 이들의 임의의 적절한 조합에서 머신(1100)에 의한 이들의 실행 동안 완전하게 또는 적어도 부분적으로 존재할 수 있다. 따라서, 다양한 구현예에서, 주 메모리(1132), 정적 메모리(1134), 및 프로세서(1110)는 머신 판독가능 매체(1138)로서 고려된다.Memory 1130 may include main memory 1132, static memory 1134, and storage unit 1136 that are accessible to processor 1110 via bus 1102. Storage unit 1136 may include machine readable medium 1138 having instructions 1116 stored thereon that implement any one or more methodologies or functions described herein. The instructions 1116 may also be in main memory 1132, in static memory 1134, in at least one of the at least one processor 1110 (eg, in the cache memory of the processor), or any suitable combination thereof. May be fully or at least partially present during their execution by the machine 1100. Thus, in various implementations, main memory 1132, static memory 1134, and processor 1110 are considered as machine readable medium 1138.

본원에서 사용된 바와 같이, 용어 "메모리"는 데이터를 임시로 또는 영구적으로 저장하는 것이 가능하고 랜덤 액세스 메모리(RAM), 판독전용 메모리(ROM), 버퍼 메모리, 플래쉬 메모리, 및 캐시 메모리를 포함하지만 이에 제한되지 않도록 취해질 수 있는 머신 판독가능 매체(1138)를 지칭한다. 머신 판독가능 매체(1138)가 단일 매체가 되도록 예시의 실시예에서 도시되었지만, 용어 "머신-판독가능 매체"는 명령어(1116)를 저장하는 것이 가능한 단일 매체 또는 다수의 매체(예를 들어, 중앙집중형 또는 분산형 데이터베이스 또는 연관된 캐시 및 서버)를 포함하도록 취해질 것이다. 용어 "머신 판독가능 매체"는 또한 명령어가 머신(1100)의 하나 이상의 프로세서(예를 들어, 프로세서(1110)에 의해 실행될 때, 머신(1100)으로 하여금 본원에서 논의된 임의의 하나 이상의 방법론을 수행하게 하도록 머신(예를 들어, 머신(1100))에 의한 실행을 위해 명령어(예를 들어, 명령어(1116))를 저장하는 것이 가능한 임의의 매체, 또는 다수의 매체의 조합을 포함하도록 취해질 것이다. 따라서, "머신 판독가능 매체"는 다수의 저장 장치 또는 디바이스를 포함하는 "클라우드 기반" 저장 시스템 또는 저장 네트워크 뿐만 아니라 단일 저장 장치 또는 디바이스를 지칭한다. 따라서 용어 "머신 판독가능 매체"는 솔리드 스테이트 메모리(예를 들어, 플래쉬 메모리), 광학 매체, 자기 매체, 다른 비휘발성 메모리(예를 들어, 제거가능한 프로그래밍가능한 판독전용 메모리(EPROM)), 또는 이들의 임의의 적절한 조합의 형태의 하나 이상의 데이터 저장소를 포함하도록 취해질 것이지만, 이에 제한되지 않는다.As used herein, the term “memory” is capable of temporarily or permanently storing data and includes random access memory (RAM), read-only memory (ROM), buffer memory, flash memory, and cache memory. Refers to a machine readable medium 1138 that can be taken without being limited thereto. Although the machine-readable medium 1138 is shown in an example embodiment such that it is a single medium, the term “machine-readable medium” refers to a single medium or multiple media (eg, centrally) capable of storing the instructions 1116. Centralized or distributed databases or associated caches and servers). The term “machine readable medium” also means that when instructions are executed by one or more processors (eg, processor 1110) of the machine 1100, the machine 1100 performs any one or more methodologies discussed herein. It will be taken to include any medium capable of storing instructions (eg, instructions 1116) for execution by the machine (eg, machine 1100), or a combination of multiple media. Thus, "machine readable medium" refers to a "cloud-based" storage system or storage network as well as a single storage device or device including multiple storage devices or devices, thus the term "machine readable medium" refers to a solid state memory. (E.g., flash memory), optical media, magnetic media, other non-volatile memory (e.g., removable programmable read only Memory (EPROM), or any suitable combination of these, will be taken to include, but is not limited to.

I/O 컴포넌트(1150)는 입력을 수신하고, 출력을 제공하고, 출력을 생성하고, 정보를 전송하고, 정보를 교환하고, 측정치를 캡쳐하는 등의 다양한 컴포넌트를 포함한다. 일반적으로, I/O 컴포넌트(1150)는 도 11에 도시되지 않은 많은 다른 컴포넌트를 포함할 수 있다는 것을 이해할 수 있다. I/O 컴포넌트(1150)는 단지 이하의 논의를 단순화시키는 기능에 따라 그룹화되고 그룹화는 제한하는 방식이 아니다. 다양한 예시의 실시예에서, I/O 컴포넌트(1150)는 출력 컴포넌트(1152) 및 입력 컴포넌트(1154)를 포함한다. 출력 컴포넌트(1152)는 시각적 컴포넌트(예를 들어, 플라즈마 디스플레이 패널(PDP), 발광 다이오드(LED) 디스플레이, 액정 디스플레이(LCD), 프로젝터, 또는 음극선관(CRT)), 음향 컴포넌트(예를 들어, 스피커), 햅틱 컴포넌트(예를 들어, 진동 모터), 다른 신호 생성기 등을 포함한다. 입력 컴포넌트(1154)는 영숫자 입력 컴포넌트(예를 들어, 키보드, 영숫자 입력을 수신하도록 구성된 터치 스크린, 광전 키보드, 또는 다른 영숫자 입력 컴포넌트), 포인트 기반 입력 컴포넌트(예를 들어, 마우스, 터치패드, 트랙볼, 조이스틱, 움직임 센서, 또는 다른 포인팅 기구), 촉각 입력 컴포넌트(예를 들어, 물리적 버튼, 위치 및 터치 힘 또는 터치 제스쳐를 제공하는 터치 스크린 및 다른 촉각 입력 컴포넌트), 오디오 입력 컴포넌트(예를 들어, 마이크로폰) 등을 포함한다.I / O component 1150 includes various components such as receiving input, providing output, generating output, transmitting information, exchanging information, capturing measurements, and the like. In general, it will be appreciated that I / O component 1150 may include many other components not shown in FIG. 11. I / O components 1150 are merely grouped according to functionality that simplifies the discussion below and are not a limiting grouping. In various example embodiments, I / O component 1150 includes an output component 1152 and an input component 1154. The output component 1152 may be a visual component (eg, a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)), an acoustic component (eg, Speakers), haptic components (eg, vibration motors), other signal generators, and the like. Input component 1154 may include an alphanumeric input component (eg, a keyboard, a touch screen configured to receive alphanumeric input, an optical keyboard, or other alphanumeric input component), a point based input component (eg, a mouse, touchpad, trackball). , Joysticks, motion sensors, or other pointing mechanisms, tactile input components (e.g., touch screens and other tactile input components that provide physical buttons, position and touch force or touch gestures), audio input components (e.g., Microphone) and the like.

일부 추가의 예시의 실시예에서, I/O 컴포넌트(1150)는 생체인식 컴포넌트(1156), 움직임 컴포넌트(1158), 환경 컴포넌트(1160), 또는 위치 컴포넌트(1162), 기타 다양한 다른 컴포넌트를 포함한다. 예를 들어, 생체인식 컴포넌트(1156)는 표현(예를 들어, 손 표현, 얼굴 표현, 안면 표현, 신체 제스쳐, 또는 안구 추적)을 검출하고, 생체신호(예를 들어, 혈압, 심장 박동, 신체 온도, 땀, 또는 뇌파)를 측정하고, 사람을 식별(예를 들어, 음성 식별, 망막 식별, 안면 식별, 지문 식별, 또는 식별에 기초한 뇌전도)하는 등의 컴포넌트를 포함한다. 움직임 컴포넌트(1158)는 가속 센서 컴포넌트(예를 들어, 가속도계), 중력 센서 컴포넌트, 회전 센서 컴포넌트(예를 들어, 자이로스코프) 등을 포함한다. 환경 컴포넌트(1160)는, 예를 들어, 조명 센서 컴포넌트(광도계), 온도 센서 컴포넌트(예를 들어, 주변 온도를 검출하는 하나 이상의 온도계), 습도 센서 컴포넌트, 압력 센서 컴포넌트(예를 들어, 바로미터), 음향 센서 컴포넌트(예를 들어, 배경 잡음을 검출하는 하나 이상의 마이크로폰), 근접 센서 컴포넌트(예를 들어, 근처 객체를 검출하는 적외선 센서), 가스 센서(예를 들어, 안전을 위해 위험한 가스의 농도를 검출하거나 환경에서 오염물질을 측정하는 머신 후각 검출 센서, 가스 검출 센서), 또는 주변의 물리적 환경에 대응하는 표시, 측정치, 또는 신호를 제공할 수 있는 다른 컴포넌트를 포함한다. 위치 컴포넌트(1162)는 위치 센서 컴포넌트(예를 들어, GPS(Global Position System) 수신기 컴포넌트), 고도 센서 컴포넌트(예를 들어, 고도가 도출될 수 있는 공기압을 검출하는 고도계 또는 바로미터), 방향 센서 컴포넌트(예를 들어, 자기미터) 등을 포함한다.In some further example embodiments, I / O component 1150 includes biometric component 1156, motion component 1158, environmental component 1160, or location component 1162, and various other components. . For example, the biometric component 1156 detects an expression (eg, hand expression, facial expression, facial expression, body gesture, or eye tracking), and the biosignal (eg, blood pressure, heart rate, body). Components such as measuring temperature, sweat, or brain waves, and identifying a person (eg, voice identification, retinal identification, facial identification, fingerprint identification, or electroencephalography based on identification), and the like. The movement component 1158 includes an acceleration sensor component (eg, accelerometer), a gravity sensor component, a rotation sensor component (eg, a gyroscope), and the like. Environmental component 1160 may include, for example, an illumination sensor component (photometer), a temperature sensor component (eg, one or more thermometers to detect ambient temperature), a humidity sensor component, a pressure sensor component (eg, a barometer) Acoustic sensor components (e.g., one or more microphones to detect background noise), proximity sensor components (e.g., infrared sensors to detect nearby objects), gas sensors (e.g. concentrations of dangerous gases for safety Machine smell detection sensor, gas detection sensor) for detecting or measuring contaminants in the environment), or other components capable of providing an indication, measurement, or signal corresponding to the surrounding physical environment. The position component 1162 may be a position sensor component (eg, a global position system (GPS) receiver component), an altitude sensor component (eg, an altimeter or barometer that detects air pressure from which an altitude can be derived), a direction sensor component (For example, a magnetic meter) and the like.

통신은 다양한 기술을 사용하여 구현될 수 있다. I/O 컴포넌트(1150)는 커플링(1182) 및 커플링(1172) 각각을 통해 네트워크(1180) 또는 디바이스(1170)에 머신(1100)을 연결하도록 동작가능한 통신 컴포넌트(1164)를 포함할 수 있다. 예를 들어, 통신 컴포넌트(1164)는 네트워크(1180)와 인터페이싱하는 네트워크 인터페이스 컴포넌트 또는 다른 적절한 디바이스를 포함한다. 추가 예시에서, 통신 컴포넌트(1164)는 다른 모달리티를 통해 통신을 제공하는 유선 통신 컴포넌트, 무선 통신 컴포넌트, 셀룰러 통신 컴포넌트, 근거리 통신(NFC) 컴포넌트, 블루투스® 컴포넌트(예를 들어, 블루투스® 저에너지), 와이파이® 컴포넌트, 및 다른 통신 컴포넌트를 포함한다. 디바이스(1170)는 다른 머신 또는 임의의 다양한 주변 디바이스(예를 들어, USB를 통해 연결되는 주변 디바이스)가 될 수 있다.Communication can be implemented using various techniques. I / O component 1150 may include communication component 1164 operable to connect machine 1100 to network 1180 or device 1170 via coupling 1182 and coupling 1172 respectively. have. For example, communication component 1164 includes a network interface component or other suitable device that interfaces with network 1180. In a further example, the communication component 1164 may include a wired communication component, a wireless communication component, a cellular communication component, a near field communication (NFC) component, a Bluetooth® component (e.g., Bluetooth® low energy) that provides communication via another modality, Wi-Fi® components, and other communication components. Device 1170 may be another machine or any of a variety of peripheral devices (eg, peripheral devices connected via USB).

또한, 일부 구현예에서, 통신 컴포넌트(1164)는 식별자를 검출하거나 식별자를 검출하도록 동작가능한 컴포넌트를 포함한다. 예를 들어, 통신 컴포넌트(1164)는 무선 주파수 식별(RFID) 태그 판독기 컴포넌트, NFC 스마트 태그 검출 컴포넌트, 광학 판독기 컴포넌트(예를 들어, 범용 제품 코드(UPC)와 같은 일차원 바코드, QR(Quick Response) 코드와 같은 다차원 바코드, 아즈텍 코드, 데이터 매트릭스, 데이터글리프(Dataglyph), 맥시코드, PDF417, 울트라 코드, UCC RSS(Uniform Commercial Code Reduced Space Symbology)-2D 바코드, 및 다른 광학 코드), 음향 검출 컴포넌트(예를 들어, 태그된 오디오 신호를 식별하는 마이크로폰), 또는 이들의 임의의 적절한 조합을 포함한다. 또한, 다양한 정보는 인터넷 프로토콜(IP) 지리적 위치를 통한 위치, 와이파이® 신호 삼각측량을 통한 위치, 특정 위치를 나타낼 수 있는 NFC 비콘 신호를 검출하는 것을 통한 위치 등과 같이, 통신 컴포넌트(1164)를 통해 도출될 수 있다.In addition, in some implementations, the communication component 1164 includes a component operable to detect the identifier or to detect the identifier. For example, the communication component 1164 may include a radio frequency identification (RFID) tag reader component, an NFC smart tag detection component, an optical reader component (e.g., a one-dimensional barcode such as a Universal Product Code (UPC), Quick Response (QR). Multidimensional barcodes such as codes, Aztec codes, data matrices, Dataglyph, maxicode, PDF417, ultracode, UCC Uniform Commercial Code Reduced Space Symbology (DCC) -2D barcodes, and other optical codes), acoustic detection components ( For example, a microphone that identifies a tagged audio signal), or any suitable combination thereof. In addition, various information may be communicated via communication component 1164, such as location via Internet Protocol (IP) geographic location, location via Wi-Fi® signal triangulation, location through detecting NFC beacon signals that may indicate a particular location, and the like. Can be derived.

전송 매체Transmission medium

다양한 예시의 실시예에서, 네트워크(1180)의 하나 이상의 부분은 애드혹 네트워크, 인트라넷, 익스트라넷, 가상 사설 네트워크(VPN), 로컬 영역 네트워크(LAN), 무선 LAN(WLAN), 광역 네트워크(WAN), 무선 WAN(WWAN), 도심 영역 네트워크(MAN), 인터넷, 인터넷 일부, 공용 스위치 전화 네트워크(PSTN) 일부, 재래식 전화 서비스(POTS) 네트워크, 셀룰러 전화 네트워크, 무선 네트워크, 와이파이® 네트워크, 다른 타입의 네트워크, 또는 둘 이상의 이러한 네트워크의 조합이 될 수 있다. 예를 들어, 네트워크(1180) 또는 네트워크(1180)의 일부는 무선 또는 셀룰러 네트워크를 포함할 수 있고 커플링(1182)은 코드 분할 다중 액세스(CDMA) 접속, 모바일 통신을 위한 글로벌 시스템(GSM) 접속, 또는 다른 타입의 셀룰러 또는 무선 커플링이 될 수 있다. 이 예시에서, 커플링(1182)은 1xRTT(Single Carrier Radio Transmission Technology), EVDO(Evolution-Data Optimized) 기술, GPRS(General Packet Radio Service) 기술, EDGE(Enhanced Data rates for GSM Evolution) 기술, 3G를 포함하는 3GPP(third Generation Partnership Project), 4G(fourth generation wireless) 네트워크, UMTS(Universal Mobile Telecommunications System), HSPA(High Speed Packet Access), WiMAX(Worldwide Interoperability for Microwave Access), Long Term Evolution (LTE) 표준, 다양한 표준 설정 기구에 의해 정의된 다른 것들, 다른 원거리 프로토콜, 다른 데이터 전송 기술과 같은, 임의의 다양한 타입의 데이터 전송 기술을 구현할 수 있다.In various example embodiments, one or more portions of network 1180 may comprise an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), Wireless WAN (WWAN), Metropolitan Area Network (MAN), Internet, Partial Internet, Partial Public Switched Telephone Network (PSTN), Conventional Telephone Service (POTS) Network, Cellular Telephone Network, Wireless Network, Wi-Fi® Network, Other Types of Network , Or a combination of two or more such networks. For example, network 1180 or part of network 1180 may comprise a wireless or cellular network and coupling 1182 may be a code division multiple access (CDMA) connection, a global system (GSM) connection for mobile communications. Or other types of cellular or wireless coupling. In this example, the coupling 1182 uses 1xRTT (Single Carrier Radio Transmission Technology), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced Data rates for GSM Evolution (EDGE) technology, 3G. Third Generation Partnership Project (3GPP), fourth generation wireless (4G) network, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Worldwide Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE) standards It is possible to implement any of various types of data transmission techniques, such as others defined by various standard setting mechanisms, other remote protocols, other data transmission techniques.

예시의 실시예에서, 명령어(1116)는 네트워크 인터페이스 디바이스(예를 들어, 통신 컴포넌트(1164)에 포함된 네트워크 인터페이스 컴포넌트)를 통해 전송 매체를 사용하고 다수의 잘 알려진 전송 프로토콜(예를 들어, 하이퍼텍스트 전송 프로토콜(HTTP)) 중 어느 하나를 활용하여 네트워크(1180)로 전송 또는 수신된다. 유사하게, 다른 예시의 실시예에서, 명령어(1116)는 커플링(1172)(예를 들어, 피어-투-피어 커플링)을 통해 전송 매체를 사용하여 디바이스(1170)로 전송 또는 수신된다. 용어 "전송 매체"는 머신(1100)에 의한 실행을 위해 명령어(1116)를 저장, 인코딩, 또는 전달하는 것이 가능하고, 디지털 또는 아날로크 전송 신호를 포함하는 임의의 무형의 매체 또는 이러한 소프트웨어의 통신을 가능하게 하는 다른 무형 매체를 포함하도록 취해질 것이다. 전송 매체는 일 실시예의 머신 판독가능 매체이다.In an example embodiment, the instructions 1116 use a transmission medium via a network interface device (eg, a network interface component included in the communication component 1164) and use a number of well-known transport protocols (eg, hyper). Text transmission protocol (HTTP) may be transmitted or received via the network 1180. Similarly, in another example embodiment, the instructions 1116 are transmitted or received to the device 1170 using a transmission medium via coupling 1172 (eg, peer-to-peer coupling). The term “transmission medium” is capable of storing, encoding, or conveying an instruction 1116 for execution by the machine 1100 and communicating any intangible medium or such software, including digital or analog transmission signals. It will be taken to include other intangible media that enable this. The transmission medium is an embodiment machine readable medium.

언어language

본 명세서 전반에서, 복수의 인스턴스는 단일 인스턴스로서 설명되는 컴포넌트, 동작, 또는 구조를 구현할 수 있다. 하나 이상의 방법의 개별 동작이 분리된 동작으로서 예시되고 설명되지만, 하나 이상의 개별 동작이 동시에 수행될 수 있고, 동작이 예시된 순서로 수행될 필요는 없다. 예시의 구성에서 분리된 컴포넌트로서 제시된 구조 및 기능은 통합된 구조 또는 컴포넌트로서 구현될 수 있다. 유사하게, 단일 컴포넌트로서 제시된 구조 및 기능은 분리된 컴포넌트로서 구현될 수 있다. 다양한 변형, 수정, 추가 및 향상은 본원의 청구 대상의 범위 내에 속한다.Throughout this specification, multiple instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more individual operations may be performed concurrently, and the operations need not be performed in the order illustrated. Structures and functions presented as separate components in the example configurations may be implemented as integrated structures or components. Similarly, structures and functions presented as a single component can be implemented as separate components. Various variations, modifications, additions, and improvements fall within the scope of the subject matter herein.

본원의 청구 대상의 개요가 특정 예시의 실시예를 참조하여 설명되었지만, 다양한 수정 및 변형이 본 개시의 실시예의 폭 넓은 범위로부터 벗어남이 없이 이들 실시예에 대해 이루어질 수 있다. 본원의 청구 대상의 이러한 실시예는 본원에서 개별적으로 또는 집합적으로 단지 편리함을 위해 용어 "본 발명"으로 지칭될 수도 있지만, 본 출원의 범위를 어느 하나의 개시 또는 발명의 개념으로 (사실은, 하나 이상이 개시되어 있겠지만) 자발적으로 제한하고자 하는 것은 아니다.While an overview of the subject matter herein has been described with reference to specific example embodiments, various modifications and variations may be made to these embodiments without departing from the broad scope of the embodiments of the present disclosure. Such embodiments of the subject matter herein may be individually or collectively referred to herein by the term "invention" for convenience only, but the scope of the present application is defined in the context of either disclosure or invention (indeed, One or more may be disclosed) but are not intended to be voluntary.

본원에서 예시된 실시예는 당업자가 개시된 교시를 실시하는 것이 가능하도록 충분히 자세하게 설명된다. 다른 실시예는 구조적 및 논리적인 대체 및 변경이 본 개시의 범위로부터 벗어남이 없이 이루어질 수 있도록 다른 실시예가 사용되고 이들로부터 도출될 수 있다. 따라서, 상세한 설명은 제한의 의미로 취해지는 것이 아니고, 다양한 실시예의 범위가 첨부된 청구항이 권리를 갖는 전범위의 등가물과 함께 첨부된 청구항에 의해서만 정의된다.The embodiments illustrated herein are described in sufficient detail to enable those skilled in the art to practice the disclosed teachings. Other embodiments may be utilized and derived from them so that structural and logical substitutions and changes may be made without departing from the scope of the present disclosure. Accordingly, the detailed description is not to be taken in a limiting sense, and the scope of the various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

본원에서 사용된 바와 같이, 용어 "또는"은 포괄적인 또는 배타적인 의미로 해석될 수 있다. 또한, 복수의 인스턴스는 본원에 설명된 리소스, 동작, 또는 구조에 대해 단일 인스턴스로서 제공될 수 있다. 추가적으로, 다양한 리소스, 동작, 모듈, 엔진, 및 데이터 저장소 사이의 경계는 다소 임의적이고, 특정 동작은 특정한 예시적인 구성의 상황으로 예시된다. 다른 할당의 기능이 구성되고 본 개시의 다양한 실시예의 범위 내에 속할 수 있다. 일반적으로, 예시의 구성에서 개별 리소스로서 제시된 구조 및 기능은 통합된 구조 또는 리소스로서 구현될 수 있다. 유사하게, 단일 리소스로서 제시된 구조 및 기능은 개별 리소스로서 구현될 수 있다. 다양한 변형, 수정, 추가 및 향상은 첨부된 청구항에 의해 표현된 것으로서 본 개시의 실시예의 범위 내에 속한다. 따라서, 명세서 및 도면은 제한의 의미 보다는 예시의 의미로서 간주된다.As used herein, the term “or” can be interpreted in an inclusive or exclusive sense. In addition, multiple instances may be provided as a single instance for the resources, operations, or structures described herein. In addition, the boundaries between the various resources, operations, modules, engines, and data stores are somewhat arbitrary, and certain operations are illustrated in the context of certain example configurations. Other assignments of functionality may be configured and may fall within the scope of various embodiments of the present disclosure. In general, structures and functionality presented as discrete resources in the example configuration may be implemented as an integrated structure or resource. Similarly, structures and functions presented as a single resource can be implemented as individual resources. Various modifications, modifications, additions, and improvements fall within the scope of embodiments of the present disclosure as represented by the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

이하의 열거된 예시들은 본원에서 논의된 다양한 예시의 실시예의 방법, 머신 판독가능 매체, 및 시스템(예를 들어, 장치)을 정의한다.The enumerated examples below define the methods, machine readable media, and systems (eg, apparatus) of the various illustrative embodiments discussed herein.

예 1. 시스템으로서,Example 1. As a system,

내부에 구현된 명령어를 구비하는 메모리와,A memory having instructions implemented therein;

상기 명령어에 의해 구성된 하나 이상의 프로세서를 포함하되, 상기 명령어는,One or more processors configured by the instructions, wherein the instructions,

복수의 대응하는 아이템에 대해 복수의 기록을 저장하는 것―상기 복수의 기록의 각각의 기록은 기록에 대응하는 아이템에 대한 텍스트 데이터 및 이미지 데이터를 포함함―과,Storing a plurality of records for a plurality of corresponding items, each record of the plurality of records comprising text data and image data for an item corresponding to the record;

제 1 아이템을 묘사하는 제 1 이미지에 액세스하는 것과,Accessing a first image depicting the first item,

상기 제 1 이미지 및 상기 복수의 기록의 이미지 데이터에 기초하여 상기 복수의 아이템으로부터 상기 제 1 아이템에 대한 제 1 세트의 후보 매치를 생성하는 것과,Generating a first set of candidate matches for the first item from the plurality of items based on the first image and the image data of the plurality of records;

상기 제 1 이미지에서 텍스트를 인식하는 것과,Recognizing text in the first image;

상기 인식된 텍스트 및 상기 복수의 기록의 텍스트 데이터에 기초하여 상기 복수의 아이템으로부터 상기 제 1 아이템에 대한 제 2 세트의 후보 매치를 생성하는 것과,Generating a second set of candidate matches for the first item from the plurality of items based on the recognized text and the text data of the plurality of records;

상기 제 1 세트의 후보 매치 및 상기 제 2 세트의 후보 매치를 통합된 세트의 후보 매치로 통합하는 것과,Merging the first set of candidate matches and the second set of candidate matches into an integrated set of candidate matches;

상기 통합된 세트의 후보 매치 중 최상위 후보 매치를 식별하는 것을 포함하는 동작을 수행하는Perform an operation comprising identifying a top candidate match of the merged set of candidate matches.

시스템.system.

예 2. 예 1에 있어서,Example 2. In Example 1,

상기 제 1 이미지는 사용자 계정과 연관되고,The first image is associated with a user account,

상기 동작은 전자 마켓플레이스에 리스팅을 생성하는 것을 더 포함하고, 상기 리스팅은 상기 사용자 계정과 연관되고, 상기 리스팅은 최상위 후보 매치를 위한 것인The operation further comprises creating a listing in an electronic marketplace, wherein the listing is associated with the user account and the listing is for a top candidate match.

시스템.system.

예 3. 예 1 또는 예 2에 있어서,Example 3. The method of example 1 or 2,

상기 텍스트를 인식하는 것은 방향에 구속받지 않는(orientation-agnostic) 방식으로 텍스트의 클러스터를 추출하는 것을 포함하고,Recognizing the text includes extracting a cluster of text in an orientation-agnostic manner,

상기 제 2 세트의 후보 매치를 생성하는 것은 상기 텍스트의 클러스터에서 고정된 크기 N의 캐릭터 엔-그램(character N-grams)을 매칭하는 것을 포함하는Generating the second set of candidate matches includes matching fixed size N character N-grams in the cluster of text.

시스템.system.

예 4. 예 3에 있어서,Example 4. In Example 3,

상기 고정된 크기 N은 3인The fixed size N is 3

시스템.system.

예 5. 예 1 내지 예 4 중 어느 하나에 있어서,Example 5. The method of any of examples 1-4,

상기 제 1 세트의 후보 매치를 생성하는 것은 상기 제 1 세트의 후보 매치에서 각각의 후보 매치에 대응하는 제 1 스코어를 생성하는 것을 포함하고,Generating the first set of candidate matches comprises generating a first score corresponding to each candidate match in the first set of candidate matches,

상기 제 2 세트의 후보 매치를 생성하는 것은 상기 제 2 세트의 후보 매치에서 각각의 후보 매치에 대응하는 제 2 스코어를 생성하는 것을 포함하고,Generating the second set of candidate matches comprises generating a second score corresponding to each candidate match in the second set of candidate matches,

상기 제 1 세트의 후보 매치 및 상기 제 2 세트의 후보 매치를 상기 통합된 세트의 후보 매치로 통합하는 것은, 상기 제 1 세트의 후보 매치 및 상기 제 2 세트의 후보 매치 모두에 포함되는 각각의 후보 매치에 대해, 상기 후보 매치에 대응하는 상기 제 1 스코어 및 상기 제 2 스코어를 합산하는 것을 포함하고,Incorporating the first set of candidate matches and the second set of candidate matches into the merged set of candidate matches, each candidate included in both the first set of candidate matches and the second set of candidate matches. For a match, summing the first score and the second score corresponding to the candidate match,

상기 통합된 세트의 후보 매치 중 최상위 후보 매치를 식별하는 것은 가장 높은 합산된 스코어를 갖는 상기 통합된 세트의 후보 매치에서 후보 매치를 식별하는Identifying a top candidate match of the merged set of candidate matches identifies a candidate match in the merged set of candidate matches with the highest summed score.

시스템.system.

예 6. 예 1 내지 예 5 중 어느 하나에 있어서,Example 6. The method of any of examples 1-5.

상기 동작은,The operation is,

검색 요청의 일부로서 클라이언트 디바이스로부터 상기 제 1 이미지를 수신하는 것과,Receiving the first image from a client device as part of a search request;

상기 최상위 후보 매치에 기초하여 결과의 세트를 식별하는 것과,Identifying a set of results based on the top candidate match;

상기 검색 요청에 응답하여, 상기 결과의 세트를 상기 클라이언트 디바이스로 제공하는 것을 더 포함하는 In response to the search request, providing the set of results to the client device

시스템.system.

예 7. 예 6에 있어서,Example 7. For example 6,

상기 결과의 세트는 판매를 위한 아이템의 아이템 리스팅의 세트를 포함하는The set of results includes a set of item listings of items for sale.

시스템.system.

예 8. 컴퓨터로 구현된 방법으로서,Example 8. A computer-implemented method

복수의 대응하는 아이템에 대해 복수의 기록을 저장하는 단계―상기 복수의 기록의 각각의 기록은 기록에 대응하는 아이템에 대한 텍스트 데이터 및 이미지 데이터를 포함함―와,Storing a plurality of records for a plurality of corresponding items, each record of the plurality of records comprising text data and image data for an item corresponding to the record;

제 1 아이템을 묘사하는 제 1 이미지에 액세스하는 단계와,Accessing a first image depicting the first item;

상기 제 1 이미지 및 상기 복수의 기록의 이미지 데이터에 기초하여 상기 복수의 아이템으로부터 상기 제 1 아이템에 대한 제 1 세트의 후보 매치를 생성하는 단계와,Generating a first set of candidate matches for the first item from the plurality of items based on the first image and the image data of the plurality of records;

상기 제 1 이미지에서 텍스트를 인식하는 단계와,Recognizing text in the first image;

상기 인식된 텍스트 및 상기 복수의 기록의 텍스트 데이터에 기초하여 상기 복수의 아이템으로부터 상기 제 1 아이템에 대한 제 2 세트의 후보 매치를 생성하는 단계와,Generating a second set of candidate matches for the first item from the plurality of items based on the recognized text and the text data of the plurality of records;

상기 제 1 세트의 후보 매치 및 상기 제 2 세트의 후보 매치를 통합된 세트의 후보 매치로 통합하는 단계와,Incorporating the first set of candidate matches and the second set of candidate matches into an integrated set of candidate matches;

상기 통합된 세트의 후보 매치 중 최상위 후보 매치를 식별하는 단계를 포함하는Identifying a top candidate match of the combined set of candidate matches;

컴퓨터로 구현된 방법.Computer-implemented method.

예 9. 예 8에 있어서,Example 9. For example 8,

상기 방법은 전자 마켓플레이스에 리스팅을 생성하는 단계를 포함하고, 상기 리스팅은 상기 사용자 계정과 연관되고, 상기 리스팅은 최상위 후보 매치를 위한 것인The method includes creating a listing in an electronic marketplace, wherein the listing is associated with the user account and the listing is for a top candidate match.

컴퓨터로 구현된 방법.Computer-implemented method.

예 10. 예 8 또는 예 9에 있어서,Example 10. The method of example 8 or 9, wherein

상기 텍스트를 인식하는 단계는 방향에 구속받지 않는 방식으로 텍스트의 클러스터를 추출하는 단계를 포함하고,Recognizing the text includes extracting a cluster of text in a manner that is not constrained by direction;

상기 제 2 세트의 후보 매치를 생성하는 단계는 상기 텍스트의 클러스터에서 고정된 크기 N의 캐릭터 엔-그램을 매칭하는 단계를 포함하는Generating the second set of candidate matches includes matching fixed size N character engrams in the cluster of text.

컴퓨터로 구현된 방법.Computer-implemented method.

예 11. 예 10에 있어서,Example 11. For example 10,

상기 고정된 크기 N은 3인The fixed size N is 3

컴퓨터로 구현된 방법.Computer-implemented method.

예 12. 예 8 내지 예 11 중 어느 하나에 있어서,Example 12. The method of any of examples 8-11.

상기 제 1 세트의 후보 매치를 생성하는 단계는 상기 제 1 세트의 후보 매치에서 각각의 후보 매치에 대응하는 제 1 스코어를 생성하는 단계를 포함하고,Generating the first set of candidate matches comprises generating a first score corresponding to each candidate match in the first set of candidate matches,

상기 제 2 세트의 후보 매치를 생성하는 단계는 상기 제 2 세트의 후보 매치에서 각각의 후보 매치에 대응하는 제 2 스코어를 생성하는 단계를 포함하고,Generating the second set of candidate matches comprises generating a second score corresponding to each candidate match in the second set of candidate matches,

상기 제 1 세트의 후보 매치 및 상기 제 2 세트의 후보 매치를 상기 통합된 세트의 후보 매치로 통합하는 단계는, 상기 제 1 세트의 후보 매치 및 상기 제 2 세트의 후보 매치 모두에 포함되는 각각의 후보 매치에 대해, 상기 후보 매치에 대응하는 상기 제 1 스코어 및 상기 제 2 스코어를 합산하는 단계를 포함하고,Incorporating the first set of candidate matches and the second set of candidate matches into the merged set of candidate matches, each of which is included in both the first set of candidate matches and the second set of candidate matches. For a candidate match, summing the first score and the second score corresponding to the candidate match,

상기 통합된 세트의 후보 매치 중 최상위 후보 매치를 식별하는 단계는 가장 높은 합산된 스코어를 갖는 상기 통합된 세트의 후보 매치에서 후보 매치를 식별하는Identifying the top candidate match of the candidate set of the merged set includes identifying a candidate match in the candidate set of the merged set with the highest summed score.

컴퓨터로 구현된 방법.Computer-implemented method.

예 13. 예 8 내지 예 12 중 어느 하나에 있어서,Example 13. The method of any of examples 8-12.

검색 요청의 일부로서 클라이언트 디바이스로부터 상기 제 1 이미지를 수신하는 단계와,Receiving the first image from a client device as part of a search request;

상기 최상위 후보 매치에 기초하여 결과의 세트를 식별하는 단계와,Identifying a set of results based on the top candidate match;

상기 검색 요청에 응답하여, 상기 결과의 세트를 상기 클라이언트 디바이스로 제공하는 단계를 더 포함하는 In response to the search request, providing the set of results to the client device;

컴퓨터로 구현된 방법.Computer-implemented method.

예 14. 예 13에 있어서,Example 14. For example 13,

컴퓨터로 구현된 방법.Computer-implemented method.

예 15. 머신으로 하여금 예 8 내지 예 14의 방법 중 어느 하나를 수행하게 하는 머신의 하나 이상의 프로세서에 의해 실행가능한 명령어를 포함하는 머신 판독가능 매체.Example 15 A machine readable medium comprising instructions executable by one or more processors of a machine to cause a machine to perform any of the methods of Examples 8-14.

Claims

As a system,
A memory having instructions implemented therein;
One or more processors configured by the instructions, wherein the instructions,
Storing a plurality of records for a plurality of corresponding items, each record of the plurality of records comprising text data and image data for an item corresponding to the record;
Accessing a first image depicting a first item, wherein the first image is associated with creating a listing in an electronic marketplace;
Generating a first set of candidate matches for the first item from the plurality of items based on the first image and the image data of the plurality of records, wherein each candidate match in the first set of candidate matches is generated by a first match. Has a score of 1, and
Recognizing text in the first image;
Generating a second set of candidate matches for the first item from the plurality of items based on the recognized text and the text data of the plurality of records, wherein each candidate match in the second set of candidate matches is generated by the first match. Has 2 scores, and
Consolidating the first set of candidate matches and the second set of candidate matches into a consolidated set of candidate matches, wherein each candidate match within the candidate set of the merged set comprises the first score and the first score of each candidate match; Has an integrated score generated from two scores—and
Identifying a top candidate match of the candidate matches of the unified set based on the unified scores of each candidate match within the unified set of candidate matches;
Perform an operation comprising generating a listing user interface as a selectable option that includes the top candidate match of the unified set of candidate matches.
system.

The method of claim 1,
The first image is associated with a user account,
The operation further includes creating the listing in the electronic marketplace using an option selected for the top candidate match of the consolidated set of candidate matches, the listing associated with the user account.
system.

The method of claim 1,
Recognizing the text includes extracting a cluster of text in an orientation-agnostic manner,
Generating the second set of candidate matches includes matching fixed size N character N-grams in the cluster of text.
system.

The method of claim 3, wherein
The fixed size N is 3
system.

The method of claim 1,
Incorporating the first set of candidate matches and the second set of candidate matches into the merged set of candidate matches, each candidate included in both the first set of candidate matches and the second set of candidate matches. For a match, summing the first score and the second score corresponding to the candidate match,
Identifying a top candidate match of the merged set of candidate matches identifies a candidate match in the merged set of candidate matches with the highest summed score.
system.

The method of claim 1,
The operation is,
Receiving the first image from a client device as part of a search request;
Identifying a set of results based on the top candidate match;
In response to the search request, providing the set of results to the client device
system.

The method of claim 6,
The set of results includes a set of item listings of items for sale.
system.

As a computer-implemented method,
Storing a plurality of records for a plurality of corresponding items, each record of the plurality of records comprising text data and image data for an item corresponding to the record;
Accessing a first image depicting a first item, wherein the first image is associated with creating a listing in an electronic marketplace;
Generating a first set of candidate matches for the first item from the plurality of items based on the first image and the image data of the plurality of records, wherein each candidate match in the first set of candidate matches is generated by a first match. Has a score of 1--and,
Recognizing text in the first image;
Generating a second set of candidate matches for the first item from the plurality of items based on the recognized text and the text data of the plurality of records, wherein each candidate match in the second set of candidate matches is generated by the first match. 2 scores
Consolidating the first set of candidate matches and the second set of candidate matches into a consolidated set of candidate matches, wherein each candidate match within the candidate set of the merged set comprises the first score and the first score of each candidate match; Has an integrated score generated from two scores—and
Identifying a top candidate match of the candidate matches of the merged set based on the consolidated score of each candidate match in the candidate match of the merged set;
Generating a listing user interface as a selectable option that includes the top candidate match of the merged set of candidate matches;
Computer-implemented method.

The method of claim 8,
The first image is associated with a user account,
The method includes creating the listing in the electronic marketplace using an option selected for the top candidate match of the consolidated set of candidate matches, the listing associated with the user account.
Computer-implemented method.

The method of claim 8,
Recognizing the text includes extracting a cluster of text in a manner that is not constrained by direction;
Generating the second set of candidate matches includes matching fixed size N character engrams in the cluster of text.
Computer-implemented method.

The method of claim 10,
The fixed size N is 3
Computer-implemented method.

The method of claim 8,
Incorporating the first set of candidate matches and the second set of candidate matches into the merged set of candidate matches, each of which is included in both the first set of candidate matches and the second set of candidate matches. For a candidate match, summing the first score and the second score corresponding to the candidate match,
Identifying the top candidate match of the candidate set of the merged set includes identifying a candidate match in the candidate set of the merged set with the highest summed score.
Computer-implemented method.

The method of claim 8,
Receiving the first image from a client device as part of a search request;
Identifying a set of results based on the top candidate match;
In response to the search request, providing the set of results to the client device;
Computer-implemented method.

The method of claim 13,
The set of results includes a set of item listings of items for sale.
Computer-implemented method.

A non-transitory machine readable medium comprising instructions executable by one or more processors of a machine to cause a machine to perform any of the methods of claims 8-14.