KR20180111979A

KR20180111979A - Semantic category classification

Info

Publication number: KR20180111979A
Application number: KR1020187026111A
Authority: KR
Inventors: 밍쿠안 리우
Original assignee: 이베이 인크.
Priority date: 2016-02-11
Filing date: 2017-02-10
Publication date: 2018-10-11
Also published as: US20170235824A1; US10599701B2; US20200218750A1; CN108701118A; WO2017139575A1; CN108701118B; US11227004B2

Abstract

예시적인 실시예에 따르면, 시퀀스 의미론적 임베딩 및 병렬 학습에 기초한 대규모 카테고리 분류를 설명한다. 일례로서, 하나 이상의 가장 가까운 매치는 (i) 발행물의 적어도 일부에 대응하는 발행물 의미론적 벡터 - 상기 발행물 의미론적 벡터는 발행물의 적어도 일부를 의미론적 벡터 공간으로 투영하는 제 1 머신 학습 모델에 기반을 둠 - 와, (ii) 복수의 카테고리로부터의 각 카테고리에 대응하는 복수의 카테고리를 비교하는 것에 의해 식별된다.According to an exemplary embodiment, a large category classification based on sequence semantic embedding and parallel learning is described. As an example, the one or more closest matches may be (i) a publication semantic vector corresponding to at least a portion of a publication, the publication semantic vector being based on a first machine learning model that projects at least a portion of the publication into a semantic vector space And (ii) by comparing a plurality of categories corresponding to each category from a plurality of categories.

Description

Semantic category classification

관련 출원에 대한 상호 참조 Cross-reference to related application

본원은 2016년 2월 11일자로 출원된 미국 가출원 제62/293,922호의 우선권의 이익을 주장하며, 그 전부가 본 명세서에서 참고로서 포함된다.This application claims the benefit of US Provisional Application No. 62 / 293,922, filed February 11, 2016, the entirety of which is incorporated herein by reference.

본 발명의 실시예는 일반적으로 시퀀스 의미론적 임베딩(sequence semantic embedding) 및 병렬 학습(parallel learning)에 기초한 대규모 카테고리 분류법 및 추천 시스템(CatReco: Category classification and recommendation system)에 관한 것이다.Embodiments of the present invention generally relate to a large category classification and recommendation system (CatReco) based on sequence semantic embedding and parallel learning.

발행물(예를 들어, 제품 및/또는 서비스)을 발행물 코퍼스(publication corpus)에 적절하게 분류하는 것은, 시스템이 사용자의 질의(query)에 응답하여 발행물의 추천을 제공하는 것을 돕는데 있어서 중요하다. 잠재적인 사용자가 사용자의 질의를 통해 발행물을 찾을 수 있도록, 시스템에서 발행물을 색인화하는데 발행물의 설명이 사용된다.Properly classifying a publication (e.g., a product and / or service) into a publication corpus is important in helping the system provide a recommendation of the publication in response to the user's query. The description of the publication is used to index the publication in the system so that a potential user can locate the publication by querying the user.

다양한 첨부 도면은 단지 본 발명의 예시적인 실시예를 나타낼 뿐이며, 본 발명의 범주를 한정하는 것은 아니다.
도 1은 일부 예시적 실시예에 따른 네트워크화 시스템(networked system)을 도시하는 블록도이다.
도 2는 예시적 실시예에 따른, 도 1의 리스팅 시스템(listing system)을 더 상세히 도시하는 블록도이다.
도 3a 및 도 3b는 예시적 실시예에 따른, 리스팅 제목을 제공하여 리스팅 제목의 카테고리를 선택하는데 사용되는 리스팅 시스템의 사용자 인터페이스이다.
도 4는 소스 의미 벡터(source semantic vector)를 최근접 타깃 의미 벡터에 매칭시키는 간단한 예를 도시한다.
도 5a는 적어도 하나의 CatReco를 사용자에게 제공하기 위해 SSE(Sequence Semantic Embedding)를 사용하는 흐름도를 도시한다.
도 5b는 예시적인 실시예에 따른, SSE를 사용하여 리프 카테고리(leaf category; LeafCat) 식별자(ID)의 리콜 세트(recall set)을 서비스로 제공하는 흐름도를 도시한다.
도 6a는 예시적 실시예에 따른, 기본 SSE CatReco 서비스에 대한 런타임 분류 프로세스를 수행하기 위한 런타임 프로세스를 수행하는 흐름도를 도시한다.
도 6b는 예시적 실시예에 따른, 기본 SSE CatReco 서비스에 대한 타깃의 의미 벡터를 사전 계산하기 위한 오프라인 프로세스를 수행하는 흐름도를 도시한다.
도 6c는 다른 예시적인 실시예에 따른, 기본 SSE CatReco 서비스에 대한 런타임 분류 프로세스를 수행하기 위한 흐름도를 도시한다.
도 6d는 예시적 실시예에 따른, 온라인 및 오프라인 구성요소를 포함하는 기본 SSE CatReco 서비스를 수행하기 위한 흐름도를 도시한다.
도 7은 예시적 실시예에 따른, 기본 SSE CatReco 서비스에 대한 SSE 모델을 트레이닝하기 위한 방법의 흐름도를 도시한다.
도 8은 예시적인 실시예에 따른, 기본 SSE CatReco 서비스에서 사용되는 SSE 모델을 트레이닝하기 위한 라벨링된 트레이닝 데이터(labeled training data)를 도출하는 방법의 흐름도를 도시한다.
도 9는 또 다른 예시적인 실시예에 따른, 기본 SSE CatReco 서비스에 대한 SSE 모델을 트레이닝하기 위한 흐름도를 도시한다.
도 10은 예시적인 실시예에 따른, CatReco를 생성하기 위해 SSE-통계 언어 모델링(SLM: statistical language modeling)-그라디언트 부스팅 머신(GBM; gradient boosting machine) 런타임 프로세스를 수행하기 위한 흐름도를 도시한다.
도 11은 예시적인 실시예에 따른, SSE-SLM 재순위 지정 런타임 프로세스(SSE-SLM re-ranking runtime process)를 수행하기 위한 흐름도를 도시한다.
도 12는 일 예시적인 실시예에 따른, SSE-SLM-GBM 오프라인 트레이닝 프로세스의 제 1 부분을 수행하기 위한 흐름도를 도시한다.
도 13은 일 예시적인 실시예에 따른, SSE-SLM-GBM 오프라인 트레이닝 프로세스의 제 2 부분을 수행하기 위한 흐름도를 도시한다.
도 14는 일부 예시적인 실시예들에 따라, 머신 상에 설치될 수 있는 소프트웨어 구조의 예를 도시하는 블록도이다.
도 15는 예시적인 실시예에 따른, 본 명세서에서 설명되는 임의의 하나 이상의 방법을 머신으로 하여금 수행하도록 한 세트의 명령어가 실행될 수 있는 컴퓨터 시스템 형태의 머신의 개략도를 도시한다.
도 16은 발행물의 관련 카테고리를 비교 및 식별하기 위한 예시적인 방법을 도시한다.The various accompanying drawings are merely representative of the exemplary embodiments of the invention and are not intended to limit the scope of the invention.
1 is a block diagram illustrating a networked system in accordance with some exemplary embodiments.
2 is a block diagram illustrating the listing system of FIG. 1 in greater detail, in accordance with an illustrative embodiment.
Figures 3A and 3B are user interfaces of a listing system used to select a category of listing titles by providing listing titles, in accordance with an illustrative embodiment.
Figure 4 shows a simple example of matching a source semantic vector to a nearest target semantic vector.
5A shows a flow diagram using Sequence Semantic Embedding (SSE) to provide at least one CatReco to a user.
Figure 5B shows a flow diagram of providing a recall set of a leaf category (LeafCat) identifier (ID) to a service using SSE, in accordance with an exemplary embodiment.
6A illustrates a flowchart for performing a runtime process for performing a runtime classification process for a base SSE CatReco service, in accordance with an exemplary embodiment.
6B illustrates a flowchart for performing an offline process for precomputing a target semantic vector for a basic SSE CatReco service, in accordance with an illustrative embodiment.
6C illustrates a flowchart for performing a runtime classification process for a base SSE CatReco service, in accordance with another exemplary embodiment.
FIG. 6D shows a flowchart for performing a basic SSE CatReco service including on-line and off-line components, in accordance with an illustrative embodiment.
7 shows a flow diagram of a method for training an SSE model for a basic SSE CatReco service, in accordance with an exemplary embodiment.
8 shows a flow diagram of a method for deriving labeled training data for training an SSE model used in a basic SSE CatReco service, in accordance with an exemplary embodiment.
Figure 9 shows a flowchart for training an SSE model for a basic SSE CatReco service, in accordance with another exemplary embodiment.
Figure 10 shows a flow diagram for performing SSE-statistical language modeling (SLM) -gradient boosting machine (GBM) runtime processes to generate CatReco, in accordance with an exemplary embodiment.
FIG. 11 illustrates a flow chart for performing an SSE-SLM re-ranking runtime process (SSE-SLM re-ranking runtime process), in accordance with an exemplary embodiment.
12 shows a flow chart for performing a first part of an SSE-SLM-GBM offline training process, in accordance with an exemplary embodiment.
FIG. 13 shows a flow chart for performing a second part of the SSE-SLM-GBM offline training process, according to one exemplary embodiment.
14 is a block diagram illustrating an example of a software architecture that may be installed on a machine, in accordance with some illustrative embodiments.
15 shows a schematic diagram of a machine in the form of a computer system in which a set of instructions may be executed to cause the machine to perform any one or more of the methods described herein, in accordance with an exemplary embodiment.
Figure 16 illustrates an exemplary method for comparing and identifying related categories of a publication.

본 명세서에서 제공되는 주제는 단지 편의를 위한 것이며 사용된 용어의 범위 또는 의미에 반드시 영향을 주지는 않는다.The subject matter provided herein is for convenience only and does not necessarily affect the scope or meaning of the term used.

이하의 설명은 본 명세서의 예시적인 실시예를 구현하는 시스템, 방법, 기술, 명령어 시퀀스, 및 계산기 프로그램 제품(computing machine program products)을 포함한다. 이하의 설명에서는, 설명을 위해, 본 발명의 주제의 다양한 실시예를 이해할 수 있도록 다수의 구체적인 세부 사항이 설명된다. 그러나, 당업자에게는, 이들 구체적인 세부 사항없이도 본 발명의 신규한 청구 대상의 실시예들이 실시될 수 있다는 것이, 자명할 것이다. 일반적으로, 알려진 명령어 인스턴스, 프로토콜, 구조 및 기술은 상세하게 설명되지 않을 수도 있다. The following description includes systems, methods, techniques, instruction sequences, and computing machine program products that implement the exemplary embodiments herein. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of various embodiments of the subject matter of the present invention. It will be apparent, however, to those skilled in the art that the novel claimed embodiments of the invention may be practiced without these specific details. Generally, known instruction instances, protocols, structures, and techniques may not be described in detail.

발행물 코퍼스에서는, 수십억개의 상이한 발행물(제품 제안(product offers))을 세밀하게 체계화하기 위해 매우 대규모의 카테고리를 성립되었다. 카테고리 분류 시스템은, 판매자가 몇가지 제목 키워드에 기초해서 발행물의 리스트를 분류하는 것을 돕기 위해 종종 사용된다.In the publication corpus, a very large category was established to fine-tune billions of different publications (product offers). The category classification system is often used to help sellers sort the list of publications based on several title keywords.

다양한 실시예는 자율(unsupervised) 사용자 로그로부터 매우 큰 크기의 라벨링된 데이터(labeled data)(예를 들어, 수십억개)를 자동적으로 추출하고, 이들을 지도(supervised) 머신 학습 모델 트레이닝을 위해 사용하는 병렬 학습 프레임워크를 기술한다.Various embodiments may be used to automatically extract labeled data (e.g., billions) of very large size from unsupervised user logs and to use them in parallel for supervised machine learning model training Describe the learning framework.

예시적인 실시예에서는, SSE(sequence semantic embedding; 시퀀스 의미론적 임베딩) 방법을 사용해서, 리스팅 제목(예를 들어, 리스팅될, 발행물의 제목 키워드) 및 카테고리 트리 경로를, <소스 시퀀스(source sequence), 타깃 시퀀스(target sequence)>의 쌍으로서 의미론적 벡터 표현으로 인코딩한다. 소스 의미론적 벡터 표현와 타깃 의미론적 벡터 표현의 벡터 거리를 유사도 측정에 사용하여 분류 리콜 후보 세트를 얻을 수 있다. 분류 리콜 후보 세트는 LeafCat ID에 의해 식별되는 카테고리 트리 내의 다수의 LeafCats를 나타낼 수 있다.In an exemplary embodiment, a listing title (e.g., a title keyword of a publication to be listed) and a category tree path may be defined as a source sequence using a sequence semantic embedding (SSE) , The target sequence >. A set of recall candidates can be obtained using the vector distance of the source semantic vector representation and the target semantic vector representation to measure the similarity. The classification recall candidate set may represent a number of LeafCats in the category tree identified by the LeafCat ID.

다른 실시예에서, 분류 리콜 후보 세트가 문장 임베딩 유사도 점수(SSE를 사용하여 도출됨) 및 언어 모델 혼잡도 점수(language model perplexity score)(SLM(statistical language modeling))을 사용하여 도출됨)로부터 GBM(gradient boosting machine) 앙상블된(ensembled) 신호로 재순위 지정될 수 있도록 모든 카테고리(예를 들어, LeafCat)에 대한 언어 모델이 트레이닝된다. 이 조합된 SSE-SLM-GBM 접근법에 의해 생성된 카테고리 추천(CatReco) 결과는 다양한 다른 접근법보다 훨씬 우수하다. 예를 들어, 19,000 이상의 상이한 LeafCats를 다루는 370,000개가 넘는 샘플을 사용하는 벤치마크 테스트 결과에서는 시스템 응답 시간이 10배 이상 개선되었고(예컨대, ~200ms 내지 ~20ms), 분류 에러가 상위 1 CatReco에서 24.8%만큼, 상위 3 CatReco에서 31.12%만큼, 상위 10 CatReco에서는 54.52%만큼 줄었다.In another embodiment, a classification recall candidate set is derived from a sentence embedded similarity score (derived using SSE) and a language model perplexity score (SLM (statistical language modeling)) to GBM the gradient model is trained for all categories (e.g., LeafCat) so that it can be reordered with an ensembled signal. The CatReco results produced by this combined SSE-SLM-GBM approach are far superior to the various other approaches. For example, benchmark test results using more than 370,000 samples covering over 19,000 different LeafCats resulted in a 10x improvement in system response time (eg, ~ 200ms to ~ 20ms) and a 24.8% improvement in classification error in the top 1 CatReco As much as 31.12% in the top 3 CatReco and 54.52% in the top 10 CatReco.

CatReco의 정확도, 특히 상위 1 추천 리프 카테고리(LeafCat)의 정확도는 발행물에 관한 몇 가지 중요한 정보, 예를 들어, 판매자 태그, 리스팅 수수료(listing fee) 및 제품 매칭이 발행물을 위한 LeafCat에 의존하기 때문에, 사용자(예컨대 구매자 및/또는 판매자)의 전반적인 경험에 직접적으로 영향을 미칠 수 있다. 또한, 상위 1 추천 LeafCat를 식별하는 정확도는 종종 B2C(Business to Consumer) 자동화 분류 흐름의 장애물이 된다. 발행 시스템에 의한 상위 1 CatReco의 정확도는 특정 기간 동안 특정 판매처(particular marketplace)를 통해 판매된 상품의 총 판매 달러값을 나타내는 GMV(gross merchandise volume, 거래액)에 직접적인 영향을 줄 수 있다.The accuracy of CatReco, especially the accuracy of the top 1 referral leaf category (LeafCat), is important because some important information about the publication, such as the seller tag, the listing fee, and the product matching depend on the LeafCat for the publication, Can directly affect the overall experience of the user (e.g., buyer and / or seller). Also, the accuracy of identifying the top 1 recommendation LeafCat is often an obstacle to the Business to Consumer (B2C) automation classification flow. The accuracy of the top 1 CatReco by the issuing system can have a direct impact on gross merchandise volume (GMV), which represents the total sales dollar value of goods sold through a particular marketplace during a particular time period.

도 1을 참조하면, 고차 클라이언트-서버 기반 네트워크 구조(high-level client-server-based network architecture)(100)의 예시적인 실시예가 도시된다. 네트워크 기반 발행 시스템 또는 지불 시스템의 예시적 형태에서의 네트워크화 시스템(102)은 서버측 기능을 네트워크(104)(예를 들어, 인터넷 또는 광역 네트워크(WAN))를 통해 하나 이상의 클라이언트 장치(110)에 제공한다. 도 1은 예를 들어 클라이언트 장치(110) 상에서 실행되는 웹 클라이언트(112)(예를 들어, 워싱턴주 레드먼드 소재의 마이크로소프트사에 의해 개발된 인터넷 익스플로러와 같은 브라우저), 클라이언트 애플리케이션(114), 및 프로그램 방식 클라이언트(programmatic client)(116)를 도시한다.Referring to FIG. 1, an exemplary embodiment of a high-level client-server-based network architecture 100 is illustrated. The networked system 102 in the illustrative form of a network-based publishing system or a payment system may provide server-side functionality to one or more client devices 110 via a network 104 (e.g., the Internet or a wide area network (WAN) to provide. Figure 1 illustrates a web client 112 (e.g., a browser such as Internet Explorer developed by Microsoft Corporation of Redmond, Washington) running on client device 110, a client application 114, A programmatic client 116 is shown.

클라이언트 장치(110)는 이동 전화, 데스크탑 컴퓨터, 랩탑, PDA(Personal digital assitant), 스마트폰, 태블릿, 울트라북, 넷북, 랩탑, 멀티프로세서 시스템, 마이크로프로세서 기반 또는 프로그램 가능한 가전 제품, 게임 콘솔, 셋톱 박스, 또는 사용자가 네트워크화 시스템(102)에 액세스하기 위해 이용할 수 있는 임의의 다른 통신 장치를 포함할 수 있지만, 이에 한정되는 것은 아니다. 일부 실시예에서, 클라이언트 장치(110)는 정보(예를 들어, 사용자 인터페이스의 형태)를 표시하는 표시 모듈(도시하지 않음)을 포함할 수 있다. 다른 실시예에서는, 클라이언트 장치(110)는 터치 스크린, 가속도계, 자이로스코프, 카메라, 마이크로폰, GPS(Global Positioning System) 장치 등 중 하나 이상을 포함할 수 있다. 클라이언트 장치(110)는 네트워크화 시스템(102) 내에서 디지털 발행물에 관련되는 트랜잭션을 수행하는데 사용되는 사용자의 장치일 수도 있다. 일 실시예에서, 네트워크화 시스템(102)은, 제품 리스팅 요청에 응답해서, 네트워크 기반 마켓플레이스에서 이용 가능한 제품 목록을 포함하는 발행물을 공개하고, 이들 마켓플레이스에서의 거래 대금을 관리하는 네트워크 기반 마켓플레이스이다. 네트워크(104)의 하나 이상의 부분은 애드혹 네트워크(ad hoc network), 인트라넷, 엑스트라넷, VPN(virtual private network, 가상 사설망), LAN(Local Area Network, 근거리 통신망), WLAN(Wireless LAN), WAN, 무선 WAN, MAN(metropolitan area network, 대도시 통신망), 인터넷의 일부, PSTN(Public Switched Telephone Network, 공중 전화 교환망)의 일부, 셀룰러 전화 네트워크, 무선 네트워크, WiFi 네트워크, WiMax 네트워크, 다른 유형의 네트워크, 또는 이들 네트워크의 2개의 이상의 조합일 수 있다.The client device 110 may be a mobile device, a desktop computer, a laptop, a personal digital assistant (PDA), a smartphone, a tablet, an ultrabook, a netbook, a laptop, a multiprocessor system, a microprocessor- Boxes, or any other communication device that a user may use to access the networked system 102. In some embodiments, client device 110 may include a display module (not shown) that displays information (e.g., in the form of a user interface). In another embodiment, the client device 110 may include one or more of a touch screen, an accelerometer, a gyroscope, a camera, a microphone, a Global Positioning System (GPS) device, and the like. Client device 110 may be a user's device used to perform transactions associated with a digital publication within networking system 102. In one embodiment, networking system 102 is responsive to a product listing request to publish a publication containing a list of products available in a network-based marketplace, and to provide a network-based marketplace to be. One or more portions of the network 104 may be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless local area network (WLAN) A cellular network, a wireless network, a WiFi network, a WiMax network, another type of network, or any other type of network, such as a wireless WAN, a metropolitan area network, a part of the Internet, a part of the Public Switched Telephone Network (PSTN) And may be a combination of two or more of these networks.

클라이언트 장치(110) 각각은, 웹 브라우저, 메시징 애플리케이션(messaging application), 전자 메일(이메일) 애플리케이션, 발행 시스템 애플리케이션(마켓플레이스 애플리케이션이라고도 함) 등과 같은 하나 이상의 애플리케이션("앱"이라고도 함)을 포함할 수 있지만 이것으로 한정되는 것은 아니다. 일부 실시예에서, 발행 시스템 애플리케이션이 클라이언트 장치(110) 중 주어진 하나에 포함되면, 이 애플리케이션은 필요에 따라 국부적으로 이용 가능하지 않는 데이터 또는 처리 능력(예를 들어, 판매 가능한 발행물의 데이터베이스, 사용자 인증, 지불 방법 확인을 위한 액세스)에 대해, 네트워크화 시스템(102)과 통신하도록 구성된 애플리케이션을 갖는 사용자 인터페이스 및 적어도 일부의 기능을 국부적으로 제공하도록 구성된다. 반대로, 발행 시스템 애플리케이션이 클라이언트 장치(110)에 포함되지 않는 경우는, 클라이언트 장치(110)는 네트워크화 시스템(102) 상에 호스팅된 발행 시스템(또는 그 변형)에 액세스하기 위해 그의 웹 브라우저를 사용할 수 있다.Each of the client devices 110 includes one or more applications (also referred to as "apps "), such as a web browser, a messaging application, an email (email) application, a publishing system application (also referred to as a marketplace application) But is not limited to this. In some embodiments, if a publishing system application is included in a given one of the client devices 110, the application may include data or processing capabilities that are not locally available (e.g., a database of publishable publications, , An access for payment method identification), and an application configured to communicate with the networked system 102. In some embodiments, Conversely, if a publishing system application is not included in the client device 110, the client device 110 may use its web browser to access the hosted publishing system (or variations thereof) on the networking system 102 have.

하나 이상의 사용자(106)는 사람, 머신, 또는 클라이언트 장치(110)와 상호 작용하는 다른 수단일 수도 있다. 예시적인 실시예에서, 사용자(106)는 네트워크 구조(100)의 일부는 아니지만, 클라이언트 장치(110) 또는 다른 수단을 통해 네트워크 구조(100)와 상호 작용할 수도 있다. 예를 들어, 사용자는 클라이언트 장치(110)에 입력(예를 들어, 터치 스크린 입력 또는 영숫자 입력)을 제공하고, 입력은 네트워크(104)를 통해 네트워크화 시스템(102)에 전달된다. 본 예에서, 네트워크화 시스템(102)은 사용자로부터의 입력이 수신되는 것에 응답하여 네트워크(104)를 통해 정보를 클라이언트 장치(110)에 전달하여 사용자에게 제공한다. 이러한 방식으로, 사용자는 클라이언트 장치(110)를 사용하여 네트워크화 시스템(102)과 상호 작용할 수 있다.The one or more users 106 may be a person, machine, or other means of interacting with the client device 110. In an exemplary embodiment, the user 106 may interact with the network architecture 100 through the client device 110 or other means, although it is not part of the network architecture 100. For example, a user may provide an input (e.g., a touch screen input or an alphanumeric input) to the client device 110, and the input may be communicated to the networking system 102 via the network 104. In this example, networking system 102 communicates information to client device 110 over network 104 in response to receipt of an input from a user and provides it to the user. In this manner, a user may interact with the networked system 102 using the client device 110. [

API(Application program interface, 애플리케이션 프로그램 인터페이스) 서버(120) 및 웹 서버(122)는 각각 하나 이상의 애플리케이션 서버(140)에 결합되고, 프로그램 방식 및 웹 인터페이스를 하나 이상의 애플리케이션 서버(140)에 각각 제공한다. 애플리케이션 서버(140)는 하나 이상의 발행 시스템(142) 및 지불 시스템(144)을 호스팅할 수 있으며, 각각은 하나 이상의 모듈 또는 애플리케이션을 포함할 수 있고, 또한 각각은 하드웨어, 소프트웨어, 펌웨어 또는 이들의 임의의 조합으로서 구현될 수 있다. 애플리케이션 서버(140)는 하나 이상의 정보 저장소 또는 데이터베이스(126)로의 액세스를 용이하게 하는 하나 이상의 데이터베이스 서버(124)에 결합되도록 차례로 도시되어 있다. 예시적인 실시예에서, 데이터베이스(126)는 발행 시스템(120)에 포스팅될 정보(예를 들어, 발행물 또는 리스팅)를 저장하는 저장 장치이다. 데이터베이스(126)는 또한 예시적인 실시예에 따라 디지털 발행물 정보를 저장할 수 있다.An application program interface (API) server 120 and a web server 122 are each coupled to one or more application servers 140 and provide a programmatic and web interface to one or more application servers 140, respectively . The application server 140 may host one or more publishing systems 142 and a payment system 144, each of which may include one or more modules or applications, and each of which may include hardware, software, firmware, As shown in FIG. Application server 140 is shown in turn to be coupled to one or more database servers 124 that facilitate access to one or more information stores or databases 126. In an exemplary embodiment, the database 126 is a storage device that stores information (e.g., a publication or listing) to be posted to the publishing system 120. The database 126 may also store digital publication information in accordance with an exemplary embodiment.

또한, 제3자 서버(130) 상에서 실행되는 제3자 애플리케이션(132)은 API 서버(120)에 의해 제공되는 프로그램 방식 인터페이스를 통해 네트워크화 시스템(102)에 프로그래밍 액세스되는 것으로서 도시되어 있다. 예를 들어, 네트워크화 시스템(102)으로부터 취득된 정보를 이용하는 제3자 애플리케이션(132)은 제3자에 의해 호스팅되는 웹 시스템 상의 하나 이상의 특집 또는 기능을 지원한다. 예를 들어, 제3자 웹 시스템은 네트워크화 시스템(102)의 관련 애플리케이션에 의해 지원되는 하나 이상의 홍보 기능, 마켓플레이스 기능 또는 지불 기능을 제공한다.The third party application 132 running on the third party server 130 is also shown as being programmed access to the networked system 102 via the programmatic interface provided by the API server 120. For example, third party application 132 using information obtained from networking system 102 supports one or more features or functions on a web system hosted by a third party. For example, the third party web system provides one or more promotional, marketplace or payment functions supported by the associated application of the networked system 102.

발행 시스템(142)은 네트워크화 시스템(102)에 액세스하는 사용자(106)에게 다수의 발행 기능 및 서비스를 제공할 수 있다. 마찬가지로 지불 시스템(144)은 지불 및 거래를 수행하거나 촉진하기 위한 다수의 기능을 제공할 수 있다. 네트워크화 시스템(102)의 일부를 형성하도록 발행 시스템(142) 및 지불 시스템(144)이 도 1에 도시되어 있지만, 대안적인 실시예에서 각 시스템(142, 144)은 분리된 지불 서비스의 일부를 형성하여 네트워크화 시스템(102)과 구별될 수 있다. 일부 실시예에서, 지불 시스템(144)은 발행 시스템(142)의 일부로서 형성될 수 있다.The publishing system 142 may provide a number of publishing functions and services to the user 106 accessing the networking system 102. Likewise, the payment system 144 may provide a number of functions for performing or facilitating payments and transactions. Although the publishing system 142 and the payment system 144 are shown in Figure 1 to form part of the networked system 102, in an alternative embodiment each system 142,144 forms part of a separate payment service And can be distinguished from the networked system 102. In some embodiments, payment system 144 may be formed as part of issuing system 142.

리스팅 시스템(150)은, 사용자가 선택한 데이터를 사용하여 판매할 발행물을 리스팅하는 다양한 측면을 수행하도록 동작 가능한 기능을 제공한다. 다양한 실시예에서, 판매자는 리스팅되는 발행물의 제목 또는 설명을 제공함으로써 (리스팅 시스템(150)을 사용하여) 발행물을 리스팅할 수 있다. 제목은 리스팅 제목이라고 지칭될 수 있고 리스팅 시스템(150)(또는 발행 시스템(142) 내의 다른 구성요소)에 의해 사용되어 리스팅된 발행물에 대한 CatReco를 제공한다. 다른 실시예에서, 리스팅 시스템(150)은 데이터베이스(126), 제3자 서버(130), 발행 시스템(120) 및 다른 소스로부터 사용자 선택 데이터에 액세스할 수 있다. 일부 예시적인 실시예에서, 리스팅 시스템(150)은 사용자 선호도의 개인화를 수행하기 위해 사용자 데이터를 분석한다. 사용자에 의해 더 많은 컨텐츠가 카테고리에 추가되면, 리스팅 시스템(150)은 개인화 설정을 더 개선할 수 있다. 일부 예시적인 실시예에서, 리스팅 시스템(150)은 발행 시스템(120)(예를 들어, 발행물 리스팅에 액세싱) 및 지불 시스템(122)과 통신한다. 다른 실시예에서, 리스팅 시스템(150)은 발행 시스템(120)의 일부이다.The listing system 150 provides functionality that is operable to perform various aspects of listing a publication for sale using data selected by the user. In various embodiments, the seller may list the publication (using the listing system 150) by providing a title or description of the publication being listed. The title may be referred to as a listing title and provides CatReco for the publication listed and used by the listing system 150 (or other component within the publishing system 142). In another embodiment, the listing system 150 may access user-selected data from the database 126, the third-party server 130, the publishing system 120, and other sources. In some exemplary embodiments, the listing system 150 analyzes user data to perform personalization of user preferences. If more content is added to the category by the user, the listing system 150 may further improve the personalization settings. In some exemplary embodiments, the listing system 150 communicates with the publishing system 120 (e.g., accessing a publication listing) and the payment system 122. [ In another embodiment, listing system 150 is part of issuing system 120.

또한, 도 1에 도시된 클라이언트-서버 기반 네트워크 구조(100)는 클라이언트-서버 구조를 사용하지만, 본 발명의 주제는 물론 그러한 구조에 한정되지 않으며, 예컨대 분산되거나 사용자간 직접 접속된(peer-to-peer) 구조 시스템에서도 마찬가지로 애플리케이션을 잘 찾을 수 있다. 다양한 발행 시스템(142), 지불 시스템(144) 및 리스팅 시스템(150)은 네트워킹 기능을 필요로 하지 않는 독립형 소프트웨어 프로그램으로서도 구현될 수 있다.In addition, although the client-server based network architecture 100 shown in FIG. 1 uses a client-server architecture, the subject matter of the present invention is of course not limited to such a structure and may be, for example, distributed or peer- -peer) Structured systems can also find applications as well. The various publishing systems 142, payment systems 144, and listing system 150 may also be implemented as stand-alone software programs that do not require networking functionality.

웹 클라이언트(112)는 웹 서버(122)에 의해 지원되는 웹 인터페이스를 통해 다양한 발행 시스템 및 결제 시스템(142, 144)에 액세스할 수 있다. 유사하게, 프로그램 방식 클라이언트(116)는 API 서버(120)에 의해 제공되는 프로그램 방식 인터페이스를 통해 발행 시스템 및 결제 시스템(142, 144)에 의해 제공되는 다양한 서비스 및 기능에 액세스한다. 프로그램 방식 클라이언트(116)는, 예를 들어, 판매자가 오프라인 방식으로 네트워크화 시스템(102) 상의 리스팅을 저작하고 및 관리할 수 있게 하고, 프로그램 방식 클라이언트(116)와 네트워크화 시스템(102) 사이에서 일괄 모드(batch-mode) 통신을 수행할 수 있는 판매자 애플리케이션(예를 들어, 캘리포니아주 새너제이 소재의 이베이(eBay)®사에 의해 개발된 Turbo Lister 애플리케이션)일 수 있다. Web client 112 may access a variety of publishing and billing systems 142 and 144 via a web interface supported by web server 122. [ Similarly, the programmatic client 116 accesses various services and functions provided by the issuing system and billing system 142, 144 via the programmatic interface provided by the API server 120. The programmatic client 116 allows the merchant to author and manage listings on the networked system 102 in an off-line manner and to enable the merchant to author and manage listings in the bulk mode < RTI ID = 0.0 > (e. g., a Turbo Lister application developed by eBay, Inc. of San Jose, CA) capable of performing batch-mode communications.

또한, 제3자 서버(130) 상에서 실행되는 제3자 애플리케이션(132)은 API 서버(120)에 의해 제공되는 프로그램 방식 인터페이스를 통해 네트워크화 시스템(102)에 프로그램 방식의 액세스를 갖는 것으로 도시되어 있다. 예를 들어, 네트워크화 시스템(102)으로부터 취득된 정보를 이용하는 제3자 애플리케이션(132)은 제3자에 의해 호스팅되는 웹 시스템 상의 하나 이상의 특징 또는 기능을 지원할 수 있다. 제3자 웹 시스템은, 예를 들어 네트워크화 시스템(102)의 관련 애플리케이션에 의해 지원되는 하나 이상의 홍보 기능, 마켓플레이스 기능 또는 지불 기능을 제공할 수 있다.The third party application 132 executing on the third party server 130 is also shown as having programmatic access to the networking system 102 via the programmatic interface provided by the API server 120 . For example, a third party application 132 using information obtained from the networked system 102 may support one or more features or functions on a web system hosted by a third party. The third party web system may provide one or more promotional, marketplace or payment functions, for example, supported by the associated application of the networked system 102.

도 2는 예시적인 실시예에 따라 도 1의 리스팅 시스템(150)을 보다 상세히 도시한 블록도이다. 여기서, 리스팅 시스템(150)은 발행물의 리스팅과 관련된 백엔드 프로세스(back end process)를 수행하도록 동작하는 리스팅 서버(200)를 포함한다. 리스팅 시스템(150)은 다른 구성요소 중에서, 분류 추천(CatReco) 구성요소(202)를 포함한다. 사용자 장치(204)는, 사용자가 리스팅 사용자 인터페이스(206)와 상호 작용해서 리스팅할 발행물의 세부 사항을 제공함으로써 판매할 발행물을 리스팅하는데 직접 사용될 수 있다. 리스팅 사용자 인터페이스(206)는 이 정보를 리스팅 서버(200)에 전달한다. 이 프로세스는 본질적으로 상호 작용될 수 있다. 예를 들어, 리스팅 사용자 인터페이스(206)를 통해 사용자에 의한 임의의 입력이 리스팅 서버(200)에 전송되고, 그 시점에서 리스팅 서버(200)는 피드백을 제공하고, 사용자는 제공된 리스팅 정보를 변경하거나 추가하게 할 수 있게 된다.FIG. 2 is a block diagram illustrating the listing system 150 of FIG. 1 in greater detail in accordance with an exemplary embodiment. Here, the listing system 150 includes a listing server 200 operative to perform a back end process associated with the listing of the publication. The listing system 150 includes a CatReco component 202 among other components. The user device 204 may be used directly to list the publication to sell by providing the details of the publication to which the user is to interact with the listing user interface 206 to list. The listing user interface 206 passes this information to the listing server 200. This process can be essentially interoperable. For example, any input by the user via the listing user interface 206 is sent to the listing server 200, at which point the listing server 200 provides feedback, and the user changes the listing information provided It can be added.

본 발명의 목적상, CatReco 구성요소(202)에 의해 구현되는 리스팅 서버(200)의 CatReco 형태로 설명이 제한될 것이다. 일 실시예에서, 사용자는 리스팅 사용자 인터페이스(206)를 통해 제목 입력 또는 다른 텍스트 입력이 개시될 수 있고, 그 후 제목 입력 또는 다른 텍스트 입력이 CatReco 구성요소(202)로 전달될 수 있다. 그 후, CatReco 구성요소(202)는 사용자가 리스팅 사용자 인터페이스(206)를 통해 선택할 수 있는 발행물의 리스팅를 위한 카테고리의 제안된 주문 리스트를 제공할 수 있다. 다른 예시적인 실시예에서, 사용자(예를 들어, B2C 판매자)는 리스팅 시스템(150)에 의해 리스팅될 발행물의 리스트를 업로드할 수 있다. 발행물의 리스트는, 각 엔트리(entry)와 관련되는 리스팅 제목 및 카테고리(판매자 분류 체계에 기반함)를 그 리스트에 포함하고 있다. 그 후, CatReco 구성요소(202)는 카테고리(판매자의 판매 체계에 기반함)를 각 엔트리에 대한 카테고리(발행 시스템(142)의 분류 체계에 기반함)에 자동으로 매칭될 수 있다. 판매자는, 판매자가 제공하는 물품 리스트(예컨대, 리스팅 제목 및 카테고리가 있는 엔트리)에 판매자의 분류 체계를 제공하거나, 판매자가 발행 시스템(142)에 업로드하기 위해 판매자의 분류 체계의 사본을 제공할 수 있다.For purposes of the present invention, the description will be limited to the CatReco form of the listing server 200 implemented by the CatReco component 202. In one embodiment, the user may initiate a title entry or other text entry via the listing user interface 206, after which a title entry or other text entry may be communicated to the CatReco component 202. The CatReco component 202 may then provide a list of proposed orders of categories for listings of publications that the user may select via the listing user interface 206. [ In another exemplary embodiment, a user (e.g., a B2C merchant) may upload a list of publications to be listed by the listing system 150. The list of publications includes listing titles and categories (based on seller classification schemes) associated with each entry. The CatReco component 202 may then be automatically matched to the category (based on the merchant ' s sales system) to the category for each entry (based on the classification system of the issuing system 142). The seller may provide the seller's taxonomy to the seller's provided list of items (e.g., an entry with the listing title and category), or the seller may provide a copy of the seller's taxonomy for uploading to the publishing system 142 have.

CatReco 구성요소(202)의 다양한 실시예(리스팅 시스템(150)과 발행 시스템(142)간의 구성요소의 조합)는 열거되어 있는 발행물의 카테고리에 대한 정확하고 견고하며 신속한 추천을 설정하기 위해 SLM 재순위 지정 및 GBM 방법과 함께 SSE를 활용(leverage)한다.Various embodiments of the CatReco component 202 (a combination of components between the listing system 150 and the publishing system 142) can be used to reorder the SLM to establish accurate, robust, and quick recommendations for the categories of publications listed And leverage SSE with the GBM method.

리스팅 사용자 인터페이스(206)는 많은 형태를 취할 수 있다. 일 예시적 실시예에서, 리스팅 사용자 인터페이스(206)는 사용자 장치(204) 상의 웹 브라우저에 의해 실행되는 웹 페이지이다. 다른 예시적인 실시예에서, 리스팅 사용자 인터페이스(206)는 모바일 장치 상에 설치된 모바일 애플리케이션이다. 도 3a 및 도 3b는, 발행물을 리스팅하고 리스팅 발행물의 카테고리를 선택하기 위해 리스팅 사용자 인터페이스(206)에 의해 생성되는 사용자 인터페이스의 예를 도시한다.The listing user interface 206 may take many forms. In one exemplary embodiment, the listing user interface 206 is a web page that is executed by a web browser on the user device 204. In another exemplary embodiment, the listing user interface 206 is a mobile application installed on a mobile device. 3A and 3B illustrate examples of user interfaces created by the listing user interface 206 for listing publications and selecting categories of listing publications.

또한 리스팅 서버(200)는 리스팅 API(210)를 통해 제3자 서비스(208)에 의해 액세스될 수 있다. 제3자 서비스(208)의 일례는 판매자를 대신하여 발행물을 리스팅함으로써 리스팅 프로세스에서 판매자를 돕는 웹 시스템이다. 리스팅 API(210)는 특히 리스팅 서버(202)와 상호 작용하도록 설계되고 다수의 제3자(208)에게 배포될 수 있다.The listing server 200 may also be accessed by the third party service 208 via the listing API 210. One example of a third party service 208 is a web system that aides the seller in the listing process by listing the publication on behalf of the seller. The listing API 210 is specifically designed to interact with the listing server 202 and may be distributed to a number of third parties 208.

사용자가 (적어도 부분적으로 CatReco 구성요소(202)에 기인하여) 리스팅용 카테고리를 선택하거나 또는 리스팅 시스템이 카테고리를 판매자의 분류 체계로부터 발행 시스템(142)의 분류 체계로 자동적으로 매칭되면, 리스팅 서버(200)는 발행물 리스팅을 물품 관리 서버(212)에 전달하며, 물품 관리 서버(212)는 발행물 리스팅을 리스팅 데이터베이스(214)에 저장하는 것에 의해 발행물 리스팅을 공개하는 프로세스를 관리한다. 이는 하둡(Hadoop)과 같은 분산 구조(distributed architecture)를 통해 달성될 수 있다.If the user selects a category for listing (at least in part due to the CatReco component 202) or if the listing system automatically matches the category from the seller's classification system to the classification system of the publication system 142, 200 delivers the publication listing to the article management server 212 and the article management server 212 manages the process of publishing the publication listing by storing the publication listing in the listing database 214. [ This can be achieved through a distributed architecture such as Hadoop.

그 후 모델 서버(216)는 카테고리를 사용자에게 추천할 때 CatReco 구성요소(202)에 의해 사용되는 모델(LeafCat 모델 포함)을 생성 및/또는 수정하도록 오프라인 트레이닝을 수행하기 위해 리스팅 데이터베이스(214)로부터 리스팅에 관한 정보를 얻을 수 있다. 전술한 바와 같이, 모든 카테고리(예컨대, LeafCat)에 대한 언어 모델은 분류 리콜 후보 세트가 문장 임베딩 유사도 점수(SSE 모델링을 사용하여 도출됨) 및 언어 모델 혼잡도 점수(SLM을 사용하여 도출)로부터의 GBM 앙상블 신호로 재순위 지정될 수 있도록 트레이닝된다. 다양한 실시예에서, 모델 서버(216)는 SSE-SLM-GBM CatReco 결과를 계산하는데 사용되는 다양한 모델을 트레이닝하는 기능을 제공한다. 일부 실시예에서, 모델 서버(216)는 SSE 모델의 오프라인 트레이닝을 수행하기 위한 정보를 얻을 수 있다.The model server 216 then retrieves from the listing database 214 to perform offline training to create and / or modify the model (including the LeafCat model) used by the CatReco component 202 when recommending the category to the user Information about listings can be obtained. As described above, the language model for all categories (e.g., LeafCat) is based on the assumption that the classification recall candidate set is a GBM from sentence embedded similarity score (derived using SSE modeling) and language model congestion score (derived using SLM) And are trained so that they can be reordered as an ensemble signal. In various embodiments, model server 216 provides the ability to train various models used to compute SSE-SLM-GBM CatReco results. In some embodiments, the model server 216 may obtain information for performing off-line training of the SSE model.

다양한 실시예에서, SSE는 (문구, 문장 또는 문단과 같은) 심볼의 시퀀스를 연속적 차원 벡터 공간 내로 인코딩하는데 사용되며, 여기서 의미론적 레벨 유사 시퀀스가 이러한 벡터 공간에서 더 가까운 표현을 가질 것이다. 이 SSE 접근법은 리스팅 제목의 심층 잠재 의미론적 의미(deep latent semantic meaning)를 자동으로 캡쳐하고, 그 의미론적 레벨 의미를 공유된 다차원 벡터 공간으로 투영한다.In various embodiments, the SSE is used to encode a sequence of symbols (such as a phrase, sentence or paragraph) into a continuous dimension vector space, where the semantic level similar sequence will have a closer representation in this vector space. This SSE approach automatically captures the deep latent semantic meaning of the listing title and projects the semantic level semantics into a shared multidimensional vector space.

심층 학습(deep learning)은 최근 NLP(Natural Language Processing, 자연 언어 처리)에서 많은 가능성을 보였다. 이 분야의 NLP 연구원들은 심볼의 시퀀스(예를 들어, 문구, 문장, 문단 및 문서)를 의미론적 공간(semantic space)이라고 불리는 다차원 벡터 공간으로 인코딩하는 다양한 방법을 시도하고 있다. 의미론적 레벨 유사 시퀀스는 이러한 다차원 공간에서 더 가까운 표현을 가질 것이다. 이 분야의 연구는 단순한 단어 대신에 문장의 벡터 공간 표현의 채용으로 이어졌다. 일반적으로 문구나 문장은 한 단어가 아닌 맥락 관련 정보(contextual information)를 보다 잘 정의하고 있다. 다양한 실시예에서, 문장 임베딩의 연구는 판매자가 발행 시스템 상에 리스팅하는 발행물에 대한 카테고리를 추천하기 위해 활용된다.Deep learning has recently shown many possibilities in NLP (Natural Language Processing). NLP researchers in this field are trying various ways to encode a sequence of symbols (eg, phrases, sentences, paragraphs, and documents) into a multidimensional vector space called a semantic space. The semantic level similar sequence will have a closer representation in this multidimensional space. Research in this area has led to the adoption of vector space representations of sentences instead of simple words. In general, phrases and sentences better define contextual information rather than a single word. In various embodiments, the study of sentence embedding is utilized to recommend a category for the publication the seller lists on the issuing system.

예시적인 실시예에서, SSE는 주어진 리스팅 제목의 심층 잠재 의미론적 의미를 임베딩하고 이를 공유된 의미론적 벡터 공간에 투영하는데 사용된다. 벡터 공간은 벡터라고 불리는 객체의 집합이라고 할 수 있다. 벡터 공간은 그들의 차원에 의해 특정지어질 수 있고, 그들의 차원은 그 공간에서 독립적인 방향의 수를 지정한다. 의미론적 벡터 공간은 문구와 문장을 나타낼 수 있으며, NLP 동작에 대한 의미를 캡쳐할 수 있다.In an exemplary embodiment, the SSE is used to embed the deep latent semantic meaning of a given listing title and project it into a shared semantic vector space. A vector space is a set of objects called vectors. Vector spaces can be specified by their dimensions, and their dimensions specify the number of independent directions in the space. A semantic vector space can represent phrases and sentences, and can capture meaning for NLP operations.

유사하게, 상이한 투영 함수를 사용함으로써, SSE는 주어진 카테고리 트리 경로의 심층 잠재 의미론적 의미(즉, 최상위 레벨로부터 리프 레벨까지)를 임베딩하고 이를 공유된 의미론적 벡터 공간으로 투영하는데 사용된다. 이 SSE 접근법은 리스팅 제목으로부터 맥락 관련 정보 및 심층 의미론적 의미를 캡쳐하고 동의어, 오타, 복합어, 분할어 등의 단어에서의 많은 불일치를 핸들링(handling)할 수 있는 CatRecos를 가능하게 한다. Similarly, by using different projection functions, SSE is used to embed in-depth latent semantic meaning (i.e., from top level to leaf level) of a given category tree path and to project it into a shared semantic vector space. This SSE approach captures contextual information and in-depth semantic meaning from the listing title and enables CatRecos to handle many inconsistencies in words such as synonyms, typo, compound words, and split terms.

예시적인 실시예에서, 시스템 런타임에서, 입력되는 리스팅 제목은 공유된 의미론적 벡터 공간으로 투영되고, 리스팅 시스템은 리스팅 시스템에 의해 사용되는 분류화 분류 체계로부터 리프 카테고리에 대한 오프라인 사전 계산된 SSE 리스트로부터 가장 가까운 SSE 표현을 갖는 LeafCat을 추천한다. 또 다른 예시적인 실시예에서, 시스템 런타임에서, 입력되는 리스팅 제목은 공유된 의미론적 벡터 공간으로 투영되고, 리스팅 시스템은 리스팅 시스템의 다른 서비스로의 입력으로서 사용되는 리프 카테고리의 집합을 추천하여, CatReco 결과 및 점수를 생성한다. 예를 들어, 다른 서비스는 SLM 재순위 지정 서비스 또는 GBM 퓨전 예측 서비스를 포함할 수 있다.In an exemplary embodiment, at the system runtime, the entered listing title is projected into a shared semantic vector space, and the listing system retrieves from the offline pre-computed SSE list for the leaf category from the taxonomy classification scheme used by the listing system LeafCat with the closest SSE representation is recommended. In another exemplary embodiment, at the system runtime, the entered listing title is projected into a shared semantic vector space, and the listing system recommends a set of leaf categories used as input to other services of the listing system, Results and scores are generated. For example, other services may include an SLM re-ranking service or a GBM fusion prediction service.

다양한 심층 의미론적 모델은 의미론적 유사 어구를 서로 가까운 벡터에 투영하고 의미상 상이한 어구를 멀리 떨어진 벡터로 투영하도록 트레이닝된다. 트레이닝시에, 리스팅 제목 T가 LeafCatC₁로 분류될 수 있으면, T와 C₁에 대해 투영된 의미론적 벡터 공간 값은 가능한 한 가깝게 되어야 하며, 즉, || SSE(T) - SSE(C₁) ||이 최소화되어야 한다. 반면에, 임의의 다른 리프 카테고리 C_n에 대해 투영된 의미론적 벡터 공간 값은 가능한 한 멀리 있어야 하며, 즉, || SSE(T) -SSE(C_n) ||은 최대화되어야 한다. 트레이닝 동안, 의미론적 벡터간의 코사인 유사도가 계산될 수 있다. 따라서 두 벡터간의 의미론적 관련도는 코사인 유사도에 의해 측정될 수 있다.Various in - depth semantic models are trained to project semantic similar phrases to a vector close to each other and to project semantically different phrases into distant vectors. At the time of training, if the listing title T can be classified as LeafCatC ₁ , the projected semantic vector space values for T and C ₁ should be as close as possible, i.e., || SSE (T) - SSE (C ₁ ) || should be minimized. On the other hand, the projected semantic vector space values for any other leaf category C _n should be as far as possible, i.e., || SSE (T) -SSE (C _n ) || should be maximized. During training, the cosine similarity between semantic vectors can be calculated. Therefore, the semantic relation between two vectors can be measured by the degree of cosine similarity.

다양한 실시예에서, 머신 학습(machine learning)은 CatRecos를 생성하기 위해 카테고리 트리 내의 리프 카테고리인 소스(X), 예를 들어 리스팅 제목과, 타깃(Y)간의 유사도를 최대화하기 위해 사용된다. SSE 모델은 심층 신경망(DNN; deep neural network) 및/또는 합성곱 신경망(CNN; convolutional neural network)에 기반한 것일 수 있다. DNN은 입력층과 출력층 사이에 다수의 은닉층을 갖는 인공 신경망이다. DNN은 심층 학습 구조를 회귀 신경망(recurrent neural network)에 적용할 수 있다. CNN은 상위에서 완전히 접속된 계층(예컨대 일반적인 인공 신경망과 매칭되는 계층)을 갖는 하나 이상의 컨볼루션 계층로 구성된다. 또한 CNN은 결합된 가중치(tied weights) 및 풀링 계층(pooling layers)을 사용한다. DNN 및 CNN 모두는 표준 역전파 알고리즘(standard backpropagation algorithm)을 사용하여 트레이닝될 수 있다. 도 7 내지 도 9는 SSE 모델을 트레이닝시키기 위한 예시적인 흐름도를 제공한다. 트레이닝된 SSE 모델은, 도 6d에 도시된 바와 같이, 기본 SSE CatReco 서비스에 의한 런타임 동안 사용된다.In various embodiments, machine learning is used to maximize the degree of similarity between the source (X), e.g., the listing title, and the target (Y), a leaf category within the category tree to generate CatRecos. The SSE model may be based on a deep neural network (DNN) and / or a convolutional neural network (CNN). DNN is an artificial neural network with multiple hidden layers between the input and output layers. DNN can apply deep learning structures to recurrent neural networks. The CNN is composed of one or more convolutional layers having a layer completely connected from the upper level (for example, a layer matched with a general artificial neural network). CNN also uses tied weights and pooling layers. Both DNN and CNN can be trained using a standard backpropagation algorithm. Figures 7-9 provide an exemplary flow chart for training SSE models. The trained SSE model is used during runtime by the underlying SSE CatReco service, as shown in Figure 6d.

SSE 모델은 모델 트레이닝 프로세스를 위해 많은 양의 라벨링된 데이터를 필요로 한다. 수동 라벨링 프로세스를 통해 막대한 양의 라벨 데이터를 얻는 것은 엄청나게 비싸다. 이러한 제한은 수백만의 판매자 온라인 행동을 활용하는 병렬 학습 접근법을 사용하여 순수한(clean) 라벨링된 학습 데이터를 자동적으로 도출함으로써 해결할 수 있다. 이 병렬 학습 접근법은 SSE 트레이닝을 위해 순수한 트레이닝 데이터(리스팅 제목과 리프 카테고리의 쌍)를 자동적으로 도출하기 위해 필터의 2개의 계층을 사용하여 구현될 수 있다. 도 8은 라벨링된 트레이닝 데이터의 쌍을 식별하는 예시적인 방법을 도시한다.The SSE model requires a large amount of labeled data for the model training process. Obtaining a huge amount of label data through the manual labeling process is prohibitively expensive. This limitation can be overcome by automatically deriving cleanly labeled learning data using a parallel learning approach that utilizes millions of vendor online behaviors. This parallel learning approach can be implemented using two layers of filters to automatically derive pure training data (pairs of listing titles and leaf categories) for SSE training. Figure 8 illustrates an exemplary method of identifying a pair of labeled training data.

예시적인 실시예에서, (판매자와 같은 사용자에 의해 제공되는 질의 또는 리스팅 제목으로부터의) 키워드의 세트가 주어진 CatReco의 동작은 발행 시스템에 의해 사용된 분류 체계를 나타내는 카테고리 트리로부터 관련 리프 카테고리의 순차적 리스트를 제공한다. 판매자에 의해 제공된 주어진 키워드의 세트를 기반으로, 발행 시스템은 각 추천 리프 카테고리에 대한 순서 또는 점수의 일부 개념과 함께, 주어진 키워드의 세트와 관련된 리프 카테고리를 추천한다. CatReco는 종종 리스팅 시스템에서 소비자 판매 흐름(즉, 발행물 리스팅을 위한)의 첫번째 단계 중 하나이다.In an exemplary embodiment, the operation of a CatReco given a set of keywords (from a query or listing title provided by a user, such as a seller) may result in a sequential list of associated leaf categories from the category tree representing the classification scheme used by the publishing system Lt; / RTI > Based on a given set of keywords provided by the seller, the issuing system recommends a leaf category associated with a given set of keywords, along with some of the order or scores for each recommendation leaf category. CatReco is often one of the first steps in the consumer sales flow (ie, for listing publications) in the listing system.

다양한 실시예에 따르면, 다양한 실시예에서 설명된 바와 같이, SSE는 리스팅 제목을 발행 시스템에 의해 사용되는 카테고리로 분류하는데 사용된다. 카테고리는 발행 시스템에 의해 사용되는 분류 체계로부터의 카테고리 트리 내의 LeafCat(카테고리 노드라고도 함)를 나타낼 수 있다. 이러한 접근법은 확장 가능하고, 신뢰할 수 있으며, 또한 비용도 저렴하다. 리스팅 시스템에서 CatRecos에 SSE를 사용하면 많은 장점이 있다.According to various embodiments, as described in various embodiments, the SSE is used to sort listing titles into categories used by the issuing system. The categories may represent LeafCats (also referred to as category nodes) in the category tree from the classification scheme used by the publishing system. This approach is scalable, reliable, and inexpensive. There are many advantages to using SSE with CatRecos in the listing system.

우선, SSE 기반 CatReco에 대한 병렬 학습 프레임워크를 통해 자동으로 트레이닝 데이터를 생성하는 것은 수동으로 트레이닝 데이터를 라벨링하는 비용을 줄인다. 자동으로 생성되는 트레이닝 데이터는 발행 시스템으로부터의 수백만의 판매자의 행동 방식 및 다른 오프라인 사용 가능한 정보를 활용하여 라벨에 대한 높은 정확도를 보장하는 병렬 학습 프레임워크에 기반하고 있다. First, generating training data automatically through the parallel learning framework for SSR-based CatReco reduces the cost of manually labeling training data. The automatically generated training data is based on a parallel learning framework that ensures high accuracy for the label utilizing the way millions of sellers behave from the issuing system and other off-line available information.

두번째로, SSE 기반 CatReco는 공지된 KNN(Known nearest neighbor, 최근접 이웃) 리콜 세트에 대한 의존성을 제거한다. 예를 들어, KNN 리콜 세트는 SSE 리콜 세트로 대체될 수 있다.Second, the SSE-based CatReco eliminates dependence on the known KNN (Known Nearest Neighbor) recall set. For example, a KNN recall set can be replaced by an SSE recall set.

셋번째로, OOV(out-of-voabulary, 어휘 이외) 쟁점 사안은 단어 레벨 대신에 하위 단어/문자 레벨에서 SSE 모델을 트레이닝함으로써 대처될 수있다. 이를 통해 CatReco는 복합어, 분할어, 오타 등을 자연스럽게 처리할 뿐만 아니라 많은 양의 단어를 처리하게 된다. SSE가 전체 시퀀스 컨텍스트를 인코딩한다는 점을 감안하면, 하위 단어 레벨에서의 모델링은 문맥적 의미론적 정보를 잃지 않는다. Third, out-of-voabulary issues can be addressed by training the SSE model at the subword / character level instead of at the word level. This allows CatReco to process compound words, segmented words, and typo, as well as handle a large number of words. Given that SSE encodes the entire sequence context, modeling at the lower word level does not lose contextual semantic information.

네번째로, CatReco 시스템은 모든 카테고리 트리 경로(예를 들어, 16,000개 리프 카테고리)에 대한 모든 의미론적 공간 벡터 표현이 미리 오프라인으로 사전 계산될 수 있고 또한 로그 레벨의 효율적 K차원(KD)-트리 알고리즘을 적용하여 최적으로 매칭된 카테고리 트리 경로를 신속하게 식별할 수 있기 때문에, 런타임 동안 신속한 응답을 제공할 수 있다. Fourth, the CatReco system allows all semantic spatial vector representations for all category tree paths (e.g., 16,000 leaf categories) to be pre-computed offline in advance, and also for efficient log-level K-dimensional (KD) Can be applied to quickly identify the best matched category tree path, thus providing rapid response during runtime.

마지막으로, 임의의 가능한 새로운 LeafCat에 대한 의미론적 공간 벡터 표현은 임의의 모델을 다시 트레이닝할 필요없이 수초 내에 직접 계산될 수 있다. 이에 의해 특히 카테고리 트리에 많은 업데이트가 있을 때, SSE 기반 CatReco 시스템은 확장성이 매우 우수하게 된다.Finally, the semantic spatial vector representation for any possible new LeafCat can be computed directly in seconds, without having to retrain any models. This makes SSE-based CatReco systems highly scalable, especially when there are many updates to the category tree.

다양한 실시예에서, 발행 시스템에 의해 사용되는 분류 체계는 카테고리 트리로 표현된다. 다른 실시예에서는, 다른 분류 체계 구조가 사용될 수 있다. 상기 예가 발행 시스템에 의해 생성된 CatRecos를 설명하지만, 다양한 실시예가 다른 유형의 온라인 시스템에서 구현될 수도 있으며 발행 시스템에 한정되지 않는다는 것이 이해될 것다.In various embodiments, the classification scheme used by the publishing system is represented by a category tree. In other embodiments, other classification scheme structures may be used. It will be appreciated that while the above example describes CatRecos generated by a publishing system, various embodiments may be implemented in other types of online systems and are not limited to an issuing system.

다른 예시적인 실시예에서, SSE는 하나 이상의 리프 카테고리를 식별하기 위해, 다른 NLP 동작을 위해 소스(X)를 타깃(Y)에 맵핑하는데 사용될 수 있으며, 리스팅 제목(예를 들어, 소스)을 카테고리 트리(예를 들어, 타깃)에 매핑하는 것에 한정되지 않는다. 다음은 관련 소스 및 타깃과 함께 다양한 NLP 동작의 예를 나열한 표이다. 아래의 표 1에서 소스는In another exemplary embodiment, the SSE may be used to map the source (X) to the target (Y) for other NLP operations to identify one or more leaf categories, and to map the listing title (e.g., source) But is not limited to mapping to a tree (e.g., a target). The following is a table listing examples of various NLP operations with associated sources and targets. In Table 1 below, the source

도 3a는 예시적인 실시예에 따른 발행물을 리스팅하기 위한 사용자 인터페이스(300)를 도시한다. 필드(310)는 판매자가 리스팅을 위한 제목을 제공하는 텍스트 필드이다. 도 3a에 도시된 예에서, 판매자는 제목 "Clash of the titans movie"을 제공해서 리스팅할 발행물을 기술한다. 리스팅을 위한 제목은 종종 리스팅의 일반적인 설명이다(그리고 발행물과 관련된 속성의 설명을 포함할 수도 있음). 발행 시스템은 제목을 이용하여 발행물을 리스팅하기 위한 하나 이상의 관련 카테고리를 식별할 수 있다. 사용자 인터페이스 요소(320)는 판매자 관련 카테고리를 제시한다. 본 특정 실시예에서, 상위 3 카테고리는 사용자 자신이 그 발행물을 리스팅하고자 하는 카테고리를 선택하기 위해 사용자에게 제시된다. 카테고리 각각은 예시적인 실시예에 대한 카테고리 트리 내의 카테고리의 리프를 나타낸다. 도 3a에 따르면, 제 1 카테고리인 "DVD&영화>DVD&블루레이 디스크"가 판매자에 의해 선택된다. 도 3b는 카테고리 "DVD&영화>DVDs&블루레이 디스크" 내의 발행 시스템 상의 발행물 리스팅의 예를 도시한다.FIG. 3A illustrates a user interface 300 for listing publications in accordance with an exemplary embodiment. Field 310 is a text field in which the seller provides a title for listing. In the example shown in Fig. 3A, the seller provides the title "Clash of the titans movie" to describe the publication to be listed. The title for the listing is often a general description of the listing (and may include a description of the property associated with the publication). The publishing system may use the title to identify one or more related categories for listing publications. The user interface element 320 presents a seller-related category. In this particular embodiment, the top three categories are presented to the user to select the category in which the user would like to list the publication. Each of the categories represents a leaf of a category within a category tree for an exemplary embodiment. According to Fig. 3A, the first category "DVD & Movie> DVD & Blu-ray Disc" is selected by the seller. FIG. 3B shows an example of a publication listing on the issuing system in the category "DVD & Movie> DVDs & Blu-ray Disc ".

SSE가 특정 <소스, 타깃> 쌍을 매핑하는데 적용되면 SSE 소스 모델 및 SSE 타깃 모델의 파라미터가 최적화되어, 관련 <소스, 타깃> 쌍은 더 가까운 벡터 표현 거리를 가진다. 다음의 식을 사용하여 최소 거리를 계산할 수 있다.When the SSE is applied to map a particular <source, target> pair, the parameters of the SSE source model and SSE target model are optimized so that the relevant <source, target> pair has a closer vector representation distance. The following formula can be used to calculate the minimum distance.

여기서,here,

ScrSeq=소스 시퀀스; ScrSeq = source sequence;

TgtSeq=타깃 시퀀스; TgtSeq = target sequence;

SrcMod=소스 SSE 모델; SrcMod = Source SSE model;

TgtMod=타깃 SSE 모델; TgtMod = target SSE model;

SrcVec=소스 시퀀스의 연속 벡터 표현(소스의 의미론적 벡터라고도 함); 및 SrcVec = a continuous vector representation of the source sequence (also known as the source semantic vector); And

TgtVec=타깃 시퀀스에 대한 연속 벡터 표현(타깃의 의미론적 벡터라고도 함)이다.TgtVec = a continuous vector representation (also referred to as the target semantic vector) for the target sequence.

소스 SSE 모델은 소스 시퀀스를 연속 벡터 표현으로 인코딩한다. 타깃 SSE 모델은 타깃 시퀀스를 연속 벡터 표현으로 인코딩한다. 예시적인 실시예에서, 벡터는 각각 약 100개의 차원을 갖는다.The source SSE model encodes the source sequence into a continuous vector representation. The target SSE model encodes the target sequence as a continuous vector representation. In an exemplary embodiment, the vectors each have about 100 dimensions.

도 4는 판매자에 의해 제공된 리스팅 제목의 예(400)를 도시한다. 도 4에 도시된 리스팅 제목(410)은 "헬로우 키티 티셔츠(hello kitty T-shirt)"이다. 본 예에서는 3개의 차원이 표시된다. 또한 루트 노드(453)를 가지는 카테고리 트리(450)의 2개의 리프 노드(451, 452)가 도시되어 있다. 소스 SSE 모델은 소스(X)(420)의 의미론적 벡터를 생성한다. X는 벡터 [0.1, 2.3, 3.0]으로 나타내어진다. 타깃 SSE 모델은 타깃(Y1 및 Y2)(430 및 440)의 의미론적 벡터를 생성한다. 리프 노드(451) "의류, 신발, 액세서리>여아>티셔츠"에 대한 Y1은 벡터 [0.1, 2.2, 3.0]으로 나타내어지고, 리프 노드(452)의 "의류, 신발, 액세서리>남아>티셔츠"에 대한 Y2는 벡터 [0.5, 2.6, 2.3]으로 나타내어진다. 본 예에서, 벡터의 차원 값에 따라, "헬로 키티 티셔츠"라는 리스팅 제목은 소스 및 타깃의 의미론적 벡트의 차원에 기초하여 리프 노드(452)의 "의류, 신발, 액세서리>남아>티셔츠"보다는 리프 노드(451)의 "의류, 신발, 액세서리>여아>티셔츠"에 더 근접한 것이 나타난다. 도 4에 도시된 예는 3차원만을 갖는 매우 단순한 예이다. 4 shows an example 400 of a listing title provided by the seller. The listing title 410 shown in FIG. 4 is "hello kitty T-shirt ". In this example, three dimensions are displayed. Two leaf nodes 451 and 452 of the category tree 450 having a root node 453 are also shown. The source SSE model generates a semantic vector of source (X) 420. X is represented by the vector [0.1, 2.3, 3.0]. The target SSE model generates a semantic vector of the targets (Y1 and Y2) 430 and 440. Y1 for the leaf node 451 "Clothing, shoes, accessories> Girls> T-shirt" is represented by the vector [0.1, 2.2, 3.0] Y2 is represented by vector [0.5, 2.6, 2.3]. In this example, depending on the dimension value of the vector, the listing title "Hello Kitty T-shirt" may be larger than the "clothing, shoes, accessories " shirt " of leaf node 452 based on the dimensions of the source and target semantic vector Quot; clothes, shoes, accessories > girl > T-shirt "of the leaf node 451 appear. The example shown in Figure 4 is a very simple example having only three dimensions.

다른 실시예에서는, 임의의 수의 차원이 사용될 수 있다. 예시적인 실시예에서, 의미론적 벡터의 차원은 KD 트리 구조에 저장된다. KD 트리 구조는 KD 공간에서 포인트를 체계화하기 위한 공간 분할 데이터 구조라고 불린다. KD 트리를 사용하여 가장 가까운 이웃 룩업(nearest-neighbor lookup)을 수행할 수 있다. 따라서, 공간 상의 소스 포인트가 주어지면, 가장 가까운 이웃 룩업은 소스 포인트에 가장 가까운 포인트를 식별하는데 사용될 수 있다. In other embodiments, any number of dimensions may be used. In an exemplary embodiment, the dimensions of the semantic vector are stored in a KD tree structure. The KD tree structure is called a space division data structure for organizing points in the KD space. The KD tree can be used to perform the nearest neighbor lookup. Thus, given a source point in space, the nearest neighbor lookup can be used to identify the point closest to the source point.

도 5a는 예시적인 실시예에 따른 시스템의 카테고리 분류 체계에 리스팅 제목을 매칭시키기 위한 런타임 분류 프로세스를 나타내는 흐름도(500)이다. 카테고리를 위한 리스팅 시스템(150)에 의해 사용되는 분류 체계는 카테고리 트리에 의해 표현될 수 있고 카테고리 트리 내의 각각의 리프는 카테고리를 나타낼 수 있다. 상기 흐름도(500)는 동작(510, 520, 530, 540)을 포함한다.5A is a flowchart 500 illustrating a runtime classification process for matching a listing title to a category classification scheme of a system according to an exemplary embodiment. The classification scheme used by the listing system 150 for a category can be represented by a category tree and each leaf in the category tree can represent a category. The flowchart 500 includes operations 510, 520, 530, and 540.

동작(510)에서, 리스팅 시스템(150)은 발행물의 리스팅 제목을 수신한다. 동작(520)에서, SSE가 발행물을 리스팅하는 리스팅 시스템(150)에 의해 사용되는 카테고리 분류 체계에 리스팅 제목을 매핑하는데 사용된다. 동작(530)에서, 적어도 하나의 관련 카테고리를 식별한다. 발행물의 리스팅을 위해 발행 시스템에서 사용되는 카테고리 분류 체계로 관련 카테고리를 식별한다. 동작(540)에서, 적어도 하나의 식별된 관련 카테고리는 사용자로의 제시를 위해 장치에 제공된다.At operation 510, the listing system 150 receives the listing title of the publication. At operation 520, the SSE is used to map the listing title to the category classification scheme used by the listing system 150 listing the publication. At act 530, at least one related category is identified. Identify the relevant category as a category classification scheme used in the publication system for listings of publications. At operation 540, at least one identified related category is provided to the device for presentation to the user.

도 5b는 예시적 실시예에 따른 시스템의 카테고리 분류 체계에 리스팅 제목을 매칭시키기 위한 런타임 분류 프로세스를 나타내는 흐름도(501)이다. 리프 카테고리(LeafCat) 식별자(ID)가 리콜 세트 SSE를 사용하여 식별된다. 상기 흐름도(501)는 동작(510, 520, 535, 545)을 포함한다. 동작(510)에서, 리스팅 시스템(150)은 발행물의 리스팅 제목을 수신한다. 동작(520)에서, SSE가, 발행물을 리스팅하는 리스팅 시스템(150)에 의해 사용되는 카테고리 분류 체계에 리스팅 제목을 매핑하는데 사용된다. 동작(535)에서, 관련 카테고리의 세트를 식별한다. 발행물을 리스팅하기 위해 발행 시스템에 의해 사용되는 카테고리 분류 체계로 관련 카테고리를 식별한다. 관련 카테고리는 수신된 리스팅 제목의 상위 N 카테고리를 나타낼 수 있다. 일예로서, N=50이다. 동작(545)에서, 리스팅 시스템(150)의 서비스에 LeafCat ID의 리콜 세트가 제공된다. 예를 들어, 상기 서비스는 SLM 재순위 지정 서비스 또는 GBM 퓨전 예측 서비스일 수 있다.FIG. 5B is a flowchart 501 illustrating a runtime classification process for matching a listing title to a category classification scheme of a system according to an exemplary embodiment. The leaf category (LeafCat) identifier (ID) is identified using the recall set SSE. The flowchart 501 includes operations 510, 520, 535, and 545. At operation 510, the listing system 150 receives the listing title of the publication. At operation 520, the SSE is used to map the listing title to the category classification scheme used by the listing system 150 listing the publication. At act 535, a set of related categories is identified. Identify the relevant category by the category classification scheme used by the publishing system to list the publication. The related category may indicate the top N category of the received listing title. As an example, N = 50. At operation 545, the service of the listing system 150 is provided with a recall set of LeafCat IDs. For example, the service may be an SLM re-ranking service or a GBM fusion prediction service.

도 6a는 SSE를 사용하여, 발행물 리스팅을 위해 발행 시스템에 의해 사용된 카테고리 분류 체계에 리스팅 제목을 더 상세히 매핑하는 동작(520)을 도시하는 흐름도(600)이다. 매핑 동작(520)은 동작(610, 620, 630)을 포함한다.6A is a flow diagram 600 illustrating operation 520 of further mapping the listing title to the category classification scheme used by the issuing system for publication listing, using SSE. Mapping operation 520 includes operations 610, 620, and 630.

동작(610)에서, 타깃(Y)의 사전 계산된(즉, 타깃 SSE 모델을 사용한) 의미론적 벡터가 취득된다. 타깃(Y)의 사전 계산된 의미론적 벡터는 의미론적 벡터 공간을 생성한다. 일 실시예에서, 타깃 시스템의 카테고리 분류 체계 엔트리는 타깃 SSE 모델을 사용하여 계산된다. 타깃(Y)의 사전 계산된 의미론적 벡터는 오프라인으로 계산되며, 도 6b을 참조하여 보다 상세하게 설명된다. 타깃 SSE 모델은 타깃 시퀀스를 연속 벡터 표현으로 인코딩한다.At operation 610, a semantically vector of the target Y (i.e., using the target SSE model) is obtained. The pre-computed semantic vector of the target (Y) produces a semantic vector space. In one embodiment, the category system entry of the target system is computed using the target SSE model. The pre-computed semantic vector of the target Y is computed offline and is described in more detail with reference to FIG. 6B. The target SSE model encodes the target sequence as a continuous vector representation.

동작(620)에서, 소스(X)의 의미론적 벡터 표현은 공유된 의미론적 벡터 공간으로 투영된다. 타깃(Y)의 의미론적 벡터로 생성된 의미론적 벡터 공간은 소스(X)의 의미론적 벡터와 결합되어, 공유된 의미론적 벡터 공간을 생성한다. 소스 SSE 모델은 리스팅 제목의 의미론적 벡터 표현을 작성하는데 사용된다.At operation 620, the semantic vector representation of the source X is projected into the shared semantic vector space. The semantic vector space generated by the semantic vector of the target (Y) is combined with the semantic vector of the source (X) to generate a shared semantic vector space. The source SSE model is used to create a semantic vector representation of the listing title.

동작(630)에서, 공유된 의미론적 벡터 공간 내에서 (리스팅 제목의) 소스(X) 의미론적 벡터 표현에 가장 가까운 의미론적 벡터 표현을 갖는 (분류 엔트리의) 타깃(Y) 의미론적 벡터 표현이 식별된다. 카테고리 엔트리는 LeafCat을 나타낼 수 있다. 의미론적 관련도 sim(X, Y)는 예시적인 실시예에서 코사인 유사도 함수를 사용하여 계산된다. 동작(620, 630)에 대한 하위 동작의 예를 도 6c에 기초하여 설명한다.At operation 630, a target (Y) semantic vector representation (of the classification entry) having the semantic vector representation closest to the source (X) semantic vector representation (of the listing title) within the shared semantic vector space . The category entry may represent LeafCat. The semantic relevance sim (X, Y) is calculated using the cosine similarity function in the exemplary embodiment. An example of a sub-operation for operations 620 and 630 will be described with reference to Fig. 6C.

도 6b에 도시된 바와 같이, 흐름도(601)는 동작(611 - 614)을 포함한다. 도 6b에 따르면, 동작(610)에서 타깃(Y)의 사전 계산된 의미론적 벡터가 취득된다.As shown in FIG. 6B, the flowchart 601 includes operations 611 - 614. According to FIG. 6B, at operation 610, a pre-computed semantic vector of the target Y is obtained.

동작(611)에서, 타깃이 액세스된다. 예시적인 실시예에서, 타깃은 리스팅 시스템(150)의 카테고리 트리에 대한 경로가 액세스되는 것을 나타낸다. 경로는 LeafCat의 루트를 나타낸다. 예시적인 실시예에서, 카테고리 트리 경로는 리스팅 시스템(150)으로부터의 데이터베이스로부터 액세스된다.At act 611, the target is accessed. In an exemplary embodiment, the target indicates that the path to the category tree of the listing system 150 is being accessed. The path represents the root of LeafCat. In an exemplary embodiment, the category tree path is accessed from a database from the listing system 150.

동작(612)에서, 단어 해싱(word hashing)은 타깃으로부터의 카테고리 트리 경로 상에서 수행된다. 예시적인 실시예에서, 단어 해싱은 약어(letter-trigram)을 사용하여 수행된다. 약어 기반의 단어 해싱은 원래 문구(예컨대, 루트에 대한 리프 경로)을 취하고 사전 처리(예컨대 빈 공간에 #을 추가)한 후에 약어를 식별한다. 단어 해싱은 대어휘의 컴팩트한 표현을 만드는 데 사용될 수 있다. 예를 들어, 500,000의 어휘를 30,000자의 약어로 줄일 수 있다. 단어 해싱은 오기, 굴절(inflections), 합성어, 분할어 등에 대해 탄탄한 리스팅 시스템(150) 또는 다른 시스템을 생성한다. 또한, 처음보는 단어(unseen word)도 단어 해싱을 사용하여 보편화할 수 있다.In operation 612, word hashing is performed on the category tree path from the target. In an exemplary embodiment, word hashing is performed using a letter-trigram. Abbreviated word hashing takes the original phrase (e.g., the leaf path to the root) and identifies the abbreviation after preprocessing (e.g., adding # to empty space). Word hashing can be used to create a compact representation of large vocabularies. For example, you can reduce the vocabulary of 500,000 to an abbreviation of 30,000 characters. The word hashing creates a robust listing system 150 or other system for retrieval, inflections, compound words, segmented words, and the like. In addition, unseen words can be generalized using word hashing.

동작(613)에서, 타깃 SSE 모델은 타깃의 의미론적 벡터(의미론적 벡터 표현이라고도 함)를 생성하는데 사용된다. At operation 613, the target SSE model is used to generate a semantic vector of the target (also referred to as a semantic vector representation).

동작(614)에서, 타깃의 의미론적 벡터가 메모리 장치의 KD 트리에 저장된다. 예시적인 실시예에서, 타깃의 의미론적 벡터의 차원은 KD 트리 구조에 저장된다. 예시적인 실시예에서, 타깃의 의미론적 벡터는 카테고리 트리 내의 각각의 LeafCat에 대한 벡터를 나타낸다. 리프 카테고리는 도 4에 도시된 바와 같은 리프 노드로서 표현될 수 있다. 리스팅 시스템(150)에 대한 카테고리 트리의 일례는 19,000개 이상의 카테고리 트리 경로(예를 들어, 리프에 대한 루트)를 포함한다.At operation 614, the semantic vector of the target is stored in the KD tree of the memory device. In an exemplary embodiment, the dimension of the target semantic vector is stored in the KD tree structure. In an exemplary embodiment, the target semantic vector represents a vector for each LeafCat in the category tree. The leaf category can be expressed as a leaf node as shown in Fig. An example of a category tree for listing system 150 includes more than 19,000 category tree paths (e.g., routes to leaves).

타깃의 의미론적 벡터를 사전 계산함으로써, 리스팅 제목을 나타내는 소스의 의미론적 벡터를 매핑하는 프로세스가 매우 신속하게 계산될 수 있다. 다양한 실시예에서, 타깃 시퀀스는 (도 6b의 동작(601)에 의해 나타낸 바와 같이) 런타임 전에 사전 계산되고, 소스 시퀀스 벡터가 런타임 동안 계산된 후에 런타임 동안 타깃 시퀀스 벡터와 비교된다. 도 6c는 타깃 시퀀스 벡터를 계산하는 오프라인 프로세스와 런타임 프로세스를 결합하여, 리스팅 제목이 리스팅 시스템(150)에 의해 사용되는 카테고리 분류 체계에 (SSE 모델을 사용하고 소스 의미론적 벡터와 타깃 의미론적 벡터간의 의미론적 관련도(예컨대, 코사인 함수를 이용함)를 계산함으로써 매핑될 수 있도록 하는 흐름도(670)를 나타낸다.By precomputing the target semantic vector, the process of mapping the semantic vector of the source representing the listing title can be computed very quickly. In various embodiments, the target sequence is precalculated prior to runtime (as indicated by operation 601 of FIG. 6B) and compared to the target sequence vector during run-time after the source sequence vector is calculated during run-time. FIG. 6C illustrates an example of combining the offline process and the runtime process for calculating the target sequence vector, so that the listing title is associated with the category classification scheme used by the listing system 150 (using the SSE model and between the source semantic vector and the target semantic vector) (670) that can be mapped by calculating a semantic relevance (e.g., using a cosine function).

도 6c에 도시된 바와 같이, 흐름도(670)는 오프라인 동작(601) 및 런타임 동작(610, 620, 630)을 포함한다. 오프라인 동작(601)은 도 6b에 도시된다. 런타임 동작(610, 620 및 630)은 도 6a에 도시된다. (소스(X)의 의미론적 벡터 표현을 공유된 의미론적 벡터 공간으로 투영하기 위한) 동작(620)은 하위 동작(615, 616)을 포함한다. (공유된 의미론적 벡터 공간 내의 (리스팅 제목의) 소스(X) 의미론적 벡터 표현에 가장 가까운 의미론적 벡터 표현을 갖는 (카테고리화 엔트리의) 타깃(Y) 의미적 벡터 표현을 식별하기 위한) 동작(630)은 하위 동작(617, 618)을 포함한다. 6C, the flowchart 670 includes an off-line operation 601 and a run-time operation 610, 620, 630. The offline operation 601 is shown in FIG. 6B. Runtime operations 610, 620 and 630 are shown in Figure 6A. (For projecting the semantic vector representation of the source X into the shared semantic vector space) includes sub-operations 615 and 616. [ (To identify the target (Y) semantic vector representation (of the categorization entry) having the semantic vector representation closest to the source (X) semantic vector representation (of the listing title) in the shared semantic vector space (630) includes sub-actions (617, 618).

도 6c에 도시된 바와 같이, 타깃의 의미론적 벡터 표현은 동작(601)에서 오프라인으로 계산된다. 동작(610)에서, 타깃의 사전 계산된 의미론적 벡터가 취득된다.As shown in FIG. 6C, the semantic vector representation of the target is computed offline at operation 601. At operation 610, a pre-computed semantic vector of the target is obtained.

동작(615)에서, 단어 해싱이 소스에 대해 수행된다. 예시적인 실시예에서, 소스는 리스팅을 위한 리스팅 제목을 나타낸다. 동작(616)에서, 소스 SSE 모델을 사용하여 소스의 의미론적 벡터가 생성된다. 예시적인 실시예에서, 조합된 동작(615, 616)이 사용되어, (동작(620)에서 나타낸 바와 같이) 소스의 의미론적 벡터 표현을 공유된 의미론적 벡터 공간으로 투영한다.At operation 615, word hashing is performed on the source. In an exemplary embodiment, the source represents a listing title for listings. At operation 616, a semantic vector of the source is generated using the source SSE model. In an exemplary embodiment, combined operations 615 and 616 are used to project the semantic vector representation of the source into a shared semantic vector space (as shown at operation 620).

동작(617)에서, 관련성 유사도 sim (X, Y)가 추정된다. 동작(618)에서, X(소스 의미론적 벡터로서 나타냄)와의 최단 거리를 갖는 최적 매칭된 카테고리 Y(타깃 의미론적 벡터로서 나타냄)가 식별된다. 예시적인 실시예에서, 결합된 동작(617, 618)을 사용하여, (동작(630)으로 나타낸 바와 같이) 공유된 의미론적 벡터 공간 내의 (리스팅 제목의) 소스(X) 의미론적 벡터 표현에 가장 가까운 의미론적 벡터 표현을 갖는 (카테고리화 엔트리의) 타깃(Y) 의미론적 벡터 표현을 식별한다.At operation 617, the relevance similarity sim (X, Y) is estimated. In operation 618, an optimal matched category Y (represented as a target semantic vector) having the shortest distance from X (represented as a source semantic vector) is identified. In an exemplary embodiment, combined actions 617 and 618 are used to determine the most likely (as indicated by operation 630) the source (X) semantic vector representation (of the listing title) within the shared semantic vector space (Y) semantic vector representations (of categorization entries) that have near semantic vector representations.

상술한 바와 같이, 다양한 실시예에서 소스 시퀀스 벡터와 타깃 시퀀스 벡터간의 의미론적 유사도(semantic relevance) sim (X, Y)을 학습함으로써 매핑이 수행될 수 있다. 예시적인 실시예에서, 의미론적 관련도라고도 지칭되는 의미론적 유사도는 코사인 유사도 함수 sim (X, Y)에 의해 계측될 수 있다. 일부 실시예에서, X는 소스 문장 시퀀스를 나타내고(즉, 판매자의 제목으로부터 도출됨), Y는 타깃 문장 시퀀스를 나타낸다(즉, 리스팅 시스템(150)의 카테고리 트리로부터 도출됨). 코사인 유사도 함수의 출력은 공유된 의미론적 벡터 공간을 나타낸다. 일반적으로, Y와 최적 매칭되는 카테고리는 X와 가장 높은 유사도 점수를 갖는다. 소스 시퀀스 및 타깃 시퀀스는 계산된 벡터 시퀀스를 나타내며, 각각이 다수 차원을 갖는다.As described above, in various embodiments, the mapping can be performed by learning the semantic relevance sim (X, Y) between the source sequence vector and the target sequence vector. In an exemplary embodiment, the semantic similarity, also referred to as the semantic relevance, can be measured by the cosine similarity function sim (X, Y). In some embodiments, X represents the source sentence sequence (i.e., derived from the seller's title) and Y represents the target sentence sequence (i.e., derived from the category tree of listing system 150). The output of the cosine similarity function represents a shared semantic vector space. Generally, the category that best matches Y has the highest similarity score to X. The source sequence and the target sequence represent the calculated vector sequence, each having multiple dimensions.

도 6d는 예시적인 실시예에 따른 SSE 런타임 분류 프로세스를 나타내는 흐름도(680)를 도시한다. 도 6d에 도시된 SSE 런타임 분류 프로세스가 사용되어, 리스팅 제목을 리스팅 시스템(150)의 카테고리 분류 체계에 매핑함으로써 리스팅 제목을 분류한다. 전술한 바와 같이, 흐름도(680)는 상기 표 1에서 식별된 바와 같은, 소스를 타깃에 매핑함으로써 다수의 동작을 수행하는데 사용될 수 있다. 또한 도 6d에 도시된 기본 SSE 런타임 분류 프로세스는 기본 SSE 카테고리화 추천(catReco) 서비스(680)로도 지칭될 수 있다. 기본 SSE 카테고리화 서비스(680)는 리콜 세트 및 유사도 점수를 얻기 위한 SSE 런타임 디코딩 프로세스를 나타낸다. 예시적인 실시예에서, 상기 리콜 세트는 카테고리 트리 내의 N개의 상위 리프 노드의 세트를 나타낸다. 리콜 세트 및 유사도 점수는 CatReco 구성요소(202)(도 2에 도시됨)에 의해 사용되어 SSE-SLM-GBM CatReco 결과를 생성할 수 있다. SSE-SLM-GBM CatReco 결과의 생성은 도 10를 참조하여 이하에서 설명된다.FIG. 6D shows a flowchart 680 illustrating an SSE runtime classification process in accordance with an exemplary embodiment. The SSE runtime classification process shown in FIG. 6D is used to sort the listing titles by mapping the listing titles to the category classification scheme of the listing system 150. As described above, the flowchart 680 can be used to perform a number of operations by mapping a source to a target, such as those identified in Table 1 above. The basic SSE runtime classification process shown in FIG. 6D may also be referred to as a basic SSE categorization recommendation (catReco) service 680. The basic SSE categorization service 680 represents an SSE runtime decoding process for obtaining recall sets and similarity scores. In an exemplary embodiment, the recall set represents a set of N upper leaf nodes in a category tree. The recall set and similarity score may be used by the CatReco component 202 (shown in FIG. 2) to generate an SSE-SLM-GBM CatReco result. The generation of the SSE-SLM-GBM CatReco result is described below with reference to FIG.

흐름도(680)는 동작(611-618 및 510)을 포함한다. 동작(611-614)에 대해서는 도 6b를 참조하여 이미 설명하였다. 동작(615-618)에 대해서는 도 6c를 참조하여 이미 설명하였다. 동작(510)에 대해서는 도 5를 참조하여 이미 설명하였다. Flow diagram 680 includes operations 611-618 and 510. [ Operations 611-614 have already been described with reference to Figure 6b. Operations 615-618 have already been described with reference to Figure 6c. Operation 510 has already been described with reference to FIG.

동작(611-614)은 KD 트리 구조에 저장된 타깃의 시퀀스 의미론적 벡터를 계산하기 위해 사용되는 오프라인 프로세스를 기술하고 있다. 상기 KD 트리는 런타임 동안 액세스되어, 소스의 시퀀스 의미론적 벡터가 타깃의 시퀀스 의미론적 벡터를 갖는 공유된 의미론적 벡터 공간으로 투영될 수 있다. 관련성 유사도 sim(X, Y)가 (단계 617에서) 추정되고, (단계 618에서) X와 가장 짧은 거리를 갖는 최적 매칭된 카테고리 Y가 식별된다. 최적 매칭된 카테고리 Y는 리스팅의 리스팅 제목과 매칭되는 카테고리 트리 내의 상위 1의 카테고리로 지칭될 수 있다. 다양한 실시예에서, 상위 "N" 카테고리는 많은 N개의 카테고리가 사용자에게 제공될 수 있도록 식별된다. 예를 들어, 도 3a에서, 상위 3의 카테고리는 사용자 인터페이스(300)로 사용자(예를 들어, 리스팅 판매자)에게 제공된다.Operation 611-614 describes an offline process used to compute the target sequence semantic vector stored in the KD tree structure. The KD tree may be accessed during runtime so that the sequence semantics vector of the source may be projected into a shared semantic vector space having a sequence semantics vector of the target. The relevance similarity sim (X, Y) is estimated (at step 617) and the best matched category Y with the shortest distance to X is identified (at step 618). The best-matched category Y may be referred to as the top 1 category in the category tree that matches the listing title of the listing. In various embodiments, the upper "N" category is identified such that many N categories can be provided to the user. For example, in FIG. 3A, the top three categories are provided to a user (e.g., a listing seller) in a user interface 300.

예시적인 실시예에서, 타깃 심층 SSE 모델(613A) 및 소스 심층 SSE 모델(616A)은 도 7 내지 도 9에 설명된 SSE 모델 트레이닝 프로세스를 사용하여 트레이닝된다.In an exemplary embodiment, the target deep-seated SSE model 613A and the source deep-seated SSE model 616A are trained using the SSE model training process described in Figs. 7-9.

예시적인 실시예에서, CatReco 동작은 사용자에 의해 제공된 리스팅 제목을 LeafCat으로 분류하는데 사용된다. 리스팅 제목을 분류하는 것은 많은 수의 카테고리가 있는 경우 어려울 수 있다. CatReco 동작은 종종 리스팅 시스템(150)의 다양한 판매 흐름에서 사용된다. 예시적인 실시예에서, 리스팅 시스템(150)은 미국에서 19,000개 이상의 상이한 카테고리를 가질 수 있다. 리스팅 시스템은 사용자가 제공한 키워드의 세트, 및 CatReco를 생성하여 리스팅 판매자에게 제시할 때의 응답 시간에 기초하여 19,000개 이상의 카테고리 중에서 가장 관련도가 높은 카테고리를 선택하는 정확도를 개선하기 위해 종종 작동한다. In an exemplary embodiment, the CatReco operation is used to sort the listing titles provided by the user into LeafCats. Categorizing listing titles can be difficult if you have a large number of categories. CatReco operations are often used in various sales flows of the listing system 150. In an exemplary embodiment, the listing system 150 may have more than 19,000 different categories in the United States. The listing system often works to improve the accuracy of choosing the most relevant category from more than 19,000 categories based on a set of keywords provided by the user and a response time when generating and presenting CatReco to a listing seller .

도 7 내지 도 9는 예시적인 실시예에 따른 기본 SSE CatReco 서비스(680)에 의해 사용되는 SSE 모델에 대한 트레이닝 프로세스를 도시한다. SSE 런타임 분류 프로세스의 일 실시예가 도 6d에 도시되어 있으며, 리스팅 제목을 리스팅 시스템(150)의 카테고리 트리 경로에 매핑함으로써 CatReco 동작을 수행하는데 사용할 때에 기본 CatReco SSE 서비스(580)로서 지칭될 수 있다. 도 7은 예시적인 실시예에 따른 SSE 트레이닝 모델에 대한 흐름도(700)를 도시한다. 도 8은 예시적인 실시예에 따른 (도 7에 도시된) SSE 트레이닝 모델에 의해 사용되는 라벨링된 트레이닝 데이터 쌍을 능동적으로 식별하기 위한 흐름도(800)를 도시한다. 도 9는 도 7 및 도 8에 도시된 다양한 동작 및 구성요소를 포함하는 SSE 모델 트레이닝 프로세스의 예를 도시한다.7-9 illustrate the training process for the SSE model used by the base SSE CatReco service 680 in accordance with the illustrative embodiment. One embodiment of the SSE runtime classification process is shown in FIG. 6D and may be referred to as a basic CatReco SSE service 580 when used to perform CatReco operations by mapping the listing title to the category tree path of the listing system 150. [ FIG. 7 shows a flow diagram 700 for an SSE training model in accordance with an exemplary embodiment. FIG. 8 shows a flowchart 800 for actively identifying labeled training data pairs used by the SSE training model (shown in FIG. 7) according to an exemplary embodiment. FIG. 9 illustrates an example of an SSE model training process that includes the various operations and components shown in FIGS. 7 and 8. FIG.

도 7을 참조하면, 소스 SSE 모델 및 타깃 SSE 모델이 트레이닝된다. 동작(710A, 720A, 730A, 740A)은 소스 SSE 모델을 트레이닝하는데 사용된다. 동작(710B, 720B, 730B, 740B)은 타깃 SSE 모델을 트레이닝하는데 사용된다. 동작(701)에서, 소스 SSE 모델 및 타깃 SSE 모델 모두를 트레이닝하기 위해 라벨링된 트레이닝 데이터 쌍(리스팅 제목, 카테고리 트리 경로)이 제공된다. 예시적인 실시예에서, 라벨링된 트레이닝 데이터 쌍은 도 8에 도시된 흐름도(800)를 사용하여 식별된다.Referring to FIG. 7, a source SSE model and a target SSE model are trained. Operations 710A, 720A, 730A, and 740A are used to train the source SSE model. Operations 710B, 720B, 730B, and 740B are used to train the target SSE model. At operation 701, a pair of training data labeled (listing title, category tree path) is provided to train both the source SSE model and the target SSE model. In an exemplary embodiment, the labeled training data pairs are identified using the flowchart 800 shown in FIG.

동작(710A)에서, 소스 리스팅 제목(X)의 원시 문장 시퀀스가 수신된다. 소스 리스팅 제목(X)은 리스팅 판매자에 의해 제공된 단어 시퀀스를 나타낼 수 있다. 동작(720A)에서, 단어 해싱이 소스 리스팅 제목(X)에 대해 수행된다. 매우 큰 대어휘 단어가 있는 상황에서는, 해싱이 하위 단어 단위로 수행된다. 다양한 실시예에서, 3-그램 단어 해싱(3-gram word hashing)이 수행된다.At act 710A, the source sentence sequence of the source listing title X is received. The source listing title (X) may represent the word sequence provided by the listing seller. At operation 720A, word hashing is performed on the source listing title X. [ In situations with very large large vocabulary words, hashing is performed on a subword basis. In various embodiments, 3-gram word hashing is performed.

예시적 실시예에서, 컨볼루션 계층, 최대 풀링 계층 및 의미론적 계층은 신경망 계층을 나타낸다. 이들 신경망 계층에는 다수의 노드(예를 들어, 도 9에 도시된 바와 같은 500개의 노드)가 구성될 수 있다. 다른 실시예에서, 노드의 수는 데이터 크기에 따라 변경되거나 상이한 수로 구성될 수 있다. 동작(730A)에서, 컨볼루션 및 최대 풀링을 사용하여 키워드 및 개념이 소스 리스팅 제목(X)으로부터 식별된다.In an exemplary embodiment, the convolution layer, the maximum pooling layer, and the semantic layer represent the neural network layer. A plurality of nodes (for example, 500 nodes as shown in Fig. 9) may be configured in these neural network layers. In other embodiments, the number of nodes may vary depending on the data size or may be configured with a different number. At act 730A, keywords and concepts are identified from the source listing title (X) using convolution and maximum pulling.

동작(740A)에서, DNN(deep neural network)를 사용하여 소스 리스팅 제목(X)의 의미론적 벡터 표현을 추출한다. DNN은 하나 이상의 신경망 계층을 사용하여 입력 시퀀스를 의미론적 벡터 공간으로 투영한다.At operation 740A, a semantic vector representation of the source listing title (X) is extracted using a deep neural network (DNN). The DNN uses one or more neural network layers to project the input sequence into a semantic vector space.

동작(710B)에서, 타깃 카테고리 트리 경로(Y)의 원시 문장 시퀀스가 수신된다. 예시적인 실시예에서, 리스팅 시스템(150)은 발행물을 리스팅하는데 사용되고 19,000개 이상의 카테고리 트리 경로(Y) 또는 CatLeaf를 포함할 수 있다. 동작(720B)에서, 단어 해싱이 타깃 카테고리 트리 경로(Y)에 대해 수행될 수 있다. 매우 큰 대어휘 단어가 있는 상황에서는, 해싱이 하위 단어 단위로 수행된다. 다양한 실시예에서, 3-그램 단어 해싱이 수행된다.At operation 710B, the source sentence sequence of the target category tree path Y is received. In an exemplary embodiment, the listing system 150 is used to list publications and may include more than 19,000 category tree paths (Y) or CatLeafs. At operation 720B, word hashing may be performed on the target category tree path Y. [ In situations with very large large vocabulary words, hashing is performed on a subword basis. In various embodiments, 3-gram word hashing is performed.

예시적 실시예에서, 컨볼루션 계층, 최대 풀링 계층 및 의미론적 계층은 신경망 계층을 나타낸다. 이들 신경망 계층에는 다수의 노드(예를 들어, 도 9에 도시된 바와 같은 500개의 노드)가 구성될 수 있다. 다른 실시예에서, 노드의 수는 데이터 크기에 따라 변경되거나 상이한 수로 구성될 수 있다. 동작(730B)에서, 컨벌루션 및 최대 풀링을 사용하여 타깃 카테고리 트리 경로(Y)로부터 키워드 및 개념이 식별된다. In an exemplary embodiment, the convolution layer, the maximum pooling layer, and the semantic layer represent the neural network layer. A plurality of nodes (for example, 500 nodes as shown in Fig. 9) may be configured in these neural network layers. In other embodiments, the number of nodes may vary depending on the data size or may be configured with a different number. At operation 730B, keywords and concepts are identified from the target category tree path (Y) using convolution and maximum pulling.

동작(740B)에서, 심층 신경망(DNN)을 사용하여, 타깃 카테고리 트리 경로(Y)의 의미론적 벡터 표현을 추출한다. DNN은 하나 이상의 신경망 계층을 사용하여 입력 시퀀스를 의미론적 벡터 공간으로 투영한다.At operation 740B, a neural network (DNN) is used to extract a semantic vector representation of the target category tree path (Y). The DNN uses one or more neural network layers to project the input sequence into a semantic vector space.

동작(750)에서, X와 Y 사이의 의미론적 벡터 거리를 사용하여, 소스 리스팅 제목(X)의 의미론적 벡터 표현과 타깃 카테고리 트리 경로(Y)의 의미론적 벡터 표현간의 유사도를 측정한다. 예시적 실시예에서, 함수 sim(X, Y)에 의해 나타내어지는 의미론적 관련도는 코사인 유사도에 의해 측정된다.At operation 750, the semantic vector distance between X and Y is used to measure the similarity between the semantic vector representation of the source listing title X and the semantic vector representation of the target category tree path Y. In an exemplary embodiment, the semantic relevance represented by the function sim (X, Y) is measured by the cosine similarity.

소스 SSE 모델 및 타깃 SSE 모델 모두가 트레이닝되어 있으면, 타깃의 카테고리 분류 체계 엔트리 모두에 대한 의미론적 벡터 표현은 타깃 SSE 모델을 사용하여 사전에 미리 계산될 수 있다. 또한, 판매자로부터 어떤 새로운 발행물 리스팅을 매핑할 필요가 있는 경우, 리스팅 제목의 의미론적 벡터 표현은 리스팅 시스템(150)의 카테고리 분류 체계로부터의 카테고리 트리 경로의 의미론적 벡터 표현과 공유된 의미론적 벡터 공간으로 투영될 수 있다. 예시적인 실시예에서, 리스팅 제목에 대한 정확한 매핑은 리스팅 제목의 의미론적 벡터 표현에 가장 가까운 의미론적 벡터 표현을 갖는 카테고리 트리 경로일 것이다. If both the source SSE model and the target SSE model are trained, the semantic vector representations for all of the target category classification scheme entries can be pre-computed in advance using the target SSE model. In addition, if it is necessary to map a new publication listing from the seller, the semantic vector representation of the listing title can be combined with the semantic vector representation of the category tree path from the cataloging system of the listing system 150, Lt; / RTI > In an exemplary embodiment, the correct mapping for a listing title would be a category tree path having a semantic vector representation closest to the semantic vector representation of the listing title.

전술한 바와 같이, SSE가 특정 <소스 시퀀스, 타깃 시퀀스> 쌍을 매핑하는데 적용될 때, SSE 소스 모델 및 SSE 타깃 모델의 파라미터가 최적화되므로 관련 <소스, 타깃> 쌍이 더 가까운 벡터 표현 거리를 가진다. 다음의 식을 사용하여 최소 거리를 계산할 수 있다.As described above, when the SSE is applied to map a specific <source sequence, target sequence> pair, the parameters of the SSE source model and the SSE target model are optimized so that the related <source, target> pair has a closer vector expression distance. The following formula can be used to calculate the minimum distance.

여기서, here,

ScrSeq=소스 시퀀스; ScrSeq = source sequence;

TgtSeq=타깃 시퀀스; TgtSeq = target sequence;

SrcMod=소스 SSE 모델; SrcMod = Source SSE model;

TgtMod=타깃 SSE 모델; TgtMod = target SSE model;

TgtVec=타깃 시퀀스에 대한 연속 벡터 표현(타깃의 의미론적 벡터라고도 함)이다. TgtVec = a continuous vector representation (also referred to as the target semantic vector) for the target sequence.

트레이닝된 SSE 모듈은 런타임 분류를 구현하는데 사용된다. 다양한 실시예에서, SSE 모델의 트레이닝은 트레이닝 데이터, 예컨대 라벨링된 트레이닝 데이터 쌍으로 오프라인에서 수행된다. 일부 실시예에서, 라벨링된 트레이닝 데이터가 자동으로 도출된다. 예시적 실시예에서, 각각의 라벨링된 트레이닝 샘플은 1쌍의 <소스 시퀀스, 타깃 시퀀스>에 의해 나타내어진다. 예시적 실시예에서, 소스 시퀀스는 발행물 리스팅의 제목을 나타낸다. 타깃 시퀀스는 리스팅 시스템(150)에 의해 사용되는 카테고리 분류 체계의 카테고리 트리 경로에 의한 LeafCat을 나타낸다.Trained SSE modules are used to implement runtime classifications. In various embodiments, the training of the SSE model is performed off-line with training data, e.g., labeled training data pairs. In some embodiments, the labeled training data is derived automatically. In an exemplary embodiment, each labeled training sample is represented by a pair of < source sequence, target sequence >. In an exemplary embodiment, the source sequence represents the title of the publication listing. The target sequence represents LeafCat by the category tree path of the category classification scheme used by the listing system 150.

일반적으로, 양호한 자연 언어 프로세스 및 머신 학습 방법은 라벨링된 학습 데이터(즉, 지도 학습)를 필요로 한다. 수백만개의 라벨링된 트레이닝 데이터 샘플을 사용하여 SSE 모듈을 트레이닝하면 매핑 결과의 정확도가 증가한다. 다양한 실시예에서, 리스팅 시스템(150)에서 미리 탑재된 발행물 리스팅을 사용하여 SSE 모델이 트레이닝된다. 기존의 발행물 리스팅은 SSE 모델을 관련 데이터로 신속하게 트레이닝할 수 있다. 예를 들어 캘리포니아주 새너제이에 소재한 이베이(eBay)사와 같은 회사는 데이터 웨어하우스(data warehouse)에 기록된 판매자의 물품 분류 체계 정보를 갖는 미리 탑재된 수십억개의 발행물 리스팅에 액세스할 수 있다. 미리 탑재된 발행물 리스팅은 이베이의 이전 거래 데이터를 기반으로 이러한 수백만개의 라벨링된 트레이닝 데이터를 채굴(mine), 연결(join), 필터링하도록 처리될 수 있다.In general, good natural language processing and machine learning methods require labeled learning data (i.e., map learning). Training SSE modules with millions of labeled training data samples increases the accuracy of the mapping results. In various embodiments, the SSE model is trained using the pre-mounted publication listing in the listing system 150. [ Existing publication listings can quickly train SSE models into relevant data. For example, a company like eBay, located in San Jose, Calif., Has access to billions of pre-installed publication listings with seller's taxonomy information stored in a data warehouse. Preloaded publication listings can be processed to mine, join, and filter these millions of labeled training data based on eBay's previous transaction data.

도 8은 예시적인 실시예에 따라 라벨링된 트레이닝 데이터를 도출하는 방법의 흐름도(800)를 도시한다. 동작(810)에서, 발행 시스템(142)의 데이터 웨어하우스에 저장된 리스팅 제목에 대한 이력 데이터가 액세스된다. 판매자에 의해 등록된 이전 발행물 리스팅과 관련된 이력 데이터가 액세스되며, 이력 데이터는 발행 시스템(142)으로부터 데이터웨어 하우스에 저장될 수 있다. 다양한 실시예에서, 이전 발행물 리스팅과 관련된 이력 데이터는 리스팅 프로세스 동안 리스팅 판매자에 의해 선택된 리스팅 제목 및 카테고리를 포함한다.FIG. 8 shows a flowchart 800 of a method of deriving labeled training data in accordance with an exemplary embodiment. At operation 810, historical data for the listing title stored in the data warehouse of the publishing system 142 is accessed. Historical data associated with previous publication listings registered by the merchant is accessed and history data may be stored in the data warehouse from publishing system 142. [ In various embodiments, the historical data associated with the previous publication listing includes the listing title and category selected by the listing seller during the listing process.

동작(820)에서, 발행 시스템의 데이터베이스에 저장된 카테고리 트리의 LeafCat가 액세스된다. 예시적인 실시예에서, LeafCat은 19,000개 이상의 엔트리를 포함할 수 있다.At operation 820, the LeafCat of the category tree stored in the database of the publishing system is accessed. In an exemplary embodiment, LeafCat may include more than 19,000 entries.

리스팅 시스템(150)의 카테고리 분류 체계에 기초하여 LeafCat의 카테고리 트리 경로 및 리스팅 제목을 포함하는 데이터를 트레이닝한다.Based on the category classification scheme of the listing system 150, the data including the category tree path and the listing title of LeafCat.

동작(830)에서, 특정 시간 주기 동안(예를 들어, 8주마다), 각 LeafCat에 대한 리스팅 제목을 식별한다. 동작(810, 820)에 의해 액세스된 데이터를 사용하여, 각 리프 카테고리에 대한 리스팅 제목을 식별한다.At act 830, a listing title for each LeafCat is identified during a particular time period (e. G., Every 8 weeks). Using the data accessed by operations 810 and 820, a listing title for each leaf category is identified.

그 후, 트레이닝 데이터는 동작(840)에서 필터 A를 사용하고 동작(850)에서 필터 B를 사용하여 필터링된다. 필터 A 및 B를 이용함으로써, 리스팅 시스템(150)은 판매자의 카타고리 선택이 리스팅 시스템(150)의 CatReco 구성요소(202)로부터의 제 1 추천과 매칭되는지를 체크한다(동작(840)). 매칭되는 것이 있으면 리스팅 시스템(150)은 리스팅 부적절 분류화(miscategorization)(miscat) 점수가 낮은지를 체크한다. 점수가 낮으면, 리스팅 발행물이 잘못된 LeafCat으로 부적절하게 분류화되었다는 것을 종종 나타낸다. 낮은 점수의 예는 50일 수 있다. 리스팅 발행물이 필터 A 및 B 모두를 통과하면, (리스팅 제목, 카테고리 트리 경로)의 쌍은 순수한 트레이닝 샘플로서 취급된다.The training data is then filtered using the filter A in operation 840 and filter B in operation 850. Using filters A and B, the listing system 150 checks whether the seller's category selection matches the first recommendation from the CatReco component 202 of the listing system 150 (act 840). If there is a match, the listing system 150 checks if the miscategorization miscat score is low. If the score is low, it often indicates that the listing publication has been improperly categorized with the wrong LeafCat. An example of a low score could be 50. If the listing publication passes both filters A and B, the pair of (listing title, category tree path) is treated as a pure training sample.

동작(860)에서, 라벨링된 트레이닝 데이터 쌍(리스팅 제목, 카테고리 트리 경로)이 식별된다. 다양한 실시예에서, 이것은 능동적 학습을 구현하기 위해 트레이닝 프로세스에 의해 사용되는 라벨링된 트레이닝 쌍을 식별하는 자동화된 프로세스이다. 흐름도(800)에 도시된 방법은 라벨링된 트레이닝 데이터 쌍이 정기적으로 식별되고, 도 9의 흐름도(900)에 의해 도시된 바와 같이, 머신 학습을 통해 SSE 모델 프로세스를 능동적으로 트레이닝하는데 사용되도록 자동화될 수 있다.At operation 860, a labeled training data pair (listing title, category tree path) is identified. In various embodiments, this is an automated process that identifies the labeled training pair used by the training process to implement active learning. The method shown in flowchart 800 may be automated to be used to actively train the SSE model process through machine learning, as shown by the flowchart 900 of FIG. 9, and the labeled training data pairs are periodically identified have.

도 9는 SSE 모델 트레이닝 프로세스를 도시하며 예시적인 실시예에서의 (도 7에 도시된) 흐름도(700) 및 (도 8에 도시된) 흐름도(800)가 조합되어 있다. SSE 모델 트레이닝 프로세서의 중요한 목표 중 하나는 최적화된 소스 SSE 모델과 최적화된 타깃 SSE 모델을 얻으려고 하는 것이며, 모든 트레이닝 샘플 쌍에 대해 소스 시퀀스의 연속 벡터 표현과 타깃 시퀀스의 연속 벡터 표현간의 거리를 최소화하는 것이다. 다양한 실시예에서, 이러한 거리 최소화의 목표를 달성하기 위해, 소스 SSE 모델 및 타깃 SSE 모델을 최적화하는데 머신 학습이 사용된다.FIG. 9 shows a SSE model training process and is a combination of a flowchart 700 (shown in FIG. 7) and a flowchart 800 (shown in FIG. 8) in an exemplary embodiment. One of the important goals of the SSE model training processor is to obtain an optimized source SSE model and an optimized target SSE model and minimize the distance between the continuous vector representation of the source sequence and the continuous vector representation of the target sequence for all pairs of training samples . In various embodiments, machine learning is used to optimize the source SSE model and the target SSE model to achieve this distance minimization goal.

도 9는 예시적인 실시예에 따른 흐름도(900)를 도시한다. 도 9에 도시된 방법은 동작(710A-740A, 710B-710B, 750)(도 7를 참조하여 설명됨) 및 동작(810-860)(도 8을 참조하여 설명됨)을 포함한다.FIG. 9 illustrates a flow diagram 900 in accordance with an exemplary embodiment. The method shown in FIG. 9 includes operations 710A-740A, 710B-710B, 750 (described with reference to FIG. 7) and operations 810-860 (described with reference to FIG. 8).

도 9에 따르면, 순수한 트레이닝 쌍(리스팅 제목, 카테고리 트리 경로)이 소스 SSE 모델 및 타깃 SSE 모델을 트레이닝하는데 사용된다. 트레이닝 쌍을 생성하는 프로세스가 (동작(840, 850)에서) 필터 A 및 B를 사용하여 부적절하게 분류화된 쌍을 걸러내기 때문에 트레이닝 쌍을 순수한 트레이닝 쌍이라고 지칭할 수 있다. 일례로서, 리스팅 제목은 "비디오 모니터, 모토롤라 - 무선 비디오 아기 모니터 - 흰색"이다. 카테고리 트리 경로는 "아기>아기 안전&건강>아기 모니터"이다. 트레이닝 쌍의 리스팅 제목은 소스 SSE 모델로의 입력으로서 제공되고, 트레이닝 쌍의 카테고리 트리 경로는 타깃 SSE 모델로의 입력으로서 제공된다. 예시적 실시예에서, 코사인 유사도에 의해 측정된 (트레이닝 데이터에서의 카테고리 트리 경로의 리스팅 제목의 소스 의미론적 벡터 및 타깃 의미론적 벡터의) 의미론적 관련도는 유사도 점수로서 지칭된다. 예시적 실시예에서, CatReco 구성요소(202) 내의 머신 학습 시스템은 소스 SSE 모델 및 타깃 SSE 모델을 트레이닝하기 위해 순수한 트레이닝 쌍(동작(860)에서 식별됨)을 사용한다.According to FIG. 9, a pure training pair (listing title, category tree path) is used to train the source SSE model and the target SSE model. A training pair may be referred to as a pure training pair because the process of generating a training pair (at operations 840 and 850) filters out the improperly classified pair using filters A and B. [ As an example, the listing title is "Video Monitor, Motorola - Wireless Video Baby Monitor - White". The category tree path is "Baby> Baby Safety & Health> Baby Monitor". The listing title of the training pair is provided as input to the source SSE model and the category tree path of the training pair is provided as input to the target SSE model. In an exemplary embodiment, the semantic relevance (of the source semantic vector and the target semantic vector of the listing title of the category tree path in the training data) measured by the cosine similarity is referred to as the similarity score. In an exemplary embodiment, the machine learning system in the CatReco component 202 uses a pure training pair (identified in operation 860) to train the source SSE model and the target SSE model.

예시적인 실시예에서, (도 6d에 도시된 바와 같이) 기본 SSE CatReco 서비스에 의해 제공되는 기본 SSE 런타임 분류 프로세스(680)는 (도 9에 도시된) SSE 모델 트레이닝 프로세스(900)에 의해 트레이닝된 타깃 심층 SSE 모델(613A) 및 SSE 모델 트레이닝 프로세스에 의해 트레이닝된 소스 심층 SSE 모델(616A)을 사용한다.In the exemplary embodiment, the basic SSE runtime classification process 680 provided by the base SSE CatReco service (as shown in FIG. 6D) is performed by the SSE model training process 900 (shown in FIG. 9) The target deep-seated SSE model 613A and the source deep-seated SSE model 616A trained by the SSE model training process.

도 9에 도시된 SSE 모델 트레이닝 프로세스가 리스팅 제목 "비디오 모니터, 모토로라 - 무선 비디오 아기 모니터"이고, 카테고리 트리 경로가 "아기>아기 안전&건강>아기 모니터"인 라벨링된 트레이닝 쌍(리스팅 제목, 카테고리 트리 경로)를 사용하여 소스 및 타깃 SSE 모델을 트레이닝하는 것을 나타내고 있지만, 도 9에 도시된 SSE 모델 트레이닝 프로세스를 사용하여 다른 유형의 라벨링된 트레이닝 쌍을 트레이닝시킬 수 있다. 예를 들어, 위의 표 1에 나타낸 바와 같은 다른 유형의 동작을 수행할 때, 라벨링된 트레이닝 쌍(리스팅 제목, 제품 유형 트리 경로) 또는 라벨링된 트레이닝 쌍(카테고리 트리 경로, 제품 유형 트리 경로)을 사용할 수 있다. The SSE model training process shown in FIG. 9 is a labeled training pair with the listing title "Video monitor, Motorola-wireless video baby monitor ", and the category tree path is" Tree path) is used to train the source and target SSE models, but other types of labeled training pairs may be trained using the SSE model training process shown in FIG. For example, when performing other types of actions, such as those shown in Table 1 above, the labeled training pair (listing title, product type tree path) or labeled training pair (category tree path, product type tree path) Can be used.

다양한 실시예에서, 리스팅 시스템(150)(도 2에 도시됨)의 CatReco 구성요소(202)는 SLM 재순위 지정 서비스(1110) 및 GBM 퓨전 예측 서비스(1030)에 의해 제공되는 SLM을 사용하는 것과 결합하여 기본 SSE CatReco 서비스를 사용할 수 있다. 또한 SLM 재순위 지정 서비스(1110) 및 GBM 퓨전 예측 서비스(1030)는 예시적 실시예에서 리스팅 시스템(150)에 의해 수행될 수도 있다. 예시적인 실시예에 따른 SSE-SLM-GBM 접근법의 고레벨 블록도가 도 10에 도시된다. 도 10에 도시된 흐름도(1000)는 점수에 의해 SSE-SLM-GBM CatReco 결과를 생성하는 프로세스를 도시한다.In various embodiments, the CatReco component 202 of the listing system 150 (shown in FIG. 2) may use the SLM provided by the SLM re-ranking service 1110 and the GBM fusion prediction service 1030 You can use the default SSE CatReco service in combination. The SLM re-ranking service 1110 and the GBM fusion prediction service 1030 may also be performed by the listing system 150 in the illustrative embodiment. A high-level block diagram of the SSE-SLM-GBM approach in accordance with the illustrative embodiment is shown in FIG. The flowchart 1000 shown in FIG. 10 illustrates a process for generating an SSE-SLM-GBM CatReco result by score.

다양한 예시적인 실시예에서, SLM은 CatReco 구성요소(202)에 의해 제공되는 추천의 정확도를 향상시키는데 사용된다. SLM은 문장, 리스팅 제목 또는 검색 질의와 같은 주어진 텍스트 입력의 우도를 한정하려고 하는 데이터 기반 모델링(data-driven modeling) 접근법이다. SLM은 방대한 양의 자율 텍스트 데이터(예컨대, 라벨링되지 않아 명백한 구조를 갖지 않는 텍스트 데이터)를 활용할 수 있다. 예시적인 실시예에서, SLM은 관리되지 않은 리스팅 제목에 기초하여 각 LeafCat에 대한 언어 모델을 트레이닝하는데 사용되고, 그리고 새로운 리스팅 제목의 문장 로그 확률(SLP; sentence log probability)은 적절한 LeafCat의 언어 모델을 사용하여 평가된다. 이것은 각 후보 LeafCat에 대해 반복될 수 있다. 다양한 실시예에서, 제안된 카테고리의 순위에 대한 재순위 지정 프로세스는 기본 SSE CatReco 서비스(680)가 유사도 점수 및 SSE 리콜 세트를 생성한 후에 수행된다. 리콜 세트는 기본 SSE CatReco 서비스(680)에 의해 생성된 상위 N 카테고리를 나타낼 수 있다.In various exemplary embodiments, the SLM is used to improve the accuracy of the recommendations provided by the CatReco component 202. SLM is a data-driven modeling approach that attempts to limit the likelihood of a given text input, such as a sentence, a listing title, or a search query. The SLM can utilize a vast amount of autonomous text data (e.g., text data that is not labeled and has no explicit structure). In an exemplary embodiment, the SLM is used to train a language model for each LeafCat based on the unmanaged listing title, and the sentence log probability (SLP) of the new listing title is used in the appropriate LeafCat language model . This can be repeated for each candidate LeafCat. In various embodiments, the re-ranking process for the ranking of the proposed category is performed after the base SSE CatReco service 680 generates the similarity score and the SSE recall set. The recall set may represent the top N category generated by the basic SSE CatReco service 680.

특히, 예시적인 실시예에서, 상위 N 리프 카테고리(기본 SSE CatReco 서비스(680)에 의해 식별됨)에 리스팅된 카테고리만이 SLM 재순위 지정 서비스(1110)를 사용하여 평가된다. 이는 모든 가능한 카테고리(예컨대, 19,000개 이상의 리프 카테고리)에 대해 SLM 알고리즘을 실행하는 것보다 훨씬 효율적일 수 있다. In particular, in the exemplary embodiment, only the categories listed in the top N leaf category (identified by the default SSE CatReco service 680) are evaluated using the SLM reranking service 1110. This may be more efficient than running the SLM algorithm for all possible categories (e.g., more than 19,000 leaf categories).

또한, 예시적인 실시예에서, 후술하는 바와 같이, 다양한 점수 및 데이터를 함께 퓨징(fusing)하여, 제안된 카테고리를 더 개선하기 위해서 일부 추정량의 예측을 조합하는데 GBM이 사용된다.In addition, in an exemplary embodiment, GBM is used to fusing various scores and data together, as described below, to combine predictions of some estimators to further refine the proposed category.

도 10에 따르면, 발행물 리스팅의 제목이 동작(1001)에서 수신된다. 발행물 리스팅의 제목은 기본 SSE CatReco 서비스(680) 및 SLM 재순위 지정 서비스(1110)에 제공된다.According to FIG. 10, the title of the publication listing is received in operation 1001. The title of the publication listing is provided to basic SSE CatReco service 680 and SLM re-ranking service 1110.

기본 SSE CatReco 서비스(680)가 사용되어, 리스팅 제목에 대한 LeafCat Id의 SSE 리콜 세트(1010)에서 정의되는 상위 N LeafCat를 식별하는데 사용된다. LeafCat Id의 SSE 리콜 세트(1010)는 SLM 재순위 지정 서비스(1110)로의 입력으로 제공된다. 예시적 실시예에서, SLM 재순위 지정 서비스(1110)는 SLM 런타임 분류 단계(1110A)(도 11에 도시됨) 및 SLM 트레이닝 단계(1110B)(도 12에 도시됨)의 2개의 구성요소를 포함한다.A default SSE CatReco service 680 is used to identify the top N LeafCat defined in the SSE recall set 1010 of the LeafCat Id for the listing title. An SSE recall set 1010 of LeafCat Id is provided as input to the SLM re-ranking service 1110. In an exemplary embodiment, the SLM re-ranking service 1110 includes two components: an SLM runtime classification step 1110A (shown in FIG. 11) and an SLM training step 1110B (shown in FIG. 12) do.

리프 카테고리의 세트를 식별하기 위해 입력 텍스트 스트링(예를 들어, 발행물 리스팅의 제목을 나타냄)에 대해 KNN(K nearest neighbor) 알고리즘을 사용하기 보다는, 기본 SSE CatReco 서비스(680)를 사용하여 리프 카테고리(즉, 상위 N의 LeafCat)의 세트를 식별한다. 리프 카테고리의 세트(LeafCat ID의 SSE 리콜 세트(1010)에 의해 정의됨)는 입력에 대해 수행된 SLM 알고리즘, 각각의 LeafCat에 대해 조합된 SLM(1232), 각 LeafCat에 대한 로그 우도 확률(LLP)(1212), 각 LeafCat에 대한 예상된 혼잡도 및 표준 편차(예상된 PPL 및 PPL_Std라고도 지칭됨)(1236)에 기초하여 (SLM 재순위 지정 서비스(1110)에 의해) 재순위 지정된다. LLP 및 PPL은 도 11을 참조하여 더 상세히 설명될 것이다. Rather than using a K nearest neighbor algorithm for an input text string (e.g., representing the title of a publication listing) to identify a set of leaf categories, the default SSE CatReco service 680 may be used to determine the leaf category That is, a LeafCat of the upper N). The set of leaf categories (defined by the SSE recall set 1010 of LeafCat ID) includes the SLM algorithm performed on the input, the SLM 1232 combined for each LeafCat, the log likelihood probability (LLP) for each LeafCat, (By the SLM re-ranking service 1110) based on the expected congestion and standard deviation (also referred to as the expected PPL and PPL_Std) 1236 for each LeafCat. LLP and PPL will be described in more detail with reference to FIG.

동작(1030)에서, GBM 퓨전 예측 서비스(1030)는 SSE 리콜 세트 LeafCat Id(1010), 각각의 LeafCat에 대한 LLP(1212), 각 LeafCat에 대한 예상된 PPL 및 PPL_Std(1236), SLM 재순위 지정 서비스(1110)로부터의 출력(즉, 재순위 지정된 LeafCat의 세트)을 입력으로서 수신한다. 그리고, 동작(1030)에서, GBM 퓨전 예측 서비스(1030)는 수신된 다양한 입력을 퓨징하여, 추천된 LeafCat의 리스팅 순서를 대응하는 점수로 계산하는데 사용된다. GBM 퓨전 예측의 결과는 1040에 나타내어져 있다.At operation 1030, the GBM Fusion Prediction Service 1030 receives the SSE recall set LeafCat Id 1010, the LLP 1212 for each LeafCat, the expected PPL and PPL_Std 1236 for each LeafCat, As an input, the output from service 1110 (i.e., the set of reordered LeafCats). Then, at operation 1030, the GBM fusion prediction service 1030 is used to fuse the received various inputs and compute the listing order of the recommended LeafCat to the corresponding score. The result of the GBM Fusion prediction is shown at 1040.

도 11은 예시적인 실시예에 따른 SLM 재순위 지정 서비스(1110)의 SLM 런타임 분류 스테이지(1110A)를 나타내는 도면이다.11 is a diagram illustrating an SLM runtime classification stage 1110A of an SLM re-ranking service 1110 according to an exemplary embodiment.

도 11에 따르면, 리스팅 제목 입력(1001)은 기본 SSE CatReco 서비스(680)에 제공된다. 기본 CatReco 서비스(680)는 SLM 런타임 분류 스테이지(1110A)로의 입력으로서 제공되는 LeafCat 식별자(ID)의 SSE 리콜 세트(1010)를 생성한다.According to FIG. 11, a listing title entry 1001 is provided to the basic SSE CatReco service 680. The basic CatReco service 680 generates an SSE recall set 1010 of the LeafCat identifier (ID) provided as input to the SLM runtime classification stage 1110A.

각 LeafCat에 대한 LLP(1212), 각 LeafCat에 대한 조합된 SLM(1232), 및 각 LeafCat에 대한 예상 PPL 및 PPL_Std(1236)는 SLM 런타임 분류 스테이지(1110A)에 의해 액세스된다. 보다 구체적으로, 각 LeafCat에 대한 LLP(1212)는 오프라인으로 사전에 계산되어 파일에 저장되며 런타임 시에 메모리로 로드된다. 각 LeafCat에 대한 조합된 SLM(1234)은 각 LeafCat에 대한 SLM 모델이며, 이 SLM 모델은 오프라인으로 사전에 트레이닝되고 런타임 시에 메모리로 로드된다. 또한 각 LeafCat에 대한 예상 PPL 및 PPL_STD(1236)는 모델 트레이닝 프로세스 동안 오프라인으로 사전에 계산되어 파일로 저장되며 런타임 시에 메모리로 로드된다. 각 LeafCat에 대한 LLP(1212)의 사전 계산, 각 LeafCat에 대한 조합된 SLM(1232), 및 각 LeafCat에 대한 예상 PPL 및 PPL_Std(1236)는 도 12를 참조하여 더 상세히 설명된다.The LLP 1212 for each LeafCat, the combined SLM 1232 for each LeafCat, and the expected PPL and PPL_Std 1236 for each LeafCat are accessed by the SLM runtime classification stage 1110A. More specifically, the LLP 1212 for each LeafCat is pre-calculated offline, stored in a file, and loaded into memory at runtime. The combined SLM 1234 for each LeafCat is the SLM model for each LeafCat, which is pre-trained offline and loaded into memory at runtime. Also, the expected PPL and PPL_STD (1236) for each LeafCat are pre-computed and stored in a file offline during the model training process and loaded into memory at runtime. The precomputation of LLP 1212 for each LeafCat, the combined SLM 1232 for each LeafCat, and the expected PPL and PPL_Std 1236 for each LeafCat are described in more detail with reference to FIG.

SLM 런타임 분류 단계(1110A)에서, 할당된 리프 카테고리로부터 주어진 리스팅이 얼마나 멀리 벗어나 있는지를 측정하기 위해 심층 신호(deep signal)가 계산된다. 런타임 발행물 리스팅 제목이 T이고, 판매자가 이를 카테고리 C에 배치하고 발행물의 런타임 혼잡도가 PP(T)로 계산된다고 가정하면, 그 편차 신호는 다음과 같이 계산된다.In the SLM runtime classification step 1110A, a deep signal is calculated to determine how far away the given listing is from the assigned leaf category. Assuming that the runtime publication listing title is T, the seller places it in category C, and the runtime congestion of the publication is calculated as PP (T), the deviation signal is calculated as:

여기서 α는 미세 조정될 수 있는 파라미터(예시적인 실시예에서는 2.0으로 설정됨)이다.Where alpha is a parameter that can be fine tuned (set to 2.0 in the exemplary embodiment).

마지막으로, Mean_PP(C), STD_PP(C), PP(T) 및 Deviation_PP(C, T)는 가격, 조건, CatReco 점수 등과 같은 피상적 특징과 함께, 심층 특징을 GBM 모델로 제공하여, 앙상블 모델을 만들 수 있다.Finally, Mean_PP (C), STD_PP (C), PP (T) and Deviation_PP (C, T) provide the deep features as GBM models along with superficial features such as price, condition, CatReco score, Can be made.

후보 LeafCat ID에 대한 LLP는 동작(1120)에서 LeafCat에 대한 LLP(1212)에 기초하여 식별된다. 후보 LeafCat ID은 LeafCat ID의 SSE 리콜 세트를 기반으로 한다.The LLP for the candidate LeafCat ID is identified based on the LLP 1212 for LeafCat in operation 1120. The candidate LeafCat ID is based on the SSE recall set of the LeafCat ID.

후보 LeafCat ID에 대한 SLP는 동작(1130)에서 각 LeafCat에 대한 조합된 SLM(1234)에 기초하여 식별된다. 후보 LeafCat ID은 LeafCat ID의 SSE 리콜 세트를 기반으로 한다.The SLP for the candidate LeafCat ID is identified based on the combined SLM 1234 for each LeafCat in operation 1130. [ The candidate LeafCat ID is based on the SSE recall set of the LeafCat ID.

동작(1120)의 출력(즉, 후보 LeafCat ID에 대해 식별된 LLP) 및 동작(1130)의 출력(즉, 후보 LeafCat ID에 대해 식별된 SLP)은 동작(1140)에서 입력으로서 사용되어 SLM 순위 점수를 계산한다. SLM 순위 점수는 동작(1150)으로의 입력으로 사용된다. 동작(1150)에서, SLM 순위 점수에 기초하여 SLM 투표 점수(SLM Voting Score)가 계산된다. 동작(1150)에서, 리스팅 제목에 대한 SLM 순위 점수가 생성된다.The output of operation 1120 (i.e., the LLP identified for the candidate LeafCat ID) and the output of operation 1130 (i.e., the SLP identified for the candidate LeafCat ID) are used as inputs in operation 1140 to determine the SLM rank score . The SLM ranking score is used as input to operation 1150. At operation 1150, an SLM Vote Score is calculated based on the SLM Rank Score. At act 1150, an SLM ranking score for the listing title is generated.

예시적인 실시예에서, 각각의 LeafCat에 대한 SLM 순위 점수(SRS; SLM ranking score)는 식 SRS=SLP + 1.8 * LPP를 사용하는 것과 같이 (가중치 부여된) 개별 SLP 점수와 LPP 점수를 다 더함으로써 계산된다. 예시적인 실시예에서, SLM 투표 점수는 식 SLM 투표 점수=1 / (식 1 + Max_SRS - SRS)를 이용하는 것과 같이, 리프 카테고리에 대한 최대 SRS 점수와 개별 SRS 점수의 차이와 그 합으로 1을 나눔으로써 계산된다.In an exemplary embodiment, the SLM ranking score (SRS) ranking score for each LeafCat is calculated by adding the (weighted) individual SLP score and the LPP score, such as using the formula SRS = SLP + 1.8 * LPP . In an exemplary embodiment, the SLM vote score is calculated by dividing the difference between the maximum SRS score and the individual SRS score for the leaf category by 1 and the sum of the SRS scores for the leaf category, such as using formula SLM vote score = 1 / (formula 1 + Max_SRS - SRS) .

동작(1160)에서, 후보 LeafCat ID에 대한 식별된 SLP 및 LeafCat ID의 SSE 리콜 세트로부터의 예측 PPL 및 PPL_Std는 동작(1160)에서 SLM PPL 편차 백분위를 계산하기 위한 입력으로서 사용된다. 동작(1160)에서, 리스팅 제목에 대한 SLM 혼잡도 편차 신호가 생성된다. 혼잡도 편차 신호는 심층 특성(deep feature)으로 지칭될 수 있다. 예시적 실시예에서, SLM PPL 편차 백분위=CurPPL / (PPL_Mean + 2 * PPL_Std)이다. CurPPL은 현재의 혼잡도를 나타내며 런타임에서 계산된다. CurPPL은 후보 LeafCat의 SLM 모델에 대해 도입되는 새로운 리스팅 제목의 PPL 값을 나타낸다. 이하에 제공된 식을 참조하면, "PPL_Mean"이라는 용어는 meann_PPL로 지칭되고, "PPL_Std"는 STD_PP로도 지칭될 수 있다.At act 1160, the predictions PPL and PPL_Std from the SSE recall set of the identified SLP and LeafCat ID for the candidate LeafCat ID are used as inputs to calculate the SLM PPL deviation percentile at operation 1160. At act 1160, an SLM congestion deviation signal for the listing title is generated. The congestion deviation signal can be referred to as a deep feature. In an exemplary embodiment, the SLM PPL deviation percentile is CurPPL / (PPL_Mean + 2 * PPL_Std). CurPPL represents the current congestion and is calculated at runtime. CurPPL represents the PPL value of the new listing title introduced for the candidate LeafCat's SLM model. Referring to the equations provided below, the term "PPL_Mean" may be referred to as meann_PPL, and "PPL_Std" may also be referred to as STD_PP.

SLM 런타임 재순위 지정 단계(1110A) 동안, SSE가 후보 LeafCat Id의 리콜 세트를 생성할 때, LeafCat에 대해 각 후보 LeafCat의 대응하는 조합된 SLM(1232)에 대해 요청된 발행물 리스팅의 제목에 기초한 SLP, PPL 및 PPL_Deviation 값이 런타임에서 계산된다. LLP, PPL, SLP, PPL_Deviation 값은 전체 리콜 leafCat 후보 세트의 재순위 지정에 사용된다.During the SLM runtime re-ordering step 1110A, when the SSE generates a recall set of candidate LeafCat Ids, an SLP based on the title of the requested publication listing for the corresponding combined SLM 1232 of each candidate LeafCat for LeafCat , PPL and PPL_Deviation values are calculated at run time. The LLP, PPL, SLP, and PPL_Deviation values are used to re-rank the entire recalled leafCat candidate set.

예시적인 실시예에서, 문장 PPL은 다음과 같이 계산될 수 있다. 문장 S가 {w₁, w₂, ..., w_N}과 같은 N개의 단어의 시퀀스로 구성된다고 가정한다. S의 혼잡도가 이하와 같이 계산된다.In an exemplary embodiment, the sentence PPL can be computed as: It is assumed that the sentence S consists of a sequence of N words such as {w ₁ , w ₂ , ..., w _N }. The congestion degree of S is calculated as follows.

주어진 LeafCat C에 대해서는, (리스팅 제목으로부터) M개의 문장이 튜닝 세트(tuning set)로서 존재할 수 있다. 이들은 S₁, S₂, ..., S_M으로서 표시될 수 있다. 이 제목 문장의 각각에 대해, 그것의 상응하는 혼잡도는 상기 식에 기초하여 계산될 수 있다. 그 후, 주어진 LeafCat에 대한 예상 혼잡도 값 및 관련 표준 편차 값은 이하의 식에 의해 구해질 수 있다(모든 평균_PP 값 및 STD_PP 값은 사전에 계산되어 런타임 이용을 위해 저장될 수 있다는 것을 주의).For a given LeafCat C, M sentences (from the listing title) can exist as a tuning set. These may be denoted as S ₁ , S ₂ , ..., S _M. For each of these headline sentences, its corresponding congestion can be calculated based on the above equation. The expected congestion value and the associated standard deviation value for a given LeafCat can then be obtained by the following equation (note that all the mean_PP and STD_PP values can be calculated in advance and stored for runtime use) .

도 12는 예시적인 실시예에 따른 SLM 트레이닝 단계(1110B)를 도시하는 도면이다. 예시적인 실시예에서, SLM 트레이닝 단계(1110B)는 SLM 재순위 지정 서비스(1110)의 일부이다. SLM 트레이닝 단계(1110B)는 발행물 정보를 포함하는 데이터베이스(1202)를 액세스하며, 이 정보는 리스팅 제목, 검색 질의, 제품명 등의 정보를 포함할 수 있다. 이 데이터베이스 상에서 다양한 검색을 실행하여, SLM 모델이 작성되는 특정 LeafCat에 관련된 정보를 식별할 수 있다.12 is a diagram illustrating an SLM training step 1110B in accordance with an exemplary embodiment. In an exemplary embodiment, SLM training step 1110B is part of SLM re-ranking service 1110. [ The SLM training step 1110B accesses a database 1202 that includes publication information, which may include information such as a listing title, a search query, a product name, and the like. Various searches can be performed on this database to identify information related to the specific LeafCat for which the SLM model is created.

여기서, (1) 동작(1204)에서 최근 X 기간(예컨대, 8주)의 LeafCat에 대한 리스팅의 수; (2) 동작(1206)에서 LeafCat 내의 모든 발행물의 제품명; (3) 동작(1208)에서 최근 X 기간의 LeafCat에 대해 수행된 질의; (4) 동작(1210)에서 LeafCat에 대한 최근 X 기간의 리스팅 제목의 4개의 검색물이 특정되었다. 이들 각 검색의 결과는 상이한 방식으로 이용된다. 동작(1204)에서 액세스된 최근 X 기간의 LeafCat에 대한 리스팅의 수에 있어서, 이 정보는 동작(1212)에서 LeafCat에 대한 로그 사전 확률(LPP; log prior probability)을 생성하는데 사용된다. 이 프로세스는 이하에서 더 상세히 설명될 것이다.Here, (1) the number of listings for LeafCat in the recent X period (e.g., 8 weeks) in operation 1204; (2) the product name of all publications in LeafCat in operation 1206; (3) a query performed on LeafCat of the recent X period in operation 1208; (4) In operation 1210, four searches of the listing title in the recent X period for LeafCat have been specified. The results of each of these searches are used in a different manner. With respect to the number of listings for LeafCat in the last X period accessed in operation 1204, this information is used to generate a log prior probability (LPP) for LeafCat in operation 1212. This process will be described in more detail below.

동작(1206)에서 액세스된 LeafCat 내의 모든 발행물의 제품명에 있어서, 이 정보는 동작(1214)에서 코퍼스에 대한 텍스트 정규화를 통해 먼저 정규화(예를 들어, 오타 또는 대안적 철자(alternative spellings)가 정정)된 후, 리프 카테고리의 구조화된 데이터에 대응하는 구조적 데이터에 대한 SLM(1216)을 구성하는데 사용된다.For product names of all publications in LeafCat accessed in operation 1206, this information is first normalized (e.g., correcting typo or alternative spellings) via text normalization on the corpus in operation 1214, And then used to configure the SLM 1216 for structural data corresponding to the structured data of the leaf category.

동작(1208)에서 액세스된 최근 X 기간에서 LeafCat에 대해 수행된 질의에 있어서는, 동작(1218)에서 이 정보는 먼저 코퍼스에 대한 텍스트 정규화를 통해 정규화(예를 들어, 오타 또는 대안적 철자가 정정)된 후, LeafCat에 대한 SLM(1220)을 구성하는데 사용된다.For a query performed on LeafCat in the last X period accessed in operation 1208, this information is first normalized (e. G., Typed or alternate spelling corrected) through text normalization on the corpus, And then used to construct the SLM 1220 for LeafCat.

LeafCat에 대한 최근 X 기간의 리스팅 제목이 동작(1210)에서 액세스된다. 이 정보는 먼저 필터 A(1222) 및 필터 B(1224)를 포함하는 필터를 통과한다. 이들 필터(1222, 1224)는 리스팅 제목에 대해 가장 관련성이 높은 것으로 좁히는 역할을 한다. 여기서 예를 들어, 필터 A(1222)는 (카테고리화 알고리즘에 기초하여) 리스팅에 대한 상위 CatReco와 매칭되는 판매자 카테고리 선택을 포함하는 리스팅을 식별한다. 예를 들어, 필터 B(1224)는 각각의 리스팅에 대한 부적절한 분류화 점수를 임계 값(예를 들어, 100 중 60, 여기서 분류화되지 않은 리스팅의 가장 높은 우도가 300임)과 비교함으로써 분류화되지 않을 가능성이 낮은 리스팅을 식별한다. 이와 관련하여, 이 프로세스는 부적절한 분류화 점수가 리프 카테고리에 대한 SLM 재순위 지정 서비스(1110)의 런타임 프로세스를 사용하여 도출되기 때문에 다소 재귀적이며, 이는 도 12에 나타낸 이 단계에서 트레이닝된다. 그 후, 필터링된 결과의 텍스트를 정규화하기 위해, 동작(1226)에서 코퍼스에 대한 텍스트 정규화가 수행될 수 있다. 이 정규화의 결과는 2가지 방식으로 사용될 수 있다. 먼저, 각 LeafCat 제목에 대한 SLM(1228)이 트레이닝 세트의 일부로서 생성될 수 있다. 이와는 별도로, 나머지 결과는 튜닝 세트에서 사용될 수 있다.The listing title for the most recent X period for LeafCat is accessed in operation 1210. This information first passes through a filter comprising filter A 1222 and filter B 1224. These filters 1222 and 1224 serve as the most relevant to the listing title. Here, for example, filter A 1222 identifies a listing that includes a merchant category selection that matches the parent CatReco for listing (based on the categorization algorithm). For example, filter B 1224 may classify the inappropriate classification score for each listing by comparing it with a threshold (e.g., 60 of 100, where the highest likelihood of unlisted listings is 300) Which is less likely to fail. In this regard, this process is somewhat recursive because an inappropriate classification score is derived using the runtime process of the SLM re-ranking service 1110 for the leaf category, which is trained at this stage shown in FIG. Text normalization for the corpus may then be performed at operation 1226 to normalize the text of the filtered result. The result of this normalization can be used in two ways. First, an SLM 1228 for each LeafCat title may be generated as part of the training set. Apart from this, the remaining results can be used in tuning sets.

그 후, 구조화된 데이터에 대한 SLM(1216)(리프 카테고리의 구조화된 데이터에 대응함), LeafCat 각각에 대한 SLM(1220), 및 LeafCat의 각각에 대한 트레이닝 SLM(1228)은 동작(1230)에서 보간되어, LeafCat에 대한 조합된 SLM(1232)을 생성할 수 있다.The training SLM 1228 for each of the SLM 1216 (corresponding to structured data in the leaf category), SLM 1220 for each LeafCat, and LeafCat for structured data is then interpolated at operation 1230 To create a combined SLM 1232 for LeafCat.

튜닝 세트 측면에서, LeafCat에 대한 조합된 SLM(1232) 및 동작(1226)에서 코퍼스에 대한 텍스트 정규화의 출력은 동작(1234)에서 LeafCat의 각각의 리스팅에 대한 PPL 및 PPL_Std 평가에서 사용되어, 각 LeafCat 제목에 대한 예상 PPL 및 PPL_Std(1236)를 생성할 수 있다. 이 프로세스는 각 리프 카테고리에 대해 반복된다.On the tuning set side, the output of the text normalization for the corpus in combined SLM 1232 and operation 1226 for LeafCat is used in the PPL and PPL_Std evaluation for each listing of LeafCat in operation 1234, It is possible to generate the expected PPL and PPL_Std 1236 for the title. This process is repeated for each leaf category.

도 13은 예시적인 실시예에 따른 GBM 트레이닝 모델 프로세스(1300)를 나타내는 도면이다. 오프라인 자율 GBM 트레이닝 모델 프로세스(1300)에서, 부트스트랩 라벨링된 트레이닝 데이터의 세트(1320)는 CatRecos가 어떻게 선택되었는지 및 관련된 부적절 분류화 점수를 체크함으로써 자율 방식으로 도출될 수 있다. 라벨링된 트레이닝 데이터(1320)가 획득되면, GBM 특성 입력 파일(1360)은 SLM 재순위 지정 서비스(1110)로부터의 출력 및 기본 SSE CatReco 서비스(680)로부터의 출력에 기초하여 준비될 수 있다. 보다 구체적으로, SLM 재순위 지정 서비스(1110)는 트레이닝 데이터에 대한 SLM 혼잡도 편차 신호(1330) 및 트레이닝 데이터에 대한 SLM 순위 점수(1340)를 생성하고, 기본 SSE CatReco 서비스(680)는 트레이닝 데이터에 대한 SSE 유사도 점수(1350)를 생성한다. 그 후, GBM 트레이닝 프로세스를 사용하여 GBM 모델을 트레이닝할 수 있다. 동작(1370)에서, GBM 특징 파일이 동작(1370)에서 GBM 트레이닝을 위해 사용된다. GBM 트레이닝은 메타 데이터에 의해 GBM 모델(1380)을 생성한다.13 is a diagram illustrating a GBM training model process 1300 in accordance with an exemplary embodiment. In the off-line autonomous GBM training model process 1300, the set of bootstrap-labeled training data 1320 can be derived in an autonomous manner by checking how the CatRecos was selected and the associated improper classifying scores. Once the labeled training data 1320 is acquired, a GBM characteristic input file 1360 can be prepared based on the output from the SLM re-ranking service 1110 and the output from the basic SSE CatReco service 680. More specifically, SLM re-ranking service 1110 generates SLM congestion deviation signal 1330 for training data and SLM ranking score 1340 for training data and base SSE CatReco service 680 generates training data And generates an SSE similarity score (1350). The GBM training process can then be used to train the GBM model. In operation 1370, a GBM feature file is used for GBM training in operation 1370. [ The GBM training generates the GBM model 1380 by the metadata.

도 13에 따르면, 라벨링된 트레이닝 데이터(1320)는 동작(1302, 1304, 1306, 1308)을 사용하여 획득된다. 동작(1302)에서, 각 LeafCat에 대한 최근 X 기간에 대한 리스팅 제목이 데이터베이스(1301)로부터 액세스된다. 예를 들어, 최근 X 기간은 예시적인 실시예에서 최근의 8주로 지칭될 수 있다. 그 후, 필터의 2개의 계층이 이 정보에 적용된다. 동작(1304)에서, 필터 A가 동작(1302)의 출력을 사용하고, CatReco 알고리즘에 따라 상위 선택지와 매칭되는 판매자의 카테고리 선택지로 리스팅을 유지하고, 그 후 그 결과를 다음 동작(1306)으로 보내어, 제 2 사전 결정된 임계 값(예를 들어, 100 중 35, 즉 리스팅이 부적절하게 분류화될 낮은 우도를 의미함) 미만인 리스팅만을 유지하도록 다음 동작의 필터 B가 필터링한다. 이들 2개의 계층 필터 A 및 B의 요건을 충족시키는 리스팅 제목은 동작(1308)에서 부적절하게 분류화되지 않은 것으로 라벨링된다.According to FIG. 13, labeled training data 1320 is obtained using operations 1302, 1304, 1306, and 1308. At act 1302, a listing title for the most recent X period for each LeafCat is accessed from database 1301. For example, a recent X period may be referred to as the latest eight weeks in an exemplary embodiment. The two layers of the filter are then applied to this information. At operation 1304, filter A uses the output of operation 1302 and maintains the listing with the seller's category selection that matches the upper selection according to the CatReco algorithm, and then sends the result to the next operation 1306 , Filter B of the next operation filters to keep only those listings that are below a second predetermined threshold (e. G., 35 out of 100, i. E., A low likelihood that the listing will be improperly categorized). A listing title that meets the requirements of these two layer filters A and B is labeled as not improperly classified in operation 1308. [

전술한 바와 같이, 혼잡도 편차 신호(1330) 및 SLM 순위 점수(1340)는 라벨링된 트레이닝 데이터(1320)의 각 부분에 대한 SLM 재순위 지정 서비스(1110)로부터 도출될 수 있다. 또한, SSE 유사도 점수(1350)는 라벨링된 트레이닝 데이터(1320)의 각 부분에 대한 기본 CatReco 서비스(680)로부터 도출될 수 있다.Congestion level deviation signal 1330 and SLM ranking score 1340 may be derived from SLM re-ranking service 1110 for each portion of labeled training data 1320, as described above. SSE similarity score 1350 may also be derived from the base CatReco service 680 for each portion of the labeled training data 1320.

모듈, 구성요소 및 로직Modules, components, and logic

임의의 실시예는 본 명세서에서 로직 또는 다수의 구성요소, 모듈 또는 기구(mechanisms)를 포함하는 것으로 설명된다. 모듈은 소프트웨어 모듈(예컨대, 머신 판독 가능한 매체 상에서 구현된 코드) 또는 하드웨어 모듈을 구성할 수 있다. "하드웨어 모듈"은 특정 동작을 수행할 수 있는 유형의 유닛이고 특정 물리적 방식으로 구성 또는 정렬될 수 있다. 다양한 예시적 실시예에서, 하나 이상의 컴퓨터 시스템(예를 들면, 독립형 컴퓨터 시스템, 클라이언트 컴퓨터 시스템, 또는 서버 컴퓨터 시스템) 또는 컴퓨터 시스템의 하나 이상의 하드웨어 모듈(예를 들어, 프로세서 또는 프로세서의 그룹)은, 본 명세서에서 설명된 바와 같이 특정 동작을 수행하도록 동작하는 하드웨어 모듈로서 소프트웨어(예를 들어, 애플리케이션 또는 애플리케이션부)에 의해 구성될 수 있다. Any embodiment is described herein as including logic or a plurality of components, modules, or mechanisms. A module may comprise a software module (e.g., code implemented on a machine readable medium) or a hardware module. A "hardware module" is a unit of a type capable of performing a particular operation and may be configured or arranged in a particular physical manner. In various exemplary embodiments, one or more hardware modules (e.g., a group of processors or processors) of one or more computer systems (e.g., standalone computer systems, client computer systems, or server computer systems) May be configured by software (e.g., an application or application portion) as a hardware module that operates to perform a particular operation as described herein.

일부 실시예에서, 하드웨어 모듈은 기계적으로, 전자적으로 또는 이들의 임의의 적절한 조합으로 구현될 수 있다. 예를 들어, 하드웨어 모듈은 특정 동작을 수행하도록 영구적으로 구성된 전용 회로 또는 로직을 포함할 수 있다. 예를 들어, 하드웨어 모듈은 FPGA(Field-Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)과 같은 특수 목적 프로세서일 수 있다. 또한 하드웨어 모듈은 특정 동작을 수행하기 위해 소프트웨어에 의해 일시적으로 구성되는 프로그램 가능한 로직 또는 회로도 포함할 수 있다. 예를 들어, 하드웨어 모듈은 범용 프로세서 또는 다른 프로그램 가능한 프로세서에 의해 실행되는 소프트웨어를 포함할 수 있다. 이러한 소프트웨어에 의해 일단 구성되면 하드웨어 모듈은 구성된 기능을 수행하도록 고유하게 맞춤된 특정 기계(또는 기계의 특정 구성요소)로 되며, 더 이상 범용 프로세서가 아니다. 기계적으로, 전용 및 영구적으로 구성된 회로 또는 일시적으로 구성된 회로(예컨대, 소프트웨어에 의해 구성됨)에서 하드웨어 모듈을 구현하는 결정은 비용 및 시간을 고려하여 실현될 수 있음을 알 수 있다. In some embodiments, the hardware modules may be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware module may include dedicated circuitry or logic configured to perform a particular operation. For example, the hardware module may be a special purpose processor such as an FPGA (Field-Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit). The hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform a particular operation. For example, the hardware module may comprise software executed by a general purpose processor or other programmable processor. Once configured by such software, the hardware module becomes a specific machine (or a specific component of the machine) that is uniquely tailored to perform the configured function, and is no longer a general purpose processor. It can be seen that the decision to implement a hardware module in a mechanically, duly and permanently configured circuit or a temporarily configured circuit (e.g., configured by software) can be realized in terms of cost and time.

따라서, "하드웨어 모듈"이라는 문구는 물리적으로 구성, 영구적으로 구성(예를 들어, 하드웨어 배선) 또는 임시적으로 구성(예를 들어, 프로그래밍)되어 특정 방식으로 동작하거나 또는 본 명세서에서 설명하는 임의의 동작을 수행할 수 있는 엔티티가 유형의 엔티티임을 포함하는 것으로 이해되어야 한다. 본 명세서에서 사용된 바와 같이, "하드웨어 구현 모듈"은 하드웨어 모듈을 지칭한다. 하드웨어 모듈이 일시적으로 구성되는(예를 들어, 프로그래밍된) 실시예를 고려하면, 하드웨어 모듈 각각은 어느 한 시점에서 시간에 따라 구성되거나 또는 예시될 필요는 없다. 예를 들어, 하드웨어 모듈이 특수 목적 프로세서로 되도록 소프트웨어에 의해 구성된 범용 프로세서를 포함하는 경우, 범용 프로세서는 상이한 시간에 제각기 상이한 특수 목적 프로세서(예를 들어, 상이한 하드웨어 모듈을 포함함)로 구성될 수 있다. 따라서 소프트웨어는, 예를 들어 한 시점에서 특정 하드웨어 모듈을 구성하고 상이한 시점에서 상이한 하드웨어 모듈을 구성하도록, 특정 프로세서 또는 프로세서를 구성한다. Thus, the phrase "hardware module" is intended to encompass all types of hardware components, including, but not limited to, physically configured, permanently configured (e.g., hardwired) or temporarily configured (e.g., Quot; is an entity of a type. As used herein, "hardware implementation module" refers to a hardware module. Considering an embodiment in which a hardware module is temporarily configured (e.g., programmed), each of the hardware modules need not be configured or illustrated with respect to time at any one time. For example, if a hardware module includes a general purpose processor configured by software to be a special purpose processor, the general purpose processor may be configured with a different special purpose processor (e.g., including different hardware modules) at different times have. Thus, the software configures a particular processor or processor, e.g., to configure a particular hardware module at a time and to configure different hardware modules at different points in time.

하드웨어 모듈은 다른 하드웨어 모듈에 정보를 제공하고 다른 하드웨어 모듈로부터 정보를 수신할 수 있다. 따라서, 설명된 하드웨어 모듈은 통신 가능하게 결합된 것으로 간주될 수 있다. 동시에 다수의 하드웨어 모듈이 존재하는 경우, 2개 이상의 하드웨어 모듈 사이에서 신호 전송을 통해(예컨대, 적절한 회로 및 버스를 통해) 통신이 이루어질 수 있다. 다수의 하드웨어 모듈이 상이한 시간에 구성되거나 인스턴스화되는 실시예에서, 그러한 하드웨어 모듈간의 통신은, 예를 들어 다수의 하드웨어 모듈이 액세스되는 메모리 구조 내의 정보의 저장 및 검색을 통해 달성될 수 있다. 예를 들어, 하나의 하드웨어 모듈은 동작을 수행하고 그 동작의 출력을 통신 가능하게 결합된 메모리 장치에 저장할 수 있다. 이 후, 추가의 하드웨어 모듈은 메모리 장치를 액세스하여 저장된 출력을 취득해서 처리할 수 있다. 또한 하드웨어 모듈은 입력 장치 또는 출력 장치와의 통신을 시작하고 리소스에서 동작할 수 있다(예컨대, 정보 수집). A hardware module may provide information to other hardware modules and may receive information from other hardware modules. Thus, the described hardware modules may be considered to be communicatively coupled. If multiple hardware modules are present at the same time, communication may be accomplished through signal transmission between two or more hardware modules (e.g., via appropriate circuitry and buses). In embodiments where multiple hardware modules are configured or instantiated at different times, communication between such hardware modules may be accomplished through, for example, storage and retrieval of information in a memory structure in which multiple hardware modules are accessed. For example, one hardware module may perform an operation and store the output of the operation in a communicatively coupled memory device. Thereafter, the additional hardware module may access the memory device to acquire and process the stored output. The hardware module may also initiate communication with the input device or output device and operate on the resource (e.g., information collection).

본 명세서에 설명된 예시적인 방법의 다양한 동작은 관련 동작을 수행하도록 (예를 들어, 소프트웨어에 의해) 일시적으로 구성되거나 영구적으로 구성된 하나 이상의 프로세서에 의해 적어도 부분적으로 수행될 수 있다. 일시적으로 또는 영구적으로 구성되더라도, 그러한 프로세서는 본 명세서에서 설명된 하나 이상의 동작 또는 기능을 수행하도록 동작하는 프로세서 구현 모듈을 구성할 수 있다. 본 명세서에서 사용된 바와 같이, "프로세서 구현 모듈"은 하나 이상의 프로세서를 사용하여 구현된 하드웨어 모듈을 지칭한다. The various operations of the exemplary methods described herein may be performed, at least in part, by one or more processors that are configured temporarily or permanently (e.g., by software) to perform the associated operations. Even if configured temporarily or permanently, such a processor may constitute a processor implementation module that operates to perform one or more of the operations or functions described herein. As used herein, "processor implementation module" refers to a hardware module implemented using one or more processors.

마찬가지로, 본 명세서에서 설명된 방법은 적어도 부분적으로 프로세서로 구현될 수 있으며, 특정 프로세서 또는 프로세서는 하드웨어의 일례이다. 예를 들어, 방법의 적어도 일부의 동작은 하나 이상의 프로세서 또는 프로세서 구현 모듈에 의해 수행될 수 있다. 게다가, 하나 이상의 프로세서는 또한 "클라우드 컴퓨팅" 환경에서 또는 "서비스형 소프트웨어"(SaaS; software as a service)로서 관련 동작의 성능을 지원하도록 동작할 수 있다. 예를 들어, 적어도 일부의 동작은 이 동작이 네트워크(예를 들어, 인터넷) 및 하나 이상의 적절한 인터페이스(예를 들어, API(Application Program Interface))를 통해 액세스 가능한 (프로세서를 포함하는 기계의 예로서의) 컴퓨터의 그룹에 의해 수행될 수 있다. Likewise, the methods described herein may be implemented, at least in part, in a processor, where a particular processor or processor is an example of hardware. For example, the operation of at least some of the methods may be performed by one or more processors or processor implementation modules. In addition, the one or more processors may also be operable to support performance of associated operations in a " cloud computing "environment or as" software as a service " (SaaS). For example, at least some of the operations may be performed in a manner such that the operations are accessible through a network (e.g., the Internet) and one or more appropriate interfaces (e.g., Application Program Interface (API) Can be performed by a group of computers.

특정 동작의 성능은 단일 머신 내에 존재할 뿐만 아니라 다수의 머신에 걸쳐 배치되는 프로세서 사이에 분산될 수 있다. 일부 예시적인 실시예에서, 프로세서 또는 프로세서 구현 모듈은 단일 지리적 위치(예를 들어, 가정 환경, 사무실 환경 또는 서버 팜 내)에 위치될 수 있다. 다른 예시적인 실시예에서, 프로세서 또는 프로세서 구현 모듈은 다수의 지리적 위치에 걸쳐 분산될 수 있다.The performance of a particular operation may not only be within a single machine, but may also be distributed among processors that are located across multiple machines. In some exemplary embodiments, the processor or processor implementation module may be located in a single geographic location (e.g., in a home environment, an office environment, or a server farm). In other exemplary embodiments, the processor or processor implementation module may be distributed across multiple geographic locations.

머신 및 소프트웨어 구조Machine and software architecture

도 1 내지 도 6을 참조하여 기술된 모듈, 방법, 애플리케이션 등은 머신 및 관련 소프트웨어 구조의 컨텍스트로 일부 실시예에서 구현된다. 이하의 섹션은 개시된 실시예에서 사용하기에 적합한 대표적인 소프트웨어 구조 및 머신(예를 들어, 하드웨어) 구조를 설명한다. The modules, methods, applications, etc., described with reference to Figs. 1-6, are implemented in some embodiments in the context of a machine and associated software architecture. The following sections describe exemplary software structures and machine (e.g., hardware) structures suitable for use in the disclosed embodiments.

소프트웨어 구조는 하드웨어 구조와 관련하여 특정 목적에 맞춤된 장치 및 머신을 만들기 위해 사용된다. 예를 들어, 특정 소프트웨어 구조와 결합된 특정 하드웨어 구조는 이동 전화, 태블릿 장치 등과 같은 이동 장치를 만들 것이다. 약간 다른 하드웨어 및 소프트웨어 구조는 "사물 인터넷(internet of things)"에서 사용할 스마트 장치를 산출할 수 있다. 또 다른 조합이 클라우드 컴퓨팅 구조 내에서 사용하기 위한 서버 컴퓨터를 제조한다. 당업자라면 본 명세서에 포함된 개시로부터 상이한 컨텍스트로 본 발명을 구현하는 방법을 쉽게 이해할 수 있으므로, 그러한 소프트웨어 및 하드웨어 구조의 모든 조합이 본 명세서에 제시된 것은 아니다.A software architecture is used to create machines and devices that are tailored to a particular purpose in relation to the hardware architecture. For example, a particular hardware architecture combined with a particular software architecture would make mobile devices such as mobile phones, tablet devices, and the like. A slightly different hardware and software architecture can yield smart devices to be used in the "internet of things ". Another combination produces a server computer for use within a cloud computing framework. Those skilled in the art will readily understand how to implement the invention in different contexts from the teachings contained herein, and not all combinations of such software and hardware structures are provided herein.

소프트웨어 구조Software architecture

도 14는 본 명세서에서 설명된 다양한 하드웨어 구조와 함께 사용될 수 있는 대표적인 소프트웨어 구조(1402)를 나타내는 블록도(1400)를 도시한다. 도 14는 단지 소프트웨어 구조의 비제한적 예일 뿐이며, 본 명세서에서 설명된 기능을 용이하게 하기 위해 많은 다른 구조가 구현될 수 있다는 것을 이해할 것이다. 소프트웨어 구조(1402)는 다른 것 중에서도, 도 15에서의 프로세서(1510), 메모리(1530) 및 I/O 구성요소(1550)를 포함하는 머신(1500)과 같은 하드웨어 상에서 실행될 수 있다. 대표적인 하드웨어 계층(1404)이 도시되어 있으며, 예를 들어 도 15의 머신(1500)을 나타낼 수 있다. 대표적인 하드웨어 계층(1404)은 관련된 실행 가능 명령어(1408)를 갖는 하나 이상의 처리부(1406)를 포함한다. 실행 가능한 명령어(1408)는 도 1 내지 도 13의 방법, 모듈 등의 구현법을 포함하는 소프트웨어 구조(1402)의 실행 가능한 명령어를 나타낸다. 또한 하드웨어 계층(1404)은 메모리 또는 저장 모듈(1410)을 포함하며, 메모리 또는 저장 모듈도 또한 실행 가능 명령어(1408)를 포함한다. 또한 하드웨어 계층(1404)은 참조부호 1412로 나타내는 다른 하드웨어도 포함할 수 있으며, 상기 다른 하드웨어는 머신(1500)의 일부로서 도시된 다른 하드웨어와 같은 하드웨어 계층(1404)의 임의의 다른 하드웨어를 나타낼 수 있다.FIG. 14 shows a block diagram 1400 illustrating an exemplary software architecture 1402 that may be used with the various hardware architectures described herein. 14 is only a non-limiting example of a software architecture, and it will be appreciated that many other architectures may be implemented to facilitate the functionality described herein. Software structure 1402 may be executed on hardware such as machine 1500, among others, including processor 1510, memory 1530 and I / O component 1550 in FIG. A representative hardware layer 1404 is shown and may represent, for example, the machine 1500 of FIG. Exemplary hardware layer 1404 includes one or more processors 1406 having associated executable instructions 1408. [ Executable instructions 1408 represent executable instructions of the software architecture 1402, including implementations of the methods, modules, etc. of FIGS. 1-13. The hardware layer 1404 also includes a memory or storage module 1410, which also includes executable instructions 1408. The hardware layer 1404 may also include other hardware, such as 1412, which may represent any other hardware in the hardware layer 1404, such as other hardware depicted as part of the machine 1500 have.

도 14의 예시적인 구조에서, 소프트웨어(1402)는 각 계층이 특정 기능을 제공하는 계층의 스택으로서 개념화될 수 있다. 예를 들어, 소프트웨어(1402)는 운영 체제(1414), 라이브러리(1416), 프레임워크/미들웨어(1418), 애플리케이션(1420) 및 표현 계층(1444)과 같은 계층을 포함할 수 있다. 동작적으로, 계층 내의 애플리케이션(1420) 또는 다른 구성요소는 소프트웨어 스택을 통해 API 호출(1424)을 호출할 수 있고, API 호출(1424)에 응답하여 응답, 리턴 값 등(메시지(1426)로 도시됨)을 수신할 수 있다. 설명된 계층은 본질적으로 대표적인 것이며 모든 소프트웨어 구조가 모든 계층을 갖는 것은 아니다. 예를 들어, 일부 모바일 또는 특수 목적 운영 체제는 프레임워크/미들웨어 계층(1418)을 제공하지 않을 수 있지만, 다른 모바일 또는 특수 목적 운영 체계가 그러한 계층을 제공할 수도 있다. 다른 소프트웨어 구조는 추가 계층 또는 다른 계층을 포함할 수 있다.In the exemplary structure of FIG. 14, the software 1402 may be conceptualized as a stack of layers where each layer provides a specific function. For example, the software 1402 may include layers such as an operating system 1414, a library 1416, a framework / middleware 1418, an application 1420 and a presentation layer 1444. The application 1420 or other component in the hierarchy may invoke an API call 1424 through the software stack and return a response, return value, etc. (in response to the API call 1424) Can be received. The described hierarchy is inherently representative and not all software structures have all hierarchies. For example, some mobile or special purpose operating systems may not provide the framework / middleware layer 1418, but other mobile or special purpose operating systems may provide such a layer. Other software architectures may include additional layers or other layers.

운영 체제(1414)는 하드웨어 리소스를 관리하고 공통 서비스를 제공할 수 있다. 운영 체제(1414)는 예를 들어 커널(1428), 서비스(1430) 및 드라이버(1432)를 포함할 수 있다. 커널(1428)은 하드웨어 계층과 소프트웨어 계층간의 추상화 계층(abstraction layer)으로서 기능할 수 있다. 예를 들어, 커널(1428)은 메모리 관리, 프로세서 관리(예컨대, 스케줄링), 구성요소 관리, 네트워킹, 보안 설정 등을 담당할 수 있다. 서비스(1430)는은 다른 소프트웨어 계층을 위한 다른 공통 서비스를 제공할 수 있다. 드라이버(1432)는 하부 하드웨어(underlying hardware)를 제어하거나 인터페이스하는 것을 담당할 수 있다. 예를 들어, 드라이버(1432)은 하드웨어 구성에 따라 디스플레이 드라이버, 카메라 드라이버, Bluetooth^® 드라이버, 플래시 메모리 드라이버, 직렬 통신 드라이버(예를 들어, USB(Universal Serial Bus) 드라이버), WI-Fi^® 드라이버, 오디오 드라이버, 전력 관리 드라이버 등을 포함할 수 있다. The operating system 1414 can manage hardware resources and provide a common service. Operating system 1414 may include, for example, kernel 1428, service 1430, and driver 1432. The kernel 1428 may serve as an abstraction layer between the hardware layer and the software layer. For example, the kernel 1428 may be responsible for memory management, processor management (e.g., scheduling), component management, networking, security settings, and the like. Service 1430 may provide other common services for other software layers. The driver 1432 may be responsible for controlling or interfacing the underlying hardware. For example, driver 1432 is a display driver, a camera driver, Bluetooth ^® drivers, flash memory driver, a serial communication driver, according to the hardware configuration (such as a Universal Serial Bus (g., USB) driver), WI-Fi ^® drivers, Audio drivers, power management drivers, and the like.

라이브러리(1416)는 애플리케이션(1420) 및/또는 다른 컴포넌트 및/또는 계층에 의해 이용될 수 있는 공통 기반 구조(common infrastructure)를 제공할 수 있다. 라이브러리(1416)는 일반적으로 다른 소프트웨어 모듈이 기본 운영 체제(1414)의 기능(예컨대, 커널(1428), 서비스(1430) 또는 드라이버(1432))과 직접 인터페이스하는 것보다 쉬운 방식으로 동작을 수행할 수 있게 하는 기능을 제공한다. 라이브러리(1416)는 메모리 할당 기능, 문자열 조작 기능, 수학 기능 등과 같은 기능을 제공할 수 있는 시스템(1434) 라이브러리(예컨대, C 표준 라이브러리)를 포함할 수 있다. 또한, 라이브러리(1416)는 미디어 라이브러리(예를 들어, MPREG4, H.264, MP3, AAC, AMR, JPG, PNG와 같은 다양한 미디어 포맷의 표현 및 조작을 지원하는 라이브러리), 그래픽 라이브러리(예를 들어, 디스플레이 상의 그래픽 컨텐츠에서 2D 및 3D를 렌더링하는데 사용될 수 있는 OpenGL 프레임워크), 데이터베이스 라이브러리(예를 들어, 다양한 관계형 데이터베이스 기능을 제공할 수 있는 SQLite), 웹 라이브러리(예를 들어 웹 브라우징 기능을 제공할 수 있는 WebKit) 등과 같은 API 라이브러리(1436)를 포함할 수 있다. 또한 라이브러리(1416)는 애플리케이션(1420) 및 다른 소프트웨어 구성요소/모듈에 많은 다른 API를 제공하기 위해 다양한 다른 라이브러리(1438)를 포함할 수 있다.Library 1416 may provide a common infrastructure that may be utilized by application 1420 and / or other components and / or layers. Library 1416 typically performs operations in a manner that is easier than other software modules that interface directly with the functionality of base operating system 1414 (e.g., kernel 1428, service 1430, or driver 1432) It provides the ability to Library 1416 may include a system 1434 library (e.g., a C standard library) that may provide functions such as memory allocation, string manipulation, mathematical functions, and the like. The library 1416 may also include a media library (e.g., a library that supports the presentation and manipulation of various media formats, such as MPREG4, H.264, MP3, AAC, AMR, JPG, , An OpenGL framework that can be used to render 2D and 3D in graphics content on the display), a database library (e.g., SQLite that can provide a variety of relational database capabilities), a web library (e.g., And an API library 1436, such as WebKit, The library 1416 may also include various other libraries 1438 to provide many other APIs for the application 1420 and other software components / modules.

프레임워크(1418)(때로는 미들웨어라고도 함)는 애플리케이션(1420) 또는 다른 소프트웨어 구성요소/모듈에 의해 이용될 수 있는 상위 계층 공통 기반 구조를 제공할 수 있다. 예를 들어, 프레임워크(1418)는 다양한 그래픽 사용자 인터페이스(GUI) 기능, 고위 리소스 관리, 고위 위치 서비스 등을 제공할 수 있다. 프레임워크(1418)는 애플리케이션(1420) 및/또는 다른 소프트웨어 구성요소/모듈에 의해 이용될 수 있는 넓은 범위의 다른 API를 제공할 수 있으며, 그 중 일부는 특정 운영 시스템 또는 플랫폼으로 특정될 수 있다. Framework 1418 (sometimes referred to as middleware) may provide a higher layer common infrastructure that may be utilized by application 1420 or other software components / modules. For example, the framework 1418 may provide various graphical user interface (GUI) functions, senior resource management, senior location services, and the like. Framework 1418 may provide a wide range of other APIs that may be utilized by application 1420 and / or other software components / modules, some of which may be specific to a particular operating system or platform .

애플리케이션(1420)은 빌트인 애플리케이션(built-in applications)(1440) 및/또는 제3자 애플리케이션(1442)을 포함한다. 대표적인 빌트인 애플리케이션(1440)의 예로서는, 주소록 애플리케이션(contacts application), 브라우저 애플리케이션, 북 리더 애플리케이션(book reader application), 위치 애플리케이션, 미디어 애플리케이션, 메시징 애플리케이션 및/또는 게임 애플리케이션을 포함할 수 있지만, 이에 한정되는 것은 아니다. 제3자 애플리케이션(1442)은 임의의 빌트인 애플케이션뿐만 아니라 광범위한 다른 애플리케이션을 포함할 수 있다. 특정 예로서, 제3자 애플리케이션(1442)(예를 들어, 특정 플랫폼 벤더 이외의 엔티티에 의해 Android(상표) 또는 iOS(상표) 소프트웨어 개발 키트(SDK; software development kit)을 사용하여 개발된 애플리케이션)은 iOS(상표), Android(상표), Windows® Phone 또는 기타 이동 운영 체제와 같은 운영 체제에 의해 구동되는 모바일 소프트웨어일 수 있다 이 예에서는, 제3자 애플리케이션(1442)은 본 명세서에서 설명된 기능을 용이하게 하기 위해 운영 체제(1414)와 같은 모바일 운영 체제에 의해 제공되는 API 호출(1424)을 호출할 수 있다. Applications 1420 include built-in applications 1440 and / or third party applications 1442. Exemplary built-in applications 1440 include, but are not limited to, contacts applications, browser applications, book reader applications, location applications, media applications, messaging applications, and / It is not. Third party application 1442 may include any built-in application as well as a wide variety of other applications. As a specific example, a third party application 1442 (e.g., an application developed using an Android trademark or an iOS (trademark) software development kit (SDK) by an entity other than a particular platform vendor) In this example, the third party application 1442 may be the mobile software that is operated by an operating system such as iOS (trademark), Android (trademark), Windows® Phone or other mobile operating system. API call 1424 provided by a mobile operating system, such as operating system 1414,

애플리케이션(1420)은 빌트인 운영 체제 기능(예를 들어, 커널(1428), 서비스(1430) 및/또는 드라이버(1432)), 라이브러리(예를 들어, 시스템(1434), API(1436) 및 다른 라이브러리(1438)), 및/또는 프레임워크/미들웨어(1418)를 사용하여 당해 시스템의 사용자와 상호 작용하는 사용자 인터페이스를 생성할 수 있다. 대안적으로 또는 부가적으로, 일부 시스템에서, 사용자와의 상호 작용은 표현 계층(1444)과 같은 표현 계층을 통해 발생할 수 있다. 이들 시스템에서, 애플리케이션/모듈 "로직"은 사용자와 상호 작용하는 애플리케이션/모듈의 측면에서 분리될 수 있다.The application 1420 may include a set of operating system functions such as built-in operating system functionality (e.g., kernel 1428, service 1430 and / or driver 1432), library (e.g., system 1434, API 1436, (E.g., application 1438), and / or framework / middleware 1418 to interact with a user of the system. Alternatively, or in addition, in some systems, interaction with a user may occur through a presentation layer, such as presentation layer 1444. In these systems, the application / module "logic" can be separated in terms of the application / module interacting with the user.

일부 소프트웨어 구조는 가상 머신(virtual machine)을 이용한다. 도 14의 예에서, 이는 가상 머신(1448)으로 도시된다. 가상 머신은 애플리케이션/모듈이 하드웨어 머신(예를 들어, 도 15의 머신)에서 실행 중인 것처럼 수행할 수 있는 소프트웨어 환경을 생성한다. 가상 머신은 호스트 운영 체제(도 15의 운영 체제(1414))에 의해 호스팅되며, 통상적으로 항상은 아니지만, 가상 머신 모니터(1446)을 가지며, 상기 가상 머신 모니터는 가상 머신의 동작뿐만 아니라 호스트 운영 체제(즉, 운영 체제(1414))와의 인터페이싱을 관리한다. 소프트웨어 구조는 운영 체제(1450), 라이브러리(1452), 프레임워크/미들웨어(1454), 애플리케이션(1456) 및/또는 표현 계층(1458)과 같은 가상 머신 내에서 실행된다. 가상 머신(1448) 내에서 실행되는 이들 소프트웨어 구조의 계층은 전술한 계층과 동일하거나 상이할 수도 있다.Some software architectures use virtual machines. In the example of FIG. 14, this is shown as a virtual machine 1448. The virtual machine creates a software environment in which the application / module can perform as if it were running on a hardware machine (e.g., the machine of FIG. 15). The virtual machine is hosted by a host operating system (operating system 1414 in FIG. 15), and typically, but not always, has a virtual machine monitor 1446, which not only monitors the operation of the virtual machine, (I.e., operating system 1414). The software architecture is implemented in a virtual machine such as operating system 1450, library 1452, framework / middleware 1454, application 1456 and / or presentation layer 1458. The layers of these software structures implemented in the virtual machine 1448 may be the same as or different from the layers described above.

예시적 머신 구조 및 머신 판독 가능한 매체Exemplary machine structures and machine readable media

도 15는 머신 판독 가능한 매체(예를 들어, 머신 판독 가능한 저장 매체)로부터 명령어를 판독 가능하고 본 명세서에서 설명한 임의의 하나 이상의 방법을 실행 가능한 일부 예시적 실시 형태에 따른 머신(1500)의 구성요소를 도시하는 블록도이다. 구체적으로, 도 15는 컴퓨터 시스템의 예시적인 형태의 머신(1500)의 개략도를 도시하며, 이 머신(1500) 내의 명령어(1516)(예를 들어, 소프트웨어, 프로그램, 애플리케이션, 애플릿, 앱 또는 다른 실행 가능한 코드)이 상기 머신이 본 명세서에서 설명된 임의의 하나 이상의 방법을 실행할 수 있도록 한다. 예를 들어, 명령어는 머신이 도 14의 흐름도를 실행하도록 할 수 있다. 부가적으로 또는 대안적으로, 명령어는 도 5a 내지 도 13 등을 구현할 수 있다. 명령어는 설명된 방식으로 기술되고 개시된 기능을 수행하도록, 일반적인 프로그래밍되지 않은 머신을 프로그래밍된 특정 머신로 변환한다. 대안적 실시 형태에서, 머신(1500)은 독립형 장치로서 동작하거나 다른 머신에 결합(예를 들어, 네트워킹)될 수 있다. 네트워크 배치에 있어서, 머신(1500)은 서버-클라이언트 네트워크 환경에서 서버 머신 또는 클라이언트 머신으로서 동작할 수 있거나, 피어-투-피어(또는 분산) 네트워크 환경에서 피어 머신으로서 동작할 수 있다. 머신(1500)은 자신이 취할 동작을 특정하는 서버 컴퓨터, 클라이언트 컴퓨터, PC(personal computer), 태블릿 컴퓨터, 랩톱 컴퓨터, 넷북, 셋톱 박스(STB), PDA, 엔터테인먼트 미디어 시스템(entertainment media system), 휴대 전화, 스마트 폰, 모바일 장치, 이동 장치, 웨어러블 장치(wearable device)(예컨대, 스마트 워치), 스마트 홈 장치(예컨대, 스마트 가전(smart appliance)), 다른 스마트 장치, 웹 가전(web appliance), 네트워크 라우터, 네트워크 스위치, 네트워크 브릿지, 또는 명령어(1516)를 실행 가능한 임의의 머신을 순차적으로 또는 이와 달리 포함할 수 있지만, 이에 한정되는 것은 아니다. 또한, 단지 하나의 머신(1500)이 도시되어 있지만, "머신"이라는 용어는 본 명세서에서 설명된 임의의 하나 이상의 방법을 수행하기 위해 개별적으로 또는 공동으로 명령어(1516)를 실행하는 머신(1500)의 집합도 포함되어야 한다. FIG. 15 is a block diagram of a component of a machine 1500 according to some illustrative embodiments in which any one or more of the methods described herein may be read from and read from a machine-readable medium (e.g., a machine-readable storage medium) Fig. 15 illustrates a schematic diagram of a machine 1500 of an exemplary form of a computer system and includes instructions 1516 (e.g., software, programs, applications, applets, Code) enables the machine to execute any one or more of the methods described herein. For example, the instruction may cause the machine to execute the flowchart of FIG. Additionally or alternatively, the instructions may implement Figures 5A-13, etc. The instructions translate a general unprogrammed machine into a programmed specific machine so as to perform the functions described and described in the manner described. In an alternative embodiment, the machine 1500 may operate as a stand-alone device or may be coupled (e.g., networked) to another machine. In a network deployment, the machine 1500 may operate as a server machine or client machine in a server-client network environment, or may operate as a peer machine in a peer-to-peer (or distributed) network environment. The machine 1500 may be a server computer, a client computer, a personal computer, a tablet computer, a laptop computer, a netbook, a set top box (STB), an entertainment media system, A mobile device, a wearable device (e.g., a smart watch), a smart home device (e.g., a smart appliance), other smart devices, a web appliance, a network But are not limited to, routers, network switches, network bridges, or any machine capable of executing instructions 1516, sequentially or otherwise. Also, while only one machine 1500 is shown, the term "machine" refers to a machine 1500 that executes instructions 1516 individually or collectively to perform any one or more of the methods described herein. Should also be included.

머신(1500)은 프로세서(1510), 메모리(1530) 및 I/O 구성요소(1550)를 포함할 수 있으며, 이들은 버스(1502)를 통해 서로와 통신하도록 구성될 수 있다. 예시적 실시예에서, 프로세서(1510)(예를 들어, CPU(Central Processing Unit), RISC(Reduced Instruction Set Computing) 프로세서, CISC(Complex Instruction Set Computing) 프로세서, GPU(Graphics Processing Unit), DSP(Digital Signal Processor), ASIC, RFIC(Radio-Frequency Integrated Circuit), 다른 프로세서, 또는 이들의 임의의 적절한 조합)은, 예를 들어 명령어(1516)를 실행할 수 있는 프로세서(1512) 및 프로세서(1514)를 포함할 수 있다. "프로세서"라는 용어는 동시적으로 명령어를 실행할 수 있는 2개 이상의 독립 프로세서(때로는 "코어"라고 함)를 포함할 수 있는 멀티-코어 프로세서를 포함하는 것을 의도하고 있다. 비록 도 15가 다수의 프로세서를 도시하지만, 머신(1500)은 단일 코어를 갖는 단일 프로세서, 멀티 코어를 갖는 단일 프로세서(예를 들어, 멀티 코어 프로세스), 단일 코어를 갖는 다중 프로세서, 멀티 코어를 갖는 다중 프로세서, 또는 이들의 임의의 조합을 포함할 수 있다. 메모리/저장부(1530)는 버스(1502)를 통해 프로세서(1510)에 액세스 가능한 주 메모리 또는 다른 메모리 저장부와 같은 메모리(1532) 및 저장부(1536)를 포함할 수 있다. 저장부(1536) 및 메모리(1532)는 본 명세서에서 설명된 임의의 하나 이상의 방법 또는 기능을 구현하는 명령어(1516)를 저장한다. 또한 명령어(1516)는 머신(1500)에 의한 실행 동안 메모리(1532) 내, 저장부(1536) 내, 프로세서(1510) 중 적어도 하나의 프로세서 내(예를 들어, 프로세서의 캐시 메모리 내), 또는 이들의 임의의 적절한 조합 내에 완전하게 또는 부분적으로 존재할 수도 있다. 따라서, 메모리(1532), 저장부(1536) 및 프로세서(1510)의 메모리는 머신 판독 가능한 매체의 예이다. The machine 1500 may include a processor 1510, a memory 1530 and an I / O component 1550, which may be configured to communicate with one another via a bus 1502. In an exemplary embodiment, a processor 1510 (e.g., a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU) (E.g., a signal processor, an ASIC, a Radio-Frequency Integrated Circuit (RFIC), another processor, or any suitable combination thereof) includes a processor 1512 and a processor 1514, can do. The term "processor" is intended to include a multi-core processor that may include two or more independent processors (sometimes referred to as "cores ") capable of executing instructions concurrently. Although FIG. 15 illustrates multiple processors, machine 1500 may be implemented as a single processor with a single core, a single processor (e.g., multicore process) with multiple cores, multiple processors with a single core, Multiple processors, or any combination thereof. Memory / storage 1530 may include memory 1532 and storage 1536 such as main memory or other memory storage accessible to processor 1510 via bus 1502. [ Storage 1536 and memory 1532 store instructions 1516 that implement any one or more of the methods or functions described herein. The instructions 1516 may also be stored in memory 1532, in storage 1536, in at least one of the processors 1510 (e.g., in a cache memory of the processor), during execution by the machine 1500, Or may be completely or partially within any suitable combination of these. Thus, the memory 1532, the storage 1536, and the memory of the processor 1510 are examples of machine-readable media.

본 명세서에서 사용된 바와 같이, "머신 판독 가능한 매체"는 명령어 및 데이터를 일시적으로 또는 영구적으로 저장할 수 있는 장치를 의미하며, RAM(randm-access memory), ROM(read-only memory), 버퍼 메모리, 플래시 메모리, 광학 매체, 자기 매체, 캐시 메모리, 다른 유형의 저장 장치(예를 들어, EEPROM(Erasable Programmable Read-Only Memory)), 또는 이들의 임의의 적절한 조합을 포함할 수 있지만, 이에 한정되는 것은 아니다. "머신 판독 가능한 매체"라는 용어는 명령어(1516)를 저장할 수 있는 단일 매체 또는 다중 매체(예를 들어, 집중형 데이터베이스 또는 분산형 데이터베이스, 또는 관련된 캐시 및 서버)를 포함해야 한다. 또한 "머신 판독 가능한 매체"라는 용어는 머신(예를 들어, 머신(1500))에 의한 실행을 위한 명령어(예를 들어, 명령어(1516))를 저장할 수 있는 임의의 매체 또는 다중 매체의 조합을 포함하도록 해야 하므로, 머신(1500)의 하나 이상의 프로세서(예를 들어, 프로세서(1510))에 의해 실행될 때, 명령어는 본 명세서에서 설명되는 임의의 하나 이상의 방법을 당해 머신(1500)으로 하여금 실행하게 한다. "Machine-readable medium" as used herein refers to a device capable of temporarily or permanently storing instructions and data, and includes a random-access memory (RAM), a read-only memory (ROM) But are not limited to, flash memory, optical media, magnetic media, cache memories, other types of storage devices (e.g., erasable programmable read-only memory (EEPROM) It is not. The term "machine-readable medium" should include a single medium or medium (e.g., a centralized or distributed database, or associated cache and server) capable of storing instructions 1516. The term "machine-readable medium" also refers to any medium or combination of media capable of storing instructions (e.g., instructions 1516) for execution by a machine (e.g., machine 1500) When executed by one or more processors (e.g., processor 1510) of machine 1500, the instructions may cause the machine 1500 to perform any one or more of the methods described herein do.

따라서, "머신 판독 가능한 매체"는 단일 저장 장치 또는 디바이스를 지칭할 뿐만 아니라, 다중 저장 장치 또는 디바이스를 포함하는 "클라우드 기반" 저장 시스템 또는 저장 네트워크도 지칭하고 있다. Thus, "machine readable medium" refers not only to a single storage device or device, but also to a "cloud-based" storage system or storage network that includes multiple storage devices or devices.

I/O 구성요소(1550)는 입력 수신, 출력 제공, 출력 생성, 정보 전송, 정보 교환, 측정 캡쳐 등을 위해 매우 다양한 구성요소를 포함할 수 있다. 특정 머신에 포함되는 특정 I/O 구성요소(1550)는 머신의 유형에 의존할 것이다. 예를 들어, 이동 전화와 같은 휴대용 머신(portable machine)는 터치 입력 장치 또는 다른 입력 메커니즘을 포함할 것이지만, 헤드리스 서버 머신(headless server machine)는 그러한 터치 입력 장치를 포함하지 않을 것이다. I/O 구성요소(1550)는 도 15에 도시되지 않은 많은 다른 구성요소를 포함할 수 있음을 알 것이다. I/O 구성요소(1550)는 단지 이하의 설명을 단순화하기 위해 기능에 따라 그룹화되며, 그룹화는 결코 제한적인 것이 아니다. 다양한 예시적인 실시예에서, I/O 구성요소(1550)는 출력 구성요소(1552) 및 입력 구성요소(1554)를 포함할 수 있다. 출력 구성요소(1552)는 시각 구성요소(예를 들어, PDP(plasma display panel), LED(light emitting diode) 디스플레이, LCD(liquid crystal display, 액정 디스플레이), 프로젝터 또는 CRT(cathode ray tube, 음극선관)), 음향 구성요소(예를 들어, 스피커), 햅틱 구성요소(예를 들어, 진동 모터, 저항 메커니즘), 기타 신호 발생기 등을 포함할 수 있다. 입력 구성요소(1554)는 영숫자 입력 구성요소(예를 들어, 키보드, 영숫자 입력을 수신하도록 구성된 터치 스크린, 광전 키보드(photo-optical keyboard), 또는 다른 영숫자 입력 구성요소), 포인트 기반 입력 구성요소(예를 들어, 마우스, 트랙볼, 조이스틱, 동작 센서(motion sensor), 또는 다른 포인팅 기구(other pointing instrument), 촉각 입력 구성요소(예를 들어, 물리 버튼, 위치 및/또는 터치의 위치 및/또는 힘 또는 터치 움직임을 제공하는 터치 스크린, 또는 다른 접촉 입력 구성요소), 오디오 입력 구성요소(예를 들어, 마이크로폰) 등을 포함할 수 있다. The I / O component 1550 can include a wide variety of components for input reception, output provisioning, output generation, information transmission, information exchange, measurement capture, and the like. The particular I / O component 1550 included in a particular machine will depend on the type of machine. For example, a portable machine such as a mobile phone would include a touch input device or other input mechanism, but a headless server machine would not include such a touch input device. It will be appreciated that I / O component 1550 may include many other components not shown in FIG. I / O components 1550 are grouped according to function merely to simplify the following description, and grouping is by no means limiting. In various exemplary embodiments, the I / O component 1550 may include an output component 1552 and an input component 1554. [ The output component 1552 may be a visual component such as a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector or a cathode ray tube ), Acoustic components (e.g., speakers), haptic components (e.g., vibration motors, resistance mechanisms), other signal generators, and the like. Input component 1554 may include an alphanumeric input component (e.g., a keyboard, a touch screen configured to receive alphanumeric input, a photo-optical keyboard, or other alphanumeric input component), a point-based input component (E.g., physical buttons, locations and / or taps of a touch and / or touch, such as a mouse, trackball, joystick, motion sensor, or other pointing instrument, Or other touch input components), audio input components (e.g., microphones), and the like.

또 다른 예시적인 실시예에서, I/O 구성요소(1550)는 다수의 상이한 구성요소 중에서, 생체 인식 구성요소(1556), 모션 구성요소(1558), 환경 구성요소(1560) 또는 위치 구성요소(1562)를 포함할 수 있다. 예를 들어, 생체 인식 구성요소(1556)는 표현(예를 들어, 손 표현, 표정, 음성 표현, 몸짓 또는 안구 추적)을 감지하는 구성요소, 생체 신호(예를 들어, 혈압, 심박수, 체온, 땀 또는 뇌파)를 측정하는 구성요소, 사람을 식별(예를 들어, 음성 식별, 망막 식별, 안면 식별, 지문 식별 또는 뇌파 기반 식별)하는 구성요소 등을 포함할 수 있다. 모션 구성요소(1558)는 가속도 센서 구성요소(예컨대, 가속도계), 중력 센서 구성요소, 회전 센서 구성요소(예컨대, 자이로스코프) 등을 포함할 수 있다. 환경 구성요소(1560)는, 예를 들어 조명 센서 구성요소(예컨대, 광도계), 온도 센서 구성요소(예컨대, 주변 온도를 검출하는 하나 이상의 온도계), 습도 센서 구성요소, 압력 센서 구성요소(예컨대 기압계), 음향 센서 구성요소(예컨대, 주변 소음을 감지하는 하나 이상의 마이크로폰), 근접각 센서 구성요소(예컨대, 근처 물체를 검출하는 적외선 센서), 가스 센서(예컨대, 안전을 위해 유해 가스의 농도를 탐지하거나 대기 오염 물질을 계측하는 가스 검출 센서), 또는 주변 물리적 환경에 상응하는 징후, 측정치 또는 신호를 제공할 수 있는 다른 구성요소를 포함할 수 있다. 위치 구성요소(1562)는 위치 센서 구성요소(예를 들어, GPS 수신기 구성요소), 고도 센서 구성요소(예를 들어, 고도가 도출될 수 있는 기압을 검출하는 고도계 또는 기압계), 방향 센서 구성요소(예를 들어, 자력계) 등을 포함할 수 있다.In another exemplary embodiment, the I / O component 1550 includes a biometric component 1556, a motion component 1558, an environmental component 1560, or a location component 1560, among a number of different components 1562). For example, the biometric component 1556 may be a component that senses an expression (e.g., a hand expression, a facial expression, a voice expression, a gesture or an eye track), a biometric signal (e.g., blood pressure, heart rate, body temperature, Sweat or brain waves), components that identify a person (e.g., voice identification, retinal identification, facial identification, fingerprint identification, or EEG based identification), and the like. The motion component 1558 may include an acceleration sensor component (e.g., an accelerometer), a gravity sensor component, a rotation sensor component (e.g., a gyroscope), and the like. The environmental component 1560 may include, for example, a light sensor component (e.g., a photometer), a temperature sensor component (e.g., one or more thermometers that detect ambient temperature), a humidity sensor component, ), Acoustic sensor components (e.g., one or more microphones that sense ambient noise), proximate angular sensor components (e.g., infrared sensors that detect nearby objects), gas sensors (e.g., Or a gas detection sensor that measures air pollutants), or other components that can provide indications, measurements, or signals corresponding to the surrounding physical environment. The location component 1562 may include a location sensor component (e.g., a GPS receiver component), a height sensor component (e.g., an altimeter or barometer that detects the atmospheric pressure from which altitude may be derived) (E. G., A magnetometer), and the like.

통신은 매우 다양한 기술을 사용하여 구현될 수 있다. I/O 구성요소(1550)는 각각 커플링(1582) 및 커플링(1572)를 통해 머신(1500)을 네트워크(104) 또는 장치(1570)에 결합하도록 동작 가능한 통신 구성요소(1564)를 포함할 수 있다. 예를 들어, 통신 구성요소(1564)는 네트워크 인터페이스 구성요소 또는 네트워크(104)와 인터페이싱하기 위한 다른 적절한 장치를 포함할 수 있다. 다른 예로서, 통신 구성요소(1564)는 유선 통신 구성요소, 무선 통신 구성요소, 셀룰러 통신 구성요소, NFC(Near Field Communication; 근접 통신) 구성요소, Bluetooth^® 구성요소(예를 들면, 저전력 블루투스(Bluetooth^® Low Energy), WiFi^® 구성요소 및 다른 방식을 통해 통신을 제공하는 다른 통신 구성요소를 포함할 수 있다. 장치(1570)는 다른 머신 또는 임의의 매우 다양한 주변 장치(예를 들어, USB를 통해 연결된 주변 장치)일 수 있다. Communication can be implemented using a wide variety of technologies. The I / O component 1550 includes a communication component 1564 that is operable to couple the machine 1500 to the network 104 or the device 1570 via a coupling 1582 and a coupling 1572, respectively. can do. For example, the communication component 1564 may include a network interface component or other suitable device for interfacing with the network 104. As another example, communication component 1564 is a wired communication component, a wireless communications component, a cellular communication component, NFC; g. Component (for example, (Near Field Communication near field communication) component, Bluetooth ^®, Bluetooth Low Energy ( may include a Bluetooth ^® Low Energy), WiFi ^® components and different ways to different communication components to provide communications over. device 1570 is an example, a USB other machines, or any of a wide variety of peripherals (e.g., Lt; / RTI >

또한, 통신 구성요소(1564)는 식별자를 검출할 수 있거나, 식별자를 검출하도록 동작 가능한 구성요소를 포함한다. 예를 들어, 통신 구성요소(1564)는 RFID(Radio Frequency Identification) 태그 판독기 구성요소, NFC 스마트 태그 검출 구성요소, 광학 판독기 구성요소(예를 들어, UPC(Universal Product Code; 통일 제품 코드) 바코드와 같은 1차원 바코드, QR(Quick Response) 코드와 같은 다차원 바코드, 아즈텍 코드(aztec code), 데이터 행렬(Data Matrix), 데이터글리프(Dataglyph), 맥시 코드(MaxiCode), PDF417, 울트라 코드(Ultra Code), UCC RSS-2D 바코드 및 다른 광학 코드를 검출하는 광학 센서), 또는 음향 탐지 구성요소(예를 들어, 태그된 오디오 신호를 식별하는 마이크로폰)를 포함할 수 있다. 또한, IP(Internet Protocol) 공간 위치에 의한 위치, Wi-Fi(상표명) 신호 삼각 측량에 의한 위치, 특정 위치를 나타낼 수 있는 NFC 비콘 신호의 검출에 의한 위치 등과 같은 다양한 정보가 통신 구성요소(1564)를 통해 도출될 수 있다.The communication component 1564 also includes a component that is capable of detecting the identifier or is operable to detect the identifier. For example, communication component 1564 may include a Radio Frequency Identification (RFID) tag reader component, an NFC smart tag detection component, an optical reader component (e.g., UPC (Universal Product Code) Aztec code, a data matrix, a data matrix, a maxi code, a PDF417, an Ultra code, and the like, such as a one-dimensional bar code and a QR (Quick Response) , Optical sensors that detect UCC RSS-2D bar codes and other optical codes), or acoustic detection components (e.g., a microphone that identifies the tagged audio signal). In addition, various information such as a position by IP (Internet Protocol) spatial position, a position by Wi-Fi (trademark) signal triangulation, a position by detection of NFC beacon signal capable of indicating a specific position, ). &Lt; / RTI >

전송 매체Transmission medium

다양한 예시적인 실시예에서, 네트워크(104)의 하나 이상의 부분은 애드혹 네트워크, 인트라넷, 엑스트라넷, VPN, LAN, WLAN, WAN, WWAN, MAN, 인터넷 , 인터넷의 일부, PSTN의 일부, POTS(plain old telephone service, 기존 전화 서비스) 네트워크, 셀룰러 전화 네트워크, 무선 네트워크, Wi-Fi® 네트워크, 다른 유형의 네트워크, 또는 이들 2개 이상의 네트워크의 조합일 수 있다. 예를 들어, 네트워크(104) 또는 네트워크(104)의 일부는 무선 또는 셀룰러 네트워크를 포함할 수 있고, 커플링(1582)은 코드 분할 다중 액세스(CDMA; Code Division Multiple Access) 접속, 세계 무선 통신 시스템(GSM; Global System for Mobile communication) 접속 또는 다른 유형의 셀룰러 또는 무선 커플링일 수 있다. 이 예에서, 커플링(1582)은 단일 반송파 무선 전송 기술(lxRTT; Single Carrier Radio Transmission Technology), 진화 데이터 최적화(EVDO; Evolution-Data Optimized) 기술, 일반 패킷 무선 서비스(GPRS; General Packet Radio Service) 기술, GSM 향상을 위한 개선된 데이터 레이트(EDGE; Enhanced Data rates for GSM Evolution), 3G를 포함하는 3GPP(third Generation Partnership Project), 4세대 무선(4G) 네트워크, 범용 이동 통신 시스템(UMTS; Universal Mobile Telecommunications System), 고속 패킷 접속(HSPA; High Speed Packet Access), 와이맥스(WiMAx; Worldwide Interoperability for Microwave Access), 마이크로웨이브 액세스를 위한 세계 상호 운용성을 포함한 GSM 진화 (EDGE) 기술, 3 세대 파트너쉽 프로젝트 (3GPP)(WiMAX), LTE(Long Term Evolution) 표준, 다양한 표준 설정 기관에 의해 정의된 다른 표준, 다른 장거리 프로토콜 또는 다른 데이터 전송 기술과 같은 임의의 다양한 유형의 데이터 전송 기술로 구현될 수 있다. In various exemplary embodiments, one or more portions of the network 104 may be part of an ad hoc network, an intranet, an extranet, a VPN, a LAN, a WLAN, a WAN, a WWAN, a MAN, telephone service, existing telephone service), a cellular telephone network, a wireless network, a Wi-Fi network, another type of network, or a combination of two or more of these networks. For example, the network 104 or a portion of the network 104 may include a wireless or cellular network and the coupling 1582 may include a Code Division Multiple Access (CDMA) 0.0 > (GSM) < / RTI > connection or other type of cellular or wireless coupling. In this example, the coupling 1582 may be a Single Carrier Radio Transmission Technology (lxRTT), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) Technology, Enhanced Data Rates for GSM Evolution (EDGE), Third Generation Partnership Project (3GPP) including 3G, Fourth Generation Wireless (4G) network, Universal Mobile (EDGE) technology, including world interoperability for microwave access, 3GPP (3GPP) technology, 3GPP, 3GPP, 3GPP, (WiMAX), Long Term Evolution (LTE) standards, other standards defined by various standards-setting organizations, other long distance protocols, or other data transmission technologies Significance can be implemented in various types of data transmission technology.

명령어(1516)는 네트워크 인터페이스 장치(예를 들어, 통신 구성요소(1564)에 포함된 네트워크 인터페이스 구성요소)를 통한 전송 매체를 사용하고 그리고 공지된 다수의 전송 프로토콜(예를 들어, 하이퍼텍스트 전송 프로토콜(HTTP; hypertext transfer protocol) 중 어느 하나를 사용하는 네트워크(104)를 통해 송신되거나 수신될 수 있다. 마찬가지로, 명령어(1516)는 장치(1570)에 대해 커플링(1572)(예를 들어, 피어 - 투 - 피어 커플링)을 통해 전송 매체를 사용하여 송신되거나 수신될 수 있다. "전송 매체"라는 용어는 머신(1500)에 의한 실행을 위한 명령어(1516)를 저장, 인코딩, 또는 전달할 수 있는 임의의 무형 매체를 포함해야 하며, 그러한 소프트웨어의 통신을 용이하게 하는 디지털 또는 아날로그 통신 신호 또는 다른 무형 매체를 포함한다. 전송 매체는 머신 판독 가능한 매체의 일 실시예이다.Instructions 1516 may be transmitted using a transmission medium through a network interface device (e.g., a network interface component included in communication component 1564) and using a number of known transmission protocols (e.g., The command 1516 may be transmitted or received over the network 104 using any one of the hypertext transfer protocol (HTTP) The term "transmission medium" is used herein to refer to any medium that can store, encode, or otherwise convey instructions 1516 for execution by machine 1500. The term " transmission medium " Digital or analog communications signals or other intangible media that facilitate the communication of such software. Is an example of a possible medium.

예시적 방법Example method

도 16은 발행물의 관련 카테고리를 식별하는 예시적 방법(1600)을 도시한다. 상기 방법(1600)은 발행물을 발행물 코퍼스에 추가하기 위한 요청에 액세스하는 동작(1610), 발행물의 카테고리의 관련 세트를 식별하는 동작(1620), 및 발행물의 카테고리의 관련 세트를 표시하는 동작(1630)을 포함한다. 16 illustrates an exemplary method 1600 for identifying related categories of publications. The method 1600 includes an operation 1610 of accessing a request to add a publication to a publication corpus, an operation 1620 of identifying an associated set of categories of a publication, and an operation 1630 of displaying an associated set of categories of a publication ).

동작(1610)에서는, 하나 이상의 프로세서에 의해, 발행물을 발행물 코퍼스에 추가하고 발행물의 카테고리의 관련 세트를 식별하기 위해 사용자 장치로부터의 요청에 액세스한다. 예를 들어, 도 2에서, 리스팅 시스템(150)의 서버 내의 하나 이상의 프로세서는 사용자 장치(204)로부터의 요청에 액세스한다. 도 3b는 사용자 장치로부터 요청에 의해 추가된 발행물의 예이다.At operation 1610, one or more processors access the request from the user device to add the publication to the publication corpus and to identify the relevant set of categories of the publication. For example, in FIG. 2, one or more processors in the server of the listing system 150 access requests from the user device 204. Figure 3B is an example of a publication added by request from a user device.

동작(1620)에서는, 하나 이상의 프로세서에 의해, (i) 발행물의 적어도 일부에 대응하는 발행물 의미론적 벡터 - 발행물 의미론적 벡터는 발행물의 적어도 일부를 의미론적 벡터 공간으로 투영하는 제 1 머신 학습 모델에 기반을 둠 - , 및 (ii) 복수의 카테고리로부터의 각 카테고리에 대응하는 복수의 카테고리 벡터를 비교하여 하나 이상의 가장 가까운 매치를 식별하며, 복수의 카테고리 벡터는 복수의 카테고리를 의미론적 벡터 공간으로 투영하는 제 2 머신 학습 모델에 기반을 두고, 복수의 카테고리는 발행물 코퍼스에서 발행물의 분류 체계이다. 도 4는 가장 가까운 매치를 식별하는 예이다.In operation 1620, a publication semantic vector-publication semantic vector corresponding to (i) at least a portion of a publication is generated by one or more processors in a first machine learning model that projects at least a portion of the publication into a semantic vector space , And (ii) comparing the plurality of category vectors corresponding to each category from the plurality of categories to identify one or more closest matches, wherein the plurality of category vectors are generated by projecting the plurality of categories into a semantic vector space Based on the second machine learning model, a plurality of categories is a classification system of a publication in a publication corpus. Fig. 4 is an example of identifying the closest match.

동작(1630)에서, 하나 이상의 가장 가까운 매치가 사용자 장치에서, 발행물 코퍼스의 카테고리의 관련 세트로서 표시되게 한다. 예를 들어, 도 2에서, 리스팅 시스템(150)의 서버 내의 하나 이상의 프로세서가 사용자 장치(204) 상에 표시하게 된다. 도 3a는 가장 가까운 매치의 표시예이다.At operation 1630, one or more nearest matches are displayed at the user device as a related set of categories of publication corpus. For example, in FIG. 2, one or more processors in the server of the listing system 150 are displayed on the user device 204. Fig. 3A shows an example of the closest match.

언어 language

본 명세서에서, 복수의 인스턴스는 단일 인스턴스로 기술된 구성요소, 동작 또는 구조를 구현할 수 있다. 하나 이상의 방법의 개별 동작이 독립된 동작으로 도시되고 설명되었지만, 하나 이상의 개별 동작은 동시에 수행될 수 있으며, 동작은 도시된 순서대로 수행될 필요는 없다. 예시적 구성예에서 독립 구성요소로서 나타낸 구성 및 기능은 조합된 구성 또는 구성요소로서 구현될 수 있다. 마찬가지로, 단일 구성요소로서 나타낸 구성 및 기능은 개별 구성요소로서 구현될 수 있다. 이들 및 다른 변형, 수정, 추가 및 개선은 본 명세서의 주제의 범위 내에 있다. In this specification, a plurality of instances may implement a component, an operation, or a structure described in a single instance. Although separate operations of one or more methods are shown and described as separate operations, one or more separate operations may be performed simultaneously, and operations need not be performed in the order shown. The constitution and function represented as independent components in the exemplary configuration example can be implemented as a combined constitution or component. Likewise, configurations and functions represented as single components may be implemented as discrete components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.

비록 본 발명의 주제에 대한 개요가 특정 예시적 실시예를 참조하여 설명되었지만, 본 발명의 실시예의 보다 넓은 범위를 벗어나지 않고 이들 실시예에 대한 다양한 수정 및 변경이 이루어질 수 있다. 본 발명의 주제의 실시예는 편의를 위해 그리고 본 출원의 범위를 임의의 단일의 개시 또는 발명의 개념에 자발적으로 한정하려고 의도하지 않고 "발명"이라는 용어에 의해 개별적으로 또는 집합적으로 언급될 수 있으며, 사실상 개시되어 있다. Although an overview of the subject matter of the present invention has been described with reference to specific exemplary embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of the embodiments of the present invention. Embodiments of the subject matter of the present invention are not intended to spontaneously limit the scope of the present application to any single disclosure or inventive concept and may be referred to individually or collectively by the term "invention" And is virtually disclosed.

본 명세서에서 설명된 실시예는 당업자라면 개시된 교시 내용을 실시할 수 있도록 충분히 상세하게 설명되었다. 본 출원의 범위를 벗어나지 않고 구조적 및 논리적 대체 및 변경이 이루어질 수 있도록 다른 실시예가 사용될 수 있고 이로부터 도출될 수 있다. 따라서, 상세한 설명은 제한적인 의미로 받아 들여서는 안되며, 다양한 실시예의 범위는 첨부된 청구범위와 이 청구 범위가 부여되는 등가물의 전체 범위에 의해서만 규정된다. The embodiments described herein have been described in sufficient detail to enable those skilled in the art to practice the disclosed teachings. Other embodiments may be used and derived therefrom so that structural and logical substitutions and changes may be made without departing from the scope of the present application. The detailed description is, therefore, not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims and the full scope of equivalents to which such claims are entitled.

본 명세서에서 사용된 바와 같이, 용어 "또는"은 포괄적 또는 배타적 의미로 해석될 수 있다. 또한, 본 명세서에서 단일 인스턴스로서 기술된 리소스, 동작 또는 구성에 대해 복수의 인스턴스가 제공될 수도 있다. As used herein, the term "or" may be interpreted in a generic or exclusive sense. Also, a plurality of instances may be provided for a resource, operation, or configuration described herein as a single instance.

또한, 다양한 리소스, 동작, 모듈, 엔진 및 데이터 저장소간의 경계는 다소 임의적이며 특정 동작은 특정 예시적 구성과 관련하여 설명된다. 기능의 상이한 배분이 계획되어, 본 출원의 다양한 실시예의 범위 내에 있을 수 있다. 일반적으로, 예시 구성예에서 개별 리소스로서 나타낸 구성 및 기능은 조합된 구성 또는 리소스로 구현될 수 있다. 마찬가지로, 단일 리소스로서 나타낸 구성 및 기능은 별도의 리소스로 구현될 수 있다. 이들 및 다른 변형, 수정, 추가 및 개선은 첨부된 청구범위에 의해 나타내어지는 본 출원의 실시예의 범위 내에 있다. 따라서, 명세서 및 도면은 제한적인 의미이기 보다는 예시적인 것으로 간주되어야 한다. In addition, the boundaries between various resources, operations, modules, engines, and data stores are somewhat arbitrary and specific operations are described in connection with specific exemplary configurations. Different distributions of functionality are planned and may be within the scope of various embodiments of the present application. In general, the configuration and functions shown as individual resources in the example configuration examples may be implemented as a combined configuration or resource. Similarly, the configuration and function represented as a single resource may be implemented as separate resources. These and other variations, modifications, additions and improvements are within the scope of the embodiments of the present application, which are indicated by the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

이하에 번호가 매겨진 예는 실시예를 나타낸다.The numbered examples below represent embodiments.

1. 방법은, 하나 이상의 프로세서에 의해, 사용자 장치로부터의 요청을 액세스하여, 발행물 코퍼스에 발행물을 추가하고 발행물의 관련 카테고리의 세트를 식별하는 것, 하나 이상의 프로세서에 의해, (i) 발행물의 적어도 일부에 대응하는 발행물 의미론적 벡터 - 상기 발행물 의미론적 벡터는 발행물의 적어도 일부를 의미론적 벡터 공간으로 투영하는 제 1 머신 학습 모델에 기반을 둠 - 와, (ii) 복수의 카테고리로부터의 각 카테고리에 대응하는 복수의 카테고리 벡터 사이의 하나 이상의 가장 가까운 매치를 식별하는 것 - 상기 복수의 카테고리 벡터는 상기 복수의 카테고리를 상기 의미론적 벡터 공간에 투영하는 제 2 머신 학습 모델에 기반을 두고, 상기 복수의 카테고리는 발행물 코퍼스에서의 발행물의 분류 체계임 - ; 및 상기 발행물 코퍼스의 관련 카테고리의 세트로서 상기 하나 이상의 가장 가까운 매치를 상기 사용자 장치 상에 표시하게 하는 것을 포함한다.1. A method comprising: (1) accessing a request from a user device by one or more processors, adding a publication to a publication corpus and identifying a set of related categories of publications; A publication semantic vector corresponding to the portion, the publication semantic vector being based on a first machine learning model that projects at least a portion of the publication into a semantic vector space; and (ii) Identifying one or more closest matches between a corresponding plurality of category vectors, wherein the plurality of category vectors are based on a second machine learning model that projects the plurality of categories into the semantic vector space, Category is the classification system of the publication in the publication corpus; And displaying the one or more nearest matches on the user device as a set of related categories of the publication corpus.

2. 예 1의 방법에 있어서, 상기 카테고리는 리프 카테고리(leaf category)이다.2. The method of embodiment 1, wherein said category is a leaf category.

3. 예 1 또는 예 2의 방법에 있어서, 상기 카테고리는 복수의 카테고리의 카테고리 트리에서 루트 레벨 아래에 적어도 2개의 트리 레벨의 카테고리 경로이다.3. The method of embodiment 1 or 2, wherein the category is at least two tree level category paths below the root level in the category tree of the plurality of categories.

4. 예 1 내지 예 3 중 어느 한 예의 방법에 있어서, 상기 발행물의 적어도 일부는 발행물의 제목을 포함한다. 4. The method as in any one of the examples 1 to 3, wherein at least a portion of the publication includes the title of the publication.

5. 예 1의 방법에 있어서, 상기 제 1 머신 학습 모델 및 상기 제 2 머신 학습 모델 중 적어도 하나는 상기 발행물 코퍼스의 이전에 추가된 발행물로부터 자동으로 도출된 데이터에 대해 트레이닝된다.5. The method of embodiment 1, wherein at least one of the first machine learning model and the second machine learning model is trained for data automatically derived from a previously added publication of the publication corpus.

6. 예 1 내지 예 5 중 어느 한 예의 방법에 있어서, 상기 제 1 머신 학습 모델 및 상기 제 2 머신 학습 모델 중 적어도 하나는 하나 이상의 하위 단어 레벨 및 하위 문자 레벨에서 트레이닝되어, 런타임 시에 어휘 이외의 용어를 줄인다.6. The method of any one of the examples 1 to 5, wherein at least one of the first machine learning model and the second machine learning model is trained at one or more lower word levels and lower character levels, .

7. 예 1 내지 예 6 중 어느 한 예의 방법에 있어서, 새로운 카테고리에 대해 상기 제 2 머신 학습 모델을 다시 트레이닝시키지 않고 상기 복수의 카테고리에 상기 새로운 카테고리를 추가하는 것을 더 포함하고, 하나 이상의 가장 가까운 매치로서 식별된 상기 하나 이상의 가장 가까운 매치는 상기 새로운 카테고리를 포함한다.7. The method of any one of Examples 1 to 6, further comprising adding the new category to the plurality of categories without re-training the second machine learning model for a new category, The one or more closest matches identified as matches comprise the new category.

8. 컴퓨터는, 명령어가 저장된 저장 장치; 및 상기 명령어에 의해 동작을 수행하도록 구성된 하나 이상의 하드웨어 프로세서를 포함하며, 상기 동작은, 하나 이상의 프로세서에 의해, 사용자 장치로부터의 요청에 액세스하여, 발행물 코퍼스에 발행물을 추가하고 발행물의 관련 카테고리의 세트를 식별하는 것; 하나 이상의 프로세서에 의해, (i) 발행물의 적어도 일부에 대응하는 발행물 의미론적 벡터 - 상기 발행물 의미론적 벡터는 발행물의 적어도 일부를 의미론적 벡터 공간으로 투영하는 제 1 머신 학습 모델에 기반을 둠 - 와, (ii) 복수의 카테고리로부터 각각의 카테고리에 대응하는 복수의 카테고리 벡터 사이에서 하나 이상의 가장 가까운 매치를 식별하는 것 - 상기 복수의 카테고리 벡터는 상기 복수의 카테고리를 상기 의미론적 벡터 공간으로 투영하는 제 2 머신 학습 모델에 기반을 두고, 상기 복수의 카테고리는 상기 발행물 코퍼스에서 상기 발행물의 분류 체계임 - ; 및 상기 하나 이상의 가장 가까운 매치를 상기 발행물 코퍼스의 관련 카테고리의 세트로서 상기 사용자 장치 상에 표시하게 하는 것을 포함한다.8. A computer, comprising: a storage device in which instructions are stored; And one or more hardware processors configured to perform operations by the instructions, the operations comprising: accessing a request from a user device by one or more processors to add a publication to a publication corpus, &Lt; / RTI > (I) a publication semantic vector corresponding to at least a portion of a publication, the publication semantic vector being based on a first machine learning model that projects at least a portion of the publication into a semantic vector space; and (ii) identifying one or more closest matches between a plurality of category vectors corresponding to respective categories from a plurality of categories, the plurality of category vectors being a plurality of categories, 2 machine learning model, the plurality of categories being a classification scheme of the publication in the publication corpus; And causing the one or more closest matches to be displayed on the user device as a set of related categories of the publication corpus.

9. 예 8의 컴퓨터에 있어서, 상기 카테고리는 리프 카테고리(leaf category)이다.9. The computer of example 8, wherein said category is a leaf category.

10. 예 8 또는 예 9의 컴퓨터에 있어서, 상기 카테고리는 상기 복수의 카테고리의 카테고리 트리에서 루트 레벨 아래의 적어도 2개의 트리 레벨의 카테고리 경로이다.10. The computer of example 8 or 9, wherein the category is at least two tree level category paths below the root level in the category tree of the plurality of categories.

11. 예 8 내지 예 10 중 어느 한 예의 컴퓨터에 있어서, 상기 발행물의 적어도 일부는 발행물의 제목을 포함한다.11. The computer as in any one of embodiments 8-10, wherein at least a portion of the publication includes a title of the publication.

12. 예 8 내지 예 11 중 어느 한 예의 컴퓨터에 있어서, 상기 제 1 머신 학습 모델 및 상기 제 2 머신 학습 모델 중 적어도 하나는 상기 발행물 코퍼스의 이전에 추가된 발행물로부터 자동으로 도출된 데이터에 대해 트레이닝된다.12. The computer of any one of embodiments 8-11, wherein at least one of the first machine learning model and the second machine learning model is adapted to perform training on data automatically derived from a previously added publication of the publication corpus, do.

13. 예 8 내지 예 12 중 어느 한 예의 컴퓨터에 있어서, 상기 제 1 머신 학습 모델 및 상기 제 2 머신 학습 모델 중 적어도 하나는 하위 단어 레벨 및 하위 문자 레벨 중 하나 이상에서 트레이닝되어 런타임 구동시에 어휘 이외 용어를 줄인다.13. The computer of any one of the examples 8-12, wherein at least one of the first machine learning model and the second machine learning model is trained in one or more of a lower word level and a lower character level, Reduce the term.

14. 예 8의 컴퓨터에 있어서, 상기 동작은 새로운 카테고리에 대해 상기 제 2 머신 학습 모델을 다시 트레이닝시키지 않고 상기 복수의 카테고리에 상기 새로운 카테고리를 추가하는 것을 더 포함하며, 하나 이상의 가장 가까운 매치로서 식별된 상기 하나 이상의 가장 가까운 매치는 상기 새로운 카테고리를 포함한다.14. The computer of example 8, wherein the operation further comprises adding the new category to the plurality of categories without re-training the second machine learning model for a new category, The one or more closest matches comprise the new category.

15. 머신의 하나 이상의 프로세서에 의한 실행시에, 상기 머신으로 하여금 동작을 수행하게 하는 명령어가 저장된 하드웨어 머신 판독 가능한 장치에서, 상기 동작은, 하나 이상의 프로세서에 의해, 사용자 장치로부터의 요청에 액세스하여, 발행물 코퍼스에 발행물 추가하고 발행물의 관련 카테고리의 세트를 식별하하는 것; 하나 이상의 프로세서로, (i) 발행물의 적어도 일부에 대응하는 발행물 의미론적 벡터 - 상기 발행물 의미론적 벡터는 발행물의 적어도 일부를 의미론적 벡터 공간으로 투영하는 제 1 머신 학습 모델에 기반을 둠 - 와, (ii) 복수의 카테고리로부터 각각의 카테고리에 대응하는 복수의 카테고리 벡터 사이에서 하나 이상의 가장 가까운 매치를 식별하는 것 - 상기 복수의 카테고리 벡터는 상기 복수의 카테고리를 상기 의미론적 벡터 공간으로 투영하는 제 2 머신 학습 모델에 기반을 두고, 상기 복수의 카테고리는 상기 발행물 코퍼스에서 상기 발행물의 분류 체계임 - ; 및 상기 하나 이상의 가장 가까운 매치를 상기 발행물 코퍼스의 관련 카테고리의 세트로서 상기 사용자 장치 상에 표시하게 하는 것을 포함한다.15. A hardware machine readable device having stored thereon instructions for causing the machine to perform operations upon execution by one or more processors of the machine, the operations comprising: accessing, by one or more processors, , Adding a publication to a publication corpus and identifying a set of related categories of publication; (I) a publication semantic vector corresponding to at least a portion of a publication, the publication semantic vector being based on a first machine learning model that projects at least a portion of the publication into a semantic vector space; (ii) identifying one or more closest matches between a plurality of category vectors corresponding to respective categories from a plurality of categories, the plurality of category vectors being a second one of projecting the plurality of categories into the semantic vector space Based on a machine learning model, said plurality of categories being a classification scheme of said publication in said publication corpus; And causing the one or more closest matches to be displayed on the user device as a set of related categories of the publication corpus.

16. 예 15의 컴퓨터에 있어서, 상기 카테고리는 리프 카테고리(leaf category)이다.16. The computer of embodiment 15 wherein said category is a leaf category.

17. 예 15 또는 예 16의 컴퓨터에 있어서, 상기 카테고리는 상기 복수의 카테고리의 카테고리 트리에서 루트 레벨 아래의 적어도 2개의 트리 레벨의 카테고리 경로이다.17. The computer of example 15 or 16, wherein the category is at least two tree level category paths below the root level in the category tree of the plurality of categories.

18. 예 15 내지 예 17 중 어느 한 예의 컴퓨터에 있어서, 상기 발행물의 적어도 일부는 발행물의 제목을 포함한다.18. The computer as in any of the embodiments 15-17, wherein at least a portion of the publication includes a title of the publication.

19. 예 15 내지 예 18 중 어느 한 예의 컴퓨터에 있어서, 상기 제 1 머신 학습 모델 및 상기 제 2 머신 학습 모델 중 적어도 하나는 상기 발행물 코퍼스의 이전에 추가된 발행물로부터 자동으로 도출된 데이터에 대해 트레이닝된다.19. The computer as in any of the embodiments 15-18, wherein at least one of the first machine learning model and the second machine learning model is adapted to perform training on data automatically derived from a publication previously added to the publication corpus do.

20. 예 15 내지 예 19 중 어느 한 예의 컴퓨터에 있어서, 상기 제 1 머신 학습 모델 및 상기 제 2 머신 학습 모델 중 적어도 하나는 적어도 하나 이상의 하위 단어 레벨 및 하위 문자 레벨에서 트레이닝되어 런타임 시에 어휘 이외 용어를 줄인다. 20. The computer as in any one of Examples 15 to 19, wherein at least one of the first machine learning model and the second machine learning model is trained at at least one lower word level and a lower character level, Reduce the term.

21. 머신의 하나 이상의 프로세서에 의해 실행될 때, 머신으로 하여금 예 1 내지 7 중 어느 한 예의 방법을 수행하게 하는 머신 판독 가능한 명령어를 전달하는 머신 판독 가능한 매체.21. A machine-readable medium that when executed by one or more processors of a machine conveys machine-readable instructions that cause the machine to perform the method of any one of examples 1-7.

Claims

As a method,
Accessing a request from a user device by one or more processors to add a publication to a publication corpus and to identify a set of related categories of the publication;
(I) a publication semantic vector corresponding to at least a portion of the publication, the publication semantic vector comprising at least one of a first machine learning that projects at least a portion of the publication into a semantic vector space, (Ii) comparing a plurality of category vectors corresponding to each category from a plurality of categories to identify one or more closest matches, wherein the plurality of category vectors are associated with the plurality of categories Based on a second machine learning model for projecting into the semantic vector space, the plurality of categories being the classification scheme of the publication in the publication corpus; And
Causing the one or more closest matches to be displayed on the user device as a set of related categories of the publication corpus
Containing
Way.

The method according to claim 1,
Categories are leaf categories.
Way.

The method according to claim 1,
Wherein the category is at least two tree level category paths below the root level in the category tree of the plurality of categories
Way.

The method according to claim 1,
Wherein at least a portion of the publication includes a title of the publication
Way.

The method according to claim 1,
Wherein at least one of the first machine learning model and the second machine learning model is trained for data automatically derived from a previously added publication of the publication corpus
Way.

The method according to claim 1,
At least one of the first machine learning model and the second machine learning model is trained at one or more lower word levels and lower character levels to reduce out-of-vocabulary terms at runtime
Way.

The method according to claim 1,
Further comprising adding the new category to the plurality of categories without re-training the second machine learning model for the new category,
Wherein the one or more closest matches identified as one or more closest matches comprise the new category
Way.

A storage device storing an instruction; And
One or more hardware processors configured to perform operations by the instructions;
/ RTI >
The operation includes:
Accessing a request from a user device by the one or more processors to add a publication to a publication corpus and identify a set of related categories of publication;
(I) a publication semantic vector corresponding to at least a portion of the publication, the publication semantic vector being based on a first machine learning model that projects at least a portion of the publication into a semantic vector space And (ii) comparing a plurality of category vectors corresponding to each category from a plurality of categories to identify one or more closest matches, wherein the plurality of category vectors are generated by comparing the plurality of categories with the semantic vector Based on a second machine learning model projecting into a space, the plurality of categories being the classification scheme of the publication in the publication corpus; And
Causing the one or more closest matches to be displayed on the user device as a set of related categories of the publication corpus
Containing
computer.

9. The method of claim 8,
The category is the leaf category
computer.

9. The method of claim 8,
Wherein the category is at least two tree level category paths below the root level in the category tree of the plurality of categories
computer.

9. The method of claim 8,
Wherein at least a portion of the publication includes a title of the publication
computer.

9. The method of claim 8,
Wherein at least one of the first machine learning model and the second machine learning model is trained for data automatically derived from a previously added publication of the publication corpus
computer.

9. The method of claim 8,
Wherein at least one of the first machine learning model and the second machine learning model is trained at one or more lower word level and lower character level to reduce terms other than vocabulary at runtime
computer.

9. The method of claim 8,
The operation further comprises adding the new category to the plurality of categories without re-training the second machine learning model for the new category,
Wherein the one or more closest matches identified as one or more closest matches comprise the new category
computer.

A hardware machine readable apparatus having stored thereon instructions for causing the machine to perform an operation when executed by one or more processors of the machine,
The operation includes:
Accessing a request from a user device by one or more processors to add a publication to a publication corpus and identify a set of related categories of the publication;
(I) a publication semantic vector corresponding to at least a portion of the publication, the publication semantic vector being based on a first machine learning model that projects at least a portion of the publication into a semantic vector space (Ii) comparing a plurality of category vectors corresponding to each category from a plurality of categories to identify one or more nearest matches, wherein the plurality of category vectors are used to identify the plurality of categories as semantic Based on a second machine learning model projecting into a vector space, wherein the plurality of categories is the classification scheme of the publication in the publication corpus; And
And displaying the one or more closest matches on the user device as a set of related categories of the publication corpus
Hardware machine readable device.

16. The method of claim 15,
The category is the leaf category
computer.

16. The method of claim 15,
Wherein the category is at least two tree level category paths below the root level in the category tree of the plurality of categories
computer.

16. The method of claim 15,
Wherein at least a portion of the publication includes a title of the publication
computer.

16. The method of claim 15,
Wherein at least one of the first machine learning model and the second machine learning model is trained for data automatically derived from a previously added publication of the publication corpus
computer.

16. The method of claim 15,
Wherein at least one of the first machine learning model and the second machine learning model is trained at one or more lower word level and lower character level to reduce terms other than vocabulary at runtime
computer.

Readable instructions that when executed by one or more processors of a machine cause the machine to perform the method of any one of claims 1 to 7.