KR20170026264A

KR20170026264A - Product search method and system

Info

Publication number: KR20170026264A
Application number: KR1020160109420A
Authority: KR
Inventors: 전재영; 박준철; 장윤훈; 최형원
Original assignee: 옴니어스 주식회사
Priority date: 2015-08-26
Filing date: 2016-08-26
Publication date: 2017-03-08
Also published as: KR101801846B1

Abstract

Disclosed is a method and a system for searching a product, by which a product may be recognized from an inquiry image registered by the user and product images including a product that is similar the recognized product to be provided to a user. The method for searching a product according to an embodiment includes: an inquiry image receiving step of, by a communication unit, receiving an inquiry image from a user device; a candidate area extracting step of extracting one or more candidate areas in which a product may be present from the inquiry image by applying an object detection model obtained by learning the inquiry image, by a candidate area detection unit; a feature extracting step of, by the feature extraction unit, applying a candidate image that is an image in the at least one candidate area to the learned feature extraction model to extract features for attributes of the candidate image; and a search step of searching a product image database for product images that are similar to the candidate area image, based on the feature extracted by the search unit. The feature extraction model has a structure in which a low level hierarchy includes a unified network and an upper level hierarchy includes a plurality of classification networks separated for the attributes.

Description

{Product search method and system}

본 발명은 사용자가 등록한 질의 영상(Query image)으로부터 상품을 인식하고, 인식된 상품과 유사한 상품을 포함하는 상품 영상들을 검색하여 사용자에게 제공하는 상품 검색 방법 및 시스템에 관한 것이다. The present invention relates to a product search method and system for recognizing a product from a query image registered by a user and searching for product images including a product similar to the recognized product and providing the product image to a user.

통신 기술이 발전함에 따라 다양한 종류의 인터넷 쇼핑몰이 생겨났다. 인터넷 쇼핑몰은 인터넷을 통해 상품이 거래되는 영업장을 말한다. 이러한 인터넷 쇼핑몰에서 상품을 구매하려는 경우, 사용자는 상품의 이름이나 제조회사 등과 같이 상품과 관련된 텍스트를 입력하여 유사 상품들을 검색하거나, 인터넷 쇼핑몰에서 제공되는 카테고리들 중에서 소정 카테고리를 선택하여, 유사 상품들을 검색한다. As communication technology developed, various types of Internet shopping malls were created. An Internet shopping mall refers to a business place where goods are traded through the Internet. In order to purchase a product at the Internet shopping mall, the user inputs a text related to the product such as a name of a product or a manufacturer, searches for similar products, selects a predetermined category from the categories provided in the Internet shopping mall, Search.

이처럼 종래에는 사용자가 입력한 텍스트나 사용자가 선택한 카테고리에 기초하여 상품을 검색하기 때문에 키워드나 카테고리로 분류하기 어려운 상품들을 검색하기에는 한계가 있었다. 왜냐하면 텍스트를 이용하여 특정 대상을 검색하기 위해서는 상기 특정 대상이 분류되는 속성 정보를 명확하게 알고 있어야 하기 때문이다. Thus, conventionally, there is a limit to search for products that are difficult to classify into keywords or categories because they search for products based on text entered by the user or categories selected by the user. This is because, in order to search for a specific object by using a text, the attribute information in which the specific object is classified must be clearly known.

위와 같은 문제 때문에, 사용자가 속성 정보를 알지 못하는 대상을 검색할 때에는 예를 들어, 구글(Google.com)의 영상 검색 방식을 이용하는 것이 사용자에게 훨씬 편리하다. 구글의 영상 검색 서비스는 영상 자체를 검색 쿼리로 이용하고, 해당 영상과 유사한 영상이나 관련이 있는 영상을 출력해주는 서비스 모델이다. Due to the above problems, it is more convenient for the user to use the image search method of Google (Google.com), for example, when the user searches for the object which does not know the attribute information. Google's image search service is a service model that uses the image itself as a search query and outputs images similar to or related to the image.

이러한 서비스 모델에서 사용하는 영상 검색 방법으로는, 영상의 태그값을 이용하는 방법, 영상의 해쉬값을 이용하는 방법을 예로 들 수 있다. 최근에는 유사도 분석을 위해 영상의 픽셀값을 매트릭스로 만든 다음, 주성분 분석, 클러스터링, 머신 러닝 등을 이용하여 유사한 값을 검색하는 방법도 사용되고 있다. Examples of the image retrieval method used in the service model include a method using a tag value of an image and a method using an image hash value. In recent years, a method of searching for similar values using principal component analysis, clustering, and machine learning has also been used in the prior art, in which a pixel value of an image is converted into a matrix for similarity analysis.

그런데, 패션 상품 영상과 관련해서는 이러한 기존의 영상 검색 방법이 효과적이지 않다. 왜냐하면 패션 상품을 구매하려고 유입된 사용자들은 패션 상품 영상과 유사한 영상을 찾고 싶어 하는 것이 아니라, 패션 상품 영상에 포함되어 있는 패션 상품과 유사하거나 관련이 있는 패션 상품을 찾고 싶어 하기 때문이다. 따라서, 패션 상품 영상에 기존의 영상 검색 방법을 적용하면, 배경 등의 불필요한 영역의 픽셀 정보도 포함하여 영상 검색이 진행되므로, 사용자가 원하는 패션 상품 영상을 찾아주는 것이 어려운 실정이었다. However, with regard to fashion merchandise images, such existing image search methods are not effective. This is because the users who are in the process of purchasing the fashion goods do not want to find the images similar to the fashion product images but want to find the fashion products similar or related to the fashion products included in the fashion product images. Accordingly, when the conventional image retrieval method is applied to the fashion product image, since the image retrieval is performed including the pixel information of the unnecessary area such as the background, it is difficult to find the fashion product image desired by the user.

또한, 사용자는 패션 상품을 인지할 때 굉장히 많은 속성을 고려하게 된다. 기본적으로 사용자는 패션 상품을 인지할 때 패션 상품의 색상, 브랜드, 종류 등을 고려하고, 구체적으로는 패션 상품의 핏(fit), 패턴, 디자인 디테일, 소재 등을 고려하게 된다. 결국, 패션 상품 자체가 굉장히 많은 속성을 갖게 되므로, 패션 상품 영상의 픽셀 값을 매트릭스로 만들어서 신경망 학습을 이용해 분류하려는 경우, 다양한 속성을 갖는 하나의 패션 상품 영상이 다르게 분류되어야 되는 문제가 발생되고 있다. 그렇기 때문에 영상 검색을 신경망 학습으로 구현하는데에는 실무적인 문제가 발생하고 있다. In addition, the user considers a great number of attributes when recognizing fashion goods. Basically, the user considers the color, the brand, and the type of the fashion product when considering the fashion product, and more specifically, takes into consideration the fit, the pattern, the design detail, and the material of the fashion product. As a result, since a fashion product itself has a great number of attributes, when a pixel value of a fashion product image is made into a matrix and is classified using neural network learning, there arises a problem that one fashion product image having various properties must be classified differently . Therefore, there are practical problems in implementing image retrieval as neural network learning.

뿐만 아니라, 패션 상품의 속성별로 별도의 분류망(classification network)을 학습한 후, 학습된 분류망을 영상 검색에 적용하는 경우, 상당한 처리 시간이 필요하다.In addition, when a classification network is learned separately for each attribute of a fashion product and the learned classification network is applied to an image search, a considerable processing time is required.

대한민국등록특허 10-1191172 (발명의 명칭: 이미지 데이터베이스의 이미지들을 관리하는 방법, 장치 및 컴퓨터 판독 가능한 기록 매체, 등록일: 2012년 10월 9일)Korean Patent Registration No. 10-1191172 (entitled "METHOD, APPARATUS AND COMPUTER-READABLE RECORDING MEDIUM FOR MANAGING IMAGES OF IMAGE DATABASE", filed October 9, 2012)

본 발명이 해결하고자 하는 과제는 하위 계층이 단일망(unified network)으로 구성되고, 상위 계층이 속성별로 분리된 복수의 분류망(classification network)으로 구성된 신경망을 학습하고, 학습된 신경망을 이용하여 상품 영상을 검색하는 상품 검색 방법 및 시스템을 제공하는 것이다. A problem to be solved by the present invention is to learn a neural network in which a lower layer is composed of a unified network and a higher layer is composed of a plurality of classification networks separated by attributes, And to provide a system and method for searching for a product.

본 발명이 해결하고자 하는 과제는 사용자가 등록한 질의 영상으로부터 상품을 인식하고, 인식된 상품과 유사한 상품을 포함하는 상품 영상들을 검색하여 사용자에게 제공하는 상품 검색 방법 및 시스템을 제공하는 것이다. 더욱 구체적으로는, 패션 상품 영상의 바운딩 박스를 추정하여 패션에 관련된 속성 정보만 학습하고 검색되도록 하는 상품 검색 방법 및 시스템을 제공하는 것이다. A problem to be solved by the present invention is to provide a product search method and system for recognizing a product from a query image registered by a user, searching for product images including a product similar to the recognized product, and providing the product image to the user. More specifically, the present invention provides a product retrieval method and system for estimating a bounding box of a fashion product image so that only attribute information related to fashion is learned and retrieved.

본 발명이 해결하고자 하는 과제는 패션 상품의 다양한 속성 정보를 분류할 수 있고, 영상 검색의 정확도를 잃지 않으면서 영상 검색의 속도를 향상시킬 수 있는 상품 검색 방법 및 시스템을 제공하는 것이다. SUMMARY OF THE INVENTION It is an object of the present invention to provide a product search method and system capable of classifying various attribute information of a fashion product and improving the speed of image search without losing the accuracy of the image search.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the above-mentioned problems, and other problems which are not mentioned can be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위하여, 일 실시예에 따른 상품 검색 방법은, 통신부가, 사용자 장치로부터 질의 영상을 수신하는 질의 영상 수신 단계; 후보 영역 검출부가 상기 질의 영상을 기학습된 물체 검출 모델에 적용하여, 상기 질의 영상에서 상품이 존재할 것으로 추정되는 하나 이상의 후보 영역을 추출하는 후보 영역 추출 단계; 특징 추출부가 상기 하나 이상의 후보 영역 내의 영상인 후보 영상을 기 학습된 특징 추출 모델에 적용하여, 상기 후보 영상에 대하여 속성별로 특징을 추출하는 특징 추출 단계; 검색부가 추출된 상기 특징에 기초하여, 상기 후보 영역 영상과 유사한 상품 영상들을 상품 영상 데이터베이스에서 검색하는 검색 단계; 를 포함하고, 상기 특징 추출 모델은, 하위 계층이 단일망(unified network)으로 구성되고, 상위 계층이 상기 속성별로 분리된 복수의 분류망으로 구성된 구조를 가지는 것을 특징으로 한다. According to an embodiment of the present invention, there is provided a product retrieval method including: a query image receiving step in which a communication unit receives a query image from a user apparatus; A candidate region extracting step of extracting at least one candidate region in which the product is estimated to exist in the query image by applying the query image to the previously learned object detection model; A feature extraction step of applying a candidate image, which is an image in the at least one candidate region, to a previously-extracted feature extraction model, and extracting features of the candidate image for each attribute; A search step of searching, in a product image database, product images similar to the candidate region image based on the feature extracted by the search unit; Wherein the feature extraction model has a structure in which a lower layer is composed of a unified network and an upper layer is composed of a plurality of classification networks separated by the attribute.

상기 복수의 분류망은, 상기 속성별로 특정 해상도보다 낮은 해상도를 가지는 저해상도 특징(coarse feature) 및 상기 특정 해상도 이상의 해상도를 가지는 고해상도 특징(fine feature)으로 기학습되는 것을 특징으로 한다. The plurality of classification networks are characterized by coarse features having a resolution lower than a specific resolution and high-resolution features having a resolution higher than the specific resolution for each attribute.

상기 특징 추출 단계는, 상기 특징 추출부가, 상기 후보 영상을 상기 저해상도 특징으로 기학습된 특징 추출 모델에 적용하여 상기 속성별로 저해상도 특징을 추출하는 저해상도 특징 추출 단계; 및 상기 특징 추출부가, 상기 후보 영상을 상기 고해상도 특징으로 기학습된 특징 추출 모델에 적용하여 상기 속성별로 고해상도 특징을 추출하는 고해상도 특징 추출 단계;를 포함한다. Wherein the feature extraction step comprises: a low-resolution feature extraction step of extracting a low-resolution feature for each attribute by applying the candidate image to the feature extraction model learned as the low-resolution feature; And a high-resolution feature extracting step of extracting a high-resolution feature by the attribute by applying the candidate image to the feature extraction model learned as the high-resolution feature.

상기 검색 단계는, 상기 검색부가, 상기 저해상도 특징 추출 단계에서 추출된 상기 저해상도 특징에 기초하여, 상기 후보 영상과 유사한 상품 영상들을 상기 상품 영상 데이터베이스에서 검색하는 저해상도 검색 단계; 및 상기 검색부가, 상기 고해상도 특징 추출 단계에서 추출된 상기 고해상도 특징에 기초하여, 상기 저해상도 검색 단계에서 검색된 상기 유사한 상품 영상들 내에서 상기 후보 영상과 유사한 상품 영상들을 검색하는 고새항도 검색 단계;를 포함한다. Wherein the searching step comprises a low-resolution searching step of searching, in the product image database, product images similar to the candidate image based on the low-resolution feature extracted in the low-resolution feature extracting step; And a search step for searching for product images similar to the candidate image in the similar product images retrieved in the low-resolution retrieval step, based on the high-resolution feature extracted in the high-resolution feature extraction step do.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다. Other specific details of the invention are included in the detailed description and drawings.

사용자가 등록한 질의 영상으로부터 상품을 인식하고, 인식된 상품과 유사한 상품이 포함된 상품 영상들을 검색하여 사용자에게 제공하므로, 텍스트나 카테고리만으로 상품 영상을 검색하는 경우에 비하여 더욱 다양한 상품 영상들을 사용자에게 제공할 수 있다. It is possible to recognize a product from a query image registered by a user and to search for product images including a product similar to the recognized product and provide the retrieved product images to the user so that more diverse product images are provided to the user can do.

질의 영상을 등록하는 행위만으로도 검색이 실행되므로, 사용자의 편의성을 향상시킬 수 있다. The retrieval is performed by only the act of registering the query image, thereby making it possible to improve convenience for the user.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다. The effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시 예에 따른 상품 영상 검색 시스템의 구성을 도시한 도면이다.
도 2는 본 발명의 일 실시 예에 따른 사용자 장치의 구성을 도시한 도면이다.
도 3은 본 발명의 일 실시 예에 따른 영상 검색 장치의 구성을 도시한 도면이다.
도 4는 심층 신경망(Deep Neural Networks, DNN)의 구조를 예시한 도면이다.
도 5는 합성곱 신경망(Convolutional deep Neural Networks, CNN)의 구조를 예시한 도면이다.
도 6은 합성곱 계산 과정을 예시한 도면이다.
도 7은 서브샘플링 과정을 예시한 도면이다.
도 8은 질의 영상과 질의 영상에 기초하여 검색된 상품 영상들을 예시한 도면이다.
도 9는 본 발명의 일 실시 예에 따른 상품 영상 검색 방법을 도시한 순서도이다.
도 10은 본 발명의 다른 실시 예에 따른 영상 검색 장치의 구성을 도시한 도면이다.
도 11은 도 10의 후보 영역 추출부가 사용하는 물체 검출 모델의 학습 과정을 설명하기 위한 도면이다.
도 12는 도 10의 특징 추출부가 사용하는 특징 추출 모델의 구조를 도시한 도면이다.
도 13은 도 10의 검색부의 동작을 설명하기 위한 도면이다.
도 14는 질의 영상의 각 후보 영역의 특징에 기초하여 검색된 상품 영상들을 예시한 도면이다.
도 15는 질의 영상의 후보 영역들 중에서 선택된 후보 영역의 특징에 기초하여 검색된 상품 영상들을 예시한 도면이다.
도 16은 본 발명의 다른 실시 예에 따른 상품 영상 검색 방법을 도시한 순서도이다.
도 17은 서로 다른 구조를 가지는 특징 추출 모델들을 대상으로 특징 추출 전달에 소요되는 시간을 실험한 결과를 도시한 그래프이다. 1 is a diagram illustrating a configuration of a product image search system according to an embodiment of the present invention.
2 is a diagram illustrating a configuration of a user apparatus according to an embodiment of the present invention.
3 is a diagram illustrating the configuration of an image search apparatus according to an embodiment of the present invention.
4 is a diagram illustrating the structure of Deep Neural Networks (DNN).
5 is a diagram illustrating the structure of Convolutional Deep Neural Networks (CNN).
FIG. 6 is a diagram illustrating a process of calculating a product product.
7 is a diagram illustrating a sub-sampling process.
8 is a diagram illustrating product images retrieved based on a query image and a query image.
FIG. 9 is a flowchart illustrating a product image search method according to an embodiment of the present invention.
10 is a diagram illustrating a configuration of an image search apparatus according to another embodiment of the present invention.
11 is a diagram for explaining a learning process of the object detection model used by the candidate region extraction unit of FIG.
12 is a diagram showing a structure of a feature extraction model used by the feature extraction unit of FIG.
13 is a diagram for explaining the operation of the search unit of FIG.
14 is a diagram illustrating product images retrieved based on features of candidate regions of a query image.
15 is a diagram illustrating product images retrieved based on features of a candidate region selected from candidate regions of the query image.
16 is a flowchart illustrating a product image search method according to another embodiment of the present invention.
FIG. 17 is a graph showing the results of an experiment on the time required for feature extraction and transmission with respect to feature extraction models having different structures.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 게시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 게시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다. The terminology used herein is for the purpose of illustrating embodiments and is not intended to be limiting of the present invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. The terms " comprises "and / or" comprising "used in the specification do not exclude the presence or addition of one or more other elements in addition to the stated element. Like reference numerals refer to like elements throughout the specification and "and / or" include each and every combination of one or more of the elements mentioned. Although "first "," second "and the like are used to describe various components, it is needless to say that these components are not limited by these terms. These terms are used only to distinguish one component from another. Therefore, it goes without saying that the first component mentioned below may be the second component within the technical scope of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. Unless defined otherwise, all terms (including technical and scientific terms) used herein may be used in a sense that is commonly understood by one of ordinary skill in the art to which this invention belongs. In addition, commonly used predefined terms are not ideally or excessively interpreted unless explicitly defined otherwise.

이하, 첨부된 도면들을 참조하여 본 발명의 실시예들을 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 상품 영상 검색 시스템(1)의 구성을 도시한 도면이다. FIG. 1 is a diagram illustrating a configuration of a product image search system 1 according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 상품 영상 검색 시스템(1)은 사용자 장치(100) 및 영상 검색 장치(200)를 포함한다. Referring to FIG. 1, a merchandise image search system 1 according to an embodiment of the present invention includes a user apparatus 100 and an image search apparatus 200.

사용자 장치(100)는 사용자로부터 질의 영상(query image)을 입력받는다. 질의 영상은 의류, 신발, 가방 및 액세서리 등의 패션 상품(이하, '상품'이라 칭하기로 한다)을 포함하는 2차원 컬러 영상일 수 있다. 이러한 질의 영상은 다른 장치(도시되지 않음)에서 사용자 장치(100)로 배포된 것이거나, 사용자 장치(100)에 구비된 카메라(도시되지 않음)를 통해 획득된 것이거나, 사용자 장치(100)의 화면 캡처 기능을 통해 획득된 것일 수 있다. The user device 100 receives a query image from a user. The query image can be a two-dimensional color image including a fashion item (hereinafter, referred to as a 'product') such as clothing, shoes, bags and accessories. This query image may be either distributed to the user device 100 from another device (not shown), acquired through a camera (not shown) included in the user device 100, It may have been obtained through a screen capture function.

사용자 장치(100)는 사용자가 입력한 질의 영상을 유무선 네트워크(300)를 통해 영상 검색 장치(200)로 전송하고, 영상 검색 장치(200)로부터 질의 영상에 포함된 상품과 유사한 상품을 포함하는 상품 영상들을 제공받는다. 영상 검색 장치(200)에서 제공된 상품 영상들은 사전 지정된 기준에 따라 정렬되어 사용자 장치(100)를 통해 출력될 수 있다. 예를 들면, 상품 영상들은 질의 영상에 포함된 상품과의 유사도를 기준으로 정렬될 수 있다. 이러한 사용자 장치(100)는 유무선 통신 장치를 포함할 수 있다. 유무선 통신 장치로는 개인 컴퓨터(Personal Computer, PC), 스마트폰 및 태블릿 PC를 예로 들 수 있다. 사용자 장치(100)의 구성에 대한 보다 구체적인 설명은 도 2를 참조하여 후술하기로 한다. The user apparatus 100 transmits a query image input by the user to the image search apparatus 200 through the wired or wireless network 300 and transmits a query including a product similar to the product included in the query image Images are provided. The product images provided by the image search apparatus 200 may be sorted according to a predetermined reference and output through the user apparatus 100. [ For example, the product images can be sorted based on the degree of similarity with the products included in the query image. Such a user device 100 may comprise a wired or wireless communication device. Examples of wired / wireless communication devices include personal computers (PCs), smart phones, and tablet PCs. A more detailed description of the configuration of the user device 100 will be given later with reference to Fig.

영상 검색 장치(200)는 사용자 장치(100)로 상품 영상 검색 서비스를 제공한다. 구체적으로, 영상 검색 장치(200)는 사용자 장치(100)로부터 질의 영상을 수신하고, 수신된 질의 영상으로부터 상품을 인식하고(label prediction), 인식된 상품의 특징(feature)을 추출한다. 실시 예에 따르면, 영상 검색 장치(200)는 딥 러닝(Deep learning)에 기반하여 질의 영상으로부터 상품을 인식하고(label prediction), 인식된 상품의 특징(feature)을 추출한다. 이후, 영상 검색 장치(200)는 인식된 상품과 추출된 특징을 이용하여, 질의 영상에서 인식된 상품과 유사한 상품을 포함하는 상품 영상을 상품 영상 데이터베이스(도 3의 '280' 참조)에서 검색한다. 그리고 검색된 상품 영상들을 사용자 장치(100)로 전송한다. 영상 검색 장치(200)의 구성에 대한 보다 구체적인 설명은 도 3을 참조하여 후술하기로 한다. The image search apparatus 200 provides a product image search service to the user apparatus 100. Specifically, the image search apparatus 200 receives a query image from the user device 100, recognizes the product from the received query image (label prediction), and extracts features of the recognized product. According to an embodiment, the image search apparatus 200 recognizes a product from a query image based on Deep Learning and extracts features of the recognized product. Then, the image search apparatus 200 searches the product image database (refer to '280' in FIG. 3) using the recognized product and the extracted feature, and the product image including the product similar to the product recognized in the query image . And transmits the retrieved product images to the user device 100. A more detailed description of the configuration of the image search apparatus 200 will be described later with reference to FIG.

도 2는 본 발명의 일 실시 예에 따른 사용자 장치(100)의 구성을 도시한 도면이다. 2 is a diagram illustrating a configuration of a user apparatus 100 according to an embodiment of the present invention.

도 2를 참조하면, 사용자 장치(100)는 입력부(110), 출력부(1120), 통신부(130), 저장부(140), 전원부(150) 및 제어부(160)를 포함한다. 2, the user device 100 includes an input unit 110, an output unit 1120, a communication unit 130, a storage unit 140, a power unit 150, and a controller 160.

입력부(110)는 사용자로부터 명령이나 정보를 입력받는다. 예를 들어, 입력부(110)는 질의 영상을 선택하는 선택 명령을 입력받는다. 이를 위하여 입력부(110)는 터치 패드, 키 패드, 버튼, 스위치, 조그 휠, 또는 이들의 조합으로 이루어진 입력 수단을 포함할 수 있다. 터치 패드는 후술될 출력부(120)의 디스플레이(도시되지 않음)에 적층되어 터치 스크린(touch screen)을 구성할 수 있다. The input unit 110 receives commands and information from a user. For example, the input unit 110 receives a selection command for selecting a query image. For this, the input unit 110 may include input means including a touch pad, a key pad, a button, a switch, a jog wheel, or a combination thereof. The touch pad may be stacked on a display (not shown) of the output unit 120 to be described later to form a touch screen.

출력부(120)는 명령 처리 결과나 각종 정보를 사용자에게 출력한다. 예를 들어, 출력부(120)는 서버(200)로부터 수신한 상품 영상들을 출력한다. 이를 위하여, 출력부(120)는 도면에 도시되지는 않았으나, 디스플레이 및 스피커를 포함할 수 있다. 디스플레이는 평판 디스플레이(Flat panel display), 연성 디스플레이(Flexible display), 불투명 디스플레이, 투명 디스플레이, 전자종이(Electronic paper, E-paper), 또는 본 발명이 속하는 기술분야에서 잘 알려진 임의의 형태로 제공될 수 있다. 출력부(120)는 디스플레이 및 스피커 외에도 본 발명이 속하는 기술분야에서 잘 알려진 임의의 형태의 출력 수단을 더 포함하여 구성될 수도 있다. The output unit 120 outputs the command processing result and various information to the user. For example, the output unit 120 outputs product images received from the server 200. To this end, the output unit 120 may include a display and a speaker, although not shown in the figure. The display may be provided in any form well known in the art, such as a flat panel display, a flexible display, an opaque display, a transparent display, an electronic paper (E-paper) . The output unit 120 may further include any type of output means well known in the art to which the present invention belongs, in addition to the display and the speaker.

통신부(130)는 유무선 네트워크(300)를 통해 영상 검색 장치(200)와 통신한다. 이를 위해 통신부(130)는 TCP/IP 프로토콜 또는 UDP 프로토콜을 지원하는 유선 통신 방식 및/또는 무선 통신 방식을 지원한다. 무선 통신 방식으로는 와이브로(Wireless Broadband Internet), 와이파이(WiFi), 지그비(ZigBee), 블루투스(Bluetooth, 예를 들면, 블루투스 4.0), 울트라와이드밴드(Ultra Wide Band, UWB), 근거리무선통신(Near Field Communication, NFC), 3세대 이동 통신(3G), 4세대 이동 통신(4G) 및 5세대 이동 통신(5G)을 예로 들 수 있으나, 예시된 것들로 반드시 한정되는 것은 아니다. The communication unit 130 communicates with the image search apparatus 200 through the wired / wireless network 300. To this end, the communication unit 130 supports a wired communication method and / or a wireless communication method supporting the TCP / IP protocol or the UDP protocol. Examples of wireless communication methods include Wireless Broadband Internet, WiFi, ZigBee, Bluetooth (e.g., Bluetooth 4.0), Ultra Wide Band (UWB) Field Communication (NFC), Third Generation Mobile Communication (3G), Fourth Generation Mobile Communication (4G), and Fifth Generation Mobile Communication (5G).

저장부(140)는 사용자 장치(100)가 동작하는데 필요한 데이터, 프로그램 및 어플리케이션 등을 저장한다. 이러한 저장부(140)는 비휘발성 메모리, 휘발성 메모리, 내장형 메모리, 착탈 가능한 외장형 메모리, 하드 디스크, 광 디스크, 광자기 디스크, 또는 본 발명이 속하는 기술분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함할 수 있다. 외장형 메모리로는 외장형 메모리로는 SD 카드(Secure Digital card), 미니 SD 카드, 및 마이크로 SD 카드를 예로 들 수 있다. The storage unit 140 stores data, programs, and applications necessary for the user device 100 to operate. Such storage 140 may be any type of computer readable medium such as non-volatile memory, volatile memory, built-in memory, removable external memory, hard disk, optical disk, magneto-optical disk, or any form of computer readable storage medium known in the art And the like. Examples of the external memory include an SD card (Secure Digital card), a mini SD card, and a micro SD card.

전원부(150)는 사용자 장치(100)의 각 구성요소들로 전원을 공급한다. 일 실시 예에 따르면, 전원부(150)는 사용자 장치(100)로부터 기계적 및 전기적으로 분리 가능하도록 구현될 수 있다. 분리된 전원부(150)는 여분의 다른 전원부(도시되지 않음)으로 교체될 수 있다. 다른 실시 예에 따르면, 전원부(150)는 사용자 장치(100)와 일체형으로 구현될 수도 있다. 이 경우, 전원부(150)는 별도로 마련된 충전 장치(도시되지 않음)로부터 전력을 공급받아 충전될 수 있다. 이 때, 전원부(150)는 유선전력전송 기술 또는 무선전력전송 기술에 따라 충전 장치로부터 전력을 공급받을 수 있다. The power supply unit 150 supplies power to the respective components of the user device 100. According to one embodiment, the power supply 150 may be implemented to be mechanically and electrically disconnectable from the user device 100. The separated power source unit 150 can be replaced with an extra power source unit (not shown). According to another embodiment, the power supply unit 150 may be implemented integrally with the user device 100. [ In this case, the power supply unit 150 can be charged with electric power supplied from a separately provided charging device (not shown). At this time, the power supply unit 150 can receive power from the charging device according to the wired power transmission technique or the wireless power transmission technique.

제어부(160)는 사용자 장치(100) 내의 다른 구성요소들을 연결하고 제어한다. 예를 들어, 제어부(160)는 질의 영상 및/또는 상품 영상을 표시하기 위한 사용자 인터페이스를 구성하여, 디스플레이를 통해 표시한다. The control unit 160 connects and controls the other components in the user device 100. [ For example, the control unit 160 configures a user interface for displaying a query image and / or a product image, and displays the user interface through a display.

도 3은 본 발명의 일 실시 예에 따른 영상 검색 장치(200)의 구성을 도시한 도면이다. FIG. 3 is a diagram illustrating a configuration of an image search apparatus 200 according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시 예에 따른 영상 검색 장치(200)는 통신부(210), 특징 추출부(230), 검색부(240), 유사도 계산부(250), 영상 정렬부(260), 질의 영상 데이터베이스(270) 및 상품 영상 데이터베이스(280)를 포함한다. 3, an image search apparatus 200 according to an exemplary embodiment of the present invention includes a communication unit 210, a feature extraction unit 230, a search unit 240, a similarity calculation unit 250, 260, a query image database 270, and a merchandise image database 280.

통신부(210)는 사용자 장치(100)와의 통신을 담당한다. 예를 들어, 통신부(210)는 사용자 장치(100)로부터 질의 영상(도 8의 '310' 참조)을 수신한다. 다른 예로, 통신부(210)는 질의 영상에 기초하여 검색된 상품 영상들(도 8의 '320' 참조)을 사용자 장치(100)로 전송한다. 이를 위해 통신부(210)는 유선 통신 방식 및/또는 무선 통신 방식을 지원할 수 있다. The communication unit 210 is in charge of communication with the user device 100. For example, the communication unit 210 receives a query image (refer to '310' in FIG. 8) from the user device 100. In another example, the communication unit 210 transmits retrieved product images (see 320 in FIG. 8) to the user device 100 based on the query image. To this end, the communication unit 210 may support a wired communication method and / or a wireless communication method.

특징 추출부(230)는 질의 영상의 특징을 추출한다. 실시 예에 따르면, 특징 추출부(230)는 딥 러닝(Deep learning)을 기반으로 학습된 모델을 통해 질의 영상의 특징을 추출한다. The feature extraction unit 230 extracts features of the query image. According to the embodiment, the feature extracting unit 230 extracts the feature of the query image through the learned model based on the Deep Learning.

딥 러닝은 여러 비선형 변환기법의 조합을 통해 높은 수준의 추상화(abstractions, 다량의 데이터나 복잡한 자료들 속에서 핵심적인 내용 또는 기능을 요약하는 작업)를 시도하는 기계학습(machine learning) 알고리즘의 집합으로 정의된다. 딥 러닝은 큰 틀에서 사람의 사고방식을 컴퓨터에게 가르치는 기계학습의 한 분야로 볼 수 있다. Deep learning is a set of machine learning algorithms that try to achieve a high level of abstraction (a task that summarizes key content or functions in large amounts of data or complex data) through a combination of several nonlinear transformation techniques. Is defined. Deep learning can be viewed as a field of machine learning that teaches computers how people think in a big way.

어떠한 데이터가 있을 때 이를 컴퓨터가 알아 들을 수 있는 형태(예를 들어 영상의 경우는 픽셀정보를 열벡터로 표현하는 등)로 표현(representation)하고 이를 학습에 적용하기 위해 많은 연구(어떻게 하면 더 좋은 표현기법을 만들고 또 어떻게 이것들을 학습할 모델을 만들지에 대한)가 진행되고 있다. 이러한 노력의 결과로 다양한 딥 러닝 기법들이 개발되었다. 딥 러닝 기법들로는 심층 신경망(Deep Neural Networks, DNN), 합성곱 신경망(Convolutional deep Neural Networks, CNN), 순환 신경망(Reccurent Neural Network, RNN) 및 심층 신뢰 신경망(Deep Belief Networks, DBN)을 예로 들 수 있다. When there is any data, it is represented by the form that the computer understands (for example, the pixel information is represented by a column vector in the case of the image), and many researches How to create expression models and how to model them). As a result of these efforts, various deep-running techniques have been developed. Deep learning techniques include Deep Neural Networks (DNN), Convolutional Deep Neural Networks (CNN), Recurrent Neural Networks (RNN), and Deep Belief Networks (DBN). have.

심층 신경망(Deep Neural Networks, DNN)은 입력 계층(input layer)과 출력 계층(output layer) 사이에 복수개의 은닉 계층(hidden layer)들로 이뤄진 인공신경망(Artificial Neural Network, ANN)이다. Deep Neural Networks (DNN) is an Artificial Neural Network (ANN) composed of hidden layers between an input layer and an output layer.

도 4는 심층 신경망의 구조를 예시한 도면이다. 도 4에서 각 원은 하나의 퍼셉트론(perceptron)을 나타낸다. 퍼셉트론은 여러 개의 입력 값(input), 하나의 프로세서(prosessor), 하나의 출력 값으로 구성된다. 프로세서는 여러 개의 입력 값에 각각 가중치를 곱한 후, 가중치가 곱해진 입력 값들을 모두 합한다. 그 다음 프로세서는 합해진 값을 활성화함수에 대입하여 하나의 출력 값을 출력한다. 만약 활성화함수의 출력 값으로 특정한 값이 나오기를 원하는 경우, 각 입력 값에 곱해지는 가중치를 수정하고, 수정된 가중치를 이용하여 출력 값을 다시 계산할 수 있다. 도 4에서 각각의 퍼셉트론은 서로 다른 활성화함수를 사용할 수 있다. 또한 각각의 퍼셉트론은 이전 계층에서 전달된 출력들을 입력으로 받아들인 다음, 활성화 함수를 이용해서 출력을 구한다. 구해진 출력은 다음 계층의 입력으로 전달된다. 상술한 바와 같은 과정을 거치면 최종적으로 몇 개의 출력 값을 얻을 수 있다. 4 is a diagram illustrating a structure of a depth-of-field neural network. In FIG. 4, each circle represents one perceptron. A perceptron consists of several input values, a processor, and a single output value. The processor multiplies the multiple input values by their respective weights, and then weights all the input values multiplied by the weights. The processor then substitutes the summed value into the activation function and outputs one output value. If you want to output a specific value as the output value of the activation function, you can modify the weight multiplied by each input value, and recalculate the output value using the modified weight. In Fig. 4, each perceptron may use a different activation function. Each perceptron also receives the outputs from the previous layer as input, and then uses the activation function to obtain the output. The obtained output is transferred to the input of the next layer. After the process described above, several output values can be finally obtained.

딥 러닝 기법에 대한 설명으로 다시 돌아가면, 합성곱 신경망(Convolutional deep Neural Networks, CNN)은 최소한의 전처리(preprocess)를 사용하도록 설계된 다계층 퍼셉트론(multilayer perceptrons)의 한 종류이다. 합성곱 신경망은 하나 또는 여러개의 합성곱 계층(convolutional layer)과 그 위에 올려진 일반적인 인공신경망 계층들로 이루어져 있으며, 가중치와 통합 계층(pooling layer)들을 추가로 활용한다. 이러한 구조 덕분에 합성곱 신경망은 2차원 구조의 입력 데이터를 충분히 활용할 수 있다. 또한, 합성곱 신경망은 표준 역전달을 통해 훈련될 수 있다. 합성곱 신경망은 다른 피드포워드 인공신경망 기법들보다 쉽게 훈련되는 편이고 적은 수의 매개변수를 사용한다는 이점이 있다. Going back to the description of the deep running technique, Convolutional Deep Neural Networks (CNN) is a kind of multilayer perceptrons designed to use minimal preprocessing. The composite neural network consists of one or several convolutional layers and general artificial neural network layers stacked on top of it, and utilizes additional weighting and pooling layers. This structure allows the composite neural network to fully utilize the input data of the two-dimensional structure. In addition, the combined product neural network can be trained through standard inverse delivery. Composite neural networks are more easily trained than other feedforward artificial neural network techniques and have the advantage of using fewer parameters.

합성곱 신경망은 입력 영상에 대하여 합성곱과 서브샘플링을 번갈아 수행함으로써 입력 영상으로부터 특징을 추출한다. 도 5는 합성곱 신경망의 구조를 예시한 도면이다 도 5를 참조하면, 합성곱 신경망은 여러 개의 합성곱 계층(Convolution layer), 여러 개의 서브샘플링 계층(Subsampling layer, Lacal pooling layer, Max-Pooling layer), 완전 연결 계층(Fully-Connected layer)을 포함한다. 합성곱 계층은 입력 영상(Input Image)에 대해 합성곱을 수행하는 계층이다. 그리고 서브샘플링 계층은 입력 영상에 대해 지역적으로 최대값을 추출하여 2차원 영상으로 매핑하는 계층으로, 국소적인 영역을 더 크게 하고, 서브샘플링을 수행한다. The composite neural network extracts features from the input image by alternately performing the composite product and the subsampling on the input image. FIG. 5 is a diagram illustrating a structure of a composite neural network. Referring to FIG. 5, the composite neural network includes a plurality of convolution layers, a plurality of subsampling layers (Max-Pooling layer, ), And a fully-connected layer. The composite product layer is a layer that performs a composite product on an input image. The subsampling layer is a layer for extracting a maximum value locally for an input image and mapping it to a two-dimensional image.

합성곱 계층에서는 커널의 크기(kernel size), 사용할 커널의 개수(즉, 생성할 맵의 개수), 및 합성곱 연산 시에 적용할 가중치 테이블 등의 정보가 필요하다. 예를 들어, 입력 영상의 크기가 32×32이고, 커널의 크기가 5×5이고, 사용할 커널의 개수가 20개인 경우를 예로 들자. 이 경우, 32×32 크기의 입력 영상에 5×5 크기의 커널을 적용하면, 입력 영상의 위, 아래, 왼쪽, 오른쪽에서 각각 2개의 픽셀(pixel)에는 커널을 적용하는 것이 불가능하다. 왜냐하면, 도 6에 도시되어 있는 합성곱 계산 과정에서 알 수 있듯이, 입력 영상의 위에 커널을 배치한 후 합성곱을 수행하면, 그 결과 값인 '-8'은 커널에 포함된 입력 영상의 픽셀들 중에서 커널의 중심요소(center element)에 대응하는 픽셀의 값으로 결정되기 때문이다. 따라서, 32×32 크기의 입력 영상에 5×5 크기의 커널을 적용하여 합성곱을 수행하면 28×28 크기의 맵(map)이 생성된다. 앞서, 사용할 커널의 개수가 총 20개인 경우를 가정하였으므로, 첫 번째 합성곱 계층(도 5의 'C1-layer' 참조)에서는 총 20개의 28×28 크기의 맵이 생성된다. In the composite product layer, information such as the kernel size, the number of kernels to be used (that is, the number of maps to be generated), and a weight table to be used in the composite product operation are required. For example, consider the case where the size of the input image is 32 × 32, the size of the kernel is 5 × 5, and the number of kernels to be used is 20. In this case, if a 5 × 5 kernel is applied to a 32 × 32 input image, it is impossible to apply the kernel to two pixels on the upper, lower, left, and right sides of the input image. 6, when the kernel is placed on the input image and the resultant product is performed, the resulting value '-8' is the number of pixels in the kernel of the input image included in the kernel, Is determined by the value of the pixel corresponding to the center element of the pixel. Therefore, if a 5 × 5 kernel is applied to a 32 × 32 input image and a composite product is performed, a 28 × 28 map is generated. It is assumed that the total number of kernels to be used is 20 in total. Therefore, a total of 20 28 × 28 maps are generated in the first convolution layer (see 'C1-layer' in FIG. 5).

서브샘플링 계층에서는 서브샘플링할 커널의 크기에 대한 정보, 커널 영역 내의 값들 중 최대값을 선택할 것인지 최소값을 선택할 것인지에 대한 정보가 필요하다. 도 7은 서브샘플링 과정을 도시한 도면이다. 도 7을 참조하면, 서브샘플링할 커널의 크기가 2×2이고, 커널 영역에 포함된 값들 중 최대값을 선택하도록 설정된 것을 알 수 있다. 8×8 크기의 입력 영상에 2×2 크기의 커널을 적용하면, 4×4 크기의 출력 영상을 얻을 수 있다. 즉, 입력 영상에 비하여 크기가 1/2로 축소된 출력 영상을 얻을 수 있다. At the subsampling layer, information on the size of the kernel to be sub-sampled, information on whether to select the maximum value or the minimum value among the values in the kernel area is required. 7 is a diagram illustrating a sub-sampling process. Referring to FIG. 7, it can be seen that the size of the kernel to be sub-sampled is 2 × 2, and the maximum value among the values included in the kernel area is set to be selected. When a 2 × 2 kernel is applied to an 8 × 8 size input image, a 4 × 4 size output image can be obtained. That is, an output image whose size is reduced to 1/2 of the input image can be obtained.

딥 러닝 기법에 대한 설명으로 다시 돌아가면, 순환 신경망(Reccurent Neural Network, RNN)은 인공신경망을 구성하는 유닛 사이의 연결이 Directed cycle을 구성하는 신경망을 말한다. 순환 신경망은 앞먹임 신경망과 달리, 임의의 입력을 처리하기 위해 신경망 내부의 메모리를 활용할 수 있다.Returning to the description of the deep running technique, the Reccurent Neural Network (RNN) refers to a neural network in which the connections between the units forming the artificial neural network constitute a directed cycle. Circular neural networks can utilize memory inside a neural network to process arbitrary inputs, unlike pre-feed neural networks.

심층 신뢰 신경망(Deep Belief Networks, DBN)이란 기계학습에서 사용되는 그래프 생성 모형(generative graphical model)으로, 딥 러닝에서는 잠재변수(latent variable)의 다중계층으로 이루어진 심층 신경망을 의미한다. 계층 간에는 연결이 있지만 계층 내의 유닛 간에는 연결이 없다는 특징이 있다. Deep Belief Networks (DBN) is a generative graphical model used in machine learning. In deep learning, it is a deep neural network composed of multiple layers of latent variables. There is a connection between layers but there is no connection between the units in the layer.

심층 신뢰 신경망은 생성 모형이라는 특성상 선행학습에 사용될 수 있고, 선행학습을 통해 초기 가중치를 학습한 후 역전파 혹은 다른 판별 알고리즘을 통해 가중치의 미조정을 할 수 있다. 이러한 특성은 훈련용 데이터가 적을 때 굉장히 유용한데, 이는 훈련용 데이터가 적을수록 가중치의 초기값이 결과적인 모델에 끼치는 영향이 세지기 때문이다. 선행학습된 가중치 초기값은 임의로 설정된 가중치 초기값에 비해 최적의 가중치에 가깝게 되고 이는 미조정 단계의 성능과 속도향상을 가능케 한다. The deep trust neural network can be used for preliminary learning due to the nature of generation model. After learning initial weight through preliminary learning, we can fine - tune the weight through back propagation or other discrimination algorithm. This characteristic is very useful when the training data is small, because the smaller the training data is, the more the initial value of the weight affects the resulting model. The initial value of the weighted initial value is closer to the optimal weight value than the arbitrary initial value of the weighted value, which enables improvement of the performance and speed of the unadjusted step.

다시 도 3을 참조하면, 특징 추출부(230)는 상술한 바와 같은 딥 러닝 기법들 중에서 합성곱 신경망 기법으로 학습된 모델을 이용하여, 질의 영상의 특징을 추출할 수 있다. 이후, 특징 추출부(230)는 추출된 특징에 기초하여 질의 영상 내의 상품을 인식한다. 예를 들면, 질의 영상 내의 상품의 카테고리(가방, 신발, 셔츠, 바지)를 인식한다. 상품의 카테고리에 대한 인식 결과는 후술될 검색부(240)로 제공되며, 질의 영상으로부터 추출된 특징은 후술될 유사도 계산부(240)로 제공된다. Referring again to FIG. 3, the feature extraction unit 230 can extract the feature of the query image using the model learned by the composite neural network technique among the deep learning techniques as described above. Then, the feature extraction unit 230 recognizes the product in the query image based on the extracted feature. For example, it recognizes the category of goods (bag, shoes, shirt, pants) in the query image. The recognition result for the category of the goods is provided to the search unit 240, which will be described later, and the features extracted from the query image are provided to the similarity calculation unit 240, which will be described later.

이외에도, 특징 추출부(230)는 상품 영상 데이터베이스(270)에 저장되어 있는 상품 영상들에 대해서도 특징을 추출할 수 있다. In addition, the feature extraction unit 230 may extract the feature of the product images stored in the product image database 270. [

일 실시 예에 따르면, 상품 영상들에 대한 특징 추출 작업은 일괄적으로 한 번만 수행될 수 있다. 이 경우, 각 상품 영상으로부터 추출된 특징들은 각 상품 영상에 매핑되어 저장될 수 있다. According to one embodiment, the feature extraction operation for the product images can be performed only once at a time. In this case, the features extracted from each product image can be mapped and stored in each product image.

다른 실시 예에 따르면, 상품 영상들에 대한 특징 추출 작업은 질의 영상에 대한 특징 추출 작업이 완료될 때마다 수행될 수도 있다. 또한, 상품 영상 데이터베이스 내에 저장되어 있는 상품 영상들은 주기적으로 갱신될 수 있는데, 상품 영상 데이터베이스에 새로운 상품 영상이 저장될 때마다 해당 상품 영상에 대한 특징 추출 작업이 수행될 수 있다. According to another embodiment, the feature extraction operation for the product images may be performed each time the feature extraction operation for the query image is completed. In addition, the product images stored in the product image database may be periodically updated, and feature extraction operations for the product images may be performed each time a new product image is stored in the product image database.

검색부(240)는 질의 영상에서 인식된 상품과 유사한 상품을 포함하는 상품 영상들을 상품 영상 데이터베이스(280)에서 검색한다. 구체적으로, 검색부(240)는 질의 영상에서 인식된 상품의 카테고리와 유사하거나 동일한 카테고리에 해당하는 상품을 포함하는 상품 영상들을 상품 영상 데이터베이스(280)에서 검색한다. 검색부(240)에 의해 검색된 상품 영상들은 후술될 유사도 계산부(240)로 제공된다. The search unit 240 searches the goods image database 280 for goods images including goods similar to the goods recognized in the query image. Specifically, the search unit 240 searches the product image database 280 for product images that include products corresponding to or similar to the category of the product recognized in the query image. The product images retrieved by the retrieval unit 240 are provided to the similarity calculation unit 240 to be described later.

유사도 계산부(250)는 검색부(240)에 의해 검색된 상품 영상들을 대상으로 질의 영상과의 유사도를 계산한다. 구체적으로, 유사도 계산부(250)는 질의 영상으로부터 추출된 특징과 검색된 각 상품 영상의 특징 간의 유사도를 계산한다. 이 때, 유사도 계산 방법으로는 해밍 거리(Hamming Distance) 및 코사인 유사도(Cosine Similarity)를 예로 들 수 있다. 각 상품 영상 별로 계산된 유사도 값은 후술될 영상 정렬부(260)로 제공된다. The similarity calculation unit 250 calculates the similarity between the product images retrieved by the retrieval unit 240 and the query image. Specifically, the similarity calculation unit 250 calculates the similarity between the feature extracted from the query image and the feature of each retrieved product image. In this case, the similarity calculation method may be exemplified by Hamming distance and cosine similarity. The similarity value calculated for each product image is provided to the image arranging unit 260 to be described later.

영상 정렬부(260)는 검색부(240)에 의해 검색된 상품 영상들을 유사도 계산부(250)로부터 제공받은 유사도 값에 기초하여 정렬한다. 예를 들면, 영상 정렬부(260)는 유사도 값이 높은 순서대로 상품 영상들을 정렬한다. 정렬된 상품 영상들은 통신부(210)를 통해 사용자 장치(100)로 전송된다. The image arranging unit 260 arranges the product images retrieved by the retrieving unit 240 based on the similarity value provided from the similarity calculating unit 250. For example, the image arranging unit 260 arranges product images in the order of high similarity value. The sorted merchandise images are transmitted to the user apparatus 100 through the communication unit 210.

일 실시 예에 따르면, 정렬된 상품 영상들 모두가 사용자 장치(100)로 제공된다. According to one embodiment, all of the ordered merchandise images are provided to the user device 100.

다른 실시 예에 따르면, 정렬된 상품 영상들 중에서 유사도 값이 기준치 이상인 상품 영상들이 선택되고, 선택된 상품 영상들만이 사용자 장치(100)로 전송된다. According to another embodiment, among the sorted product images, product images whose similarity value is equal to or higher than the reference value are selected, and only the selected product images are transmitted to the user device 100.

또 다른 실시예에 따르면, 정렬된 상품 영상들 중에서 상위 N개의 상품 영상들이 선택되고, 선택된 상품 영상들만이 사용자 장치(100)로 전송된다. 일 예로, N은 서버(200)의 관리자(도시되지 않음)에 의해 사전에 설정될 수 있으며, 설정된 값은 사용자 장치(100)를 통해 사용자에 의해 변경될 수 있다. 다른 예로, N은 사용자의 구매 빈도나 구매 이력에 따라 자동으로 결정될 수 있다. 예를 들어, 구매 빈도가 낮은 사용자에 대해서는 N이 작은 값으로 설정되고, 구매 빈도가 높은 사용자에 대해서는 N이 큰 값으로 설정될 수 있다. According to another embodiment, the top N product images are selected from among the sorted product images, and only the selected product images are transmitted to the user device 100. In one example, N may be preset by the administrator (not shown) of the server 200, and the set values may be changed by the user via the user device 100. [ As another example, N may be automatically determined according to the purchase frequency or purchasing history of the user. For example, N may be set to a small value for a user with a low purchase frequency, and N may be set to a large value for a user with a high purchase frequency.

질의 영상 데이터베이스(270)는 사용자 장치(100)로부터 수신한 질의 영상을 저장한다. 질의 영상은 사용자별로 분류되어 저장될 수 있다. 사용자별로 분류된 질의 영상은 사용자별로 선호하는 상품의 종류를 분석하는데 사용될 수 있다. The query image database 270 stores the query image received from the user device 100. [ The query image can be classified and stored for each user. Query images classified by users can be used to analyze the type of products preferred by users.

상품 영상 데이터베이스(280)는 상품 영상들을 저장한다. 일 예로, 상품 영상은 서버(200)의 관리자에 의해 업로드될 수 있다. 다른 예로, 상품 영상은 서버(200)와 연동된 다른 장치(도시되지 않음)나 다른 서버(도시되지 않음)로부터 자동으로 수집될 수 있다. The merchandise image database 280 stores merchandise images. For example, the merchandise image may be uploaded by the administrator of the server 200. [ As another example, the merchandise image may be automatically collected from another apparatus (not shown) or another server (not shown) associated with the server 200.

업로드되거나 수집된 상품 영상에 대해서는 특징이 추출되며, 추출된 특징에 기초하여 상품 영상 내의 상품의 카테고리가 인식된다. 상품 영상에서 추출된 특징은 상품 영상에 매핑될 수 있으며, 특징이 매핑된 상품 영상은 인식된 카테고리를 기준으로 분류되어 상품 영상 데이터베이스(280)에 저장될 수 있다. 여기서, 상품 영상의 카테고리로는 의류, 신발, 가방 및 액세서리 등을 예로 들 수 있다. 예시된 바와 같은 카테고리들은 하위 카테고리들로 더욱 세분화될 수 있다. 예를 들어, '의류'라는 카테고리는 셔츠, 치마, 바지, 드레스 등의 하위 카테고리로 더욱 세분화될 수도 있다. 그리고 '신발'이라는 카테고리는 운동화, 구두, 부츠, 슬리퍼 등의 하위 카테고리로 더욱 세분화될 수 있다. The feature is extracted for the uploaded or collected product image, and the category of the product in the product image is recognized based on the extracted feature. The feature extracted from the product image can be mapped to the product image, and the product image having the feature mapped can be classified based on the recognized category and stored in the product image database 280. [ Here, the category of the product image includes clothes, shoes, bags, accessories, and the like. The categories as illustrated can be further subdivided into subcategories. For example, the category 'apparel' may be subdivided into subcategories such as shirts, skirts, pants, dresses, and so on. And the category 'shoes' can be further subdivided into subcategories such as sneakers, shoes, boots, slippers, and so on.

이상, 도 3 내지 도 8을 참조하여 본 발명의 일 실시 예에 따른 서버(20)에 대해서 설명하였다. 전술한 예에서는 서버(200)가 특징 추출부(230), 유사도 계산부(250) 및 영상 정렬부(260)를 모두 포함하는 경우를 예로 들어 설명하였지만, 이들 구성요소들 중에서 하나 이상의 구성요소들은 사용자 장치(100)에 구비될 수도 있다. The server 20 according to the embodiment of the present invention has been described above with reference to FIGS. In the above example, the server 200 includes the feature extraction unit 230, the similarity calculation unit 250, and the image alignment unit 260. However, one or more of the components Or may be provided in the user device 100.

또한, 서버(200)는 도 3에 도시된 구성요소들 외에도 하나 이상의 다른 구성요소들을 더 포함할 수 있다. 예를 들면, 서버(200)는 서버(200)와 연동된 외부의 다른 서버들로부터 상품 영상을 수집하기 위한 영상 수집부, 사용자의 개인 정보나 구매 이력을 저장하는 사용자 정보 데이터베이스부, 사용자의 구매 이력을 분석하는 분석부, 분석부의 분석 결과에 기초하여 상품 영상 검색 서비스와 관련된 각종 조건을 자동으로 설정하는 조건 설정부 중 적어도 하나를 더 포함할 수 있다. In addition, the server 200 may further include one or more other components in addition to the components shown in FIG. For example, the server 200 may include an image collecting unit for collecting product images from other external servers linked to the server 200, a user information database unit for storing personal information or purchasing history of users, An analysis unit for analyzing the history, and a condition setting unit for automatically setting various conditions related to the product image search service based on the analysis result of the analysis unit.

도 9는 본 발명의 일 실시 예에 따른 패션 상품 영상 검색 방법을 도시한 도면이다. FIG. 9 is a diagram illustrating a fashion product image searching method according to an embodiment of the present invention.

우선, 서버(200)는 사용자 장치(100)로부터 질의 영상을 수신한다(S410). First, the server 200 receives a query image from the user device 100 (S410).

이후, 서버(200)는 딥 러닝으로 학습된 모델을 이용하여 질의 영상으로부터 특징을 추출한다(S420). 여기서, 상기 모델은 합성곱 신경망 기법으로 학습된 모델일 수 있다. 그러나 상기 모델이 합성곱 신경망 기법으로 학습된 모델로 반드시 한정되는 것은 아니며, 다른 종류의 딥 러닝 기법에 의해 학습된 모델이 사용될 수도 있음은 물론이다.Thereafter, the server 200 extracts features from the query image using the model learned by deep learning (S420). Here, the model may be a model learned by the composite neural network technique. However, the model is not necessarily limited to a model learned by the composite neural network technique, and it is needless to say that a model learned by other types of deep learning techniques may be used.

이후, 서버(200)는 질의 영상에서 추출된 특징에 기초하여 질의 영상 내의 상품을 인식한다(S430). 구체적으로, 서버(200)는 질의 영상에서 추출된 특징에 기초하여 질의 영상 내의 상품의 카테고리를 인식한다. Thereafter, the server 200 recognizes the goods in the query image based on the features extracted from the query image (S430). Specifically, the server 200 recognizes the category of the product in the query image based on the feature extracted from the query image.

그 다음, 서버(200)는 질의 영상에서 인식된 상품과 유사한 상품을 포함하는 상품 영상들을 상품 영상 데이터베이스(270)에서 검색한다(S440). 구체적으로, 서버(200)는 질의 영상에서 인식된 상품의 카테고리와 유사하거나 동일한 카테고리에 해당되는 상품 영상들을 상품 영상 데이터베이스(270)에서 검색한다. Then, the server 200 searches the goods image database 270 for goods images including goods similar to the goods recognized in the query image (S440). Specifically, the server 200 searches the product image database 270 for product images that are similar to or similar to the category of the product recognized in the query image.

이후, 서버(200)는 상품 영상 데이터베이스(270)에서 검색된 상품 영상들을 대상으로 질의 영상과의 유사도를 계산한다(S450). 상기 S450 단계는 질의 영상으로부터 추출된 특징과 각 상품 영상으로부터 추출된 특징 간의 유사도를 계산하는 단계를 포함할 수 있다. 여기서, 유사도를 계산하는 방법으로는 해밍 거리 및 코사인 유사도를 예로 들 수 있다. Thereafter, the server 200 calculates the degree of similarity between the product images retrieved from the product image database 270 and the query image (S450). The step S450 may include calculating the similarity between the feature extracted from the query image and the feature extracted from each product image. As a method for calculating the degree of similarity, the Hamming distance and the degree of cosine similarity can be exemplified.

이후, 서버(200)는 각 상품 영상별로 계산된 유사도 값에 기초하여 각 상품 영상들을 정렬한다(S460). 상기 S460 단계는 유사도 값이 높은 순으로 각 상품 영상들을 정렬하는 단계를 포함할 수 있다. Thereafter, the server 200 arranges the product images based on the similarity value calculated for each product image (S460). In operation S460, the merchandise images may be sorted in descending order of the similarity value.

이후, 서버(200)는 정렬된 상품 영상들 중에서 소정 개수의 상품 영상을 선택한다(S470). 일 실시 예에 따르면, 상기 S470 단계는 유사도 값이 기준치 이상인 상품 영상들을 선택하는 단계를 포함한다. 다른 실시예에 따르면, 상기 S470 단계는 유사도 값을 기준으로 상위 N개의 상품 영상을 선택하는 단계를 포함한다. 이 때, 상기 N은 서버(200)의 관리자 또는 사용자에 의해 설정될 수 되거나, 사용자의 구매 빈도에 비례하여 자동으로 결정될 수도 있다. Thereafter, the server 200 selects a predetermined number of commodity images from the sorted commodity images (S470). According to an exemplary embodiment, the step S470 includes a step of selecting product images whose similarity value is equal to or higher than a reference value. According to another embodiment, the step S470 includes selecting the top N product images based on the similarity value. At this time, the N may be set by the administrator or user of the server 200, or may be automatically determined in proportion to the purchase frequency of the user.

이처럼 선택된 상품 영상들은 유무선 네트워크(300)를 통해 사용자 장치(100)로 전송된다(S470). 전송된 상품 영상들은 질의 영상과 함께 사용자 장치(100)의 디스플레이를 통해 표시된다. 예를 들면, 도 8에 도시된 바와 같은 질의 영상(310) 및 검색된 상품 영상들(320)이 디스플레이를 통해 표시된다. The selected product images are transmitted to the user device 100 through the wired / wireless network 300 (S470). The transmitted product images are displayed on the display of the user device 100 together with the query image. For example, the query image 310 and the retrieved product images 320 as shown in FIG. 8 are displayed through the display.

이상, 도 9를 참조하여 본 발명의 일 실시 예에 따른 패션 상품 영상 검색 방법을 설명하였다. 도 9에 도시된 단계들 중에서 일부 단계는 생략될 수도 있다. 예를 들어, S470 단계는 생략될 수도 있다. As described above, the fashion product image searching method according to the embodiment of the present invention has been described with reference to FIG. Some of the steps shown in FIG. 9 may be omitted. For example, step S470 may be omitted.

도 10은 본 발명의 다른 실시 예에 따른 영상 검색 장치(500)의 구성을 도시한 도면이다. FIG. 10 is a diagram illustrating a configuration of an image search apparatus 500 according to another embodiment of the present invention.

도 10을 참조하면, 본 발명의 다른 실시 예에 따른 영상 검색 장치(500)는 통신부(510), 후보 영역 추출부(520), 특징 추출부(530), 검색부(540), 유사도 계산부(550), 영상 정렬부(560), 질의 영상 데이터베이스(570) 및 상품 영상 데이터베이스(580)를 포함한다. 도 10에 도시된 영상 검색 장치(500)의 구성요소들은 도 3에 도시된 서버(200)의 구성요소들과 동일하거나 거의 유사하므로, 중복되는 설명은 생략하고 차이점 위주로 설명하기로 한다. 10, an image search apparatus 500 according to another embodiment of the present invention includes a communication unit 510, a candidate region extraction unit 520, a feature extraction unit 530, a search unit 540, An image arranging unit 560, a query image database 570, and a merchandise image database 580. The components of the image search apparatus 500 shown in FIG. 10 are the same as or similar to those of the server 200 shown in FIG. 3, so that redundant description will be omitted and differences will be mainly described.

후보 영역 추출부(520)는 질의 영상에서 하나 이상의 후보 영역을 추출한다. The candidate region extraction unit 520 extracts one or more candidate regions from the query image.

일 실시예에 따르면, 후보 영역 추출부(520)는 사용자가 터치 조작 등을 수행하여 질의 영상 내에 소정 영역을 지정하는 경우, 지정된 영역을 후보 영역으로 추출한다. According to one embodiment, when the user performs a touch operation or the like to designate a predetermined region in the query image, the candidate region extraction unit 520 extracts the designated region as a candidate region.

다른 실시예에 따르면, 후보 영역 추출부(520)는 질의 영상에서 상품이 존재할 것으로 추정되는 후보 영역들을 자동으로 추출한다. 이를 위하여 후보 영역 추출부(520)는 물체 검출 모델(object detection model)을 이용하여, 질의 영상에서 후보 영역들을 추출한다. 물체 검출 모델로는 R-CNN(Regions with Convolutional Neural Networks, Region based Convolutional Neural Networks) 모델, fast R-CNN 모델, YOLO(You Only Look Once) 모델을 예로 들 수 있다. According to another embodiment, the candidate region extraction unit 520 automatically extracts candidate regions in which a product is estimated to exist in the query image. For this, the candidate region extraction unit 520 extracts candidate regions from the query image using an object detection model. Examples of object detection models include R-CNN (Regions with Convolutional Neural Networks, Region based Convolutional Neural Networks), fast R-CNN models, and YOLO (You Only Look Once) models.

후보 영역 추출부(520)가 사용하는 물체 검출 모델(521)은 입력 영상 및 입력 영상의 그라운드 트루스 영상을 이용하여 사전에 학습될 수 있다. 여기서 도 11을 참조하여 후보 영역 추출부(520)가 사용하는 물체 검출 모델(521)의 학습 과정에 대해서 좀 더 구체적으로 설명하기로 한다. The object detection model 521 used by the candidate region extraction unit 520 can be learned in advance by using the ground truth image of the input image and the input image. Hereinafter, the learning process of the object detection model 521 used by the candidate region extraction unit 520 will be described in more detail with reference to FIG.

도 11을 참조하면, 물체 검출 모델(521)은 입력 영상(810)에서 상품이 존재할 것으로 추정되는 영역에 복수의 바운딩 박스(Bounding Box)를 생성한다. 도 11에는 입력 영상(820)에 총 4개의 바운딩 박스가 생성되어 있는 경우를 도시하고 있다. Referring to FIG. 11, the object detection model 521 generates a plurality of bounding boxes in an area where the goods are estimated to exist in the input image 810. FIG. 11 shows a case where a total of four bounding boxes are generated in the input image 820. FIG.

이후, 물체 검출 모델(521)은 바운딩 박스가 생성된 입력 영상(820)과 입력 영상(810)의 그라운드 트루스 영상(Ground-truth image; 830)을 비교한다. 그라운드 트루스 영상(830)이란 각 상품마다 바운딩 박스가 생성되어 있는 영상을 말하는 것으로, 각 바운딩 박스의 위치 정보 및 각 바운딩 박스 내에 있는 상품의 상품 정보를 포함한다. 상품 정보는 상품의 속성 정보를 포함한다. 상품의 속성 정보로는 아이템의 종류, 디자인 디테일, 패턴 및 소재를 예로 들 수 있다. Thereafter, the object detection model 521 compares the input image 820 in which the bounding box is created with the ground-truth image 830 of the input image 810. The ground-truth image 830 refers to an image in which a bounding box is generated for each product, and includes position information of each bounding box and product information of the products in each bounding box. The product information includes attribute information of the product. The attribute information of a product includes an item type, a design detail, a pattern, and a material.

물체 검출 모델(521)은 그라운드 트루스 영상(830)에 생성되어 있는 복수의 바운딩 박스에 대한 정보(예를 들어, 위치 및 상품)를 참조하여, 입력 영상(840)에 복수의 바운딩 박스를 재생성한다. 도 11은 입력 영상(840)에 총 3개의 바운딩 박스가 재생성되어 있는 경우를 도시하고 있다. 입력 영상(820)과 입력 영상(840)을 비교하면, 입력 영상(820)에 비하여 입력 영상(840)에 생성되어 있는 복수의 바운딩 박스의 위치가 그라운드 트루스 영상(830)에 생성되어 있는 복수의 바운딩 박스의 위치와 유사한 것을 알 수 있다. The object detection model 521 regenerates a plurality of bounding boxes in the input image 840 with reference to information (for example, position and product) of the plurality of bounding boxes generated in the ground-truth image 830 . 11 shows a case where a total of three bounding boxes are regenerated in the input image 840. FIG. When the input image 820 and the input image 840 are compared with each other, the positions of the plurality of bounding boxes generated in the input image 840, as compared with the input image 820, You can see that this is similar to the position of the bounding box.

다시 도 10을 참조하면, 후보 영역 추출부(520)는 전술한 방식으로 충분히 학습된 물체 검출 모델(521)을 이용하여, 질의 영상에서 후보 영역을 추출한다. 질의 영상에서 후보 영역을 추출하는 과정을 구체적으로 설명하면 다음과 같다. Referring again to FIG. 10, the candidate region extraction unit 520 extracts candidate regions from the query image using the object detection model 521 that has been sufficiently learned in the above-described manner. A process of extracting a candidate region from a query image will be described in detail as follows.

우선, 후보 영역 추출부(520)는 학습된 물체 검출 모델(521)의 중간 단계인 특징 맵(feature map)을 추출한다. 그 다음, 분류에 큰 영향을 미치는 특징 맵의 가중치 합(weighted sum)을 계산하여 중점적으로 사용하는 영상의 위치를 히트 맵(heat map)으로 생성한다. 그 다음, 생성된 히트 맵에 임계화(thresholding) 또는 연결 성분 추출(connected component extraction) 등의 후처리(post processing) 과정을 적용하여 바운딩 박스를 생성한다. First, the candidate region extraction unit 520 extracts a feature map that is an intermediate step of the learned object detection model 521. Then, a weighted sum of feature maps having a large influence on the classification is calculated, and a position of an image to be used mainly is generated as a heat map. Then, a bounding box is created by applying post processing such as thresholding or connected component extraction to the generated heat map.

질의 영상에 여러 종류의 아이템이 존재하는 경우, 각 아이템별로 바운딩 박스가 생성된다. 또한, 물체 검출 모델(521)은 아이템의 종류를 분류하기 위한 분류 모델뿐만 아니라, 디자인 디테일의 종류를 분류하기 위한 분류 모델, 패턴의 종류를 분류하기 위한 분류 모델, 소재의 종류를 분류하기 위한 분류 모델을 이용하여, 디자인 디테일, 패턴, 또는 소재에 대한 특징이 질의 영상 상의 어느 위치에 존재하는지 대략적인 위치를 추정할 수 있다. If there are several kinds of items in the query image, a bounding box is created for each item. The object detection model 521 includes not only classification models for classifying the types of items but also classification models for classifying types of design details, classification models for classifying types of patterns, classification classes for classifying materials Using the model, it is possible to estimate the approximate position of a feature on the query detail, pattern, or material on the query image.

특징 추출부(530)는 질의 영상에서 추출된 복수의 후보 영역 각각에 대하여 특징을 추출한다. 좀 더 구체적으로, 특징 추출부(530)는 질의 영상에서 추출된 복수의 후보 영역 각각에 대하여 속성별로 저해상도 특징(coarse feature)과 고해상도 특징(fine feature)을 추출한다. 이를 위하여, 특징 추출부(530)는 특징 추출 모델을 이용한다. 여기서, 도 12를 참조하여, 특징 추출 모델의 구조에 대해서 구체적으로 설명하기로 한다.The feature extraction unit 530 extracts features of each of the plurality of candidate regions extracted from the query image. More specifically, the feature extraction unit 530 extracts a coarse feature and a fine feature for each of a plurality of candidate regions extracted from the query image. For this, the feature extraction unit 530 uses a feature extraction model. Here, the structure of the feature extraction model will be described in detail with reference to FIG.

도 12를 참조하면, 특징 추출 모델은 단일망(531)과 속성별로 분리된 복수의 분류망(532)으로 구성된다. Referring to FIG. 12, the feature extraction model is composed of a single network 531 and a plurality of classification networks 532 separated by attributes.

단일망(531)은 도 5에 도시된 합성곱 신경망에서 여러 개의 합성곱 계층(C1-layer 내지 C3-layer) 및 여러 개의 서브샘플링 계층(MP1-layer 내지 MP2-layer)를 나타낸다. 이처럼 하위 계층을 단일망(unified network)로 구성하면, 하위 계층에서는 서로 다른 속성의 특징을 공유할 수 있다. The single network 531 represents a plurality of convolutional layers (C1-layer to C3-layer) and multiple subsampling layers (MP1-layer to MP2-layer) in the composite neural network shown in FIG. If the lower layer is configured as a unified network, the lower layer can share characteristics of different attributes.

복수의 분류망(532)은 도 5에 도시된 합성곱 신경망에서 완전 연결 계층(Fully-Connected layer)를 나타낸다. 실시예에 따르면, 복수의 분류망(532)는 아이템 분류망, 디자인 디테일 분류망, 패턴 분류망, 및 소재 분류망을 포함한다. 아이템 분류망은 아이템에 대한 저해상도 특징(coarse feature)을 추출한 다음, 고해상도 특징(fine feature)을 추출한다. 디자인 디테일 분류망은 디자인 디테일에 대한 저해상도 특징을 추출한 다음, 고해상도 특징을 추출한다. 패턴 분류망은 패턴에 대한 저해상도 특징을 추출한 다음, 고해상도 특징을 추출한다. 그리고 소재 분류망은 소재에 대한 저해상도 특징을 추출한 다음, 고해상도 특징을 추출한다. The plurality of classification networks 532 represent a fully-connected layer in the composite neural network shown in FIG. According to an embodiment, the plurality of classification networks 532 includes an item classification network, a design detail classification network, a pattern classification network, and a material classification network. The item classification network extracts a coarse feature for an item and then extracts a high-resolution feature. The design detail classification network extracts the low-resolution features for the design details, and then extracts the high-resolution features. The pattern classification network extracts low-resolution features for the pattern, and then high-resolution features. The material classification network then extracts the low-resolution features of the material and then the high-resolution features.

다시 도 10을 참조하면, 특징 추출부(530)는 복수의 후보 영역 각각에서 속성별로 추출된 특징에 기초하여 각 후보 영역 내의 상품의 아이템, 디자인 디테일, 패션 및 소재 중 적어도 하나를 인식한다. Referring again to FIG. 10, the feature extraction unit 530 recognizes at least one item, design detail, fashion, and material of a product in each candidate region based on the feature extracted for each attribute in each of the plurality of candidate regions.

한편, 상술한 후보 영역 추출부(520) 및 특징 추출부(530)는 상품 영상 데이터베이스(580)에 저장되어 있는 상품 영상에 대해서도 동일하게 동작할 수 있다. 즉, 후보 영역 추출부(520)는 상품 영상으로부터 복수의 후보 영역을 추출한다. 그리고 특징 추출부(530)는 상품 영상에서 추출된 복수의 후보 영역 각각에 대하여 속성별로 저해상도 특징(coarse feature)과 고해상도 특징(fine feature)을 추출한다. 이처럼 상품 영상에서 추출된 복수의 후보 영역, 각 후보 영역에서 속성별로 추출된 저해상도 특징과 고해상도 특징 등은 해당 상품 영상에 매핑되어 상품 영상 데이터베이스(580)에 저장될 수 있다. The candidate region extracting unit 520 and the feature extracting unit 530 may operate in the same manner with respect to the product image stored in the product image database 580. [ That is, the candidate region extracting unit 520 extracts a plurality of candidate regions from the product image. The feature extraction unit 530 extracts a coarse feature and a fine feature for each of the plurality of candidate regions extracted from the product image. As described above, the plurality of candidate regions extracted from the product image, the low-resolution feature and the high-resolution feature extracted for each attribute in each candidate region can be mapped to the corresponding product image and stored in the product image database 580.

검색부(540)는 질의 영상의 각 후보 영역에서 인식된 상품의 아이템(디자인 디테일, 패턴, 또는 소재)과 유사한 상품을 포함하는 상품 영상들을 상품 영상 데이터베이스(280)에서 검색한다. 여기서, 도 13을 참조하여, 검색부(540)의 동작에 대해서 좀 더 구체적으로 설명하기로 한다. The search unit 540 searches the product image database 280 for product images including products similar to items (design details, patterns, or materials) of goods recognized in each candidate region of the query image. Hereinafter, the operation of the search unit 540 will be described in more detail with reference to FIG.

도 13에 도시된 질의 영상에서 티셔츠를 포함하는 후보 영역이 추출되었고, 추출된 후보 영역 내의 영상(이하 '후보 영상'이라 한다)을 특징 추출 모델에 적용하여, 후보 영상의 아이템에 대한 저해상도 특징(coarse feature) 및 고해상도 특징(fine feature)이 추출되었다고 하자. A candidate region including a T-shirt is extracted from the query image shown in FIG. 13, and an image in the extracted candidate region (hereinafter, referred to as a 'candidate image') is applied to the feature extraction model to obtain a low- coarse feature and a fine feature are extracted.

이후, 검색부(540)는 아이템의 저해상도 특징에 대한 이진값(binary value)을 이용하여, 후보 영상과 유사한 상품 영상들을 검색한다. 예를 들어, 아이템의 저해상도 특징에 대한 이진값이 '10100'인 경우, 검색부(540)는 상품 영상 데이터베이스(580)에 저장되어 있는 상품 영상들 중에서 아이템 해시 테이블(Item hash table)에 '10100'과 동일한 값, 또는 '10100'과 유사한 값(예를 들어, 해밍 거리가 N 이하인 값; N은 상품 영상 데이터베이스에 저장되어 있는 상품 영상의 종류나 양에 따라 변할 수 있음)이 포함되어 있는 상품 영상들을 검색한다. 여기서, 아이템 해시 테이블이란 소정 상품 영상과 관련하여 아이템에 대한 특징들이 저장되어 있는 테이블을 말한다. Then, the search unit 540 searches for product images similar to the candidate image using a binary value of the low-resolution feature of the item. For example, when the binary value for the low-resolution feature of the item is '10100', the search unit 540 adds '10100' to the item hash table among the product images stored in the product image database 580 (For example, a value with a Hamming distance of N or less, and N may vary depending on the type and amount of the product image stored in the product image database), or a value similar to '10100' Search for images. Here, the item hash table refers to a table in which characteristics of an item are stored in association with a predetermined product image.

이후, 검색부(540)는 아이템의 고해상도 특징에 대한 이진값을 이용하여, 아이템의 저해상도 특징을 이용하여 검색된 상품 영상들 내에서 후보 영상과 유사한 상품 영상들을 다시 검색한다. 이처럼 검색부는 아이템의 저해상도 특징을 이용하여 후보 영상과 유사한 상품 영상들을 1차로 검색한 다음, 아이템의 고해상도 특징을 이용하여, 1차 검색된 상품 영상들 내에서 후보 영상과 유사한 상품 영상들을 2차로 검색한다. Then, the search unit 540 uses the binary value of the high-resolution feature of the item to search again for the product images similar to the candidate image within the product images retrieved using the low-resolution feature of the item. As described above, the search unit searches the product images similar to the candidate image by using the low-resolution feature of the item, and then secondarily searches the product images similar to the candidate image within the first-retrieved product images using the high-resolution feature of the item .

한편, 도 13에 도시되어 있지는 않지만, 다른 실시예에 따르면, 검색부(540)는 2차 검색 시, 디자인 디테일의 고해상도 특징에 대한 이진값, 패턴의 고해상도 특징에 대한 이진값, 또는 소재의 고해상도 특징에 대한 이진값을 이용할 수도 있다. 이 경우, 검색부(540)는 기 저장되어 있는 상품 영상들의 아이템 해시 테이블 대신, 디자인 디테일 해시 테이블, 패턴 해시 테이블, 또는 소재 해시 테이블을 참조할 수 있다. Alternatively, although not shown in FIG. 13, according to another embodiment, the search unit 540 may perform a binary search for a high-resolution feature of the design detail, a binary value for the high-resolution feature of the pattern, A binary value for the feature may be used. In this case, the retrieval unit 540 may refer to a design detail hash table, a pattern hash table, or a material hash table instead of the item hash table of previously stored product images.

상술한 바와 같이, 아이템의 저해상도에 기초하여, 후보 영상과 유사한 상품 영상들을 1차로 검색하면, 검색 범위를 한정할 수 있으므로, 검색 속도를 증가시킬 수 있다. 또한, 1차로 검색된 상품 영상들 내에서, 후보 영상과 유사한 상품 영상들을 2차로 검색할 때, 아이템, 디자인 디테일, 패턴, 또는 소재의 고해상도 특징에 기초하여 상품 영상을 검색할 수 있으므로, 아이템의 고해상도 특징만을 이용하여 2차 검색을 수행하는 경우에 비하여, 사용자가 원하는 검색 결과를 제공할 가능성이 높아진다. As described above, if the product images similar to the candidate image are firstly searched based on the low resolution of the item, the search range can be limited, so that the search speed can be increased. In addition, when the product images similar to the candidate image are searched for in the secondary search, the product images can be searched based on the high-resolution features of the item, design detail, pattern, or material, The probability that a user provides a desired search result is higher than in a case where a secondary search is performed using only the feature.

한편, 상술한 바와 같은 상품 영상 검색 과정에는 사용자가 개입하지 않을 수도 있고, 사용자가 개입할 수도 있다. 이에 대한 구체적인 설명을 위해 도 14 및 도 15를 참조하기로 한다. Meanwhile, in the product image search process as described above, the user may not intervene or the user may intervene. 14 and 15 will be referred to for a specific explanation.

도 14는 상품 영상 검색 과정에 사용자가 개입하지 않는 경우를 설명하기 위한 도면이다. 그리고 도 15는 상품 영상 검색 과정에 사용자가 개입하는 경우를 설명하기 위한 도면이다. 14 is a diagram for explaining a case where the user does not intervene in the product image search process. And FIG. 15 is a diagram for explaining a case where a user intervenes in a product image search process.

도 14에 도시된 질의 영상(630)에서 제1 후보 영역(631), 제2 후보 영역(632), 제3 후보 영역(633) 및 제4 후보 영역(634)이 추출되었다면, 검색부(540)는 각 후보 영역(631, 632, 633, 634)에서 인식된 상품의 아이템, 디자인 디테일, 소재, 또는 패턴과 유사한 상품을 포함하는 상품 영상들(640)을 검색한다. If the first candidate region 631, the second candidate region 632, the third candidate region 633 and the fourth candidate region 634 are extracted from the query image 630 shown in FIG. 14, the search unit 540 Retrieves product images 640 that include items similar to the item, design detail, material, or pattern of the goods recognized in each of the candidate regions 631, 632, 633, and 634.

구체적으로, 검색부(540)는 질의 영상(630)의 제1 후보 영역(631)에서 인식된 상품의 아이템(예를 들어, 선글라스), 디자인 디테일, 패턴 또는 소재와 유사하거나 동일한 상품을 포함하는 상품 영상들(641)을 검색한다. 또한, 검색부(540)는 질의 영상(630)의 제2 후보 영역(632)에서 인식된 상품의 아이템(예를 들어, 줄무늬 티셔츠), 디자인 디테일, 패턴 또는 소재와 유사하거나 동일한 상품을 포함하는 상품 영상들(642)을 검색한다. 또한, 검색부(540)는 질의 영상(630)의 제3 후보 영역(633)에서 인식된 상품의 아이템(예를 들어, 스키니 진), 디자인 디테일, 패턴 또는 소재와 유사하거나 동일한 상품을 포함하는 상품 영상들(643)을 검색한다. 또한, 검색부(540)는 질의 영상(630)의 제4 후보 영역(634)에서 인식된 상품의 아이템(예를 들어, 부츠), 디자인 디테일, 패턴 또는 소재와 유사하거나 동일한 상품을 포함하는 상품 영상들(644)을 검색한다. More specifically, the search unit 540 may include a product similar to or identical to an item of merchandise (e.g., sunglasses), design detail, pattern or material recognized in the first candidate region 631 of the query image 630 Product images 641 are retrieved. The retrieval unit 540 may also include a product similar to or identical to an item of merchandise (e.g., a striped t-shirt), design detail, pattern or material recognized in the second candidate region 632 of the query image 630 Product images 642 are retrieved. The retrieval unit 540 may also include a product similar to or identical to an item of merchandise (e.g., skinny), design detail, pattern or material recognized in the third candidate region 633 of the query image 630 Product images 643 are retrieved. The search unit 540 may also search for a product containing items similar or identical to items of merchandise (e.g., boots), design details, patterns or materials recognized in the fourth candidate region 634 of the query image 630 Images 644 are retrieved.

한편, 검색부(540)가 질의 영상(630)의 각 후보 영역에서 인식된 상품의 속성을 기준으로 상품 영상들을 검색하기에 앞서, 질의 영상(630)에 대한 후보 영역 추출 결과가 사용자 장치(100)로 제공되어, 사용자 장치(100)를 통해 표시될 수 있다. 이후, 사용자가 소정 후보 영역을 선택하면, 선택된 후보 영역에 대한 정보가 영상 검색 장치(500)로 전송되고, 영상 검색 장치(500)의 검색부(540)는 선택된 후보 영역에서 인식된 상품의 속성과 유사하거나 동일한 상품을 포함하는 상품 영상들을 검색한다. 예를 들어, 도 15에 도시된 질의 영상(630) 및 후보 영역들(631, 632, 633, 634) 중에서 제2 후보 영역(632)이 사용자에 의해 선택된 경우, 검색부(540)는 제2 후보 영역(632)에서 인식된 상품의 아이템(예를 들어, 줄무늬 티셔츠), 디자인 디테일, 패턴 또는 소재와 유사하거나 동일한 상품 영상들(642)만을 검색한다. Before the retrieval unit 540 retrieves the product images based on the attributes of the goods recognized in the respective candidate regions of the query image 630, the candidate region extraction result for the query image 630 is transmitted to the user device 100 And may be displayed via the user device 100. [ Then, when the user selects a predetermined candidate region, the information on the selected candidate region is transmitted to the image search apparatus 500, and the search unit 540 of the image search apparatus 500 searches the candidate region The merchandise images including merchandise that is similar or identical to merchandise merchandise. For example, if the second candidate region 632 is selected by the user from the query image 630 and the candidate regions 631, 632, 633, and 634 shown in FIG. 15, the search unit 540 searches the second candidate region 632 Only product images 642 that are similar or identical to item items (e.g., striped t-shirts), design details, patterns, or materials recognized in the candidate region 632 are searched.

이 때, 검색된 상품 영상들(642)로는 줄무늬 티셔츠만을 포함하는 상품 영상(642a, 642b, 642c) 및/또는 줄무늬 티셔츠를 착용한 사람을 포함하는 상품 영상(642d)이 포함될 수 있다. 이처럼 줄무늬 티셔츠만을 포함하는 상품 영상뿐만 아니라 줄무니 티셔츠를 착용한 사람을 포함하는 상품 영상을 검색 결과로 제공하면, 사용자가 줄무늬 티셔츠와 관련된 전체 스타일을 확인할 수 있다. 따라서 사용자가 줄무늬 티셔츠와 관련된 스타일링법을 별도로 검색할 필요가 없으므로, 사용자의 편의성이 향상된다. At this time, the retrieved merchandise images 642 may include merchandise images 642a, 642b, and 642c including only striped t-shirts and / or merchandise images 642d including persons wearing striped t-shirts. If you provide product images that include people wearing striped T-shirts as well as product videos that include only striped T-shirts, you can see the entire style associated with striped T-shirts. Therefore, the user does not have to search for the styling method related to the striped T-shirt separately, thereby improving the user's convenience.

다시 도 10을 참조하면, 검색부(540)에서 검색된 상품 영상들은 후술될 유사도 계산부(550)로 제공된다. Referring again to FIG. 10, the product images retrieved from the retrieval unit 540 are provided to the similarity calculation unit 550, which will be described later.

유사도 계산부(550)는 검색부(540)에 의해 검색된 상품 영상들을 대상으로 질의 영상과의 유사도를 계산한다. 일 예로, 유사도 계산부(550)는 질의 영상의 각 후보 영역에서 추출된 특징과 상품 영상의 각 후보 영역에서 추출된 특징 간의 유사도를 계산한다. 이 때, 유사도 계산 방법으로는 해밍 거리(Hamming Distance) 및 코사인 유사도(Cosine Similarity)를 예로 들 수 있다. The similarity calculation unit 550 calculates the degree of similarity between the product images retrieved by the retrieval unit 540 and the query image. For example, the similarity calculation unit 550 calculates the similarity between features extracted from each candidate region of the query image and features extracted from each candidate region of the product image. In this case, the similarity calculation method may be exemplified by Hamming distance and cosine similarity.

영상 정렬부(560)는 검색부(540)에 의해 검색된 상품 영상들을 유사도 계산부(550)로부터 제공받은 유사도 값에 기초하여 정렬한다. 예를 들면, 영상 정렬부(560)는 유사도 값이 높은 순서대로 상품 영상들을 정렬한다. 정렬된 상품 영상들은 통신부(510)를 통해 사용자 장치(100)로 제공된다. The image arrangement unit 560 arranges the product images retrieved by the retrieval unit 540 based on the similarity value provided from the similarity calculation unit 550. For example, the image arranging unit 560 arranges product images in the order of high similarity value. The sorted commodity images are provided to the user apparatus 100 through the communication unit 510.

이 때, 정렬된 상품 영상들 모두가 사용자 장치(100)로 전송되거나, 정렬된 상품 영상들 중에서 선택된 상품 영상들만이 사용자 장치(100)로 전송된다. 예를 들어, 정렬된 상품 영상들 중에서 유사도 값이 기준치 이상인 상품 영상들만이 선택되거나, 정렬된 상품 영상들 중에서 상위 M개의 상품 영상들이 선택될 수 있다. 이 때, M은 영상 검색 장치(500)의 관리자나 사용자에 의해 수동으로 설정되거나, 사용자의 구매 빈도나 구매 이력에 따라 자동으로 설정될 수도 있다. At this time, all the ordered product images are transmitted to the user device 100, or only the product images selected from the sorted product images are transmitted to the user device 100. For example, among the sorted product images, only the product images having a similarity value equal to or higher than the reference value may be selected, or the uppermost M product images among the sorted product images may be selected. At this time, M may be manually set by an administrator or a user of the image search apparatus 500, or may be automatically set according to a purchase frequency or purchasing history of the user.

도 16은 본 발명의 다른 실시 예에 따른 패션 상품 영상 검색 방법을 도시한 순서도이다. FIG. 16 is a flowchart illustrating a fashion product image searching method according to another embodiment of the present invention.

우선 영상 검색 장치(500)는 사용자 장치(100)로부터 질의 영상을 수신한다(S710). First, the image search apparatus 500 receives a query image from the user apparatus 100 (S710).

이후, 영상 검색 장치(500)는 질의 영상을 딥 러닝으로 학습된 물체 검출 모델(521)에 적용하여, 질의 영상에서 상품이 존재할 것으로 추정되는 하나 이상의 후보 영역을 추출한다(S720). 여기서, 상기 물체 검출 모델은 R-CNN 모델, fast R-CNN 모델, YOLO 모델을 예로 들 수 있다. 예시된 모델들은, 입력 영상 및 입력 영상의 그라운드 트루스 영상을 포함하는 학습데이터를 이용하여 학습될 수 있다. 또는 입력 영상을 포함하는 학습데이터를 이용하여 학습될 수도 있다. Thereafter, the image search apparatus 500 applies the query image to the object detection model 521 learned by deep learning to extract one or more candidate regions estimated to be present in the query image (S720). Here, the object detection model may be an R-CNN model, a fast R-CNN model, or a YOLO model. The illustrated models can be learned using learning data including an input image and a ground-truth image of the input image. Or may be learned using learning data including an input image.

이후, 영상 검색 장치(500)는 후보 영역 내의 영상인 후보 영상을 딥 러닝으로 학습된 특징 추출 모델에 적용하여, 후보 영역의 속성별로 특징을 추출한다(S730)Then, the image search apparatus 500 applies a candidate image, which is an image in the candidate region, to the feature extraction model learned by deep learning, and extracts features for each attribute of the candidate region (S730)

실시예에 따르면, 상기 S730 단계는, 후보 영상을 기학습된 특징 추출 모델에 적용하여 후보 영상의 속성별로 저해상도 특징(coarse feature)을 추출하는 단계와, 후보 영상을 기학습된 특징 추출 모델에 적용하여 후보 영상의 속성별로 고해상도 특징(fine feature)를 추출하는 단계를 포함한다. According to an embodiment of the present invention, the step S730 may include a step of applying a candidate image to the learned feature extraction model to extract a coarse feature for each attribute of the candidate image, and a step of applying the candidate image to the feature extraction model And extracting a fine feature for each attribute of the candidate image.

앞서 도 12를 참조하여 설명한 바와 같이, 상기 특징 추출 모델은 하위 계층이 단일망(unified network)으로 구성되고, 상위 계층이 복수의 분류망으로 구성된 구조를 갖는다. 특징 추출 모델의 구조를 이와 같이 구성하면, 하위 계층은 단일망으로 구성되므로, 하위 계층에서는 파라미터를 공유할 수 있다. 또한 상위 계층만이 복수개의 분류망으로 나뉘게 되므로, 아이템, 디자인 디테일, 패턴, 소재 각각에 대해서 저해상도 특징과 고해상도 특징을 출력할 수 있다. 뿐만 아니라, 특징 추출 전달(forward)에 소요되는 평균 시간을 단축시킬 수 있어, 상품 영상 검색의 속도를 향상시킬 수 있다. 영상 검색 속도 향상에 대해서는 도 17을 참조하여 후술하기로 한다. As described above with reference to FIG. 12, the feature extraction model has a structure in which the lower layer is composed of a unified network and the upper layer is composed of a plurality of classification networks. If the structure of the feature extraction model is constructed as described above, the lower layer is composed of a single network, so that parameters can be shared in the lower layer. Also, since only the upper layer is divided into a plurality of classification networks, it is possible to output low-resolution features and high-resolution features for each item, design detail, pattern, and material. In addition, the average time required for feature extraction can be shortened, and the speed of product image retrieval can be improved. The video search speed improvement will be described later with reference to FIG.

이후, 영상 검색 장치(500)는 후보 영상에서 추출된 특징에 기초하여, 후보 영상과 유사한 상품 영상들을 상품 영상 데이터베이스(580)에서 검색한다(S740).Thereafter, the image search apparatus 500 searches the product image database 580 for product images similar to the candidate image, based on the features extracted from the candidate images (S740).

실시예에 따르면, 상기 S740 단계는, 후보 영상에서 추출된 저해상도 특징에 기초하여, 후보 영상과 유사한 상품 영상들을 상품 영상 데이터베이스(580)에서 1차로 검색하는 단계와, 후보 영상에서 추출된 고해상도 특징에 기초하여, 후보 영상과 유사한 상품 영상들을 1차로 검색된 상품 영상들 내에서 2차로 검색하는 단계를 포함한다. According to the embodiment, the step S740 may include a step of primarily searching for a product image similar to the candidate image in the product image database 580 based on the low-resolution feature extracted from the candidate image, And secondarily searching for the commodity images similar to the candidate image in the commodity images retrieved first.

후보 영상의 저해상도 특징을 이용한 1차 검색 단계에서는 후보 영상과 대략적으로 유사한 상품 영상들이 검색된다. 그리고 후보 영상의 고해상도 특징을 이용한 2차 검색 단계에서는 후보 영상과 더욱 유사한 상품 영상들이 검색된다. In the first search step using the low-resolution feature of the candidate image, product images that are approximately similar to the candidate image are searched. In the second search step using the high resolution feature of the candidate image, product images more similar to the candidate image are searched.

한편, 1차 검색 단계에서는 후보 영상의 아이템에 대한 저해상도 특징이 이용될 수 있다. 그리고 2차 검색 단계에서는 후보 영상의 아이템에 대한 고해상도 특징뿐만이 아니라, 후보 영상의 디자인 디테일에 대한 고해상도 특징, 소재에 대한 고해상도 특징, 패턴에 대한 고해상도 특징, 또는 소재에 대한 고해상도 특징이 사용될 수도 있다. 2차 검색 단계에서 어떠한 속성의 고해상도 특징을 사용할 것인지는, 사용자에 의해 선택될 수도 있다. On the other hand, the low-resolution feature of the candidate image item can be used in the first search step. In addition, not only the high-resolution features of the items of the candidate image but also the high-resolution features of the design details of the candidate images, the high-resolution features of the materials, the high-resolution features of the patterns, or the high-resolution features of the materials may be used in the secondary search step. It is also possible for the user to select which attribute of the high resolution feature to use in the secondary search step.

이후, 영상 검색 장치(500)는 검색된 상품 영상들을 대상으로 질의 영상과의 유사도를 계산한다(S750). 상기 S750 단계는 질의 영상의 각 후보 영역에서 추출된 특징과 상품 영상의 각 후보 영역에서 추출된 특징 간의 유사도를 계산하는 단계를 포함한다. Then, the image search apparatus 500 calculates the similarity between the retrieved product images and the query image (S750). The step S750 includes calculating the similarity between the feature extracted from each candidate region of the query image and the feature extracted from each candidate region of the product image.

이후, 영상 검색 장치(500)는 계산된 유사도 값에 기초하여 상품 영상들을 정렬한다(S760). 상기 S760 단계는 유사도 값이 높은 순으로 각 상품 영상들을 정렬하는 단계를 포함할 수 있다. Thereafter, the image search apparatus 500 arranges the product images based on the calculated similarity value (S760). The step S760 may include arranging the product images in descending order of the similarity value.

이후, 영상 검색 장치(500)는 정렬된 상품 영상들 중에서 소정 개수의 상품 영상을 선택하고, 선택된 상품 영상들을 사용자 장치로 전송한다(S770). Then, the image search apparatus 500 selects a predetermined number of product images from among the sorted product images, and transmits the selected product images to the user device (S770).

일 실시 예에 따르면, 상기 S770 단계에서는 유사도 값이 기준치 이상인 상품 영상들이 선택될 수 있다. According to one embodiment, in step S770, product images having a similarity value equal to or higher than a reference value may be selected.

다른 실시 예에 따르면, 상기 S770 단계에서는 유사도 값을 기준으로 상위 M개의 상품 영상들이 선택될 수 있다. 이 때, M은 영상 검색 장치(500)의 관리자 또는 사용자에 의해 설정되거나, 사용자의 구매 빈도에 비례하여 자동으로 결정될 수 있다. According to another embodiment, in step S770, the top M product images may be selected based on the similarity value. At this time, M may be set by an administrator or a user of the image search apparatus 500, or may be automatically determined in proportion to the purchase frequency of the user.

도 17은 서로 다른 구조를 가지는 특징 추출 모델들을 대상으로 특징 추출 전달에 소요되는 시간을 실험한 결과를 도시한 그래프이다. FIG. 17 is a graph showing the results of an experiment on the time required for feature extraction and transmission with respect to feature extraction models having different structures.

실험 대상은 제1 특징 추출 모델, 제2 특징 추출 모델, 및 제3 특징 추출 모델이다. 제1 특징 추출 모델은 하나의 단일 분류망으로 구성된다. 제2 특징 추출 모델은 단일망과 복수의 분류망(아이템, 소재, 패턴)으로 구성된다. 제3 특징 추출 모델은 복수의 단일 분류망(아이템, 소재, 패턴)으로 구성된다. 여기서, 제2 특징 추출 모델은 도 12에 도시된 특징 추출 모델을 의미한다. The objects to be tested are the first feature extraction model, the second feature extraction model, and the third feature extraction model. The first feature extraction model consists of one single classification network. The second feature extraction model is composed of a single network and a plurality of classification networks (items, materials, patterns). The third feature extraction model is composed of a plurality of single classification networks (items, materials, patterns). Here, the second feature extraction model means the feature extraction model shown in FIG.

실험에는 10,000개의 테스트 영상이 사용되었다. 실험 결과, 제1 특징 추출 모델에서 특징 추출 전달(forward)에 소요되는 평균 시간은 18ms 인 것으로 측정되었다. 제2 특징 추출 모델에서 특징 추출 전달에 소요되는 평균 시간은 27ms 인 것으로 측정되었다. 제3 특징 추출 모델에서 특징 추출 전달에 소요되는 평균 시간은 54ms 인 것으로 측정되었다. 10,000 test images were used in the experiment. Experimental results show that the average time taken for the feature extraction forward in the first feature extraction model is measured to be 18 ms. The mean time required for feature extraction in the second feature extraction model was measured to be 27 ms. The average time required for feature extraction and transmission in the third feature extraction model was measured to be 54 ms.

즉, 제2 특징 추출 모델은 제3 특징 추출 모델에 비하여 속도가 2배 정도 향상된 것을 알 수 있다. 따라서, 하위 계층은 단일망으로 구성되고 상위 계층은 속성별로(아이템, 소재, 패턴) 복수의 분류망으로 구성되어 있는 제2 특징 추출 모델을 사용하는 경우, 영상 검색 속도 향상에 도움이 된다는 것을 확인할 수 있다. That is, it can be seen that the speed of the second feature extraction model is improved by about two times as compared with the third feature extraction model. Therefore, it is confirmed that the lower layer is composed of a single network, and the upper layer is useful for improving the image retrieval speed when a second feature extraction model composed of a plurality of classification networks (item, material, pattern) is used have.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in connection with the embodiments of the present invention may be embodied directly in hardware, in software modules executed in hardware, or in a combination of both. The software module may be a random access memory (RAM), a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a hard disk, a removable disk, a CD- May reside in any form of computer readable recording medium known in the art to which the invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다. While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, You will understand. Therefore, it should be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

100: 사용자 장치
200: 영상 검색 장치
210: 통신부
230: 특징 추출부
240: 검색부
250: 유사도 계산부
260: 영상 정렬부
270: 질의 영상 데이터베이스
280: 상품 영상 데이터베이스100: User device
200: Image search device
210:
230: Feature extraction unit
240:
250:
260:
270: Query image database
280: Product image database

Claims

A communication unit receiving a query image from a user device;
A candidate region extracting step of extracting at least one candidate region in which the product is estimated to exist in the query image by applying the query image to the previously learned object detection model;
A feature extraction step of applying a candidate image, which is an image in the at least one candidate region, to a previously-extracted feature extraction model, and extracting features of the candidate image for each attribute;
A search step of searching, in a product image database, product images similar to the candidate region image based on the feature extracted by the search unit;
Lt; / RTI >
Wherein the feature extraction model has a structure in which the lower layer is composed of a unified network and the upper layer is composed of a plurality of classification networks separated by the attribute.

The method according to claim 1,
Wherein the plurality of classification networks comprise:
Wherein a coarse feature having a resolution lower than a specific resolution and a fine feature having a resolution higher than the specific resolution are learned for each attribute.

3. The method of claim 2,
The feature extraction step may include:
Extracting a low-resolution feature for each of the plurality of attributes by applying the candidate image to the learned feature extraction model with the low-resolution feature; And
A high-resolution feature extraction step of extracting a high-resolution feature for each of the attributes by applying the candidate image to the feature extraction model learned as the high-resolution feature;
And displaying the product image.

The method of claim 3,
The retrieving step comprises:
A low resolution searching step of searching, in the product image database, product images similar to the candidate image based on the low resolution feature extracted in the low resolution feature extraction step; And
Searching for a product image similar to the candidate image within the similar product images retrieved in the low-resolution retrieval step, based on the high-resolution feature extracted in the high-resolution feature extraction step;
And displaying the product image.