KR101768521B1

KR101768521B1 - Method and system providing informational data of object included in image

Info

Publication number: KR101768521B1
Application number: KR1020160054198A
Authority: KR
Inventors: 송철환; 민재식; 김영관; 김재명
Original assignee: 네이버 주식회사
Priority date: 2016-05-02
Filing date: 2016-05-02
Publication date: 2017-08-17

Abstract

Disclosed are a method and a system for providing data on an object included in an image. The method for providing data on an object included in an image comprises the steps of: receiving an image by using a query for retrieval; detecting an object corresponding to a specific item in the image by using a deep learning model in which the specific item is learned and extracting features of the detected object; retrieving data corresponding to the features of the object among data related to the specific item; and providing the data corresponding to the features of the object as a search result for the image.

Description

TECHNICAL FIELD [0001] The present invention relates to a method and system for providing information data on an object included in an image,

아래의 설명은 이미지에 포함된 객체에 대해 연관된 정보 데이터를 찾아서 제공하는 기술에 관한 것이다.The following description relates to techniques for finding and providing associated information data for objects included in an image.

온라인을 통한 인터넷 상의 각종 포털, 전문 또는 개인 웹사이트에는 광고와 같은 다양한 정보 데이터를 게재하여 홍보할 수 있는 다양한 방식의 정보 제공창을 제공한다.Various kinds of information providing windows for displaying various information data such as advertisements are provided on various kinds of portal, professional or personal web sites on the Internet through online.

인터넷을 통한 정보 데이터는 TV나 신문, 라디오 등의 매체와 더불어 기능이나 효과적인 측면에서 거대한 수익시장을 형성하고 있다. 더욱이, 포털이나 각종 정보를 제공하는 웹사이트 등은 정보 데이터로부터 얻는 수입 비중이 커지고 있는 실정이다.Information data through the Internet is forming a huge profitable market in terms of function and effectiveness in addition to media such as TV, newspaper, and radio. In addition, portals and websites providing various information have a larger share of imports from information data.

인터넷의 웹사이트에 게재되는 정보 데이터는 배너나 동영상 형태 또는 사용자가 시청각적으로 인식할 수 있는 다양한 형태로 제공된다. 이러한 정보 데이터는 단순히 웹사이트의 메인 페이지나 서브 페이지에 지정된 정보 제공창에 게재하여 홍보하거나 또는 해당하는 검색어에 대한 정보 데이터를 위한 검색 관련 상품 등으로 나눌 수 있다.Information data to be displayed on a web site of the Internet is provided in various forms such as a banner or a moving image or a user can visually recognize. Such information data can be simply displayed on an information providing window designated on a main page or a sub page of a web site, or can be divided into search related products for information data on a corresponding search word.

정보 데이터를 제공하는 기술의 일례로서, 한국 공개특허공보 제10-2010-0004312호 "인터넷을 통한 웹사이트 광고 연동방법"에는 광고주가 원하는 광고를 어느 한 웹사이트와 광고 계약을 체결한 후에 해당 웹사이트와 광고 연동이 제휴된 수많은 웹사이트를 선택함으로써 원하는 광고를 원하는 웹사이트를 통해 동시에 연동하는 기술이 개시되어 있다.As an example of a technology for providing information data, Korean Patent Laid-Open No. 10-2010-0004312 entitled " Method of Interacting Web Site Advertisements Through the Internet ", an advertiser concludes an advertisement with an arbitrary Web site, A plurality of websites linked with sites are linked with each other, and a desired advertisement is simultaneously linked through a desired web site.

제약 조건 없이 모든 이미지를 대상으로 이미지에 포함된 객체를 감지하여 감지된 객체와 관련된 정보 데이터를 검색할 수 있는 정보 제공 방법 및 시스템을 제공한다.There is provided an information providing method and system capable of detecting an object included in an image by targeting all images without constraint and searching for information data related to the detected object.

CNN(Convolutional Neural Network features) 기반의 객체 검출(object detection) 알고리즘을 이용하여 이미지에서 특정 주제의 아이템에 맞는 객체를 찾을 수 있는 정보 제공 방법 및 시스템을 제공한다.The present invention provides a method and system for providing information for finding an object corresponding to an item of a specific topic in an image using an object detection algorithm based on CNN (Convolutional Neural Network features).

계층적 분류 모델 및 속성 학습(attribute learning) 기반 CNN을 적용하여 이미지에서 감지된 객체에 대해 카테고리를 분류하고 특정한 속성의 특징들을 추출할 수 있는 정보 제공 방법 및 시스템을 제공한다.There is provided an information providing method and system capable of classifying a category of an object detected in an image and extracting characteristics of a specific attribute by applying a hierarchical classification model and an attribute learning based CNN.

이미지에서 감지된 객체에 대해 희소한(sparse) 특성을 바탕으로 한 인덱싱과 검색을 통해 대용량 DB에서 보다 정확하고 빠르게 관련 정보를 찾을 수 있는 정보 제공 방법 및 시스템을 제공한다.The present invention provides a method and system for providing information that can more accurately and quickly find related information in a large-capacity DB through indexing and searching based on a sparse characteristic of an object detected in an image.

고차원(high dimension)을 가지는 CNN 기반의 심원한 특징(deep feature)에 대하여 차원 축소와 양자화 과정을 거쳐 검색 결과를 제공할 수 있는 정보 제공 방법 및 시스템을 제공한다.There is provided an information providing method and system for providing a search result through a dimension reduction and a quantization process for a CNN-based deep feature having a high dimension.

컴퓨터로 구현되는 방법에 있어서, 검색을 위한 질의로 이미지를 수신하는 단계; 특정 아이템에 대해 학습된 딥 러닝(deep learning) 학습 모델을 이용하여 상기 이미지에서 상기 특정 아이템에 해당되는 객체를 검출한 후 상기 검출된 객체의 특징(feature)을 추출하는 단계; 상기 특정 아이템과 관련된 정보 데이터 중 상기 객체와 관련된 정보 데이터로서 상기 객체의 특징과 대응되는 정보 데이터를 검색하는 단계; 및 상기 객체의 특징과 대응되는 정보 데이터를 상기 이미지에 대한 검색 결과로 제공하는 단계를 포함하는 컴퓨터로 구현되는 방법을 제공한다.A computer-implemented method, comprising: receiving an image in a query for search; Extracting a feature of the detected object after detecting an object corresponding to the specific item in the image using a deep learning learning model learned for a specific item; Retrieving information data corresponding to a characteristic of the object as information data related to the object among information data related to the specific item; And providing information data corresponding to the feature of the object as a search result for the image.

컴퓨터로 구현되는 시스템에 있어서, 검색을 위한 질의로 이미지를 수신하는 이미지 수신부; 특정 아이템에 대해 학습된 딥 러닝(deep learning) 학습 모델을 이용하여 상기 이미지에서 상기 특정 아이템에 해당되는 객체를 검출한 후 상기 검출된 객체의 특징(feature)을 추출하는 객체 검출부; 상기 특정 아이템과 관련된 정보 데이터 중 상기 객체와 관련된 정보 데이터로서 상기 객체의 특징과 대응되는 정보 데이터를 검색하는 정보 검색부; 및 상기 객체의 특징과 대응되는 정보 데이터를 상기 이미지에 대한 검색 결과로 제공하는 정보 제공부를 포함하는 컴퓨터로 구현되는 시스템을 제공한다.1. A computer-implemented system, comprising: an image receiving unit for receiving an image as a query for a search; An object detecting unit for detecting an object corresponding to the specific item in the image using a deep learning learning model learned for a specific item and extracting features of the detected object; An information retrieval unit for retrieving information data corresponding to a feature of the object as information data related to the object among information data related to the specific item; And an information providing unit for providing information data corresponding to the feature of the object as a search result for the image.

CNN(Convolutional Neural Network features) 기반의 객체 검출(object detection) 알고리즘을 이용함으로써 어떠한 제약 조건 없이 모든 이미지를 대상으로 이미지에서 특정 주제의 아이템에 맞는 객체를 정확히 찾을 수 있고 이에 따라 정보 검색을 위한 분류 성능 결과를 향상시킬 수 있다.By using object detection algorithm based on CNN (Convolutional Neural Network features), it is possible to accurately find an object corresponding to an item of a specific subject in an image without any constraint condition, The result can be improved.

계층적 분류 모델 및 속성 학습(attribute learning) 기반 CNN을 적용하여 이미지에서 감지된 객체에 대해 카테고리를 분류하고 특정한 속성의 특징들을 추출할 수 있다.The hierarchical classification model and CNN based on attribute learning can be applied to classify the categories of the objects detected in the images and to extract the characteristics of specific attributes.

이미지에서 감지된 객체에 대해 희소한(sparse) 특성을 바탕으로 하여 고속의 인덱싱과 검색을 통해 대용량 DB에서 보다 정확하고 빠르게 관련 정보를 찾을 수 있다.Based on the sparse characteristics of objects detected in the image, high-speed indexing and searching can be used to find relevant information more accurately and quickly in a large-capacity DB.

고차원(high dimension)을 가지는 CNN 기반의 심원한 특징(deep feature)에 대하여 차원 축소와 양자화 과정을 거쳐 관련 정보의 랭킹에 적용함으로써 검색 품질을 향상시킬 수 있다.The search quality can be improved by applying dimensionality reduction and quantization to the ranking of related information for a CNN-based deep feature having a high dimension.

도 1은 본 발명의 일실시예에 따른 네트워크 환경의 예를 도시한 도면이다.
도 2는 본 발명의 일실시예에 있어서, 전자 기기 및 서버의 내부 구성을 설명하기 위한 블록도이다.
도 3은 본 발명의 일실시예에 있어서, 데이터셋을 구성하기 위한 어노테이션 툴의 예를 도시한 도면이다.
도 4는 본 발명의 일실시예에 따른 서버의 프로세서가 포함할 수 있는 구성요소의 예를 도시한 도면이다.
도 5는 본 발명의 일실시예에 따른 서버가 수행할 수 있는 방법의 예를 도시한 흐름도이다.
도 6은 본 발명의 일실시예에 있어서, 카테고리 분류를 위한 사전 학습 모델을 재생성 하는 과정의 예를 도시한 도면이다.
도 7은 본 발명의 일실시예에 있어서, 이미지에 대한 로컬리제이션과 속성 학습 분류 결과의 예를 도시한 도면이다.
도 8은 본 발명의 일실시예에 있어서, CNN 기반의 이미지 특성 분류 과정의 예를 도시한 도면이다.
도 9는 본 발명의 일실시예에 있어서, 희소한 특성을 지닌 이미지 특징을 설명하기 위한 도면이다.
도 10은 본 발명의 일실시예에 있어서, 이미지 특성의 희소한 특성을 이용한 인덱싱 과정의 예를 도시한 도면이다.
도 11은 본 발명의 일실시예에 있어서, 차원 축소 및 양자화 과정의 예를 도시한 도면이다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention.
3 is a diagram showing an example of an annotation tool for constructing a data set in an embodiment of the present invention.
4 is a diagram illustrating an example of a component that a processor of a server according to an embodiment of the present invention may include.
5 is a flowchart illustrating an example of a method that a server according to an embodiment of the present invention can perform.
FIG. 6 is a diagram showing an example of a process of regenerating a pre-learning model for category classification, according to an embodiment of the present invention.
FIG. 7 is a diagram showing examples of localization and attribute learning classification results for an image in an embodiment of the present invention.
FIG. 8 is a diagram illustrating an example of a CNN-based image characteristic classification process according to an exemplary embodiment of the present invention.
9 is a diagram for explaining an image characteristic having a rare characteristic in an embodiment of the present invention.
10 is a diagram showing an example of an indexing process using a rare property of an image characteristic in an embodiment of the present invention.
11 is a diagram showing an example of a dimension reduction and quantization process in an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 네트워크 환경의 예를 도시한 도면이다. 도 1의 네트워크 환경은 복수의 전자 기기들(110, 120, 130, 140), 복수의 서버들(150, 160) 및 네트워크(170)를 포함하는 예를 나타내고 있다. 이러한 도 1은 발명의 설명을 위한 일례로 전자 기기의 수나 서버의 수가 도 1과 같이 한정되는 것은 아니다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention. 1 shows an example in which a plurality of electronic devices 110, 120, 130, 140, a plurality of servers 150, 160, and a network 170 are included. 1, the number of electronic devices and the number of servers are not limited to those shown in FIG.

복수의 전자 기기들(110, 120, 130, 140)은 컴퓨터 장치로 구현되는 고정형 단말이거나 이동형 단말일 수 있다. 복수의 전자 기기들(110, 120, 130, 140)의 예를 들면, 스마트폰(smart phone), 휴대폰, 내비게이션, 컴퓨터, 노트북, 디지털방송용 단말, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 태블릿 PC 등이 있다. 일례로 전자 기기 1(110)은 무선 또는 유선 통신 방식을 이용하여 네트워크(170)를 통해 다른 전자 기기들(120, 130, 140) 및/또는 서버(150, 160)와 통신할 수 있다.The plurality of electronic devices 110, 120, 130, 140 may be a fixed terminal implemented as a computer device or a mobile terminal. Examples of the plurality of electronic devices 110, 120, 130 and 140 include a smart phone, a mobile phone, a navigation device, a computer, a notebook, a digital broadcast terminal, a PDA (Personal Digital Assistants) ), And tablet PCs. For example, the electronic device 1 110 may communicate with other electronic devices 120, 130, 140 and / or the servers 150, 160 via the network 170 using a wireless or wired communication scheme.

통신 방식은 제한되지 않으며, 네트워크(170)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망)을 활용하는 통신 방식뿐만 아니라 기기들간의 근거리 무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크(170)는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크(170)는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and may include a communication method using a communication network (for example, a mobile communication network, a wired Internet, a wireless Internet, a broadcasting network) that the network 170 may include, as well as a short-range wireless communication between the devices. For example, the network 170 may be a personal area network (LAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN) , A network such as the Internet, and the like. The network 170 may also include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, It is not limited.

서버(150, 160) 각각은 복수의 전자 기기들(110, 120, 130, 140)과 네트워크(170)를 통해 통신하여 명령, 코드, 파일, 컨텐츠, 서비스 등을 제공하는 컴퓨터 장치 또는 복수의 컴퓨터 장치들로 구현될 수 있다.Each of the servers 150 and 160 is a computer device or a plurality of computers that communicate with a plurality of electronic devices 110, 120, 130 and 140 through a network 170 to provide commands, codes, files, Lt; / RTI > devices.

일례로, 서버(160)는 네트워크(170)를 통해 접속한 전자 기기 1(110)로 어플리케이션의 설치를 위한 파일을 제공할 수 있다. 이 경우 전자 기기 1(110)은 서버(160)로부터 제공된 파일을 이용하여 어플리케이션을 설치할 수 있다. 또한 전자 기기 1(110)이 포함하는 운영체제(Operating System, OS)나 적어도 하나의 프로그램(일례로 브라우저나 상기 설치된 어플리케이션)의 제어에 따라 서버(150)에 접속하여 서버(150)가 제공하는 서비스나 컨텐츠를 제공받을 수 있다. 예를 들어, 전자 기기 1(110)이 어플리케이션의 제어에 따라 네트워크(170)를 통해 서비스 요청 메시지를 서버(150)로 전송하면, 서버(150)는 서비스 요청 메시지에 대응하는 코드를 전자 기기 1(110)로 전송할 수 있고, 전자 기기 1(110)은 어플리케이션의 제어에 따라 코드에 따른 화면을 구성하여 표시함으로써 사용자에게 컨텐츠를 제공할 수 있다.In one example, the server 160 may provide a file for installation of the application to the electronic device 1 (110) connected via the network 170. [ In this case, the electronic device 1 (110) can install an application using a file provided from the server (160). The server 150 is connected to the server 150 in accordance with an operating system (OS) included in the electronic device 1 110 or under control of at least one program (for example, a browser or an installed application) I can receive contents. For example, when the electronic device 1 (110) transmits a service request message to the server 150 via the network 170 under the control of the application, the server 150 transmits a code corresponding to the service request message to the electronic device 1 The first electronic device 110 can provide contents to the user by displaying and displaying a screen according to the code according to the control of the application.

특히, 서버(150)는 이미지를 질의(query)로 받아 이미지에 포함된 객체를 찾은 후 해당 객체와 관련된 정보 데이터를 검색하여 전자 기기 1(110)로 검색 결과를 제공할 수 있다. 일례로, 서버(150)는 네트워크(170)를 통해 전자 기기 1(110)로부터 이미지를 질의로 수신할 수 있고, 이에 이미지에 포함된 객체와 매칭 또는 유사한 정보 데이터를 검색하여 검색 결과를 전자 기기 1(110)로 제공할 수 있다. 다른 예로, 서버(150)는 전자 기기 1(110)에서 접근 요청한 이미지 또는 문서에 포함된 이미지를 질의로 인식할 수 있고, 이에 이미지에 포함된 객체와 매칭 또는 유사한 정보 데이터를 검색하여 검색 결과를 전자 기기 1(110)이 접근 요청한 이미지 또는 문서와 함께 전자 기기 1(110)로 제공할 수 있다.
In particular, the server 150 receives an image as a query, finds an object included in the image, and searches the information data related to the object to provide the search result to the electronic device 1 (110). In one example, the server 150 may receive an image query from the electronic device 1 110 via the network 170, retrieve information data matching or similar to the objects contained in the image, 1 (110). As another example, the server 150 can recognize an image included in an image or a document requested by the electronic device 1 (110) as a query, retrieve information data matching or similar to the object included in the image, (110) with the image or document requested for access by the electronic device (110).

도 2는 본 발명의 일실시예에 있어서, 전자 기기 및 서버의 내부 구성을 설명하기 위한 블록도이다. 도 2에서는 하나의 전자 기기에 대한 예로서 전자 기기 1(110), 그리고 하나의 서버에 대한 예로서 서버(150)의 내부 구성을 설명한다. 다른 전자 기기들(120, 130, 140)이나 서버(160) 역시 동일한 또는 유사한 내부 구성을 가질 수 있다.2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention. In FIG. 2, an internal configuration of the electronic device 1 (110) as an example of one electronic device and the server 150 as an example of one server will be described. Other electronic devices 120, 130, 140 or server 160 may have the same or similar internal configurations.

전자 기기 1(110)과 서버(150)는 메모리(211, 221), 프로세서(212, 222), 통신 모듈(213, 223) 그리고 입출력 인터페이스(214, 224)를 포함할 수 있다. 메모리(211, 221)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 또한, 메모리(211, 221)에는 운영체제나 적어도 하나의 프로그램 코드(일례로 전자 기기 1(110)에 설치되어 구동되는 브라우저나 영상 통화를 위한 어플리케이션 등을 위한 코드)가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(211, 221)와는 별도의 컴퓨터에서 판독 가능한 기록 매체로부터 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록 매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록 매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록 매체가 아닌 통신 모듈(213, 223)을 통해 메모리(211, 221)에 로딩될 수도 있다. 예를 들어, 적어도 하나의 프로그램은 개발자들 또는 어플리케이션의 설치 파일을 배포하는 파일 배포 시스템(일례로 상술한 서버(160))이 네트워크(170)를 통해 제공하는 파일들에 의해 설치되는 프로그램(일례로 상술한 어플리케이션)에 기반하여 메모리(211, 221)에 로딩될 수 있다.The electronic device 1 110 and the server 150 may include memories 211 and 221, processors 212 and 222, communication modules 213 and 223 and input / output interfaces 214 and 224. The memories 211 and 221 may be a computer-readable recording medium and may include a permanent mass storage device such as a random access memory (RAM), a read only memory (ROM), and a disk drive. The memory 211 or 221 may store an operating system or at least one program code (for example, a browser installed in the electronic device 1 (110) and a code for an application for video communication). These software components may be loaded from a computer readable recording medium separate from the memories 211 and 221. [ Such a computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, and a memory card. In other embodiments, the software components may be loaded into memory 211, 221 via communication modules 213, 223 rather than a computer readable recording medium. For example, at least one program may be a program installed by a file distribution system (for example, the server 160 described above) that distributes installation files of developers or applications, May be loaded into the memory 211, 221 based on the application described above.

프로세서(212, 222)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(211, 221) 또는 통신 모듈(213, 223)에 의해 프로세서(212, 222)로 제공될 수 있다. 예를 들어 프로세서(212, 222)는 메모리(211, 221)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.Processors 212 and 222 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations. The instructions may be provided to the processors 212 and 222 by the memories 211 and 221 or the communication modules 213 and 223. For example, the processor 212, 222 may be configured to execute a command received in accordance with a program code stored in a recording device, such as the memory 211, 221.

통신 모듈(213, 223)은 네트워크(170)를 통해 전자 기기 1(110)과 서버(150)가 서로 통신하기 위한 기능을 제공할 수 있으며, 다른 전자 기기(일례로 전자 기기 2(120)) 또는 다른 서버(일례로 서버(160))와 통신하기 위한 기능을 제공할 수 있다. 일례로, 전자 기기 1(110)의 프로세서(212)가 메모리(211)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청(일례로 영상 통화 서비스를 위한 요청)이 통신 모듈(213)의 제어에 따라 네트워크(170)를 통해 서버(150)로 전달될 수 있다. 역으로, 서버(150)의 프로세서(222)의 제어에 따라 제공되는 제어 신호나 명령, 컨텐츠, 파일 등이 통신 모듈(223)과 네트워크(170)를 거쳐 전자 기기 1(110)의 통신 모듈(213)을 통해 전자 기기 1(110)로 수신될 수 있다. 예를 들어 통신 모듈(213)을 통해 수신된 서버(150)의 제어 신호나 명령 등은 프로세서(212)나 메모리(211)로 전달될 수 있고, 컨텐츠나 파일 등은 전자 기기 1(110)가 더 포함할 수 있는 저장 매체로 저장될 수 있다.The communication modules 213 and 223 may provide functions for the electronic device 1 110 and the server 150 to communicate with each other through the network 170 and may provide functions for communicating with other electronic devices (for example, the electronic device 2 120) Or to communicate with another server (e.g., server 160). For example, when the processor 212 of the electronic device 1 110 receives a request (for example, a request for a video call service) generated according to a program code stored in a recording device such as the memory 211, To the server 150 via the network 170 in accordance with the < / RTI > Conversely, control signals, commands, contents, files, and the like provided under the control of the processor 222 of the server 150 are transmitted to the communication module 223 of the electronic device 110 via the communication module 223 and the network 170 213 to the electronic device 1 (110). For example, control signals and commands of the server 150 received through the communication module 213 may be transmitted to the processor 212 or the memory 211, May be stored as a storage medium that may further include a < RTI ID = 0.0 >

입출력 인터페이스(214, 224)는 입출력 장치(215)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 어플리케이션의 통신 세션을 표시하기 위한 디스플레이와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(214)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 보다 구체적인 예로, 전자 기기 1(110)의 프로세서(212)는 메모리(211)에 로딩된 컴퓨터 프로그램의 명령을 처리함에 있어서 서버(150)나 전자 기기 2(120)가 제공하는 데이터를 이용하여 구성되는 서비스 화면이나 컨텐츠가 입출력 인터페이스(214)를 통해 디스플레이에 표시될 수 있다.The input / output interfaces 214 and 224 may be means for interfacing with the input / output device 215. For example, the input device may include a device such as a keyboard or a mouse, and the output device may include a device such as a display for displaying a communication session of the application. As another example, the input / output interface 214 may be a means for interfacing with a device having integrated functions for input and output, such as a touch screen. More specifically, the processor 212 of the electronic device 1 (110) uses the data provided by the server 150 or the electronic device 2 (120) in processing commands of the computer program loaded in the memory 211 A service screen or contents can be displayed on the display through the input / output interface 214. [

또한, 다른 실시예들에서 전자 기기 1(110) 및 서버(150)는 도 2의 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 전자 기기 1(110)은 상술한 입출력 장치(215) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), GPS(Global Positioning System) 모듈, 카메라, 각종 센서, 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다. 보다 구체적인 예로, 전자 기기 1(110)이 스마트폰인 경우, 일반적으로 스마트폰이 포함하고 있는 가속도 센서나 자이로 센서, 카메라, 각종 물리적인 버튼, 터치패널을 이용한 버튼, 입출력 포트, 진동을 위한 진동기 등의 다양한 구성요소들이 전자 기기 1(110)에 더 포함되도록 구현될 수 있음을 알 수 있다.
Also, in other embodiments, electronic device 1 110 and server 150 may include more components than the components of FIG. However, there is no need to clearly illustrate most prior art components. For example, electronic device 1 110 may be implemented to include at least a portion of input / output devices 215 described above, or may be implemented with other components such as a transceiver, Global Positioning System (GPS) module, camera, Elements. More specifically, when the electronic device 1 (110) is a smart phone, it may be an acceleration sensor, a gyro sensor, a camera, various physical buttons, buttons using a touch panel, input / output ports, It is to be understood that the present invention may be embodied in various forms without departing from the spirit or scope of the invention.

본 발명의 실시예들은 딥 러닝(deep learning) 방법에 기반하여 제약 조건이 없는 자연 이미지(natural image)를 질의로 하여 이미지와 연관된 정보 데이터를 검색 및 추천하는 기술에 관한 것이다.Embodiments of the present invention are directed to techniques for retrieving and recommending information data associated with an image by querying a natural image without constraints based on a deep learning method.

본 발명에서는 이미지에 포함된 객체를 감지 및 분류한 후(localization) 감지된 객체와 관련된 정보 데이터를 대용량 DB에서 빠르고 정확하게 검색하여 해당 객체와 매칭 또는 유사한 순으로 검색 결과를 제공할 수 있다.In the present invention, the object included in the image is detected and classified (localized), and information data related to the detected object can be quickly and accurately searched in the large-capacity DB, and the search result can be provided in the order matching or similar to the object.

이를 위하여, 본 발명에서는 이미지에 대해 CNN 기반의 로컬리제이션 방법을 적용하고 이를 학습시킬 때 특정 주제(예컨대, 패션 등)에 맞는 파인-튜닝(fine-tuning) 방법을 제공할 수 있다. 그리고, 본 발명에서는 뛰어난 검색을 위하여 계층적 분류 모델 및 속성 학습 기반 CNN을 적용하여 카테고리 분류 및 특징 추출에 적용할 수 있다. 기본적으로, 인터넷 상의 데이터들(예컨대, 쇼핑 관련 데이터 등)은 방대하다. 이러한 이유로 빠르게 검색하는 것이 매우 중요하며 이를 위해 희소한 개념의 매우 심플하고도 강건한 고속 인덱싱 및 검색 방법을 제공할 수 있다.To this end, in the present invention, a CNN-based localization method may be applied to an image and a fine-tuning method may be provided for a specific subject (e.g., fashion, etc.) in learning. The present invention can be applied to category classification and feature extraction by applying a hierarchical classification model and CNN based on attribute learning for excellent searching. Basically, data on the Internet (e.g., shopping-related data, etc.) is vast. For this reason, it is very important to do a quick search, which can provide a very simple and robust fast indexing and search method with a rare concept.

본 명세서에서, 이미지는 모바일 단말이나 다른 형태의 전자 기기에서 촬영 또는 생성된 이미지, 혹은 인터넷(예컨대, 쇼핑몰, SNS, 뉴스 등) 상에 배포 또는 게재 가능한 형태의 이미지, 웹 페이지와 같은 문서에 포함된 이미지 등 모든 타입의 이미지를 포괄하여 의미할 수 있다.In this specification, an image is included in a document such as a web page, an image captured or generated on a mobile terminal or other type of electronic device, or an image in a form distributed or displayed on the Internet (e.g., shopping mall, SNS, The image may include all kinds of images such as an image.

이하의 실시예에서는 패션 아이템을 포함하는 이미지를 대표적인 예로 하여 설명하나, 이에 한정되는 것은 아니다.In the following embodiments, an image including a fashion item is described as a representative example, but the present invention is not limited thereto.

이미지 기반의 패션 검색을 위해서는, 첫 번째, 이미지 안에 존재하는 복잡한 배경에 상관 없이 그 대상을 정확히 찾아야 한다. 두 번째, 그 찾은 대상이 정확히 어떤 카테고리에 속하는지 정확한 이미지 분류가 이루어져야 한다. 세 번째, 유사한 상품이 검색 결과로 나오도록 하기 위해 보다 강건한 이미지 특징을 추출해야 한다. 네 번째, 패션과 같은 품목은 거대한 DB가 존재하므로 그 속에서 보다 빠르고 정확하게 찾아야 한다.For an image-based fashion search, first, the subject must be precisely searched regardless of the complex background in the image. Second, accurate image classification should be done to determine exactly which category the object found belongs to. Third, you need to extract more robust image features to ensure that similar products appear as search results. Fourth, items such as fashion have a huge DB, so you have to find them faster and more accurately.

먼저, 서버(150)는 이미지 기반의 패션 검색을 위한 데이터셋(dataset)을 사전에 구성할 수 있으며, 이때 데이터셋의 구성 원칙은 패션 카테고리의 특성을 잘 표현하도록 하는데 그 목적이 있다. 이를 위해, 패션에 대한 속성을 크게 대표적인 3개의 속성으로 구분하고 각각의 상세한 속성으로 나누어 데이터셋을 구성할 수 있다. 이러한 속성 기반의 데이터셋은 딥 러닝 학습에 적용할 수 있고, 특히 딥 러닝의 한 분야인 CNN(Convolutional Neural Network features)을 적용할 수 있다. 예를 들어, CNN 모델을 학습시키기 위해 온라인 상의 쇼핑몰, 웹 엔진, SNS 등 다양한 웹 사이트를 통해 옷을 입고 있는 사진 등 패션 아이템이 포함된 이미지들을 획득할 수 있다. 그리고, 패션 아이템(객체)을 감지하기 위한 데이터셋으로, 로컬리제이션 학습을 위한 데이터셋과 속성 학습을 위한 데이터셋을 구성할 수 있다. 예컨대, 로컬리제이션 학습을 위한 데이터셋은 옷, 신발, 가방류 등으로 나누어 수집할 수 있고, 이 형태는 CNN 기반의 객체 검출 알고리즘을 적용시키기 위한 것이다. 이때, 이미지에 포함된 패션 아이템 각각에 대해 한 개의 레이블(label)과 그에 해당하는 ROI(region of interest)를 가지는 형태이다. 이러한 데이터셋을 구성하기 위해 서버(150)는 웹 기반의 어노테이션 툴(annotation tool)을 제공할 수 있다. 예를 들어, 도 3에 도시한 바와 같이 서버(150)는 이미지(300)에 포함된 패션 아이템(301)에 대해 다양한 형식의 어노테이션이 가능한 툴을 제공할 수 있고, 이러한 어노테이션 툴을 웹 기반으로 제공하여 데이터셋을 여러 사람이 동시에 만들도록 지원할 수 있다. 다음으로, 속성 학습을 위한 데이터셋은 이미지에서 감지된 ROI에 적용될 수 있다. 본 발명에서는 속성 학습을 위한 데이터셋을 패션의 속성을 잘 표현할 수 있는 다양한 칼라(color), 텍스처(texture), 카테고리 속성(category attribute)을 기반으로 구성할 수 있다. 카테고리 속성은 기본적으로 계층적 구조에 의해 각 상위 레벨에 대한 상세 카테고리로 구성될 수 있다. 로컬리제이션에서 분류된 카테고리는 상위 카테고리를 의미하고, 이를 기준으로 패션에 대한 각각의 자세한 카테고리로 나뉠 수 있다.First, the server 150 may configure a data set for image-based fashion retrieval in advance, and the data set configuration principle is to display the characteristics of the fashion category well. For this purpose, the data set can be constructed by dividing the attribute of fashion into three attributes, which are largely representative, and dividing it into detailed attributes. These attribute-based datasets can be applied to deep-learning learning, and in particular Convolutional Neural Network features (CNN), an area of deep-learning, can be applied. For example, to learn the CNN model, it is possible to acquire images including fashion items such as clothes wearing clothes through various websites such as online shopping mall, web engine, and SNS. And, a data set for detecting fashion items (objects), a data set for localization learning and a data set for property learning can be configured. For example, data sets for localization learning can be collected by dividing into clothes, shoes, bags, etc., and this form is for applying CNN-based object detection algorithm. At this time, each of the fashion items included in the image has one label and corresponding ROI (region of interest). To configure such a dataset, the server 150 may provide a web-based annotation tool. For example, as shown in FIG. 3, the server 150 can provide various types of annotation-capable tools for the fashion items 301 included in the image 300, To allow multiple people to create a dataset at the same time. Next, the data set for property learning can be applied to the ROI detected in the image. In the present invention, the data set for attribute learning can be configured based on various colors, textures, and category attributes that can express fashion attributes well. The category attribute can be basically composed of detailed categories for each upper level by hierarchical structure. The category classified in localization means upper category, and can be divided into each detailed category of fashion based on this.

상기와 같이 패션 카테고리의 특성이 반영된 데이터셋을 바탕으로 서버(150)는 이미지에서 패션 아이템에 해당되는 객체를 정확히 찾을 수 있고, 이미지에서 찾은 해당 객체가 정확히 어떤 칼라와 텍스처, 그리고 어떤 카테고리에 속하는지 정확한 이미지 분류가 가능하다.Based on the data set reflecting the characteristics of the fashion category as described above, the server 150 can accurately find the object corresponding to the fashion item in the image, and can identify exactly which color, texture, and category Can be accurately classified.

이하에서는 이미지 기반의 패션 검색 결과를 제공하는 시스템 및 방법의 구체적인 실시예를 설명하기로 한다.Hereinafter, a specific embodiment of a system and method for providing an image-based fashion search result will be described.

도 4는 본 발명의 일실시예에 따른 서버의 프로세서가 포함할 수 있는 구성요소의 예를 도시한 도면이고, 도 5는 본 발명의 일실시예에 따른 서버가 수행할 수 있는 방법의 예를 도시한 흐름도이다.FIG. 4 is a diagram illustrating an example of a component that a server of a server according to an embodiment of the present invention can include; FIG. 5 is a diagram illustrating an example of a method that a server can perform according to an exemplary embodiment of the present invention; Fig.

도 4에 도시된 바와 같이 서버(150)의 프로세서(222)는 구성요소들로서 이미지 수신부(410), 객체 검출부(420), 정보 검색부(430), 및 정보 제공부(440)를 포함할 수 있다. 이러한 프로세서(222) 및 프로세서(222)의 구성요소들은 도 5의 방법이 포함하는 단계들(S510 내지 S540)을 수행하도록 서버(150)를 제어할 수 있다. 이때, 프로세서(222) 및 프로세서(222)의 구성요소들은 메모리(221)가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다. 또한, 프로세서(222)의 구성요소들은 운영체제나 적어도 하나의 프로그램이 제공하는 제어 명령에 따라 프로세서(222)에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 예를 들어, 프로세서(222)가 상술한 제어 명령에 따라 이미지를 수신하는 기능적 표현으로서 이미지 수신부(410)가 사용될 수 있다.4, the processor 222 of the server 150 may include an image receiving unit 410, an object detecting unit 420, an information searching unit 430, and an information providing unit 440 have. The processor 222 and the components of the processor 222 may control the server 150 to perform the steps S510 through S540 included in the method of FIG. At this time, the components of the processor 222 and the processor 222 may be implemented to execute instructions according to the code of the operating system and the code of at least one program that the memory 221 contains. In addition, components of processor 222 may be representations of different functions performed by processor 222 in accordance with control commands provided by the operating system or by at least one program. For example, the image receiving unit 410 may be used as a functional representation in which the processor 222 receives an image according to the control command described above.

단계(S510)에서 이미지 수신부(410)는 검색을 위한 질의로서 이미지를 수신할 수 있다. 이때, 이미지 수신부(410)는 패션 아이템에 해당되는 객체를 포함하는 이미지를 입력으로 수신할 수 있다. 일례로, 이미지 수신부(410)는 네트워크를 통해 전자 기기1(110)로부터 이미지를 직접 수신할 수 있다. 다른 예로, 이미지 수신부(410)는 내부 혹은 외부 데이터베이스 시스템으로부터 이미지를 수신할 수 있다. 예를 들어, 이미지 수신부(410)는 전자 기기 1(110)을 대상으로 서비스 하고자 하는 문서나 컨텐츠 등에 이미지가 포함된 경우 해당 이미지를 데이터베이스 시스템으로부터 입력 받을 수 있다. 질의로서 입력되는 이미지는 어떠한 제약이 없는 자연 이미지를 의미할 수 있다. 예를 들어, 거리 이미지, 샵 이미지, 핸드폰으로 촬영한 이미지, 뉴스나 SNS 등에 게재된 웹 페이지에 포함된 이미지 등 모든 형태의 이미지가 질의로서 활용 가능하다.In step S510, the image receiving unit 410 may receive the image as a query for searching. At this time, the image receiving unit 410 may receive an image including an object corresponding to the fashion item as an input. For example, the image receiving unit 410 may receive an image directly from the electronic device 1 (110) via the network. As another example, the image receiving unit 410 may receive an image from an internal or external database system. For example, if an image is included in a document or contents to be served to the electronic device 110, the image receiving unit 410 may receive the image from the database system. An image input as a query can mean a natural image without any restrictions. For example, all types of images such as street images, shop images, mobile phone images, and images included in web pages posted in news or SNS can be used as queries.

단계(S520)에서 객체 검출부(420)는 특정 주제의 아이템, 일례로 패션 아이템의 데이터셋이 적용된 CNN 학습 모델을 이용하여 질의로서 수신된 이미지에서 패션 아이템에 해당되는 적어도 하나의 객체를 검출한 후 검출된 객체의 특징을 추출할 수 있다.In operation S520, the object detection unit 420 detects at least one object corresponding to a fashion item in the image received as a query using a CNN learning model to which an item of a specific topic, for example, a data item of a fashion item is applied The feature of the detected object can be extracted.

객체 검출부(420)는 사용자에게 이미지 안에 존재하는 패션 아이템을 선택하기 위한 어떠한 인터랙션도 요구하지 않기 위해서 이미지에 포함된 객체에 대한 자동 로컬리제이션을 수행할 수 있다. 본 발명에서 객체 검출부(420)는 CNN 기반 객체 검출 알고리즘에 기반한 로컬리제이션 방법을 적용하며, 이때 CNN 기반 객체 검출 알고리즘은 파인-튜닝을 바탕으로 학습하여 이미지에서 분류하고자 하는 객체들을 검출할 수 있다. 일 예로, 객체 검출부(420)는 R-CNN(Regions with Convolutional Neural Network features) 기반의 분류 모델을 이용하여 이미지에서 특정 주제의 아이템에 맞는 객체를 찾을 수 있다. 객체 검출을 위한 사전 훈련 모델(Pre-trained Model)은 이미지와 관련된 데이터셋을 이용하여 학습시킨 CNN 학습 모델을 적용하되, CNN 학습 모델을 패션에 더 적합한 파인-튜닝을 위한 분류 모델을 재생성 하여 적용할 수 있다.The object detection unit 420 may perform automatic localization of an object included in the image so that the user does not require any interaction for selecting a fashion item existing in the image. In the present invention, the object detection unit 420 applies a localization method based on a CNN-based object detection algorithm. At this time, the CNN-based object detection algorithm can detect objects to be classified in an image by learning based on fine- . For example, the object detecting unit 420 may use an R-CNN (Regions with Convolutional Neural Network Features) -based classification model to find an object corresponding to an item of a specific topic in an image. The pre-trained model for object detection uses the CNN learning model learned by using the data set related to the image, and regenerates the classification model for the fine-tuning which is more suitable for fashion in the CNN learning model can do.

객체 검출부(420)에서의 로컬리제이션은 이미지 안에서 찾고자 하는 패션 아이템에 해당되는 객체에 관한 위치와 미리 정의된 라벨을 알아내는 과정을 의미한다. 객체 검출부(420)는 로컬리제이션을 위해 CNN 기반 객체 검출 알고리즘을 적용하며, 이는 CNN 내부에서 후보 ROI를 내재하여 계산하기 때문에 보다 빠르게 로컬리제이션을 수행할 수 있다. 객체에 대한 분류 성능을 향상시키기 위해 패션 아이템에 보다 적합하도록 사전 훈련 모델을 재구성할 수 있다. 다시 말해, 파인-튜닝을 위해 상세한 카테고리를 적용한 사전 훈련 모델을 재생성 할 수 있으며, 기존의 사전 훈련 모델을 패션 데이터셋을 이용하여 재학습(re-training)할 수 있다. 예를 들어, 도 6을 참조하면 초기 사전 훈련 모델(601)은 미리 학습된 CNN 모델을 사용하되, 그 다음 과정에서는 패션 카테고리의 데이터셋을 이용하여 학습시킨 모델로 학습하여 사전 학습 모델(602)을 생성한다. 마지막으로, 다시 한번 패션 학습셋을 재구성하고 이전에 생성된 패션에 적합한 사전 학습 모델(602)을 적용하여 로컬리제이션을 수행하기 위한 최종 모델(603)을 생성할 수 있다.Localization in the object detection unit 420 means a process of finding a position and an predefined label of an object corresponding to a fashion item to be searched in the image. The object detection unit 420 applies a CNN-based object detection algorithm for localization. Since the object detection unit 420 internally calculates the candidate ROI in the CNN, it can perform the localization more quickly. The pre-training model can be reconfigured to be more suitable for fashion items in order to improve the classification performance of the object. In other words, pre-training models with detailed categories for fine-tuning can be regenerated, and existing pre-training models can be re-trained using fashion data sets. For example, referring to FIG. 6, the initial preliminary training model 601 uses a pre-learned CNN model. In the following process, the preliminary learning model 602 is learned by learning a model using a data set of a fashion category, . Finally, a final model 603 for performing localization can be created by reconfiguring the fashion learning set once again and applying a pre-learning model 602 suitable for the previously generated fashion.

객체 검출부(420)는 질의로 수신된 이미지에 대해 패션 카테고리의 데이터셋을 이용하여 학습시킨 CNN 기반 객체 검출 알고리즘을 적용함으로써 이미지에 포함된 패션 아이템(객체)에 대한 로컬리제이션과 속성 학습(attribute learning)을 수행할 수 있다. 도 7은 이미지에 대한 로컬리제이션 결과의 예시를 도시한 것이다. 객체 검출부(420)는 이미지(700)가 들어오면 CNN 기반 객체 검출 알고리즘을 바탕으로 이미지(700) 안에 존재하는 패션 객체의 위치(ROI)(701, 702, 703)를 찾고 각 위치에 해당되는 객체 속성(710)을 분류할 수 있다. 이때, 객체의 위치는 로컬리제이션의 결과를 의미하고, 객체 속성(710)은 객체의 속성 학습에 대한 결과로서 칼라, 텍스처, 카테고리에 대한 결과를 포함할 수 있다.The object detection unit 420 applies a CNN-based object detection algorithm learned by using a data set of a fashion category to an image received as a query, thereby obtaining localization and attribute learning learning can be performed. Figure 7 illustrates an example of localization results for an image. The object detection unit 420 detects the ROIs 701, 702, and 703 of the fashion objects existing in the image 700 based on the CNN-based object detection algorithm when the image 700 is received, Attribute 710 can be categorized. At this time, the position of the object means the result of the localization, and the object attribute 710 can include the results for the color, texture, and category as a result of the attribute learning of the object.

객체 검출부(420)는 이미지에서 검출된 객체에 대한 속성 학습을 통해 해당 객체의 특징을 추출하여 카테고리를 분류할 수 있다. 객체 검출부(420)는 패션 아이템의 특정 속성, 즉 칼라, 텍스처, 카테고리에 기반하여 모델링 된 CNN을 적용하여 이미지에서 검출된 객체의 특징을 정확히 추출 및 분류할 수 있다. 속성 학습은 특정한 패션 카테고리에 맞는 다양한 속성을 정의하여 학습에 적용한 것이다. 칼라와 텍스처 및 카테고리 속성에 대한 데이터셋을 구성하고 해당 속성의 특성을 가진 데이터들을 CNN 알고리즘을 적용하여 학습시킬 수 있다. 칼라와 텍스처 속성은 각각 1개씩 카테고리에 상관없이 공통된 학습 모델을 생성할 수 있다. 패션 아이템에 대한 분류 구조는 계층적 구조를 가지며, 이는 카테고리 속성 측면에서 볼 때 패션 카테고리들은 서로 공통된 특성을 공유한다. 예를 들어, 원피스는 탑(Top)과 스커트(Skirts)의 속성을 둘 다 가지는 경우이다. 이처럼 패션 카테고리는 세부 카테고리로 나눌수록 서로 공통된 특성을 가지는 경우가 매우 많다. 따라서, 각각의 카테고리에 대해서 구별되는 특성을 보존하도록 설계할 필요가 있어 계층적 구조를 가지도록 구성하는 것이다.The object detection unit 420 may classify categories by extracting the characteristics of the object through attribute learning for the object detected in the image. The object detection unit 420 can accurately extract and classify the characteristics of the object detected in the image by applying CNN modeled based on specific attributes of the fashion item, i.e., color, texture, and category. Attribute learning is applied to learning by defining various attributes according to a specific fashion category. We can construct a dataset for color, texture, and category attributes, and learn the data with the properties of that property by applying the CNN algorithm. Each color and texture property can generate a common learning model regardless of category. The classification structure for fashion items has a hierarchical structure, which, in terms of category attributes, shares common characteristics among fashion categories. For example, a dress has both attributes of Top and Skirts. As fashion categories are divided into subcategories, there are many cases where they have common characteristics. Therefore, it is necessary to design so as to preserve distinctive characteristics for each category, and to have a hierarchical structure.

로컬리제이션에 따른 분류 결과는 상위 레벨을 의미하고 카테고리 속성은 하위 레벨을 의미한다. 도 8을 참조하면, 객체 검출부(420)는 이미지(800)가 들어오면 로컬리제이션을 위한 CNN 모델(상위레벨)(801)과 속성 학습을 위한 CNN 모델(하위레벨)(802)을 차례로 적용할 수 있다. 객체 검출부(420)는 이미지(800) 안에 패션 아이템에 해당되는 객체들이 존재하는 경우 CNN 모델(상위레벨)(801)을 통한 로컬리제이션 과정에서 상위 개념의 패션 라벨과 위치(ROI)를 찾아낼 수 있다. 이후, 객체 검출부(420)는 속성 학습 CNN 모델(하위레벨)(802) 중에서 이미지(800)에서 검출된 객체의 라벨에 매칭되는 카테고리 속성 모델과 각 칼라/텍스처 속성 모델을 적용한 CNN 과정을 거쳐 최종적으로 각 객체의 칼라/텍스처/카테고리에 대한 3개의 분류 결과를 획득할 수 있다. 속성 학습 CNN 모델(하위레벨)(802)을 적용한 분류의 중간 과정, 즉 CNN의 소프트-맥스 레이어(Soft-max Layer)(classifier)의 바로 이전 레이어인 풀 연결 레이어(Fully Connected Layer)부터 최종 분류 결과와 마찬가지로 각각 3개의 특징(칼라, 텍스처, 카테고리)을 추출할 수 있다(Deep Feature).The result of classification according to localization means higher level, and category attribute means lower level. 8, when the image 800 is received, the object detecting unit 420 sequentially applies a CNN model (upper level) 801 for localization and a CNN model (lower level) 802 for attribute learning can do. The object detecting unit 420 finds a fashion label and a position (ROI) of the upper concept in the localization process through the CNN model (upper level) 801 when there are objects corresponding to the fashion item in the image 800 . Thereafter, the object detecting unit 420 performs a CNN process in which a category attribute model matched with the label of the object detected in the image 800 and each color / texture attribute model among the attribute learning CNN model (lower level) To obtain three classification results for each object's color / texture / category. (FULL Connected Layer), which is the immediately preceding layer of the CNN soft-max layer (classifier), is classified into a final classification As with the results, three features (color, texture, category) can be extracted for each (Deep Feature).

따라서, 객체 검출부(420)는 패션 아이템의 데이터셋이 적용된 CNN 학습 모델을 적용함으로써 이미지에서 패션 아이템에 해당되는 객체를 찾아 해당 객체의 특징을 정확히 추출할 수 있다.Accordingly, the object detecting unit 420 can extract an object corresponding to a fashion item from an image and accurately extract the feature of the object by applying a CNN learning model to which a data item of a fashion item is applied.

다시 도 5를 참조하면, 단계(S530)에서 정보 검색부(430)는 이미지에서 검출된 객체에 대하여 해당 객체와 관련된 정보 데이터를 검색하되, DB 상의 정보 데이터 중 단계(S520)에서 추출된 특징(칼라, 텍스처, 카테고리)과 대응되는 정보 데이터를 검색할 수 있다. 정보 검색부(430)는 단계(S520)에서 CNN 학습 모델을 통해 추출된 특징(Deep Feature)의 희소한 특성을 적용한 검색 방식으로 이미지에 포함된 객체와 매칭 또는 유사한 정보 데이터를 검색할 수 있다.5, in operation S530, the information searching unit 430 searches for information data related to the object detected in the image, and extracts the characteristic extracted in the step S520 of the information data on the DB Color, texture, category) corresponding to the image data. The information searching unit 430 can search for information data matching or similar to the objects included in the image in the search method using the rare characteristics of the Deep Feature extracted through the CNN learning model in step S520.

패션 관련 데이터의 경우 전세계적으로 엄청난 양의 데이터가 존재하고 현재도 꾸준히 빠르게 증가하고 있는 현실이다. 따라서, 이러한 대용량 DB를 빠르게 검색하는 기술이 필수적이다. 기본적으로 대용량 DB에서의 검색의 어려움은 많은 DB 수와 고차원을 가진 이미지 특징에 있다.In the case of fashion-related data, there is an enormous amount of data in the world, and it is still a steady increase. Therefore, a technique for quickly searching for such a large-capacity DB is essential. Basically, the difficulty of retrieving large DBs is due to the number of DBs and images with high dimensions.

CNN 기반의 이미지 특징(Deep Feature)들은 기본적으로 고차원의 특징을 가진다. 심원한 특징은 시멘틱한(semantic) 개념을 포함하고 있으며, CNN 구조에서 볼 때 점점 높은 레이어로 진행할수록 시멘틱한 속성을 지닌다. 이 의미는 그 특징이 학습된 카테고리에 대해서 아주 구별된 값을 지니고 있다고 할 수 있다. 본 발명에서는 CNN의 은닉 계층(hidden layer)에 비선형 ReLU(rectification) 함수를 적용할 수 있다. ReLU 함수는 딥 러닝에서 학습 속도와 정확성을 높이는데 기여할 수 있으며, max(0,x)로 표현될 수 있다. max(0,x)는 이전 레이어의 값에 대하여 양수는 남기고 음수의 경우 모두 0으로 만든다는 것을 의미한다. 이러한 성격을 지닌 심원한 특징은 벡터의 값들이 대부분 0을 가지는 희소한 특성을 지닌다(sparse coding).CNN-based image features (Deep Features) are basically high-dimensional features. In the CNN structure, it has a semantic property as it progresses to a higher layer. This means that the feature has a very distinct value for the learned category. In the present invention, a non-linear ReLU (rectification) function can be applied to the hidden layer of the CNN. The ReLU function can contribute to enhance learning speed and accuracy in deep learning and can be expressed as max (0, x). max (0, x) means that positive values are left for the value of the previous layer, and 0 for negative values. A spe- cial feature of this nature is that sparse coding has zero-valued nature of the vector values.

도 9는 희소한 속성을 지닌 심원한 특징(Deep Feature)의 특성을 설명하기 위한 도면이다. a)는 10000개의 특징에서 0을 포함하는 전체 분포(0000개 중 8058개)를 나타내고 있고, b)는 0이 아닌 값만의 분포(0000개 중 8058개)를 나타내고 있다. c)는 4096 벡터에 대한 값에 대한 분포(대부분이 0임을 알 수 있음)를 나타내고 있고, d)는 c)에서 0이 아닌 부분에 대한 스케일 뷰를 나타내고 있다. 즉, 4096차원일 때 대부분의 바이너리 파일(bin)은 0으로 채워져 있다. 이 의미는 0보다 큰 바이너리가 존재한다면 그 바이너리는 매우 시멘틱한 특성 또는 구별되는(discriminative) 값을 의미한다고 볼 수 있다. 이와 같은 심원한 특징(Deep Feature)의 특성을 이용하여 검색을 위한 인덱싱(indexing)에 적용할 수 있다. 최종적으로 고차원의 벡터에서 워드 개념(term)으로 전환하기 위해서는 수학식 1 및 수학식 2와 같은 방식을 적용할 수 있다.FIG. 9 is a diagram for explaining characteristics of a deep feature having a rare attribute. FIG. a) represents the entire distribution (0,808 of 0000) including 0 in 10000 features, and b) represents the distribution (8058 out of 0000) of only non-zero values. c) represents the distribution for the values for the 4096 vector (most can be found to be zero), and d) represents the scale view for the non-zero portion in c). That is, most binary files (bin) are filled with zeros when the size is 4096. This means that if there is a binary larger than 0, it means a very semantic property or a discriminative value. It can be applied to indexing for retrieval by using the characteristics of such a deep feature. Finally, in order to switch from a high dimensional vector to a word concept, the following equations (1) and (2) can be applied.

수학식 1에서, V는 CNN 학습 모델을 통해 추출된 특징(Deep Feature)(칼라, 텍스처, 카테고리)을 나타내는 벡터를 의미하고, Sort(V)는 각 벡터를 내림차순으로 정렬하는 것을 의미한다.In Equation (1), V denotes a vector representing a Deep Feature (color, texture, category) extracted through the CNN learning model, and Sort (V) means that each vector is sorted in descending order.

수학식 2에서는 수학식 1을 통해 정렬한 벡터 값들에 대해 0이 아닌 값에 대하여 주어진 값 K(Ranking or Value) 만큼 임계값(threshold)을 적용하여 선택한 다음 V에서 사용된 바이너리(bin)의 j를 실제 인덱싱(Indexing_(j))(index visual word)에 적용한다.In Equation (2), a threshold value is applied to a non-zero value by a given value K (Ranking or Value) for the vector values sorted through Equation (1) Is applied to the actual indexing (indexing _(j) ) (index visual word).

도 10은 심원한 특징(Deep Feature)의 희소한 특성을 이용한 인덱싱 과정을 도시한 것이다. 도 10을 참조하면, 정보 검색부(430)는 검색을 위해 CNN 학습 모델을 통해 추출된 특징(Deep Feature)(칼라, 텍스처, 카테고리)(S1)을 나타내는 벡터(V)(1)를 내림차순으로 정렬하고(S2), 정렬한 벡터 값 중 0이 아닌 값에 대하여 주어진 값 K 만큼 임계값을 적용하여 인덱싱을 위한 단어(index visual word)를 생성한다(S3). 만약 Ranking K라는 제약 조건이 가해진다면 K 크기만큼의 인덱싱을 하기 위한 단어가 발생된다. 다음, 정보 검색부(430)는 상기 과정(S3)을 통해 생성된 단어를 이용하여 해당 단어에 대응되는 이미지를 검색할 수 있다(S4). 다시 말해, 정보 검색부(430)는 쿼리로 수신된 이미지에 대해 CNN 학습 모델을 통해 추출된 특징(Deep Feature)으로부터 인덱싱을 위한 단어를 발생시키고 반전 인덱싱(inverted indexing)을 이용한 보팅(voting) 방식에 의해 대용량 DB에서 빠르게 후보 이미지들을 선정할 수 있다.10 illustrates an indexing process using rare properties of a deep feature. Referring to FIG. 10, the information searching unit 430 searches for a vector (V) (1) representing a Deep Feature (color, texture, category) S1 extracted through a CNN learning model in descending order (S2). Then, a threshold value is applied to a non-zero value among the sorted vector values by a given value K to generate an index visual word (S3). If a constraint of Ranking K is applied, a word is generated for indexing by K size. Next, the information retrieving unit 430 may retrieve an image corresponding to the word using the word generated in step S3 (S4). In other words, the information retrieving unit 430 generates a word for indexing from a feature extracted from the CNN learning model with respect to the image received in the query, and uses a voting method using inverted indexing It is possible to quickly select candidate images from a large-capacity DB.

다시 도 5를 참조하면, 단계(S540)에서 정보 제공부(440)는 질의로 수신된 이미지에 대해 해당 이미지에서 검출된 객체의 특징과 대응되는 정보 데이터를 검색 결과로서 제공할 수 있다. 검색 결과로 제공되는 정보 데이터는 이미지에서 검출된 객체의 특징과 대응되는 이미지 혹은 이미지를 포함하는 컨텐츠 등을 의미할 수 있다. 이때, 정보 제공부(440)는 단계(S530)에서 선정된 후보 이미지들에 대해 순위 재정렬(re-ranking)을 수행할 수 있다. 인덱싱을 위한 단어(index visual word)를 적용한 보팅 방식은 본래의 특징 값을 잃어버리는 경우가 발생하므로 보팅에서 나온 결과를 향상시킬 필요성이 있다. 일례로, 정보 제공부(440)는 검색 품질 향상을 위해 고차원의 특징(Deep Feature)에 대하여 차원 축소(reduction)와 양자화(Quantization)를 통해 순위 재정렬을 수행할 수 있다.Referring again to FIG. 5, in step S540, the information providing unit 440 may provide, as a search result, information data corresponding to the feature of the object detected in the image with respect to the image received in the query. The information data provided as a search result may mean an image corresponding to the feature of the object detected in the image or a content including the image. At this time, the information providing unit 440 may perform re-ranking of the candidate images selected in step S530. In the voting method applying the index visual word, there is a need to improve the result of the voting because the original characteristic value is lost. For example, the information providing unit 440 may perform rearrangement through dimension reduction and quantization with respect to a high-level feature to improve search quality.

도 11은 축소 및 양자화 과정을 도시한 것이다. 정보 제공부(440)는 차원 축소 알고리즘(일 예로, PCA(Principal Component Analysis) 알고리즘 등)을 사용하여 검색을 위한 이미지 특징 벡터를 축소시킬 수 있다. 도 11을 참조하면, PCA 투영(projection)과 같은 차원 축소 방식을 통해 이미지 특징을 나타내는 벡터를 좌표계에 나타낼 수 있다(S5). 학습 모델에서 추출된 특징 자체가 고차원이고 후보 이미지의 개수가 많아지면 많아질수록 검색 속도가 느려지기 때문에 특징을 나타내는 벡터들에 대한 고유치(EigenValue)의 개수를 선택함으로써 이미지 특징의 차원을 선택할 수 있다. 예를 들어, 고유치 개수를 선택할 때 실험에 의해 고유치 전체 크기의 80% 정도의 비율을 차지하는 개수를 선택할 수 있으며, 선택된 크기만큼의 차원으로 이미지 특징을 축소시킬 수 있다. 도 11에 도시한 바와 같이, 선택된 고유벡터는 새로운 좌표축으로 표현될 수 있고(S6), 그 축을 기준으로 양자화를 진행할 수 있다(S7). 보통 양자화 기준은 평균을 의미할 수 있으며, 특징 값이 평균보다 크면 1, 그렇지 않으면 0으로 전환함으로써 양자화를 진행할 수 있다.11 shows a reduction and quantization process. The information providing unit 440 may reduce the image feature vector for the search using a dimension reduction algorithm (e.g., a Principal Component Analysis (PCA) algorithm or the like). Referring to FIG. 11, a vector representing an image feature may be represented in a coordinate system through a dimensional reduction method such as PCA projection (S5). Since the feature extracted from the learning model is high-order and the number of candidate images increases, the search speed is slowed down. Therefore, the dimension of the image feature can be selected by selecting the number of eigenvalues for the vectors representing the feature . For example, when selecting the number of eigenvalues, it is possible to select the number of the eigenvalues occupying about 80% of the total eigenvalue by the experiment, and the image characteristic can be reduced to the dimension of the selected size. As shown in Fig. 11, the selected eigenvector can be represented by a new coordinate axis (S6), and quantization can proceed based on the axis (S7). Normally, the quantization criterion may mean an average. If the feature value is larger than the average, the quantization can be proceeded by switching to 1, otherwise.

따라서, 정보 제공부(440)는 CNN 학습 모델을 통해 추출된 특징에 대해 차원 축소 및 양자화를 통해 단계(S530)에서 선정된 후보 이미지들에 대한 순위 재정렬(re-ranking)을 수행할 수 있으며, 재정렬된 순위를 적용하여 검색 결과를 제공할 수 있다.Accordingly, the information providing unit 440 may perform re-ranking of the candidate images selected in step S530 through dimension reduction and quantization with respect to the feature extracted through the CNN learning model, The retrieved results can be provided by applying the reordered rankings.

상기한 정보 제공 방법에 따른 서비스 시나리오의 일례는 다음과 같다.An example of a service scenario according to the above information providing method is as follows.

1) 사용자가 전자 기기 1(110)을 이용하여 패션 아이템이 존재하는 사진(예컨대, 길거리 패션 사진, 공항 패션 사진 등)을 찍어 서버(150)로 사진을 전송함으로써 해당 사진에 대한 검색을 요청할 수 있다.1) A user can request a search for the photograph by sending a photograph to the server 150 by taking a photograph (for example, a street fashion photograph, an airport fashion photograph, etc.) in which a fashion item exists using the electronic device 1 110 have.

2) 서버(150)는 전자 기기 1(110)로부터 검색을 요청한 이미지를 수신하여 수신된 이미지에 대해 자동 로컬리제이션을 수행함으로써 이미지에 포함된 객체를 검출할 수 있다. 이때, 서버(150)는 패션 주제에 맞는 파인-튜닝 방법을 적용한 CNN 학습 모델을 이용하여 이미지에 포함된 객체 중 패션 아이템에 해당되는 객체를 검출할 수 있다.2) The server 150 can detect an object included in the image by receiving an image requested to be retrieved from the electronic device 1 (110) and performing automatic localization on the received image. At this time, the server 150 can detect an object corresponding to the fashion item among the objects included in the image, using the CNN learning model to which the fine-tuning method suited to the fashion subject is applied.

3) 서버(150)는 구체적인 패션 속성에 기반하여 모델링 된 CNN 학습 모델을 통해 이미지에서 검출된 객체가 어떤 패션 카테고리에 속하는지, 그리고 해당 객체가 어떤 칼라와 텍스처로 이루어져 있는지 상세한 객체 특징을 추출 및 분류할 수 있다.3) The server 150 extracts and extracts detailed object characteristics of the object in which the object detected in the image belongs, based on the CNN learning model modeled on concrete fashion attributes, and the color and texture of the object in question Can be classified.

4) 서버(150)는 DB 상의 패션 정보 중 3) 과정에서 추출된 특징의 패션 속성에 대응되는 패션 정보를 검색할 수 있다. 이때, 서버(150)는 패션 속성에 대한 희소한 개념의 특성을 이용하여 대용량 DB에서 이미지에 포함된 객체와 매칭 또는 유사한 패션 아이템의 데이터를 빠르고 정확하게 검색할 수 있다.4) The server 150 can retrieve the fashion information corresponding to the fashion attribute of the feature extracted in step 3) of the fashion information on the DB. At this time, the server 150 can quickly and accurately retrieve data of a fashion item matching or similar to the object included in the image in the large-capacity DB by using the characteristic of the rare concept of the fashion attribute.

5) 서버(150)는 1) 과정에서 수신된 이미지에 대한 검색 결과로서 4) 과정에서 찾은 패션 정보를 전자 기기 1(110)로 제공할 수 있다.5) The server 150 may provide fashion information found in step 4) to the first electronic device 110 as a search result on the image received in step 1).

따라서, 서버(150)는 사용자가 검색 요청한 이미지에 대하여 해당 이미지에 포함된 패션 아이템을 분석하여 그것과 매칭 또는 유사한 패션 아이템의 관련 정보를 사용자에게 추천할 수 있다.Accordingly, the server 150 may analyze the fashion item included in the image for the image requested by the user, and recommend related information of the fashion item matching or similar to the analyzed fashion item to the user.

다른 형태의 서비스 시나리오는 다음과 같다.Other types of service scenarios are:

1) 웹 상의 문서(예컨대, 연예인 관련 뉴스 등)를 수집한 후 해당 문서에 포함된 이미지를 분석하여 이미지에 포함된 패션 아이템과 관련된 정보(패션 정보)(칼라, 텍스처, 카테고리 등)를 수집할 수 있다. 이때, 문서에 포함된 인물 정보(예컨대, 연예인 이름 등)를 함께 수집함으로써 패션 정보를 인물 정보와 연계하여 데이터베이스화 할 수 있다.1) collecting a document (e.g., news related to entertainers) on the web, analyzing images included in the document, and collecting information (fashion information) related to the fashion item (color, texture, category, etc.) . At this time, by collecting the person information (e.g., entertainer name, etc.) included in the document together, the fashion information can be linked with the person information to form a database.

2) 사용자가 웹 상의 문서 중 이미지를 포함하는 문서에 접근하는 경우 해당 이미지에 대해 자동 로컬리제이션을 수행함으로써 이미지에 포함된 객체를 검출할 수 있다. 이때, 서버(150)는 패션 주제에 맞는 파인-튜닝 방법을 적용한 CNN 학습 모델을 이용하여 이미지에 포함된 객체 중 패션 아이템에 해당되는 객체를 검출할 수 있다.2) When a user accesses a document including an image of a document on the web, it can detect an object included in the image by performing automatic localization on the image. At this time, the server 150 can detect an object corresponding to the fashion item among the objects included in the image, using the CNN learning model to which the fine-tuning method suited to the fashion subject is applied.

4) 서버(150)는 DB 상의 패션 정보 중 3) 과정에서 추출된 특징의 패션 속성에 대응되는 패션 정보를 검색할 수 있다. 이때, 서버(150)는 패션 속성에 대한 희소한 개념의 특성을 이용하여 대용량 DB에서 이미지에 포함된 객체와 매칭 또는 유사한 패션 아이템의 데이터를 빠르고 정확하게 검색할 수 있다. 그리고, 서버(150)는 1) 데이터베이스화 과정에서 구축된 정보를 바탕으로 해당 패션 아이템과 연계된 인물 정보를 추가로 검색할 수 있다.4) The server 150 can retrieve the fashion information corresponding to the fashion attribute of the feature extracted in step 3) of the fashion information on the DB. At this time, the server 150 can quickly and accurately retrieve data of a fashion item matching or similar to the object included in the image in the large-capacity DB by using the characteristic of the rare concept of the fashion attribute. The server 150 can further search for the person information associated with the corresponding fashion item based on the information constructed in the database conversion process.

5) 서버(150)는 사용자가 접근한 웹 문서 상에 해당 문서에 포함된 이미지의 패션 아이템과 매칭 또는 유사한 패션 아이템에 대한 정보를 노출하고, 아울러 해당 패션 아이템과 연계된 인물 정보를 함께 노출할 수 있다.5) The server 150 exposes information on a fashion item matching or similar to the fashion item of the image included in the document, and exposes the person information associated with the fashion item on the web document accessed by the user .

따라서, 서버(150)는 사용자가 접근 요청하거나 사용자에게 제공하고자 하는 문서 상에 패션 아이템을 포함한 이미지가 존재하는 경우 이미지에 포함된 패션 아이템과 매칭 또는 유사한 패션 아이템을 검색하여 검색된 패션 아이템을 문서 상의 추천 정보로 제공할 수 있다. 이때, 서버(150)는 검색된 패션 아이템과 연계되어 있는 인물 정보를 추가로 제공함으로써 패션 아이템뿐 아니라 해당 패션 아이템을 착용한 사람들에 대한 정보도 함께 제공할 수 있다.
Accordingly, when there is an image including a fashion item on a document that the user requests access or provides to the user, the server 150 searches for a fashion item matching or similar to the fashion item included in the image, Recommendation information can be provided. At this time, the server 150 may provide information about the wearer of the fashion item as well as the fashion item by additionally providing the person information associated with the searched fashion item.

이처럼 본 발명의 실시예들에 따르면, CNN 기반의 객체 검출 알고리즘을 이용함으로써 어떠한 제약 조건 없이 모든 이미지를 대상으로 이미지에서 특정 주제의 아이템에 맞는 객체를 정확히 찾을 수 있고 이에 따라 정보 검색을 위한 분류 성능 결과를 향상시킬 수 있다. 그리고, 본 발명의 실시예들에 따르면, 계층적 분류 모델 및 속성 학습 기반 CNN을 적용하여 이미지에서 감지된 객체에 대해 카테고리를 분류하고 특정한 속성의 특징들을 추출할 수 있다. 또한, 본 발명의 실시예들에 따르면, 이미지에서 감지된 객체에 대해 희소한 특성을 바탕으로 하여 고속의 인덱싱과 검색을 통해 대용량 DB에서 보다 정확하고 빠르게 관련 정보를 찾을 수 있다. 더 나아가, 본 발명의 실시예들에 따르면, 고차원을 가지는 CNN 기반의 심원한 특징(deep feature)에 대하여 차원 축소와 양자화 과정을 거쳐 관련 정보의 랭킹에 적용함으로써 검색 품질을 향상시킬 수 있다.According to embodiments of the present invention, by using the CNN-based object detection algorithm, it is possible to accurately find an object corresponding to an item of a specific subject in an image for all images without any constraint condition, The result can be improved. According to embodiments of the present invention, the hierarchical classification model and the CNN based on the attribute learning can be applied to classify the category of the object detected in the image and extract the characteristics of the specific property. In addition, according to embodiments of the present invention, it is possible to find related information more accurately and quickly in a large-capacity DB through high-speed indexing and searching based on a rare characteristic of an object detected in an image. Furthermore, according to embodiments of the present invention, a search feature can be improved by applying CNN-based deep features to rankings of related information through dimension reduction and quantization.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit, a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device As shown in FIG. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In a computer implemented method,
Receiving an image in a query for search;
Extracting a feature of the detected object after detecting an object corresponding to the specific item in the image using a deep learning learning model learned for a specific item;
Retrieving information data corresponding to a characteristic of the object as information data related to the object among information data related to the specific item; And
Providing information data corresponding to a feature of the object as a search result for the image
Lt; / RTI >
Wherein the extracting comprises:
(ROI) and a predefined label for an object corresponding to the specific item, and then, using an attribute model matching the label with respect to the position, Respectively,
Wherein the searching comprises:
Generating a word for indexing of the given value for the feature by applying a threshold for a vector value representing the feature to a given value to select at least a portion of the vector value, Information data is retrieved,
Wherein the providing step comprises:
Selecting a number of eigenvalues for a vector value representing the feature to reduce the dimension for the feature and re-ranking the information data corresponding to the word through the dimension reduction; &Lt; / RTI >
Including
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
Wherein the extracting comprises:
Using a Convolutional Neural Network (CNN) learning model to which a data set for localization learning and attribute learning of the specific item is applied
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
Wherein the extracting comprises:
Extracting features of at least one of a color, a texture, and a category of the object
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
Wherein the extracting comprises:
Detecting a location (ROI, region of interest) of the object included in the image through a higher learning model for localization learning; And
Extracting features of at least one of a color, a texture, and a category with respect to the detected position through a sub-learning model for attribute learning;
&Lt; / RTI >

delete

The method according to claim 1,
Wherein the receiving comprises:
Receiving a search request for the image with the image from an electronic device,
Wherein the providing step comprises:
Providing the electronic device with information data corresponding to the characteristic of the object
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
Wherein the receiving comprises:
Receiving the image included in the document when the image is included in the document to be provided to the electronic device,
Wherein the providing step comprises:
Providing the information data corresponding to the feature of the object through the document when the document is provided to the electronic device
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
Wherein the providing step comprises:
If the specific item is a fashion item, providing information on a fashion item corresponding to the feature of the object as a search result on the image
Lt; RTI ID = 0.0 > 1, < / RTI >

The method according to claim 1,
Collecting information on fashion items from images included in each document with respect to documents on the web, and linking them with the person information included in the document to form a database
Further comprising:
Wherein the providing step comprises:
When the specific item is a fashion item, providing information on the fashion item corresponding to the feature of the object together with the character information as a search result on the image
Lt; RTI ID = 0.0 > 1, < / RTI >

In a computer implemented system,
An image receiving unit for receiving an image as a query for searching;
An object detecting unit for detecting an object corresponding to the specific item in the image using a deep learning learning model learned for a specific item and extracting features of the detected object;
An information retrieval unit for retrieving information data corresponding to a feature of the object as information data related to the object among information data related to the specific item; And
An information providing unit for providing information data corresponding to a characteristic of the object as a search result for the image,
Lt; / RTI >
Wherein the object detection unit comprises:
(ROI) and a predefined label for an object corresponding to the specific item, and then, using an attribute model matching the label with respect to the position, Respectively,
The information retrieval unit,
Generating a word for indexing of the given value for the feature by applying a threshold for a vector value representing the feature to a given value to select at least a portion of the vector value, Information data is retrieved,
The information providing unit,
Selecting a number of eigenvalues for a vector value representing the feature to reduce the dimension for the feature and re-ranking the information data corresponding to the word through the dimension reduction; To do
The system comprising:

14. The method of claim 13,
Wherein the object detection unit comprises:
Using a Convolutional Neural Network (CNN) learning model to which a data set for localization learning and attribute learning of the specific item is applied
The system comprising:

14. The method of claim 13,
Wherein the object detection unit comprises:
Extracting features of at least one of a color, a texture, and a category of the object
The system comprising:

14. The method of claim 13,
Wherein the object detection unit comprises:
(ROI) of the object included in the image is detected through a higher learning model for localization learning, and then the ROI of the object is detected through a lower learning model for attribute learning, Extracting at least one feature of color, texture and category for the detected position
The system comprising:

delete

A computer-readable recording medium storing a program for executing the method according to any one of claims 1 to 4 and 9 to 12.