KR102185571B1

KR102185571B1 - An apparatus for identifying purchase intent, a method therefor and a computer readable recording medium on which a program for carrying out the method is recorded

Info

Publication number: KR102185571B1
Application number: KR1020190008391A
Authority: KR
Inventors: 권성근
Original assignee: 경일대학교산학협력단
Priority date: 2019-01-22
Filing date: 2019-01-22
Publication date: 2020-12-02
Also published as: KR20200091291A

Abstract

본 발명의 실시예에 다른 구매 의도를 식별하기 위한 장치는 고객의 영상을 지속적으로 촬영하는 카메라부와, 상기 고객의 영상으로부터 고객이 상품을 응시하는 동작인 상품 응시 동작을 검출하는 모션인식부와, 상기 고객의 영상 중 얼굴 부분인 얼굴 영상을 통해 고객의 감정의 강도 변화를 분석하여 해당 상품에 대한 고객의 구매 의도를 판별하는 감정인식부와, 상기 판별 결과, 상기 고객의 구매 의도가 있는 것으로 판별되면, 상기 고객이 응시하는 상품을 특정하는 정보처리부를 포함한다. An apparatus for identifying purchase intention according to an embodiment of the present invention includes a camera unit that continuously photographs an image of a customer, and a motion recognition unit that detects a product gaze motion, which is an operation by which the customer stares at the product, from the image of the customer. , An emotion recognition unit that determines the customer's purchase intention for the product by analyzing the change in the intensity of the customer's emotion through the facial image, which is a face part of the customer's image, and the determination result, that the customer's purchase intention is When it is determined, it includes an information processing unit that specifies the product the customer is staring at.

Description

An apparatus for identifying purchase intent, a method therefor and a computer readable recording medium on which a program for carrying out the method is recorded}

본 발명은 구매 의도 식별 기술에 관한 것으로, 보다 상세하게는, 매장 내 고객의 행동 및 표정을 통해 구매 의도를 식별하기 위한 장치, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체에 관한 것이다. The present invention relates to a purchase intent identification technology, and more particularly, a device for identifying purchase intent through the behavior and expression of a customer in a store, a method for the same, and a computer-readable record in which a program performing the method is recorded. It's about the medium.

과거 마케팅 영역에서는 전환율이나 효율성을 측정하기 힘들었지만, 지금은 다양한 온라인 툴을 사용해서 간단하게 측정하고 개선할 수 있다. 온라인으로 인바운드 마케팅에 주력하는 것은 아주 자연스러운 일이다. 그리고 앞으로 더욱 강조 될 것으로 예측된다. 하지만, 마켓팅 관점에서 오프라인도 여전히 중요한 요소임은 분명하다. In the past, it was difficult to measure conversion rates or efficiency in the marketing space, but now it is simple to measure and improve using a variety of online tools. It is natural to focus on inbound marketing online. And it is expected to be more emphasized in the future. However, it is clear that offline is still an important factor from a marketing perspective.

한국공개특허 제2018-0047794호 2018년 05월 10일 공개 (명칭: 오프라인 매장의 방문자 행동 패턴에 기반한 O2O 마케팅 플랫폼 시스템 및 그 구축 방법)Korean Patent Application Publication No. 2018-0047794 published on May 10, 2018 (Name: O2O marketing platform system based on visitor behavior patterns of offline stores and its construction method)

본 발명의 목적은 매장을 방문한 고객의 행동 및 표정을 통해 구매 의도를 파악할 수 있는 장치, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체를 제공함에 있다. An object of the present invention is to provide a device capable of identifying purchase intentions through behaviors and facial expressions of customers visiting a store, a method therefor, and a computer-readable recording medium in which a program for performing the method is recorded.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 구매 의도를 식별하기 위한 장치는 고객의 영상을 지속적으로 촬영하는 카메라부와, 상기 고객의 영상으로부터 고객이 상품을 응시하는 동작인 상품 응시 동작을 검출하는 모션인식부와, 상기 고객의 영상 중 얼굴 부분인 얼굴 영상을 통해 고객의 감정의 강도 변화를 분석하여 해당 상품에 대한 고객의 구매 의도를 판별하는 감정인식부와, 상기 판별 결과, 상기 고객의 구매 의도가 있는 것으로 판별되면, 상기 고객이 응시하는 상품을 특정하는 정보처리부를 포함한다. An apparatus for identifying purchase intention according to a preferred embodiment of the present invention for achieving the above-described object includes a camera unit continuously photographing an image of a customer, and an operation in which the customer stares at the product from the image of the customer. A motion recognition unit that detects a product gaze motion, and an emotion recognition unit that determines the customer's purchase intention for the product by analyzing the change in the intensity of the customer's emotion through a face image, which is a face part of the customer's image, and the determination As a result, when it is determined that there is a purchase intention of the customer, an information processing unit for specifying a product to be stared by the customer is included.

상기 감정인식부는 고객의 복수의 감정 중 소정의 제1 수치 이상의 강도를 가지는 적어도 하나의 감정을 선택하고, 선택된 감정의 강도가 상기 상품 응시 동작을 검출한 시점으로부터 소정 기간 동안 소정의 제1 수치 이상 그리고 상기 제1 수치 보다 큰 소정의 제2 수치 이하의 범위 내에서 변화하면, 상기 고객이 구매 의도가 있는 것으로 판단하는 것을 특징으로 한다. The emotion recognition unit selects at least one emotion having an intensity equal to or greater than a predetermined first value among a plurality of emotions of the customer, and the intensity of the selected emotion is equal to or greater than a predetermined first value for a predetermined period from the time when the product staring motion is detected. And if it changes within a range less than or equal to a predetermined second value greater than the first value, it is characterized in that it is determined that the customer has a purchase intention.

상기 감정인식부는 고객의 복수의 감정 중 적어도 하나의 감정의 강도가 상기 상품 응시 동작을 검출한 시점 이전 보다 상기 상품 응시 동작을 검출한 시점 이후에 소정의 제3 수치 이상 상승하면, 상기 고객이 구매 의도가 있는 것으로 판단하는 것을 특징으로 한다. If the intensity of at least one emotion among a plurality of emotions of the customer increases by a predetermined third value or more after the time when the product gaze action is detected than before the time when the product gaze action is detected, the customer purchases It is characterized by determining that there is an intention.

상기 장치는 직원장치와 통신을 위한 통신부를 더 포함하며, 상기 정보처리부는 상기 특정된 상품에 대한 프로모션 정보를 생성하고, 생성된 프로모션 정보를 상기 통신부를 통해 상기 직원장치로 전송하는 것을 특징으로 한다. The device further comprises a communication unit for communicating with an employee device, wherein the information processing unit generates promotion information for the specified product, and transmits the generated promotion information to the employee device through the communication unit. .

상기 장치는 화면 표시를 위한 표시부를 더 포함하며, 상기 정보처리부는 상기 특정된 상품에 대한 프로모션 정보를 생성하고, 생성된 프로모션 정보를 상기 표시부를 통해 표시하는 것을 특징으로 한다. The device may further include a display unit for displaying a screen, wherein the information processing unit generates promotion information for the specified product, and displays the generated promotion information through the display unit.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 구매 의도를 식별하기 위한 방법은 카메라부가 고객의 영상을 지속적으로 촬영하는 단계와, 모션인식부가 상기 고객의 영상으로부터 고객이 상품을 응시하는 동작인 상품 응시 동작을 검출하는 단계와, 감정인식부가 상기 고객의 영상 중 얼굴 부분인 얼굴 영상을 통해 고객의 감정의 강도 변화를 분석하여 해당 상품에 대한 고객의 구매 의도를 판별하는 단계와, 정보처리부가 상기 판별 결과, 상기 고객의 구매 의도가 있는 것으로 판별되면, 상기 고객이 응시하는 상품을 특정하는 단계를 포함한다. A method for identifying purchase intent according to a preferred embodiment of the present invention for achieving the above-described object includes the steps of continuously capturing an image of a customer by a camera unit, and a motion recognition unit to select a product from the customer's image. The step of detecting a product gaze motion, which is an action to gaze at, and determining a purchase intention of the customer for the product by analyzing the change in the intensity of the customer's emotion through the facial image, which is a face part of the customer's image, by an emotion recognition unit; and And, when the information processing unit determines that the customer has a purchase intention as a result of the determination, specifying a product to be stared at by the customer.

전술한 본 발명의 바람직한 실시예에 따른 구매 의도를 식별하기 위한 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체를 제공한다. A computer-readable recording medium on which a program for performing a method for identifying purchase intent according to the above-described preferred embodiment of the present invention is recorded is provided.

본 발명은 고객이 상품을 응시할 때의 표정을 통해 고객의 구매 의도를 파악하고, 이에 따른, 맞춤 응대 전략을 제공할 수 있어 효율적인 마케팅 전략을 제공할 수 있다. The present invention can identify a customer's purchase intention through an expression when a customer gazes at a product, and provide a customized response strategy accordingly, thereby providing an efficient marketing strategy.

도 1은 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 시스템을 설명하기 위한 블록도이다.
도 2는 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 식별장치를 설명하기 위한 블록도이다.
도 3은 본 발명의 실시예에 따른 감정인식부의 내부 구성을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 방법을 설명하기 위한 흐름도이다.
도 5 내지 도 7은 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 방법을 설명하기 위한 그래프이다. 1 is a block diagram illustrating a system for identifying a purchase intention according to an embodiment of the present invention.
2 is a block diagram illustrating an identification device for identifying purchase intention according to an embodiment of the present invention.
3 is a view for explaining the internal configuration of the emotion recognition unit according to an embodiment of the present invention.
4 is a flowchart illustrating a method for identifying purchase intention according to an embodiment of the present invention.
5 to 7 are graphs for explaining a method for identifying purchase intention according to an embodiment of the present invention.

본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 실시예에 불과할 뿐, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. Prior to the detailed description of the present invention, terms or words used in the present specification and claims described below should not be construed as being limited to their usual or dictionary meanings, and the inventors shall use their own invention in the best way. For explanation, based on the principle that it can be appropriately defined as a concept of terms, it should be interpreted as a meaning and concept consistent with the technical idea of the present invention. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all the technical spirit of the present invention, and various equivalents that can replace them at the time of application It should be understood that there may be water and variations.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 이때, 첨부된 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음을 유의해야 한다. 또한, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 마찬가지의 이유로 첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this case, it should be noted that the same components in the accompanying drawings are indicated by the same reference numerals as possible. In addition, detailed descriptions of known functions and configurations that may obscure the subject matter of the present invention will be omitted. For the same reason, some components in the accompanying drawings are exaggerated, omitted, or schematically illustrated, and the size of each component does not entirely reflect the actual size.

먼저, 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 시스템에 대해서 상세하게 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 시스템을 설명하기 위한 블록도이다. 도 1을 참조하면, 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 시스템은 식별장치(100) 및 직원장치(400)를 포함한다. First, a system for identifying purchase intention according to an embodiment of the present invention will be described in detail. 1 is a block diagram illustrating a system for identifying a purchase intention according to an embodiment of the present invention. Referring to FIG. 1, a system for identifying purchase intention according to an embodiment of the present invention includes an identification device 100 and an employee device 400.

식별장치(100)는 매장 내부를 촬영하기 위한 카메라 기능을 포함하는 장치이며, 매장 내부에 설치된다. 직원장치(400)는 직원이 휴대한 장치이다. The identification device 100 is a device including a camera function for photographing the interior of the store, and is installed inside the store. The employee device 400 is a device carried by an employee.

식별장치(100)는 매장 내의 고객을 촬영하여 해당 고객이 특정 상품에 대한 구매 의도를 가지고 있는지 여부를 판별한다. 식별장치(100)는 고객이 특정 상품을 응시하는 시점의 고객의 얼굴에 나타내는 감정의 강도 변화를 통해 해당 상품에 대한 구매 의도를 판별할 수 있다. 만약, 해당 고객이 특정 상품에 대한 구매 의도를 가지고 있는 것으로 판단되면, 식별장치(100)는 직원이 해당 고객 및 해당 상품을 인지할 수 있도록 하면서, 해당 상품의 판매를 촉진할 수 있는 프로모션 정보를 직원장치(400)로 제공할 수 있다. The identification device 100 photographs a customer in the store and determines whether the customer has an intention to purchase a specific product. The identification device 100 may determine the purchase intention for the corresponding product through a change in the intensity of the emotion expressed on the customer's face at the time when the customer stares at the specific product. If it is determined that the customer has an intention to purchase a specific product, the identification device 100 allows the employee to recognize the customer and the product and provides promotion information that can promote the sale of the product. It can be provided to the staff device 400.

다음으로, 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 식별장치(100)에 대해서 보다 상세하게 설명하기로 한다. 도 2는 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 식별장치를 설명하기 위한 블록도이다. Next, an identification device 100 for identifying purchase intention according to an embodiment of the present invention will be described in more detail. 2 is a block diagram illustrating an identification device for identifying purchase intention according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시예에 따른 식별장치(100)는 통신부(110), 카메라부(120), 입력부(130), 표시부(140), 오디오부(150), 저장부(160) 및 제어부(200)를 포함한다. Referring to FIG. 2, the identification device 100 according to an embodiment of the present invention includes a communication unit 110, a camera unit 120, an input unit 130, a display unit 140, an audio unit 150, and a storage unit 160. ) And a control unit 200.

통신부(110)는 예컨대, 직원장치(400)와 통신하기 위한 수단이다. 통신부(110)는 송신되는 신호의 주파수를 상승 변환 및 증폭하는 RF(Radio Frequency) 송신기(Tx) 및 수신되는 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF 수신기(Rx)를 포함할 수 있다. 그리고 통신부(110)는 송신되는 신호를 변조하고, 수신되는 신호를 복조하는 모뎀(Modem)을 포함할 수 있다. The communication unit 110 is, for example, a means for communicating with the employee device 400. The communication unit 110 may include a radio frequency (RF) transmitter Tx for up-converting and amplifying a frequency of a transmitted signal, and an RF receiver Rx for low-noise amplifying and down-converting a received signal. Further, the communication unit 110 may include a modem that modulates a transmitted signal and demodulates a received signal.

카메라부(120)는 영상을 촬영하기 위한 것으로, 이미지 센서를 포함한다. 이미지 센서는 피사체에서 반사되는 빛을 입력받아 전기신호로 변환하며, CCD(Charged Coupled Device), CMOS(Complementary Metal-Oxide Semiconductor) 등을 기반으로 구현될 수 있다. 카메라부(120)는 아날로그-디지털 변환기(Analog to Digital Converter)를 더 포함할 수 있으며, 이미지 센서에서 출력되는 전기신호를 디지털 수열로 변환하여 제어부(200)로 출력할 수 있다. The camera unit 120 is for capturing an image and includes an image sensor. The image sensor receives light reflected from a subject and converts it into an electric signal, and may be implemented based on a Charged Coupled Device (CCD) or Complementary Metal-Oxide Semiconductor (CMOS). The camera unit 120 may further include an analog to digital converter, and may convert an electrical signal output from the image sensor into a digital sequence and output it to the controller 200.

특히, 카메라부(120)는 3D 센서를 포함한다. 3D 센서는 비접촉 방식으로 영상의 각 픽셀의 3차원 좌표를 획득하기 위한 센서이다. 카메라부(120)가 영상을 촬영하면, 3D 센서는 촬영된 영상과 함께, 카메라부(120)의 소정의 기준점(예컨대, 초점)으로부터 촬영한 객체까지의 실제 거리를 측정하여 촬영한 영상의 각 픽셀의 3차원 좌표를 검출하고, 검출된 3차원 좌표를 제어부(200)로 전달한다. 3D 센서는 레이저, 적외선, 가시광 등을 이용하는 다양한 방식의 센서를 이용할 수 있다. 이러한 3D 센서는 TOP(Time of Flight), 위상변위(Phase-shift) 및 Online Waveform Analysis 중 어느 하나를 이용하는 레이저 방식 3차원 스캐너, 광 삼각법을 이용하는 레이저 방식 3차원 스캐너, 백색광 혹은 변조광을 이용하는 광학방식 3차원 스캐너, Handheld Real Time 방식의 PHOTO, 광학방식 3차원 스캐너, Pattern Projection 혹은 Line Scanning을 이용하는 광학방식, 레이저 방식 전신 스캐너, 사진 측량(Photogrammetry)을 이용하는 사진방식 스캐너, 키네틱(Kinect Fusion)을 이용하는 실시간(Real Time) 스캐너 등을 예시할 수 있다. In particular, the camera unit 120 includes a 3D sensor. The 3D sensor is a sensor for obtaining 3D coordinates of each pixel of an image in a non-contact method. When the camera unit 120 captures an image, the 3D sensor measures the actual distance from a predetermined reference point (eg, focus) of the camera unit 120 to the captured object, along with the captured image. The three-dimensional coordinates of the pixel are detected, and the detected three-dimensional coordinates are transmitted to the controller 200. As the 3D sensor, various types of sensors using laser, infrared, and visible light can be used. These 3D sensors are laser type 3D scanners using any one of TOP (Time of Flight), phase-shift and Online Waveform Analysis, laser type 3D scanners using optical trigonometry, and optics using white light or modulated light. Method 3D scanner, Handheld Real Time PHOTO, optical 3D scanner, optical method using Pattern Projection or Line Scanning, laser system full body scanner, photo scanner using photogrammetry, Kinect Fusion A real time scanner to be used may be exemplified.

입력부(130)는 식별장치(100)를 제어하기 위한 사용자의 키 조작을 입력받고 입력 신호를 생성하여 제어부(200)에 전달할 수 있다. 입력부(130)는 식별장치(100)를 제어하기 위한 각 종 키들을 포함할 수 있다. 입력부(130)는 표시부(140)가 터치스크린으로 이루어진 경우, 각 종 키들의 기능이 표시부(140)에서 이루어질 수 있으며, 터치스크린만으로 모든 기능을 수행할 수 있는 경우, 입력부(130)는 생략될 수도 있다. The input unit 130 may receive a user's key manipulation for controlling the identification device 100, generate an input signal, and transmit the input signal to the control unit 200. The input unit 130 may include various types of keys for controlling the identification device 100. When the display unit 140 is formed of a touch screen, the input unit 130 may perform functions of various keys on the display unit 140, and when all functions can be performed only with the touch screen, the input unit 130 will be omitted. May be.

표시부(140)는 화면 표시를 위한 것으로, 식별장치(100)의 메뉴, 입력된 데이터, 기능 설정 정보 및 기타 다양한 정보를 사용자에게 시각적으로 제공할 수 있다. 또한, 표시부(140)는 식별장치(100)의 부팅 화면, 대기 화면, 메뉴 화면, 등의 화면을 출력하는 기능을 수행한다. 표시부(140)는 액정표시장치(LCD, Liquid Crystal Display), 유기 발광 다이오드(OLED, Organic Light Emitting Diodes), 능동형 유기 발광 다이오드(AMOLED, Active Matrix Organic Light Emitting Diodes) 등으로 형성될 수 있다. 한편, 표시부(140)는 터치스크린으로 구현될 수 있다. 이러한 경우, 표시부(140)는 터치센서를 포함한다. 터치센서는 사용자의 터치 입력을 감지한다. 터치센서는 정전용량 방식(capacitive overlay), 압력식, 저항막 방식(resistive overlay), 적외선 감지 방식(infrared beam) 등의 터치 감지 센서로 구성되거나, 압력 감지 센서(pressure sensor)로 구성될 수도 있다. 상기 센서들 이외에도 물체의 접촉 또는 압력을 감지할 수 있는 모든 종류의 센서 기기가 본 발명의 터치센서로 이용될 수 있다. 터치센서는 사용자의 터치 입력을 감지하고, 감지 신호를 발생시켜 제어부(200)로 전송한다. 특히, 표시부(140)가 터치스크린으로 이루어진 경우, 입력부(130) 기능의 일부 또는 전부는 표시부(140)를 통해 이루어질 수 있다. The display unit 140 is for screen display and may visually provide a menu of the identification device 100, input data, function setting information, and various other information to a user. In addition, the display unit 140 performs a function of outputting screens such as a boot screen, a standby screen, a menu screen, and the like of the identification device 100. The display unit 140 may be formed of a liquid crystal display (LCD), an organic light emitting diode (OLED), an active matrix organic light emitting diode (AMOLED), or the like. Meanwhile, the display unit 140 may be implemented as a touch screen. In this case, the display unit 140 includes a touch sensor. The touch sensor detects a user's touch input. The touch sensor may be composed of a touch sensing sensor such as a capacitive overlay, a pressure type, a resistive overlay, or an infrared beam, or may be composed of a pressure sensor. . In addition to the above sensors, all kinds of sensor devices capable of sensing contact or pressure of an object may be used as the touch sensor of the present invention. The touch sensor detects a user's touch input, generates a detection signal, and transmits it to the controller 200. In particular, when the display unit 140 is formed of a touch screen, some or all functions of the input unit 130 may be performed through the display unit 140.

오디오부(150)는 본 발명의 실시예에 따른 음성과 같은 오디오 신호를 출력하기 위한 스피커(SPK)와, 음성과 같은 오디오 신호를 수집하기 위한 마이크(MIKE)를 포함한다. 즉, 오디오부(150)는 제어부(200)의 제어에 따라 오디오 신호를 스피커(SPK)를 통해 출력하거나, 마이크(MIKE)를 통해 입력된 오디오 신호를 제어부(200)로 전달할 수 있다. The audio unit 150 includes a speaker (SPK) for outputting an audio signal such as voice according to an embodiment of the present invention, and a microphone (MIKE) for collecting an audio signal such as voice. That is, the audio unit 150 may output an audio signal through the speaker SPK or transmit an audio signal input through the microphone (MIKE) to the controller 200 under control of the controller 200.

저장부(160)는 식별장치(100)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 저장부(160)에 저장되는 각 종 데이터는 사용자의 조작에 따라, 삭제, 변경, 추가될 수 있다. The storage unit 160 serves to store programs and data necessary for the operation of the identification device 100. Each type of data stored in the storage unit 160 may be deleted, changed, or added according to a user's manipulation.

제어부(200)는 식별장치(100)의 전반적인 동작 및 식별장치(100)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 또한, 제어부(200)는 기본적으로, 식별장치(100)의 각 종 기능을 제어하는 역할을 수행한다. 제어부(200)는 중앙처리장치(CPU: Central Processing Unit), 디지털신호처리기(DSP: Digital Signal Processor) 등을 예시할 수 있다. 제어부(200)는 동작인식부(210), 감정인식부(220) 및 정보처리부(230)를 포함한다. 동작인식부(210), 감정인식부(220) 및 정보처리부(230)를 포함하는 제어부(200)의 동작에 대해서는 아래에서 더 상세하게 설명될 것이다. The controller 200 may control the overall operation of the identification device 100 and a signal flow between internal blocks of the identification device 100, and perform a data processing function of processing data. In addition, the control unit 200 basically serves to control various functions of the identification device 100. The control unit 200 may be a central processing unit (CPU), a digital signal processor (DSP), or the like. The control unit 200 includes a motion recognition unit 210, an emotion recognition unit 220 and an information processing unit 230. The operation of the control unit 200 including the motion recognition unit 210, the emotion recognition unit 220, and the information processing unit 230 will be described in more detail below.

다음으로, 본 발명의 실시예에 따른 감정인식부(220)에 대해서 보다 상세하게 설명하기로 한다. 감정인식부(220)는 합성곱신경망(CNN: convolutional neural networks)로 구현되는 것이 바람직하다. 하지만, 감정인식부(220)를 합성곱신경망으로 한정하는 것은 아니며, 얼굴 영상을 입력받고, 얼굴 영상으로부터 소정 수의 표정을 확률로 출력하도록 학습(machine learning)된 하는 인공신경망(ANN: artificial neural networks)이라면, 그 종류에 무관하게 본 발명의 실시예에 적용될 수 있을 것이다. 도 3은 본 발명의 실시예에 따른 감정인식부의 내부 구성을 설명하기 위한 도면이다. Next, the emotion recognition unit 220 according to an embodiment of the present invention will be described in more detail. The emotion recognition unit 220 is preferably implemented with convolutional neural networks (CNN). However, the emotion recognition unit 220 is not limited to a convolutional neural network, and an artificial neural network (ANN) that receives a face image and learns to output a predetermined number of expressions with probability from the face image. networks), it may be applied to an embodiment of the present invention regardless of its kind. 3 is a view for explaining the internal configuration of the emotion recognition unit according to an embodiment of the present invention.

도 3을 참조하면, 감정인식부(220)는 복수의 계층을 포함한다. 복수의 계층은 어느 하나의 계층의 출력이 가중치가 적용되는 복수의 연산을 통해 다음 계층을 구성한다. 여기서, 가중치(W)는 계층 간 연결의 강도를 결정한다. 3, the emotion recognition unit 220 includes a plurality of layers. The plurality of layers configures the next layer through a plurality of operations in which the output of any one layer is weighted. Here, the weight (W) determines the strength of the inter-layer connection.

감정인식부(220)는 입력계층(input layer: INL), 컨볼루션계층(convolution layer: CL), 풀링계층(pooling layer: PL), 완전연결계층(fully-connected layer: FCL) 및 출력계층(Output layer: ML)을 포함한다. 입력계층(IL)은 소정 크기의 행렬로 이루어진다. 입력계층(IL) 행렬의 각 원소는 입력되는 이미지의 각각의 픽셀에 대응한다. 도 3에 도시된 바에 따르면, 2개의 컨볼루션계층(CL: CLF, CLS)과 2개의 풀링계층(PL: PLF, PLS)이 교대로 반복되는 것으로 도시하였지만, 본 발명은 이에 한정되지 않으며, 이 기술분야에서 통상의 지식을 가진자는 컨볼루션계층(CL) 및 풀링계층(PL)의 수 및 배치 순서가 인공신경망의 설계에 따라 변동될 수 있음을 이해할 수 있을 것이다. 컨볼루션계층(CL)과 풀링계층(PL) 각각은 실선의 사각형으로 표시된 복수의 특징 지도(Feature Map)로 이루어지며, 이러한 특징 지도 각각은 소정 크기의 행렬이다. 특징 지도를 이루는 행렬의 원소 각각의 값은 이전 계층에 대해 커널을 이용한 컨볼루션 연산(convolution) 혹은 풀링 연산(pooling 혹은 subsampling)을 적용하여 산출된다. 여기서, 여기서, 커널은 점선의 사각형으로 표시된 소정 크기의 행렬이며, 커널을 이루는 행렬의 각 원소의 값은 가중치(W)가 된다. The emotion recognition unit 220 includes an input layer (INL), a convolution layer (CL), a pooling layer (PL), a fully-connected layer (FCL), and an output layer ( Output layer: ML). The input layer IL consists of a matrix of a predetermined size. Each element of the input layer (IL) matrix corresponds to each pixel of the input image. As shown in FIG. 3, two convolution layers (CL: CLF, CLS) and two pooling layers (PL: PLF, PLS) are shown to be alternately repeated, but the present invention is not limited thereto. Those of ordinary skill in the art will appreciate that the number and arrangement order of the convolutional layers (CL) and the pooling layers (PL) may vary depending on the design of the artificial neural network. Each of the convolutional layer CL and the pooling layer PL includes a plurality of feature maps indicated by solid squares, and each of these feature maps is a matrix having a predetermined size. The value of each element of the matrix constituting the feature map is calculated by applying a convolution operation or pooling or subsampling using a kernel to the previous layer. Here, the kernel is a matrix of a predetermined size indicated by a dotted rectangle, and the value of each element of the matrix constituting the kernel becomes a weight (W).

완전연결계층(FCL)은 복수의 노드(혹은 sigmoid: C1, C2, C3...... Cn)를 포함하며, 출력계층(OUL)은 복수의 출력노드(O1, O2, O3, ..., O6)를 포함한다. 완전연결계층(FL)의 연산 또한 가중치(W)가 적용되어 출력계층(OL)의 복수의 출력노드(O1, O2, O3...... O6)에 입력된다. 복수의 출력노드(O1, O2, O3...... O6) 각각은 소정의 감정에 대응한다. 예컨대, 이러한 감정은 기쁨, 슬픔, 공포, 놀람, 혐오 및 분노를 포함한다. The fully connected layer (FCL) includes a plurality of nodes (or sigmoid: C1, C2, C3......Cn), and the output layer (OUL) includes a plurality of output nodes (O1, O2, O3, .. ., O6). A weight W is also applied to the calculation of the fully connected layer FL and input to a plurality of output nodes O1, O2, O3... O6 of the output layer OL. Each of the plurality of output nodes O1, O2, O3, ... O6 corresponds to a predetermined emotion. For example, these emotions include joy, sadness, fear, surprise, disgust and anger.

일례로, 제1 출력 노드(O1)는 얼굴 영상으로부터 도출되는 감정 중 기쁨에 대응하며, 제1 출력 노드(O1)의 출력인 제1 출력값은 얼굴 영상으로부터 도출되는 감정이 기쁨일 확률을 나타낸다. 예컨대, 제1 출력 노드(O1)의 출력인 제1 출력값이 0.35이면, 해당 얼굴 영상으로부터 도출되는 감정이 기쁨일 확률이 35%임을 나타낸다. For example, the first output node O1 corresponds to joy among emotions derived from a face image, and a first output value, which is an output of the first output node O1, indicates a probability that the emotion derived from the face image is joy. For example, if the first output value, which is the output of the first output node O1, is 0.35, it indicates that the probability that the emotion derived from the corresponding face image is joy is 35%.

다른 예로, 제2 출력 노드(O2)는 얼굴 영상으로부터 도출되는 감정 중 슬픔에 대응하며, 제2 출력 노드(O2)의 출력인 제2 출력값은 얼굴 영상으로부터 도출되는 감정이 슬픔일 확률을 나타낸다. 예컨대, 제2 출력 노드(O2)의 출력인 제2 출력값이 0.10이면, 해당 얼굴 영상으로부터 도출되는 감정이 슬픔일 확률이 10%임을 나타낸다. As another example, the second output node O2 corresponds to sadness among emotions derived from a face image, and a second output value, which is an output of the second output node O2, indicates a probability that the emotion derived from the face image is sadness. For example, if the second output value, which is the output of the second output node O2, is 0.10, it indicates that the probability that the emotion derived from the corresponding face image is sadness is 10%.

또 다른 예로, 제6 출력 노드(O6)는 얼굴 영상으로부터 도출되는 감정 중 분노에 대응하며, 제6 출력 노드(O6)의 출력인 제6 출력값은 얼굴 영상으로부터 도출되는 감정이 분노일 확률을 나타낸다. 예컨대, 제6 출력 노드(O6)의 출력인 제6 출력값이 0.02이면, 얼굴 영상으로부터 도출되는 감정이 분노일 확률이 2%임을 나타낸다. As another example, the sixth output node O6 corresponds to anger among emotions derived from the face image, and the sixth output value, which is an output of the sixth output node O6, represents the probability that the emotion derived from the face image is anger. . For example, if the sixth output value, which is the output of the sixth output node O6, is 0.02, it indicates that the probability that the emotion derived from the face image is anger is 2%.

복수의 계층(INL, CL, PL, FL, OUL) 각각은 복수의 연산을 포함한다. 복수의 계층(INL, CL, PL, FL, OUL)의 복수의 연산 각각은 가중치(w)가 적용되며, 가중치(w)가 적용된 연산 결과는 다음 계층으로 전달된다. 즉, 이전 계층의 연산 결과는 다음 계층의 입력이 된다. Each of the plurality of layers (INL, CL, PL, FL, and OUL) includes a plurality of operations. Each of a plurality of operations of a plurality of layers (INL, CL, PL, FL, and OUL) is applied with a weight (w), and the result of the operation to which the weight (w) is applied is transferred to the next layer. That is, the operation result of the previous layer becomes the input of the next layer.

전술한 바와 같이, 입력계층(IL)은 소정 크기의 행렬인 특징지도이다. 입력계층(IL)의 행렬의 원소는 픽셀 단위이다. 그 행렬의 원소 각각은 얼굴 영상의 각 픽셀의 픽셀값 등이 될 수 있고, 픽셀값은 이진 데이터로 입력계층(IL)의 행렬의 원소에 입력될 수 있다. As described above, the input layer IL is a feature map that is a matrix of a predetermined size. The elements of the matrix of the input layer IL are pixel units. Each element of the matrix may be a pixel value of each pixel of the face image, and the pixel value may be input as binary data to an element of the matrix of the input layer IL.

그러면, 입력계층 행렬에 대해 복수의 커널(W) 각각을 이용한 컨벌루션 연산(convolution)이 수행되며, 그 연산 결과는 제1 컨벌루션 계층(CLF)의 복수의 특징지도에 입력된다. 여기서, 복수의 커널(W) 각각은 행렬의 원소가 가중치(W)인 소정 크기의 행렬을 이용할 수 있다. 또한, 제1 컨벌루션 계층(CLF)의 복수의 특징지도 각각은 소정 크기의 행렬이다. Then, a convolution operation using each of the plurality of kernels W is performed on the input layer matrix, and the result of the operation is input to a plurality of feature maps of the first convolutional layer CLF. Here, each of the plurality of kernels W may use a matrix having a predetermined size in which an element of the matrix is a weight W. In addition, each of the plurality of feature maps of the first convolutional layer CLF is a matrix having a predetermined size.

다음으로, 제1 컨벌루션 계층(CLF)의 복수의 특징 지도에 대해 복수의 커널(W)을 이용한 풀링 연산(subsampling)이 수행된다. 복수의 커널(W) 또한 각각이 원소가 가중치(W)로 이루어진 소정 크기의 행렬이다. 이러한 풀링 연산(subsampling)의 연산 결과는 제1 풀링계층(PLF)의 복수의 특징지도에 입력된다. 제1 풀링계층(PLF)의 복수의 특징지도 역시 각각이 소정 크기의 행렬이다. Next, a pooling operation (subsampling) using a plurality of kernels W is performed on a plurality of feature maps of the first convolutional layer CLF. A plurality of kernels (W) is also a matrix of a predetermined size, each of which is composed of a weight (W). The result of the subsampling operation is input to a plurality of feature maps of the first pooling layer PLF. Each of the plurality of feature maps of the first pooling layer PLF is also a matrix having a predetermined size.

이어서, 제1 풀링계층(PLF)의 복수의 특징 지도에 대해 행렬의 원소 각각이 가중치(W)로 이루어진 소정 크기의 행렬인 커널(W)을 이용한 컨벌루션 연산(convolution)을 수행하여, 복수개의 특징 지도로 이루어진 제2 컨벌루션 계층(CLS)을 구성한다. 다음으로, 제2 컨벌루션 계층(CLS)의 복수의 특징 지도에 대해 복수의 가중치(w)로 이루어진 행렬인 커널(W)을 이용한 풀링 연산(subsampling)을 수행하여 복수의 특징 지도로 이루어진 제2 풀링계층(PLS)을 구성한다. 제2 풀링계층(PLS) 역시 각각이 소정 크기의 행렬이다. Subsequently, a convolution operation (convolution) is performed using a kernel (W), which is a matrix of a predetermined size in which each element of the matrix is a weight (W) with respect to the plurality of feature maps of the first pooling layer (PLF). A second convolutional layer (CLS) consisting of a map is constructed. Next, a second pooling consisting of a plurality of feature maps is performed by performing a subsampling operation using a kernel W, which is a matrix consisting of a plurality of weights w, for a plurality of feature maps of the second convolutional layer (CLS). Construct a layer (PLS). Each of the second pooling layers PLS is also a matrix having a predetermined size.

그런 다음, 제2 풀링계층(PLS)의 복수의 특징 지도에 대해 복수의 커널(W)을 이용한 컨벌루션 연산(convolution)을 수행한다. 복수의 커널(W) 또한 그 원소가 가중치(w)로 이루어진 소정 크기의 행렬이다. 복수의 커널(W)을 이용한 컨벌루션 연산(convolution) 결과에 따라 완전연결계층(FCL)이 생성된다. 다른 말로, 복수의 커널(W)을 이용한 컨벌루션 연산(convolution) 결과는 복수의 노드(C1 내지 Cn)에 입력된다. Thereafter, a convolution operation using a plurality of kernels W is performed on a plurality of feature maps of the second pooling layer PLS. A plurality of kernels (W) is also a matrix of a predetermined size whose elements are weights (w). A fully connected layer (FCL) is generated according to a result of a convolution operation using a plurality of kernels (W). In other words, a result of a convolution operation using a plurality of kernels W is input to a plurality of nodes C1 to Cn.

완전연결계층(FL)의 복수의 노드(C1 내지 Cn) 각각은 제2 풀링계층(PLS)으로부터 입력에 대해 전달함수 등을 이용한 소정의 연산을 수행하고, 그 연산에 가중치(w)를 적용하여 출력계층(OUL)의 각 노드에 입력한다. 이에 따라, 출력계층(OUL)의 복수의 출력 노드(O1 내지 O6)는 완전연결계층(FCL)으로부터 입력된 값에 대해 소정의 연산을 수행하고, 그 결과인 출력값을 출력한다. 전술한 바와 같이, 복수의 출력 노드(O1 내지 O6) 각각은 얼굴 영상으로부터 도출되는 감정에 대응하며, 이러한 복수의 출력 노드(O1 내지 O6) 각각의 출력값은 해당 얼굴 영상으로부터 도출되는 감정의 확률값이다. Each of the plurality of nodes (C1 to Cn) of the fully connected layer (FL) performs a predetermined operation using a transfer function, etc. for an input from the second pooling layer (PLS), and applies a weight (w) to the operation. Input to each node of the output layer (OUL). Accordingly, the plurality of output nodes O1 to O6 of the output layer OUL perform a predetermined operation on a value input from the fully connected layer FCL, and output an output value as a result. As described above, each of the plurality of output nodes O1 to O6 corresponds to an emotion derived from a face image, and an output value of each of the plurality of output nodes O1 to O6 is a probability value of an emotion derived from the corresponding face image. .

전술한 바와 같이, 감정인식부(220)의 복수의 계층 각각은 복수의 연산으로 이루어지며, 어느 하나의 계층의 어느 하나의 연산 결과는 가중치(w)가 적용되어 후속 계층에 입력된다. 따라서 감정인식부(220)는 얼굴 영상이 입력되면, 얼굴 영상의 각 픽셀 단위로 가중치(w)가 적용되는 복수의 연산을 수행하여 그 연산의 결과를 출력한다. 이러한 연산 결과에 따라 최종적으로 출력 노드(O1 내지 O6) 각각의 출력값은 소정의 감정에 대응하는 확률값이 된다. 예컨대, 출력 노드(O1 내지 O6) 각각의 출력값은 기쁨, 슬픔, 공포, 놀람, 혐오 및 분노 각각의 확률값이다. As described above, each of the plurality of layers of the emotion recognition unit 220 consists of a plurality of operations, and a weight w is applied to a result of any one operation of any one layer and input to a subsequent layer. Accordingly, when a face image is input, the emotion recognition unit 220 performs a plurality of operations to which a weight w is applied in each pixel unit of the face image, and outputs a result of the operation. According to the result of this operation, the output values of each of the output nodes O1 to O6 finally become a probability value corresponding to a predetermined emotion. For example, the output values of each of the output nodes O1 to O6 are probability values of joy, sadness, fear, surprise, dislike, and anger.

감정인식부(220)는 그 감정이 알려진 얼굴 영상(FIMG)을 학습 데이터로 사용하여 학습될 수 있다. 감정인식부(220)는 학습데이터인 얼굴 영상(FIMG)이 입력되면, 얼굴 영상(FIMG)의 알려진 감정에 따라 기댓값을 설정한다. 여기서, 기댓값은 해당 얼굴 영상(FIMG)으로부터 검출될 것으로 기대되는 복수의 감정 각각에 대한 확률이 될 수 있다. 예컨대, 얼굴 영상(FIMG)이 기쁨을 나타내는 것일 때, 기댓값은 기쁨, 슬픔, 공포, 놀람, 혐오 및 분노 각각에 대해 0.600, 0.001, 0.001, 0.396, 0.001, 0.001로 설정될 수 있다. The emotion recognition unit 220 may be trained using a face image FIMG in which the emotion is known as learning data. When the face image FIMG, which is the learning data, is input, the emotion recognition unit 220 sets an expected value according to the known emotion of the face image FIMG. Here, the expected value may be a probability for each of a plurality of emotions expected to be detected from the corresponding face image FIMG. For example, when the facial image FIMG represents joy, the expected value may be set to 0.600, 0.001, 0.001, 0.396, 0.001, and 0.001 for each of joy, sadness, fear, surprise, disgust, and anger.

그런 다음, 감정인식부(220)는 학습 데이터인 얼굴 영상(FIMG)의 복수의 픽셀값에 대해 복수의 계층(INL, CL, PL, FL, OUL)의 복수의 연산을 통해 최종적으로 얼굴 영상에 대응하는 복수의 감정 각각의 확률을 출력한다. 이때, 복수의 출력 노드(O1 내지 O6) 각각은 기쁨, 슬픔, 공포, 놀람, 혐오 및 분노에 대응하며, 그 출력값이 0.400, 0.100, 0.163, 0.070, 0.233, 0.034와 같이 도출되었다고 가정한다. Then, the emotion recognition unit 220 finally applies to the face image through a plurality of operations of a plurality of layers (INL, CL, PL, FL, OUL) for a plurality of pixel values of the face image FIMG, which is the training data. Outputs the probability of each of a plurality of corresponding emotions. At this time, it is assumed that each of the plurality of output nodes O1 to O6 responds to joy, sadness, fear, surprise, disgust, and anger, and that the output values are derived such as 0.400, 0.100, 0.163, 0.070, 0.233, and 0.034.

이와 같이, 충분한 학습이 이루어지기 전, 즉, 학습이 완료되기 전, 감정인식부(220)의 출력값은 기댓값과 차이가 있을 수 있다. 따라서 감정인식부(220)는 출력값과 기댓값을 비교하여, 기댓값과 출력값의 차이가 최소가 되도록 역전파(back-propagation) 알고리즘을 통해 가중치(W)를 수정한다. In this way, before sufficient learning is performed, that is, before the learning is completed, the output value of the emotion recognition unit 220 may be different from the expected value. Accordingly, the emotion recognition unit 220 compares the output value and the expected value, and corrects the weight W through a back-propagation algorithm so that the difference between the expected value and the output value is minimized.

이와 같이, 감정인식부(220)는 기댓값이 설정된 학습 데이터를 입력하여 출력값이 도출되면, 역전파 알고리즘을 통해 가중치(W)를 수정하는 것을 학습 절차를 수행한다. In this way, the emotion recognition unit 220 performs a learning procedure of modifying the weight W through a back propagation algorithm when an output value is derived by inputting training data for which an expected value is set.

감정인식부(220)는 복수의 학습 데이터를 이용하여 전술한 학습 절차를 반복한다. 특히, 감정인식부(220)는 어느 하나의 학습 데이터를 입력했을 때, 기댓값과 출력값의 차이가 소정 수치 이내이면서, 출력값이 변동이 없으면, 학습이 완료된 것으로 판단하고, 학습 절차를 종료한다. The emotion recognition unit 220 repeats the above-described learning procedure by using a plurality of learning data. In particular, when any one of the learning data is input, if the difference between the expected value and the output value is within a predetermined value and there is no change in the output value, the emotion recognition unit 220 determines that the learning has been completed, and ends the learning process.

학습 절차가 종료되면, 감정인식부(220)는 감정이 알려지지 않은 얼굴 영상(FIMG)을 입력 데이터로 사용하여 해당 얼굴 영상(FIMG)으로부터 복수의 감정의 강도를 판별할 수 있다. 즉, 감정인식부(220)는 입력 데이터인 얼굴 영상(FIMG)의 복수의 픽셀값에 대해 복수의 계층(INL, CL, PL, FL, OUL)의 복수의 연산을 통해 최종적으로 얼굴 영상에 대응하는 복수의 감정 각각의 확률을 출력한다. 이때, 복수의 출력 노드(O1 내지 O6) 각각은 기쁨, 슬픔, 공포, 놀람, 혐오 및 분노에 대응하며, 출력 노드(O1 내지 O6) 각각의 출력값은 해당 얼굴 영상(FIMG)이 대응하는 감정을 나타내는 확률을 의미한다. 본 발명은 이러한 각 감정의 확률을 감정의 강도로 이용한다. 예컨대, 출력 노드(O1 내지 O6) 각각의 출력값은 0.800, 0.010, 0.010, 0.010, 0.160, 0.010과 같이 도출되었다면, 감정인식부(220)는 그 출력값을 해당 얼굴 영상(FIMG)의 기쁨, 슬픔, 공포, 놀람, 혐오 및 분노 각각의 감정의 강도로 인식한다. When the learning process is finished, the emotion recognition unit 220 may determine the intensity of a plurality of emotions from the corresponding face image FIMG by using the face image FIMG of which the emotion is unknown as input data. That is, the emotion recognition unit 220 finally responds to the face image through a plurality of operations of a plurality of layers (INL, CL, PL, FL, OUL) for a plurality of pixel values of the face image FIMG as input data. Outputs the probability of each of the plurality of emotions. At this time, each of the plurality of output nodes O1 to O6 responds to joy, sadness, fear, surprise, disgust, and anger, and the output values of each of the output nodes O1 to O6 indicate emotions corresponding to the face image FIMG. It means the probability to represent. The present invention uses the probability of each emotion as the intensity of emotion. For example, if the output values of each of the output nodes O1 to O6 are derived as 0.800, 0.010, 0.010, 0.010, 0.160, and 0.010, the emotion recognition unit 220 uses the output values of the face image FIMG as joy, sadness, and Recognize each emotion as the intensity of fear, surprise, aversion, and anger.

다음으로, 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 방법에 대해서 설명하기로 한다. 도 4는 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 방법을 설명하기 위한 흐름도이다. 도 5 내지 도 7은 본 발명의 실시예에 따른 구매 의도를 식별하기 위한 방법을 설명하기 위한 그래프이다. Next, a method for identifying purchase intention according to an embodiment of the present invention will be described. 4 is a flowchart illustrating a method for identifying purchase intention according to an embodiment of the present invention. 5 to 7 are graphs for explaining a method for identifying purchase intention according to an embodiment of the present invention.

도 4를 참조하면, 식별장치(100)의 제어부(200)는 S110 단계에서 카메라부(120)를 통해 매장 내의 고객을 지속적으로 촬영하여 저장부(160)에 저장한다. 이러한 촬영 중 제어부(200)의 모션인식부(210)는 S120 단계에서 촬영된 영상으로부터 고객이 특정 상품을 응시하는 동작인 상품 응시 동작이 검출되는지 여부를 확인한다. Referring to FIG. 4, the control unit 200 of the identification device 100 continuously photographs customers in the store through the camera unit 120 in step S110 and stores them in the storage unit 160. During such photographing, the motion recognition unit 210 of the control unit 200 checks whether a product gaze motion, which is an operation in which the customer stares at a specific product, is detected from the image captured in step S120.

모션인식부(210)가 상품 응시 동작을 검출하면, 감정인식부(220)는 S130 단계에서 카메라부(120)가 촬영한 고객의 영상 중 얼굴 부분을 통해 고객의 감정의 변화를 분석하여 해당 상품에 대한 고객의 구매 의도를 판별한다. When the motion recognition unit 210 detects a product staring motion, the emotion recognition unit 220 analyzes the change in the customer’s emotion through the face of the customer’s image captured by the camera unit 120 in step S130 Determine the customer's purchase intention for

감정인식부(220)는 모션인식부(210)가 상품 응시 동작을 검출하면, 해당 시점을 기준으로 소정 기간의 이전부터 카메라부(120)를 통해 촬영되어 저장부(160)에 저장된 고객의 영상 중 얼굴 부분을 추출하고, 추출된 얼굴 부분의 영상인 얼굴 영상을 통해 고객의 감정의 강도 변화를 분석한다. 감정인식부(220)는 고객의 영상으로부터 소정 주기(예컨대, 1초에 한번)로 얼굴 영상을 추출하여 고객의 감정의 강도 변화를 분석한다. 예컨대, 감정인식부(220)는 합성곱신경망(CNN)을 통해 얼굴 영상 중 어느 일 시점에 추출된 얼굴 영상을 분석한 고객의 감정의 강도는 도 5에 도시된 바와 같다. 도시된 바와 같이, 감정인식부(220)의 합성곱신경망(CNN)을 통해 분석된 고객의 감정인 기쁨, 슬픔, 공포, 놀람, 혐오 및 분노의 강도는 각각 0.0264, 0.3321, 0.1223, 0.0977, 0.3852, 0.0263, 0.0264일 수 있다. When the motion recognition unit 210 detects a product staring motion, the emotion recognition unit 220 is an image of a customer that is photographed through the camera unit 120 and stored in the storage unit 160 from before a predetermined period based on the time point. The middle face is extracted, and the change in the intensity of the customer's emotion is analyzed through the facial image, which is an image of the extracted face. The emotion recognition unit 220 analyzes a change in the intensity of the customer's emotion by extracting a face image from the customer's image at a predetermined period (eg, once a second). For example, the emotion recognition unit 220 analyzes a face image extracted at a certain point in a face image through a convolutional neural network (CNN), and the intensity of the customer's emotion is as shown in FIG. 5. As shown, the intensity of the customer's emotions, which are joy, sadness, fear, surprise, disgust, and anger analyzed through the convolutional neural network (CNN) of the emotion recognition unit 220, is 0.0264, 0.3321, 0.1223, 0.0977, 0.3852, respectively. It may be 0.0263 or 0.0264.

일 실시예에 따르면, 감정인식부(220)는 고객의 복수의 감정 중 소정의 제1 수치 이상의 강도를 가지는 적어도 하나의 감정을 선택하고, 선택된 감정의 강도 변화가 상품 응시 동작을 검출한 시점으로부터 소정 기간 동안 소정의 제1 수치 이상 그리고 제1 수치 보다 큰 소정의 제2 수치 이하의 범위 내에 있으면, 상기 고객이 구매 의도가 있는 것으로 판단한다. According to an embodiment, the emotion recognition unit 220 selects at least one emotion having an intensity greater than or equal to a predetermined first value among a plurality of emotions of the customer, and the change in the intensity of the selected emotion is from a time point when the product staring motion is detected. If it is within a range equal to or greater than a predetermined first value and equal to or less than a predetermined second value greater than the first value for a predetermined period, it is determined that the customer has a purchase intention.

예컨대, 도 5 및 도 6을 참조하면, 예컨대, 제1 수치가 0.30이고, 제2 수치가 0.40이라고 가정한다. 그러면, 감정인식부(220)는 감정의 강도가 0.30 이상인 감정을 선택할 수 있다. 이러한 경우, 도 5에 따르면, 감정 '기쁨' 및 '놀람' 중 적어도 하나를 선택할 수 있다. 이때, 감정인식부(220)는 놀람을 선택하였다고 가정한다. 이러한 경우, 도 6의 그래프와 같이, 놀람의 강도가 상품 응시 동작을 검출한 시점 Ts로부터 소정 기간 동안 제1 수치인 0.30 이상 그리고 0.40 이하의 범위 내에서 변화하면, 감정인식부(220)는 고객이 구매 의도가 있는 것으로 판단한다. 이러한 도 6의 그래프와 같이, 제1 크기 이상이면서, 제2 크기 이하인 소정의 범위 내에서 고객의 감정이 변화하는 것을 각성 상태라고 하며, 이러한 각성 상태는 어떤 대상에 대해 흥미를 가지는 상태이다. 따라서 본 발명은 이러한 각성 상태를 고객이 해당 상품에 대한 구매 의도가 있는 것으로 판단한다. For example, referring to FIGS. 5 and 6, it is assumed that, for example, a first value is 0.30 and a second value is 0.40. Then, the emotion recognition unit 220 may select an emotion having an emotion intensity of 0.30 or more. In this case, according to FIG. 5, at least one of emotions'joy' and'surprise' may be selected. At this time, it is assumed that the emotion recognition unit 220 has selected surprise. In this case, as shown in the graph of FIG. 6, when the intensity of the surprise changes within the range of 0.30 or more and 0.40 or less, which are the first values for a predetermined period from the time point Ts at which the product staring motion is detected, the emotion recognition unit 220 It is determined that there is an intention to purchase this. As shown in the graph of FIG. 6, a change in the customer's emotion within a predetermined range that is greater than or equal to the first size and less than or equal to the second size is referred to as an awakening state, and this arousal state is a state of interest in a certain object. Accordingly, the present invention determines that the customer has an intention to purchase the product in this awakening state.

반면, 도 5 및 도 7을 참조하면, 예컨대, 제1 수치가 0.30이고, 제2 수치가 0.40이라고 가정한다. 그러면, 감정인식부(220)는 감정의 강도가 0.30 이상인 감정을 선택할 수 있다. 이러한 경우, 도 5에 따르면, 감정 '기쁨' 및 '놀람' 중 적어도 하나를 선택할 수 있다. 이때, 감정인식부(220)는 놀람을 선택하였다고 가정한다. 이러한 경우, 도 7의 그래프와 같이, 상품 응시 동작을 검출한 시점 Ts로부터 소정 기간 동안 놀람의 강도가 제1 수치인 0.30 미만 혹은 0.40 이상으로 변화하기 때문에 감정인식부(220)는 고객이 구매 의도가 없는 것으로 판단한다. 도 7과 같이, 또한, 감정의 강도가 소정 범위 이상으로 급격하게 변하는 경우, 흥분 상태로 정의하며, 감정의 강도의 변화가 크게 없으면서 낮은 상태인 경우, 이완 상태로 정의한다. 이러한 경우, 흥분 상태의 경우, 대상 상품에 대해 관심이 있는 것이 아닌 다른 외부 요인에 의해 감정의 상태가 변화하는 것으로, 대상 상품과는 무관하다. 또한, 이완 상태의 경우, 실질적으로 쇼핑에 관심이 있는 고객이 아닌 것으로 판단할 수 있다. On the other hand, referring to FIGS. 5 and 7, it is assumed that, for example, a first value is 0.30 and a second value is 0.40. Then, the emotion recognition unit 220 may select an emotion having an emotion intensity of 0.30 or more. In this case, according to FIG. 5, at least one of emotions'joy' and'surprise' may be selected. At this time, it is assumed that the emotion recognition unit 220 has selected surprise. In this case, as shown in the graph of FIG. 7, since the intensity of the surprise changes to less than 0.30 or 0.40 or more, which is the first value, for a predetermined period from the time point Ts at which the product staring motion is detected, the customer's purchase intention It is judged that there is no. As shown in FIG. 7, when the intensity of the emotion changes rapidly beyond a predetermined range, it is defined as an excited state, and when the intensity of the emotion is not large and is low, it is defined as a relaxed state. In this case, in the case of excitement, the state of emotion changes due to external factors other than interest in the target product, and is irrelevant to the target product. In addition, in the case of the relaxed state, it may be determined that the customer is not actually interested in shopping.

다른 실시예에 따르면, 감정인식부(220)는 고객의 복수의 감정 중 적어도 하나의 감정의 강도가 상품 응시 동작을 검출한 시점 이전 보다 상품 응시 동작을 검출한 시점 이후에 소정의 제3 수치 이상 상승하면, 고객이 구매 의도가 있는 것으로 판단한다. 예컨대, 도 6을 참조하면, 제3 수치를 0.30이라고 가정한다. 그러면, 도 6의 그래프와 같이, 놀람의 강도가 상품 응시 동작을 검출한 시점 Ts 이전 보다 상품 응시 동작을 검출한 시점 Ts 이후 0.30 이상 상승한 경우, 감정인식부(220)는 고객이 구매 의도가 있는 것으로 판단할 수 있다. According to another embodiment, the emotion recognition unit 220 has the intensity of at least one emotion among a plurality of emotions of the customer greater than or equal to a predetermined third value after the time when the product staring motion is detected than before the time when the product staring motion is detected. If it rises, it is determined that the customer has a purchase intention. For example, referring to FIG. 6, it is assumed that the third value is 0.30. Then, as shown in the graph of FIG. 6, when the intensity of the surprise increases by 0.30 or more after the time point Ts when the product staring motion is detected than before the time point Ts when the product staring motion is detected, the emotion recognition unit 220 It can be judged as.

다시 도 4를 참조하면, 전술한 바와 같이, S140 단계의 판별 결과, 고객이 구매 의도가 있는 것으로 판단되면, 정보처리부(230)는 S150 단계에서 해당 고객이 응시한 상품을 특정한다. Referring back to FIG. 4, as described above, as a result of the determination in step S140, if it is determined that the customer has a purchase intention, the information processing unit 230 specifies the product stared at by the customer in step S150.

그런 다음, 정보처리부(230)는 S160 단계에서 해당 상품에 대한 프로모션 정보를 생성하고, 생성된 프로모션 정보를 제공한다. 여기서, 프로모션 정보는 해당 고객이 응시한 상품의 식별하기 위한 정보인 상품 식별 정보 및 해당 고객을 식별할 수 있도록 하는 영상인 고객 식별 영상을 포함한다. 또한, 프로모션 정보는 해당 상품의 판매를 촉진하기 위한 해당 상품에 대한 정보, 할인 정보 등을 포함한다. Then, the information processing unit 230 generates promotion information for the product in step S160, and provides the generated promotion information. Here, the promotion information includes product identification information, which is information for identifying a product that the customer has stared at, and a customer identification image, which is an image to identify the customer. In addition, the promotion information includes information on the product, discount information, and the like for promoting the sale of the product.

일 실시예에 따르면, 정보처리부(230)는 표시부(140)를 통해 프로모션 정보를 표시할 수 있다. 다른 실시예에 따르면, 정보처리부(230)는 통신부(110)를 통해 직원장치(400)로 프로모션 정보를 전송할 수 있다. According to an embodiment, the information processing unit 230 may display promotion information through the display unit 140. According to another embodiment, the information processing unit 230 may transmit promotion information to the employee device 400 through the communication unit 110.

이에 따라, 표시부(140)에 표시되는 프로모션 정보를 열람하거나, 직원장치(400)를 통해 프로모션 정보를 열람한 직원은 해당 고객에게 해당 상품에 대한 정보 혹은 할인 정보 등을 설명할 수 있다. 즉, 본 발명은 전술한 바와 같은 고객 대응을 통해 상품 판매를 촉진시킬 수 있다. Accordingly, an employee who reads the promotion information displayed on the display unit 140 or reads the promotion information through the employee device 400 may explain information about the product or discount information to the corresponding customer. That is, the present invention can promote product sales through customer response as described above.

한편, 앞서 설명된 본 발명의 실시예에 따른 방법들은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 와이어뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 와이어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Meanwhile, the methods according to the embodiments of the present invention described above may be implemented in the form of programs that can be read through various computer means and recorded on a computer-readable recording medium. Here, the recording medium may include a program command, a data file, a data structure, or the like alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic-optical media such as floptical disks ( magneto-optical media), and hardware devices specially configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of the program instruction may include not only machine language wires such as those made by a compiler, but also high-level language wires that can be executed by a computer using an interpreter or the like. These hardware devices may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다. The present invention has been described above using several preferred embodiments, but these embodiments are illustrative and not limiting. As such, those of ordinary skill in the art to which the present invention pertains will understand that various changes and modifications can be made according to the equivalence theory without departing from the spirit of the present invention and the scope of the rights presented in the appended claims.

100: 인식장치
110: 통신부 120: 카메라부
130: 입력부 140: 표시부
150: 오디오부 160: 저장부
200: 제어부 210: 모션인식부
220: 감정인식부 230: 정보처리부
400: 직원장치 100: recognition device
110: communication unit 120: camera unit
130: input unit 140: display unit
150: audio unit 160: storage unit
200: control unit 210: motion recognition unit
220: emotion recognition unit 230: information processing unit
400: staff device

Claims

In the device for identifying purchase intent,
A camera unit continuously photographing a customer's image;
A motion recognition unit that detects a product gaze motion, which is a motion for the customer to stare at the product, from the image of the customer;
An emotion recognition unit for determining a customer's purchase intention for a corresponding product by analyzing a change in the intensity of a customer's emotion through a facial image, which is a face part of the customer's image; And
If the determination result, if it is determined that the purchase intention of the customer, the information processing unit for specifying the product the customer is staring; includes,
The emotion recognition unit
An input layer including a feature map, which is a matrix of a predetermined size having elements in units of pixels,
A first convolutional layer including a plurality of feature maps, each of which is a matrix of a predetermined size,
A first pooling layer including a plurality of feature maps, each of which is a matrix of a predetermined size,
A second convolutional layer including a plurality of feature maps, each of which is a matrix of a predetermined size,
A second pooling layer including a plurality of feature maps, each of which is a matrix of a predetermined size,
A fully connected layer including a plurality of nodes,
Output layer containing multiple output nodes
Including,
Each of the plurality of output nodes corresponds to an emotion derived from the face image,
The emotion recognition unit
Inputting the pixel value of the face image into the matrix of the input layer,
The result of the convolution operation performed by performing a convolution operation using a plurality of kernels on the matrix of the input layer is input into a plurality of feature maps of the first convolution layer,
A pooling operation using a plurality of kernels is performed on a plurality of feature maps of the first convolutional layer, and the result of the performed pooling operation is input to a plurality of feature maps of the first pooling layer,
A convolution operation using a plurality of kernels is performed on a plurality of feature maps of the first pooling layer, and the result of the performed convolution operation is input to a plurality of feature maps of the second convolutional layer,
A pooling operation using a plurality of kernels is performed on a plurality of feature maps of the second convolutional layer, and the result of the performed pooling operation is input to a plurality of nodes of the fully connected layer,
The plurality of nodes of the fully connected layer performs an operation using a transfer function on an input from the second pooling layer, applies a weight to the performed operation, and inputs it to a plurality of output nodes of the output layer,
The plurality of output nodes of the output layer perform a predetermined operation on the value input from the fully connected layer, and output an output value as a result of the operation,
The output value of each of the plurality of output nodes is a probability value of each of a plurality of emotions derived from the face image,
The emotion recognition unit
Recognize the output value as the strength of each of a plurality of emotions of the customer,
Selecting at least one emotion having an intensity equal to or greater than a predetermined first value from among the plurality of emotions of the customer,
The intensity of the selected emotion
For a predetermined period from the time when the product staring motion is detected
If it changes while maintaining an arousal state other than a relaxed state and an excited state within a range of more than a predetermined first value and less than a predetermined second value greater than the first value
It is determined that the customer has an intention to purchase,
The intensity of at least one of the customer’s emotions
Than before the time when the above product gaze action was detected
After the point in time when the above product gaze action is detected
If it rises above the predetermined third value,
Characterized in that it is determined that the customer has a purchase intention
A device for identifying purchase intent.

delete

The method of claim 1,
The device is
Further includes; a communication unit for communication with the employee device,
The information processing unit
Generate promotion information for the specified product,
Characterized in that transmitting the generated promotion information to the employee device through the communication unit
A device for identifying purchase intent.

The method of claim 1,
The device is
A display unit for displaying a screen; further includes,
The information processing unit
Generate promotion information for the specified product,
Displaying the generated promotion information through the display unit
A device for identifying purchase intent.

In a method for identifying purchase intent,
Continuously capturing an image of a customer by a camera unit;
Detecting, by a motion recognition unit, a product gaze motion, which is an action for the customer to stare at the product, from the image of the customer;
Determining, by an emotion recognition unit, a change in the intensity of a customer's emotion through a facial image, which is a face part of the customer's image, to determine the customer's purchase intention for the corresponding product; And
When the information processing unit determines that the customer's purchase intention is determined as a result of the determination, specifying a product to be stared at by the customer; and
The emotion recognition unit
An input layer including a feature map, which is a matrix of a predetermined size having elements in units of pixels,
A first convolutional layer including a plurality of feature maps, each of which is a matrix of a predetermined size,
A first pooling layer including a plurality of feature maps, each of which is a matrix of a predetermined size,
A second convolutional layer including a plurality of feature maps, each of which is a matrix of a predetermined size,
A second pooling layer including a plurality of feature maps, each of which is a matrix of a predetermined size,
A fully connected layer including a plurality of nodes,
Output layer containing multiple output nodes
Including,
Each of the plurality of output nodes corresponds to an emotion derived from the face image,
The step of determining the purchase intention
The emotion recognition unit
Inputting the pixel value of the face image into the matrix of the input layer,
The result of the convolution operation performed by performing a convolution operation using a plurality of kernels on the matrix of the input layer is input into a plurality of feature maps of the first convolution layer,
A pooling operation performed by performing a pooling operation using a plurality of kernels on a plurality of feature maps of the first convolutional layer is input into a plurality of feature maps of the first pooling layer,
The result of the convolution operation performed by performing a convolution operation using a plurality of kernels on the plurality of feature maps of the first pooling layer is input to the plurality of feature maps of the second convolutional layer,
A pooling operation performed by performing a pooling operation using a plurality of kernels on a plurality of feature maps of the second convolutional layer is input to a plurality of nodes of the fully connected layer,
The plurality of nodes of the fully connected layer performs an operation using a transfer function on the input from the second pooling layer, applies a weight to the performed operation, and inputs it to the plurality of output nodes of the output layer,
The plurality of output nodes of the output layer perform a predetermined operation on the value input from the fully connected layer, and output the output values of each of the plurality of output nodes, which are probability values of each of a plurality of emotions derived from the face image as a result of the operation. and,
The emotion recognition unit
Recognizing the output value of each of the plurality of output nodes as the strength of each of the plurality of emotions of the customer,
Selecting at least one emotion having an intensity equal to or greater than a predetermined first value from among the plurality of emotions of the customer,
The intensity of the selected emotion
For a predetermined period from the time when the product staring motion is detected
Within the range of more than the first value and less than or equal to a predetermined second value greater than the first value
If you change while maintaining a state of arousal rather than a state of relaxation and excitement,
It is determined that the customer has an intention to purchase,
The intensity of at least one of the customer’s emotions
Than before the time when the above product gaze action was detected
After the point in time when the above product gaze action is detected
If it rises above the predetermined third value,
Characterized in that it is determined that the customer has a purchase intention
A way to identify purchase intent.

A computer-readable recording medium on which a program for performing the method for identifying purchase intent according to claim 6 is recorded.