KR20220095752A

KR20220095752A - Automatic Scrolling Apparatus Using Gaze Tracking

Info

Publication number: KR20220095752A
Application number: KR1020200187590A
Authority: KR
Inventors: 정새보미
Original assignee: 주식회사 유니트리아
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2022-07-07

Abstract

Disclosed is an automatic scrolling device using eye tracking. One aspect of this embodiment relates to an automatic scrolling device which automatically scrolls an output screen by analyzing a user's face image. Provided is an automatic scrolling device, which includes: a communication unit receiving a learning model for deriving a scroll control signal from a face image from the outside; a photographing unit which photographs a user's face image; and an input/output unit outputting contents; and a control unit for controlling whether to scroll an output screen output by the input/output unit by analyzing the degree of concentration on a user's screen from the user's face image captured by the photographing unit using the learning model. The present invention provides a device for scrolling the screen viewed by a user according to user's eye tracking.

Description

Automatic Scrolling Apparatus Using Gaze Tracking

본 발명은 시선을 추적하여 사용자의 시선 방향에 따라 자동으로 스크롤링하는 장치에 관한 것이다.The present invention relates to a device for automatically scrolling according to a user's gaze direction by tracking the gaze.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information for the present embodiment and does not constitute the prior art.

현대 사회에서는 통신 기술의 발전에 따라 스마트폰의 사용이 폭발적으로 증가하고 있고, 이에 따라 스마트폰의 보급 또한 증가하고 있다. In modern society, with the development of communication technology, the use of smart phones is increasing explosively, and accordingly, the spread of smart phones is also increasing.

터치패널을 구비하는 스마트폰은 터치패널을 입력장치로 사용한다. 스마트폰은 터치패널에서 사용자가 입력하는 다양한 입력 이벤트를 감지할 수 있고, 그에 대응하는 기능을 수행할 수 있다.A smartphone having a touch panel uses the touch panel as an input device. The smartphone may detect various input events input by the user on the touch panel, and may perform a function corresponding thereto.

스마트폰은 많은 양의 페이지 또는 이미지 등을 표시하는 다양한 화면(예컨대, 브라우저, E-book, 갤러리 등의 어플리케이션 화면)을 표시할 수 있고, 다양한 화면에서 스크롤을 제어하기 위한 이벤트를 감지할 수 있다. 스마트폰은 스크롤 이벤트에 대응하여 화면을 스크롤하여 표시할 수 있다. 스크롤은 표시하는 화면을 상하 또는 좌우로 이동시키는 동작을 의미한다.The smartphone may display various screens (eg, application screens such as browser, E-book, gallery, etc.) displaying a large amount of pages or images, and may detect an event for controlling scrolling on various screens. . The smartphone may scroll and display the screen in response to the scroll event. Scrolling refers to an operation of moving a displayed screen vertically or horizontally.

스마트폰의 사용량이 많아지며 사용자가 누워서 스마트폰 등을 조작하는 일이 많은데, 누워서의 스마트폰의 조작은 불편함이 존재한다. 이러한 불편함을 해소하고자 스마트폰 거치대 등이 개발되고 있으나, 스마트폰의 스크롤을 위해서는 스마트폰을 터치해야만 하기에 사용자가 불편을 느낄 수 있다.As the usage of smartphones increases, users often operate smartphones while lying down, but operation of the smartphone while lying down is inconvenient. Although a smartphone holder has been developed to solve this inconvenience, the user may feel inconvenient because the smartphone has to be touched in order to scroll the smartphone.

본 발명의 일 실시예는, 사용자의 안구를 인식하여 시선을 추적함으로써, 사용자가 보고 있는 화면을 사용자의 시선에 따라 스크롤링하는 장치를 제공하는 데 일 목적이 있다.An embodiment of the present invention aims to provide an apparatus for scrolling a screen viewed by a user according to the user's gaze by recognizing the user's eyeball and tracking the gaze.

본 발명의 일 측면에 의하면, 사용자의 얼굴 이미지를 분석하여 출력화면의 스크롤링을 자동으로 수행하는 자동 스크롤링 장치에 있어서, 외부로부터 얼굴 이미지로부터 스크롤 제어 신호를 도출하는 학습모델을 수신하는 통신부와 사용자의 얼굴 이미지를 촬영하는 촬영부와 컨텐츠를 출력하는 입출력부 및 상기 학습모델을 이용하여 상기 촬영부가 촬영한 사용자의 얼굴 이미지로부터 사용자의 화면에 대한 집중도를 분석하여 상기 입출력부가 출력하는 출력화면의 스크롤링 여부를 제어하는 제어부를 포함하는 것을 특징으로 하는 자동 스크롤링 장치를 제공한다.According to one aspect of the present invention, in an automatic scrolling device for automatically scrolling an output screen by analyzing a user's face image, a communication unit for receiving a learning model for deriving a scroll control signal from the face image from the outside and the user Whether the output screen output by the input/output unit is scrolled by analyzing the concentration on the user's screen from the user's face image photographed by the photographing unit using the photographing unit for photographing a face image, an input/output unit for outputting content, and the learning model It provides an automatic scrolling device comprising a control unit for controlling the.

본 발명의 일 측면에 의하면, 상기 자동 스크롤링 장치는 상기 통신부가 수신한 학습모델을 저장하는 메모리부를 더 포함하는 것을 특징으로 한다.According to an aspect of the present invention, the automatic scrolling apparatus further comprises a memory unit for storing the learning model received by the communication unit.

본 발명의 일 측면에 의하면, 상기 제어부는 상기 얼굴 이미지로부터 사용자의 동공의 움직임 또는 사용자의 각 이목구비의 움직임을 예측하여 사용자의 화면에 대한 집중도를 분석하는 것을 특징으로 한다.According to an aspect of the present invention, the control unit predicts the movement of the user's pupil or the movement of each feature of the user from the face image, and analyzes the concentration of the user on the screen.

본 발명의 일 측면에 의하면, 상기 제어부는 사용자의 동공의 움직임이 상대적으로 증가하거나 사용자의 각 이목구비의 움직임이 상대적으로 감소하는 경우, 상기 입출력부가 출력하는 출력화면을 학습모델에 따라 일 방향으로 스크롤링하는 것을 특징으로 한다.According to an aspect of the present invention, when the movement of the user's pupil is relatively increased or the movement of each feature of the user is relatively decreased, the control unit scrolls the output screen output by the input/output unit in one direction according to the learning model. characterized in that

본 발명의 일 측면에 의하면, 사용자의 얼굴 이미지를 분석하여 출력화면의 스크롤링을 자동으로 수행할 수 있도록 하는 학습모델을 생성하는 자동 스크롤링 학습서버에 있어서, 화면을 시청하는 인간의 얼굴 이미지와 인간의 스크롤 제어를 위한 입력을 빅데이터로 저장하는 데이터베이스와 상기 데이터베이스 내 인간의 얼굴 이미지와 인간의 스크롤 제어를 위한 입력을 학습하여, 얼굴 이미지를 입력받아 스크롤 제어신호를 출력하는 학습모델을 생성하는 제어부 및 상기 제어부가 생성한 학습모델을 전송하는 통신부를 포함하는 것을 특징으로 하는 자동 스크롤링 학습서버를 제공한다.According to one aspect of the present invention, in an automatic scrolling learning server that analyzes a user's face image and generates a learning model that enables automatic scrolling of an output screen, the face image of a human watching the screen and the human A database for storing an input for scroll control as big data, a controller for learning a human face image in the database and an input for human scroll control, and generating a learning model that receives a face image and outputs a scroll control signal; It provides an automatic scrolling learning server comprising a communication unit for transmitting the learning model generated by the control unit.

본 발명의 일 측면에 의하면, 상기 데이터베이스는 스크롤 제어를 하는 각 상황에서의 인간의 얼굴 이미지들을 저장하는 것을 특징으로 한다.According to one aspect of the present invention, the database is characterized in that it stores human face images in each situation in which scroll control is performed.

본 발명의 일 측면에 의하면, 상기 제어부는 상기 데이터베이스 내 인간의 얼굴 이미지로부터 사용자의 화면에 대한 집중도를 분석하는 것을 특징으로 한다.According to one aspect of the present invention, the control unit is characterized in that it analyzes the concentration of the user's screen from the human face image in the database.

이상에서 설명한 바와 같이, 본 발명의 일 측면에 따르면, 사용자의 시선을 추적하여 사용자가 보고 있는 화면을 사용자의 시선에 따라 스크롤링함으로써, 사용자가 화면을 조작하지 않고 쳐다만 보더라도 원하는 대로 스크롤링할 수 있는 장점이 있다.As described above, according to one aspect of the present invention, by tracking the user's gaze and scrolling the screen the user is looking at according to the user's gaze, the user can scroll as desired even if he or she looks at the screen without manipulating the screen. There is this.

도 1은 본 발명의 일 실시예에 따른 자동 스크롤링 시스템을 도시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 자동 스크롤링 장치의 구성을 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 자동 스크롤링 학습서버의 구성을 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 자동 스크롤링 학습서버 또는 자동 스크롤링 장치가 이미지로부터 사용자의 안구를 검출하는 과정을 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 자동 스크롤링 학습서버 또는 자동 스크롤링 장치가 동공을 검출하는 과정을 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 자동 스크롤링 장치가 사용자의 시선에 따라 스크롤링하는 일 예를 도시한 도면이다.1 is a diagram illustrating an automatic scrolling system according to an embodiment of the present invention.
2 is a diagram illustrating a configuration of an automatic scrolling apparatus according to an embodiment of the present invention.
3 is a diagram showing the configuration of an automatic scrolling learning server according to an embodiment of the present invention.
4 is a diagram illustrating a process in which an automatic scrolling learning server or an automatic scrolling device detects a user's eyeball from an image according to an embodiment of the present invention.
5 is a diagram illustrating a process of detecting a pupil by an automatic scrolling learning server or an automatic scrolling device according to an embodiment of the present invention.
6 is a diagram illustrating an example in which an automatic scrolling apparatus scrolls according to a user's gaze according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 동력 수단요소로 명명될 수 있고, 유사하게 동력 수단요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be called a power means, and similarly, the power means may also be called the first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에서, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When an element is referred to as being “connected” or “connected” to another element, it is understood that it may be directly connected or connected to the other element, but other elements may exist in between. it should be On the other hand, when a certain element is referred to as being “directly connected” or “directly connected” to another element, it should be understood that no other element is present in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서 "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. It should be understood that terms such as “comprise” or “have” in the present application do not preclude the possibility of addition or existence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification in advance. .

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해서 일반적으로 이해되는 것과 동일한 의미를 가지고 있다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

또한, 본 발명의 각 실시예에 포함된 각 구성, 과정, 공정 또는 방법 등은 기술적으로 상호 간 모순되지 않는 범위 내에서 공유될 수 있다.In addition, each configuration, process, process or method included in each embodiment of the present invention may be shared within a range that does not technically contradict each other.

도 1은 본 발명의 일 실시예에 따른 자동 스크롤링 시스템을 도시한 도면이다.1 is a diagram illustrating an automatic scrolling system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 자동 스크롤링 시스템(100)은 자동 스크롤링 장치(110, 이하에서 '장치'라 약칭함) 및 자동 스크롤링 학습서버(120, 이하에서 '서버'라 약칭함)를 포함한다.Referring to FIG. 1 , an automatic scrolling system 100 according to an embodiment of the present invention includes an automatic scrolling device 110 (hereinafter abbreviated as 'device') and an automatic scrolling learning server 120 (hereinafter referred to as 'server'). abbreviated).

장치(110)는 자신을 바라보는 사용자의 얼굴 이미지로부터 사용자의 화면에 대한 집중도를 분석하고, 분석한 집중도를 토대로 출력하는 화면을 스크롤링한다.The device 110 analyzes the concentration on the user's screen from the face image of the user looking at it, and scrolls the output screen based on the analyzed concentration.

장치(110)는 터치패널을 입력수단으로 구비하며, 텍스트, 이미지 또는 영상 등의 정보를 출력하는 장치로 구현된다. 일 예로 스마트폰 또는 태블릿 PC일 수 있으나, 반드시 이에 한정하는 것은 아니다. The device 110 includes a touch panel as an input means and is implemented as a device that outputs information such as text, image, or video. An example may be a smartphone or a tablet PC, but is not necessarily limited thereto.

장치(110)는 자신이 출력한 정보를 시청하는 사용자의 얼굴 이미지를 촬영하여 그로부터 사용자의 화면에 대한 집중도를 분석한다. 장치(110)는 이미지를 촬영하는 구성을 내장하거나 외부의 장치와 연결되어, 자신이 출력한 정보를 시청하는 사용자의 얼굴 이미지를 촬영한다. 장치(110)는 기 설정된 간격 내 사용자의 얼굴 이미지를 복수 개 획득하며, 각 이미지 내에서 사용자의 이목구비와 안구 내 동공을 추출한다. 장치(110)는 서버(120)로부터 이미지 내에서 인간의 이목구비와 동공을 추출하는 학습 모델 (또는 알고리즘)을 수신하여 저장하며, 해당 학습 모델을 이용하여 이미지 내에서 화면에 대한 집중도를 분석한다. 이목구비나 동공의 움직임으로부터 사용자의 감정 또는 사용자의 집중도가 도출될 수 있다. 사용자의 눈, 눈썹, 동공, 뺨 또는 입의 움직임이나 크기 변화는 사람의 감정을 드러내는 단서가 된다. 예를 들어, 눈과 눈썹의 움직임이나 크기변화로부터 공포 또는 슬픔의 감정을 도출할 수 있고, 뺨이나 입의 움직임 등으로부터 기쁨이나 혐오(불쾌감)의 감정이 도출될 수 있다. 눈과 눈썹이 움직이며, 입꼬리가 상승하는 경우, 장치(110)는 사용자가 기쁨의 감정을 가지고 있음을 도출할 수 있고, 눈을 감은 상태를 일정 기간 지속하거나 이목구비의 움직임이 최소화되는 경우, 장치(110)는 사용자가 지루함을 느끼고 있음을 도출할 수 있다. 장치(110)가 제공하는 화면(컨텐츠)을 시청하며 갖는 감정에 따라 사용자의 화면에 대한 집중도는 상이해진다. 즉, 장치(110)는 이목구비의 움직임을 추출하여 그로부터 사용자의 감정을 도출함으로써, 도출된 사용자의 감정에 따라 사용자가 얼마나 집중하고 있는지를 분석한다. 마찬가지로, 사용자가 제공되는 화면에 집중할 경우, 동공은 화면 내 일 부분에 정지해 있거나 화면 내 일 부분으로부터 다른 부분으로 미세하게(기 설정된 기준치 이내) 움직이게 된다. 이와 같이, 장치(110)는 학습모델을 이용하여 이목구비의 움직임으로부터 사용자의 감정을 도출하거나, 동공의 움직임으로부터 사용자의 화면에 대한 집중도를 분석한다. The device 110 captures the face image of the user who views the information output by the device 110 and analyzes the concentration of the user's screen therefrom. The device 110 has a built-in configuration for photographing an image or is connected to an external device to photograph the face image of a user who views the information output by the device 110 . The device 110 acquires a plurality of face images of the user within a preset interval, and extracts the user's facial features and intraocular pupils from within each image. The device 110 receives and stores a learning model (or algorithm) for extracting human features and pupils from the image from the server 120, and analyzes the concentration on the screen in the image using the learning model. The user's emotion or the user's concentration may be derived from the movement of the facial features or pupil. Changes in the movement or size of the user's eyes, eyebrows, pupils, cheeks or mouth are clues that reveal a person's emotions. For example, emotions of fear or sadness may be derived from movements or size changes of eyes and eyebrows, and emotions of joy or disgust (discomfort) may be derived from movements of cheeks or mouths. When the eyes and eyebrows move and the corners of the mouth rise, the device 110 can derive that the user has a feeling of joy, and when the state with the eyes closed for a certain period of time or movement of the features is minimized, the device 110 may derive that the user is feeling bored. The user's concentration on the screen is different according to the emotion he has while watching the screen (content) provided by the device 110 . That is, the device 110 extracts the movement of the features and derives the user's emotion from it, and analyzes how much the user is focused according to the derived user's emotion. Similarly, when the user concentrates on a provided screen, the pupil is still at one part of the screen or moves minutely (within a preset reference value) from one part of the screen to another. In this way, the device 110 derives the user's emotion from the movement of the features or analyzes the concentration of the user on the screen from the movement of the pupil using the learning model.

장치(110)는 화면에 대한 집중도에 따라 출력하고 있는 화면을 스크롤링한다. 사용자의 동공의 움직임 또는 얼굴의 전체적인 표정변화(화면에 대한 집중도)를 예측한 후, 장치(110)는 동공의 움직임에 따라 출력하고 있는 화면을 자동으로 스크롤링한다. 장치(110)는 터치패널로 사용자의 입력을 받지 않더라도 분석한 사용자의 시선에 따라 출력하고 있는 화면을 스크롤링한다. 자세한 구조는 도 2를 참조하여 후술한다.The device 110 scrolls the output screen according to the degree of concentration on the screen. After predicting the movement of the user's pupil or the overall facial expression change (degree of concentration on the screen), the device 110 automatically scrolls the output screen according to the movement of the pupil. The device 110 scrolls the output screen according to the analyzed user's gaze even if the user's input is not received through the touch panel. A detailed structure will be described later with reference to FIG. 2 .

서버(120)는 장치를 바라보는 인간의 얼굴 이미지 및 인간의 스크롤 제어를 위한 입력을 토대로 사용자의 집중도와 스크롤과의 관계를 학습하여 학습모델을 생성한다. 서버(120)는 수많은 인간의 얼굴 이미지와 인간의 스크롤 제어를 위한 입력을 학습한다. 서버(120)는 인간의 얼굴 이미지 내에서 동공의 움직임 또는 얼굴의 전체적인 표정(각 이목구비 및 눈썹의 움직임) 등을 분석하여 인간의 화면에 대한 집중도 (변화)를 분석한다. 이와 함께, 서버(120)는 얼굴 이미지를 제공하는 각 인간의 장치에 대한 스크롤 제어에 관한 입력을 분석한다. 스크롤 제어에 관한 입력은 각 시점에서 스크롤의 위치, 스크롤의 움직임 여부 및 스크롤을 이동하고자 하는 방향 등을 포함한다. 서버(120)는 수많은 데이터들에 대한 전술한 정보를 매칭하여, 장치를 시청하는 인간의 얼굴 이미지의 변화 및 스크롤 제어와의 학습모델을 도출한다. 즉, 서버(120)는 인간의 얼굴 이미지가 입력될 경우, 이미지 내 인간의 화면에 대한 집중도를 분석하여 스크롤의 제어신호를 출력으로 도출하는 학습모델을 도출한다. 서버(120)는 이러한 학습모델을 도출하여 장치(110)로 제공함으로써, 장치(110)가 해당 학습모델을 이용해 화면을 스크롤링할 수 있도록 한다. 서버(120)에 대한 구체적인 구성과 동작은 도 3 내지 5를 참조하여 후술하기로 한다.The server 120 generates a learning model by learning the relationship between the user's concentration and scrolling based on the human face image looking at the device and the input for controlling the human scroll. The server 120 learns numerous human face images and inputs for human scroll control. The server 120 analyzes the degree of concentration (change) on the human screen by analyzing the movement of the pupil or the overall expression of the face (movement of each feature and eyebrows) in the human face image. At the same time, the server 120 analyzes the input related to the scroll control for each human device providing the face image. The input related to the scroll control includes the position of the scroll at each time point, whether the scroll is moved, and the direction in which the scroll is to be moved. The server 120 matches the above-described information for numerous pieces of data, and derives a learning model for changing the face image of a person viewing the device and controlling the scroll. That is, when a human face image is input, the server 120 derives a learning model that derives a scroll control signal as an output by analyzing the concentration on the human screen in the image. The server 120 derives such a learning model and provides it to the device 110 so that the device 110 can scroll the screen using the learning model. A detailed configuration and operation of the server 120 will be described later with reference to FIGS. 3 to 5 .

도 2는 본 발명의 일 실시예에 따른 자동 스크롤링 장치의 구성을 도시한 도면이다.2 is a diagram illustrating a configuration of an automatic scrolling apparatus according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 장치(110)는 통신부(210), 촬영부(220), 제어부(230), 메모리부(240) 및 입출력부(250)를 포함한다.Referring to FIG. 2 , the device 110 according to an embodiment of the present invention includes a communication unit 210 , a photographing unit 220 , a control unit 230 , a memory unit 240 , and an input/output unit 250 .

통신부(210)는 서버(120)로부터 사용자의 이미지로부터 스크롤링 제어에 관한 학습모델을 수신한다. The communication unit 210 receives a learning model for scrolling control from the user's image from the server 120 .

촬영부(220)는 장치(110)를 사용하는 사용자의 얼굴 이미지를 촬영한다. 촬영부(220)는 장치(110)에 내장되어 있을 수도 있고, 외부에서 장치(110)와 연결된 구성일 수도 있다. 촬영부(220)는 사용자의 얼굴 이미지를 획득하여, 제어부(230)가 사용자의 화면에 대한 집중도를 분석할 수 있도록 한다. 제어부(230)가 사용자의 집중도를 분석할 수 있도록, 촬영부(220)는 기 설정된 간격 내 사용자의 얼굴 이미지를 복수 개 촬영한다.The photographing unit 220 captures a face image of a user who uses the device 110 . The photographing unit 220 may be built-in to the device 110 or may be externally connected to the device 110 . The photographing unit 220 obtains the user's face image so that the control unit 230 can analyze the concentration of the user's screen. In order for the control unit 230 to analyze the user's concentration level, the photographing unit 220 captures a plurality of face images of the user within a preset interval.

제어부(230)는 수신한 학습모델을 토대로, 촬영부(220)가 촬영한 이미지를 분석하여 입출력부(250)에서 출력되는 화면의 스크롤링을 제어한다.The control unit 230 analyzes the image captured by the photographing unit 220 based on the received learning model and controls scrolling of a screen output from the input/output unit 250 .

제어부(230)는 수신한 학습모델을 토대로, 촬영부(220)가 촬영한 이미지로부터 사용자의 화면에 대한 집중도를 분석한다. 제어부(230)는 학습모델을 이용하여 이미지 내에서 동공의 움직임 또는 얼굴의 전체적인 표정(각 이목구비 및 눈썹의 움직임)을 예측하여 이로부터 화면에 대한 집중도를 분석한다. 사용자가 입출력부(250)에서 출력되는 화면에 집중하고 있을 경우, 동공은 화면 내 일 부분에 정지해 있거나 화면 내 일 부분으로부터 다른 부분으로 미세하게(기 설정된 기준치 이내) 움직인다. 반면, 사용자가 출력되는 화면의 컨텐츠를 모두 확인한 경우나 집중하지 않고 있는 상태라면, 동공은 화면 내 일 부분에 정지하지 않고 이동하거나 이동량이 상대적으로 증가(기 설정된 기준치 이상) 움직이게 된다. 또한, 사용자는 화면으로부터 제공되는 컨텐츠를 시청하며 집중할 경우, 이목구비 일부 또는 전부를 움직이며 컨텐츠의 내용에 따른 감정을 드러낸다. 반면, 사용자가 화면으로부터 제공되는 모든 컨텐츠를 확인하였거나 집중도가 떨어진 경우, 지루함을 드러내거나 집중하지 못하고 산만해지며 이목구비의 일부 또는 전부의 움직임이 최소화된다. 제어부(230)는 이목구비 일부 또는 전부의 움직임이 기 설정된 기준치 이상인지 여부를 판단하여, 사용자가 제공되는 화면에 집중하며 특정 감정을 드러내고 있는지를 판단한다. 이와 같은 특징을 이용하고자, 제어부(230)는 사용자의 화면에 대한 집중도 분석을 위해, 촬영부(220)가 촬영한 이미지로부터 사용자의 얼굴 및 이목구비를 추출하여 이들의 움직임을 분석한다. The control unit 230 analyzes the concentration of the user on the screen from the image captured by the photographing unit 220 based on the received learning model. The controller 230 predicts the movement of the pupil or the overall expression of the face (movement of each facial feature and eyebrow) in the image using the learning model, and analyzes the concentration on the screen therefrom. When the user is concentrating on the screen output from the input/output unit 250 , the pupil is stationary on one part of the screen or moves minutely (within a preset reference value) from one part of the screen to another part. On the other hand, if the user checks all the contents of the output screen or is not concentrating, the pupil moves without stopping in a part of the screen or moves with a relatively increased amount of movement (more than a preset reference value). In addition, when the user concentrates while watching the content provided from the screen, he or she moves some or all of the features to reveal the emotion according to the content of the content. On the other hand, when the user checks all the contents provided from the screen or the concentration is low, boredom is revealed or the user is distracted without being able to concentrate, and movement of some or all of the features is minimized. The controller 230 determines whether the movement of some or all of the features is greater than or equal to a preset reference value, and determines whether the user is concentrating on the provided screen and revealing a specific emotion. In order to use such a feature, the control unit 230 extracts the user's face and facial features from the image captured by the photographing unit 220 to analyze the concentration on the user's screen and analyzes their movements.

메모리부(240)는 서버(120)로부터 수신한 학습모델을 저장한다.The memory unit 240 stores the learning model received from the server 120 .

입출력부(250)는 사용자로부터 컨텐츠 출력을 위한 입력을 수신하거나, 사용자의 입력에 따른 컨텐츠를 출력한다. The input/output unit 250 receives an input for outputting content from a user or outputs content according to the user's input.

도 3은 본 발명의 일 실시예에 따른 자동 스크롤링 학습서버의 구성을 도시한 도면이다.3 is a diagram showing the configuration of an automatic scrolling learning server according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 서버(120)는 통신부(310), 제어부(320) 및 데이터베이스(330)를 포함한다.Referring to FIG. 3 , the server 120 according to an embodiment of the present invention includes a communication unit 310 , a control unit 320 , and a database 330 .

통신부(310)는 제어부(320)가 생성한 학습모델을 장치(110)로 전송한다.The communication unit 310 transmits the learning model generated by the control unit 320 to the device 110 .

제어부(320)는 데이터베이스(330) 내 저장된 수많은 인간의 얼굴 이미지와 인간의 스크롤 제어를 위한 입력을 학습하여, 얼굴 이미지를 입력받아 인간의 화면에 대한 집중도 (변화)를 분석하고 이로부터 스크롤 제어신호를 출력하는 학습모델을 생성한다. The controller 320 learns a number of human face images stored in the database 330 and inputs for human scroll control, receives the face images, analyzes the concentration (change) on the human screen, and receives scroll control signals therefrom. Create a learning model that outputs

제어부(320)는 데이터베이스(330) 내 저장된 수많은 인간의 얼굴 이미지로부터 사용자의 얼굴 및 얼굴 내 이목구비를 추출한다. 제어부(320)가 사용자의 얼굴 및 이목구비를 추출하는 방법은 도 4 및 5에 예시되어 있다. The controller 320 extracts the user's face and facial features from numerous human face images stored in the database 330 . A method for the controller 320 to extract the user's face and features is illustrated in FIGS. 4 and 5 .

도 4는 본 발명의 일 실시예에 따른 자동 스크롤링 학습서버 또는 자동 스크롤링 장치가 이미지로부터 사용자의 안구를 검출하는 과정을 도시한 도면이고, 도 5는 본 발명의 일 실시예에 따른 자동 스크롤링 학습서버 또는 자동 스크롤링 장치가 동공을 검출하는 과정을 도시한 도면이다.4 is a diagram illustrating a process in which an automatic scrolling learning server or an automatic scrolling device detects a user's eyeball from an image according to an embodiment of the present invention, and FIG. 5 is an automatic scrolling learning server according to an embodiment of the present invention. Alternatively, it is a diagram illustrating a process in which the automatic scrolling device detects a pupil.

제어부(320)는 이미지로부터 사용자의 얼굴과 이목구비를 추출하기 위한 인공지능 기반의 신경망(Neural Network)을 구성한다.The controller 320 configures an artificial intelligence-based neural network for extracting the user's face and features from the image.

인공지능 기반의 신경망은 서포트 벡터 머신(Support Vector Machine, SVM), 컨볼루션 신경망(Convolutional Neural Networks, CNN), 순환 신경망(Recurrent Neural Networks, RNN), 장기-단기 기억 신경망(Long-Short Term Memory, LSTM) 등의 특징 추출 기반의 분류 모델을 사용할 수 있다.Artificial intelligence-based neural networks include Support Vector Machine (SVM), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long-Short Term Memory, A classification model based on feature extraction such as LSTM) can be used.

특징 추출모델은 크게 머신러닝 기반의 알고리즘을 이용해 이미지 내에서 사용자 얼굴을 인식하는 얼굴 인식 과정, 인식된 사용자 얼굴에서 안구와 이목구비의 위치를 검출하기 위해 랜드마크들을 추출하는 랜드마크 추출 과정 및 이미지 프로세싱을 통해 눈의 랜드마크에서 동공과 눈의 중앙 위치를 추출하는 안구 움직임 추적 과정으로 이루어진다.The feature extraction model is largely a face recognition process that recognizes a user's face in an image using a machine learning-based algorithm, a landmark extraction process that extracts landmarks to detect the positions of the eyes and features in the recognized user's face, and image processing It consists of an eye movement tracking process that extracts the pupil and center positions of the eyes from the landmarks of the eyes.

먼저, 제어부(320)는 얼굴 인식을 위해 HoG((Histogram of Oriented Gradients) 특징 및 SVM 기반의 얼굴 인식 알고리즘을 이용하여 빠른 속도, 적은 리소스 요구량, 높은 정확도를 가지고 효율적인 얼굴 인식 기능을 수행한다. SVM을 이용한 HOG 기반의 얼굴 검출 방식은 입력되는 이미지 내에서 관심 객체(즉, 사용자 얼굴 부위)의 특징을 추출하고, 추출된 특징 데이터를 이용하여 SVM 학습 및 분류기를 생성하고, 이렇게 생성된 분류기를 통해 입력 영상에서 관심 객체의 특징과 유사한 객체를 추출함으로써 관심 객체를 탐색한다. 관심 객체를 자동으로 찾아가기 때문에 넓은 영역에서 수동으로 관심 객체를 탐색하는데 소요되는 시간과 노력을 줄일 수 있다.First, the controller 320 uses a HoG (Histogram of Oriented Gradients) feature and an SVM-based face recognition algorithm for face recognition to perform an efficient face recognition function with high speed, low resource requirements, and high accuracy. SVM The HOG-based face detection method using The object of interest is searched for by extracting an object similar to the characteristic of the object of interest from the input image, and the time and effort required to manually search for the object of interest in a wide area can be reduced because the object of interest is automatically found.

다음, 랜드마크 추출 과정에서는 얼굴 인식 과정에서 인식된 얼굴에서 이목구비의 위치를 인식하기 위해 얼굴 랜드마크 추출 알고리즘을 사용한다. 이때, 얼굴 랜드마크 추출 알고리즘은 CNN(Convolutional Neural Network)이다. CNN의 동작하는 원리를 개략적으로 나타나면 이미지를 작은 조각으로 나누고, 작은 네트워크를 통해 조각들에서 특정한 특징을 추출하게 된다. 이후 네트워크가 다음 조각으로 이동하고 같은 방법으로 동일한 수준을 활용하여 특징을 다시 추출한다. 이동하여 다른 특징을 추출하는 네트워크를 추가적으로 만들고 위와 같은 방법을 반복하여 조각들을 하나씩 네트워크에 적용한다. 이후 추출된 모든 특징들을 잘 조합하여 최종적으로 이미지를 판단하게 된다. 이러한 동작을 거쳐, 제어부(320)는 얼굴 내에서 이목구비를 추출한다.Next, in the landmark extraction process, a facial landmark extraction algorithm is used to recognize the location of features in the face recognized in the face recognition process. In this case, the facial landmark extraction algorithm is a Convolutional Neural Network (CNN). When the working principle of CNN is schematically shown, the image is divided into small pieces, and specific features are extracted from the pieces through a small network. The network then moves to the next piece and extracts the features again using the same level in the same way. Create an additional network that moves and extracts different features and repeats the above method to apply the pieces to the network one by one. Afterwards, all the extracted features are well combined to finally judge the image. Through these operations, the control unit 320 extracts features from within the face.

이목구비를 추출한 후, 제어부(320)는 도 3에 도시된 바와 같이, 추출된 눈 영역에서 동공을 추출하는 과정은 임계(Threshold) 값을 기반으로 하는 마스킹(Masking)을 통해 수행한다.After the feature is extracted, as shown in FIG. 3 , the control unit 320 performs a process of extracting the pupil from the extracted eye region through masking based on a threshold value.

즉, 제어부(320)는 입력되는 사용자의 얼굴 이미지에서의 조명이나 촬영 각도 또는 촬영장소 등에 따른 환경 변화에 대응하기 위하여 임계값을 설정할 수 있다. 제어부(320)는 설정된 임계값에 기초하여 눈 영역에서 동공과 동공이 아닌 부분을 구분할 수 있다.That is, the controller 320 may set a threshold value in order to respond to a change in the environment according to the lighting in the input user's face image, the photographing angle, or the photographing location. The controller 320 may distinguish the pupil from the non-pupil portion in the eye region based on the set threshold value.

다시 도 3을 참조하면, 제어부(320)는 추출한 이목구비를 토대로 움직임 예측모델을 이용하여 동공의 움직임과 표정의 변화를 예측한다. 제어부(320)는 RNN((Recurrent Neural Networks) 내지 LSTM(Long-Short Term Memory)을 이용하여 동공의 움직임이나 이목구비 전체의 이동을 예측한다. 장치(110) 내 입력부(210)는 기 설정된 간격(시간) 내에 복수의 이미지를 촬영하기에, 제어부(320)는 시간에 연속적인 복수의 이미지 내에서 이목구비를 추출하고, 추출한 이목구비와 동공의 움직임을 예측한다. Referring back to FIG. 3 , the control unit 320 predicts the movement of the pupil and the change of the expression by using the motion prediction model based on the extracted features. The controller 320 predicts the movement of the pupil or the movement of the entire feature using Recurrent Neural Networks (RNN) or Long-Short Term Memory (LSTM). time), the control unit 320 extracts a feature from a plurality of images successive in time, and predicts the extracted features and movement of the pupil.

제어부(320)는 인간의 얼굴 이미지와 인간의 스크롤 제어를 위한 입력을 매칭하여 학습하고, 장치(110)가 사용할 수 있도록 하는 학습모델을 생성한다. 제어부(320)는 예측한 이목구비와 동공의 움직임을 인간의 스크롤 제어를 위한 입력과 매칭하여 학습한다. 전술한 예와 같이, 사용자가 동공을 일정한 지점에 정지해두지 않고 이동하거나 인상으로 지루함을 드러내면서, 장치(110)로 상승 또는 하강하는 스크롤링을 입력할 수 있다. 또는, 사용자가 동공을 일정한 지점에 정지해두거나 제공되는 컨텐츠를 시청하며 이목구비를 이동시키며 기쁜 또는 슬픈 표정 등을 드러내면서, 장치(110)로 스크롤링에 관한 아무런 입력을 하지 않을 수 있다. 제어부(320)는 이와 같은 다양한 동공 및 이목구비의 움직임과 그와 동반하는 사용자의 스크롤링에 관한 입력을 매칭하여 학습한다. 제어부(320)는 데이터베이스(330) 내 저장된, 전술한 수많은 데이터를 매칭하여 학습한다. 이에, 제어부(320)는 인간의 얼굴 이미지를 입력받으면, 스크롤 제어신호를 출력하는 학습모델을 생성한다. 전술한 학습모델은 입력받은 이미지로부터 스크롤의 위치, 스크롤의 움직임 여부 및 스크롤을 이동하고자 하는 방향 등 스크롤의 동작을 레이블로 하여 스크롤 제어신호를 도출한다. 이와 같은 학습모델의 일 예는 도 6에 도시되어 있다.The controller 320 learns by matching a human face image with an input for controlling a human scroll, and generates a learning model that the device 110 can use. The controller 320 learns by matching the predicted facial features and pupil movements with an input for controlling a human scroll. As in the above example, the user may input scrolling upward or downward to the device 110 while moving the pupil without stopping at a certain point or showing boredom with an impression. Alternatively, the user may stop the pupil at a certain point or view the provided content, move the facial features, display a happy or sad expression, etc., while not inputting any scrolling input to the device 110 . The control unit 320 learns by matching the motions of the various pupils and features and the accompanying user's scrolling input. The controller 320 learns by matching the above-described numerous data stored in the database 330 . Accordingly, when receiving a human face image, the controller 320 generates a learning model that outputs a scroll control signal. The above-described learning model derives a scroll control signal from the received image by using the scroll operation as a label, such as the position of the scroll, whether the scroll is moved, and the direction in which the scroll is to be moved. An example of such a learning model is shown in FIG. 6 .

도 6은 본 발명의 일 실시예에 따른 자동 스크롤링 장치가 사용자의 시선에 따라 스크롤링하는 일 예를 도시한 도면이다. 장치(110)가 서버(120)가 전술한 과정으로 생성한 학습모델을 수신하여, 이를 이용해 분석하는 것을 예시한다.6 is a diagram illustrating an example in which an automatic scrolling apparatus scrolls according to a user's gaze according to an embodiment of the present invention. It is exemplified that the device 110 receives the learning model generated by the server 120 through the above-described process and analyzes it using the same.

예를 들어, 도 6a에서와 같이 사용자가 장치(110)의 화면(입출력부)을 시청하다가 위에서 아래로 동공을 이동시키는 경우, 장치(110)는 이미지 내에서 동공을 추출하고 동공의 움직임을 예측한다. 도 6에는 도시되어 있지는 않으나 이목구비의 이동으로 표정을 예측할 수 있음은 당연하다.For example, when the user moves the pupil from top to bottom while watching the screen (input/output unit) of the device 110 as shown in FIG. 6A , the device 110 extracts the pupil from the image and predicts the movement of the pupil. do. Although not shown in FIG. 6 , it is natural that facial expressions can be predicted by movement of features.

도 6a와 같이, 사용자가 동공을 위에서 아래로 이동시키는 동작이 예측된 경우, 장치(110)는 학습모델을 토대로 도 6b와 같이 스크롤을 하강시키는 제어신호를 도출할 수 있다. 이러한 제어신호에 따라 입출력부(250)에서 출력되는 화면이 아래로 스크롤링할 수 있다.As shown in FIG. 6A , when the motion of the user moving the pupil from top to bottom is predicted, the device 110 may derive a control signal for lowering the scroll as shown in FIG. 6B based on the learning model. According to such a control signal, the screen output from the input/output unit 250 may scroll down.

다시 도 3을 참조하면, 제어부(320)는 이처럼 생성한 학습모델을 장치(110)로 전송하도록 통신부(310)를 제어한다.Referring back to FIG. 3 , the controller 320 controls the communication unit 310 to transmit the learning model thus generated to the device 110 .

제어부(320)가 학습모델을 생성할 수 있도록, 데이터베이스(330)는 수많은, 화면을 시청하는 인간의 얼굴 이미지와 인간의 스크롤 제어를 위한 입력을 (빅데이터로) 저장한다. 특히, 제어부(320)가 원활히 학습모델을 생성할 수 있도록, 데이터베이스(330)는 스크롤 제어를 하는 다양한 상황에서의 인간의 얼굴 이미지들을 저장한다. In order for the controller 320 to generate a learning model, the database 330 stores (as big data) a number of face images of a human watching a screen and an input for controlling a human scroll. In particular, so that the controller 320 can smoothly generate the learning model, the database 330 stores human face images in various situations in which scroll control is performed.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of this embodiment, and a person skilled in the art to which this embodiment belongs may make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present embodiment.

100: 자동 스크롤링 시스템
110: 자동 스크롤링 장치
120: 자동 스크롤링 학습서버
210, 310: 통신부
220: 촬영부
230, 320: 제어부
240: 메모리부
250: 입출력부
330: 데이터베이스100: automatic scrolling system
110: automatic scrolling device
120: automatic scrolling learning server
210, 310: communication unit
220: shooting department
230, 320: control unit
240: memory unit
250: input/output unit
330: database

Claims

An automatic scrolling device for automatically scrolling an output screen by analyzing a user's face image,
a communication unit for receiving a learning model for deriving a scroll control signal from an external face image;
a photographing unit for photographing a user's face image;
an input/output unit for outputting content; and
A control unit for controlling whether the output screen output by the input/output unit is scrolled by analyzing the concentration on the user's screen from the user's face image captured by the photographing unit using the learning model
Automatic scrolling device comprising a.

According to claim 1,
Automatic scrolling device characterized in that it further comprises a memory unit for storing the learning model received by the communication unit.

According to claim 1,
The control unit is
The automatic scrolling device, characterized in that the user's pupil movement or the user's movement of each feature is predicted from the face image to analyze the concentration of the user's screen.

4. The method of claim 3,
The control unit is
An automatic scrolling device, characterized in that, when the movement of the user's pupil is relatively increased or the movement of each feature of the user is relatively decreased, the output screen output by the input/output unit is scrolled in one direction according to the learning model.

In an automatic scrolling learning server that analyzes a user's face image and generates a learning model that enables automatic scrolling of an output screen,
a database for storing a human face image watching a screen and an input for human scroll control as big data;
a controller for learning a human face image in the database and an input for human scroll control, and generating a learning model that receives the face image and outputs a scroll control signal; and
Communication unit for transmitting the learning model generated by the control unit
Automatic scrolling learning server comprising a.

6. The method of claim 5,
The database is
Automatic scrolling learning server, characterized in that it stores human face images in each situation of scroll control.

6. The method of claim 5,
The control unit is
Automatic scrolling learning server, characterized in that the concentration on the user's screen is analyzed from the human face image in the database.