KR20190100982A

KR20190100982A - Deep learning based gaze detection system and method for automobile drivers

Info

Publication number: KR20190100982A
Application number: KR1020180014084A
Authority: KR
Inventors: 박강령; 윤효식; 알리 나크비 리즈완
Original assignee: 동국대학교 산학협력단
Priority date: 2018-02-05
Filing date: 2018-02-05
Publication date: 2019-08-30
Also published as: KR102017766B1

Abstract

The present invention relates to a technology of tracking a gaze of a vehicle driver. More specifically, provided are an apparatus of tracking a gaze of a vehicle driver, which classifies a gaze area of a vehicle driver from an image obtained from a camera, and a method thereof. According to one embodiment of the present invention, based on deep learning, the gaze tracking accuracy of the vehicle driver can be improved by considering head movements and eye movements of the driver without requiring an initial user calibration step.

Description

DEEP LEARNING BASED GAZE DETECTION SYSTEM AND METHOD FOR AUTOMOBILE DRIVERS

본 발명은 차량 운전자 시선 추적 기술에 관한 것으로, 더욱 상세하게는 카메라로부터 취득한 이미지에서 차량 운전자의 시선 영역을 분류하는 차량 운전자 시선 추적 장치 및 방법에 관한 것이다.The present invention relates to a vehicle driver gaze tracking technology, and more particularly, to a vehicle driver gaze tracking apparatus and method for classifying a gaze area of a vehicle driver in an image acquired from a camera.

기존의 시선 추적 시스템은 실내 데스크톱 모니터 환경에서 주로 사용되어 왔으며, 최대한 머리를 움직이지 않고 눈동자만 움직여서 시선 추적을 수행한다. 그러나 차량 환경에서는 운전자가 양 측 사이드 미러, 룸 미러 등을 응시하기 위해 머리를 움직이게 되는데, 기존 실내에서의 시선 추적 방법은 머리 움직임에 기반한 시선 추적 방법이 아니기에 시선 추적 정확도가 하락한다. 또한, 기존 차량에서의 시선 추적 연구는 이러한 문제를 해결하기 위해 머리 움직임에 기반한 시선 추적 방법을 수행하였으나, 이 경우 운전자가 머리를 움직이지 않고 눈동자만 움직여 물체를 응시하면 시선 추적 정확도가 하락한다. 이러한 문제를 해결하기 위해 기존에는 동공 중심과 조명으로 인해 각막에 발생하는 각막 반사광 중심을 이용한 방법이 수행되었으나, 차량 환경에서는 다양한 외부 광 변화, 안경 표면의 반사, 운전자의 움직임으로 인한 광학적 흐려짐으로 인해 동공 중심과 각막 반사광 중심을 정확히 검출하지 못하는 문제가 발생하였다. The existing eye tracking system has been mainly used in the indoor desktop monitor environment, and the eye tracking is performed by moving only the eyes without moving the head as much as possible. However, in a vehicle environment, the driver moves his head to stare at both side mirrors and room mirrors. However, since the gaze tracking method in the indoor room is not a gaze tracking method based on the head movement, the gaze tracking accuracy decreases. In addition, the gaze tracking research in the existing vehicle performed a gaze tracking method based on head movements to solve this problem, but in this case, gaze tracking accuracy decreases when a driver gazes at an object without moving his head. In order to solve this problem, the method using the center of the corneal reflected light generated in the cornea due to the pupil center and the illumination has been performed.However, in a vehicle environment, due to various external light changes, the reflection of the glasses surface, and the optical blur caused by the driver's movement. There was a problem in that the center of the pupil and the center of the corneal reflected light were not accurately detected.

본 발명의 배경기술은 대한민국 등록특허 제10-1395819호에 게시되어 있다.Background of the present invention is disclosed in Republic of Korea Patent No. 10-1395819.

본 발명은 초기 사용자 캘리브레이션 단계를 필요로 하지 않고 운전자의 머리 움직임과 안구 움직임을 고려한 딥 러닝 기반의 차량 운전자 시선 추적 장치 및 방법을 제공하기 위한 것이다. An object of the present invention is to provide an apparatus and method for tracking a vehicle driver's eye gaze based on deep learning considering a driver's head movement and eyeball movement without requiring an initial user calibration step.

본 발명의 일 측면에 따르면, 딥 러닝 기반의 운전자 시선 추적 장치가 제공된다.According to an aspect of the present invention, a deep learning based driver gaze tracking device is provided.

본 발명의 일 실시 예에 따른 딥 러닝 기반의 운전자 시선 추적 장치는 운전자 영상을 입력 받는 운전자 영상 입력부, 입력된 운전자 영상에서 얼굴 특징 식별 점들을 추적하고, 탐지한 얼굴 특징 식별 점에 기초하여 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지를 추출하는 특징 영역 검출부, 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지를 이용하여 각 영역 이미지의 딥 러닝 특징 세트를 출력하는 딥 러닝 특징 추출부 및 출력한 특징 세트와 저장된 시선 영역의 특징 세트들을 이용하여 운전자의 시선 영역을 분류하는 운전자 시선 추적부를 포함할 수 있다.According to an exemplary embodiment of the present disclosure, a deep learning based driver gaze tracking device may include a driver image input unit that receives a driver image, track face feature identification points from the input driver image, and based on the detected face feature identification point. Feature region detector for extracting image, left eye region image and right eye region image, Deep learning feature extractor for outputting deep learning feature set of each region image using face region image, left eye region image and right eye region image And a driver gaze tracking unit classifying the driver's gaze area by using the output feature set and the stored gaze area feature sets.

본 발명의 다른 일 측면에 따르면, 딥 러닝 기반의 운전자 시선 추적 방법 및 이를 실행하는 컴퓨터 프로그램이 제공된다.According to another aspect of the present invention, a deep learning based driver gaze tracking method and a computer program for executing the same are provided.

본 발명의 일 실시 예에 따른 딥 러닝 기반의 운전자 시선 추적 방법 및 이를 실행하는 컴퓨터 프로그램은 운전자 얼굴 영상을 입력 받는 단계, 운전자 얼굴 영상에서 얼굴 특징 식별 점을 추적하여 얼굴 특징 식별 점을 탐지하는 단계, 얼굴 특징 식별 점에 기반하여 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 손 영역 이미지를 추출하는 단계, 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 손 영역 이미지 등 검출된 3개의 이미지들에 대하여 각각 딥 러닝 기반의 CNN(Convolutional Neural Networks)을 수행하여 특징 값들을 포함한 특징 세트를 생성하는 단계 및 특징 세트를 이용하여 운전자의 시선 영역을 분류하는 단계를 포함할 수 있다.A deep learning based driver gaze tracking method and a computer program for executing the same according to an embodiment of the present disclosure may include receiving a driver face image and detecting a face feature identification point by tracking a face feature identification point in the driver face image. Extracting the face region image, the left eye region image and the right hand region image based on the facial feature identification point, and each of the three detected images, such as the face region image, the left eye region image, and the right hand region image The method may include generating a feature set including feature values by performing learning-based convolutional neural networks (CNNs) and classifying a driver's gaze area using the feature set.

본 발명의 일 실시 예에 따르면, 초기 사용자 캘리브레이션 단계를 필요로 하지 않고 운전자의 머리 움직임과 안구 움직임을 고려하여 딥 러닝 기반으로 차량 운전자의 시선 추적 정확도를 향상시킬 수 있다. According to an embodiment of the present disclosure, the gaze tracking accuracy of the vehicle driver may be improved based on the deep learning in consideration of the head movement and the eye movement of the driver without requiring an initial user calibration step.

본 발명의 일 실시 예에 따르면, 자율 주행 자동차에서 자율 주행으로부터 운전자 주행으로 변환 시, 운전자의 시선 추적을 정확하게 수행하여 주행하는 운전자의 졸음 운전, 주의 산만, 도로 응시 여부를 판단 후 주행 변환에 도움이 되는 정확한 운전자의 상태 정보를 제공할 수 있다. According to an embodiment of the present invention, when converting from autonomous driving to driver driving in an autonomous vehicle, the driver's gaze tracking is accurately performed to determine the drowsy driving, distraction, and stare of the driver, which helps in converting the driving. This can provide accurate driver status information.

본 발명의 일 실시 예에 따르면, 동공 중심과 각막 반사광 중심을 이용한 방법보다 정확도가 높으며, 운전자의 머리가 과도하게 회전된 경우에도 정확하게 운전자의 시선을 추적할 수 있다.According to one embodiment of the present invention, the accuracy is higher than the method using the pupil center and the corneal reflection light center, and even if the driver's head is excessively rotated, the driver's gaze can be accurately tracked.

도 1 내지 도 9는 본 발명의 일 실시 예에 따른 딥 러닝 기반의 운전자 시선 추적 장치를 설명하기 위한 도면들.
도 10 내지 도 13은 본 발명의 일 실시 예에 따른 딥 러닝 기반의 운전자 시선 추적 방법을 설명하기 위한 도면들.1 to 9 are views for explaining a deep gaze-based driver gaze tracking device according to an embodiment of the present invention.
10 to 13 are diagrams for describing a driver gaze tracking method based on deep learning according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서 및 청구항에서 사용되는 단수 표현은, 달리 언급하지 않는 한 일반적으로 "하나 이상"을 의미하는 것으로 해석되어야 한다.As the inventive concept allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the present invention, when it is determined that the detailed description of the related known technology may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. Also, the singular forms used in the specification and claims are to be interpreted as generally meaning "one or more", unless stated otherwise.

이하, 본 발명의 바람직한 실시 예를 첨부도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, and in the following description with reference to the accompanying drawings, the same or corresponding components are given the same reference numerals and redundant description thereof will be omitted. Shall be.

도 1 내지 도 9는 본 발명의 일 실시 예에 따른 딥 러닝 기반의 운전자 시선 추적 장치를 설명하기 위한 도면들이다.1 to 9 are diagrams for describing an apparatus for tracking a driver's gaze based on deep learning according to an exemplary embodiment.

도 1을 참조하면, 딥 러닝 기반의 운전자 시선 추적 장치(100)는 운전자 영상 입력부(110), 특징 영역 검출부(120), 딥 러닝 특징 추출부(130) 및 운전자 시선 추적부(140)를 포함한다.Referring to FIG. 1, a deep learning based driver gaze tracking device 100 may include a driver image input unit 110, a feature region detector 120, a deep learning feature extractor 130, and a driver gaze tracker 140. do.

운전자 영상 입력부(110)는 운전자의 영상을 입력 받는다. The driver image input unit 110 receives an image of a driver.

도 2를 참조하면, 운전자 영상 입력부(110)는 차량의 계기판 근처에 위치한 근적외선 조명(예를 들면, 850nm의 근적외선 LED) 및 근적외선 카메라(예를 들면, NIR 대역 패스 필터를 가진 USB 카메라)에서 촬영된 운전자의 영상을 입력할 수 있다. 운전자 영상 입력부(110)는 가시광선 카메라를 통하여 촬영된 가시광선 운전자 영상도 입력할 수 있다. Referring to FIG. 2, the driver image input unit 110 is photographed by a near-infrared light (for example, a near-infrared LED of 850 nm) and a near-infrared camera (for example, a USB camera having a NIR band pass filter) located near the instrument panel of the vehicle. Can input the image of the driver. The driver image input unit 110 may also input a visible ray driver image photographed through the visible ray camera.

특징 영역 검출부(120)는 입력된 운전자 영상에서 얼굴 특징 식별 점들을 추적하고, 탐지한 얼굴 특징 식별 점에 기초하여 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지를 추출한다. 특징 영역 검출부(120)는 예를 들면, Dlib 얼굴 특징 트랙터를 이용하여 68개의 얼굴 특징 식별 점들을 탐지할 수 있다. 여기서, 얼굴 특징 식별 점들은 눈들, 눈썹들, 코, 입, 턱 라인 등의 돌출된 영역들을 한정하고 나타내기 위하여 이용될 수 있다. The feature region detector 120 tracks face feature identification points in the input driver image and extracts a face region image, a left eye region image, and a right eye region image based on the detected face feature identification points. The feature region detector 120 may detect 68 face feature identification points using, for example, a Dlib face feature tractor. Here, facial feature identification points may be used to define and represent protruding areas such as eyes, eyebrows, nose, mouth, chin line, and the like.

도 3을 참조하면, 특징 영역 검출부(120)는 68개의 얼굴 특징 식별 점의 2차원 좌표를 추출한다. 특징 영역 검출부(120)는 추출한 얼굴 특징 식별 점의 2차원 좌표에 근거하여 얼굴 영역(얼굴 특징 식별 점 1~68), 왼 눈 영역(얼굴 특징 식별 점 36~41) 및 오른 눈 영역(얼굴 특징 식별 점 42~47)을 한정하고, 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지를 추출한다. 특징 영역 검출부(120)는 운전자 영상에서 얼굴 영역, 왼 눈 영역 또는 오른 눈 영역을 한정하기 위하여 하르(Haar) 캐스캐이드 디텍터, HOG 및 선형 SVM 기반 디텍터 또는 딥 러닝 기반의 알고리즘 등을 더 이용할 수 있다. Referring to FIG. 3, the feature region detector 120 extracts two-dimensional coordinates of 68 face feature identification points. The feature region detector 120 detects a face region (face feature identification points 1 to 68), a left eye region (face feature identification points 36 to 41), and a right eye region (face feature) based on the two-dimensional coordinates of the extracted face feature identification points. Identification points 42 to 47), and extracts a face region image, a left eye region image, and a right eye region image. The feature area detector 120 may further use a haar cascade detector, a HOG and linear SVM-based detector, or a deep learning algorithm to define a face region, a left eye region, or a right eye region in the driver's image. have.

딥 러닝 특징 추출부(130)는 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지를 이용하여 각 영역 이미지의 특징 세트를 출력한다. The deep learning feature extractor 130 outputs a feature set of each region image by using the face region image, the left eye region image, and the right eye region image.

도 4를 참조하면, 딥 러닝 특징 추출부(130)는 딥 러닝부(132), 특징 값 추출부(134), 특징 값 정규화부(136) 및 특징 세트 생성부(138)를 포함한다.4, the deep learning feature extractor 130 includes a deep learning unit 132, a feature value extractor 134, a feature value normalizer 136, and a feature set generator 138.

딥 러닝부(132)는 CNN 알고리즘을 이용하여 얼굴 시선 추적을 학습한다. 딥 러닝부(132)는 VGG 얼굴 16 모델에 포함된 2622명으로부터 취득한 2600만개의 얼굴 이미지들을 이용하여 VGG 얼굴 네트워크를 기계 학습한다. 딥 러닝부(132)는 Labeled Faces in the Wild (LFW) 데이터베이스 및 YouTube Faces (YTF) 데이터베이스에 의해 정확도를 평가된 VGG 얼굴 16 모델 외에 자체 데이터베이스(DDGC-DB1)의 학습 데이터를 이용하여 미세한 조절과정을 수행한다. The deep learning unit 132 learns face gaze tracking using a CNN algorithm. The deep learning unit 132 machine learns the VGG face network using 26 million face images obtained from 2622 people included in the VGG face 16 model. The deep learning unit 132 uses the training data of its own database (DDGC-DB1) in addition to the VGG face 16 model whose accuracy is evaluated by the Labeled Faces in the Wild (LFW) database and the YouTube Faces (YTF) database. Do this.

딥 러닝부(132)는 도 5에서 보여진 바와 같이 차량 환경에서 운전자의 시선 추적 시스템을 위하여 자체 데이터베이스(DDGC-DB1)를 구축하였다. 여기서, 자체 데이터베이스(DDGC-DB1)는 운전자 시선 탐지에 대한 외부 빛의 다양한 종류의 영향을 이해하기 위하여 테스트 정보는 하루의 다양한 시간대, 아침, 점심 및 저녁에 취득하였다. 딥 러닝부(132)는 CNN 학습 및 검증을 위하여 자체 데이터베이스(DDGC-DB1)에서 224×224 픽셀의 얼굴 영역 이미지, 왼 손 영역 이미지 및 오른 손 영역 이미지를 이용한다. 딥 러닝부(132)는 CNN 학습을 통하여 최종 레벨에서 2번째 완전 연결 층의 응답으로, 얼굴 영역 이미지, 왼 눈 영역 이미지, 오른 눈 영역 이미지로부터 각각 4096개의 특징 값의 세트를 추출한 후, 최소 거리에 근거하여 17개의 영역 중 시선 영역을 결정한다. 여기서, 17개의 시선 영역은 도 6을 참조하면 운전자의 시선 영역 실험을 통하여 운전자가 5번 이상의 응시한 영역으로 지정될 수 있다. As shown in FIG. 5, the deep learning unit 132 has built its own database DDGC-DB1 for the driver's eye tracking system in a vehicle environment. Here, the self database (DDGC-DB1) has obtained test information at various time zones, mornings, lunches and evenings of the day in order to understand the influence of various kinds of external light on driver gaze detection. The deep learning unit 132 uses a face region image, a left hand region image, and a right hand region image of 224 × 224 pixels in its database (DDGC-DB1) for CNN learning and verification. The deep learning unit 132 extracts a set of 4096 feature values from the face region image, the left eye region image, and the right eye region image in response to the second fully connected layer at the final level through CNN learning, and then minimum distance. Based on the above, the gaze area of 17 areas is determined. Here, the seventeen gaze regions may be designated as five or more gaze regions of the driver through the gaze region experiment of the driver, referring to FIG. 6.

도 7 및 도 8을 참조하면, 딥 러닝부(132)는 13개의 컨벌루션 레이어, 5개의 풀링 레이어, 3개의 완전 연결 레이어를 포함한 CNN 구조를 이용할 수 있다. 딥 러닝부(132)는 예를 들면, 제1 컨벌루션 레이어가 224×224×3(너비, 높이 및 채널의 크기)의 입력에 대해 3×3 크기의 64개의 필터가 이용되어 224×224×64의 특징 맵이 획득된다. 딥 러닝부(132)는 예를 들면, 획득된 224×224×64의 특징 맵이 제1 컨벌루션 레이어의 ReLU 층을 통과한 후 최대 풀링 레이어를 통하여 112×112×64를 출력할 수 있으며, 최대 풀링 레이어는 커널 사이즈 2×2, 스트라이드 2×2일 수 있다. 딥 러닝부(132)는 예를 들면, 13개의 컨벌루션 레이어들에서 이용된 커널 사이즈는 3×3, 패딩 수는 1×1 및 스트라이드 수는 1×1이고, 필터의 수만 64, 128, 256 및 512로 변화할 수 있으며, 각 ReLU 레이어는 컨벌루션 레이어 다음에 위치하여 각 최대 풀링 레이어가 ReLU-1_2, ReLU-2_2, ReLU-3_3, ReLU-4_3, and ReLU-5_3 후에 이용될 수 있다. 여기서, 각 최대 풀링 레이어는 커널 사이즈 2×2, 스트라이드 수 2×2 및 패딩 수 0×0를 포함할 수 있다. 딥 러닝부(132)는 예를 들면, 224×224×64의 ReLU-1_2 레이어가 112×112×64의 Pool-1 레이어로 연결되고, 112×112×128의 ReLU-2_2 레이어가 56×56×128의 Pool-2 레이어로 연결되고, 56×56×256의 ReLU-3_3 레이어가 28×28×246의 Pool-3 레이어로 연결되고 28×28×512의 ReLU-4_3 레이어가 14×14×512의 Pool-4 레이어로 연결되고 14×14×512의 ReLU-5_3 레이어가 7×7×512의 Pool-5 레이어로 연결되며, 입력 이미지가 13 컨벌루션 레이어들, 13 ReLU 레이어들 및 5 풀링 레이어를 통하여 7×7×512 의 특징 맵을 최종적으로 출력할 수 있다. 이후, 딥 러닝부(132)는 예를 들면, 추가적인 3개의 완전 연결 레이어들을 통하여 4096×1, 4096×1, and 17×1의 특징 맵들을 각각 출력할 수 있다. 여기서, 제3의 완전 연결 레이어에서 소프트맥스 함수는 아래 수식 (1)이 이용된다.7 and 8, the deep learning unit 132 may use a CNN structure including 13 convolutional layers, 5 pulling layers, and 3 fully connected layers. The deep learning unit 132 may include, for example, 64 filters having a size of 3 × 3 for the input of the first convolutional layer of 224 × 224 × 3 (width, height, and size of the channel). The feature map of is obtained. For example, the deep learning unit 132 may output 112 × 112 × 64 through the maximum pooling layer after the obtained 224 × 224 × 64 feature map passes through the ReLU layer of the first convolutional layer. The pooling layer may be kernel size 2 × 2, stride 2 × 2. The deep learning unit 132 may have, for example, a kernel size of 3 × 3, a padding number of 1 × 1 and a stride number of 1 × 1 used in 13 convolution layers, and only 64, 128, 256, and the number of filters. It can be changed to 512, where each ReLU layer is located after the convolutional layer so that each maximum pooling layer can be used after ReLU-1_2, ReLU-2_2, ReLU-3_3, ReLU-4_3, and ReLU-5_3. Here, each maximum pooling layer may include a kernel size 2 × 2, a stride number 2 × 2, and a padding number 0 × 0. For example, the deep learning unit 132 has a 224 × 224 × 64 ReLU-1_2 layer connected to a 112 × 112 × 64 Pool-1 layer and a 112 × 112 × 128 ReLU-2_2 layer 56 × 56. Connected to a Pool-2 layer of × 128 and a ReLU-3_3 layer of 56 × 56 × 256 connected to a Pool-3 layer of 28 × 28 × 246 and a 14 × 14 × ReLU-4_3 layer of 28 × 28 × 512. Connected to a Pool-4 layer of 512 and a ReLU-5_3 layer of 14 × 14 × 512 to a Pool-5 layer of 7 × 7 × 512, with the input image 13 convolutional layers, 13 ReLU layers and 5 pooling layers Finally, the 7 × 7 × 512 feature map can be finally output. Thereafter, the deep learning unit 132 may output feature maps of 4096 × 1, 4096 × 1, and 17 × 1, respectively, through three additional fully connected layers. Here, Equation (1) below is used for the Softmax function in the third fully connected layer.

(1)

(One)

여기서, r은 출력 뉴런의 어레이임Where r is an array of output neurons

딥 러닝부(132)는 데이터 오버 피팅에 기인하여 CNN 기반의 인식 시스템에서의 낮은 인식 정확도를 해결하기 위하여 데이터 증가 및 소실 방법을 이용할 수 있으며, 데이터 소실을 위하여 제1 및 제2 완전 연결 레이어 사이에 연결을 무작위적으로 단절하는 50% 확률의 데이터 소실을 채택할 수 있다. 딥 러닝부(132)는 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지로부터 3개의 별도의 특징 세트(3개의 4096 특징 세트)를 추출하고, 최소 최대 스케일링(min-max scaling)에 의해 추출한 특징 값들을 정규화 한다. 딥 러닝부(132)는 17개의 시선 영역에 대하여 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지로부터 획득한 정규화한 특징 값들을 세트로 저장한다. The deep learning unit 132 may use a data increment and loss method to solve the low recognition accuracy in the CNN-based recognition system due to data overfitting, and between the first and second fully connected layers for data loss. Can employ data loss of 50% probability to break the link at random. The deep learning unit 132 extracts three separate feature sets (three 4096 feature sets) from the face region image, the left eye region image, and the right eye region image, and extracts them by min-max scaling. Normalize feature values. The deep learning unit 132 stores normalized feature values obtained from the face region image, the left eye region image, and the right eye region image for the 17 gaze regions.

특징 값 추출부(134)는 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지를 변환하여 딥 러닝 기반의 CNN(Convolutional Neural Networks)에 입력하고, 각 이미지의 특징 값을 추출한다. 여기서, 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지는 바이 리니어(bi-linear) 보간을 이용하여 50픽셀의 여유공간을 포함한 224×224 픽셀의 이미지로는 크기를 변환하여 입력 값으로 이용될 수 있다. 특징 값 추출부(134)는 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지로부터 3개의 별도의 특징 값(4096 특징 값)을 추출한다.The feature value extractor 134 converts a face region image, a left eye region image, and a right eye region image, inputs them to deep learning-based convolutional neural networks (CNNs), and extracts feature values of each image. Here, the face region image, the left eye region image, and the right eye region image are converted to a 224 × 224 pixel image including 50 pixels of free space using bi-linear interpolation and used as input values. Can be. The feature value extractor 134 extracts three separate feature values (4096 feature values) from the face region image, the left eye region image, and the right eye region image.

특징 세트 생성부(136)는 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지로부터 추출된 특징 값들을 정규화 하고, 정규화된 3개의 특징 값들을 포함한 특징 세트를 생성한다. 여기서, 정규화는 최소 최대 스케일링(min-max scaling) 방법을 이용할 수 있다.The feature set generator 136 normalizes feature values extracted from the face region image, the left eye region image, and the right eye region image, and generates a feature set including three normalized feature values. In this case, normalization may use a min-max scaling method.

다시 도 1을 참조하면, 운전자 시선 추적부(140)는 출력한 특징 세트와 저장된 17개 시선 영역의 특징 세트들 사이에서 산출한 유클리디안 거리를 이용하여 운전자의 시선 영역을 분류한다.Referring back to FIG. 1, the driver gaze tracking unit 140 classifies the driver gaze area by using the Euclidean distance calculated between the output feature set and the stored feature sets of the 17 gaze areas.

도 9를 참조하면, 운전자 시선 추적부(140)는 시선 영역 거리 계산부(142) 및 운전자 시선 분류부(144)를 포함한다.Referring to FIG. 9, the driver gaze tracking unit 140 may include a gaze area distance calculator 142 and a driver gaze classifier 144.

시선 영역 거리 계산부(142)는 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지를 딥 러닝 기반의 CNN 알고리즘에 입력하여 출력한 특징 세트와 학습 데이터를 이용하여 미리 저장된 17개의 시선 영역에 대하여 저장된 특징 세트들 사이의 유클리디안 거리를 계산한다. The gaze area distance calculating unit 142 inputs a face area image, a left eye area image, and a right eye area image to a deep learning based CNN algorithm, and outputs 17 gaze areas previously stored using a feature set and learning data. Calculate the Euclidean distance between the stored feature sets.

운전자 시선 분류부(144)는 계산된 3개의 유클리디안 거리를 점수 레벨 융합(score level fusion) 방법에 의해 결합하고, 17개 시선 영역에 대해 결합된 점수가 최소인 시선 영역을 운전자 시선 영역으로 결정한다. 여기서, 점수 레벨 융합은 아래 수식 (2)의 가중 합(weighted SUM) 또는 수식 (3)의 가중 곱(weighted PRODUCT) 방법이 비교되어 훈련 데이터에 의해 최종 가중 값이 결정된다. The driver gaze classification unit 144 combines the three Euclidean distances calculated by a score level fusion method, and the gaze area having the minimum combined scores for the 17 gaze areas as the driver gaze area. Decide Here, the score level fusion is compared with the weighted sum method of the following formula (2) or the weighted product method of the formula (3), and the final weight value is determined by the training data.

(2)

여기서, WS는 가중 합이며, wi는 가중치이며, m은 3이고, di는 유클리디안 거리를 나타냄.Where WS is the weighted sum, wi is the weight, m is 3, and di represents the Euclidean distance.

(3)

여기서, WP는 가중 곱이며, wi는 가중치이며, m은 3이고, di는 유클리디안 거리를 나타냄.Where WP is the weighted product, wi is the weight, m is 3, and di represents the Euclidean distance.

도 10 내지 도 13은 본 발명의 일 실시 예에 따른 딥 러닝 기반의 운전자 시선 추적 방법을 설명하기 위한 도면들이다.10 to 13 are diagrams for describing a driver's gaze tracking method based on deep learning according to an exemplary embodiment.

도 10을 참조하면, 단계 S1010에서 운전자 시선 추적 장치(100)는 운전자 얼굴 영상을 입력한다. 운전자 시선 추적 장치(100)는 적외선 카메라를 결합한 소형 시선 추적 장치를 사용할 수 있다. Referring to FIG. 10, in operation S1010, the driver gaze tracking apparatus 100 inputs a driver face image. The driver gaze tracking device 100 may use a compact gaze tracking device incorporating an infrared camera.

단계 S1020에서 운전자 시선 추적 장치(100)는 취득된 운전자 얼굴 영상에서 얼굴 특징 식별 점을 추적하여 얼굴 특징 식별 점을 탐지한다.In step S1020 the driver gaze tracking device 100 The facial feature identification point is detected by tracking the facial feature identification point in the acquired driver's face image.

단계 S1030에서 운전자 시선 추적 장치(100)는 얼굴 특징 식별 점에 기반하여 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 손 영역 이미지를 추출한다.In step S1030 the driver gaze tracking device 100 The face region image, the left eye region image, and the right hand region image are extracted based on the facial feature identification point.

단계 S1040에서 운전자 시선 추적 장치(100)는 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 손 영역 이미지 등 검출된 3개의 이미지들에 대하여 각각 딥 러닝 기반의 CNN을 수행하여 특징 값들을 포함한 특징 세트를 생성한다.In step S1040 the driver gaze tracking device 100 Deep learning based CNN is performed on each of three detected images such as a face region image, a left eye region image, and a right hand region image to generate a feature set including feature values.

도 11을 참조하면, 단계 S1110에서 운전자 시선 추적 장치(100)는 딥 러닝 기반의 CNN을 이용하여 얼굴 시선 추적을 학습한다. Referring to FIG. 11, in step S1110, the driver gaze tracking apparatus 100 learns a face gaze tracking using a deep learning-based CNN.

단계 S1120에서 운전자 시선 추적 장치(100)는 얼굴 영역 이미지, 왼 눈 영역 이미지 및 오른 눈 영역 이미지를 변환하여 CNN에 입력하고, 각 이미지의 특징 값을 추출한다. In operation S1120, the driver gaze tracking apparatus 100 converts the face region image, the left eye region image, and the right eye region image into the CNN, and extracts feature values of each image.

단계 S1130에서 운전자 시선 추적 장치(100)는 추출된 특징 값들을 정규화 하고, 정규화 된 특징 값들을 포함한 특징 세트를 생성한다. In operation S1130, the driver gaze tracking apparatus 100 normalizes the extracted feature values and generates a feature set including the normalized feature values.

다시 도 10을 참조하면, 단계 S1050에서 운전자 시선 추적 장치(100)는 생성한 특징 세트와 미리 저장된 17개의 시선 영역 특징 세트들 간의 각 영역 이미지의 유클리디안 거리를 산출한다. Referring back to FIG. 10, in step S1050, the driver gaze tracking apparatus 100 calculates a Euclidean distance of each area image between the generated feature set and the 17 gaze area feature sets previously stored.

단계 S1060에서 운전자 시선 추적 장치(100)는 각 영역 이미지의 유클리디안 거리를 이용하여 운전자 시선 영역을 분류한다. 운전자 시선 추적 장치(100)는 산출한 각 영역 이미지의 유클리디안 거리의 레벨 융합(score level fusion) 방법에 의해 결합하고, 17개 시선 영역에 대해 결합된 점수가 최소인 시선 영역을 운전자 시선 영역으로 결정할 수 있다. 여기서, 레벨 융합(score level fusion) 방법은 가중 합(weighted SUM) 또는 가중 곱(weighted PRODUCT) 방법을 이용할 수 있다.In operation S1060, the driver gaze tracking apparatus 100 classifies the driver gaze area using the Euclidean distance of each area image. The driver gaze tracking device 100 combines the calculated eyelid region by Euclidean distance level fusion method of each area image, and the driver gaze region includes a gaze region having a minimum combined score for 17 gaze regions. Can be determined. Here, the level level fusion method may use a weighted sum or weighted product method.

도 12를 참조하면, 본 발명에 따른 운전자 시선 추적 방법은 정확도로 강한 정확 평가도(SCER; Strictly Correct Estimation Rate)과 대략 정확 평가도(LCER; Loosely Correct Estimation Rate)를 측정하였다. 여기서, 강한 정확 평가도(SCER)는 해당되는 지점에 대한 정확도이고, 대략 정확 평가도(LCER)은 해당되는 지점 및 주변 인접한 지점까지 포함한 정확도이다. Referring to FIG. 12, the driver's eye tracking method according to the present invention accurately measures a strict accuracy estimation rate (SCER) and a roughly accurate estimation rate (LCER). Here, the strong accuracy evaluation degree (SCER) is the accuracy for the corresponding point, and the approximately accurate evaluation degree (LCER) is the accuracy including the corresponding point and neighboring adjacent points.

본 발명에 따른 딥 러닝 기반의 CNN 알고리즘은 교차 검증을 진행하였으며, 학습 데이터를 두 그룹으로 나누어 1번 그룹은 16,310장, 2번 그룹은 16,280장으로 학습하였으며, 검증(Test)은 1번 그룹 3,256장, 2번 그룹 3,262장으로 구성하여 진행하여 정확도를 측정하였다. 각 영역 별 자세한 정확도는 도 12와 같으며, 최종 정확도는 17개 시선 영역에 대해 평균 SCER 92.8%, LCER 99.6%이다. The deep learning-based CNN algorithm according to the present invention was cross-validated. The learning data was divided into two groups, and the first group was trained as 16,310 chapters, the second group was 16,280 chapters, and the test (Test) group 1,3,256. Chapter 3, Group 2, 3262 chapters were conducted to measure the accuracy. Detailed accuracy of each region is shown in FIG. 12, and final accuracy is 92.8% SCER and 99.6% LCER for 17 gaze regions.

도 13을 참조하면, 본 발명에 따른 운전자 시선 추적 방법은 공개된 콜럼비아 시선 데이터베이스 CAVE-DB에 적용하였다. 여기서, CAVE-DB는 변화하는 머리의 포즈와 시선 방향을 가진 56명에 대한 5,880 이미지를 포함하며, 머리 포즈 별 21 개의 시선 방향과 함께, 5개의 머리 포즈에 대하여 105개의 시선 방향들이 있으나 13개의 시선 방향 이미지를 선택하였다.Referring to FIG. 13, the driver gaze tracking method according to the present invention was applied to a published Columbia gaze database CAVE-DB. Here, CAVE-DB contains 5,880 images of 56 people with varying head poses and gaze directions, with 105 gaze directions for 5 head poses, with 21 gaze directions per head pose, but 13 A gaze direction image was selected.

본 발명에 따른 운전자 시선 추적 방법은 공개된 콜럼비아 시선 데이터베이스 CAVE-DB에 대해 정확도로 강한 정확 평가도(SCER; Strictly Correct Estimation Rate)과 대략 정확 평가도(LCER; Loosely Correct Estimation Rate)를 측정하였다. 각 영역 별 자세한 정확도는 도 13과 같으며, 최고 정확도는 13개 시선 영역에 대해 평균 SCER 88.9%, LCER 98.8%이다. The driver gaze tracking method according to the present invention measured the Strictly Correct Estimation Rate (SCER) and the Loosely Correct Estimation Rate (LCER) with accuracy against the published Columbia Eye gaze database CAVE-DB. The detailed accuracy of each region is shown in FIG. 13, and the highest accuracy is 88.9% SCER and 98.8% LCER for 13 gaze regions.

본 발명의 실시 예에 따른 딥 러닝 기반의 운전자 시선 추적 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 또한 상술한 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The driver's gaze tracking method based on deep learning according to an exemplary embodiment of the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. Computer-readable media may include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the computer readable medium may be those specially designed and constructed for the present invention, or may be known and available to those skilled in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Hardware devices specially configured to store and execute program instructions such as magneto-optical media and ROM, RAM, flash memory and the like. In addition, the above-described medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been described with reference to the embodiments. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

Claims

In the deep learning based driver gaze tracking device,
A driver image input unit to receive a driver image;
A feature region detector for tracking face feature identification points in the input driver image and extracting a face region image, a left eye region image, and a right eye region image based on the detected face feature identification points;
A deep learning feature extractor configured to output a deep learning feature set of each area image by using a face area image, a left eye area image, and a right eye area image; And
A deep gaze-based driver gaze tracking device including a driver gaze tracking unit for classifying a gaze area of a driver using the output feature set and the stored gaze area feature sets.

The method of claim 1,
The deep learning feature extraction unit
A deep learning unit learning face gaze tracking using deep learning-based convolutional neural networks (CNN);
A feature value extracting unit configured to convert the face region image, the left eye region image, and the right eye region image, input the same to the CNN, and extract feature values of each image; And
A deep learning based driver gaze tracking device including a feature set generator for normalizing extracted feature values and generating a feature set including the normalized feature values.

The method of claim 1,
The deep learning unit
Deep learning by extracting 4096 feature values from the face region image, the left eye region image, and the right eye region image into images of 224 × 224 pixels, respectively, extracting 4096 feature values Driver's eye tracking device based.

The method of claim 3,
The deep learning unit
Use a CNN structure with 13 convolutional layers, 5 pooling layers, and 3 fully connected layers,
The kernel size used in the 13 convolution layers is 3 × 3, the number of padding is 1 × 1, the number of strides is 1 × 1, and only the number of filters is changed to 64, 128, 256 and 512, and each pooling layer is a kernel. Deep learning based driver gaze tracking device including size 2 × 2, number of strides 2 × 2 and padding 0 × 0.

The method of claim 4, wherein
The deep learning unit
The three fully connected layers are deep learning based driver gaze tracking device for outputting feature maps of 4096 × 1, 4096 × 1 and 17 × 1, respectively.

The method of claim 2,
The feature set generation unit
And a deep learning based driver gaze tracking device for normalizing feature values extracted from the face region image, the left eye region image, and the right eye region image using a min-max scaling method.

The method of claim 1,
The driver's eye tracking unit
A gaze area distance calculator configured to calculate an Euclidean distance between the stored feature sets for the 17 gaze areas stored in advance using the output feature set and the training data; And
A driver gaze classification unit for combining the calculated three Euclidean distances by a score level fusion method and determining a gaze area having a minimum combined score for 17 gaze areas as a driver gaze area; Deep learning based driver gaze tracking device.

The method of claim 7, wherein
The score level fusion method is a deep learning based driver gaze tracking device using a weighted sum or weighted product method.

In the deep learning based driver gaze tracking method,
Receiving a driver's face image;
Detecting a facial feature identification point by tracking a facial feature identification point on the driver's face image;
Extracting a face region image, a left eye region image, and a right hand region image based on the face feature identification point;
Generating a feature set including feature values by performing deep learning-based convolutional neural networks (CNNs) on three detected images such as the face region image, the left eye region image, and the right hand region image; And
And classifying the driver's gaze area using the feature set.
In the deep learning based driver gaze tracking method,
Receiving a driver's face image;
Detecting a facial feature identification point by tracking a facial feature identification point on the driver's face image;
Extracting a face region image, a left eye region image, and a right hand region image based on the face feature identification point;
Generating a feature set including feature values by performing deep learning-based convolutional neural networks (CNNs) on three detected images such as the face region image, the left eye region image, and the right hand region image; And
And classifying the driver's gaze area using the feature set.

The method of claim 9,
Generating a feature set including feature values by performing deep learning based CNN on each of the three images,
Learning face gaze tracking using a deep learning based CNN;
Converting the face region image, the left eye region image, and the right eye region image into the CNN, and extracting feature values of each image; And
And normalizing the extracted feature values, and generating a feature set including the normalized feature values.

The method of claim 9,
Categorizing the driver's gaze area using the feature set
Calculating a Euclidean distance of each region image between the feature set and the 17 pre-stored gaze region feature sets; And
A driver's gaze tracking method based on deep learning, comprising classifying a driver's gaze area using Euclidean distance of each area image.

The method of claim 11,
Classifying the driver's gaze area using the Euclidean distance of each area image
Driver based on deep learning that combines by the Euclidean distance level fusion method of each area image, and determines the gaze area with the minimum combined score for 17 gaze areas as the driver gaze area. Eye tracking method.

The method of claim 12,
The core level fusion method
Deep learning based driver gaze tracking method using weighted sum or weighted product method.

A computer program that executes the deep learning based driver gaze tracking method of any one of claims 9 to 13 and is recorded on a computer-readable recording medium.