KR20220013834A

KR20220013834A - Method and apparatus of image processing

Info

Publication number: KR20220013834A
Application number: KR1020200093297A
Authority: KR
Inventors: 강동우
Original assignee: 삼성전자주식회사
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2022-02-04

Abstract

According to an embodiment of the present invention, a method and device for image processing are provided to obtain an image frame including the face of a user, detect the face area of the user in an image frame, classify a type of occlusion for the face area according to whether an occlusion exists in at least some of the face area, track the user's eyes based on the type of occlusion, and output information about the face area including the tracked eye position.

Description

Image processing method and apparatus {METHOD AND APPARATUS OF IMAGE PROCESSING}

아래의 실시예들은 영상 처리 방법 및 장치에 관한 것이다.The following embodiments relate to an image processing method and apparatus.

카메라 기반의 눈추적 기술은 예를 들어, 시점 추적 기반의 무안경 3D 초다시점 디스플레이, 및/또는 HUD(Head Up Display) 등 많은 분야에서 활용될 수 있다. 카메라 기반의 눈추적 기술은 카메라의 영상 품질 및/또는 눈 추적 방법에 의해 성능이 좌우될 수 있다. 카메라 기반의 눈추적 기술은 맨 얼굴(bare face)에는 잘 동작하나, 선글라스(Sunglass) 또는 모자를 착용하는 경우 등과 같이 얼굴의 일부 영역이 가려진 상태에서는 동작의 안정성이 떨어질 수 있다. 이와 같이 실제 운전 환경을 고려하여 얼굴의 일부 영역이 가려진 경우에도 안정적으로 동작할 수 있는 눈추적 방법이 요구된다. The camera-based eye tracking technology may be utilized in many fields, for example, a glasses-free 3D super multi-viewpoint display based on viewpoint tracking, and/or a head-up display (HUD). The performance of the camera-based eye tracking technology may depend on the image quality of the camera and/or the eye tracking method. The camera-based eye tracking technology works well for a bare face, but the stability of the operation may be poor when some areas of the face are covered, such as when wearing sunglasses or a hat. As described above, in consideration of the actual driving environment, there is a need for an eye tracking method that can operate stably even when a part of the face is covered.

전술한 배경기술은 발명자가 본원의 개시 내용을 도출하는 과정에서 보유하거나 습득한 것으로서, 반드시 본 출원 전에 일반 공중에 공개된 공지기술이라고 할 수는 없다.The above-mentioned background art is possessed or acquired by the inventor in the process of deriving the disclosure of the present application, and it cannot necessarily be said to be a known technology disclosed to the general public prior to the present application.

일 실시예에 따르면, 영상 처리 방법은 사용자의 얼굴을 포함하는 영상 프레임을 획득하는 단계; 상기 영상 프레임에서 상기 사용자의 얼굴 영역을 검출하는 단계; 상기 얼굴 영역 중 적어도 일부에 가려짐(occlusion)이 존재하는지 여부에 따라 상기 얼굴 영역에 대한 상기 가려짐의 유형을 분류하는 단계; 상기 가려짐의 유형에 기초하여 상기 사용자의 눈을 추적(tracking)하는 단계; 및 상기 추적한 눈의 위치를 포함하는 상기 얼굴 영역에 대한 정보를 출력하는 단계를 포함한다. According to an embodiment, an image processing method includes: acquiring an image frame including a user's face; detecting the face region of the user from the image frame; classifying the type of occlusion for the face region according to whether occlusion exists in at least a part of the face region; tracking the user's eyes based on the type of occlusion; and outputting information on the face region including the tracked eye position.

상기 가려짐의 유형은 상기 가려짐이 존재하지 않는 맨 얼굴(bare face)에 해당하는 제1 유형; 상기 사용자의 얼굴 영역 중 이마, 눈썹 및 눈의 적어도 일부 중 적어도 하나의 영역이 가려진 제2 유형; 상기 사용자의 얼굴 영역 중 눈 영역이 가려진 제3 유형; 및 상기 사용자의 얼굴 영역 중 입 및 코 중 적어도 하나의 영역이 가려진 제4 유형 중 적어도 하나를 포함할 수 있다. The occlusion type may include a first type corresponding to a bare face in which the occlusion does not exist; a second type in which at least one of a forehead, an eyebrow, and at least a portion of an eye of the user's face area is covered; a third type in which an eye region of the user's face region is covered; and at least one of a fourth type in which at least one of a mouth and a nose among the face regions of the user is covered.

상기 가려짐의 유형은 상기 가려짐이 존재하지 않는 맨 얼굴(bare face)에 해당하는 A 유형; 및 상기 사용자의 얼굴 영역 중 선글라스에 의해 눈 영역이 가려진 B 유형 중 적어도 하나를 포함할 수 있다.The type of occlusion may include type A corresponding to a bare face in which the occlusion does not exist; and at least one of type B in which an eye region is covered by sunglasses among the face region of the user.

상기 사용자의 눈을 추적하는 단계는 상기 가려짐의 유형에 기초하여, 상기 얼굴 영역의 적어도 일부 형상에 서로 다른 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적하는 단계를 포함할 수 있다. The tracking of the user's eyes may include tracking the user's eyes by aligning different numbers of vertices to at least some shapes of the face region based on the type of occlusion.

상기 사용자의 눈을 추적하는 단계는 상기 가려짐의 유형이 제1 유형인 경우, 상기 얼굴 영역의 일부 형상에 제1 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적하는 단계; 및 상기 가려짐의 유형이 제2 유형 내지 제4 유형 중 어느 한 유형인 경우, 상기 얼굴 영역의 전체 형상에 제1 개수보다 많은 제2 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적하는 단계를 포함할 수 있다. The tracking of the user's eyes may include: when the occlusion type is the first type, tracking the user's eyes by aligning a first number of vertices to a partial shape of the face region; and when the type of occlusion is any one of the second to fourth types, tracking the user's eyes by aligning a second number of vertices greater than the first number to the overall shape of the face region. may include

상기 가려짐의 유형이 제1 유형인 경우, 상기 사용자의 눈을 추적하는 단계는 상기 얼굴 영역을 검출하는 검출 박스의 크기를, 상기 얼굴 영역 중 눈 및 코의 형상에 대응하도록 축소하는 단계; 상기 축소된 검출 박스에 의해 상기 눈 및 코의 형상을 크롭핑(cropping)하는 단계; 및 상기 크롭핑된 눈 및 코의 형상에 상기 제1 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적하는 단계를 포함할 수 있다. When the type of occlusion is the first type, the tracking of the user's eyes may include: reducing a size of a detection box for detecting the face region to correspond to shapes of eyes and nose among the face regions; cropping the shape of the eyes and nose by the reduced detection box; and tracking the user's eyes by aligning the first number of vertices to the cropped shapes of the eyes and nose.

상기 가려짐의 유형이 제2 유형 내지 제4 유형 중 어느 한 유형인 경우, 상기 사용자의 눈을 추적하는 단계는 상기 얼굴 영역을 검출하는 검출 박스에 의해 상기 얼굴 영역의 전체 형상에 상기 제2 개수의 정점들을 정렬하는 단계; 및 상기 정렬된 제2 개수의 정점들에 의해 상기 얼굴 영역에서 상기 사용자의 동공의 위치를 추정함으로써 상기 사용자의 눈을 추적하는 단계를 포함할 수 있다. When the type of occlusion is any one of the second type to the fourth type, the step of tracking the user's eye may include adding the second number to the overall shape of the face area by a detection box for detecting the face area. aligning the vertices of and tracking the user's eye by estimating the position of the user's pupil in the face region by the aligned second number of vertices.

상기 사용자의 눈을 추적하는 단계는 상기 정렬된 정점들에 의한 상기 사용자의 눈에 대한 추적이 실패했다는 판단에 따라, 상기 얼굴 영역을 재검출하는 단계; 및 상기 재검출된 얼굴 영역에 기초하여, 상기 가려짐의 유형을 분류하는 단계, 및 상기 사용자의 눈을 추적하는 단계를 반복적으로 수행하는 단계를 더 포함할 수 있다. The tracking of the user's eyes may include: re-detecting the face region according to a determination that tracking of the user's eyes by the aligned vertices has failed; and classifying the type of occlusion based on the re-detected face region, and repeatedly performing the steps of tracking the user's eyes.

상기 가려짐의 유형을 분류하는 단계는 기계 학습에 의해 상기 얼굴 영역 중 상기 가려짐이 존재하는 영역을 구별하도록 미리 학습된 분류기(classifier)를 이용하여 상기 얼굴 영역에 대한 상기 가려짐의 유형을 분류하는 단계를 포함할 수 있다. The classifying the occlusion type classifies the occlusion type for the face region using a pre-trained classifier to distinguish the occlusion region among the face regions by machine learning. may include the step of

일 실시예에 따르면, HUD(Head Up Display) 장치의 영상 처리 방법은 영상 프레임을 획득하는 단계; 상기 영상 프레임에서의 사용자의 선글라스 착용 여부를 결정하는 단계; 상기 선글라스 착용 여부에 대한 결정에 따라 서로 다른 유형의 눈 추적기를 사용하여 상기 사용자의 눈을 추적하는 단계; 및 상기 추적한 눈의 위치를 포함하는 상기 얼굴 영역에 대한 정보를 출력하는 단계를 포함한다. According to an embodiment, an image processing method of a Head Up Display (HUD) apparatus includes: acquiring an image frame; determining whether the user wears sunglasses in the image frame; tracking the user's eyes using different types of eye trackers according to the determination of whether to wear the sunglasses; and outputting information on the face region including the tracked eye position.

상기 사용자의 눈을 추적하는 단계는 상기 선글라스를 착용하지 않았다는 결정에 따라 제1 유형의 눈 추적기를 사용하여 상기 사용자의 눈을 추적하는 단계; 및 상기 선글라스를 착용했다는 결정에 따라 제2 유형의 눈 추적기를 사용하여 상기 사용자의 눈을 추적하는 단계를 포함할 수 있다. The step of tracking the user's eyes may include: tracking the user's eyes using a first type of eye tracker in response to a determination that the sunglasses are not worn; and tracking the user's eyes using a second type of eye tracker in accordance with the determination that the sunglasses are worn.

상기 제1 유형의 눈 추적기는 상기 사용자의 얼굴 영역의 일부 형상에 제1 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적하고, 상기 제2 유형의 눈 추적기는 상기 사용자의 얼굴 영역의 전체 형상에 제1 개수보다 많은 제2 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적할 수 있다. The first type of eye tracker tracks the user's eyes by aligning a first number of vertices to a partial shape of the user's facial region, and the second type of eye tracker is configured with an overall shape of the user's facial region. The eye of the user may be tracked by aligning a second number of vertices greater than the first number.

상기 제1 유형의 눈 추적기를 사용하여 상기 사용자의 눈을 추적하는 단계는 상기 사용자의 얼굴 영역을 검출하는 검출 박스의 크기를, 상기 얼굴 영역 중 눈 및 코의 형상에 대응하도록 축소하는 단계; 상기 축소된 검출 박스에 의해 상기 눈 및 코의 형상을 크롭핑하는 단계; 및 상기 크롭핑된 눈 및 코의 형상에 제1 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적하는 단계를 포함할 수 있다. The step of tracking the user's eyes using the first type of eye tracker may include reducing a size of a detection box for detecting the user's face region to correspond to the shape of eyes and nose among the face regions; cropping the shape of the eyes and nose by the reduced detection box; and tracking the user's eyes by aligning a first number of vertices to the cropped shape of the eyes and nose.

상기 제2 유형의 눈 추적기를 사용하여 상기 사용자의 눈을 추적하는 단계는 상기 사용자의 얼굴 영역을 검출하는 검출 박스에 의해 상기 얼굴 영역의 전체 형상에 제2 개수의 정점들을 정렬하는 단계; 및 상기 정렬된 제2 개수의 정점들에 의해 상기 얼굴 영역에서 상기 사용자의 동공의 위치를 추정함으로써 상기 사용자의 눈을 추적하는 단계를 포함할 수 있다. The step of tracking the user's eyes using the second type of eye tracker may include: aligning a second number of vertices to the overall shape of the facial region by a detection box that detects the user's facial region; and tracking the user's eye by estimating the position of the user's pupil in the face region by the aligned second number of vertices.

상기 선글라스 착용 여부를 결정하는 단계는 상기 사용자의 눈 영역이 상기 선글라스에 의해 가려졌는지 여부를 검출하도록 미리 학습된 검출기(detector)를 이용하여 상기 선글라스 착용 여부를 결정하는 단계를 포함할 수 있다. 일 실시예에 따르면, 영상 처리 장치는 사용자의 얼굴을 포함하는 영상 프레임을 획득하는 센서; 및 상기 영상 프레임에서 상기 사용자의 얼굴 영역을 검출하고, 상기 얼굴 영역 중 적어도 일부에 가려짐이 존재하는지 여부에 따라 상기 얼굴 영역에 대한 상기 가려짐의 유형을 분류하고, 상기 가려짐의 유형에 기초하여 상기 사용자의 눈을 추적하며, 상기 추적한 눈의 위치를 포함하는 상기 얼굴 영역에 대한 정보를 출력하는 프로세서를 포함한다.The determining whether to wear the sunglasses may include determining whether to wear the sunglasses by using a detector previously learned to detect whether the eye region of the user is covered by the sunglasses. According to an embodiment, an image processing apparatus includes: a sensor for acquiring an image frame including a user's face; and detecting the user's face region from the image frame, classifying the type of occlusion with respect to the face region according to whether occlusion exists in at least a portion of the face region, and based on the occlusion type and a processor for tracking the user's eyes and outputting information on the face region including the tracked eye position.

상기 프로세서는 상기 가려짐의 유형에 기초하여, 상기 얼굴 영역의 적어도 일부 형상에 서로 다른 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적할 수 있다. The processor may track the user's eyes by aligning different numbers of vertices to at least some shapes of the face region based on the type of occlusion.

상기 프로세서는 상기 가려짐의 유형이 제1 유형인 경우, 상기 얼굴 영역의 일부 형상에 제1 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적하고, 상기 가려짐의 유형이 제2 유형 내지 제4 유형 중 어느 한 유형인 경우, 상기 얼굴 영역의 전체 형상에 제1 개수보다 많은 제2 개수의 정점들을 정렬함으로써 상기 사용자의 눈을 추적할 수 있다. When the type of occlusion is the first type, the processor tracks the user's eyes by aligning a first number of vertices to some shapes of the face region, and the type of occlusion is a second type to a fourth type. In the case of any one of the types, the user's eyes may be tracked by aligning a second number of vertices greater than the first number to the overall shape of the face region.

상기 영상 처리 장치는 HUD(Head Up Display) 장치, 3D 디지털 정보 디스플레이(Digital Information Display, DID), 내비게이션 장치, 3D 모바일 기기, 스마트 폰, 스마트 TV, 및 스마트 차량 중 적어도 하나를 포함할 수 있다. The image processing apparatus may include at least one of a head up display (HUD) device, a 3D digital information display (DID), a navigation device, a 3D mobile device, a smart phone, a smart TV, and a smart vehicle.

도 1은 일 실시예에 따른 영상 처리 방법을 나타낸 흐름도.
도 2는 일 실시예에 따라 얼굴 영역에 대한 가려짐의 유형을 분류하는 방법을 설명하기 위한 도면.
도 3은 일 실시예에 따른 가려짐의 유형에 기초하여 사용자의 눈을 추적하는 방법을 나타낸 흐름도.
도 4는 일 실시예에 따른 영상 처리 장치가 가려짐의 유형에 기초하여 서로 다른 개수의 정점들을 정렬하는 방법을 설명하기 위한 도면.
도 5는 다른 실시예에 따라 사용자의 눈을 추적하는 방법을 나타낸 흐름도.
도 6은 다른 실시예에 따른 영상 처리 방법을 설명하기 위한 도면.
도 7은 다른 실시예에 따른 영상 처리 방법을 나타낸 흐름도.
도 8은 일 실시예에 따른 영상 처리 장치의 블록도.1 is a flowchart illustrating an image processing method according to an exemplary embodiment;
FIG. 2 is a view for explaining a method of classifying a type of occlusion for a face region, according to an exemplary embodiment;
3 is a flowchart illustrating a method of tracking a user's eyes based on a type of occlusion according to an exemplary embodiment;
4 is a diagram for describing a method of aligning different numbers of vertices based on a type of occlusion by an image processing apparatus according to an exemplary embodiment;
5 is a flowchart illustrating a method of tracking a user's eyes according to another embodiment;
6 is a view for explaining an image processing method according to another embodiment;
7 is a flowchart illustrating an image processing method according to another exemplary embodiment;
8 is a block diagram of an image processing apparatus according to an exemplary embodiment;

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, since various changes may be made to the embodiments, the scope of the patent application is not limited or limited by these embodiments. It should be understood that all modifications, equivalents and substitutes for the embodiments are included in the scope of the rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are used for the purpose of description only, and should not be construed as limiting. The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that a feature, number, step, operation, component, part, or a combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiment belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same components are given the same reference numerals regardless of the reference numerals, and the overlapping description thereof will be omitted. In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description thereof will be omitted.

또한, 실시예의 구성 요소를 설명하는 데 있어서, 제1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다. In addition, in describing the components of the embodiment, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. When it is described that a component is "connected", "coupled" or "connected" to another component, the component may be directly connected or connected to the other component, but another component is between each component. It will be understood that may also be "connected", "coupled" or "connected".

어느 하나의 실시예에 포함된 구성요소와, 공통적인 기능을 포함하는 구성요소는, 다른 실시예에서 동일한 명칭을 사용하여 설명하기로 한다. 반대되는 기재가 없는 이상, 어느 하나의 실시예에 기재한 설명은 다른 실시예에도 적용될 수 있으며, 중복되는 범위에서 구체적인 설명은 생략하기로 한다.Components included in one embodiment and components having a common function will be described using the same names in other embodiments. Unless otherwise stated, a description described in one embodiment may be applied to another embodiment, and a detailed description in the overlapping range will be omitted.

도 1은 일 실시예에 따른 영상 처리 방법을 나타낸 흐름도이다. 도 1을 참조하면, 일 실시예에 따른 영상 처리 장치는 단계(110) 내지 단계(150)를 수행함으로써 사용자의 눈의 위치를 포함하는 얼굴 영역에 대한 정보를 출력할 수 있다. 1 is a flowchart illustrating an image processing method according to an exemplary embodiment. Referring to FIG. 1 , the image processing apparatus according to an exemplary embodiment may output information on a face region including the position of the user's eyes by performing steps 110 to 150 .

아래에서 설명될 실시예들은 무안경 3D 모니터, 무안경 3D 태블릿/스마트폰, 및 차량용 3D HUD (Head-up display) 등을 이용하는 경우에 적외선 카메라 또는 RGB 카메라를 이용하여 사용자의 눈을 추적하여 눈의 좌표를 출력하는 데에 활용될 수 있다. 또한, 실시예들은 예를 들어, 모니터 내의 칩에 소프트웨어 알고리즘의 형태로 구현되거나, 태블릿/스마트 폰에서 앱의 형태로도 구현 가능하며, 하드웨어 눈추적 장치로도 구현될 수 있다. 실시예들은 예를 들어, 자율 주행 자동차, 지능형 자동차, 스마트 폰, 및 모바일 기기 등에 적용될 수 있다. Examples to be described below include tracking the user's eyes using an infrared camera or RGB camera when using an autostereoscopic 3D monitor, an autostereoscopic 3D tablet/smartphone, and a vehicle 3D head-up display (HUD). It can be used to output the coordinates of In addition, the embodiments may be implemented in the form of a software algorithm on a chip in a monitor, in the form of an app in a tablet/smart phone, for example, or may be implemented as a hardware eye tracking device. The embodiments may be applied to, for example, an autonomous vehicle, an intelligent vehicle, a smart phone, and a mobile device.

단계(110)에서, 영상 처리 장치는 사용자의 얼굴을 포함하는 영상 프레임을 획득한다. 영상 프레임은 예를 들어, 컬러 영상 프레임, 및/또는 적외선 영상 프레임을 포함할 수 있다. 영상 프레임은 예를 들어, 차량 내부에 설치된 카메라, 이미지 센서, 비전 센서 등에 의해 촬영된 운전자의 영상에 해당할 수 있다. In operation 110, the image processing apparatus acquires an image frame including the user's face. The image frame may include, for example, a color image frame, and/or an infrared image frame. The image frame may correspond to, for example, an image of a driver captured by a camera, an image sensor, a vision sensor, etc. installed inside the vehicle.

단계(120)에서, 영상 처리 장치는 단계(110)에서 획득한 영상 프레임에서 사용자의 얼굴 영역을 검출한다. 영상 처리 장치는 예를 들어, 기계 학습(machine learning)에 의해 얼굴 영역을 검출하도록 미리 학습된 검출기(detector)를 이용하여 스캔 영역을 설정함으로써 영상 프레임에서 사용자의 얼굴 영역을 검출할 수 있다. 이때, 검출기는 사용자의 맨 얼굴뿐만 아니라 사용자가 선글라스를 착용한 얼굴 또한 검출하도록 학습된 것일 수 있다. 검출기에 의한 스캔 영역은 예를 들어, 후술하는 검출 박스의 형태로 표현될 수 있다. In operation 120 , the image processing apparatus detects the user's face region from the image frame obtained in operation 110 . The image processing apparatus may detect the user's face region from the image frame by, for example, setting the scan region using a detector previously learned to detect the face region by machine learning. In this case, the detector may be learned to detect not only the user's bare face but also the user's face wearing sunglasses. The scan area by the detector may be expressed, for example, in the form of a detection box to be described later.

예를 들어, 단계(110)에서 획득한 영상 프레임이 영상 처리 장치가 획득한 최초의 영상 프레임, 다시 말해 초기 영상 프레임(initial image frame)인 경우, 영상 처리 장치는 영상 프레임에 포함된 사용자의 얼굴 영역을 검출할 수 있다. 이와 달리, 단계(110)에서 획득한 영상 프레임이 초기 영상 프레임이 아닌 경우, 영상 처리 장치는 해당 영상 프레임의 적어도 하나의 이전 영상 프레임 중 어느 하나에 기초하여 검출된 얼굴 영역을 획득 또는 수신할 수 있다. 예를 들어, 해당 영상 프레임이 t 시점에 획득된 것이라면, 적어도 하나의 이전 프레임은 해당 영상 프레임이 획득된 t 시점을 기준으로 이전 시점(예를 들어, t-1 시점) 혹은 이전 시점들(예를 들어, t-1 시점, t-2 시점, t-3 시점)에서 획득된 것일 수 있다. For example, when the image frame acquired in step 110 is the first image frame acquired by the image processing apparatus, that is, an initial image frame, the image processing apparatus may display the user's face included in the image frame. area can be detected. Contrary to this, when the image frame obtained in step 110 is not the initial image frame, the image processing apparatus may obtain or receive the detected face region based on any one of at least one previous image frame of the corresponding image frame. have. For example, if the corresponding image frame is acquired at time t, at least one previous frame is a previous time (eg, time t-1) or previous viewpoints (eg, time t-1) based on time t at which the corresponding image frame was acquired For example, it may be obtained at time t-1, time t-2, time t-3).

단계(130)에서, 영상 처리 장치는 단계(120)에서 검출한 얼굴 영역 중 적어도 일부에 가려짐(occlusion)이 존재하는지 여부에 따라 얼굴 영역에 대한 가려짐의 유형을 분류한다. 영상 처리 장치는 예를 들어, 기계 학습에 의해 얼굴 영역 중 가려짐이 존재하는 영역을 구별하도록 미리 학습된 분류기(classifier)를 이용하여 얼굴 영역에 대한 가려짐의 유형을 분류할 수 있다. 가려짐의 유형은 예를 들어, 가려짐이 존재하지 않는 맨 얼굴(bare face)에 해당하는 제1 유형, 모자 등에 의해 사용자의 얼굴 영역 중 이마, 눈썹 및 눈의 적어도 일부 중 적어도 하나의 영역이 가려진 제2 유형, 선글라스 등에 의해 사용자의 얼굴 영역 중 눈 영역이 가려진 제3 유형, 및 마스크 등에 의해 사용자의 얼굴 영역 중 입 및 코 중 적어도 하나의 영역이 가려진 제4 유형 등을 포함할 수 있다. In operation 130 , the image processing apparatus classifies the type of occlusion of the face region according to whether occlusion exists in at least a part of the face region detected in operation 120 . The image processing apparatus may classify the type of occlusion with respect to the face region by using a pre-learned classifier to distinguish the region where the occlusion exists among the face regions by, for example, machine learning. The type of occlusion is, for example, a first type corresponding to a bare face in which occlusion does not exist, and at least one of a forehead, an eyebrow, and at least a part of an eye among the user's face regions by a hat, etc. It may include a second type that is covered, a third type in which an eye area is covered among the user's face areas by sunglasses, etc., and a fourth type in which at least one of a mouth and a nose in the user's face area is covered by a mask or the like.

단계(130)에서, 영상 처리 장치는 예를 들어, 단계(120)에서 검출한 얼굴 영역에서 눈썹, 눈, 코, 및 입 영역이 모두 검출되는 경우, 얼굴 영역에 대한 가려짐의 유형을 맨 얼굴에 해당하는 제1 유형으로 분류할 수 있다. 영상 처리 장치는 예를 들어, 얼굴 영역 중 눈 영역의 적어도 일부, 코 및 입 영역이 검출되고, 이마, 눈썹 및 눈의 적어도 일부 중 어느 하나의 영역이 검출되지 않는 경우, 가려짐의 유형을 모자에 의해 눈썹 및/또는 눈이 가려진 제2 유형으로 분류할 수 있다. 영상 처리 장치는 예를 들어, 얼굴 영역 중 코 및 입 영역이 검출되고, 눈 영역이 검출되지 않는 경우, 가려짐의 유형을 선글라스에 의해 눈이 가려진 제3 유형으로 분류할 수 있다. 이 밖에도, 영상 처리 장치는 예를 들어, 얼굴 영역 중 눈썹 및 눈 영역이 검출되고, 코 및 입 중 적어도 하나의 영역이 검출되지 않는 경우, 가려짐의 유형을 마스크에 의해 코 및/또는 입이 가려진 제4 유형으로 분류할 수 있다. In operation 130 , for example, when all of the eyebrows, eyes, nose, and mouth regions are detected in the face region detected in operation 120 , the image processing apparatus determines the type of occlusion of the face region as a bare face. It can be classified into the first type corresponding to . For example, when at least a part of an eye region, a nose, and a mouth region of a face region are detected, and any one region of a forehead, an eyebrow, and at least a part of an eye region is not detected, the image processing apparatus may determine the type of occlusion. It can be classified as a second type in which the eyebrows and/or eyes are covered by For example, when the nose and mouth regions are detected among the face regions and the eye region is not detected, the image processing apparatus may classify the occlusion type as a third type in which the eyes are covered by sunglasses. In addition, when, for example, eyebrows and eye regions are detected among the face regions and at least one of the nose and mouth is not detected, the image processing apparatus determines the type of occlusion by the mask. It can be classified as a hidden fourth type.

또는 실시예에 따라서, 가려짐의 유형은 가려짐이 존재하지 않는 맨 얼굴(bare face)에 해당하는 A 유형, 및 사용자의 얼굴 영역 중 선글라스에 의해 눈 영역이 가려진 B 유형 중 적어도 하나로 구분될 수도 있다. 가려짐의 유형이 맨 얼굴과 선글라스에 의해 가려진 2가지 유형으로 구분되는 경우에 사용자의 눈을 추적하는 방법은 아래의 도 6 및 도 7을 참조하여 구체적으로 설명한다.Alternatively, according to an embodiment, the type of occlusion may be divided into at least one of type A, which corresponds to a bare face in which no occlusion exists, and type B, in which an eye region is covered by sunglasses among the user's face regions. have. A method of tracking a user's eyes when the type of occlusion is divided into two types, a bare face and a type covered by sunglasses, will be described in detail with reference to FIGS. 6 and 7 below.

단계(140)에서, 영상 처리 장치는 단계(130)에서 분류된 가려짐의 유형에 기초하여 사용자의 눈을 추적(tracking)한다. 예를 들어, 얼굴 영역에 가려짐이 존재하지 않는 경우, 영상 처리는 예를 들어, 눈 및 코 형상에 해당하는 영역의 영상 특징을 이용하여 사용자 눈에서 동공의 중심을 찾고, 동공의 중심에 따라 사용자의 눈을 추적할 수 있다. 이와 달리, 선글라스나 모자 등에 의해 사용자의 눈이 가려진 경우 눈의 추적이 불가능하므로 가려짐의 유형에 따라 사용자의 눈을 추적하는 방법이 달라질 수 있다. 영상 처리 장치는 단계(130)에서 분류된 가려짐의 유형에 기초하여 얼굴 영역의 적어도 일부 형상에 서로 다른 개수의 정점들을 정렬함으로써 사용자의 눈을 추적할 수 있다. 영상 처리 장치는 예를 들어, 얼굴 영역 내 영상 정보에 기초하여, 얼굴 영역의 적어도 일부 형상에 대하여 서로 다른 개수의 정점들을 이동시켜 정렬할 수 있다. 여기서, '정점들'은 예를 들어, 눈, 코, 입, 얼굴 윤곽 등과 같이 얼굴의 특징을 나타내는 키 포인트(key point)에 해당하는 특징점들일 수 있다. 정점들은 예를 들어, 도 4에 도시된 것과 같이 점(ㆍ)의 형태로 표시될 수도 있고, 별표(*) 또는 그 밖의 다양한 형태로 표시될 수도 있다. In operation 140 , the image processing apparatus tracks the user's eyes based on the type of occlusion classified in operation 130 . For example, when there is no occlusion in the face region, image processing finds the center of the pupil in the user's eye using, for example, image features of the region corresponding to the shape of the eyes and nose, and according to the center of the pupil It can track the user's eyes. Contrary to this, when the user's eyes are covered by sunglasses or a hat, it is impossible to track the user's eyes, so a method of tracking the user's eyes may vary depending on the type of occlusion. The image processing apparatus may track the user's eyes by aligning different numbers of vertices to at least some shapes of the face region based on the type of occlusion classified in operation 130 . The image processing apparatus may move and align a different number of vertices with respect to at least a partial shape of the face region, for example, based on image information within the face region. Here, the 'vertices' may be feature points corresponding to key points representing facial features, such as, for example, eyes, nose, mouth, and face outline. Vertices may be indicated, for example, in the form of a dot (·) as shown in FIG. 4 , an asterisk (*) or other various forms.

영상 처리 장치는 예를 들어, 초기 모양 구성으로부터 학습한 하강 벡터(descent vector)를 이용하여 정점들을 이미지의 모양에 정렬하는 SDM(Supervised Descent Method) 기법, 모양(shape)과 모양의 주요 구성 요소 분석(principal component analysis; PCA)에 기반하여 정점들을 정렬하는 ASM(Active Shape Model) 기법, AAM(Active Appearance Model) 기법 또는 CLM(Constrained Local Models) 기법 등에 의해 영상 프레임의 얼굴 영역으로부터 사용자의 눈썹, 눈, 코, 및 입 등에 해당하는 복수의 특징 부위들의 위치를 인식할 수 있다. The image processing device, for example, uses a descent vector learned from the initial shape configuration to align the vertices to the shape of the image using the SDM (Supervised Descent Method) technique, and analyzes the shape and major components of the shape. Based on principal component analysis (PCA), the user's eyebrows and eyes are extracted from the face area of the image frame by the Active Shape Model (ASM) technique, the Active Appearance Model (AAM) technique, or the Constrained Local Models (CLM) technique that aligns the vertices. , the position of a plurality of characteristic parts corresponding to the nose, mouth, and the like may be recognized.

영상 처리 장치는 단계(130)에서 분류된 가려짐의 유형에 기초하여 인식된 복수의 특징 부위들의 위치로 서로 다른 개수의 정점들을 이동시켜 정렬할 수 있다. 예를 들어, 영상 프레임이 초기 영상 프레임인 경우, 정렬 전 복수의 정점들은 복수의 사용자들의 특징 부위들의 평균 위치에 대응할 수 있다. 또한, 영상 프레임이 초기 영상 프레임이 아닌 경우, 정렬 전 복수의 정점들은 이전 영상 프레임에 기초하여 정렬된 복수의 정점들에 대응할 수 있다. 영상 처리 장치가 사용자의 눈을 추적하는 방법은 아래의 도 2 내지 도 4를 참조하여 구체적으로 설명한다. The image processing apparatus may move and align a different number of vertices to positions of a plurality of recognized feature regions based on the type of occlusion classified in operation 130 . For example, when the image frame is an initial image frame, a plurality of vertices before alignment may correspond to an average position of feature portions of a plurality of users. Also, when the image frame is not the initial image frame, the plurality of vertices before alignment may correspond to the plurality of vertices aligned based on the previous image frame. A method for the image processing apparatus to track the user's eyes will be described in detail with reference to FIGS. 2 to 4 below.

실시예에 따라서, 영상 처리 장치는 정렬된 정점들에 의한 눈의 추적이 성공인지 여부를 체크할 수 있다. 영상 처리 장치는 사용자의 눈에 대한 추적이 실패했다는 판단에 따라, 얼굴 영역을 재검출하고, 재검출된 얼굴 영역에 기초하여, 가려짐의 유형을 분류하는 단계(130), 및 사용자의 눈을 추적하는 단계(140)를 반복적으로 수행하여 사용자의 눈에 대한 추적이 성공하도록 할 수 있다. 예를 들어, 사용자의 눈에 대한 추적이 실패한 경우의 영상 처리 장치의 동작은 아래의 도 5를 참조하여 구체적으로 설명한다. According to an embodiment, the image processing apparatus may check whether or not tracking of the eye by the aligned vertices is successful. The image processing apparatus re-detects the face region according to the determination that the tracking of the user's eyes has failed, and classifies the type of occlusion based on the re-detected face region ( 130 ), and By repeatedly performing the tracking 140 , the tracking of the user's eyes may be successful. For example, an operation of the image processing apparatus when the user's eye tracking fails will be described in detail with reference to FIG. 5 below.

단계(150)에서, 영상 처리 장치는 단계(140)에서 추적한 눈의 위치를 포함하는 얼굴 영역에 대한 정보를 출력한다. 단계(150)에서, 영상 처리 장치는 단계(140)에서 추적한 눈 또는 눈동자의 위치 이외에도, 눈동자의 위치에 의한 시점(view point), 및 사용자의 얼굴 표정 등에 대한 정보를 출력할 수 있다. 이때, 얼굴 영역에 대한 정보는 예를 들어, 스캔 영역에 포함된 눈동자 및 코의 위치, 눈동자의 위치에 의한 시점, 스캔 영역에 표현되는 사용자의 얼굴 표정 등을 포함할 수 있다. 영상 처리 장치는 얼굴 영역에 관한 정보를 명시적(explicitly)으로 출력할 수도 있고, 암시적(implicitly)으로 출력할 수도 있다. '얼굴 영역에 대한 정보를 명시적으로 출력'하는 것은 예를 들어, 얼굴 영역에 포함된 눈동자의 위치 및/또는 얼굴 영역에 표현되는 표정을 디스플레이 패널의 화면 상에 표시하거나, 및/또는 오디오로 출력하는 것 등을 포함할 수 있다. 또는 '얼굴 영역에 대한 정보를 암시적으로 출력'하는 것은 예를 들어, 얼굴 영역에 포함된 눈동자의 위치, 눈동자의 위치에 의한 시점 등에 의해 HUD(Head Up Display)에 표시되는 영상을 조정하거나, 또는 얼굴 영역에 표현되는 표정에 대응하는 서비스를 제공하는 것 등을 포함할 수 있다.In operation 150 , the image processing apparatus outputs information on the face region including the position of the eyes tracked in operation 140 . In operation 150 , the image processing apparatus may output information on a view point based on the position of the pupil and information on the user's facial expression, in addition to the position of the eye or pupil tracked in operation 140 . In this case, the information on the face region may include, for example, positions of pupils and noses included in the scan region, a viewpoint based on the position of the pupils, a user's facial expression expressed in the scan region, and the like. The image processing apparatus may output information about the face region explicitly or implicitly. 'Explicitly outputting information on the face region' means, for example, displaying the position of the pupil included in the face region and/or the expression expressed in the face region on the screen of the display panel, and/or as audio. It may include outputting, etc. Alternatively, 'implicitly outputting information on the face region' means adjusting the image displayed on the HUD (Head Up Display) by, for example, the position of the pupil included in the face region, a viewpoint based on the position of the pupil, or Alternatively, it may include providing a service corresponding to an expression expressed in the face region, and the like.

단계(150)에서, 영상 처리 장치는 단계(140)에서 추적한 눈의 위치를 기초로, 광 필드 렌더링을 수행함으로써 단계(140)에서 추적한 눈의 위치에 대응하는 영상을 재생할 수 있다. 실시예에 따라서, 단계(130)에서 분류된 가려짐의 유형이 제2 유형 내지 제4 유형 중 어느 한 유형에 해당하는 경우, 영상 처리 장치는 단계(140)에서 추적한 눈의 위치를 포함하는 얼굴 영역에 대해 글로우 효과(glow effect)를 적용한 영상을 출력할 수 있다. 예를 들어, 가려짐의 유형이 제2 유형 또는 제3 유형과 같이 눈이 가려진 경우, 눈의 위치는 추정된 것이므로 정밀도가 낮을 수 있다. 따라서, 영상 처리 장치는 가려짐의 유형이 제2 유형 또는 제3 유형에 해당하는 경우, 추정된 눈의 위치에 대해 낮은 크로스토크의 3D 컨텐츠를 표시하여 글로우 효과를 적용함으로써 정밀도를 높일 수 있다. In operation 150 , the image processing apparatus may reproduce an image corresponding to the eye position tracked in operation 140 by performing light field rendering based on the eye position tracked in operation 140 . According to an embodiment, when the type of occlusion classified in operation 130 corresponds to any one of the second to fourth types, the image processing apparatus includes the eye position tracked in operation 140 . An image to which a glow effect is applied to the face region may be output. For example, when the type of occlusion is the second type or the third type, when the eyes are obscured, the position of the eyes is estimated and thus the precision may be low. Accordingly, when the occlusion type corresponds to the second type or the third type, the image processing apparatus may display 3D content having low crosstalk with respect to the estimated eye position and apply the glow effect to increase precision.

도 2는 일 실시예에 따라 얼굴 영역에 대한 가려짐의 유형을 분류하는 방법을 설명하기 위한 도면이고, 도 3은 일 실시예에 따른 가려짐의 유형에 기초하여 사용자의 눈을 추적하는 방법을 나타낸 흐름도이다. 2 is a diagram for explaining a method of classifying a type of occlusion for a face region according to an embodiment, and FIG. 3 is a method of tracking a user's eyes based on the type of occlusion according to an embodiment. It is a flowchart shown.

도 2 및 도 3을 참조하면, 일 실시예에 따른 영상 처리 장치는 미리 학습된 분류기(classifier)를 이용하여 얼굴 영역에 대한 가려짐의 유형을 분류할 수 있다. 이때, 분류기는 예를 들어, 학습 영상 DB에 저장된 각 얼굴 영역의 적어도 일부 형상에 가려짐의 유형에 따라 서로 다른 개수의 정점들을 정렬시킨 후 정렬된 각각의 정점들로부터 추출된 SIFT 특징을 학습한 분류기일 수 있다. 분류기는 예를 들어, 서포트 벡터 머신(Support Vector Machine; SVM) 분류기일 수 있다. 2 and 3 , the image processing apparatus according to an embodiment may classify the type of occlusion of the face region using a pre-learned classifier. At this time, for example, the classifier aligns a different number of vertices according to the type of occlusion of at least some shapes of each face region stored in the training image DB, and then learns the SIFT features extracted from the aligned vertices. It can be a classifier. The classifier may be, for example, a Support Vector Machine (SVM) classifier.

영상 처리 장치는 예를 들어, 가려짐이 존재하지 않는 맨 얼굴(bare face)의 얼굴 영역(210)에 대한 가려짐의 유형을 제1 유형으로 분류할 수 있다. 영상 처리 장치는 예를 들어, 모자에 의해 이마, 눈썹 및 눈의 적어도 일부가 가려진 얼굴 영역(220)에 대한 가려짐의 유형을 제2 유형으로 분류할 수 있다. 영상 처리 장치는 선글라스에 의해 눈 영역이 가려진 얼굴 영역(230)에 대한 가려짐의 유형을 제3 유형으로 분류할 수 있다. 또한, 영상 처리 장치는 마스크에 의해 입 또는 입과 코가 가려진 얼굴 영역(240)에 대한 가려짐의 유형을 제4 유형으로 분류할 수 있다. The image processing apparatus may classify, for example, a type of occlusion with respect to the face region 210 of a bare face in which occlusion does not exist, as the first type. The image processing apparatus may classify a type of occlusion with respect to the face region 220 in which at least a portion of the forehead, eyebrows, and eyes are covered by, for example, a hat as the second type. The image processing apparatus may classify an occlusion type with respect to the face region 230 in which the eye region is covered by sunglasses as a third type. Also, the image processing apparatus may classify an occlusion type with respect to the face region 240 in which the mouth or the mouth and nose are covered by the mask as a fourth type.

가려짐의 유형이 제1 유형인 경우, 영상 처리 장치는 단계(310)와 같이 얼굴 영역(210)의 일부 형상에 제1 개수의 정점들을 정렬함으로써 사용자의 눈을 추적할 수 있다. 가려짐의 유형이 제1 유형인 경우, 영상 처리 장치는 얼굴 영역을 검출하는 검출 박스의 크기를, 검출 박스(250)과 같이 얼굴 영역 중 눈 및 코의 형상에 대응하도록 축소할 수 있다. 영상 처리 장치는 축소된 검출 박스(250)에 의해 눈 및 코의 형상을 크롭핑(cropping)할 수 있다. 영상 처리 장치는 크롭핑된 눈 및 코의 형상에 제1 개수(예를 들어, 11개)의 정점들을 정렬하여 사용자의 눈 및 코를 추적할 수 있다(260). 영상 처리 장치가 제1 개수의 정점들을 정렬하는 방법은 아래의 도 4의 도면(410)을 참조하여 구체적으로 설명한다. When the occlusion type is the first type, the image processing apparatus may track the user's eyes by aligning the first number of vertices to the partial shape of the face region 210 as in operation 310 . When the occlusion type is the first type, the image processing apparatus may reduce the size of the detection box for detecting the face region to correspond to the shape of the eyes and nose among the face regions like the detection box 250 . The image processing apparatus may crop the shape of the eyes and nose by the reduced detection box 250 . The image processing apparatus may track the user's eyes and nose by aligning a first number (eg, 11) of vertices to the cropped shapes of the eyes and nose ( 260 ). A method in which the image processing apparatus aligns the first number of vertices will be described in detail with reference to the drawing 410 of FIG. 4 below.

단계(310)에서, 영상 처리 장치는 11개의 정점들을 이용하여 사용자의 눈과 코에 해당하는 얼굴 영역의 정렬 결과를 검증(check)하여 사용자의 눈을 추적할 수 있다. 영상 처리 장치는 검출 박스(250)에 의한 스캔 영역 내 영상 정보에 기초하여, 스캔 영역 내에서 눈과 코에 해당하는 얼굴 영역의 정렬 결과를 검증할 수 있다. 영상 처리 장치는 예를 들어, SIFT(Scale Invariant Feature Transform) 특징에 기초하는 검증기를 이용하여 스캔 영역이 눈 및 코에 해당하는지 여부를 검증할 수 있다. 여기서, 'SIFT 특징'은 다음의 두 단계를 거쳐 획득할 수 있다. 영상 처리 장치는 예를 들어, 스캔 영역의 영상 데이터를 영상 피라미드에 의한 스케일 공간에서 영상의 밝기가 지역적으로 극대 또는 극소인 후보 정점들을 추출하고, 명암비가 낮은 정점을 필터링하여 영상 정합에 사용할 정점을 선별할 수 있다. 영상 처리 장치는 선별된 정점들을 중심으로 주변 영역의 그래디언드(gradient)를 통해 방향 성분을 얻고, 얻어진 방향 성분을 중심으로 관심 영역을 재설정하여 정점의 크기를 검출하여 서술자를 생성할 수 있다. 여기서, 서술자가 SIFT 특징에 해당할 수 있다. In operation 310, the image processing apparatus may track the user's eyes by checking the alignment result of the face region corresponding to the user's eyes and nose using 11 vertices. The image processing apparatus may verify the alignment result of the face region corresponding to the eyes and the nose in the scan region, based on the image information in the scan region by the detection box 250 . The image processing apparatus may verify whether the scan area corresponds to the eye and the nose by using, for example, a verifier based on a Scale Invariant Feature Transform (SIFT) feature. Here, the 'SIFT feature' can be obtained through the following two steps. The image processing apparatus extracts, for example, candidate vertices having a locally maximum or minimum image brightness in the scale space by the image pyramid from the image data of the scan region, and filters the vertices with a low contrast ratio to select the vertices to be used for image registration. can be selected The image processing apparatus may generate a descriptor by obtaining a direction component through a gradient of a peripheral area based on the selected vertices, resetting the ROI based on the obtained direction component, and detecting the size of the vertex. Here, the descriptor may correspond to the SIFT feature.

영상 처리 장치는 예를 들어, 분류된 가려짐의 유형이 제1 유형인 경우, 축소된 검출 박스(250)에 의해 크롭핑된 눈 및 코의 형상에 제1 개수의 정점들을 정렬하여 사용자의 눈 및 코를 추적하는 별도의 모듈을 포함할 수 있으며, 해당 모듈을 '맨 얼굴 추적기(Bare Face Tracker)'라고 부를 수 있다.For example, when the classified type of occlusion is the first type, the image processing apparatus aligns the first number of vertices to the shapes of the eyes and nose cropped by the reduced detection box 250 to align the user's eyes and a separate module for tracking the nose, and the module may be referred to as a 'bare face tracker'.

이외 달리, 가려짐의 유형이 제2 유형 내지 제4 유형 중 어느 한 유형인 경우, 영상 처리 장치는 단계(320)와 같이 얼굴 영역의 전체 형상에 제1 개수보다 많은 제2 개수의 정점들을 정렬함으로써 사용자의 눈을 추적할 수 있다. Otherwise, when the occlusion type is any one of the second to fourth types, the image processing apparatus aligns the second number of vertices, which is greater than the first number, in the overall shape of the face region as in step 320 . This allows tracking of the user's eyes.

단계(320)에서, 영상 처리 장치는 얼굴 영역을 검출하는 검출 박스에 의해 얼굴 영역의 전체 형상에 제2 개수(예를 들어, 98개)의 정점들을 정렬하고, 정렬된 제2 개수의 정점들에 의해 얼굴 영역에서 사용자의 동공의 위치를 추정(estimate)할 수 있다. 가려짐의 유형이 제2 유형 내지 제4 유형 중 어느 한 유형인 경우, 영상 처리 장치는 얼굴 전체를 추적할 수 있다(270). 영상 처리 장치가 제2 개수의 정점들을 정렬하는 방법은 아래의 도 4의 도면(430)을 참조하여 구체적으로 설명한다. In operation 320, the image processing apparatus aligns a second number (eg, 98) of vertices to the overall shape of the face region by a detection box for detecting the face region, and the aligned second number of vertices It is possible to estimate the position of the user's pupil in the face region. When the occlusion type is any one of the second to fourth types, the image processing apparatus may track the entire face ( 270 ). A method in which the image processing apparatus aligns the second number of vertices will be described in detail with reference to the drawing 430 of FIG. 4 below.

또한, 일 실시예에 따른 영상 처리 장치는 예를 들어, 분류된 가려짐의 유형이 제3 유형인 경우, 얼굴 영역의 전체 형상에 제2 개수의 정점들을 정렬하고, 정렬된 제2 개수의 정점들에 의해 얼굴 영역에서 사용자의 동공의 위치를 추정하는 별도의 모듈을 포함할 수 있으며, 해당 모듈을 '선글라스 추적기(Sunglasses Tracker)'라고 부를 수 있다. Also, the image processing apparatus according to an embodiment aligns the second number of vertices to the overall shape of the face region, for example, when the classified type of occlusion is the third type, and the aligned second number of vertices A separate module for estimating the position of the user's pupil in the face region may be included, and the module may be called a 'sunglasses tracker'.

도 4는 일 실시예에 따른 영상 처리 장치가 가려짐의 유형에 기초하여, 서로 다른 개수의 정점들을 정렬하는 방법을 설명하기 위한 도면이다. 도 4를 참조하면, 예를 들어, 가려짐의 유형이 제1 유형으로 분류된 경우에 정렬된 정점들을 나타낸 도면(410) 및 가려짐의 유형이 제3 유형으로 분류된 경우에 정렬된 정점들을 나타낸 도면(430)이 도시된다. 이하, 설명의 편의를 위하여 제2 유형 내지 제4 유형 중 하나에 해당하는 제3 유형을 중심으로 정점들을 정렬하는 방법을 설명하지만, 가려짐의 유형이 제2 유형 또는 제4 유형인 경우에도 제3 유형과 마찬가지로 정점들을 정렬할 수 있다. 4 is a diagram for describing a method of aligning different numbers of vertices based on a type of occlusion by an image processing apparatus according to an exemplary embodiment. Referring to FIG. 4 , for example, a diagram 410 showing aligned vertices when the type of occlusion is classified as the first type and vertices aligned when the type of occlusion is classified as the third type. An illustrated diagram 430 is shown. Hereinafter, for convenience of explanation, a method of arranging vertices around a third type corresponding to one of the second to fourth types will be described. As with type 3, vertices can be aligned.

예를 들어, 분류된 가려짐의 유형이 제1 유형인 경우, 영상 처리 장치는 도면(410)과 같이 얼굴 영역 중 눈 및 코에 해당하는 일부 형상에 제1 개수(예를 들어, 11개)의 정점들을 정렬함으로써 사용자의 눈을 추적할 수 있다. 이때, 11개의 정점들은 예를 들어, 도면(410)과 같이 오른쪽 눈에 대응하는 3개의 정점들, 왼쪽 눈에 대응하는 3개의 정점들, 미간의 1개의 정점, 코 끝 및 코의 윤곽에 해당하는 4개의 정점들에 해당할 수 있으며, 반드시 이에 한정되지는 않는다. For example, when the classified type of occlusion is the first type, the image processing apparatus may include a first number (eg, 11) in some shapes corresponding to the eyes and nose among the face regions as shown in the drawing 410 . By aligning the vertices of , we can track the user's eyes. In this case, the 11 vertices correspond to, for example, three vertices corresponding to the right eye, three vertices corresponding to the left eye, one vertex between the brows, the tip of the nose, and the contour of the nose, as shown in the figure 410 . may correspond to four vertices, but is not necessarily limited thereto.

또는 분류된 가려짐의 유형이 제3 유형인 경우, 영상 처리 장치는 도면(430)과 같이 얼굴 영역의 전체 형상에 제2 개수(예를 들어, 98개)의 정점들을 정렬함으로써 사용자의 눈을 추적할 수 있다. 이때, 98개의 정점들은 예를 들어, 도면(430)과 같이 사용자의 오른쪽 눈썹 및 오른쪽 눈의 형상에 대응하는 19개의 정점들, 왼쪽 눈썹 및 왼쪽 눈의 형상에 대응하는 19개의 정점들, 코의 형상에 대응하는 12개의 정점들, 입의 형상에 대응하는 15개의 정점들 및 얼굴의 외곽선 형상에 대응하는 33개의 정점들에 해당할 수 있으며, 반드시 이에 한정되지는 않는다. 영상 처리 장치는 정렬된 98개의 정점들에 의해 얼굴 영역에서 사용자의 동공의 위치를 추정함으로써 사용자의 눈을 추적할 수 있다. Alternatively, when the classified type of occlusion is the third type, the image processing apparatus closes the user's eyes by aligning the second number (eg, 98) of vertices to the overall shape of the face region as shown in the figure 430 . can be tracked In this case, the 98 vertices are, for example, 19 vertices corresponding to the shapes of the user's right eyebrow and right eye, 19 vertices corresponding to the shapes of the left eyebrow and left eye, and the nose as shown in the figure 430 . It may correspond to 12 vertices corresponding to the shape, 15 vertices corresponding to the shape of the mouth, and 33 vertices corresponding to the outline shape of the face, but is not limited thereto. The image processing apparatus may track the user's eyes by estimating the position of the user's pupil in the face region based on the aligned 98 vertices.

도 5는 다른 실시예에 따라 사용자의 눈을 추적하는 방법을 나타낸 흐름도이다. 도 5를 참조하면, 일 실시예에 따른 영상 처리 장치는 사용자의 눈에 대한 추적이 실패한 경우에 단계(510) 내지 단계(550)을 통해 얼굴 영역을 재검출하여 사용자의 눈에 대한 추적이 성공하도록 할 수 있다. 5 is a flowchart illustrating a method of tracking a user's eyes according to another exemplary embodiment. Referring to FIG. 5 , when the tracking of the user's eyes fails, the image processing apparatus according to an embodiment re-detects the face region through steps 510 to 550 to succeed in tracking the user's eyes. can make it

이하, 설명의 편의를 위하여 가려짐의 유형이 제1 유형으로 분류된 경우에 눈을 추적하는 방법을 설명하지만, 가려짐의 유형이 제2 유형 내지 제4유형으로 분류된 경우에도 마찬가지의 방법이 적용될 수 있다. Hereinafter, for convenience of explanation, a method of tracking eyes when the type of occlusion is classified as the first type will be described, but the same method is applied even when the type of occlusion is classified as the second type to the fourth type. can be applied.

단계(510)에서, 영상 처리 장치는 제1 개수의 정점들을 얼굴 영역의 일부 형상에 정렬할 수 있다. 영상 처리 장치는 영상 프레임의 얼굴 영역으로부터 예를 들어, 사용자의 눈, 및 코 형상에 해당하는 위치를 인식하고, 제1 개수의 정점들을 인식된 위치로 이동시켜 정렬할 수 있다.In operation 510 , the image processing apparatus may align the first number of vertices to a partial shape of the face region. The image processing apparatus may recognize positions corresponding to, for example, the shape of the user's eyes and nose from the face region of the image frame, and move and align the first number of vertices to the recognized positions.

단계(520)에서, 영상 처리 장치는 단계(510)에서 제1 개수의 정점들을 정렬한 결과를 검증할 수 있다. 영상 처리 장치는 영상 프레임의 눈 및/또는 코 영역 내 영상 정보를 이용하여, 정점들이 정렬된 영역이 실제 눈 및/또는 코에 해당하는지를 검증할 수 있다. In operation 520 , the image processing apparatus may verify a result of aligning the first number of vertices in operation 510 . The image processing apparatus may verify whether the region in which the vertices are aligned corresponds to the actual eye and/or nose by using image information in the eye and/or nose region of the image frame.

단계(530)에서, 영상 처리 장치는 단계(520)의 검증 결과에 따라 사용자의 눈이 추적되는지 여부를 결정할 수 있다. In operation 530 , the image processing apparatus may determine whether the user's eyes are tracked according to the verification result of operation 520 .

단계(530)에서 사용자의 눈 추적에 성공한 경우, 영상 처리 장치는 단계(540)에서, 눈의 위치를 출력할 수 있다(540). 이때, 눈의 위치는 예를 들어, 2차원의 눈의 위치 좌표에 해당할 수도 있고, 3차원의 눈의 위치 좌표에 해당할 수도 있다. If the user's eye tracking is successful in operation 530, the image processing apparatus may output the eye position in operation 540 (operation 540). In this case, the position of the eye may correspond to, for example, two-dimensional eye position coordinates or may correspond to a three-dimensional eye position coordinate.

단계(530)에서 사용자의 눈 추적에 실패한 경우, 영상 처리 장치는 단계(550)에서, 영상 프레임으로부터 얼굴 영역을 재검출할 수 있다. 이후, 영상 처리 장치는 단계(550)에서 재검출된 얼굴 영역에 대해 단계(510) 내지 단계(530)를 수행하여 눈을 추적할 수 있다. If the user's eye tracking fails in operation 530 , the image processing apparatus may re-detect the face region from the image frame in operation 550 . Thereafter, the image processing apparatus may track the eyes by performing operations 510 to 530 on the face region re-detected in operation 550 .

도 6은 다른 실시예에 따른 영상 처리 방법을 설명하기 위한 도면이다. 도 6을 참조하면, 일 실시예에 따른 영상 처리 장치가 맨 얼굴 추적기(620) 및 선글라스 추적기(640)에 의해 가려짐의 유형에 따라 눈의 좌표를 출력하는 과정이 도시된다. 6 is a diagram for describing an image processing method according to another exemplary embodiment. Referring to FIG. 6 , a process of outputting the coordinates of the eyes according to the type of occlusion by the bare face tracker 620 and the sunglasses tracker 640 by the image processing apparatus according to an exemplary embodiment is illustrated.

단계(605)에서, 영상 처리 장치는 영상 프레임을 획득할 수 있다. In operation 605, the image processing apparatus may acquire an image frame.

단계(610)에서, 영상 처리 장치는 단계(605)에서 획득한 영상 프레임에서 사용자의 얼굴 영역을 검출할 수 있다. 이때, 영상 처리 장치는 예를 들어, 영상 프레임에서 스캔 영역을 설정하는 검출 박스 또는 스캔 윈도우(scan window)에 의해 얼굴 전체 영역을 검출할 수 있다. 실시예에 따라서, 영상 처리 장치는 예를 들어, 검출 박스 또는 스캔 윈도우에 의해 얼굴 중 눈 및 코에 해당하는 영역을 검출하거나, 또는 선글라스에 해당하는 영역을 검출할 수도 있다. In operation 610 , the image processing apparatus may detect the user's face region from the image frame obtained in operation 605 . In this case, the image processing apparatus may detect the entire face area by, for example, a detection box or a scan window that sets a scan area in the image frame. According to an embodiment, the image processing apparatus may detect a region corresponding to eyes and a nose of a face or a region corresponding to sunglasses by using, for example, a detection box or a scan window.

단계(615)에서, 영상 처리 장치는 단계(610)에서 검출된 얼굴 영역에 대한 가려짐의 유형을 분류할 수 있다.In operation 615 , the image processing apparatus may classify the type of occlusion of the face region detected in operation 610 .

예를 들어, 단계(610)에서 검출된 얼굴 영역이 맨 얼굴에 해당하여 단계(615)에서 가려짐의 유형이 제1 유형으로 분류되었다고 하자. 이 경우, 영상 처리 장치는 맨 얼굴 추적기(620)에 의해 단계(621) 내지 단계(625)의 과정을 수행함으로써 맨 얼굴에서 눈을 추적하여 눈의 좌표를 출력할 수 있다. For example, suppose that the face region detected in step 610 corresponds to a bare face, so that the occlusion type is classified as the first type in step 615 . In this case, the image processing apparatus may output the coordinates of the eyes by tracking the eyes in the bare face by performing the processes of steps 621 to 625 by the bare face tracker 620 .

보다 구체적으로, 단계(621)에서, 영상 처리 장치는 축소된 검출 박스에 의해 얼굴 영역 중 크롭핑된 눈 및 코의 형상에 제1 개수(예를 들어, 11개)의 정점들을 정렬하여 사용자의 눈 및 코를 추적할 수 있다.More specifically, in step 621, the image processing apparatus aligns the first number (eg, 11) vertices to the shapes of the eyes and nose cropped in the face region by the reduced detection box to determine the user's Eyes and nose can be tracked.

단계(623)에서, 영상 처리 장치는 단계(621)에서 이용한 11개의 정점들에 대응하는 SIFT 특징에 의해 눈 또는 눈과 코가 제대로 추적되었는지를 검증할 수 있다. 단계(623)에서, 영상 처리 장치는 예를 들어, SIFT 특징 중 코보다 눈에 대한 가중치를 높게 주어 눈이 제대로 추적되었는지를 검증할 수 있다. 영상 처리 장치는 전술한 서포트 벡터 머신(SVM)을 이용하여 검증을 수행할 수 있다. In operation 623 , the image processing apparatus may verify whether the eye or the eye and nose are properly tracked by the SIFT feature corresponding to the eleven vertices used in operation 621 . In operation 623 , the image processing apparatus may verify whether the eye is properly tracked by, for example, giving a higher weight to the eye than the nose among the SIFT features. The image processing apparatus may perform verification using the aforementioned support vector machine (SVM).

단계(623)에서 눈이 제대로 추적되었다고 검증되면, 단계(625)에서 영상 처리 장치는 추적된 눈의 위치에 대응하는 눈의 좌표를 출력할 수 있다. 단계(623)에서 눈이 제대로 추적되지 않았다고 검증되면, 영상 처리 장치는 단계(610)을 통해 얼굴 영역을 재검출할 수 있다. 영상 처리 장치는 재검출된 얼굴 영역에서 가려짐의 유형을 다시 분류하여 사용자의 눈을 추적할 수 있다. If it is verified that the eye is properly tracked in operation 623 , the image processing apparatus may output eye coordinates corresponding to the tracked eye position in operation 625 . If it is verified in operation 623 that the eyes are not properly tracked, the image processing apparatus may re-detect the face region in operation 610 . The image processing apparatus may track the user's eyes by reclassifying the type of occlusion in the redetected face region.

이와 달리, 단계(610)에서 검출된 얼굴 영역이 선글라스에 의해 눈 영역이 가려진 얼굴에 해당하여 단계(615)에서 가려짐의 유형이 제3 유형으로 분류되었다고 하자. 이 경우, 영상 처리 장치는 선글라스 추적기(640)에 의해 단계(641) 내지 단계(645)의 과정을 수행함으로써 선글라스를 착용한 얼굴에서 눈의 위치를 추정하여 눈의 좌표를 출력할 수 있다. Alternatively, suppose that the face region detected in step 610 corresponds to a face whose eye region is covered by sunglasses, and thus the type of occlusion is classified as the third type in step 615 . In this case, the image processing apparatus may output the coordinates of the eyes by estimating the positions of the eyes on the face wearing the sunglasses by performing the processes of steps 641 to 645 by the sunglasses tracker 640 .

보다 구체적으로, 단계(641)에서, 영상 처리 장치는 얼굴 영역의 전체 형상에 제2 개수(예를 들어, 98개)의 정점들을 정렬하고, 정렬된 제2 개수의 정점들에 의해 얼굴 영역에서 사용자의 동공의 위치를 추정함으로써 사용자의 눈을 추적할 수 있다. 단계(641)에서, 영상 처리 장치는 예를 들어, 다량의 영상들을 포함하는 데이터베이스를 이용한 기계 학습을 통해 사용자의 눈, 보다 구체적으로는 동공의 위치를 추정할 수 있다. 이때, 영상 처리 장치는 추정한 눈의 위치에 가중치를 두어 정밀도를 향상시킬 수 있다.More specifically, in operation 641, the image processing apparatus aligns a second number (eg, 98) of vertices to the overall shape of the face region, and uses the aligned second number of vertices in the face region. By estimating the location of the user's pupil, the user's eyes can be tracked. In operation 641 , the image processing apparatus may estimate the position of the user's eye, more specifically, the pupil through machine learning using a database including a large amount of images, for example. In this case, the image processing apparatus may improve precision by weighting the estimated eye position.

단계(643)에서, 영상 처리 장치는 단계(641)에서 제2 개수의 정점들을 이용하여 추정한 사용자의 눈(또는 동공)의 위치가 정확한지를 검증할 수 있다. 단계(643)에서, 영상 처리 장치는 예를 들어, 선글라스에 해당하는 정점들을 제외하고, 선글라스를 제외한 나머지 영역에 해당하는 정점들을 이용하여 추정한 사용자의 눈동공의 위치가 정확한지를 검증할 수 있다. In operation 643 , the image processing apparatus may verify whether the position of the user's eye (or pupil) estimated by using the second number of vertices in operation 641 is correct. In operation 643 , the image processing apparatus may verify whether the user's eye pupil position estimated by using vertices corresponding to regions other than sunglasses except for vertices corresponding to sunglasses is correct, for example. .

단계(643)에서 추정한 사용자의 눈(또는 동공)의 위치가 정확하다고 검증되면, 단계(645)에서 영상 처리 장치는 추정한 눈의 위치로부터 눈의 좌표를 출력할 수 있다. 단계(643)에서 추정한 사용자의 눈(또는 동공)의 위치가 정확하지 않다고 검증되면, 영상 처리 장치는 단계(610)을 통해 얼굴 영역을 재검출할 수 있다. 영상 처리 장치는 재검출된 얼굴 영역에서 가려짐의 유형을 다시 분류하여 사용자의 눈을 추적할 수 있다. If it is verified that the position of the user's eye (or pupil) estimated in operation 643 is correct, in operation 645 the image processing apparatus may output the eye coordinates from the estimated eye position. If it is verified that the position of the user's eye (or pupil) estimated in operation 643 is not accurate, the image processing apparatus may re-detect the face region in operation 610 . The image processing apparatus may track the user's eyes by reclassifying the type of occlusion in the redetected face region.

도 7은 다른 실시예에 따른 영상 처리 방법을 나타낸 흐름도이다. 도 7을 참조하면, 일 실시예에 따른 영상 처리 장치의 일 예인 차량의 3D HUD 장치가 단계(710) 내지 단계(740)를 통해 사용자의 눈을 추적하고, 추적 결과를 출력하는 과정이 도시된다. 7 is a flowchart illustrating an image processing method according to another exemplary embodiment. Referring to FIG. 7 , a process in which a 3D HUD apparatus for a vehicle, which is an example of an image processing apparatus according to an embodiment, tracks a user's eyes through steps 710 to 740 and outputs a tracking result is illustrated. .

단계(710)에서, HUD 장치는 영상 프레임을 획득한다. 영상 프레임은 예를 들어, 컬러 영상 프레임 또는 적외선 영상 프레임에 해당할 수 있다. In step 710, the HUD device acquires an image frame. The image frame may correspond to, for example, a color image frame or an infrared image frame.

단계(720)에서, HUD 장치는 영상 프레임에서의 사용자의 선글라스 착용 여부를 결정한다. HUD 장치는 예를 들어, 사용자의 눈 영역이 선글라스에 의해 가려졌는지 여부를 검출하도록 미리 학습된 검출기를 이용하여 선글라스 착용 여부를 결정할 수 있다. In step 720, the HUD device determines whether the user wears sunglasses in the image frame. For example, the HUD device may determine whether to wear sunglasses using a pre-trained detector to detect whether a user's eye region is covered by sunglasses.

단계(730)에서, HUD 장치는 선글라스 착용 여부에 대한 단계(720)에서의 결정에 따라 서로 다른 유형의 눈 추적기를 사용하여 사용자의 눈을 추적한다. In step 730, the HUD device tracks the user's eyes using different types of eye trackers according to the determination in step 720 whether to wear sunglasses.

예를 들어, 단계(720)에서 선글라스를 착용하지 않았다는 결정에 따라 HUD 장치는 제1 유형의 눈 추적기를 사용하여 사용자의 눈을 추적할 수 있다. 제1 유형의 눈 추적기는 예를 들어, 사용자의 얼굴 영역의 일부 형상에 제1 개수의 정점들을 정렬함으로써 사용자의 눈을 추적할 수 있다. For example, in response to the determination in step 720 that the sunglasses are not worn, the HUD device may track the user's eyes using the first type of eye tracker. A first type of eye tracker may track the user's eyes, for example, by aligning a first number of vertices to some shape of the user's facial region.

보다 구체적으로, 단계(720)에서 선글라스를 착용하지 않았다는 결정에 따라 HUD 장치는 단계(730)에서 제1 유형의 눈 추적기에 의해 사용자의 얼굴 영역을 검출하는 검출 박스의 크기를, 얼굴 영역 중 눈 및 코의 형상에 대응하도록 축소할 수 있다. HUD 장치는 제1 유형의 눈 추적기에 의해 축소된 검출 박스에 의해 눈 및 코의 형상을 크롭핑한 후, 크롭핑된 눈 및 코의 형상에 제1 개수의 정점들을 정렬함으로써 사용자의 눈을 추적할 수 있다. More specifically, according to the determination that the sunglasses are not worn in step 720 , the HUD device determines the size of the detection box for detecting the face region of the user by the first type of eye tracker in step 730 , the eye of the face region. And it can be reduced to correspond to the shape of the nose. The HUD device tracks the user's eyes by cropping the shape of the eyes and nose by the reduced detection box by the first type of eye tracker, and then aligning the first number of vertices to the cropped shape of the eyes and nose. can do.

또는 단계(720)에서 선글라스를 착용했다는 결정에 따라 HUD 장치는 제2 유형의 눈 추적기를 사용하여 사용자의 눈을 추적할 수 있다. 제2 유형의 눈 추적기는 예를 들어, 사용자의 얼굴 영역의 전체 형상에 제1 개수보다 많은 제2 개수의 정점들을 정렬함으로써 사용자의 눈을 추적할 수 있다. Alternatively, in response to the determination in step 720 that the sunglasses are worn, the HUD device may use a second type of eye tracker to track the user's eyes. The second type of eye tracker may track the user's eyes, for example, by aligning a second number of vertices greater than the first number to the overall shape of the user's facial region.

보다 구체적으로, 단계(720)에서 선글라스를 착용했다는 결정에 따라 HUD 장치는 단계(730)에서 제2 유형의 눈 추적기에 의해 사용자의 얼굴 영역을 검출하는 검출 박스에 의해 얼굴 영역의 전체 형상에 제2 개수의 정점들을 정렬할 수 있다. HUD 장치는 정렬된 제2 개수의 정점들에 의해 얼굴 영역에서 사용자의 동공의 위치를 추정함으로써 사용자의 눈을 추적할 수 있다. More specifically, in accordance with the determination in step 720 that the sunglasses are worn, the HUD device determines the overall shape of the face region by a detection box that detects the face region of the user by the second type of eye tracker in step 730 . Two vertices can be aligned. The HUD device may track the user's eyes by estimating the position of the user's pupil in the face region by the aligned second number of vertices.

단계(740)에서, HUD 장치는 추적한 눈의 위치를 포함하는 얼굴 영역에 대한 정보를 출력한다. In operation 740, the HUD device outputs information on the face region including the tracked eye position.

도 8은 일 실시예에 따른 영상 처리 장치의 블록도이다. 도 8을 참조하면, 일 실시예에 따른 영상 처리 장치(800)는 센서(810) 및 프로세서(830)를 포함한다. 영상 처리 장치(800)는 메모리(850), 통신 인터페이스(870), 및 디스플레이 패널(890)을 더 포함할 수 있다. 센서(810), 프로세서(830), 메모리(850), 통신 인터페이스(870), 및 디스플레이 패널(890)은 통신 버스(805)를 통해 서로 통신할 수 있다. 8 is a block diagram of an image processing apparatus according to an exemplary embodiment. Referring to FIG. 8 , the image processing apparatus 800 according to an embodiment includes a sensor 810 and a processor 830 . The image processing apparatus 800 may further include a memory 850 , a communication interface 870 , and a display panel 890 . The sensor 810 , the processor 830 , the memory 850 , the communication interface 870 , and the display panel 890 may communicate with each other through the communication bus 805 .

센서(810)는 사용자의 얼굴을 포함하는 영상 프레임을 획득한다. 센서(810)는 예를 들어, 적외선 조명에 의하여 입력 영상을 촬영하는 이미지(image) 센서, 비전(vision) 센서 또는 적외선 카메라일 수 있다. 영상 프레임은 예를 들어, 사용자의 얼굴 영상 또는 차량에서 주행 중인 사용자의 영상을 포함할 수 있다.The sensor 810 acquires an image frame including the user's face. The sensor 810 may be, for example, an image sensor that captures an input image by infrared illumination, a vision sensor, or an infrared camera. The image frame may include, for example, a face image of a user or an image of a user driving in a vehicle.

프로세서(830)는 영상 프레임에서 사용자의 얼굴 영역을 검출한다. 이때, 영상 프레임은 예를 들어, 센서(810)에 의해 획득된 것일수도 있고, 또는 통신 인터페이스(870)를 통해 영상 처리 장치(800)의 외부로부터 수신한 것일 수도 있다. The processor 830 detects the user's face region from the image frame. In this case, the image frame may be obtained by, for example, the sensor 810 or received from the outside of the image processing apparatus 800 through the communication interface 870 .

프로세서(830)는 얼굴 영역 중 적어도 일부에 가려짐이 존재하는지 여부에 따라 얼굴 영역에 대한 가려짐의 유형을 분류한다. 프로세서(830)는 가려짐의 유형에 기초하여 사용자의 눈을 추적한다. 프로세서(830)는 추적한 눈의 위치를 포함하는 얼굴 영역에 대한 정보를 출력한다. The processor 830 classifies the type of occlusion with respect to the face region according to whether occlusion exists in at least a portion of the face region. The processor 830 tracks the user's eyes based on the type of occlusion. The processor 830 outputs information on the face region including the tracked eye position.

프로세서(830)는 예를 들어, 추적한 눈의 위치를 포함하는 얼굴 영역에 대한 정보를 디스플레이 패널(890)에 출력할 수도 있고, 또는 통신 인터페이스(870)를 통해 영상 처리 장치(800)의 외부로 출력할 수도 있다. The processor 830 may output, for example, information on the face region including the tracked eye position to the display panel 890 , or the outside of the image processing device 800 through the communication interface 870 . It can also be output as

프로세서(830)는 하나일 수도 있고, 복수 개일 수도 있다. 프로세서(830)는 도 1 내지 도 7을 통해 전술한 적어도 하나의 방법 또는 적어도 하나의 방법에 대응되는 알고리즘을 수행할 수 있다. 프로세서(830)는 프로그램을 실행하고, 영상 처리 장치(800)를 제어할 수 있다. 프로세서(830)에 의하여 실행되는 프로그램 코드는 메모리(850)에 저장될 수 있다. 프로세서(830)는 예를 들어, CPU(Central Processing Unit), GPU(Graphics Processing Unit) 또는 NPU(Neural Network Processing Unit)로 구성될 수 있다.There may be one processor 830 or a plurality of processors 830 . The processor 830 may perform the at least one method described above with reference to FIGS. 1 to 7 or an algorithm corresponding to the at least one method. The processor 830 may execute a program and control the image processing apparatus 800 . The program code executed by the processor 830 may be stored in the memory 850 . The processor 830 may include, for example, a central processing unit (CPU), a graphics processing unit (GPU), or a neural network processing unit (NPU).

메모리(850)는 센서(810)에 의해 획득한 영상 프레임, 프로세서(830)가 영상 프레임으로부터 검출한 사용자의 얼굴 영역, 프로세서(830)가 분류한 얼굴 영역에 대한 가려짐 유형, 및 프로세서(830)가 추적한 사용자의 눈의 좌표 등을 저장할 수 있다. 또한, 메모리(850)는 프로세서가 출력하는 얼굴 영역에 대한 정보를 저장할 수 있다. 메모리(850)는 휘발성 메모리 또는 비 휘발성 메모리일 수 있다. The memory 850 stores the image frame acquired by the sensor 810 , the user's face region detected from the image frame by the processor 830 , the occlusion type for the face region classified by the processor 830 , and the processor 830 . ) can store the coordinates of the user's eyes tracked. Also, the memory 850 may store information on the face region output by the processor. The memory 850 may be a volatile memory or a non-volatile memory.

통신 인터페이스(870)는 영상 처리 장치(800)의 외부로부터 영상 프레임을 수신할 수 있다. 통신 인터페이스(870)는 프로세서(830)에서 검출한 얼굴 영역 및/또는 프로세서(830)가 추적한 사용자의 눈의 위치를 포함하는 얼굴 영역에 대한 정보를 출력할 수 있다. 통신 인터페이스(870)는 영상 처리 장치(800)의 외부에서 캡쳐된 영상 프레임 또는 영상 처리 장치(800)의 외부로부터 수신되는 각종 센서들의 정보 등을 수신할 수 있다.The communication interface 870 may receive an image frame from the outside of the image processing apparatus 800 . The communication interface 870 may output information about the face region detected by the processor 830 and/or the face region including the position of the user's eyes tracked by the processor 830 . The communication interface 870 may receive an image frame captured from the outside of the image processing apparatus 800 or information of various sensors received from the outside of the image processing apparatus 800 .

디스플레이 패널(890)는 예를 들어, 프로세서(830)가 추적한 눈의 위치를 포함하는 사용자의 얼굴 영역에 대한 정보 등과 같이 프로세서(830)의 처리 결과를 표시할 수 있다. 예를 들어, 영상 처리 장치(800)가 차량에 내장(embedded)된 경우, 디스플레이(890)는 차량에 설치된 헤드 업 디스플레이(HUD)로 구성될 수 있다.The display panel 890 may display the processing result of the processor 830 , such as information on the user's face region including the eye position tracked by the processor 830 , for example. For example, when the image processing apparatus 800 is embedded in a vehicle, the display 890 may be configured as a head-up display (HUD) installed in the vehicle.

영상 처리 장치(800)는 예를 들어, HUD(Head Up Display) 장치, 3D 디지털 정보 디스플레이(Digital Information Display, DID), 내비게이션 장치, 3D 모바일 기기, 스마트 폰, 스마트 TV, 및 스마트 차량 등을 포함할 수 있으며, 반드시 이에 한정되지는 않는다. 3D 모바일 기기는 예를 들어, 증강 현실(Augmented Reality; AR), 가상 현실(Virtual Reality; VR), 및/또는 혼합 현실(Mixed Reality; MR)을 표시하기 위한 디스플레이 장치, 머리 착용 디스플레이(Head Mounted Display; HMD) 및 얼굴 착용 디스플레이(Face Mounted Display; FMD) 등을 모두 포함하는 의미로 이해될 수 있다.The image processing device 800 includes, for example, a Head Up Display (HUD) device, a 3D digital information display (DID), a navigation device, a 3D mobile device, a smart phone, a smart TV, and a smart vehicle. can, but is not necessarily limited thereto. The 3D mobile device is, for example, a display device for displaying Augmented Reality (AR), Virtual Reality (VR), and/or Mixed Reality (MR), a head mounted display (Head Mounted Display). Display (HMD) and face mounted display (FMD) may be understood as including both.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

obtaining an image frame including a user's face;
detecting the face region of the user from the image frame;
classifying the type of occlusion for the face region according to whether occlusion exists in at least a part of the face region;
tracking the user's eyes based on the type of occlusion; and
outputting information on the face region including the tracked eye position
containing,
Image processing method.

According to claim 1,
The type of occlusion is
a first type corresponding to a bare face in which the occlusion does not exist;
a second type in which at least one of a forehead, an eyebrow, and at least a part of an eye of the user's face area is covered;
a third type in which an eye region of the user's face region is covered; and
A third type in which at least one of a mouth and a nose among the face regions of the user is covered
containing at least one of.
Image processing method.

According to claim 1,
The type of occlusion is
A type corresponding to a bare face in which the occlusion does not exist; and
Type B in which the eye area is covered by sunglasses among the face area of the user
containing at least one of.
Image processing method.

According to claim 1,
The step of tracking the user's eyes
tracking the user's eyes by aligning different numbers of vertices to at least some shapes of the facial region based on the type of occlusion;
containing,
Image processing method.

4. The method of claim 3,
The step of tracking the user's eyes
tracking the user's eyes by aligning a first number of vertices to a partial shape of the face region when the type of occlusion is the first type; and
tracking the user's eyes by aligning a second number of vertices greater than the first number to the overall shape of the face region when the type of occlusion is any one of the second to fourth types;
containing,
Image processing method.

6. The method of claim 5,
When the type of occlusion is the first type, the step of tracking the user's eyes
reducing a size of a detection box for detecting the face region to correspond to shapes of eyes and nose among the face regions;
cropping the shape of the eyes and nose by the reduced detection box; and
tracking the user's eyes by aligning the first number of vertices to the cropped shape of the eyes and nose.
containing,
Image processing method.

6. The method of claim 5,
If the type of occlusion is any one of the second type to the fourth type, the step of tracking the user's eye may include:
aligning the second number of vertices to the overall shape of the face region by a detection box for detecting the face region; and
tracking the user's eye by estimating the location of the user's pupil in the face region by the aligned second number of vertices
containing,
Image processing method.

4. The method of claim 3,
The step of tracking the user's eyes
re-detecting the face region according to a determination that tracking of the user's eyes by the aligned vertices has failed; and
Classifying the type of occlusion based on the re-detected face region, and repeatedly performing the steps of tracking the user's eyes
further comprising,
Image processing method.

According to claim 1,
The step of classifying the type of occlusion is
Classifying the type of occlusion with respect to the face region using a pre-learned classifier to distinguish the region where the occlusion exists among the face regions by machine learning
containing,
Image processing method.

obtaining an image frame;
determining whether the user wears sunglasses in the image frame;
tracking the user's eyes using different types of eye trackers according to the determination of whether to wear the sunglasses; and
outputting information on the face region including the tracked eye position
containing,
Image processing method of HUD (Head Up Display) device.

11. The method of claim 10,
The step of tracking the user's eyes
tracking the user's eyes using a first type of eye tracker in response to a determination that the sunglasses are not worn; and
tracking the user's eyes using a second type of eye tracker in accordance with a determination that the sunglasses are worn;
containing,
HUD device image processing method.

12. The method of claim 11,
The first type of eye tracker comprises
tracking the user's eyes by aligning a first number of vertices to some shape of the user's face region;
The second type of eye tracker is
tracking the user's eyes by aligning a second number of vertices greater than the first number to the overall shape of the user's face region;
HUD device image processing method.

12. The method of claim 11,
Using the first type of eye tracker to track the user's eye comprises:
reducing a size of a detection box for detecting the user's face region to correspond to shapes of eyes and nose among the face regions;
cropping the shape of the eyes and nose by the reduced detection box; and
tracking the user's eye by aligning a first number of vertices to the cropped eye and nose shape;
containing,
HUD device image processing method.

12. The method of claim 11,
Using the second type of eye tracker to track the user's eye comprises:
aligning a second number of vertices to the overall shape of the face region by a detection box for detecting the face region of the user; and
tracking the user's eye by estimating the location of the user's pupil in the face region by the aligned second number of vertices
containing,
HUD device image processing method.

11. The method of claim 10
The step of determining whether to wear the sunglasses
Determining whether to wear the sunglasses using a pre-learned detector to detect whether the user's eye region is covered by the sunglasses
containing,
HUD device image processing method.

A computer program stored in a computer-readable recording medium in combination with hardware to execute the method of any one of claims 1 to 15.

a sensor for acquiring an image frame including a user's face; and
Detects the user's face region from the image frame, classifies the type of occlusion with respect to the face region according to whether occlusion exists in at least a part of the face region, and based on the occlusion type A processor that tracks the user's eyes and outputs information on the face region including the tracked eye position
containing,
image processing device.

18. The method of claim 17,
the processor is
tracking the user's eyes by aligning different numbers of vertices to at least some shapes of the facial region based on the type of occlusion;
image processing device.

19. The method of claim 18,
the processor is
when the type of occlusion is the first type, tracking the user's eyes by aligning a first number of vertices to some shape of the face region;
tracking the user's eyes by aligning a second number of vertices greater than the first number to the overall shape of the face region when the type of occlusion is any one of the second to fourth types;
image processing device.

18. The method of claim 17,
The image processing device
At least one of a Head Up Display (HUD) device, a 3D Digital Information Display (DID), a navigation device, a 3D mobile device, a smart phone, a smart TV, and a smart vehicle,
image processing device.