KR20200106111A

KR20200106111A - Face landmark detection apparatus and method using gaussian landmark map with regression scheme

Info

Publication number: KR20200106111A
Application number: KR1020190022152A
Authority: KR
Inventors: 이상윤; 이용주; 배한별; 전태재
Original assignee: 연세대학교 산학협력단
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2020-09-11
Also published as: WO2020175729A1

Abstract

The present invention can provide an apparatus for detecting a facial feature point and a method thereof. The apparatus includes: a facial area detection unit which receives an input image including a face and detects a facial area to output a facial area map; a feature point prediction unit which detects a feature point from the facial area map according to a pre-designated algorithm and outputs the feature point as a predicted feature point; a Gaussian feature map generation unit for generating a Gaussian feature map having a pixel value according to a Gaussian distribution around the predicted feature point in the facial area map; and a feature point acquisition unit for receiving the facial area map and the Gaussian feature map and detecting the location of the feature point in the facial area map in consideration of the pixel value of the Gaussian feature map according to a pre-learned pattern estimation scheme.

Description

Face feature point detection device and method using Gaussian feature point map and regression technique {FACE LANDMARK DETECTION APPARATUS AND METHOD USING GAUSSIAN LANDMARK MAP WITH REGRESSION SCHEME}

본 발명은 얼굴 특징점 검출 장치 및 방법에 관한 것으로, 가우시안 특징점맵과 회귀 기법을 이용하여 정확도가 개선된 얼굴 특징점 검출 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for detecting facial feature points, and to an apparatus and method for detecting facial feature points with improved accuracy by using a Gaussian feature point map and a regression technique.

2차원 얼굴 영상 처리에는 얼굴 검출, 얼굴 인식, 얼굴 표정 예측, 얼굴 특징점 검출 등 다양한 연구 분야가 있으며, 이 중 얼굴 특징점 검출은 얼굴의 특징을 간략히 가장 잘 나타내는 표현 방식이기 때문에, 얼굴 인식, 감정 인식 그리고 포즈 인식 등 다양한 얼굴 영상 처리 분야의 기반 정보로서 다방면으로 활용되고 있다. 즉 얼굴 영상 처리 분야의 많은 기술들이 얼굴 특징점이 정확하게 검출된다는 전제하에 다양한 응용 처리를 수행하고 있다. 따라서 정확한 얼굴 특징점 검출이 얼굴 영상 처리 분야에서 매우 중요하다.There are various research fields in 2D facial image processing, such as face detection, face recognition, facial expression prediction, and facial feature point detection. Among them, face feature point detection is the most simple expression method for representing facial features. In addition, it is used in various ways as the basis information for various face image processing fields such as pose recognition. In other words, many technologies in the field of facial image processing are performing various application processing on the premise that facial feature points are accurately detected. Therefore, accurate facial feature point detection is very important in the field of facial image processing.

한편 최근에는 딥 러닝(Deep learning)과 같은 학습 기반 얼굴 특징점 검출 방법이 제안되었으며, 학습 기반 얼굴 특징점 검출 방법은 미리 학습된 인공 신경망을 이용하여 특징을 스스로 검출할 수 있도록 구성된다. 현재 인공 신경망은 음성, 영상, 데이터 마이닝 등 최신 산업 전반에 걸쳐 다양한 방식으로 적용되고 있으며, 영상 분야에서는 대표적인 인공 신경망 중 하나로서 영상에 대해 크기, 회전, 왜곡에 불변하는 특징을 추출할 수 있는 컨볼루션 신경망(Convolutional neural network: 이하 CNN)를 주로 이용하여 높은 성능을 나타내고 있다. 즉 CNN을 이용하는 경우, 얼굴 특정점 검출시 눈, 코, 입 등 사람마다 다양한 모양의 얼굴 인식에 있어 우수한 성능을 나타낼 수 있다. 또한 CPU와 컴퓨터 연산량 기술의 발전에 의해 인공 신경망을 이용한 학습 기반 특징점 검출 방법은 실시간 처리에서도 효과적으로 기능할 수 있다.Meanwhile, recently, a learning-based facial feature point detection method such as deep learning has been proposed, and the learning-based facial feature point detection method is configured to detect features by itself using a pre-learned artificial neural network. Currently, artificial neural networks are applied in various ways throughout the latest industries such as voice, video, and data mining, and as one of the representative artificial neural networks in the video field, conball that can extract features that are invariant in size, rotation, and distortion for images. It shows high performance by mainly using a convolutional neural network (CNN). That is, in the case of using CNN, it can exhibit excellent performance in recognizing faces of various shapes for each person such as eyes, nose, and mouth when detecting face specific points. In addition, with the development of CPU and computational technology, the learning-based feature point detection method using artificial neural networks can function effectively in real-time processing.

다만 학습 기반 얼굴 특징점 검출 방법을 효과적으로 활용하기 위해서는 인공 신경망이 사전에 방대한 양의 데이터를 기반으로 학습되어야만 한다. 그러나 학습을 위해 제공되는 데이터 셋이 얼굴 표정 및 포즈의 다양성을 모두 학습하기에 매우 부족한 실정이며, 이로 인해 인공 신경망을 이용한 학습 기반 얼굴 특징점 검출 방법의 성능이 제약되는 문제가 있다. 이러한 학습 데이터의 부족을 해결하기 위해 주어진 데이터 셋에 대칭, 빛 변화, 랜덤 패치 등과 같이 일부 변화를 주어 학습 데이터 셋을 증가시키는 방안도 제안된 바 있으나, 변화의 수준이 크지 않아 학습 성능의 개선 효과가 크지 않다는 한계가 있다.However, in order to effectively utilize the learning-based facial feature detection method, the artificial neural network must be learned based on a vast amount of data in advance. However, the data set provided for learning is very insufficient to learn all the diversity of facial expressions and poses, and this limits the performance of a learning-based facial feature point detection method using an artificial neural network. In order to solve this lack of training data, a method of increasing the training data set by making some changes such as symmetry, light change, and random patch to a given data set has also been proposed. However, the level of change is not large, so the learning performance is improved. There is a limit that is not large.

한국 공개 특허 제10-2015-0127381호 (2015.11.17 공개)Korean Patent Publication No. 10-2015-0127381 (published on November 17, 2015)

본 발명의 목적은 얼굴 특징점을 정확하게 검출할 수 있는 얼굴 특징점 검출 장치 및 방법을 제공하는데 있다.An object of the present invention is to provide an apparatus and method for detecting facial feature points capable of accurately detecting facial feature points.

본 발명의 다른 목적은 적은 양의 학습 데이터로도 효과적으로 학습될 수 있는 얼굴 특징점 검출 장치 및 방법을 제공하는데 있다.Another object of the present invention is to provide an apparatus and method for detecting facial feature points that can be effectively learned even with a small amount of training data.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 얼굴 특징점 검출 장치는 얼굴이 포함된 입력 영상을 인가받아 얼굴 영역을 검출하여 얼굴 영역맵을 출력하는 얼굴 영역 검출부; 상기 얼굴 영역맵에서 기지정된 알고리즘에 따라 특징점을 검출하여 예측 특징점으로 출력하는 특징점 예측부; 상기 얼굴 영역맵에서 상기 예측 특징점을 중심으로 가우시안 분포에 따른 픽셀값을 갖는 가우시안 특징맵을 생성하는 가우시안 특징맵 생성부; 및 상기 얼굴 영역맵과 상기 가우시안 특징맵을 인가받고, 미리 학습된 패턴 추정 방식에 따라 상기 가우시안 특징맵의 픽셀값을 고려하여 상기 얼굴 영역맵의 특징점의 위치를 검출하는 특징점 획득부; 를 포함한다.In order to achieve the above object, an apparatus for detecting facial feature points according to an embodiment of the present invention includes: a face region detector configured to receive an input image including a face, detect a face region, and output a face region map; A feature point predictor for detecting feature points in the face region map according to a predetermined algorithm and outputting predicted feature points; A Gaussian feature map generating unit generating a Gaussian feature map having pixel values according to a Gaussian distribution centered on the predicted feature point in the face region map; And a feature point acquisition unit receiving the face region map and the Gaussian feature map and detecting a location of the feature point of the face region map in consideration of a pixel value of the Gaussian feature map according to a previously learned pattern estimation method. Includes.

상기 특징점 획득부는 상기 가우시안 특징맵의 픽셀값을 기반으로 상기 예측 특징점의 위치를 보정하기 위한 보정값을 획득하고, 획득된 보정값을 이용하여 상기 예측 특징점의 위치를 보정함으로써, 상기 얼굴 영역맵의 특징점의 위치를 추정할 수 있다.The feature point acquisition unit acquires a correction value for correcting the position of the predicted feature point based on the pixel value of the Gaussian feature map, and corrects the position of the predicted feature point using the obtained correction value, The location of the feature point can be estimated.

상기 특징점 획득부는 각각 다수의 컨볼루션 레이어와 이전 컨볼루션 박스에서 인가되는 연산 결과를 합하는 적어도 하나의 가산기를 포함하는 다수의 컨볼루션 박스를 포함하는 ResNet(Residual Network)로 구현되며, 상기 다수의 컨볼루션 박스 각각에 포함되는 다수의 컨볼루션 레이어 중 적어도 하나의 컨볼루션 레이어는 크기가 상이할 수 있다.The feature point acquisition unit is implemented by a ResNet (Residual Network) including a plurality of convolution boxes each including a plurality of convolution layers and at least one adder that sums the calculation results applied from the previous convolution box, and the plurality of convolutions At least one convolution layer among a plurality of convolution layers included in each of the resolution boxes may have different sizes.

상기 특징점 예측부는 트리 알고리즘에 따라 상기 얼굴 영역맵의 데이터를 랜덤하게 샘플링하여 생성된 다수의 결정 트리를 포함하는 랜덤 포레스트를 획득하고, 랜덤 포레스트에 의해 추출된 특징에 대해 회귀 트리 분류기를 이용하여 얼굴 특징점을 예측하며, 기지정된 횟수로 반복하여 얼굴 특징점을 예측하여 이전 예측된 얼굴 특징점을 보정함으로써, 상기 예측 특징점을 획득할 수 있다.The feature point prediction unit acquires a random forest including a plurality of decision trees generated by randomly sampling the data of the face region map according to a tree algorithm, and uses a regression tree classifier for the feature extracted by the random forest. The predicted feature points may be obtained by predicting the feature points, predicting the facial feature points by repeating a predetermined number of times and correcting the previously predicted facial feature points.

상기 얼굴 영역 검출부는 HOG(Histogram of Gradient)+SVM(Support Vector Machine) 방식에 의해 얼굴 영역을 검출할 수 있다.The face region detector may detect the face region using a Histogram of Gradient (HOG) + Support Vector Machine (SVM) method.

상기 목적을 달성하기 위한 본 발명의 다른 실시예에 따른 얼굴 특징점 검출 방법은 얼굴이 포함된 입력 영상을 인가받아 얼굴 영역을 검출하여 얼굴 영역맵을 출력하는 단계; 상기 얼굴 영역맵에서 기지정된 알고리즘에 따라 특징점을 검출하여 예측 특징점으로 출력하는 단계; 상기 얼굴 영역맵에서 상기 예측 특징점을 중심으로 가우시안 분포에 따른 픽셀값을 갖는 가우시안 특징맵을 생성하는 단계; 및 상기 얼굴 영역맵과 상기 가우시안 특징맵을 인가받고, 미리 학습된 패턴 추정 방식에 따라 상기 가우시안 특징맵의 픽셀값을 고려하여 상기 얼굴 영역맵의 특징점의 위치를 검출하는 단계; 를 포함한다.In order to achieve the above object, a method for detecting facial feature points according to another embodiment of the present invention includes: receiving an input image including a face, detecting a face region, and outputting a face region map; Detecting a feature point in the face region map according to a predetermined algorithm and outputting a predicted feature point; Generating a Gaussian feature map having pixel values according to a Gaussian distribution centered on the predicted feature point in the face region map; And receiving the face region map and the Gaussian feature map, and detecting a position of the feature point of the face region map in consideration of pixel values of the Gaussian feature map according to a previously learned pattern estimation method. Includes.

따라서, 본 발명의 실시예에 따른 얼굴 특징점 검출 장치 및 방법은 얼굴 검출기를 통해 검출된 얼굴 영역에 대해 트리 알고리즘을 이용하여 특징점을 예측하고, 예측된 특징점을 기반으로 가우시안 특징맵을 생성하여 검출된 얼굴 영역과 함께 인공 신경망의 입력으로 인가하여 예측 특징점의 위치를 보정함으로써 특징덤을 검출하므로, 적은 양의 학습 데이터로도 효과적으로 학습되어 정확하게 얼굴 특징점을 검출할 수 있다.Accordingly, the apparatus and method for detecting facial feature points according to an embodiment of the present invention predicts feature points using a tree algorithm for a face region detected through a face detector, and generates a Gaussian feature map based on the predicted feature points. Since the feature bonus is detected by correcting the position of the predicted feature point by applying it as an input of the artificial neural network together with the face region, it is possible to accurately detect the face feature point by effectively learning even with a small amount of training data.

도1 은 본 발명의 일 실시예에 따른 얼굴 특징점 검출 장치의 개략적 구조를 나타낸다.
도2 및 도3 은 도1 의 특징점 예측부가 트리 알고리즘을 이용하여 추출하는 예측 특징점의 일예를 나타낸다.
도4 는 도1 의 보정값 추정부의 구조의 일예를 나타낸다.
도5 는 본 발명의 일 실시예에 따른 얼굴 특징점 검출 방법을 나타낸다.1 shows a schematic structure of an apparatus for detecting facial feature points according to an embodiment of the present invention.
2 and 3 illustrate examples of predicted feature points extracted by the feature point predictor of FIG. 1 using a tree algorithm.
4 shows an example of the structure of the correction value estimating unit of FIG. 1.
5 shows a method of detecting facial feature points according to an embodiment of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the implementation of the present invention, reference should be made to the accompanying drawings illustrating preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, the present invention will be described in detail by describing a preferred embodiment of the present invention with reference to the accompanying drawings. However, the present invention may be implemented in various different forms, and is not limited to the described embodiments. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components unless specifically stated to the contrary. In addition, terms such as "... unit", "... group", "module", and "block" described in the specification mean units that process at least one function or operation, which is hardware, software, or hardware. And software.

도1 은 본 발명의 일 실시예에 따른 얼굴 특징점 검출 장치의 개략적 구조를 나타내고, 도2 및 도3 은 도1 의 특징점 예측부가 트리 알고리즘을 이용하여 추출하는 예측 특징점의 일예를 나타내며, 도4 는 도1 의 보정값 추정부의 구조의 일예를 나타낸다.1 shows a schematic structure of an apparatus for detecting facial feature points according to an embodiment of the present invention, and FIGS. 2 and 3 show an example of predicted feature points extracted by the feature point predictor of FIG. 1 using a tree algorithm, and FIG. 1 shows an example of the structure of the correction value estimation unit.

도1 을 참조하면, 본 실시예에 따른 얼굴 특징점 검출 장치는 영상 획득부(100), 얼굴 영역 검출부(200), 특징점 예측부(300), 가우시안 특징맵 생성부(400), 특징맵 병합부(500), 보정값 추정부(600) 및 특징점 획득부(700)를 포함한다.Referring to FIG. 1, the facial feature point detection apparatus according to the present embodiment includes an image acquisition unit 100, a face region detection unit 200, a feature point prediction unit 300, a Gaussian feature map generation unit 400, and a feature map merging unit. 500, a correction value estimating unit 600, and a feature point acquisition unit 700.

영상 획득부(100)는 얼굴이 포함된 입력 영상을 획득하여 얼굴 영역 검출부(200)로 전달한다. 여기서 입력 영상은 일반적인 카메라 장치 등을 이용하여 촬영된 2차원 영상이며, 경우에 따라서는 연속되는 다수의 2차원 영상이 프레임으로 포함된 동영상이어도 무방하다.The image acquisition unit 100 acquires an input image including a face and transmits it to the face region detection unit 200. Here, the input image is a 2D image captured using a general camera device, and in some cases, it may be a video including a plurality of consecutive 2D images as frames.

얼굴 영역 검출부(200)는 영상 획득부(100)로부터 얼굴 특징점을 검출해야 하는 입력 영상을 인가받고, 기지정된 방식으로 입력 영상에서 얼굴 영역을 검출한다. 즉 얼굴 영역 검출부(200)는 입력 영상에서 얼굴 영역과 배경 영역을 구분하여 얼굴 영역만을 검출하여 얼굴 영역맵을 출력한다.The face region detection unit 200 receives an input image for detecting facial feature points from the image acquisition unit 100 and detects a face region from the input image in a predetermined manner. That is, the face region detection unit 200 divides the face region and the background region from the input image, detects only the face region, and outputs a face region map.

영상에서 얼굴 영역을 검출하는 방식은 다양하게 공지되어 있으나 여기서는 일예로, HOG(Histogram of Gradient)+SVM(Support Vector Machine) 방법을 이용하여 얼굴 영역을 검출하는 것으로 가정한다. HOG는 일정 크기의 셀 내부에서 모든 픽셀 값의 그래디언트(Gradient)와 방향(Orientation)을 계산한 뒤 이 값들을 사용하여 히스토그램을 생성하며, 이를 SVM의 특징 벡터로 이용하여 얼굴 영역을 검출하는 방법이다. 그러나 본 실시예에서 얼굴 영역 검출부(200)는 HOG+SVM 방법 이외에 다른 얼굴 영역 검출 방법을 이용할 수도 있다.Various methods of detecting a face region in an image are known, but here, as an example, it is assumed that the face region is detected using a Histogram of Gradient (HOG) + Support Vector Machine (SVM) method. HOG is a method of detecting the face region by calculating the gradient and orientation of all pixel values within a cell of a certain size, then generating a histogram using these values, and using this as a feature vector of SVM. . However, in this embodiment, the face region detection unit 200 may use a face region detection method other than the HOG+SVM method.

특징점 예측부(300)는 얼굴 영역 검출부(200)에서 검출된 얼굴 영역맵을 인가받고, 인가된 얼굴 영역맵에서 기지정된 방식으로 특징점을 검출하여 예측 특징점으로 출력한다. 얼굴 영상에서 특징점을 검출하는 다양한 방법이 알려져 있으나, 여기서는 일예로 특징점 예측부(300)가 트리(tree) 알고리즘을 이용하여 적어도 하나의 예측 특징점을 검출하는 것으로 가정한다.The feature point prediction unit 300 receives the face area map detected by the face area detection unit 200, detects the feature point in a known manner from the applied face area map, and outputs it as a predicted feature point. Various methods of detecting a feature point in a face image are known, but here, as an example, it is assumed that the feature point predictor 300 detects at least one predicted feature point using a tree algorithm.

트리 알고리즘을 이용한 얼굴 특징점 검출은 최근 제안된 회귀 기반(regression-based) 얼굴 특징점 검출 방법으로, 특징점 추출을 위한 변수와 해당 변수의 연산을 통해 특징점을 검출하는 방식인 Hand-craft 방식의 알고리즘 중 하나이다. 트리 알고리즘은 데이터를 랜덤하게 샘플링하여 다수의 결정 트리를 생성함으로써 랜덤 포레스트(random forest)를 획득하고, 획득된 랜덤 포레스트의 다수의 결정 트리들의 결과를 모아 다수결로 특징을 추출한다.Face feature point detection using tree algorithm is a recently proposed regression-based face feature point detection method, and it is one of the hand-craft algorithms that detect feature points through the operation of variables and variables for feature point extraction. . The tree algorithm obtains a random forest by randomly sampling data to generate a plurality of decision trees, and extracts features by majority vote by collecting the results of the obtained decision trees of the random forest.

다만 트리 알고리즘을 이용하여 검출되는 적어도 하나의 예측 특징점은 많은 경우에 우수한 검출 성능을 나타내지만, 얼굴 영상에 따라서는 매우 큰 오류(big failure)를 나타내는 문제가 있다. 특히, 영상에 포함된 대상자 얼굴의 포즈에 따라 큰 오류를 발생하는 경우가 발생할 수 있다.However, at least one predicted feature point detected using the tree algorithm exhibits excellent detection performance in many cases, but there is a problem in that it indicates a very large failure depending on the face image. In particular, a large error may occur depending on the pose of the subject's face included in the image.

도2 를 참조하면, 트리 알고리즘으로 검출된 예측 특징점이 얼굴의 윤곽선 및 눈/코/입과 관련이 없는 영역에 지정되어 있는 것을 확인할 수 있다. 즉 오류가 크게 나타날 수 있다.Referring to FIG. 2, it can be seen that the predicted feature points detected by the tree algorithm are designated in areas not related to the contours of the face and eyes/nose/mouth. That is, errors can appear large.

이에 특징점 예측부(300)는 랜덤 포레스트에 의해 추출된 특징에 대해 회귀 트리 분류기(regression tree classifier)를 적용하여 목표값에 빠르게 수렴할 수 있도록 한다. 회귀 트리 분류기를 이용하여 빠르게 목표값에 수렴할 수 있도록 하므로, 추출되는 예측 특징점은 캐스케이드(Cascade) 방식으로 다수 횟수로 반복하여 리파인이 가능하다. 즉 다수횟수로 반복하여 예측 특징점을 검출함으로써, 적어도 하나의 예측 특징점의 정확도를 향상시킨다.Accordingly, the feature point prediction unit 300 applies a regression tree classifier to the feature extracted by the random forest so that it can rapidly converge to the target value. Since the regression tree classifier is used to quickly converge to the target value, the extracted predicted feature points can be refined by repeating a plurality of times in a cascade method. That is, the accuracy of at least one predicted feature point is improved by repeatedly detecting the predicted feature point a plurality of times.

도3 에 도시된 바와 같이, 트리 알고리즘을 회귀 트리 분류기를 이용하여 캐스케이드(Cascade) 방식으로 다수 횟수로 반복하여 리파인하는 경우, 예측 특징점의 위치 정확도가 크게 향상될 수 있음을 알 수 있다.As shown in FIG. 3, it can be seen that when the tree algorithm is refined by repeating the tree algorithm a number of times in a cascade manner using a regression tree classifier, the positional accuracy of the predicted feature points can be greatly improved.

그럼에도 불구하고, 트리 알고리즘이 갖는 오류를 보정할 수 있도록, 본 실시예에 따른 얼굴 특징점 검출 장치는 특징점 예측부(300)에서 예측된 예측 특징점을 정확한 얼굴 특징점을 검출하기 위한 기반 자료로 이용한다.Nevertheless, the facial feature point detection apparatus according to the present embodiment uses the predicted feature point predicted by the feature point predictor 300 as base data for detecting an accurate facial feature point in order to correct an error of the tree algorithm.

가우시안 특징맵 생성부(400)는 특징점 예측부(300)에서 추출된 예측 특징점을 중심으로 가우시안 분포를 갖는 가우시안 특징맵을 생성한다. 가우시안 특징맵은 특징점 예측부(300)에서 추출된 적어도 하나의 예측 특징점을 중심으로 주변 점들이 가우시안 분포에 기반하여 서로 다른 값을 갖도록 픽셀 값을 할당하여 가우시안 특징맵을 생성한다. 즉 특징점 예측부(300)에서 추출된 예측 특징점이 기지정된 가장 큰 값을 갖고 예측 특징점으로부터 거리가 멀어질 수록 점차로 작은 값을 갖도록 픽셀 값이 할당된 가우시안 특징맵을 생성하여 출력한다.The Gaussian feature map generation unit 400 generates a Gaussian feature map having a Gaussian distribution centering on the predicted feature points extracted from the feature point predicting unit 300. The Gaussian feature map generates a Gaussian feature map by assigning pixel values so that surrounding points have different values based on a Gaussian distribution around at least one predicted feature point extracted by the feature point predictor 300. That is, the predicted feature point extracted by the feature point predicting unit 300 generates and outputs a Gaussian feature map to which pixel values are assigned so that the predicted feature point has the largest known value and gradually has a smaller value as the distance from the predicted feature point increases.

특징맵 병합부(500)는 얼굴 영역 검출부(200)에서 획득된 얼굴 영역맵과 가우시안 특징맵 생성부(400)에서 생성된 가우시안 특징맵을 인가받아 기지정된 방식으로 병합하여 병합맵을 보정값 추정부(600)로 전달한다.The feature map merging unit 500 receives the face region map obtained from the face region detection unit 200 and the Gaussian feature map generated by the Gaussian feature map generation unit 400 and merges the merged map in a predetermined manner to add a correction value. Delivered to the government (600).

특징맵 병합부(500)가 얼굴 영역 검출부(200)에서 획득된 얼굴 영역맵에 가우시안 특징맵을 병합하여 병합맵을 보정값 추정부(600)로 전달하는 것은 보정값 추정부(600)가 특징점을 추출할 때, 가우시안 특징맵의 픽셀값을 픽셀 특징 가중치로서 참조할 수 있도록 하기 위함이다.When the feature map merging unit 500 merges the Gaussian feature map with the face area map obtained by the face area detection unit 200 and transfers the merged map to the correction value estimating unit 600, the correction value estimating unit 600 It is to be able to refer to the pixel value of the Gaussian feature map as a pixel feature weight when extracting.

여기서 특징맵 병합부(500)는 얼굴 영역맵과 가우시안 특징맵을 다양한 방식으로 병합하여 병합맵을 생성할 수 있으나, 얼굴 영역맵과 가우시안 특징맵의 데이터를 단순 결합하여 보정값 추정부(600)로 전달할 수도 있다. 즉 특징맵 병합부(500)는 각각 2차원의 벡터값으로 구성되는 얼굴 영역맵과 얼굴 영역맵에 대응하는 가우시안 특징맵을 별도의 처리 절차 없이 순차적으로 보정값 추정부(600)로 전달하도록 구성될 수도 있다.Here, the feature map merging unit 500 may create a merge map by merging the face area map and the Gaussian feature map in various ways, but the correction value estimating unit 600 simply combines the data of the face area map and the Gaussian feature map. You can also pass it to. That is, the feature map merging unit 500 is configured to sequentially transfer a face region map composed of two-dimensional vector values and a Gaussian feature map corresponding to the face region map to the correction value estimating unit 600 without a separate processing procedure. It could be.

이 경우 보정값 추정부(600)는 얼굴 영역 검출부(200)와 가우시안 특징맵 생성부(400)로부터 직접 얼굴 영역맵과 가우시안 특징맵을 입력으로 인가받도록 구성될 수 있다. 이 경우 특징맵 병합부(500)는 생략될 수도 있다.In this case, the correction value estimating unit 600 may be configured to directly receive a face region map and a Gaussian feature map as inputs from the face region detection unit 200 and the Gaussian feature map generation unit 400. In this case, the feature map merging unit 500 may be omitted.

보정값 추정부(600)는 특징맵 병합부(500)로부터 병합맵을 인가받고, 미리 학습된 패턴 추정 방식에 따라 병합맵에서 적어도 하나의 예측 특징점 각각의 위치를 보상할 보정값을 추출한다. 특히 본 실시예에서 보정값 추정부(600)는 학습된 패턴 추정 방식에 따라 병합맵의 가우시안 특징맵에 포함된 픽셀 특징 가중치를 참조하여 얼굴 영역맵에서 예측 특징점에 대한 보정값을 추출한다.The correction value estimating unit 600 receives the merge map from the feature map merging unit 500, and extracts a correction value to compensate for the position of each of the at least one predicted feature point from the merge map according to a previously learned pattern estimation method. In particular, in this embodiment, the correction value estimating unit 600 extracts a correction value for the predicted feature point from the face region map by referring to the pixel feature weight included in the Gaussian feature map of the merged map according to the learned pattern estimation method.

보정값 추정부(600)는 미리 학습된 인공 신경망으로 구현될 수 있으며, 일예로 도4 에 도시된 바와 같이, 인공 신경망 중에서 ResNet(Deep Residual Learning for Image Recognition)으로 구현될 수 있다. ResNet은 잔여 학습(residual learning) 방법을 이용하여 깊은 네트워크 구조에도 그래디언트 값을 가능한 유지할 수 있어 더욱 정교한 검출 성능을 나타낼 수 있는 것으로 알려진 인공 신경망이다.The correction value estimating unit 600 may be implemented as a pre-trained artificial neural network, and as an example, as shown in FIG. 4, may be implemented as a Deep Residual Learning for Image Recognition (ResNet) among artificial neural networks. ResNet is an artificial neural network known to be capable of more sophisticated detection performance by maintaining a gradient value as possible even in a deep network structure using a residual learning method.

도4 에서 (a)는 ResNet으로 구현되는 보정값 추정부(600)의 개략적 구조 및 동작을 나타내고, (b)는 (a)의 컨볼루션 박스(Conv Box)의 일예를 나타낸다.In FIG. 4, (a) shows a schematic structure and operation of the correction value estimating unit 600 implemented by ResNet, and (b) shows an example of the convolution box (Conv Box) of (a).

ResNet는 기본적으로 컨볼루션 신경망을 기반으로 구성되지만, 기지정된 개수의 컨볼루션 레이어(conv)를 컨볼루션 박스(Conv Box)로 그룹화하고, 각 컨볼루션 박스(Conv Box)(CLB1 ~ CLB4) 사이의 가산기(AD)가 숏컷 연결되는 구조를 가져, 학습 시에 대응하는 컨볼루션 박스(CLBi) 이전의 인접한 컨볼루션 박스(CLBi-1)가 추출한 특성을 함께 고려하여 학습할 수 있으며, 이로 인해 신경망의 깊이가 매우 깊어진 경우에도, 패턴 추정 시에 왜곡을 발생하지 않으며, 우수한 성능을 나타낼 수 있다. ResNet is basically configured based on a convolutional neural network, but a predetermined number of convolutional layers (conv) are grouped into a convolutional box, and each convolutional box (CLB1 ~ CLB4) is Since the adder (AD) has a structure in which a shortcut is connected, it is possible to learn by considering the characteristics extracted by the adjacent convolution box (CLBi-1) before the corresponding convolution box (CLBi) at the time of learning. Even when the depth is very deep, distortion is not generated during pattern estimation, and excellent performance can be exhibited.

도4 의 (a)를 참조하면, 본 실시예에 따른 보정값 추정부(600)는 하나의 컨볼루션 레이어(CL1)와 다수의 컨볼루션 박스 레이어(CBL1 ~ CBL4), 2개의 풀링 레이(PL1, PL2) 및 완전 연결 레이어(FCL)를 포함할 수 있다. 컨볼루션 레이어(CL1)는 얼굴 영역맵과 가우시안 특징맵을 포함하는 병합맵을 입력으로 인가받아 7 X 7 크기의 64개의 커널을 이용하여 컨볼루션 연산을 수행한다. 그리고 제1 풀링 레이어(PL1)는 맥스 풀링 레이어로서 컨볼루션 레이어(CL1)의 컨볼루션 결과로 획득되는 특징맵을 3 X 3 크기의 윈도우를 슬라이딩 하면서 윈도우 내의 값들 중 최대값을 도출함으로써, 맥스 풀링을 수행한다. 한편, 다수의 컨볼루션 박스 레이어(CBL1 ~ CBL4) 각각은 (b)에 도시된 바와 같이, 내부에 다수의 컨볼루션 레이어(conv)를 포함하며, 각 컨볼루션 레이어(conv)크기를 상이하게 구성한 병목 구조(Bottleneck architecture)로 구성함으로써, 특징맵 수를 증가시키면서 내부 파라미터의 수는 감소시킬 수 있다.Referring to FIG. 4A, the correction value estimating unit 600 according to the present embodiment includes one convolution layer CL1, a plurality of convolution box layers CBL1 to CBL4, and two pulling rays PL1. , PL2) and a fully connected layer (FCL). The convolutional layer CL1 receives a merge map including a face region map and a Gaussian feature map as an input and performs a convolution operation using 64 kernels having a size of 7 X 7. In addition, the first pooling layer PL1 is a max pooling layer, by sliding the feature map obtained as a result of the convolution of the convolution layer CL1 and deriving the maximum value among the values in the window while sliding a 3 X 3 window. Perform. Meanwhile, each of the plurality of convolution box layers (CBL1 to CBL4) includes a plurality of convolution layers (conv) therein, as shown in (b), and each convolution layer (conv) size is configured differently. By configuring the bottleneck architecture, the number of internal parameters can be decreased while increasing the number of feature maps.

그리고 제2 풀링 레이어(PL2)는 평균 풀링 레이어로서, 최종 컨볼루션 박스 레이어(CLB4)에서 출력되는 특징맵에 대해 7 X 7 크기의 윈도우를 슬라이딩 하면서 윈도우 내의 값들에 대한 평균값을 획득한다.In addition, the second pooling layer PL2 is an average pooling layer, and obtains an average value of values within the window while sliding a 7 X 7 window for the feature map output from the final convolution box layer CLB4.

그리고 완전 연결 레이어(FCL)는 제2 풀링 레이어(PL2)에서 평균값으로 획득된 특징맵에 대해 분류를 수행하여 출력한다. 여기서 출력되는 보정값은 예측 특징점을 이동시키기 위한 거리 위치 정보값일 수 있다.In addition, the fully connected layer FCL classifies and outputs a feature map obtained as an average value in the second pooling layer PL2. The correction value output here may be a distance position information value for moving the predicted feature point.

그러나 본 발명은 이에 한정되지 않으며, 경우에 따라서는 CNN과 같은 다른 인공 신경망으로 구현될 수도 있다.However, the present invention is not limited thereto, and in some cases, it may be implemented with other artificial neural networks such as CNN.

그리고 특징점 획득부(700)는 특징점 예측부(300)로부터 예측 특징점 위치를 인가받고, 인가된 예측 특징점 위치를 보정값 추정부(600)에서 인가되는 보정값에 따라 보정하여 특징점을 획득한다. 즉 특징점의 위치 정보를 획득한다.In addition, the feature point acquisition unit 700 receives the predicted feature point position from the feature point predictor 300 and corrects the applied predicted feature point position according to a correction value applied from the correction value estimating unit 600 to obtain a feature point. That is, the location information of the feature point is acquired.

상기에서는 설명의 편의를 위하여 보정값 추정부(600)와 특징점 획득부(700)를 별도의 구성으로 도시하였으나, 보정값 추정부(600)는 특징점 획득부(700)에 포함되어 구성될 수 있다.In the above, for convenience of explanation, the correction value estimating unit 600 and the feature point obtaining unit 700 are shown as separate configurations, but the correction value estimating unit 600 may be included in the feature point obtaining unit 700 and configured. .

기존에는 얼굴 특징점 검출 장치가 보정값 추정부(600)와 같은 인공 신경망을 포함하는 경우, 얼굴 특징점 검출 장치에 포함된 인공 신경망이 얼굴 영역맵으로부터 얼굴 특징점을 추출하도록 학습되는 경우가 일반적이었다. 즉 특징점 예측부(300)에서 예측 특징점을 추출하고, 가우시안 특징맵 생성부(400)가 가우시안 특징맵을 생성할 필요 없이 인공 신경망이 얼굴 영역맵으로부터 직접 얼굴 특징점을 추출하도록 구성될 수 있다.Conventionally, when the facial feature point detection device includes an artificial neural network such as the correction value estimating unit 600, the artificial neural network included in the facial feature point detection device is generally trained to extract facial feature points from the face region map. That is, the feature point predictor 300 extracts the predicted feature points, and the artificial neural network directly extracts the facial feature points from the face region map without the need for the Gaussian feature map generator 400 to generate the Gaussian feature map.

그러나 인공 신경망이 얼굴 영역맵에서 직접 얼굴 특징점을 추출하도록 구성되는 경우, 학습용 데이터가 충분하지 않아 인공 신경망이 얼굴 특징점을 추출할 수 있도록 학습 시키기가 용이하지 않다. 이러한 문제를 극복하기 위해 특징점 예측부(300)에서 추출되는 예측 특징점의 위치를 인공 신경망에 얼굴 영역맵과 함께 입력하는 경우, 인공 신경망은 예측 특징점을 중심으로 특징점을 검출하도록 학습될 수 있다. 이렇게 인공 신경망이 예측 특징점을 참조하여 특징점을 검출하는 경우, 특징점 검출 성능이 향상 될 뿐만 아니라 학습 시간을 줄일 수 있다는 장점이 있다. 그러나 상기한 바와 같이, 예측 특징점에는 오류가 포함될 수 있으며, 예측 특징점에 오류가 포함된 경우에 인공 신경망은 오류를 기반으로 특징점을 검출하게 되어 검출된 특징점 또한 오류를 포함할 수 있는 한계가 있다.However, when the artificial neural network is configured to extract facial feature points directly from the face region map, it is not easy to train the artificial neural network to extract facial feature points because there is insufficient data for learning. In order to overcome this problem, when the position of the predicted feature point extracted by the feature point predictor 300 is input to the artificial neural network together with the face region map, the artificial neural network may be trained to detect the feature point around the predicted feature point. In this way, when an artificial neural network detects a feature point with reference to a predicted feature point, there is an advantage in that the feature point detection performance is improved and the learning time can be reduced. However, as described above, an error may be included in the predicted feature point, and when an error is included in the predicted feature point, the artificial neural network detects the feature point based on the error, and thus the detected feature point also has a limitation in that it may contain an error.

이에 본 실시예에서는 인공 신경망이 예측 특징점에 과도한 픽셀 특징 가중치를 두지 않도록 가우시안 특징맵 생성부(400)가 예측 특징점을 중심으로 가우시안 분포를 갖는 픽셀 특징 가중치가 부가된 가우시안 특징맵을 생성하여 인공 신경망으로 제공한다. 따라서 인공 신경망이 예측 특징점에 더 많은 관심을 갖지만 예측 특징점을 과도하게 집중하지 않고, 예측 특징점의 주변에 대해서도 관심을 분산하는 형태로 특징점을 추출함으로써, 특징점 검출 성능을 향상 시킬 수가 있다.Therefore, in this embodiment, the Gaussian feature map generator 400 generates a Gaussian feature map to which pixel feature weights having a Gaussian distribution are added around the predicted feature points so that the artificial neural network does not place excessive pixel feature weights on the predicted feature points. Provided by Therefore, although the artificial neural network pays more attention to the predicted feature points, it is possible to improve the feature point detection performance by extracting the feature points in a form that disperses the interest even around the predicted feature points without overlying the predicted feature points.

또한 인공 신경망이 가우시안 특징맵에 포함된 픽셀 특징 가중치를 이용하여 특징점을 추출하는 것이 아니라 보정값 추정부(600)로 구현되어 얼굴 영역맵의 예측 특징점 위치를 보정할 보정값을 획득하는 것은 학습 과정 및 추출 정확도를 향상시키기 위해서이다. 그러나 경우에 따라서는 특징점 획득부(700)를 인공 신경망으로 구현하여 직접 특징점을 추출하도록 구성하고 보정값 추정부(600)를 생략할 수도 있다.In addition, the artificial neural network does not extract the feature points using the pixel feature weights included in the Gaussian feature map, but is implemented by the correction value estimator 600 to obtain a correction value to correct the predicted feature point position of the face region map. And to improve extraction accuracy. However, in some cases, the feature point acquisition unit 700 may be implemented as an artificial neural network to directly extract the feature points, and the correction value estimation unit 600 may be omitted.

한편, 본 실시예에 따른 얼굴 특징점 검출 장치는 학습부(800)를 더 포함할 수 있다. 학습부(800)는 인공 신경망으로 구현된 보정값 추정부(600)를 학습시키기 위한 구성으로 학습 과정에서 이용되며 학습이 완료된 경우에는 제외되어도 무방하다.Meanwhile, the apparatus for detecting facial feature points according to the present embodiment may further include a learning unit 800. The learning unit 800 is a configuration for learning the correction value estimating unit 600 implemented as an artificial neural network, and is used in the learning process, and may be excluded when the learning is completed.

보정값 추정부(600)를 학습시키는 경우, 얼굴 특징점 검출 장치는 학습을 위해 특징점에 대한 검증 자료(Ground Truth)가 미리 획득된 입력 영상을 입력받아 특징점 획득부(700)에서 획득된 특징값을 획득하고, 획득된 특징값을 학습부(800)로 전달한다. 그리고 학습부는 학습부(800)는 입력 영상에 대한 검증 자료를 인가받아, 특징점 획득부(700)에서 획득된 특징값의 위치와 검증 자료의 위치 사이의 오차를 기지정된 손실 비용 함수에 따라 판별하고, 판별된 오차를 보정값 추정부(600)으로 역전파하여 보정값 추정부(600)를 학습시킨다. 여기서 학습은 보정값 추정부(600)를 구성하는 인공 신경망에 포함된 다수의 레이어의 가중치를 업데이트하는 방식으로 수행될 수 있다.In the case of training the correction value estimating unit 600, the facial feature point detection apparatus receives an input image obtained in advance for verification of the feature point (Ground Truth) for learning, and obtains the feature value obtained by the feature point acquisition unit 700. It is acquired and transfers the acquired feature value to the learning unit 800. In addition, the learning unit receives the verification data for the input image, the learning unit 800 determines an error between the location of the feature value acquired by the feature point acquisition unit 700 and the location of the verification data according to a known loss cost function. , The determined error is backpropagated to the correction value estimating unit 600 to train the correction value estimating unit 600. Here, the learning may be performed by updating weights of multiple layers included in the artificial neural network constituting the correction value estimating unit 600.

도4 의 (a)에서 맨 오른쪽 이미지를 참조하면, 학습부(800)는 붉은 색의 검증 자료에서의 특징점과 트리 알고리즘으로 획득된 파란색의 예측 특징점 및 노란색의 이전 추출된 특징점 사이의 차이를 손실(Loss)로 계산하여 역전파함으로써, 보정값 추정부(600)를 학습시킬 수 있다. 그리고 상기한 바와 같이, 특징점 획득부(700)가 인공 신경망으로 구현하여 직접 특징점을 추출하도록 구성된 경우, 학습부(800)는 오차를 특징점 획득부(700)로 역전파하여 특징점 획득부(700)를 학습시킬 수도 있다.Referring to the rightmost image in (a) of FIG. 4, the learning unit 800 loses the difference between the feature points in the red verification data, the blue predicted feature points obtained by the tree algorithm, and the yellow previously extracted feature points. By calculating with (Loss) and backpropagating, the correction value estimating unit 600 can be trained. And, as described above, when the feature point acquisition unit 700 is implemented as an artificial neural network and configured to directly extract the feature point, the learning unit 800 backpropagates the error to the feature point acquisition unit 700 and the feature point acquisition unit 700 You can also learn.

도5 는 본 발명의 일 실시예에 따른 얼굴 특징점 검출 방법을 나타낸다.5 shows a method of detecting facial feature points according to an embodiment of the present invention.

도1 내지 도4 를 참조하여, 도5 의 얼굴 특징점 검출 방법을 설명하면, 우선 특징점을 검출할 입력 영상을 획득한다(S10). 여기서 입력 영상은 일반적인 카메라 장치 등을 이용하여 촬영된 2차원 영상으로, 연속되는 다수의 2차원 영상이 프레임으로 포함된 동영상일 수도 있다.Referring to FIGS. 1 to 4, the method of detecting facial feature points of FIG. 5 will be described. First, an input image for detecting the feature points is obtained (S10). Here, the input image is a 2D image captured using a general camera device, and may be a video including a plurality of consecutive 2D images as frames.

입력 영상이 획득되면, 입력 영상에서 얼굴 영역과 배경 영역을 구분하여 얼굴 영역만을 추출하여 얼굴 영역맵을 획득한다(S20). 얼굴 영역맵은 일예로, HOG(Histogram of Gradient)+SVM(Support Vector Machine) 방법을 이용하여 추출될 수 있다.When the input image is acquired, the face area map is obtained by dividing the face area and the background area from the input image and extracting only the face area (S20). As an example, the face area map may be extracted using a Histogram of Gradient (HOG) + Support Vector Machine (SVM) method.

얼굴 영역맵이 획득되면, 인가된 얼굴 영역맵에서 기지정된 방식으로 특징점을 예측하여 예측 특징점으로 출력한다(S30). 예측 특징점은 일예로 트리 알고리즘을 이용하여 획득될 수 있으며, 특히 회귀 트리 분류기를 이용하여 캐스케이드 방식으로 다수 횟수로 반복하여 리파인되어 획득될 수 있다.When the face area map is obtained, a feature point is predicted in a known manner from the applied face area map and output as a predicted feature point (S30). The predicted feature points may be obtained by using a tree algorithm as an example, and in particular, may be obtained by refining by repeating a plurality of times in a cascade manner using a regression tree classifier.

예측 특징점이 획득되면, 획득된 예측 특징점을 중심으로 주변 점들이 가우시안 분포에 따르는 픽셀값을 갖는 가우시안 특징맵을 생성한다(S40). 그리고 생성된 가우시안 특징맵을 얼굴 영역맵과 병합한다(S50).When the predicted feature points are acquired, a Gaussian feature map having pixel values in which neighboring points conform to the Gaussian distribution centered on the acquired predicted feature points is generated (S40). Then, the generated Gaussian feature map is merged with the face area map (S50).

기지정된 패턴 추정 기법에 따라 학습된 인공 신경망으로 구현되는 보정값 추정부(600)가 얼굴 영역맵과 가우시안 특징맵이 병합된 병합맵을 인가받아 예측 특징점의 위치를 보정하기 위한 보정값을 추정한다(S60).The correction value estimating unit 600 implemented by an artificial neural network learned according to a known pattern estimation technique receives a merge map in which the face region map and the Gaussian feature map are merged and estimates a correction value for correcting the position of the predicted feature point. (S60).

보정값이 추정되면, 획득된 예측 특징점의 위치에 대해 보정값을 반영하여 예측 특징점의 위치를 보정함으로써, 특징점의 위치를 검출한다(S70).When the correction value is estimated, the position of the predicted feature point is corrected by reflecting the correction value with respect to the position of the acquired predicted feature point, thereby detecting the position of the feature point (S70).

본 발명에 따른 방법은 컴퓨터에서 실행 시키기 위한 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 여기서 컴퓨터 판독가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 또한 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함하며, ROM(판독 전용 메모리), RAM(랜덤 액세스 메모리), CD(컴팩트 디스크)-ROM, DVD(디지털 비디오 디스크)-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등을 포함할 수 있다.The method according to the present invention may be implemented as a computer program stored in a medium for execution on a computer. Here, the computer-readable medium may be any available medium that can be accessed by a computer, and may also include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, and ROM (Read Dedicated memory), RAM (random access memory), CD (compact disk)-ROM, DVD (digital video disk)-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.The present invention has been described with reference to the embodiments shown in the drawings, but these are merely exemplary, and those of ordinary skill in the art will appreciate that various modifications and other equivalent embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

100: 영상 획득부 200: 얼굴 영역 검출부
300: 특징점 예측부 400: 가우시안 특징맵 생성부
500: 특징맵 병합부 600: 보정값 추정부
700: 특징점 획득부 800: 학습부100: image acquisition unit 200: face region detection unit
300: feature point prediction unit 400: Gaussian feature map generator
500: feature map merging unit 600: correction value estimating unit
700: feature point acquisition unit 800: learning unit

Claims

A face region detector configured to receive an input image including a face, detect a face region, and output a face region map;
A feature point predictor for detecting feature points in the face region map according to a predetermined algorithm and outputting predicted feature points;
A Gaussian feature map generating unit generating a Gaussian feature map having pixel values according to a Gaussian distribution centered on the predicted feature point in the face region map; And
A feature point acquisition unit receiving the face region map and the Gaussian feature map and detecting a location of the feature point of the face region map in consideration of a pixel value of the Gaussian feature map according to a previously learned pattern estimation method; Facial feature point detection device comprising a.

The method of claim 1, wherein the feature point acquisition unit
By obtaining a correction value for correcting the position of the predicted feature point based on the pixel value of the Gaussian feature map, and correcting the position of the predicted feature point using the obtained correction value, the position of the feature point of the face region map Facial feature point detection device to estimate.

The method of claim 2, wherein the feature point acquisition unit
It is implemented as a ResNet (Residual Network) including a plurality of convolutional layers each including a plurality of convolutional boxes including at least one adder that sums the calculation results applied from the previous convolutional box,
At least one convolutional layer among a plurality of convolutional layers included in each of the plurality of convolutional boxes is a facial feature point detection apparatus having different sizes.

The method of claim 1, wherein the feature point prediction unit
Obtaining a random forest including a plurality of decision trees generated by randomly sampling the data of the face region map according to a tree algorithm, and predicting facial feature points using a regression tree classifier for features extracted by the random forest. And a facial feature point detection device for obtaining the predicted feature point by repetitively predicting the face feature point by a predetermined number of times to correct the previously predicted face feature point.

The method of claim 1, wherein the face area detection unit
A facial feature point detection device that detects a face region by a Histogram of Gradient (HOG) + Support Vector Machine (SVM) method.

Receiving an input image including a face, detecting a face region, and outputting a face region map;
Detecting a feature point in the face region map according to a predetermined algorithm and outputting a predicted feature point;
Generating a Gaussian feature map having pixel values according to a Gaussian distribution centered on the predicted feature point in the face region map; And
Receiving the face region map and the Gaussian feature map, and detecting a location of a feature point of the face region map in consideration of a pixel value of the Gaussian feature map according to a previously learned pattern estimation method; Facial feature point detection method comprising a.

The method of claim 6, wherein detecting the position of the feature point
Obtaining a correction value for correcting the position of the predicted feature point based on the pixel value of the Gaussian feature map; And
Estimating the position of the feature point of the face area map by correcting the position of the predicted feature point using the obtained correction value; Facial feature point detection method comprising a.

The method of claim 6, wherein outputting the predicted feature points
Obtaining a random forest including a plurality of decision trees generated by randomly sampling data of the face region map according to a tree algorithm;
Predicting facial feature points using a regression tree classifier for the features extracted by the random forest: And
Obtaining the predicted feature points by repeating the step of obtaining the random forest and predicting the facial feature points a predetermined number of times to correct the previously predicted facial feature points; Facial feature point detection method comprising a.

The method of claim 6, wherein outputting the face area map comprises:
A facial feature point detection method that detects a face region by a Histogram of Gradient (HOG) + Support Vector Machine (SVM) method.