KR20170121664A

KR20170121664A - Method and apparatus for multiple image information generation and the processing for optimal recognition performance in a deep learning

Info

Publication number: KR20170121664A
Application number: KR1020160114314A
Authority: KR
Inventors: 노용만; 김형일; 김대회; 위삼 알하즈 바다르
Original assignee: 한국과학기술원
Priority date: 2016-04-25
Filing date: 2016-09-06
Publication date: 2017-11-02
Also published as: KR101873645B1

Abstract

Disclosed are a method and apparatus for generating and processing multiple image information for optimal performance in a deep learning structure. A facial color feature learning apparatus according to an embodiment of the present invention includes: a multiplexing unit which generates color images for a plurality of predetermined colors with respect to a facial image and performs processing using a predetermined processing technique on each of the generated color images to generate image information for each of the color images; and a selecting unit which selects one or more image information among the image information on each of the generated color images; and a fusion unit for fusing the feature of each of the one or more selected image information to generate fusion characteristics. Accordingly, the present invention can achieve strong recognition performance for an input facial image which is variously changed.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a method and apparatus for generating and processing multiple image information for optimal performance in a deep running structure,

본 발명은 딥 러닝 구조에서 최적의 성능을 위한 다중 영상 정보 생성 및 처리에 관한 장치 및 그 방법에 관한 기술이다.The present invention relates to an apparatus and method for generating and processing multiple image information for optimal performance in a deep running structure.

얼굴인식 및 얼굴 표정인식에서 입력 얼굴의 조명변화, 포즈 변화, 해상도 변화 등과 같은 다양한 입력 얼굴 영상의 변화가 인식을 어렵게 만드는 것으로 알려져 있다.In face recognition and facial expression recognition, various input face image changes such as lighting change of input face, pose change, resolution change, and the like are known to make recognition difficult.

다양하게 변화하는 입력 얼굴영상을 갖는 환경에서 최근 딥 러닝(deep learning) 기반의 인식 기법은 매우 우수한 성능으로 각광받고 있다. 이 때, 매우 우수한 성능을 얻기 위해서는 입력 데이터의 감추어진(latent) 특징이 충분하게 딥 러닝 구조에 의해 학습(learning)되어야 하며, 기존 연구에 따르면 이를 위해서는 많은 수의 레이어(layer)를 갖는 딥 러닝 구조가 요구된다고 알려져 있다.Recently, the deep learning based recognition method has attracted great attention in the environment having variously changing input face images. In this case, in order to obtain very good performance, the latent characteristic of the input data must be sufficiently learned by the deep learning structure, and according to the existing studies, it is necessary to perform the deep learning with a large number of layers Structure is known to be required.

즉, 입력 영상의 환경에 따라 딥 러닝 구조가 설계되어야 하고 그렇지 못하면 인식 성능 저하로 나타난다.In other words, the deep learning structure should be designed according to the environment of the input image, otherwise the recognition performance deteriorates.

하지만, 매우 많은 수의 레이어를 갖는 딥 러닝 구조는 학습이 매우 어렵고 많은 학습 시간이 요구되는 것으로 알려져 있으며, 이러한 딥 러닝 구조 설계는 개발자의 선험적 지식에 매우 의존하게 된다. 뿐만 아니라, 딥 러닝 구조의 규모가 커질수록 많은 수의 부동 소수점(floating point) 파라미터(parameter)를 갖기 때문에 용량과 계산 측면에서 많은 리소스를 요구하게 된다.However, a deep learning structure with a very large number of layers is known to be very difficult to learn and requires a lot of learning time, and such a deep learning structure design is highly dependent on the developer's a priori knowledge. In addition, as the size of the deep running structure increases, it has a large number of floating point parameters, which requires a lot of resources in terms of capacity and calculation.

따라서, 고정된 딥 러닝 구조에서 최적의 성능을 낼 수 있는 설계 방법이 요구되고 있고, 딥 러닝 구조의 경량화에 관한 연구도 진행되고 있다. Therefore, there is a demand for a design method capable of achieving optimum performance in a fixed deep-running structure, and studies on weight reduction of the deep-running structure are also under way.

본 발명에서는 고정된 딥 러닝 구조에서 다양하게 변하는 입력 얼굴영상에 대해 강인한 인식 성능을 달성하기 위해 입력 데이터의 다중화 및 처리에 관한 기술을 제안한다.The present invention proposes a technique for multiplexing and processing input data to achieve robust recognition performance for input face images that vary in a fixed depth running structure.

본 발명의 실시예들은, 고정된 딥 러닝 구조에서 다양하게 변하는 입력 얼굴영상에 대해 강인한 인식 성능을 달성하기 위해 입력 데이터의 다중화 및 처리를 수행할 수 있는 방법 및 장치를 제공한다.Embodiments of the present invention provide a method and apparatus that can perform multiplexing and processing of input data to achieve robust recognition performance for input face images that vary in a fixed depth running structure.

또한, 본 발명의 실시예들은 다양한 변화를 갖는 입력 영상에 대해 고정된 딥 러닝 구조, 적은 수의 레이어를 갖는 딥 러닝 구조에서 충분하게 학습이 되어 딥 러닝 구조의 인식 성능을 향상시킬 수 있는 방법 및 장치를 제공한다.Further, the embodiments of the present invention can be applied to a method of improving learning performance of a deep learning structure by learning a fixed deep learning structure for an input image having various changes, a deep learning structure having a small number of layers, Device.

또한, 본 발명의 실시예들은 적은 수의 레이어를 갖는 딥 러닝 구조에서도 최적의 성능을 얻을 수 있기 때문에 계산 복잡도와 저장 용량 측면에서 효과적이고 효율적인 설계를 가능하게 하는 방법 및 장치를 제공한다.Embodiments of the present invention also provide a method and apparatus that enable effective and efficient design in terms of computational complexity and storage capacity since an optimal performance can be obtained even in a deep running structure having a small number of layers.

본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 장치는 얼굴 영상에 대하여, 미리 결정된 복수의 컬러들에 대한 컬러 영상들을 생성하고 상기 생성된 컬러 영상들 각각에 대해 미리 결정된 처리 기법을 이용한 처리를 통해 상기 컬러 영상들 각각에 대한 영상 정보를 생성하는 다중화 처리부; 상기 생성된 컬러 영상들 각각에 대한 영상 정보 중 적어도 하나 이상의 영상 정보를 선택하는 선택부; 및 상기 선택된 적어도 하나 이상의 영상 정보 각각의 특징을 융합하여 융합 특징을 생성하는 융합부를 포함한다.The facial color feature learning apparatus according to an embodiment of the present invention generates color images for a predetermined plurality of colors with respect to a face image and processes each of the generated color images using a predetermined processing technique A multiplexing unit for generating image information for each of the color images; A selecting unit selecting at least one image information among the image information for each of the generated color images; And a fusion unit for fusing features of each of the selected at least one image information to generate fusion characteristics.

나아가, 본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 장치는 상기 선택된 적어도 하나 이상의 영상 정보 각각의 특징을 추출하는 특징 추출부를 더 포함하고, 상기 융합부는 상기 특징 추출부에 의해 추출된 상기 적어도 하나 이상의 영상 정보 각각의 특징을 융합하여 상기 융합 특징을 생성할 수 있다.Further, the facial color feature learning apparatus according to an embodiment of the present invention may further include a feature extracting unit that extracts features of each of the selected at least one image information, The convergence characteristic can be generated by fusing the features of each of the above-mentioned image information.

상기 특징 추출부는 상기 적어도 하나 이상의 영상 정보 각각의 DCNN(deep convolutional neural networks)의 특징 또는 DNN(deep neural networks)의 특징을 학습함으로써, 상기 적어도 하나 이상의 영상 정보 각각의 특징을 추출할 수 있다.The feature extraction unit may extract characteristics of each of the at least one or more image information by learning features of deep convolutional neural networks (DCNN) or deep neural networks (DNN) of each of the at least one image information.

상기 다중화 처리부는 조명 보정, 포즈 보정, 영상 정렬 및 초해상화 중 적어도 하나의 전처리 기법을 이용한 처리를 통해 상기 컬러 영상들 각각에 대한 영상 정보를 생성할 수 있다.The multiplexing unit may generate image information for each of the color images through at least one of a preprocessing technique such as illumination correction, pause correction, image alignment, and super resolution.

상기 융합부는 상기 적어도 하나 이상의 영상 정보 각각의 특징을 하나의 벡터로 융합하는 연쇄(concatenation) 기법 또는 심층 신경망 학습에 의한 가중치에 의한 융합 기법을 이용하여 상기 융합 특징을 생성할 수 있다.The fusion unit may generate the convergence feature using a concatenation technique for fusing features of each of the at least one image information into one vector or a fusion technique using a weight based on depth-of-neural network learning.

상기 융합부는 상기 적어도 하나 이상의 영상 정보 각각의 특징에 대하여 DNN(deep neural networks)을 학습함으로써, 상기 융합 특징을 생성할 수 있다.The fusion unit may generate DNN (deep neural networks) for each feature of the at least one image information to generate the fusion feature.

상기 DNN은 복수의 완전 연결 레이어(fully connected layer)로 구성될 수 있다.The DNN may comprise a plurality of fully connected layers.

본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 방법은 얼굴 영상에 대하여, 미리 결정된 복수의 컬러들에 대한 컬러 영상들을 생성하는 단계; 상기 생성된 컬러 영상들 각각에 대해 미리 결정된 처리 기법을 이용한 처리를 통해 상기 컬러 영상들 각각에 대한 영상 정보를 생성하는 단계; 상기 생성된 컬러 영상들 각각에 대한 영상 정보 중 적어도 하나 이상의 영상 정보를 선택하는 단계; 및 상기 선택된 적어도 하나 이상의 영상 정보 각각의 특징을 융합하여 융합 특징을 생성하는 단계를 포함한다.A face color feature learning method according to an embodiment of the present invention includes: generating color images for a predetermined plurality of colors for a face image; Generating image information for each of the color images through processing using a predetermined processing technique for each of the generated color images; Selecting at least one image information among the image information for each of the generated color images; And generating a convergence feature by fusing features of each of the at least one or more selected image information.

나아가, 본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 방법은 상기 선택된 적어도 하나 이상의 영상 정보 각각의 특징을 추출하는 단계를 더 포함하고, 상기 융합 특징을 생성하는 단계는 상기 특징 추출부에 의해 추출된 상기 적어도 하나 이상의 영상 정보 각각의 특징을 융합하여 상기 융합 특징을 생성할 수 있다.Further, the face color feature learning method according to an embodiment of the present invention may further include extracting a feature of each of the selected at least one image information, and the step of generating the fusion feature may include extracting The fusion features may be generated by fusing features of each of the at least one image information.

상기 특징을 추출하는 단계는 상기 적어도 하나 이상의 영상 정보 각각의 DCNN(deep convolutional neural networks)의 특징 또는 DNN(deep neural networks)의 특징을 학습함으로써, 상기 적어도 하나 이상의 영상 정보 각각의 특징을 추출할 수 있다.The extracting of the feature may include extracting features of each of the at least one or more image information by learning features of deep convolutional neural networks (DCNN) or deep neural networks (DNN) have.

상기 영상 정보를 생성하는 단계는 조명 보정, 포즈 보정, 영상 정렬 및 초해상화 중 적어도 하나의 전처리 기법을 이용한 처리를 통해 상기 컬러 영상들 각각에 대한 영상 정보를 생성할 수 있다.The generating of the image information may generate image information for each of the color images through at least one of a preprocessing technique such as illumination correction, pose correction, image alignment, and super resolution.

상기 융합 특징을 생성하는 단계는 상기 적어도 하나 이상의 영상 정보 각각의 특징을 하나의 벡터로 융합하는 연쇄(concatenation) 기법 또는 심층 신경망 학습에 의한 가중치에 의한 융합 기법을 이용하여 상기 융합 특징을 생성할 수 있다.The convergence feature generation step may generate the fusion feature using a concatenation technique of fusing the features of each of the at least one image information into one vector or a fusion technique using a weight by depth neural network learning have.

상기 융합 특징을 생성하는 단계는 상기 적어도 하나 이상의 영상 정보 각각의 특징에 대하여 DNN(deep neural networks)을 학습함으로써, 상기 융합 특징을 생성할 수 있다.The generating the fusion feature may generate the fusion feature by learning deep neural networks (DNNs) for each feature of the at least one image information.

본 발명의 실시예들에 따르면, 고정된 딥 러닝 구조에서 다양하게 변하는 입력 얼굴영상에 대해 강인한 인식 성능을 달성하기 위해 입력 데이터의 다중화 및 처리를 수행할 수 있다.According to embodiments of the present invention, multiplexing and processing of input data can be performed to achieve robust recognition performance on input face images that vary in a fixed deep-running structure.

또한, 본 발명의 실시예들에 따르면 다양한 변화를 갖는 입력 영상에 대해 고정된 딥 러닝 구조, 적은 수의 레이어를 갖는 딥 러닝 구조에서 충분하게 학습이 되어 딥 러닝 구조의 인식 성능을 향상시킬 수 있다.Further, according to the embodiments of the present invention, it is possible to sufficiently learn in a deep learning structure having a fixed depth learning structure and a deep learning learning structure having a small number of layers for an input image having various changes, thereby improving the recognition performance of the deep learning structure .

또한, 본 발명의 실시예들에 따르면 적은 수의 레이어를 갖는 딥 러닝 구조에서도 최적의 성능을 얻을 수 있기 때문에 계산 복잡도와 저장 용량 측면에서 효과적이고 효율적인 설계를 가능하게 할 수 있다.In addition, according to the embodiments of the present invention, optimal performance can be obtained even in a deep-running structure having a small number of layers, thereby enabling effective and efficient design in terms of computational complexity and storage capacity.

도 1은 본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 장치의 프레임워크를 나타낸 개념도이다.
도 2는 다중 영상 정보에 대한 일 예시도를 나타낸 것이다.
도 3은 다중 입력 영상에 대해 다중 특징을 추출하는 과정을 설명하기 위한 예시도를 나타낸 것이다.
도 4는 추출된 다중 특징에 대해 융합된 특징을 생성하는 과정을 설명하기 위한 예시도를 나타낸 것이다.
도 5는 본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 방법에 대한 동작 흐름도를 나타낸 것이다.1 is a conceptual diagram illustrating a framework of a facial color feature learning apparatus according to an embodiment of the present invention.
FIG. 2 shows an example of multiple image information.
FIG. 3 is a diagram illustrating an exemplary process for extracting multiple features for a multiple input image.
FIG. 4 shows an exemplary diagram for explaining a process of generating fused features for extracted multiple features.
FIG. 5 is a flowchart illustrating an operation of a face color feature learning method according to an exemplary embodiment of the present invention.

이하, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 또한, 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. In addition, the same reference numerals shown in the drawings denote the same members.

본 발명의 실시예들은, 고정된 딥 러닝 구조, 적은 수의 레이어를 갖는 딥 러닝 구조에서 다양한 변화를 갖는 입력 영상에 대해 강인한 인식 성능을 달성하기 위해 입력 데이터의 다중화 및 처리를 수행함으로써, 딥 러닝 구조의 인식 성능을 향상시키는 것을 그 요지로 한다.Embodiments of the present invention are directed to a method and apparatus for performing deep processing of input data by performing multiplexing and processing of input data in order to achieve robust recognition performance for an input image having various changes in a fixed deep running structure and a deep running structure having a small number of layers, Thereby improving the recognition performance of the structure.

이 때, 본 발명의 실시예들은 적은 수의 레이어를 갖는 딥 러닝 구조에서도 최적의 성능을 얻을 수 있기 때문에 계산 복잡도와 저장 용량 측면에서 효과적이고 효율적인 설계를 가능하게 할 수 있다.At this time, the embodiments of the present invention can achieve an optimum performance even in a deep-running structure having a small number of layers, thereby enabling effective and efficient design in terms of computational complexity and storage capacity.

도 1은 본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 장치의 프레임워크를 나타낸 개념도로서, 딥 러닝 구조에서 다중 영상 정보 생성과 처리를 이용하여 얼굴 컬러 특징 학습을 수행하는 장치에 대한 구성을 나타낸 것이다.FIG. 1 is a conceptual diagram illustrating a framework of a facial color feature learning apparatus according to an exemplary embodiment of the present invention, and shows a configuration of an apparatus for performing facial color feature learning using multiple image information generation and processing in a deep learning structure will be.

도 1을 참조하면, 본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 장치(100)는 다중화 처리부(110), 선택부(120), 특징 추출부(130) 및 융합부(140)를 포함한다.1, the facial color feature learning apparatus 100 according to an exemplary embodiment of the present invention includes a multiplex processing unit 110, a selection unit 120, a feature extraction unit 130, and a fusion unit 140 .

여기서, 다중화 처리부(110)는 입력 영상 다중화 및 처리를 수행하는 구성이고, 선택부(120)는 다중 정보를 선택하는 구성이며, 특징 추출부(1310)는 딥 러닝 기반 특징을 추출하는 추출기에 대한 구성이고, 융합부(140)는 추출된 다중 특징을 융합하는 구성일 수 있다.Here, the multiplexing processing unit 110 is configured to perform multiplexing and processing of input images, the selecting unit 120 is configured to select multiple information, and the feature extracting unit 1310 extracts features of a depth- And the fusion unit 140 may be configured to fuse the extracted multiple features.

이러한 얼굴 컬러 특징 학습 장치(100)를 구성하는 각 구성에 대해 설명하면 다음과 같다.The constituent elements of the facial color feature learning apparatus 100 will now be described.

1. 입력 영상 다중화 및 처리부(다중화 처리부)1. Input image multiplexing and processing unit (multiplexing processing unit)

다중화 처리부(110)는 한정된 딥 러닝 구조에서 최적의 성능을 얻기 위해 입력 영상의 다중 정보를 활용하기 위한 모듈로, 여러 컬러 모델을 통해 다수개의 다중 컬러를 생성하는 것을 기본으로 한다.The multiplex processing unit 110 is a module for utilizing multiple information of an input image in order to obtain optimal performance in a limited depth running structure, and generates a plurality of multiple colors through various color models.

이 때, 다중화 처리부(110)는 RGB, YCbCr, YIQ, XYZ, HSV, RIQ, RQCr, YQCr 및 La*b* 다중 컬러 모델을 이용하여 입력 영상 예를 들어, 얼굴 이미지에 대한 RGB, YCbCr, YIQ, XYZ, HSV, RIQ, RQCr, YQCr 및 La*b*의 컬러 영상 각각을 생성할 수 있다. In this case, the multiplexing processing unit 110 may convert RGB, YCbCr, YIQ (x, y) into the input image, for example, the face image using the RGB, YCbCr, YIQ, XYZ, HSV, RIQ, RQCr, YQCr, , XYZ, HSV, RIQ, RQCr, YQCr, and La * b *, respectively.

나아가, 다중화 처리부(110)는 환경에 적응적인 전처리 기법 예를 들어, 조명 보정, 포즈 보정, 영상 정렬 및 초해상화(super-resolution) 등을 이용하여 생성된 다중 컬러 영상에 대한 처리를 수행할 수 있다.Furthermore, the multiplexing processing unit 110 performs processing on the multi-color image generated using the environment-adaptive preprocessing technique such as illumination correction, pause correction, image alignment, and super-resolution .

예를 들어, 다중화 처리부(110)는 조명 보정 방법 예를 들어, 히스토그램 균등화, 미분 연산자 기반 필터링, Weber 얼굴, 국부 이진 패턴 등에 기반하여 각 영상 채널에 대해 전처리를 수행함으로써, 조명 변화에 강인한 다중 컬러 영상을 생성할 수 있다.For example, the multiplexing processing unit 110 performs preprocessing on each image channel based on an illumination correction method, for example, histogram equalization, differential operator-based filtering, Weber face, local binary pattern, Images can be generated.

즉, 다중화 처리부(110)는 입력 얼굴 영상을 다양한 컬러 공간으로 변형하여 다중 컬러 영상을 생성한 후 전처리 기법을 통해 다중 컬러 영상을 처리함으로써, 도 2에 도시된 일 예와 같이, 입력 영상에 대한 다중 영상 정보를 생성할 수 있다. That is, the multiplexing processing unit 110 transforms the input face image into various color spaces to generate a multicolor image, and then processes the multicolor image through the preprocessing method. As a result, as shown in FIG. 2, Multiple image information can be generated.

2. 다중 정보 선택부(선택부)2. Multiple information selection unit (selection unit)

선택부(120)는 다중화 처리부(110)에 의해 처리된 다중 영상 정보에 대해 필요한 정보를 선택하는 모듈이다.The selecting unit 120 is a module for selecting necessary information for the multiplexed image information processed by the multiplexing processing unit 110. [

이 때, 선택부(120)는 한정된 레이어 수에 맞는 정보 조합 예를 들어, 컬러 조합, 선택되는 컬러 채널의 수, 다양한 처리에 의해 생성된 영상정보 등에 기초하여 필요한 정보를 선택할 수 있다.At this time, the selection unit 120 can select necessary information based on a combination of information suited to the limited number of layers, for example, a color combination, the number of selected color channels, image information generated by various processes, and the like.

예를 들면, 선택부(120)는 매우 다양하게 변화는 환경에서 적은 훈련 데이터를 갖고 레이어 개수가 2-3개 정도로 상대적으로 얕은 딥 러닝 구조일 경우 많은 수의 영상 정보 선택을 통해 가용한 모든 정보를 활용할 수 있다.For example, the selection unit 120 may include all of the available information through selection of a large number of image information when there are relatively few training data in a highly varied environment, and a deep running structure having a relatively shallow depth of about 2-3 layers Can be utilized.

여기서, 선택부(120)는 다양한 목표함수(objective function)를 도입하여 최적의 정보 조합을 선택할 수 있다.Here, the selection unit 120 may select various optimal combinations of information by introducing various objective functions.

3. 딥 러닝 기반 특징 추출기(특징 추출부)3. Deep Learning Based Feature Extractor (Feature Extraction)

특징 추출부(130)는 선택부(120)에 의해 선택된 정보로부터 특징을 추출하는 모듈로서, 다양한 컨볼루션 레이어로 구성된 심층 컨볼루션 신경망(DCNN; deep convolutional neural network) 또는 다양한 완전 연결 레이어(fully connected layer)로 구성된 심층 신경망(DNN; deep neural networks) 등이 채택될 수 있다.The feature extraction unit 130 is a module for extracting features from information selected by the selection unit 120. The feature extraction unit 130 may include a deep convolutional neural network (DCNN) composed of various convolutional layers or a fully connected deep neural networks (DNNs) composed of a layer of high-order neural network (DNN).

이 딥 러닝 구조는 동일한 형태의 입력, 동일한 다중화 및 처리 방법, 그리고 동일한 다중 정보 선택 방법에 의해 학습될 수 있다.This deep learning structure can be learned by the same type of input, the same multiplexing and processing method, and the same multiple information selection method.

즉, 특징 추출부(130)는 도 3에 도시된 바와 같이, 입력으로 선택된 다중의 영상 정보(입력영상 1 내지 입력영상 n) 각각에 대해 독립적인 특징 추출기(딥 러닝 기반 특징 추출기)를 설계하고, 각 특징 추출기로부터 상보적이고 분별력을 갖춘 특징(특징 1 내지 특징 n)을 획득한다.That is, as shown in FIG. 3, the feature extraction unit 130 designates an independent feature extractor (deep learning-based feature extractor) for each of the multiple pieces of image information (input image 1 to input image n) , And obtains complementary and discerning features (features 1 to n) from each feature extractor.

다시 말해, 특징 추출부(130)는 선택된 컬러 공간의 정보에서 DCNN 학습 또는 DNN 학습을 통해 선택된 컬러 공간에서의 특징을 담은 정보를 획득한다.In other words, the feature extracting unit 130 acquires information including the feature in the color space selected through the DCNN learning or the DNN learning from the information of the selected color space.

4. 다중 특징 융합부(융합부)4. Multi-feature fusion part (fusion part)

융합부(140)는 딥 러닝 기반 특징 추출기 즉, 특징 추출부(130)에 의해 추출된 특징 벡터에 대해 분별력이 최대화되는 특징을 융합(fusion)하는 모듈로서, 추출된 특징 벡터를 융합하여 특징 벡터들이 융합된 융합 특징 또는 융합 특징 벡터를 출력 또는 생성한다.The fusion unit 140 is a module for fusing a feature that maximizes the discrimination power on the feature vector extracted by the feature extraction unit 130. The feature extraction unit 130 fuses the extracted feature vectors, Lt; RTI ID = 0.0 > fused < / RTI > feature or fused feature vector.

여기서, 융합부(140)는 특징 추출부(130)에서 추출된 독립적인 특징들을 하나의 벡터로 융합하는 연쇄(concatenation) 기법, 심층 신경망 학습에 의한 가중치에 의한 융합 등의 방법을 이용하여 추출된 특징들을 융합할 수 있다.Here, the fusion unit 140 extracts features extracted from the features extracted by the feature extraction unit 130 using a concatenation technique that fuses the independent features into a single vector, fusion using weighted by neural network learning Features can be merged.

즉, 융합부(140)는 도 4에 도시된 바와 같이, 선택된 다양한 컬러 공간으로부터 학습된 특징들(특징 1 내지 특징 n)을 융합하여 융합된 특징으로 생성한다.That is, as shown in FIG. 4, the fusion unit 140 fuses the learned features (feature 1 to feature n) from the selected various color spaces to generate fused features.

여기서, 다양한 컬러 특징이 융합된 융합 특징은 높은 얼굴 인식률을 얻기 위하여, 다양한 컬러 공간으로부터 모든 특징 벡터들을 연쇄시켜 결합함으로써, 획득될 수 있다. 그러나, 다양한 컬러 공간의 특징 벡터들의 변별 능력(discriminative capabilities)은 상이하고, 따라서 최대 성능이 모든 특징 벡터들은 간단히 연쇄시킨다고 보장되는 것은 아니다. 또한, 증가된 차원에 의한 차원의 문제도 발생할 수 있다.Here, the convergence feature in which the various color features are fused can be obtained by concatenating and combining all feature vectors from various color spaces in order to obtain a high face recognition rate. However, the discriminative capabilities of the feature vectors in the various color spaces are different, and therefore maximum performance is not guaranteed to simply chain all feature vectors. In addition, a dimensional problem due to an increased dimension may also occur.

본 발명에서의 융합 특징은 복수의 완전 연결 레이어(fully connected layer)로 구성된 DNN(deep neural networks)을 학습함으로써, 생성 또는 획득될 수 있다.The convergence feature in the present invention can be generated or obtained by learning DNNs (deep neural networks) composed of a plurality of fully connected layers.

따라서, 본 발명의 얼굴 컬러 특징 학습 프레임워크는 다양한 컬러 공간 중 선택된 컬러 공간에서 학습된 특징 벡터를 학습하고 모아서 최종 출력되는 특징 벡터는 필수적인 내용만을 담고 있으며, 이는 학습된 특징의 식별력을 향상시킬 수 있다.Therefore, the facial color feature learning framework of the present invention learns and collects the learned feature vectors in the selected color space among various color spaces, and the finally output feature vectors contain only the essential contents, which can improve the discrimination ability of the learned features have.

DNN 파라미터에 의해 연쇄된 특징 벡터는 낮은 차원의 혼합된 특징 공간으로 맵핑(mapping)될 수 있고, 다양한 컬러 공간으로부터 혼합된 컬러 특징은 압축된 형태로 입력 정보를 묘사할 수 있다. 나아가, 식별력 있는 특징 벡터의 경우 DNN의 학습된 가중치에 의해 강조될 수 있다.The feature vectors concatenated by the DNN parameter can be mapped to a mixed feature space of low dimension and mixed color features from various color spaces can describe the input information in compressed form. Furthermore, in the case of discernible feature vectors, it can be emphasized by the learned weight of DNN.

이와 같이, 본 발명의 실시예에 따른 장치는 다중 정보 추출 및 처리, 다중 정보 선택, 딥 러닝 구조에 의한 특징 추출 및 융합을 수행하는 프레임워크에 기반하여 한정된 딥 러닝 구조에서 다양하게 변하는 입력 얼굴영상에 대해 강인한 인식 성능을 달성할 수 있다.As described above, the apparatus according to the embodiment of the present invention can be applied to a variety of input face images in a limited deep learning structure based on a framework for performing multiple information extraction and processing, multiple information selection, feature extraction and fusion by a deep learning structure A robust recognition performance can be achieved.

여기서, 본 발명은 다중 정보를 추출하기 위해 다중 컬러 정보 생성 및 다중 정보를 처리하고, 다중 정보 입력에 대해 분별력을 최대화하는 정보 선택 방법을 사용되며, 상술한 바와 같이 획득되는 정보는 최적의 딥 러닝 구조 학습에서 이용될 수 있다. Here, the present invention uses an information selection method for generating multiple color information and extracting multiple information, and for processing multiple information and maximizing discrimination power for multiple information input. The information obtained as described above is used for optimal deep learning It can be used in structure learning.

또한, 본 발명의 실시예에 따른 장치는 다중 정보 입력을 통해 상대적으로 적은 레이어로 구성됨으로써, 고성능의 딥 러닝 구조를 구성할 수 있다. 이 때, 본 발명은 다중 영상의 개수와 종류 및 처리 방법에 의해 딥 러닝 구조의 레이어 수를 결정할 수 있다.Further, the apparatus according to the embodiment of the present invention is configured with a relatively small number of layers through multiple information inputs, thereby making it possible to construct a high-performance deep-running structure. At this time, the number of layers of the deep learning structure can be determined by the number, type, and processing method of multiple images.

따라서, 본 발명은 입력 영상에 대해 다중 정보를 활용함으로써, 한정된 레이어를 갖는 딥 러닝 구조에 대해 최적의 학습이 가능하게 되고, 이는 고성능의 인식이 가능하게 할 수 있다. Therefore, the present invention utilizes the multiple information for the input image, so that optimal learning can be performed for a deep learning structure having a limited layer, which enables high performance recognition.

또한, 본 발명의 실시예에 따른 장치는 따르면 적은 수의 레이어를 갖는 딥 러닝 구조에서도 최적의 성능을 얻을 수 있기 때문에 계산 복잡도와 저장 용량 측면에서 효과적이고 효율적인 설계를 가능하게 할 수 있다.According to the apparatus according to the embodiment of the present invention, an optimal performance can be obtained even in a deep running structure having a small number of layers, thereby enabling effective and efficient design in terms of computational complexity and storage capacity.

특히, 본 발명은 한정된 레이어 또는 적은 수의 레이어를 갖는 딥 러닝 구조에 대해 제안하는 방법을 통해 학습함으로써, 딥 러닝 구조의 파라미터 저장 및 인식을 위한 연산 측면에서 효과적이고 효율적인 설계가 가능하게 할 수 있다. 즉, 이는 고속의 저전력 설계가 가능하도록 한다.Particularly, the present invention enables effective and efficient design in terms of operation for storing and recognizing parameters of a deep learning structure by learning through a proposed method for a deep learning structure having a limited number of layers or a small number of layers . That is, it enables a high-speed, low-power design.

도 5는 본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 방법에 대한 동작 흐름도를 나타낸 것으로, 상술한 도 1 내지 도 4의 장치에서 수행하는 동작 흐름도를 나타낸 것이다.FIG. 5 is a flowchart illustrating an operation of a facial color feature learning method according to an exemplary embodiment of the present invention, and is a flowchart illustrating an operation performed by the apparatuses of FIGS. 1 to 4 described above.

도 5를 참조하면, 본 발명의 일 실시예에 따른 얼굴 컬러 특징 학습 방법은 입력 영상 예를 들어, 얼굴 영상에 대하여 다중 컬러 영상을 생성한다(S510).Referring to FIG. 5, a face color feature learning method according to an embodiment of the present invention generates a multi-color image for an input image, for example, a face image (S510).

여기서, 단계 S510은 RGB, YCbCr, YIQ, XYZ, HSV, RIQ, RQCr, YQCr 및 La*b* 다중 컬러 모델을 이용하여 입력 영상에 대한 RGB, YCbCr, YIQ, XYZ, HSV, RIQ, RQCr, YQCr 및 La*b*의 컬러 영상 각각을 생성할 수 있다.In step S510, RGB, YCbCr, YIQ, XYZ, HSV, RIQ, RQCr, and YQCr for the input image are calculated using RGB, YCbCr, YIQ, XYZ, HSV, RIQ, RQCr, YQCr, And La * b * color images, respectively.

단계 S510에 의해 다중 컬러 영상이 생성되면, 전처리 기법 예를 들어, 조명 보정, 포즈 보정, 영상 정렬 및 초해상화 등을 이용하여 생성된 다중 컬러 영상 각각에 대하여 전처리를 수행한다(S520).After the multi-color image is generated in step S510, preprocessing is performed on each of the multiple color images generated using the preprocessing technique, for example, illumination correction, pose correction, image alignment, and super resolution (S520).

여기서, 단계 S520은 조명 보정 방법 예를 들어, 히스토그램 균등화, 미분 연산자 기반 필터링, Weber 얼굴, 국부 이진 패턴 등에 기반하여 각 영상 채널에 대해 전처리를 수행함으로써, 조명 변화에 강인한 다중 컬러 영상을 생성할 수 있다.Here, in step S520, a multi-color image robust to illumination change can be generated by performing preprocessing on each image channel based on the illumination correction method, for example, histogram equalization, differential operator-based filtering, Weber face, local binary pattern, have.

단계 S520에 의해 처리된 다중 컬러 영상에서 적어도 하나 이상의 영상 정보를 선택한다(S530).At least one piece of image information is selected from the multicolor image processed in step S520 (S530).

여기서, 단계 S530은 한정된 레이어 수에 맞는 정보 조합 예를 들어, 컬러 조합, 선택되는 컬러 채널의 수, 다양한 처리에 의해 생성된 영상정보 등에 기초하여 필요한 영상 정보를 선택할 수 있다.Here, in step S530, necessary image information can be selected on the basis of an information combination suitable for a limited number of layers, for example, a color combination, the number of color channels to be selected, image information generated by various processes and the like.

물론, 단계 S530은 매우 다양하게 변화는 환경에서 적은 훈련 데이터를 갖고 레이어 개수가 2~3개 정도로 상대적으로 얕은 딥 러닝 구조일 경우에는 많은 수의 영상 정보를 선택할 수 있으며, 다양한 목표함수(objective function)를 도입하여 최적의 정보 조합을 선택할 수 있다.Of course, in step S530, a large number of image information can be selected when there are relatively few training data in a highly varied environment and a relatively shallow deep learning structure, such as two or three layers, and various objective functions ) Can be introduced to select the optimum combination of information.

단계 S530에 의해 영상 정보가 선택되면 선택된 영상 정보로부터 영상 정보 각각의 특징을 추출한다(S540).If the image information is selected in step S530, the feature of each image information is extracted from the selected image information (S540).

여기서, 단계 S540은 다양한 컨볼루션 레이어로 구성된 심층 컨볼루션 신경망(DCNN) 또는 다양한 완전 연결 레이어로 구성된 심층 신경망(DNN) 등을 이용하여 선택된 영상 정보 각각의 특징을 추출할 수 있다.Here, the step S540 may extract the feature of each of the selected image information using a deep convolutional neural network (DCNN) composed of various convolution layers or a deep neural network (DNN) composed of various completely connected layers.

단계 S540에 의해 선택된 영상 정보 각각의 특징이 추출되면 추출된 특징 벡터에 대해 분별력이 최대화되는 특징을 융합함으로써, 융합 특징을 생성한다(S550).When the feature of each of the image information selected in step S540 is extracted, a fusion feature is maximized for the extracted feature vector to generate a fusion feature (S550).

여기서, 단계 S550은 추출된 독립적인 특징들을 하나의 벡터로 융합하는 연쇄(concatenation) 기법, 심층 신경망 학습에 의한 가중치에 의한 융합 등의 방법을 이용하여 융합 특성을 생성할 수 있다.Here, in step S550, fusion characteristics may be generated using a concatenation technique for merging the extracted independent features into a single vector, and a fusion based on weighted learning using depth neural network learning.

단계 S550은 추출된 모든 특징 벡터들을 연쇄시켜 결합함으로써, 융합 특징을 생성할 수 있다. 이러한 융합 특징은 복수의 완전 연결 레이어(fully connected layer)로 구성된 DNN을 학습함으로써, 생성 또는 획득될 수 있다.Step S550 may concatenate and combine all of the extracted feature vectors to generate a fusion feature. This convergence feature can be created or obtained by learning a DNN consisting of a plurality of fully connected layers.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 시스템, 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the systems, devices, and components described in the embodiments may be implemented in various forms such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array ), A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예들에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to embodiments may be implemented in the form of a program instruction that may be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A multiplexing processing unit for generating color images of a predetermined plurality of colors for a face image and generating image information for each of the color images through a process using a predetermined processing technique for each of the generated color images, ;
A selecting unit selecting at least one image information among the image information for each of the generated color images; And
A fusion unit for fusing features of each of the selected at least one image information to generate fusion features,
Wherein the face color feature learning device comprises:

The method according to claim 1,
A feature extracting unit for extracting a feature of each of the at least one selected image information,
Further comprising:
The fusion unit
Wherein the fusion features are generated by fusing features of each of the at least one image information extracted by the feature extraction unit.

3. The method of claim 2,
The feature extraction unit
Characterized by extracting characteristics of each of the at least one image information by learning characteristics of deep convolutional neural networks (DCNN) or deep neural networks (DNN) of each of the at least one image information, Device.

The method according to claim 1,
The multiplexing processor
Wherein the image information for each of the color images is generated through processing using at least one pre-processing method of illumination correction, pose correction, image alignment, and super resolution.

The method according to claim 1,
The fusion unit
Characterized in that the convergence feature is generated using a concatenation technique for fusing features of each of the at least one image information into one vector or a convergence technique for weighting by depth neural network learning .

The method according to claim 1,
The fusion unit
And the convergence feature is generated by learning DNN (deep neural networks) for each feature of the at least one image information.

The method according to claim 6,
The DNN
And a plurality of fully connected layers.

Generating, for the face image, color images for a predetermined plurality of colors;
Generating image information for each of the color images through processing using a predetermined processing technique for each of the generated color images;
Selecting at least one image information among the image information for each of the generated color images; And
Generating fusion features by fusing features of each of the at least one or more selected image information
A face color feature learning method.

9. The method of claim 8,
Extracting features of each of the selected at least one image information
Further comprising:
The step of generating the fusion feature
Wherein the merging feature is generated by merging features of each of the at least one image information extracted by the feature extracting unit.

10. The method of claim 9,
The step of extracting the feature
Characterized by extracting characteristics of each of the at least one image information by learning characteristics of deep convolutional neural networks (DCNN) or deep neural networks (DNN) of each of the at least one image information, Way.

9. The method of claim 8,
The step of generating the image information
Wherein the image information for each of the color images is generated through at least one of a preprocessing method of illumination correction, pose correction, image alignment, and super resolution.

9. The method of claim 8,
The step of generating the fusion feature
Wherein the convergence feature is generated by using a concatenation technique for fusing features of each of the at least one image information into one vector or a convergence technique for weighting by depth neural network learning .

9. The method of claim 8,
The step of generating the fusion feature
And the convergence feature is generated by learning DNN (deep neural networks) for each feature of the at least one image information.

14. The method of claim 13,
The DNN
And a plurality of fully connected layers.