KR20210048622A

KR20210048622A - Apparatus and method for recognizing gender using image reconsturction based on deep learning

Info

Publication number: KR20210048622A
Application number: KR1020190131908A
Authority: KR
Inventors: 박강령; 백나래
Original assignee: 동국대학교 산학협력단
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2021-05-04
Also published as: KR102299360B1

Abstract

The present invention relates to a gender recognition technology, and more particularly, to a gender recognition technology using deep learning-based image restoration. According to an embodiment of the present invention, it is possible to accurately recognize the gender of a person displayed in a low-resolution input image. The gender recognition technology includes: a reconstruction unit for generating a restored image by removing noise from an input image using an image restoration convolutional neural network, and generating a reconstructed image by performing image super-resolution reconstruction through a deep convolutional neural network; a normalizer which normalizes the reconstructed image and an infrared image; a score calculator for calculating a score by inputting the reconstructed image and the infrared image into a gender recognition neural network; a final score calculator for calculating a final score by combining the score of the reconstructed image and the score of the infrared image; and a gender recognition unit for generating the gender information according to the final score.

Description

Gender recognition device and method using deep learning-based image restoration {APPARATUS AND METHOD FOR RECOGNIZING GENDER USING IMAGE RECONSTURCTION BASED ON DEEP LEARNING}

본 발명은 성별 인식 기술에 관한 것으로, 보다 상세하게는 딥러닝 기반 이미지 복원을 이용한 성별 인식 기술에 관한 것이다. The present invention relates to a gender recognition technology, and more particularly, to a gender recognition technology using deep learning-based image restoration.

기존의 성별 인식 시스템의 경우에는 가까운 거리에서 정면으로 촬영된 고 해상도의 얼굴 이미지로 성별 인식을 수행하여 높은 성능을 보여준다.In the case of the existing gender recognition system, it shows high performance by performing gender recognition with a high-resolution face image taken from a close distance.

다만, 실내 환경이 아닌 감시 카메라와 같이 원거리 환경이나 실외 환경에서의 성별 인식이 요구되고 있다. 원거리 환경이나 실외 환경에서는 성별 인식 시에 어려움이 있고, 기존의 성별 인식 시스템은 가시광선 영상만을 활용하기 때문에 배경, 옷, 헤어스타일, 액세서리와 같은 요인에 영향을 많이 받는다.However, gender recognition is required in a remote environment or an outdoor environment such as a surveillance camera, not an indoor environment. There is difficulty in recognizing gender in a distant environment or an outdoor environment, and since the existing gender recognition system uses only visible light images, factors such as background, clothes, hairstyles, and accessories are greatly affected.

본 발명의 배경기술은 대한민국 등록특허공보 10-1827538호에 개시되어 있다.The background technology of the present invention is disclosed in Korean Patent Publication No. 10-1827538.

본 발명은 딥러닝 기반 이미지 복원을 이용하여 저해상도인 입력 영상에 나타난 사람의 성별을 정확하게 인식하는 성별 인식 장치 및 방법을 제공한다.The present invention provides a gender recognition apparatus and method for accurately recognizing the gender of a person appearing in a low-resolution input image using deep learning-based image restoration.

본 발명의 일 측면에 따르면, 성별 인식 장치가 제공된다.According to an aspect of the present invention, a device for recognizing gender is provided.

본 발명의 일 실시 예에 따른 성별 인식 장치는 이미지 복원 컨볼루션 신경망(image restoration CNN)을 사용하여 입력 영상의 노이즈를 제거하여 복원 영상을 생성하고, 딥 컨볼루션 신경망(very deep convolutional networks super resolution, VDSR)을 통해 이미지 수퍼 해상도 재구성(image super resolution reconstruction)을 수행하여 재구성 영상을 생성하는 재구성부; 상기 재구성 영상 및 적외선 영상에 대한 정규화를 수행하는 정규화부, 상기 재구성 영상 및 상기 적외선 영상을 성별 인식 신경망에 입력하여 스코어를 산출하는 스코어 산출부, 상기 재구성 영상의 스코어 및 상기 적외선 영상의 스코어를 결합하여 최종 스코어를 산출하는 최종 스코어 산출부 및 상기 최종 스코어에 따라 성별 정보를 생성하는 성별 인식부를 포함할 수 있다. A gender recognition apparatus according to an embodiment of the present invention generates a reconstructed image by removing noise from an input image using an image restoration convolutional neural network, and a very deep convolutional network super resolution, A reconstruction unit for generating a reconstructed image by performing image super resolution reconstruction through VDSR); A normalization unit that normalizes the reconstructed image and the infrared image, a score calculation unit that calculates a score by inputting the reconstructed image and the infrared image to a gender recognition neural network, and combines the score of the reconstructed image and the score of the infrared image Thus, a final score calculation unit for calculating a final score and a gender recognition unit for generating gender information according to the final score may be included.

상기 성별 인식 신경망은 숏컷(shortcut) 구조의 레지듀얼 러닝(residual learning) 방식을 사용하여 학습되는 신경망일 수 있다.The gender recognition neural network may be a neural network that is learned using a residual learning method having a shortcut structure.

상기 정규화부는 상기 재구성 영상 및 상기 적외선 영상을 직방형의 미리 설정된 크기로 정규화 할 수 있다.The normalization unit may normalize the reconstructed image and the infrared image to a rectangular preset size.

상기 이미지 복원 컨볼루션 신경망은 불필요한 정보를 포함하는 레지듀얼 이미지(residual image)를 학습하여 상기 입력 영상에서 레지듀얼 이미지를 차감하여 상기 복원 영상을 생성할 수 있다.The image restoration convolutional neural network may generate the reconstructed image by learning a residual image including unnecessary information and subtracting a residual image from the input image.

상기 딥 컨볼루션 신경망은 사람의 모양(shape) 정보를 학습하고, 상기 복원 영상에 상기 모양 정보를 더하여 상기 재구성 영상을 생성할 수 있다.The deep convolutional neural network may learn shape information of a person and generate the reconstructed image by adding the shape information to the reconstructed image.

본 발명의 다른 측면에 따르면, 성별 인식 장치에서 수행되는 성별 인식 방법이 제공된다.According to another aspect of the present invention, a gender recognition method performed in a gender recognition device is provided.

본 발명의 일 실시 예에 따른 성별 인식 방법은 이미지 복원 컨볼루션 신경망(image restoration CNN)을 사용하여 입력 영상의 노이즈를 제거하여 복원 영상을 생성하고, 딥 컨볼루션 신경망(very deep convolutional networks super resolution, VDSR)을 통해 이미지 수퍼 해상도 재구성(image super resolution reconstruction)을 수행하여 재구성 영상을 생성하는 단계, 상기 재구성 영상 및 적외선 영상에 대한 정규화를 수행하는 단계; 상기 재구성 영상 및 상기 적외선 영상을 성별 인식 신경망에 입력하여 스코어를 산출하는 단계; 상기 재구성 영상의 스코어 및 상기 적외선 영상의 스코어를 결합하여 최종 스코어를 산출하는 단계 및 상기 최종 스코어에 따라 성별 정보를 생성하는 단계를 포함할 수 있다.In the gender recognition method according to an embodiment of the present invention, a reconstructed image is generated by removing noise from an input image using an image restoration convolutional neural network, and very deep convolutional networks super resolution. Generating a reconstructed image by performing image super resolution reconstruction through VDSR), performing normalization on the reconstructed image and the infrared image; Calculating a score by inputting the reconstructed image and the infrared image to a gender recognition neural network; And calculating a final score by combining the score of the reconstructed image and the score of the infrared image, and generating sex information according to the final score.

상기 재구성 영상 및 적외선 영상에 대한 정규화를 수행하는 단계는 상기 재구성 영상 및 상기 적외선 영상을 직방형의 미리 설정된 크기로 정규화 하는 단계일 수 있다.The normalization of the reconstructed image and the infrared image may be a step of normalizing the reconstructed image and the infrared image to a rectangular preset size.

본 발명의 또 다른 측면에 따르면, 상기 성별 인식 방법을 실행하고 컴퓨터가 판독 가능한 기록매체에 기록된 컴퓨터 프로그램이 제공된다.According to another aspect of the present invention, there is provided a computer program that executes the gender recognition method and is recorded on a computer-readable recording medium.

본 발명의 일 실시 예에 따르면, 저해상도인 입력 영상에 나타난 사람의 성별을 정확하게 인식할 수 있다.According to an embodiment of the present invention, it is possible to accurately recognize the gender of a person displayed in an input image having a low resolution.

또한, 본 발명의 일 실시 예에 따르면, 촬영 환경에 따라 발생하는 블러(blur)나 노이즈에 강건한 성능을 가지는 성별 인식 기능을 제공할 수 있다.In addition, according to an embodiment of the present invention, it is possible to provide a gender recognition function having a robust performance against blur or noise generated according to a photographing environment.

또한, 본 발명의 일 실시 예에 따르면, 적외선 영상을 이용하여 배경, 옷, 헤어스타일, 액세서리 등의 가시광선 영상의 세부적인 영역으로 인해 발생하는 성별 인식 성능의 하락을 방지할 수 있다.In addition, according to an embodiment of the present invention, a decrease in gender recognition performance caused by a detailed area of a visible ray image such as a background, clothes, hairstyle, and accessories may be prevented by using an infrared image.

도 1은 본 발명의 일 실시 예에 따른 성별 인식 장치를 간략히 예시한 블록도.
도 2는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 이미지 복원 컨볼루션 신경망의 구조를 예시한 도면.
도 3은 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 이미지 복원 컨볼루션 신경망의 구조를 예시한 표.
도 4는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 딥 컨볼루션 신경망의 구조를 예시한 도면.
도 5는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 딥 컨볼루션 신경망의 구조를 예시한 표.
도 6은 본 발명의 일 실시예에 따른 성별 인식 장치의 재구성부가 생성하는 복원 영상 및 재구성 영상을 예시한 도면.
도 7은 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 신경망의 구조를 예시한 도면.
도 8은 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 신경망의 구조를 예시한 표.
도 9는 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 성능을 예시한 그래프.
도 10은 본 발명의 일 실시 예에 따른 성별 인식 장치가 성별을 인식하는 방법을 예시한 순서도.
도 11은 본 발명의 일 실시 예에 따른 성별 인식 장치가 여러 영상에 대한 성별을 인식하였을 때 EER을 나타낸 표.1 is a block diagram schematically illustrating a gender recognition apparatus according to an embodiment of the present invention.
2 is a diagram illustrating the structure of an image restoration convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention.
3 is a table illustrating the structure of an image restoration convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention.
4 is a diagram illustrating a structure of a deep convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention.
5 is a table illustrating the structure of a deep convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention.
6 is a diagram illustrating a reconstructed image and a reconstructed image generated by a reconstruction unit of a gender recognition apparatus according to an embodiment of the present invention.
7 is a diagram illustrating the structure of a gender recognition neural network of a gender recognition apparatus according to an embodiment of the present invention.
8 is a table illustrating the structure of a gender recognition neural network of a gender recognition apparatus according to an embodiment of the present invention.
9 is a graph illustrating gender recognition performance of a gender recognition apparatus according to an embodiment of the present invention.
10 is a flow chart illustrating a method of recognizing a gender by a gender recognition apparatus according to an embodiment of the present invention.
11 is a table showing EER when a gender recognition apparatus according to an embodiment of the present invention recognizes gender for multiple images.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서 및 청구항에서 사용되는 단수 표현은, 달리 언급하지 않는 한 일반적으로 "하나 이상"을 의미하는 것으로 해석되어야 한다.In the present invention, various modifications may be made and various embodiments may be provided. Specific embodiments are illustrated in the drawings and will be described in detail through detailed description. However, this is not intended to limit the present invention to a specific embodiment, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the present invention, when it is determined that a detailed description of a related known technology may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted. In addition, the singular expressions used in the specification and claims are to be construed as meaning "one or more" in general, unless otherwise stated.

이하, 본 발명의 바람직한 실시 예를 첨부도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, and in the description with reference to the accompanying drawings, the same or corresponding components are assigned the same reference numbers, and redundant descriptions thereof will be omitted. It should be.

도 1은 본 발명의 일 실시 예에 따른 성별 인식 장치를 간략히 예시한 블록도 이고, 도 2는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 이미지 복원 컨볼루션 신경망의 구조를 예시한 도면이고, 도 3은 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 이미지 복원 컨볼루션 신경망의 구조를 예시한 표이고, 도 4는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 딥 컨볼루션 신경망의 구조를 예시한 도면이고, 도 5는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 딥 컨볼루션 신경망의 구조를 예시한 표이고, 도 6은 본 발명의 일 실시예에 따른 성별 인식 장치의 재구성부가 생성하는 복원 영상 및 재구성 영상을 예시한 도면이고, 도 7은 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 신경망의 구조를 예시한 도면이고, 도 8은 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 신경망의 구조를 예시한 표이고, 도 9는 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 성능을 예시한 그래프이다.1 is a block diagram schematically illustrating a gender recognition apparatus according to an embodiment of the present invention, and FIG. 2 is a diagram illustrating a structure of an image restoration convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention. 3 is a table illustrating the structure of an image restoration convolutional neural network used by a gender recognition device according to an embodiment of the present invention, and FIG. 4 is a deep diagram used by a gender recognition device according to an embodiment of the present invention. A diagram illustrating the structure of a convolutional neural network, and FIG. 5 is a table illustrating the structure of a deep convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention. FIG. 7 is a diagram illustrating a reconstructed image and a reconstructed image generated by the reconstruction unit of the gender recognition apparatus according to the present invention, and FIG. 7 is a diagram illustrating the structure of a gender recognition neural network of the gender recognition apparatus according to an embodiment of the present invention, and FIG. A table illustrating the structure of a gender recognition neural network of a gender recognition apparatus according to an embodiment of the present invention, and FIG. 9 is a graph illustrating gender recognition performance of a gender recognition apparatus according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 성별 인식 장치는 재구성부(110), 정규화부(120), 스코어 산출부(130), 최종 스코어 산출부(140) 및 성별 인식부(150)를 포함한다.Referring to FIG. 1, a gender recognition apparatus according to an embodiment of the present invention includes a reconfiguration unit 110, a normalization unit 120, a score calculation unit 130, a final score calculation unit 140, and a gender recognition unit 150. ).

재구성부(110)는 가시 광선 영상인 입력 영상을 입력 받아 이미지 재구성을 수행하여 재구성 영상을 생성한다. 예를 들어, 재구성부(110)는 이미지 복원 컨볼루션 신경망(image restoration CNN)을 사용하여 입력 영상의 노이즈(noise)를 제거하여 복원 영상을 생성할 수 있다. 만약 입력 영상에 대해 바로 재구성 과정을 수행하는 경우, 바운더리 아티팩트(boundary artifiacts)가 발생하여 성별 인식 성능을 저하시킨다. 따라서, 본 발명의 일 실시 예에 따른 재구성부(110)는 재구성 과정 이전에 노이즈를 제거하는 과정을 수행하여 성별 인식 성능을 향상시킬 수 있다. 이 때, 재구성부(110)가 이용하는 이미지 복원 컨볼루션 신경망은 도 2와 같이 7개의 컨볼루션 레이어(convolution layer)을 포함한다. 이 때, 도 3과 같이 첫 번째 내지 6 번째 컨볼루션 레이어는 64개의 필터를 포함하고, 7번째 컨볼루션 레이어는 3개의 필터를 포함한다. 또한, 모든 컨볼루션 레이어는 3x3의 커널 사이즈(size of kernel)와 1x1의 스트라이드(stride)를 가진다. 또한, 첫 번째 및 7번째 컨볼루션 레이어는 1x1의 패딩을 가지고, 두 번째 및 6 번째 컨볼루션 레이어는 2x2의 패딩을 가지고, 세 번째 및 5 번째 컨볼루션 레이어는 3x3의 패딩을 가지고, 4 번째 컨볼루션 레이어는 4x4의 패딩을 가질 수 있다.The reconstruction unit 110 receives an input image, which is a visible light image, and performs image reconstruction to generate a reconstructed image. For example, the reconstruction unit 110 may generate a reconstructed image by removing noise from an input image using an image restoration convolutional neural network (CNN). If the reconstruction process is performed directly on the input image, boundary artifiacts occur, which degrades gender recognition performance. Accordingly, the reconstruction unit 110 according to an embodiment of the present invention may improve the gender recognition performance by performing a process of removing noise before the reconstruction process. In this case, the image restoration convolutional neural network used by the reconstruction unit 110 includes seven convolution layers as shown in FIG. 2. In this case, as shown in FIG. 3, the first to sixth convolution layers include 64 filters, and the 7th convolution layer includes 3 filters. In addition, all convolution layers have a size of kernel of 3x3 and a stride of 1x1. In addition, the first and seventh convolution layers have 1x1 padding, the second and sixth convolution layers have 2x2 padding, the third and fifth convolution layers have 3x3 padding, and the fourth convolution layer The lution layer may have 4x4 padding.

재구성부(110)는 복원 영상을 딥 컨볼루션 신경망(very deep convolutional networks super resolution, VDSR)을 통해 이미지 수퍼 해상도 재구성(image super resolution reconstruction)을 수행하여 재구성 영상을 생성한다. 이 때, 딥 컨볼루션 신경망은 도 4 및 도 5와 같이 3x3의 커널과, 1x1의 스트라이드, 1x1의 패딩을 포함하는 20개의 컨볼루션 레이어를 포함하고, 제1 컨볼루션 레이어 내지 제19 컨볼루션 레이어는 64개의 필터를 포함하고, 제 20 컨볼루션 레이어는 3개의 필터를 포함할 수 있다. 딥 컨볼루션 신경망은 사람의 모양(shape) 정보를 학습하고, 상기 복원 영상에 상기 모양 정보를 더하여 상기 재구성 영상을 생성할 수 있다.The reconstruction unit 110 generates a reconstructed image by performing image super resolution reconstruction on the reconstructed image through a very deep convolutional networks super resolution (VDSR). In this case, the deep convolutional neural network includes 20 convolution layers including a 3x3 kernel, 1x1 stride, and 1x1 padding, as shown in FIGS. 4 and 5, and the first to 19th convolution layers May include 64 filters, and the twentieth convolution layer may include 3 filters. The deep convolutional neural network may learn shape information of a person and generate the reconstructed image by adding the shape information to the reconstructed image.

즉, 재구성부(110)는 도 6의 610과 같이 입력 영상을 이미지 복원 컨볼루션 신경망에 입력하여 620과 같은 복원 영상을 생성할 수 있다. 또한, 재구성부(110)는 딥 컨볼루션 신경망에 복원 영상을 입력하여 630과 같은 재구성 영상을 생성할 수 있다.That is, the reconstruction unit 110 may generate a reconstructed image 620 by inputting the input image to the image restoration convolutional neural network as shown in 610 of FIG. Also, the reconstruction unit 110 may generate a reconstructed image such as 630 by inputting a reconstructed image to the deep convolutional neural network.

재구성부(110)는 재구성 영상을 정규화부(120)로 전송한다.The reconstruction unit 110 transmits the reconstructed image to the normalization unit 120.

정규화부(120)는 재구성부(110)로부터 재구성 영상을 수신하고, 적외선 카메라에 의해 생성된 적외선 영상을 입력 받고, 재구성 영상 및 적외선 영상을 지정된 크기로 정규화 한다. 예를 들어, 정규화부(120)는 재구성 영상을 정방형이 아닌 사람 몸의 비율을 고려하여 1:2.27 (197 x 447 pixels)의 크기로 정규화 할 수 있다. 따라서, 성별 인식 장치는 남녀간의 차이를 나타내는 몸매 및 몸의 비율 등과 같은 정보가 많이 소실되는 정방형 영상에 비해 직방형 영상으로 정규화를 진행하여 인식성능의 저하를 방지할 수 있다. 정규화부(120)는 정규화 된 재구성 영상 및 적외선 영상을 스코어 산출부(130)로 전송한다.The normalization unit 120 receives a reconstructed image from the reconstruction unit 110, receives an infrared image generated by an infrared camera, and normalizes the reconstructed image and the infrared image to a specified size. For example, the normalization unit 120 may normalize the reconstructed image to a size of 1:2.27 (197 x 447 pixels) in consideration of a ratio of a human body that is not a square shape. Accordingly, the gender recognition apparatus can prevent deterioration of recognition performance by performing normalization to a rectangular image compared to a square image in which a lot of information such as a body and a proportion of a body representing differences between men and women is lost. The normalization unit 120 transmits the normalized reconstructed image and the infrared image to the score calculation unit 130.

스코어 산출부(130)는 재구성 영상 및 적외선 영상을 성별 인식 신경망에 각각 입력하여 스코어를 산출한다. 성별 인식 신경망은 숏컷(shortcut)구조의 레지듀얼 러닝(residual learning) 방식을 사용하여 학습되는 신경망이다. 일반적으로 컨볼루션 신경망은 깊을(deep) 수록 정보가 손실된다는 문제점이 있는데 스코어 산출부(130)가 사용하는 성별 인식 신경망은 컨볼루션 필터(convolutional filter)에 의해 손실되지 않은 정보를 유지하기 위해 숏컷 구조를 취한다. 예를 들어, 도 6 및 도 7과 같이 성별 인식 신경망은 숏컷 구조를 포함하되 conv2~conv5는 보틀넥(bottleneck) 구조이고, 1x1, 3x3, 1x1 순서의 컨볼루션 필터를 포함한다. 첫 번째 1x1 컨볼루션 필터는 차원을 줄이기 위한 목적의 필터이고, 이후의 3x3 컨볼루션 필터 및 1x1 컨볼루션 필터를 통해 다시 차원을 확대하는 과정이 수행된다. 보틀넥 구조는 상술한 바와 같이 1x1 컨볼루션 필터를 활용하여 채널 리덕션(channel reduction)을 통해 연산량을 줄일 수 있다. 또한, 성별 인식 신경망은 재구성 영상 및 적외선 영상 스크래치 방식(train from scratch)으로 훈련된 신경망일 수 있다. 따라서, 성별 인식 신경망은 가시 광선 기반인 재구성 영상에서 사람의 세부적인 정보를 추출할 수 있고, 조명, 배경, 옷, 악세사리 등의 환경적 영향을 적게 받는 적외선 영상에서 부가적인 정보를 추출할 수 있다.The score calculation unit 130 calculates a score by inputting the reconstructed image and the infrared image to the gender recognition neural network, respectively. The gender recognition neural network is a neural network that is trained using a residual learning method of a shortcut structure. In general, the convolutional neural network has a problem that information is lost as it gets deeper, but the gender recognition neural network used by the score calculator 130 has a shortcut structure to maintain information that is not lost by a convolutional filter. Take For example, as shown in FIGS. 6 and 7, the gender recognition neural network includes a shortcut structure, but conv2 to conv5 have a bottleneck structure, and include convolution filters in the order of 1x1, 3x3, and 1x1. The first 1x1 convolution filter is a filter for the purpose of reducing the dimension, and a process of expanding the dimension again is performed through a subsequent 3x3 convolution filter and a 1x1 convolution filter. The bottleneck structure can reduce the amount of computation through channel reduction using a 1x1 convolution filter as described above. In addition, the gender recognition neural network may be a neural network trained by a reconstructed image and an infrared image scratch method (train from scratch). Therefore, the gender recognition neural network can extract detailed information of a person from a reconstructed image based on visible light, and can extract additional information from an infrared image that is less affected by environmental influences such as lighting, background, clothes, and accessories. .

스코어 산출부(130)는 재구성 영상 및 적외선 영상에 대한 각각의 스코어를 최종 스코어 산출부(140)로 전송한다.The score calculation unit 130 transmits each score for the reconstructed image and the infrared image to the final score calculation unit 140.

최종 스코어 산출부(140)는 재구성 영상의 스코어 및 적외선 영상의 스코어를 결합하여 최종 스코어를 산출한다. 예를 들어, 최종 스코어 산출부(140)는 하기의 수학식 1 또는 수학식 2과 같이 최종 스코어를 산출할 수 있다.The final score calculation unit 140 calculates a final score by combining the score of the reconstructed image and the score of the infrared image. For example, the final score calculation unit 140 may calculate a final score as in Equation 1 or Equation 2 below.

이 때, WS 및 WP는 최종 스코어이고, S_vis는 재구성 영상의 스코어이고, S_ir는 적외선 영상의 스코어이다. W는 미리 설정된 가중치이다. At this time, WS and WP are the final scores, S _vis is the score of the reconstructed image, and S _ir is the score of the infrared image. W is a preset weight.

최종 스코어 산출부(140)는 최종 스코어를 성별 인식부(150)로 전송한다.The final score calculation unit 140 transmits the final score to the gender recognition unit 150.

성별 인식부(150)는 최종 스코어가 미리 설정된 임계값 이상이면 입력 영상에 나타난 사람의 성별을 남성으로 나타내고, 최종 스코어가 임계값 미만이면 입력 영상에 나타난 사람의 성별을 영성으로 나타내는 성별 정보를 출력한다. 이 때, 남성을 여성으로 잘못 판단한 에러율인 Type I error와 여성을 남성으로 잘못 판단한 에러율인 Type II error로 나타낼 수 있다. 통상적으로 Type I error와 Type II error는 트레이드 오프(trade-off) 관계로 Type I error가 커지면 Type II error는 줄어들고, Type I error가 줄어들면 Type II error는 커지게 된다. 이러한 관계를 가지는 Type I error와 Type II error가 같아질 때의 에러(error)를 equal error rate(EER)이라고 한다. 성별 인식부(150)는 EER이 얻어지는 지점에서의 임계값을 성별 인식을 위한 임계값으로 사용할 수 있다. 즉, 성별 인식부(150)는 도 9에서 점선에 해당하는 EER이 얻어지는 임계값으로 사용할 수 있다.When the final score is greater than or equal to a preset threshold, the gender recognition unit 150 represents the gender of the person shown in the input image as male, and when the final score is less than the threshold value, the gender recognition unit 150 outputs gender information indicating the gender of the person shown in the input image as spirituality. do. At this time, it can be expressed as Type I error, which is the error rate of incorrectly judging male as female, and Type II error, which is the error rate of wrongly judging female as male. In general, Type I error and Type II error are trade-off. When Type I error increases, Type II error decreases, and Type II error decreases when Type I error decreases. The error when Type I error and Type II error having such a relationship are the same is called equal error rate (EER). The gender recognition unit 150 may use a threshold value at a point at which EER is obtained as a threshold value for gender recognition. That is, the gender recognition unit 150 may be used as a threshold value at which the EER corresponding to the dotted line in FIG. 9 is obtained.

도 10은 본 발명의 일 실시 예에 따른 성별 인식 장치가 성별을 인식하는 방법을 예시한 순서도이다. 이하 설명하는 각 단계는 성별 인식 장치를 구성하는 각 기능부를 통해 수행되는 과정이나 발명의 간결하고 명확한 설명을 위해 각 단계의 주체를 성별 인식 장치로 통칭하도록 한다.10 is a flowchart illustrating a method of recognizing a gender by a gender recognition apparatus according to an embodiment of the present invention. Each step described below will be collectively referred to as a gender recognition device for the purpose of concise and clear description of the invention or a process performed by each functional unit constituting the gender recognition device.

도 10을 참조하면, 단계 1010에서 성별 인식 장치는 가시 광선 영상인 입력 영상을 이미지 복원 컨볼루션 신경망(image restoration convolutional neural network)을 사용하여 입력 영상의 노이즈(noise)를 제거하여 복원 영상을 생성한다. 이미지 복원 컨볼루션 신경망은 도 2와 같이 7개의 컨볼루션 레이어(convolution layer)을 포함한다. 이 때, 도 3과 같이 이미지 복원 컨볼루션 신경망의 첫 번째 내지 6 번째 컨볼루션 레이어는 64개의 필터를 포함하고, 7번째 컨볼루션 레이어는 3개의 필터를 포함할 수 있다. 또한, 모든 컨볼루션 레이어는 3x3의 커널 사이즈(size of kernel)와 1x1의 스트라이드(stride)를 가진다. 또한, 첫 번째 및 7번째 컨볼루션 레이어는 1x1의 패딩을 가지고, 두 번째 및 6 번째 컨볼루션 레이어는 2x2의 패딩을 가지고, 세 번째 및 5 번째 컨볼루션 레이어는 3x3의 패딩을 가지고, 4 번째 컨볼루션 레이어는 4x4의 패딩을 가질 수 있다. 이 때, 이미지 복원 컨볼루션 신경망은 저해상도에 있는 불필요한 정보를 포함하는 레지듀얼 이미지(residual image)를 학습하여 저해상도 영상에서 레지듀얼 이미지를 빼는 방식의 신경망일 수 있다.Referring to FIG. 10, in step 1010, the gender recognition apparatus generates a reconstructed image by removing noise from the input image using an image restoration convolutional neural network, which is a visible light image. . The image restoration convolutional neural network includes 7 convolution layers as shown in FIG. 2. In this case, as shown in FIG. 3, the first to sixth convolutional layers of the image reconstruction convolutional neural network may include 64 filters, and the 7th convolutional layer may include 3 filters. In addition, all convolution layers have a size of kernel of 3x3 and a stride of 1x1. In addition, the first and seventh convolution layers have 1x1 padding, the second and sixth convolution layers have 2x2 padding, the third and fifth convolution layers have 3x3 padding, and the fourth convolution layer The lution layer may have 4x4 padding. In this case, the image restoration convolutional neural network may be a neural network of a method of subtracting a residual image from a low-resolution image by learning a residual image including unnecessary information in a low resolution.

단계 1020에서 성별 인식 장치는 복원 영상을 딥 컨볼루션 신경망(very deep convolutional networks super resolution, VDSR)을 통해 이미지 수퍼 해상도 재구성(image super resolution reconstruction)을 수행하여 재구성 영상을 생성한다. 이 때, 딥 컨볼루션 신경망은 도 4 및 도 5와 같이 3x3의 커널과, 1x1의 스트라이드, 1x1의 패딩을 포함하는 20개의 컨볼루션 레이어를 포함하고, 제1 컨볼루션 레이어 내지 제19 컨볼루션 레이어는 64개의 필터를 포함하고, 제 20 컨볼루션 레이어는 3개의 필터를 포함할 수 있다.In step 1020, the gender recognition apparatus generates a reconstructed image by performing image super resolution reconstruction on the reconstructed image through a very deep convolutional networks super resolution (VDSR). At this time, the deep convolutional neural network includes 20 convolution layers including a 3x3 kernel, 1x1 stride, and 1x1 padding, as shown in FIGS. 4 and 5, and the first to 19th convolution layers May include 64 filters, and the twentieth convolution layer may include 3 filters.

단계 1030에서 성별 인식 장치는 재구성 영상을 미리 지정된 직방형의 크기로 정규화 할 수 있다. 예를 들어, 성별 인식 장치는 재구성 영상을 정방형이 아닌 사람 몸의 비율을 고려하여 1:2.27 (197 x 447 pixels)의 크기로 정규화 할 수 있다. 따라서, 성별 인식 장치는 남녀간의 차이를 나타내는 몸매 및 몸의 비율 등과 같은 정보가 많이 소실되는 정방형 영상에 비해 직방형 영상으로 정규화를 진행하여 인식성능의 저하를 방지할 수 있다.In step 1030, the gender recognition apparatus may normalize the reconstructed image to a predetermined rectangular size. For example, the gender recognition apparatus may normalize the reconstructed image to a size of 1:2.27 (197 x 447 pixels) in consideration of the ratio of the human body rather than the square shape. Accordingly, the gender recognition apparatus can prevent deterioration of recognition performance by performing normalization to a rectangular image compared to a square image in which a lot of information such as a body and a proportion of a body representing differences between men and women is lost.

단계 1040에서 성별 인식 장치는 재구성 영상을 성별 인식 신경망에 입력하여 스코어를 산출한다. 성별 인식 신경망은 숏컷(shortcut)구조의 레지듀얼 러닝(residual learning) 방식을 사용하여 학습되는 신경망이다. 예를 들어, 도 6 및 도 7과 같이 성별 인식 신경망은 숏컷 구조를 포함하되 conv2~conv5는 보틀넥(bottleneck) 구조이고, 1x1, 3x3, 1x1 순서의 컨볼루션 필터를 포함한다. 첫 번째 1x1 컨볼루션 필터는 차원을 줄이기 위한 목적의 필터이고, 이후의 3x3 컨볼루션 필터 및 1x1 컨볼루션 필터를 통해 다시 차원을 확대하는 과정이 수행된다. 보틀넥 구조는 상술한 바와 같이 1x1 컨볼루션 필터를 활용하여 채널 리덕션(channel reduction)을 통해 연산량을 줄일 수 있다. 또한, 성별 인식 신경망은 재구성 영상 및 적외선 영상 스크래치 방식(train from scratch)으로 훈련된 신경망일 수 있다.In step 1040, the gender recognition apparatus calculates a score by inputting the reconstructed image to the gender recognition neural network. The gender recognition neural network is a neural network that is trained using a residual learning method of a shortcut structure. For example, as shown in FIGS. 6 and 7, the gender recognition neural network includes a shortcut structure, but conv2 to conv5 have a bottleneck structure, and include convolution filters in the order of 1x1, 3x3, and 1x1. The first 1x1 convolution filter is a filter for the purpose of reducing the dimension, and a process of expanding the dimension again is performed through a subsequent 3x3 convolution filter and a 1x1 convolution filter. The bottleneck structure can reduce the amount of computation through channel reduction using a 1x1 convolution filter as described above. In addition, the gender recognition neural network may be a neural network trained by a reconstructed image and an infrared image scratch method (train from scratch).

단계 1050에서 성별 인식 장치는 적외선 영상을 미리 지정된 직방형의 크기로 정규화 한다. 예를 들어, 성별 인식 장치는 적외선 영상을 1:2.27 (197 x 447 pixels)의 크기로 정규화 할 수 있다. In step 1050, the gender recognition apparatus normalizes the infrared image to a predetermined rectangular size. For example, the gender recognition device may normalize an infrared image to a size of 1:2.27 (197 x 447 pixels).

단계 1060에서 성별 인식 장치는 적외선 영상을 성별 인식 신경망에 입력하여 스코어를 산출한다. In step 1060, the gender recognition apparatus calculates a score by inputting an infrared image into a gender recognition neural network.

단계 1070에서 성별 인식 장치는 재구성 영상의 스코어와 적외선 영상의 스코어를 스코어 퓨전(score fusion)하여 최종 스코어를 산출한다.In step 1070, the gender recognition apparatus calculates a final score by score fusion between the score of the reconstructed image and the score of the infrared image.

단계 1080에서 성별 인식 장치는 최종 스코어에 따라 성별 정보를 생성한다. 예를 들어, 성별 인식 장치는 최종 스코어가 미리 설정된 임계값 이상이면 입력 영상에 나타난 사람의 성별을 남성으로 나타내고, 최종 스코어가 임계값 미만이면 입력 영상에 나타난 사람의 성별을 영성으로 나타내는 성별 정보를 출력할 수 있다.In step 1080, the gender recognition apparatus generates gender information according to the final score. For example, if the final score is greater than or equal to a preset threshold, the gender recognition device represents the gender of the person shown in the input image as male, and if the final score is less than the threshold value, the gender information indicating the gender of the person shown in the input image as spirituality is displayed. Can be printed.

도 11은 본 발명의 일 실시 예에 따른 성별 인식 장치가 여러 영상에 대한 성별을 인식하였을 때 EER을 나타낸 표이다.11 is a table showing EER when a gender recognition apparatus according to an embodiment of the present invention recognizes gender for multiple images.

도 11을 참조하면, 성별 인식 장치가 데이터셋 SYSU-MM01 및 DBGender-DB2를 통해 성별을 인식하였을 때 EER은 5.27%, 10.98%이다. 따라서, 본 발명의 일 실시 예에 따른 성별 인식 장치는 저해상도 영상의 노이즈를 제거한 이후 재구성 영상을 생성하고, 재구성 영상과 적외선 영상의 스코어 퓨전을 통해 성별을 인식하여 성별 인식의 정확성을 높일 수 있는 것을 확인할 수 있다.Referring to FIG. 11, when the gender recognition device recognizes gender through the datasets SYSU-MM01 and DBGender-DB2, EER is 5.27% and 10.98%. Therefore, the gender recognition apparatus according to an embodiment of the present invention is capable of increasing the accuracy of gender recognition by generating a reconstructed image after removing noise from a low-resolution image, and recognizing gender through score fusion between the reconstructed image and the infrared image. I can confirm.

상술한 성별 인식 방법은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The above-described gender recognition method may be implemented as a computer-readable code on a computer-readable medium. The computer-readable recording medium is, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer-equipped hard disk). I can. The computer program recorded on the computer-readable recording medium may be transmitted to another computing device through a network such as the Internet and installed in the other computing device, thereby being used in the other computing device.

이상에서, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above, even if all the components constituting the embodiments of the present invention are described as being combined into one or operating in combination, the present invention is not necessarily limited to these embodiments. That is, as long as it is within the scope of the object of the present invention, one or more of the components may be selectively combined and operated.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시 예 들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although the operations are illustrated in a specific order in the drawings, it should not be understood that the operations must be executed in the specific order shown or in a sequential order, or all illustrated operations must be executed to obtain a desired result. In certain situations, multitasking and parallel processing may be advantageous. Moreover, the separation of various components in the above-described embodiments should not be understood as necessarily requiring such separation, and the described program components and systems are generally integrated together into a single software product or may be packaged into multiple software products. It should be understood that there is.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at around the embodiments. Those of ordinary skill in the art to which the present invention pertains will be able to understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from a descriptive point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the above description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

Claims

A reconstructed image is generated by removing noise from the input image using an image restoration CNN, and image super resolution is reconstructed through a very deep convolutional networks super resolution (VDSR). a reconstruction unit for generating a reconstructed image by performing reconstruction);
A normalization unit that normalizes the reconstructed image and the infrared image;
A score calculator configured to calculate a score by inputting the reconstructed image and the infrared image to a gender recognition neural network;
A final score calculator for calculating a final score by combining the score of the reconstructed image and the score of the infrared image; And
A gender recognition unit generating gender information according to the final score;
Gender recognition device comprising a.

The method of claim 1,
The gender recognition device, characterized in that the gender recognition neural network is a neural network that is learned using a residual learning method having a shortcut structure.

The method of claim 1,
And the normalization unit normalizes the reconstructed image and the infrared image to a rectangular preset size.

The method of claim 1,
Wherein the image restoration convolutional neural network learns a residual image including unnecessary information, and generates the reconstructed image by subtracting a residual image from the input image.

The method of claim 1,
The deep convolutional neural network learns shape information of a person, and generates the reconstructed image by adding the shape information to the reconstructed image.

In the method for the gender recognition device to recognize gender,
A reconstructed image is generated by removing noise from the input image using an image restoration CNN, and image super resolution is reconstructed through a very deep convolutional networks super resolution (VDSR). generating a reconstructed image by performing reconstruction);
Performing normalization on the reconstructed image and the infrared image;
Calculating a score by inputting the reconstructed image and the infrared image to a gender recognition neural network;
Calculating a final score by combining the score of the reconstructed image and the score of the infrared image; And
Generating gender information according to the final score;
Gender recognition method comprising a.

The method of claim 6,
The gender recognition method, characterized in that the gender recognition neural network is a neural network that is learned using a residual learning method having a shortcut structure.

The method of claim 6,
The step of normalizing the reconstructed image and the infrared image is a step of normalizing the reconstructed image and the infrared image to a rectangular preset size.

The method of claim 6,
Wherein the image restoration convolutional neural network learns a residual image including unnecessary information, and generates the reconstructed image by subtracting a residual image from the input image.

The method of claim 6,
The deep convolutional neural network learns shape information of a person, and generates the reconstructed image by adding the shape information to the reconstructed image.

A computer program recorded on a computer-readable recording medium after executing the gender recognition method of any one of claims 6 to 10.