KR102299360B1

KR102299360B1 - Apparatus and method for recognizing gender using image reconsturction based on deep learning

Info

Publication number: KR102299360B1
Application number: KR1020190131908A
Authority: KR
Inventors: 박강령; 백나래
Original assignee: 동국대학교 산학협력단
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2021-09-08
Also published as: KR20210048622A

Abstract

본 발명은 성별 인식 기술에 관한 것으로, 보다 상세하게는 딥러닝 기반 이미지 복원을 이용한 성별 인식 기술에 관한 것이다. 본 발명의 일 실시 예에 따르면, 저해상도인 입력 영상에 나타난 사람의 성별을 정확하게 인식할 수 있다.The present invention relates to gender recognition technology, and more particularly, to a gender recognition technology using deep learning-based image restoration. According to an embodiment of the present invention, it is possible to accurately recognize the gender of a person displayed in a low-resolution input image.

Description

Apparatus and method for gender recognition using deep learning-based image restoration

본 발명은 성별 인식 기술에 관한 것으로, 보다 상세하게는 딥러닝 기반 이미지 복원을 이용한 성별 인식 기술에 관한 것이다. The present invention relates to gender recognition technology, and more particularly, to a gender recognition technology using deep learning-based image restoration.

기존의 성별 인식 시스템의 경우에는 가까운 거리에서 정면으로 촬영된 고 해상도의 얼굴 이미지로 성별 인식을 수행하여 높은 성능을 보여준다.In the case of the existing gender recognition system, it shows high performance by performing gender recognition with a high-resolution face image taken from a close distance.

다만, 실내 환경이 아닌 감시 카메라와 같이 원거리 환경이나 실외 환경에서의 성별 인식이 요구되고 있다. 원거리 환경이나 실외 환경에서는 성별 인식 시에 어려움이 있고, 기존의 성별 인식 시스템은 가시광선 영상만을 활용하기 때문에 배경, 옷, 헤어스타일, 액세서리와 같은 요인에 영향을 많이 받는다.However, gender recognition is required in a remote environment or an outdoor environment, such as a surveillance camera rather than an indoor environment. There are difficulties in recognizing gender in a remote environment or an outdoor environment, and since the existing gender recognition system uses only visible light images, it is greatly affected by factors such as background, clothes, hairstyle, and accessories.

본 발명의 배경기술은 대한민국 등록특허공보 10-1827538호에 개시되어 있다.Background art of the present invention is disclosed in Korean Patent No. 10-1827538.

본 발명은 딥러닝 기반 이미지 복원을 이용하여 저해상도인 입력 영상에 나타난 사람의 성별을 정확하게 인식하는 성별 인식 장치 및 방법을 제공한다.The present invention provides a gender recognition apparatus and method for accurately recognizing the gender of a person shown in a low-resolution input image using deep learning-based image restoration.

본 발명의 일 측면에 따르면, 성별 인식 장치가 제공된다.According to one aspect of the present invention, a gender recognition device is provided.

본 발명의 일 실시 예에 따른 성별 인식 장치는 이미지 복원 컨볼루션 신경망(image restoration CNN)을 사용하여 입력 영상의 노이즈를 제거하여 복원 영상을 생성하고, 딥 컨볼루션 신경망(very deep convolutional networks super resolution, VDSR)을 통해 이미지 수퍼 해상도 재구성(image super resolution reconstruction)을 수행하여 재구성 영상을 생성하는 재구성부; 상기 재구성 영상 및 적외선 영상에 대한 정규화를 수행하는 정규화부, 상기 재구성 영상 및 상기 적외선 영상을 성별 인식 신경망에 입력하여 스코어를 산출하는 스코어 산출부, 상기 재구성 영상의 스코어 및 상기 적외선 영상의 스코어를 결합하여 최종 스코어를 산출하는 최종 스코어 산출부 및 상기 최종 스코어에 따라 성별 정보를 생성하는 성별 인식부를 포함할 수 있다. The apparatus for recognizing gender according to an embodiment of the present invention generates a restored image by removing noise from an input image using an image restoration convolutional neural network (CNN), a deep convolutional network super resolution, a reconstruction unit generating a reconstructed image by performing image super resolution reconstruction through VDSR; A normalizer that normalizes the reconstructed image and the infrared image, a score calculator that calculates a score by inputting the reconstructed image and the infrared image to a gender recognition neural network, combines the score of the reconstructed image and the score of the infrared image and a final score calculator for calculating the final score and a gender recognition unit for generating gender information according to the final score.

상기 성별 인식 신경망은 숏컷(shortcut) 구조의 레지듀얼 러닝(residual learning) 방식을 사용하여 학습되는 신경망일 수 있다.The gender recognition neural network may be a neural network trained using a residual learning method of a shortcut structure.

상기 정규화부는 상기 재구성 영상 및 상기 적외선 영상을 직방형의 미리 설정된 크기로 정규화 할 수 있다.The normalizer may normalize the reconstructed image and the infrared image to a preset size of a rectangular shape.

상기 이미지 복원 컨볼루션 신경망은 불필요한 정보를 포함하는 레지듀얼 이미지(residual image)를 학습하여 상기 입력 영상에서 레지듀얼 이미지를 차감하여 상기 복원 영상을 생성할 수 있다.The image restoration convolutional neural network may learn a residual image including unnecessary information, and may generate the restored image by subtracting the residual image from the input image.

상기 딥 컨볼루션 신경망은 사람의 모양(shape) 정보를 학습하고, 상기 복원 영상에 상기 모양 정보를 더하여 상기 재구성 영상을 생성할 수 있다.The deep convolutional neural network may learn shape information of a person and generate the reconstructed image by adding the shape information to the reconstructed image.

본 발명의 다른 측면에 따르면, 성별 인식 장치에서 수행되는 성별 인식 방법이 제공된다.According to another aspect of the present invention, there is provided a gender recognition method performed by a gender recognition device.

본 발명의 일 실시 예에 따른 성별 인식 방법은 이미지 복원 컨볼루션 신경망(image restoration CNN)을 사용하여 입력 영상의 노이즈를 제거하여 복원 영상을 생성하고, 딥 컨볼루션 신경망(very deep convolutional networks super resolution, VDSR)을 통해 이미지 수퍼 해상도 재구성(image super resolution reconstruction)을 수행하여 재구성 영상을 생성하는 단계, 상기 재구성 영상 및 적외선 영상에 대한 정규화를 수행하는 단계; 상기 재구성 영상 및 상기 적외선 영상을 성별 인식 신경망에 입력하여 스코어를 산출하는 단계; 상기 재구성 영상의 스코어 및 상기 적외선 영상의 스코어를 결합하여 최종 스코어를 산출하는 단계 및 상기 최종 스코어에 따라 성별 정보를 생성하는 단계를 포함할 수 있다.A gender recognition method according to an embodiment of the present invention uses an image restoration convolutional neural network (CNN) to remove noise from an input image to generate a restored image, and a very deep convolutional networks super resolution, generating a reconstructed image by performing image super resolution reconstruction through VDSR; performing normalization on the reconstructed image and the infrared image; calculating a score by inputting the reconstructed image and the infrared image into a gender recognition neural network; It may include calculating a final score by combining the score of the reconstructed image and the score of the infrared image, and generating gender information according to the final score.

상기 재구성 영상 및 적외선 영상에 대한 정규화를 수행하는 단계는 상기 재구성 영상 및 상기 적외선 영상을 직방형의 미리 설정된 크기로 정규화 하는 단계일 수 있다.The normalizing the reconstructed image and the infrared image may be a step of normalizing the reconstructed image and the infrared image to a preset size of a rectangle.

본 발명의 또 다른 측면에 따르면, 상기 성별 인식 방법을 실행하고 컴퓨터가 판독 가능한 기록매체에 기록된 컴퓨터 프로그램이 제공된다.According to another aspect of the present invention, there is provided a computer program that executes the gender recognition method and is recorded in a computer-readable recording medium.

본 발명의 일 실시 예에 따르면, 저해상도인 입력 영상에 나타난 사람의 성별을 정확하게 인식할 수 있다.According to an embodiment of the present invention, it is possible to accurately recognize the gender of a person displayed in a low-resolution input image.

또한, 본 발명의 일 실시 예에 따르면, 촬영 환경에 따라 발생하는 블러(blur)나 노이즈에 강건한 성능을 가지는 성별 인식 기능을 제공할 수 있다.In addition, according to an embodiment of the present invention, it is possible to provide a gender recognition function having robust performance against blur or noise generated according to a shooting environment.

또한, 본 발명의 일 실시 예에 따르면, 적외선 영상을 이용하여 배경, 옷, 헤어스타일, 액세서리 등의 가시광선 영상의 세부적인 영역으로 인해 발생하는 성별 인식 성능의 하락을 방지할 수 있다.In addition, according to an embodiment of the present invention, it is possible to prevent a decrease in gender recognition performance caused by a detailed area of a visible light image such as a background, clothes, hairstyle, and accessories using an infrared image.

도 1은 본 발명의 일 실시 예에 따른 성별 인식 장치를 간략히 예시한 블록도.
도 2는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 이미지 복원 컨볼루션 신경망의 구조를 예시한 도면.
도 3은 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 이미지 복원 컨볼루션 신경망의 구조를 예시한 표.
도 4는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 딥 컨볼루션 신경망의 구조를 예시한 도면.
도 5는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 딥 컨볼루션 신경망의 구조를 예시한 표.
도 6은 본 발명의 일 실시예에 따른 성별 인식 장치의 재구성부가 생성하는 복원 영상 및 재구성 영상을 예시한 도면.
도 7은 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 신경망의 구조를 예시한 도면.
도 8은 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 신경망의 구조를 예시한 표.
도 9는 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 성능을 예시한 그래프.
도 10은 본 발명의 일 실시 예에 따른 성별 인식 장치가 성별을 인식하는 방법을 예시한 순서도.
도 11은 본 발명의 일 실시 예에 따른 성별 인식 장치가 여러 영상에 대한 성별을 인식하였을 때 EER을 나타낸 표.1 is a block diagram schematically illustrating a gender recognition apparatus according to an embodiment of the present invention.
2 is a diagram illustrating the structure of an image reconstruction convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention.
3 is a table illustrating the structure of an image reconstruction convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention.
4 is a diagram illustrating a structure of a deep convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention.
5 is a table illustrating a structure of a deep convolutional neural network used by a gender recognition apparatus according to an embodiment of the present invention.
6 is a diagram illustrating a reconstructed image and a reconstructed image generated by a reconstruction unit of the gender recognition apparatus according to an embodiment of the present invention.
7 is a diagram illustrating a structure of a gender recognition neural network of a gender recognition apparatus according to an embodiment of the present invention.
8 is a table illustrating a structure of a gender recognition neural network of a gender recognition apparatus according to an embodiment of the present invention.
9 is a graph illustrating gender recognition performance of a gender recognition apparatus according to an embodiment of the present invention.
10 is a flowchart illustrating a method for recognizing a gender by a gender recognizing apparatus according to an embodiment of the present invention.
11 is a table showing EER when the gender recognition apparatus according to an embodiment of the present invention recognizes the gender of several images.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서 및 청구항에서 사용되는 단수 표현은, 달리 언급하지 않는 한 일반적으로 "하나 이상"을 의미하는 것으로 해석되어야 한다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and will be described in detail through detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing the present invention, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Also, as used herein and in the claims, the terms "a" and "a" are to be construed to mean "one or more" in general, unless stated otherwise.

이하, 본 발명의 바람직한 실시 예를 첨부도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. do it with

도 1은 본 발명의 일 실시 예에 따른 성별 인식 장치를 간략히 예시한 블록도 이고, 도 2는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 이미지 복원 컨볼루션 신경망의 구조를 예시한 도면이고, 도 3은 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 이미지 복원 컨볼루션 신경망의 구조를 예시한 표이고, 도 4는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 딥 컨볼루션 신경망의 구조를 예시한 도면이고, 도 5는 본 발명의 일 실시 예에 따른 성별 인식 장치가 사용하는 딥 컨볼루션 신경망의 구조를 예시한 표이고, 도 6은 본 발명의 일 실시예에 따른 성별 인식 장치의 재구성부가 생성하는 복원 영상 및 재구성 영상을 예시한 도면이고, 도 7은 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 신경망의 구조를 예시한 도면이고, 도 8은 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 신경망의 구조를 예시한 표이고, 도 9는 본 발명의 일 실시 예에 따른 성별 인식 장치의 성별 인식 성능을 예시한 그래프이다.1 is a block diagram schematically illustrating a gender recognition apparatus according to an embodiment of the present invention, and FIG. 2 is a diagram illustrating the structure of an image reconstruction convolutional neural network used by the gender recognition apparatus according to an embodiment of the present invention 3 is a table illustrating the structure of an image reconstruction convolutional neural network used by a gender recognition device according to an embodiment of the present invention, and FIG. It is a diagram illustrating the structure of a convolutional neural network, and FIG. 5 is a table illustrating the structure of a deep convolutional neural network used by a gender recognition device according to an embodiment of the present invention, and FIG. 6 is an embodiment of the present invention. A diagram illustrating a reconstructed image and a reconstructed image generated by the reconstruction unit of the gender recognition apparatus according to A table illustrating a structure of a gender recognition neural network of a gender recognition apparatus according to an embodiment of the present invention, and FIG. 9 is a graph illustrating a gender recognition performance of a gender recognition apparatus according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 성별 인식 장치는 재구성부(110), 정규화부(120), 스코어 산출부(130), 최종 스코어 산출부(140) 및 성별 인식부(150)를 포함한다.Referring to FIG. 1 , an apparatus for recognizing gender according to an embodiment of the present invention includes a reconstruction unit 110 , a normalization unit 120 , a score calculation unit 130 , a final score calculation unit 140 , and a gender recognition unit 150 . ) is included.

재구성부(110)는 가시 광선 영상인 입력 영상을 입력 받아 이미지 재구성을 수행하여 재구성 영상을 생성한다. 예를 들어, 재구성부(110)는 이미지 복원 컨볼루션 신경망(image restoration CNN)을 사용하여 입력 영상의 노이즈(noise)를 제거하여 복원 영상을 생성할 수 있다. 만약 입력 영상에 대해 바로 재구성 과정을 수행하는 경우, 바운더리 아티팩트(boundary artifiacts)가 발생하여 성별 인식 성능을 저하시킨다. 따라서, 본 발명의 일 실시 예에 따른 재구성부(110)는 재구성 과정 이전에 노이즈를 제거하는 과정을 수행하여 성별 인식 성능을 향상시킬 수 있다. 이 때, 재구성부(110)가 이용하는 이미지 복원 컨볼루션 신경망은 도 2와 같이 7개의 컨볼루션 레이어(convolution layer)을 포함한다. 이 때, 도 3과 같이 첫 번째 내지 6 번째 컨볼루션 레이어는 64개의 필터를 포함하고, 7번째 컨볼루션 레이어는 3개의 필터를 포함한다. 또한, 모든 컨볼루션 레이어는 3x3의 커널 사이즈(size of kernel)와 1x1의 스트라이드(stride)를 가진다. 또한, 첫 번째 및 7번째 컨볼루션 레이어는 1x1의 패딩을 가지고, 두 번째 및 6 번째 컨볼루션 레이어는 2x2의 패딩을 가지고, 세 번째 및 5 번째 컨볼루션 레이어는 3x3의 패딩을 가지고, 4 번째 컨볼루션 레이어는 4x4의 패딩을 가질 수 있다.The reconstruction unit 110 receives an input image, which is a visible light image, and performs image reconstruction to generate a reconstructed image. For example, the reconstruction unit 110 may generate a restored image by removing noise from the input image by using an image restoration convolutional neural network (image restoration CNN). If the reconstruction process is directly performed on the input image, boundary artifacts are generated, thereby degrading gender recognition performance. Therefore, the reconstruction unit 110 according to an embodiment of the present invention may improve the gender recognition performance by performing a process of removing noise before the reconstruction process. At this time, the image reconstruction convolutional neural network used by the reconstruction unit 110 includes seven convolutional layers as shown in FIG. 2 . At this time, as shown in FIG. 3 , the first to sixth convolutional layers include 64 filters, and the seventh convolutional layer includes three filters. Also, all convolutional layers have a kernel size of 3x3 and a stride of 1x1. Also, the 1st and 7th convolutional layers have padding of 1x1, the 2nd and 6th convolutional layers have padding of 2x2, the 3rd and 5th convolutional layers have padding of 3x3, and the 4th convolutional layer has padding of 3x3. The solution layer may have 4x4 padding.

재구성부(110)는 복원 영상을 딥 컨볼루션 신경망(very deep convolutional networks super resolution, VDSR)을 통해 이미지 수퍼 해상도 재구성(image super resolution reconstruction)을 수행하여 재구성 영상을 생성한다. 이 때, 딥 컨볼루션 신경망은 도 4 및 도 5와 같이 3x3의 커널과, 1x1의 스트라이드, 1x1의 패딩을 포함하는 20개의 컨볼루션 레이어를 포함하고, 제1 컨볼루션 레이어 내지 제19 컨볼루션 레이어는 64개의 필터를 포함하고, 제 20 컨볼루션 레이어는 3개의 필터를 포함할 수 있다. 딥 컨볼루션 신경망은 사람의 모양(shape) 정보를 학습하고, 상기 복원 영상에 상기 모양 정보를 더하여 상기 재구성 영상을 생성할 수 있다.The reconstruction unit 110 generates a reconstructed image by performing image super resolution reconstruction on the reconstructed image through a very deep convolutional networks super resolution (VDSR). At this time, the deep convolutional neural network includes 20 convolutional layers including a 3x3 kernel, a 1x1 stride, and a 1x1 padding, as shown in FIGS. 4 and 5, and a first convolutional layer to a 19th convolutional layer. may include 64 filters, and the twentieth convolutional layer may include three filters. The deep convolutional neural network may learn shape information of a person and generate the reconstructed image by adding the shape information to the reconstructed image.

즉, 재구성부(110)는 도 6의 610과 같이 입력 영상을 이미지 복원 컨볼루션 신경망에 입력하여 620과 같은 복원 영상을 생성할 수 있다. 또한, 재구성부(110)는 딥 컨볼루션 신경망에 복원 영상을 입력하여 630과 같은 재구성 영상을 생성할 수 있다.That is, the reconstruction unit 110 may generate a reconstructed image as shown in 620 by inputting the input image to the image reconstruction convolutional neural network as shown in 610 of FIG. 6 . Also, the reconstruction unit 110 may generate a reconstructed image such as 630 by inputting the reconstructed image to the deep convolutional neural network.

재구성부(110)는 재구성 영상을 정규화부(120)로 전송한다.The reconstruction unit 110 transmits the reconstructed image to the normalizer 120 .

정규화부(120)는 재구성부(110)로부터 재구성 영상을 수신하고, 적외선 카메라에 의해 생성된 적외선 영상을 입력 받고, 재구성 영상 및 적외선 영상을 지정된 크기로 정규화 한다. 예를 들어, 정규화부(120)는 재구성 영상을 정방형이 아닌 사람 몸의 비율을 고려하여 1:2.27 (197 x 447 pixels)의 크기로 정규화 할 수 있다. 따라서, 성별 인식 장치는 남녀간의 차이를 나타내는 몸매 및 몸의 비율 등과 같은 정보가 많이 소실되는 정방형 영상에 비해 직방형 영상으로 정규화를 진행하여 인식성능의 저하를 방지할 수 있다. 정규화부(120)는 정규화 된 재구성 영상 및 적외선 영상을 스코어 산출부(130)로 전송한다.The normalizer 120 receives the reconstructed image from the reconstructor 110 , receives the infrared image generated by the infrared camera, and normalizes the reconstructed image and the infrared image to a specified size. For example, the normalizer 120 may normalize the reconstructed image to a size of 1:2.27 (197 x 447 pixels) in consideration of a ratio of a non-square human body. Accordingly, the gender recognition apparatus can prevent deterioration of recognition performance by performing normalization to a rectangular image compared to a square image in which a lot of information such as body proportions and body proportions indicating differences between men and women is lost. The normalizer 120 transmits the normalized reconstructed image and the infrared image to the score calculator 130 .

스코어 산출부(130)는 재구성 영상 및 적외선 영상을 성별 인식 신경망에 각각 입력하여 스코어를 산출한다. 성별 인식 신경망은 숏컷(shortcut)구조의 레지듀얼 러닝(residual learning) 방식을 사용하여 학습되는 신경망이다. 일반적으로 컨볼루션 신경망은 깊을(deep) 수록 정보가 손실된다는 문제점이 있는데 스코어 산출부(130)가 사용하는 성별 인식 신경망은 컨볼루션 필터(convolutional filter)에 의해 손실되지 않은 정보를 유지하기 위해 숏컷 구조를 취한다. 예를 들어, 도 6 및 도 7과 같이 성별 인식 신경망은 숏컷 구조를 포함하되 conv2~conv5는 보틀넥(bottleneck) 구조이고, 1x1, 3x3, 1x1 순서의 컨볼루션 필터를 포함한다. 첫 번째 1x1 컨볼루션 필터는 차원을 줄이기 위한 목적의 필터이고, 이후의 3x3 컨볼루션 필터 및 1x1 컨볼루션 필터를 통해 다시 차원을 확대하는 과정이 수행된다. 보틀넥 구조는 상술한 바와 같이 1x1 컨볼루션 필터를 활용하여 채널 리덕션(channel reduction)을 통해 연산량을 줄일 수 있다. 또한, 성별 인식 신경망은 재구성 영상 및 적외선 영상 스크래치 방식(train from scratch)으로 훈련된 신경망일 수 있다. 따라서, 성별 인식 신경망은 가시 광선 기반인 재구성 영상에서 사람의 세부적인 정보를 추출할 수 있고, 조명, 배경, 옷, 악세사리 등의 환경적 영향을 적게 받는 적외선 영상에서 부가적인 정보를 추출할 수 있다.The score calculator 130 calculates a score by inputting the reconstructed image and the infrared image to the gender recognition neural network, respectively. A gender-aware neural network is a neural network that is trained using a shortcut-structured residual learning method. In general, the convolutional neural network has a problem in that information is lost as the depth increases. take For example, as shown in FIGS. 6 and 7 , the gender recognition neural network includes a shortcut structure, but conv2 to conv5 is a bottleneck structure, and includes a convolution filter in the order of 1x1, 3x3, and 1x1. The first 1x1 convolution filter is a filter for the purpose of reducing the dimension, and the process of enlarging the dimension is performed again through the subsequent 3x3 convolution filter and the 1x1 convolution filter. As described above, the bottleneck structure can reduce the amount of computation through channel reduction by using the 1x1 convolution filter. In addition, the gender recognition neural network may be a neural network trained by a reconstruction image and infrared image scratch method (train from scratch). Therefore, the gender recognition neural network can extract detailed human information from the visible light-based reconstructed image, and additional information can be extracted from the infrared image that is less affected by environmental factors such as lighting, background, clothes, and accessories. .

스코어 산출부(130)는 재구성 영상 및 적외선 영상에 대한 각각의 스코어를 최종 스코어 산출부(140)로 전송한다.The score calculator 130 transmits each score for the reconstructed image and the infrared image to the final score calculator 140 .

최종 스코어 산출부(140)는 재구성 영상의 스코어 및 적외선 영상의 스코어를 결합하여 최종 스코어를 산출한다. 예를 들어, 최종 스코어 산출부(140)는 하기의 수학식 1 또는 수학식 2과 같이 최종 스코어를 산출할 수 있다.The final score calculator 140 calculates a final score by combining the score of the reconstructed image and the score of the infrared image. For example, the final score calculator 140 may calculate the final score as shown in Equation 1 or Equation 2 below.

이 때, WS 및 WP는 최종 스코어이고, S_vis는 재구성 영상의 스코어이고, S_ir는 적외선 영상의 스코어이다. W는 미리 설정된 가중치이다. In this case, WS and WP are final scores, S _vis is the score of the reconstructed image, and S _ir is the score of the infrared image. W is a preset weight.

최종 스코어 산출부(140)는 최종 스코어를 성별 인식부(150)로 전송한다.The final score calculation unit 140 transmits the final score to the gender recognition unit 150 .

성별 인식부(150)는 최종 스코어가 미리 설정된 임계값 이상이면 입력 영상에 나타난 사람의 성별을 남성으로 나타내고, 최종 스코어가 임계값 미만이면 입력 영상에 나타난 사람의 성별을 영성으로 나타내는 성별 정보를 출력한다. 이 때, 남성을 여성으로 잘못 판단한 에러율인 Type I error와 여성을 남성으로 잘못 판단한 에러율인 Type II error로 나타낼 수 있다. 통상적으로 Type I error와 Type II error는 트레이드 오프(trade-off) 관계로 Type I error가 커지면 Type II error는 줄어들고, Type I error가 줄어들면 Type II error는 커지게 된다. 이러한 관계를 가지는 Type I error와 Type II error가 같아질 때의 에러(error)를 equal error rate(EER)이라고 한다. 성별 인식부(150)는 EER이 얻어지는 지점에서의 임계값을 성별 인식을 위한 임계값으로 사용할 수 있다. 즉, 성별 인식부(150)는 도 9에서 점선에 해당하는 EER이 얻어지는 임계값으로 사용할 수 있다.The gender recognition unit 150 outputs gender information indicating the gender of the person shown in the input image as spirituality if the final score is greater than or equal to a preset threshold value, and indicates the gender of the person shown in the input image as male, and if the final score is less than the threshold value, outputs gender information indicating the gender of the person shown in the input image do. At this time, it can be expressed as Type I error, which is the error rate for erroneously judging men as women, and Type II errors, which is the error rate for erroneously judging women as men. In general, Type I error and Type II error have a trade-off relationship. As the Type I error increases, the Type II error decreases, and when the Type I error decreases, the Type II error increases. The error when the Type I error and the Type II error with this relationship are equal is called the equal error rate (EER). The gender recognition unit 150 may use a threshold value at a point where EER is obtained as a threshold value for gender recognition. That is, the gender recognition unit 150 may be used as a threshold value at which EER corresponding to the dotted line in FIG. 9 is obtained.

도 10은 본 발명의 일 실시 예에 따른 성별 인식 장치가 성별을 인식하는 방법을 예시한 순서도이다. 이하 설명하는 각 단계는 성별 인식 장치를 구성하는 각 기능부를 통해 수행되는 과정이나 발명의 간결하고 명확한 설명을 위해 각 단계의 주체를 성별 인식 장치로 통칭하도록 한다.10 is a flowchart illustrating a method for recognizing a gender by a gender recognition apparatus according to an embodiment of the present invention. In each step described below, the subject of each step is collectively referred to as a gender recognition device for a concise and clear explanation of the process or invention performed through each functional unit constituting the gender recognition device.

도 10을 참조하면, 단계 1010에서 성별 인식 장치는 가시 광선 영상인 입력 영상을 이미지 복원 컨볼루션 신경망(image restoration convolutional neural network)을 사용하여 입력 영상의 노이즈(noise)를 제거하여 복원 영상을 생성한다. 이미지 복원 컨볼루션 신경망은 도 2와 같이 7개의 컨볼루션 레이어(convolution layer)을 포함한다. 이 때, 도 3과 같이 이미지 복원 컨볼루션 신경망의 첫 번째 내지 6 번째 컨볼루션 레이어는 64개의 필터를 포함하고, 7번째 컨볼루션 레이어는 3개의 필터를 포함할 수 있다. 또한, 모든 컨볼루션 레이어는 3x3의 커널 사이즈(size of kernel)와 1x1의 스트라이드(stride)를 가진다. 또한, 첫 번째 및 7번째 컨볼루션 레이어는 1x1의 패딩을 가지고, 두 번째 및 6 번째 컨볼루션 레이어는 2x2의 패딩을 가지고, 세 번째 및 5 번째 컨볼루션 레이어는 3x3의 패딩을 가지고, 4 번째 컨볼루션 레이어는 4x4의 패딩을 가질 수 있다. 이 때, 이미지 복원 컨볼루션 신경망은 저해상도에 있는 불필요한 정보를 포함하는 레지듀얼 이미지(residual image)를 학습하여 저해상도 영상에서 레지듀얼 이미지를 빼는 방식의 신경망일 수 있다.Referring to FIG. 10 , in step 1010, the apparatus for recognizing a gender removes noise from an input image, which is a visible light image, by using an image restoration convolutional neural network to generate a restored image. . The image reconstruction convolutional neural network includes seven convolutional layers as shown in FIG. 2 . In this case, as shown in FIG. 3 , the first to sixth convolutional layers of the image reconstruction convolutional neural network may include 64 filters, and the seventh convolutional layer may include three filters. Also, all convolutional layers have a kernel size of 3x3 and a stride of 1x1. Also, the 1st and 7th convolutional layers have padding of 1x1, the 2nd and 6th convolutional layers have padding of 2x2, the 3rd and 5th convolutional layers have padding of 3x3, and the 4th convolutional layer has padding of 3x3. The solution layer may have 4x4 padding. In this case, the image restoration convolutional neural network may be a neural network of a method of subtracting a residual image from a low-resolution image by learning a residual image including unnecessary information in a low resolution.

단계 1020에서 성별 인식 장치는 복원 영상을 딥 컨볼루션 신경망(very deep convolutional networks super resolution, VDSR)을 통해 이미지 수퍼 해상도 재구성(image super resolution reconstruction)을 수행하여 재구성 영상을 생성한다. 이 때, 딥 컨볼루션 신경망은 도 4 및 도 5와 같이 3x3의 커널과, 1x1의 스트라이드, 1x1의 패딩을 포함하는 20개의 컨볼루션 레이어를 포함하고, 제1 컨볼루션 레이어 내지 제19 컨볼루션 레이어는 64개의 필터를 포함하고, 제 20 컨볼루션 레이어는 3개의 필터를 포함할 수 있다.In operation 1020, the gender recognition apparatus generates a reconstructed image by performing image super resolution reconstruction on the reconstructed image through a very deep convolutional networks super resolution (VDSR). At this time, the deep convolutional neural network includes 20 convolutional layers including a 3x3 kernel, a 1x1 stride, and a 1x1 padding, as shown in FIGS. 4 and 5, and a first convolutional layer to a 19th convolutional layer. may include 64 filters, and the twentieth convolutional layer may include three filters.

단계 1030에서 성별 인식 장치는 재구성 영상을 미리 지정된 직방형의 크기로 정규화 할 수 있다. 예를 들어, 성별 인식 장치는 재구성 영상을 정방형이 아닌 사람 몸의 비율을 고려하여 1:2.27 (197 x 447 pixels)의 크기로 정규화 할 수 있다. 따라서, 성별 인식 장치는 남녀간의 차이를 나타내는 몸매 및 몸의 비율 등과 같은 정보가 많이 소실되는 정방형 영상에 비해 직방형 영상으로 정규화를 진행하여 인식성능의 저하를 방지할 수 있다.In operation 1030, the gender recognition apparatus may normalize the reconstructed image to a size of a predetermined rectangle. For example, the gender recognition apparatus may normalize the reconstructed image to a size of 1:2.27 (197 x 447 pixels) in consideration of a non-square human body ratio. Accordingly, the gender recognition apparatus can prevent deterioration of recognition performance by performing normalization to a rectangular image compared to a square image in which a lot of information such as body proportions and body proportions indicating differences between men and women is lost.

단계 1040에서 성별 인식 장치는 재구성 영상을 성별 인식 신경망에 입력하여 스코어를 산출한다. 성별 인식 신경망은 숏컷(shortcut)구조의 레지듀얼 러닝(residual learning) 방식을 사용하여 학습되는 신경망이다. 예를 들어, 도 6 및 도 7과 같이 성별 인식 신경망은 숏컷 구조를 포함하되 conv2~conv5는 보틀넥(bottleneck) 구조이고, 1x1, 3x3, 1x1 순서의 컨볼루션 필터를 포함한다. 첫 번째 1x1 컨볼루션 필터는 차원을 줄이기 위한 목적의 필터이고, 이후의 3x3 컨볼루션 필터 및 1x1 컨볼루션 필터를 통해 다시 차원을 확대하는 과정이 수행된다. 보틀넥 구조는 상술한 바와 같이 1x1 컨볼루션 필터를 활용하여 채널 리덕션(channel reduction)을 통해 연산량을 줄일 수 있다. 또한, 성별 인식 신경망은 재구성 영상 및 적외선 영상 스크래치 방식(train from scratch)으로 훈련된 신경망일 수 있다.In step 1040, the gender recognition device calculates a score by inputting the reconstructed image to the gender recognition neural network. A gender-aware neural network is a neural network that is trained using a shortcut-structured residual learning method. For example, as shown in FIGS. 6 and 7 , the gender recognition neural network includes a shortcut structure, but conv2 to conv5 is a bottleneck structure, and includes a convolution filter in the order of 1x1, 3x3, and 1x1. The first 1x1 convolution filter is a filter for the purpose of reducing the dimension, and the process of enlarging the dimension is performed again through the subsequent 3x3 convolution filter and the 1x1 convolution filter. As described above, the bottleneck structure can reduce the amount of computation through channel reduction by using the 1x1 convolution filter. In addition, the gender recognition neural network may be a neural network trained by a reconstruction image and infrared image scratch method (train from scratch).

단계 1050에서 성별 인식 장치는 적외선 영상을 미리 지정된 직방형의 크기로 정규화 한다. 예를 들어, 성별 인식 장치는 적외선 영상을 1:2.27 (197 x 447 pixels)의 크기로 정규화 할 수 있다. In step 1050, the gender recognition apparatus normalizes the infrared image to a size of a predetermined rectangle. For example, the gender recognition device may normalize an infrared image to a size of 1:2.27 (197 x 447 pixels).

단계 1060에서 성별 인식 장치는 적외선 영상을 성별 인식 신경망에 입력하여 스코어를 산출한다. In step 1060, the gender recognition device calculates a score by inputting the infrared image into the gender recognition neural network.

단계 1070에서 성별 인식 장치는 재구성 영상의 스코어와 적외선 영상의 스코어를 스코어 퓨전(score fusion)하여 최종 스코어를 산출한다.In step 1070, the gender recognition device calculates a final score by score fusion of the score of the reconstructed image and the score of the infrared image.

단계 1080에서 성별 인식 장치는 최종 스코어에 따라 성별 정보를 생성한다. 예를 들어, 성별 인식 장치는 최종 스코어가 미리 설정된 임계값 이상이면 입력 영상에 나타난 사람의 성별을 남성으로 나타내고, 최종 스코어가 임계값 미만이면 입력 영상에 나타난 사람의 성별을 영성으로 나타내는 성별 정보를 출력할 수 있다.In step 1080, the gender recognition device generates gender information according to the final score. For example, if the final score is greater than or equal to a preset threshold, the gender recognition device indicates the gender of the person shown in the input image as male, and if the final score is less than the threshold, gender information indicating the gender of the person shown in the input image as spirituality can be printed out.

도 11은 본 발명의 일 실시 예에 따른 성별 인식 장치가 여러 영상에 대한 성별을 인식하였을 때 EER을 나타낸 표이다.11 is a table showing EER when the gender recognition apparatus according to an embodiment of the present invention recognizes the gender of several images.

도 11을 참조하면, 성별 인식 장치가 데이터셋 SYSU-MM01 및 DBGender-DB2를 통해 성별을 인식하였을 때 EER은 5.27%, 10.98%이다. 따라서, 본 발명의 일 실시 예에 따른 성별 인식 장치는 저해상도 영상의 노이즈를 제거한 이후 재구성 영상을 생성하고, 재구성 영상과 적외선 영상의 스코어 퓨전을 통해 성별을 인식하여 성별 인식의 정확성을 높일 수 있는 것을 확인할 수 있다.Referring to FIG. 11 , when the gender recognition device recognizes the gender through the datasets SYSU-MM01 and DBGender-DB2, the EERs are 5.27% and 10.98%. Therefore, the gender recognition apparatus according to an embodiment of the present invention generates a reconstructed image after removing noise from the low-resolution image, and recognizes the gender through score fusion of the reconstructed image and the infrared image to increase the accuracy of gender recognition. can be checked

상술한 성별 인식 방법은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The above-described gender recognition method may be implemented as a computer-readable code on a computer-readable medium. The computer-readable recording medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer-equipped hard disk). can The computer program recorded in the computer-readable recording medium may be transmitted to another computing device through a network such as the Internet and installed in the other computing device, thereby being used in the other computing device.

이상에서, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above, even though it has been described that all components constituting the embodiment of the present invention are combined or operated as one, the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining one or more.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시 예 들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although acts are shown in a specific order in the drawings, it should not be understood that the acts must be performed in the specific order or sequential order shown, or that all shown acts must be performed to obtain a desired result. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of the various components in the embodiments described above should not be construed as necessarily requiring such separation, and the program components and systems described may generally be integrated together into a single software product or packaged into multiple software products. It should be understood that there is

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at focusing on the embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

Claims

a reconstruction unit for generating a reconstructed image by removing noise from an input image using an image reconstruction convolutional neural network, and performing super-resolution image reconstruction through a deep convolutional neural network to generate a reconstructed image;
a normalizer that normalizes the reconstructed image and the infrared image;
a score calculator for calculating a score by inputting the reconstructed image and the infrared image into a gender recognition neural network;
a final score calculator for calculating a final score by score fusion of the score of the reconstructed image and the score of the infrared image; and
If the final score is greater than or equal to a preset threshold, a gender recognition unit that indicates the gender of the person shown in the input image as male, and outputs gender information indicating that the person is female if the final score is less than the threshold;
The image restoration convolutional neural network generates the restored image by learning a residual image including unnecessary information and subtracting the residual image from the input image,
The deep convolutional neural network learns human shape information and generates the reconstructed image by adding the shape information to the reconstructed image,
The gender recognition neural network is trained using a residual learning method of a shortcut structure and a bottleneck structure,
The final score calculation unit calculates the final score by the following Equation 1 or Equation 2,
[Equation 1]

[Equation 2]

Here, WS and WP are the final scores, Svis is the score of the reconstructed image, Sir is the score of the infrared image, W is a preset weight,
The threshold value is a gender recognition apparatus, characterized in that using the error (error) value when the error rate of erroneous determination of a male as a woman and an error rate of erroneously determining a woman as a male are the same.

delete

generating a reconstructed image by removing noise from the input image using an image reconstruction convolutional neural network, and generating a reconstructed image by performing image super-resolution reconstruction through a deep convolutional neural network;
performing normalization on the reconstructed image and the infrared image;
calculating a score by inputting the reconstructed image and the infrared image into a gender recognition neural network;
calculating a final score by score fusion of the score of the reconstructed image and the score of the infrared image; and
If the final score is greater than or equal to a preset threshold, outputting gender information indicating the gender of the person shown in the input image as male, and indicating as female if the final score is less than the threshold.
The image restoration convolutional neural network generates the restored image by learning a residual image including unnecessary information and subtracting the residual image from the input image,
The deep convolutional neural network learns human shape information and generates the reconstructed image by adding the shape information to the reconstructed image,
The gender recognition neural network is trained using a residual learning method of a shortcut structure and a bottleneck structure,
The step of calculating the final score is
Calculate the final score by the following Equation 1 or Equation 2,
[Equation 1]

[Equation 2]

Here, WS and WP are the final scores, Svis is the score of the reconstructed image, Sir is the score of the infrared image, W is a preset weight,
The threshold value is a gender recognition method, characterized in that using an error (error) value when the error rate of erroneously erroneously judging a man as a woman and an error rate of erroneously judging a woman as a man is the same.

delete

A computer program that executes the gender recognition method according to claim 6 and is recorded in a computer-readable recording medium.