KR20230016470A

KR20230016470A - Method and apparatus for detecting mask wearing

Info

Publication number: KR20230016470A
Application number: KR1020210098007A
Authority: KR
Inventors: 장우진
Original assignee: 주식회사 에스원
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2023-02-02
Also published as: KR102615951B1

Abstract

Provided is a method for detecting mask wearing from an entire image inputted from a mask wearing detection device. The method for detecting mask wearing comprises: a step of normalizing and pre-processing an image of a facial area detected from the entire image; and a step of detecting a mask normal wearing state and a mask abnormal wearing state wherein a mask completely covers the nose and mouth corresponding to the respiratory organ based on a deep learning network from the normalized and preprocessed image of the facial area.

Description

Mask wearing detection method and device {METHOD AND APPARATUS FOR DETECTING MASK WEARING}

본 발명은 마스크 착용 감지 방법 및 장치에 관한 것으로, 보다 상세하게는 경량화된 신경 네트워크를 이용하여 마스크의 정상 착용 여부를 감지할 수 있는 마스크 착용 감지 방법 및 장치에 관한 것이다. The present invention relates to a mask wearing detection method and device, and more particularly, to a mask wearing detection method and device capable of detecting whether a mask is normally worn using a lightweight neural network.

바이러스의 확산 방지를 위해 마스크 착용이 의무화되었다. 종래에는 일상에서 마스크 착용이 의무가 아니었기 때문에 마스크 미착용 감지 기능의 필요성이 크게 부각되지 않았으나, 현재는 감염병 예방을 위해 코와 입을 완전히 가렸는지 확인할 수 있는 마스크 미착용 감지 기능의 필요성이 증가되었다. 특히, 공공시설 및 대형 마트 등에 마스크 착용 감지 기능의 기기 수요가 증가되고 있어, 정부의 마스크 착용 지침을 따른 마스크 정상 착용 감지 기능이 필요하다. Wearing a mask is mandatory to prevent the spread of the virus. Conventionally, since wearing a mask was not compulsory in daily life, the need for a non-mask detection function has not been highlighted. However, the need for a non-mask detection function that can check whether the nose and mouth are completely covered to prevent infectious diseases has increased. In particular, as demand for devices with a mask wearing detection function is increasing in public facilities and large marts, it is necessary to have a mask wearing detection function in accordance with the government's mask wearing guidelines.

최근 딥러닝 기반 영상인식 기술의 발전으로, AI 기술을 적용한 지능형 CCTV, 얼굴인식 출입제어 등 다양한 장소에서 영상인식 기술이 사용되고 있다. With the recent development of deep learning-based image recognition technology, image recognition technology is being used in various places, such as intelligent CCTV using AI technology and face recognition access control.

딥러닝은 일반적으로 네트워크의 깊이가 깊을수록 성능이 좋아진다. 최근 딥러닝 기반의 영상처리 기술의 발전으로 스퀴즈넷(SqueezeNet), 모바일넷(MobileNet), 셔플넷(ShuffleNet)과 같이 높은 정확도를 가지면서 경량화된 딥러닝 네트워크 구조가 많이 제안되었다. 하지만 이렇게 경량화된 네트워크도 저가형 프로세서나 구형 프로세서와 같이 낮은 성능의 프로세서에서 동작할 때 수백 밀리 초에서 수초까지의 연산 시간을 필요로 한다. 동작 횟수가 많은 출입문 같은 환경에서 긴 연산시간은 불편함을 야기한다. Deep learning generally improves performance as the depth of the network increases. Recently, with the development of deep learning-based image processing technology, many lightweight deep learning network structures with high accuracy, such as SqueezeNet, MobileNet, and ShuffleNet, have been proposed. However, even this lightweight network requires a computation time of hundreds of milliseconds to several seconds when operated on a low-performance processor such as a low-end processor or an older processor. In an environment such as a door with a large number of operations, long computation time causes inconvenience.

또한 딥러닝을 활용해 마스크 착용 감지를 위해서는 사람의 얼굴을 검출한 후 마스크 정상 착용을 판정하는 과정이 필요하다. 마스크 착용 감지를 위해, 일련의 과정을 하나의 새로운 네트워크에서 동작시키면 얼굴 검출과 같이 이미 사용 중이 기능을 중복하여 수행하며, 전체 시스템의 필요 연산량이 증가되는 문제가 있다. 따라서 원활하고 신속한 마스크 착용 감지를 위해 저가 및 구형 프로세서에서도 빠른 동작을 보장하며, 기존에 설치된 여러 플랫폼에도 유연하게 사용 가능한 경량화된 마스크 착용 감지를 위한 딥러닝 네트워크 개발이 필요하다.In addition, in order to detect wearing a mask using deep learning, a process of detecting a person's face and then determining whether the mask is worn normally is necessary. For mask wearing detection, if a series of processes are operated in one new network, functions that are already in use, such as face detection, are duplicated, and the amount of computation required for the entire system increases. Therefore, for smooth and rapid mask wearing detection, it is necessary to develop a deep learning network for lightweight mask wearing detection that guarantees fast operation even with low-cost and old processors and can be flexibly used on various existing platforms.

본 발명이 해결하려는 과제는 마스크의 정상 착용 여부를 감지할 수 있는 마스크 착용 감지 방법 및 장치를 제공하는 것이다. The problem to be solved by the present invention is to provide a mask wearing detection method and device capable of detecting whether or not the mask is normally worn.

또한 본 발명이 해결하려는 과제는 기존에 사용 중인 딥러닝 기능을 활용하여 연산량을 감소시키고 빠른 동작을 보장할 수 있는 마스크 착용 감지 방법 및 장치를 제공하는 것이다. In addition, the problem to be solved by the present invention is to provide a mask wearing detection method and device that can reduce the amount of calculation and ensure fast operation by utilizing the deep learning function currently in use.

본 발명의 한 실시 예에 따르면, 마스크 착용 감지 장치에서 입력되는 전체 영상으로부터 마스크 착용을 감지하는 방법이 제공된다. 마스크 착용 감지 방법은 상기 전체 영상으로부터 검출된 얼굴 영역의 영상을 정규화 및 전처리하는 단계, 그리고 상기 정규화 및 전처리된 얼굴 영역의 영상으로부터 딥러닝 네트워크를 기반으로 마스크가 호흡기관에 해당하는 코와 입을 완전히 가린 마스크 정상 착용 상태와 마스크 비정상 착용 상태를 검출하는 단계를 포함한다. 이때 상기 딥러닝 네트워크는 적어도 하나의 컨볼루션 계층, 그리고 상기 적어도 하나의 컨볼루션 계층으로부터 출력되는 데이터로부터 하나의 출력 값을 생성하는 완전 연결 계층을 포함하고, 상기 적어도 하나의 컨볼루션 계층은 해상도를 줄이면서 특징을 추출하는 적어도 하나의 역 블록과, 상기 역 블록을 통과할 때마다 채널의 수를 증가시켜 특징을 추출하는 적어도 하나의 역 잔여 블록을 포함한다. According to one embodiment of the present invention, a method of detecting wearing of a mask from an entire image input by a mask wearing detection device is provided. The mask wearing detection method includes the steps of normalizing and pre-processing the image of the face region detected from the entire image, and using a deep learning network from the normalized and pre-processed image of the face region so that the mask completely covers the nose and mouth corresponding to the respiratory tract. and detecting the normal wearing state of the covered mask and the abnormal wearing state of the mask. In this case, the deep learning network includes at least one convolution layer and a fully connected layer generating one output value from data output from the at least one convolution layer, and the at least one convolution layer determines the resolution. It includes at least one inverse block for extracting a feature while reducing it, and at least one inverse residual block for extracting a feature by increasing the number of channels whenever it passes through the inverse block.

상기 정규화 및 전처리하는 단계는 상기 전체 영상으로부터 검출된 얼굴 영역에서의 눈, 코 및 입의 위치를 획득하는 단계, 그리고 상기 얼굴 영역에서 눈, 코 및 입의 위치를 지정된 위치로 옮기는 단계를 포함할 수 있다. The normalizing and preprocessing may include acquiring positions of the eyes, nose, and mouth in the face region detected from the entire image, and moving the positions of the eyes, nose, and mouth in the face region to designated positions. can

상기 정규화 및 전처리하는 단계는 상기 전체 영상으로부터 상기 얼굴 영역의 영상만을 자르는 단계, 그리고 상기 얼굴 영역의 영상을 흑백 영상으로 변환하는 단계를 포함할 수 있다. The normalizing and preprocessing may include cropping only the face region image from the entire image and converting the face region image into a black and white image.

상기 검출하는 단계는 상기 딥러닝 네트워크의 출력 값이 설정된 임계값보다 클 때 상기 얼굴 영역의 영상을 마스크 정상 착용 상태로 분류하는 단계, 그리고 상기 딥러닝 네트워크의 출력 값이 상기 임계값 이하일 때 상기 얼굴 영역의 영상을 마스크 비정상 착용 상태로 분류하는 단계를 포함할 수 있다. The detecting may include classifying the image of the face region as a normal mask wearing state when the output value of the deep learning network is greater than a set threshold, and the face when the output value of the deep learning network is less than or equal to the threshold. A step of classifying the image of the region as a mask abnormal wearing state may be included.

상기 마스크 비정상 착용 상태로 분류하는 단계는 상기 딥러닝 네트워크의 출력 값을 토대로 코 또는 입이 노출된 착용 불량 상태, 마스크를 착용하지 않은 마스크 미착용 상태 및 손이나 물건으로 가려진 상태 중 하나로 분류하는 단계를 포함할 수 있다. The step of classifying the mask into an abnormal wearing state is a step of classifying into one of a wearing state in which the nose or mouth is exposed, a mask not wearing a mask state, and a state covered by a hand or object based on the output value of the deep learning network. can include

상기 정규화 및 전처리하는 단계는 기 설치된 얼굴 감기지로부터 얼굴 검출 결과를 수신하는 단계, 그리고 상기 얼굴 검출 결과를 토대로 상기 전체 영상으로부터 얼굴 영역을 검출하는 단계를 포함할 수 있다. The normalizing and preprocessing may include receiving a face detection result from a pre-installed face detector, and detecting a face region from the entire image based on the face detection result.

본 발명의 다른 한 실시 예에 따르면, 마스크 착용을 감지하는 마스크 착용 감지 장치가 제공된다. 상기 마스크 착용 감지 장치는 얼굴 검출 결과를 토대로 전체 영상으로부터 얼굴 영역의 영상을 자르고, 상기 얼굴 영역에 눈, 코 및 입의 위치를 설정하는 정규화 처리부, 그리고 상기 눈, 코 및 입의 위치가 설정된 얼굴 영역의 영상을 딥러닝 네트워크의 입력 데이터로 사용하여 마스크가 호흡기관에 해당하는 코와 입을 완전히 가린 마스크 정상 착용 상태와 마스크 비정상 착용 상태를 검출하는 딥러닝 검출부를 포함하며, 상기 딥러닝 네트워크는 적어도 하나의 컨볼루션 계층, 그리고 상기 적어도 하나의 컨볼루션 계층으로부터 출력되는 데이터로부터 하나의 출력 값을 생성하는 완전 연결 계층을 포함하고, 상기 적어도 하나의 컨볼루션 계층은 해상도를 줄이면서 특징을 추출하는 적어도 하나의 역 블록과, 상기 역 블록을 통과할 때마다 채널의 수를 증가시켜 특징을 추출하는 적어도 하나의 역 잔여 블록을 포함할 수 있다. According to another embodiment of the present invention, a mask wearing detection device for detecting mask wearing is provided. The mask wearing detection device cuts an image of a face region from an entire image based on a face detection result, a normalization processing unit that sets positions of eyes, nose, and mouth in the face region, and a face for which the positions of the eyes, nose, and mouth are set. A deep learning detection unit that uses the image of the region as input data of the deep learning network to detect a normal wearing state of the mask and an abnormal wearing state of the mask in which the mask completely covers the nose and mouth corresponding to the respiratory tract, and the deep learning network includes at least One convolution layer and a fully connected layer generating one output value from data output from the at least one convolution layer, wherein the at least one convolution layer extracts features while reducing resolution. It may include one inverse block and at least one inverse residual block for extracting a feature by increasing the number of channels each time it passes through the inverse block.

상기 딥러닝 검출부는 상기 상기 딥러닝 네트워크의 출력 값과 설정된 임계값의 비교를 통해 상기 얼굴 영역의 영상을 마스크 정상 착용 상태와 상기 얼굴 영역의 영상을 마스크 비정상 착용 상태로 분류하는 분류부를 포함할 수 있다. The deep learning detection unit may include a classification unit configured to classify the image of the face region into a mask normal wearing state and the mask abnormal wearing state by comparing an output value of the deep learning network with a set threshold. there is.

상기 마스크 착용 감지 장치는 상기 눈, 코 및 입의 위치가 설정된 얼굴 영역의 영상을 설정된 크기 및 흑백 영상으로 변환하여 상기 딥러닝 네트워크로 출력하는 영상 전처리부를 더 포함할 수 있다. The mask wearing detection device may further include an image pre-processing unit that converts the image of the face region where the positions of the eyes, nose, and mouth are set to a black-and-white image of a set size and outputs the converted image to the deep learning network.

상기 마스크 착용 감지 장치는 입력되는 전체 영상으로부터 얼굴을 검출하여 상기 얼굴 검출 결과를 생성하고, 상기 얼굴 검출 결과를 상기 정규화 처리부로 전달하는 얼굴 검출기를 더 포함할 수 있다. The mask wearing detection device may further include a face detector that detects a face from an entire input image, generates a face detection result, and transmits the face detection result to the normalization processor.

본 발명의 실시 예에 의하면, 호흡기 감염 방지를 위해 마스크를 올바르게 착용 했는지 판단할 수 있고, 마스크를 착용하였지만 호흡기관인 코와 입 일부 노출과 같은 비정상적인 마스크 착용도 감지할 수 있어, 마스크를 착용하여 감염병 예방이 필요한 공공시설에서 사용이 가능하다. According to an embodiment of the present invention, it is possible to determine whether a mask is worn correctly to prevent respiratory infections, and even though a mask is worn, abnormal wearing of a mask, such as partial exposure of the nose and mouth, which are respiratory organs, can be detected. It can be used in public facilities that require prevention.

또한 마스크 착용 감지와 얼굴 검출, 얼굴 정렬 기능을 각각 분리함으로써, 얼굴 검출 기능을 사용하는 장소의 인프라를 그대로 사용할 수 있다. In addition, by separating the mask wearing detection, face detection, and face alignment functions, the infrastructure of the place where the face detection function is used can be used as it is.

또한 얼굴 검출 기능을 기존의 인프라를 사용하는 경우, 마스크 착용 감지용 딥러닝 네트워크는 마스크 착용 여부만을 판단하면 되므로, 저가의 프로세서에서도 마스크 착용 감지 기능의 구현이 가능하며, 고성능 프로세서를 활용한다면 더 짧은 시간 내에 마스크 착용 감지 동작이 가능할 수 있다. In addition, if the face detection function uses the existing infrastructure, the deep learning network for mask wearing detection only needs to determine whether or not the mask is worn, so it is possible to implement the mask wearing detection function even with a low-cost processor, and if a high-performance processor is used, a shorter A mask wearing detection operation may be possible within a certain amount of time.

도 1은 본 발명의 실시 예에 따른 마스크 착용 감지 장치를 개략적으로 나타낸 도면이다.
도 2는 도 1에 도시된 마스크 착용 판별부를 나타낸 도면이다.
도 3은 도 2에 도시된 딥러닝 검출부의 딥러닝 네트워크의 구조의 일 예를 나타낸 도면이다.
도 4는 도 3에 도시된 컨볼루션 계층의 역 블록의 일 예를 나타낸 도면이다.
도 5는 도 3에 도시된 컨볼루션 계층의 역 잔여 블록의 일 예를 나타낸 도면이다.
도 6은 본 발명의 실시 예에 따른 마스크 착용 감지 방법을 나타낸 흐름도이다.
도 7은 본 발명의 다른 실시 예에 따른 마스크 착용 감지 장치를 나타낸 도면이다. 1 is a diagram schematically showing a mask wearing detection device according to an embodiment of the present invention.
FIG. 2 is a view showing the mask wearing determination unit shown in FIG. 1 .
FIG. 3 is a diagram showing an example of a structure of a deep learning network of the deep learning detection unit shown in FIG. 2 .
FIG. 4 is a diagram illustrating an example of an inverse block of the convolution layer shown in FIG. 3 .
FIG. 5 is a diagram illustrating an example of an inverse residual block of the convolution layer shown in FIG. 3 .
6 is a flowchart illustrating a mask wearing detection method according to an embodiment of the present invention.
7 is a diagram illustrating a device for detecting wearing a mask according to another embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 및 청구범위 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification and claims, when a certain component is said to "include", it means that it may further include other components without excluding other components unless otherwise stated.

이제 본 발명의 실시 예에 따른 마스크 착용 감지 방법 및 장치에 대하여 도면을 참고로 하여 상세하게 설명한다.Now, a mask wearing detection method and apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시 예에 따른 마스크 착용 감지 장치를 개략적으로 나타낸 도면이다. 1 is a diagram schematically showing a mask wearing detection device according to an embodiment of the present invention.

먼저, 마스크는 귀에 걸거나 벨트 형태로 얼굴에 착용하여 호흡기관인 입과 코를 가릴 수 있도록 제작된 천(면) 가리개로 정의될 수 있다. First, a mask can be defined as a cloth (cotton) covering made to cover the mouth and nose, which are respiratory organs, by hanging it over the ears or wearing it on the face in the form of a belt.

도 1을 참고하면, 마스크 착용 감지 장치(100)는 얼굴 검출기(110) 및 마스크 착용 판별부(120)를 포함할 수 있다. Referring to FIG. 1 , the mask wearing detection device 100 may include a face detector 110 and a mask wearing determining unit 120 .

얼굴 검출기(110)는 입력되는 영상으로부터 얼굴을 검출한다. 얼굴 검출기(110)는 딥러닝을 이용하여 입력되는 영상으로부터 얼굴을 검출할 수 있다. 얼굴 검출기(110)는 얼굴 검출 결과로 두 눈, 코끝 및 입술의 양 끝점의 위치를 마스크 착용 판별부(120)로 전달한다. The face detector 110 detects a face from an input image. The face detector 110 may detect a face from an input image using deep learning. The face detector 110 transfers the positions of the two eyes, the tip of the nose, and the ends of the lips to the mask wearing determination unit 120 as a face detection result.

일반적으로, 얼굴 검출기(110)는 얼굴 인식 출입제어나 지능형 CCTV 등 영상감시의 많은 영역에서 사용되고 있다. 얼굴 검출기(110)가 사용되고 있는 환경의 경우, 마스크 착용 감지 장치(100)는 이미 사용되고 있는 얼굴 검출기(110)의 얼굴 검출 기능을 이용할 수 있다. In general, the face detector 110 is used in many areas of video surveillance, such as face recognition access control or intelligent CCTV. In the case of an environment in which the face detector 110 is being used, the mask wearing detection device 100 may use the face detection function of the already used face detector 110 .

마스크 착용 판별부(120)는 얼굴 검출기(110)로부터 얼굴 검출 결과를 수신하면, 얼굴 검출 결과를 이용하여 마스크 정상 착용 여부를 판단한다. 마스크 착용 판별부(120)는 얼굴 검출기(110)로부터 눈, 코끝 및 입술의 양 끝점의 위치를 이용하여 얼굴 영역을 자른 뒤, 얼굴 영역의 영상만을 이용하여 마스크 정상 착용 여부를 판단한다. 이때 마스크 착용 판별부(120)는 마스크 정상 착용 여부를 판단함에 있어 딥러닝 네트워크를 이용할 수 있다. When the face detection result is received from the face detector 110, the mask wearing determination unit 120 determines whether the mask is normally worn using the face detection result. The mask wearing determination unit 120 cuts the face area using the positions of the eyes, the tip of the nose, and both ends of the lips from the face detector 110, and determines whether the mask is normally worn using only the image of the face area. At this time, the mask wearing determination unit 120 may use a deep learning network in determining whether or not the mask is normally worn.

이때, 마스크 착용 감지를 위해 기존에 사용 중인 얼굴 검출기(110)를 사용한다면, 마스크 착용 감지를 위한 딥러닝 네트워크는 마스크 착용 여부만을 판단하면 되므로, 학습 난이도를 낮출 수 있고, 학습 문제를 단순화시킬 수 있으므로, 연산 효율화 및 마스크 착용 감지 성능을 향상시킬 수 있다. At this time, if the existing face detector 110 is used to detect wearing a mask, the deep learning network for detecting wearing a mask only needs to determine whether or not the mask is worn, so the learning difficulty can be reduced and the learning problem can be simplified. Therefore, computation efficiency and mask wearing detection performance can be improved.

도 2는 도 1에 도시된 마스크 착용 판별부를 나타낸 도면이다. FIG. 2 is a view showing the mask wearing determination unit shown in FIG. 1 .

도 2를 참고하면, 마스크 착용 판별부(120)는 정규화 처리부(122), 영상 전처리부(124), 딥러닝 검출부(126) 및 판정부(128)를 포함한다. Referring to FIG. 2 , the mask wearing determination unit 120 includes a normalization processing unit 122, an image pre-processing unit 124, a deep learning detection unit 126, and a determination unit 128.

얼굴 검출기(110)는 전체 영역의 영상 내에 얼굴이 있는지, 얼굴의 눈과 코, 입의 영상 내 위치를 검출하고, 얼굴의 눈과 코, 입의 영상 내 위치를 포함한 얼굴의 위치를 출력한다. The face detector 110 detects whether there is a face in the image of the entire area, detects the positions of the eyes, nose, and mouth of the face in the image, and outputs the position of the face including the positions of the eyes, nose, and mouth of the face in the image.

정규화 처리부(122)는 얼굴 검출기(110)로부터 눈, 코끝 및 입술의 양 끝점의 위치를 수신하고, 얼굴 검출기(110)로부터 수신한 눈, 코끝 및 입술의 양 끝점의 위치를 이용하여 전체 영역의 영상에서 얼굴 영역만을 잘라내는 크랍(crop) 과정을 수행한다. 이때 잘라진 얼굴 영역의 영상은 얼굴의 자세에 따라 눈, 코 및 입의 위치가 부정확할 수 있다. 정규화 처리부(122)는 눈, 코, 입의 위치 오차에 따른 딥러닝 검출부(126)의 성능저하를 줄이기 위해 얼굴 영역에 눈, 코, 입의 위치를 지정된 위치로 옮기는 정규화 과정을 수행한다. 정규화 처리부(122)는 얼굴의 눈, 코, 입의 위치가 지정된 위치에 가도록 유사변환(similarity transformation)을 수행해 얼굴을 정렬할 수 있다. 이렇게 얼굴을 정렬함으로써, 어느 정도 형태가 일정한 얼굴 영상을 구할 수 있으며, 이는 마스크 판별을 위한 딥러닝 네트워크가 오작동할 수 있는 요인을 줄여준다. The normalization processing unit 122 receives the positions of both end points of the eyes, nose tip, and lips from the face detector 110, and uses the positions of both end points of the eyes, nose tip, and lips received from the face detector 110 to determine the entire area. A cropping process is performed to cut out only the face area from the image. In this case, the positions of the eyes, nose, and mouth in the image of the face region that is cut out may be inaccurate depending on the posture of the face. The normalization processor 122 performs a normalization process of moving the positions of the eyes, nose, and mouth in the face region to designated positions in order to reduce performance degradation of the deep learning detection unit 126 due to positional errors of the eyes, nose, and mouth. The normalization processing unit 122 may align the face by performing similarity transformation so that the positions of the eyes, nose, and mouth of the face are at designated positions. By aligning the faces in this way, it is possible to obtain a face image with a certain shape to a certain extent, which reduces the factor that may cause the deep learning network for mask discrimination to malfunction.

또한 정규화 처리부(122)는 얼굴 영역의 영상을 설정된 크기로 변환한다. 예를 들어, 정규화 처리부(122)는 얼굴 영역의 영상을 128x128의 크기의 영상으로 변환한다. Also, the normalization processing unit 122 converts the image of the face area to a set size. For example, the normalization processing unit 122 converts an image of the face region into an image having a size of 128x128.

영상 전처리부(124)는 주변 조명에 따른 밝기 및 색 변화에 강인하도록, 정규화 처리부(122)로부터 출력된 얼굴 영역의 영상을 1채널 흑백 영상으로 변환하는 전처리 과정을 수행한다. 영상 전처리부(124)에 의해 흑백으로 변경된 설정된 크기의 얼굴 영역의 영상은 딥러닝 검출부(126)로 입력된다. The image preprocessor 124 performs a preprocessing process of converting the image of the face region output from the normalization processor 122 into a 1-channel black and white image so as to be robust against changes in brightness and color due to ambient lighting. The image of the face region of the set size changed to black and white by the image pre-processing unit 124 is input to the deep learning detection unit 126 .

딥러닝 검출부(126)는 입력되는 얼굴 영역의 흑백 영상으로부터 딥러닝 네트워크 기반으로 마스크 정상 착용 여부를 검출한다. 마스크를 착용한 얼굴은 크게 마스크 정상 착용, 코와 입 노출, 코와 입을 마스크가 아닌 물체로 가린 획책의 경우로 나눌 수 있다. 마스크 착용 상태는 마스크로 코와 입을 완전히 가린 경우이고, 미착용은 코와 입을 노출한 경우이다. 코 노출은 마스크로 입은 가렸지만 코를 노출한 경우이다. 마지막으로 획책은 영상에서 마스크가 아닌 손이나 스마트 폰 등으로 코와 입을 가린 경우이다. 딥러닝 검출부(126)는 입력되는 얼굴 영역의 영상으로부터 마스크 정상 착용과 마스크 비정상 착용을 검출할 수 있다. 마스크 비정상 착용은 마스크 미착용, 코 노출과 같은 착용 불량 및 획책으로 나뉘어지며, 딥러닝 검출부(126)는 마스크 비정상 착용 시, 마스크 미착용, 착용 불량 또는 획책인지 검출할 수 있다. 획책의 경우, 손으로 가린 것인지 스마트 폰과 같은 소품으로 가린 경우인지를 검출할 수 있다. The deep learning detection unit 126 detects whether or not the mask is normally worn based on the deep learning network from the input black and white image of the face area. The face wearing a mask can be largely divided into wearing the mask normally, exposing the nose and mouth, and covering the nose and mouth with an object other than a mask. Wearing a mask is when the nose and mouth are completely covered with a mask, and not wearing is when the nose and mouth are exposed. Nose exposure is when a mask covers the mouth but exposes the nose. Lastly, the scheme is a case where the nose and mouth are covered with a hand or smartphone, not a mask in the video. The deep learning detection unit 126 may detect normal wearing of the mask and abnormal wearing of the mask from the input image of the face region. Abnormal wearing of a mask is divided into not wearing a mask, poor wearing such as nose exposure, and planning, and the deep learning detection unit 126 can detect whether the mask is not wearing, poor wearing, or a plan when the mask is worn abnormally. In the case of a plan, it is possible to detect whether it is covered by a hand or a small item such as a smart phone.

판정부(128)는 딥러닝 검출부(126)의 검출 결과를 토대로 마스크 정상 착용 여부를 결정하고, 마스크 정상 착용 여부에 대한 결과를 출력한다. The determination unit 128 determines whether or not the mask is normally worn based on the detection result of the deep learning detection unit 126, and outputs a result of whether or not the mask is normally worn.

도 3은 도 2에 도시된 딥러닝 검출부의 딥러닝 네트워크의 구조의 일 예를 나타낸 도면이다. FIG. 3 is a diagram showing an example of a structure of a deep learning network of the deep learning detection unit shown in FIG. 2 .

도 3을 참고하면, 딥러닝 검출부(126)는 딥러닝 네트워크(1262) 및 분류부(1264)를 포함한다. Referring to FIG. 3 , the deep learning detection unit 126 includes a deep learning network 1262 and a classification unit 1264.

딥러닝 네트워크(1262)는 복수의 컨볼루션 계층(10, 20, 30)과 완전 연결 계층(40)을 포함한다. 도 3에서는 3개의 컨볼루션 계층(10, 20, 30)을 도시하였다. The deep learning network 1262 includes a plurality of convolution layers 10, 20, and 30 and a fully connected layer 40. 3 shows three convolutional layers 10, 20, and 30.

컨볼루션 계층(10, 20, 30)의 적어도 하나의 컨볼루션 계층(10, 20)은 적어도 하나의 역 블록(Inverted Block)과 적어도 하나의 역 잔여 블록(Inverted Residual Block)으로 이루어질 수 있다. 컨볼루션 계층(10, 20, 30)의 마지막에 위치한 컨볼루션 계층(30)은 적어도 하나의 역 블록(Inverted Block)으로 이루어질 수 이다. 역 잔여 블록은 채널수가 적은 계층들을 스킵 연결(skip connection)한다. At least one of the convolution layers 10 and 20 of the convolution layers 10, 20 and 30 may include at least one inverted block and at least one inverted residual block. The convolution layer 30 located at the end of the convolution layers 10, 20, and 30 may be composed of at least one inverted block. The inverse residual block skips layers with a small number of channels.

예를 들면, 흑백으로 변경된 설정된 크기의 얼굴 영역의 영상이 입력되는 첫 번째 컨볼루션 계층(10)은 하나의 역 블록과 두 개의 역 잔여 블록이 순차적으로 연결되어 있는 구조를 가진다. 첫 번째 컨볼루션 계층(10)의 출력 데이터인 특징 맵(feature map)은 두 번째 컨볼루션 계층(20)으로 입력되고, 두 번째 컨볼루션 계층(20)은 하나의 역 블록과 세 개의 역 잔여 블록이 순차적으로 연결되어 있는 구조를 가진다. 다음, 두 번째 컨볼루션 계층(20)의 출력 데이터인 특징 맵이 입력되는 세 번째 컨볼루션 계층(30)은 두 개의 역 블록이 순차적으로 연결되어 있는 구조를 가진다. 세 번째 컨볼루션 계층(30)의 출력 데이터인 특징 맵은 완전 연결 계층(40)으로 입력된다. For example, the first convolution layer 10 to which an image of a face region of a set size changed to black and white is input has a structure in which one inverse block and two inverse residual blocks are sequentially connected. The feature map, which is the output data of the first convolution layer 10, is input to the second convolution layer 20, and the second convolution layer 20 has one inverse block and three inverse residual blocks. It has a sequentially connected structure. Next, the third convolution layer 30 to which the feature map, which is the output data of the second convolution layer 20, is input has a structure in which two inverse blocks are sequentially connected. The feature map, which is the output data of the third convolution layer 30, is input to the fully connected layer 40.

완전 연결 계층(40)은 직전의 컨볼루션 계층(30)에서 얻은 출력 데이터를 완전 연결을 통해 하나의 값(Mask score)을 생성하여 출력한다. The fully connected layer 40 generates and outputs one value (mask score) through fully connected output data obtained from the previous convolution layer 30.

이때 컨볼루션 계층(10, 20)의 역 블록을 통과할 때마다 필터의 수가 2배씩 증가하며, 컨볼루션 계층(10, 20, 30)의 역 블록과 역 잔여 블록을 모두 통과한 특징 맵은 완전 연결 계층(40)을 통해 하나의 출력 값(Mask score)으로 변환된다. At this time, the number of filters doubles each time it passes through the inverse block of the convolution layer (10, 20), and the feature map that passes both the inverse block and the inverse residual block of the convolution layer (10, 20, 30) is completely It is converted into one output value (mask score) through the connection layer 40.

이와 같이, 딥러닝 네트워크(1262)는 네트워크의 깊이를 늘리기 위해 역 잔여 블록을 역 블록과 혼용하는 구조를 가진다. In this way, the deep learning network 1262 has a structure in which an inverse residual block is mixed with an inverse block in order to increase the depth of the network.

역 블록과 역 잔여 블록은 깊이별 컨볼루션(Depthwise Convolution) 계층과 포인트별 컨볼루션(Pointwise Convolution) 계층의 연결로 구성된 깊이별 분리 컨볼루션(Depthwise Separable convolutional)의 형태이다. 깊이별 컨볼루션은 컨볼루션 연산을 특징맵의 채널별로 분할하여 연산하며, 포인트별 컨볼루션은 특징맵의 공간을 분할하여 연산한다. 채널과 공간을 분할하여 연산하게 되면서 연산해야 하는 파라미터 수가 작아져서 연산량이 줄어들고 정보의 표현력을 유지할 수 있다. The inverse block and the inverse residual block are in the form of a depthwise separable convolution composed of a connection between a depthwise convolution layer and a pointwise convolution layer. Convolution by depth is performed by dividing the convolution operation by channel of the feature map, and convolution by point is performed by dividing the space of the feature map. As the channel and space are divided and calculated, the number of parameters to be calculated is reduced, reducing the amount of calculation and maintaining the expressive power of information.

분류부(1264)는 완전 연결 계층(40)의 출력 값을 토대로 얼굴 영역의 영상으로부터 마스크 정상 착용 여부를 검출한다. 분류부(1264)는 완전 연결 계층(40)의 출력 값(Mask score)을 토대로 얼굴 영역의 영상을 마스크 정상 착용 상태, 착용 불량 상태, 마스크 미착용 상태, 손 가림 상태 및 소품 가림 상태 중 하나의 상태로 분류할 수 있다. 특히, 분류부(1264)는 완전 연결 계층(40)의 출력 값(Mask score)을 설정된 임계값(threshold)과 비교하고, 완전 연결 계층(40)의 출력 값(Mask score)이 임계값(threshold)보다 클 때 마스크 정상 착용 상태인 것으로 검출한다. 분류부(1264)는 완전 연결 계층(40)의 출력 값(Mask score)이 임계값(threshold) 이하일 때, 마스크 비정상 착용 상태로 검출하고, 완전 연결 계층(40)의 출력 값(Mask score)을 토대로 착용 불량 상태, 마스크 미착용 상태, 손 가림 상태 및 소품 가림 상태 중 하나의 상태로 분류할 수 있다. The classification unit 1264 detects whether or not the mask is normally worn from the face region image based on the output value of the fully connected layer 40 . Based on the output value (mask score) of the fully connected layer 40, the classification unit 1264 converts the image of the face region into one of a normal mask wearing state, a bad wearing state, a mask not wearing state, a hand covered state, and a prop covered state. can be classified as In particular, the classifier 1264 compares the output value (Mask score) of the fully connected layer 40 with a set threshold, and determines whether the output value (Mask score) of the fully connected layer 40 is the threshold. ), it is detected that the mask is in a normal wearing state. The classification unit 1264 detects that the mask is worn abnormally when the output value (Mask score) of the fully connected layer 40 is less than or equal to a threshold, and outputs the output value (Mask score) of the fully connected layer 40. Based on this, it can be classified into one of a state of poor wearing, a state of not wearing a mask, a state of covering hands, and a state of covering small items.

판정부(128)는 딥러닝 검출부(126)로부터 출력되는 상태 값을 바탕으로 마스크 정상 착용일 때 정상으로 출력하고, 그 이외의 상태는 이상상황으로 결과를 출력할 수 있다. Based on the state value output from the deep learning detection unit 126, the decision unit 128 may output a normal mask when the mask is worn normally, and may output a result as an abnormal situation in other states.

도 4는 도 3에 도시된 컨볼루션 계층의 역 블록의 일 예를 나타낸 도면이다.FIG. 4 is a diagram illustrating an example of an inverse block of the convolution layer shown in FIG. 3 .

도 4를 참고하면, 컨볼루션 계층(10, 20, 30)의 역 블록은 포인트별 컨볼루션(Pointwise Convolution) 연산 블록, 깊이별 콘볼루션(Depthwise Convolution) 연산 블록 및 포인트별 컨볼루션 연산 블록이 순차적으로 연결된 구조를 가진다. 여기서, 스트라이드 2(stride 2)는 다운 샘플링(down sampling)을 위해 사용하는 블록이다. 스트라이드 2에 의해 해상도가 줄어들게 된다. Referring to FIG. 4, the inverse blocks of the convolution layers 10, 20, and 30 are sequentially composed of a pointwise convolution operation block, a depthwise convolution operation block, and a pointwise convolution operation block. has a structure connected to Here, stride 2 is a block used for down sampling. The resolution is reduced by a stride of 2.

이러한 역 블록은 처리하는 해상도를 줄여가며 이전 특징 맵으로부터 의미 있는 특징들을 추출하는 구조이다. This inverse block is a structure that extracts meaningful features from the previous feature map while reducing the processing resolution.

도 5는 도 3에 도시된 컨볼루션 계층의 역 잔여 블록의 일 예를 나타낸 도면이다. FIG. 5 is a diagram illustrating an example of an inverse residual block of the convolution layer shown in FIG. 3 .

도 5를 참고하면, 컨볼루션 계층(10, 20)의 역 잔여 블록은 포인트별 컨볼루션 연산 블록, 깊이별 콘볼루션 연산 블록 및 포인트별 컨볼루션 연산 블록이 순차적으로 연결되어 있으며, 이때 채널수가 적은 포인트별 컨볼루션 연산 블록이 연결되는 구조를 가진다. Referring to FIG. 5, in the inverse residual blocks of the convolution layers 10 and 20, a point-by-point convolution operation block, a depth-based convolution operation block, and a point-by-point convolution operation block are sequentially connected, and in this case, the number of channels is small. It has a structure in which point-by-point convolution operation blocks are connected.

이러한 역 잔여 블록은 처리하는 해상도에서 의미 있는 특징들을 추출하는 구조이다. This inverse residual block is a structure for extracting meaningful features at a processing resolution.

도 4와 도 5에 도시된 역 블록과 역 잔여 블록을 이용하는 딥러닝 네트워크(1262)는 입력되는 영상을 16개의 채널로 처리한 다음 역 블록을 통과하며 해상도가 절반으로 줄어들 때마다 채널 수를 2배씩 늘려가면서 최종적으로 64개의 채널까지 늘려, 특징 값을 출력한다. The deep learning network 1262 using the inverse block and the inverse residual block shown in FIGS. 4 and 5 processes the input image with 16 channels and then passes through the inverse block, increasing the number of channels to 2 each time the resolution is reduced by half. The feature value is output by increasing the number of channels by a factor of 64 and finally increasing to 64 channels.

다음, 복수의 컨볼루션 계층(10, 20, 30)의 마지막 컨볼루션 계층(30)으로부터 출력되는 특징 값은 완전 연결 계층(40)을 통과한 뒤 임계값(threshold)과의 비교를 통해 마스크 착용 여부가 판단된다. Next, the feature value output from the last convolution layer 30 of the plurality of convolution layers 10, 20, and 30 passes through the fully connected layer 40 and then wears a mask through comparison with a threshold. It is judged whether

딥러닝에서 활성함수로 사용되는 ReLU(Rectified Linear Unit) 함수는 비선형에 의한 정보손실이 발생한다. 활성함수에 의한 정보손실을 줄이기 위해 데이터를 증폭(expansion)한 뒤, 깊이별 컨볼루션과 포인트별 컨볼루션을 통해 원래 크기로 압축한다. The Rectified Linear Unit (ReLU) function used as an activation function in deep learning causes information loss due to nonlinearity. In order to reduce the information loss caused by the activation function, the data is expanded (expansion) and then compressed to the original size through convolution by depth and convolution by point.

한편, 도 4와 도 5에 도시된 역 블록과 역 잔여 블록을 이용하는 딥러닝 네트워크(1262)는 도 4와 도 5에 도시된 역 블록과 역 잔여 블록을 이용하여, 도 3에 도시한 바와 같은 구조를 가짐으로써, 마스크 착용 판별을 위한 성능을 만족하면서 연산량을 최대한 줄여 동작 시간을 줄일 수 있는 깊이까지 역 블록과 역 잔여 블록을 쌓아서 생성된다. 이렇게 하면, 기존의 모바일넷 V2의 구조보다 더욱 경량화된 네트워크를 구성할 수 있다. Meanwhile, the deep learning network 1262 using the inverse block and inverse residual block shown in FIGS. 4 and 5 uses the inverse block and inverse residual block shown in FIGS. 4 and 5, as shown in FIG. By having a structure, it is generated by stacking inverse blocks and inverse residual blocks to a depth that can reduce operation time by reducing the amount of calculation as much as possible while satisfying the performance for discriminating wearing a mask. In this way, a more lightweight network can be configured than the structure of the existing MobileNet V2.

도 6은 본 발명의 실시 예에 따른 마스크 착용 감지 방법을 나타낸 흐름도이다. 6 is a flowchart illustrating a mask wearing detection method according to an embodiment of the present invention.

도 6을 참고하면, 마스크 착용 감지 장치(100)는 입력되는 영상으로부터 얼굴을 검출한다(S610). 마스크 착용 감지 장치(100)는 얼굴 검출을 위해 기 사용 중인 얼굴 검출기(110)를 이용할 수 있다. Referring to FIG. 6 , the mask wearing detection device 100 detects a face from an input image (S610). The mask wearing detection device 100 may use the already used face detector 110 for face detection.

마스크 착용 감지 장치(100)는 얼굴 검출 결과를 이용하여 전체 영상에서 얼굴 영역을 자른 뒤(S620), 얼굴 영역에 눈, 코, 입의 위치를 지정된 위치로 옮기는 얼굴 정규화를 수행한다(S630). The mask wearing detection device 100 cuts out the face region from the entire image using the face detection result (S620), and then performs face normalization by moving the positions of the eyes, nose, and mouth to designated positions in the face region (S630).

마스크 착용 감지 장치(100)는 얼굴 영역의 영상을 설정된 크기로 변환하고(S640), 얼굴 영역의 영상을 흑백 영상으로 변환한다(S650).The mask wearing detection device 100 converts the image of the face area into a set size (S640) and converts the image of the face area into a black and white image (S650).

마스크 착용 감지 장치(100)는 얼굴 영역의 흑백 영상으로부터 딥러닝 네트워크를 기반으로 마스크 정상 착용 여부를 검출한다(S660). 마스크 착용 감지 장치(100)는 얼굴 영역의 흑백 영상을 딥러닝 네트워크의 입력 영상으로 사용하고 딥러닝 네트워크로부터 출력되는 값으로부터 얼굴 영역의 영상을 마스크 정상 착용 상태, 착용 불량 상태, 마스크 미착용 상태, 손 가림 상태 및 소품 가림 상태 중 하나의 상태로 분류하고, 분류된 상태 값을 바탕으로 마스크 정상 착용일 때 정상으로 출력하고, 그 이외의 상태는 이상상황으로 결과를 출력할 수 있다. The mask wearing detection device 100 detects whether the mask is normally worn based on the deep learning network from the black and white image of the face area (S660). The mask wearing detection device 100 uses the black and white image of the face area as an input image of the deep learning network, and from the value output from the deep learning network, the image of the face area is classified as a normal mask wearing state, a bad wearing state, a mask not wearing state, a hand It can be classified into one of the covered state and small item covered state, output as normal when the mask is normally worn based on the classified state value, and output as abnormal in other states.

도 7은 본 발명의 다른 실시 예에 따른 마스크 착용 감지 장치를 나타낸 도면이다. 7 is a diagram illustrating a mask wearing detection device according to another embodiment of the present invention.

도 7을 참고하면, 마스크 착용 감지 장치(700)는 앞에서 설명한 마스크 착용 감지 방법이 구현된 컴퓨팅 장치를 나타낼 수 있다. Referring to FIG. 7 , the mask wearing detection device 700 may represent a computing device in which the mask wearing detection method described above is implemented.

마스크 착용 감지 장치(700)는 프로세서(710), 메모리(720), 입력 인터페이스 장치(730), 출력 인터페이스 장치(740) 및 저장 장치(750) 중 적어도 하나를 포함할 수 있다. 각각의 구성 요소들은 공통 버스(bus)(760)에 의해 연결되어 서로 통신을 수행할 수 있다. 또한 각각의 구성 요소들은 공통 버스(760)가 아니라, 프로세서(710)를 중심으로 개별 인터페이스 또는 개별 버스를 통하여 연결될 수도 있다.The mask wearing detection device 700 may include at least one of a processor 710, a memory 720, an input interface device 730, an output interface device 740, and a storage device 750. Each component may be connected by a common bus 760 to communicate with each other. In addition, each component may be connected through an individual interface or individual bus centered on the processor 710 instead of the common bus 760 .

프로세서(710)는 AP(Application Processor), CPU(Central Processing Unit), GPU(Graphic　Processing　Unit) 등과 같은 다양한 종류들로 구현될 수 있으며, 메모리(720) 또는 저장 장치(750)에 저장된 명령을 실행하는 임의의 반도체 장치일 수 있다. 프로세서(710)는 메모리(720) 및 저장 장치(750) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. 프로세서(710)는 도 2에서 설명한 정규화 처리부(122), 영상 전처리부(124), 딥러닝 검출부(126) 및 판정부(128)의 적어도 일부 기능을 구현하기 위한 프로그램 명령을 메모리(720)에 저장하여, 도 1 내지 도 6을 토대로 설명한 동작이 수행되도록 제어할 수 있다. The processor 710 may be implemented in various types such as an Application Processor (AP), a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), and the like, and executes commands stored in the memory 720 or the storage device 750. It may be any semiconductor device that The processor 710 may execute a program command stored in at least one of the memory 720 and the storage device 750 . The processor 710 stores program instructions for implementing at least some functions of the normalization processing unit 122, the image preprocessing unit 124, the deep learning detection unit 126, and the determination unit 128 described in FIG. 2 in the memory 720. By storing, it can be controlled so that the operation described based on FIGS. 1 to 6 is performed.

메모리(720) 및 저장 장치(750)는 다양한 형태의 휘발성 또는 비 휘발성 저장 매체를 포함할 수 있다. 예를 들어, 메모리(720)는 ROM(read-only memory)(721) 및 RAM(random access memory)(722)를 포함할 수 있다. 메모리(720)는 프로세서(710)의 내부 또는 외부에 위치할 수 있고, 이미 알려진 다양한 수단을 통해 프로세서(710)와 연결될 수 있다. The memory 720 and the storage device 750 may include various types of volatile or non-volatile storage media. For example, the memory 720 may include read-only memory (ROM) 721 and random access memory (RAM) 722 . The memory 720 may be located inside or outside the processor 710 and may be connected to the processor 710 through various known means.

입력 인터페이스 장치(730)는 데이터를 프로세서(710)로 제공하도록 구성된다. Input interface device 730 is configured to provide data to processor 710 .

출력 인터페이스 장치(740)는 프로세서(710)로부터의 데이터를 출력하도록 구성된다. Output interface device 740 is configured to output data from processor 710 .

본 발명의 실시 예에 따른 마스크 착용 감지 방법 중 적어도 일부는 컴퓨팅 장치에서 실행되는 프로그램 또는 소프트웨어로 구현될 수 있고, 프로그램 또는 소프트웨어는 컴퓨터로 판독 가능한 매체에 저장될 수 있다.At least some of the method for detecting wearing a mask according to an embodiment of the present invention may be implemented as a program or software running on a computing device, and the program or software may be stored in a computer-readable medium.

또한 본 발명의 실시 예에 따른 마스크 착용 감지 방법 중 적어도 일부는 컴퓨팅 장치와 전기적으로 접속될 수 있는 하드웨어로 구현될 수도 있다.In addition, at least some of the mask wearing detection methods according to embodiments of the present invention may be implemented as hardware that can be electrically connected to a computing device.

이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리 범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리 범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concept of the present invention defined in the following claims are also included in the scope of the present invention. that fall within the scope of the right.

Claims

In the method of detecting mask wearing from the entire image input from the mask wearing detection device,
Normalizing and pre-processing the image of the face region detected from the entire image, and
Detecting a mask normal wearing state and a mask abnormal wearing state in which the mask completely covers the nose and mouth corresponding to the respiratory organ based on a deep learning network from the normalized and preprocessed image of the face region
Including,
The deep learning network is
at least one convolutional layer; and
A fully connected layer generating one output value from data output from the at least one convolution layer,
The at least one convolution layer includes at least one inverse block for extracting features while reducing resolution, and at least one inverse residual block for extracting features by increasing the number of channels each time it passes through the inverse block. How to detect wearing a mask.

In paragraph 1,
The normalization and preprocessing steps
Acquiring positions of eyes, noses, and mouths in the face region detected from the entire image, and
A mask wearing detection method comprising the step of moving the positions of the eyes, nose and mouth to designated positions in the face area.

In paragraph 2,
The normalization and preprocessing steps
Cropping only the image of the face region from the entire image, and
and converting the image of the face region into a black and white image.

In paragraph 1,
The step of detecting
classifying the image of the face region as a normal mask wearing state when the output value of the deep learning network is greater than a set threshold; and
and classifying the image of the face region as a mask abnormal wearing state when the output value of the deep learning network is less than or equal to the threshold value.

In paragraph 1,
The step of classifying the mask into an abnormal wearing state is a step of classifying into one of a wearing state in which the nose or mouth is exposed, a mask not wearing a mask state, and a state covered by a hand or object based on the output value of the deep learning network. A mask wearing detection method comprising:

In paragraph 1,
The normalization and preprocessing steps
Receiving a face detection result from a pre-installed face detector, and
and detecting a face region from the entire image based on the face detection result.

In a mask wearing detection device that detects wearing a mask,
A normalization processing unit for cropping an image of a face region from an entire image based on a face detection result and setting positions of eyes, nose, and mouth in the face region; and
Using the image of the face region where the positions of the eyes, nose, and mouth are set as input data of the deep learning network, the mask completely covers the nose and mouth corresponding to the respiratory organ. Deep learning to detect normal wearing and abnormal wearing of the mask detection unit
Including,
The deep learning network is
at least one convolutional layer; and
A fully connected layer generating one output value from data output from the at least one convolution layer,
The at least one convolution layer includes at least one inverse block for extracting features while reducing resolution, and at least one inverse residual block for extracting features by increasing the number of channels each time it passes through the inverse block. Mask wearing detection device.

In paragraph 7,
The deep learning detection unit includes a classification unit configured to classify the image of the face region into a mask normal wearing state and the mask abnormal wearing state by comparing the output value of the deep learning network with a set threshold. wear detection device.

In paragraph 7,
An image pre-processing unit that converts the image of the face region where the positions of the eyes, nose, and mouth are set to a set size and black-and-white image and outputs the image to the deep learning network.
A mask wearing detection device further comprising a.

In paragraph 7,
A face detector for detecting a face from the entire input image, generating the face detection result, and passing the face detection result to the normalization processing unit
A mask wearing detection device further comprising a.