KR20230139019A

KR20230139019A - System for identitifying the wearing of worker's personal protective equipment and worker's face based on deep learning

Info

Publication number: KR20230139019A
Application number: KR1020220037085A
Authority: KR
Inventors: 정회경; 이근우
Original assignee: 배재대학교 산학협력단
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2023-10-05

Abstract

본 발명은 영상 데이터를 기반으로 한 컴퓨터 비전 영상처리와 딥 러닝 알고리즘을 적용하여 무인으로 산업 현장 작업자들의 개인보호구 착용 여부를 확인하고, 이와 동시에 얼굴 인식을 통해 신원을 확인할 수 있는 딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템에 관한 것으로, 이는 작업자들을 촬영한 영상 데이터를 획득하는 촬영부와, 영상 데이터로부터 작업자가 착용한 개인보호구의 객체와 작업자의 얼굴을 인식하기 위한 각 모델을 학습시키는 모델 학습부, 모델 학습부를 통해 학습된 최적의 모델 알고리즘을 적용하여 실제 영상 데이터로부터 작업자의 개인보호구 및 얼굴을 인식하고 사전 등록된 데이터 셋과 비교하여 개인보호구의 착용 여부 및 작업자의 신원을 각각 식별하는 개인보호구 착용 식별부 및 작업자 신원 식별부를 포함할 수 있다.The present invention applies computer vision image processing and deep learning algorithms based on image data to unmannedly check whether workers at industrial sites are wearing personal protective equipment, and at the same time, identifies workers based on deep learning through facial recognition. It relates to a system for wearing protective gear and facial identification, which includes an imaging unit that acquires video data taken of workers, and a model that learns each model to recognize the object of the personal protective equipment worn by the worker and the worker's face from the video data. By applying the optimal model algorithm learned through the learning unit and model learning unit, the worker's personal protective equipment and face are recognized from actual image data and compared with the pre-registered data set to identify whether personal protective equipment is worn and the worker's identity, respectively. It may include a personal protective equipment wearing identification unit and a worker identification unit.

Description

Deep learning-based worker personal protective equipment wearing and facial identification system {SYSTEM FOR IDENTITIFYING THE WEARING OF WORKER'S PERSONAL PROTECTIVE EQUIPMENT AND WORKER'S FACE BASED ON DEEP LEARNING}

본 발명은 산업 현장의 안전관리 시스템에 관한 것으로, 영상 데이터를 기반으로 한 컴퓨터 비전 영상처리와 딥 러닝 알고리즘을 적용하여 무인으로 산업 현장 작업자들의 개인보호구 착용 여부를 확인하고, 이와 동시에 얼굴 인식을 통해 신원을 확인할 수 있는 딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템에 관한 것이다. The present invention relates to a safety management system at industrial sites. It applies computer vision image processing and deep learning algorithms based on image data to check whether workers at industrial sites are wearing personal protective equipment unmanned, and at the same time, uses facial recognition to determine whether workers are wearing personal protective equipment. It is about a deep learning-based worker wearing personal protective equipment and facial identification system that can verify identity.

오늘날의 대형 건설, 제조 등 공정이 복잡하고 불안전한 산업 현장에서 작업자의 근로 활동은 언제 어디서든 항상 많은 위험에 노출되어 있다. 국내 산업 현장의 재해 유형 분석 결과에 따르면 건설 현장에서 전체 사망자의 37.3%가 개인보호구 중 하나인 안전모를 착용하지 않은 상태에서 발생했다는 것이다. In today's industrial sites with complex and unsafe processes, such as large-scale construction and manufacturing, workers' work activities are always exposed to many risks anytime and anywhere. According to the results of analysis of disaster types at domestic industrial sites, 37.3% of all deaths at construction sites occurred without wearing a safety helmet, which is one of the personal protective equipment.

이러한 산업 현장의 사고 예방을 위한 시스템으로, 작업자의 업무 상황을 실시간으로 파악하기 위한 CCTV(Closed Circuit TeleVision)를 활용한 영상 모니터링 관제 시스템이 적용되고 있다. 그러나, 이러한 CCTV 감시 시스템과 안전관리 시스템은 사람에 의한 관제 능력이 제한적이어서 사전 예방에 적합하지 못한 문제점이 있다. As a system to prevent accidents in these industrial sites, a video monitoring control system using CCTV (Closed Circuit TeleVision) is being applied to determine the work situation of workers in real time. However, these CCTV surveillance systems and safety management systems have a problem in that they are not suitable for prevention due to limited human control capabilities.

따라서, 산업 현장의 작업자 상황을 실시간으로 식별하여 즉시 안전하게 관리할 수 있는 첨단 기술이 적용된 스마트 안전 관리가 요구되고 있다. 사람에 의한 수동적 안전관리가 아닌 최근 각광받고 있는 인공지능 컴퓨터 비전 및 딥 러닝 기반의 이미지 인식기술에 의한 방식들이 시도되고 있다.Therefore, there is a demand for smart safety management using cutting-edge technology that can identify workers' situations in industrial sites in real time and immediately and safely manage them. Instead of manual safety management by humans, methods using artificial intelligence computer vision and deep learning-based image recognition technology, which have recently been in the spotlight, are being attempted.

본 출원인은, 상기와 같은 문제점을 해결하기 위하여 본 발명을 제안하게 되었다.The present applicant proposed the present invention to solve the above problems.

한국등록특허 제10-2147052호 (등록일자 2020.08.17.)Korean Patent No. 10-2147052 (registration date 2020.08.17.) 한국공개특허 제10-2017-0050465호 (공개일자 2017.05.11.)Korean Patent Publication No. 10-2017-0050465 (publication date 2017.05.11.) 한국등록특허 제10-2294574호 (등록일자 2021.08.23.)Korean Patent No. 10-2294574 (registration date 2021.08.23.) 한국등록특허 제10-2263154호 (등록일자 2021.06.03.)Korean Patent No. 10-2263154 (registration date 2021.06.03.) 한국등록특허 제10-1510798호 (등록일자 2015.04.03.)Korean Patent No. 10-1510798 (registration date 2015.04.03.)

본 발명은 상기와 같은 문제점을 해결하기 위하여 제안된 것으로, 영상 데이터를 기반으로 한 컴퓨터 비전 영상처리와 딥 러닝 알고리즘을 적용하여 작업자의 영상으로부터 작업자가 착용하고 있는 개인보호구와 작업자의 얼굴을 인식하여 사람의 수동적 안전관리가 아닌 무인으로 작업자의 개인보호구 착용 준수 및 신원을 확인할 수 있는 딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템을 제공하는 데 본 발명의 목적이 있다. The present invention was proposed to solve the above problems, and applies computer vision image processing and deep learning algorithms based on image data to recognize the personal protective equipment worn by the worker and the worker's face from the worker's image. The purpose of the present invention is to provide a deep learning-based worker personal protective equipment wearing and facial identification system that can verify workers' personal protective equipment wearing compliance and identity unmanned rather than through manual safety management by humans.

상기한 바와 같은 과제를 달성하기 위하여 본 발명의 실시예에 따른 딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템은, 작업자들을 촬영한 영상 데이터를 획득하는 촬영부; 상기 영상 데이터로부터 작업자가 착용한 개인보호구의 객체와 작업자의 얼굴을 인식하기 위한 각 모델을 학습시킨 후 최적의 모델 알고리즘을 추론하는 모델 학습부; 상기 모델 학습부를 통해 추론된 모델 알고리즘을 적용하여 실제 영상 데이터로부터 작업자의 개인보호구를 검출하고 사전 등록된 개인보호구의 데이터 셋과 비교하여 개인보호구의 착용 여부를 식별하는 개인보호구 착용 식별부; 및 상기 모델 학습부를 통해 추론된 모델 알고리즘을 적용하여 실제 영상 데이터로부터 작업자의 얼굴을 인식하고 사전 등록된 작업자의 신원 데이터 셋과 비교하여 작업자의 신원을 식별하는 작업자 신원 식별부를 포함할 수 있다. In order to achieve the above-mentioned tasks, a deep learning-based worker personal protective equipment wearing and facial identification system according to an embodiment of the present invention includes a photographing unit for acquiring image data of workers; A model learning unit that learns each model for recognizing the object of the personal protective equipment worn by the worker and the worker's face from the image data and then infers an optimal model algorithm; A personal protective equipment wearing identification unit that detects the worker's personal protective equipment from actual image data by applying the model algorithm inferred through the model learning unit and identifies whether the personal protective equipment is worn by comparing it with a data set of pre-registered personal protective equipment; And it may include a worker identity identification unit that recognizes the worker's face from actual image data by applying the model algorithm inferred through the model learning unit and identifies the worker's identity by comparing it with a pre-registered worker's identity data set.

구체적으로, 상기 모델 학습부는, 딥 러닝(deep learning)을 활용한 YOLO(You Only Look Once) 모델을 이용하여 상기 영상 데이터로부터 개인보호구의 객체를 검출하는 개인보호구 검출부; 딥 러닝(deep learning)을 활용한 MTCNN(Multi Tasking Convolutional Neural Networks) 모델과 FaceNet모델을 이용하여 상기 영상 데이터로부터 얼굴 특징을 인식하는 얼굴 인식부; 상기 YOLO 모델과 MTCNN 모델, FaceNet모델의 성능 분석을 통해 학습 손실을 최소가 되도록 매개변수를 조절하는 모델 검증부를 포함할 수 있다. Specifically, the model learning unit includes a personal protective equipment detection unit that detects objects of personal protective equipment from the image data using a YOLO (You Only Look Once) model utilizing deep learning; A face recognition unit that recognizes facial features from the image data using a MTCNN (Multi Tasking Convolutional Neural Networks) model using deep learning and a FaceNet model; It may include a model verification unit that adjusts parameters to minimize learning loss through performance analysis of the YOLO model, MTCNN model, and FaceNet model.

또한, 상기 개인보호구 검출부는, 상기 영상 데이터로부터 특징맵을 추출하고 개인보호구의 착용 및 미착용한 경우로 클래스를 분류하는 백본(backbone)과, 상기 추출된 특징맵을 바탕으로 상기 개인보호구의 객체가 위치될 것으로 예상되는 영역에 앵커 박스(anchor box)를 생성하고 상기 앵커 박스내에서 상기 클래스 정보에 기반하여 최종적인 경계 상자(bounding box)를 설정하는 컨볼루션 앵커 박스(convolution anchor box), 및 상기 설정된 경계 상자에서 상기 개인보호구의 객체를 검출하는 검출부를 포함하여 구현될 수 있다. In addition, the personal protective equipment detection unit includes a backbone that extracts a feature map from the image data and classifies the class according to whether the personal protective equipment is worn or not worn, and an object of the personal protective equipment based on the extracted feature map. A convolution anchor box that creates an anchor box in an area expected to be located and sets a final bounding box within the anchor box based on the class information, and It may be implemented by including a detection unit that detects the object of the personal protective equipment in a set bounding box.

상기 얼굴 인식부는, 상기 MTCNN 모델을 이용하여 상기 영상 데이터로부터 얼굴 영역을 추출하고 크로핑(cropping)하는 구성과, 상기 FaceNet모델을 이용하여 상기 크로핑된 이미지로부터 얼굴 특징을 추출하고 임베딩하는 구성과, 상기 얼굴의 신원을 식별하기 위해 상기 임베딩된 벡터간 거리를 측정하는 구성을 포함하여 구현될 수 있다. The face recognition unit includes a configuration for extracting and cropping a face area from the image data using the MTCNN model, and a configuration for extracting and embedding facial features from the cropped image using the FaceNet model; , may be implemented including a configuration for measuring the distance between the embedded vectors to identify the identity of the face.

상기 모델 검증부는, 상기 개인보호구 검출을 위한 YOLO 모델과 상기 얼굴 인식을 위한 FaceNet 모델로부터 출력되는 값의 성능을 분석하고, 각 모델의 매개 변수인 에폭, 배치 크기를 조절하여 적용할 수 있다.The model verification unit analyzes the performance of the values output from the YOLO model for detecting personal protective equipment and the FaceNet model for face recognition, and can apply the parameters by adjusting the epoch and batch size of each model.

이러한 본 발명에 따르면, 산업 현장을 촬영한 영상 데이터로부터 작업자의 개인보호구 착용 여부 및 신원 확인이 가능하여 사람의 수동적인 안전관리가 불필요하고, 무인으로 상시적, 실시간적, 예방적으로 작업자의 사고 및 재해를 줄일 수 있는 현저한 효과가 있다. According to the present invention, it is possible to check whether a worker is wearing personal protective equipment and his/her identity from video data captured at an industrial site, eliminating the need for manual human safety management, and preventing worker accidents in an unmanned manner on a regular, real-time, and preventive basis. and has a significant effect in reducing disasters.

이처럼 본 발명은 산업 안전 분야는 물론, 출입관리, 보안관리 분야 등에 적용이 가능하여 그 활용도를 확대할 수 있다. In this way, the present invention can be applied not only to the industrial safety field, but also to the access control and security management fields, thereby expanding its utility.

도 1은 본 발명의 실시예에 따른 딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템을 나타낸 구성도이다.
도 2는 본 발명의 실시예에 따른 시스템에서 모델 학습부의 상세 구성도이다.
도 3은 본 발명의 실시예에 따른 시스템에서 모델을 학습시키는 과정을 설명하기 위한 도면이다.
도 4는 데이터 셋을 라벨링한 결과를 일 예로 보여주는 도면이다.
도 5는 본 발명의 실시예에 적용되는 YOLOv5s 모델에 대하여 최적의 매개변수 조건에서 최적의 성능을 출력하는 보여주는 그래프이다.
도 6은 다양한 조건의 산업현장에서 작업자의 머리와 개인보호구의 검출 결과를 보여주는 도면이다.Figure 1 is a configuration diagram showing a deep learning-based worker personal protective equipment wearing and facial identification system according to an embodiment of the present invention.
Figure 2 is a detailed configuration diagram of a model learning unit in a system according to an embodiment of the present invention.
Figure 3 is a diagram for explaining the process of learning a model in a system according to an embodiment of the present invention.
Figure 4 is a diagram showing an example of the results of labeling a data set.
Figure 5 is a graph showing optimal performance output under optimal parameter conditions for the YOLOv5s model applied to an embodiment of the present invention.
Figure 6 is a diagram showing the detection results of workers' heads and personal protective equipment at industrial sites under various conditions.

본 발명의 이점 및/또는 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다.The advantages and/or features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and will be implemented in various different forms, but the present embodiments only serve to ensure that the disclosure of the present invention is complete and are within the scope of common knowledge in the technical field to which the present invention pertains. It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

또한, 이하 실시되는 본 발명의 바람직한 실시예는 본 발명을 이루는 기술적 구성요소를 효율적으로 설명하기 위해 각각의 시스템 기능구성에 기 구비되어 있거나, 또는 본 발명이 속하는 기술분야에서 통상적으로 구비되는 시스템 기능 구성은 가능한 생략하고, 본 발명을 위해 추가적으로 구비되어야 하는 기능 구성을 위주로 설명한다. 만약 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면, 하기에 도시하지 않고 생략된 기능 구성 중에서 종래에 기 사용되고 있는 구성요소의 기능을 용이하게 이해할 수 있을 것이며, 또한 상기와 같이 생략된 구성 요소와 본 발명을 위해 추가된 구성 요소 사이의 관계도 명백하게 이해할 수 있을 것이다.In addition, the preferred embodiments of the present invention to be implemented below are provided in each system function configuration in order to efficiently explain the technical components constituting the present invention, or system functions commonly provided in the technical field to which the present invention pertains. The configuration will be omitted whenever possible, and the description will focus on the functional configuration that must be additionally provided for the present invention. If a person has ordinary knowledge in the technical field to which the present invention pertains, he or she will be able to easily understand the functions of conventionally used components among the functional configurations not shown and omitted below, as well as the omitted configurations as described above. The relationships between elements and components added for the present invention will also be clearly understood.

또한, 이하의 설명에 있어서, 신호 또는 정보의 "전송", "통신", "송신", "수신" 기타 이와 유사한 의미의 용어는 일 구성요소에서 다른 구성요소로 신호 또는 정보가 직접 전달되는 것뿐만이 아니라 다른 구성요소를 거쳐 전달되는 것도 포함한다. 특히 신호 또는 정보를 일 구성요소로 "전송" 또는 "송신"한다는 것은 그 신호 또는 정보의 최종 목적지를 지시하는 것이고 직접적인 목적지를 의미하는 것이 아니다. 이는 신호 또는 정보의 "수신"에 있어서도 동일하다.In addition, in the following description, "transmission", "communication", "transmission", "reception" and other similar terms of signals or information refer to the direct transmission of signals or information from one component to another component. In addition, it also includes those transmitted through other components. In particular, “transmitting” or “transmitting” a signal or information as a component indicates the final destination of the signal or information and does not mean the direct destination. This is the same for “receiving” signals or information.

도 1은 본 발명의 실시예에 따른 딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템을 나타낸 구성도, 도 2는 본 발명의 실시예에 따른 시스템에서 모델 학습부의 상세 구성도, 도 3은 본 발명의 실시예에 따른 시스템에서 모델을 학습시키는 과정을 설명하기 위한 도면, 도 4는 데이터 셋을 라벨링한 결과를 일 예로 보여주는 도면, 도 5는 본 발명의 실시예에 적용되는 YOLOv5s 모델에 대하여 최적의 매개변수 조건에서 최적의 성능을 출력하는 보여주는 그래프, 도 6은 다양한 조건의 산업현장에서 작업자의 머리와 개인보호구의 검출 결과를 보여주는 도면이다.Figure 1 is a configuration diagram showing a deep learning-based worker personal protective equipment wearing and facial identification system according to an embodiment of the present invention, Figure 2 is a detailed configuration diagram of the model learning unit in the system according to an embodiment of the present invention, and Figure 3 is the present invention. A diagram for explaining the process of learning a model in a system according to an embodiment of the invention. Figure 4 is a diagram showing an example of the result of labeling a data set. Figure 5 is an optimal diagram for the YOLOv5s model applied to an embodiment of the invention. 6 is a graph showing the optimal performance output under parameter conditions, and FIG. 6 is a diagram showing the detection results of workers' heads and personal protective equipment at industrial sites under various conditions.

이하, 본 발명의 구체적인 실시예를 첨부된 도면을 참조하여 상세히 설명한다.Hereinafter, specific embodiments of the present invention will be described in detail with reference to the attached drawings.

도 1은 본 발명의 실시예에 따른 딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템(100)을 나타낸 구성도이다. Figure 1 is a configuration diagram showing a deep learning-based worker personal protective equipment wearing and facial identification system 100 according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템(100, 이하 '시스템'이라 함)은 크게 촬영부(101), 모델 학습부(140), 개인보호구 착용 식별부(150), 작업자 신원 식별부(160)를 포함할 수 있다. Referring to FIG. 1, the deep learning-based worker personal protective equipment wearing and facial identification system 100 (hereinafter referred to as “system”) according to an embodiment of the present invention largely includes an imaging unit 101, a model learning unit 140, It may include a personal protective equipment wearing identification unit 150 and a worker identification unit 160.

촬영부(101)는 산업 현장에서 작업하는 사람(이하, 작업자)들을 촬영한 이미지 또는 동영상(이하, 영상 데이터)을 획득한다. 영상 데이터는 이미지 센서를 통해 획득할 수 있는데, 여기서의 이미지 센서는 전하결합소자(Charge-Coupled Device, CCD) 카메라, 적외선(infrared, IR) 카메라, 회색조(gray scale) 이미지 변환기 또는 방사 분석 온도(radiometric temperature) 이미지 변환기 중 어느 하나를 포함할 수 있다. The photographing unit 101 acquires images or videos (hereinafter referred to as video data) of people working at an industrial site (hereinafter referred to as workers). Image data can be acquired through an image sensor, which may be a charge-coupled device (CCD) camera, an infrared (IR) camera, a gray scale image converter, or a radiometric temperature ( radiometric temperature) may include any one of the image converters.

모델 학습부(140)는 촬영부(101)에서 획득한 영상 데이터로부터 두 객체를 인식하기 위한 각 모델을 학습시키는 과정을 수행할 수 있다. 첫 번째는 작업자가 안전원을 위해 착용하는 개인보호구의 객체를 검출하기 위한 모델을 학습한다, 두 번째는 작업자의 신원 확인을 위한 얼굴 인식 모델을 학습할 수 있다. The model learning unit 140 may perform a process of learning each model for recognizing two objects from image data acquired by the photographing unit 101. The first learns a model to detect objects in the personal protective equipment worn by workers for safety purposes, and the second learns a face recognition model to confirm the worker's identity.

여기서, 개인보호구(PPE)는 안전모, 안전고리, 안전화 등을 포함할 수 있다. 일 예로, 본 발명의 실시예에서는 개인보호구로 안전모(helmet)를 선정하였다. 개인보호구 객체를 잘 검출하기 위해서는 작고, 원거리, 어두운 곳, 많은 객체들, 사람의 머리에 있는 상태 등 다양한 산업현장 환경 상황에 대해 검출 속도와 함께 정확도가 요구된다. Here, personal protective equipment (PPE) may include a safety helmet, safety ring, safety shoes, etc. For example, in an embodiment of the present invention, a helmet was selected as personal protective equipment. In order to properly detect personal protective equipment objects, accuracy as well as detection speed is required for various industrial environmental situations such as small, distant, dark places, many objects, and conditions on a person's head.

본 발명의 실시예에서는 개인보호구 객체 검출을 위해서 YOLOv5s 모델에 대해 배치 크기와 에폭(epoch) 등 다양한 조건의 학습과 검증에 대한 실험 후, 개인보호구 객체를 검출할 수 있는 최적의 모델을 추론하여 예측 실험을 한다. 작업자의 신원 확인을 위한 모델로는 인식률이 좋은 FaceNet 모델을 적용할 수 있다.In an embodiment of the present invention, in order to detect personal protective equipment objects, after experimenting with learning and verification of various conditions such as batch size and epoch for the YOLOv5s model, the optimal model for detecting personal protective equipment objects is inferred and predicted. Do an experiment. The FaceNet model, which has a good recognition rate, can be applied as a model for worker identity verification.

개인보호구 착용 식별부(150)는 모델 학습부(140)를 통해 추론된 최적의 모델 알고리즘을 적용하여 실제 영상 데이터(real input data)로부터 작업자의 개인보호구를 검출하고 사전 등록된 개인보호구의 데이터 셋(data set)과 비교하여 개인보호구의 착용 여부를 식별할 수 있다. 이때, 개인보호구의 데이터 셋은 Kaggle(캐글) 또는 임의 서버에서 공유하는 산업현장 안전모에 관한 데이터를 적용할 수 있다. The personal protective equipment wearing identification unit 150 applies the optimal model algorithm deduced through the model learning unit 140 to detect the worker's personal protective equipment from real image data (real input data) and detects the worker's personal protective equipment from the pre-registered personal protective equipment data set. By comparing with (data set), it is possible to identify whether personal protective equipment is worn. At this time, the data set for personal protective equipment can be data on industrial safety helmets shared on Kaggle or any server.

작업자 신원 식별부(160)는 모델 학습부(140)를 통해 추론된 최적의 모델 알고리즘을 적용하여 실제 영상 데이터(real input data)로부터 작업자의 얼굴을 인식하고, 사전 등록된 작업자의 신원 데이터 셋과 비교하여 작업자의 신원을 식별할 수 있다.The worker identity identification unit 160 applies the optimal model algorithm deduced through the model learning unit 140 to recognize the worker's face from real input data, and uses the pre-registered worker's identity data set and By comparison, the identity of the worker can be identified.

이러한 시스템의 구성에 의해, 본 발명의 실시예에 따른 시스템(100)을 사용함으로써, 산업 현장에서 작업자의 신원과 개인보호구 착용 여부를 무인으로 실시간 감지할 수 있어 작업자의 사고 및 재해를 줄일 수 있는 스마트 안전관리 방법을 제공할 수 있다.By configuring this system, by using the system 100 according to an embodiment of the present invention, the identity of the worker and whether or not he or she is wearing personal protective equipment can be detected in real time unmanned at an industrial site, thereby reducing worker accidents and disasters. It can provide smart safety management methods.

이를 효율적으로 제공하기 위해 본 발명의 시스템(100)은 이미지로부터 개인보호구 및 얼굴을 보다 정확하고 정밀하게 검출할 수 있는 모델 알고리즘이 요구되며 이의 학습이 필요하다. In order to efficiently provide this, the system 100 of the present invention requires a model algorithm that can more accurately and precisely detect personal protective equipment and faces from images, and learning this is necessary.

도 2는 본 발명의 실시예에 따른 시스템(100)에서 모델 학습부(140)의 상세 구성도이다.Figure 2 is a detailed configuration diagram of the model learning unit 140 in the system 100 according to an embodiment of the present invention.

도 1의 구성을 참조하여 도 2를 살펴보면, 도 1의 모델 학습부(140)는 기본적으로 딥 러닝(deep learning)을 활용한 YOLO(You Only Look Once) 모델을 이용하여 영상 데이터로부터 개인보호구의 객체를 검출하는 개인보호구 검출부(도 1의 110)와, 딥 러닝(deep learning)을 활용한 MTCNN(Multi Tasking Convolutional Neural Networks) 모델과 FaceNet 모델을 이용하여 영상 데이터로부터 얼굴 특징을 인식하는 얼굴 인식부(도 1의 120)를 포함할 수 있다. 이중 개인보호구 검출부(110)는 백본 객체 탐지부(backbone object detection)(111)와, 컨볼루션 앵커 박스(convolution anchor box)(112), 개인보호구(PPE; Personal Protection Equipment) 예측부(113)를 포함하여 구현될 수 있다. Looking at FIG. 2 with reference to the configuration of FIG. 1, the model learning unit 140 of FIG. 1 basically uses a YOLO (You Only Look Once) model using deep learning to learn personal protective equipment from image data. A personal protective equipment detection unit (110 in FIG. 1) that detects objects, and a face recognition unit that recognizes facial features from image data using a MTCNN (Multi Tasking Convolutional Neural Networks) model using deep learning and a FaceNet model. It may include (120 in FIG. 1). Among them, the personal protective equipment detection unit 110 includes a backbone object detection unit 111, a convolution anchor box 112, and a personal protective equipment (PPE) prediction unit 113. It can be implemented including:

백본 객체 탐지부(111)는 영상 데이터로부터 특징맵을 추출하고 개인보호구의 착용 및 미착용한 경우로 클래스를 분류할 수 있다.The backbone object detection unit 111 can extract a feature map from image data and classify the class into cases of wearing and not wearing personal protective equipment.

컨볼루션 앵커 박스(112)는 추출된 특징맵을 바탕으로 개인보호구의 객체가 위치될 것으로 예상(예측)되는 영역에 앵커 박스(anchor box)를 생성하고 이 앵커 박스 내에서 클래스 정보에 기반하여 최종적인 경계 상자(bounding box)를 설정할 수 있다. The convolutional anchor box 112 creates an anchor box in the area where the object of personal protective equipment is expected to be located (predicted) based on the extracted feature map and creates a final anchor box based on class information within this anchor box. You can set an appropriate bounding box.

컨볼루션 앵커 박스(112)는 PANet: BottleneckCSP를 이용하거나 구현할 수 있다.The convolution anchor box 112 can use or implement PANet: BottleneckCSP.

개인보호구 예측부(113)는 설정된 경계 상자에서 개인보호구(Personal Protection Equipment)의 객체를 검출할 수 있다. The personal protective equipment prediction unit 113 can detect objects of personal protective equipment (Personal Protective Equipment) in the set boundary box.

개인보호구 검출부(110)는 YOLOv5s 모델을 이용하여 객체 탐지 모델 학습(Object Detection Model Training)을 진행할 수 있다.The personal protective equipment detection unit 110 can perform object detection model training using the YOLOv5s model.

다시 말해, 개인보호구 검출부(110)는 YOLOv5s 모델 구조를 이용하여 이미지로부터 특징맵을 추출하는 백본(111)으로 CSP-Darknet을 사용할 수 있다. 헤드는 추출된 특징맵을 바탕으로 물체의 위치를 찾는다. 앵커 박스(Anchor Box)를 처음에 설정하고 이를 이용하여 최종적인 경계상자를 생성할 수 있다. 이때 학습 속도 및 정확도 향상을 위해서 전이학습 기법을 적용할 수 있다. In other words, the personal protective equipment detection unit 110 can use CSP-Darknet as the backbone 111 to extract feature maps from images using the YOLOv5s model structure. The head finds the location of the object based on the extracted feature map. You can initially set an anchor box and use it to create the final bounding box. At this time, transfer learning techniques can be applied to improve learning speed and accuracy.

YOLOv5s 모델에 대해 에폭과 배치(batch) 크기 설정을 다르게 하여 실험을 하였다. 하기의 [표 1]은 에폭(epoch) 설정에 따른 차이로 에폭 200일 때 mAP가 최적의 성능을 보여주고 있다. [표 2]는 배치 크기에 따른 차이로 배치 크기가 32일 때 mAP가 최적의 성능을 보여주고 있다. FPS는 큰 변화가 없었다.An experiment was conducted with different epoch and batch size settings for the YOLOv5s model. [Table 1] below shows the optimal performance of mAP at epoch 200 due to differences in epoch settings. [Table 2] shows the difference depending on the batch size, showing the optimal performance of mAP when the batch size is 32. There was no significant change in FPS.

얼굴 인식부(120)는 FaceNet 모델을 이용하여 얼굴 인식 모델 학습(Face Recognition Model Training)을 진행할 수 있다.The face recognition unit 120 can perform face recognition model training using the FaceNet model.

도 2를 참조하면, 얼굴 인식부(120)는 얼굴 탐지부(121, Face detect), MTCNN 얼굴 크로핑부(122, MTCNN Face crop), 특징 추출부(123, Feature extraction) 및 얼굴 인식 SVM(124, Face Recognition SVM)을 포함할 수 있다.Referring to FIG. 2, the face recognition unit 120 includes a face detection unit (121), a face detection unit (121), an MTCNN face cropping unit (122), a feature extraction unit (123), and a face recognition SVM (124). , Face Recognition SVM) may be included.

얼굴 인식부(120)는 FaceNet 모델의 얼굴 인식 및 SVM(Support Vector Machine) 분류기를 사용하여 입력 영상에서 사람을 식별하는 과정은 다음과 같다. 얼굴 탐지부(121) 및 MTCNN 얼굴 크로핑부(122)의 구성에서 입력 데이터로서 이미지에서 MTCNN(Multi Tasking Convolutional Neural Networks) 통해 얼굴 영역을 추출하고 일정 크기(예컨대 160×160)로 크로핑(cropping)하여 정규화하는 과정을 거칠 수 있다.The face recognition unit 120 uses the FaceNet model's face recognition and SVM (Support Vector Machine) classifier to identify people in the input image as follows. In the configuration of the face detection unit 121 and the MTCNN face cropping unit 122, the face area is extracted from the image as input data through MTCNN (Multi Tasking Convolutional Neural Networks) and cropped to a certain size (e.g., 160 × 160). You can go through the normalization process.

특징 추출부(123)의 구성에서 FaceNet 모델을 이용하여 크로핑된 이미지로부터 얼굴 특징을 추출하여 얼굴 임베딩이라고 하는 128개의 요소 벡터로 표현할 수 있다. 그리고, 얼굴 인식 SVM(124) 구성에서 SVM 모델을 이용하여 임베딩 벡터간 거리 비교로 본인인지 아닌지를 판별할 수 있다. In the configuration of the feature extraction unit 123, facial features can be extracted from the cropped image using the FaceNet model and expressed as a 128 element vector called face embedding. In addition, in the configuration of the face recognition SVM 124, it is possible to determine whether a person is the person or not by comparing distances between embedding vectors using the SVM model.

한편, 본 발명의 실시예에 따른 시스템(100)의 모델 학습부(140)의 모델 검증부(도 1의 130)는 각 모델의 성능 분석을 통해 학습 손실을 최소가 되도록 매개변수를 조절할 수 있다. 이를 테면, 개인보호구 검출을 위해 개인보호구 검출부(110)에서 적용되는 YOLO 모델과 얼굴 인식부(120)에서 적용되는 FaceNet 모델로부터 출력되는 값의 성능을 분석하고, 각 모델의 매개 변수인 에폭, 배치 크기를 변경, 보상할 수 있다. 이에 대해서는 하기의 모델 실험 및 결과 분석에서 자세히 설명하기로 한다. Meanwhile, the model verification unit (130 in FIG. 1) of the model learning unit 140 of the system 100 according to an embodiment of the present invention can adjust parameters to minimize learning loss through performance analysis of each model. . For example, to detect personal protective equipment, the performance of the values output from the YOLO model applied in the personal protective equipment detection unit 110 and the FaceNet model applied in the face recognition unit 120 are analyzed, and the parameters of each model, such as epoch and placement, are analyzed. Size can be changed and compensated. This will be explained in detail in the model experiment and result analysis below.

도 3은 본 발명의 실시예에 따른 시스템(100)에서 모델을 학습시키는 과정을 설명하기 위한 도면이다.Figure 3 is a diagram for explaining the process of learning a model in the system 100 according to an embodiment of the present invention.

본 발명의 실시예에 따른 시스템(100)은 산업현장의 작업자가 포함된 데이터 셋(data set) 이미지를 입력하여 작업자의 개인보호구 검출 및 분류와 얼굴 인식 신원 확인 모델을 도출할 수 있다. The system 100 according to an embodiment of the present invention can input a data set image containing workers at an industrial site to detect and classify the worker's personal protective equipment and derive a facial recognition identity verification model.

도 3은 본 발명의 실시예에 따른 시스템(100)을 이용한 실험 과정을 설명하는 도면으로, 시스템 환경 설정 단계(S1, System environment setup), 얼굴 인식 및 개인보호구에 대한 전이학습 단계(S2, YOLOv5s and FaceNet modification for transfer learning), 얼굴 및 개인보호구 데이터 셋 준비 단계(S3), 모델선정 단계(S4), 학습/검증/시험 단계(S5), 신원 확인 단계(S6, Combine ResultSet) 및 성능 평가 분석 단계(S7, Evaluation)에 대한 상세 과정을 나타내고 있다.Figure 3 is a diagram illustrating an experiment process using the system 100 according to an embodiment of the present invention, including a system environment setup step (S1) and a transfer learning step for face recognition and personal protective equipment (S2, YOLOv5s). and FaceNet modification for transfer learning), face and personal protective equipment data set preparation stage (S3), model selection stage (S4), learning/verification/testing stage (S5), identity verification stage (S6, Combine ResultSet), and performance evaluation analysis. It shows the detailed process for the step (S7, Evaluation).

얼굴 및 개인보호구 데이터 셋 준비 단계(S3)는, 얼굴 데이터 셋을 준비하는 단계(DataSet(Face) Prepare)와 개인보호구 데이터 셋을 준비하는 단계(DataSet(PPE) Prepare)를 포함할 수 있다. 여기서, 얼굴 데이터 셋을 준비하는 단계는, 캐글로부터 데이터를 수집하는 단계(Collection from kaggle), MTCNN을 이용하여 얼굴을 탐지하는 단계(Face Detection(MTCNN)) 및 얼굴 크롭핑을 준비하는 단계(Face Cropping Preparation)를 포함할 수 있다.The face and personal protective equipment data set preparation step (S3) may include a step of preparing a face data set (DataSet (Face) Prepare) and a step of preparing a personal protective equipment data set (DataSet (PPE) Prepare). Here, the steps for preparing the face data set include collecting data from Kaggle (Collection from kaggle), detecting faces using MTCNN (Face Detection (MTCNN)), and preparing face cropping (Face Cropping Preparation) may be included.

또한, 개인보호구 데이터 셋을 준비하는 단계는, 캐글로부터 개인보호구 데이터를 수집하는 단계(Collection from kaggle(PPE)), 매뉴얼 툴을 이용한 이미지 어노테이션 단계(Image Annotation(Manual Tool)) 및 학습 유효 테스트 셋 준비단계(Train-Validation-Test Set Preparation)를 포함할 수 있다.In addition, the steps to prepare the personal protective equipment data set include collecting personal protective equipment data from Kaggle (Collection from kaggle (PPE)), image annotation using a manual tool (Image Annotation (Manual Tool)), and learning validation test set. It may include a preparation stage (Train-Validation-Test Set Preparation).

또한, 모델선정 단계(S4)는, 얼굴 데이터 셋을 준비하는 단계 다음에 수행되는 FaceNet 모델 선정 단계(Model Selection(FaceNet))와, 개인보호구 데이터 셋을 준비하는 단계 다음에 수행되는 Yolo 모델 선정 단계(Model Selection(Yolo))를 포함할 수 있다.In addition, the model selection step (S4) includes the FaceNet model selection step (Model Selection (FaceNet)) performed after the step of preparing the face data set, and the Yolo model selection step performed after the step of preparing the personal protective equipment data set. May include (Model Selection(Yolo)).

학습/검증/시험 단계(S5)는, FaceNet 모델 선정 단계(Model Selection(FaceNet)) 다음에 수행되는 학습 모델링 단계(Training Model), 테스트 셋 유효 검증 단계(Validation on test set), 비디오 피드에 대한 유효 검증 단계(Validation on Video Feed) 및 얼굴 신원 확인 단계(Face ID Identification)를 포함할 수 있다.The learning/verification/testing step (S5) is a learning modeling step (Training Model), a test set validation step (Validation on test set), and a video feed performed after the FaceNet model selection step (Model Selection (FaceNet)). It may include a validation step (Validation on Video Feed) and a face identification step (Face ID Identification).

또한, 학습/검증/시험 단계(S5)는, Yolo 모델 선정 단계(Model Selection(Yolo)) 다음에 수행되는 학습 모델링 단계(Training Model), 테스트 셋 유효 검증 단계(Validation on test set), 비디오 피드에 대한 유효 검증 단계(Validation on Video Feed) 및 객체 탐지 단계(Object Detection)를 포함할 수 있다.In addition, the learning/verification/testing step (S5) is a learning modeling step (Training Model), validation on test set, and video feed performed after the Yolo model selection step (Model Selection (Yolo)). It may include a validation step (Validation on Video Feed) and an object detection step (Object Detection).

상기한 바와 같이 딥 러닝 관점에서, 이미지 분류 문제는 전이학습으로 접근할 수 있다. 이를 위해 먼저 원래 모델에 있던 분류기(classifier)를 삭제하고, 본 발명의 목적에 맞는 새로운 분류기를 추가한 후 개발 모델에 미세 조정할 수 있다. 즉, YOLOv5s 모델, FaceNet 모델, SVM 모델을 미세 조정할 수 있다. As mentioned above, from a deep learning perspective, the image classification problem can be approached through transfer learning. To do this, first delete the classifier in the original model, add a new classifier suitable for the purpose of the present invention, and then fine-tune the developed model. In other words, you can fine-tune the YOLOv5s model, FaceNet model, and SVM model.

개인보호구 검출과 얼굴 인식을 위한 학습 데이터 셋은 kaggle(캐글)에서 공개하는 산업현장 안전모 데이터 셋과 산업현장에서 카메라로 촬영한 영상을 구축하여 혼합 사용할 수 있다. The learning data set for personal protective equipment detection and face recognition can be mixed and used by constructing the industrial site hard hat data set released by Kaggle and images captured by cameras at industrial sites.

또한, 작업자 개인보호구 검출과 얼굴 인식에 의한 신원 확인을 위한 모델 학습 및 검증과 성능 확인은 YOLOv5s 모델을 사용하여 개인보호구를 검출하고, 입력 이미지에서 얼굴 검출은 MTCNN 모델을 이용할 수 있다. 전처리로 입력 이미지에서 얼굴 부분만 추출하고 다음 FaceNet 모델을 사용하여 얼굴 인식 처리과정을 수행하며, SVM 분류기를 이용하여 사전 등록된 작업자의 신원을 확인할 수 있다. In addition, the YOLOv5s model can be used to learn and verify model learning and performance for worker personal protective equipment detection and identity verification by face recognition, and the MTCNN model can be used to detect faces in input images. As preprocessing, only the face part is extracted from the input image, the face recognition process is performed using the FaceNet model, and the identity of the pre-registered worker can be confirmed using the SVM classifier.

* 실험 및 결과 분석* Experiment and result analysis

실험을 위한 환경과 데이터 셋, 객체 검출 및 얼굴 인식에 대한 모델 알고리즘의 성능 평가에 사용되는 지표와 개인보호구 검출 분류 실험 결과, 얼굴 인식 분류 실험 결과에 대하여 기술한다. The environment and data set for the experiment, the indicators used to evaluate the performance of the model algorithm for object detection and face recognition, the results of the personal protective equipment detection and classification experiment, and the results of the face recognition classification experiment are described.

실험 환경은 다음과 같다. The experimental environment is as follows.

OS(Ubuntu 18.04.5 LTS), GPU(Tesla V100-SXM2), CPU(16-Core Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz), RAM(177GB), CUDA(10.1), cuDNN(7.6.0), software(python/pytorch), Deep Learning model(YOLOv5s)를 사용하였다. OS (Ubuntu 18.04.5 LTS), GPU (Tesla V100-SXM2), CPU (16-Core Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz), RAM (177GB), CUDA (10.1), cuDNN( 7.6.0), software (python/pytorch), and Deep Learning model (YOLOv5s) were used.

데이터 셋은 다음과 같다. The data set is as follows.

안전모 검출 학습에 관한 데이터 셋은 캐글(Kaggle)에서 공개하는 산업현장 안전모 데이터와 산업현장에서 카메라로 촬영한 영상을 혼합하여 사용하였다. 캐글의 데이터 셋은 학습 데이터 15,887개, 검증 데이터 4,641개, 테스트 데이터 2,261개로 구성되어 있다. 클래스는 head와 helmet 2가지로 라벨링하였으며, 예컨대, class 0은 head, class 1은 helmet으로 라벨링하였다. 각 클래스별로 head는 약 84,000개, helmet은 약 41,000개이다. 안전모를 착용하지 않은 작업자를 head라 표시하고, 안전모를 착용한 작업자는 helmet으로 표시했다. The data set for learning hard hat detection was a mixture of industrial field hard hat data released by Kaggle and images captured by cameras at industrial sites. Kaggle's data set consists of 15,887 learning data, 4,641 validation data, and 2,261 test data. Classes were labeled as head and helmet. For example, class 0 was labeled as head and class 1 as helmet. For each class, there are approximately 84,000 heads and approximately 41,000 helmets. Workers not wearing safety helmets are indicated as heads, and workers wearing safety helmets are indicated as helmets.

Ground Truth의 영역은 안전모 자체만 있을 경우에 검출되는 것을 피하기 위해 사람의 머리 또는 모자 등을 포함한 머리 영역에 대하여 경계 상자로 설정하는 것이 바람직하다. It is advisable to set the ground truth area as a bounding box for the head area including the human head or hat to avoid detection when only the hard hat itself is present.

도 4에 도시한 바와 같이 검출 객체의 클래스 식별값, 경계 상자에 대한 클래스별 색상과 객체의 위치좌표값(X,Y), 경계 상자의 영역 크기인 넓이와 높이값(W,H)을 표현한다. 정보파일(Helm_0003334.txt)에서 윗줄은 Class 1(helmet)의 정보이며, 아래는 Class 0(사람의 머리)의 정보이다.As shown in Figure 4, the class identification value of the detected object, the color of each class for the bounding box, the position coordinate values of the object (X, Y), and the area size of the bounding box (W, H) are expressed. do. In the information file (Helm_0003334.txt), the top line is information for Class 1 (helmet), and the bottom line is information for Class 0 (human head).

또한, 본 발명의 실험에 필요한 국내 산업현장의 개인보호구에 대한 영상 데이터는 스마트폰 카메라로 촬영하여 데이터 셋을 구축하였다. 얼굴 인식 학습을 위한 데이터 셋은 산업현장에서 안전모 착용과 미착용한 작업자들을 카메라로 촬영한 데이터 셋을 혼용하였다. [표 3]은 얼굴 인식용 데이터 셋 현황을 나타내고 있다.In addition, video data on personal protective equipment at domestic industrial sites required for the experiment of the present invention were captured with a smartphone camera to construct a data set. The data set for face recognition learning was a mixture of data sets captured by cameras of workers wearing and not wearing safety helmets at industrial sites. [Table 3] shows the status of data sets for face recognition.

평가지표는 다음과 같다. The evaluation indicators are as follows.

객체 검출 및 얼굴 인식에 대한 모델 알고리즘의 성능 평가에 사용되는 정량적 지표는 혼동 행렬(Confusion Matrix), PR(Precision-Recall) 그래프, 평균 정밀도(Average Precision), 조화평균(F1 score) 등을 통해 평가한다.Quantitative indicators used to evaluate the performance of model algorithms for object detection and face recognition are evaluated through confusion matrix, PR (Precision-Recall) graph, average precision, harmonic mean (F1 score), etc. do.

다음, 개인보호구 검출 실험 결과에 대하여 기술한다.Next, the results of the personal protective equipment detection experiment are described.

개인보호구 검출을 위한 실험으로, YOLOv5s 모델에 대하여 배치 크기 64로, 에폭 200, 500, 1000, 2000으로 매개 변수를 달리하여 실험을 비교하였다. 640x640 입력 이미지에서 작업자의 머리와 개인보호구를 검출한 결과로, 머리 검출율은 0.937∼0.943, 안전모 검출율은 0.968∼0.974, mAP는 0.953∼0.964이다. As an experiment for personal protective equipment detection, experiments were compared for the YOLOv5s model by varying parameters with a batch size of 64 and epochs of 200, 500, 1000, and 2000. As a result of detecting the worker's head and personal protective equipment in the 640x640 input image, the head detection rate is 0.937 to 0.943, the hard hat detection rate is 0.968 to 0.974, and the mAP is 0.953 to 0.964.

도 5는 배치 크기 64에서 에폭 200인 경우의 PR 그래프를 나타낸 것이다. 배치 크기 64의 검증 결과 그래프에서 에폭 200에서는 과적합이 없으나, 에폭 500, 1000, 2000에서는 과적합을 볼 수 있었다. Figure 5 shows a PR graph for epoch 200 at a batch size of 64. In the verification result graph of batch size 64, there was no overfitting at epoch 200, but overfitting was seen at epochs 500, 1000, and 2000.

도 6은 YOLOv5s모델의 배치 64, 에폭 200의 예측 실험 결과이다. 다양한 상황과 조건의 산업현장에서, 작업자의 머리와 개인보호구의 검출에 대한 결과를 보인 예이다. 객체가 많거나 작아도 객체 검출이 용이하며, 학습을 통해 산업현장이 작고 어두운 곳이라 하더라도 원하는 객체 검출이 가능하다.Figure 6 shows the prediction experiment results of batch 64 and epoch 200 of the YOLOv5s model. This is an example showing the results of detection of workers' heads and personal protective equipment in industrial sites under various situations and conditions. Object detection is easy even if there are many or small objects, and through learning, desired objects can be detected even if the industrial site is small and dark.

하기의 [표 4]는 여러 배치 실험으로 에폭 200을 기준으로 배치 크기를 32, 64, 128로 변경하며 수행하여 모델 성능을 비교하였다. 입력 이미지의 크기는 640ｘ640로 정의하고, 배치 크기가 32일 때 mAP, head, helmet 검출율이 가장 높게 나타나고 있다. [Table 4] below compares model performance by performing several batch experiments by changing the batch size to 32, 64, and 128 based on epoch 200. The size of the input image is defined as 640x640, and the mAP, head, and helmet detection rates are highest when the batch size is 32.

선정 모델 실험 결과는 다양한 배치 크기와 에폭에 대하여 실험하여 그 결과 배치 32, 에폭 200에서 작업자의 머리 및 개인보호구 검출에 대한 최적의 검출율을 보였으므로, 본 발명의 시스템(100)에서는 상기 조건의 모델을 최적의 모델로 선정할 수 있다. 선정한 모델을 적용하여 검증 및 테스트를 수행한 결과로 다양한 산업현장 상황과 조건에서의 작업자의 머리와 개인보호구를 검출하고 인식할 수 있다. The selected model experiment results were tested for various batch sizes and epochs, and as a result, batch 32 and epoch 200 showed the optimal detection rate for detecting workers' heads and personal protective equipment, so the system 100 of the present invention meets the above conditions. The model can be selected as the optimal model. As a result of verification and testing by applying the selected model, it is possible to detect and recognize workers' heads and personal protective equipment in various industrial field situations and conditions.

다음으로, 얼굴 인식 분류 실험 결과는 하기와 같다. Next, the results of the face recognition classification experiment are as follows.

앞서 설명한 바와 같이 신원 확인을 위한 얼굴 검출은 MTCNN 모델을 적용하며 얼굴 특징은 FaceNet 모델을 사용할 수 있다. 신원 확인의 과정은 이미지를 FaceNet 입력에 맞게 160ｘ160 크기로 정규화 과정을 거쳐 저장하고, 얼굴 특징을 추출하며, 임베딩 벡터를 이용하여 신원을 확인할 수 있다.As previously explained, the MTCNN model is applied for face detection for identity verification, and the FaceNet model can be used for facial features. The process of identity verification is to store the image through a normalization process at a size of 160

얼굴 인식 학습을 위한 데이터 셋은 동일인을 선택하여 9개의 클래스를 구성하고, 스마트 폰으로 촬영한 동일인 국내 작업자 9명으로 제작한 9개의 클래스로 구성하여 총 18개의 클래스를 구성하였다. 하기의 [표 5]는 스마트폰 촬영 영상으로 구성된 9개의 클래스에 대하여 훈련용 203개, 검증용 44개, 총 합계 247개인 데이터 셋이다.The data set for face recognition learning consisted of 9 classes created by selecting the same person, and 9 classes created by 9 domestic workers of the same person taken with a smartphone, making a total of 18 classes. [Table 5] below is a data set of 203 for training, 44 for verification, and a total of 247 for 9 classes composed of smartphone captured videos.

얼굴 검출 및 크롭핑은 FaceNet 모델의 얼굴 인식 및 분류하기 위한 전처리 과정으로, MTCNN을 이용하여 객체 감지 및 얼굴 부분만 크롭핑할 수 있다. 다음으로 MTCNN으로 검출된 얼굴 영상을 FaceNet 모델의 입력으로 하여 얼굴 특징을 임베딩 벡터로 표현할 수 있다. 임베딩 벡터는 신원 확인을 위해 확인하고자 하는 이미지의 임베딩 벡터와의 거리로 본인인지 아닌지를 판별할 수 있다. 따라서 신원 확인을 위한 최적의 거리 임계값을 구하기 위해 동일 인물로 구성된 50쌍, 서로 다른 인물로 구성된 50쌍, 총 100쌍의 얼굴쌍을 만들고 그중 30쌍을 무작위로 추출해서 얼굴을 검출하고, FaceNet 학습모델을 이용하여 검출한 얼굴의 특징을 임베딩한 후 임베딩 벡터를 거리를 다르게 하여 비교하는 실험을 하였다. 이 실험의 정확도는 0.07, 정밀도는 0.73, 재현도는 0.65, 조화평균은 0.69로 산출되었다.Face detection and cropping are preprocessing processes for recognizing and classifying faces in the FaceNet model, and MTCNN can be used to detect objects and crop only the face part. Next, facial features detected by MTCNN can be used as input to the FaceNet model to express facial features as an embedding vector. The embedding vector can be used to determine whether the person is the person or not based on the distance from the embedding vector of the image to be checked for identity verification. Therefore, in order to find the optimal distance threshold for identity verification, we created a total of 100 face pairs, 50 pairs of the same person and 50 pairs of different people, and randomly selected 30 of them to detect faces and use FaceNet. After embedding the facial features detected using a learning model, an experiment was conducted to compare the embedding vectors at different distances. The accuracy of this experiment was calculated as 0.07, precision as 0.73, recall as 0.65, and harmonic mean as 0.69.

얼굴 30쌍 영상의 에폭 50, 100, 200에 대하여 거리 0.5에서 2.9까지 검출한 얼굴 인식의 값을 살펴보면, 에폭에 관계없이 임베딩 벡터를 사용하여 거리를 비교할 경우 1.1∼1.5의 거리를 사용하는 것이 좋고, 1.4일 때 공통적으로 21개를 검출하였으므로 거리 임계값을 1.4로 하여 임베딩 벡터를 비교하는 모델을 설정하는 것이 좋다는 결과를 얻었다.Looking at the face recognition values detected from distances 0.5 to 2.9 for epochs 50, 100, and 200 of images of 30 pairs of faces, when comparing distances using embedding vectors regardless of epoch, it is better to use a distance of 1.1 to 1.5. Since 21 items were detected in common at 1.4, the result was obtained that it is better to set a model to compare embedding vectors with the distance threshold set to 1.4.

신원 확인을 위해서 임베딩 벡터를 입력하여 SVM 분류기를 훈련할 수 있다. MTCNN에서 검출된 얼굴 이미지는 FaceNet에 입력되어 얼굴의 특징을 표현하는 128 차원의 임베딩 벡터를 생성하고 SVM 신원을 확인한다. 최적의 SVM 모델을 도출하기 위해 에폭 50과 100에 대해 실험하여 에폭 100에서 정확도, 재현도, 조화 평균의 성능 결과가 더 좋음을 알 수 있었다. To verify identity, an SVM classifier can be trained by inputting the embedding vector. The face image detected by MTCNN is input to FaceNet to generate a 128-dimensional embedding vector representing facial features and confirm the SVM identity. To derive the optimal SVM model, experiments were conducted on epochs 50 and 100, and it was found that accuracy, recall, and harmonic mean performance results were better at epoch 100.

임베딩 벡터의 거리 임계값을 이용한 경우 에폭 100에 거리 임계값 1.4를 한 모델을 최적의 모델로 선정하였고 SVM을 이용하여 에폭 100으로 학습한 모델을 최적의 모델로 선정하였다. 두 결과를 비교할 때 SVM의 결과가 더 좋은 성능을 보임을 알 수 있다. When using the distance threshold of the embedding vector, a model with an epoch of 100 and a distance threshold of 1.4 was selected as the optimal model, and a model learned with an epoch of 100 using SVM was selected as the optimal model. When comparing the two results, it can be seen that the results of SVM show better performance.

따라서, 본 발명의 시스템(100)에서는 FaceNet을 에폭 100으로 훈련한 후 SVM에 입력으로 사용하여 에폭 100으로 훈련한 모델을 최적의 모델로 선정하였다. 카메라로 촬영한 안전모를 착용한 작업자 얼굴 110개의 학습데이터로 실험한 성능 결과로 정확도는 0.98, 정밀도는 1.0, 재현도는 0.92로 나타났다.Therefore, in the system 100 of the present invention, FaceNet was trained with epoch 100 and then used as input to SVM, and the model trained with epoch 100 was selected as the optimal model. As a result of an experiment with learning data of 110 faces of workers wearing hard hats captured with a camera, the accuracy was 0.98, precision was 1.0, and recall was 0.92.

이처럼, 본 발명에서 제안하는 시스템(100)은 기존 연구와 비교할 때 개인보호구 검출 방식과 객체 검출 모델 적용 방식의 차별성이 있다. 하기의 [표 6]을 참조하면 기존 연구와의 차별성을 나타내고 있다.As such, the system 100 proposed in the present invention has differences in the method of detecting personal protective equipment and the method of applying the object detection model compared to existing studies. Referring to [Table 6] below, it shows the difference from existing studies.

결과적으로, 본 발명의 시스템은 개인보호구 검출을 위해 YOLOv5s 모델에 대해 에폭별, 배치 크기별 실험을 통해 최적의 모델을 선정하고, 선정된 모델에 대해 학습과 검증 과정으로 결과를 도출하고 분석하였다. 첫 번째로, 최적의 에폭을 선택하기 위해 배치 크기를 64로 정의하고 에폭을 200,500,1000,2000으로 달리하는 에폭 실험을 통해 IoU(Intersection over Union)가 0.5일 때 기준으로 에폭 200일 때 최적의 모델로 확인할 수 있었다. 두 번째로, 에폭 200에서 배치 32,64,128에 대한 배치 실험을 통해 검출한 결과, 에폭 200에서 배치 크기 32 일 때 mAP 0.959로 최적의 성능을 보였다. 세 번째로, 실시간 객체 검출이 가장 빠른 YOLOv5s모델에 대해 이미지에서 객체 검출 및 인식에 대한 검증 및 테스트를 수행하였으며, 다양한 산업현장 환경에서의 작업자 머리와 개인보호구를 잘 인식하는 것이 확인되었다.As a result, the system of the present invention selected the optimal model through experiments for each epoch and batch size for the YOLOv5s model to detect personal protective equipment, and derived and analyzed results for the selected model through a learning and verification process. First, in order to select the optimal epoch, the batch size was defined as 64 and through epoch experiments varying the epochs to 200, 500, 1000, and 2000, the optimal epoch was found to be 200 based on an Intersection over Union (IoU) of 0.5. This could be confirmed with the model. Second, as a result of detection through batch experiments on batches 32,64,128 at epoch 200, optimal performance was shown with mAP 0.959 when the batch size was 32 at epoch 200. Third, verification and testing of object detection and recognition in images was performed on the YOLOv5s model, which has the fastest real-time object detection, and it was confirmed that it recognized worker heads and personal protective equipment well in various industrial field environments.

한편, 작업자 신원 확인을 위해 MTCNN 얼굴 검출과 FaceNet 얼굴 인식 방법으로 단계별 실험하였다. 1단계로 입력 이미지에서 얼굴검출 및 160ｘ160 크기로 정규화 과정과 저장을 하고, 2단계로 얼굴 이미지는 FaceNet으로 입력되어 얼굴특징 및 임베딩 벡터를 추출하였다. 추출한 특징을 SVM의 입력으로 하여 신원을 확인하였다. 3단계로 임베딩 벡터 거리를 이용하여 신원을 확인하였다. Meanwhile, step-by-step experiments were conducted using MTCNN face detection and FaceNet face recognition methods to confirm worker identity. In the first step, the face was detected from the input image, normalized to a size of 160 Identity was confirmed using the extracted features as input to SVM. In three steps, the identity was confirmed using the embedding vector distance.

본 발명의 실시예에서는 촬영 데이터 18명(Class 18)의 얼굴쌍에 대해 SVM에 학습한 결과, 에폭 50과 100으로 실험하여 에폭 50과 100의 차이는 전체적으로 정확도와 정밀도, 재현도에서는 비슷하였으나, 에폭 100 조건의 조화 평균이 다소 좋은 결과로 보였다. 또한 FaceNet을 에폭 100으로 훈련한 후 SVM에 입력으로 사용하여, 에폭 100으로 훈련한 모델을 최적의 모델로 선정하였다.In an embodiment of the present invention, as a result of learning SVM on face pairs of 18 people (Class 18) of photographed data, experiments were conducted with epochs 50 and 100, and the differences between epochs 50 and 100 were similar in overall accuracy, precision, and recall. The harmonic average of the epoch 100 condition seemed to be a rather good result. Additionally, FaceNet was trained with epoch 100 and then used as input to SVM, and the model trained with epoch 100 was selected as the optimal model.

이처럼, YOLOv5 4모델의 배치 크기와 에폭 조건에 따른 비교 실험과, 얼굴 임베딩 벡터값의 정확도를 에폭 50,100,200 조건에 따라 비교한 실험 등과 같이 다양한 조건의 실험을 통해 최적 성능을 보인 YOLOv5s 모델과 FaceNet 모델을 도출하였다. 도출한 최적 모델에서 작업자의 개인보호구 착용/미착용 여부와 얼굴 인식에 의한 신원을 확인하였다.In this way, the YOLOv5s model and FaceNet model, which showed optimal performance through experiments under various conditions, such as a comparison experiment according to the batch size and epoch conditions of the YOLOv5 4 model, and an experiment comparing the accuracy of the face embedding vector value according to the epoch 50, 100, and 200 conditions, were compared. Derived. In the derived optimal model, it was confirmed whether the worker was wearing or not wearing personal protective equipment and his/her identity through facial recognition.

이상에서 설명된 시스템 내지 장치는 하드웨어 구성 요소, 소프트웨어 구성 요소, 및/또는 하드웨어 구성 요소 및 소프트웨어 구성 요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성 요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general-purpose or special-purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may execute an operating system (OS) and one or more software applications that run on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and thus stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

이상의 설명은 본 발명을 예시적으로 설명한 것에 불과하며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술적 사상에서 벗어나지 않는 범위에서 다양한 변형이 가능할 것이다. 따라서 본 발명의 명세서에 개시된 실시예들은 본 발명을 한정하는 것이 아니다. 본 발명의 범위는 아래의 청구범위에 의해 해석되어야 하며, 본 출원 시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들은 본 발명의 범위에 포함되는 것으로 해석해야 할 것이다.The above description is merely an exemplary description of the present invention, and various modifications may be made by those skilled in the art without departing from the technical spirit of the present invention. Accordingly, the embodiments disclosed in the specification of the present invention do not limit the present invention. The scope of the present invention should be interpreted in accordance with the claims below, and various equivalents and modifications that can replace them at the time of filing this application should be construed as being included in the scope of the present invention.

100:딥 러닝 기반 작업자 개인보호구 착용 및 얼굴 신원 확인 시스템
101: 촬영부 110: 개인보호구 검출부
120: 얼굴 인식부 130: 모델 검증부
140: 모델 학습부 150: 개인보호구 착용 식별부
160: 작업자 신원 식별부100: Deep learning-based worker personal protective equipment wearing and facial identification system
101: imaging unit 110: personal protective equipment detection unit
120: face recognition unit 130: model verification unit
140: Model learning unit 150: Personal protective equipment wearing identification unit
160: Worker identification unit

Claims

A photography unit that acquires video data captured by workers;
A model learning unit that learns each model for recognizing the object of the personal protective equipment worn by the worker and the worker's face from the image data and then infers an optimal model algorithm;
A personal protective equipment wearing identification unit that detects the worker's personal protective equipment from actual image data by applying the model algorithm inferred through the model learning unit and identifies whether the personal protective equipment is worn by comparing it with a data set of pre-registered personal protective equipment; and
A worker identity identification unit that recognizes the worker's face from actual image data by applying the model algorithm inferred through the model learning unit and identifies the worker's identity by comparing it with a pre-registered worker's identity data set;
Deep learning-based worker personal protective equipment wearing and facial identification system including.

According to paragraph 1,
The model learning unit,
A personal protective equipment detection unit that detects objects of personal protective equipment from the image data using a YOLO (You Only Look Once) model using deep learning;
A face recognition unit that recognizes facial features from the image data using a MTCNN (Multi Tasking Convolutional Neural Networks) model using deep learning and a FaceNet model;
A model verification unit that adjusts parameters to minimize learning loss through performance analysis of the YOLO model, MTCNN model, and FaceNet model;
A deep learning-based worker personal protective equipment wearing and facial identification system comprising a.

According to paragraph 2,
The personal protective equipment detection unit,
A backbone that extracts feature maps from the image data and classifies classes according to cases of wearing and not wearing personal protective equipment,
Based on the extracted feature map, an anchor box is created in the area where the object of the personal protective equipment is expected to be located, and a final bounding box is created within the anchor box based on the class information. A convolution anchor box to set, and
A detection unit that detects the object of the personal protective equipment in the set boundary box
A deep learning-based worker personal protective equipment wearing and facial identification system comprising a.

According to paragraph 2,
The face recognition unit,
A configuration for extracting and cropping a face area from the image data using the MTCNN model,
A configuration for extracting and embedding facial features from the cropped image using the FaceNet model,
Configuration for measuring the distance between the embedded vectors to identify the identity of the face
A deep learning-based worker personal protective equipment wearing and facial identification system comprising a.

According to clause 3 or 4,
The model verification unit,
Deep learning-based worker personalization, characterized in that the performance of the values output from the YOLO model for personal protective equipment detection and the FaceNet model for face recognition are analyzed and the parameters of each model, such as epoch and batch size, are adjusted and applied. Protective equipment wearing and facial identification system.