KR102615378B1

KR102615378B1 - Behavioral recognition-based risk situation detection system and method

Info

Publication number: KR102615378B1
Application number: KR1020210180444A
Authority: KR
Inventors: 김명호; 임채현
Original assignee: 숭실대학교 산학협력단
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2023-12-19
Also published as: KR20230091380A

Abstract

본 발명은 행동인식 기반 위험 상황 감지 시스템 및 방법에 관한 것으로, 본 발명에 따르면, 주거 공간에 설치되어 촬영을 수행하는 촬영부 및 상기 촬영부를 통해 촬영된 영상으로부터 이미지 데이터를 획득하고, 획득된 이미지 데이터에 블러 처리한 후 사람에 해당하는 객체의 행동을 인식하여 위험 상황에 대한 판단을 수행하는 위험 감지 장치를 포함하고, 상기 위험 감지 장치는 상기 촬영부로부터 촬영된 영상을 수신 받아 복수의 프레임을 포함하는 상기 이미지 데이터를 획득하고, 상기 이미지 데이터에 블러 처리하는 이미지 추출부; 블러 처리된 이미지 데이터에서 상기 객체를 검출하여, 검출된 객체에 따라 인원을 파악하고 전처리를 통해 객체 이미지 데이터를 생성하는 객체 검출부 및 파악된 인원에 따라 상기 객체 이미지 데이터로부터 행동을 인식하여 위험 상황 여부를 판단하는 행동 인식부를 포함하는 행동인식 기반 위험 상황 감지 시스템을 제공할 수 있다.
또한 위험 감지 장치가 촬영부로부터 촬영된 영상을 수신 받아 복수의 프레임을 포함하는 이미지 데이터를 획득하는 이미지 추출단계; 획득한 이미지 데이터에 블러 처리하는 블러 처리단계; 블러 처리된 이미지 데이터에서 사람에 해당하는 객체를 검출하여 인원을 파악하고, 전처리를 통해 객체 이미지 데이터를 생성하는 객체 검출단계; 파악된 인원에 따라 상기 객체 이미지 데이터로부터 행동을 인식하는 행동 인식단계 및 인식된 행동에 따라 위험 상황 여부를 판단하는 상황 판단단계를 포함하는 행동인식 기반 위험 상황 감지 방법을 제공할 수 있다.The present invention relates to a system and method for detecting dangerous situations based on behavior recognition. According to the present invention, image data is acquired from a photographing unit installed in a residential space and performs photography, and images captured through the photographing unit, and the obtained image It includes a risk detection device that performs a judgment on a risk situation by recognizing the behavior of an object corresponding to a person after blurring the data, wherein the risk detection device receives the captured image from the photographing unit and generates a plurality of frames. an image extraction unit that acquires the image data and performs a blur process on the image data; An object detection unit that detects the object from blurred image data, identifies people according to the detected object, and generates object image data through preprocessing, and recognizes behavior from the object image data according to the identified people to determine whether there is a dangerous situation. A behavior recognition-based risk situation detection system including a behavior recognition unit that determines can be provided.
In addition, an image extraction step in which the risk detection device receives the captured image from the photographing unit and acquires image data including a plurality of frames; A blur processing step of blurring the acquired image data; An object detection step of detecting objects corresponding to people in blurred image data to identify people and generating object image data through preprocessing; It is possible to provide a behavior recognition-based risk situation detection method that includes a behavior recognition step of recognizing behavior from the object image data according to the identified number of people and a situation determination step of determining whether or not there is a risk situation according to the recognized behavior.

Description

Behavioral recognition-based risk situation detection system and method}

본 발명은 행동인식 기반 위험 상황 감지 시스템 및 방법에 관한 것으로, 더욱 자세하게는 영상을 활용하여 1인 가구에서 발생할 수 있는 고독사 또는 주거침입의 위험상황을 감지하여 대처할 수 있도록 하되, 블러 처리를 통해 개인의 프라이버시를 보장할 수 있는 행동인식 기반 위험 상황 감지 시스템 및 방법에 관한 것이다.The present invention relates to a behavior recognition-based risk situation detection system and method, and more specifically, to detect and respond to risk situations of lonely death or home invasion that may occur in a single-person household using video, but through blur processing. This relates to a behavior recognition-based risk situation detection system and method that can guarantee individual privacy.

1인 가구의 비율이 증가함에 따라 많은 문제들이 발생하고 있는데, 65세 이상의 독거노인들은 돌봐줄 사람이 없어 고독사 비율이 점차 증가하고 있고, 청년들에서도 1인 가구가 증가함에 따라 주거침입에 따른 많은 사건·사고가 발생하고 있다.As the proportion of single-person households increases, many problems are occurring. The rate of lonely deaths is gradually increasing among elderly people over 65 years old who live alone because they have no one to take care of them. As the number of single-person households also increases among young people, the number of people living alone is increasing due to home invasions. Many incidents and accidents are occurring.

이에 독거노인들의 고독사 등 위험 상황을 예방하기 위해 노인의 상태를 실시간으로 감시할 수 있는 모니터링 시스템이 개발되고 있는데, 집안의 독거 노인을 육안으로 모니터링하고 평소와 다른 이상 징후가 탐지되면 이를 관련자에게 알릴 수 있도록 하는 비교적 단순한 형태의 수동 모니터링 시스템에서부터, 노인의 움직임을 세밀하게 분석하여 이상 징후의 조기 탐지 및 이상 징후를 효율적으로 판단하는 시스템까지, 다양한 분야에서의 연구가 진행되고 있다.Accordingly, in order to prevent dangerous situations such as lonely deaths among elderly people living alone, a monitoring system is being developed that can monitor the condition of the elderly in real time. The elderly living alone in the house are visually monitored and if unusual signs are detected, it is reported to the relevant person. Research is being conducted in a variety of fields, ranging from relatively simple passive monitoring systems that provide notification to systems that analyze the movements of elderly people in detail to detect abnormal signs early and efficiently determine abnormal signs.

최근 모니터링 시스템에 딥러닝 기술이 활용되어 영상을 기반으로, 영상 정보 분석을 통해 자동으로 노인의 움직임을 판단하여 위험 상황을 감지할 수 있는 기술이 개발되고 있다.Recently, deep learning technology has been used in monitoring systems, and technology is being developed that can detect dangerous situations by automatically determining the movement of the elderly through video information analysis based on video.

그러나, 상기와 같은 모니터링 시스템은 한 사람에 대한 행동만을 인식할 수 있어 두 사람 이상에 대한 행동을 인식할 수 없는 한계를 가지고 있어 고독사뿐만 아니라 주거침입에 따른 위험 상황에 대한 판단을 할 수 없는 문제가 있다.However, the above-mentioned monitoring system has the limitation that it can only recognize the actions of one person and cannot recognize the actions of two or more people, making it impossible to judge not only lonely deaths but also dangerous situations due to home invasions. there is a problem.

또한 모니터링 시스템은 주거공간에 대한 촬영이 지속적으로 이루어져야 하는 특징이 있어, 1인 가구의 연령층이 다양해짐에 따라 개인의 사생활이 노출되는 것에 대해 불편함을 가질 수 있는 문제가 있다.In addition, the monitoring system has the characteristic of requiring continuous filming of residential spaces, and as the age group of single-person households diversifies, there is a problem that individuals may feel uncomfortable about their privacy being exposed.

따라서 개인의 프라이버시를 보장하면서도 한 사람 이상의 행동에 대해 인식하여 위험 상황을 판단할 수 있는 시스템에 대한 개발이 필요하다.Therefore, there is a need to develop a system that can determine risk situations by recognizing the actions of more than one person while ensuring individual privacy.

한국공개특허 제10-2021-0090771호(2020.01.10), 독거노인 전용 원격 모니터링 IoT 모듈러 하우스 시스템Korea Patent Publication No. 10-2021-0090771 (2020.01.10), remote monitoring IoT modular house system exclusively for seniors living alone

상기와 같은 문제를 해결하고자, 본 발명은 영상을 활용하여 1인 가구에서 발생할 수 있는 고독사 또는 주거침입의 위험상황을 감지하여 대처할 수 있도록 하되, 블러 처리를 통해 개인의 프라이버시를 보장할 수 있는 행동인식 기반 위험 상황 감지 시스템 및 방법을 제공하는데 있다.In order to solve the above problems, the present invention utilizes video to detect and respond to risky situations of lonely death or home invasion that may occur in a single-person household, while ensuring individual privacy through blur processing. The goal is to provide a behavior recognition-based risk situation detection system and method.

상기와 같은 과제를 해결하기 위하여, 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 시스템은 주거 공간에 설치되어 촬영을 수행하는 촬영부 및 상기 촬영부를 통해 촬영된 영상으로부터 이미지 데이터를 획득하고, 획득된 이미지 데이터에 블러 처리한 후 사람에 해당하는 객체의 행동을 인식하여 위험 상황에 대한 판단을 수행하는 위험 감지 장치를 포함하고, 상기 위험 감지 장치는 상기 촬영부로부터 촬영된 영상을 수신 받아 복수의 프레임을 포함하는 상기 이미지 데이터를 획득하고, 상기 이미지 데이터에 블러 처리하는 이미지 추출부; 블러 처리된 이미지 데이터에서 상기 객체를 검출하여, 검출된 객체에 따라 인원을 파악하고 전처리를 통해 객체 이미지 데이터를 생성하는 객체 검출부 및 파악된 인원에 따라 상기 객체 이미지 데이터로부터 행동을 인식하여 위험 상황 여부를 판단하는 행동 인식부를 포함하는 행동인식 기반 위험 상황 감지 시스템을 제공할 수 있다.In order to solve the above problems, the behavior recognition-based dangerous situation detection system according to an embodiment of the present invention acquires image data from a photographing unit that is installed in a residential space to perform photography and images captured through the photographing unit, It includes a risk detection device that performs a judgment on a risk situation by recognizing the behavior of an object corresponding to a person after blurring the acquired image data, wherein the risk detection device receives the image captured from the photographing unit and detects a plurality of images. an image extraction unit that acquires the image data including the frame and performs blur processing on the image data; An object detection unit that detects the object from blurred image data, identifies people according to the detected object, and generates object image data through preprocessing, and recognizes behavior from the object image data according to the identified people to determine whether there is a dangerous situation. A behavior recognition-based risk situation detection system including a behavior recognition unit that determines can be provided.

또한 상기 이미지 추출부는, 상기 이미지 데이터에 가우시안 블러 필터를 적용하여 블러 처리하는 것을 특징으로 한다.In addition, the image extractor is characterized in that the image data is blurred by applying a Gaussian blur filter.

또한 상기 객체 검출부는, 블러 처리된 이미지 데이터의 프레임 별로 상기 객체에 대한 바운딩 박스(bounding box)를 추출하여 인원을 파악하는 인원 확인부 및 상기 인원 확인부로부터 파악된 인원 정보에 따라 상기 이미지 데이터의 각 프레임을 크롭(crop)하여 복수의 객체 프레임을 획득하는 것으로, 상기 객체 이미지 데이터를 생성하는 이미지 전처리부를 포함할 수 있다.In addition, the object detection unit extracts a bounding box for the object for each frame of blurred image data and detects the number of people according to the person identification unit and the person identification unit. It may include an image pre-processing unit that crops each frame to obtain a plurality of object frames and generates the object image data.

또한 상기 인원 확인부는, 복수의 바운딩 박스가 추출되어 상기 인원이 두 사람 이상이라고 파악될 경우, 상기 바운딩 박스 간의 교집합 영역 여부를 확인하는 것을 특징으로 한다.In addition, when a plurality of bounding boxes are extracted and the number of people is determined to be two or more, the person confirmation unit is characterized in that it checks whether there is an intersection area between the bounding boxes.

또한 상기 이미지 전처리부는, 상기 인원이 한 사람으로 파악될 경우, 상기 프레임 별로 추출된 바운딩 박스를 왼쪽 상단 꼭짓점 좌표와 오른쪽 하단 꼭짓점 좌표로 변환하여 박스 꼭짓점 좌표 정보를 획득하고, 복수의 프레임을 기준으로 상기 박스 꼭짓점 좌표 정보들을 비교하여 x좌표 기준 최소 값 상단 좌표와 최대 값 하단 좌표인 최소최대 좌표를 추출하고, 상기 최소최대 좌표를 기준으로 크롭 영역을 설정하여 각 프레임을 크롭(crop)하는 것을 특징으로 한다.In addition, when the person is identified as one person, the image pre-processing unit converts the bounding box extracted for each frame into upper left vertex coordinates and lower right vertex coordinates to obtain box vertex coordinate information, and based on a plurality of frames By comparing the box vertex coordinate information, the minimum and maximum coordinates, which are the upper coordinates of the minimum value and the lower coordinates of the maximum value, are extracted based on the x coordinate, and each frame is cropped by setting a crop area based on the minimum and maximum coordinates. Do it as

또한 상기 이미지 전처리부는, 상기 인원이 두 사람 이상이라고 파악되되, 상기 바운딩 박스 간의 교집합 영역이 존재할 경우, 상기 프레임 별로 교집합 영역이 존재하는 바운딩 박스들을 각각 왼쪽 상단 꼭짓점 좌표와 오른쪽 하단 꼭짓점 좌표로 변환하여 박스 꼭짓점 좌표 정보를 획득하고, 상기 프레임 별로 교집합 영역이 존재하는 바운딩 박스의 상기 박스 꼭짓점 좌표 정보들을 토대로 박스 영역의 x좌표 기준 최소 값 상단 좌표와 최대 값 하단 좌표인 박스 영역 좌표 정보를 추출하고, 복수의 프레임을 기준으로 상기 박스 영역 좌표 정보들을 비교하여 x좌표 기준 최소 값 상단 좌표와 최대 값 하단 좌표인 최소최대 좌표를 추출하고, 상기 최소최대 좌표를 기준으로 크롭 영역을 설정하여 각 프레임을 크롭(crop)하는 것을 특징으로 한다.In addition, if the image pre-processing unit determines that the number of people is two or more people and an intersection area between the bounding boxes exists, the bounding boxes in which the intersection area exists for each frame are converted into coordinates of the upper left vertex and lower right vertex, respectively. Obtain box vertex coordinate information, and extract box area coordinate information, which is the minimum upper coordinate and maximum value lower coordinates based on the x coordinate of the box area, based on the box vertex coordinate information of the bounding box in which the intersection area exists for each frame, Compare the box area coordinate information based on a plurality of frames to extract the minimum and maximum coordinates, which are the upper coordinates of the minimum value and the lower coordinates of the maximum value based on the x coordinate, and crop each frame by setting the crop area based on the minimum and maximum coordinates. It is characterized by (crop).

또한 상기 행동 인식부는, 행동 인식 모델을 통해 파악된 인원에 따라 상기 객체 이미지 데이터로부터 행동을 인식하여 위험 상황 여부를 판단하되, 상기 행동 인식 모델은, 상기 객체 이미지 데이터의 복수의 객체 프레임 중 일부를 입력 받아 공간 및 색에 대한 특징을 추출하는 Slow 모델; 상기 Slow 모델 보다 많은 객체 프레임을 입력 받아 시간 변화에 따른 특징을 추출하는 Fast 모델 및 상기 인원을 기준으로 상기 Slow 모델과 Fast 모델로부터 추출된 특징을 종합하여 행동을 인식하고, 인식된 행동에 따라 위험 상황 여부에 대한 판단을 수행하는 예측부를 포함할 수 있다.In addition, the behavior recognition unit determines whether a dangerous situation exists by recognizing behavior from the object image data according to the number of people identified through the behavior recognition model, and the behavior recognition model selects some of the plurality of object frames of the object image data. Slow model that receives input and extracts spatial and color features; The Fast model receives more object frames than the Slow model and extracts features according to time changes, and recognizes behavior by combining the features extracted from the Slow model and Fast model based on the number of people, and risks depending on the recognized behavior. It may include a prediction unit that determines whether the situation is present or not.

또한 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 방법은 위험 감지 장치가 촬영부로부터 촬영된 영상을 수신 받아 복수의 프레임을 포함하는 이미지 데이터를 획득하는 이미지 추출단계; 획득한 이미지 데이터에 블러 처리하는 블러 처리단계; 블러 처리된 이미지 데이터에서 사람에 해당하는 객체를 검출하여 인원을 파악하고, 전처리를 통해 객체 이미지 데이터를 생성하는 객체 검출단계; 파악된 인원에 따라 상기 객체 이미지 데이터로부터 행동을 인식하는 행동 인식단계 및 인식된 행동에 따라 위험 상황 여부를 판단하는 상황 판단단계를 포함하는 행동인식 기반 위험 상황 감지 방법을 제공할 수 있다.In addition, the behavior recognition-based risk situation detection method according to an embodiment of the present invention includes an image extraction step in which a risk detection device receives a captured image from a photographing unit and acquires image data including a plurality of frames; A blur processing step of blurring the acquired image data; An object detection step of detecting objects corresponding to people in blurred image data to identify people and generating object image data through preprocessing; It is possible to provide a behavior recognition-based risk situation detection method that includes a behavior recognition step of recognizing behavior from the object image data according to the identified number of people and a situation determination step of determining whether or not there is a risk situation according to the recognized behavior.

여기서 상기 객체 검출단계는, 블러 처리된 이미지 데이터의 프레임 별로 상기 객체에 대한 바운딩 박스(bounding box)를 추출하는 객체 추출단계 및 추출된 바운딩 박스에 따라 인원을 파악하는 인원 확인단계를 포함할 수 있다.Here, the object detection step may include an object extraction step of extracting a bounding box for the object for each frame of blurred image data and a person confirmation step of identifying people according to the extracted bounding box. .

또한 상기 객체 검출단계는, 상기 인원 확인단계에서 파악된 인원 정보가 한 사람일 경우, 상기 프레임 별로 상기 바운딩 박스를 변환하여 박스 꼭짓점 좌표 정보를 획득하고, 상기 박스 꼭짓점 좌표 정보를 활용해 각 프레임을 크롭(crop)하여 상기 객체 이미지 데이터의 복수의 객체 프레임을 획득하는 제1 전처리단계를 더 포함할 수 있다.In addition, in the object detection step, when the person information identified in the person confirmation step is one person, the bounding box is converted for each frame to obtain box vertex coordinate information, and the box vertex coordinate information is used to determine each frame. It may further include a first pre-processing step of obtaining a plurality of object frames of the object image data by cropping.

또한 상기 제1 전처리단계는, 상기 프레임 별로 상기 바운딩 박스를 왼쪽 상단 꼭짓점 좌표와 오른쪽 하단 꼭짓점 좌표로 변환하여 박스 꼭짓점 좌표 정보를 획득하는 제1 좌표 변환단계; 복수의 프레임을 기준으로 상기 박스 꼭짓점 좌표 정보들을 비교하여 x좌표 기준 최소 값 상단 좌표와 최대 값 하단 좌표인 최소최대 좌표를 추출하고, 상기 최소최대 좌표를 기준으로 크롭 영역을 설정하는 제1 크롭영역 설정단계 및 상기 크롭 영역에 따라 각 프레임을 크롭(crop)하여 복수의 객체 프레임을 획득하는 제1 크롭 단계를 포함할 수 있다.In addition, the first preprocessing step includes: a first coordinate conversion step of obtaining box vertex coordinate information by converting the bounding box into upper left vertex coordinates and lower right vertex coordinates for each frame; A first crop area that compares the box vertex coordinate information based on a plurality of frames to extract the minimum and maximum coordinates, which are the upper coordinate of the minimum value and the lower coordinate of the maximum value based on the x coordinate, and sets the crop area based on the minimum and maximum coordinates It may include a setting step and a first cropping step of obtaining a plurality of object frames by cropping each frame according to the cropping area.

또한 상기 객체 검출단계는, 상기 인원 확인단계에서 파악된 인원 정보가 두 사람 이상일 경우, 상기 프레임의 상기 바운딩 박스 간의 교집합 영역 여부를 확인하는 교집합영역 판단단계 및 상기 바운딩 박스 간의 교집합 영역이 존재할 경우, 상기 프레임 별로 교집합 영역이 존재하는 바운딩 박스들을 변환하여 박스 꼭짓점 좌표 정보를 획득하고, 상기 교집합 영역이 존재하는 바운딩 박스들의 상기 박스 꼭짓점 좌표 정보를 활용해 각 프레임을 크롭(crop)하여 상기 객체 이미지 데이터의 객체 프레임을 획득하는 제2 전처리단계를 더 포함할 수 있다.In addition, the object detection step includes, when the person information identified in the person confirmation step is two or more people, an intersection area determination step of checking whether there is an intersection area between the bounding boxes of the frame, and an intersection area determination step of checking whether an intersection area between the bounding boxes of the frame exists, Box vertex coordinate information is obtained by converting the bounding boxes in which the intersection area exists for each frame, and each frame is cropped using the box vertex coordinate information of the bounding boxes in which the intersection area exists to obtain the object image data. It may further include a second preprocessing step of obtaining the object frame.

또한 상기 제2 전처리단계는, 상기 프레임 별로 교집합 영역이 존재하는 바운딩 박스들을 각각 왼쪽 상단 꼭짓점 좌표와 오른쪽 하단 꼭짓점 좌표로 변환하여 박스 꼭짓점 좌표 정보를 획득하는 제2 좌표 변환단계; 상기 프레임 별로 교집합 영역이 존재하는 바운딩 박스의 상기 박스 꼭짓점 좌표 정보들을 토대로 박스 영역의 x좌표 기준 최소 값 상단 좌표와 최대 값 하단 좌표인 박스 영역 좌표 정보를 추출하는 박스영역 추출단계; 복수의 프레임을 기준으로 상기 박스 영역 좌표 정보들을 비교하여 x좌표 기준 최소 값 상단 좌표와 최대 값 하단 좌표인 최소최대 좌표를 추출하고, 상기 최소최대 좌표를 기준으로 크롭 영역을 설정하는 제2 크롭영역 설정단계 및 상기 크롭 영역에 따라 각 프레임을 크롭(crop)하여 복수의 객체 프레임을 획득하는 제2 크롭 단계를 포함할 수 있다.In addition, the second preprocessing step includes a second coordinate conversion step of obtaining box vertex coordinate information by converting the bounding boxes with intersection areas for each frame into upper left vertex coordinates and lower right vertex coordinates, respectively; A box area extraction step of extracting box area coordinate information, which is the upper coordinate of the minimum value and the lower coordinate of the maximum value based on the x coordinate of the box area, based on the box vertex coordinate information of the bounding box where the intersection area exists for each frame; A second crop area that compares the box area coordinate information based on a plurality of frames to extract the minimum and maximum coordinates, which are the upper coordinate of the minimum value and the lower coordinate of the maximum value based on the x coordinate, and sets the crop area based on the minimum and maximum coordinates. It may include a setting step and a second cropping step of obtaining a plurality of object frames by cropping each frame according to the cropping area.

또한 상기 상황 판단단계에서 위험 상황으로 판단될 경우, 경보 장치를 작동시키거나 등록된 단말로 긴급 연락을 수행하는 경보 및 긴급연락 단계를 더 포함할 수 있다.In addition, if the situation is determined to be a dangerous situation in the situation determination step, an alarm and emergency contact step may be further included to activate an alarm device or make emergency contact to a registered terminal.

상기와 같은 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 시스템 및 방법은 영상을 활용하여 1인 가구에서 발생할 수 있는 고독사 또는 주거침입의 위험상황을 감지하는 것으로, 위험상황을 예방하거나 신속하게 대처할 수 있도록 한다.The behavior recognition-based dangerous situation detection system and method according to the embodiment of the present invention as described above uses video to detect a risky situation of a lonely death or home invasion that may occur in a single-person household, preventing or quickly detecting a risky situation. Be able to respond appropriately.

또한 영상으로부터 획득된 이미지에 블러 처리를 하는 것으로 개인의 프라이버시를 보호하면서 위험상황을 감지할 수 있다.Additionally, by blurring images obtained from video, it is possible to detect dangerous situations while protecting individual privacy.

또한 한 사람에 대한 행동만을 인식하는 것이 아닌 복수의 사람에 대한 행동을 인식하도록 하여 고독사 외에 주거침입 등 타인에 의한 위험상황도 감지할 수 있어, 다양한 위험상황에 대한 대처가 가능하다.In addition, by recognizing the actions of multiple people rather than just one person, it is possible to detect dangerous situations caused by others, such as home invasions, as well as lonely deaths, making it possible to respond to a variety of dangerous situations.

도 1은 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 시스템을 나타낸 블록도.
도 2의 (a) 및 (b)는 도 1의 이미지 추출부에 의해 획득된 프레임과 블러 처리가 된 프레임을 나타낸 예시도.
도 3은 도 1의 객체 검출부의 구성을 나타낸 블록도.
도 4는 도 3의 인원 확인부를 통해 프레임에 바운딩 박스가 추출된 모습을 나타낸 예시도.
도 5의 (a) 내지 (d)는 인원이 한 사람일 경우 도 3의 이미지 전처리부를 통해 복수의 프레임을 기준으로 크롭 영역이 설정된 모습을 나타낸 예시도.
도 6은 도 5의 크롭 영역에 따라 크롭 되어 생성된 객체 프레임을 나타낸 예시도.
도 7은 인원이 두 사람인 경우 도 3의 이미지 전처리부를 통해 복수의 프레임을 기준으로 크롭 영역이 설정된 모습을 나타낸 예시도.
도 8은 도 7의 크롭 영역에 따라 크롭 되어 생성된 객체 프레임을 나타낸 예시도.
도 9는 도 1의 행동 인식부의 행동 인식 모델을 나타낸 개념도.
도 10은 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 방법을 개략적으로 나타낸 흐름도.
도 11은 도 10의 S300 단계를 순차적으로 나타낸 흐름도.
도 12는 도 11의 S330 단계를 순차적으로 나타낸 흐름도.
도 13은 도 11의 S350 단계를 순차적으로 나타낸 흐름도.1 is a block diagram showing a behavior recognition-based dangerous situation detection system according to an embodiment of the present invention.
Figures 2 (a) and (b) are exemplary diagrams showing frames obtained by the image extraction unit of Figure 1 and frames subjected to blur processing.
Figure 3 is a block diagram showing the configuration of the object detection unit of Figure 1.
Figure 4 is an example diagram showing a bounding box extracted from a frame through the person identification unit of Figure 3.
Figures 5 (a) to (d) are examples showing crop areas set based on a plurality of frames through the image pre-processing unit of Figure 3 when there is only one person.
Figure 6 is an example diagram showing an object frame created by cropping according to the crop area of Figure 5.
Figure 7 is an example diagram showing a crop area set based on a plurality of frames through the image pre-processing unit of Figure 3 when there are two people.
Figure 8 is an example diagram showing an object frame created by cropping according to the crop area of Figure 7.
Figure 9 is a conceptual diagram showing an action recognition model of the action recognition unit of Figure 1.
Figure 10 is a flowchart schematically showing a method for detecting a dangerous situation based on behavior recognition according to an embodiment of the present invention.
Figure 11 is a flow chart sequentially showing step S300 of Figure 10.
Figure 12 is a flow chart sequentially showing step S330 of Figure 11.
Figure 13 is a flow chart sequentially showing step S350 of Figure 11.

이하, 도면을 참조한 본 발명의 설명은 특정한 실시 형태에 대해 한정되지 않으며, 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있다. 또한, 이하에서 설명하는 내용은 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Hereinafter, the description of the present invention with reference to the drawings is not limited to specific embodiments, and various changes may be made and various embodiments may be possible. In addition, the content described below should be understood to include all conversions, equivalents, and substitutes included in the spirit and technical scope of the present invention.

이하의 설명에서 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용되는 용어로서, 그 자체에 의미가 한정되지 아니하며, 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.In the following description, the terms first, second, etc. are terms used to describe various components, and their meaning is not limited, and is used only for the purpose of distinguishing one component from other components.

본 명세서 전체에 걸쳐 사용되는 동일한 참조번호는 동일한 구성요소를 나타낸다.Like reference numerals used throughout this specification refer to like elements.

본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 이하에서 기재되는 "포함하다", "구비하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것으로 해석되어야 하며, 하나 또는 그 이상의 다른 특징들이나, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.As used herein, singular expressions include plural expressions, unless the context clearly dictates otherwise. In addition, terms such as “comprise,” “provide,” or “have” used below are intended to designate the presence of features, numbers, steps, operations, components, parts, or a combination thereof described in the specification. It should be construed and understood as not excluding in advance the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations thereof.

또한, 명세서에 기재된 "…부", "…기", "…모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, terms such as “…unit”, “…unit”, and “…module” used in the specification refer to a unit that processes at least one function or operation, which may be implemented through hardware or software or a combination of hardware and software. You can.

이하, 첨부된 도면을 참조하여 본 발명의 실시 예에 따른 행동인식 기반 위험 상황 감지 시스템 및 방법을 상세히 살펴보기로 한다.Hereinafter, a behavior recognition-based dangerous situation detection system and method according to an embodiment of the present invention will be examined in detail with reference to the attached drawings.

도 1은 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 시스템을 나타낸 블록도이고, 도 2의 (a) 및 (b)는 도 1의 이미지 추출부에 의해 획득된 프레임과 블러 처리가 된 프레임을 나타낸 예시이고, 도 3은 도 1의 객체 검출부의 구성을 나타낸 블록도이고, 도 4는 도 3의 인원 확인부를 통해 프레임에 바운딩 박스가 추출된 모습을 나타낸 예시도이고, 도 5의 (a) 내지 (d)는 인원이 한 사람일 경우 도 3의 이미지 전처리부를 통해 복수의 프레임을 기준으로 크롭 영역이 설정된 모습을 나타낸 예시도이고, 도 6은 도 5의 크롭 영역에 따라 크롭 되어 생성된 객체 프레임을 나타낸 예시도이고, 도 7은 인원이 두 사람 이상일 경우 도 3의 이미지 전처리부를 통해 복수의 프레임을 기준으로 크롭 영역이 설정된 모습을 나타낸 예시도이고, 도 8은 도 7의 크롭 영역에 따라 크롭 되어 생성된 객체 프레임을 나타낸 예시도이며, 도 9는 도 1의 행동 인식부의 행동 인식 모델을 나타낸 개념도이다.Figure 1 is a block diagram showing a behavior recognition-based dangerous situation detection system according to an embodiment of the present invention, and Figures 2 (a) and (b) are blurred frames obtained by the image extraction unit of Figure 1. It is an example showing a frame, Figure 3 is a block diagram showing the configuration of the object detection unit of Figure 1, Figure 4 is an example showing a bounding box extracted from a frame through the person identification unit of Figure 3, and (in Figure 5) a) to (d) are examples showing how the crop area is set based on a plurality of frames through the image preprocessor of Figure 3 when there is one person, and Figure 6 is generated by cropping according to the crop area of Figure 5. This is an example diagram showing an object frame, and Figure 7 is an example diagram showing how a crop area is set based on a plurality of frames through the image pre-processing unit of Figure 3 when there are two or more people, and Figure 8 is a crop area of Figure 7. This is an example diagram showing an object frame created by cropping according to , and FIG. 9 is a conceptual diagram showing the action recognition model of the action recognition unit of FIG. 1.

본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 시스템(이하 '위험 상황 감지 시스템'이라 함)은 영상을 수신 받아 이미지 데이터를 추출하되 블러 처리 하여 개인의 사생활을 보호하고, 상기 이미지 데이터 내 객체(사람)의 행동을 인식하는 것으로, 위험 상황에 대한 감지를 할 수 있다.The behavior recognition-based dangerous situation detection system (hereinafter referred to as the 'dangerous situation detection system') according to an embodiment of the present invention receives images, extracts image data, and blurs it to protect individual privacy, and detects objects in the image data. By recognizing (human) behavior, it is possible to detect dangerous situations.

위험 상황 감지 시스템은 위험 상황이 감지되면, 도 1과 같이, 경보 장치(W)를 작동시키거나 연락처로 사전 등록되어 있는 단말(T)로 긴급 연락을 수행하여, 주변에 알리는 것으로 위험 상황에 대한 대처가 이루어지도록 할 수 있다.When a dangerous situation is detected, the dangerous situation detection system activates the alarm device (W) or makes emergency contact to the terminal (T) pre-registered as a contact information, as shown in FIG. 1, and informs the surroundings of the dangerous situation. A response can be made.

여기서 단말(T)은 보호자, 응급 기관 등 누구나 가지고 있는 불특정 다수가 소지하고 있는 단말일 수 있으며, 모바일 단말, PC, 태블릿 등 다양하게 적용될 수 있다.Here, the terminal (T) can be a terminal owned by an unspecified number of people, such as guardians and emergency agencies, and can be applied to various devices such as mobile terminals, PCs, and tablets.

도 1을 참조하면, 위험 상황 감지 시스템은 촬영부(1) 및 위험 감지 장치(2)를 포함할 수 있다.Referring to FIG. 1, the dangerous situation detection system may include a photographing unit 1 and a danger detection device 2.

촬영부(1)는 하나 이상이 주거 공간에 설치되어 촬영을 수행하는 것으로, 주거 공간 내에 존재하는 사람을 촬영할 수 있다.One or more photographing units 1 are installed in a residential space to perform photographing, and can photograph people present in the residential space.

또한 촬영부(1)는 위험 감지 장치(2)와 연동되어 촬영한 영상을 위험 감지 장치(2)로 전송할 수 있다.Additionally, the photographing unit 1 can transmit the captured image in conjunction with the hazard detection device 2 to the hazard detection device 2.

여기서 촬영부(1)는 화상카메라일 수 있으나, 이에 한정되지 않고, 실시간 카메라일 수 있으며, 나아가 라이다, 열화상카메라 등으로 구성되어 필터 처리 없이 사생활 보호가 가능하도록 할 수도 있다.Here, the capturing unit 1 may be a video camera, but is not limited to this, and may be a real-time camera, and may also be comprised of a lidar, a thermal imaging camera, etc. to enable privacy protection without filter processing.

위험 감지 장치(2)는 촬영부(1)를 통해 촬영된 영상으로부터 이미지 데이터를 획득하고, 획득된 이미지 데이터에 블러 처리한 후 사람에 해당하는 객체의 행동을 인식하여 위험 상황에 대한 판단을 수행할 수 있다.The danger detection device (2) acquires image data from the image captured through the photographing unit (1), performs blur processing on the acquired image data, and then recognizes the behavior of objects corresponding to people to determine a dangerous situation. can do.

이를 위해, 위험 감지 장치(2)는 이미지 추출부(20), 객체 검출부(21), 행동 인식부(22), 알림부(23) 및 데이터 베이스(24)를 포함할 수 있다.To this end, the risk detection device 2 may include an image extraction unit 20, an object detection unit 21, a behavior recognition unit 22, a notification unit 23, and a database 24.

이미지 추출부(20)는 촬영부(1)로부터 촬영된 영상을 수신 받아 이미지 데이터를 획득하고, 획득한 이미지 데이터에 블러 처리를 할 수 있다.The image extraction unit 20 may receive the captured image from the photographing unit 1, obtain image data, and perform blur processing on the acquired image data.

여기서 이미지 데이터는 영상으로부터 일정 시간 간격으로 추출된 복수의 프레임을 포함할 수 있으나, 이에 한정되지 않고, 일정 시간 간격이 아닌 불규칙한 시간 간격으로 추출될 수도 있다. 또한 이미지 데이터는 16 프레임이 바람직하나, 이에 한정되지는 않는다.Here, the image data may include a plurality of frames extracted from the image at regular time intervals, but is not limited to this and may be extracted at irregular time intervals rather than regular time intervals. Additionally, the image data is preferably 16 frames, but is not limited thereto.

구체적으로, 이미지 추출부(20)는 파이썬 OpenCV 라이브러리를 이용하여 캡쳐를 통해 영상에서 복수의 프레임을 추출할 수 있으나, 이에 한정되지는 않는다.Specifically, the image extraction unit 20 can extract a plurality of frames from an image through capture using the Python OpenCV library, but is not limited to this.

또한 이미지 추출부(20)는 상기와 같이 획득된 이미지 데이터의 각 프레임에 가우시안 블러 필터를 적용하여 블러 처리를 함으로써, 도 2와 같이, 모자이크 효과를 줄 수 있다. 이는 사용자 개인의 사생활 보호를 보장하기 위한 것이다.In addition, the image extraction unit 20 applies a Gaussian blur filter to each frame of the image data obtained as above to perform blur processing, thereby providing a mosaic effect as shown in FIG. 2. This is to ensure the privacy of individual users.

가우시안(Gaussian) 블러 필터는 프레임에 수학 함수를 적용하여 흐림 효과를 주는 것으로, 일정 범위 이상을 벗어나는 픽셀 값을 잘라내 프레임의 픽셀값을 균일하게 조정하여 흐림 효과를 주는 필터이다. A Gaussian blur filter applies a mathematical function to a frame to create a blur effect. It is a filter that creates a blur effect by cutting out pixel values that fall outside a certain range and adjusting the pixel values of the frame uniformly.

또한 이미지 추출부(20)는 프레임에 가우시안 블러 필터를 적용할 시 반경 값을 조정하여 흐림 정도를 조정할 수 있는데, 프레임 해상도를 파악하여 프레임에 따라 반경 값을 자동 조정하는 것으로, 가우시안 블록 필터를 적용할 수 있다. 그러나 이에 한정되지 않고, 반경 값은 기 설정될 수도 있다.In addition, the image extraction unit 20 can adjust the degree of blur by adjusting the radius value when applying a Gaussian blur filter to a frame. By determining the frame resolution and automatically adjusting the radius value according to the frame, a Gaussian block filter is applied. can do. However, it is not limited to this, and the radius value may be preset.

객체 검출부(21)는 인공지능 모델이 적용되어 이미지 추출부(20)로부터 블러 처리된 이미지 데이터에서 사람에 해당하는 객체를 검출하고, 인원에 따른 전처리 통해 객체 이미지 데이터를 생성할 수 있다. 이는 객체 영역에 해당하는 부분만을 데이터로 추출하여 행동 인식부(22)에서 이용되도록 함으로써 객체의 행동 인식이 보다 효율적으로 정확하게 이루어지도록 하기 위한 것이다.The object detection unit 21 can apply an artificial intelligence model to detect objects corresponding to people in the blurred image data from the image extraction unit 20 and generate object image data through preprocessing according to the number of people. This is to ensure that the object's behavior is recognized more efficiently and accurately by extracting only the part corresponding to the object area as data and using it in the behavior recognition unit 22.

도 3을 참조하면, 객체 검출부(21)는 인원 확인부(210) 및 이미지 전처리부(211)를 포함할 수 있다.Referring to FIG. 3, the object detection unit 21 may include a person identification unit 210 and an image pre-processing unit 211.

인원 확인부(210)는 블러 처리된 이미지 데이터의 프레임 별로 사람에 해당하는 객체에 대한 바운딩 박스(bounding box, BB)를 추출하여 인원을 파악할 수 있다.The person identification unit 210 may identify people by extracting a bounding box (BB) for an object corresponding to a person for each frame of blurred image data.

이러한 인원 확인부(210)는 인공지능 모델 중 YOLO 모델이 적용되어 프레임에서 사람에 해당하는 객체에 대한 바운딩 박스(BB)를 추출할 수 있다.This person confirmation unit 210 can extract a bounding box (BB) for an object corresponding to a person from a frame by applying the YOLO model among artificial intelligence models.

YOLO 모델은 이미지 내에서 객체를 빠짐없이 탐지하여 위치 정보를 바운딩 박스로 표시하는 모델로서, 하나의 신경망을 Grid 방식을 통해 전체 이미지에 적용하여 주변 정보까지 처리할 수 있어, 기존 분류기 기반 객체 탐지 기법인 R-CNN, Fast R-CNN 등에 비하여 객체 검출 정확도가 우수하고, 매우 효율적이고 빠르게 객체를 탐지할 수 있어 실시간 객체 탐지가 가능할 수 있다.The YOLO model is a model that detects all objects in an image and displays location information as a bounding box. By applying a single neural network to the entire image through the grid method, it can even process surrounding information, replacing existing classifier-based object detection techniques. It has superior object detection accuracy compared to R-CNN, Fast R-CNN, etc., and can detect objects very efficiently and quickly, enabling real-time object detection.

보다 구체적으로, YOLO 모델은 하나의 합성곱 신경망(Convolutional neural network, CNN)으로 이루어질 수 있고, 합성곱 신경망(CNN)은 컨볼루션 레이어(convolution layer, Conv)와 전결합 레이어(fully connected layer, FC)로 구성될 수 있다.More specifically, the YOLO model may be composed of a convolutional neural network (CNN), which consists of a convolution layer (Conv) and a fully connected layer (FC). ) can be composed of.

컨볼루션 레이어(Conv)는 하나 이상으로 형성되어, 이미지 데이터의 프레임의 특징을 추출하는 것으로, 프레임을 S x S 그리드(Grid)로 나누고, 프레임에 대해 가중치를 적용하여 합성곱 연상을 통해 특징맵(feature map)을 생성할 수 있다. 이러한 하나의 컨볼루션 레이어(Conv)는 프레임의 픽셀 또는 그리드 셀을 대상으로 위치를 변경하면서 여러 번 반복하여 적용되어 프레임에 대해 특징을 추출할 수 있다.One or more convolutional layers (Conv) are formed to extract the features of the frame of image data. The frame is divided into an S x S grid, weights are applied to the frame, and a feature map is created through convolution association. (feature map) can be created. This single convolutional layer (Conv) can be applied repeatedly several times while changing the position of the pixel or grid cell of the frame to extract features for the frame.

여기서 사용되는 가중치들의 그룹을 가중치 커널(kernel)이라고 지칭할 수 있으며, 가중치 커널은 n x m x d의 3차원 행렬로 구성될 수 있는데, 프레임을 지정된 간격으로 순회하며 합성곱 연산을 통해 특징맵을 생성할 수 있다. The group of weights used here can be referred to as a weight kernel, and the weight kernel can be composed of a three-dimensional matrix of n x m x d, and a feature map can be generated through a convolution operation by traversing the frame at specified intervals. there is.

이때, 프레임이 복수의 채널(예를 들어, HSV의 3개의 채널)을 갖는 이미지라면, 가중치 커널은 프레임의 각 채널을 순회하며 합성곱 계산을 한 후, 채널 별 특징맵을 생성할 수 있다.At this time, if the frame is an image with multiple channels (e.g., 3 channels of HSV), the weight kernel may traverse each channel of the frame, perform convolution calculations, and then generate a feature map for each channel.

여기서, n은 프레임의 특정 크기의 행, m은 프레임의 특정 크기의 열, d는 프레임의 채널을 나타낼 수 있다.Here, n may represent a row of a specific size in the frame, m may represent a column of a specific size in the frame, and d may represent a channel of the frame.

전결합 레이어(FC)는 생성된 특징맵을 이용하여 객체에 대한 하나 이상의 바운딩 박스와 클래스 확률을 예측할 수 있다.The pre-combined layer (FC) can predict one or more bounding boxes and class probabilities for an object using the generated feature map.

여기서 바운딩 박스(bounding box)는 (x,y,w,h,c) 좌표로 구성될 수 있는데, x,y는 바운딩 박스의 중심 좌표 값, w,h는 바운딩 박스의 너비와 높이 값, c는 신뢰 점수(confidence score)이다.Here, the bounding box can be composed of (x,y,w,h,c) coordinates, where x,y is the center coordinate value of the bounding box, w,h are the width and height values of the bounding box, and c is the confidence score.

또한 클래스 확률은 그리드 셀 안에 객체가 있다는 조건 하에 그 객체가 어떤 클랜스(class)인지에 대한 조건부 확률이다.Additionally, the class probability is the conditional probability of what class the object is under the condition that there is an object in the grid cell.

또한 전결합 레이어(FC)는 바운딩 박스의 좌표와 클래스 확률을 이용하여 실제 객체에 대한 바운딩 박스를 선택해 낼 수 있으며, 사람에 해당하는 객체에 대한 바운딩 박스(BB)만을 추출할 수 있다. 이때, class specific confidence score와 IOU(Intersection over Union)을 이용하여 사람에 해당하는 객체에 대한 바운딩 박스(BB)만을 추출할 수 있다.In addition, the fully combined layer (FC) can select the bounding box for the real object using the coordinates and class probability of the bounding box, and can extract only the bounding box (BB) for the object corresponding to a person. At this time, only the bounding box (BB) for the object corresponding to the person can be extracted using the class specific confidence score and IOU (Intersection over Union).

class specific confidence score는 바운딩 박스의 신뢰 점수(confidence score)와 클래스 확률(class probability)을 곱하는 것으로 구할 수 있고, IOU(Intersection over Union)는 교집합 영역 넓이/합집합 영역 넓이로 구할 수 있다.The class specific confidence score can be obtained by multiplying the confidence score of the bounding box and the class probability, and IOU (Intersection over Union) can be obtained by the intersection area area/union area area.

이와 같이 인원 확인부(210)는 상기 YOLO 모델을 통해 사람에 해당하는 객체에 대한 바운딩 박스(BB)를 추출할 수 있다(도 4 참조).In this way, the person identification unit 210 can extract the bounding box BB for an object corresponding to a person through the YOLO model (see FIG. 4).

이에 인원 확인부(210)는 추출된 바운딩 박스(BB)의 수에 따라 인원을 파악할 수 있으며, 인원 정보를 생성할 수 있다.Accordingly, the person verification unit 210 can determine the number of people based on the number of extracted bounding boxes (BB) and generate person information.

한편, 인원 확인부(210)는 복수의 바운딩 박스(BB)가 추출되어 인원이 두 사람 이상이라고 파악될 경우, 바운딩 박스(BB) 간의 교집합 영역 여부를 확인할 수 있다.Meanwhile, when a plurality of bounding boxes (BB) are extracted and it is determined that there are two or more people, the person confirmation unit 210 may check whether there is an intersection area between the bounding boxes (BB).

이에 인원 확인부(210)는 교집합 영역이 존재한다고 판단될 경우, 해당 이미지 데이터가 이미지 전처리부(211)에 의해 전처리 되도록 할 수 있으며, 존재하지 않는다고 판단될 경우, 해당 이미지 데이터에 대한 처리가 종료되도록 하여, 위험 상황이 아닌 '일반 상황'으로 판단되도록 할 수 있다.Accordingly, if the personnel verification unit 210 determines that an intersection area exists, the corresponding image data can be preprocessed by the image preprocessing unit 211, and if it determines that it does not exist, processing of the corresponding image data is terminated. As much as possible, it can be determined as a 'normal situation' rather than a dangerous situation.

이미지 전처리부(211)는 인원 확인부(210)로부터 파악된 인원 정보에 따라 이미지 데이터의 각 프레임을 크롭(crop)하여 복수의 객체 프레임을 획득하는 것으로, 객체 이미지 데이터를 생성할 수 있다.The image pre-processing unit 211 may generate object image data by cropping each frame of image data according to the personnel information identified from the personnel verification unit 210 to obtain a plurality of object frames.

이미지 전처리부(211)는 바운딩 박스를 이용하여 인원 정보에 따라 이미지 데이터의 각 프레임을 크롭(crop)할 수 있는데, 이에 대하여 도 5 내지 도 8을 참고하여 인원이 한 사람인 경우와 두 사람 이상인 경우로 나누어 자세하게 설명하기로 한다. 한편 도면에서 설명의 편의성을 위하여 4 프레임을 기준으로 나타내었으나, 이에 한정되지는 않는다.The image pre-processing unit 211 can use a bounding box to crop each frame of image data according to person information. For this, refer to FIGS. 5 to 8 for cases where there is one person and when there are two or more people. Let's break it down and explain it in detail. Meanwhile, in the drawings, for convenience of explanation, 4 frames are shown as a standard, but it is not limited thereto.

일 예로, 도 5와 같이 인원이 한 사람으로 파악될 경우, 이미지 전처리부(211)는 먼저 프레임 별로 추출된 바운딩 박스(BB)의 좌표를 왼쪽 상단 꼭짓점 좌표 P_l(x_l,y_l)와 오른쪽 하단 꼭짓점 좌표 P_r(x_r,y_r)로 변환하여 박스 꼭짓점 좌표 정보를 획득할 수 있다.For example, when the person is identified as one person as shown in FIG. 5, the image preprocessor 211 first combines the coordinates of the bounding box BB extracted for each frame with the upper left vertex coordinates P _l (x _l , y _l ) You can obtain box vertex coordinate information by converting the bottom right vertex coordinates to P _r (x _r , y _r ).

그 다음 이미지 전처리부(211)는 복수의 프레임을 기준으로 박스 꼭짓점 좌표 정보(P_l, P_r)들을 비교하여 x좌표 기준 최소 값 상단 좌표 P_min와 최대 값 하단 좌표 P_max인 최소최대 좌표를 추출하고, 최소최대 좌표를 기준으로 박스 형태의 크롭 영역(CR)을 설정할 수 있다.Next, the image preprocessor 211 compares the box vertex coordinate information (P _l , P _r ) based on a plurality of frames to determine the minimum and maximum coordinates, which are the minimum value upper coordinate P _min and the maximum value lower coordinate P _max based on the x coordinate. You can extract and set a box-shaped crop area (CR) based on the minimum and maximum coordinates.

이에 이미지 전처리부(211)는 설정된 크롭 영역(CR)에 따라 각 프레임을 크롭 하여, 도 6과 같이, 복수의 객체 프레임을 획득할 수 있다.Accordingly, the image pre-processing unit 211 can obtain a plurality of object frames by cropping each frame according to the set crop area (CR), as shown in FIG. 6.

다른 예로, 이미지 전처리부(211)는 인원이 두 사람 이상으로 파악될 경우에도 다른 전처리 과정을 통해 객체 이미지 데이터를 획득할 수 있는데, 인원이 두 사람인 경우에 해당하는 도 7 및 도 8을 이용하여 설명하기로 한다.As another example, the image pre-processing unit 211 can acquire object image data through another pre-processing process even when the number of people is determined to be two or more. Using FIGS. 7 and 8 corresponding to the case of two people, Let me explain.

도 7을 참조하면, 이미지 전처리부(211)는 프레임 별로 교집합 영역이 존재하는 바운딩 박스(BB)들의 좌표를 각각 왼쪽 상단 꼭짓점 좌표 P_l(x_l,y_l)와 오른쪽 하단 꼭짓점 좌표 P_r(x_r,y_r)로 변환하여 교집합 영역이 존재하는 바운딩 박스(BB)들의 박스 꼭짓점 좌표 정보를 획득할 수 있다.Referring to FIG. 7, the image preprocessor 211 sets the coordinates of the bounding boxes (BB) in which the intersection area exists for each frame into the upper left vertex coordinates P _l (x _l , y _l ) and the lower right vertex coordinates P _r ( By converting to x _r , y _r ), you can obtain box vertex coordinate information of bounding boxes (BB) where the intersection area exists.

그 다음 이미지 전처리부(211)는 프레임 별로 교집합 영역이 존재하는 바운딩 박스(BB)의 박스 꼭짓점 좌표 정보들을 토대로 박스 영역 좌표 정보를 추출할 수 있다. Next, the image preprocessor 211 may extract box area coordinate information based on the box vertex coordinate information of the bounding box BB where the intersection area exists for each frame.

여기서 박스 영역 좌표 정보는 바운딩 박스(BB)들의 영역을 포함할 수 있는 최소한의 크기를 가지는 박스 영역의 x좌표 기준 최소 값 상단 좌표와 최대 값 하단 좌표일 수 있다.Here, the box area coordinate information may be the minimum upper coordinate and the maximum lower coordinate based on the x coordinate of the box area with the minimum size that can include the area of the bounding boxes (BB).

그 다음 이미지 전처리부(211)는 복수의 프레임을 기준으로 박스 영역 좌표 정보들을 비교하여 x좌표 기준 최소 값 상단 좌표 P_min와 최대 값 하단 좌표 P_max인 최소최대 좌표를 추출하고, 최소최대 좌표를 기준으로 박스 형태의 크롭 영역(CR)을 설정할 수 있다.Next, the image preprocessor 211 compares the box area coordinate information based on a plurality of frames to extract the minimum and maximum coordinates, which are the minimum value upper coordinate P _min and the maximum value lower coordinate P _max based on the x coordinate, and calculate the minimum and maximum coordinates. You can set a box-shaped crop area (CR) as a standard.

이에 이미지 전처리부(211)는 설정된 크롭 영역(CR)에 따라 각 프레임을 크롭 하여, 도 8과 같이, 복수의 객체 프레임을 획득할 수 있다.Accordingly, the image pre-processing unit 211 can obtain a plurality of object frames by cropping each frame according to the set crop area (CR), as shown in FIG. 8.

이와 같은 과정으로 처리되는 경우는 인원이 두 사람 이상이되, 바운딩 박스 간의 교집합 영역이 존재하는 경우에 해당할 수 있다.Cases processed through this process may correspond to cases where there are two or more people and an intersection area between bounding boxes exists.

한편, 이미지 전처리부(211)는 최소최대 좌표를 기준으로 박스 형태의 크롭 영역(CR)을 설정할 시, 가우시안 블러 필터의 반경 값에 따라 크롭 영역(CR)에 대한 확장/축소 처리로 조정 과정을 추가로 진행할 수도 있다.Meanwhile, when the image pre-processing unit 211 sets the box-shaped crop area (CR) based on the minimum and maximum coordinates, it performs an adjustment process by expanding/contracting the crop area (CR) according to the radius value of the Gaussian blur filter. You may proceed further.

이와 같이 가우시안 블록 필터 적용에 의한 흐림 정도에 따라 크롭 영역(CR)을 확장시키거나 축소시켜 사생활 보호도와 검출 정확도를 조절할 수 있도록 한다.In this way, the degree of privacy and detection accuracy can be adjusted by expanding or reducing the crop area (CR) according to the degree of blur by applying a Gaussian block filter.

예를 들어, 이미지 전처리부(211)는 반경 값이 기 설정된 기준 반경 값 보다 클 경우 크롭 영역(CR)을 일정 크기 확장시킬 수 있는데, 이는 사생활 보호도가 높아짐에 따라 객체와 주변환경 간의 경계가 모호해져 바운딩 박스 검출 정확도가 떨어질 수 있어 크롭 영역(CR)을 확장시켜 객체가 잘리는 경우 등을 최대한 방지하는 것으로 객체 검출 정확도를 높일 수 있다.For example, the image preprocessor 211 can expand the crop area (CR) to a certain size when the radius value is larger than the preset reference radius value, which means that as the degree of privacy increases, the boundary between the object and the surrounding environment becomes smaller. Since bounding box detection accuracy may decrease due to ambiguity, object detection accuracy can be increased by expanding the crop area (CR) to prevent objects from being cut off as much as possible.

행동 인식부(22)는 인공지능 모델이 적용되어 파악된 인원에 따라 객체 이미지 데이터로부터 행동을 인식하여 위험 상황 여부를 판단할 수 있다.The behavior recognition unit 22 can determine whether a dangerous situation exists by recognizing behavior from object image data according to the number of people identified by applying an artificial intelligence model.

보다 구체적으로, 행동 인식부(22)는 행동 인식 모델을 통해 객체 이미지 데이터로부터 객체의 행동을 인식할 수 있고, 인식된 행동에 따라 위험 상황을 감지할 수 있다.More specifically, the action recognition unit 22 can recognize the behavior of an object from object image data through a behavior recognition model and detect a dangerous situation according to the recognized behavior.

행동 인식 모델(220)로 SlowFast 모델이 적용될 수 있는데, SlowFast 모델은 사람 눈의 세포에 따른 물체 정보 인식에 관한 연구에 착안해 제안된 모델 구조로서, end-to-end 방식으로 학습을 할 수 있다. 사람의 눈은 P-cells과 M-cells로 이루어졌는데, M-cells은 빠른 시간적 주기에 대한 정보를 받아들이고, 빠른 시각적 변화에 반응을 하되 공간적 자세한 특성에는 거의 반응을 하지 않는 반면, P-cells은 공간적 자세한 특성에 대해 반응하지만 시간적 정보에는 반응을 하지 않는 특징을 지니고 있는 것에 착안하여, 이미지에 대해 서로 다른 샘플링 비율을 정해 두 개의 모델을 학습시킨 후 최종적으로 특징을 합쳐 예측하는 모델로 SlowFast 모델이 제안된 것이다.The SlowFast model can be applied as the action recognition model 220. The SlowFast model is a model structure proposed based on research on object information recognition according to cells in the human eye, and can be learned in an end-to-end manner. . The human eye is composed of P-cells and M-cells. M-cells receive information about fast temporal cycles and respond to rapid visual changes, but rarely respond to spatial details, while P-cells Based on the fact that it responds to detailed spatial characteristics but does not respond to temporal information, the SlowFast model is a model that trains two models by setting different sampling rates for images and then combines the features to make predictions. It is proposed.

도 9를 참조하면, 행동 인식 모델(220)은 Slow 모델(221), Fast 모델(222) 및 예측부(223)를 포함할 수 있다. 도 9에서 C는 채널(Channel)의 수, T는 프레임의 수(시간 축), H,W는 프레임의 크기를 의미한다.Referring to FIG. 9, the action recognition model 220 may include a slow model 221, a fast model 222, and a prediction unit 223. In Figure 9, C means the number of channels, T means the number of frames (time axis), and H and W mean the size of the frame.

Slow 모델(221)은 낮은 프레임 비율로 객체 프레임이 입력되는 것으로, 객체 이미지 데이터의 복수의 객체 프레임 중 일부를 입력 받아 공간 및 색에 대한 특징을 추출할 수 있다.The Slow model 221 inputs object frames at a low frame rate, and can extract spatial and color features by receiving some of the plurality of object frames of object image data.

Fast 모델(222)은 높은 프레임 비율로 객체 프레임이 입력되는 것으로, 즉 Slow 모델(221) 보다 많은 객체 프레임을 입력 받아 시간 변화에 따른 특징을 추출할 수 있다. The Fast model (222) inputs object frames at a high frame rate, that is, it receives more object frames than the Slow model (221) and can extract features according to time changes.

이때, Fast 모델(222)은 Slow 모델(221) 보다 α배 많게 객체 프레임이 입력될 수 있는데, 성능을 고려하여 α가 8로 설정되는 것이 바람직할 수 있다. At this time, the Fast model 222 may have α times more object frames input than the Slow model 221, and considering performance, it may be desirable for α to be set to 8.

예를 들어, Slow 모델(221)에 2 객체 프레임이 입력될 경우 Fast 모델(222)에 16 객체 프레임이 입력될 수 있는 것이다.For example, if 2 object frames are input to the Slow model (221), 16 object frames can be input to the Fast model (222).

또한 Fast 모델(222)은 Slow 모델(221) 보다 C와 T가 β배 작을 수 있는데, β가 1/8인 것이 바람직하나, 이에 한정되지는 않는다.Additionally, the Fast model (222) may have C and T that are β times smaller than the Slow model (221), and β is preferably 1/8, but is not limited to this.

예측부(223)는 인원 정보를 기준으로 Slow 모델(221)과 Fast 모델(222)로부터 추출된 특징을 종합하여 행동을 인식할 수 있다.The prediction unit 223 can recognize behavior by combining features extracted from the slow model 221 and the fast model 222 based on personnel information.

예측부(223)는 한 사람인 경우, 특징을 통해 행동을 일반 상황인 앉기, 서기, 걷기, 눕기 등 중 하나로 인식하거나, 위험 상황인 넘어짐 등으로 인식할 수 있다.In the case of a single person, the prediction unit 223 may recognize the behavior as one of the normal situations such as sitting, standing, walking, lying down, etc., or a dangerous situation such as falling, based on the characteristics.

또한 예측부(223)는 두 사람 이상인 경우, 특징을 통해 행동을 일반 상황인 안마, 터치 등 중 하나로 인식하거나, 위험 상황인 발차기, 밀기, 물건으로 치기, 칼로 찌르기, 다중 폭행 등 중 하나로 인식할 수 있다.In addition, in the case of two or more people, the prediction unit 223 recognizes the behavior through the characteristics as one of the general situations such as massage or touching, or the dangerous situation such as kicking, pushing, hitting with an object, stabbing with a knife, or multiple assaults. can do.

이에 예측부(223)는 인식된 행동에 따라 위험 상황 여부를 판단할 수 있고, '위험 상황'으로 판단될 경우 알림부(23)를 통해 대처가 가능하도록 할 수 있다.Accordingly, the prediction unit 223 can determine whether a dangerous situation exists based on the recognized behavior, and if it is determined to be a 'dangerous situation,' the notification unit 23 can enable response.

알림부(23)는 예측부(223)로부터 '위험 상황'으로 판단될 경우, 경보 장치(W)를 작동시키거나 등록된 단말(T)로 긴급 연락을 수행할 수 있다.If the prediction unit 223 determines that it is a 'dangerous situation', the notification unit 23 can activate the alarm device (W) or make emergency contact to the registered terminal (T).

이때 알림부(23)는 단말(T)로 긴급 연락을 수행할 시 메시지도 함께 전달할 수 있는데, 메시지에는 인식된 행동 정보, 그에 해당하는 객체 이미지 데이터 등이 포함될 수 있다. At this time, the notification unit 23 can also deliver a message when making emergency contact with the terminal T. The message may include recognized behavior information, corresponding object image data, etc.

또한 메시지에는 촬영부(1)와 연동될 수 있는 링크가 같이 첨부되어, 단말(T)을 통해 실시간으로 영상을 확인할 수 있도록 할 수 있으며, 이때 영상에 블러 처리가 이루어져 있을 수도 있다.In addition, a link that can be linked to the photographing unit 1 is attached to the message so that the image can be viewed in real time through the terminal T. At this time, the image may be blurred.

데이터 베이스(24)는 촬영부(1)로부터 전달 받은 영상, 이미지 추출부(20)로부터 추출된 이미지 데이터, 객체 검출부(21)로부터 생성된 객체 이미지 데이터, 행동 인식부(22)로부터 인식된 행동 정보, 위험 상황 판단 정보 등을 저장할 수 있다.The database 24 includes images received from the photographing unit 1, image data extracted from the image extraction unit 20, object image data generated from the object detection unit 21, and behavior recognized from the behavior recognition unit 22. Information, risk situation judgment information, etc. can be stored.

한편, 경보 장치(W)는 스피커, 경광등, 확성기 등 다양한 장치로 구성될 수 있다.Meanwhile, the warning device W may be composed of various devices such as speakers, warning lights, and loudspeakers.

이러한 행동인식 기반 위험 상황 감지 시스템에 의해 이루어지는 위험 상황 감지 방법에 대하여 하기에서 구체적으로 설명하기로 한다.The method of detecting a dangerous situation performed by this behavior recognition-based dangerous situation detection system will be described in detail below.

도 10은 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 방법을 개략적으로 나타낸 흐름도이고, 도 11은 도 10의 S300 단계를 순차적으로 나타낸 흐름도이고, 도 12는 도 11의 S330 단계를 순차적으로 나타낸 흐름도이며, 도 13은 도 11의 S341 단계를 순차적으로 나타낸 흐름도이다.Figure 10 is a flowchart schematically showing a behavior recognition-based risk situation detection method according to an embodiment of the present invention, Figure 11 is a flowchart sequentially showing step S300 of Figure 10, and Figure 12 is a flowchart sequentially showing step S330 of Figure 11. This is a flowchart, and FIG. 13 is a flowchart sequentially showing step S341 of FIG. 11.

도 10을 참조하면, 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 방법은 이미지 추출단계(S100), 블러 처리단계(S200), 객체 검출단계(S300), 행동 인식단계(S400), 상황 판단단계(S500), 경보 및 긴급연락 단계(S600)를 포함할 수 있다.Referring to Figure 10, the behavior recognition-based risk situation detection method according to an embodiment of the present invention includes an image extraction step (S100), a blur processing step (S200), an object detection step (S300), a behavior recognition step (S400), and a situation. It may include a determination step (S500) and an alarm and emergency contact step (S600).

이미지 추출단계(S100)는 위험 감지 장치(2)가 촬영부(1)로부터 촬영된 영상을 수신 받아 복수의 프레임을 포함하는 이미지 데이터를 획득할 수 있다.In the image extraction step (S100), the risk detection device 2 may receive the captured image from the photographing unit 1 and obtain image data including a plurality of frames.

블러 처리단계(S200)는 위험 감지 장치(2)가 S100 단계에서 획득한 이미지 데이터에 블러 처리하는 단계로, 이미지 데이터의 각 프레임에 가우시안 블러 필터를 적용하여 블러 처리를 할 수 있다.The blur processing step (S200) is a step in which the risk detection device 2 blurs the image data acquired in step S100. Blur processing can be performed by applying a Gaussian blur filter to each frame of the image data.

객체 검출단계(S300)는 위험 감지 장치(2)가 S200 단계에서 블러 처리된 이미지 데이터에서 사람에 해당하는 객체를 검출하여 인원을 파악하고, 전처리를 통해 객체 이미지 데이터를 생성할 수 있다.In the object detection step (S300), the risk detection device 2 detects an object corresponding to a person in the image data blurred in step S200, identifies the person, and generates object image data through preprocessing.

도 11을 참조하면, S300 단계는 객체 추출단계(S310) 및 인원 확인단계(S320)를 포함하고, 인원 확인단계(S320)에서의 판단에 따라 진행되는 제1 전처리단계(S330), 교집합영역 판단단계(S340) 및 제2 전처리단계(S350)를 더 포함할 수 있다.Referring to FIG. 11, step S300 includes an object extraction step (S310) and a person confirmation step (S320), and a first preprocessing step (S330) performed according to the judgment in the person confirmation step (S320) and intersection area determination. It may further include a step (S340) and a second preprocessing step (S350).

객체 추출단계(S310)는 S200 단계에서 블러 처리된 이미지 데이터의 프레임 별로 사람에 해당하는 객체에 대한 바운딩 박스(BB)를 추출할 수 있다. 사람에 해당하는 객체에 대한 바운딩 박스 추출 방법에 대해서는 상기 시스템에서 구체적으로 설명하였으므로, 생략하기로 한다.In the object extraction step (S310), a bounding box (BB) for an object corresponding to a person can be extracted for each frame of the image data blurred in step S200. Since the method for extracting bounding boxes for objects corresponding to people has been explained in detail in the above system, it will be omitted.

인원 확인단계(S320)는 S310 단계에서 추출된 바운딩 박스(BB)의 수에 따라 인원을 파악하여 인원 정보를 생성할 수 있다. 이에 인원 정보에 따라 인원이 한 사람인지 두 사람인지를 확인할 수 있다.The person confirmation step (S320) can generate person information by identifying the number of people according to the number of bounding boxes (BB) extracted in step S310. Accordingly, depending on the number of people, it is possible to check whether there are one person or two people.

S300 단계는 인원 정보가 한 사람일 경우, 제1 전처리단계(S330)를 더 포함할 수 있다. Step S300 may further include a first preprocessing step (S330) when the personnel information is for one person.

제1 전처리단계(S330)는 프레임 별로 바운딩 박스(BB)를 변환하여 박스 꼭짓점 좌표 정보(P_l, P_r)를 획득하고, 박스 꼭짓점 좌표 정보(P_l, P_r)를 활용해 각 프레임을 크롭(crop)하여 객체 이미지 데이터의 복수의 객체 프레임을 획득할 수 있다.The first preprocessing step (S330) transforms the bounding box (BB) for each frame to obtain box vertex coordinate information (P _l , P _r ), and uses the box vertex coordinate information (P _l , P _r ) to calculate each frame. A plurality of object frames of object image data can be obtained by cropping.

도 12를 참조하면, S330 단계는 제1 좌표 변환단계(S331), 제1 크롭영역 설정단계(S332) 및 제1 크롭 단계(S333)를 포함할 수 있다.Referring to FIG. 12, step S330 may include a first coordinate conversion step (S331), a first crop area setting step (S332), and a first cropping step (S333).

제1 좌표 변환단계(S331)는 프레임 별로 바운딩 박스(BB)의 좌표를 왼쪽 상단 꼭짓점 좌표 P_l(x_l,y_l)와 오른쪽 하단 꼭짓점 좌표 P_r(x_r,y_r)로 변환하여 박스 꼭짓점 좌표 정보(P_l, P_r)를 획득할 수 있다.The first coordinate conversion step (S331) converts the coordinates of the bounding box (BB) into the upper left vertex coordinates P _l (x _l , y _l ) and the lower right vertex coordinates P _r (x _r , y _r ) for each frame to transform the box. Vertex coordinate information (P _l , P _r ) can be obtained.

제1 크롭영역 설정단계(S332)는 복수의 프레임을 기준으로 박스 꼭짓점 좌표 정보(P_l, P_r)들을 비교하여 x좌표 기준 최소 값 상단 좌표 P_min와 최대 값 하단 좌표 P_max인 최소최대 좌표를 추출하고, 최소최대 좌표를 기준으로 크롭 영역(CR)을 설정할 수 있다.The first crop area setting step (S332) compares the box vertex coordinate information (P _l , P _r ) based on a plurality of frames, and the minimum and maximum coordinates are the minimum value upper coordinate P _min and the maximum value lower coordinate P _max based on the x coordinate. You can extract and set the crop area (CR) based on the minimum and maximum coordinates.

제1 크롭 단계(S333)는 S332 단계에서 설정된 크롭 영역(CR)에 따라 각 프레임을 크롭(crop)하여 복수의 객체 프레임을 획득할 수 있다.In the first cropping step (S333), a plurality of object frames may be obtained by cropping each frame according to the cropping area (CR) set in step S332.

또한 S300 단계는 인원 정보가 두 사람일 경우, 교집합영역 판단단계(S340) 및 제2 전처리단계(S350)를 더 포함할 수 있다. In addition, step S300 may further include an intersection area determination step (S340) and a second preprocessing step (S350) when the person information is for two people.

교집합영역 판단단계(S340)는 프레임의 바운딩 박스(BB) 간의 교집합 영역 여부를 확인할 수 있다. S340 단계에서 바운딩 박스(BB) 간의 교집합 영역이 존재한다고 판단될 경우 S350 단계로 진행될 수 있다.In the intersection area determination step (S340), it is possible to check whether there is an intersection area between the bounding boxes (BB) of the frame. If it is determined in step S340 that an intersection area between the bounding boxes BB exists, the process may proceed to step S350.

반면, S340 단계에서 바운딩 박스(BB) 간의 교집합 영역이 존재하지 않는다고 판단될 경우, 해당 이미지 데이터에 대한 처리는 종료하고 S100 단계부터 다시 진행될 수 있다.On the other hand, if it is determined in step S340 that there is no intersection area between the bounding boxes BB, processing of the corresponding image data may be terminated and proceed again from step S100.

제2 전처리단계(S350)는 S340 단계에서 바운딩 박스 간의 교집합 영역이 존재한다고 판단될 경우, 프레임 별로 교집합 영역이 존재하는 바운딩 박스(BB)들을 변환하여 박스 꼭짓점 좌표 정보(P_l, P_r)를 획득하고, 교집합 영역이 존재하는 바운딩 박스(BB)들의 박스 꼭짓점 좌표 정보(P_l, P_r)를 활용해 각 프레임을 크롭(crop)하여 상기 객체 이미지 데이터의 객체 프레임을 획득할 수 있다.In the second preprocessing step (S350), when it is determined that an intersection area between bounding boxes exists in step S340, the bounding boxes (BB) in which the intersection area exists for each frame are converted to provide box vertex coordinate information (P _l , P _r ). The object frame of the object image data can be obtained by cropping each frame using the box vertex coordinate information (P _l , P _r ) of the bounding boxes (BB) where the intersection area exists.

도 13을 참조하면, S350 단계는 제2 좌표 변환단계(S351), 박스영역 추출단계(S352), 제2 크롭영역 설정단계(S353) 및 제2 크롭 단계(S354)를 포함할 수 있다.Referring to FIG. 13, step S350 may include a second coordinate conversion step (S351), a box area extraction step (S352), a second crop area setting step (S353), and a second cropping step (S354).

제2 좌표 변환단계(S351)는 프레임 별로 교집합 영역이 존재하는 바운딩 박스(BB)들의 좌표를 각각 왼쪽 상단 꼭짓점 좌표 P_l(x_l,y_l)와 오른쪽 하단 꼭짓점 좌표 P_r(x_r,y_r)로 변환하여 박스 꼭짓점 좌표 정보(P_l, P_r)를 획득할 수 있다.The second coordinate conversion step (S351) converts the coordinates of the bounding boxes (BB) in which the intersection area exists for each frame into the upper left vertex coordinates P _l (x _l , y _l ) and the lower right vertex coordinates P _r (x _r , y). _r ), you can obtain box vertex coordinate information (P _l , P _r ).

박스영역 추출단계(S352)는 프레임 별로 S351 단계에서 변환된 박스 꼭짓점 좌표 정보(P_l, P_r)들을 토대로 박스 영역 좌표 정보를 추출할 수 있다.In the box area extraction step (S352), box area coordinate information can be extracted for each frame based on the box vertex coordinate information (P _l , P _r ) converted in step S351.

여기서 박스 영역 좌표 정보는 박스 영역의 x좌표 기준 최소 값 상단 좌표와 최대 값 하단 좌표이다.Here, the box area coordinate information is the upper coordinate of the minimum value and the lower coordinate of the maximum value based on the x coordinate of the box area.

제2 크롭영역 설정단계(S353)는 복수의 프레임을 기준으로 각 프레임의 박스 영역 좌표 정보들을 비교하여 x좌표 기준 최소 값 상단 좌표 P_min와 최대 값 하단 좌표 P_max인 최소최대 좌표를 추출하고, 최소최대 좌표를 기준으로 박스 형태의 크롭 영역(CR)을 설정할 수 있다.The second crop area setting step (S353) compares the box area coordinate information of each frame based on a plurality of frames to extract the minimum and maximum coordinates, which are the minimum value upper coordinate P _min and the maximum value lower coordinate P _max based on the x coordinate, You can set a box-shaped crop area (CR) based on the minimum and maximum coordinates.

제2 크롭 단계(S354)는 S353 단계에서 설정된 크롭 영역(CR)에 따라 각 프레임을 크롭(crop)하여 복수의 객체 프레임을 획득할 수 있다.In the second cropping step (S354), a plurality of object frames may be obtained by cropping each frame according to the cropping area (CR) set in step S353.

즉, S300 단계는 파악된 인원 수에 따라 전처리가 다르게 이루어져 객체 이미지 데이터를 얻을 수 있다.In other words, in step S300, preprocessing is performed differently depending on the number of people identified, and object image data can be obtained.

행동 인식단계(S400)는 위험 감지 장치(2)가 파악된 인원에 따라 객체 이미지 데이터로부터 행동을 인식하는 단계로, 인공지능 모델 중 SlowFast 모델을 통해 객체의 행동을 인식할 수 있다.The behavior recognition step (S400) is a step in which the risk detection device (2) recognizes behavior from object image data according to the identified person, and the object's behavior can be recognized through the SlowFast model among artificial intelligence models.

상황 판단단계(S500)는 S400 단계에서 인식된 행동에 따라 위험 상황인지 일반 상황인지를 판단할 수 있다. S500 단계는 위험 상황으로 판단될 경우 S600 단계로 넘어가며, 일반 상황으로 판단될 경우 처음으로 돌아가 S100 단계부터 다시 진행되도록 할 수 있다.The situation determination step (S500) can determine whether it is a dangerous situation or a normal situation depending on the behavior recognized in step S400. If the S500 stage is judged to be a dangerous situation, it moves to the S600 stage, and if it is judged to be a normal situation, the process can be returned to the beginning and started again from the S100 stage.

경보 및 긴급연락 단계(S600)는 S500 단계에서 위험 상황으로 판단될 경우, 경보 장치(W)를 작동시키거나 등록된 단말(T)로 긴급 연락을 수행할 수 있다.In the alarm and emergency contact step (S600), if a dangerous situation is determined in step S500, the alarm device (W) can be activated or emergency contact can be made to the registered terminal (T).

상기에서 설명한 바와 같이, 본 발명의 실시예에 따른 행동인식 기반 위험 상황 감지 시스템 및 방법은 영상을 활용하여 1인 가구에서 발생할 수 있는 고독사 또는 주거침입의 위험상황을 감지하는 것으로, 위험상황을 예방하거나 신속하게 대처할 수 있도록 한다.As described above, the behavior recognition-based risk situation detection system and method according to an embodiment of the present invention uses video to detect a risk situation of a lonely death or home invasion that may occur in a single-person household, and detects the risk situation. Prevent or respond quickly.

이상으로 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고 다른 구체적인 형태로 실시할 수 있다는 것을 이해할 수 있을 것이다. 따라서 이상에서 기술한 실시예는 모든 면에서 예시적인 것이며 한정적이 아닌 것이다.Although embodiments of the present invention have been described above with reference to the attached drawings, those skilled in the art will understand that the present invention can be implemented in other specific forms without changing the technical idea or essential features of the present invention. You will be able to understand it. Therefore, the embodiments described above are illustrative in all respects and are not restrictive.

1: 촬영부
2: 위험 감지 장치
20: 이미지 추출부
21: 객체 검출부
210: 인원 확인부
211: 이미지 전처리부
22: 행동 인식부
23: 알림부
24: 데이터 베이스
W: 경보 장치
T: 단말
BB: 사람에 해당하는 객체에 대한 바운딩 박스
CR: 크롭 영역1: Filming Department
2: Hazard detection device
20: Image extraction unit
21: object detection unit
210: Personnel confirmation department
211: Image preprocessing unit
22: Behavior recognition unit
23: Notification unit
24: database
W: alarm device
T: terminal
BB: Bounding box for the object corresponding to the person
CR: Crop area

Claims

A filming unit installed in a residential space and performing filming, and
A risk detection device that acquires image data from an image captured through the photographing unit, performs a blur process on the acquired image data, and then recognizes the behavior of an object corresponding to a person to determine a risk situation,
The danger detection device,
an image extraction unit that receives the captured image from the photographing unit, obtains image data including a plurality of frames, and performs a blur process on the image data;
An object detection unit that detects the object from blurred image data, identifies people according to the detected object, and generates object image data through preprocessing;
It includes a behavior recognition unit that recognizes behavior from the object image data according to the identified number of people and determines whether or not there is a dangerous situation,
The object detection unit,
A person identification unit that extracts the bounding box for the object for each frame of blurred image data and identifies the number of people, and
An image pre-processing unit that generates the object image data by cropping each frame of the image data according to the personnel information identified from the personnel verification unit to obtain a plurality of object frames,
The image preprocessing unit,
If the above person is identified as one person,
Convert the bounding box extracted for each frame into upper left vertex coordinates and lower right vertex coordinates to obtain box vertex coordinate information,
Compare the box vertex coordinate information based on a plurality of frames to extract the minimum and maximum coordinates, which are the upper coordinates of the minimum value and the lower coordinates of the maximum value based on the x coordinate,
A behavior recognition-based risk situation detection system that crops each frame by setting a crop area based on the minimum and maximum coordinates.

delete

According to paragraph 1,
The personnel verification department said,
If multiple bounding boxes are extracted and it is determined that the number of people is more than two,
A behavior recognition-based risk situation detection system characterized by checking whether there is an intersection area between the bounding boxes.

delete

According to paragraph 3,
The image preprocessing unit,
If the number of people is determined to be two or more, but there is an intersection area between the bounding boxes,
Obtain box vertex coordinate information by converting the bounding boxes with intersection areas for each frame into upper left vertex coordinates and lower right vertex coordinates, respectively,
Based on the box vertex coordinate information of the bounding box where the intersection area exists for each frame, box area coordinate information, which is the minimum upper coordinate and maximum value lower coordinate based on the x coordinate of the box area, is extracted,
Compare the box area coordinate information based on a plurality of frames to extract the minimum and maximum coordinates, which are the upper coordinate of the minimum value and the lower coordinate of the maximum value based on the x coordinate,
A behavior recognition-based risk situation detection system that crops each frame by setting a crop area based on the minimum and maximum coordinates.

According to paragraph 1,
The behavior recognition unit,
Recognize behavior from the object image data according to the number of people identified through the behavior recognition model to determine whether there is a dangerous situation,
The action recognition model is,
A Slow model that receives some of the plurality of object frames of the object image data and extracts spatial and color features;
A fast model that receives more object frames than the slow model and extracts features according to time changes, and
A behavior recognition-based risk situation detection system that recognizes behavior by combining the features extracted from the slow model and fast model based on the number of people, and includes a prediction unit that determines whether or not there is a risk situation according to the recognized behavior.

An image extraction step in which a risk detection device receives an image captured from a photographing unit and obtains image data including a plurality of frames;
A blur processing step of blurring the acquired image data;
An object detection step of detecting objects corresponding to people in blurred image data to identify people and generating object image data through preprocessing;
A behavior recognition step of recognizing behavior from the object image data according to the identified person, and
It includes a situation determination step to determine whether a dangerous situation exists based on the recognized behavior,
The object detection step is,
An object extraction step of extracting a bounding box for the object for each frame of blurred image data, and
It includes a personnel confirmation step of identifying the number of people according to the extracted bounding box,
The object detection step is,
If the person information identified in the person confirmation step above is one person,
A first preprocessing step of converting the bounding box for each frame to obtain box vertex coordinate information, and cropping each frame using the box vertex coordinate information to obtain a plurality of object frames of the object image data. A behavior recognition-based risk situation detection method including more.

delete

In clause 7,
The first preprocessing step is,
A first coordinate conversion step of obtaining box vertex coordinate information by converting the bounding box into upper left vertex coordinates and lower right vertex coordinates for each frame;
A first crop area that compares the box vertex coordinate information based on a plurality of frames to extract the minimum and maximum coordinates, which are the upper coordinate of the minimum value and the lower coordinate of the maximum value based on the x coordinate, and sets the crop area based on the minimum and maximum coordinates Setting steps and
A behavior recognition-based risk situation detection method comprising a first cropping step of obtaining a plurality of object frames by cropping each frame according to the cropping area.

In clause 7,
The object detection step is,
If the number of people identified in the person confirmation step above is more than two people,
An intersection area determination step of checking whether there is an intersection area between the bounding boxes of the frame, and
If an intersection area between the bounding boxes exists, box vertex coordinate information is obtained by transforming the bounding boxes in which the intersection area exists for each frame, and the box vertex coordinate information of the bounding boxes in which the intersection area exists is utilized for each frame. A behavior recognition-based risk situation detection method further comprising a second preprocessing step of cropping to obtain an object frame of the object image data.

According to clause 11,
The second preprocessing step is,
A second coordinate conversion step of obtaining box vertex coordinate information by converting the bounding boxes with intersection areas for each frame into upper left vertex coordinates and lower right vertex coordinates, respectively;
A box area extraction step of extracting box area coordinate information, which is the upper coordinate of the minimum value and the lower coordinate of the maximum value based on the x coordinate of the box area, based on the box vertex coordinate information of the bounding box where the intersection area exists for each frame;
A second crop area that compares the box area coordinate information based on a plurality of frames to extract the minimum and maximum coordinates, which are the upper coordinate of the minimum value and the lower coordinate of the maximum value based on the x coordinate, and sets the crop area based on the minimum and maximum coordinates. Setting steps and
A behavior recognition-based risk situation detection method comprising a second cropping step of obtaining a plurality of object frames by cropping each frame according to the cropping area.

In clause 7,
If a dangerous situation is determined in the situation determination step, a behavior recognition-based dangerous situation detection method further includes an alarm and emergency contact step of activating an alarm device or making emergency contact to a registered terminal.