KR20230042926A

KR20230042926A - Apparatus and Method for Detecting Violence, Smart Violence Monitoring System having the same

Info

Publication number: KR20230042926A
Application number: KR1020210125773A
Authority: KR
Inventors: 김용호; 박정우; 엄동원
Original assignee: 주식회사 소이넷
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2023-03-30
Also published as: KR102648004B1

Abstract

The present invention relates to a device and method for detecting violence, and a smart violence detection system comprising the same. According to one embodiment of the present invention, provided is the device for detecting violence comprising: an object detection part that detects an object by analyzing a surveillance image, and analyzes at least one among a pose and action of the detected object; an object analysis part that identifies a perpetrator and victim through face detection and face identification, and analyzes a change factor set to reflect a damage of the victim as time goes by; and a violence occurrence determination part that determines whether the analyzed change factor satisfies a violence occurrence condition and transmits a determination result to the outside.

Description

Apparatus and Method for Detecting Violence, Smart Violence Monitoring System having the same}

본 발명은 폭력감지장치 및 방법, 이를 포함하는 스마트 폭력감시시스템에 관한 것으로, 더욱 상세하게는 보육 시설에서 수집되는 영상을 분석하여 보육 아동에 대한 폭력 또는 이상행동을 감지할 수 있는 폭력감지장치 및 방법, 이를 포함하는 스마트 폭력감시시스템에 관한 것이다.The present invention relates to a violence detection device and method, and a smart violence monitoring system including the same, and more particularly, a violence detection device capable of detecting violence or abnormal behavior against childcare by analyzing images collected in childcare facilities, and method, and a smart violence monitoring system including the same.

최근, 어린이집, 유치원 등의 보육시설에서 교사에 의한 아동 가혹행위가 많이 발생하여 사회 문제가 되고 있다.Recently, many cases of child abuse by teachers in child care facilities such as daycare centers and kindergartens have become a social problem.

이러한 문제점에 해결하고자, 보육시설에 감시카메라가 설치되고 있으나, 상시 분석이 불가능하며, 아동의 상태를 확인한 학부모가 감시카메라의 영상을 요구하여 확인한 후에나 가혹행위가 확인되고 있다.In order to solve this problem, surveillance cameras are installed in childcare facilities, but constant analysis is not possible, and harsh behavior is only confirmed after parents confirm the child's condition by requesting video from the surveillance camera.

또한, 어린이집 입장에서도 교사의 행동을 상시 확인할 수 있는 방법이 없으며, 사고 발생을 미연이 방지할 수 있는 방안이 필요한 상황이다.In addition, there is no way to check the teacher's behavior at all times even in the daycare center, and there is a need for a plan to prevent accidents in advance.

따라서, 보육 시설과 학부모 사이에 상호 수용 가능한 방지 방안을 마련할 필요가 있다.Therefore, it is necessary to prepare mutually acceptable prevention measures between child care facilities and parents.

대한민국 공개특허 제10-2009-0035379호Republic of Korea Patent Publication No. 10-2009-0035379

본 발명이 이루고자 하는 기술적 과제는 보육 시설에서 수집되는 영상을 분석하여 보육 아동에 대한 폭력 또는 이상행동을 감지할 수 있는 폭력감지장치 및 방법, 이를 포함하는 스마트 폭력감시시스템을 제공하는 것이다.The technical problem to be achieved by the present invention is to provide a violence detection device and method capable of detecting violence or abnormal behavior against childcare by analyzing images collected in childcare facilities, and a smart violence monitoring system including the same.

또한, 본 발명이 이루고자 하는 기술적 과제는 보육 시설의 특수성을 반영할 수 있고, 고성능의 장비를 대체할 수 있는 폭력감지장치 및 방법, 이를 포함하는 스마트 폭력감시시스템을 제공하는 것이다.In addition, the technical problem to be achieved by the present invention is to provide a violence detection device and method that can reflect the specificity of childcare facilities and can replace high-performance equipment, and a smart violence monitoring system including the same.

본 발명이 이루고자 하는 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problem to be achieved by the present invention is not limited to the above-mentioned technical problem, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

상기 기술적 과제를 달성하기 위하여, 본 발명의 일 실시예에 따르면, 감시 영상을 분석하여 객체를 검출하고, 검출된 객체의 포즈 및 행동 중 적어도 하나를 분석하는 객체 검출부, 얼굴 감지 및 얼굴 식별을 통해 가해자와 피해자를 확인하고, 시간 경과에 따라 피해자의 피해상황을 반영하도록 설정된 변화요소를 분석하는 객체 분석부, 및 분석된 변화요소가 폭력 발생 조건을 만족하는지 판단하여 판단 결과를 외부로 송신하는 폭력발생 판단부를 포함하는, 폭력감지장치를 제공한다.In order to achieve the above technical problem, according to an embodiment of the present invention, an object is detected by analyzing a surveillance image, and through an object detection unit that analyzes at least one of a pose and a motion of the detected object, face detection and face identification. An object analysis unit that identifies perpetrators and victims and analyzes the change factors set to reflect the victim's damage over time, and violence that determines whether the analyzed change factors satisfy the conditions for occurrence of violence and transmits the judgment result to the outside. It provides a violence detection device including an occurrence determination unit.

본 발명의 실시예에 있어서, 상기 객체 검출부는, 검출된 객체의 포즈를 분석하여 행동을 인식하고, 설정된 분석 모델을 이용하여 인식된 행동이 폭력과 관련된 것인지 분석하며, 복수의 객체 사이의 도심 거리를 측정하여 겹치는 객체를 확인할 수 있다.In an embodiment of the present invention, the object detection unit analyzes the pose of the detected object to recognize the action, analyzes whether the recognized action is related to violence using a set analysis model, and measures the city center distance between the plurality of objects. By measuring , overlapping objects can be identified.

본 발명의 실시예에 있어서, 상기 객체 분석부는, 얼굴인식 모델 및 얼굴 특징정보 추출모델 중 적어도 하나를 이용하여 겹치는 객체들의 얼굴 감지 및 얼굴 식별을 실시하여 가해자 및 피하자를 확인할 수 있다.In an embodiment of the present invention, the object analyzer may identify an offender and a victim by performing face detection and face identification of overlapping objects using at least one of a face recognition model and a facial feature information extraction model.

본 발명의 실시예에 있어서, 상기 객체 분석부는, 시간 경과에 따른 표정 변화 및/또는 감정 변화를 반영하도록, 피해자의 얼굴 상태가 포함된 장면, 및 피해자의 포즈 중 적어도 하나를 포함하는 상기 변화요소를 분석할 수 있다.In an embodiment of the present invention, the object analysis unit includes at least one of a scene including a facial state of the victim and a pose of the victim so as to reflect a change in expression and/or emotion over time. can be analyzed.

본 발명의 실시예에 있어서, 상기 객체 검출부는, 상기 감시 영상으로부터 오디오 분석을 통해 오디오 데이터를 추출하고, 상기 객체 분석부는, 상기 객체 검출부로부터 추출된 오디오 데이터를 수신하여 사운드를 인식할 수 있다.In an embodiment of the present invention, the object detector may extract audio data from the surveillance image through audio analysis, and the object analyzer may recognize sound by receiving the audio data extracted from the object detector.

본 발명의 실시예에 있어서, 상기 폭력발생 판단부는, 피해자의 얼굴 표정, 가해자와 피해자의 포즈 및 사운드 각각을 장면 기반 폭력, 인물포즈 기반 폭력 및 사운드 기반 폭력 각각의 케이스로 분류하고, 케이스별로 설정된 방식으로 스코어를 측정하며, 측정된 스코어를 합산한 후 기준 스코어와 비교하여 폭력 발생을 판단할 수 있다.In an embodiment of the present invention, the violence occurrence determination unit classifies the facial expressions of the victim, the poses and sounds of the offender and the victim into each case of scene-based violence, character pose-based violence, and sound-based violence, and set for each case. Scores are measured in this way, and the occurrence of violence can be determined by adding up the measured scores and comparing them with a standard score.

본 발명의 실시예에 있어서, 상기 폭력발생 판단부는, 장면 기반 폭력 케이스에서 상기 폭력 감지 모델을 이용하여 프레임 내 특징을 학습하는 CNN 기반 분류, 시간별 프레임간 특징을 학습하는 LSTM 기반 분류, 불변/로컬의 특징을 학습하는 광학흐름 기반 분류를 실시하여 폭력 또는 일반을 판단하도록 스코어를 측정하거나, 인물포즈 기반 폭력 케이스에서 프레임 내 특징을 학습하는 CNN 기반 포즈 판단, 객체간 거리 및 기준값 초과 여부 계산을 실시하여 폭력 또는 일반을 판단하도록 스코어를 측정하거나, 사운드 기반 폭력 케이스에서 LSTM 기반의 사운드 분류를 실시하여 정상 또는 비정상을 판단하도록 스코어를 측정할 수 있다.In an embodiment of the present invention, the violence occurrence determination unit uses the violence detection model in a scene-based violence case, CNN-based classification to learn intra-frame features, LSTM-based classification to learn inter-frame features by time, invariant/local Conducting optical flow-based classification that learns the characteristics of a score to determine violence or general, or CNN-based pose judgment that learns features within a frame in a character pose-based violence case, distance between objects and calculation of whether a reference value is exceeded The score can be measured to determine violence or general, or the score can be measured to determine normal or abnormal by performing LSTM-based sound classification in sound-based violence cases.

상기 기술적 과제를 달성하기 위하여, 본 발명의 다른 실시예에 따르면, 보육 시설에 설치되고, 적어도 하나의 대상을 촬영하여 감시 영상을 생성하는 촬영장치, 폭력감지장치, 및 상기 폭력감지장치로부터 폭력 발생 알림과 의심징후의 감시 영상을 수신하여 재확인하고, 피해자의 보호자에게 폭력 발생을 통지하는 관제서버를 포함하는, 스마트 폭력감시시스템을 제공한다.In order to achieve the above technical problem, according to another embodiment of the present invention, a photographing device installed in a child care facility and generating a surveillance image by photographing at least one subject, a violence detecting device, and violence occurring from the violence detecting device Provides a smart violence monitoring system that includes a control server that receives and reconfirms surveillance images of notifications and suspicious signs, and notifies the guardians of victims of violence.

본 발명의 실시예에 있어서, 상기 관제서버는, 감시 영상의 분석을 통해 폭력 발생을 확인한 판정값을 추가 학습용 데이터로 상기 폭력감지장치에 제공할 수 있다.In an embodiment of the present invention, the control server may provide the violence detection device with a determination value for confirming the occurrence of violence through analysis of the surveillance video as additional data for learning.

상기 기술적 과제를 달성하기 위하여, 본 발명의 또 다른 실시예에 따르면, 영상 분석을 통해 감시 영상으로부터 객체를 검출하는 단계, 객체별 포즈를 검출하고, 액션을 인식하여 인식된 액션이 폭력과 관련된 것인지 판단하는 단계, 객체간 도심 거리를 측정하고, 측정된 도심 거리를 이용하여 겹치는 객체를 확인하는 단계, 겹치는 것으로 확인된 객체들의 얼굴 감지 및 얼굴 식별을 통해 가해자 및 피해자를 확인하고, 시간 경과에 따른 피해자의 표정 변화 및/또는 감정 변화를 반영하는 변화요소를 분석하는 단계, 오디오 분석을 통해 사운드를 인식하고, 인식된 사운드가 폭력과 관련이 있는지 판단하는 단계, 및 상기 변화요소와 인식된 사운드를 폭력발생 판단기준과 비교하여 폭력 발생을 판단하는 단계를 포함하는, 폭력감지방법을 제공한다.In order to achieve the above technical problem, according to another embodiment of the present invention, detecting an object from a surveillance video through image analysis, detecting a pose for each object, and recognizing an action to determine whether the recognized action is related to violence. Determining step, measuring the city center distance between objects, identifying overlapping objects using the measured city center distance, identifying the perpetrator and victim through face detection and face identification of the objects identified as overlapping, Analyzing change factors that reflect changes in facial expressions and/or emotions of the victim, recognizing sounds through audio analysis, and determining whether the recognized sounds are related to violence, and analyzing the change factors and the recognized sounds. It provides a method for detecting violence, including the step of determining the occurrence of violence by comparing it with criteria for determining the occurrence of violence.

본 발명의 실시예에 있어서, 상기 폭력 발생을 판단하는 단계에서는, 장면 기반 폭력 케이스에서 상기 폭력 감지 모델을 이용하여 프레임 내 특징을 학습하는 CNN 기반 분류, 시간별 프레임간 특징을 학습하는 LSTM 기반 분류, 불변/로컬의 특징을 학습하는 광학흐름 기반 분류를 실시하여 폭력 또는 일반을 판단하도록 스코어를 측정하는 단계, 인물포즈 기반 폭력 케이스에서 프레임 내 특징을 학습하는 CNN 기반 포즈 판단, 객체간 거리 및 기준값 초과 여부 계산을 실시하여 폭력 또는 일반을 판단하도록 스코어를 측정하는 단계, 사운드 기반 폭력 케이스에서 LSTM 기반의 사운드 분류를 실시하여 정상 또는 비정상을 판단하도록 스코어를 측정하는 단계, 및 케이스별로 설정된 방식으로 스코어를 측정하고, 측정된 스코어를 합산한 후 기준 스코어와 비교하여 폭력 발생을 판단하는 단계를 포함할 수 있다.In an embodiment of the present invention, in the step of determining the occurrence of violence, CNN-based classification for learning intra-frame features using the violence detection model in a scene-based violence case, LSTM-based classification for learning inter-frame features for each time, Measuring scores to determine violence or general by performing optical flow-based classification that learns immutable/local features, CNN-based pose judgment that learns in-frame features in character pose-based violent cases, distance between objects and exceeding reference values Calculating whether or not to measure the score to determine violence or general, performing LSTM-based sound classification in sound-based violence cases to measure scores to determine normal or abnormal, and scoring in a manner set for each case The method may include measuring, summing up the measured scores, and then determining the occurrence of violence by comparing them with a reference score.

본 발명의 실시예에 따르면, 보육 시설에서 수집되는 영상을 분석하여 보육 아동에 대한 폭력 또는 이상행동을 감지할 수 있다.According to an embodiment of the present invention, it is possible to detect violence or abnormal behavior against children in child care by analyzing images collected in child care facilities.

또한, 본 발명이 이루고자 하는 기술적 과제는 보육 시설의 특수성을 반영할 수 있고, 고성능의 장비를 대체할 수 있다.In addition, the technical problem to be achieved by the present invention can reflect the specificity of childcare facilities and can replace high-performance equipment.

본 발명의 효과는 상기한 효과로 한정되는 것은 아니며, 본 발명의 설명 또는 청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.The effects of the present invention are not limited to the above effects, and should be understood to include all effects that can be inferred from the description of the present invention or the configuration of the invention described in the claims.

도 1은 본 발명의 일 실시예에 따른 스마트 폭력감시시스템의 구성을 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 폭력감지장치의 구성을 나타내는 도면이다.
도 3은 본 발명의 일 실시예에 따른 자세 및 동작 분석 모델의 예시를 나타내는 도면이다.
도 4는 본 발명의 일 실시예에 따른 객체간 겹침을 예시적으로 나타내는 도면이다.
도 5는 본 발명의 일 실시예에 따른 얼굴인식 모델의 예시를 나타내는 도면이다.
도 6은 본 발명의 일 실시예에 따른 얼굴 특징정보 추출 모델의 예시를 나타내는 도면이다.
도 7은 본 발명의 일 실시예에 따른 표정분석 모델의 예시를 나타내는 도면이다.
도 8은 본 발명의 일 실시예에 따른 폭력감지방법을 나타내는 순서도이다.
도 9는 본 발명의 일 실시예에 폭력 발생 판단 단계를 세부적으로 나타내는 순서도이다.1 is a diagram showing the configuration of a smart violence monitoring system according to an embodiment of the present invention.
2 is a diagram showing the configuration of a violence detection device according to an embodiment of the present invention.
3 is a diagram showing an example of a posture and motion analysis model according to an embodiment of the present invention.
4 is a diagram exemplarily illustrating overlap between objects according to an embodiment of the present invention.
5 is a diagram showing an example of a face recognition model according to an embodiment of the present invention.
6 is a diagram illustrating an example of a facial feature information extraction model according to an embodiment of the present invention.
7 is a diagram showing an example of a facial expression analysis model according to an embodiment of the present invention.
8 is a flowchart illustrating a violence detection method according to an embodiment of the present invention.
9 is a flowchart showing in detail the violence occurrence determination step according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명을 설명하기로 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 따라서 여기에서 설명하는 실시예로 한정되는 것은 아니다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, the present invention will be described with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and, therefore, is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결(접속, 접촉, 결합)"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 구비할 수 있다는 것을 의미한다.Throughout the specification, when a part is said to be "connected (connected, contacted, combined)" with another part, this is not only "directly connected", but also "indirectly connected" with another member in between. "Including cases where In addition, when a part "includes" a certain component, it means that it may further include other components without excluding other components unless otherwise stated.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "include" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

이하 첨부된 도면을 참고하여 본 발명의 실시예를 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 폭력감지장치 및 방법, 이를 포함하는 스마트 폭력감시시스템의 구성을 나타내는 도면이다. 도 2는 본 발명의 일 실시예에 따른 폭력감지장치의 구성을 나타내는 도면이다. 도 3은 본 발명의 일 실시예에 따른 자세 및 동작 분석 모델의 예시를 나타내는 도면이다. 도 4는 본 발명의 일 실시예에 따른 객체간 겹침을 예시적으로 나타내는 도면이다. 도 5는 본 발명의 일 실시예에 따른 얼굴인식 모델의 예시를 나타내는 도면이다. 도 6은 본 발명의 일 실시예에 따른 얼굴 특징정보 추출 모델의 예시를 나타내는 도면이다. 도 7은 본 발명의 일 실시예에 따른 표정분석 모델의 예시를 나타내는 도면이다.1 is a diagram showing the configuration of a violence detection device and method according to an embodiment of the present invention, and a smart violence monitoring system including the same. 2 is a diagram showing the configuration of a violence detection device according to an embodiment of the present invention. 3 is a diagram showing an example of a posture and motion analysis model according to an embodiment of the present invention. 4 is a diagram exemplarily illustrating overlap between objects according to an embodiment of the present invention. 5 is a diagram showing an example of a face recognition model according to an embodiment of the present invention. 6 is a diagram illustrating an example of a facial feature information extraction model according to an embodiment of the present invention. 7 is a diagram showing an example of a facial expression analysis model according to an embodiment of the present invention.

도 1 내지 도 7을 참조하면, 본 발명의 일 실시예에 따른 폭력감지장치 및 방법, 이를 포함하는 스마트 폭력감시시스템은 촬영장치(100), 폭력감지장치(200) 및 관제서버(300)를 포함할 수 있다.1 to 7, a violence detection device and method according to an embodiment of the present invention, and a smart violence monitoring system including the same, include a photographing device 100, a violence detection device 200, and a control server 300. can include

상기 촬영장치(100)는 보육 시설에 설치되고, 보육아동 및 보육사(교사) 중 적어도 하나의 대상을 촬영하며, 촬영 대상과 관련된 영상 및 소리 중 적어도 하나를 포함하는 감시 영상을 생성할 수 있다. 이를 위하여, 상기 촬영장치(100)는 폐쇄회로(CCTV)를 포함할 수 있다.The photographing device 100 may be installed in a childcare facility, photograph at least one target of childcare children and childcare workers (teachers), and generate a monitoring image including at least one of images and sounds related to the photographed target. To this end, the photographing device 100 may include a closed circuit (CCTV).

상기 폭력감지장치(200)는 감시 영상으로부터 객체를 검출하고, 검출된 객체의 포즈 및 행동 중 적어도 하나를 분석하는 객체 검출부(210), 얼굴 감지 및 얼굴 식별을 통해 가해자와 피해자를 확인하고, 시간 경과에 따라 피해자의 피해상황을 반영하도록 설정된 변화요소를 분석하는 객체 분석부(220), 및 분석된 변화요소가 폭력 발생 조건을 만족하는지 판단하여 판단 결과를 외부로 송신하는 폭력발생 판단부(230)를 포함할 수 있다.The violence detection device 200 detects an object from a surveillance video, identifies an assailant and a victim through an object detector 210 that analyzes at least one of the pose and action of the detected object, face detection and face identification, and time An object analysis unit 220 that analyzes change factors set to reflect the victim's damage situation over time, and a violence occurrence determination unit 230 that determines whether the analyzed change factors satisfy violence occurrence conditions and transmits the determination result to the outside. ) may be included.

구체적으로, 상기 객체 검출부(210)는 촬영장치(100)를 통해 촬영된 감시 영상으로부터 영상 분석을 통해 적어도 하나의 객체를 검출할 수 있다. 여기서, 상기 객체 검출부(210)는 오픈 포즈(OpenPose), 코드북을 기반으로 하는 배경 모델링 방법(Codebook-based Background modeling), 및 휴먼 포즈 에스티메이션(Human Pose Estimation) 중 적어도 하나를 포함하는 검출 알고리즘을 이용하여 영상 프레임에서 객체를 검출할 수 있다. 예를 들면, 상기 객체 검출부(210)는 도 3을 참조하여 이미지나 영상에서 사람의 포즈를 검출하는 상기 휴먼 포즈 에스티메이션을 이용하여 신체의 주요 연결부위(관절)를 키-포인트 검출하고, 이들을 연결해 서있기, 걷기, 발차기 주먹질 등의 포즈(자세)를 인식할 수 있다.Specifically, the object detector 210 may detect at least one object from a surveillance image captured by the photographing device 100 through image analysis. Here, the object detector 210 uses a detection algorithm including at least one of OpenPose, Codebook-based Background Modeling, and Human Pose Estimation. object can be detected in the image frame. For example, with reference to FIG. 3 , the object detection unit 210 detects key-points of main connecting parts (joints) of the body using the human pose estimation that detects a human pose in an image or video, and It can recognize poses (postures) such as standing, walking, kicking, and punching.

또한, 상기 객체 검출부(210)는 보육아동과 보육사(교사) 각각의 객체의 포즈(자세)를 분석하여 액션(행동)을 인식하고, 인식된 액션(행동)이 발차기, 주먹질 등의 폭력과 관련된 것인지 판단할 수 있다. 여기서, 상기 객체 검출부(210)는 자세추정 모델(예: PoseNet, body key point) 및 시간별 프레임 분석 모델(예: LSTM) 중 적어도 하나를 이용하여 객체의 행동을 분석할 수 있다. 예를 들면, 상기 객체 검출부(210)는 보육사(교사)가 특정 방향을 지시하는 포즈(자세)를 취하고, 보육아동들이 갑자기 한쪽으로 몰려가 대기하거나, 모두 고개를 숙이는 행동을 인식하여 폭행, 체벌, 및 가혹행위 중 적어도 하나와 관련된 보육사(교사)의 행동을 분석할 수 있다.In addition, the object detection unit 210 recognizes an action (action) by analyzing the pose (posture) of each object of the childcare child and the childcare worker (teacher), and the recognized action (action) corresponds to violence such as kicking and punching. You can determine if it is related. Here, the object detector 210 may analyze the behavior of the object using at least one of a posture estimation model (eg, PoseNet, body key point) and a frame analysis model per time (eg, LSTM). For example, the object detection unit 210 recognizes an action in which a childcare worker (teacher) takes a pose (posture) instructing a specific direction, and childcare children suddenly flock to one side and wait, or all bow their heads, and then assault or punish them. It is possible to analyze the childcare worker's (teacher's) behavior related to at least one of , , and harsh behavior.

또한, 상기 객체 검출부(210)는 보육아동과 보육사(교사) 각각의 객체 사이의 도심(중심) 거리를 측정할 수 있다. 여기서, 상기 객체 검출부(210)는 도 4를 참조하여 측정된 도심 거리를 이용하여 겹치는 객체(사람)을 확인할 수 있다. 예를 들면, 상기 객체 검출부(210)는 복수의 객체 각각의 상체를 기준으로 서로의 상체 사이의 중심 거리를 측정하고, 측정된 거리가 객체의 팔 길이 이하일 경우 복수의 객체가 겹치는 것으로 판단할 수 있다.In addition, the object detection unit 210 may measure the center (centre) distance between each object of the child care child and the child care teacher (teacher). Here, the object detection unit 210 may identify overlapping objects (people) using the measured city center distance with reference to FIG. 4 . For example, the object detection unit 210 may measure the center distance between the upper bodies of each of the plurality of objects based on the upper body, and determine that the plurality of objects overlap when the measured distance is equal to or less than the arm length of the object. there is.

한편, 상기 객체 검출부(210)는 상기 촬영장치(100)로부터 수신된 감시 영상으로부터 오디오 분석을 통해 오디오 데이터를 추출할 수 있다.Meanwhile, the object detector 210 may extract audio data from the surveillance image received from the photographing device 100 through audio analysis.

상기 객체 분석부(220)는 겹치는 것으로 확인된 객체(사람)들의 얼굴 감지 및 얼굴 식별을 통해 가해자 및 피해자를 확인할 수 있다. 여기서, 객체 분석부(220)는 도 5와 같은 얼굴인식 모델(RetinaFace)을 이용하여 안면을 감지할 수 있고, 도 6과 같은 얼굴 특징정보 추출모델(Arcface)을 이용하여 얼굴의 특징을 추출하고, 특징정보 유사도 분석 알고리즘(예: Cosign Similarity)을 이용하여 안면을 식별할 수 있다.The object analyzer 220 may identify an assailant and a victim through face detection and face identification of objects (people) identified as overlapping. Here, the object analyzer 220 may detect a face using a face recognition model (RetinaFace) as shown in FIG. 5, extract facial features using a facial feature information extraction model (Arcface) as shown in FIG. 6, and , The face can be identified using a feature information similarity analysis algorithm (eg, Cosign Similarity).

또한, 상기 객체 분석부(220)는 도 7과 같은 표정분석(Facial Expression) 모델을 이용하여 시간 경과에 따른 피해자의 표정 변화 및/또는 감정 변화를 반영하도록, 피해자의 얼굴 상태가 포함된 장면, 및 피해자의 포즈 중 적어도 하나를 포함하는 상기 변화요소를 분석할 수 있다. 이를 통해, 상기 객체 분석부(220)는 상기 변화요소를 분석하여 피해자의 표정 변화 및/또는 감정 변화가 폭력과 관련이 있는지 확인할 수 있다. 예를 들면, 상기 객체 분석부(220)는 폭행, 체벌 또는 가혹행위를 당한 보육아동의 표정 변화를 통해 놀람, 슬픔, 눈물 흘림, 분노, 두려움 등의 피해자 감정을 확인할 수 있다.In addition, the object analysis unit 220 uses a facial expression model as shown in FIG. 7 to reflect the victim's facial expression change and / or emotion change over time, a scene including the victim's facial state, And it is possible to analyze the change factor including at least one of the victim's pose. Through this, the object analyzer 220 may analyze the change factors to determine whether the change in expression and/or emotion of the victim is related to violence. For example, the object analyzer 220 may check victim emotions such as surprise, sadness, tears, anger, fear, etc. through a change in facial expression of a child who has been assaulted, punished, or abused.

한편, 상기 객체 분석부(220)는 상기 객체 검출부(210)로부터 추출된 오디오 데이터를 수신하여 사운드를 인식할 수 있다. 여기서, 상기 객체 분석부(220)는 오디오 데이터에 인공지능 알고리즘을 이용하여 학습된 학습 모델을 적용하여 비명, 울음, 체벌소리 등의 사운드를 인식할 수 있다. 또한, 상기 객체 분석부(220)는 인식된 사운드가 비명, 구타, 넘어짐 소리 등과 같이 폭력과 관련된 것인지 확인할 수 있다.Meanwhile, the object analyzer 220 may recognize sound by receiving audio data extracted from the object detector 210 . Here, the object analyzer 220 may recognize sounds such as screaming, crying, and corporal punishment by applying a learning model learned using an artificial intelligence algorithm to audio data. In addition, the object analyzer 220 may check whether the recognized sound is related to violence, such as a scream, a beating, or a fall sound.

상기 폭력발생 판단부(230)는 상기 객체 분석부(220)에서 분석된 변화요소와 인식된 사운드를 폭력발생 판단기준과 비교하여 보육아동에게 폭력이 발생했는지 판단할 수 있다.The violence occurrence determination unit 230 may compare the change factors analyzed by the object analysis unit 220 and the recognized sound with a violence occurrence determination standard to determine whether violence has occurred to the childcare child.

구체적으로, 상기 폭력발생 판단부(230)는 피해자의 얼굴 표정, 가해자와 피해자의 포즈 및 사운드 각각을 장면 기반 폭력, 인물포즈 기반 폭력 및 사운드 기반 폭력 각각의 케이스로 분류하고, 케이스별로 설정된 방식으로 스코어를 측정하며, 케이스별로 측정된 스코어를 합산하여 기준 스코어(폭력발생 판단기준)와 비교한 후 현재 상황이 폭력 상황인지 일반 상황인지 판단할 수 있다.Specifically, the violence occurrence determination unit 230 classifies the victim's facial expression, the pose and sound of the offender and the victim into each case of scene-based violence, character pose-based violence, and sound-based violence, and in a manner set for each case. Scores are measured, and scores measured for each case are summed up and compared with a standard score (a criterion for determining the occurrence of violence), and then it can be determined whether the current situation is a violent situation or a general situation.

여기서, 상기 폭력발생 판단부(230)는 현재 시점과 이전 시점의 프레임을 입력 데이터로 하여 딥-러닝(deep-learning) 구성 요소들을 결합한 CNN(Convolutional Neural Networks) + LSTM(Long Short-Term Memory)을 기반으로 만들어진 폭력 감지 모델을 이용하여 폭력 발생을 판단할 수 있다. 예를 들면, 상기 폭력발생 판단부(230)는 장면 기반 폭력 케이스에서 상기 폭력 감지 모델을 이용하여 프레임 내 특징을 학습하는 CNN 기반 분류, 시간별 프레임간 특징을 학습하는 LSTM 기반 분류, 불변/로컬의 특징을 학습하는 광학흐름 기반 분류를 실시하여 폭력 또는 일반을 판단하도록 스코어를 측정할 수 있다. 또는, 상기 폭력발생 판단부(230)는 인물포즈 기반 폭력 케이스에서 프레임 내 특징을 학습하는 CNN 기반 포즈 판단, 객체간 거리 및 기준값 초과 여부 계산을 실시하여 폭력 또는 일반을 판단하도록 스코어를 측정할 수 있다. 또는, 상기 폭력발생 판단부(230)는 사운드 기반 폭력 케이스에서 LSTM 기반의 사운드 분류를 실시하여 정상 또는 비정상을 판단하도록 스코어를 측정할 수 있다.Here, the violence occurrence determination unit 230 takes the frames of the current time and the previous time as input data and combines deep-learning components CNN (Convolutional Neural Networks) + LSTM (Long Short-Term Memory) It is possible to determine the occurrence of violence using a violence detection model based on . For example, the violence occurrence determination unit 230 uses the violence detection model in a scene-based violence case to use CNN-based classification to learn intra-frame features, LSTM-based classification to learn inter-frame features over time, and immutable/local By performing an optical flow-based classification that learns features, it can be scored to determine violence or generality. Alternatively, the violence occurrence determination unit 230 may measure scores to determine violence or general by performing CNN-based pose determination that learns characteristics within a frame in a person pose-based violence case, distance between objects, and calculation of whether or not a reference value is exceeded. there is. Alternatively, the violence occurrence determination unit 230 may perform LSTM-based sound classification in the sound-based violence case and measure a score to determine whether it is normal or abnormal.

이때, 상기 폭력발생 판단부(230)는 하기 수학식 1을 이용하여 스코어를 계산할 수 있다.At this time, the violence occurrence determination unit 230 may calculate a score using Equation 1 below.

[수학식 1][Equation 1]

여기서,

는 분석된 감시 영상의 총 시간이고,

는 피해자의 변화요소(얼굴 표정/감정) 분석에 따른 폭력 여부를 0 또는 1로 표현한 값이고,

는 피해자의 변화요소에 따른 가중치를 수치로 나타낸 값이며,

는 피해자의 변화요소 분석에 따른 폭력 확률을 0.00 내지 1.00로 표현한 값이다.here,

is the total time of the analyzed surveillance video,

is a value expressed as 0 or 1 whether or not there is violence according to the analysis of the victim's changing factors (facial expression/emotion),

is a numerical value representing the weight according to the change factor of the victim,

is a value expressing the probability of violence from 0.00 to 1.00 according to the analysis of the changing factors of the victim.

또한, 상기 폭력발생 판단부(230)는 폭력 단계를 의심(스코어 0~10), 보통(스코어 10~20), 위험(스코어 20~40), 심각(스코어 40~200)으로 구분짓고, 합산된 스코어가 위험 이상일 경우, 현재 상황이 폭력 상황인 것으로 판단할 수 있다.In addition, the violence occurrence determination unit 230 classifies the violence level into suspicious (score 0 to 10), normal (score 10 to 20), dangerous (score 20 to 40), and serious (score 40 to 200), and sums If the scored score is greater than or equal to danger, it can be determined that the current situation is a violent situation.

상기 폭력발생 판단부(230)는 폭력 상황으로 판단한 경우, 상기 관제서버(300)로 폭력 발생 알림과 함께 의심징후의 감시 영상을 전송할 수 있다.When the violence determination unit 230 determines that a violent situation occurs, the surveillance video of the suspicious sign may be transmitted to the control server 300 together with the violence occurrence notification.

상기 관제서버(300)는 상기 폭력감지장치(200)로부터 수신한 의심징후 감시 영상을 관제사 및/또는 근무자가 재확인하여 폭력 발생을 판단하고, 피해자의 보호자에게 폭력 발생을 통지할 수 있다. 이때, 상기 관제서버(300)는 보호자의 사용자 단말기(400)로 문자, 영상, 소리 중 적어도 하나의 형태로 폭력 발생을 통지할 수 있다.In the control server 300 , a controller and/or a worker reconfirms the suspicious symptom monitoring image received from the violence detection device 200 to determine whether violence has occurred, and can notify the guardian of the victim of the occurrence of violence. At this time, the control server 300 may notify the occurrence of violence to the guardian's user terminal 400 in the form of at least one of text, video, and sound.

또한, 상기 관제서버(300)는 감시 영상의 분석을 통해 폭력 발생을 확인한 판정값을 추가 학습용 데이터로 상기 폭력감지장치(200)에 제공할 수 있다. 추가 학습용 데이터를 수신한 상기 폭력감지장치(200)는 AI모델 추가 학습을 통해 폭력 감지 모델의 고도화를 수행할 수 있고, 신규 가중치 파일을 생성, 갱신 및/또는 배포할 수 있다.In addition, the control server 300 may provide the violence detection device 200 with a determination value for confirming the occurrence of violence through analysis of the surveillance video as additional learning data. The violence detection device 200 that has received the additional training data may perform enhancement of the violence detection model through additional AI model learning, and may create, update, and/or distribute a new weight file.

이하에서는, 도 및 도 를 참조하여, 본 발명의 일 실시예에 따른 폭력 감지 방법에 대해서 설명한다. Hereinafter, a violence detection method according to an embodiment of the present invention will be described with reference to FIGS.

도 는 본 발명의 일 실시예에 따른 폭력감지방법을 나타내는 순서도이고, 도 는 본 발명의 일 실시예에 폭력 발생 판단 단계를 세부적으로 나타내는 순서도이다. 여기서는, 상술한 본 발명의 일 실시예에 따른 스마트 폭력감시시스템을 참조하여 폭력감지방법을 설명하되, 편의상 중복된 설명을 생략할 수 있다.FIG. is a flow chart showing a method for detecting violence according to an embodiment of the present invention, and FIG. Here, the violence detection method is described with reference to the smart violence monitoring system according to an embodiment of the present invention described above, but redundant descriptions can be omitted for convenience.

본 발명의 일 실시예에 따른 폭력 감지 방법은 영상 분석을 통해 감시 영상으로부터 객체를 검출하는 단계(S110), 객체별 포즈를 검출하는 단계(S120), 액션을 인식하는 단계(S130), 액션이 폭력과 관련된 것인지 판단하는 단계(S140), 객체간 도심 거리를 측정하는 단계(S150), 측정된 도심 거리를 이용하여 겹치는 객체(사람)를 확인하는 단계(S160), 겹치는 것으로 확인된 객체(사람)들의 얼굴 감지 및 얼굴 식별을 통해 가해자 및 피해자를 확인하고, 시간 경과에 따른 피해자의 표정 변화 및/또는 감정 변화를 반영하는 변화요소를 분석하는 단계(S170), 표정이 폭력과 관련된 것인지 판단하는 단계(S180), 감시 영상으로부터 오디오를 분석하는 단계(S210), 오디오 분석을 통해 추출된 오디오 데이터를 이용하여 사운드를 인식하는 단계(S220), 인식된 사운드가 폭력과 관련이 있는지 판단하는 단계(S230), 및 변화요소와 인식된 사운드를 폭력발생 판단기준과 비교하여 폭력 발생을 판단하는 단계(S300) 및 폭력 발생 알림을 송신하는 단계를 포함할 수 있다.The violence detection method according to an embodiment of the present invention includes detecting an object from a surveillance video through image analysis (S110), detecting a pose for each object (S120), recognizing an action (S130), Determining whether it is related to violence (S140), measuring the distance between objects (S150), checking overlapping objects (people) using the measured distance (S160), overlapping objects (persons) ) through face detection and face identification to identify perpetrators and victims, and analyzing change factors reflecting changes in the victim's expression and/or emotion over time (S170), determining whether the expression is related to violence Step (S180), analyzing audio from surveillance images (S210), recognizing sound using audio data extracted through audio analysis (S220), determining whether the recognized sound is related to violence ( S230), determining the occurrence of violence by comparing the change factor and the recognized sound with a criterion for determining occurrence of violence (S300), and transmitting a notification of occurrence of violence.

단계 S110에서는, 객체 검출부(210)가 촬영장치(100)를 통해 촬영된 감시 영상으로부터 영상 분석을 통해 적어도 하나의 객체를 검출할 수 있다.In step S110 , the object detector 210 may detect at least one object from the surveillance image captured by the photographing device 100 through image analysis.

단계 S120에서는, 객체 검출부(210)가 검출된 객체에서 주요 연결부위를 연결하여 포즈를 검출할 수 있다.In step S120, the object detector 210 may detect a pose by connecting main connection parts in the detected object.

단계 S130에서는, 객체 검출부(210)가 객체의 포즈(자세)를 분석하여 액션(행동)을 인식할 수 있다. In step S130, the object detector 210 may recognize an action (behavior) by analyzing the pose (posture) of the object.

단계 S140에서는, 객체 검출부(210)가 인식된 액션(행동)이 발차기, 주먹질 등의 폭력과 관련된 것인지 판단할 수 있다.In step S140, the object detector 210 may determine whether the recognized action (behavior) is related to violence such as kicking or punching.

단계 S150에서는, 객체 검출부(210)가 보육아동과 보육사(교사) 각각의 객체 사이의 도심(중심) 거리를 측정할 수 있다.In step S150, the object detection unit 210 may measure the center (centre) distance between each object of the child care child and the caregiver (teacher).

단계 S160에서는, 객체 검출부(210)가 측정된 도심 거리를 이용하여 겹치는 객체(사람)을 확인할 수 있다.In step S160, the object detection unit 210 may check overlapping objects (people) using the measured city center distance.

단계 S170에서는, 객체 분석부(220)가 겹치는 것으로 확인된 객체(사람)들의 얼굴 감지 및 얼굴 식별을 통해 가해자 및 피해자를 확인할 수 있다. 여기서, 객체 분석부(220)는 얼굴인식 모델(RetinaFace)을 이용하여 안면을 감지할 수 있고, 얼굴 특징정보 추출모델(Arcface)을 이용하여 얼굴의 특징을 추출하고, 특징정보 유사도 분석 알고리즘(예: Cosign Similarity)을 이용하여 안면을 식별할 수 있다. 또한, 상기 객체 분석부(220)는 시간 경과에 따른 피해자의 표정 변화 및/또는 감정 변화를 반영하도록, 피해자의 얼굴 상태가 포함된 장면, 및 피해자의 포즈 중 적어도 하나를 포함하는 변화요소를 분석할 수 있다.In step S170, the object analyzer 220 may identify an assailant and a victim through face detection and face identification of objects (people) identified as overlapping. Here, the object analyzer 220 may detect a face using a face recognition model (RetinaFace), extract facial features using a facial feature information extraction model (Arcface), and use a feature information similarity analysis algorithm (eg : Faces can be identified using Cosign Similarity. In addition, the object analyzer 220 analyzes a change factor including at least one of a scene including the victim's face state and a victim's pose so as to reflect the victim's expression change and/or emotion change over time. can do.

단계 S210에서는, 객체 검출부(210)가 촬영장치(100)로부터 수신된 감시 영상으로부터 오디오 분석을 통해 오디오 데이터를 추출할 수 있다.In step S210, the object detector 210 may extract audio data from the surveillance image received from the photographing apparatus 100 through audio analysis.

단계 S220에서는, 객체 분석부(220)가 객체 검출부(210)로부터 추출된 오디오 데이터를 수신하여 사운드를 인식할 수 있다.In step S220, the object analyzer 220 may receive the audio data extracted from the object detector 210 and recognize the sound.

단계 S230에서는, 객체 분석부(220)가 인식된 사운드를 비명, 구타, 넘어짐 소리 등과 같이 폭력과 관련된 소리인지 확인할 수 있다.In step S230, the object analyzer 220 may determine whether the recognized sound is a sound related to violence, such as a scream, a beating, or a fall.

단계 S300은, 장면 기반 폭력 케이스에서 상기 폭력 감지 모델을 이용하여 프레임 내 특징을 학습하는 CNN 기반 분류, 시간별 프레임간 특징을 학습하는 LSTM 기반 분류, 불변/로컬의 특징을 학습하는 광학흐름 기반 분류를 실시하여 폭력 또는 일반을 판단하도록 스코어를 측정하는 단계, 인물포즈 기반 폭력 케이스에서 프레임 내 특징을 학습하는 CNN 기반 포즈 판단, 객체간 거리 및 기준값 초과 여부 계산을 실시하여 폭력 또는 일반을 판단하도록 스코어를 측정하는 단계, 사운드 기반 폭력 케이스에서 LSTM 기반의 사운드 분류를 실시하여 정상 또는 비정상을 판단하도록 스코어를 측정하는 단계, 및 케이스별로 설정된 방식으로 스코어를 측정하고, 측정된 스코어를 합산한 후 기준 스코어와 비교하여 폭력 발생을 판단하는 단계를 포함할 수 있다.In step S300, CNN-based classification for learning intra-frame features using the violence detection model in a scene-based violence case, LSTM-based classification for learning inter-frame features over time, and optical flow-based classification for learning immutable/local features score to determine violence or general, CNN-based pose judgment that learns features within a frame in case of person pose-based violence, distance between objects and whether a reference value is exceeded, and score to judge violence or general Step of measuring, measuring scores to determine normal or abnormal by performing LSTM-based sound classification in sound-based violence cases, and measuring scores in a manner set for each case, summing the measured scores, and then obtaining a reference score and It may include a step of determining the occurrence of violence by comparing.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 후술하는 청구범위에 의하여 나타내어지며, 청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be interpreted as being included in the scope of the present invention.

100: 촬영장치
200: 폭력감지장치
210: 객체 검출부
220: 객체 분석부
230: 폭력발생 판단부
300: 관제서버
400: 사용자 단말기100: photographing device
200: violence detection device
210: object detection unit
220: object analysis unit
230: violence occurrence determination unit
300: control server
400: user terminal

Claims

an object detector configured to detect an object by analyzing the surveillance image and to analyze at least one of a pose and a motion of the detected object;
an object analysis unit that identifies an offender and a victim through face detection and face identification, and analyzes change factors set to reflect the victim's damage situation over time; and
a violence occurrence determination unit that determines whether the analyzed change factors satisfy violence occurrence conditions and transmits the determination result to the outside;
Including, violence detection device.

According to claim 1,
The object detection unit,
Characterized by analyzing the pose of the detected object to recognize the action, using a set analysis model to analyze whether the recognized action is related to violence, and measuring the city center distance between a plurality of objects to identify overlapping objects, violence detector.

According to claim 2,
The object analysis unit,
An apparatus for detecting violence characterized by identifying an offender and a victim by performing face detection and face identification of overlapping objects using at least one of a face recognition model and a facial feature information extraction model.

According to claim 3,
The object analysis unit,
A violence detection device characterized by analyzing the change factor including at least one of a scene including a victim's face state and a victim's pose to reflect a change in facial expression and/or emotion over time.

According to claim 1,
The object detection unit,
Extracting audio data from the surveillance video through audio analysis;
The object analysis unit,
Characterized in that the sound is recognized by receiving the audio data extracted from the object detection unit, the violence detection device.

According to claim 5,
The Violence Determination Division,
The facial expression of the victim, the pose and sound of the perpetrator and the victim are classified into each case of scene-based violence, character pose-based violence, and sound-based violence, scores are measured in a method set for each case, and the measured scores are summed up. A violence detection device characterized in that it determines the occurrence of violence by comparing it with a reference score.

According to claim 6,
The Violence Determination Division,
In scene-based violence cases, CNN-based classification that learns intra-frame features using the violence detection model, LSTM-based classification that learns inter-frame features over time, and optical flow-based classification that learns immutable/local features are used to detect violence or Scoring to judge the general, or
In the case of person pose-based violence, CNN-based pose judgment that learns features within a frame, distance between objects, and calculation of whether or not a reference value is exceeded are used to measure scores to determine violence or general,
An apparatus for detecting violence characterized by measuring scores to determine normal or abnormal by performing LSTM-based sound classification in sound-based violence cases.

A photographing device installed in a childcare facility and generating a surveillance image by photographing at least one subject;
A violence detection device according to any one of claims 1 to 7; and
a control server that receives and reconfirms the surveillance video of the violence occurrence notification and suspicious signs from the violence detection device, and notifies the guardian of the victim of the occurrence of violence;
Including, smart violence monitoring system.

According to claim 8,
The control server,
A smart violence monitoring system, characterized in that for providing a determination value confirming the occurrence of violence through analysis of surveillance video to the violence detection device as additional learning data.

Detecting an object from a surveillance image through image analysis;
Detecting a pose for each object, recognizing an action, and determining whether the recognized action is related to violence;
measuring centroid distances between objects and identifying overlapping objects using the measured centroid distances;
Identifying an assailant and a victim through face detection and face identification of objects identified as overlapping, and analyzing change factors reflecting changes in facial expression and/or emotion of the victim over time;
recognizing a sound through audio analysis and determining whether the recognized sound is related to violence; and
determining the occurrence of violence by comparing the change factor and the recognized sound with a criterion for determining occurrence of violence;
Including, violence detection method.

According to claim 10,
In the step of determining the occurrence of violence,
In scene-based violence cases, CNN-based classification that learns intra-frame features using the violence detection model, LSTM-based classification that learns inter-frame features over time, and optical flow-based classification that learns immutable/local features are used to detect violence or measuring the score to judge the general;
Measuring a score to determine violence or general by performing a CNN-based pose judgment that learns characteristics within a frame in a case of person pose-based violence, distance between objects, and calculation of whether a reference value is exceeded;
Measuring a score to determine normal or abnormal by performing LSTM-based sound classification in the sound-based violence case; and
Measuring scores in a manner set for each case, summing up the measured scores, and then comparing them with a reference score to determine occurrence of violence;
Characterized in that it comprises a, violence detection method.