KR20230070601A

KR20230070601A - Sound-based intelligent emergency analysis system and method thereof

Info

Publication number: KR20230070601A
Application number: KR1020210156342A
Authority: KR
Inventors: 김선만; 백광선
Original assignee: 한국광기술원
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2023-05-23

Abstract

The present invention relates to a sound-based intelligent emergency analysis system capable of specifically classifying current violent or emergency situations, and a method thereof. According to the present invention, the sound-based intelligent emergency analysis system comprises: an emergency bell device installed in each high-crime area to collect audio data including sounds or voices generated within the high-crime area to detect an event for an emergency situation from the collected audio data and generate an emergency alarm operation signal; and an analysis server receiving the audio data from the emergency bell device when the emergency bell operation signal is received, extracting the voice data from the audio data, analyzing the voice data to classify a conversational voice situation, analyzing the remaining audio data excluding the voice data from the audio data to classify an audio situation, and integrating the classified conversational voice situations and audio situations to determine crime or emergency situations classified into security levels by stage according to preset classification criteria to provide a situation classification result. The analysis server collects conversational voice-related texts corresponding to the crime or emergency situation as learning data, uses the learning data to train an artificial intelligence-based voice analysis model in advance, and extracts the voice data and classifies the conversational voice situations on the basis of the trained voice analysis model. The server collects audio-related sounds corresponding to crime or emergency situations as learning data, uses the learning data to train an artificial intelligence-based audio analysis model, and classifies the audio situation on the basis of the trained audio analysis model.

Description

Sound-based intelligent emergency analysis system and method thereof

본 발명은 음성 및 음향 신호를 종합적으로 분석하여 현재 폭력 또는 위급 상황을 분류할 수 있는 사운드 기반의 지능형 위급상황 분석 시스템에 관한 것이다.The present invention relates to a sound-based intelligent emergency situation analysis system capable of classifying a current violence or emergency situation by comprehensively analyzing voice and acoustic signals.

이 부분에 기술된 내용은 단순히 본 발명의 일 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The information described in this section merely provides background information on an embodiment of the present invention and does not constitute prior art.

일반적으로, 범죄 예방 시스템은 폭력, 응급 상황 등의 비상 상황 발생 시 신고 및 대응을 할 수 있도록 보안이 취약한 지역에 설치된다. 범죄 예방 시스템 중 방범용 비상벨은 우범 지역과 같이 특정 지역에 설치되어 현장에서 위험상황 발생시 사용자의 조작에 따라 도움을 요청할 수 있는 특정 서버로 신호를 전송하여 관리자가 위험 상황을 감지할 수 있도록 하는 장치이다. In general, crime prevention systems are installed in areas with weak security to report and respond to emergencies such as violence and emergencies. Among the crime prevention systems, the emergency bell for crime prevention is installed in a specific area such as a high-crime area. It is a device.

이러한 방범용 비상벨과 같이 설치되는 감시 카메라는 해당 우범 지역의 일측 상부 영역에 설치되어 위험 상황이 발생하는 경우에 관리자가 촬영된 영상을 확인하여 도움을 주거나, 이후 범죄자를 색출하는데 이용되도록 범행 영상을 녹화하는 기능을 수행한다. 여기서, 감시 카메라는 일반적으로 폐쇄회로 텔레비전(CCTV: Closed Circuit Television)이 사용되고 있으나, 고성능의 카메라가 사용되기도 한다.Surveillance cameras installed like these emergency bells for crime prevention are installed in the upper area on one side of the high-crime area, so that when a dangerous situation occurs, the manager checks the recorded video to help, or to find the criminal later. performs the function of recording. Here, a surveillance camera is generally a Closed Circuit Television (CCTV), but a high-performance camera is also used.

최근, 화장실 등과 같이 공중의 이용이 가능하면서 외부와의 노출이 차단되는 공간(예를 들어, 실내 공공 장소 등)에서, 폭행, 강도, 성추행, 살인 등의 범죄사고가 빈번하게 발생하고 있고, 이에 따라 실내공공 장소를 이용하는 이용자의 불안감이 점차 증가하고 있다. 특히, 여성의 경우 남성에 비교하여 신체적 능력이 낮기 때문에 실내 공공 장소 이용에 대하여 더욱 큰 불안감 및 부담을 가지게 된다.Recently, crimes such as assault, robbery, sexual harassment, and murder frequently occur in spaces (eg, indoor public spaces) where exposure to the outside is blocked while being accessible to the public, such as a bathroom. As a result, the anxiety of users using indoor public spaces is gradually increasing. In particular, since women have lower physical abilities than men, they have greater anxiety and burden about using indoor public spaces.

이에 따라, 실내 공공 장소에서의 위급상황을 미연에 방지함과 동시에 대처하기 위한 비상경보장치에 대한 다양한 연구가 진행되고 있다. 방범용 비상벨은 설치가 간단하며, 조작이 편리한 장점으로 인해 실제 현장에 설치되고 있으나, 비상벨을 구동시키기 위해서는 위급상황에 처한 당사자가 직접 비상벨이 설치된 위치로 이동하여야만 하고, 물리적인 접촉을 통해서만 비상벨을 누를 수 있기 때문에 실제 위급상황에 처한 당사자가 범죄자의 시야에서 비상벨을 누르기가 어렵고, 강제적으로 비상벨의 동작이 정지될 수 있어 위급 상황에 신속한 대응을 할 수 없다는 문제점이 있다. Accordingly, various studies are being conducted on emergency alarm devices for preventing and simultaneously coping with emergency situations in indoor public spaces. Emergency bells for crime prevention are installed on site due to their simple installation and convenient operation. Since the emergency bell can be pressed only through, it is difficult for a party in an actual emergency to press the emergency bell in the criminal's field of view, and the operation of the emergency bell can be forcibly stopped, making it impossible to respond quickly to an emergency situation.

이러한 문제점으로 인해, 마이크로폰을 통해 수집된 음향신호의 데시벨 크기를 임계치에 비교하여 위급상황을 감지하도록 하는 음향 기반의 보안 기술이 연구되었으나, 이러한 방식은 위급상황과 무관한 소리에도 반응하기 때문에 오동작 및 에러가 높아 신뢰도가 떨어지는 문제점이 있다.Due to this problem, a sound-based security technology has been studied to detect an emergency situation by comparing the decibel level of the acoustic signal collected through a microphone with a threshold value. There is a problem of low reliability due to high errors.

최근, 실내 공공 장소에 설치되는 비상벨 장치는 버튼식 비상벨과 음향 인식 모듈이 적용된 비상벨을 함께 사용하고 있으나, 실제로 대화음성과 주변소리를 구분할 수 없어 현재 상황을 인지하기 어렵고, 오동작으로 인해 매일 2~3회 정도 방범 담당자(관할 경찰 등)가 비상벨 장치가 설치된 장소로 출동하고 있어 인력 낭비가 발생하고 있다. Recently, emergency bell devices installed in indoor public places use both a button-type emergency bell and an emergency bell with a sound recognition module. Two to three times a day, security personnel (including the police in charge) are dispatched to the place where the emergency bell device is installed, resulting in a waste of manpower.

실제로, 비상벨 장치로 인한 출동 건수의 85.6%가 취객이나 소음으로 인한 상황이며, 출동 건수의 13.7%가 장난이나 실수로 인한 상황으로서, 99.3%가 실제 범죄 상황이 아닌 비범죄 상황에서의 불필요한 출동이 되고 있다. 이로 인해, 비상벨 장치가 동작하는 장소에 출동한 방범 당당자는 주로 범죄 상황 대처보다는 범죄 발생 여부에 대한 진위 파악을 하고 있는 실정이다. In fact, 85.6% of mobilizations caused by emergency bell devices were caused by drunkenness or noise, 13.7% of mobilizations were due to pranks or mistakes, and 99.3% were unnecessary mobilizations in non-criminal situations, not actual crimes. is becoming Due to this, the crime prevention person who is dispatched to the place where the emergency bell device operates is mainly trying to determine the authenticity of whether a crime has occurred rather than coping with a crime situation.

이와 같이, 종래에는 비상벨 장치의 동작시 먼저 초동 인력이 출동하여 범죄 발생에 대한 진위 여부를 파악하고, 실제 범죄 상황 발생시 대응 인력이 재출동하여 범죄 상황에 대처하고 있으므로 범죄 대처에 필요한 출동 지연이 발생할 뿐만 아니라 신속한 범죄 대처가 어렵게 되는 문제점이 있다. As such, in the prior art, when the emergency bell device operates, first responders are first dispatched to determine the authenticity of a crime, and when an actual crime occurs, response personnel are re-mobilized to deal with the crime. There is a problem that not only occurs, but also makes it difficult to quickly cope with crime.

본 발명은 전술한 문제점을 해결하기 위하여, 본 발명의 일 실시예에 따라 현장에서 발생되는 음성 및 음향 신호를 종합적으로 분석하여 현재 폭력 또는 윕급 상황을 구체적으로 분류할 수 있는 사운드 기반의 지능형 위급상황 분석 시스템 및 그 방법을 제공하는 것에 목적이 있다.In order to solve the above problems, the present invention comprehensively analyzes voice and acoustic signals generated in the field according to an embodiment of the present invention to provide a sound-based intelligent emergency situation that can specifically classify the current violence or whip-level situation. It is an object to provide an analysis system and method therefor.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서 본 발명의 일 실시예에 따른 사운드 기반의 지능형 위급상황 분석 시스템은, 각 우범 지역에 설치되어, 해당 우범 지역 내에서 발생되는 음향 또는 음성을 포함한 오디오 데이터를 수집하여 수집된 오디오 데이터에서 위급 상황에 대한 이벤트를 감지하여 비상벨 작동 신호를 발생하는 비상벨 장치; 및 상기 비상벨 작동 신호가 수신되면 상기 비상벨 장치로부터 오디오 데이터를 수신하고, 상기 오디오 데이터에서 음성 데이터를 추출한 후 음성 데이터를 분석하여 대화음성 상황을 분류하고, 상기 오디오 데이터에서 음성 데이터를 제외한 나머지 음향 데이터를 분석하여 음향 상황을 분류하며, 상기 분류된 대화음성 상황과 음향 상황을 통합하여 기 설정된 분류 기준에 따라 단계별 보안 레벨로 구분되는 범죄 또는 위급 상황을 판별하여 상황 분류 결과를 제공하는 분석 서버를 포함하되, 상기 분석 서버는, 상기 범죄 또는 위급 상황에 해당하는 대화음성 관련 텍스트들을 학습 데이터로 수집하고, 상기 학습 데이터를 이용하여 인공 지능 기반의 음성 분석 모델을 사전에 학습한 후 학습된 음성 분석 모델에 기반하여 음성 데이터 추출 및 대화음성 상황을 분류하며, 상기 범죄 또는 위급 상황에 해당하는 음향 관련 소리들을 학습 데이터로 수집하고, 상기 학습 데이터를 이용하여 인공 지능 기반의 음향 분석 모델을 학습한 후 학습된 음향 분석 모델에 기반하여 음향 상황을 분류하는 것이다.As a technical means for achieving the above technical problem, the sound-based intelligent emergency situation analysis system according to an embodiment of the present invention is installed in each high-crime area, and audio data including sound or voice generated in the high-crime area An emergency bell device for generating an emergency bell operation signal by detecting an emergency event in the collected audio data by collecting; and when the emergency bell operating signal is received, audio data is received from the emergency bell device, voice data is extracted from the audio data, voice data is analyzed, and conversational voice situations are classified, and the remaining audio data excluding voice data is classified. An analysis server that analyzes sound data, classifies sound situations, integrates the classified conversational voice situations and sound situations, determines crime or emergency situations classified into security levels for each stage according to preset classification standards, and provides situation classification results. Including, but the analysis server collects texts related to conversational voice corresponding to the crime or emergency situation as learning data, learns an artificial intelligence-based voice analysis model in advance using the learning data, and then learns the learned voice. Based on the analysis model, extracting voice data and classifying conversational voice situations, collecting sound-related sounds corresponding to the crime or emergency situation as learning data, and learning an artificial intelligence-based acoustic analysis model using the learning data Then, the acoustic situation is classified based on the learned acoustic analysis model.

본 발명에 일측면에 따르면, 사운드 기반의 지능형 위급상황 분석 시스템은, 상기 상황 분류 결과가 수신되면 상기 비상벨 작동 신호가 발생된 우범 지역을 관할하는 보안 단말에 상기 상황 분류 결과에 근거한 보안 레벨에 따라 현장 출동 정보 또는 상황 대처 정보를 제공하는 관제 서버를 더 포함하는 것이다. According to one aspect of the present invention, the sound-based intelligent emergency situation analysis system, when the situation classification result is received, determines the security level based on the situation classification result to the security terminal in charge of the high-crime area where the emergency bell operation signal is generated. Accordingly, it further includes a control server providing on-site dispatch information or situation response information.

상기 분석 서버는, 상기 비상벨 장치로부터 현장의 오디오 데이터를 수신하고, 외부 장치와의 송수신을 수행하는 통신부; 상기 오디오 데이터에서 음성 데이터를 추출하는 음성 추출부; 추출된 음성 데이터를 한국어 기반의 대화음성 텍스트로 변환하고, 대화음성텍스트에 기반하여 대화음성 상황을 분석하는 대화음성 상황 분석부; 상기 오디오 데이터에서 음성 데이터를 제외한 나머지 음향 데이터를 분석하여 음향 상황을 분석하는 음향 상황 분석부; 및 상기 대화음성 상황 분석부에서 분석한 대화음성 상황과 음향 상황 분석부에서 분석한 음향 상황을 통합하여 기 설정된 분류 기준에 따라 단계별 범죄 코드로 구분되는 범죄 또는 위급 상황을 판별하여 상황 분류 결과를 제공하는 상황 판단부를 포함하는 것이다. The analysis server may include a communication unit for receiving audio data of the site from the emergency bell device and performing transmission and reception with an external device; a voice extraction unit extracting voice data from the audio data; a conversational speech situation analysis unit that converts the extracted speech data into Korean-based conversational speech text and analyzes a conversational speech situation based on the conversational speech text; an acoustic situation analyzer configured to analyze an acoustic situation by analyzing the remaining audio data excluding voice data from the audio data; and by integrating the conversational voice situation analyzed by the conversational voice situation analysis unit and the acoustic situation analyzed by the acoustic situation analysis unit to determine crimes or emergency situations classified by step-by-step crime codes according to preset classification criteria and provide situation classification results. It includes a situation judgment unit that does.

상기 음성 추출부는 상기 오디오 데이터에서 언어종류별 음성 발생 여부를 판별하여 언어종류 정보를 상기 대화음성 상황 분석부로 제공하고, 상기 대화음성 상황 분석부는 상기 언어종류 정보에 기초하여 추출된 음성 데이터를 문자데이터로 변환하고, 변환된 문자데이터를 한국어 기반의 대화음성 텍스트로 번역하는 것이다. The voice extraction unit determines whether or not voice is generated for each language type in the audio data and provides language type information to the conversational voice situation analysis unit, and the conversational voice situation analysis unit converts voice data extracted based on the language type information into text data. It converts and translates the converted text data into Korean-based dialogue voice text.

상기 음성 분석 모델은 음성 추출부, 대화음성 상황 분석부, 음향 상황 분석부, 상황 판단부를 위해 각각의 심층신경망(DNN, deep neural networks)을 포함하는 다층 신경망 구조로 형성되는 것이다.The voice analysis model is formed as a multi-layer neural network structure including deep neural networks (DNNs) for a voice extraction unit, a dialogue voice situation analysis unit, an acoustic situation analysis unit, and a situation determination unit.

본 발명의 일 실시예에 따른 사운드 기반의 지능형 위급상황 분석 방법에 따르면, 음향 기반의 비상벨 장치와 연동하여 위급 상황을 분석하는 분석 서버에 의해 수행되는 사운드 기반의 지능형 위급상황 분석 방법에 있어서, a) 기 설정된 우범 지역에 설치된 비상벨 장치로부터 비상벨 작동 신호가 감지되면, 해당 우범 지역 내에서 발생되는 현장의 오디오 데이터를 수신하는 단계; b) 상기 오디오 데이터에서 음성 데이터를 추출하고, 추출된 음성 데이터에 기반하여 대화음성 상황을 분석하는 단계; c) 상기 오디오 데이터에서 음향 데이터를 추출하고, 추출된 음향 데이터에 기반하여 음향 상황을 분석하는 단계; 및 d) 상기 대화음성 상황과 음향 상황을 통합하여 기 설정된 분류 기준에 따라 단계별 보안 레벨로 구분되는 범죄 또는 위급 상황을 판별하여 상황 분류 결과를 제공하는 단계를 포함하는 것이다.According to the sound-based intelligent emergency situation analysis method according to an embodiment of the present invention, in the sound-based intelligent emergency situation analysis method performed by an analysis server that analyzes an emergency situation in conjunction with a sound-based emergency bell device, a) when an emergency bell operating signal is detected from an emergency bell device installed in a pre-set high crime area, receiving audio data from the site generated within the high crime area; b) extracting voice data from the audio data and analyzing a dialogue voice situation based on the extracted voice data; c) extracting sound data from the audio data and analyzing a sound situation based on the extracted sound data; and d) integrating the dialogue voice situation and the acoustic situation to determine a crime or emergency situation classified into security levels by stages according to a preset classification criterion, and providing a situation classification result.

사운드 기반의 지능형 위급상황 분석 방법은, e) 상기 비상벨 작동 신호가 발생된 우범 지역을 관할하는 보안 단말에 상기 상황 분류 결과에 근거하여 현장 출동 정보 또는 상황 대처 정보를 제공하는 단계를 더 포함하는 것이다. The sound-based intelligent emergency situation analysis method further comprises: e) providing on-site dispatch information or situation response information based on the situation classification result to a security terminal having jurisdiction over a high crime area where the emergency bell operation signal is generated will be.

상기 b) 단계는, 상기 범죄 또는 위급 상황에 해당하는 대화음성 관련 텍스트들을 학습 데이터로 수집하고, 상기 학습 데이터를 이용하여 인공 지능 기반의 음성 분석 모델을 사전에 학습한 후 학습된 음성 분석 모델에 기반하여 음성 데이터 추출 및 대화음성 상황을 분류하는 것이다. In the step b), texts related to conversational voice corresponding to the crime or emergency situation are collected as learning data, an artificial intelligence-based voice analysis model is pre-learned using the learning data, and then the learned voice analysis model is used. Based on this, it is to extract voice data and classify conversational voice situations.

상기 음성 분석 모델은, 상기 오디오 데이터에서 언어종류별 음성 발생 여부를 판별하고, 판별된 언어종류 정보에 기초하여 추출된 음성 데이터를 문자데이터로 변환하고, 변환된 문자데이터를 한국어 기반의 대화음성 텍스트로 번역하여 대화음성 상황을 분류하는 것이다. The voice analysis model determines whether a voice is generated for each language type in the audio data, converts the extracted voice data into text data based on the determined language type information, and converts the converted text data into Korean-based dialogue voice text. It is to classify conversational voice situations by translating them.

상기 음성 분석 모델은 언어종류별 음생 발생 여부 판별, 음성 데이터 추출, 음성 데이터의 문자데이터 변환, 한국어 기반의 대화음성 텍스트 번역, 대화음성 상황 분류를 위해 각각의 심층신경망(DNN, deep neural networks)을 포함하는 다층 신경망 구조로 형성되는 것이다. The speech analysis model includes deep neural networks (DNNs) for determining whether consonants are generated by language type, extracting speech data, converting speech data to text data, translating Korean-based dialogue speech text, and classifying dialogue speech situations. It is formed as a multi-layer neural network structure.

상기 c) 단계는, 상기 범죄 또는 위급 상황에 해당하는 음향 관련 소리들을 학습 데이터로 수집하고, 상기 학습 데이터를 이용하여 인공 지능 기반의 음향 분석 모델을 학습한 후 학습된 음향 분석 모델에 기반하여 음향 상황을 분류하는 것이다. In the step c), after collecting sound-related sounds corresponding to the crime or emergency as learning data, and learning an artificial intelligence-based sound analysis model using the learning data, the acoustic sound analysis model is based on the learned sound analysis model. to classify the situation.

전술한 본 발명의 과제 해결 수단에 의하면, 본 발명은 현장의 오디오 데이터에서 음성 데이터를 검출하여 대화음성 상황을 인지하여 폭력 또는 위급상황에 대한 초기 인지가 가능할 뿐만 아니라 현장의 오디오 데이터에서 검출된 음향 데이터를 분석하여 음향 상황을 분류한 후 대화음성 상황과 음향 상황을 융합하여 폭력 또는 위급상황에 대한 범죄레벨을 단계별로 분류할 수 있고, 그로 인해 비상벨 서비스의 실효성에 대한 사회적 불신과 활용 저하를 방지할 수 있고, 양질의 범죄 안전 관련 서비스를 제공할 수 있는 효과가 있다. According to the above-described problem solving means of the present invention, the present invention detects voice data from audio data in the field and recognizes a dialogue voice situation, thereby enabling initial recognition of violence or an emergency situation, as well as sound detected from audio data in the field. After analyzing the data and classifying the sound situation, it is possible to classify the crime level for violence or emergency situation by step by integrating the dialogue voice situation and the sound situation, thereby reducing social distrust and deterioration of utilization of the emergency bell service. It has the effect of preventing crime and providing quality crime safety-related services.

도 1은 본 발명의 일 실시예에 따른 사운드 기반의 지능형 위급상황 분석 시스템의 구성을 설명하는 도면이다.
도 2는 본 발명의 일 실시예에 따른 분석 서버의 구성을 설명하는 도면이다.
도 3은 본 발명의 일 실시예에 따른 범죄 상황별 분류된 범죄 코드를 설명하는 도면이다.
도 4는 본 발명의 일 실시예에 따른 사운드 기반의 지능형 위급상황 분석 방법을 설명하는 순서도이다.
도 5는 종래 기술의 일 실시예에 따른 Wave U-Net 구조를 설명하는 도면이고, 도 6은 본 발명의 일 실시예에 따른 Nested Wave U-Net 구조의 음성 분석 모델을 설명하는 도면이다.
도 7은 본 발명의 일 실시예에 따른 분석 서버에서 대화음성 상황 및 음향 상황을 분석하는 과정을 설명하는 도면이다.
도 8은 본 발명의 일 실시예에 따른 음성종류별 음성 발생 여부를 판별하는 과정을 설명하는 순서도이다.
도 9는 본 발명의 일 실시예에 따른 음성 데이터 추출 과정을 설명하는 순서도이다.
도 10은 본 발명에 일 실시예에 따른 대화음성 상황 분류를 위해 음성 분석 모델을 학습하는 과정을 설명하는 도면이다.
도 11은 본 발명에 일 실시예에 따른 학습된 음성 분석 모델을 이용해 대화음성 상황 분류하는 과정을 설명하는 도면이다.
도 12는 본 발명에 일 실시예에 따른 학습된 음향 분석 모델을 이용해 음향 상황 분류하는 과정을 설명하는 도면이다. 1 is a diagram illustrating the configuration of a sound-based intelligent emergency situation analysis system according to an embodiment of the present invention.
2 is a diagram illustrating the configuration of an analysis server according to an embodiment of the present invention.
3 is a diagram for explaining crime codes classified for each crime situation according to an embodiment of the present invention.
4 is a flowchart illustrating a method for analyzing a sound-based intelligent emergency situation according to an embodiment of the present invention.
5 is a diagram illustrating a Wave U-Net structure according to an embodiment of the prior art, and FIG. 6 is a diagram illustrating a voice analysis model of a Nested Wave U-Net structure according to an embodiment of the present invention.
7 is a diagram explaining a process of analyzing a conversational voice situation and an acoustic situation in an analysis server according to an embodiment of the present invention.
8 is a flowchart illustrating a process of determining whether a voice is generated for each voice type according to an embodiment of the present invention.
9 is a flowchart illustrating a process of extracting voice data according to an embodiment of the present invention.
10 is a diagram explaining a process of learning a voice analysis model for classifying conversational voice situations according to an embodiment of the present invention.
11 is a diagram explaining a process of classifying conversational voice situations using a learned voice analysis model according to an embodiment of the present invention.
12 is a diagram explaining a process of classifying acoustic situations using a learned acoustic analysis model according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that it may further include other components, not excluding other components, unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the possibility of existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

본 명세서에서 ‘단말’은 휴대성 및 이동성이 보장된 무선 통신 장치일 수 있으며, 예를 들어 스마트 폰, 태블릿 PC 또는 노트북 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치일 수 있다. 또한, ‘단말’은 네트워크를 통해 다른 단말 또는 서버 등에 접속할 수 있는 PC 등의 유선 통신 장치인 것도 가능하다. 또한, 네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. In this specification, a 'terminal' may be a wireless communication device with guaranteed portability and mobility, and may be, for example, any type of handheld-based wireless communication device such as a smart phone, a tablet PC, or a laptop computer. Also, the 'terminal' may be a wired communication device such as a PC capable of accessing other terminals or servers through a network. In addition, a network refers to a connection structure capable of exchanging information between nodes such as terminals and servers, such as a local area network (LAN), a wide area network (WAN), and the Internet (WWW : World Wide Web), wired and wireless data communications network, telephone network, and wired and wireless television communications network.

무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasonic communication, visible light communication (VLC: Visible Light Communication), LiFi, and the like, but are not limited thereto.

이하의 실시예는 본 발명의 이해를 돕기 위한 상세한 설명이며, 본 발명의 권리 범위를 제한하는 것이 아니다. 따라서 본 발명과 동일한 기능을 수행하는 동일 범위의 발명 역시 본 발명의 권리 범위에 속할 것이다.The following examples are detailed descriptions for better understanding of the present invention, and do not limit the scope of the present invention. Therefore, inventions of the same scope that perform the same functions as the present invention will also fall within the scope of the present invention.

또한, 본 발명의 각 실시예에 포함된 각 구성, 과정, 공정 또는 방법 등은 기술적으로 상호간 모순되지 않는 범위 내에서 공유될 수 있다.In addition, each configuration, process, process or method included in each embodiment of the present invention may be shared within a range that does not contradict each other technically.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 사운드 기반의 지능형 위급상황 분석 시스템의 구성을 설명하는 도면이다. 1 is a diagram illustrating the configuration of a sound-based intelligent emergency situation analysis system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 사운드 기반의 지능형 위급상황 분석 시스템은, 적어도 하나 이상의 비상벨 장치(100), 분석 서버(200) 및 관제 서버(300)를 포함하지만 이에 한정되지는 않는다.Referring to FIG. 1, the sound-based intelligent emergency situation analysis system according to an embodiment of the present invention includes at least one emergency bell device 100, an analysis server 200, and a control server 300, but is limited thereto. It doesn't work.

비상벨 장치(100)는 각 우범 지역에 설치되어, 해당 우범 지역 내에서 발생되는 음향 또는 음성을 포함한 오디오 데이터를 수집하여 수집된 오디오 데이터에서 위급 상황에 대한 이벤트를 감지하여 비상벨 작동 신호를 발생한다. 이러한 비상벨 장치(100)는 버튼식 비상벨과 소리인식모듈을 포함하는 소리인식 비상벨을 모두 포함할 수 있다. The emergency bell device 100 is installed in each high-crime area, collects audio data including sounds or voices generated in the high-crime area, detects an emergency event from the collected audio data, and generates an emergency bell operation signal. do. This emergency bell device 100 may include both a button-type emergency bell and a sound recognition emergency bell including a sound recognition module.

비상벨 장치(100)는 음향 수집을 위한 마이크(미도시), 비상벨 작동 신호와 오디오 데이터를 분석 서버(200)로 전송하기 위한 통신 모듈(미도시), 메모리(미도시), 파손이나 강제 전원 꺼짐시 발생되는 경고장치(미도시), 제어모듈(미도시) 등을 포함할 수 있다. The emergency bell device 100 includes a microphone (not shown) for sound collection, a communication module (not shown) for transmitting an emergency bell operating signal and audio data to the analysis server 200, a memory (not shown), damage or forced It may include a warning device (not shown) generated when the power is turned off, a control module (not shown), and the like.

비상벨 장치(100)는 일정 시간 단위(대략 10초)로 우범 지역(공중 화장실, 버스 정류장, 골목, 건물 사각지대 등)에서 발생되는 모든 음향 및 음성을 포함한 오디오 데이터를 버퍼(미도시)에 저장하고, 위급 상황에 대한 이벤트가 감지되면 비상벨 작동신호를 발생시키며, 비상벨 작동신호가 발생되기 이전의 일정 시간 동안 녹음된 오디오 데이터를 버퍼에서 불러와 비상벨 작동 신호와 함께 분석 서버(200)로 전송한다. 이때, 비상벨 장치(100)는 버퍼에 저장된 오디오 데이터를 선입선출 방식으로 삭제하여 기설정된 용량 이상의 저장 용량을 확보할 수 있다. The emergency bell device 100 stores audio data including all sounds and voices generated in crime-prone areas (public toilets, bus stops, alleys, blind spots of buildings, etc.) in a certain time unit (approximately 10 seconds) in a buffer (not shown). When an emergency event is detected, an emergency bell operation signal is generated, and the audio data recorded for a certain period of time before the emergency bell operation signal is generated is read from the buffer and the analysis server (200 ) is sent to At this time, the emergency bell device 100 may secure a storage capacity equal to or greater than a preset capacity by deleting the audio data stored in the buffer in a first-in-first-out manner.

분석 서버(200)는 비상벨 장치(100)로부터 비상벨 작동 신호가 수신되면 해당 비상벨 장치(100)로부터 오디오 데이터를 수신하고, 오디오 데이터에서 음성 데이터를 추출한 후 음성 데이터를 분석하여 대화음성 상황을 분류하고, 오디오 데이터에서 음성 데이터를 제외한 나머지 음향 데이터를 분석하여 음향 상황을 분류하며, 분류된 대화음성 상황과 음향 상황을 통합하여 기 설정된 분류 기준에 따라 단계별 코드로 구분되는 범죄 또는 위급 상황을 판별하여 상황 분류 결과를 제공한다. 이때, 분석 서버(200)는 비상벨 작동 신호가 발생되기 이전에 일정 시간동안 녹음된 음향 정보도 함께 수신하여 분석할 수 있어, 더욱 정확하게 현재 상황을 파악할 수 있다. When the emergency bell operating signal is received from the emergency bell device 100, the analysis server 200 receives audio data from the corresponding emergency bell device 100, extracts voice data from the audio data, and analyzes the voice data to obtain a conversational voice situation. and classifies the acoustic situation by analyzing the remaining audio data except for the voice data in the audio data, and integrating the classified dialogue voice situation and the acoustic situation to classify crimes or emergency situations into step-by-step codes according to preset classification criteria. It determines and provides situation classification results. At this time, the analysis server 200 may also receive and analyze sound information recorded for a certain time before the emergency bell activation signal is generated, so that the current situation can be more accurately grasped.

관제 서버(300)는 분석 서버(200)로부터 상황 분류 결과가 수신되면 비상벨 작동 신호가 발생된 우범 지역을 관할하는 보안 단말(400)에 상황 분류 결과에 근거한 범죄 코드에 따라 현장 출동 정보 또는 상황 대처 정보를 제공한다. When the situation classification result is received from the analysis server 200, the control server 300 provides the security terminal 400 in charge of the high-crime area where the emergency bell operation signal is generated according to the crime code based on the situation classification result, on-site dispatch information or situation. Provide coping information.

이때, 분석 서버(200) 및 관제 서버(300)는 일반적인 의미의 서버용 컴퓨터 본체일 수 있고, 그 외에 서버 역할을 수행할 수 있는 다양한 형태의 장치로 구현될 수 있다. 구체적으로, 분석 서버(200) 및 관제 서버(300)는 각각 통신 모듈(미도시), 메모리(미도시), 프로세서(미도시) 및 데이터베이스(미도시)를 포함하는 컴퓨팅 장치에 구현될 수 있는데, 일례로 휴대폰이나 TV, PDA, 태블릿 PC, PC, 노트북 PC 및 기타 사용자 단말 장치 등으로 구현될 수 있다. At this time, the analysis server 200 and the control server 300 may be a computer body for a server in a general sense, and may be implemented in various types of devices capable of performing other server roles. Specifically, the analysis server 200 and the control server 300 may be implemented in a computing device including a communication module (not shown), a memory (not shown), a processor (not shown), and a database (not shown), respectively. , For example, it may be implemented as a mobile phone, TV, PDA, tablet PC, PC, notebook PC, and other user terminal devices.

또한, 보안 단말(400)은 경찰서 또는 타기관과 연계하여 보안 요원의 출동 여부, 범죄 상황 알림 등을 수행하기 위해 무선 통신이 가능한 단말로서, 스마트폰, 태블릿 PC, PC, 노트북 PC 등으로 구현될 수 있다. In addition, the security terminal 400 is a terminal capable of wireless communication in connection with the police station or other institutions to perform notification of whether a security agent is dispatched or not, a crime situation, etc. can

비상벨 장치(100)는 관제 서버(300)에 의해 지정된 고유한 식별 정보를 가지고, 비상벨 작동 신호와 상황 분류 결과는 해당 비상벨 장치(100)의 식별 정보를 포함한다. 따라서, 분석 서버(200) 및 관제 서버(300)는 비상벨 장치(100)의 식별 정보를 이용하여 해당 비상벨 장치(100)가 설치된 장소, 즉 우범 지역의 위치 정보를 확인할 수 있고, 해당 우범 지역을 관할하는 보안 단말(400)로 신속히 정보를 전송할 수 있다. The emergency bell device 100 has unique identification information designated by the control server 300, and the emergency bell operating signal and the situation classification result include the identification information of the corresponding emergency bell device 100. Therefore, the analysis server 200 and the control server 300 can use the identification information of the emergency bell device 100 to check the location information of the place where the emergency bell device 100 is installed, that is, the high crime area, and the corresponding high crime area. Information can be quickly transmitted to the security terminal 400 in charge of the area.

따라서, 분석 서버(200)와 관제 서버(300)는 데이터베이스(210)에 각 비상벨 장치(100)의 식별 정보, 각 우범 지역을 관할하는 보안 단말(400)의 정보를 저장한다. Therefore, the analysis server 200 and the control server 300 store identification information of each emergency bell device 100 and information of the security terminal 400 in charge of each high crime area in the database 210 .

한편, 비상벨 장치(100)는 우범 지역에 대한 현장 영상을 촬영하는 적어도 하나 이상의 카메라 장치(150)를 더 포함할 수 있다. 예를 들어, 우범 지역이 버스정류장, 지하인도, 건물 옥상이나 건물 계단 등의 건물 사각지대인 경우에 CCTV 등의 카메라 장치(150)를 지하인도, 건물 옥상이나 계단 등 해당 우범 지역의 일측 상부에 설치하고, 카메라 장치(150)를 통해 현장 상황을 촬영할 수 있다. Meanwhile, the emergency bell device 100 may further include at least one or more camera devices 150 for capturing on-site images of high crime areas. For example, when the crime-prone area is a blind spot of a building such as a bus stop, an underground walkway, a roof of a building or a stairway, a camera device 150 such as a CCTV is installed on an upper part of one side of the crime-prone area such as an underground walkway, a building rooftop or stairs. After installation, the field situation may be photographed through the camera device 150 .

관제 서버(300)는 상황 분류 결과가 수신되면, 해당 우범 지역의 카메라 장치(150)를 통해 현장 영상을 실시간 수신하고, 상황 분류 결과를 기초로 하여 현장 영상을 확인하면서 현재 상황을 기설정된 보안 레벨로 구분하고, 구분된 보안 레벨에 따라 현장 출동 정보 또는 상황 대처 정보를 발생시킬 수 있다. 이때, 관제 서버(300)는 실시간 수신되는 현장 영상에 따라 보안 레벨을 수시로 변경할 수 있다. When the situation classification result is received, the control server 300 receives the field image in real time through the camera device 150 of the corresponding high-crime area, checks the field image based on the situation classification result, and sets the current situation to a preset security level. , and may generate on-site dispatch information or situation response information according to the classified security level. At this time, the control server 300 may change the security level at any time according to the field video received in real time.

도 2는 본 발명의 일 실시예에 따른 분석 서버의 구성을 설명하는 도면이고, 도 3은 본 발명의 일 실시예에 따른 범죄 상황별 분류된 범죄 코드를 설명하는 도면이다. 2 is a diagram for explaining the configuration of an analysis server according to an embodiment of the present invention, and FIG. 3 is a diagram for explaining crime codes classified for each crime situation according to an embodiment of the present invention.

도 2 및 도 3을 참조하면, 분석 서버(200)는 통신부(201), 음성 추출부(202), 대화음성 상황 분석부(203), 음향상황 분석부(204), 상황 판단부(205)를 포함하지만 이에 한정되지는 않는다.2 and 3, the analysis server 200 includes a communication unit 201, a voice extraction unit 202, a dialogue voice situation analysis unit 203, an acoustic situation analysis unit 204, and a situation determination unit 205. Including, but not limited to.

통신부(201)는 비상벨 장치(100)로부터 오디오 데이터를 수신하는데, 통신망과 연동하여 비상벨 장치(100) 뿐만 아니라 관제 서버(300), 사용자 단말 간의 송수신 신호를 패킷 데이터 형태로 제공하는데 필요한 통신 인터페이스를 제공한다. 여기서, 통신부(201)는 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다.The communication unit 201 receives audio data from the emergency bell device 100, and communication necessary to provide transmission and reception signals between the emergency bell device 100 as well as the control server 300 and user terminals in the form of packet data in conjunction with the communication network. provide an interface. Here, the communication unit 201 may be a device including hardware and software necessary for transmitting/receiving a signal such as a control signal or a data signal through a wired/wireless connection with another network device.

음성 추출부(202)는 오디오 데이터에서 사람의 음성 데이터를 추출하고, 대화음성 상황 분석부(203)는 추출된 음성 데이터를 텍스트로 변환한 후 텍스트에 기반하여 대화음성 상황(협박 상황, 폭행 상황, 금전갈취 상황 등)을 분석한다.The voice extraction unit 202 extracts human voice data from the audio data, and the conversational voice situation analysis unit 203 converts the extracted voice data into text, and then converts the extracted voice data into text based on the conversational voice situation (threatening situation, assault situation). , money extortion situation, etc.) is analyzed.

음향 상황 분석부(204)는 오디오 데이터에서 음성 데이터를 제외한 나머지 음향 데이터를 분석하여 음향 상황(동물학대, 몰카설치, 기물파손, 주취자, 아이울음, 성인울음, 맹견짖음, 벌떼, 화재경보, 폭행 등)을 분석한다.The acoustic situation analysis unit 204 analyzes the remaining audio data except voice data from the audio data and analyzes the acoustic situation (animal abuse, hidden camera installation, property damage, drunken person, crying child, adult crying, fierce dog barking, bee swarm, fire alarm, assault) etc.) are analyzed.

상황 판단부(205)는 대화음성 상황 분석부(203)에서 분석한 대화음성 상황과 음향상황 분석부(204)에서 분석한 음향 상황을 통합하여 기 설정된 분류 기준에 따라 단계별 범죄 코드로 구분되는 범죄 또는 위급 상황을 판별하여 상황 분류 결과를 제공한다. The situation determination unit 205 integrates the conversational voice situation analyzed by the conversational voice situation analysis unit 203 and the acoustic situation analyzed by the acoustic situation analysis unit 204 to classify crimes into step-by-step crime codes according to preset classification criteria. Or it determines the emergency situation and provides the situation classification result.

도 3에 도시된 바와 같이, 범죄 코드는 5개의 보안 레벨(코드0~코드4)로 구분되고, 코드4에서 코드 0으로 갈수록 출동 시간, 출동인원, 상황 대처의 심각성들이 높아짐을 알 수 있다. 예를 들어, 공중 화장실에 비상벨 장치(100)가 설치된 경우에, 분석 서버(200)는 비상벨 작동 신호가 감지되고 공중 화장실 내에서 여자 비명이 감지되면 범죄 코드를 코드0로 분류하고, 관제 서버(300)로 범죄 코드와 범죄 상황(여자 화장실에 남자 출입 상황, 협박에 피해자가 흐느끼는 상황, 폭행하는 상황 등)에 대한 상황 분석 결과를 전송한다. 그러면, 관제 서버(300)는 상황 분석 결과를 통해 범죄 코드가 코드 0임을 확인하고, 최단 시간내에 경찰 등의 방범 요원이 출동하고, 피해자의 안전과 가해자의 신속한 검거 등을 위해 구급차, 여성 경찰, 인접 지역의 경찰인력 등의 출동 요소와의 공조 출동 등의 현장 출동 정보 또는 상황 대처 정보를 지시할 수 있다. As shown in FIG. 3, the crime codes are divided into five security levels (code 0 to code 4), and it can be seen that the severity of the response time, number of people dispatched, and response to the situation increases from code 4 to code 0. For example, when the emergency bell device 100 is installed in a public toilet, the analysis server 200 classifies the crime code as code 0 when an emergency bell operation signal is detected and a female scream is detected in the public toilet, and the control The server 300 transmits the crime code and the situational analysis result for the crime situation (a situation in which a man enters a women's bathroom, a situation in which a victim sobs under threat, a situation in which an assault occurs, etc.). Then, the control server 300 confirms that the crime code is code 0 through the situation analysis result, and a crime prevention agent such as the police is dispatched within the shortest time, and an ambulance, female police, Field dispatch information or situation response information, such as cooperation with dispatch elements such as police personnel in an adjacent area, may be instructed.

이와 같이, 상황 판단부(205)는 폭력 또는 위급 상황은 벌집, 화재, 유기견 발생 등의 상황 발생시 코드 4로 판단하고, 아이울음, 성인울음, 맹견짖음 등의 상황 발생시 코드 3로 판단하며, 2인상의 괴롭힘 상황 발생시 코드 2로 판단하고, 동물학대, 기물파손, 몰카설치 등의 상황 발생시 코드 1로 판단하며, 금전갈취, 협박, 괴롭힘, 폭력 등의 상황 발생시 코드 0로 판단할 수 있다. As such, the situation determination unit 205 determines violence or emergency as code 4 when situations such as beehives, fires, and abandoned dogs occur, and judges code 3 when situations such as crying children, adults cry, and barking dogs occur, and 2 It is judged as code 2 when bullying situation occurs, code 1 when animal abuse, property damage, hidden camera installation, etc., and code 0 when money extortion, threats, bullying, violence, etc. occur.

도 4는 본 발명의 일 실시예에 따른 사운드 기반의 지능형 위급상황 분석 방법을 설명하는 순서도이다. 도 5는 종래 기술의 일 실시예에 따른 Wave U-Net 구조를 설명하는 도면이고, 도 6은 본 발명의 일 실시예에 따른 Nested Wave U-Net 구조의 음성 분석 모델을 설명하는 도면이다. 4 is a flowchart illustrating a method for analyzing a sound-based intelligent emergency situation according to an embodiment of the present invention. 5 is a diagram illustrating a Wave U-Net structure according to an embodiment of the prior art, and FIG. 6 is a diagram illustrating a voice analysis model of a Nested Wave U-Net structure according to an embodiment of the present invention.

도 4를 참조하면, 사운드 기반의 지능형 위급상황 분석 방법은, 분석 서버(200)가 비상벨 장치(100)로부터 비상벨 작동 신호를 감지하면(S11), 실시간 현장 오디오 데이터를 수신한다(S12). Referring to FIG. 4, in the sound-based intelligent emergency situation analysis method, when the analysis server 200 detects an emergency bell operation signal from the emergency bell device 100 (S11), it receives real-time on-site audio data (S12). .

분석 서버(200)는 현장 오디오 데이터에서 음성 데이터를 추출한 후(S13), 추출한 음성 데이터에 기반하여 대화음성 상황을 분석한다(S14). 분석 서버(200)는 범죄 또는 위급 상황에 해당하는 대화음성 관련 텍스트들을 학습 데이터로 수집하고, 수집된 학습 데이터를 이용하여 인공 지능 기반의 음성 분석 모델을 사전에 학습한 후 학습된 음성 분석 모델에 기반하여 음성 데이터 추출 및 대화음성 상황을 분류할 수 있다.The analysis server 200 extracts voice data from on-site audio data (S13), and then analyzes a dialogue voice situation based on the extracted voice data (S14). The analysis server 200 collects conversational voice-related texts corresponding to crimes or emergency situations as learning data, uses the collected learning data to learn an artificial intelligence-based voice analysis model in advance, and then uses the learned voice analysis model. Based on this, it is possible to extract voice data and classify conversational voice situations.

이때, 음성 분석 모델은 End-to-End 방식의 Fully-Convolutional Network 기반 모델인 U-Net의 딥러닝 구조를 사용하여 알고리즘을 구성하고 있지만, 은닉층의 연결 구조를 달리함에 따라 VGGnet, GoogLeNet, ResNet, DenseNet, fully convolutional network, AlexNet 등 다양한 구조의 딥러닝을 사용할 수 있다. At this time, the voice analysis model uses the deep learning structure of U-Net, which is an end-to-end fully-convolutional network-based model, to configure the algorithm, but as the connection structure of the hidden layer is different, VGGnet, GoogLeNet, Deep learning with various structures such as DenseNet, fully convolutional network, and AlexNet can be used.

특히, 음성 분석 모델은 음성 데이터 추출을 위해 Nested wave U-Net을 사용하여 음향과 음성이 혼합된 오디오 데이터에 2개의 음향 데이터와 음성 데이터를 나누게 된다. In particular, the voice analysis model uses nested wave U-Net to extract voice data and divides two audio data and voice data into audio data in which sound and voice are mixed.

도 5에 도시된 바와 같이, 기존의 Wave U-Net은 1D Convolution 모듈과 다운샘플링(downsampling) 모듈이 연속으로 구성되어 앞쪽 특징 맵 크기가 줄어드는 수축 단계(Contracting Path)을 수행하고, 업샘플링(upsampling) 모듈의 연속 과정으로 뒤쪽 특징 맵 크기를 다시 늘려주는 팽창 단계(Expanding Path)을 수행한다. 이때, 1D Convolution 모듈을 통해서 시간 도메인(Time domain)에서 많은 하이레벨 특징 맵(High-level features map)을 추출해 내고, 다운 샘플링을 하며 시간 단계에 대해 특정한 패턴을 따르며 시간 샘플(time sample)들을 무시하여 시간 분해능(Time resolution)을 절반으로 줄이게 된다.As shown in FIG. 5, the conventional Wave U-Net consists of a 1D convolution module and a downsampling module in succession to perform a contracting path in which the front feature map size is reduced, and upsampling ) module, an Expanding Path that increases the size of the back feature map again is performed. At this time, many high-level feature maps are extracted from the time domain through the 1D Convolution module, down-sampled, follow a specific pattern for the time step, and time samples are ignored. This reduces the time resolution by half.

그러나, 도 6에 도시된 바와 같이, 본 발명의 음성 분석 모델에 적용되는 Nested wave U-Net 구조는 아래로 향하는 경로인 수축 단계와 위로 향하는 경로인 팽창 단계로 이루어져 수축 단계에서 입력 데이터에 대한 고차원 정보를 포착하고, 팽창 단계에서 세밀한 지역화(localization)를 진행한다. 이때, Nested wave U-Net 구조는 기존의 Wave U-Net 구조와 다르게, 업샘플링 단계마다 각 단계에 해당하는 수축 단계(contracting path)의 특징 맵을 가져와 뒤로 이어 붙이게 되는데(concate), 특징 맵의 크기를 맞추기 위해 팽창 단계의 특징 맵의 크기에 맞춰 수축 단계의 특징맵을 적당한 크기로 crop(잘라냄)한 후 concat(붙임)하는 방식으로 오디오 데이터의 정보를 유지하고 있다. However, as shown in FIG. 6, the nested wave U-Net structure applied to the speech analysis model of the present invention consists of a contraction step, which is a downward path, and an expansion step, which is an upward path. It captures the information and proceeds with fine-grained localization in the expansion phase. At this time, unlike the existing Wave U-Net structure, the nested wave U-Net structure brings the feature maps of the contracting path corresponding to each step at each upsampling step and concatenates them. In order to fit the size, the feature map in the contraction stage is cropped (cut out) to an appropriate size according to the size of the feature map in the expansion stage, and then concat (attached) to maintain the information of the audio data.

이와 같이, 음성 분석 모델은 Nested wave U-Net 구조를 이용하여 현장 오디오 데이터에서 잡음을 제거하고 깨긋한 음질의 음성 데이터를 추출할 수 있다. In this way, the voice analysis model can remove noise from field audio data and extract voice data with clear sound quality by using the nested wave U-Net structure.

분석 서버(200)는 오디오 데이터에서 음성 데이터를 제외한 음향 데이터를 추출하고, 추출한 음향 데이터에 기반하여 음향 상황을 분석한다(S15). 분석 서버(200)는 범죄 또는 위급 상황에 해당하는 음향 관련 소리들을 학습 데이터로 수집하고, 수집된 학습 데이터를 이용하여 인공 지능 기반의 음향 분석 모델을 학습한 후 학습된 음향 분석 모델에 기반하여 음향 상황을 분류할 수 있다. The analysis server 200 extracts audio data excluding voice data from audio data and analyzes an acoustic situation based on the extracted audio data (S15). The analysis server 200 collects sound-related sounds corresponding to crimes or emergencies as learning data, learns an artificial intelligence-based sound analysis model using the collected learning data, and then uses the learned sound analysis model to perform sound analysis. Situations can be categorized.

분석 서버(200)는 분석된 대화음성 상황과 음향 상황을 통합하여 현장 오디오 데이터에 따른 상황이 범죄/위급 상황별 범죄 코드에 해당하는지를 분류한다(S16). The analysis server 200 integrates the analyzed dialogue voice situation and sound situation and classifies whether the situation according to the on-site audio data corresponds to the crime code for each crime/emergency situation (S16).

한편. 도 4의 각 단계들은 본 발명의 구현예에 따라서 추가적인 단계들로 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계간의 순서가 변경될 수도 있다.Meanwhile. Each of the steps in FIG. 4 may be divided into additional steps or combined into fewer steps according to an embodiment of the present invention. Also, some steps may be omitted if necessary, and the order of steps may be changed.

도 7은 본 발명의 일 실시예에 따른 분석 서버에서 대화음성 상황 및 음향 상황을 분석하는 과정을 설명하는 도면이고, 도 8은 본 발명의 일 실시예에 따른 음성종류별 음성 발생 여부를 판별하는 과정을 설명하는 순서도이며, 도 9는 본 발명의 일 실시예에 따른 음성 데이터 추출 과정을 설명하는 순서도이다. FIG. 7 is a diagram illustrating a process of analyzing a conversational voice situation and an acoustic situation in an analysis server according to an embodiment of the present invention, and FIG. 8 is a process of determining whether a voice is generated by voice type according to an embodiment of the present invention. 9 is a flowchart illustrating a process of extracting voice data according to an embodiment of the present invention.

도 7에 도시된 바와 같이, 분석 서버(200)는 현장의 오디오 데이터(예를 들어, 10초간 현장 소리)를 수신하고(S21), 음성종류별 음성 발생 여부를 판별한 후 음성 데이터를 추출한다(S22, S23). As shown in FIG. 7, the analysis server 200 receives on-site audio data (eg, 10-second on-site sound) (S21), determines whether a voice is generated for each voice type, and then extracts the voice data (S21). S22, S23).

도 8에 도시된 바와 같이, 분석 서버(200)는 비명, 발자국, 맹견짖음 등의 50종 이상의 음향 데이터가 저장된 음향 DB와, 영어, 한국어, 중국어, 조선족어 등의 언어종류별 음성 데이터가 저장된 음성 DB를 구축한다. As shown in FIG. 8, the analysis server 200 includes a sound DB in which more than 50 types of sound data such as screams, footsteps, and barking dogs are stored, and voice data in which language types such as English, Korean, Chinese, and Korean-Chinese are stored. Build a DB.

따라서, 분석 서버(200)는 음향 DB에서 1~3종의 음향 샘플 데이터를 랜덤으로 선택하고(S31), 음성 DB에서 1 종류의 음성 샘플 데이터를 랜덤으로 선택한 후(S32), 음향 샘플 데이터와 음성 샘플 데이터를 서로 믹싱한 후 언어종류를 라벨로 하여 언어 종류를 분류하기 위해 인공지능 기반의 음성 분석 모델을 학습한다(S33, S34). 따라서, 분석 서버(200)는 학습된 음성 분석 모델을 이용하여 현장 오디오 데이터에서 언어 종류를 확인할 수 있다. Therefore, the analysis server 200 randomly selects 1 to 3 kinds of sound sample data from the sound DB (S31), randomly selects one type of sound sample data from the sound DB (S32), and then After mixing the voice sample data with each other, an artificial intelligence-based voice analysis model is learned to classify language types using language types as labels (S33 and S34). Accordingly, the analysis server 200 may check the language type in the on-site audio data using the learned voice analysis model.

도 9에 도시된 바와 같이, 분석 서버(200)는 1~3종의 음향 샘플 데이터를 랜덤으로 선택하고(S41), 음성 DB에서 1 종류의 음성 샘플 데이터를 랜덤으로 선택한 후(S42), 음향 샘플 데이터와 음성 샘플 데이터를 서로 믹싱한 후 음성 DB에서 선택한 본래의 음성 데이터를 라벨로 하여 음성 데이터 추출을 위해 인공 지능 기반의 음성 분석 모델을 학습한다(S43, S44). 따라서, 분석 서버(200)는 학습된 음성 분석 모델을 이용하여 현장 오디오 데이터에서 1종 이상의 언어 종류를 사용한 음성 신호를 추출할 수 있다. As shown in FIG. 9, the analysis server 200 randomly selects 1 to 3 kinds of sound sample data (S41), randomly selects one kind of sound sample data from the voice DB (S42), and then After mixing the sample data and the voice sample data, the original voice data selected from the voice DB is used as a label to learn an artificial intelligence-based voice analysis model for voice data extraction (S43 and S44). Accordingly, the analysis server 200 may extract a voice signal using one or more types of languages from field audio data using the learned voice analysis model.

다시 도 7을 참조하면, 분석 서버(200)는 음성 종류 정보를 한국어 특성에 기반한 STT(Speech To Text)의 음성 처리 엔진에 제공하여, 음성 처리 엔진에서 인간의 언어를 텍스트로 변환한다(S24). 여기서, STT(Speech To Text)는 음성인식의 한 분야로서 사람의 음성언어를 컴퓨터의 해석으로 문자데이터로 변환하는 처리를 의미한다. 최근 STT를 위한 음성인식 엔진은 딥러닝(Deep Learning)알고리즘을 통해 음향과 언어 모델을 이용해 정확도를 높이고 있다. 전통적인 음성인식 알고리즘인 HMM(Hidden Markov Model) 이외에 딥러닝 기반 알고리즘으로 널리 사용되는 DNN(Deep Neural Network)과 RNN(Recurrent Neural Network)기법을 적용함으로써 과거에 비해 높은 정확도를 보이고 있다.Referring back to FIG. 7 , the analysis server 200 provides voice type information to a STT (Speech To Text) voice processing engine based on Korean characteristics, and the voice processing engine converts human language into text (S24). . Here, STT (Speech To Text) means a process of converting human voice language into text data through computer interpretation as a field of voice recognition. Recently, speech recognition engines for STT are improving accuracy by using sound and language models through deep learning algorithms. In addition to HMM (Hidden Markov Model), which is a traditional speech recognition algorithm, DNN (Deep Neural Network) and RNN (Recurrent Neural Network) techniques, which are widely used as deep learning-based algorithms, are applied to show higher accuracy than in the past.

따라서, 분석 서버(200)는 한국어, 영어, 중국어, 조선족어 중 적어도 하나 이상의 언어종류를 포함한 음성 데이터가 텍스트로 변환되고, 텍스트 기반의 대화음성 상황을 분류하며(S25), 음성방해신호에 강건한 음향 상황을 분류한 후 이 대화음성 상황과 음향 상황을 통합하여 폭력/위급상황별 범죄코드를 분류할 수 있다(S26, S27). Therefore, the analysis server 200 converts voice data including at least one language type among Korean, English, Chinese, and Korean-Chinese into text, classifies a text-based conversational voice situation (S25), and is robust against a voice interference signal. After the sound situation is classified, the conversation voice situation and the sound situation are combined to classify crime codes for each violence/emergency situation (S26, S27).

도 10은 본 발명에 일 실시예에 따른 대화음성 상황 분류를 위해 음성 분석 모델을 학습하는 과정을 설명하는 도면이고, 도 11은 본 발명에 일 실시예에 따른 학습된 음성 분석 모델을 이용해 대화음성 상황 분류하는 과정을 설명하는 도면이다. 10 is a diagram illustrating a process of learning a voice analysis model for classifying a conversational voice situation according to an embodiment of the present invention, and FIG. It is a diagram explaining the process of classifying situations.

도 10을 참조하면, 분석 서버(200)는 일반 대화, 금전갈취, 협박, 괴롭힘 등의 여러가지 대화 상황별 한국어 텍스트 DB를 구축하고(S51), 하나 이상의 텍스트를 포함한 대화 상황에서 단어 순서를 변경하고, 단어(한국어, 중국어, 영어, 유사어, 비속어 등)를 변경하는 방식으로 학습 데이터를 1차적으로 증강(Data Augmentation)한다(S52). Referring to FIG. 10, the analysis server 200 builds a Korean text DB for various conversation situations such as general conversation, extortion of money, intimidation, and bullying (S51), and changes the order of words in conversation situations including one or more texts, , The learning data is primarily augmented (Data Augmentation) by changing words (Korean, Chinese, English, similar words, profanity, etc.) (S52).

분석 서버(200)는 한국어 텍스트로 1차 증강된 DB를 영어 번역하여 영어 텍스트 증강 DB를 구축하고, 한국어 텍스트로 1차 증강된 DB를 중국어 번역하여 중국어 텍스트 증강 DB를 구축한다(S53). The analysis server 200 constructs an English text augmented DB by translating the primary augmented DB into Korean text into English, and constructs a Chinese text augmented DB by translating the primary augmented DB into Korean text into Chinese (S53).

분석 서버(200)는 한국어 텍스트로 1차 증강된 DB와 영어 텍스트 증강 DB, 중국어 텍스트 증강 DB를 통합하여 2차 증강된 DB를 구축한 후(S54), 대화음성 상황 분류를 위해 2차 증강된 DB를 학습 데이터로 하여 음성 분석 모델을 학습한다(S55). The analysis server 200 builds a second augmented DB by integrating the first augmented DB with Korean text, the English text augmented DB, and the Chinese text augmented DB (S54), and then the second augmented DB for conversational voice situation classification. A voice analysis model is learned using the DB as learning data (S55).

도 11에 도시된 바와 같이, 분석 서버(200)는 현장 오디오 데이터가 수신되면 현장 오디오 데이터에서 언어종류별 음성 발생 여부를 판별한 후 음성 데이터를 추출한 후(S61, S62, S63), 추출한 음성 데이터를 STT를 통해 해당 언어 종류에 따라 문자데이터로 변환한다(S64). As shown in FIG. 11, the analysis server 200, when field audio data is received, determines whether a voice is generated for each language type from the field audio data, extracts voice data (S61, S62, S63), and then extracts the extracted voice data. Through the STT, it is converted into text data according to the corresponding language type (S64).

분석 서버(200)는 변환된 문자 데이터가 한국어가 아닌 경우에 한국어 텍스트로 번역하고, 한국어로 번역된 대화음성 텍스트를 학습된 음성 분석 모델에 입력하고, 음성 분석 모델은 입력된 대화음성 텍스트가 어떠한 대화음성 상황에 해당하는지를 분류하여 출력한다(S65~S67). If the converted text data is not in Korean, the analysis server 200 translates the converted text data into Korean text, inputs the Korean-translated conversational speech text into the learned speech analysis model, and the speech analysis model determines how the input conversational speech text is Whether or not it corresponds to the dialogue voice situation is classified and output (S65 to S67).

대화음성 상황을 분석한다(S14). 분석 서버(200)는 범죄 또는 위급 상황에 해당하는 대화음성 관련 텍스트들을 학습 데이터로 수집하고, 수집된 학습 데이터를 이용하여 인공 지능 기반의 음성 분석 모델을 사전에 학습한 후 학습된 음성 분석 모델에 기반하여 음성 데이터 추출 및 대화음성 상황을 분류할 수 있다.A dialogue voice situation is analyzed (S14). The analysis server 200 collects conversational voice-related texts corresponding to crimes or emergency situations as learning data, uses the collected learning data to learn an artificial intelligence-based voice analysis model in advance, and then uses the learned voice analysis model. Based on this, it is possible to extract voice data and classify conversational voice situations.

이와 같이, 음성 분석 모델은 딥러닝 기반 알고리즘으로 구현될 수 있는데, 언어종류 분류, 음성 신호 추출, STT, 대화음성 상황 분류를 위해 각각의 심층신경망(DNN, deep neural networks)을 포함하는 다층 신경망 구조로 형성될 수 있다. In this way, the voice analysis model can be implemented as a deep learning-based algorithm, which has a multi-layer neural network structure including deep neural networks (DNNs) for language type classification, voice signal extraction, STT, and conversational voice situation classification. can be formed as

도 12는 본 발명에 일 실시예에 따른 학습된 음향 분석 모델을 이용해 음향 상황 분류하는 과정을 설명하는 도면이다. 12 is a diagram explaining a process of classifying acoustic situations using a learned acoustic analysis model according to an embodiment of the present invention.

도 12에 도시된 바와 같이, 분석 서버(200)는 폭력 상황, 동물학대, 기물파손, 아이울음, 성인울음, 맹견, 벌떼, 화재경보, 몰카설치 등의 여러 상황별 음향 데이터를 저장한 음향 DB를 구축하고, 음향 DB에서 1종 상황의 음향 샘플 데이터를 랜덤으로 선택하고, 음성 DB에서 1종의 음성 샘플 데이터를 랜덤으로 선택한다(S71, S72).As shown in FIG. 12, the analysis server 200 is a sound DB that stores sound data for various situations such as violent situations, animal abuse, property damage, crying children, crying adults, fierce dogs, swarms of bees, fire alarms, and hidden camera installation. is constructed, randomly selects one type of audio sample data from the audio DB, and randomly selects one type of audio sample data from the audio DB (S71 and S72).

분석 서버(200)는 음향 샘플 데이터와 음성 샘플 데이터를 서로 혼합하여 음향 상황 분류를 위해 음향 상황 종류를 라벨로 하는 음향 분석 모델을 학습한다. 따라서, 분석 서버(200)는 현장 오디오 데이터에서 추출한 음향 데이터를 학습된 음향 분석 모델에 입력하고, 학습된 음향 분석 모델을 통해 현장 오디오 데이터에 해당되는 음향 상황을 분류할 수 있다. The analysis server 200 mixes the acoustic sample data and the voice sample data to learn an acoustic analysis model using the acoustic situation type as a label for acoustic situation classification. Accordingly, the analysis server 200 may input the acoustic data extracted from the field audio data to the learned acoustic analysis model, and classify the acoustic situation corresponding to the field audio data through the learned acoustic analysis model.

이상에서 설명한 본 발명의 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 이러한 기록 매체는 컴퓨터 판독 가능 매체를 포함하며, 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함하며, 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.The embodiments of the present invention described above may be implemented in the form of a recording medium including instructions executable by a computer, such as program modules executed by a computer. Such recording media includes computer readable media, which can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Computer readable media also includes computer storage media, both volatile and nonvolatile, implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. , including both removable and non-removable media.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention. do.

100 : 비상벨 장치
150 : 카메라 장치
200 : 분석 서버
201 : 통신부
202 : 음성 추출부
203 : 대화음성 상황 분석부
204 : 음향상황 분석부
205 : 상황 판단부
210 : 데이터베이스
300 : 관제 서버
400 : 보안 단말100: emergency bell device
150: camera device
200: analysis server
201: Ministry of Communications
202: voice extraction unit
203: dialogue voice situation analysis unit
204: acoustic situation analysis unit
205: situation judgment unit
210: database
300: control server
400: security terminal

Claims

An emergency bell device installed in each high-crime area to collect audio data including sound or voice generated in the high-crime area, detect an emergency event from the collected audio data, and generate an emergency bell operation signal; and
When the emergency bell activation signal is received, audio data is received from the emergency bell device, voice data is extracted from the audio data, voice data is analyzed to classify conversational voice situations, and the rest of the audio data except voice data is classified. An analysis server that analyzes data to classify acoustic situations, integrates the classified conversational voice situations and acoustic situations, determines crime or emergency situations classified into step-by-step security levels according to preset classification standards, and provides situation classification results. include,
The analysis server,
The conversational voice-related texts corresponding to the crime or emergency situation are collected as learning data, an artificial intelligence-based voice analysis model is learned in advance using the learning data, and voice data is extracted and extracted based on the learned voice analysis model. Classify conversational voice situations,
Acoustic-related sounds corresponding to the crime or emergency situation are collected as learning data, an artificial intelligence-based acoustic analysis model is learned using the learning data, and then the acoustic situation is classified based on the learned acoustic analysis model. , sound-based intelligent emergency analysis system.

According to claim 1,
Further comprising a control server for providing on-site dispatch information or situation response information according to a security level based on the situation classification result to a security terminal having jurisdiction over a high-crime area where the emergency bell operation signal is generated when the situation classification result is received An intelligent emergency situation analysis system based on human and sound.

According to claim 1,
The analysis server,
A communication unit that receives audio data from the site from the emergency bell device and performs transmission and reception with an external device;
a voice extraction unit extracting voice data from the audio data;
a conversational speech situation analysis unit that converts the extracted speech data into Korean-based conversational speech text and analyzes a conversational speech situation based on the conversational speech text;
an acoustic situation analyzer configured to analyze an acoustic situation by analyzing the remaining audio data excluding voice data from the audio data; and
By integrating the conversational voice situation analyzed by the conversational voice situation analysis unit and the acoustic situation analyzed by the acoustic situation analysis unit, crimes or emergency situations classified into step-by-step crime codes are determined according to preset classification criteria, and situation classification results are provided. A sound-based intelligent emergency situation analysis system that includes a situation determination unit.

According to claim 3,
The voice extraction unit determines whether or not a voice is generated for each language type in the audio data and provides language type information to the dialogue voice situation analysis unit;
The sound-based intelligent emergency situation analysis system, wherein the dialogue voice situation analysis unit converts voice data extracted based on the language type information into text data, and translates the converted text data into Korean-based dialogue voice text.

According to claim 3,
The voice analysis model is formed of a multi-layer neural network structure including deep neural networks (DNNs) for a voice extraction unit, a conversational speech situation analysis unit, an acoustic situation analysis unit, and a situation determination unit. Intelligent emergency analysis system.

In the sound-based intelligent emergency situation analysis method performed by an analysis server that analyzes an emergency situation in conjunction with a sound-based emergency bell device,
a) when an emergency bell operating signal is detected from an emergency bell device installed in a pre-set high crime area, receiving audio data from the site generated within the high crime area;
b) extracting voice data from the audio data and analyzing a dialogue voice situation based on the extracted voice data;
c) extracting sound data from the audio data and analyzing a sound situation based on the extracted sound data; and
d) integrating the dialogue voice situation and the sound situation to determine a crime or emergency situation classified into step-by-step security levels according to a predetermined classification standard, and providing a situation classification result, sound-based intelligent emergency situation analysis method.

According to claim 6,
e) providing on-site dispatch information or situation response information to a security terminal having jurisdiction over the high crime area where the emergency bell operation signal is generated based on the situation classification result, sound-based intelligent emergency situation analysis method.

According to claim 6,
In step b),
The conversational voice-related texts corresponding to the crime or emergency situation are collected as learning data, an artificial intelligence-based voice analysis model is learned in advance using the learning data, and voice data is extracted and extracted based on the learned voice analysis model. A sound-based intelligent emergency situation analysis method that classifies conversational speech situations.

According to claim 8,
The voice analysis model determines whether a voice is generated for each language type in the audio data, converts the extracted voice data into text data based on the determined language type information, and converts the converted text data into Korean-based dialogue voice text. A sound-based intelligent emergency situation analysis method that classifies conversational speech situations by translating them.

According to claim 9,
The speech analysis model includes deep neural networks (DNNs) for determining whether consonants are generated by language type, extracting speech data, converting speech data to text data, translating Korean-based dialogue speech text, and classifying dialogue speech situations. A method for analyzing sound-based intelligent emergencies, which is formed as a multi-layer neural network structure.

According to claim 6,
In step c),
Acoustic-related sounds corresponding to the crime or emergency situation are collected as learning data, an artificial intelligence-based acoustic analysis model is learned using the learning data, and then the acoustic situation is classified based on the learned acoustic analysis model. , Sound-based intelligent emergency situation analysis method.

In the analysis server for analyzing the emergency situation in conjunction with the emergency bell device in conjunction with the sound-based emergency bell device,
Audio data is received from the emergency bell device, voice data is extracted from the audio data using an artificial intelligence-based voice analysis model, the voice data is analyzed to classify conversational voice situations, and an artificial intelligence-based acoustic analysis model is used. The audio data except for the voice data is analyzed to classify the acoustic situation, and the classified conversational voice situation and the acoustic situation are integrated and the crime or emergency situation is divided into step-by-step security levels according to preset classification criteria. to provide a situation classification result by determining
The artificial intelligence-based voice analysis model is pre-learned using learning data collected from conversational voice-related texts corresponding to the crime or emergency situation,
The analysis server, characterized in that the artificial intelligence-based acoustic analysis model is pre-trained using learning data collected from acoustic sounds corresponding to the crime or emergency situation.