KR20090070325A

KR20090070325A - Emergency calling system and method based on multimodal information

Info

Publication number: KR20090070325A
Application number: KR1020070138300A
Authority: KR
Inventors: 소인미; 김봉완; 김기영; 강선경; 김영운; 이상설; 이용주; 정성태
Original assignee: 원광대학교산학협력단
Priority date: 2007-12-27
Filing date: 2007-12-27
Publication date: 2009-07-01

Abstract

A multi-modal information base emergency recognition system and a method thereof for grasping the action of the old people or the independent person and generating the emergency call signal are provided to synthesize the video information, the sound information, and the operation information by the gravity sensor. A video processor(110) recognizes an operation by extracting and tracing a user from inputted image. A gravity sensor processor(120) recognizes swoon operation by sensing the operation of the user through a gravity sensor(121). A sound processor recognizes clearly the kind of the sound and speech word. A multimodal integrated recognizer determines emergency by integrating the information.

Description

Emergency calling system and method based on multimodal information}

본 발명은 멀티모달 정보기반 응급상황 인식 시스템 및 방법에 관한 것으로, 보다 자세하게는 영상처리부, 중력센서처리부, 음향처리부, 음성출력부, 멀티모달통합인식부 및 응급상황처리부로 구성되어 노인 또는 독거인의 행동을 파악하여 응급상황 발생 시 외부로 응급호출 하기 위한 멀티모달 정보기반 응급상황 인식 시스템 및 방법에 관한 것이다.The present invention relates to a multi-modal information-based emergency situation recognition system and method, and more particularly comprises an image processing unit, gravity sensor processing unit, sound processing unit, voice output unit, multi-modal integrated recognition unit and emergency situation processing unit for the elderly or single person The present invention relates to a multi-modal information-based emergency situation recognition system and method for making an emergency call to the outside in case of emergency.

본 발명은 영상과 음성 그리고 중력 센서를 이용한 멀티모달 정보에 의한 응급상황인지 시스템에 관한 것으로, 독거인의 댁내 거주 시 발생될 수 있는 응급상황에 대해 자동으로 인지하고 그 상황에 맞는 처리를 수행하는 방법에 관한 것이다.The present invention relates to an emergency situation recognition system based on multimodal information using video, audio, and gravity sensors, and automatically recognizes an emergency situation that may occur when a resident lives in the house and performs processing appropriate to the situation. It is about a method.

의료과학의 발전과 인간수명의 증가에 따른 노령인구가 증가함에 따라 많은 노령자 및 신체 활동이 부자유스러운 장애인들이 홀로 거주하는 현상이 증가하고 있다. 이에 따라 노인들의 건강을 고려한 실버산업이 발전하고 있으며, 피보호자의 상태를 실시간으로 파악하여 응급상황에 효과적으로 대처할 수 있는 기술들이 개발되고 있다. 최근에는 피보호자의 신체에 부착하여 맥박수를 자동으로 측정하고, 이상 발생시에 의료센터나 친인척들에게 자동으로 구급 통보를 할 수 있는 시스템이 개발되고 있다. As the elderly population increases due to the advancement of medical science and the increase of human lifespan, the number of the elderly and the handicapped people living alone are increasing. Accordingly, the silver industry considering the health of the elderly is developing, and technologies are developed to effectively identify the condition of the wards and cope with emergencies. Recently, a system has been developed that can be attached to the body of a ward to automatically measure the pulse rate and to automatically notify first aid to a medical center or relatives in case of an abnormality.

종래 응급상황인식 시스템은, 손가락이나 인체의 목 부위에 흐르는 말초동맥에 빛을 쪼여 반사광이나 투과광으로부터 혈류의 흐름을 검출하고, 심박수와 비례관계에 있는 맥박을 측정하는 광전식 방식을 채택하고 있다. 그러나 이와 같은 방법은 기온 저하로 인한 혈류 흐름이 감소하고, 태양광으로 인한 빛의 강도차에 의해서 맥박수가 쉽게 변하기 때문에, 정확한 맥박측정이 어려운 문제점이 있었다. Conventional emergency recognition system employs a photoelectric system that detects the flow of blood flow from reflected light or transmitted light by irradiating light to a peripheral artery flowing through the finger or the neck of the human body, and measuring the pulse in proportion to the heart rate. However, this method has a problem that it is difficult to accurately measure the pulse rate because the blood flow decreases due to temperature decrease, and the pulse rate is easily changed by the light intensity difference due to sunlight.

또한, 인체의 심장박동을 진동센서로 감지하여 맥박수를 측정하는 장치에 있어서도, 극심한 신체 활동이나 외부의 미소한 진동에도 민감하고, 정확한 측정을 위하여 손목이나 흉부에 압박을 가하여야 하기 때문에 장시간 착용이 곤란하고 부자유스럽다. 또한, 노약자들이 실신하거나 미끄러져 넘어질 확률이 높은 욕실이나 화장실 등에서는 몸에서 떼어 놓기 때문에 정작 중요한 시점에 노약자의 상태를 감지할 수 없는 문제점이 있었다.In addition, the device that measures the pulse rate by detecting the heart rate of the human body with a vibration sensor is sensitive to extreme physical activity or external slight vibration, and it is necessary to apply pressure on the wrist or chest for accurate measurement. Difficult and inconvenient In addition, in the bathroom or toilet having a high probability that the elderly are faint or slip, there is a problem that can not detect the status of the elderly at a critical point.

그 외에도, 긴급 연락용 버튼이나 비상 버튼을 댁내 주요 위치에 구비하여 응급상황시 독거인이 쉽게 누를 수 있도록 하는 방법이 있으나, 독거인이 직접 버튼을 조작하지 못하면 응급상황을 파악할 수 없는 문제점이 있다. 그리고, 독거인이 주로 생활하는 위치에 동작감지센서를 부착하여 일정시간 동안 동작이 감지되지 않을 때 비상사태로 규정하는 방식은 동작감지센서의 기능이 단순하여 사람이 집에서 활동중인지 여부만 파악할 수 있으며, 수면과 실신으로 인한 활동의 일시 정지상태를 구분하지 못하는 문제점이 있었다.In addition, there is a method of providing emergency contact buttons or emergency buttons in the main location of the home so that a single person can easily press in case of emergency, but there is a problem that the single person cannot grasp the emergency situation if the button is not operated directly. . In addition, by attaching a motion sensor to a location where a single person lives, the method of defining an emergency when a motion is not detected for a certain period of time is a simple function of the motion sensor, so that only the person who is active at home can be identified. And, there was a problem that can not distinguish the pause state of activities due to sleep and fainting.

또한, 영상, 음향 및 신체 동작등을 종합적으로 감지하여 정확한 상황 파악과 어느 한 장치에서 감지 못하더라도 대체 장치를 통해 위급상황을 감지하도록 하여 독거인 또는 노인의 위급상황을 놓치지 않도록 하는 기술은 종래에는 없었다.In addition, a technology for comprehensively detecting images, sounds, and body movements to identify an accurate situation and to detect an emergency situation through an alternative device even if one device does not detect the situation, so as not to miss an emergency situation of a single person or an elderly person. There was no.

상기와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 본 발명은 영상정보, 음향정보, 중력센서에 의한 동작정보를 종합하여 노인 또는 독거인의 실신 여부를 정확히 판별함으로써, 판단 오류로 인한 오동작을 방지하고 신속히 응급구조 요청을 할 수 있는 멀티모달 정보기반 응급상황 인식 시스템 및 방법을 제공함에 본 발명의 목적이 있다.The present invention has been made to solve the problems of the prior art as described above, by accurately determining whether the elderly or the single person is missing by combining the image information, sound information, the operation information by the gravity sensor, to prevent malfunction due to the determination error An object of the present invention to provide a multi-modal information-based emergency situation recognition system and method capable of making an emergency rescue request quickly.

본 발명의 상기 목적은 영상을 입력받아 영상으로부터 사용자를 검출 및 추적하고 실신 동작을 인식하여 멀티모달 통합 인식부로 전달하는 영상처리부; 사용자의 몸에 부착된 중력 센서를 통해 사용자의 동작을 감지하고 실신 동작을 인식하여 멀티모달 통합 인식부로 전달하는 중력센서처리부; 소리를 입력받아 음성과 비음성을 구분하며 음성에 대해서는 발성 단어를 인식하고 비음성에 대해서는 소리의 종류를 인식하여 응급상황 발생시 멀티모달 통합 인식부로 전달하는 음향처리부; 상기 영상처리부, 중력센서처리부, 음향처리부에서 감지된 정보를 통합하여 응급상황을 판별하고 상기 음성출력부를 통해 사용자에게 응급상황을 재확인하며 응급상황 발생시 응급상황 처리부로 응급호출을 요청하는 멀티모달 통합 인식부; 상기 멀티모달 통합 인식부의 신호를 받아 음성신호를 출력하는 음성출력부; 및 상기 멀티모달 통합 인식부의 응급호출 요청을 받아 의료기관, 구조기관 등으로 응급호출을 수행하는 응급상황처리부를 포함하는 멀티모달 정보기반 응급상황 인식 시스템에 의해 달성된다.The object of the present invention is an image processing unit for receiving an image and detects and tracks the user from the image and recognize the fainting motion to deliver to the multi-modal integrated recognition unit; A gravity sensor processor for detecting a user's motion through a gravity sensor attached to a user's body, recognizing a faint motion, and transmitting the multimodal integrated recognition unit; A sound processor that receives a sound and distinguishes a voice from a non-voice, recognizes a spoken word for a voice, and recognizes a type of sound for a non-voice and transmits the multi-modal integrated recognition unit in case of an emergency; Integrate the information detected by the image processing unit, gravity sensor processing unit, sound processing unit to determine the emergency situation, the multi-modal integrated recognition to determine the emergency situation to the user through the voice output unit and to request an emergency call to the emergency situation processing unit in case of an emergency situation part; A voice output unit receiving the signal of the multi-modal integrated recognition unit and outputting a voice signal; And it is achieved by a multi-modal information-based emergency situation recognition system including an emergency situation processing unit for performing an emergency call to a medical institution, a rescue organization, etc. in response to the emergency call request of the multi-modal integrated recognition unit.

또한, 본 발명의 다른 목적은 영상처리부, 중력센서처리부 또는 음향처리부에서 사용자의 실신동작을 감지하고 멀티모달 통합 인식부로 응급신호를 전송하는 제1단계; 멀티모달 통합 인식부는 응급상황을 재판단하기 위해 영상, 음성 및 중력 정보를 가공하여 응급상황이 발생한 시간을 동기화하는 제2단계; 상기 동기화된 영상정보, 음성정보 및 중력정보를 토대로 실신 동작 여부를 판별하는 제3단계; 음성 또는 경고음을 발생하여 응급상황을 재확인하는 제4단계; 및 상기 재확인과정을 통해 응급상황임이 최종 판별되면 외부로 응급호출을 수행하는 제5단계를 포함하는 멀티모달 정보기반 응급상황 인식 방법에 의해 달성된다.In addition, another object of the present invention is a first step of detecting the user's syncope motion in the image processing unit, gravity sensor processing unit or sound processing unit and transmits an emergency signal to the multi-modal integrated recognition unit; A second step of synchronizing the time at which the emergency occurred by processing the image, audio, and gravity information to judge the emergency situation; A third step of determining whether a fainting operation is performed based on the synchronized image information, voice information, and gravity information; A fourth step of re-confirming an emergency by generating a voice or warning sound; And if it is finally determined that the emergency situation through the re-confirmation process is achieved by a multi-modal information-based emergency situation recognition method comprising a fifth step of performing an emergency call to the outside.

따라서, 본 발명의 멀티모달 정보기반 응급상황 인식 시스템 및 방법은 영상정보, 음성정보, 중력센서에 의한 동작정보를 종합하여 언제라도 발생할 수 있는 댁내에서의 응급상황을 자동으로 감지하고 정확히 판별하여, 응급상황에 처한 댁내 독거인이 그대로 방치되지 않고 빠른 응급처리 또는 구조를 받을 수 있도록 하는 효과가 있다. 또한, 세 개의 정보를 종합하여 판단하므로 세 개중 어느 하나 또는 두개의 감지장치가 위급상황을 감지하지 못하더라도 놓치지 않고 정확히 위급상황을 파악할 수 있는 장점이 있다.Therefore, the multi-modal information-based emergency situation recognition system and method of the present invention automatically detects and accurately determines the emergency situation in the home that can occur at any time by combining image information, voice information, and operation information by a gravity sensor. It is effective to allow home-only resident in an emergency to receive quick emergency treatment or rescue without being left unattended. In addition, since three pieces of information are judged together, any one or two of the three detection devices do not miss an emergency situation, and thus, there is an advantage of accurately identifying an emergency situation without being missed.

본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다.The terms or words used in this specification and claims are not to be construed as being limited to their ordinary or dictionary meanings, and the inventors may appropriately define the concept of terms in order to best describe their invention. It should be interpreted as meaning and concept corresponding to the technical idea of the present invention based on the principle that the present invention.

따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 전체 시스템을 나타낸 블록도이다. 도 1을 참조하면, 본 발명의 구성은 영상처리부, 중력센서처리부, 음향처리부, 음성출력부, 멀티모달통합인식부 및 응급상황처리부로 구성된다.1 is a block diagram illustrating an entire system according to the present invention. Referring to FIG. 1, the configuration of the present invention includes an image processing unit, a gravity sensor processing unit, an audio processing unit, an audio output unit, a multi-modal integrated recognition unit, and an emergency situation processing unit.

영상처리부(100)은 외부 영상을 입력받는 영상입력부(111)와 영상의 잡음을 제거하고 조명변화에 따른 영상의 변화를 보정하는 영상전처리부(112) 및 전처리된 영상으로부터 사람을 검출하고 추적하여 실신동작을 인식하는 영상인식부(113)를 포함하여 구성된다.The image processor 100 detects and tracks a person from the image input unit 111 that receives an external image, the image preprocessor 112 that removes noise of the image, and corrects the change of the image according to the illumination change, and the person from the preprocessed image. It is configured to include an image recognition unit 113 for recognizing the operation.

영상입력부(111)는 거실이나 방의 천장 중앙에 위치하여 어안렌즈(fish-eye lens)를 장착한 카메라로부터 화각이 180°인 RGB 컬러 모델의 어안 영상을 입력 받게 된다. The image input unit 111 receives a fisheye image of an RGB color model having an angle of view of 180 ° from a camera equipped with a fish-eye lens positioned in the center of a ceiling of a living room or a room.

영상전처리부(112)는 먼저 RGB 컬러 모델을 LUV 컬러 모델로 변환하여 조명과 색상을 분리함으로써 조명 변화로 인한 영상의 변화를 감소시킨다. 그 다음에는 입력 영상의 평균 밝기를 구하고 이전의 평균 밝기에서 임계값 이상으로 변화하는 경우 영상 픽셀 값을 보정하여 평균 밝기가 급격하게 변화하지 않도록 만든다. The image preprocessing unit 112 first converts the RGB color model to the LUV color model to separate the illumination from the color to reduce the change of the image due to the change in illumination. After that, the average brightness of the input image is obtained, and when the average brightness changes from the previous average brightness above the threshold value, the image pixel value is corrected so that the average brightness does not change rapidly.

영상인식부(113)는 먼저 전처리된 입력 영상과 배경 영상과 차이가 큰 픽셀을 찾아내어 움직이는 객체를 검출한다. 배경 영상을 생성하기 위해 시스템 동작 초기에는 입력 영상 내에 움직이는 객체가 없어야 한다. 배경 영상은 조명 변화나 물체의 이동에 따라 지속적으로 변하므로 가우시안 혼합 모델 기반의 적응적 배경 모델링 방법을 이용하여 동적으로 배경 영상을 갱신한다.The image recognition unit 113 first detects a moving object by finding a pixel having a large difference between the preprocessed input image and the background image. In order to generate a background image, there should be no moving object in the input image at the beginning of the system operation. Since the background image changes continuously according to lighting changes or object movement, the background image is dynamically updated using an adaptive background modeling method based on a Gaussian mixture model.

움직이는 객체를 검출하면 객체의 크기 변화, 위치 변화, 움직임 속도를 추출한다. 이들 정보를 이용하여 움직이는 객체가 사람인지를 판별하고 사람이 이동 중인지, 정지해 있는지, 서 있거나 걷다가 쓰러지는 동작을 취하는지 등을 인식한다. 서있거나 걷다가 쓰러지는 동작이 일어난 다음에 일정 시간 움직임이 없으면 응급상황으로 판단하여 멀티모달 통합 인식부(100)에 이를 알린다.Detecting a moving object extracts the size change, position change, and movement speed of the object. This information is used to determine if a moving object is a person and to recognize whether the person is moving, stationary, standing or walking and taking a fall. If there is no movement for a certain time after standing or walking down, the multi-modal integrated recognition unit 100 is notified of the emergency situation.

중력센서처리부(120)는 독거인의 허리에 부착되어 몸의 움직임 정보를 감지하는 중력센서부(121) 및 중력센서의 움직임 정보를 이용하여 몸동작을 인식하는 동작인식부(122)로 구성된다.The gravity sensor processing unit 120 is composed of a gravity sensor unit 121 attached to the waist of the lone to detect the movement information of the body and the motion recognition unit 122 to recognize the body motion using the movement information of the gravity sensor.

중력센서처리부(120)는 사용자(독거인)의 허리 부분에 부착하여 중력을 측정하는 중력센서부(121)를 통해 x,y,z의 3축 기울기 정보를 측정하고, 측정된 기울기 정보를 이용하여 동작인식부(122)에서 사용자의 움직임을 감지하여, 응급상황을 인식한다. Gravity sensor processing unit 120 is attached to the waist of the user (alone) measures the three-axis tilt information of x, y, z through the gravity sensor unit 121 for measuring the gravity, and using the measured tilt information The motion recognition unit 122 detects a user's movement to recognize an emergency situation.

중력센서부(121)는 중력센서를 이용하여 x,y,z 3축의 기울기 정보를 입력받는다. 기울기 정보를 토대로 중력센서의 기울어진 축을 보정하기 위해 3축 기울기 정보를 동일한 값으로 변환하고 x,y,z 각 축의 기울임 정도를 측정한다.The gravity sensor unit 121 receives inclination information of three axes x, y, and z using a gravity sensor. In order to correct the tilted axis of the gravity sensor based on the tilted information, the 3-axis tilted information is converted into the same value and the tilting degree of each of the x, y, and z axes is measured.

동작인식부(122)에서는 각축의 기울기 변화를 통해 걷기, 눕기, 실신 등의 동작을 인식하게 된다. 걷기 동작은 사용자에 의해 중력센서의 기울기 정보가 일정한 형태로 흔들릴 경우를 인식하며, 눕는 동작은 중력센서의 기울기 정보가 틀어진 후 일정하게 유지될 때를 인식한다. 실신 동작의 경우 기울기 정보의 3축에서 충격 파형이 검출되고, 중력센서의 기울기 정보가 틀어져 유지될 때 응급상황으로 판단하여 멀티모달 통합 인식부(100)로 전송한다.The motion recognition unit 122 recognizes motions such as walking, lying down, and fainting through the change of the inclination of each axis. The walking motion recognizes a case in which the tilt information of the gravity sensor is shaken by a user by a user, and the lying motion recognizes when the tilt information of the gravity sensor is kept constant after being turned on. In the case of the fainting operation, the shock wave is detected on three axes of the tilt information, and when the tilt information of the gravity sensor is kept in an inverted state, it is determined as an emergency situation and transmitted to the multimodal integrated recognition unit 100.

음향처리부(130)는 독거인의 음성 및 주변 소리를 입력받는 음향입력부(131), 잡음을 제거하고 끝점을 검출하며 판별 및 인식을 위한 특징을 추출하는 음향 전처리부(132); 상기 추출된 특징을 이용하여 음성과 비음성을 구분하는 음성 판별부(133) 및 음성 판별부(133)에서 음성으로 판별된 경우 상기 추출된 특징을 이용하여 발성된 단어를 인식하고, 음성이 아닌 경우 상기 추출된 특징을 이용하여 소리의 종류를 인식하는 음성/음향 인식부(134)를 포함한다.The sound processor 130 may include a sound input unit 131 that receives a voice and ambient sound of a lone, a sound preprocessor 132 that removes noise, detects an end point, and extracts features for identification and recognition; When the voice discriminating unit 133 and the voice discriminating unit 133 that distinguish the voice from the non-voice using the extracted feature are recognized as the voice, the word spoken using the extracted feature is recognized and not the voice. The voice / sound recognition unit 134 may recognize the type of sound using the extracted feature.

음향입력부(131)는 천장에 부착된 마이크로폰, 독거인에 부착된 무선 마이크로폰 등으로부터 소리 정보를 입력받아 음향 전처리부(132)로 전송한다. 음향 전처리부(132)는 입력된 소리로부터 먼저 채널 잡음 및 배경잡음 등을 제거하고, 무음 이 아닌 소리가 있는 부분만을 검출하는 끝점 검출을 수행한다. 검출된 소리 부분을 이후의 음성판별, 음성 인식 및 소리 인식을 위하여 프레임별 특징을 추출한다. The sound input unit 131 receives sound information from a microphone attached to the ceiling, a wireless microphone attached to the lone, and transmits the sound information to the sound preprocessor 132. The acoustic preprocessor 132 first removes channel noise and background noise from the input sound, and performs endpoint detection to detect only a portion of the sound that is not silent. The detected sound portion is extracted feature for each frame for subsequent speech discrimination, speech recognition and sound recognition.

음성 판별부(133)에서는 추출된 특징을 이용하여 입력된 소리가 독거인의 음성인지 아니면 음악소리 또는 유리창 깨지는 소리와 같은 비음성 소리인지를 판별한다. The voice discriminating unit 133 determines whether the input sound is a voice of a lone or a non-voice sound such as a music sound or a broken glass window by using the extracted feature.

음성/음향 인식부(134)에서는 음성판별부(133)의 판별 결과가 음성일 경우 독거인의 음성이 비상 상황의 처리를 위한 요청인지 아닌지 인식한다. 만일 독거인의 음성이 “빨리 병원에 연락해줘”와 같은 비상 상황 요청의 경우 멀티모달 통합 인식부(100)에 응급상황이 발생했음을 알린다. 그러나 독거인의 음성이 비상 상황 요청이 아닌 일반적인 대화인 경우에는 독거인의 활동이 정상적임을 멀티모달 통합 인식부(100)에 알린다.The voice / sound recognition unit 134 recognizes whether the voice of the lone person is a request for processing an emergency situation when the determination result of the voice discriminating unit 133 is voice. If the voice of the lone person in the case of an emergency request, such as "contact the hospital quickly" to inform the multi-modal integrated recognition unit 100 that an emergency has occurred. However, if the voice of the lone person is a general conversation rather than an emergency request, the multimodal integrated recognition unit 100 is notified that the lone person's activity is normal.

음성판별부(133)의 판별 결과가 음성이 아닌 것으로 판별된 경우 유리창이 깨지는 소리 또는 컵이 깨지는 소리와 같은 비상 상황과 관련된 소리를 파악하여 멀티모달 통합 인식부(100)에 비상상황 소리가 발생했음을 알린다. 그러나 일상적인 소음일 경우 독거인의 활동이 정상적임을 멀티모달 통합 인식부(100)에 알린다.When the determination result of the voice discriminating unit 133 is determined to be non-negative, an emergency situation sound is generated in the multi-modal integrated recognition unit 100 by grasping a sound related to an emergency situation such as a sound of broken glass or a cup being broken. Inform them. However, in the case of everyday noise informs the multimodal integrated recognition unit 100 that the activity of the lone is normal.

멀티모달 통합 인식부(100)에서는 영상인식 결과, 현재의 음향 인식 상태, 중력센서의 인식 결과를 종합하여 최종결과를 생성한다. 멀티모달 통합 인식부(100)에서 응급상황으로 판단하면, 독거인에게 이상 유무를 재확인하는 음성을 상기 음성출력부(140)를 통해 출력한다. 독거인이 응답하면 상기 음향 처리부(130)에서 음성을 인식하여 멀티모달 통합 인식부(100)에 그 결과를 전달한다. 독거인으 로부터 응답이 없거나 독거인이 응급상황임을 알릴 시에는 상기 응급상황처리부(160)에 응급상황임을 알리는 메시지를 보내어 비상호출시스템이 작동되도록 한다.The multi-modal integrated recognition unit 100 generates a final result by combining the image recognition result, the current acoustic recognition state, and the recognition result of the gravity sensor. When the multimodal integrated recognition unit 100 determines that the emergency situation, the voice output unit 140 outputs a voice for reconfirming the abnormality to the lone person. When the lone responds, the sound processor 130 recognizes the voice and transmits the result to the multi-modal integrated recognition unit 100. When there is no response from the lone person or the lone person is notified of the emergency situation, the emergency call processing unit 160 sends a message informing that the emergency situation is to operate the emergency call system.

음성출력부(140)는 멀티모달 통합 인식부(100)로부터 음성신호를 수신하여 음성메시지를 출력한다. 음성메시지는 독거인의 행동에 이상이 발생했을 경우 이상 유무를 재확인하는 메시지이다. 또한, 응급상황임이 최종 판단되면, 외부에서 알 수 있도록 구조요청 메시지 또는 경고음을 발생시키도록 한다.The voice output unit 140 receives a voice signal from the multi-modal integrated recognition unit 100 and outputs a voice message. The voice message is a message to reconfirm the presence of an abnormality in the behavior of a single person. In addition, when it is finally determined that the emergency situation, it is necessary to generate a rescue request message or a warning sound to know from the outside.

응급상황 처리부(150)는 응급상황 발생 시, 멀티모달 통합 인식부(100)로부터 응급신호를 수신하고, 병원 등의 의료기관이나 응급구조기관 또는 보호자에게 연락을 취한다. 또한, 환자가 입력해 두었거나, 의료기관에서 입력받은 지병의 유무, 혈압, 맥박 등의 정보를 자동으로 응급구조기관으로 전송하여 구조를 원활히 할 수 있다.The emergency situation processing unit 150 receives an emergency signal from the multi-modal integrated recognition unit 100 when an emergency occurs, and contacts a medical institution such as a hospital, an emergency rescue organization, or a guardian. In addition, the patient's input, such as the presence or absence of the disease, blood pressure, pulse, etc. received from the medical institution can be automatically sent to the emergency rescue organization to facilitate the rescue.

도 2는 본 발명에 따른 영상인식 과정을 나타낸 순서도이다. 도 2를 참조하면, 영상입력부를 통해 영상을 입력받아 배경을 검출한다(S210). 배경의 검출은 각 픽셀별로 가우시안 혼합 모델을 이용하여 적응적으로 배경 모델을 생성 유지하는 것이다. 시스템 동작 초기에는 적어도 300프레임의 배경 영상이 입력받아 이를 비교하여 배경 모델을 생성한다.2 is a flowchart illustrating an image recognition process according to the present invention. Referring to FIG. 2, an image is received through an image input unit to detect a background (S210). Background detection is to adaptively generate and maintain a background model using a Gaussian mixture model for each pixel. At the beginning of the system operation, a background image of at least 300 frames is input and compared to generate a background model.

배경 모델을 생성한 후 입력된 영상 내에서 그림자로 판단되는 픽셀을 찾아내어 제거한다(S220). 그림자 영역에서는 밝기가 어두워지고 원래의 색상은 유지되는 성질이 있으므로 밝기 변화와 색상 변화량을 이용하여 그림자 영역인지를 판별 한다.After generating the background model, the pixel determined to be a shadow in the input image is found and removed (S220). In the shadow area, the brightness becomes dark and the original color is maintained, so it is determined whether it is the shadow area by using the brightness change and the color change amount.

또한, 전경 픽셀들로 이루어진 연결된 영역들 중에서 아주 작은 영역을 찾아서 고립점을 제거한다(S230). 움직이는 객체에 포함되지 않은 영역에서도 잡음에 의하여 픽셀들이 전경 픽셀로 분류될 수 있는데, 이 단계에서는 이러한 잡음에 의한 영역들을 제거하는 것이다. In addition, the isolation point is removed by finding a very small area among the connected areas of the foreground pixels (S230). Even in areas not included in the moving object, pixels can be classified as foreground pixels by noise. In this step, the areas caused by the noise are removed.

배경 모델이 생성된 후 활동 객체가 입력 영상 내에 나타나면 해당 픽셀은 배경과 차이가 나게 되어 전경 픽셀로 분류된다. 이 단계에서는 전경 픽셀만 남기고 나머지 픽셀은 제거한다.If the activity object appears in the input image after the background model is generated, the pixel is different from the background and classified as a foreground pixel. This step leaves only the foreground pixel and removes the remaining pixels.

영상을 통해 활동객체가 감지되면 형태 근사 연산을 하여 서로 떨어져 있는 영역들을 합병하는 작업을 수행한다(S240). 움직이는 객체의 일부가 배경과 비슷한 색상을 가질 경우에 객체의 일부분이 배경으로 분류되어 객체가 분리될 수 있는데, 이 연산을 적용함으로써 분리되었던 객체의 부분들을 합병할 수 있다. 활동 객체의 움직임을 파악하기 위하여 연결되어 있는 전경 픽셀 영역들의 외곽점들을 추적함으로써 윤곽선을 검출한다(S250).When the active object is detected through the image, a shape approximation operation is performed to merge regions separated from each other (S240). If a part of the moving object has a color similar to the background, the part of the object may be classified as a background and the object may be separated. By applying this operation, the parts of the separated object may be merged. The contour is detected by tracking the outer points of the connected foreground pixel areas in order to detect the movement of the active object (S250).

윤곽선을 검출하면 윤곽선을 가장 잘 감싸는 타원을 찾아 윤곽선의 형태를 타원으로 매핑하여 영역의 형태를 단순화한다(S260). 활동객체를 타원으로 매핑한 후(S260), 타원을 추적함으로써, 사람을 검출하고 추적한다(S270). 타원의 추적은 타원의 크기 및 위치의 변화와 이동 속도 정보를 추출하여 이루어진다. If the contour is detected, the shape of the region is simplified by finding an ellipse that best surrounds the contour and mapping the shape of the contour into an ellipse (S260). After mapping the activity object to an ellipse (S260), by tracking the ellipse, a person is detected and tracked (S270). The tracking of the ellipse is performed by extracting the change of the size and position of the ellipse and the moving speed information.

그리고 이들 정보로부터 사람에 해당하는 타원을 검출하고 추적하여 영상에 사람이 존재하는지, 이동 중인지, 정지해 있는지, 움직임이 실신 동작과 유사한지 를 판단한다. 또한 입력 영상에 화장실, 안방, 현관문 등과 같은 영역을 설정하여 독거인의 위치 변화를 판단한다.From this information, an ellipse corresponding to a person is detected and tracked to determine whether a person exists in the image, whether it is moving or stationary, and whether the movement is similar to the fainting motion. In addition, the location of the living alone is determined by setting an area such as a toilet, a room, a front door, etc. in the input image.

독거인의 움직임을 판단하여(S280) 독거인에게 이상이 없을 경우 계속 사람의 검출 및 추적 작업을 지속하고(S270), 응급상황이 발생할 경우 멀티모달 통합 인식부로 응급상황을 전달하여 응급호출을 지시한다(S290).Judging the movement of the lonelier (S280), if there is no abnormality of the lonelier continues to detect and track the person (S270), and in case of an emergency, the emergency situation is delivered to the multi-modal integrated recognition unit to instruct an emergency call. (S290).

도 3은 본 발명에 따른 중력센서 처리부 인식 과정을 나타낸 순서도이다. 도 3을 참조하면, x, y, z 3축에서 각축의 전압(기울기 정보)을 측정한다(S310). 측정된 x, y, z 축에 대한 중력센서의 기울기는 마이크로 프로세서에서 각 축의 틀어짐을 측정하여 보정함으로써(S320) 센서가 임의의 기울기에서 시작하더라도 기울기를 측정할 수 있도록 한다. 보정을 마친 각 축의 기울기 정보를 기준으로 하여 x, y, z 축 기울기를 검출하게 된다(S330). 이렇게 연속적으로 검출된 기울기를 연산함으로써 움직임을 인식하고 각 축의 기울어짐 변화와 충격 정보를 추출한다(S340).3 is a flowchart illustrating a process of recognizing a gravity sensor processor according to the present invention. Referring to FIG. 3, voltages (tilt information) of each axis are measured in three axes x, y, and z (S310). The tilt of the gravity sensor with respect to the measured x, y, and z axes is measured and corrected by measuring the distortion of each axis in the microprocessor (S320), so that the sensor can measure the tilt even if it starts at an arbitrary tilt. The tilt of the x, y, and z axes is detected based on the tilt information of each axis after the correction (S330). By sequentially calculating the detected slopes, the motion is recognized and the tilt change and the impact information of each axis are extracted (S340).

그리고 이들 정보로부터 사용자가 현재 활동 중인지, 정지해 있는지, 이동 중인지, 누웠는지를 판단한다. 또한 넘어지거나 부딪힐 경우 각 축에 대해 중력센서를 통한 충격 정보가 검출됨으로써 실신 동작과 유사한지를 판단한다(S350). 응급상황으로 판단될 경우 멀티모달 응급상황 인식부로 응급상황이 발생했음을 전달한다(S360).From this information, it is determined whether the user is currently active, stationary, moving or lying down. In addition, when falling or hitting, it is determined whether the impact information through the gravity sensor for each axis is similar to the fainting motion (S350). If it is determined that the emergency situation is transmitted to the multi-modal emergency situation recognition unit that the emergency occurred (S360).

도 4는 본 발명에 따른 음성/음향 인식 과정을 나타낸 순서도이다. 도 4를 참조하면 음향입력부를 통해 입력된 음향에서 채널 잡음 및 배경잡음 등을 제거하고(S410), 무음이 아닌 소리가 있는 부분의 시작점과 종료점을 검출하는 끝점 검출 을 수행한다(S420).4 is a flowchart illustrating a voice / sound recognition process according to the present invention. Referring to FIG. 4, channel noise and background noise are removed from the sound input through the sound input unit (S410), and end point detection is performed to detect a start point and an end point of a portion having a non-silent sound (S420).

검출된 음향에서 음성판별, 음성 인식 및 소리 인식를 위하여 프레임별 특징을 추출하며(S430), 음성 인식 및 소리 분류를 위한 특징으로는 멜프리컨시 캡스트럴 계수(Mel-Frequency Cepstral Coefficients, MFCC)를 사용한다. From the detected sound, features for each frame are extracted for speech discrimination, speech recognition, and sound recognition (S430), and Mel-Frequency Cepstral Coefficients (MFCC) is a feature for speech recognition and sound classification. Use

멜프리컨시 캡스트럴 계수는 분석 구간의 오디오 신호에 푸리에 변환을 취하여 스펙트럼을 구한 후, 구한 스펙트럼에 대해 멜 스케일에 맞춘 삼각 필터 뱅크를 대응시켜 각 밴드에서의 크기의 합을 구하고 필터 뱅크 출력값에 로그를 취한 후, 이산 코사인 변환을 하여 구해진 특징 벡터로써 음성 인식에서 많이 사용되며, 스펙트럼을 기반으로 인간의 청각 특성을 나타내는 것으로 알려져 있다. The melprecency capsular coefficients are obtained by taking Fourier transforms on the audio signals of the analysis section.Then, the sum of the magnitudes in each band is obtained by matching the triangular filter banks according to the mel scale with the obtained spectrums. It is known to represent human auditory characteristics based on spectrum as a feature vector obtained by taking a logarithm to, and performing discrete cosine transform.

음성과 비음성의 판별을 위한 특징으로는 모듈레이션 에너지(Modulation Energy, ME), 켑스트럴 플럭스(Cepstral Flux, CF) 및 멜프리컨시 캡스트럴 모듈레이션 에너지(Mel-Frequency Cepstrum Modulation Energy, MCME) 등의 특징 벡터들을 구한다. 멜프리컨시 캡스트럴 계수가 임의의 시간에 존재하는 오디오 신호의 음향적 특징을 반영한다면 나머지의 특징 벡터들은 보다 넓은 시간 구간에서의 음향적 특징의 변화 양상을 나타내기 위해 사용되는 특징 벡터들이다. Features for discriminating voice and non-voice include Modulation Energy (ME), Cepstral Flux (CF) and Mel-Frequency Cepstrum Modulation Energy (MCME) Find feature vectors, etc. The remaining feature vectors are the feature vectors used to represent the changing aspect of the acoustic feature over a wider time interval, provided that the melprincency capsular coefficients reflect the acoustic characteristics of the audio signal present at any time. .

모듈레이션 에너지는 스펙트럼을 반영하는 필터 뱅크의 출력값의 벡터 열에 대하여 푸리에 변환을 취함으로써 스펙트럼의 시간에 따른 변화의 정도를 나타내는 특징 벡터이다. 켑스트럴 플럭스는 인접한 프레임에서부터 다소 멀리 떨어진 프레임까지 여러 프레임의 켑스트럼 거리(켑스트럼 성분들의 차의 제곱)를 계산하고 이들 에 대한 평균을 구함으로써 켑스트럼의 시간에 따른 변화량을 나타낸다. The modulation energy is a feature vector that represents the degree of change over time of the spectrum by performing a Fourier transform on the vector sequence of output values of the filter banks that reflect the spectrum. The spectral flux represents the amount of change in the cepstrum over time by calculating the mean spectral distance (the square of the difference between the spectral components) of several frames from an adjacent frame to a slightly distant frame. .

멜프리컨시 캡스트럴 모듈레이션 에너지는 모듈레이션 에너지가 스펙트럼을 기반으로 하여 푸리에 변환을 수행하는 데 반하여 켑스트럼 영역에서 푸리에 변환을 수행한다. 오디오 신호의 스펙트럼은 피치 하모닉 성분 등 음정 변화에 따른 세밀한 스펙트럼의 변화에 민감하여 판별 성능이 켑스트럼 기반 방법보다 저하되는 문제점이 있다. 따라서, 스펙트럼보다 상호 상관이 적은 켑스트럼을 이용하여 푸리에 변환을 수행하여 시간에 따른 변화양상을 측정함으로써 보다 신뢰도 높은 판별 성능을 보인다. The melprecipitation capsular modulation energy performs the Fourier transform in the cepstral region, whereas the modulation energy performs the Fourier transform based on the spectrum. The spectrum of the audio signal is sensitive to the change in the detailed spectrum due to the change of pitch such as pitch harmonic component, and thus there is a problem that the discrimination performance is lower than that of the spectrum-based method. Therefore, the Fourier transform is performed using a cepstrum with less cross-correlation than the spectrum to measure the change pattern over time, thereby showing more reliable discrimination performance.

추출된 음성/비음성 판별을 위한 특징을 이용하면 입력된 소리가 음성 인지 아니면 비음성의 소리인지를 판별할 수 있다(S440). If the extracted voice / non-voice discrimination feature is used, it may be determined whether the input sound is voice or non-voice sound (S440).

판별된 결과(S440)가 음성으로 판별된 경우 음성 인식을 수행한다(S450). 만일 독거인의 음성이 “빨리 병원에 연락해줘”와 같은 저장된 응급상황 발생 메시지 인 경우 응급상황으로 판단하여(S470) 멀티모달 통합 인식부에 응급상황이 발생했음을 전달한다(S480). 그러나 독거인의 음성이 비상 상황 요청이 아닌 정상적인 상황일 경우 계속 음향입력부를 통해 음향을 입력받고(S410), 데이터를 처리한다. 음성 판별 결과(S440) 비음성으로 판별되면 소리를 인식한다(S460). 유리창이 깨지는 소리 또는 컵이 깨지는 소리와 같은 응급상황과 관련된 소리의 경우(S470) 멀티모달 통합 인식부에 응급상황이 발생했음을 알린다(S480). 그러나 일반적인 소리일 경우 상기 과정을 반복한다(S410).If it is determined that the determined result (S440) is speech, speech recognition is performed (S450). If the voice of the lone person is a stored emergency occurrence message such as “please contact the hospital”, it is determined as an emergency (S470) and the multimodal integrated recognition unit transmits that an emergency occurred (S480). However, when the voice of the lone is a normal situation instead of an emergency request, the sound is continuously input through the sound input unit (S410), and the data is processed. Speech determination result (S440) If it is determined that the voice is non-voice is recognized (S460). In the case of a sound related to an emergency situation such as the sound of a broken glass or a broken glass (S470), the multimodal integrated recognition unit notifies that an emergency has occurred (S480). However, if the general sound is repeated (S410).

도 5는 본 발명에 따른 멀티모달 통합 인식부 처리 과정을 나타낸 순서도이다. 도 5를 참조하면 멀티모달 통합 인식부는 영상처리부, 중력센서처리부 또는 음 향처리부 중 하나 이상으로부터 응급상황이 발생했음을 전달받아(S510) 데이터베이스에 저장한다. 멀티모달 통합 인식부는 응급상황을 재판단하기 위해 영상, 음성 및 중력 정보를 가공하여 응급상황이 발생한 시간을 동기화하는 전처리 과정을 수행한다(S520).5 is a flowchart illustrating a multimodal integrated recognition process according to the present invention. Referring to FIG. 5, the multi-modal integrated recognition unit receives an emergency occurrence from one or more of an image processor, a gravity sensor processor, or a sound processor (S510) and stores it in a database. The multi-modal integrated recognition unit processes the image, voice, and gravity information to judge the emergency situation and performs a preprocessing process to synchronize the time of occurrence of the emergency situation (S520).

동일시간으로 동기화된 영상정보, 음성정보 및 중력정보를 토대로 독거인의 동작이 실제 실신 동작인지, 정상 동작인지 판별한다(S530). 실신동작을 판별하는 것은 영상정보로부터 독거인의 위치, 움직임, 자세와 같은 상태정보를, 중력정보로부터는 움직이거나 움직임이 없는 동작정보를, 음성으로부터는 소리에 의한 음성, 잡음과 같은 상태정보를 추출하여 실신동작 인식률을 높인다. 실신동작으로 판별되는 경우 음성출력부를 통해 독거인의 주위를 환기시킬 수 있는 음성 또는 경고음으로 응급상황을 재확인한다(S540).Based on the synchronized image information, voice information and gravity information at the same time, the determination of the lone lone is a real fainting motion or a normal motion (S530). Determining the synchro motion includes state information such as the position, movement, and posture of the lone, from motion information, motion information with or without motion from gravity information, and state information such as voice and noise from sound. The recognition rate is increased by extracting. If it is determined as a fainting operation, the emergency situation is reconfirmed with a voice or a warning sound that can ventilate the surrounding person through the voice output unit (S540).

경고 후에도 독거인으로부터 일정 시간 동안 응답이 없거나 음향처리부로부터 응급상황 발생 확인 신호를 받으면 즉시 응급상황처리부를 호출해 응급 상황을 알린다(S550).If there is no response for a certain period of time even after the warning or the emergency signal is confirmed from the sound processor, the emergency situation processor is called immediately to notify the emergency (S550).

본 발명은 이상에서 살펴본 바와 같이 바람직한 실시예를 들어 도시하고 설명하였으나, 상기한 실시예에 한정되지 아니하며 본 발명의 정신을 벗어나지 않는 범위 내에서 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변경과 수정이 가능할 것이다.Although the present invention has been shown and described with reference to the preferred embodiments as described above, it is not limited to the above embodiments and those skilled in the art without departing from the spirit of the present invention. Various changes and modifications will be possible.

본 발명은 영상정보, 음성정보, 동작정보를 종합하여 실신을 비롯한 응급상황을 정확히 판단하여 신속하게 응급호출을 하도록 하는 응급호출 시스템 등에 사용이 가능하다.The present invention can be used for an emergency call system for quickly determining an emergency situation including the fainting by combining image information, audio information, and motion information.

도 1은 본 발명에 따른 전체 시스템을 나타낸 블록도, 1 is a block diagram showing an overall system according to the present invention;

도 2는 본 발명에 따른 영상인식 과정을 나타낸 순서도, 2 is a flowchart illustrating an image recognition process according to the present invention;

도 3은 본 발명에 따른 중력센서 처리부 인식 과정을 나타낸 순서도, 3 is a flowchart illustrating a process of recognizing a gravity sensor processor according to the present invention;

도 4는 본 발명에 따른 음성/음향 인식 과정을 나타낸 순서도, 4 is a flowchart illustrating a voice / sound recognition process according to the present invention;

도 5는 본 발명에 따른 멀티모달 통합 인식부 처리 과정을 나타낸 순서도이다. 5 is a flowchart illustrating a multimodal integrated recognition process according to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100: 멀티모달 통합 인식부 110: 영상처리부 100: multi-modal integrated recognition unit 110: image processing unit

120: 중력센서 처리부 130: 음향처리부120: gravity sensor processing unit 130: sound processing unit

140: 음성출력부 150: 응급상황처리부 140: voice output unit 150: emergency situation processing unit

Claims

An image processing unit which receives an image and detects and tracks a user from the image, recognizes the fainting motion, and delivers the multimodal integrated recognition unit;

A gravity sensor processor for detecting a user's motion through a gravity sensor attached to a user's body, recognizing a faint motion, and transmitting the multimodal integrated recognition unit;

A sound processor that receives a sound and distinguishes a voice from a non-voice, recognizes a spoken word for a voice, and recognizes a type of sound for a non-voice and transmits the multi-modal integrated recognition unit in case of an emergency; And

Integrate the information detected by any one or more of the image processing unit, gravity sensor processing unit, sound processing unit to determine the emergency situation, reconfirm the emergency situation to the user through the audio output unit, and when an emergency occurs emergency call request signal to the emergency situation processing unit Multimodal integrated recognition unit for transmission

Multi-modal information-based emergency situation recognition system comprising a.

The method of claim 1,

A voice output unit receiving the signal of the multi-modal integrated recognition unit and outputting a voice signal; And

An emergency situation processing unit for receiving an emergency call request signal from the multi-modal integrated recognition unit and transmits an emergency call signal to an external organization.

Multi-modal information-based emergency situation recognition system further comprises.

The image processing apparatus of claim 1, wherein the image processor comprises:

Multi-modal information-based emergency situation recognition system that detects syncope and recognizes emergency situations by mapping moving people to ellipses from fish-eye images acquired using fish-eye lenses and measuring size changes, position changes, and movement speeds of ellipses.

The method of claim 1, wherein the gravity sensor processing unit,

Multi-modal information-based emergency situation recognition system that can measure the three-axis information of x, y, z by gravitational sensor attached to the user, and detect the posture and faint motion of the user by grasping the tilt information of each axis according to the movement.

The method of claim 1, wherein the sound processing unit,

Multi-modal information-based emergency situation recognition system that can distinguish emergency from speech by distinguishing between speech and non-voice and recognizing speech words for voice and sound type for non-voice.

A first step of detecting a fainting motion of a user in at least one of an image processor, a gravity sensor processor, and an audio processor and transmitting an emergency signal to a multi-modal integrated recognition unit;

A second step of synchronizing the time at which the emergency occurred by processing the image, audio, and gravity information to judge the emergency situation;

A third step of determining whether a fainting operation is performed based on the synchronized image information, sound information, and gravity information;

A fourth step of re-confirming an emergency by generating a voice or warning sound; And

A fifth step of performing an emergency call to the outside when the emergency situation is finally determined through the reconfirmation process;

Multi-modal information-based emergency situation recognition method comprising a.