KR20110125431A

KR20110125431A - Method and apparatus for generating life log in portable termianl

Info

Publication number: KR20110125431A
Application number: KR1020100044953A
Authority: KR
Inventors: 정명기; 고한석; 김기현; 윤종성; 최우현
Original assignee: 삼성전자주식회사; 고려대학교 산학협력단
Priority date: 2010-05-13
Filing date: 2010-05-13
Publication date: 2011-11-21
Also published as: KR101660306B1

Abstract

PURPOSE: A method and device for generating a life log in portable terminal are provided to generate a life log through the situation of a user. CONSTITUTION: A sound environment recognizing unit(212) recognizes a sound environment corresponding to a sound signal inputted from a microphone among the preset sound environment. An image environment recognizing unit(214) recognizes an image environment corresponding to an image signal inputted from a camera among the preset image environments. A situation determining unit(220) determines a user situation corresponding to a sound environment recognition result and an image environment recognition result among the preset situation models. The situation determining unit records the determined user situation as a life log.

Description

METHOD AND APPARATUS FOR GENERATING LIFE LOG IN PORTABLE TERMIANL}

본 발명은 휴대용 단말기에서 라이프 로그를 생성하는 방법 및 장치에 관한 것으로서, 특히 마이크 및 카메라를 이용하여 사용자의 상황을 판단하고, 이를 통해 라이프 로그를 생성하는 방법 및 장치에 관한 것이다.
The present invention relates to a method and apparatus for generating a lifelog in a portable terminal, and more particularly, to a method and apparatus for determining a situation of a user using a microphone and a camera and thereby generating a lifelog.

최근 들어, 휴대용 단말기들이 발전하고, 활용도가 높아짐에 따라 사용자의 일상 생활 속에서 획득 가능한 정보를 기록하여 필요에 따라 검색할 수 있도록 하는 라이프로그(Life Log) 서비스가 제공되고 있다. 즉, 상기 라이프 로그 서비스는 카메라, GPS(Global Positioning System), 조도센서, 지자기 센서, 온도계 등의 다양한 센서들로부터 사용자의 일상 생활 속에서 정보를 획득하여 기록한 후, 상기 기록된 정보를 이후에 이용할 수 있도록 한다.Recently, with the development of portable terminals and increased utilization, life log services have been provided to record information that can be obtained in a user's daily life and to search as needed. That is, the life log service acquires and records information in a user's daily life from various sensors such as a camera, a global positioning system (GPS), an illuminance sensor, a geomagnetic sensor, and a thermometer, and then uses the recorded information later. To help.

종래에 제공된 라이프 로그 서비스는 사용자의 시선과 일치하는 카메라 및 착용 가능한 장치들을 사용자 몸에 장착하여 시간, 위치, 밝기, 사용자 시선 영상 등 다양한 사용자 주변 상황 데이터를 수집한 후, 이를 웹 서버로 전송하여 저장함으로써, 사용자가 웹 서비스를 통해 자신의 라이프 로그를 확인할 수 있도록 하고 있다.The life log service provided in the related art collects various user's surrounding situation data such as time, location, brightness and user's gaze by mounting cameras and wearable devices that match the user's eyes on the user's body, and then transmits the data to the web server. By saving, the user can check his or her life log through a web service.

상기와 같이 종래의 라이프 로그 서비스는 사용자의 다양한 정보를 얻기 위해 많은 수의 센서들을 사용자의 몸에 장착해야 한다. 하지만, 이와 같이 사용자의 몸에 많은 수의 센서들을 장착하는 것은 사용자의 일상생활에서 부자연스러움을 유발하는 것은 물론, 사용자가 거부감을 일으킬 수 있기 때문에 상업적으로 실용화되기 어려운 단점이 있다. 또한, 종래의 라이프 로그 기법에서는 수집된 정보 예를 들어, 시간, 위치, 밝기, 사용자 시선 영상 등을 단순한 형태로 웹 서버에 전송 및 저장함으로써, 사용자의 행위나 사용자가 처한 환경 등과 같은 종합적인 상황을 나타내지 못하는 단점이 있다.
As described above, the conventional lifelog service requires mounting a large number of sensors on the user's body in order to obtain various information of the user. However, the mounting of a large number of sensors on the user's body as well as causing unnaturalness in the user's daily life, as well as the user's rejection has a disadvantage that it is difficult to be commercially practical. In addition, in the conventional lifelog technique, the collected information, for example, time, location, brightness, and user's gaze image is transmitted and stored in a simple form on a web server, and thus a comprehensive situation such as a user's behavior or an environment in which the user is placed. There is a disadvantage that does not represent.

본 발명은 상술한 바와 같은 문제점을 해결하기 위해 도출된 것으로서, 본 발명의 목적은 휴대용 단말기에서 라이프 로그를 생성하는 방법 및 장치를 제공함에 있다.The present invention has been made to solve the above problems, and an object of the present invention is to provide a method and apparatus for generating a lifelog in a portable terminal.

본 발명의 다른 목적은 휴대용 단말기에서 마이크 및 카메라로부터 입력되는 영상 및 음향을 이용하여 사용자의 상황을 판단하고, 이를 통해 라이프 로그를 생성하는 방법 및 장치를 제공함에 있다.Another object of the present invention is to provide a method and apparatus for determining a situation of a user by using an image and a sound input from a microphone and a camera in a portable terminal, and generating a life log through the same.

본 발명의 또 다른 목적은 휴대용 단말기에서 휴대용 단말기에서 마이크 및 카메라로부터 입력되는 영상 및 음향을 처리 및 인지하기 위한 스케줄링 방법 및 장치를 제공함에 있다.
It is still another object of the present invention to provide a scheduling method and apparatus for processing and recognizing an image and sound input from a microphone and a camera in a portable terminal.

상술한 목적들을 달성하기 위한 본 발명의 제 1 견지에 따르면, 휴대용 단말기에서 라이프 로그 생성 방법은, 기 설정된 음향 환경들 중에서 마이크로부터 입력되는 음향 신호에 대응되는 음향 환경을 인지하는 과정과, 기 설정된 영상 환경들 중에서 카메라로부터 입력되는 영상 신호에 대응되는 영상 환경을 인지하는 과정과, 기 설정된 상황 모델들 중에서 상기 음향 환경 인지 결과와 영상 환경 인지 결과에 대응되는 사용자 상황을 판단하는 과정과, 판단된 사용자 상황을 라이프 로그로 기록하는 과정을 포함하는 것을 특징으로 한다.According to a first aspect of the present invention for achieving the above objects, the method for generating a lifelog in a portable terminal includes the steps of recognizing a sound environment corresponding to a sound signal input from a microphone among preset sound environments; A process of recognizing an image environment corresponding to an image signal input from a camera among image environments, a process of determining a user situation corresponding to the acoustic environment recognition result and the image environment recognition result among preset context models; And recording a user situation in a life log.

상술한 목적들을 달성하기 위한 본 발명의 제 2 견지에 따르면, 휴대용 단말기의 라이프 로그 생성 장치는, 음향 신호를 입력받는 마이크와, 영상 신호를 입력받는 카메라와, 기 설정된 음향 환경들 중에서 상기 마이크로부터 입력되는 음향 신호에 대응되는 음향 환경을 인지하는 음향 환경 인지부와, 기 설정된 영상 환경들 중에서 상기 카메라로부터 입력되는 영상 신호에 대응되는 영상 환경을 인지하는 영상 환경 인지부와, 기 설정된 상황 모델들 중에서 상기 음향 환경 인지 결과와 영상 환경 인지 결과에 대응되는 사용자 상황을 판단하고, 판단된 사용자 상황을 라이프 로그로 기록하는 상황 판단부를 포함하는 것을 특징으로 한다.
According to a second aspect of the present invention for achieving the above object, the life log generation apparatus of a portable terminal includes a microphone for receiving an audio signal, a camera for receiving an image signal, and the microphone from among the preset acoustic environments. An acoustic environment recognition unit for recognizing an acoustic environment corresponding to an input audio signal, an image environment recognition unit for recognizing an image environment corresponding to an image signal input from the camera among preset image environments, and preset situation models And a situation determination unit for determining a user situation corresponding to the acoustic environment recognition result and the image environment recognition result, and recording the determined user situation in a life log.

본 발명은 휴대용 단말기에서 마이크 및 카메라로부터 입력되는 영상 및 음향을 이용하여 사용자의 상황을 판단하고, 이를 통해 라이프 로그를 생성함으로써, 사용자가 별도의 장치를 구매하거나 소지하지 않고서 일상생활에서 라이프 로그를 생성할 수 있으며, 라이프 로그만으로 사용자의 생활 패턴을 알 수 있는 효과가 있다. 또한, 본 발명에서 제안한 스케줄링 방식에 따라 영상과 음향을 처리 및 인지함으로써, 영상과 음향을 처리 및 인지하는데 소모되는 시간을 감소시킬 수 있는 효과가 있다.
The present invention determines the user's situation using the image and sound input from the microphone and the camera in the portable terminal, and generates a life log through this, so that the user does not purchase or possess a separate device, the life log in daily life It can be created, the life log only has the effect of knowing the user's life patterns. In addition, by processing and recognizing the image and sound in accordance with the scheduling method proposed in the present invention, it is possible to reduce the time spent processing and recognizing the image and sound.

도 1은 본 발명의 실시 예에 따라 사용자의 주변 환경에 따라 사용자 상황을 판단하는 예를 도시하는 도면,
도 2는 본 발명의 실시 예에 따른 휴대용 단말기의 블록 구성을 도시하는 도면,
도 3은 본 발명의 실시 예에 따른 휴대용 단말기에서 영상 신호 분석 절차를 도시하는 도면,
도 4는 본 발명의 실시 예에 따른 휴대용 단말기에서 영상 신호 분석 절차에 따라 분석 영역을 제한하는 예를 도시하는 도면,
도 5는 본 발명의 실시 예에 따른 휴대용 단말기에서 영상과 음성 처리를 위한 스케줄링 기법을 도시하는 도면,
도 6은 본 발명의 실시 예에 따른 휴대용 단말기의 상황 통계 모델을 도시하는 도면,
도 7은 본 발명의 실시 예에 따른 휴대용 단말기에서 영상 및 음성과 상황 통계 모델을 통해 사용자 상황을 판단하는 예를 도시하는 도면,
도 8은 본 발명의 실시 예에 따른 휴대용 단말기에서 라이프 로그 저장 방식을 도시하는 도면,
도 9는 본 발명의 실시 예에 따른 휴대용 단말기에서 라이프 로그를 생성하여 저장하는 절차를 도시하는 도면,1 is a diagram illustrating an example of determining a user situation according to a surrounding environment of a user according to an embodiment of the present invention;
2 is a block diagram of a portable terminal according to an embodiment of the present invention;
3 is a diagram illustrating a video signal analysis procedure in a portable terminal according to an embodiment of the present invention;
4 is a diagram illustrating an example of limiting an analysis area according to an image signal analysis procedure in a portable terminal according to an embodiment of the present invention;
5 is a diagram illustrating a scheduling technique for processing video and audio in a portable terminal according to an embodiment of the present invention;
6 is a diagram illustrating a situation statistical model of a portable terminal according to an embodiment of the present invention;
7 is a diagram illustrating an example of determining a user situation through a video, audio, and situation statistics model in a portable terminal according to an embodiment of the present invention;
8 is a diagram illustrating a lifelog storage method in a portable terminal according to an embodiment of the present invention;
9 is a diagram illustrating a procedure of generating and storing a lifelog in a portable terminal according to an embodiment of the present invention;

이하 본 발명의 바람직한 실시 예를 첨부된 도면을 참조하여 설명한다. 그리고, 본 발명을 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단된 경우 그 상세한 설명은 생략한다.Hereinafter, exemplary embodiments of the present invention will be described with reference to the accompanying drawings. In describing the present invention, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

이하 본 발명에서는 휴대용 단말기에서 마이크 및 카메라로부터 입력되는 영상 및 음향을 이용하여 사용자의 상황을 판단하고, 이를 통해 라이프 로그를 생성하는 기술에 관해 설명할 것이다.
Hereinafter, a description will be given of a technology of determining a user's situation using a video and a sound input from a microphone and a camera in a portable terminal, and generating a lifelog.

본 발명에 따른 휴대용 단말기는 마이크로부터 입력되는 음향과 카메라로부터 입력되는 영상을 통해 주변 환경을 인지하고, 인지된 결과를 통해 사용자의 상황을 판단한다. 예를 들어, 도 1에 도시된 바와 같이, 상기 휴대용 단말기는 상기 마이크로부터 입력되는 음향을 분석하여 자동차 소리(100), 음악 소리(102), 웅성거리는 소리(104)를 인지하고, 상기 카메라로부터 입력되는 영상을 분석하여 인물이 걷는 영상(106)임을 인지한 후, 상기 자동차 소리(100), 음악 소리(102), 웅성거리는 소리(104) 및 걷는 영상(106)을 종합하여 사용자의 상황이 길거리 이동 중(110)인 상황임을 판단할 수 있다.
The portable terminal according to the present invention recognizes the surrounding environment through the sound input from the microphone and the image input from the camera, and determines the user's situation based on the recognized result. For example, as shown in FIG. 1, the portable terminal analyzes the sound input from the microphone to recognize the car sound 100, the music sound 102, the roaring sound 104, and from the camera. After analyzing the input image to recognize that the person is walking image 106, the car sound 100, the music sound 102, the roaring sound 104 and the walking image 106 by combining the user's situation is It may be determined that the situation is during the street movement 110.

도 2는 본 발명의 실시 예에 따른 휴대용 단말기의 블록 구성을 도시하고 있다. 상기 도 2에 도시된 바와 같이, 상기 휴대용 단말기는 입력부(200), 인지 결과 누적부(210), 상황 판단부(220), 상황 통계 모델(230), 저장부(240)를 포함하여 구성되며, 상기 입력부(200)는 마이크(202)와 카메라(204)를 포함하며, 상기 인지 결과 누적부(210)는 음향 환경 인지부(212)와 영상 환경 인지부(214)를 포함하여 구성된다.2 is a block diagram of a portable terminal according to an exemplary embodiment of the present invention. As shown in FIG. 2, the portable terminal includes an input unit 200, a recognition result accumulator 210, a situation determination unit 220, a situation statistical model 230, and a storage unit 240. The input unit 200 includes a microphone 202 and a camera 204, and the recognition result accumulator 210 includes an acoustic environment recognizer 212 and an image environment recognizer 214.

먼저, 상기 입력부(200)는 사용자 상황 판단에 필요한 음향 신호와 영상 신호를 입력받아 상기 인지 결과 누적부(210)로 제공한다. 즉, 상기 입력부(200)는 상기 마이크(202)를 통해 사용자 주변의 다양한 음향 신호를 입력받고, 상기 카메라(204)는 카메라 센서를 통해 사용자 주변의 영상 신호를 입력받는다. 상기 입력부(200)는 기 설정된 주기마다 활성화되어 소정 시간 동안 상기 음향 신호와 영상 신호를 입력받아 상기 인지 결과 누적부(210)로 제공하고, 상기 소정 시간이 만료되면 다음 주기가 될 때까지 비활성화 상태로 되돌아간다.First, the input unit 200 receives an audio signal and an image signal necessary for determining a user situation and provides the received result to the accumulator 210. That is, the input unit 200 receives various sound signals around the user through the microphone 202, and the camera 204 receives an image signal around the user through a camera sensor. The input unit 200 is activated every predetermined period, receives the sound signal and the image signal for a predetermined time and provides the recognition result accumulator 210, and when the predetermined time expires, deactivates until the next cycle. Return to

상기 인지 결과 누적부(210)는 상기 입력부(200)로부터 누적 시간 동안 입력되는 음향 신호와 영상 신호를 미리 설정된 단구간(예: 3초 이내의 구간)마다 분석하여 음향 환경과 영상 환경을 인지하고, 상기 누적 시간 동안 인지된 결과를 누적시켜 상기 상황 판단부(220)로 제공한다. 특히, 상기 인지 결과 누적부(210)는 상기 음향 환경 인지부(212)를 통해 상기 음향 신호를 분석하여 음향 환경을 인지하고, 상기 영상 환경 인지부(214)를 통해 영상 신호를 분석하여 영상 환경을 인지한다. As a result of the recognition, the accumulator 210 analyzes the sound signal and the image signal input during the accumulation time from the input unit 200 for each preset short period (eg, within 3 seconds) to recognize the sound environment and the image environment. The cumulative result accumulated during the cumulative time is accumulated and provided to the situation determiner 220. In particular, the recognition result accumulator 210 recognizes an acoustic environment by analyzing the sound signal through the acoustic environment recognizer 212, and analyzes an image signal through the image environment recognizer 214. Recognize.

상기 음향 환경 인지부(212)는 기 저장된 음향 환경에 대한 가우시안 혼합 모델(Gaussian Mixture Model)들을 참조하여 상기 입력되는 음향 신호에 가장 유사한 음향 환경을 검색 및 인지한다. 여기서, 상기 음향 환경에 대한 가우시안 혼합 모델들은 각각의 음향 환경에 대한 에너지 특징을 나타낼 수 있으며, 상기 음향 환경은 예를 들어, 웅성거리는 소리, 자동차 소리, 음악 소리, 가방과 같은 밀폐 공간 내 소음, 사무실 소음, 사람 음성, 지하철 소음, 조용한 공공장소의 소음, 물 흐르는 소리 및 큰 소리(혹은 큰 소음)로 구분할 수 있다. 즉, 상기 음향 환경 인지부(212)는 상기 마이크(202)로부터 입력되는 음향 신호의 에너지 특징인 MFCC(Mel Frequency Cepstral Coefficient)를 추출한 후, 상기 가우시안 혼합 모델로 나타낸 음향 환경들 중에서 상기 추출한 에너지 특징에 대해 최대 우도(Likelihood) 값을 갖는 음향 환경을 검색한다. The acoustic environment recognizer 212 searches for and recognizes an acoustic environment most similar to the input acoustic signal with reference to Gaussian Mixture Models for the previously stored acoustic environment. Here, the Gaussian mixed models for the acoustic environment may represent energy characteristics for each acoustic environment, and the acoustic environment may include, for example, a loud noise, a car sound, a music sound, a noise in a closed space such as a bag, It can be classified into office noise, human voice, subway noise, quiet public place noise, water flowing sound and loud sound (or loud noise). That is, the acoustic environment recognition unit 212 extracts a MFCC (Mel Frequency Cepstral Coefficient), which is an energy characteristic of an acoustic signal input from the microphone 202, and then extracts the extracted energy characteristic from among acoustic environments represented by the Gaussian mixture model. Search for an acoustic environment with a maximum likelihood value for.

이때, 상기 음향 신호에 임계값 이상 큰 소리가 포함된 경우, 다른 음성, 소음 혹은 소리를 구분하기 어려우므로, 상기 음향 환경 인지부(212)는 상기 입력되는 음향 신호에 큰 소리가 포함되어 있는지 여부를 먼저 판단하여 상기 큰 소리가 포함된 경우 해당 음향 신호를 큰 소리 환경으로 구분하고, 상기 큰 소리가 포함되지 않은 경우 해당 음향 신호에서 에너지 특징을 추출하여 음향 환경을 검색하는 동작을 수행할 수 있을 것이다.In this case, when the sound signal includes a sound greater than or equal to a threshold value, it is difficult to distinguish other voices, noises, or sounds, and thus, the acoustic environment recognition unit 212 determines whether the sound signal includes a loud sound. First, when the loud sound is included, the sound signal may be divided into a loud sound environment, and when the loud sound is not included, an energy feature may be extracted from the sound signal to search for the sound environment. will be.

상기 영상 환경 인지부(214)는 기 설정된 기준에 따라 상기 입력되는 영상 신호를 분석하여 영상 환경을 인지한다. 즉, 상기 영상 환경 인지부(214)는 상기 입력되는 영상 신호의 프레임에 대해 저조명, 이동 여부, 얼굴 유무, 실내외 여부를 검사하여 영상 환경을 인지한다. 상기 영상 환경 인지부(214)는 해당 영상 프레임에 대한 빛의 강도(intensity)(혹은 명도)를 측정하여 저조명 여부를 판단하고, 움직임 벡터를 이용하여 이동중인지 여부를 판단하고, 피부색을 감지하여 얼굴 존재 여부를 판단하며, 색감과 질감 특징(texture feature)을 이용하여 실내인지 혹은 실외인지 여부를 판단한다. 예를 들어, 상기 영상 환경 인지부(214)는 입력되는 영상 프레임에 대한 명도를 나타내는 그레이 스케일(gray scale)을 측정하여 평균 그레이 스케일을 임계값과 비교하여 상기 영상 프레임이 저조명인지 여부를 판단할 수 있다. 또한, 상기 영상 환경 인지부(214)는 입력되는 영상 프레임을 복수의 서브 블록으로 나눈 후, 각 서브 블록에 대한 평균 그레이 스케일을 측정하고, 임계값 이하의 그레이 스케일을 가지는 서브 블록 수에 따라 상기 영상 프레임이 저조명인지 여부를 판단할 수 있다. 또한, 상기 영상 환경 인지부(214)는 종래에 제공된 'Lucas-Kanade 알고리즘' 혹은 'Pyramid 알고리즘'을 이용하여 복수의 영상 프레임들로부터 움직임 벡터를 추출함으로써, 상기 영상 프레임의 이동 여부를 판단할 수 있다. 또한, 상기 영상 환경 인지부(214)는 상기 영상 프레임에서 피부색을 갖는 영역을 검출한 후, 검출된 영역이 기 설정된 최대 및 최소 크기 조건, 가로 대 세로의 비율 조건 및 검출 영역 대 그외 영역의 비율 조건을 만족하는지 여부를 검사하여 상기 영상 프레임에 얼굴이 포함되어 있는지 여부를 판단할 수 있다. 또한, 상기 영상 환경 인지부(214)는 상기 영상 프레임을 복수의 서브 블록으로 나눈 후, 각 서브 블록에 대한 색상 및 질감 특징을 추출하여 기 설정된 임계값과 비교함으로써, 상기 영상 프레임이 실내 영상에 해당하는지 혹은 실외 영상에 해당하는지 판단할 수 있다.The image environment recognizer 214 analyzes the input image signal according to a preset criterion to recognize the image environment. That is, the image environment recognition unit 214 recognizes the image environment by checking the low light, the movement, the presence of the face, the indoor and outdoor for the frame of the input image signal. The image environment recognizing unit 214 determines the low light by measuring the intensity (or brightness) of the light for the corresponding image frame, determines whether it is moving by using a motion vector, detects the skin color The presence of a face is determined, and whether the face is indoors or outdoors is determined by using color and texture features. For example, the image environment recognizer 214 may measure a gray scale representing brightness of an input image frame and compare the average gray scale with a threshold to determine whether the image frame is low light. Can be. In addition, the image environment recognizer 214 divides an input image frame into a plurality of subblocks, and then measures an average gray scale for each subblock, and according to the number of subblocks having a gray scale less than or equal to a threshold. It may be determined whether the image frame is low light. In addition, the image environment recognizer 214 may determine whether the image frame is moved by extracting a motion vector from the plurality of image frames using a 'Lucas-Kanade algorithm' or 'Pyramid algorithm'. have. In addition, the image environment recognition unit 214 detects an area having a skin color in the image frame, and then the detected area is a preset maximum and minimum size condition, a horizontal-to-vertical ratio condition, and a ratio of the detection area to other areas. It may be determined whether the face is included in the image frame by checking whether the condition is satisfied. In addition, the image environment recognizer 214 divides the image frame into a plurality of subblocks, extracts the color and texture features of each subblock, and compares the image frame with a preset threshold value. It may be determined whether the image corresponds to the outdoor image.

특히, 상기 영상 환경 인지부(214)는 상기 입력되는 영상 신호를 실시간으로 처리하기 위해 도 3에 도시된 바와 같은 순서로 해당 영상 프레임을 분석할 수 있다. 먼저, 상기 영상 환경 인지부(214)는 영상 프레임의 저조명(301) 여부를 판단하고, 상기 영상 프레임이 저조명인 경우, 상기 이동 여부, 얼굴 유무, 실내외 여부를 판단하는 절차를 생략한다. 이는, 영상 프레임이 저조명인 경우 영상 인지에 관한 정보가 부족하기 때문이다. 반면, 상기 영상 환경 인지부(214)는 상기 영상 프레임이 저조명이 아닌 경우, 얼굴 유무(303)를 판단하고, 얼굴이 존재하는 경우, 실내인지 혹은 실외(305)인지 여부를 판단한다. 여기서, 상기 영상 환경 인지부(214)는 상기 영상 프레임에 얼굴이 존재하는 경우 이동 여부를 판단하는 절차를 생략할 수 있다. 이는, 상기 영상 프레임에 얼굴이 감지된 경우, 이동 중임을 감지하기 위한 정보가 부족할 수 있기 때문이다. 한편, 상기 영상 프레임에 얼굴이 존재하지 않는 경우, 상기 영상 환경 인지부(214)는 이동 여부(307)를 판단하고 실내인지 혹은 실외(305)인지 여부를 판단한다. 이와 같이 영상 프레임을 분석할 경우, 불필요한 과정을 생략하여 영상 처리 속도를 향상시켜 실시간으로 입력 영상을 처리할 수 있다.In particular, the image environment recognizing unit 214 may analyze the image frame in the order as shown in FIG. 3 to process the input image signal in real time. First, the image environment recognizing unit 214 determines whether the image frame is low light 301, and if the image frame is low light, the procedure for determining whether the movement, the presence of the face, indoors and outdoors are omitted. This is because, when the image frame is low light, information on image recognition is insufficient. On the other hand, when the image frame is not low light, the image environment recognizer 214 determines whether there is a face 303, and if there is a face, determines whether it is indoor or outdoor 305. Here, the image environment recognition unit 214 may omit the procedure of determining whether to move if a face exists in the image frame. This is because, when a face is detected in the image frame, information for detecting movement may be insufficient. On the other hand, if a face does not exist in the image frame, the image environment recognizer 214 determines whether to move (307) and whether it is indoor or outdoor (305). When analyzing the image frame as described above, it is possible to process the input image in real time by improving the image processing speed by eliminating unnecessary processes.

또한, 상기 영상 환경 인지부(214)는 상기 입력되는 영상 신호를 실시간으로 처리하기 위해 도 4에 도시된 바와 같이 분석 영역을 제한하여 해당 영상 프레임을 분석할 수 있다. 즉, 상기 영상 환경 인지부(214)는 해당 영상 프레임 전체 영역을 대상으로 영상 프레임의 저조명 여부를 판단하고, 얼굴 유무를 검사한 후. 상기 영상에 얼굴이 존재할 경우, 얼굴이 존재하는 영역(410)을 제외한 나머지 영역을 분석 영역으로 제한하여 실내외 여부 및 이동 여부를 판단할 수 있다. 이와 같이 영상 프레임의 분석 영역을 제한하는 경우, 영상 처리에 필요한 연산량을 감소시켜 실시간으로 입력 영상을 처리할 수 있다.In addition, the image environment recognition unit 214 may analyze the corresponding image frame by limiting an analysis region as shown in FIG. 4 to process the input image signal in real time. That is, the image environment recognition unit 214 determines whether the image frame is low light for the entire region of the image frame, and checks whether there is a face. When the face exists in the image, the remaining area except for the area 410 where the face exists may be limited to the analysis area to determine whether the indoor / outdoor and the movement. As such, when the analysis region of the image frame is limited, the amount of computation required for image processing may be reduced to process the input image in real time.

여기서, 상기 음향 환경 인지부(212)와 상기 영상 환경 인지부(214)는 도 5에 도시된 바와 같이, 음향 신호와 영상 신호를 서로 다른 시점에 처리할 수 있다. 이때, 상기 음향 환경 인지부(212)와 상기 영상 환경 인지부(214)는 음향 신호와 영상 신호의 입력이 시작된 후 소정 시간이 지난 시점에서 상기 음향 신호와 영상 신호를 동시에 출력하기 위해, 상기 음향 환경을 판단하는데 필요한 시간과 데이터를 고려하고, 상기 영상 환경을 판단하는데 필요한 시간과 데이터를 고려하여 동작한다. 상기 도 5에서는 상기 음향 신호와 영상 신호가 입력되고 3초가 지난 시점에 상기 음향 환경 판단 결과와 상기 영상 판단 결과를 동시에 출력하기 위해, 상기 음향 환경 인지부(212)와 상기 영상 환경 인지부(214)가 동작하는 시점을 나타내고 있다. 상기 도 5에 도시된 바와 같이, 상기 음향 환경 인지부(212)는 3초 동안 주기적으로 반복되는 음향 시점에 음향 신호를 처리하여 음향 환경을 인지하고, 상기 영상 환경 인지부(214)는 상기 3초 동안 주기적으로 반복되는 영상 시점에 영상 신호를 처리한다. 여기서, 상기 음향 시점과 영상 시점은 중복되지 않는다. 상기 영상 환경 인지부(214)는 상기 3초 동안 영상 환경을 판단하기 위해 먼저 저조명 여부를 판단하고, 이동 여부를 검사한 후, 얼굴 유무 및 실내외 여부를 판단할 수 있다. 여기서, 상기 영상 프레임이 저조명인 경우, 나머지 이동 여부, 얼굴 유무 및 실내외 여부를 판단하는 동작을 수행할 필요가 없으므로, 상기 저조명 여부를 가장 먼저 판단하는 것이 중요하다.Here, as illustrated in FIG. 5, the acoustic environment recognizer 212 and the image environment recognizer 214 may process the sound signal and the image signal at different times. In this case, the sound environment recognition unit 212 and the image environment recognition unit 214 outputs the sound signal and the image signal simultaneously at a point in time after the input of the sound signal and the image signal starts, the sound The operation is performed in consideration of the time and data necessary for determining the environment, and in consideration of the time and data necessary for determining the video environment. In FIG. 5, the sound environment recognition unit 212 and the image environment recognition unit 214 output the sound environment determination result and the image determination result at the same time three seconds after the sound signal and the image signal are input. ) Shows the timing of operation. As shown in FIG. 5, the acoustic environment recognizer 212 recognizes an acoustic environment by processing an acoustic signal at an acoustic time point that is periodically repeated for three seconds, and the image environment recognizer 214 recognizes the acoustic environment. The video signal is processed at a video point that is periodically repeated for a second. Here, the sound viewpoint and the image viewpoint do not overlap. The image environment recognizing unit 214 may first determine whether low light is in order to determine the image environment for the three seconds, check whether there is a movement, and then determine whether there is a face and whether it is indoor or outdoor. Here, when the image frame is low light, it is not necessary to perform an operation for determining whether to move the rest, whether there is a face, and whether it is indoor or outdoor. Therefore, it is important to first determine whether the low light is low.

상기 상황 통계 모델(230)은 미리 구분된 복수의 상황들에 대한 음향 및 영상 신호에 대한 통계 모델을 나타낸다. 예를 들어, 상기 상황 통계 모델은 일상생활에서 생활/이동하는 공간을 바탕으로 사무실, 식당/카페, 경기관람, 쇼핑몰, 자동차/버스, 강의실, 길거리, 대형 마트, 지하철, 야외 상황, 가정집으로 구분될 수 있다. 상기 각 상황에 대한 통계 모델들은 해당 상황별로 장시간(수 시간 분량)의 음향 및 영상 신호를 수집하고, 수집된 음향 및 영상 신호를 단구간(예: 3초)별로 구분하여 음향 환경 인지 및 영상 환경 인지를 수행한 후, 인지 결과를 누적하여 획득한다. 여기서 상기 각 상황에 대한 통계 모델은, 도 6에 도시된 바와 같이, 상기 누적된 인지 결과를 이용하여 2차원(음향/영상 환경) 히스토그램으로 나타낼 수 있다. 상기 도 6에 도시된 바와 같이, 각 상황에 대한 통계 모델은 각 상황의 특징을 나타낸다. 예를 들어, 대형 마트의 통계 모델은 실내/웅성거림과 이동 중/웅성거림이 높게 나타나고, 지하철의 통계 모델은 실내/지하철 소음과 실내/큰소음이 높게 나타난다. The situation statistical model 230 represents a statistical model for sound and image signals for a plurality of predetermined situations. For example, the situation statistical model is divided into offices, restaurants / cafes, games, shopping malls, cars / buses, classrooms, streets, large marts, subways, outdoor situations, and homes based on living / moving spaces in daily life. Can be. The statistical models for each situation collect sound and image signals for a long time (several hours) for each situation, and classify the collected sound and image signals for each short period (for example, 3 seconds) to recognize the acoustic environment and the image environment. After cognition is performed, cognitive results are accumulated and acquired. In this case, as shown in FIG. 6, the statistical model for each situation may be represented as a two-dimensional (acoustic / image environment) histogram using the accumulated cognitive result. As shown in FIG. 6, the statistical model for each situation represents the characteristics of each situation. For example, the statistical model of a large mart shows high indoor / loudness and moving / loudness, and the statistical model of subway shows high indoor / subway noise and indoor / loud noise.

상기 상황 판단부(220)는 상기 인지 결과 누적부(210)로부터 누적 결과가 제공되면, 상기 상황 통계 모델(230)에 저장된 상황별 모델들과 누적 결과 간의 확률적 거리를 비교하여 상기 확률적 거리가 가장 가까운 상황 통계 모델을 상기 사용자 상황으로 판단한다. 예를 들어, 상기 상황 판단부(220)는 상기 도 7에 도시된 바와 같이, 5분간 누적된 음향 환경 인지 결과와 영상 환경 인지 결과를 나타내는 히스토그램과 상기 상황 통계 모델(230)에 저장된 대형마트, 지하철 및 길거리와 같은 상황별 모델들을 나타내는 히스토그램 간의 확률적 거리를 계산하여 계산된 확률적 거리가 가장 짧은 상황 통계 모델을 상기 사용자의 상황으로 판단할 수 있다. When the cumulative result is provided from the cognitive result accumulator 210, the situation determiner 220 compares the probabilistic distance between the situational models stored in the situation statistical model 230 and the cumulative result, thereby providing the probability distance. Determines the closest situation statistical model as the user situation. For example, as illustrated in FIG. 7, the situation determination unit 220 may include a histogram representing a sound environment recognition result and an image environment recognition result accumulated for 5 minutes, and a large mart stored in the situation statistical model 230. A situation statistical model having the shortest probability distance calculated by calculating a stochastic distance between histograms representing situational models such as subways and streets may be determined as the user's situation.

상기 상황 판단부(220)는 상기 사용자 상황이 판단되면, 도 8에 도시된 바와 같이 상기 판단된 사용자 상황 정보에 시간 인덱스를 매핑하여 상기 저장부(240)에 저장한다. 여기서, 상기 시간 인덱스는 상기 사용자 상황을 판단한 시점으로, 날짜 정보와 시간 정보를 포함할 수 있다. 여기서, 판단된 사용자 상황을 시간 인덱스와 매핑하여 저장하는 것은 이후 사용자가 필요에 따라 과거 상황을 쉽게 검색할 수 있도록 하기 위함이다. 즉, 상기 사용자가 날짜, 시간 단위 검색을 통해 과거 상황을 검색할 수 있도록 하기 위함이다.When the user situation is determined, the situation determination unit 220 maps a time index to the determined user situation information as shown in FIG. 8 and stores the time index in the storage unit 240. Here, the time index is a time point when the user situation is determined, and may include date information and time information. The mapping of the determined user context with the time index is performed so that the user can easily search for the past situation as needed. That is, to allow the user to search for past situations through date and time unit search.

상기 저장부(240)는 상기 휴대용 단말기에서 라이프 로그 생성에 필요한 각종 프로그램 및 데이터를 저장하며, 특히 상기 상황 판단부(220)의 제어에 따라 시간 인덱스가 매핑된 사용자 상황 정보를 저장한다. 여기서, 상기 시간 인덱스가 매핑된 사용자 상황 정보를 사용자의 라이프 로그라고 칭할 수 있다.
The storage unit 240 stores various programs and data necessary for generating a lifelog in the portable terminal. In particular, the storage unit 240 stores user context information mapped with a time index under the control of the context determination unit 220. Here, the user context information to which the time index is mapped may be referred to as a user's life log.

도 9는 본 발명의 실시 예에 따른 휴대용 단말기에서 라이프 로그를 생성하여 저장하는 절차를 도시하고 있다.9 is a flowchart illustrating a procedure of generating and storing a lifelog in a portable terminal according to an exemplary embodiment of the present invention.

상기 도 9를 참조하면, 상기 단말은 901단계에서 마이크(202)와 카메라(204)를 통해 사용자 주변의 다양한 음향 신호와 영상 신호를 입력받는다. Referring to FIG. 9, in step 901, the terminal receives various sound signals and video signals around a user through the microphone 202 and the camera 204.

이후, 상기 단말은 903단계에서 입력된 음향 신호와 영상 신호를 처리하기 위해 상기 음향 신호와 영상 신호를 구분하여 음향 신호일 경우 905단계로 진행하고 영상 신호일 경우 911단계로 진행한다.Thereafter, the terminal distinguishes the sound signal from the video signal in order to process the sound signal and the video signal input in step 903, and proceeds to step 905 for the audio signal and to step 911 for the video signal.

상기 단말은 905단계에서 미리 설정된 단구간 동안 입력되는 음향 신호에 임계값 이상의 큰 소리가 포함되었는지 여부를 검사한다. 상기 단말은 상기 음향 신호에 임계값 이상의 큰 소리가 포함되어 있을 시 하기 909단계로 진행하여 상기 음향 신호에 대응되는 음향 환경이 큰 소리 환경임을 인지한다. In step 905, the terminal determines whether a loud sound equal to or greater than a threshold value is included in the sound signal input during the preset short section. If the sound signal includes a loud sound above the threshold, the terminal proceeds to step 909 and recognizes that the sound environment corresponding to the sound signal is a loud sound environment.

반면, 상기 음향 신호에 임계값 이상의 큰 소리가 포함되지 않았을 시, 상기 단말은 907단계로 진행하여 상기 음향 신호의 에너지 특징을 추출한 후, 909단계로 진행하여 상기 추출된 에너지 특징에 따른 음향 환경을 인지한다. 즉, 상기 단말은 기 저장된 음향 환경에 대한 가우시안 혼합 모델들 중에서 상기 추출된 에너지 특징에 대해 최대 우도 값을 갖는 음향 환경을 검색 및 인지한다. 여기서, 상기 음향 환경에 대한 가우시안 혼합 모델은 각각의 음향 환경에 대한 에너지 특징을 나타낼 수 있으며, 상기 음향 환경은 예를 들어, 웅성거리는 소리, 자동차 소리, 음악 소리, 가방과 같은 밀폐 공간 내 소음, 사무실 소음, 사람 음성, 지하철 소음, 조용한 공공장소의 소음, 물 흐르는 소리 및 큰 소리로 구분할 수 있다. On the other hand, when the sound signal does not include a loud sound above the threshold, the terminal proceeds to step 907 to extract the energy characteristic of the sound signal, and proceeds to step 909 to create an acoustic environment according to the extracted energy feature Be aware. That is, the terminal searches for and recognizes an acoustic environment having a maximum likelihood value for the extracted energy feature among the Gaussian mixture models for the previously stored acoustic environment. Here, the Gaussian mixed model for the acoustic environment may represent an energy characteristic for each acoustic environment, and the acoustic environment may include, for example, a loud noise, a car sound, a music sound, a noise in a closed space such as a bag, It can be divided into office noise, human voice, subway noise, quiet public noise, running water and loud noise.

여기서는 상기 큰 소리 포함 여부를 검사한 후, 상기 큰 소리가 포함되지 않은 경우에 상기 음향 신호의 특징을 추출하여 음향 환경을 인지하였으나, 상기 큰 소리 포함 여부를 검사하는 절차를 생략하고, 음향 신호의 특징을 추출할 수도 있다.Here, after checking whether the loud sound is included, when the loud sound is not included, a feature of the sound signal is extracted to recognize an acoustic environment, but the procedure of checking whether the loud sound is included is omitted, and You can also extract features.

한편, 상기 단말은 911단계에서 미리 설정된 단구간 동안 입력되는 영상 신호에 해당하는 영상 프레임의 강도를 측정하여 저조명인지 여부를 검사하고, 상기 저조명일 경우 915단계로 진행하여 상기 영상 신호의 환경이 저조명 환경임을 인지한다. 반면, 해당 영상 프레임이 저조명이 아닐 경우, 상기 단말은 913단계에서 상기 영상 신호에 해당하는 프레임에 대한 움직임 벡터, 피부색 및 색감과 질감을 감지하여 상기 해당 프레임이 이동 중인지 혹은 얼굴이 포함되어 있는지 실내인지 실외인지 여부를 판단하고, 915단계로 진행하여 판단 결과에 따라 상기 해당 프레임의 영상 환경을 인지한다. 여기서, 상기 단말은 상기 입력되는 영상 신호를 실시간으로 처리하기 위해 도 3에 도시된 바와 같은 순서에 따라 영상 프레임을 분석하거나 도 4에 도시된 바와 같이 분석 영역을 제한하여 해당 영상 프레임을 분석할 수 있다. 특히, 상기 단말은 도 5에 도시된 바와 같이, 상기 영상 신호를 상기 음향 신호가 처리되지 않는 시점에 처리할 수 있다.On the other hand, the terminal measures the intensity of the video frame corresponding to the video signal input during the preset short section in step 911 to check whether or not the low light, and if the low light proceeds to step 915 and the environment of the video signal is low Be aware of the lighting environment. On the other hand, if the corresponding video frame is not low light, the terminal detects a motion vector, skin color, color and texture of the frame corresponding to the video signal in step 913 to determine whether the corresponding frame is moving or includes a face. In step 915, the image environment of the corresponding frame is recognized according to the determination result. In this case, the terminal may analyze the video frame in the order as shown in FIG. 3 or limit the analysis area as shown in FIG. 4 to analyze the video frame in real time to process the input video signal in real time. have. In particular, as shown in FIG. 5, the terminal may process the video signal at a time when the sound signal is not processed.

상기 음향 신호의 음향 환경이 인지되고, 상기 영상 신호의 영상 환경이 인지되면, 상기 단말은 917단계에서 상기 미리 설정된 단구간에 대한 음향 환경 결과와 영상 환경 결과를 누적시킨다. 여기서, 상기 단말은 상기 인지 결과를 누적시켜 도 6에 도시된 바와 같이, 2차원 히스토그램으로 나타낼 수 있다.When the sound environment of the sound signal is recognized and the video environment of the video signal is recognized, the terminal accumulates the sound environment result and the video environment result for the preset short section in step 917. In this case, the terminal may accumulate the recognition result and may represent the two-dimensional histogram as shown in FIG. 6.

이후, 상기 단말은 919단계로 진행하여 미리 설정된 누적 시간이 만료되는지 여부를 검사한다. 상기 누적 시간이 만료되지 않을 시, 상기 단말은 상기 901단계로 되돌아가 이하 단계를 재수행한다.Thereafter, the terminal proceeds to step 919 to check whether the preset cumulative time expires. If the cumulative time has not expired, the terminal returns to step 901 to perform the following steps again.

반면, 상기 누적 시간이 만료될 시, 상기 단말은 921단계에서 상기 누적 결과를 이용하여 사용자의 상황을 판단한다. 즉, 상기 단말은 미리 저장된 복수의 상황 통계 모델들과 상기 누적 결과 간의 확률적 거리를 비교하여 확률적 거리가 가장 짧은 상황 통계 모델을 사용자 상황으로 판단한다. 여기서, 상기 복수의 상황 통계 모델들은 미리 구분된 복수의 상황들에 대한 음향 및 영상 신호에 대한 통계 모델로서, 2차원 히스토그램으로 나타낼 수 있다. 예를 들어, 상기 상황 통계 모델들은 사무실, 식당/카페, 경기관람, 쇼핑몰, 자동차/버스, 강의실, 길거리, 대형 마트, 지하철, 야외 상황, 가정집 각각에 대한 음향 및 영상 신호에 대한 특징을 나타낼 수 있다.On the other hand, when the cumulative time expires, the terminal determines the user's situation using the cumulative result in step 921. That is, the terminal compares the probabilistic distance between the plurality of pre-stored situation statistical models and the cumulative result and determines the situation statistical model having the shortest probabilistic distance as the user situation. Here, the plurality of situation statistical models may be represented as two-dimensional histograms as statistical models for sound and image signals for a plurality of predetermined situations. For example, the situation statistical models may characterize the acoustic and video signals of offices, restaurants / cafés, games, shopping malls, cars / buses, classrooms, streets, hypermarkets, subways, outdoor situations, and homes. have.

이후, 상기 단말은 923단계로 진행하여 도 8에 도시된 바와 같이 상기 판단된 사용자 상황 정보에 시간 인덱스를 매핑하여 라이프 로그로 저장한다. 여기서, 상기 시간 인덱스는 상기 사용자 상황을 판단한 시점으로, 날짜 정보와 시간 정보를 포함할 수 있다. 여기서, 판단된 사용자 상황을 시간 인덱스와 매핑하여 저장하는 것은 이후 사용자가 필요에 따라 날짜, 시간 단위의 검색을 통해 과거 상황을 쉽게 검색할 수 있도록 하기 위함이다.In step 923, the terminal maps a time index to the determined user context information as shown in FIG. 8 and stores the time index as a lifelog. Here, the time index is a time point when the user situation is determined, and may include date information and time information. Here, the storage of the determined user situation by mapping with the time index is to allow the user to easily search for the past situation by searching the date and time unit as needed.

이후, 상기 단말은 본 발명에 따른 알고리즘을 종료한다.
Thereafter, the terminal terminates the algorithm according to the present invention.

한편 본 발명의 상세한 설명에서는 구체적인 실시 예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능하다. 그러므로 본 발명의 범위는 설명된 실시 예에 국한되어 정해져서는 아니 되며 후술하는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.
Meanwhile, in the detailed description of the present invention, specific embodiments have been described, but various modifications may be made without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the scope of the following claims, but also by the equivalents of the claims.

200: 입력부 202: 마이크
204: 카메라 210: 인지 결과 누적부
212: 음향 환경 인지부 214: 영상 환경 인지부
220: 상황 판단부 230: 상황 통계 모델
240: 저장부200: input unit 202: microphone
204: Camera 210: Cognitive Result Accumulator
212: acoustic environment recognition unit 214: image environment recognition unit
220: situation determination unit 230: situation statistics model
240: storage unit

Claims

Recognizing a sound environment corresponding to a sound signal input from a microphone among preset sound environments;
Recognizing a video environment corresponding to a video signal input from a camera among preset video environments;
Determining a user context corresponding to the acoustic environment recognition result and the image environment recognition result among preset context models;
And recording the determined user situation into a life log.

The method of claim 1,
The acoustic environments represent energy characteristics for each of a plurality of preset acoustic environments,
The acoustic environments may include at least one of a loud noise, a car sound, a music sound, a noise in an enclosed space such as a bag, an office noise, a human voice, a subway noise, a quiet public place noise, a water flowing sound, and a loud sound. Method comprising a.

The method of claim 2,
The process of recognizing the acoustic environment,
Extracting energy characteristics of an acoustic signal input during a predetermined period from the microphone;
And recognizing an acoustic environment having a maximum likelihood value with respect to the extracted energy feature among the preset acoustic environments as an acoustic environment corresponding to the acoustic signal.

The method of claim 1,
The video environments are distinguished according to at least one of low light, mobility, presence of face, and indoors and outdoors.

The method of claim 4, wherein
Recognizing the video environment,
Checking whether an image signal input during a preset period from the camera satisfies the at least one condition;
And recognizing a video environment corresponding to the video signal according to whether the at least one condition is satisfied.

The method of claim 1,
The audio signal and the video signal are processed at different points in time within a preset section.

The method of claim 1,
The situation models represent recognition result statistics of an audio signal and an image signal for each of a plurality of predetermined situations.
The situation models may include models for at least one of an office, a restaurant / café, an event, a shopping mall, a car / bus, a classroom, a street, a large mart, a subway, an outdoor situation, and a home.

The method of claim 7, wherein
The process of determining a situation corresponding to the acoustic environment recognition result and the image environment recognition result among preset context models may include:
Accumulating the acoustic environment recognition result and the image environment recognition result for a predetermined time interval;
And determining the situation model having the shortest probabilistic distance from the cumulative result among the situation models as a user situation.

The method of claim 1,
The process of recording the determined user situation in the life log,
Obtaining time information at the time of determining the user situation;
And mapping the time information to the determined user context and recording the time information.

A microphone receiving an audio signal,
A camera receiving an image signal,
An acoustic environment recognizing unit recognizing an acoustic environment corresponding to a sound signal input from the microphone among preset acoustic environments;
An image environment recognizing unit recognizing an image environment corresponding to an image signal input from the camera among preset image environments;
A life log generation of a portable terminal, comprising: a situation determination unit for determining a user situation corresponding to the acoustic environment recognition result and the image environment recognition result among preset situation models and recording the determined user situation as a life log Device.

The method of claim 10,
The acoustic environments represent energy characteristics for each of a plurality of preset acoustic environments,
The acoustic environments may include at least one of a loud noise, a car sound, a music sound, a noise in an enclosed space such as a bag, an office noise, a human voice, a subway noise, a quiet public place noise, a water flowing sound, and a loud sound. Apparatus comprising a.

The method of claim 10,
The acoustic environment recognizing unit extracts an energy feature of an acoustic signal input during a preset period, and includes an acoustic environment having a maximum likelihood value with respect to the extracted energy feature among the preset acoustic environments. Recognizing as a device.

The method of claim 10,
The imaging environment is characterized in that the classification according to at least one of the low light, whether the movement, the presence of the face, indoors and outdoors.

The method of claim 13,
And the image environment recognizer recognizes an image environment corresponding to the image signal by checking whether an image signal input during a preset period satisfies the at least one condition.

The method of claim 10,
And the audio signal and the video signal are processed at different points in time within a preset section.

The method of claim 10,
The situation models represent recognition result statistics of an audio signal and an image signal for each of a plurality of predetermined situations.
The situation models may include a model for at least one of an office, a restaurant / café, a game watch, a shopping mall, a car / bus, a classroom, a street, a large mart, a subway, an outdoor situation, and a home.

17. The method of claim 16,
The situation determination unit may be configured to determine, as a user situation, a situation model having a shortest probabilistic distance from an acoustic environment recognition result and an image environment recognition result accumulated during a predetermined time period among the situation models.

The method of claim 10,
And the situation determination unit obtains time information at the time point of determining the user situation and maps and records the time information to the determined user situation.