KR20200025532A

KR20200025532A - An system for emotion recognition based voice data and method for applications thereof

Info

Publication number: KR20200025532A
Application number: KR1020180103019A
Authority: KR
Inventors: 주민성
Original assignee: 주민성
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2020-03-10

Abstract

Disclosed are an emotion recognition system based on voice data and an application method thereof. The emotion recognition system based on voice data comprises: a user terminal providing a vital voice signal; an emotion analysis server generating a customized emotion value based on user voice information received from the user terminal to transmit the customized emotion value to the user terminal or an emotion value application server and an emotion adjustment hardware; the emotion adjustment hardware connected to the emotion analysis server through a wired/wireless communication network; and the emotion value application server. According to the present invention, a voice signal obtained from the emotion analysis server may be utilized in the user terminal, the emotion value application server, and the emotion adjustment hardware connected to the emotion analysis server through the wired/wireless communication network.

Description

An system for emotion recognition based voice data and method for applications

본 발명은 음성 데이터 기반의 감정 인식 시스템 및 그 응용 방법에 관한 것으로서, 더욱 상세하게는 개인의 음성 정보를 수집하여 빅 데이터(big data)를 만들고, 이러한 빅 데이터를 이용하여 지도 학습에 의한 딥 러닝(deep learning)을 진행하여 개인의 감정 정보를 도출하고, 이를 단말기, 다른 서버 혹은 별도의 감정 조절 하드웨어 장치로 보낼 수 있는 음성 데이터 기반의 감정 인식 시스템 및 그 응용 방법에 관한 것이다. The present invention relates to a voice data-based emotion recognition system and an application method thereof, and more particularly, to collect big voice data of an individual to make big data, and deep learning by supervised learning using the big data. The present invention relates to a voice data-based emotion recognition system and an application method thereof that can derive personal emotion information and send it to a terminal, another server, or a separate emotion control hardware device.

이동통신 기술이 발전함에 따라 대용량의 데이터를 고속으로 전송할 수 있는 제 3세대 이동통신망(예를 들면, W-CDMA망 또는 HSDPA망)이 구축되면서, 기존의 음성통화 서비스에만 국한되었던 것을 탈피해 통화자 간 영상을 통해 얼굴을 직접 보면서 통화하는 영상통화도 가능하게 되었다. 이에 따라 이동통신사들은 제3세대 이동통신 망의 가입자들을 먼저 선점하기 위해 더욱더 다양한 특수 기능의 통화서비스를 개발하여 제공하고 있다.With the development of mobile communication technology, the third generation mobile communication network (for example, W-CDMA network or HSDPA network) capable of transmitting large amounts of data at high speed is established, and it is possible to break away from the existing voice communication service. It is also possible to make a video call while looking directly at the face through the video. Accordingly, mobile carriers are developing and providing call services with various special functions in order to preempt subscribers of third generation mobile communication networks first.

이러한 특수 기능의 서비스들은 이를테면, 발신자와 착신자 간에 서로 이모티콘, 이미지, 플래시 및 동영상 등과 같은 멀티미디어 컨텐츠를 주고 받음으로써 통화에 더욱 재미와 흥미를 더할 뿐 아니라, 더 나아가 음성 분석을 통해 통화 당사자의 환경적 상태 및 감정까지도 표시해주는 감성 통화 서비스에 이르기까지 더욱 다양해지고 있다. 음성 인식은 사람의 억양과 음의 높낮이가 서로 다르다는 특성에 기인하여 사람의 음소, 음절, 단어 등의 진동을 분석하여 수치화하는 것으로서, 마이크 등을 통해 전달된 음성의 특징을 분석한 후 가장 근접한 것을 찾아내는 방식이다.These specially designed services not only add more fun and interest to the call by exchanging multimedia content such as emoticons, images, flashes and videos between the caller and the caller, but also further analyze the environment of the calling party through voice analysis. It is becoming more diverse, ranging from emotional call services that also show status and emotion. Speech recognition is the quantification by analyzing the vibrations of people's phonemes, syllables, words, etc., due to the different characteristics of human accent and pitch. It's a way of finding out.

그런데 기존의 음성 데이터를 통한 감정 분석 시스템은 개인의 언어 특징 정보를 고려하지 않고 있어 정확한 감정 분석에는 한계를 가지는 문제가 있다.However, the existing emotion analysis system using voice data does not consider the language feature information of the individual, so there is a problem in that accurate emotion analysis is limited.

즉, 기존의 음성 데이터를 이용한 감정 분석은, 각 개인의 음성 특성을 고려하지 않고 획일적인 감정 분석이 이루어지는 문제가 있다.That is, conventional emotion data analysis has a problem that uniform emotion analysis is performed without considering the voice characteristics of each individual.

특허문헌 0001) 한국공개특허 10-2010-0001928Patent Document 0001) Korea Patent Publication 10-2010-0001928

본 발명의 기술적 사상이 이루고자 하는 과제는, 스마트 폰과 같은 단말기에서 얻은 개인의 생체 음성 신호를 빅 데이터화하고 이를 활용하여 지도 학습에 의한 딥 러닝을 진행하여 특정 주기 동안의 개인의 감정을 복수개의 감정 중 하나로 추론하고 이를 단말기, 감정값 응용 서버 및 감정 조절 하드웨어 장치에서 활용할 수 있는 음성 데이터 기반의 감정 인식 시스템을 제공하는데 있다.The problem to be achieved by the technical idea of the present invention is to make a big data of the biometric voice signal of an individual obtained from a terminal such as a smart phone, and deep learning by using the supervised learning using a plurality of emotions during a specific cycle It is to provide a voice data-based emotion recognition system that can be inferred as one of them and utilized in the terminal, emotion value application server and emotion control hardware device.

본 발명의 기술적 사상이 이루고자 하는 다른 과제는, 스마트 폰과 같은 단말기에서 얻은 개인의 생체 음성 신호를 빅 데이터화하고 이를 활용하여 지도 학습에 의한 딥 러닝을 진행하여 특정 주기 동안의 개인의 감정을 복수개의 감정 중 하나로 추론하고 이를 단말기, 감정값 응용 서버 및 감정 조절 하드웨어 장치에서 활용할 수 있는 음성 데이터 기반의 감정 인식 시스템의 응용 방법을 제공하는데 있다.Another object of the technical idea of the present invention is to make a big data of a biometric voice signal of an individual obtained from a terminal such as a smart phone, and use the same to conduct deep learning by using supervised learning. It is to provide an application method of a speech data-based emotion recognition system that can be inferred as one of emotions and utilized in a terminal, an emotion value application server, and an emotion control hardware device.

본 발명의 기술적 사상의 일 실시예에 의한 음성 데이터 기반의 감정 인식 시스템은, 사용자로부터 입력되는 음성 데이터로부터 음정 주파수, 음량 데시벨 및 발성 속도를 포함한 사용자 음성 정보를 추출하여 감정 분석 서버로 전송하며, 상기 사용자 음성 정보에 기반한 사용자 맞춤형 감정값을 상기 감정 분석 서버로부터 수신하여 표시하는 사용자 단말기와, 상기 사용자 단말기로부터 수신되는 사용자 음성 정보를 기반으로 객관적인 감정을 나타낸 객관적 감정값을 산출하며, 상기 사용자 단말기로부터 수신되는 사용자 음성 정보와 누적 저장된 사용자 음성 누적 정보를 비교하여 사용자 개별 감정인 감정 가중치를 산출하여, 상기 객관적 감정값에 감정 가중치를 추가 적용한 사용자 맞춤형 감정값을 생성하여 상기 사용자 단말기로 전송하는 감정 분석 서버와, 상기 감정 분석 서버에서 산출된 맞춤형 감정값에 따른 제어 명령을 받아서 상기 사용자의 감정 조절에 영향을 줄 수 있는 동작을 수행할 수 있는 감정 조절 하드웨어 장치와, 상기 감정 분석 서버에서 산출된 맞춤형 감정값을 수신받아 참고 자료로 사용할 수 있는 감정값 응용 서버 및 상기 단말기, 감정 분석 서버, 감정 조절 하드웨어 장치 및 감정값 응용 서버를 서로 연결하여 유선 또는 무선 통신을 제공하는 유무선 통신망을 구비하는 것을 특징으로 한다. According to an embodiment of the present invention, a voice data-based emotion recognition system extracts user voice information including a pitch frequency, a volume decibel, and a speech rate from voice data input from a user, and transmits the user voice information to an emotion analysis server. Calculating a user terminal for receiving and displaying a user-defined emotion value based on the user voice information from the emotion analysis server, and an objective emotion value indicating an objective emotion based on the user voice information received from the user terminal, The emotion weight is calculated by comparing the user's voice information received from the stored user's voice accumulated information and calculating the user's individual emotion value, and generating a user-defined emotion value to which the emotion value is added to the objective emotion value and transmitting the emotion weight to the user terminal. An emotion server configured to receive a control command according to a customized emotion value calculated by the emotion analysis server, and to perform an operation that may affect the emotion control of the user; Equipped with a wired or wireless communication network that provides a wired or wireless communication by connecting the emotion value application server and the terminal, the emotion analysis server, the emotion control hardware device and the emotion value application server that can receive the customized emotion value as a reference material. It features.

본 발명의 일 실시예에 의하면, 상기 감정 조절 하드웨어 장치는, 내부에 사물 인터넷(IoT)이 가능하고, 상기 감정 분석 서버로부터 받은 명령을 수행할 수 있는 구성을 포함하는 것으로서 상기 사용자가 그 동작 환경에 노출되는 에어컨, 히터 및 습도 조절 장치 로 이루어진 공정 시스템 중에서 선택된 어느 하나이거나 혹은 내부에 인공 지능이 장착되어 상기 사용자와 대화를 할 수 있는 인공지능 스피커일 수 있다.According to an embodiment of the present invention, the emotion control hardware device, the Internet of Things (IoT) therein, and includes a configuration that can perform a command received from the emotion analysis server as the user's operating environment It may be any one selected from a process system consisting of an air conditioner, a heater, and a humidity control device exposed to or an artificial intelligence mounted therein and having an artificial intelligence capable of talking to the user.

또한 본 발명의 일 실시예에 의하면, 상기 감정 분석 서버는, 음성 주파수, 음성 데시벨 및 발성 속도를 포함한 음성 정보 별로 객관적 감정값이 할당되어 저장된 객관적 감정값 DB와, 사용자 단말기로부터 수신되는 사용자 음성 정보가 누적되어 저장된 사용자 음성 누적 정봅 DB와, 상기 사용자 단말기로부터 수신되는 사용자 음성 정보와 매칭되는 객관적 감정값을 상기 객관적 감정값 DB로부터 추출하는 객관적 감정값 분석부와, 상기 사용자 음성 누적 정보 DB를 이용하여 사용자의 평균 음성 주파수, 평균 음성 데시벨 및 평균 발성 속도를 포함하는 사용자 음성 평균 정보를 추출하여, 상기 사용자 단말기로부터 수신되는 사용자 음성 정보와 상기 사용자 음성 평균 정보를 비교하여 감정 가중치를 산출하는 감정 가중치 산출부 및 상기 객관적 감정값에 상기 감정 가중치를 합산한 사용자 맞춤형 감정값을 생성하여 상기 사용자 단말기로 전송하는 사용자 맞춤형 감정값 제공부를 구비할 수 있다.In addition, according to an embodiment of the present invention, the emotion analysis server, an objective emotion value DB is assigned and stored for each voice information including voice frequency, voice decibels and voice speed, and user voice information received from the user terminal. Cumulatively stored user voice accumulated information DB, an objective emotion value analysis unit for extracting an objective emotion value matching the user voice information received from the user terminal from the objective emotion value DB, and the user voice accumulated information DB Emotion weights for extracting user voice average information including average voice frequency of the user, average voice decibels, and average voice speed, and calculating emotion weights by comparing the user voice information received from the user terminal with the user voice average information. To the calculation unit and the objective emotion value A user-customized emotion value providing unit may be provided to generate a user-customized emotion value obtained by adding the emotion weights to the user terminal.

또한 본 발명의 일 실시예에 의하면, 상기 사용자 단말기는, 상기 음성 데이터를 입력받는 단말기 입력부와, 상기 사용자 맞춤형 감정값이 표시되는 단말기 표시부와, 상기 사용자로부터 입력되는 음성 데이터로부터 음성 주파수, 음성 데시벨 및 발성 속도를 포함한 사용자 음성 정보를 추출하여 상기 감정 분석 서버로 전송하는 사용자 음성 정보 제공부 및 상기 사용자 음성 정보에 기반한 사용자 맞춤형 감정값을 상기 감정 분석 서버로부터 수신하여 상기 표시부에 기록하는 단말기 제어부를 구비하는 것이 적합하다.According to an embodiment of the present invention, the user terminal includes a terminal input unit for receiving the voice data, a terminal display unit for displaying the user-specific emotion value, and a voice frequency and voice decibel from the voice data input from the user. And a user voice information providing unit for extracting user voice information including voice speed and transmitting the user voice information to the emotion analysis server, and a terminal controller for receiving a user-customized emotion value based on the user voice information from the emotion analysis server and recording it on the display unit. It is suitable to provide.

이와 함께, 상기 사용자 단말기는, 스트레스 지수별로 긴장을 완화시킬 수 있는 배경 음악이 할당되어 저장된 배경 음악 DB를 더 포함하며, 상기 단말기 제어부는, 사용자 맞춤형 감정값의 스트레스 지수가 미리 설정된 임계 스트레스를 초과하는 경우, 사용자의 스트레스 지수 별로 할당된 배경 음악을 상기 배경 음악 DB에서 추출하여 재생하며, 사용자의 스트레스 지수를 실시간으로 수신하여 수신되는 스트레스 지수에 매칭되는 배경 음악으로 바꾸어가며 재생시키는 것이 바람직하다.In addition, the user terminal further includes a background music DB, in which background music is allocated and stored for alleviating stress for each stress index, and the terminal controller further includes a stress index of a user-customized emotion value exceeding a preset threshold stress. In this case, it is preferable to extract and play the background music allocated to the stress index of the user from the background music DB, and to play the background music matching the received stress index by receiving the user's stress index in real time.

또한, 상기 사용자 단말기는, 스트레스를 회복시킬 수 있는 스트레스 완화 이미지가 할당되어 저장된 스트레스 완화 이미지 DB를 포함하며, 상기 단말기 제어부는, 사용자 맞춤형 감정값의 스트레스 지수가 미리 설정된 임계 스트레스를 초과하는 경우, 사용자의 스트레스 지수별로 할당된 스트레스 완화 이미지를 상기 스트레스 완화 이미지 DB에서 추출하여 표시부를 통해 표시하며, 사용자의 스트레스 지수를 실시간으로 수신하여 수신되는 스트레스 지수에 매칭되는 스트레스 완화 이미지로서 바꾸어가며 표시하는 것이 적합하다.The user terminal may include a stress relaxation image DB in which a stress relaxation image capable of restoring stress is allocated and stored, and the terminal controller may include: when the stress index of the user-customized emotion value exceeds a preset threshold stress, The stress relief image allocated to each user's stress index is extracted from the stress relaxation image DB and displayed through a display unit, and the user's stress index is received in real time and displayed as a stress relaxation image matching the received stress index. Suitable.

본 발명의 기술적 사상의 다른 실시예에 의한 음성 데이터 기반의 감정 인식 시스템의 응용 방법은, 단말기에서 사용자를 통해 음성 정보가 제공되는 단계와, 상기 단말기에서 제공된 음성 정보를 음량 데시벨, 음정 주파수 및 발성 속도로 이루어진 음성 생체 신호로 분석하는 단계와, 상기 분석된 음성 생체 신호를 감정 분석 서버로 전송하는 단계와, 상기 감정 분석 서버에서 음성 정보를 제공한 개인별로 누적된 음성 생체 신호에 대한 빅 데이터를 수집하는 단계와, 상기 감정 분석 서버에서 감정 분석 알고리즘을 동작시켜 수집된 빅 데이터를 사용하여 지도 학습(supervised learning)에 의한 딥 러닝(deep learning)을 진행하는 단계와, 상기 음성 생체 신호의 음량 데시벨, 음정 주파수 및 발성 속도에 따라 사용자의 감정을 정해진 주기 동안 복수개의 기본 감정 중 하나로 분류하는 단계와, 상기 복수개의 기본 감정 중 하나로 분류된 감정을 상기 감정 분석 서버와 유무선 통신망으로 연결된 감정값 응용 서버로 전송하는 단계를 구비할 수 있다.According to another aspect of the present invention, there is provided a method of applying a speech data-based emotion recognition system, including providing speech information through a user in a terminal, and determining the speech information provided by the terminal in terms of volume decibel, pitch frequency, and speech. Analyzing the voice biosignal at the speed; transmitting the analyzed voice biosignal to the emotion analysis server; and collecting big data on the voice biosignal accumulated for each individual who provided the voice information from the emotion analysis server. Collecting, and performing deep learning by supervised learning using big data collected by operating an emotion analysis algorithm in the emotion analysis server, and volume decibels of the voice biosignal. , Based on the pitch frequency and voice speed, the user's emotions can be And classifying as one of information, it may be provided with the step of transmitting an emotion classified into one of the plurality of basic emotions in the emotion analysis server and the emotion value of the application server in conjunction with a wired or wireless communication network.

한편, 상기 지도 학습에 의한 딥 러닝을 진행하는 방법은, 상기 음량 데시벨의 주기별 평균값의 산포도에서 제1 감정, 제2 감정 및 제3 감정의 경계치를 정하고, 상기 음정 주파수의 주기별 평균값의 산포도에서 제4 감정, 제5 감정 및 제6 감정의 경계치를 정하고, 상기 발성 스피드의 주기별 평균값의 산포도에서 제7 감정, 제8 감정 및 제9 감정의 경계치를 정하는 과정을 포함할 수 있다.On the other hand, the method of deep learning by the supervised learning, the boundary value of the first emotion, the second emotion and the third emotion from the scatter diagram of the average value for each period of the volume decibels, and the scatter diagram of the average value for each period of the pitch frequency Determining the threshold value of the fourth emotion, the fifth emotion and the sixth emotion, and determining the threshold value of the seventh emotion, the eighth emotion and the ninth emotion in the scatter diagram of the average value for each period of the speech speed.

본 발명의 일 실시예에 의하면, 상기 사용자의 감정을 정해진 주기 동안 복수개의 기본 감정 중 하나로 분류하는 방법은, 상기 음량 데시벨을 기준으로 산출된 하나의 감정(P1)을 구하고, 상기 음정 주파수를 기준으로 산출된 다른 하나의 감정(P2)을 구하고, 상기 발성 속도를 기준으로 산출된 또 다른 하나의 감정(P3)를 구한 후, 구해진 3개의 감정 P1, P2, P3 중에서 경계치를 가장 크게 벗어난 하나의 감정을 선택하는 것이 적합하다.According to an embodiment of the present invention, the method of classifying the emotion of the user into one of a plurality of basic emotions for a predetermined period, obtains one emotion (P1) calculated based on the volume decibels, and based on the pitch frequency After obtaining another emotion (P2) calculated by, and another emotion (P3) calculated on the basis of the speech speed, one of the three emotions P1, P2, P3 obtained from the most out of the boundary value It is appropriate to choose emotions.

바람직하게는, 상기 복수개의 기본 감정 중 하나로 분류된 감정을 상기 감정값 응용 서버로 전송하는 단계 전에, 상기 복수개의 기본 감정 중 하나로 선택된 감정에 상응하는 제어 명령을 상기 감정 분석 서버와 유무선 통신망으로 연결된 감정 조절 하드웨어 장치로 전송하는 단계를 더 구비할 수도 있다.Preferably, before transmitting the emotion classified as one of the plurality of basic emotions to the emotion value application server, a control command corresponding to the selected emotion as one of the plurality of basic emotions is connected to the emotion analysis server through a wired / wireless communication network. The method may further include transmitting to the emotion control hardware device.

이때, 상기 감정 조절 하드웨어 장치는, 인공지능 스피커 및 온습도 조절용 공조 장치로 이루어진 사용자의 감정에 영향을 미칠 수 있는 장치 군(group) 중에서 선택된 하나인 것이 적합하다.In this case, the emotion control hardware device is one selected from the group of devices that can affect the user's emotion consisting of the artificial intelligence speaker and the air conditioning device for controlling the temperature and humidity.

따라서, 상술한 본 발명의 기술적 사상에 의하면, 첫째, 사용자 개인의 음성 정보를 스마트 폰과 같은 단말기에서 수집하여 이를 다른 형태의 음성 정보인 생체 음성 신호, 예컨대 음량 데시벨, 음정 주파수, 발성 속도로 세분화하여 빅 데이터를 만들고, 이러한 빅 데이터를 지도 학습에 의한 딥 러닝 기법으로 분석하여 특정 주기 동안의 사용자 개인의 감정을 복수개의 감정들 중에서 가장 근접한 특정 맞춤형 감정값으로 추론하여 얻을 수 있다.Therefore, according to the technical idea of the present invention, first, the user's voice information is collected in a terminal such as a smart phone and subdivided into other forms of biometric voice signals, for example, volume decibels, pitch frequencies, and voice speeds. By making big data, the big data can be analyzed by deep learning by supervised learning, and the user's emotions during a specific cycle can be inferred as a specific customized emotion value closest to the plurality of emotions.

둘째, 추론하여 얻어진 맞춤형 감정값을 사용자 단말기, 또는 사용자가 그작동 환경에 노출되는 감정 조절 하드웨어 장치, 예컨대 인공지능이 탑재된 인공지능 스피커 혹은 감정을 조절할 수 있는 온습도 조절 기능이 있는 공조 장치로 송신하여 이를 적극적으로 활용될 수 있다.Secondly, send the inferred personalized emotion value to the user terminal or an emotion control hardware device that the user is exposed to the operating environment, such as an artificial intelligence-equipped artificial speaker or an air conditioner with a temperature and humidity control function that can control emotion. This can be used actively.

셋째, 감성 분석 서버에서 추론하여 얻어진 맞춤형 감정값을 유무선 통신망을 통해 연결된 감정값 응용서버 예컨대 사용자 개인의 의료 기록이 수집될 수 있는 서버로 전송하여 의료 기록의 참고 자료로 활용될 수 있다.Third, the personalized emotion value obtained by inferring from the emotion analysis server may be transmitted to an emotion value application server connected through a wired / wireless communication network, for example, to a server capable of collecting medical records of a user, and may be used as a reference for medical records.

도 1은 본 발명의 실시예에 따른 음성 데이터 기반의 감정인식 시스템의 구성을 설명하기 위한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 음성 데이터 기반의 감정을 인식하는 절차를 설명하기 위한 플로챠트이다.
도 3은 본 발명의 바람직한 일 실시예에 의한 감정 분석 서버의 구성을 설명하기 위한 블록도이다.
도 4는 본 발명의 일 실시예에 따른 사용자의 직업에 따른 스트레스 보정값을 나타낸 도표이다.
도 5는 본 발명의 일 실시예에 따른 감정 분석이 이루어지는 날씨에 따른 피로도 보정값을 나타낸 도펴이다.
도 6은 본 발명의 일 실시예에 따른 사용자 단말기의 구성을 설명하기 위한 블록도이다.
도 7은 본 발명의 일 실시예에 따른 스트레스 완화 이미지가 표시된 사용자 단말기의 표시 화면이다.
도 8은 본 발명의 제1 실시예에 따른 음성 데이터 기반의 감정 인식 과정들을 도시한 플로차트이다.
도 9는 본 발명의 제2 실시예에 의한 음성 데이터 기반의 감정 인식 과정과 이를 응용하는 방법을 설명하기 위한 플로차트이다.
도 10은 본 발명의 제2 실시예에 의한 음성 데이터 기반의 감정 인식 과정과 이를 응용하는 방법을 설명하기 위한 개념도이다.
도 11은 도 9의 빅 데이터, 지도 학습에 의한 딥 러닝 및 맞춤형 감정값 산출 방법을 설명하기 위한 그래프이다.
도 12는 도 9의 맞춤형 감정값의 여러 유형을 나타내는 블록도이다.1 is a block diagram for explaining the configuration of a voice data-based emotion recognition system according to an embodiment of the present invention.
2 is a flowchart illustrating a procedure for recognizing emotion based on voice data according to an embodiment of the present invention.
3 is a block diagram illustrating a configuration of an emotion analysis server according to an exemplary embodiment of the present invention.
4 is a diagram illustrating a stress correction value according to a job of a user according to an exemplary embodiment of the present invention.
Figure 5 is a diagram showing the fatigue correction value according to the weather in which the emotional analysis according to an embodiment of the present invention.
6 is a block diagram illustrating a configuration of a user terminal according to an embodiment of the present invention.
7 is a display screen of a user terminal displaying a stress relief image according to an embodiment of the present invention.
8 is a flowchart illustrating processes of emotion recognition based on voice data according to the first embodiment of the present invention.
9 is a flowchart for explaining a voice data based emotion recognition process and a method of applying the same according to a second embodiment of the present invention.
FIG. 10 is a conceptual diagram illustrating a voice data based emotion recognition process and a method of applying the same according to a second embodiment of the present invention.
FIG. 11 is a graph for explaining a method of calculating deep data and customized emotion values according to the big data and supervised learning of FIG. 9.
12 is a block diagram illustrating various types of customized emotion values of FIG. 9.

본 발명의 구성 및 효과를 충분히 이해하기 위하여, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예들을 설명한다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라, 여러 가지 형태로 구현될 수 있고 다양한 변경을 가할 수 있다. 아래에 소개되는 실시예들은 단지 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술 분야의 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위하여 제공되는 것이다. DETAILED DESCRIPTION In order to fully understand the constitution and effects of the present invention, preferred embodiments of the present invention will be described with reference to the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be embodied in various forms and various changes may be made. The embodiments introduced below are provided only to make the disclosure of the present invention complete, and to fully inform the scope of the invention to those skilled in the art.

단수의 표현은 문맥상 명백하게 다르게 표현하지 않는 한, 복수의 표현을 포함한다. 예컨대 "포함한다" 또는 "가진다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하기 위한 것으로, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들이 부가될 수 있는 것으로 해석될 수 있다. Singular expressions include plural expressions unless the context clearly indicates otherwise. For example, the terms "comprises" or "having" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described on the specification, and one or more other features or numbers. It can be interpreted that steps, actions, components, parts or combinations thereof can be added.

본 발명에서 말하는 음성 정보란, 사용자에 의해 단말기로 제공되는 음성 신호와 상기 단말기에서 이를 추출하여 얻어지는 생체 음성 신호를 포괄하는 의미이다. 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. Voice information in the present invention is meant to encompass the voice signal provided by the user to the terminal and the biological voice signal obtained by extracting it from the terminal. Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries are to be interpreted as having meanings consistent with the meanings in the context of the related art, and are not construed in ideal or excessively formal meanings unless expressly defined in this application. .

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써 본 발명을 상세히 설명한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 음성 데이터 기반의 감정인식 시스템의 구성 블록도이며, 도 2는 본 발명의 실시예에 따른 음성 데이터 기반의 감정을 인식하는 흐름도이다.1 is a block diagram of a voice data based emotion recognition system according to an embodiment of the present invention, Figure 2 is a flow chart for recognizing the voice data based emotion according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 본 발명에 의한 음성 데이터 기반의 감정 인식 시스템은, 유무선 통신망(100), 사용자 단말기(200), 감정 분석 서버(300), 감정값 응용 서버(400) 및 감정 조절 하드웨어 장치(500)를 포함할 수 있다.1 and 2, in the voice data-based emotion recognition system according to the present invention, a wired / wireless communication network 100, a user terminal 200, an emotion analysis server 300, an emotion value application server 400, and an emotion Adjustment hardware device 500 may be included.

상기 유무선 통신망(100)은, 사용자 단말기(200)와 감정 분석 서버(300), 감정값 응용 서버(400) 및 감정 조절 하드웨어 장치(500) 간에 유선 통신 또는 무선 통신을 제공한다. 이러한 유무선 통신망(100)이 무선 통신망으로 구현되는 경우, 기지국(BTS; Base Transceiver Station), 이동 교환국(MSC; Mobile Switching Center) 및 홈 위치 등록기(HLR; Home Location Register)로 이루어진 무선 이동통신망을 이용하여 데이터 통신을 할 수 있다. 또한 유무선 통신망(100)이 유선 통신망으로 구현되는 경우, 네트워크 통신망으로 구현될 수 있는데 TCP/IP(Transmission Control Protocol/Internet Protocol) 등의 인터넷 프로토콜에 따라서 데이터 통신이 이루어질 수 있다.The wired / wireless communication network 100 provides wired or wireless communication between the user terminal 200, the emotion analysis server 300, the emotion value application server 400, and the emotion control hardware device 500. When the wired / wireless communication network 100 is implemented as a wireless communication network, a wireless mobile communication network including a base transceiver station (BTS), a mobile switching center (MSC), and a home location register (HLR) is used. Data communication is possible. In addition, when the wired / wireless communication network 100 is implemented as a wired communication network, it may be implemented as a network communication network, and data communication may be performed according to an Internet protocol such as Transmission Control Protocol / Internet Protocol (TCP / IP).

상기 사용자 단말기(200)는, 감정 분석을 수행하는 사용자가 사용하는 단말기로서, 도면에서는 스마트폰(smart phone)을 예로 들어 설명하나, 스마트폰 뿐만 아니라 음성 정보를 수집할 수 있는 수단, 예컨대 데스크 탑 PC(desktop PC), 태블릿 PC(tablet PC), 슬레이트 PC(slate PC), 노트북 컴퓨터(notebook computer), 디지털방송용 단말기, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 내비게이션(Navigation), 디지털카메라(Digital Camera), MP3P(MPEG layer 3 Player) 등이 해당될 수 있다. 물론, 본 발명이 적용 가능한 단말기는 상술한 종류에 한정되지 않고, 외부 장치와 통신이 가능한 단말기를 모두 포함할 수 있음은 당연하다.The user terminal 200 is a terminal used by a user performing an emotion analysis. In the drawing, a smart phone is described as an example, but means for collecting voice information as well as a smartphone, for example, a desktop. PC (desktop PC), tablet PC (tablet PC), slate PC, notebook computer, digital broadcasting terminal, PDA (Personal Digital Assistants), Portable Multimedia Player (PMP), navigation, A digital camera, an MP3P (MPEG layer 3 Player), and the like may correspond. Of course, the terminal to which the present invention is applicable is not limited to the above-described type, and of course, may include all terminals capable of communicating with external devices.

상기 사용자 단말기(200)는, 사용자로부터 입력되는 음성 데이터로부터 음성 주파수, 음성 데시벨, 및 발성 속도를 포함한 사용자 음성 정보인 음성 생체 신호를 추출하여 감정 분석 서버(300)로 전송하며, 상기 사용자 음성 정보에 기반한 사용자 맞춤형 감정값을 감정 분석 서버(300)로부터 수신하여 표시하는 기능을 수행한다. 이때 사용자에 의해 제공되는 음성 데이터를 음성 주파수, 음성 데시벨, 및 발성 속도와 같은 생체 음성 신호로 추출하는 방법은, 사용자 단말기에 이미 설치되어 동작되는 공지된 응용 프로그램잉 인터페이스(API: application programming interface)의 구동에 의해 가능하다.The user terminal 200 extracts a voice biosignal, which is user voice information including voice frequency, voice decibels, and voice speed, from the voice data input from the user, and transmits the voice biosignal to the emotion analysis server 300. Performs a function of receiving and displaying a user-specific emotion value based on the emotion analysis server 300. In this case, a method of extracting voice data provided by a user into a biometric signal such as voice frequency, voice decibel, and voice speed may include a known application programming interface (API) already installed and operated in a user terminal. It is possible by driving.

여기서 사용자로부터 입력되는 음성 데이터라 함은, 음성/영상 통화 중의 음성, 녹음된 음성, 실시간 입력 음성 등과 같이 단말기에 입력될 수 있는 음성에 해당한다면 다양한 형태 및 다양한 상황에서의 음성이 모두 해당될 수 있다. 또한 추출되는 사용자 음성 정보는, 음성의 주파수(Hertz), 음성의 데시벨(decibel), 발성 속도(speed) 등이 해당될 수 있다. 이러한 사용자 단말기(200)에 대해서는 후술할 도 6 및 도 7과 함께 자세히 상술한다.Herein, the voice data input from the user may correspond to voices that can be input to the terminal, such as voices in voice / video calls, recorded voices, real-time input voices, and the like, and may correspond to voices in various forms and in various situations. have. In addition, the extracted user voice information may correspond to a frequency (Hertz) of the voice, a decibel of the voice, a speech speed, and the like. The user terminal 200 will be described in detail with reference to FIGS. 6 and 7 to be described later.

상기 감정 분석 서버(300)는, 하드웨어적으로는 통상적인 웹 서버와 동일한 구성을 가지며, 소프트웨어적으로는 C, C++, Java, Visual Basic, Visual C 등과 같은 다양한 형태의 언어를 통해 구현되어 여러 가지 기능을 하는 프로그램 모듈을 포함한다. 또한, 일반적인 서버용 하드웨어에 도스(dos), 윈도우(window), 리눅스(linux), 유닉스(unix), 매킨토시(macintosh) 등의 운영 체제에 따라 다양하게 제공되고 있는 웹 서버 프로그램을 이용하여 구현될 수 있다.The emotion analysis server 300 has the same configuration as a conventional web server in hardware, and is implemented through various types of languages such as C, C ++, Java, Visual Basic, Visual C, etc. in software. Contains program modules that function. In addition, it can be implemented by using a web server program that is variously provided according to operating systems such as DOS, Windows, Linux, Unix, Macintosh, etc. in general server hardware. have.

상기 감정 분석 서버(300)는, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보를 기반으로 미리 설정된 기준 음성 정보를 기준으로 판단되는 객관적인 감정을 나타낸 객관적 감정값을 산출한다. 또한 사용자 단말기(200)로부터 수신되는 사용자 음성 정보와 누적 저장된 사용자 음성 누적 정보를 비교하여 사용자 개별 감정인 감정 가중치를 산출한다. 그리고 객관적 감정값에 감정 가중치를 추가 적용한 사용자 맞춤형 감정값을 생성하여 사용자 단말기(200), 감정값 응용서버(400) 및 감정 조정 하드웨어 장치(500)로 전송한다. 상기 맞춤형 감정값의 생성 방법의 일 예에 대해서는 후술되는 도 11 및 도 12를 참조하여 다시 한번 설명하기로 한다.The emotion analysis server 300 calculates an objective emotion value indicating an objective emotion determined based on preset reference voice information based on user voice information received from the user terminal 200. In addition, by comparing the user voice information received from the user terminal 200 and the cumulatively stored user voice cumulative information to calculate the emotion weight of the individual user emotion. In addition, a user-customized emotion value to which an emotion weight is added to the objective emotion value is generated and transmitted to the user terminal 200, the emotion value application server 400, and the emotion adjustment hardware device 500. An example of the method of generating the customized emotion value will be described once again with reference to FIGS. 11 and 12.

상기 감정값 응용 서버(400)는 상기 감정 분석 서버(300)와 유무선 통신망(100)을 통해 연결될 수 있다. 상기 감정값 응용 서버(400)는 상기 감정 분석 서버(300)로부터 맞춤형 감정값을 제공받을 수 있다. 이때 상기 감정값 응용 서버는 단말기(200) 사용자의 개인 의료 기록이 수집되어 보관되는 기관의 정보를 주관하는 서버일 수 있다. The emotion value application server 400 may be connected to the emotion analysis server 300 through a wired or wireless communication network 100. The emotion value application server 400 may receive a customized emotion value from the emotion analysis server 300. In this case, the emotion value application server may be a server that manages the information of the institution in which the personal medical record of the user of the terminal 200 is collected and stored.

상기 개인 의료 기록은 뇌파, 맥파, 혈압 및 각종 신체의 상태를 나타내는 의료 기록일 수 있다. 따라서 감정값 응용 서버(400)는 상기 감정 분석 서버(300)로부터 제공받은 단말기(200) 사용자 개인의 맞춤형 감정값을 내부에 보유중인 개인 의료 기록과 연계/비교하여 상관 관계를 분석하여 보다 효과적인 의료 행위를 수행할 수 있다. 따라서 감정 분석 서버(300)에서 제공된 맞춤형 값정값을 이것이 제공되지 않을 때와 비교하여 보다 효율적으로 응용할 수 있다. 여기서 감정값 응용 서버(400)는 개인의 의료 기록을 보관 중인 서버에만 국한되지 않고 주기별 개인의 맞춤형 감정값을 참고하여 응용할 수 있는 분야이면 다른 종류의 서버에도 적용될 수 있음은 물론이다.The personal medical record may be a medical record indicating brain waves, pulse waves, blood pressure, and various body conditions. Therefore, the emotion value application server 400 analyzes the correlation by linking / comparing the personalized emotion value of the user of the terminal 200 provided from the emotion analysis server 300 with the personal medical record held therein to more effectively perform medical care. You can perform an action. Therefore, the customized value provided by the emotion analysis server 300 can be compared more efficiently than when it is not provided. Here, the emotion value application server 400 is not limited to a server that stores an individual's medical record, but may be applied to other types of servers as long as the field can be applied by referring to a personalized emotion value for each cycle.

상기 감정 조절 하드웨어 장치(500)는, 상기 감정 분석 서버(300)와 유무선 통신망(100)으로 연결되어 상기 감정 분석 서버(300)에서 획득된 맞춤형 감정값을 보다 적극적으로 활용할 수 있는 수단이 된다. 이를 위하여 상기 감정 조절 하드웨어 장치(500)는, 내부에 사물 인터넷(IoT) 통신이 가능한 구성과, 감정 분석 서버에서 추론된 맞춤형 감정값에 따른 제어 명령, 예컨대 조작 명령을 받아서 상기 단말기 사용자가 노출된 환경에서 사용자의 감정 조절에 영향을 줄 수 있는 동작을 수행할 수 있는 내부 구성을 갖는 것이 적합하다. 일 예로 상기 감정 조절 하드웨어 장치(500)는, 상기 사용자가 그 동작 환경에 노출되는 에어컨, 히터 및 습도 조절 장치로 이루어진 공정 시스템 중에서 선택된 어느 하나이거나, 내부에 인공 지능이 장착되어 상기 사용자와 대화를 할 수 있는 인공지능 스피커일 수 있다.The emotion control hardware device 500 is connected to the emotion analysis server 300 and the wired / wireless communication network 100 to become a means for more actively utilizing the customized emotion value obtained from the emotion analysis server 300. To this end, the emotion control hardware device 500 is configured to be capable of communicating with the Internet of Things (IoT) therein, and receives a control command based on a customized emotion value inferred from the emotion analysis server, for example, an operation command to expose the terminal user. It is appropriate to have an internal configuration that can perform actions that can affect the user's emotional control in the environment. For example, the emotion control hardware device 500 may be any one selected from a process system including an air conditioner, a heater, and a humidity control device that the user is exposed to the operating environment, or an artificial intelligence may be installed therein to communicate with the user. It can be an AI speaker that can.

따라서 단말기(200) 사용자의 맞춤형 감정값이 흥분 상태일 때, 개인이 사용하는 에어컨과 같은 공조 장치의 온도를 정해진 범위만큼 낮추거나, 인공지능 스피커일 경우에는 차분해 질 수 있는 대화 모드를 지정할 수 있다. 동일하게 단말기(200) 사용자의 맞춤형 감정값이 긴장 상태일 때, 개인이 사용하는 히터와 같은 공조 장치의 온도를 정해진 범위만큼 높이거나, 인공지능 스피커일 경우에는 편안하게 감정을 가져갈 수 있는 대화 모드를 지정할 수 있다. 따라서 상기 감정값 응용 서버(400) 및 감정 조절 하드웨어 장치(500)는 감정 분석 서버에서 산출된 맞춤형 감정값을 보다 적극적으로 활용할 수 있는 수단이 된다 할 수 있다. 이에 함께 본 발명에 의한 감정 조절 하드웨어 장치(500)는 비단 온습도 조절 공조 장치 및 인공 지능 스피커에만 제한되지 않고 단말기(200) 사용자의 감정을 제어할 수 있는 다른 하드웨어 장치에 폭넓게 응용될 수 있음은 물론이다.Therefore, when the personalized emotion value of the terminal 200 user is excited, the temperature of an air conditioning device such as an air conditioner used by an individual may be lowered by a predetermined range, or, in the case of an artificial intelligence speaker, a conversation mode that may be dissipated may be specified. . In the same manner, when the user's personalized emotion value is in a tense state, the temperature of an air conditioning device such as a heater used by an individual is increased by a predetermined range, or in the case of an artificial intelligence speaker, a conversation mode that can bring emotions comfortably. Can be specified. Accordingly, the emotion value application server 400 and the emotion control hardware device 500 may be means for actively utilizing the customized emotion value calculated by the emotion analysis server. In addition, the emotion control hardware device 500 according to the present invention can be widely applied to other hardware devices that can control the emotions of the user of the terminal 200, without being limited to the temperature and humidity control air conditioning device and the artificial intelligence speaker. to be.

이하 도3 내지 도 5와 함께 감정 분석 서버(300)에 대하여 설명한다.Hereinafter, the emotion analysis server 300 will be described with reference to FIGS. 3 to 5.

도 3은 본 발명의 실시예에 따른 감정 분석 서버의 구성 블록도이며, 도 4는 본 발명의 실시예에 따른 사용자의 직업에 따른 스트레스 보정값을 나타낸 그림이며, 도 5는 본 발명의 실시예에 따른 감정 분석이 이루어지는 날씨에 따른 피로도 보정값을 나타낸 그림이다.3 is a block diagram illustrating an emotion analysis server according to an embodiment of the present invention, FIG. 4 is a diagram showing a stress correction value according to a user's occupation according to an embodiment of the present invention, and FIG. 5 is an embodiment of the present invention. This figure shows the fatigue correction value according to the weather in which the emotional analysis is performed.

상기감정 분석 서버(300)는, 도 3에 도시한 바와 같이 서버 통신부(350), 객관적 감정값 DB(320), 사용자 음성 누적 정보 DB(340), 객관적 감정값 분석부(310), 감정 가중치 산출부(330), 및 사용자 맞춤형 감정값 제공부(360)를 포함할 수 있다. 이밖에 스트레스 보정값 산출부(370), 피로도 보정값 산출부(380)를 더 포함할 수 있다.As shown in FIG. 3, the emotion analysis server 300 may include a server communication unit 350, an objective emotion value DB 320, a user speech accumulation information DB 340, an objective emotion value analysis unit 310, and an emotion weight. The calculator 330 may include a user-specific emotion value providing unit 360. In addition, the stress correction value calculator 370 and the fatigue correction value calculator 380 may be further included.

상기 서버 통신부(350)는, 사용자 단말기(200)와 감정값 응용서버(400) 및 감정 조정 하드웨어 장치(500)와 통신하는 하드웨어 및 소프트웨어의 프로토콜을 지원한다. 서버 통신부(350)는, 예컨대, TCP/IP(Transmission Control Protocol/Internet Protocol) 등의 인터넷 프로토콜에 따라서 데이터 통신이 이루어질 수 있다.The server communication unit 350 supports hardware and software protocols for communicating with the user terminal 200, the emotion value application server 400, and the emotion adjustment hardware device 500. The server communication unit 350 may perform data communication according to, for example, an Internet protocol such as Transmission Control Protocol / Internet Protocol (TCP / IP).

상기 객관적 감정값 DB(320)는, 음성의 음성 주파수, 음성 데시벨, 및 발성 속도를 포함한 음성 정보별로 객관적 감정값이 할당되어 저장된 데이터베이스(DB; DataBase)이다. 여기서 객관적 감정값은, 각 개인별 감정이 아니라 음성 정보를 통해 객관적으로 판단되는 감정값으로서, 스트레스 지수(흥분, 긴장, 불안), 피로도 지수(피곤, 나른, 무기력), 및 감성 성분(편안, 기쁨, 만족) 등이 해당될 수 있다.The objective emotion value DB 320 is a database (DB; DataBase) in which objective emotion values are allocated and stored for each voice information including voice frequency, voice decibel, and voice speed of the voice. Here, the objective emotional value is an emotional value that is objectively determined through voice information, not individual feelings, such as stress index (excitement, tension, anxiety), fatigue index (fatigue, relaxing, lethargy), and emotional components (comfort, joy). , Satisfaction), and the like.

참고로 데이터베이스(DB)는 하드디스크 드라이브(Hard Disk Drive), SSD 드라이브(Solid State Drive), 플래시메모리(Flash Memory), CF카드(Compact Flash Card), SD카드(Secure Digital Card), SM카드(Smart Media Card), MMC 카드(Multi-Media Card) 또는 메모리 스틱(Memory Stick) 등 정보의 입출력이 가능한 모듈로서 장치의 내부에 구비되어 있을 수도 있고, 별도의 장치에 구비되어 있을 수도 있다.For reference, the database includes a hard disk drive, a solid state drive, a flash memory, a compact flash card, a secure digital card, and an SM card. A module capable of inputting / outputting information such as a smart media card, a multi-media card, a memory stick, or a memory stick may be provided inside the device, or may be provided in a separate device.

상기 사용자 음성 누적 정보 DB(340)는, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보가 누적되어 저장되는 데이터베이스이다. 각 사용자의 ID별로, 사용자의 음성 주파수, 음성 데시벨, 발성 속도가 날짜 별로 누적 저장되는 데이터베이스이다.The user voice accumulation information DB 340 is a database in which user voice information received from the user terminal 200 is accumulated and stored. The user's voice frequency, voice decibel, and voice speed are accumulated and stored by date for each user ID.

상기 객관적 감정값 분석부(310)는, 공지되어 있는 결정 융합 방법 등의 다양한 음성 분석 방식이 활용되어 사용자의 객관적 감정값을 분석한다. 결정 융합 방법은, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보와 매칭되는 객관적 감정값을 객관적 감정값 DB(320)로부터 추출한다. 예를 들어, 음성 주파수가 기준 주파수보다 높을 경우 스트레스가 높다고 판단되는 감정값이며, 음성 데시벨이 기준 데시벨보다 낮을 경우 피로도가 높다고 판단되는 감정값이다.The objective emotion value analyzer 310 analyzes the objective emotion value of the user by utilizing various voice analysis methods such as a known crystal fusion method. The decision fusion method extracts an objective emotion value matching the user voice information received from the user terminal 200 from the objective emotion value DB 320. For example, when the voice frequency is higher than the reference frequency, the emotion value is determined to be high stress, and when the voice decibel is lower than the reference decibel, the emotion value is determined to be high fatigue.

나아가, 객관적 감정값 분석부(310)는, 감정 모델에 관한 음성 정보에 대하여, 예컨대 하나 또는 그 이상의 통계적 방법을 이용하여 정보의 패턴을 분석하여, 상기 감정 모델을 감정별 및/또는 측정 대상별로 분류한다. 사용자 음성 정보의 패턴 분석을 위하여 판별 모형, 로지스틱 모형, 신경망 구조 (Neural Network), 의사 결정 나무 (Decision Tree) 등을 적용한 후, 그 중 정확도(예컨대, 70%~78%의 수준) 및/또는 신뢰도가 높은 방법을 최종적으로 선택할 수 있다.Furthermore, the objective emotion value analyzer 310 analyzes the pattern of information on the voice information regarding the emotion model, for example, using one or more statistical methods, and analyzes the emotion model by emotion and / or measurement target. Classify. In order to analyze the pattern of the user's voice information, a discriminant model, a logistic model, a neural network, a decision tree, etc. are applied, and the accuracy (eg, 70% to 78%) and / or The method with high reliability can finally be selected.

한편, 상기의 객관적 감정값 분석만을 사용자 개개인의 특성을 고려한 정확한 감정 분석이 이루어지지 않는다. 예를 들어, 원래 평상시에 발성 속도가 빠르거나, 고주파수의 음성을 가지는 사용자를 대상으로 획일적인 객관적 감정값 분석만으로는 정확한 분석이 이루어지지 않는다. 이에 본 발명은, 각 사용자의 과거의 누적 저장된 음성 데이터를 기반으로 감정 가중치를 산출하고 이러한 감정 가중치를 객관적 감정값에 추가 적용하여 사용자 맞춤형 감정값을 제공한다. 여기서 감정 가중치는, 스트레스 지수에 적용되는 스트레스 가중치와, 피로도 지수에 적용되는 피로도 가중치를 포함할 수 있다.On the other hand, only the objective emotional value analysis described above does not perform accurate emotional analysis considering the characteristics of each user. For example, an accurate analysis may not be performed only by a single objective emotional value analysis for a user who normally has a high voice speed or a high frequency voice. Accordingly, the present invention calculates emotion weights based on the accumulated cumulative stored voice data of each user and further applies the emotion weights to objective emotion values to provide user-specific emotion values. The emotion weight may include a stress weight applied to the stress index and a fatigue weight applied to the fatigue index.

상기 감정 가중치 산출부(330)는, 사용자 음성 누적 정보 DB(340)를 이용하여 사용자의 평균 음성 주파수, 평균 음성 데시벨, 및 평균 발성 속도를 포함하는 사용자 음성 평균 정보를 추출하여, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보와 상기 사용자 음성 평균 정보를 비교하여 감정 가중치를 산출한다. 예를 들어, 사용자의 평균 음성 주파수보다 높은 사용자 음성 주파수를 가지는 경우, 스트레스 지수의 감정 가중치가 '+1'로 결정되며, 사용자의 평균 음성 주파수보다 낮은 사용자 음성 주파수를 가지는 경우 스트레스 지수의 감정 가중치가 '-1'로 결정된다. 마찬가지로 사용자의 평균 데시벨보다 낮은 사용자 음성 데시벨을 가지는 경우 피로도 지수의 감정 가중치가 '-1'로 결정된다.The emotion weight calculator 330 extracts user voice average information including a user's average voice frequency, an average voice decibel, and an average speech speed by using the user's voice accumulated information DB 340, and the user terminal 200. Emotion weights are calculated by comparing the user voice information received from the user with the user voice average information. For example, when the user's voice frequency is higher than the user's average voice frequency, the emotional weight of the stress index is determined to be '+1', and when the user's voice frequency is lower than the user's average voice frequency, the emotional weight of the stress index is determined. Is determined to be '-1'. Similarly, when the user's voice decibel is lower than the user's average decibel, the emotional weight of the fatigue index is determined to be '-1'.

사용자 맞춤형 감정값 제공부(360)는, 객관적 감정값에 감정 가중치를 합산한 사용자 맞춤형 감정값을 생성하여 사용자 단말기(200)로 전송한다. 예를 들어, 객관적 감정값이 '+3' 단계의 스트레스 지수로서 분석되고 스트레스 감정 가중치가 '+1'로 분석되는 경우, 최종적인 스트레스 지수는 '+4'의 사용자 맞춤형 감정값을 가지게 된다.The user-customized emotion value providing unit 360 generates a user-customized emotion value obtained by adding the emotion weight to the objective emotion value and transmits the generated emotion value to the user terminal 200. For example, when the objective emotional value is analyzed as a stress index of '+3' stage and the stress emotion weight is analyzed as '+1', the final stress index has a user-customized emotional value of '+4'.

한편, 사용자의 누적된 음성 정보를 이용하여 감정 가중치를 사용자 맞춤형 감정값에 적용하였는데, 좀 더 정확한 감정값 분석을 위하여 사용자의 직업이 활용될 수 있다. 사무직보다 영업직이나 판매직과 같이 사람들을 상대하는 직업일수록 사소한 스트레스에 더 큰 스트레스를 받을 수 있기 때문이다.Meanwhile, although an emotion weight is applied to a user-specific emotion value by using the accumulated voice information of the user, the user's job may be used for more accurate emotion value analysis. Jobs that deal with people, such as sales and sales, are more likely to be stressed out than minor ones.

이를 위해 음성 데이터 기반의 감정인식 시스템은, 감정 분석을 받는 사용자의 직업에 따라서 스트레스 보정값을 산출하는 스트레스 보정값 산출부(370)를 더 포함할 수 있다. 감정 가중치 산출부(330)는, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보와 사용자 음성 평균 정보를 비교하여 산출되는 스트레스 지수에 사용자의 스트레스 보정값을 적용하여 보정되는 스트레스 지수를 사용자의 감정 가중치로서 산출한다. 예를 들어, 도 4에 도시한 바와 같이 영업직, 판매직, 사무직별로 각각 다른 스트레스 보정값이 할당되도록 하여, 분석되는 사용자의 직업에 따라서 스트레스 보정값이 추가 적용된다.To this end, the voice data-based emotion recognition system may further include a stress correction value calculator 370 that calculates a stress correction value according to a job of a user who is subjected to emotion analysis. The emotion weight calculator 330 may apply the user's emotion weight to a stress index that is corrected by applying a stress correction value of the user to a stress index calculated by comparing the user's voice information received from the user terminal 200 with the user's voice average information. Calculated as For example, as shown in FIG. 4, different stress correction values are assigned to each of the sales, sales, and office workers, and the stress correction value is additionally applied according to the job of the user to be analyzed.

마찬가지로, 피로도 지수의 좀 더 정확한 분석을 위하여 분석되는 날의 날씨가 활용될 수 있다. 맑은 날씨보다 비오는 날씨 등과 같이 우울한 날씨가 근육통 등의 이유로 피로도가 더 높을 수 있기 때문이다.Likewise, the weather of the day being analyzed can be used for a more accurate analysis of the fatigue index. This is because depressing weather such as rainy weather may have higher fatigue due to muscle pain than sunny weather.

이를 위하여 감정인식 시스템은, 감정 분석이 이루어지는 날씨에 따라서 피로도 보정값을 산출하는 피로도 보정값 산출부(380)를 포함한다. 감정 가중치 산출부(330)는, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보와 사용자 음성 평균 정보를 비교하여 산출되는 피로도 지수에 사용자의 피로도 보정값을 적용하여 보정되는 피로도 지수를 사용자의 감정 가중치로서 산출할 수 있다. 예를 들어, 도 5에 도시한 바와 같이 흐린 날씨, 맑은 날씨 별로 각각 다른 피로도 보정값이 할당되도록 하여, 분석되는 날씨에 따라서 피로도 보정값이 추가 적용된다.To this end, the emotion recognition system includes a fatigue correction value calculator 380 that calculates a fatigue correction value according to the weather in which the emotion analysis is performed. The emotion weight calculation unit 330 may calculate a user's emotion weight by applying a fatigue index corrected by applying the user's fatigue correction value to the fatigue index calculated by comparing the user's voice information received from the user terminal 200 with the user's voice average information. It can be calculated as For example, as illustrated in FIG. 5, different fatigue correction values are allocated to each of the cloudy weather and the sunny weather, and the fatigue correction value is additionally applied according to the weather being analyzed.

도 6은 본 발명의 실시예에 따른 사용자 단말기의 구성 블록도이며, 도 7은 본 발명의 실시예에 따른 스트레스 완화 이미지가 표시된 사용자 단말기의 표시 화면 그림이다.6 is a block diagram illustrating a user terminal according to an exemplary embodiment of the present invention, and FIG. 7 is a view illustrating a display screen of a user terminal displaying a stress relief image according to an exemplary embodiment of the present invention.

상기 사용자 단말기(200)는 도 6에 도시한 바와 같이, 단말기 통신부(210), 단말기 입력부(220), 단말기 표시부(230), 사용자 음성 정보 제공부(240), 및 단말기 제어부(270)를 포함할 수 있다. 이밖에 배경 음악 DB(250), 스트레스 완화 이미지 DB(260)를 포함할 수 있다.As illustrated in FIG. 6, the user terminal 200 includes a terminal communication unit 210, a terminal input unit 220, a terminal display unit 230, a user voice information providing unit 240, and a terminal control unit 270. can do. In addition, the background music DB 250 and the stress relief image DB 260 may be included.

상기 단말기 통신부(210)는, 이동통신망을 통하여 통신하는 기능을 수행하는 모듈로서, 3G, 4G 등의 이동 통신을 수행하는 경우에는, 무선 송신되는 신호의 주파수를 상승 변환 및 증폭하는 RF송신기(미도시)와, 수신되는 무선 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF수신기(미도시) 등을 포함한다.The terminal communication unit 210 is a module that performs a function of communicating through a mobile communication network. When performing a mobile communication such as 3G or 4G, an RF transmitter (not shown) which upconverts and amplifies a frequency of a wirelessly transmitted signal. And an RF receiver (not shown) for low noise amplifying the received wireless signal and down converting the frequency.

단말기 입력부(220)는, 음성 데이터를 입력받는 모듈로서, 마이크 등이 해당될 수 있다. 사용자 단말기(200)가 스마트폰인 경우 사용자가 음성 분석을 원할 경우 스마트폰의 음성인식 센서(voice recognition sensor)을 통해 사용자 본인 음성, 상대방 음 등을 입력 받을 수 있다.The terminal input unit 220 is a module that receives voice data, and may correspond to a microphone. When the user terminal 200 is a smartphone, when the user wants to analyze the voice, the user's voice and the other party's sound may be input through the smartphone's voice recognition sensor.

단말기 표시부(230)는, 사용자 맞춤형 감정값이 표시되는 디스플레이 모듈이다. 이러한 단말기 입력부(220)와 단말기 표시부(230)는 터치스크린패널의 단일 형태로 구현될 수 있다. 터치스크린패널은, 입력과 표시를 동시에 수행할 수 있는 터치 스크린 화면을 제공하여 단말기의 전면에 마련되어 작업 화면을 표시하는 표시창으로서, 사용자와의 소통을 위한 그래픽 유저 인터페이스(GUI; Graphic User Interface)를 표시한다.The terminal display unit 230 is a display module that displays a user-specific emotion value. The terminal input unit 220 and the terminal display unit 230 may be implemented as a single type of touch screen panel. The touch screen panel is a display window provided on the front of the terminal to display a work screen by providing a touch screen screen which can simultaneously perform input and display. A touch screen panel includes a Graphic User Interface (GUI) for communication with a user. Display.

상기 사용자 음성 정보 제공부(240)는, 사용자로부터 입력되는 음성 데이터로부터 생체 음성 신호, 예컨대 음성 주파수, 음성 데시벨, 및 발성 속도를 포함한 사용자 음성 정보를 추출하여 감정 분석 서버(300)로 전송한다. 입력되는 음성 데이터로부터 피치의 통계치, 크기 예측법에 의해 구해진 음성의 음성 주파수, 음성 데시벨, 및 발성 속도, 교차율(crossing rate), 증가율(increasing rate)의 음성 특징 점들을 사용자 음성 정보로서 추출할 수 있다.The user voice information providing unit 240 extracts the user voice information including the biological voice signal, for example, the voice frequency, the voice decibel, and the voice speed from the voice data input from the user and transmits the user voice information to the emotion analysis server 300. Pitch statistics, voice decibels, and voice feature points of speech rate, crossing rate, and increasing rate can be extracted from the input voice data as user voice information. have.

상기 단말기 제어부(270)는, 각 기능 모듈을 제어하는 MCU(Main Control Unit)로 구현되어, 본 발명의 감정인식 앱이 설치된 모듈이다. 이러한 감정인식 앱은, 사용자 음성 정보에 기반한 사용자 맞춤형 감정값을 감정 분석 서버(300)로부터 수신하여 표시부에 표시한다. 참고로, 스마트폰(smart phone) 등으로 구현되는 학습자 단말기(10)는, 수백여 종의 다양한 어플리케이션(응용프로그램)을 사용자가 원하는 대로 설치하고 추가 또는 삭제할 수 있어, 사용자가 원하는 어플리케이션을 직접 제작할 수도 있으며, 다양한 어플리케이션을 통하여 자신에게 알맞은 인터페이스를 구현할 수 있다. 따라서 구글마켓, 애플스토어 등에서 감정인식 앱을 다운 로드받아 스마트폰에 설치할 수 있다.The terminal controller 270 is implemented as a MCU (Main Control Unit) for controlling each function module, the module is installed emotion recognition app of the present invention. The emotion recognition app receives a user-specific emotion value based on user voice information from the emotion analysis server 300 and displays it on the display unit. For reference, the learner terminal 10 implemented as a smart phone (smart phone), etc., hundreds of various applications (applications) can be installed and added or deleted as desired by the user, so that the user can directly create a desired application. It is also possible to implement an interface suitable for oneself through various applications. Therefore, you can download emotion recognition apps from Google Market, Apple Store, etc. and install them on your smartphone.

상기 배경 음악 DB(250)는, 스트레스 지수 별로 긴장 완화시킬 수 있는 배경 음악이 할당되어 저장된 데이터베이스이다. 단말기 제어부(270)는, 사용자 맞춤형 감정값의 스트레스 지수가 미리 설정된 임계 스트레스를 초과하는 경우, 사용자의 스트레스 지수 별로 할당된 배경 음악을 배경 음악 DB(250)에서 추출하여 재생하며, 사용자의 스트레스 지수를 실시간으로 수신하여 수신되는 스트레스 지수에 매칭되는 배경 음악으로서 바꾸어가며 재생시킬 수 있다. 따라서 사용자의 스트레스를 완화시킬 수 있는 배경 음악이 스트레스 지수에 따라서 맞춤형으로 단말기를 통해 실시간으로 업데이트되어 재생됨으로써, 효율적인 스트레스 완화 치료가 이루어질 수 있다. 즉, 스트레스 지수가 높아지면 그에 대응되는 배경 음악이 실시간으로 업데이트되어 변경 재생됨으로써, 효율적인 스트레스 완화 치료가 이루어진다.The background music DB 250 is a database in which background music for alleviating stress for each stress index is allocated and stored. When the stress index of the user-customized emotion value exceeds a preset threshold stress, the terminal controller 270 extracts and plays back the background music allocated to the stress index of the user from the background music DB 250 and the stress index of the user. Can be played in real time as the background music matching the received stress index. Therefore, the background music that can alleviate the user's stress is updated and reproduced in real time through the terminal in a customized manner according to the stress index, thereby enabling efficient stress relief treatment. That is, when the stress index is increased, the background music corresponding thereto is updated and reproduced in real time, so that an effective stress relaxation treatment is performed.

또한 스트레스 완화 이미지 DB(260)는, 스트레스를 완화시킬 수 있는 스트레스 완화 이미지가 할당되어 저장된 데이터베이스이다. 단말기 제어부(270)는, 사용자 맞춤형 감정값의 스트레스 지수가 미리 설정된 임계 스트레스를 초과하는 경우, 사용자의 스트레스 지수별로 할당된 스트레스 완화 이미지를 스트레스 완화 이미지 DB(260)에서 추출하여 도 7에 도시한 바와 같이 표시부를 통해 표시하며, 사용자의 스트레스 지수를 실시간으로 수신하여 수신되는 스트레스 지수에 매칭되는 스트레스 완화 이미지로서 바꾸어가며 표시할 수 있다. 따라서 사용자의 스트레스를 완화시킬 수 있는 스트레스 완화 이미지가 스트레스 지수에 따라서 맞춤형으로 단말기를 통해 실시간으로 업데이트되어 표시됨으로써, 효율적인 스트레스 완화 치료가 이루어질 수 있다. 참고로, 이러한 스트레스 완화 이미지는, 이미지 형태뿐만 아니라 컬러 색상, 3D 입체 형상, 증강 현실, 가상 현실 등의 디지털 콘텐츠가 모두 포함될 수 있다.In addition, the stress relaxation image DB 260 is a database in which a stress relaxation image that can alleviate stress is allocated and stored. When the stress index of the user-customized emotion value exceeds a preset threshold stress, the terminal controller 270 extracts a stress relaxation image allocated to each user's stress index from the stress relaxation image DB 260 and shows the result shown in FIG. 7. As shown through the display unit, the stress index of the user may be received in real time and displayed as a stress relief image matching the received stress index. Therefore, the stress relaxation image that can alleviate the user's stress is updated and displayed in real time through the terminal in accordance with the stress index, thereby enabling efficient stress relief treatment. For reference, the stress relief image may include not only the image form but also digital contents such as color colors, 3D shapes, augmented reality, and virtual reality.

도 8은 본 발명의 제1 실시예에 따른 음성 데이터 기반의 감정인식 과정 들을 도시한 플로차트로서 맞춤형 감정값을 사용자 단말기에 주로 활용하는 동작 방법이다.8 is a flowchart illustrating voice data-based emotion recognition processes according to the first embodiment of the present invention.

본 발명의 음성 데이터 기반의 감정인식 방법은, 도 8에 도시한 바와 같이 사용자 음성 정보 제공 과정(S810), 객관적 감정값 산출 과정(S820), 감정 가중치 산출 과정(S830), 사용자 맞춤형 감정값 제공 과정(S860), 사용자 맞춤형 감정값 표시 과정(S870)을 포함할 수 있다. 사용자 음성 정보 제공 과정(S810)은, 사용자 단말기(200)가, 사용자로부터 입력되는 음성 데이터로부터 음성 주파수, 음성 데시벨, 및 발성 속도를 포함한 사용자 음성 정보를 추출하여 감정 분석 서버(300)로 전송하는 과정이다.In the speech data-based emotion recognition method of the present invention, as shown in FIG. 8, a user voice information provision process (S810), an objective emotion value calculation process (S820), an emotion weight calculation process (S830), and a user-specific emotion value provision are provided. In operation S860, the user-customized emotion value display process may be included (S870). In the process of providing user voice information (S810), the user terminal 200 extracts user voice information including voice frequency, voice decibels, and voice speed from voice data input from the user and transmits the user voice information to the emotion analysis server 300. It is a process.

상기 객관적 감정값 산출 과정(S820)은, 감정 분석 서버(300)가, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보를 기반으로 미리 설정된 기준 음성 정보를 기준으로 판단되는 객관적인 감정을 나타낸 객관적 감정값을 산출하는 과정이다.In the objective emotion value calculation process (S820), the objective emotion value indicating the objective emotion determined by the emotion analysis server 300 based on the preset reference voice information based on the user voice information received from the user terminal 200. It is a process of calculating.

감정 가중치 산출 과정은, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보와 누적 저장된 사용자 음성 누적 정보를 비교하여 사용자 개별 감정인 감정 가중치를 산출하는 과정이다. 사용자 맞춤형 감정값 제공 과정(S860)은, 상기 감정 분석 서버(300)가, 상기 객관적 감정값에 감정 가중치를 추가 적용한 사용자 맞춤형 감정값을 생성하여 사용자 단말기(200)로 전송하는 과정이다.The emotion weight calculation process is a process of calculating the emotion weight of the individual individual emotionr by comparing the user voice information received from the user terminal 200 and the accumulated and stored user voice accumulated information. The user-customized emotion value providing process (S860) is a process in which the emotion analysis server 300 generates a user-customized emotion value to which the emotion weight is added to the objective emotion value and transmits it to the user terminal 200.

사용자 맞춤형 감정값 표시 과정(S870)은, 사용자 단말기(200)가, 사용자 음성 정보에 기반한 사용자 맞춤형 감정값을 상기 감정 분석 서버(300)로부터 수신하여 표시하는 과정이다.The user-customized emotion value display process (S870) is a process in which the user terminal 200 receives and displays a user-specific emotion value based on user voice information from the emotion analysis server 300.

나아가, 감정 가중치 산출 과정(S830)이 있은 후, 감정 분석을 받는 사용자의 직업에 따라서 스트레스 보정값을 산출하는 스트레스 보정값 산출 과정(S840)을 더 포함할 수 있다. 감정 가중치 산출 과정(S830)은, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보와 사용자 음성 평균 정보를 비교하여 산출되는 스트레스 지수에 사용자의 스트레스 보정값을 적용하여 보정되는 스트레스 지수를 사용자의 감정 가중치로서 산출할 수 있다. 따라서 사용자의 직업을 고려한 스트레스 보정값이 스트레스 감정값에 적용됨으로써, 좀 더 정확한 스트레스 분석이 이루어질 수 있다.Further, after the emotion weight calculation process (S830), it may further include a stress correction value calculation process (S840) for calculating the stress correction value according to the job of the user undergoing the emotion analysis. In the emotion weight calculation process (S830), the user's emotion weight is corrected by applying a stress correction value of the user to a stress index calculated by comparing the user's voice information received from the user terminal 200 with the user's voice average information. It can be calculated as Therefore, the stress correction value considering the user's occupation is applied to the stress emotion value, more accurate stress analysis can be made.

또한 감정 가중치 산출 과정(S830)이 있은 후, 감정 분석이 이루어지는 날씨에 따라서 피로도 보정값을 산출하는 피로도 보정값 산출 과정(S850)을 더 포함할 수 있다. 감정 가중치 산출 과정(S830)은, 사용자 단말기(200)로부터 수신되는 사용자 음성 정보와 사용자 음성 평균 정보를 비교하여 산출되는 피로도 지수에 사용자의 피로도 보정값을 적용하여 보정되는 피로도 지수를 사용자의 감정 가중치로서 산출한다. 따라서 감정 분석이 이루어지는 날의 날씨를 고려한 피로도 보정값이 피로도 감정값에 적용됨으로써, 좀 더 정확한 피로도 분석이 이루어질 수 있다.In addition, after the emotion weight calculation process (S830), may further include a fatigue correction value calculation process (S850) for calculating the fatigue correction value according to the weather in which the emotion analysis is performed. In the emotion weight calculation process (S830), the user's emotion weight is corrected by applying the user's fatigue correction value to the fatigue index calculated by comparing the user's voice information received from the user terminal 200 with the user's voice average information. Calculated as Therefore, the fatigue correction value considering the weather of the day on which the emotion analysis is performed is applied to the fatigue emotion value, so that more accurate fatigue analysis can be performed.

도 9는 본 발명의 제2 실시예에 의한 음성 데이터 기반의 감정 인식 과정과 이를 응용하는 방법을 설명하기 위한 플로차트(flowchart)이고, 도 10은 본 발명의 제2 실시예에 의한 음성 데이터 기반의 감정 인식 과정과 이를 응용하는 방법을 설명하기 위한 개념도이다.FIG. 9 is a flowchart illustrating a voice data based emotion recognition process and a method of applying the same according to a second embodiment of the present invention. FIG. 10 is a flowchart illustrating voice data based on a second embodiment of the present invention. This is a conceptual diagram to explain the emotion recognition process and how to apply it.

도 8의 제1 실시예는 감성 분석 서버에서 산출된 맞춤형 감정값을 사용자 단말기로 전송하여 이를 표시하고, 사용자 단말기의 스트레스 완화 이미지로 바탕 화면을 조정하고, 배경 음악을 조정하였다. 그러나 본 발명에 의한 제2 실시예는, 맞춤형 감정값을 사용 음성 누적 정보 DB에서 빅 데이터화 하고, 이 빅 데이터를 활용한 지도 학습(supervised learning)에 의한 립 러닝(deep learning)을 진행하여 9가지 복수개의 감정 중에서 특정 하나의 감정을 맞춤형 감정값으로 도출한다. 그 후, 도출된 맞춤형 감정값을 감정값 응용서버(도1의 400) 및 감정 조정 하드웨어 장치(도1의 500)로 전송하여 이를 적극적으로 활용하는 방식이다.In the first exemplary embodiment of FIG. 8, the personalized emotion value calculated by the emotion analysis server is transmitted to the user terminal and displayed, the desktop screen is adjusted with the stress relaxation image of the user terminal, and the background music is adjusted. However, according to the second exemplary embodiment of the present invention, nine types of deep data are generated by using a personalized emotion value in a speech cumulative information DB and performing deep learning by supervised learning using the big data. A specific one of the plurality of emotions is derived as a customized emotion value. Thereafter, the derived customized emotion value is transmitted to the emotion value application server (400 of FIG. 1) and the emotion adjustment hardware device (500 of FIG. 1) to actively use it.

상세히 설명하면, 사용자의 단말기에서 사용자 음성 정보, 예컨대 통화 음성 데이터를 수집(S900)한다. 이어서 단말기에서 사용자 음성 정보를 생체 음성 신호로 데이터를 분석(S910)한다. 이러한 생체 음성 신호 데이터 분석은 단말기에 설치되어 동작되는 공개된 응용 프로그램밍 인터페이스(API: application programming interface)의 작동에 의해 가능하다. 이어서 분석된 생체 음성 신호를 유무선 통신망을 통해 감정 분석 서버로 전송(S920)한다. 그 후, 상기 감성 분석 서버는 사용자 음성 누적 DB에 빅 데이터를 수집(S930)한다. In detail, the user's terminal collects user voice information, for example, call voice data (S900). Subsequently, the terminal analyzes the data of the user voice information into a biometric signal (S910). Such biological voice signal data analysis is possible by operation of a publicly available application programming interface (API) installed and operated in a terminal. Subsequently, the analyzed biological voice signal is transmitted to the emotion analysis server through the wired / wireless communication network (S920). Thereafter, the emotion analysis server collects big data in the user's voice accumulation DB (S930).

계속해서 상기 감정 분석 서버는 수집된 빅 데이터를 사용한 지도 학습에 의한 딥 러닝을 진행(S940)한다. 이때 감정 분석 알고리즘이 지도 학습시 작동될 수 있다. 4차 산업의 인공지능(AI) 분야에서 딥 러닝 기술은 컴퓨터가 마치 사람처럼 생각하고 배울 수 있도록 하는 기술을 말한다. 이를 위하여 딥 러닝 기술은 사물이나 데이터를 군집화하거나 분류하는 역할을 수행한다. 이러한 딥 러닝 기술은 분류를 통한 예측을 그 핵심 내용으로 하는데, 빅 데이터에서 데이터를 분류하고 나누는 방식은 크게 두 가지 방식이 있다. 하나는 지도 학습(Supervised learning)이고, 다른 하나는 비지도 학습(unsupervised learning)이다. 여기서 지도학습이란 컴퓨터에 먼저 정보를 가르쳐서 분류 및 예측을 하게 하는 방식이다. 본 발명에 의한 딥 러닝에 사용되는 감성 분석 알고리즘은 지도 학습에 해당된다. 즉, 복수개의 감정의 유형, 가령 도 12에 나타난 9개의 감정 별로 이에 해당되는 생체 음성 신호 범위를 미리 설정한 후, 이를 분류하도록 하는 방식이다. 구체적으로는 음량 데시벨의 주기 별 평균값의 산포도에서 제1 감정, 제2 감정 및 제3 감정의 경계치를 정하고, 상기 음정 주파수의 주기별 평균값의 산포도에서 제4 감정, 제5 감정 및 제6 감정의 경계치를 정하고, 상기 발성 스피드의 주기별 평균값의 산포도에서 제7 감정, 제8 감정 및 제9 감정의 경계치를 정하는 지도 학습(Supervised learning)을 진행한다.Subsequently, the emotion analysis server proceeds to deep learning by supervised learning using the collected big data (S940). At this time, the emotion analysis algorithm may be operated during the supervised learning. In the field of artificial intelligence (AI) in the fourth industry, deep learning technology refers to the technology that allows computers to think and learn as if they were humans. To this end, deep learning technology serves to cluster or classify objects or data. Such deep learning technology is based on the prediction through classification, and there are two ways to classify and divide data in big data. One is supervised learning and the other is unsupervised learning. Supervised learning is a way of teaching computers first to classify and make predictions. The emotion analysis algorithm used for deep learning according to the present invention corresponds to supervised learning. That is, a method of classifying the biovoice signal range corresponding to each of the plurality of emotion types, for example, nine emotions shown in FIG. 12, and classifying them. Specifically, the boundary values of the first emotion, the second emotion, and the third emotion are determined from the scatter diagrams of the mean values for each period of the volume decibels, and the fourth emotions, the fifth emotions, and the sixth emotions are determined from the scatter diagrams of the mean values for each period of the pitch frequency. The threshold value is determined, and supervised learning is performed to determine the threshold values of the seventh emotion, the eighth emotion, and the ninth emotion on a scatter diagram of the average value of each period of the speech speed.

다음으로 감정 분석 서버는 생체 음성 신호를 구성하는 음량(decibel), 음정(Hertz) 및 발성 속도인 음속(speed)에 따라 복수개의 감정 유형 예컨대 9가지의 중에서 사용자에 가장 근접한 맞춤형 감정값을 산출(S950)한다. 구체적으로는, 상기 음량 데시벨을 기준으로 산출된 하나의 감정(P1)을 구하고, 상기 음정 주파수를 기준으로 산출된 다른 하나의 감정(P2)을 구하고, 상기 발성 속도를 기준으로 산출된 또 다른 하나의 감정(P3)를 구한 후, 구해진 3개의 감정 P1, P2, P3 중에서 경계치를 가장 크게 벗어난 하나의 감정을 선택한다. 이에 대해서는 후속되는 도 11을 참조하여 상세히 설명하기로 한다Next, the emotion analysis server calculates a personalized emotion value closest to the user from among a plurality of emotion types, for example, nine according to the speed, sound, and voice speed that constitute the biovoice signal. S950). Specifically, obtain one emotion (P1) calculated on the basis of the volume decibels, obtain another emotion (P2) calculated on the basis of the pitch frequency, another calculated on the basis of the speech speed After obtaining the emotion (P3) of, select one of the three emotions P1, P2, P3 that is out of the bounds of the maximum. This will be described in detail later with reference to FIG. 11.

그 후, 상기 감정 분석 서버는 산출된 맞춤형 감정값을 감정값 응용 서버로 전송(S960)하여 정해진 주기별 맞춤형 감정값을 참고로 활용할 수 있도록 한다. 상기 감정값 응용 서버는 단말기를 통해 음성 정보를 제공한 사용자의 개인 의료 기록을 보관하는 서버일 수 있다. 따라서 일정 주기별 맞춤형 감정값이 개인 의료 기록 중 하나로 참고로 활용될 수 있다. Thereafter, the emotion analysis server transmits the calculated customized emotion value to the emotion value application server (S960) so as to utilize the determined emotion value for each cycle as a reference. The emotion value application server may be a server that stores a personal medical record of a user who provided voice information through a terminal. Therefore, the personalized emotion value for each period may be used as a reference as one of the personal medical records.

마지막으로, 상기 상기 감정 분석 서버는 산출된 맞춤형 감정값에 대응하는 조작 명령을 감정 조절 하드웨어 장치, 예컨대 온습도 조절이 가능한 공조 장치나 인공 지능이 내장된 인공지능 스피커 등으로 전송하여 감정 조절 하드웨어 장치를 조작(S970)한다. 여기서 조작 명령이란, 감정 분석 서버에 유무선 통신망으로 연결된 감정 조절 하드웨어 장치를 적절한 방식으로 가동시키는 명령을 의미하며, 이를 위하여 감정 조절 하드웨어 장치는 조작 명령을 수신하여 구동할 수 있는 적절한 내부 구성을 갖는 것이 적합하다.Finally, the emotion analysis server transmits an operation command corresponding to the calculated customized emotion value to an emotion control hardware device, for example, an air conditioning device capable of controlling temperature and humidity, or an artificial intelligence speaker with artificial intelligence. Operation (S970). Herein, the operation command refers to a command for operating an emotion control hardware device connected to the emotion analysis server through a wired or wireless communication network in an appropriate manner. For this purpose, the emotion control hardware device has an appropriate internal configuration for receiving and driving an operation command. Suitable.

한편, 도 10에서 감성 분석 API 시스템은, 감성 분석 서버가 될 수 있고, 감정 분석 결과는 맞춤형 감정값을 수신하는 사용자 단말기, 감정값 응용 서버 및 감정 조절 하드웨어 장치 중 하나가 될 수 있다.Meanwhile, in FIG. 10, the emotion analysis API system may be an emotion analysis server, and the emotion analysis result may be one of a user terminal that receives a customized emotion value, an emotion value application server, and an emotion adjusting hardware device.

도 11은 도 9의 빅 데이터, 지도 학습에 의한 딥 러닝 및 맞춤형 감정값 산출 방법을 설명하기 위한 그래프이고, 도 12는 도 9의 맞춤형 감정값의 여러 유형을 나타내는 블록도이다.FIG. 11 is a graph illustrating a method of calculating the big data of FIG. 9, deep learning by using supervised learning, and customized emotion values, and FIG. 12 is a block diagram illustrating various types of customized emotion values of FIG. 9.

도 11 및 도 12를 설명하면, 그래프의 X축은 음성 생체 신호의 크기를 나타내며, Y축은 빅 데이터로부터 수집된 생체 음성 신호의 발생 빈도수를 나타낸다. 대부분의 자연계에 존재하는 데이터가 그러하듯 생체 음성 신호의 빈도 별 분포도는 정규분포(Gaussian Distribution) 형태(도면의 D curve)를 갖는다. 가령 음량 데시벨로 제1 감정(도면의 무기력), 제2 감정(도면의 편안), 제3 감정(도면의 흥분)을 분류하기 위한 지도 학습에 의한 딥 러닝을 진행하기 위해서는 이들 3개의 감정의 경계치를 설정하여 감정 분석 서버에게 학습시키는 것이 필요하다. 이를 위해 음량 데시벨의 크기에 따라 경계치(도면 X축의 N1 및 N3)를 지정한다. 11 and 12, the X axis of the graph represents the magnitude of the negative biosignal, and the Y axis represents the frequency of occurrence of the biovoice signal collected from the big data. As is the case with most natural data, the distribution of biospeech signals by frequency has a Gaussian Distribution (D curve). For example, to conduct deep learning by supervised learning to classify the first emotion (helplessness of drawing), the second emotion (comfort of drawing), and the third emotion (exercise of drawing) with volume decibels, the boundary of these three emotions Value is set and trained by the emotion analysis server. For this purpose, the boundary values (N1 and N3 on the X axis of the drawing) are designated according to the volume of the decibels.

이때 사용자 단말기에서 감정 분석 서버로 전송된 생체 음성 신호 중에서 일정 주기 동안의 음량 데시벨의 평균치가 도면의 P1에 분포한다면, 맞춤형 감정값은 무기력으로 추론하여 산출할 수 있다. 마찬가지로 음정 주파수를 기준으로 수집된 빅 데이터로부터 정규분포(Gaussian Distribution)를 구하고 경계치를 설정한 후 제4 내지 제6 감정에서 하나의 감정인 P2를 구한다. 그리고 발성 속도인 음속을 기준으로 수집된 빅 데이터로부터 정규분포(Gaussian Distribution)를 구하고 경계치를 설정한 후 제7 내지 제9 감정에서 하나의 감정인 P3를 구한다. 다음으로 구해진 3개의 감정 P1, P2, P3 중에서 경계치를 가장 크게 벗어난 하나의 감정(도면의 P1, 무기력)을 사용자 단말기로부터 수신 받은 생체 음성 신호에 해당하는 맞춤형 감정값으로 추론하여 산출할 수 있다. 이러한 맞춤형 감정값은 도 12와 같이 9개의 복수개로 정의하여 적용할 수 있다. 그러나 이러한 복수개의 감정의 유형과 생체 음성 신호에 대한 정규 분포(Gaussian Distribution)에서 경계치를 정하는 것은 심리/감정 전문가나 통계적 기법에 의해 다양한 방법으로 최적화될 수 있음은 물론이다.In this case, if the average value of the volume decibels for a predetermined period of the biological voice signals transmitted from the user terminal to the emotion analysis server is distributed in P1 of the figure, the customized emotion value may be inferred and calculated. Similarly, a Gaussian Distribution is obtained from the big data collected on the basis of the pitch frequency, the threshold is set, and P2, which is one emotion in the fourth to sixth emotions, is obtained. Then, a normal distribution (Gaussian Distribution) is obtained from the big data collected based on the sound velocity, which is the voice speed, and the threshold is set, and P3, which is one emotion in the seventh to ninth emotions, is obtained. Next, one of the three emotions P1, P2, and P3 obtained may be inferred and calculated as a customized emotion value corresponding to the bio-voice signal received from the user terminal. Such customized emotion values may be defined and applied to nine plural numbers as shown in FIG. 12. However, delimitation of the plurality of emotion types and the Gaussian Distribution of the biospeech signal may be optimized in various ways by psychological / emotional experts or statistical techniques.

본 발명은 상기한 실시예에 한정되지 않으며, 본 발명이 속한 기술적 사상 내에서 당 분야의 통상의 지식을 가진 자에 의해 많은 변형이 가능함이 명백하다.The present invention is not limited to the above embodiments, and it is apparent that many modifications can be made by those skilled in the art within the technical spirit to which the present invention belongs.

Claims

Extracting user voice information including pitch frequency, volume decibels, and voice speed from voice data input from the user and transmitting the voice information to the emotion analysis server, and receiving and displaying a user-defined emotion value based on the user voice information from the emotion analysis server User terminal;
An objective emotion value indicating an objective emotion is calculated based on the user voice information received from the user terminal, and the user individual emotion value is calculated by comparing the user voice information received from the user terminal with accumulated cumulative user voice information. Emotion analysis server;
An emotion adjustment hardware device capable of receiving a control command according to a customized emotion value calculated by the emotion analysis server and performing an operation that may affect the emotion adjustment of the user;
An emotion value application server that can receive the customized emotion value calculated by the emotion analysis server and use it as a reference material; And
And a wired / wireless communication network connecting the terminal, the emotion analysis server, the emotion control hardware device, and the emotion value application server to provide wired or wireless communication.

The method of claim 1,
The emotion control hardware device,
The Internet of Things (LoT) is possible therein, and a voice data-based emotion recognition system comprising a configuration capable of executing a command received from the emotion analysis server.

The method of claim 1,
The emotion control hardware device,
And a user selected from a process system including an air conditioner, a heater, and a humidity control device exposed to the operating environment.

The method of claim 1,
The emotion control hardware device,
Voice data-based emotion recognition system, characterized in that the artificial intelligence is mounted inside the artificial speaker that can communicate with the user.

The method of claim 1,
The emotion analysis server,
An objective emotion value DB, in which an objective emotion value is allocated and stored for each voice information including voice frequency, voice decibel, and voice speed;
A user voice accumulation information DB in which user voice information received from a user terminal is accumulated and stored;
An objective emotion value analysis unit for extracting an objective emotion value matching the user voice information received from the user terminal from the objective emotion value DB;
The user voice average information including the average voice frequency, the average voice decibel, and the average speech rate of the box is extracted using the user voice cumulative information DB, and the user voice information received from the user terminal and the user voice average information are extracted. An emotion weight calculator for comparing emotion weights; And
And a user-specific emotion value providing unit configured to generate a user-defined emotion value obtained by adding the emotion weight to the objective emotion value and transmit the generated emotion value to the user terminal.

The method of claim 1,
The user terminal,
A terminal input unit configured to receive the voice data;
A terminal display unit displaying the user customized emotion value;
A user voice information providing unit configured to extract user voice information including voice frequency, voice decibels, and voice speed from the voice data input from the user and transmit the extracted user voice information to the emotion analysis server; And
And a terminal controller configured to receive a user-customized emotion value based on the user voice information from the emotion analysis server and record the same on the display unit.

The method of claim 6,
The user terminal,
It further includes a background music DB, which is stored with music assigned to relieve tension by stress index.
The terminal control unit,
When the stress index of the user-customized emotion value exceeds a preset threshold stress, the background music allocated for each user's stress index is extracted from the background music DB and played, and the stress index received by receiving the user's stress index in real time. The voice data based emotion recognition system, characterized in that for playing by switching to a background music matching.

The method of claim 1,
The user terminal,
Saved streams with assigned stress relief images that can relieve stress
Less relaxation image DB;
The terminal controller may be configured to preset a stress index of a user-customized emotion value.
When the exceeded threshold stress is exceeded, the stress relaxation image allocated for each user's stress index is extracted from the stress relaxation image DB and displayed through the display unit, and the stress corresponding to the stress index received by receiving the user's stress index in real time. Emotion recognition system based on voice data, characterized in that the display alternately as a relaxation image.

Providing voice information through a user in a terminal;
Analyzing the voice information provided by the terminal as a voice biosignal consisting of a volume decibel, a pitch frequency, and a voice speed;
Transmitting the analyzed voice biosignal to an emotion analysis server;
Collecting big data on the voice biosignal accumulated by the individual providing the voice information in the emotion analysis server;
Performing deep learning by supervised learning using the big data collected by operating an emotion analysis algorithm in the emotion analysis server;
Classifying a user's emotion into one of a plurality of basic emotions for a predetermined period according to the volume decibel, pitch frequency, and voice speed of the voice biosignal; And
And transmitting the emotion classified into one of the plurality of basic emotions to an emotion value application server connected to the emotion analysis server through a wired / wireless communication network.

The method of claim 9,
The method for deep learning by the supervised learning,
A boundary value of a first emotion, a second emotion, and a third emotion is determined from a scatter diagram of the mean value for each period of the volume decibels,
A boundary value of a fourth emotion, a fifth emotion, and a sixth emotion is determined from a scatter diagram of the mean value of each period of the pitch frequency;
And determining a boundary value of a seventh emotion, an eighth emotion, and a ninth emotion from a scatter diagram of the mean value for each period of the speech speed.

The method of claim 10,
The method of classifying the user's emotion into one of a plurality of basic emotions for a predetermined period of time,
Obtaining one emotion (P1) calculated based on the volume decibels,
Obtaining another emotion P2 calculated based on the pitch frequency,
After obtaining another emotion P3 calculated based on the speech speed,
A method of applying a speech data-based emotion recognition system, characterized in that one of the three emotions P1, P2, and P3 obtained is selected from the largest boundary.

The method of claim 9,
Before transmitting the emotion classified as one of the plurality of basic emotions to the emotion value application server,
And transmitting a control command corresponding to an emotion classified as one of the plurality of basic emotions to an emotion control hardware device connected to the emotion analysis server through a wired / wireless communication network. Way.

The method of claim 12,
The emotion control hardware device,
Application method of a voice data-based emotion recognition system, characterized in that it is one selected from the group of devices that can affect the user's emotion consisting of an artificial speaker and an air conditioning device for controlling temperature and humidity.