KR20180034927A

KR20180034927A - Communication terminal for analyzing call speech

Info

Publication number: KR20180034927A
Application number: KR1020160124772A
Authority: KR
Inventors: 권용준; 한택진
Original assignee: 주식회사 이엠텍
Priority date: 2016-09-28
Filing date: 2016-09-28
Publication date: 2018-04-05

Abstract

The present invention relates to a communication terminal capable of analyzing call voices, which analyzes call voices of callers during a telephone call in order to provide content according to emotion of the callers. The communication terminal comprises: an input unit which receives input from a user; a display unit which displays a user interface and content for unique functions (for example, telecommunication, content play, Internet search and the like) of the communication terminal; a speaker which emits sound; a microphone which acquires a sound; a communications unit which performs wired and/or wireless communications with an outside communication terminal, a communication system, a communication server, etc. (commonly called, a communication device); a storage unit which stores telephone number information, content (for example, music data, and video/still image data), a call voice, etc.; a signal processing unit which analyzes call voices in order to extract a feature vector (an input vector); an emotion recognizing unit which is inputted with the specific vector in order to recognize emotion of the caller; and a control unit which controls the components described above in order to perform unique functions of the communication terminal, acquires emotion information (emotion output) that is analyzed and recognized from the call voices of the caller during a call by the signal processing unit and the emotion recognizing unit, selects the type of content based on the recognized emotion, and provides the content through the display unit and/or the speaker.

Description

TECHNICAL FIELD [0001] The present invention relates to a communication terminal for analyzing a call voice,

본 발명은 통신 단말기에 관한 것으로서, 특히 전화 통화 시의 통화자들의 통화 음성을 분석하여 통화자들의 감정에 따른 컨텐츠를 제공하는 통화 음성을 분석하는 통신 단말기에 관한 것이다.
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a communication terminal, and more particularly, to a communication terminal for analyzing a call voice of callers at the time of a telephone conversation and analyzing a call voice for providing contents according to emotions of callers.

현재의 스마트폰과 같은 이동통신 단말기 등의 사용자는 자기 자신의 감정과 및 개성을 표현하는 독특한 문자와 숫자, 기호로 생성된 감정 표현 데이터를 포함하는 텍스트 이모티콘 또는 그래픽 이모티콘, 또는 움직이는 이모티콘을 상대방의 전화 번호로 이모티콘 데이터를 전송하는 서비스를 제공하고 있다. A user of a mobile communication terminal, such as a current smart phone, can use a text emoticon or a graphic emoticon that includes emotional expression data generated by a unique character, a number, or a symbol expressing one's own emotions and personality, And provides a service for transmitting emoticon data to a telephone number.

이모티콘(emoticon)은 인터넷상에서 메일 전송시나 채팅할 때 또는 이동통신 단말기에서 문자 메시지(SMS)를 전송할 때 자신의 감정이나 의사를 전달할 때 사용하는 독특한 문자, 기호, 숫자 등을 조합하여 만들어진다. An emoticon is created by combining unique characters, symbols, numbers, etc. used when transmitting a message or chatting on the Internet, or when transmitting a text message (SMS) in a mobile communication terminal.

인터넷 채팅이나 이메일을 주고받을 때, 이동통신 단말기로 SMS 문자 메시지를 송수신할 때, 전송되는 이모티콘은 사용자 자신의 감정 표현 데이터를 포함하는 텍스트 이모티콘, 또는 그래픽 이모티콘, 사운드 이모티콘을 포함한다. When sending or receiving an SMS text message to or from an Internet chat or e-mail, the transmitted emoticons include text emoticons, or graphic emoticons, and sound emoticons, including emotional expression data of the user's own emoticon.

따라서, 사용자가 자기 자신의 기분 상태를 나타내는 이모티콘이나 메모글로 미리 설정함으로써 특정인(가족, 애인, 친구 등) 및 사용자가 지정한 수신인의 전화번호로 전송하여 보다 향상된 감성 서비스를 제공하게 되었다. Accordingly, the user has been provided with emoticon or memo text indicating the mood state of his / her own by previously transmitting to a specific person (family, lover, friend, etc.) and the telephone number of the recipient designated by the user.

그러나, 상기 감성 서비스 제공 방법은 사전에 기분 상태를 특정인들이 확인하기 위해서는 사용자가 이를 전송해주어야만 가능하다.
However, the emotional service providing method can only be performed by the user in order to confirm the mood state in advance by the specific persons.

본 발명은 통화자들 간의 음성 통화 시의 통화 음성을 분석하여 통화자들의 감정을 인식하고, 인식된 감정에 따른 컨텐츠를 제공할 수 있는 통화 음성을 분석하는 통신 단말기를 제공하는 것을 목적으로 한다.
An object of the present invention is to provide a communication terminal for analyzing a call voice during a voice call between callers and recognizing emotions of callers and analyzing a call voice capable of providing contents according to the recognized emotions.

본 발명인 통신 단말기는 사용자로부터의 입력을 획득하는 입력부와, 통신 단말기의 고유 기능(예를 들면, 전화 통신, 컨텐츠 재생, 인터넷 검색 등)을 위한 사용자 인터페이스와 컨텐츠를 표시하는 표시부와, 음향을 표출하는 스피커와, 음향을 획득하는 마이크와, 외부의 통신 단말기, 통신 시스템, 통신 서버 등(통신 기기로 통칭됨)과 유선 및/또는 무선 통신을 수행하는 통신부와, 전화 번호 정보, 컨텐츠(예를 들면, 음악 데이터, 동영상/정지 영상 데이터 등), 통화 음성 등을 저장하는 저장부와, 통화 음성을 분석하여 특징 벡터(입력 벡터)를 추출하는 신호 처리부와, 특징 벡터를 입력 받아 통화자의 감정을 인식하는 감정 인식부와, 상술된 구성요소들을 제어하여 통신 단말기의 고유 기능을 수행하면서, 신호 처리부와 감정 인식부에 의해 통화 중의 통화자의 통화 음성에서 분석되어 인식된 감정 정보(감정 출력)를 획득하여, 인식된 감정을 기준으로 컨텐츠의 종류를 선택하여 표시부 및/또는 스피커를 통하여 제공하는 제어부로 구성된다.
A communication terminal of the present invention includes an input unit for acquiring an input from a user, a display unit for displaying a user interface and contents for a unique function (e.g., telephone communication, content reproduction, Internet search, etc.) of the communication terminal, A communication unit for performing wired and / or wireless communication with an external communication terminal, a communication system, a communication server (collectively referred to as a communication apparatus), a telephone number information, a content A signal processing unit for extracting a feature vector (input vector) by analyzing the speech voice; and a signal processing unit for receiving the feature vector and inputting the emotion of the caller An emotion recognizing unit for recognizing an emotion recognized by the emotion recognizing unit; and a control unit for controlling the above- And a control unit for acquiring the recognized emotion information (emotion output) analyzed in the caller's voice and selecting the type of the content based on the recognized emotion and providing the selected type through the display unit and / or the speaker.

본 발명은 통화자들 간의 음성 통화 시의 통화 음성을 분석하여 통화자들의 감정을 인식하고, 인식된 감정에 따른 컨텐츠를 제공하여 통화자의 감정을 치유할 수 있는 효과가 있다.
The present invention has the effect of hearing the feelings of the caller by analyzing the call voice during the voice call between the callers, recognizing the feelings of the callers, and providing the contents according to the recognized feelings.

도 1은 통화 음성을 분석하는 통신 단말기의 구성도이다.
도 2는 도 1의 통신 단말기에서의 통화 음성 분석 방법의 순서도이다. 1 is a configuration diagram of a communication terminal for analyzing a voice call.
2 is a flowchart of a call voice analysis method in the communication terminal of FIG.

이하에서, 본 발명은 실시예와 도면을 통하여 상세하게 설명된다.
Hereinafter, the present invention will be described in detail with reference to embodiments and drawings.

도 1은 통화 음성을 분석하는 통신 단말기의 구성도이다. 본 발명인 통신 단말기는 사용자로부터의 입력(예를 들면, 전화 번호, 문자 등의 입력, 통화 음성 분석 선택/해제 입력 등)을 획득하는 입력부(1)와, 통신 단말기의 고유 기능(예를 들면, 전화 통신, 컨텐츠 재생, 인터넷 검색 등)을 위한 사용자 인터페이스와 컨텐츠를 표시하는 표시부(2)와, 음향을 표출하는 스피커(3)와, 통신 단말기 외부의 음향을 획득하는 마이크(4)와, 외부의 통신 단말기, 통신 시스템, 통신 서버 등(통신 기기로 통칭됨)과 유선 및/또는 무선 통신을 수행하는 통신부(5), 전화 번호 정보, 컨텐츠(예를 들면, 음악 데이터, 동영상/정지 영상 데이터 등), 통화 음성 등을 저장하는 저장부(6)와, 통화 음성을 분석하여 특징 벡터(입력 벡터)를 추출하는 신호 처리부(11)와, 특징 벡터를 입력 받아 통화자의 감정을 인식하는 감정 인식부(13)와, 상술된 구성요소들을 제어하여 통신 단말기의 고유 기능을 수행하면서, 신호 처리부(11)와 감정 인식부(13)의 동작에 의해 통화 중의 통화자의 통화 음성에서 분석되어 인식된 감정 정보(감정 출력)를 획득하여, 인식된 감정을 기준으로 컨텐츠의 종류를 선택하여 표시부(2) 및/또는 스피커(3)를 통하여 제공하는 제어부(20)로 구성된다. 다만, 전원부(미도시), 입력부(1), 표시부(2), 스피커(3), 마이크(4), 통신부(5)는 본 발명이 속하는 기술분야에 익숙한 통상의 기술자들에게 당연히 인식되는 정도에 해당되는 기술이기에, 그 설명이 생략된다. 1 is a configuration diagram of a communication terminal for analyzing a voice call. The communication terminal of the present invention includes an input unit 1 for obtaining an input from a user (for example, input of a telephone number, a character, etc., a call voice analysis selection / release input, etc.) A speaker 3 for displaying sound, a microphone 4 for acquiring sound outside the communication terminal, and a display unit 2 for displaying contents, A communication unit 5 for performing wired and / or wireless communication with a communication terminal, a communication system, a communication server, etc. (collectively referred to as a communication apparatus), telephone number information, contents (for example, music data, A speech processing unit 11 for extracting a feature vector (input vector) by analyzing a speech voice, an emotion recognition unit (not shown) for receiving the feature vector, Part 13, (Emotion output) analyzed and recognized in the call voice of the caller in the call by the operation of the signal processing unit 11 and the emotion recognition unit 13 while controlling the components to perform the unique function of the communication terminal And a control unit 20 for selecting the type of content based on the recognized emotion and providing the selected type of content through the display unit 2 and / or the speaker 3. However, the power supply unit (not shown), the input unit 1, the display unit 2, the speaker 3, the microphone 4, and the communication unit 5 are not limited to the degree that is well known to those skilled in the art. The description thereof is omitted.

먼저, 신호 처리부(11)는 저장부(6)에 저장된 통화 음성을 제어부(20)로부터 인가 받아, 인가된 통화 음성으로부터 에너지 분포도, 크기, 변화폭, MFCC(Mel-frequency cepstral coefficient), 중심 주파수(center frequency) 등의 특징 벡터들을 추출한다. 여기서, 에너지 분포도는 통화 음성의 주파수별 에너지의 표준 편차를 의미하며, 크기는 주파수별 크기(magnitude)를 의미하며, 변화폭은 통화 음성의 시간 영역(time domain)에서의 attack time이나 decay time을 의미한다. MFCC는 인간의 귀가 가지는 비선형적인 주파수 특성을 이용한 것으로, 신호 처리부(11)는 FFT를 수행한 후 주파수 대역을 여러 개의 필터 뱅크로 나누고 각 뱅크에서의 에너지를 산정하여 MFCC를 추출한다. First, the signal processing unit 11 receives the voice call stored in the storage unit 6 from the control unit 20, and calculates an energy distribution, a size, a variation width, a Mel-frequency cepstral coefficient (MFCC) center frequency) are extracted. Here, the energy distribution represents the standard deviation of the energy of each voice of the voice, the size means magnitude, and the variation means an attack time or decay time in the time domain of the voice voice do. The MFCC uses the nonlinear frequency characteristic of the human ear. The signal processor 11 performs FFT, divides the frequency band into a plurality of filter banks, and calculates the energy in each bank to extract the MFCC.

다음으로, 감정 인식부(13)는 신호 처리부(11)로부터의 특징 벡터들을 입력받는 입력층과, 입력층으로부터 특징 벡터들 각각을 입력 받아 감정을 분류하는 은닉층과, 은닉층에 의해 분류된 감정 정보(감정 출력, 감정의 종류)을 제어부(20)에 인가하는 출력층으로 구성된다. 은닉층은 기저장된 감정인식 알고리즘이나 감정인식 목적 함수 등에 의해 모델링되거나 트레이닝되어 특징 벡터들로부터 감정을 분류하며, 이 감정은 예를 들면, 화남, 기쁨, 우울 등을 포함한다. Next, the emotion recognition unit 13 includes an input layer for receiving the feature vectors from the signal processing unit 11, a hidden layer for receiving the feature vectors from the input layer and classifying the emotion, (The emotional output, the type of emotion) to the control unit 20. The hidden layer is modeled or trained by a previously stored emotion recognition algorithm or an emotion recognition objective function to classify the emotion from the feature vectors, and this emotion includes, for example, anger, joy, depression, and the like.

제어부(20)는 상술된 구성요소들을 제어하여 통신 단말기의 고유 기능을 수행하면서, 특히 전화나 다른 프로그램(어플리케이션)을 통한 다른 사용자와의 통화 시에 마이크(4)를 통하여 획득되는 사용자(제1통화자)의 통화 음성 및/또는 통신부(5)를 통하여 통신 기기로부터 전송되어 스피커(3)를 통하여 표출되는 통신 기기의 사용자(제2통화자)의 통화 음성을 저장부(6)에 지속적으로 저장하며, 저장된 통화 음성을 신호 처리부(11)에 인가하여 특징 벡터들을 추출하도록 하며, 추출된 특징 벡터들이 감정 인식부(13)로 인가되도록 신호 처리부(11)를 제어하며, 감정 인식부(13)로부터 감정 정보를 획득하고, 감정 정보에 대응하는 컨텐츠를 저장부(6)로부터 판독하여 표시부(2) 및/또는 스피커(3)를 통하여 사용자에게 제공한다. 감정 정보에 대응하는 컨텐츠(또는 컨텐츠 종류)는 음악이나 배경화면이며, 예를 들면 우울 감정인 경우에 음악은 댄스나 R&B 음악이 선택되어 내부 재생 프로그램으로 스피커(3)를 통하여 표출되며, 화남 감정인 경우 자연 화면인 배경화면이 표시부(3)에 표시된다.The control unit 20 controls the above-described components to perform a specific function of the communication terminal, and in particular, a user who is obtained via the microphone 4 when talking with another user via a telephone or another program (application) (Second party) of the communication device, which is transmitted from the communication device via the communication device 5 and displayed through the speaker 3, to the storage section 6 continuously Extracts the feature vectors by applying the stored speech voice to the signal processing unit 11 and controls the signal processing unit 11 so that the extracted feature vectors are applied to the emotion recognition unit 13. The emotion recognition unit 13 And reads out the contents corresponding to the emotion information from the storage unit 6 and provides the contents to the user through the display unit 2 and / or the speaker 3. The content (or the content type) corresponding to the emotion information is music or a background screen. For example, in the case of a depressed emotion, music is selected through dance or R & B music and displayed through an internal playback program through a speaker 3, A background screen which is a natural screen is displayed on the display unit 3. [

또한, 제어부(20)는 제1통화자의 감정뿐만 아니라, 제2통화자의 감정에 대한 분석을 진행하고, 감정 정보를 제2통화자의 통신 기기(통신 단말기)로 전송하며, 제2통화자의 통신 기기는 수신된 제2통화자의 감정을 통신 기기의 표시부에 표시하거나, 통신 기기에 저장된 컨텐츠를 선택하여 통신 시기의 표시부나 스피커를 통하여 재생할 수도 있다. In addition, the control unit 20 conducts an analysis of the emotions of the second party, as well as the emotions of the first party, transmits the emotion information to the communication apparatus (communication terminal) of the second party, May display the received emotion of the second party on the display unit of the communication device, or may select the content stored in the communication device and reproduce through the display unit or the speaker of the communication time.

도 2는 도 1의 통신 단말기에서의 통화 음성 분석 방법의 순서도이다. 2 is a flowchart of a call voice analysis method in the communication terminal of FIG.

제어부(20)는 입력부(1)를 통하여 사용자로부터 통화 음성 분석 선택에 대한 입력을 획득한 이후에, 도 2의 통화 음성 분석 방법을 수행하며, 통화 음성 분석 방법은 예를 들면, 통화 기능에 대하여 독립적으로 수행된다. The control unit 20 performs the call voice analysis method of FIG. 2 after obtaining an input for the call voice analysis selection from the user through the input unit 1, and the call voice analysis method is, for example, It is performed independently.

단계(S1)에서, 제어부(20)는 입력부(1)로부터의 통화 입력에 의해 통신부(5)를 통하여 통신 기기와의 통화를 시작하는지를 판단한다. 만약 통화가 시작되면, 단계(S3)으로 진행하고 그렇지 않으면 통화가 시작될 때까지 단계(S1)를 지속적으로 수행한다.In step S1, the control unit 20 determines whether to start a call with the communication device via the communication unit 5 by inputting a call from the input unit 1. [ If the call is started, the process proceeds to step S3, otherwise, it continues to perform step S1 until the call is started.

단계(S3)에서, 제어부(20)는 통화가 시작되면 마이크(4)를 통하여 획득되는 제1통화자의 통화 음성 및/또는 통신 기기로부터 통신부(5)를 통하여 전송되는 제2통화자의 통화 음성을 저장부(6)에 저장한다. 제어부(20)는 제1통화자의 통화 음성과 제2통화자의 통화 음성을 분리하여 각각 저장할 수도 있고, 통화 전체를 하나의 파일로 통합하여 저장할 수도 있다. In step S3, the control unit 20 transmits a call voice of a first caller obtained through the microphone 4 and / or a call voice of a second caller, which is transmitted from the communication device via the communication unit 5, And stores it in the storage unit 6. The control unit 20 may separately store the call voice of the first caller and the call voice of the second caller, respectively, or may integrate and store the entire call into one file.

단계(S5)에서, 제어부(20)는 통화 음성을 저장하면서, 통화가 종료되었는지를 판단한다. 만약 통화가 종료되었으면 단계(S7)로 진행하고, 그렇지 않으면 통화 음성을 계속 저장한다. In step S5, the control unit 20 stores the call voice and determines whether or not the call is terminated. If the call is terminated, the process proceeds to step S7, otherwise, the call voice is continuously stored.

단계(S7)에서, 제어부(20)는 저장된 통화 음성에 대한 분석이 진행되도록 한다. 제어부(20)는 저장된 통화 음성을 저장부(6)로부터 판독하여 신호 처리부(11)에 인가한다. 신호 처리부(11)는 인가된 통화 음성으로부터 특징 벡터들을 추출하고, 추출된 특징 벡터들을 감정 인식부(13)로 입력시킨다. 감정 인식부(13)는 입력된 특징 벡터들로부터 감정을 인식하고, 인식된 감정정보를 제어부(20)에 인가한다. In step S7, the control unit 20 causes the analysis of the stored speech voice to proceed. The control unit 20 reads out the stored voice call from the storage unit 6 and applies it to the signal processing unit 11. [ The signal processing unit 11 extracts the feature vectors from the applied speech voice, and inputs the extracted feature vectors to the emotion recognition unit 13. The emotion recognition unit 13 recognizes the emotion from the input feature vectors and applies the recognized emotion information to the control unit 20. [

단계(S9)에서, 제어부(20)는 감정 인식부(13)로부터의 감정 정보를 인가 받았는지 여부를 통하여 통화 음성에 대한 분석이 완료되었는지 판단한다. 만약 제어부(20)가 감정 인식부(13)로부터 감정 정보를 인가 받았으면, 통화 음성에 대한 분석이 완료된 것으로 판단하여 단계(S11)로 진행하고, 그렇지 않으면 단계(S7)의 종료를 기다린다.In step S9, the control unit 20 determines whether or not the analysis of the speech voice is completed based on whether or not the emotion information from the emotion recognition unit 13 is received. If the control unit 20 receives the emotion information from the emotion recognition unit 13, it is determined that the analysis of the voice is completed and the process proceeds to step S11. Otherwise, the process waits for the end of step S7.

단계(S11)에서, 제어부(20)는 감정 정보를 획득한 이후에, 저장부(6)에 저장된 통화 음성을 삭제하여, 저장부(6)의 저장 공간을 확보한다. In step S11, after acquiring the emotion information, the control unit 20 deletes the voice call stored in the storage unit 6 to reserve the storage space of the storage unit 6. [

단계(S13)에서, 제어부(20)는 획득된 감정 정보를 분석하고, 감정 정보에 대응하는 컨텐츠를 선택하여, 스피커(3) 및/또는 표시부(2)를 통하여 제공한다. 이러한 컨텐츠의 제공에 의해, 사용자는 감정적인 치유를 받을 수 있게 된다.
In step S13, the control unit 20 analyzes the acquired emotion information, selects the content corresponding to the emotion information, and provides the selected content through the speaker 3 and / or the display unit 2. [ By providing such content, the user can receive emotional healing.

이상 설명한 바와 같이, 본 발명은 상술한 특정의 바람직한 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형의 실시가 가능한 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위 내에 있게 된다.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the general inventive concept as defined by the appended claims. It is to be understood that modifications are possible and that such modifications are within the scope of the claims.

1: 입력부 2: 표시부
3: 스피커 4: 마이크
5: 통신부 6: 저장부
11: 신호 처리부 13: 감정 인식부
20: 제어부1: input unit 2: display unit
3: Speaker 4: Microphone
5: communication unit 6: storage unit
11: Signal processing section 13: Emotion recognition section
20:

Claims

A communication unit for performing voice communication between a first speaker who is a user and another speaker;
A microphone for acquiring a voice of a first party;
A storage unit for storing contents corresponding to call voice and emotion information of a first speaker during voice communication;
A signal processing unit for extracting the feature vectors from the speech voice from the storage unit;
An emotion recognition unit that receives the extracted feature vectors and classifies emotions of the first party and outputs emotion information;
An output unit for presenting the contents visually or audibly;
And a control unit for controlling the signal processing unit and the emotion recognition unit to read the content corresponding to the outputted emotion information from the storage unit and to display the content through the output unit.

The method according to claim 1,
And the control unit stores the call voice of the first caller obtained from the microphone during the voice communication by the communication unit in the storage unit.

The method according to claim 1,
Wherein the feature vector includes at least one of an energy distribution, frequency-dependent size, change width, MFCC, and center frequency.

The method according to claim 1,
Wherein the control unit obtains the call voice of another caller from the communication unit and stores it in the storage unit and controls the signal processing unit and the emotion recognition unit to acquire the emotion information of the other caller and transmit it to the communication equipment of the other caller through the communication unit. A communication terminal for analyzing voice.