KR101066228B1

KR101066228B1 - Emotion classification system and method thereof

Info

Publication number: KR101066228B1
Application number: KR1020090030414A
Authority: KR
Inventors: 박규식; 윤원중
Original assignee: 단국대학교 산학협력단
Priority date: 2009-04-08
Filing date: 2009-04-08
Publication date: 2011-09-21
Also published as: KR20100111928A

Abstract

감성 분류 시스템 및 감성 분류 방법이 개시된다. 상기 감성 분류 시스템 및 감성 분류 방법은 음성 신호로부터 추출된 감성 벡터들 및 훈련용 음성 신호로부터 추출된 적어도 하나의 제1 특징 벡터 값의 범위에 기초하여 음성 신호 발신자의 감성을 다수의 감성 그룹들 중에서 어느 하나의 감성 그룹으로 분류한 후, 상기 음성 신호의 감성 벡터들 및 상기 훈련용 음성 신호로부터 추출된 적어도 하나의 제2 특징 벡터 값의 범위에 기초하여 상기 어느 하나의 감성 그룹으로 분류된 상기 음성 신호 발신자의 감성을 다시 한번 분류함으로써 감성 분류 동작을 수행한다.An emotional classification system and an emotional classification method are disclosed. The emotion classification system and the emotion classification method are based on a range of emotion vectors extracted from a voice signal and at least one first feature vector value extracted from a training voice signal. The voice classified into any one emotion group based on a range of emotion vectors of the voice signal and at least one second feature vector value extracted from the training voice signal after classifying into one emotion group The emotion classification operation is performed by classifying the emotion of the signal sender once again.

감성 분류, 특징 벡터, 피치, 에너지, MFCC(Mel frequency cepstral coefficients) Emotion classification, feature vector, pitch, energy, mel frequency cepstral coefficients (MFCC)

Description

Emotion classification system and method {EMOTION CLASSIFICATION SYSTEM AND METHOD THEREOF}

본 발명은 감성 분류 기술에 관한 것으로, 보다 상세하게는 음성 신호로부터 추출되는 특징 벡터에 기초하여 발신자의 감성 상태를 분류할 수 있는 감성 분류 시스템 및 그 방법에 관한 것이다.The present invention relates to an emotion classification technique, and more particularly, to an emotion classification system and a method for classifying an emotional state of a sender based on a feature vector extracted from a voice signal.

고객 센터(call center), 결혼 정보 회사, 모바일 콘텐트 산업 등과 같이 고객의 감성에 따라서 유연한 대응이 필요한 분야에서 감성 분류 또는 감성 분류 시스템에 대한 관심이 증가하고 있다. 인간의 행위로부터 감성 정보는 얼굴 표정, 음성, 몸동작, 심장 박동 수, 혈압, 체온, 뇌파 등을 통하여 얻어질 수 있다. 특히,이들 중에서도 음성을 이용한 감성 분류 방법은 음성 신호의 입력 및 처리가 상대적으로 편리하기 때문에 이를 이용한 감성 분류 기술에 대한 연구가 활발히 진행되고 있다.There is a growing interest in emotion classification or emotion classification systems in areas that require flexible responses, such as call centers, marriage information companies, and mobile content industries. Emotional information from human behavior can be obtained through facial expressions, voice, gestures, heart rate, blood pressure, body temperature, brain waves, and the like. In particular, since the emotional classification method using voice is relatively convenient to input and process voice signals, research on emotional classification technology using the same has been actively conducted.

일반적인 감성 분류 시스템의 훈련은 감성 훈련용 음성 신호를 이용하여 이루어지는 반면, 불특정 다수의 고객들의 질의는 전혀 다른 환경에서 녹취되는 경우가 대부분이다. 그러므로 이러한 감성 분류 시스템을 이용한 불특정 다수의 고객들 의 질의 음성으로부터 고객들의 감성을 분류하는 것은 녹취 환경의 차이로 인한 시스템의 성능 저하와 감성 분류의 부 정확성을 초래할 수밖에 없다.While general training of emotion classification system is performed using voice signals for emotion training, inquiries of many unspecified customers are recorded in completely different environments. Therefore, classifying the emotions of customers from the voice of the unspecified number of customers using the emotion classification system inevitably leads to poor performance of the system and inaccurate accuracy of the emotion classification due to differences in the recording environment.

따라서 본 발명이 이루고자 하는 기술적인 과제는 훈련용 음성 신호에 기초하여 훈련되지만 불특정 다수의 음성 신호 발신자들을 대상으로 한 감성 분류에도 높은 시스템 효율 및 감성 분류의 정확도를 갖는 감성 분석 시스템 및 그 방법을 제공하는 것이다.Accordingly, the technical problem to be achieved by the present invention is to provide an emotional analysis system and method having a high system efficiency and accuracy of emotional classification even for emotional classification for a plurality of unspecified voice signal transmitters, which are trained based on training voice signals. It is.

상기 기술적 과제를 달성하기 위한 감성 분류 시스템은 특징 벡터 추출 블락, 특징 벡터 저장 블락, 및 감성 분류 블락을 포함할 수 있다. 상기 특징 벡터 추출 블락은 수신되는 음성 신호의 특징 벡터들을 추출할 수 있다.An emotional classification system for achieving the above technical problem may include a feature vector extraction block, a feature vector storage block, and an emotion classification block. The feature vector extraction block may extract feature vectors of a received speech signal.

상기 특징 벡터 저장 블락은 상기 특징 벡터 추출 블락에 의하여 추출된 훈련용 음성 신호들에 대한 특징 벡터들에 기초하여 다수의 감성들 중에서 적어도 2개의 감성들을 포함하는 적어도 하나의 감성 그룹을 포함하는 다수의 감성 그룹들 각각에 상응하는 적어도 하나의 제1 특징 벡터 값의 범위 및 상기 적어도 하나의 감성 그룹에 포함된 상기 적어도 2개의 감성들 각각에 상응하는 적어도 하나의 제2 특징 벡터 값의 범위를 저장할 수 있다.The feature vector storage block includes a plurality of emotion groups including at least one emotion group including at least two emotions among a plurality of emotions based on feature vectors for training speech signals extracted by the feature vector extraction block. Store at least one first feature vector value corresponding to each of the sentiment groups and at least one second feature vector value corresponding to each of the at least two sentiments included in the at least one sentiment group have.

상기 감성 분류 블락은 상기 음성 신호의 특징 벡터들 및 상기 특징 벡터 저장 블락에 저장된 상기 적어도 하나의 제1 특징 벡터 값의 범위 및 제2 특징 벡터 값의 범위에 기초하여 상기 음성 신호 발신자의 감성 분류 결과를 발생할 수 있다.The emotion classification block is a result of emotional classification of the voice signal sender based on the feature vectors of the voice signal and the range of the at least one first feature vector value and the range of a second feature vector value stored in the feature vector storage block. May occur.

상기 감성 분류 블락은 제1 감성 분류 블락 및 제2 감성 분류 블락을 포함할 수 있다. 상기 제1 감성 분류 블락은 상기 음성 신호의 감성 벡터들 및 상기 적어도 하나의 제1 특징 벡터 값의 범위에 기초하여 상기 음성 신호 발신자의 감성을 상기 다수의 감성 그룹들 중에서 어느 하나의 감성 그룹으로 분류할 수 있다.The emotional classification block may include a first emotional classification block and a second emotional classification block. The first emotion classification block classifies the emotion of the voice signal sender into one of the plurality of emotion groups based on the range of emotion vectors of the voice signal and the at least one first feature vector value. can do.

상기 제2 감성 분류 블락은 상기 음성 신호의 감성 벡터들 및 상기 적어도 하나의 제2 특징 벡터 값의 범위에 기초하여 상기 적어도 하나의 감성 그룹에 포함된 상기 적어도 2개의 감성을 분류할 수 있다.The second emotion classification block may classify the at least two emotions included in the at least one emotion group based on the emotion vectors of the voice signal and the range of the at least one second feature vector value.

예컨대, 상기 특징 벡터 저장 블락은 상기 다수의 감성들 중에서 적어도 2개의 감성들을 포함하는 적어도 하나의 감성 그룹을 포함하는 다수의 감성 그룹들 각각에 상응하는 피치의 평균값의 범위 및 상기 적어도 하나의 감성 그룹에 포함된 상기 적어도 2개의 감성들 각각에 상응하는 에너지의 범위 및 MFCC의 범위를 저장할 수 있다.For example, the feature vector storage block may include a range of average values of pitches corresponding to each of a plurality of emotion groups including at least one emotion group including at least two emotions among the plurality of emotions and the at least one emotion group. It is possible to store a range of energy and a range of MFCC corresponding to each of the at least two emotions included in.

그러면, 상기 제1 감성 분류 유닛은 상기 음성 신호의 피치의 평균값과 상기 특징 벡터 저장 블락에 저장된 상기 피치의 평균값의 범위에 기초하여 상기 음성 신호의 발신자의 감성을 상기 다수의 감성 그룹들 중에서 어느 하나의 감성 그룹으로 분류할 수 있으며, 상기 제2 감성 분류 유닛은 상기 음성 신호의 에너지의 평균값 및 MFCC의 평균값 및 상기 특징 벡터 저장 블락에 저장된 상기 에너지의 범위 및 MFCC의 범위에 기초하여 상기 어느 하나의 감성 그룹에 포함된 적어도 2개의 감성을 분류할 수 있다.Then, the first emotion classification unit selects one of the plurality of emotion groups the emotion of the sender of the voice signal based on the range of the average value of the pitch of the voice signal and the average value of the pitch stored in the feature vector storage block. And the second emotional classification unit is based on the average value of the energy of the speech signal and the average value of the MFCC and the range of the energy stored in the feature vector storage block and the range of the MFCC. At least two emotions included in the emotion group may be classified.

본 발명의 실시예에 따른 감성 분류 시스템은 특징 벡터 추출 블락, 특징 벡터 저장 블락, 및 감성 분류 블락을 포함할 수 있다. 상기 특징 벡터 추출 블락은 수신되는 음성 신호의 피치의 평균값, 에너지 및 MFCC를 추출할 수 있다.An emotional classification system according to an embodiment of the present invention may include a feature vector extraction block, a feature vector storage block, and an emotion classification block. The feature vector extraction block may extract an average value, energy, and MFCC of a pitch of the received speech signal.

상기 특징 벡터 저장 블락은 상기 특징 벡터 추출 블락에 의하여 추출된 훈련용 음성 신호들에 대한 피치의 평균값을 남자 평상, 남자 화남 및 여자 평상, 및 여자 화남 각각에 상응하는 피치 평균값의 범위로 분류하여 저장하고, 상기 특징 벡터 추출 블락에 의하여 추출된 상기 훈련용 음성 신호의 에너지 및 MFCC를 남자 화남 및 여자 평상 각각에 상응하는 에너지의 범위 및 MFCC의 범위로 분류하여 저장할 수 있다.The feature vector storage block classifies and stores an average value of pitches for training voice signals extracted by the feature vector extraction block into a range of pitch average values corresponding to male normal, male angry and female ordinary, and female angry, respectively. The energy and MFCC of the training voice signal extracted by the feature vector extraction block may be classified and stored into a range of energy and a range of MFCC corresponding to a male anger and a female phase, respectively.

상기 감성 분류 블락은 상기 음성 신호의 피치의 평균값, 에너지, 및 MFCC와 상기 특징 벡터 저장 블락에 저장된 피치의 평균값의 범위, 에너지의 범위, 및 MFCC의 범위에 기초하여 상기 음성 신호 발신자의 감성 분류 결과를 발생할 수 있다.The emotion classification block is based on the average value of the pitch of the speech signal, energy, and the range of the average value of the pitch stored in the MFCC and the feature vector storage block, the range of energy, and the range of the MFCC. May occur.

상기 감성 분류 블락은 제1 감성 분류 유닛 및 제2 감성 분류 유닛을 포함할 수 있다. 상기 제1 감성 분류 유닛은 상기 음성 신호의 피치의 평균값 및 상기 특징 벡터 저장 블락에 저장된 상기 피치의 평균값의 범위에 기초하여 상기 음성 신호 발신자의 감성을 남자 평상, 남자 화남 및 여자 평상, 및 여자 화남 중에서 어느 하나의 감성 그룹으로 분류할 수 있다.The emotional classification block may include a first emotional classification unit and a second emotional classification unit. The first emotion classification unit is based on the average value of the pitch of the voice signal and the range of the average value of the pitch stored in the feature vector storage block, the sentiment of the voice signal sender being male ordinary, male angry and female ordinary, and female angry. It can be classified into any one of the emotional groups.

상기 제2 감성 분류 유닛은 상기 음성 신호의 에너지 및 MFCC 및 상기 특징 벡터 저장 블락에 저장된 상기 에너지의 범위 및 상기 MFCC의 범위에 기초하여 남자 화남 및 여자 평상의 감성 그룹을 남자 화남 또는 여자 평상 중 하나의 감성으로 분류할 수 있다.The second emotion classification unit selects one of the male angry male and female female emotional groups based on the energy of the speech signal and the range of the energy stored in the MFCC and the feature vector storage block and the range of the MFCC. Can be classified as

상기 기술적 과제를 해결하기 위한 감성 분류 방법은 수신되는 음성 신호의 특징 벡터들을 추출하는 단계; 훈련용 음성 신호들로부터 추출되는 특징 벡터들에 기초하여 다수의 감성들 중에서 적어도 2개의 감성들을 포함하는 적어도 하나의 감성 그룹을 포함하는 다수의 감성 그룹들 각각에 상응하는 적어도 하나의 제1 특징 벡터 값의 범위 및 상기 적어도 하나의 감성 그룹에 포함된 상기 적어도 2개의 감성들 각각에 상응하는 적어도 하나의 제2 특징 벡터 값의 범위를 저장하는 단계; 및 상기 음성 신호의 특징 벡터들 및 상기 특징 벡터 저장 블락에 저장된 상기 적어도 하나의 제1 특징 벡터 값의 범위 및 제2 특징 벡터 값의 범위에 기초하여 상기 음성 신호 발신자의 감성 분류 결과를 발생하는 단계를 포함할 수 있다.Emotion classification method for solving the technical problem comprises the steps of extracting feature vectors of the received speech signal; At least one first feature vector corresponding to each of the plurality of emotion groups including at least one emotion group including at least two emotions among the plurality of emotions based on feature vectors extracted from the training speech signals Storing a range of values and a range of at least one second feature vector value corresponding to each of the at least two sentiments included in the at least one sentiment group; And generating an emotion classification result of the voice signal sender based on the feature vectors of the voice signal and the range of the at least one first feature vector value and the range of the second feature vector value stored in the feature vector storage block. It may include.

상기 감성 분류 결과를 발생하는 단계는 상기 음성 신호의 감성 벡터들 및 상기 적어도 하나의 제1 특징 벡터 값의 범위에 기초하여 상기 음성 신호 발신자의 감성을 상기 다수의 감성 그룹들 중에서 어느 하나의 감성 그룹으로 분류하는 단계; 및 상기 음성 신호의 감성 벡터들 및 상기 적어도 하나의 제2 특징 벡터 값의 범위에 기초하여 상기 적어도 하나의 감성 그룹에 포함된 상기 적어도 2개의 감성을 분류하는 단계를 포함할 수 있다.The generating of the emotion classification result comprises: an emotion group of any one of the plurality of emotion groups based on a range of emotion vectors of the voice signal and the at least one first feature vector value. Classifying to; And classifying the at least two emotions included in the at least one emotion group based on a range of emotion vectors of the voice signal and the at least one second feature vector value.

예컨대, 상기 제1 특징 벡터의 범위 및 상기 제2 특징 벡터의 범위를 저장하는 단계는 상기 다수의 감성들 중에서 적어도 2개의 감성들을 포함하는 적어도 하나의 감성 그룹을 포함하는 다수의 감성 그룹들 각각에 상응하는 피치의 평균값의 범위 및 상기 적어도 하나의 감성 그룹에 포함된 상기 적어도 2개의 감성들 각각에 상응하는 에너지의 범위 및 MFCC의 범위를 저장하는 단계를 포함할 수 있다.For example, storing the range of the first feature vector and the range of the second feature vector may include each of a plurality of sentiment groups including at least one sentiment group including at least two sentiments among the plurality of sentiments. Storing a range of MFCCs and a range of energy corresponding to each of the at least two sentiments included in the at least one sentiment group and a range of average values of corresponding pitches.

그러면 상기 음성 신호의 발신자의 감성 분류 결과를 발생하는 단계는 상기 음성 신호의 피치의 평균값과 상기 피치의 평균값의 범위에 기초하여 상기 음성 신호의 발신자의 감성을 상기 다수의 감성 그룹들 중에서 어느 하나의 감성 그룹으로 분류하는 단계; 및 상기 음성 신호의 에너지 및 MFCC 및 상기 특징 벡터 저장 블락에 저장된 상기 에너지의 범위 및 MFCC의 범위에 기초하여 상기 어느 하나의 감성 그룹에 포함된 적어도 2개의 감성을 분류하는 단계를 포함할 수 있다.The generating of the emotion classification result of the sender of the voice signal may include the emotion of the sender of the voice signal based on a range of the average value of the pitch of the voice signal and the average value of the pitch. Categorizing into an emotion group; And classifying at least two emotions included in any one emotion group based on the energy of the voice signal and the range of the energy stored in the MFCC and the feature vector storage block and the range of the MFCC.

본 발명의 실시예에 따른 감성 분류 방법은 컴퓨터로 읽을 수 있는 기록 매체에 저장된 상기 감성 분류 방법을 실행하기 위한 컴퓨터 프로그램을 실행함으로써 구현될 수 있다.The emotion classification method according to an embodiment of the present invention may be implemented by executing a computer program for executing the emotion classification method stored in a computer-readable recording medium.

상술한 바와 같이 본 발명의 실시예에 따른 감성 분류 시스템 및 그 방법은 음성 신호 발신자의 감성을 음성 신호의 특징 벡터의 범위에 따라서 적어도 하나의 감성을 포함하는 다수의 감성 그룹들 중에서 하나로 분류하고, 일차적으로 분류된 결과를 다른 특징 벡터의 범위에 기초하여 분류함으로써 훈련용 음성 신호와 발신자의 음성 신호의 녹취 환경의 차이에서 발생할 수 있는 시스템의 성능 저하 및 감성 분류의 부 정확성을 개선할 수 있는 효과가 있다.As described above, the emotion classification system and method according to an embodiment of the present invention classify the emotion of the voice signal sender into one of a plurality of emotion groups including at least one emotion according to the range of the feature vector of the voice signal, By classifying the primarily classified results based on the range of different feature vectors, it is possible to improve system performance and negative accuracy of emotional classification that may occur due to differences in the recording environment of the training voice signal and the caller's voice signal. There is.

본 발명과 본 발명의 동작상의 이점 및 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시 예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다.In order to fully understand the present invention, the operational advantages of the present invention, and the objects attained by the practice of the present invention, reference should be made to the accompanying drawings which illustrate preferred embodiments of the present invention and the contents described in the accompanying drawings.

본 명세서에 있어서는 어느 하나의 구성요소가 다른 구성요소로 데이터 또는 신호를 '전송'하는 경우에는 상기 구성요소는 상기 다른 구성요소로 직접 상기 데이터 또는 신호를 전송할 수 있고, 적어도 하나의 또 다른 구성요소를 통하여 상기 데이터 또는 신호를 상기 다른 구성요소로 전송할 수 있음을 의미한다.In the present specification, when one component 'transmits' data or a signal to another component, the component may directly transmit the data or signal to the other component, and at least one other component. Through this means that the data or signal can be transmitted to the other component.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써 본 발명을 상세히 설명한다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Like reference numerals in the drawings denote like elements.

도 1은 본 발명의 실시예에 따른 감성 분류 시스템(100)의 블락도이다. 도 1을 참조하면, 감성 분류 시스템(100)은 특징 벡터 추출 블락(110), 특징 벡터 저장 블락(120), 감성 분류 블락(130), 및 컨트롤러(140)를 포함한다. 감성 분류 시스템(100)의 전반적인 동작은 컨트롤러(140)에 의하여 제어될 수 있다.1 is a block diagram of an emotion classification system 100 according to an embodiment of the present invention. Referring to FIG. 1, the emotion classification system 100 includes a feature vector extraction block 110, a feature vector storage block 120, an emotion classification block 130, and a controller 140. The overall operation of the emotion classification system 100 may be controlled by the controller 140.

특징 벡터 추출 블락(110)는 수신되는 음성 신호의 특징 벡터들을 추출할 수 있다. 음성 신호의 특징 벡터에는 음성 신호의 피치, 에너지, 발성 속도, 음성 신호의 모델링을 위한 다수의 수학적 계수들이 포함하는 MFCC(Mel frequency cepstral coefficients) 등이 포함될 수 있으나 본 발명이 이에 한정되는 것은 아니다.The feature vector extraction block 110 may extract feature vectors of the received voice signal. The feature vector of the speech signal may include pitch, energy, speech rate, mel frequency cepstral coefficients (MFCC) including a plurality of mathematical coefficients for modeling the speech signal, but the present invention is not limited thereto.

도 2는 상기 특징 벡터 추출 블락은 도 2는 도 1에 도시된 특징 벡터 추출 블락(110)의 블락도이다. 도 1을 참조하면, 특징 벡터 추출 블락(110)은 음성 신호 분할 유닛(111), 샘플링 윈도우 적용 유닛(112), 비음성 구간 제거 유닛(113), 및 특징 벡터 발생 블락(114)을 포함할 수 있다.2 is a block diagram of the feature vector extraction block 110 shown in FIG. 1. Referring to FIG. 1, the feature vector extraction block 110 may include a speech signal division unit 111, a sampling window application unit 112, a non-voice interval elimination unit 113, and a feature vector generation block 114. Can be.

음성 신호 분할 유닛(111)은 수신되는 음성 신호를 미리 정해진 프레임 단위로 분할하여 출력할 수 있다. 이는 음성 신호를 보다 작은 단위로 분할하고, 분할된 음성 신호로부터 특징 벡터를 추출함으로써 특징 벡터 추출의 정확성을 향상시키기 위함이다.The voice signal dividing unit 111 may divide and output the received voice signal in predetermined frame units. This is to improve the accuracy of feature vector extraction by dividing the speech signal into smaller units and extracting the feature vectors from the divided speech signals.

샘플링 윈도우 적용 유닛(112)는 프레임 단위로 분할된 음성 신호에 대하여 미리 정해진 중복 범위를 갖는 윈도우들을 곱하여 출력할 수 있다. 예컨대, 샘플링 윈도우 적용 유닛(112)은 이웃한 프레임과 50%의 중복되는 헤밍 윈도우(Hamming window)를 프레임 단위로 분할된 음성 신호에 곱하여 출력함으로써 원하지 않는 고주파 성분이 특징 벡터 추출에 영향을 미치는 것을 감소시킬 수 있다. 만약, 직사각형 타입의 윈도우가 적용될 경우에는 직사각형의 가장자리에 포함된 고주파 성분이 특징 벡터 추출의 정확성을 저해할 수 있다.The sampling window applying unit 112 may multiply and output windows having a predetermined overlapping range with respect to the speech signal divided in units of frames. For example, the sampling window applying unit 112 multiplies and outputs a 50% overlapping Hemming window with a neighboring frame by a speech signal divided by a frame so that unwanted high frequency components affect the feature vector extraction. Can be reduced. If a rectangular window is applied, high frequency components included at the edges of the rectangle may impair the accuracy of feature vector extraction.

비음성 구간 제거 유닛(113)은 음성 신호 중에서 음성 구간을 추출할 수 있다. 이는 특징 벡터 추출에 있어서 음성 신호 중에서 비음성 구간을 제거함으로써 시스템의 성능을 향상시키기 위함이다. 비음성 구간 제거 유닛(113)은 음성 신호 분할 전후 또는 샘플링 윈도우 적용 전후에 위치할 수 있다. 특징 벡터 발생 블락(114)은 음성 신호의 특징 벡터들을 추출할 수 있다.The non-voice section removing unit 113 may extract a voice section from the voice signal. This is to improve the performance of the system by removing the non-speech interval from the speech signal in the feature vector extraction. The non-voice interval removing unit 113 may be positioned before or after the division of the voice signal or before or after applying the sampling window. The feature vector generation block 114 may extract feature vectors of the speech signal.

도 3은 도 2에 도시된 특징 벡터 발생 블락(114)의 블락도이다. 도 3을 참조하면, 특징 벡터 발생 블락(114)은 피치 추출 유닛(115), 에너지 추출 유닛(116), MFCC(Mel frequency cepstral coefficients) 추출 유닛(117), 및 특징 벡터 발생 유닛(119)을 포함할 수 있다. 피치 추출 유닛(115)은 음성 신호의 피치를 추출하여 출력하는 할 수 있다. 여기서, 피치라 함은 음성 신호의 주파수를 의미하는 것으로 음성 신호 발신자의 감성 분류를 위한 가장 기본적이고 중요한 특징 벡터 중의 하나이다.3 is a block diagram of the feature vector generation block 114 shown in FIG. Referring to FIG. 3, the feature vector generation block 114 includes a pitch extraction unit 115, an energy extraction unit 116, a mel frequency cepstral coefficients (MFCC) extraction unit 117, and a feature vector generation unit 119. It may include. The pitch extraction unit 115 may extract and output the pitch of the audio signal. Here, the pitch refers to the frequency of the voice signal and is one of the most basic and important feature vectors for emotion classification of the voice signal sender.

에너지 추출 유닛(116)은 음성 신호의 에너지를 추출하여 출력할 수 있다. MFCC 추출 유닛(117)은 음성 신호의 모델링을 위한 다수의 수학적 계수들을 추출할 수 있다. MFCC란 멜-스케일(Mel-scale)로 표현된 음성 신호의 주파수별 파워 스펙트럼의 형태를 정현파 성분으로 나타낸 것이다. 특징 벡터 발생 유닛(119)은 음성 신호의 피치, 에너지, 및 MFCC와 그들에 대한 평균값 및 표준 편차값을 발생할 수 있다. 여기서, 음성 신호의 피치, 에너지, 및 MFCC와 그들에 대한 평균값 및 표준 편차값은 음성 신호 발신자의 감성을 분류하는 특징 벡터로 활용될 수 있다.The energy extraction unit 116 may extract and output energy of the voice signal. The MFCC extraction unit 117 may extract a plurality of mathematical coefficients for modeling the speech signal. MFCC is a sine wave component of the frequency-specific power spectrum of a speech signal expressed in mel-scale. The feature vector generating unit 119 can generate the pitch, energy, and MFCC of the speech signal and the mean and standard deviation values thereof. Here, the pitch, energy, and MFCC of the speech signal, and average and standard deviation values thereof may be used as feature vectors for classifying the emotions of the speech signal sender.

특징 벡터 발생 블락(114)은 음선 신호의 피치, 에너지, 및 MFCC에 대한 시간적 변화량을 추출하기 위한 델타값 추출 유닛을 더 포함할 수 있으며, 음선 신호의 피치, 에너지, 및 MFCC에 대한 시간적 변화량도 음성 신호 발신자의 감성을 분류하는 특징 벡터로 활용될 수 있다.The feature vector generation block 114 may further include a delta value extraction unit for extracting pitch, energy, and temporal variation of the MFCC from the sound signal, and also varying the pitch, energy, and temporal variation of the sound signal from the MFCC. It can be used as a feature vector for classifying emotions of voice signal senders.

특징 벡터 발생 유닛(119)은 음성 신호의 피치, 에너지, 및 MFCC 각각에 대한 델타값의 평균값 및 표준 편차값을 더 발생할 수 있으며, 음성 신호의 피치, 에너지, 및 MFCC 각각에 대한 델타값의 평균값 및 표준 편차값도 음성 신호 발신자의 감성을 분류하는 특징 벡터로 이용될 수 있다.The feature vector generation unit 119 may further generate an average value and a standard deviation value of the pitch, energy, and delta values of the speech signal, respectively, and the average value of the pitch, energy, and delta values of the speech signal, respectively. And the standard deviation value may also be used as a feature vector that classifies the emotion of the voice signal sender.

특징 벡터 저장 블락(120)은 훈련용 음성 신호들로부터 추출된 특징 벡터들을 감성별로 분류하여 저장할 수 있다. 예컨대, 특징 벡터 저장 블락(120)은 다수 의 감성들을 적어도 2개의 감성을 포함하는 감성 그룹을 적어도 하나 포함하는 다수의 감성 그룹들로 분류하고, 다수의 감성 그룹들 각각에 상응하는 적어도 하나의 특징 벡터 값의 범위를 저장할 수 있다.The feature vector storage block 120 may classify and store feature vectors extracted from training voice signals by emotion. For example, the feature vector storage block 120 classifies a plurality of emotions into a plurality of emotion groups including at least one emotion group including at least two emotions, and includes at least one feature corresponding to each of the plurality of emotion groups. You can store a range of vector values.

또한, 특징 벡터 저장 블락(120)은 다수의 감성 그룹들 중에서 적어도 2개의 감성을 포함하는 감성 그룹에 포함된 적어도 2개 이상의 감성들 각각에 상응하는 적어도 하나의 특징 벡터 값의 범위를 저장할 수 있다.Also, the feature vector storage block 120 may store a range of at least one feature vector value corresponding to each of at least two emotions included in an emotion group including at least two emotions among a plurality of emotion groups. .

여기서, 다수의 감성들에는 평상, 화남, 기쁨, 슬픔, 공포, 놀람, 호감 등의 감성이 포함될 수 있으며, 다수의 감성들은 성별, 나이 등을 기준으로 세분화하여 분류될 수도 있으나 본 발명의 범위가 이에 한정되는 것은 아니다.Here, the plurality of emotions may include emotions such as ordinary, angry, joy, sadness, fear, surprise, crush, and the like, and the plurality of emotions may be classified by gender, age, etc., but the scope of the present invention is It is not limited to this.

예컨대, 특징 벡터 저장 블락(120)은 다수의 감성들 중에서 적어도 2개의 감성들을 포함하는 적어도 하나의 감성 그룹을 포함하는 다수의 감성 그룹들 각각에 상응하는 피치(pitch)의 평균값의 범위 및 상기 적어도 하나의 감성 그룹에 포함된 상기 적어도 2개의 감성들 각각에 상응하는 에너지의 평균값의 범위 및 MFCC의 평균값의 범위를 저장할 수 있다.For example, the feature vector storage block 120 may include a range of average values of pitches corresponding to each of the plurality of emotion groups including at least one emotion group including at least two emotions among the plurality of emotions, and the at least The range of the average value of energy and the range of the average value of MFCC corresponding to each of the at least two emotions included in one emotion group may be stored.

좀더 구체적인 예를 살펴 보면, 특징 벡터 저장 블락(120)은 특징 벡터 추출 블락(110)에 의하여 추출된 훈련용 음성 신호들에 대한 피치의 평균값을 남자 평상, 남자 화남 및 여자 평상, 및 여자 화남 각각에 상응하는 피치 평균값의 범위로 분류하여 저장하고, 특징 벡터 추출 블락(110)에 의하여 추출된 훈련용 음성 신호의 에너지 및 MFCC를 남자 화남 및 여자 평상 각각에 상응하는 에너지의 범위 및 MFCC의 범위로 분류하여 저장할 수 있다.In a more specific example, the feature vector storage block 120 calculates an average value of pitches for training voice signals extracted by the feature vector extraction block 110 for male normal, male angry and female ordinary, and female angry, respectively. And store the energy and MFCC of the training voice signal extracted by the feature vector extraction block 110 into the range of the energy corresponding to the male anger and the female normal and the range of the MFCC, respectively. Can be sorted and stored.

감성 분류 블락(130)은 음성 신호의 특징 벡터들 및 특징 벡터 저장 블락(120)에 저장된 적어도 하나의 제1 특징 벡터 값의 범위 및 제2 특징 벡터 값의 범위에 기초하여 음성 신호 발신자의 감성 분류 결과를 발생할 수 있다.The emotion classification block 130 classifies the emotion of the voice signal sender based on the feature vectors of the speech signal and the range of the at least one first feature vector value and the range of the second feature vector value stored in the feature vector storage block 120. Can result.

도 4는 도 1에 도시된 감성 분류 블락(130)의 블락도이다. 도 4를 참조하면, 감성 분류 블락(130)은 제1 감성 분류 유닛(131) 및 제2 감성 분류 유닛(132)을 포함할 수 있다. 도 4에서는 2개의 감성 분류 유닛(131 및 132) 만이 도시되었으나 본 발명의 범위가 이에 한정되는 것은 아니다.4 is a block diagram of the emotion classification block 130 shown in FIG. 1. Referring to FIG. 4, the emotion classification block 130 may include a first emotion classification unit 131 and a second emotion classification unit 132. In FIG. 4, only two emotional classification units 131 and 132 are illustrated, but the scope of the present invention is not limited thereto.

제1 감성 분류 유닛(131)은 음성 신호의 감성 벡터들 및 특징 벡터 저장 블락(120)에 저장된 적어도 하나의 제1 특징 벡터 값의 범위에 기초하여 음성 신호 발신자의 감성을 다수의 감성 그룹들 중에서 어느 하나의 감성 그룹으로 분류할 수 있다. 제2 감성 분류 유닛(132)은 음성 신호의 감성 벡터들 및 특징 벡터 저장 블락(120)에 저장된 적어도 하나의 제2 특징 벡터 값의 범위에 기초하여 제1 감성 분류 유닛(131)에 의하여 분류된 감성 그룹들 중에서 다수의 감성들을 포함하는 감성 그룹에서 적어도 2개의 감성을 분류할 수 있다.The first emotion classification unit 131 is configured to select the emotion of the voice signal originator from among the plurality of emotion groups based on the range of emotion vectors of the voice signal and at least one first feature vector value stored in the feature vector storage block 120. It can be classified into any one emotion group. The second sentiment classification unit 132 is classified by the first sentiment classification unit 131 based on a range of sentiment vectors of the speech signal and at least one second feature vector value stored in the feature vector storage block 120. At least two emotions may be classified in an emotion group including a plurality of emotions among the emotion groups.

예컨대, 제1 감성 분류 유닛(131)은 음성 신호의 피치의 평균값과 특징 벡터 저장 블락(120)에 저장된 상기 피치의 평균값의 범위에 기초하여 음성 신호의 발신자의 감성을 다수의 감성 그룹들 중에서 어느 하나의 감성 그룹으로 분류할 수 있으며, 제2 감성 분류 유닛(132)은 음성 신호의 에너지 및 MFCC 및 특징 벡터 저장 블락(120)에 저장된 에너지의 범위 및 MFCC의 범위에 기초하여 상기 어느 하나의 감성 그룹에 포함된 적어도 2개의 감성을 분류할 수 있다.For example, the first emotion classification unit 131 may determine the emotion of the sender of the speech signal based on the range of the average value of the pitch of the speech signal and the average value of the pitch stored in the feature vector storage block 120. Can be categorized into one emotion group, and the second emotion classification unit 132 is based on the energy of the voice signal and the range of the energy stored in the MFCC and the feature vector storage block 120 and the range of the MFCC. At least two emotions included in the group may be classified.

좀더 구체적인 예를 살펴보면, 제1 감성 분류 유닛(131)은 음성 신호의 피치의 평균값 및 특징 벡터 저장 블락(120)에 저장된 상기 피치의 평균값의 범위에 기초하여 음성 신호 발신자의 감성을 남자 평상, 남자 화남 및 여자 평상, 및 여자 화남 중에서 어느 하나의 감성 그룹으로 할 수 있으며, 제2 감성 분류 유닛(132)은 음성 신호의 에너지 및 MFCC 및 특징 벡터 저장 블락(120)에 저장된 에너지의 범위 및 MFCC의 범위에 기초하여 남자 화남 및 여자 평상의 감성 그룹을 남자 화남 또는 여자 평상 중 하나의 감성으로 분류할 수 있다.In a more specific example, the first emotion classification unit 131 may use the emotion of the voice signal sender based on the average value of the pitch of the voice signal and the range of the average value of the pitch stored in the feature vector storage block 120. Emotion group of any one of the angry and female normal, and the female angry, the second emotional classification unit 132 is the energy of the speech signal and the range of the energy stored in the MFCC and feature vector storage block 120 and Based on the range, an emotional group of male anger and female ordinary can be classified as either an emotional male or female ordinary.

즉, 제1 감성 분류 유닛(131)은 음성 신호 발신자의 감성을 남자 평상, 남자 화남 및 여자 평상, 및 여자 화남의 3 가지로 분류할 수 있고, 제2 감성 분류 유닛(132)은 제1 감성 분류 유닛(131)에 의하여 분류된 남자 화남 및 여자 평상을 포함하는 감성 그룹을 남자 화남 또는 여자 평상 중의 하나의 감성으로 분류할 수 있다.That is, the first emotion classification unit 131 may classify the emotions of the voice signal sender into three types of male ordinary, male angry and female ordinary, and female angry, and the second emotional classification unit 132 may include the first emotional. An emotion group including the male anger and the female normal classified by the classification unit 131 may be classified into the emotion of either the male anger or the female normal.

도 5는 남녀 성별 및 감성에 따른 음성 신호의 피치의 평균값을 나타내는 그래프이다. 도 5는 남녀 각각 10명이 평상 감성과 화남 감성 각각에 대하여 30 문장을 발성한 음성 신호의 피치의 평균값을 나타낸다. 상술한 바와 같이, 도 5에 도시된 음성 신호의 피치의 평균값은 제1 감성 분류 유닛(131)의 감성 분류의 기준이 될 수 있다.5 is a graph illustrating an average value of pitches of voice signals according to gender and emotions of men and women. Fig. 5 shows an average value of pitches of a speech signal in which 10 men and women each speak 30 sentences for each of the normal emotion and the angry emotion. As described above, the average value of the pitch of the voice signal illustrated in FIG. 5 may be a criterion of the emotion classification of the first emotion classification unit 131.

도 5를 참조하면, 감성들 중에서 남자 평상의 감성의 피치의 평균값과 여자 평상의 감성의 피치의 평균값은 서로 중복되는 범위가 가장 넓은 것을 알 수 있다. 이때, 특징 벡터 저장 블락(120)은 음성 신호의 피치의 평균값의 범위를 남자 평 상, 남자 화남 및 여자 평상, 및 여자 화남의 3 가지로 분류하여 저장할 수 있다. 그러면 제1 감성 분류 유닛(131)은 음성 신호 발신자의 감성을 음성 신호의 피치의 평균값이 포함되는 특징 벡터 저장 블락(120)에 저장된 피치의 평균값의 범위에 상응하는 감성으로 분류할 수 있다.Referring to FIG. 5, it can be seen that, among the emotions, the average value of the pitch of the emotion of the male flat and the average value of the pitch of the emotion of the female flat are the most overlapping with each other. At this time, the feature vector storage block 120 may classify and store the range of the average value of the pitch of the voice signal into three types of male normal, male angry and female ordinary, and female angry. Then, the first emotion classification unit 131 may classify the emotion of the voice signal sender into an emotion corresponding to the range of the average value of the pitch stored in the feature vector storage block 120 including the average value of the pitch of the voice signal.

도 5를 참조하여 설명한 바와 같이, 제1 감성 분류 유닛(131)이 감성 분류 기준은 음성 신호의 피치의 평균값이 될 수 있으나 본 발명의 범위가 이에 한정되는 것은 아니다. 예컨대, 제1 감성 분류 유닛(131)은 음성 신호의 에너지, 발성 속도 등이 중복되는 범위에 기초하여 음성 신호 발신자의 감성을 다수의 감성 그룹들을 포함하는 다수의 감성 그룹들 중에서 하나의 감성 그룹으로 분류할 수도 있다.As described with reference to FIG. 5, the emotion classification criterion of the first emotion classification unit 131 may be an average value of pitches of voice signals, but the scope of the present invention is not limited thereto. For example, the first emotion classification unit 131 may be configured as one emotion group among a plurality of emotion groups including a plurality of emotion groups based on a range in which energy, voice speed, and the like of the voice signal overlap. You can also classify.

도 6은 도 1에 도시된 감성 분류 블락의 감성 분류 결과를 나타낸다. 도 6을 참조하면, 제1 감성 분류 유닛(131)에 의하여 음성 신호 발신자의 감성을 남자 평상, 남자 화남 및 여자 평상, 및 여자 화남의 3 가지로 분류하며, 제2 감성 분류 유닛(132)은 남자 화남 및 여자 평상의 감성 그룹을 다시 남자 화남 및 여자 평상의 감성으로 분류함을 알 수 있다.FIG. 6 shows the emotion classification result of the emotion classification block shown in FIG. 1. Referring to FIG. 6, the first emotion classification unit 131 classifies the emotion of the voice signal sender into three types of male normal, male angry and female ordinary, and female angry, and the second emotional classification unit 132 It can be seen that the group of male angry and female ordinary emotions is again classified into male angry and female ordinary emotions.

상술한 바에 따르면, 본 발명의 실시예에 따른 감성 분류 시스템(100)은 제1 감성 분류 유닛(131) 및 제2 감성 분류 유닛(132)을 통한 2 단계의 감성 분류 과정을 수행할 수 있다. 그러나 본 발명의 범위가 이에 한정되는 것은 아니며, 감성 분류 시스템(100)은 3 단계 이상의 감성 분류 과정을 수행할 수 있다. 감성 분류 시스템(100)이 수행하는 감성 분류 횟수는 다수의 감성들을 포함하는 감성 그룹들의 수가 많을수록 증가할 수 있다.As described above, the emotion classification system 100 according to an embodiment of the present invention may perform a two-step emotion classification process through the first emotion classification unit 131 and the second emotion classification unit 132. However, the scope of the present invention is not limited thereto, and the emotion classification system 100 may perform three or more emotion classification processes. The number of emotion classifications performed by the emotion classification system 100 may increase as the number of emotion groups including a plurality of emotions increases.

한편, 본 발명의 실시예에 따른 감성 분류 시스템(100)의 구성 요소들 각각은 본 발명의 기술적 사상을 수행하기 위한 소프트웨어, 하드웨어, 또는 소프트웨어와 하드웨어의 결합에 의하여 구현될 수 있다. 즉, 본 발명의 실시예에 따른 감성 분류 시스템(100)의 구성 요소들 각각은 소정의 코드와 상기 소정의 코드가 수행되기 위한 하드웨어 리소스의 논리적인 결합으로 구현될 수 있다.On the other hand, each of the components of the emotion classification system 100 according to an embodiment of the present invention may be implemented by software, hardware, or a combination of software and hardware for carrying out the technical spirit of the present invention. That is, each of the components of the emotion classification system 100 according to the embodiment of the present invention may be implemented by a logical combination of a predetermined code and a hardware resource for performing the predetermined code.

도 7은 본 발명의 실시예에 따른 감성 분류 방법의 개략 순서도이다. 이하, 도 1 내지 도 4 및 도 7을 참조하여 그 과정을 살펴본다.7 is a schematic flowchart of an emotion classification method according to an embodiment of the present invention. Hereinafter, the process will be described with reference to FIGS. 1 to 4 and 7.

특징 벡터 추출 블락(110)은 훈련용 음성 파일을 이용하여 다수의 감성들 각각에 대하여 감성별 특징 벡터들을 추출하며, 특징 벡터 저장 블락(120)은 각 감성별 특징 벡터들을 저장한다(S70). 특징 벡터 저장 블락(120)에 저장되는 감성 벡터들은 적어도 2개의 감성을 포함하는 감성 그룹을 적어도 하나 포함하는 다수의 감성 그룹들 각각에 상응하는 적어도 하나의 특징 벡터 값의 범위를 저장할 수 있으며, 다수의 감성 그룹들 중에서 적어도 2개의 감성을 포함하는 감성 그룹에 포함된 적어도 2개 이상의 감성들 각각에 상응하는 적어도 하나의 특징 벡터 값의 범위를 저장할 수 있다.The feature vector extraction block 110 extracts feature vectors for each emotion by using a training voice file, and the feature vector storage block 120 stores feature vectors for each emotion (S70). The emotion vectors stored in the feature vector storage block 120 may store a range of at least one feature vector value corresponding to each of the plurality of emotion groups including at least one emotion group including at least two emotions. A range of at least one feature vector value corresponding to each of at least two emotions included in an emotion group including at least two emotions among the emotion groups of may be stored.

훈련용 음성 파일을 이용한 감성 분류 시스템(100)의 훈련이 완료되면, 특징 벡터 추출 블락(110)은 입력되는 음성 신호의 특징 벡터들을 추출한다(S80). 그러면, 감성 분류 블락(130)은 특징 벡터 저장 블락(120)에 저장되는 감성별 특징 벡터들과 음성 신호의 특징 벡터들에 기초하여 음성 신호 발신자의 감성을 분류하여 그 감성 결과를 발생한다(S90).When the training of the emotion classification system 100 using the training voice file is completed, the feature vector extraction block 110 extracts feature vectors of the input voice signal (S80). Then, the emotion classification block 130 classifies the emotion of the voice signal sender based on the emotion-specific feature vectors stored in the feature vector storage block 120 and the feature vectors of the voice signal to generate an emotion result (S90). ).

도 8은 본 발명의 실시예에 따른 감성 분류 방법의 상세 순서도이다. 좀더 구체적으로 말하면, 도 8은 음성 신호의 피치의 평균값에 기초하여 제1 감성 분류를 수행하고, 음성 신호의 에너지의 평균값 및 MFCC의 평균값에 기초하여 제2 감성 분류를 수행하는 감성 분류 방법의 순서도이다. 이하, 도 1 내지 도 4 및 도 8을 참조하여 그 과정을 살펴본다.8 is a detailed flowchart of an emotion classification method according to an embodiment of the present invention. More specifically, FIG. 8 is a flowchart of an emotion classification method of performing a first emotional classification on the basis of the average value of the pitch of the speech signal, and performing a second emotional classification based on the average value of the energy of the speech signal and the average value of the MFCC. to be. Hereinafter, the process will be described with reference to FIGS. 1 to 4 and 8.

특징 벡터 추출 블락(110)은 훈련용 음성 파일을 이용하여 감성별 피치의 평균값, 에너지, 및 MFCC를 추출하여 저장한다(S70). 훈련용 음성 파일을 이용한 감성 분류 시스템(100)의 훈련이 완료되면, 특징 벡터 추출 블락(110)은 입력되는 음성 신호의 피치의 평균값, 에너지, 및 MFCC를 추출한다(S80).The feature vector extraction block 110 extracts and stores the average value, the energy, and the MFCC of each pitch by using the training voice file (S70). When the training of the emotion classification system 100 using the training voice file is completed, the feature vector extraction block 110 extracts an average value, energy, and MFCC of the pitch of the input voice signal (S80).

입력되는 음성 신호에 대한 특징 벡터들 추출이 완료되면, 제1 감성 분류 유닛(131)은 특징 벡터 저장 블락(120)에 저장된 감성별 피치의 평균값과 음성 신호의 피치의 평균값에 기초하여 제1 감성 분류를 수행한다(S90a). 예컨대, 제1 감성 분류 유닛(131)은 음성 신호 발신자의 감성을 특징 벡터 저장 블락(120)에 저장된 감성 그룹들에 상응하는 피치의 평균값 및 음성 신호의 피치의 평균값에 기초하여 남자 평상, 남자 화남 및 여자 평상, 및 여자 화남의 3가지로 분류할 수 있다.When the extraction of the feature vectors for the input voice signal is completed, the first emotion classification unit 131 may generate the first emotion based on the average value of the pitch for each emotion stored in the feature vector storage block 120 and the average value of the pitch of the voice signal. The classification is performed (S90a). For example, the first sentiment classification unit 131 is based on the average value of the pitch and the average value of the pitch of the voice signal corresponding to the emotional groups stored in the feature vector storage block 120 of the voice signal sender. And female normal, and female anger.

제1 감성 분류 수행이 완료되면, 제2 감성 분류 유닛(132)은 특징 벡터 저장 블락(120)에 저장된 감성별 에너지 및 MFCC와 음성 신호의 에너지 및 MFCC에 기초하여 제2 감성 분류를 수행할 수 있다(90b). 예컨대, 제2 감성 분류 유닛(132)은 제1 감성 분류 유닛(131)에 의하여 분류된 감성 그룹들 중에서 2개 이상의 감성들을 포함하는 남자 화남 및 여자 화남의 감성 그룹으로 분류된 음성 신호 발신자의 감성을 특징 벡터 저장 블락(120)에 저장된 감성별 에너지의 범위 및 MFCC의 범위 및 음성 신호의 에너지 및 MFCC에 기초하여 남자 화남 또는 여자 평상의 감성으로 분류할 수 있다.When the first emotional classification is completed, the second emotional classification unit 132 may perform the second emotional classification based on the emotion-specific energy and the MFCC and the energy of the speech signal and the MFCC stored in the feature vector storage block 120. 90b. For example, the second emotion classification unit 132 may include emotions of voice signal callers classified into an emotion group of male angry and female angry including two or more emotions among the emotional groups classified by the first emotional classification unit 131. Based on the range of energy for each emotion stored in the feature vector storage block 120 and the range of the MFCC, the energy of the voice signal and the MFCC can be classified into male male or female ordinary emotion.

본 발명의 실시예에 따른 감성 분류 방법은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다.The emotion classification method according to an embodiment of the present invention may also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored.

예컨대, 컴퓨터가 읽을 수 있는 기록매체에는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 또한 본 발명의 실시예에 따른 감성 분류 방법을 수행하기 위한 프로그램 코드는 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 전송될 수 있다.For example, a computer-readable recording medium may include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The program code for performing the emotion classification method according to an embodiment of the present invention may be May be transmitted in the form of a carrier wave (eg, transmission over the Internet).

또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명의 실시예에 따른 감성 분류 방법을 구현하기 위한 기능적인 (functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. And functional programs, codes and code segments for implementing the emotion classification method according to an embodiment of the present invention can be easily inferred by programmers in the art to which the present invention belongs.

이하에서는 본 발명의 실시예에 따른 감성 분류 시스템(100)에 대한 다수의 데이터 베이스를 이용한 성능 평가 결과를 살펴본다.Hereinafter, a performance evaluation result using a plurality of databases of the emotion classification system 100 according to an embodiment of the present invention will be described.

표 1은 감성 분류 시스템(100)의 성능을 평가하기 위한 3가지의 데이터 베이스를 나타낸다.Table 1 shows three databases for evaluating the performance of the emotion classification system 100.

표 1에서 화자 독립이라고 함은 녹음 인원들 사이에 발성 문장이 서로 중복되지 않는 것을 말하며, 문장 독립이라 함은 서로 다른 감성에 대해서는 발성 문장이 서로 다른 것을 의미한다. 실생활에서는 서로 다른 사람들 서로 다른 문장을 통하여 자신의 감성을 표현하므로 화자 독립 및 문장 독립의 데이터 베이스가 실재 감성 분류 시스템의 성능 평가에 적합한 데이터 베이스라 할 것이다.In Table 1, speaker independence means that utterance sentences do not overlap with each other, and sentence independence means that utterance sentences are different for different emotions. In real life, since different people express their emotions through different sentences, the database of speaker independence and sentence independence will be referred to as a suitable database for performance evaluation of real emotion classification system.

제1 데이터 베이스(DB1)는 감성 표현 훈련자(예컨대, 연기자)들의 화자 독립-문장 종속의 감성별 녹음 파일들로 구성되며, 제2 데이터 베이스(DB2)는 드라마 대사 속에서의 화자 독립-문장 독립의 감성별 대사의 녹음 파일들로 구성되며, 제3 데이터 베이스(DB3)는 감성 표현 훈련자들의 화자 독립-문장 독립의 감성별 녹음 파일들로 구성된다. 그러므로 제3 데이터 베이스(DB3)에 저장된 음성 파일들이 실생활에서의 감성 표현에 가장 근접한 감성별 음성 파일들이라 할 것이다.The first database DB1 is composed of emotion-dependent recording files of speaker independence-sentence subordination of emotional expression trainers (eg, performers), and the second database DB2 is speaker independence-sentence in drama dialogue. The third database DB3 is composed of emotional recording files of speaker independence-sentence independence of emotional expression trainees. Therefore, the voice files stored in the third database DB3 are the emotion-specific voice files closest to the emotional expression in real life.

표 2는 제3 데이터 베이스(DB3)의 음성 파일들을 이용하여 종래의 평상/화남의 이진 감성 분류 시스템을 훈련시키고, 제3 데이터 베이스(DB3)의 나머지 음성 파일들을 이용하여 평상/화남의 이진 감성 분류 방법에 기초하여 감성 분류 시스템(100)의 성능을 평가한 것이다.Table 2 trains a conventional normal / angry binary emotional classification system using the voice files of the third database DB3, and uses the remaining voice files of the third database DB3 to train the normal / anger binary sensitivity. The performance of the emotion classification system 100 is evaluated based on the classification method.

표 2를 참조하면, 이진 감성 분류 시스템은 평상 감성을 나타내는 음성 파일 500 중 493개를 평상 감성으로 분류하였고, 화남 감성을 나타내는 음성 파일 500개 중 480개를 화남 감성으로 평가하였다. 그러므로 이진 감성 분류 시스템의 정확성은 97.3%이다. 이는 제3 데이터 베이스(DB3)가 안정적으로 구축된 데이터 베이스임을 의미한다.Referring to Table 2, the binary sentiment classification system classified 493 out of 500 voice files representing normal emotions as normal emotions, and evaluated 480 out of 500 voice files representing angry emotions as angry emotions. Therefore, the accuracy of the binary emotional classification system is 97.3%. This means that the third database DB3 is a stable database.

표 3a는 제3 데이터 베이스(DB3)의 음성 파일들을 이용하여 종래의 평상/화남의 이진 감성 분류 시스템을 훈련시키고, 제1데이터 베이스(DB1)의 음성 파일들을 이용하여 이진 감성 분류 시스템의 성능을 평가한 것이다.Table 3a trains a conventional ordinary / angry binary emotional classification system using voice files of a third database DB3, and shows the performance of the binary emotional classification system using voice files of a first database DB1. It is evaluated.

표 3a를 참조하면, 이진 감성 분류 시스템은 평상 감성을 나타내는 음성 파일 500 중 500개를 평상 감성으로 분류하였고, 화남 감성을 나타내는 음성 파일 500 중 10개를 화남 감성으로 평가하였다. 그러므로 이진 감성 분류 시스템의 정확성은 51.0%이다. 표 2 및 표 3a을 참조하면, 안정적으로 구축된 제3 데이터 베이스(DB3)를 기초로 이진 감성 분류 시스템이 훈련되더라도, 서로 다른 환경에서 녹음된 데이터 베이스에 대해서는 이진 감성 분류 시스템의 성능이 저하됨을 알 수 있다.Referring to Table 3a, the binary sentiment classification system classified 500 out of 500 voice files representing ordinary emotions as normal emotions and evaluated 10 out of 500 voice files representing angry emotions as angry emotions. Therefore, the accuracy of the binary emotional classification system is 51.0%. Referring to Tables 2 and 3a, even if the binary emotional classification system is trained based on a stablely constructed third database (DB3), the performance of the binary emotional classification system is degraded for databases recorded in different environments. Able to know.

표 3b는 제3 데이터 베이스(DB3)의 음성 파일들을 이용하여 본 발명의 실시예에 따른 감성 분류 시스템(100)을 훈련시키고, 제1데이터 베이스(DB1)의 음성 파일들을 이용하여 감성 분류 시스템(100)의 성능을 평가한 것이다.Table 3b trains the emotion classification system 100 according to an embodiment of the present invention using the voice files of the third database DB3, and uses the voice files of the first database DB1. 100) is evaluated.

표 3b를 참조하면, 감성 분류 시스템(100)은 남자 평상 감성을 나타내는 음성 파일 250개 중 195개를 남자 평상으로 분류하였고, 남자 화남 감성을 나타내는 음성 파일 250개 중 205개를 남자 화남으로 분류하였으며, 여자 평상 감성을 나타내는 음성 파일 250개 중 220개를 여자 평상으로 분류하였으며, 여자 화남 감성을 나타내는 음성 파일 250개 중 243개를 여자 화남으로 분류하였다. 그러므로 감성 분류 시스템(100)의 정확성은 86.3%이다.Referring to Table 3b, the emotion classification system 100 classified 195 out of 250 voice files representing male normal emotion as male normal, and classified 205 out of 250 voice files representing male angry emotion as male angry. Of the 250 voice files representing female normal sensibilities, 220 were classified as female normals. Therefore, the accuracy of the emotion classification system 100 is 86.3%.

도 3a 및 도 3b를 참조하면, 본 발명의 실시예에 따른 감성 분류 시스템(100)의 서로 다른 환경에서 녹음된 동일한 데이터 베이스에 대한 감성 분류 성능이 종래의 이진 감성 분류 시스템보다 뛰어남을 알 수 있다.3A and 3B, it can be seen that the emotional classification performance of the same database recorded in different environments of the emotional classification system 100 according to the embodiment of the present invention is superior to the conventional binary emotional classification system. .

표 4a는 제3 데이터 베이스(DB3)의 음성 파일들을 이용하여 종래의 이진 감성 분류 시스템을 훈련시키고, 제2데이터 베이스(DB2)의 음성 파일들을 이용하여 이진 감성 분류 시스템의 성능을 평가한 것이다.Table 4a trains the conventional binary emotion classification system using the voice files of the third database DB3, and evaluates the performance of the binary emotion classification system using the voice files of the second database DB2.

표 4a를 참조하면, 이진 감성 분류 시스템은 평상 감성을 나타내는 음성 파일 500 중 490개를 평상 감성으로 분류하였고, 화남 감성을 나타내는 음성 파일 500 중 115개를 화남 감성으로 평가하였다. 그러므로 이진 감성 분류 시스템의 정확성은 60.5%이다. 표 2 및 표 4a을 참조하면, 안정적으로 구축된 제3 데이터 베이스(DB3)를 기초로 이진 감성 분류 시스템이 훈련되더라도, 서로 다른 환경에서 녹음된 데이터 베이스에 대해서는 이진 감성 분류 시스템의 성능이 저하됨을 알 수 있다.Referring to Table 4a, the binary emotional classification system classified 490 out of 500 voice files representing normal emotions as normal emotions, and evaluated 115 out of 500 voice files representing angry emotions as angry emotions. Therefore, the accuracy of the binary emotional classification system is 60.5%. Referring to Tables 2 and 4a, even if the binary emotional classification system is trained based on a stablely constructed third database (DB3), the performance of the binary emotional classification system is degraded for databases recorded in different environments. Able to know.

표 4b는 제3 데이터 베이스(DB3)의 음성 파일들을 이용하여 감성 분류 시스템(100)을 훈련시키고, 제2데이터 베이스(DB2)의 음성 파일들을 이용하여 감성 분류 시스템(100)의 성능을 평가한 것이다.Table 4b shows the training of the emotion classification system 100 using the voice files of the third database DB3, and the performance of the emotion classification system 100 using the voice files of the second database DB2. will be.

표 4b를 참조하면, 감성 분류 시스템(100)은 남자 평상 감성을 나타내는 음성 파일 250개 중 198개를 남자 평상으로 분류하였고, 남자 화남 감성을 나타내는 음성 파일 250개 중 165개를 남자 화남으로 분류하였으며, 여자 평상 감성을 나타내는 음성 파일 250개 중 188개를 여자 평상으로 분류하였으며, 여자 화남 감성을 나타내는 음성 파일 250개 중 200개를 여자 화남으로 분류하였다. 그러므로 감성 분류 시스템(100)의 정확성은 75.1%이다.Referring to Table 4b, the emotion classification system 100 classifies 198 of 250 voice files representing male normal emotion as male normal, and classifies 165 of 250 voice files representing male anger emotional as male angry. Of the 250 voice files representing female normal emotion, 188 were classified as female normal, and 200 of 250 voice files representing female anger emotional were classified as female male. Therefore, the accuracy of the emotion classification system 100 is 75.1%.

도 4a 및 도 4b를 참조하면, 본 발명의 실시예에 따른 감성 분류 시스템(100)의 서로 다른 환경에서 녹음된 동일한 데이터 베이스에 대한 감성 분류 성능이 종래의 이진 감성 분류 시스템보다 뛰어남을 알 수 있다.4A and 4B, it can be seen that the emotional classification performance of the same database recorded in different environments of the emotional classification system 100 according to the embodiment of the present invention is superior to the conventional binary emotional classification system. .

현재 고객 센터에서 사용되는 감성 분류 시스템은 훈련용 음성과 불특정 고객들의 음성이 서로 다른 환경에서 녹음되는 경우가 대부분이기 때문에 전통적인 평상/화남의 이진 감성 분류 시스템의 성능은 낮을 수밖에 없다. 그러나 본 발명의 실시예에 따른 감성 분류 시스템(100)은 고객 센터와 같은 기업환경에서 고객들의 불만 관리 시스템에 적용되더라도 녹음 환경 차이로 인한 시스템의 성능 저하를 방지할 수 있다.Currently, the emotional classification system used in the customer center is mostly recorded in different environments for the training voice and the unspecified customer's voice, so the performance of the traditional flat / angry binary emotional classification system is low. However, even if the emotion classification system 100 according to an embodiment of the present invention is applied to a complaint management system of customers in a corporate environment such as a customer center, it is possible to prevent the performance degradation of the system due to differences in recording environments.

나아가, 본 발명의 실시예에 따른 감성 분류 시스템(100)은 소방서, 경찰서, 결혼 정보 회사 등과 같이 불특정 다수의 고객들의 감성을 빠르게 인식하고 대응해야 하는 분야에서 유용하게 활용될 수 있다.Furthermore, the emotion classification system 100 according to the embodiment of the present invention may be usefully used in a field in which the emotions of a plurality of unspecified customers, such as a fire station, a police station, a marriage information company, etc., must be quickly recognized and responded to.

발명은 도면에 도시된 일 실시 예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the invention has been described with reference to one embodiment shown in the drawings, this is merely exemplary, and it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

도 1은 본 발명의 실시예에 따른 감성 분류 시스템의 블락도이다.1 is a block diagram of an emotion classification system according to an embodiment of the present invention.

도 2는 도 1에 도시된 특징 벡터 추출 블락의 블락도이다.FIG. 2 is a block diagram of the feature vector extraction block shown in FIG. 1.

도 3은 도 2에 도시된 특징 벡터 발생 유닛의 블락도이다.3 is a block diagram of the feature vector generating unit shown in FIG.

도 4는 도 1에 도시된 감성 분류 블락의 블락도이다.4 is a block diagram of the emotion classification block shown in FIG. 1.

도 5는 남녀 성별 및 감성에 따른 음성 신호의 피치의 평균값을 나타내는 그래프이다.5 is a graph illustrating an average value of pitches of voice signals according to gender and emotions of men and women.

도 6은 도 1에 도시된 감성 분류 블락의 감성 분류 결과를 나타낸다.FIG. 6 shows the emotion classification result of the emotion classification block shown in FIG. 1.

도 7은 본 발명의 실시예에 따른 감성 분류 방법의 개략 순서도이다.7 is a schematic flowchart of an emotion classification method according to an embodiment of the present invention.

도 8은 본 발명의 실시예에 따른 감성 분류 방법의 상세 순서도이다.8 is a detailed flowchart of an emotion classification method according to an embodiment of the present invention.

Claims

A feature vector extraction block for extracting feature vectors of a received speech signal;

Corresponding to each of a plurality of emotion groups including at least one emotion group including at least two emotions among a plurality of emotions based on feature vectors for training speech signals extracted by the feature vector extraction block A feature vector storage block for storing a range of at least one first feature vector value and at least one second feature vector value corresponding to each of the at least two emotions included in the at least one emotion group; And

Emotion classification that generates an emotion classification result of the voice signal sender based on the range of the feature vectors and the range of the at least one first feature vector value and the range of the second feature vector value stored in the feature vector storage block. Block,

The emotion classification block

A first emotion classification unit that classifies the emotion of the voice signal sender into any one emotion group among the plurality of emotion groups based on the range of emotion vectors of the voice signal and the at least one first feature vector value; And

An emotion classification system including a second emotion classification unit for classifying the at least two emotions included in the at least one emotion group based on a range of emotion vectors of the speech signal and the at least one second feature vector value .

delete

The method of claim 1, wherein the feature vector extraction block is

A voice signal dividing unit for dividing and outputting the voice signal in predetermined frame units;

A sampling window application unit which multiplies and outputs windows having a predetermined overlapping range with respect to the speech signal divided in the frame unit;

A non-voice section removing block for extracting a voice section from the voice signal; And

And a feature vector generation block for extracting feature vectors of the speech signal.

The method of claim 3, wherein the feature vector generation block is

A pitch extraction unit for extracting and outputting a pitch of the voice signal;

An energy extraction unit for extracting and outputting energy of the voice signal;

Mel frequency cepstral coefficients (MFCC) extraction unit for extracting a plurality of mathematical coefficients for modeling the speech signal; And

And a feature vector generating unit for generating an average value and / or a standard deviation value for at least one of pitch, energy, and MFCC of the speech signal.

The method of claim 4, wherein the feature vector generation block is

A delta value extracting unit for extracting pitch, energy, and temporal change amount for MFCC of the speech signal,

The feature vector generating unit

And an average value and a standard deviation value of pitch, energy, and delta values for the MFCC of the speech signal.

The method of claim 4, wherein the feature vector storage block is

A range of an average value of a pitch corresponding to each of the plurality of emotion groups including the at least one emotion group including the at least two emotion groups among the plurality of emotions and the at least two included in the at least one emotion group An emotional classification system that stores a range of energy and a range of MFCCs corresponding to each of the emotions.

The method of claim 6, wherein the first emotional classification unit

Classify the sentiment of the sender of the voice signal into any one of the plurality of emotion groups based on a range of the average value of the pitch of the voice signal and the average value of the pitch stored in the feature vector storage block,

The second emotional classification unit

An emotion classification system for classifying at least two emotions included in the at least one emotion group based on the energy of the speech signal and the range of MFCC and the range of energy stored in the feature vector storage block and the MFCC.

A feature vector extraction block for extracting an average value, an energy, and an MFCC of a pitch of the received speech signal;

The average value of the pitches for the training voice signals extracted by the feature vector extraction block is classified into a range of pitch average values corresponding to each of the male normal, the male angry and the female ordinary, and the female angry, and the feature vector extraction is performed. Characteristic vector storage block for storing the energy and the MFCC of the training voice signal extracted by the block into a range of energy and a range of MFCC corresponding to the male anger and the female normal, respectively:

Emotion classification that generates the emotion classification result of the voice signal sender based on the average value of the pitch of the voice signal, the energy, and the range of the average value of the pitch stored in the MFCC and the feature vector storage block, the energy range, and the range of the MFCC. Block,

The emotion classification block

Based on the average value of the pitch of the voice signal and the range of the average value of the pitch stored in the feature vector storage block, the sentiment of the voice signal sender is sent to the emotional group of any one of male ordinary, male angry and female ordinary, and female angry. A first emotional classification unit to classify; And

A second classifying an emotional group of male angry and female ordinary into one of male angry or female ordinary based on the energy of the speech signal and the range of the energy stored in the MFCC and the feature vector storage block and the range of the MFCC Emotion classification system comprising an emotion classification unit.

delete

The method of claim 8, wherein the feature vector extraction block is

11. The method of claim 10, wherein the feature vector generation block is

The method of claim 11, wherein the feature vector generation block is

The feature vector generating unit

Extracting feature vectors of the received speech signal;

At least one first feature vector corresponding to each of the plurality of emotion groups including at least one emotion group including at least two emotions among the plurality of emotions based on feature vectors extracted from the training speech signals Storing a range of values and a range of at least one second feature vector value corresponding to each of the at least two sentiments included in the at least one sentiment group; And

Generating an emotional classification result of the voice signal sender based on the feature vectors of the voice signal and the stored range of the at least one first feature vector value and a range of a second feature vector value,

Generating the emotion classification result

Classifying the emotion of the voice signal sender into any one of the plurality of emotion groups based on the range of emotion vectors of the voice signal and the at least one first feature vector value; And

And classifying the at least two emotions included in the at least one emotion group based on a range of emotion vectors of the speech signal and the at least one second feature vector value.

delete

The method of claim 13, wherein extracting feature vectors of the speech signal comprises:

Dividing and outputting the voice signal into predetermined frame units;

Multiplying and outputting windows having a predetermined overlapping range with respect to the speech signal divided in the frame unit;

Extracting a voice section from the voice signal; And

And generating feature vectors of the speech signal.

16. The method of claim 15, wherein generating feature vectors of the speech signal

Extracting and outputting a pitch of the voice signal;

Extracting and outputting energy of the voice signal;

Extracting an MFCC including a plurality of mathematical coefficients for modeling the speech signal; And

Generating a mean value and / or a standard deviation value for at least one of pitch, energy, and MFCC of the speech signal.

17. The method of claim 16, wherein storing the first feature vector value range and the second feature vector value range comprises:

A range of an average value of a pitch corresponding to each of the plurality of emotion groups including the at least one emotion group including the at least two emotion groups among the plurality of emotions and the at least two included in the at least one emotion group Storing the range of energy and range of MFCC corresponding to each of the emotions.

18. The method of claim 17, wherein generating the emotion classification result of the sender of the voice signal

Classifying the emotion of the sender of the speech signal into any one emotion group among the plurality of emotion groups based on a range of the average value of the pitch of the speech signal and the average value of the pitch; And

And classifying at least two emotions included in the at least one emotion group based on the energy of the speech signal and the MFCC and the stored range of energy and the range of MFCC.

A computer-readable recording medium storing a computer program for executing the emotion classification method according to any one of claims 13 and 15 to 18.