KR20220063818A

KR20220063818A - System and method for analyzing emotion of speech

Info

Publication number: KR20220063818A
Application number: KR1020200148922A
Authority: KR
Inventors: 이혜영
Original assignee: 주식회사 스피랩
Priority date: 2020-11-09
Filing date: 2020-11-09
Publication date: 2022-05-18
Also published as: KR102429365B1

Abstract

The present invention relates to a voice emotion analysis system and method. According to an embodiment of the present invention, the voice emotion analysis system comprises: a voice input unit receiving a user's voice; a text conversion unit converting the user's voice into text; a voice emotion analysis unit analyzing the emotion of the user's voice; a text emotion analysis unit analyzing the emotion of the text; and an emotion extraction unit extracting the user's emotion through a voice emotion result calculated by the voice emotion analysis unit and the text emotion result calculated by the text emotion analysis unit and adjusting a result when the voice emotion result and the text emotion result do not match. Accordingly, the voice emotion analysis system can accurately analyze the emotion of the user's voice.

Description

Speech emotion analysis system and method {System and method for analyzing emotion of speech}

본 발명은 음성감성 분석 시스템 및 방법에 관한 것으로서, 보다 상세하게는 음성의 감성과 음성에서 인식된 텍스트의 감성을 모두 고려하여 음성감성을 정확하게 분석하는 것이 가능한 음성감성 분석 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for analyzing voice emotions, and more particularly, to a system and method for analyzing voice emotions capable of accurately analyzing voice sensibility by considering both the sensibility of the voice and the sensibility of a text recognized from the voice.

장수(長壽)는 인간의 소망이기도 하지만, 질병, 빈곤, 고독 등은 장수의 일면이기도 하다.Longevity is a human wish, but disease, poverty, and loneliness are also aspects of longevity.

이러한 문제의 해결을 위해서는 문제를 파악하는 과정이 우선되어야 하는데, 질병, 빈곤은 정량적인 방식을 통해 비교적 쉽게 파악이 가능하였지만 고독과 같이 감정과 관련된 문제는 정량적인 방식으로 파악하는 것이 쉽지 않았다.In order to solve these problems, the process of identifying the problem must be prioritized. Although disease and poverty were relatively easy to grasp through quantitative methods, it was not easy to grasp emotional issues such as loneliness through quantitative methods.

그러나 최근에 들어서는 인공지능을 통해 인간의 감정·감성을 파악하는 방법이 많이 개발되고 있다.However, in recent years, many methods have been developed to understand human emotions and emotions through artificial intelligence.

인간은 표정, 행동, 말 등을 통해 감정을 표현하며, 따라서 인공지능을 통해 인간의 감정을 파악할 때에는 표정, 행동, 말 등을 분석한다.Humans express emotions through facial expressions, actions, and words. Therefore, when understanding human emotions through artificial intelligence, facial expressions, actions, and words are analyzed.

그런데 사람은 표정, 행동, 말 각각에서 명시적으로는 서로 상반된 감정 등을 표현하기도 하므로, 어느 하나의 감정 표현 수단만을 분석하면 감정 파악의 정확성이 떨어질 수 있다.However, since people expressly express opposite emotions in each of their expressions, actions, and words, analyzing only one emotion expression means may reduce the accuracy of emotion recognition.

KRKR 10-2018-005797010-2018-0057970 AA

따라서, 본 발명의 목적은 이와 같은 종래의 문제점을 해결하기 위한 것으로서, 음성의 감성과 음성에서 인식된 텍스트의 감성을 모두 고려하여 감성을 정확하게 분석하는 것이 가능한 음성감성 분석 시스템 및 방법을 제공함에 있다.Accordingly, an object of the present invention is to solve such a problem in the prior art, and to provide a voice sentiment analysis system and method capable of accurately analyzing the emotion in consideration of both the emotion of the voice and the emotion of the text recognized from the voice. .

본 발명이 해결하고자 하는 과제는 위에서 언급한 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기 목적은, 본 발명에 따라, 사용자 음성을 입력받는 음성 입력부; 상기 사용자 음성을 텍스트로 변환하는 텍스트 변환부; 상기 사용자 음성의 감성을 분석하는 음성 감성 분석부; 상기 텍스트의 감성을 분석하는 텍스트 감성 분석부; 및 상기 음성 감성 분석부에서 산출된 음성 감성 결과와 상기 텍스트 감성 분석부에서 산출된 텍스트 감성 결과를 통해 사용자의 감성을 도출하되, 상기 음성 감성 결과와 상기 텍스트 감성 결과가 매칭되지 않는 경우에 결과를 조정하는 감성 도출부;를 포함하는 음성감성 분석 시스템에 의해 달성된다.The above object, according to the present invention, a voice input unit for receiving a user's voice; a text converter for converting the user's voice into text; a voice emotion analysis unit for analyzing the emotion of the user's voice; a text sentiment analysis unit for analyzing the emotion of the text; and the user's emotion is derived through the voice emotion result calculated by the voice emotion analysis unit and the text emotion result calculated by the text emotion analysis unit, but if the voice emotion result and the text emotion result do not match the result It is achieved by a voice emotion analysis system comprising a; emotion derivation unit to adjust.

본 발명에 의한 음성감성 분석 시스템은, 음성의 감성별 특징이 저장되어 상기 음성 감성 분석부에서의 분석 기준을 제공하는 음성 감성 모델부 및 텍스트의 감성별 특징이 저장되어 상기 텍스트 감성 분석부에서의 분석 기준을 제공하는 텍스트 감성 모델부를 더 포함할 수 있다.In the voice emotion analysis system according to the present invention, the voice emotion model unit that stores the characteristics of each emotion of the voice to provide analysis criteria in the voice emotion analysis unit, and the text emotion analysis unit stores the characteristics of each emotion in the text emotion analysis unit. It may further include a text sentiment model unit that provides analysis criteria.

상기 음성 감성 모델부는, 감성이 라벨링된 음성 데이터가 저장된 음성 데이터부, 상기 음성 데이터의 음성 특징 벡터를 추출하는 음성 특징 추출부, 상기 음성 특징 벡터를 학습하는 음성 감성 모델 학습부 및 상기 음성 감성 모델 학습부에서 도출된 음성 감성 모델이 저장된 음성 감성 모델 저장부를 구비할 수 있다.The voice emotion model unit includes a voice data unit storing voice data labeled with emotion, a voice feature extracting unit for extracting a voice feature vector of the voice data, a voice emotion model learning unit for learning the voice feature vector, and the voice emotion model A voice emotion model storage unit in which the speech emotion model derived from the learning unit is stored may be provided.

상기 음성 감성 모델 학습부는 SVM 모델, 및 CNN과 RNN의 결합 모델 중 적어도 어느 하나를 통해 상기 음성 특징 벡터를 학습할 수 있다.The speech emotion model learning unit may learn the speech feature vector through at least one of an SVM model and a combined model of CNN and RNN.

상기 음성 감성 모델 학습부는 SVM 모델, 및 CNN과 RNN의 결합 모델 모두를 통해 상기 음성 특징 벡터를 앙상블 학습할 수 있다.The speech emotion model learning unit may ensemble the speech feature vector through both an SVM model and a combined model of CNN and RNN.

상기 텍스트 감성 모델부는, 감성이 라벨링된 텍스트 데이터가 저장된 텍스트 데이터부, 상기 텍스트 데이터를 학습하는 텍스트 감성 모델 학습부 및 상기 텍스트 감성 모델 학습부에서 도출된 텍스트 감성 모델이 저장된 텍스트 감성 모델 저장부를 구비할 수 있다.The text emotion model unit includes a text data unit storing text data labeled with emotion, a text emotion model learning unit for learning the text data, and a text emotion model storage unit storing a text emotion model derived from the text emotion model learning unit can do.

상기 감성 도출부는 상기 사용자 음성을 의사결정 나무 알고리즘을 통해 분석하는 조정부를 구비하여, 상기 음성 감성 결과와 상기 텍스트 감성 결과가 매칭되지 않는 경우에 상기 조정부를 통해 사용자의 감성을 도출할 수 있다.The emotion derivation unit may include an adjustment unit that analyzes the user's voice through a decision tree algorithm, and when the voice emotion result and the text emotion result do not match, the user's emotion may be derived through the adjustment unit.

본 발명의 또 다른 실시예에 의하면, 사용자 음성을 입력받는 사용자 음성 입력단계; 상기 사용자 음성을 텍스트로 변환하는 텍스트 변환단계; 상기 사용자 음성의 감성을 분석하고, 상기 텍스트의 감성을 분석하는 감성 분석단계; 및 상기 감성 분석단계에서 도출된 음성 감성 결과와 텍스트 감성 결과를 통해 사용자의 감성을 도출하되, 상기 음성 감성 결과와 텍스트 감성 결과가 매칭되지 않는 경우에 결과를 조정하는 감성 도출단계;를 포함하는 음성감성 분석 방법이 제공된다.According to another embodiment of the present invention, a user voice input step of receiving a user voice; a text conversion step of converting the user's voice into text; a sentiment analysis step of analyzing the emotion of the user's voice and analyzing the emotion of the text; and an emotion derivation step of deriving the user's emotion through the speech emotion result and the text emotion result derived in the emotion analysis step, but adjusting the result when the voice emotion result and the text emotion result do not match; A sentiment analysis method is provided.

본 발명에 의한 음성감성 분석 시스템에 의하면, 사용자의 음성에서 음향적 특성과 의미적 특성을 모두 고려하여 감성을 분석함으로써 사용자의 감성 상태를 정확하게 파악하는 것이 가능하다.According to the voice emotion analysis system according to the present invention, it is possible to accurately grasp the emotional state of the user by analyzing the emotion in consideration of both the acoustic and semantic characteristics of the user's voice.

또한, 음성 감성 결과와 텍스트 감성 결과가 서로 매칭되지 않는 경우에 결과를 조정하여 감성 분석 결과의 정확성을 보다 높여줄 수 있다.In addition, when the result of the speech emotion and the result of the text emotion do not match each other, the accuracy of the emotion analysis result may be further improved by adjusting the result.

본 발명에 의한 음성감성 분석 시스템을 구성하는 음성 감성 모델부는 여러 인공지능 모델을 통해 앙상블 학습을 진행하여 음성 감성 분석의 정확한 기준을 제공할 수 있다.The speech emotion model unit constituting the speech emotion analysis system according to the present invention may provide an accurate standard for speech emotion analysis by performing ensemble learning through several artificial intelligence models.

도 1은 본 발명에 의한 음성감성 분석 시스템의 개략적인 구성도,
도 2는 본 발명에 의한 음성감성 분석 시스템을 구성하는 음성 감성 모델부의 개략적인 구성도,
도 3 및 도 4는 본 발명에 의한 음성감성 분석 시스템을 구성하는 음성 감성 모델 학습부에 관한 설명도,
도 5는 본 발명에 의한 음성감성 분석 시스템을 구성하는 텍스트 감성 모델부의 개략적인 구성도,
도 6은 본 발명에 의한 음성감성 분석 시스템을 구성하는 조정부에 관한 설명도,
도 7은 본 발명에 의한 음성감성 분석 방법의 순서도이다.1 is a schematic configuration diagram of a voice sentiment analysis system according to the present invention;
2 is a schematic configuration diagram of a voice emotion model unit constituting a voice emotion analysis system according to the present invention;
3 and 4 are explanatory views of the voice emotion model learning unit constituting the voice emotion analysis system according to the present invention;
5 is a schematic configuration diagram of a text emotion model unit constituting a voice emotion analysis system according to the present invention;
6 is an explanatory diagram of an adjustment unit constituting a voice sentiment analysis system according to the present invention;
7 is a flowchart of a method for analyzing voice emotion according to the present invention.

이하에서는 본 발명의 구체적인 실시예에 대하여 도면을 참고하여 자세하게 설명하도록 한다.Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings.

도 1에는 본 발명에 의한 음성감성 분석 시스템(1)의 개략적인 구성도가 도시되어 있다.1 is a schematic configuration diagram of a voice sentiment analysis system 1 according to the present invention.

본 발명에 의한 음성감성 분석 시스템(1)은 음성 입력부(10), 텍스트 변환부(20), 음성 감성 분석부(30), 텍스트 감성 분석부(40) 및 감성 도출부(50)를 포함하여 이루어진다.The voice sentiment analysis system 1 according to the present invention includes a voice input unit 10 , a text conversion unit 20 , a voice sentiment analysis unit 30 , a text sentiment analysis unit 40 and an emotion derivation unit 50 , is done

음성 입력부(10)에서는 사용자의 음성을 입력받는다.The voice input unit 10 receives the user's voice.

텍스트 변환부(20)에서는 음성 입력부(10)를 통해 입력된 사용자의 음성을 텍스트로 변환한다.The text conversion unit 20 converts the user's voice input through the voice input unit 10 into text.

음성 감성 분석부(30)에서는 음성 입력부(10)를 통해 입력된 사용자의 음성에서 피치, 에너지, 에너지의 엔트로피(entropy of energy), 제로 크로싱 율(zero crossing rate), 스펙트로그램(spectrogram), 로그-멜-스펙트로그램(log-mel-spectrogram) 및 주파수 등의 음향적 특성을 통해 사용자의 감성 상태를 분석한다.In the voice emotion analyzer 30 , pitch, energy, entropy of energy, zero crossing rate, spectrogram, and log -Analyzes the user's emotional state through acoustic characteristics such as log-mel-spectrogram and frequency.

텍스트 감성 분석부(40)에서는 텍스트 변환부(20)에서 도출된 텍스트의 의미적 특성을 통해 사용자의 감성 상태를 분석한다.The text sentiment analysis unit 40 analyzes the user's emotional state through the semantic characteristics of the text derived from the text conversion unit 20 .

감성 도출부(50)는 음성 감성 분석부(30)에서 산출된 음성 감성 결과와 텍스트 감성 분석부(40)에서 산출된 텍스트 감성 결과를 통해 사용자의 감성을 도출한다. 보다 구체적으로, 음성 감성 결과와 텍스트 감성 결과가 서로 매칭되는 경우에는 매칭된 결과를 최종 결과로서 그대로 출력하고, 음성 감성 결과와 텍스트 감성 결과가 서로 매칭되지 않는 경우에는 조정 후 최종 결과를 출력한다.The emotion derivation unit 50 derives the user's emotion through the voice emotion result calculated by the voice emotion analysis unit 30 and the text emotion result calculated by the text emotion analysis unit 40 . More specifically, when the voice emotion result and the text emotion result match each other, the matched result is output as a final result, and when the voice emotion result and the text emotion result do not match, the final result after adjustment is output.

이러한 본 발명의 음성감성 분석 시스템(1)에 의하면, 사용자의 음성에서 음향적 특성 뿐만 아니라 의미적 특성을 함께 고려하여 감성 상태를 정확하게 파악하는 것이 가능하다.According to the voice sentiment analysis system 1 of the present invention, it is possible to accurately grasp the emotional state by considering not only the acoustic characteristics but also the semantic characteristics of the user's voice.

그리고 음향적 특성에 의한 음성 감성 결과와 의미적 특성에 의한 텍스트 감성 결과가 서로 일치하지 않는 경우에는 결과를 조정하는 것이 가능하기 때문에, 감성 분석 결과의 정확성을 보다 높여줄 수 있다.In addition, since it is possible to adjust the result when the result of the speech emotion by the acoustic characteristic and the text emotion result by the semantic characteristic do not match with each other, the accuracy of the emotion analysis result can be further improved.

본 발명에 의한 음성 감성 분석 시스템(1)은 음성 감성 모델부(60)와 텍스트 감성 모델부(70)를 더 포함할 수 있다.The voice sentiment analysis system 1 according to the present invention may further include a voice sentiment model unit 60 and a text sentiment model unit 70 .

음성 감성 모델부(60)에는 음성의 감성별 특징이 저장되어 음성 감성 분석부(30)에서의 분석 기준을 제공한다. 즉, 음성 감성 분석부(30)는 음성의 음향적 특성이 음성 감성 모델부(60)에서 어느 감성에 대응되는지 찾음으로써 음성 감성 결과를 도출할 수 있다.The voice emotion model unit 60 stores the characteristics for each emotion of the voice, and provides analysis criteria for the voice emotion analysis unit 30 . That is, the voice emotion analysis unit 30 may derive the voice emotion result by finding which emotion the acoustic characteristic of the voice corresponds to in the voice emotion model unit 60 .

음성 감성 모델부(60)는 보다 구체적으로 음성 데이터부(61), 음성 특징 추출부(63), 음성 감성 모델 학습부(64) 및 음성 감성 모델 저장부(65)를 포함하여 이루어진다. 도 2에는 이러한 음성 감성 모델부(60)의 개략적인 구성도가 도시되어 있다.More specifically, the voice emotion model unit 60 includes a voice data unit 61 , a voice feature extraction unit 63 , a voice emotion model learning unit 64 , and a voice sentiment model storage unit 65 . 2 is a schematic configuration diagram of the voice emotion model unit 60 is shown.

음성 데이터부(61)에는 감성이 라벨링된 다수의 음성 데이터가 저장된다. 음성 데이터는 드라마, 영화 또는 사용자가 녹음한 음성 파일로부터 얻어질 수 있으며, 해당 음성 데이터가 어떤 감성과 관련되어 있는지에 대한 데이터를 포함한다.The voice data unit 61 stores a plurality of voice data labeled with emotions. The voice data may be obtained from a drama, a movie, or a voice file recorded by a user, and includes data on which emotion the corresponding voice data is related to.

음성 특징 추출부(63)는 음성 데이터의 음성 특징 벡터를 추출한다. 음성 특징 추출부(63)에서 추출되는 음성 특징 벡터에는 피치, 에너지, 에너지의 엔트로피(entropy of energy), 제로 크로싱 율(zero crossing rate), 스펙트로그램(spectrogram), 로그-멜-스펙트로그램(log-mel-spectrogram) 및 주파수 등이 있을 수 있다. 음성 특징 추출부(63)는 예를 들어, MFCC(Mel Frequency Cepstral Coefficient)를 통해 음성 특징 벡터를 추출할 수 있다.The speech feature extraction unit 63 extracts speech feature vectors of speech data. The voice feature vector extracted by the voice feature extraction unit 63 includes pitch, energy, entropy of energy, zero crossing rate, spectrogram, and log-mel-spectrogram (log). -mel-spectrogram) and frequency. The speech feature extraction unit 63 may extract a speech feature vector through, for example, a Mel Frequency Cepstral Coefficient (MFCC).

음성 데이터부(61)의 음성 데이터는 전처리부(62)를 거친 후에 음성 특징 추출부(63)로 전송될 수 있다.The voice data of the voice data unit 61 may be transmitted to the voice feature extraction unit 63 after passing through the preprocessor 62 .

전처리부(62)는 프리-엠퍼시스(pre-emphasis) 처리부, 프레이밍(framing) 처리부 및 해밍 윈도우(hamming window) 처리부를 구비한다. 프리-엠퍼시스 처리부에서는 프리-엠퍼시스 필터를 사용하여 음성 신호의 고역 주파수 성분을 강조하고, 프레이밍 처리부에서는 음성 신호를 일정한 간격으로 분할한다. 그리고 해밍 윈도우 처리부에서는 프레이밍 처리부에서 분할되어 형성된 음성 신호의 각 프레임에 해밍 윈도우 함수를 곱하여 각 프레임의 경계에서 불연속성을 최소화한다.The pre-processing unit 62 includes a pre-emphasis processing unit, a framing processing unit, and a hamming window processing unit. The pre-emphasis processing unit uses a pre-emphasis filter to emphasize the high frequency component of the audio signal, and the framing processing unit divides the audio signal at regular intervals. In addition, the Hamming window processing unit minimizes discontinuity at the boundary of each frame by multiplying each frame of the voice signal divided and formed by the framing processing unit by a Hamming window function.

음성 감성 모델 학습부(64)에서는 인공지능 알고리즘을 통하여 음성 특징 추출부(63)에서 추출된 음성 특징 벡터를 학습한다. 이에 따라, 도 3에 도시되어 있는 바와 같이, 음성 특징 벡터들이 감성별로 분류된다. 이에 따라, 감성별로 음성 특징 벡터가 어떠한 특징을 가지는지 알 수 있다.The speech emotion model learning unit 64 learns the speech feature vector extracted from the speech feature extraction unit 63 through an artificial intelligence algorithm. Accordingly, as shown in FIG. 3 , speech feature vectors are classified by emotion. Accordingly, it is possible to know what kind of feature the speech feature vector has for each emotion.

참고로, 도 3에 도시되어 있는 감성별 분류는 예시적인 것으로서, 감성은 더 단순하거나 더 다양하게 분류될 수 있다.For reference, the classification by emotion shown in FIG. 3 is an example, and the emotion may be classified more simply or more diversely.

음성 감성 모델 저장부(65)에는 음성 감성 모델 학습부(64)에서 도출된 음성 감성 모델이 저장된다. 음성 감성 분석부(30)는 음성 감성 모델 저장부(65)에 저장된 음성 감성 모델을 참조하여 사용자 음성이 어떤 감성의 음성 특징 벡터와 유사한지 확인함으로써 사용자 음성의 감성을 분석할 수 있다.The voice emotion model storage unit 65 stores the voice emotion model derived from the voice emotion model learning unit 64 . The voice emotion analyzer 30 may analyze the emotion of the user's voice by referring to the voice emotion model stored in the voice emotion model storage unit 65 and confirming which emotion the voice feature vector the user's voice is similar to.

음성 감성 모델 학습부(64)는 예를 들어, SVM(Support Vector Machine) 모델, 및 CNN(Convolution Neural Network)과 RNN(Recurrent Neural Network)의 결합 모델 중 적어도 어느 하나를 통해 음성 특징 벡터를 학습할 수 있다.The speech emotion model learning unit 64 may learn a speech feature vector through at least one of, for example, a Support Vector Machine (SVM) model, and a combined model of a Convolution Neural Network (CNN) and a Recurrent Neural Network (RNN). can

SVM 모델을 사용하는 경우 음성 데이터에서 깊은 감성까지 학습하는 것이 가능하다. CNN과 RNN의 결합 모델을 사용하면 음성 데이터의 전처리 과정이 거의 필요하지 않다. CNN과 RNN의 결합 모델은 멜 스펙트로그램(mel spectrogram)에 윈도우 사이즈로 롤링(rolling)하듯이 학습을 진행할 수 있다.When using the SVM model, it is possible to learn from voice data to deep emotions. Using the combined model of CNN and RNN, almost no preprocessing of speech data is required. The combined model of CNN and RNN can be trained as if rolling with a window size on a mel spectrogram.

도 4에 도시되어 있는 바와 같이, SVM 모델, 및 CNN과 RNN의 결합 모델 모두를 통해 음성 특징 벡터를 앙상블 학습하는 것도 가능하다. 이 경우, 음성 특징 벡터를 보다 정확하게 감성별로 분류하는 것이 가능하며, 각 모델의 장점을 모두 발휘할 수 있다.As shown in FIG. 4 , it is also possible to ensemble learning a speech feature vector through both the SVM model and the combined model of CNN and RNN. In this case, it is possible to more accurately classify the speech feature vector by emotion, and it is possible to exhibit all the advantages of each model.

텍스트 감성 모델부(70)에는 텍스트의 감성별 특징이 저장되어 텍스트 감성 분석부(40)에서의 분석 기준을 제공한다. 즉, 텍스트 감성 분석부(40)는 텍스트의 특성이 텍스트 감성 모델부(70)에서 어느 감성에 대응되는지 찾음으로써 텍스트 감성 결과를 도출할 수 있다.The text sentiment model unit 70 stores the characteristics of each emotion of the text, and provides analysis criteria in the text sentiment analysis unit 40 . That is, the text sentiment analysis unit 40 may derive the text sentiment result by finding which emotion the text characteristic corresponds to in the text sentiment model unit 70 .

텍스트 감성 모델부(70)는 보다 구체적으로, 텍스트 데이터부(71), 텍스트 감성 모델 학습부(72) 및 텍스트 감성 모델 저장부(73)를 포함한다. 도 5에는 이러한 텍스트 감성 모델부(70)의 개략적인 구성도가 도시되어 있다.More specifically, the text sentiment model unit 70 includes a text data unit 71 , a text sentiment model learning unit 72 , and a text sentiment model storage unit 73 . 5 is a schematic configuration diagram of the text emotion model unit 70 is shown.

텍스트 데이터부(71)에는 감성이 라벨링된 텍스트 데이터가 저장된다. 이러한 텍스트 데이터에는 NRC 감성 사전, KNU 한국어 감성사전 등의 데이터가 포함될 수 있다. 예를 들어, 텍스트 데이터부(71)에서 어휘들은 5가지 감성에 따라 구분될 수 있다.The text data unit 71 stores emotionally labeled text data. Such text data may include data such as NRC sentiment dictionary and KNU Korean sentiment dictionary. For example, in the text data unit 71, vocabulary may be classified according to five emotions.

텍스트 감성 모델 학습부(72)에서는 인공지능 알고리즘을 통해 텍스트 데이터부(71)의 감성별 텍스트 데이터를 학습한다. 이에 따라, 텍스트의 특징들이 감성별로 분류된다. 이에 따라, 감성별로 텍스트가가 어떠한 특징을 가지는지 알 수 있다.The text emotion model learning unit 72 learns text data for each emotion of the text data unit 71 through an artificial intelligence algorithm. Accordingly, the characteristics of the text are classified according to emotion. Accordingly, it is possible to know what characteristics the text artist has for each emotion.

텍스트 감성 모델 저장부(73)에는 텍스트 감성 모델 학습부(72)에서 도출된 텍스트 감성 모델이 저장된다. 텍스트 감성 분석부(40)는 텍스트 감성 모델 저장부(73)에 저장된 텍스트 감성 모델을 참조하여 사용자 음성에서 도출된 텍스트가 어떤 감성의 텍스트 특징과 유사한지 확인함으로써 텍스트의 감성을 분석할 수 있다.The text sentiment model storage unit 73 stores the text sentiment model derived from the text sentiment model learning unit 72 . The text sentiment analysis unit 40 may analyze the sentiment of the text by referring to the text sentiment model stored in the text sentiment model storage unit 73 and confirming which sentiment text characteristic the text derived from the user's voice is similar to.

텍스트 감성 모델 학습부(72)는 예를 들어, SKT brain에서 개발한 KoBERT를 이용하여 학습을 진행할 수 있다. KoBERT는 기존 구글이 개발한 BERT의 한국어 성능 한계를 극복하기 위해 개발된 것으로서, 임베딩은 BERT를 기반으로 하고 분류는 CNN을 기반으로 할 수 있다.The text emotion model learning unit 72 may perform learning using, for example, KoBERT developed by SKT Brain. KoBERT was developed to overcome the Korean performance limitations of BERT developed by Google, and embedding can be based on BERT and classification can be based on CNN.

감성 도출부(50)는 매칭 확인부(52)와 조정부(51)를 더 구비할 수 있다. 도 6에는 이러한 감성 도출부(50)에 관한 설명도가 도시되어 있다.The emotion derivation unit 50 may further include a matching confirmation unit 52 and an adjustment unit 51 . 6 is an explanatory diagram of the emotion derivation unit 50 is shown.

매칭 확인부(52)는 음성 감성 결과와 텍스트 감성 결과가 매칭되는지를 확인하여, 매칭되는 경우에는 그대로 최종 결과를 출력하고, 매칭되지 않는 경우에는 조정부(51)가 동작하도록 한다.The matching confirmation unit 52 checks whether the voice emotion result and the text emotion result match, and if they match, the final result is output as it is, and if not, the adjustment unit 51 operates.

조정부(51)는 음성 감성 결과와 텍스트 감성 결과가 매칭되지 않는 경우에 사용자의 감성을 도출하는 역할을 한다.The adjustment unit 51 serves to derive the user's emotion when the speech emotion result and the text emotion result do not match.

예를 들어, 조정부(51)는 사용자 음성의 원 데이터를 의사결정 나무 알고리즘을 통해 분석한 후, 원 데이터의 분석 결과에 대한 음성, 텍스트 각각의 영향도를 분석하고 영향도에 따라 각 감성 분석 모델에 상대적인 가중치를 부과할 수 있다. 그리고 가중치를 적용하였을 때 가장 높은 확률로 도출된 감성을 감성 분석의 최종 결과로서 출력할 수 있다. 의사결정 나무 알고리즘의 분리 기준에는 예를 들어, 음성 주파수의 크기, 음절의 길이 등이 포함될 수 있다.For example, the adjustment unit 51 analyzes the raw data of the user's voice through a decision tree algorithm, and then analyzes the influence of each voice and text on the analysis result of the raw data, and each sentiment analysis model according to the influence. can be given a relative weight. And the emotion derived with the highest probability when weight is applied can be output as the final result of sentiment analysis. Separation criteria of the decision tree algorithm may include, for example, the size of a voice frequency, the length of a syllable, and the like.

가장 높은 확률로 도출된 감성을 감성 분석의 최종 결과로서 출력할 수 있다. The emotion derived with the highest probability may be output as the final result of sentiment analysis.

이하에서는 본 발명에 의한 음성감성 분석 방법에 대하여 설명하도록 한다. 본 발명에 의한 음성감성 분석 방법에 대해 설명하면서, 음성감성 분석 시스템(1)의 설명시 언급한 부분에 대해서는 자세한 설명을 생략할 수 있다.Hereinafter, a method for analyzing voice emotion according to the present invention will be described. While describing the voice sentiment analysis method according to the present invention, detailed descriptions of the parts mentioned in the description of the voice sentiment analysis system 1 may be omitted.

도 7에는 본 발명에 의한 음성감성 분석 방법의 순서도가 도시되어 있다.7 is a flowchart of a method for analyzing voice emotion according to the present invention.

본 발명에 의한 음성감성 분석 방법은 사용자 음성 입력단계(S10), 텍스트 변환단계(S20), 감성 분석단계(S30) 및 감성 도출단계(S40)를 포함한다.The voice emotion analysis method according to the present invention includes a user voice input step (S10), a text conversion step (S20), a sentiment analysis step (S30) and an emotion derivation step (S40).

사용자 음성 입력단계(S10)에서는 사용자의 음성을 입력받는다.In the user voice input step S10, the user's voice is input.

텍스트 변환단계(S20)에서는 사용자 음성 입력단계(S10)에서 입력된 사용자 음성을 텍스트로 변환한다.In the text conversion step (S20), the user voice input in the user voice input step (S10) is converted into text.

감성 분석단계(S30)에서는 사용자 음성 입력단계(S10)에서 입력된 사용자 음성의 감성을 분석하고, 텍스트 변환단계(S20)에서 도출된 텍스트의 감성을 분석한다.In the sentiment analysis step (S30), the emotion of the user's voice input in the user voice input step (S10) is analyzed, and the emotion of the text derived in the text conversion step (S20) is analyzed.

사용자 음성의 감성 분석시에는 사용자 음성의 음향적 특징을 통해 감성 상태를 분석하며, 미리 학습되어 도출된 음성 감성 모델을 참조하여 음성 감성 결과를 도출한다.In the emotion analysis of the user's voice, the emotional state is analyzed through the acoustic characteristics of the user's voice, and the result of the voice emotion is derived by referring to the pre-learned and derived voice emotion model.

텍스트 감성 분석시에는 텍스트의 의미적 특징을 통해 감성 상태를 분석하며, 미리 학습되어 도출된 텍스트 감성 모델을 참조하여 텍스트 감성 결과를 도출한다.In text sentiment analysis, the emotional state is analyzed through the semantic characteristics of the text, and text sentiment results are derived by referring to the pre-learned and derived text sentiment model.

감성 도출단계(S40)에서는 감성 분석단계(S30)에서 도출된 음성 감성 결과와 텍스트 감성 결과를 통해 사용자의 감성을 도출한다.In the emotion derivation step (S40), the user's emotion is derived through the voice emotion result and the text emotion result derived in the emotion analysis step (S30).

보다 구체적으로는, 감성 도출단계(S40)는 매칭 확인단계(S41)와 조정단계(S42)를 포함하여, 매칭 확인단계(S41)에서 음성 감성 결과와 텍스트 감성 결과가 매칭되는 것으로 확인되면 매칭된 감성을 최종 결과로서 그대로 출력하고 음성 감성 결과와 텍스트 감성 결과가 매칭되지 않는 것으로 확인되면 조정단계(S42)가 진행되도록 한다. 조정단계(S42)에서는 감성 결과를 조정한 후 최종 결과를 도출한다.More specifically, the emotion derivation step (S40) includes a matching check step (S41) and an adjustment step (S42), and when it is confirmed that the voice emotion result and the text emotion result match in the matching check step (S41), the matched The emotion is output as it is as the final result, and when it is confirmed that the voice emotion result and the text emotion result do not match, the adjustment step S42 is performed. In the adjustment step (S42), the final result is derived after adjusting the emotional result.

조정단계(S42)에서의 조정 작업은 사용자 음성 입력단계(S10)에서 입력된 사용자 음성의 원 데이터를 의사결정 나무 알고리즘을 통해 분석하고 가장 높은 확률로 도출된 감성을 최종 결과로서 출력함으로써 이루어질 수 있다.The adjustment operation in the adjustment step (S42) can be made by analyzing the raw data of the user's voice input in the user voice input step (S10) through a decision tree algorithm and outputting the emotion derived with the highest probability as the final result. .

본 발명에 의한 음성감성 분석 시스템(1) 및 방법은 하나 또는 다수의 하드웨어를 통해 구현되거나 하나 또는 다수의 소프트웨어를 통해 구현될 수 있다. 또는 하드웨어와 소프트웨어를 통해 구현될 수도 있다.The voice sentiment analysis system 1 and method according to the present invention may be implemented through one or more hardware or may be implemented through one or more software. Alternatively, it may be implemented through hardware and software.

본 발명의 권리범위는 상술한 실시예에 한정되는 것이 아니라 첨부된 특허청구범위 내에서 다양한 형태의 실시예로 구현될 수 있다. 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 변형 가능한 다양한 범위까지 본 발명의 청구범위 기재의 범위 내에 있는 것으로 본다.The scope of the present invention is not limited to the above-described embodiments, but may be implemented in various types of embodiments within the scope of the appended claims. Without departing from the gist of the present invention claimed in the claims, it is considered to be within the scope of the description of the claims of the present invention to various extents that can be modified by any person skilled in the art to which the invention pertains.

1 : 음성감성 분석 시스템
10 : 음성 입력부 20 : 텍스트 변환부
30 : 음성 감성 분석부 40 : 텍스트 감성 분석부
50 : 감성 도출부 51 : 조정부
60 : 음성 감성 모델부 61 : 음성 데이터부
62 : 전처리부 63 : 음성 특징 추출부
64 : 음성 감성 모델 학습부 65 : 음성 감성 모델 저장부
70 : 텍스트 감성 모델부 71 : 텍스트 데이터부
72 : 텍스트 감성 모델 학습부 73 : 텍스트 감성 모델 저장부1: Voice emotion analysis system
10: voice input unit 20: text conversion unit
30: voice sentiment analysis unit 40: text sentiment analysis unit
50: emotion elicitation unit 51: adjustment unit
60: voice emotional model unit 61: voice data unit
62: preprocessor 63: voice feature extraction unit
64: voice emotional model learning unit 65: voice emotional model storage unit
70: text emotion model unit 71: text data unit
72: text emotion model learning unit 73: text emotion model storage unit

Claims

a voice input unit receiving a user's voice;
a text converter for converting the user's voice into text;
a voice emotion analysis unit for analyzing the emotion of the user's voice;
a text sentiment analysis unit that analyzes the emotion of the text; and
The user's emotion is derived through the voice emotion result calculated by the voice emotion analysis unit and the text emotion result calculated by the text emotion analysis unit, but if the voice emotion result and the text emotion result do not match, the result is adjusted A voice emotion analysis system comprising a; emotion derivation unit.

According to claim 1,
A voice emotion model unit that stores the characteristics of each emotion of a voice to provide analysis criteria in the voice emotion analysis unit; and
Speech sentiment analysis system, characterized in that it further comprises a text sentiment model unit for storing the characteristics of each emotion of the text to provide an analysis standard in the text sentiment analysis unit.

3. The method of claim 2,
The voice emotion model unit,
A voice data unit in which emotionally labeled voice data is stored;
a voice feature extracting unit for extracting a voice feature vector of the voice data;
a speech emotion model learning unit for learning the speech feature vector; and
Speech emotion analysis system, characterized in that it comprises a speech emotion model storage unit in which the speech emotion model derived from the speech emotion model learning unit is stored.

4. The method of claim 3,
The speech emotion model learning unit is a speech emotion analysis system, characterized in that it learns the speech feature vector through at least one of an SVM model and a combined model of CNN and RNN.

4. The method of claim 3,
The speech emotion model learning unit ensembles learning the speech feature vector through both an SVM model and a combined model of CNN and RNN.

3. The method of claim 2,
The text emotion model unit,
A text data unit in which emotional labeling text data is stored;
a text emotion model learning unit for learning the text data; and
and a text emotion model storage unit in which the text emotion model derived from the text emotion model learning unit is stored.

According to claim 1,
The emotion derivation unit includes an adjustment unit that analyzes the user's voice through a decision tree algorithm,
Voice emotion analysis system, characterized in that the user's emotion is derived through the adjusting unit when the speech emotion result and the text emotion result do not match.

a user voice input step of receiving a user voice;
a text conversion step of converting the user's voice into text;
a sentiment analysis step of analyzing the emotion of the user's voice and analyzing the emotion of the text; and
Sentiment derivation step of deriving the user's emotion through the speech emotion result and the text emotion result derived in the emotion analysis step, but adjusting the result when the speech emotion result and the text emotion result do not match; analysis method.