KR20220130353A

KR20220130353A - Speech ballon expression method and system for voice messages reflecting emotion classification based on voice

Info

Publication number: KR20220130353A
Application number: KR1020210035129A
Authority: KR
Inventors: 석현정; 유춘 얀; 첸 친유에
Original assignee: 한국과학기술원
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2022-09-27
Also published as: KR102583986B1

Abstract

Disclosed are a method and system for expressing a speech bubble of a speech message reflecting an emotion classification based on a voice. The method for expressing the speech bubble of the speech message performed by a speech bubble expression system according to one embodiment may comprise: a step of receiving the speech data; a step of classifying the emotion information using the acoustic attribute information included in the received speech data; and a step of expressing the additional information corresponding to the received speech data according to the classified emotion information. Therefore, the present invention is capable of increasing a preference of the users.

Description

SPEECH BALLON EXPRESSION METHOD AND SYSTEM FOR VOICE MESSAGES REFLECTING EMOTION CLASSIFICATION BASED ON VOICE

아래의 설명은 음성 데이터에 기반하여 감정 분류를 수행하는 기술에 관한 것으로, 말풍선을 통해 음성 메시지에 내포된 감정적 특징을 표현하는 방법 및 시스템에 관한 것이다. The following description relates to a technique for performing emotion classification based on voice data, and to a method and system for expressing emotional characteristics embedded in a voice message through a speech bubble.

인스턴트 메신저를 통해 다른 사용자와 메시지를 주고 받는 서비스의 이용이 폭발적으로 증가되고 있다. 사용자들은 다른 사용자들과 음성 메시지 또는 텍스트 메시지를 송수신하며, 자신의 감정을 표현하기 위해 별도로 이모티콘이나 스티커를 사용하고 있다.The use of services for exchanging messages with other users through instant messengers is increasing explosively. Users send and receive voice messages or text messages with other users, and separately use emoticons or stickers to express their emotions.

한국인들의 경우 음성 메시지를 사용하는 빈도가 높은 편이 아니지만, 중국과 같이 문자를 기입하는 과정이 복잡한 문화권에서는 음성 메시지를 사용하는 빈도가 빈번하다. In the case of Koreans, the frequency of using voice messages is not high, but in a culture where the process of entering text is complicated, such as China, the frequency of using voice messages is frequent.

이와 같이, 사용자로부터 입력된 음성 메시지를 텍스트 메시지로 변환하여 사용자의 의견과 감정을 공유하기에는 번거로움이 있다. 이에, 메시지 말풍선의 변화를 통해 음성 메시지에 내포된 감정적 특징을 직관적으로 표현하기 위한 기술이 요구된다. As described above, it is cumbersome to convert a voice message input from the user into a text message to share the user's opinion and emotion. Accordingly, a technique for intuitively expressing the emotional characteristics contained in the voice message through the change of the message speech bubble is required.

음성 데이터에 기반한 감정 분류가 반영된 음성 메시지의 말풍선 색상 및 말풍선 내 시그널 표시의 두께 변화를 통해 음성 메시지에 내포된 감정적 특징을 표현하는 방법 및 시스템을 제공할 수 있다.It is possible to provide a method and system for expressing emotional characteristics contained in a voice message through a change in the color of a speech bubble of a voice message in which emotional classification based on voice data is reflected and the thickness of a signal display in the speech bubble.

말풍선 표현 시스템에 의해 수행되는 음성 메시지의 말풍선 표현 방법은, 음성 데이터를 수신하는 단계; 상기 수신된 음성 데이터에 포함된 음향적 속성 정보를 이용하여 감정 정보를 분류하는 단계; 및 상기 분류된 감정 정보에 따라 상기 수신된 음성 데이터에 대응되는 부가 정보를 표현하는 단계를 포함할 수 있다. A speech bubble expression method of a voice message performed by a speech bubble expression system, the method comprising: receiving voice data; classifying emotion information using acoustic attribute information included in the received voice data; and expressing additional information corresponding to the received voice data according to the classified emotion information.

상기 감정 정보를 분류하는 단계는, 감정 인식을 위한 학습 모델을 이용하여 상기 수신된 음성 데이터에 대한 감정 정보를 분류하는 단계를 포함하고, 상기 감정 정보는, 중립, 화남, 흥분 및 절망, 고요함, 슬픔 중 어느 하나 이상을 포함할 수 있다. The step of classifying the emotion information includes classifying the emotion information for the received voice data using a learning model for emotion recognition, wherein the emotion information is neutral, angry, excited and desperate, quiet, It may include any one or more of sadness.

상기 학습 모델은, 사용자마다 동일 문장이 발화되도록 하여 학습 결과를 판단하기 위한 기준선을 설정하고, 상기 설정된 기준선을 기준으로 상기 수신된 음성 메시지가 기 설정된 기준 이상의 변화를 보이는 경우 상기 수신된 음성 메시지로부터 인식된 감정 정보가 분류되도록 학습될 수 있다. The learning model sets a baseline for determining a learning result by allowing the same sentence to be uttered for each user, and when the received voice message shows a change greater than or equal to a preset criterion based on the set baseline, from the received voice message The recognized emotion information may be learned to be classified.

상기 학습 모델은, 상기 수신된 음성 메시지로부터 분류된 감정 정보가 상기 설정된 기준선과 비교되어 상기 분류된 감정 정보의 정도가 도출되도록 학습된 것일 수 있다. The learning model may be one that has been trained to derive the degree of the classified emotion information by comparing the emotion information classified from the received voice message with the set baseline.

상기 표현하는 단계는, 상기 수신된 음성 데이터에 대한 음향적 분석을 통해 상기 수신된 음성 데이터에 대한 음의 크기를 획득하고, 상기 획득된 음의 크기에 기초하여 말풍선 내 시그널 표시의 굵기 정보를 조절하여 상기 수신된 음성 데이터에 대한 음의 크기를 반영하는 단계를 포함할 수 있다. The expressing may include acquiring a volume of the received voice data through acoustic analysis of the received voice data, and adjusting the thickness information of the signal display in the speech bubble based on the acquired volume. to reflect the volume of the received voice data.

상기 표현하는 단계는, 상기 획득된 음의 크기가 기 설정된 값 이상일 경우, 말풍선 내 시그널 표시의 굵기 정보를 두껍게 시각화하는 단계를 포함할 수 있다. The expressing may include thickly visualizing the thickness information of the signal display in the speech bubble when the acquired sound level is greater than or equal to a preset value.

상기 표현하는 단계는, 상기 분류된 감정 정보에 기초하여 상기 분류된 감정 정보에 따라 미리 설정된 색상 정보를 상기 수신된 음성 데이터에 대한 말풍선의 배경색에 매핑하고, 상기 매핑된 말풍선의 배경색을 시각화하는 단계를 포함할 수 있다.The expressing may include mapping preset color information according to the classified emotion information based on the classified emotion information to a background color of a speech bubble for the received voice data, and visualizing the mapped background color of the speech bubble. may include

상기 표현하는 단계는, 상기 분류된 감정 정보에 따라 미리 설정된 색상 정보를 기준으로 색상값 범위에 기초하여 상기 수신된 음성 데이터에 대한 말풍선의 배경색을 조절하는 단계를 포함할 수 있다. The expressing may include adjusting a background color of a speech bubble for the received voice data based on a color value range based on color information preset according to the classified emotion information.

상기 수신하는 단계는, 인스턴트 메시지 서비스를 제공하는 메신저 또는 상기 인스턴트 메시지 서비스를 제공하는 메신저 기능이 포함된 SNS에서 송수신되는 음성 데이터 기반의 인스턴트 메시지를 수신하는 단계를 포함할 수 있다. The receiving may include receiving an instant message based on voice data transmitted/received from a messenger that provides an instant message service or an SNS that includes a messenger function that provides the instant message service.

말풍선 표현 시스템에 의해 수행되는 음성 메시지의 말풍선 표현 방법을 실행시키기 위해 컴퓨터 판독가능한 저장 매체에 저장된 컴퓨터 프로그램은, 음성 데이터를 수신하는 단계; 상기 수신된 음성 데이터에 포함된 음향적 속성 정보를 이용하여 감정 정보를 분류하는 단계; 및 상기 분류된 감정 정보에 따라 상기 수신된 음성 데이터에 대응되는 부가 정보를 표현하는 단계를 포함할 수 있다. A computer program stored in a computer-readable storage medium for executing a speech bubble expression method of a voice message performed by the speech bubble expression system includes the steps of: receiving speech data; classifying emotion information using acoustic attribute information included in the received voice data; and expressing additional information corresponding to the received voice data according to the classified emotion information.

말풍선 표현 시스템은, 음성 데이터를 수신하는 음성 수신부; 상기 수신된 음성 데이터에 포함된 음향적 속성 정보를 이용하여 감정 정보를 분류하는 감정 분류부; 및 상기 분류된 감정 정보에 따라 상기 수신된 음성 데이터에 대응되는 부가 정보를 표현하는 감정 표현부를 포함할 수 있다. The speech bubble expression system includes: a voice receiver configured to receive voice data; an emotion classification unit for classifying emotion information using acoustic attribute information included in the received voice data; and an emotion expression unit for expressing additional information corresponding to the received voice data according to the classified emotion information.

상기 감정 분류부는, 감정 인식을 위한 학습 모델을 이용하여 상기 수신된 음성 데이터에 대한 감정 정보를 분류하는 것을 포함하고, 상기 감정 정보는, 중립, 화남, 흥분 및 절망, 고요함, 슬픔 중 어느 하나 이상을 포함할 수 있다. The emotion classification unit includes classifying emotion information on the received voice data using a learning model for emotion recognition, wherein the emotion information is any one or more of neutrality, anger, excitement and despair, stillness, and sadness may include

상기 감정 표현부는, 상기 수신된 음성 데이터에 대한 음향적 분석을 통해 상기 수신된 음성 데이터에 대한 음의 크기를 획득하고, 상기 획득된 음의 크기에 기초하여 말풍선 내 시그널 표시의 굵기 정보를 조절하여 상기 수신된 음성 데이터에 대한 음의 크기를 반영할 수 있다. The emotion expression unit obtains the sound level of the received speech data through acoustic analysis of the received speech data, and adjusts the thickness information of the signal display in the speech bubble based on the obtained sound level. The volume of the received voice data may be reflected.

상기 감정 표현부는, 상기 획득된 음의 크기가 기 설정된 값 이상일 경우, 말풍선 내 시그널 표시의 굵기 정보를 두껍게 시각화할 수 있다. The emotion expression unit may thickly visualize the thickness information of the signal display in the speech bubble when the acquired sound level is greater than or equal to a preset value.

상기 감정 표현부는, 상기 분류된 감정 정보에 기초하여 상기 분류된 감정 정보에 따라 미리 설정된 색상 정보를 상기 수신된 음성 데이터에 대한 말풍선의 배경색에 매핑하고, 상기 매핑된 말풍선의 배경색을 시각화할 수 있다. The emotion expression unit may map preset color information according to the classified emotion information based on the classified emotion information to a background color of a speech bubble for the received voice data, and visualize the mapped background color of the speech bubble. .

상기 감정 표현부는, 상기 분류된 감정 정보에 따라 미리 설정된 색상 정보를 기준으로 색상값 범위에 기초하여 상기 수신된 음성 데이터에 대한 말풍선의 배경색을 조절할 수 있다. The emotion expression unit may adjust a background color of a speech bubble for the received voice data based on a color value range based on color information preset according to the classified emotion information.

상기 음성 수신부는, 인스턴트 메시지 서비스를 제공하는 메신저 또는 상기 인스턴트 메시지 서비스를 제공하는 메신저 기능이 포함된 SNS에서 송수신되는 음성 데이터 기반의 인스턴트 메시지를 수신할 수 있다. The voice receiver may receive an instant message based on voice data transmitted/received from a messenger providing an instant message service or an SNS including a messenger function providing the instant message service.

사용자로부터 입력된 음성 메시지로부터 해석된 감정 분류에 따라 미리 설정된 색상 정보가 말풍선의 배경 정보에 입혀지고, 텍스트 대신 말풍선 내의 시그널 표시의 굵기로 음성의 크기를 반영함으로써 보다 직관적으로 사용자의 음성 데이터를 포함하는 감정 정보를 파악할 수 있다. 이에, 사용자의 개인 정보를 침해하지 않아 사용자들의 선호도가 높아질 수 있다. According to the emotional classification interpreted from the voice message input from the user, preset color information is applied to the background information of the speech bubble, and the user's voice data is included more intuitively by reflecting the size of the voice with the thickness of the signal display in the speech bubble instead of text emotional information can be identified. Accordingly, the user's preference may be increased by not infringing on the user's personal information.

음성 데이터를 텍스트 데이터로 변환하거나 문맥을 읽어내는 기술과는 달리, 음성 데이터의 음향적 속성 정보를 기반으로 말풍선의 배경색 및 말풍선 내의 시그널 표시를 변화하여 시각화함으로써 보다 단순한 연산을 통해 메신저나 SNS에 손쉽게 적용할 수 있다. Unlike the technology that converts voice data into text data or reads the context, it can be visualized by changing the background color of the speech bubble and the signal display in the speech bubble based on the acoustic property information of the speech data, so that it can be easily transmitted to messengers or SNS through simpler calculations. can be applied.

도 1은 일 실시예에 따른 말풍선 표현 시스템에서 말풍선 표현 동작을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 말풍선 표현 시스템의 구성을 설명하기 위한 블록도이다.
도 3은 일 실시예에 따른 말풍선 표현 시스템에서 음성 메시지의 말풍선 표현 방법을 설명하기 위한 흐름도이다.
도 4는 일 실시예에 있어서, 학습 모델을 이용하여 수신된 음성 데이터에 대한 감정 정보를 분류하는 동작을 설명하기 위한 예이다.
도 5는 일 실시예에 있어서, 감정 정보에 따라 말풍선을 시각화하는 것을 설명하기 위한 예이다.
도 6은 일 실시예에 있어서, 음성 메시지를 이용하여 판단된 감정 정보에 따라 말풍선을 시각화하는 것을 설명하기 위한 예이다.
도 7은 일 실시예에 있어서, 음성 메시지를 이용하여 감정을 분석하기 위한 인터페이스를 설명하기 위한 예이다.
도 8은 일 실시예에 있어서, 감정 정보를 분류하는 동작을 설명하기 위한 예이다.
도 9는 일 실시예에 있어서, 학습 모델을 이용하여 사용자의 감정을 판단하고 감정 정도를 표시하는 것을 설명하기 위한 예이다.1 is a view for explaining a speech bubble expression operation in the speech balloon expression system according to an embodiment.
2 is a block diagram for explaining the configuration of a speech bubble expression system according to an embodiment.
3 is a flowchart illustrating a method for expressing a speech bubble of a voice message in a speech bubble expression system according to an embodiment.
4 is an example for explaining an operation of classifying emotion information on received voice data using a learning model, according to an embodiment.
5 is an example for explaining visualization of a speech bubble according to emotion information, according to an embodiment.
6 is an example for explaining visualization of a speech bubble according to emotion information determined using a voice message, according to an embodiment.
7 is an example for explaining an interface for analyzing an emotion using a voice message, according to an embodiment.
8 is an example for explaining an operation of classifying emotion information according to an embodiment.
9 is an example for explaining how to determine a user's emotion and display an emotion level using a learning model, according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

실시예에서는 음성 메시지로 의사소통을 하는 경우에, 음성 메시지의 음향적 속성 정보에 기초하여 화자의 감정적 상태를 파악하는 동작에 대하여 설명하기로 한다. 이때, 파악된 감정적 상태에 따라 음성 메시지의 말풍선 배경색이 감정의 종류에 따라 매핑될 수 있으며, 추가적으로 목소리가 큰 경우에 말풍선 내 포함된 시그널 신호 표시가 두껍게(bold) 시각화되는 동작에 대하여 상세하게 설명하기로 한다.In the embodiment, in the case of communication through a voice message, an operation of recognizing the emotional state of a speaker based on acoustic attribute information of the voice message will be described. In this case, the background color of the speech bubble of the voice message can be mapped according to the type of emotion according to the identified emotional state. In addition, when the voice is loud, the bold visualization of the signal signal included in the speech bubble is described in detail. decide to do

도 1은 일 실시예에 따른 말풍선 표현 시스템에서 말풍선 표현 동작을 설명하기 위한 도면이다. 1 is a view for explaining a speech bubble expression operation in the speech balloon expression system according to an embodiment.

말풍선 표현 시스템은 사용자로부터 음성 메시지(101)가 입력됨을 수신할 수 있다. 일례로, 인스턴트 메시지 서비스를 제공하는 메신저 또는 인스턴트 메시지 서비스를 제공하는 메신저 기능이 포함된 SNS에서 동작하는 환경을 설명하기로 한다. 메신저 또는 메신저 기능이 포함된 SNS에서 사용자로부터 음성 메시지가 입력됨을 수신할 수 있고, 사용자들 간 음성 메시지가 송수신될 수 있다. 말풍선 표현 시스템은 사용자로부터 음성 메시지가 입력됨을 수신할 수 있다. 사용자는 메신저 내에서 음성 메시지를 입력할 수 있고, 또는 사용자는 SNS 내에서 음성 데이터를 통해 댓글을 포스팅할 수 있다. 이와 같이 사용자로부터 입력된 음성 메시지가 녹음될 수 있다. The speech bubble expression system may receive that the voice message 101 is input from the user. As an example, a description will be given of an environment operating in a messenger that provides an instant message service or an SNS that includes a messenger function that provides an instant message service. In a messenger or SNS including a messenger function, a voice message input may be received from a user, and a voice message may be transmitted/received between users. The speech bubble expression system may receive that a voice message is input from the user. The user may input a voice message in the messenger, or the user may post a comment through voice data in the SNS. In this way, the voice message input from the user may be recorded.

도 7은 음성 메시지를 이용하여 감정을 분석하기 위한 인터페이스를 설명하기 위한 예이다. 도 7을 참고하면, 왼쪽 그림은 흥분된 감정을 타겟으로 하는 대화 시나리오를 나타낸 예이다. 사용자(참가자)는 대화를 확인하고 메시지를 녹음할 수 있다. 오른쪽 그림은 쌍별 평가 설문 조사를 나타낸 것이다. 사용자(참가자)는 자신의 목소리를 재생하고 감정 전달 및 사용 의향에 관한 현재 서비스를 비교할 수 있다. 7 is an example for explaining an interface for analyzing an emotion using a voice message. Referring to FIG. 7 , the left figure is an example of a dialogue scenario targeting excited emotions. Users (participants) can view conversations and record messages. The figure on the right shows the pairwise evaluation questionnaire. Users (participants) can reproduce their own voices and compare current services on conveying emotions and intent to use.

예를 들면, 말풍선 표현 시스템은 가이드보기, 녹음보기 및 평가보기를 포함하는 인터페이스를 제공할 수 있다. 가이드보기는 작업에 대한 기본 설명을 제공할 수 있다. 사용자는 제공된 기본 설명을 확인 후에 사용자의 이름을 입력할 수 있다. 녹음보기는 음성 메시지를 전송하는데 필요한 모든 기능을 제공할 수 있다. 사용자는 녹음 내용을 자유롭게 녹음하고 재생할 수 있다. 사용자로부터 '전송하기'가 선택됨에 따라 녹음이 학습 모델로 입력될 수 있고 감정 인식이 완료될 수 있다. 평가보기는 사용자에게 기본 채팅 화면 및 말풍선의 배경색이 있는 채팅 화면을 포함하는 한 쌍의 메시지 화면을 제공할 수 있다. 사용자는 결과를 평가하기 전에 자신의 음성 메시지를 재생할 수 있다. For example, the speech bubble expression system may provide an interface including a guide view, a recording view, and an evaluation view. The guide view can provide a basic description of the task. The user can enter the user's name after checking the basic description provided. Recording view can provide all the necessary functions to send voice messages. Users are free to record and play back the recordings. As 'send' is selected from the user, the recording may be input to the learning model and emotion recognition may be completed. The evaluation view may provide the user with a pair of message screens including a basic chatting screen and a chatting screen having a background color of a speech bubble. Users can play their own voice messages before evaluating the results.

말풍선 표현 시스템은 사용자들이 메시지 발신자로서 서비스를 체험하고, 서비스와 비교하여 평가할 수 있는 테스트 플랫폼을 제공할 수 있다. 예를 들면, 말풍선 표현 시스템은 사용자의 음성 데이터로부터 흥분, 화남(분노), 슬픔, 고요함을 포함하는 4가지의 감정 데이터를 분류할 수 있다. 도 8을 참고하면, 쾌락과 각성은 감정의 처음 두 개의 독립적인 축이며, 각 축이 직각으로 교차됨으로써 circumplex model 모델에 따라 2차원 감정 공간이 생성될 수 있다. 그런 다음 4개의 사분면을 식별하고 흥분, 분노, 슬픔, 고요함으로 분류할 수 있다. The speech bubble expression system may provide a test platform for users to experience the service as a message sender, and to compare and evaluate the service. For example, the speech bubble expression system may classify four types of emotional data including excitement, anger (anger), sadness, and stillness from the user's voice data. Referring to FIG. 8 , pleasure and arousal are the first two independent axes of emotion, and by crossing each axis at right angles, a two-dimensional emotional space can be created according to the circumplex model model. You can then identify four quadrants and classify them as excitement, anger, sadness, and stillness.

말풍선 표현 시스템은 테스트 플랫폼을 통해 각 감정 데이터에 대해 하나의 대화 시나리오를 생성하여 사용자들이 대화 분위기와 발신자의 감정을 명확하게 이해할 수 있도록 할 수 있다. 또한, 말풍선 표현 시스템은 테스트 플랫폼을 통해 음성 데이터와 피드백을 녹음하고 재생할 수 있다. The speech bubble expression system can generate one dialogue scenario for each emotional data through the test platform so that users can clearly understand the conversation atmosphere and the sender's emotions. In addition, the speech bubble expression system can record and reproduce voice data and feedback through the test platform.

말풍선 표현 시스템은 음성 메시지에 대한 말풍선 색상의 감정 효과를 조사하기 위해 사용자를 연구할 수 있다. 사용자(참가자)는 메시지 발신자의 역할을 수행할 수 있다. 먼저 제공된 대화를 확인하고 메시지 발신자의 감정을 이해할 수 있다. 사용자는 녹음된 음성 메시지를 검색하고, 녹음이 만족할 때까지 다시 시도할 수 있다. 이와 같이, 음성 녹음이 수행된 후, 오른쪽 그림과 같이 기본의 채팅 메시지 화면과 말풍선의 색상이 표현된 채팅 메시지 화면이 표시될 수 있다. 사용자는 감정 전달 및 사용 의향과 관련하여 음성 메시지를 평가할 수 있다. 말풍선 표현 시스템은 리커트 척도의 각 기준 아래 양극 척도를 제안할 수 있다. 여기서, -2는 기본 버전을 확실히 선호하고, 0은 기본 버전과 말풍선 색상 버전이 동일하고, +2는 말풍선 색상을 선호하는 것을 의미한다. 말풍선 표현 시스템은 녹음된 음성 데이터와 설문 응답을 함께 보관하여 저장할 수 있다. The speech bubble expression system may study the user to investigate the emotional effect of the speech bubble color on the voice message. A user (participant) can act as a message sender. You can first check the conversation provided and understand the feelings of the sender of the message. Users can search for recorded voice messages and try again until they are satisfied with the recording. In this way, after the voice recording is performed, a basic chatting message screen and a chatting message screen in which the color of a speech bubble is expressed may be displayed as shown in the figure on the right. Users can rate voice messages in terms of conveying emotions and willingness to use them. The speech bubble expression system may suggest a bipolar scale under each criterion of the Likert scale. Here, -2 means that the default version is definitely preferred, 0 means that the default version and the speech bubble color version are the same, and +2 means that the speech bubble color is preferred. The speech bubble expression system may store and store recorded voice data and questionnaire responses together.

말풍선 표현 시스템은 음성 메시지에 대한 감정 인식(110)을 수행할 수 있다. 말풍선 표현 시스템은 감정 인식을 위한 학습 모델을 이용하여 수신된 음성 데이터에 대한 감정 정보를 분류할 수 있다. 말풍선 표현 시스템은 학습 모델을 이용하여 수신된 음성 메시지로부터 중립, 화남, 흥분, 절망, 고요함, 슬픔을 포함하는 감정 정보를 분류할 수 있다. The speech bubble expression system may perform emotion recognition 110 for the voice message. The speech bubble expression system may classify emotion information on the received voice data using a learning model for emotion recognition. The speech bubble expression system may classify emotional information including neutrality, anger, excitement, despair, stillness, and sadness from the received voice message using a learning model.

도 4를 참고하면, 학습 모델을 이용하여 수신된 음성 데이터에 대한 감정 정보를 분류하는 동작을 설명하기 위한 예이다. 말풍선 표현 시스템은 감정 인식을 위한 학습 모델(400)을 이용하여 음성 메시지에 대한 감정 정보를 분류할 수 있다. 이때, 학습 모델(400)은 감정 인식을 위한 데이터셋을 이용하여 학습됨으로써 구축된 것일 수 있다. 학습 모델(400)은 LSTM, CNN, DN, RNN 등 다양한 네트워크 기반의 학습 모델로 구성될 수 있다. 실시예에서는 LSTM 기반의 학습 모델(400)이라고 가정하여 설명하기로 한다. 도 9를 참고하면, LSTM 기반의 학습 모델(400)을 나타낸 도면이다. Referring to FIG. 4 , it is an example for explaining an operation of classifying emotion information on received voice data using a learning model. The speech bubble expression system may classify the emotion information for the voice message by using the learning model 400 for emotion recognition. In this case, the learning model 400 may be constructed by learning using a dataset for emotion recognition. The learning model 400 may be composed of various network-based learning models such as LSTM, CNN, DN, and RNN. In the embodiment, it is assumed that the LSTM-based learning model 400 will be described. Referring to FIG. 9 , it is a diagram illustrating an LSTM-based learning model 400 .

이와 같이 구축된 학습 모델(400)에 음성 메시지가 입력될 수 있다. 말풍선 표현 시스템은 학습 모델(400)를 통해 음성 메시지에 대한 감정 정보를 분류할 수 있다. 감정 정보는 중립, 화남, 흥분 및 절망, 고요함, 슬픔 중 어느 하나 이상을 포함할 수 있다. 이외에도 감정 정보는 기쁨, 슬픔 등이 더 포함될 수 있다. 예를 들면, 사용자의 음성 메시지로부터 복수 개의 감정 정보(예를 들면, 화남과 흥분)가 인식될 수 있다. 또한, 동일한 내용의 음성 메시지일지라도 시간 정보에 따라 감정 정보가 다르게 분류될 수 있다. 사용자로부터 새벽에 입력된 음성 메시지가 낮 시간대에 입력된 음성 메시지보다 더욱 감성적일 수 있다. A voice message may be input to the learning model 400 constructed in this way. The speech bubble expression system may classify the emotion information for the voice message through the learning model 400 . The emotional information may include any one or more of neutrality, anger, excitement and despair, stillness, and sadness. In addition, the emotional information may further include joy, sadness, and the like. For example, a plurality of emotional information (eg, anger and excitement) may be recognized from the user's voice message. In addition, even for voice messages having the same content, emotion information may be classified differently according to time information. A voice message input from the user at dawn may be more emotional than a voice message input during the daytime.

음성 메시지가 학습 모델(400)에 입력됨에 따라 음성 메시지에 대한 특징 정보가 추출될 수 있고, 추출된 특징 정보에 기초하여 감정 정보가 판단될 수 있다. 음성 메시지는 openSMILE toolkit을 사용하여 특징 벡터로 변환될 수 있다. 이때, 음성 메시지로부터 음파의 세기, 음파의 높이, 음색을 포함하는 소리의 3요소에 기초하여 특징 정보가 추출될 수 있다. 또한, 소리의 3요소 이외에도 말투, 억양, 전파속도, 위상 등이 더 고려되어 특징 정보가 추출될 수도 있다. 추출된 특징 정보에 기초하여 판단된 감정 정보에 따라 색상 정보가 결정될 수 있다. 이때, 판단된 감정 정보 및 판단된 감정 정보의 값이 학습 결과로서 획득될 수 있다.As the voice message is input to the learning model 400 , characteristic information about the voice message may be extracted, and emotion information may be determined based on the extracted characteristic information. Voice messages can be converted into feature vectors using the openSMILE toolkit. In this case, characteristic information may be extracted from the voice message based on three elements of sound including the intensity of the sound wave, the height of the sound wave, and the tone. Also, in addition to the three elements of sound, feature information may be extracted by considering tone, intonation, propagation speed, phase, and the like. Color information may be determined according to emotion information determined based on the extracted feature information. In this case, the determined emotion information and the value of the determined emotion information may be obtained as a learning result.

상세하게는, 예를 들면, 훈련 데이터 셋은 900개의 영어 음성 오디오 파일을 포함하는 RAVDESS의 하위 집합이 이용될 수 있다. 200개의 오디오는 '중립'을 제외하고 각 감정에 속한다. 다중 클래스 분류는 이산 가중치 추정에 사용될 수 있다. 학습 모델(400)의 출력은 복수 개의 감정 클래스에 대한 확률 분포를 제공할 수 있다. 이때, 예를 들면, 음성 메시지에 대한 감정을 강화하기 위하여 사용자마다 서로 다른 셋팅이 필요할 수 있다. 이에, 사용자 개별의 음성 조건에 기초하여 사용자마다 서로 다른 셋팅이 설정될 수 있다. 일례로, 평상시 원래 좀 졸리는 말투의 사람은 정말 발악하지 않는 이상 '절망'이라고 판단될 수 있다. 이에, 사용자마다 동일 문장을 발화하도록 하여 기준선(baseline)을 잡아준 후, 기준선보다 현격한 차이가 있는 경우, 그리고 현격한 차이 중 가장 두드러진 변화를 보이는 경우에 특정 감정인 것으로 판단할 수 있다. 예를 들면, 사용자가 음성 데이터를 입력할 때 중립적인 분위기에서 문장을 복수 번 반복하여 녹음하도록 하여, 그 평균 LSTM 판단을 기준으로 고려할 수 있다. 이에, 음성 메시지를 입력할 때 LSTM 판단을 위한 기준선과의 일치 여부를 판단에 사용할 수 있다. 말풍선 표현 시스템은 음성 메시지로부터 분류된 감정 정보를 기준선과 비교하여 감정 정보의 정도(강도 수준)을 표시할 수 있다. 기준선 변화율이 가장 큰 감정 범주를 최종 예측 결과로 간주될 수 있다. 또한, 변화율을 미리 결정된 두 개의 임계값과 비교하고 그에 따라 강도 레이블이 할당될 수 있다. 예를 들면, 변화율이 상한 임계값보다 클 경우 강도는 '높음'으로 간주될 수 있다. 이때, 훈련 데이터 셋에 따라 반복 수정을 통해 임계값이 수정될 수 있다. 여기서, 상한 임계값을 10, 하한 임계값이 0.1로 설정될 수 있다. 기본의 말풍선을 '중립'적인 감정으로 시각화하고 다른 감정의 말풍선에 색상이 추가될 수 있다. 색 구성표에 기초하여 각 감정 수준에 대해 복수 개(예를 들면, 2색) 그라데이션이 사용될 수 있다. 예를 들면, 주황색-노란색, 빨간색-갈색, 회색-파란색, 민트색-파란색의 그라이데션은 각 흥분, 분노, 슬픔, 고요함 등의 감정을 위해 선택될 수 있다. 다른 강도 수준에 대해 색상 강도가 변경되어 색상 정보가 변경될 수 있다. 하이 레벨(High-level)의 경우, 더 많은 채도 또는 대비 색상을 사용하고, 로우 레벨(Low-level)의 경우 옅은 색상이 더 많이 사용될 수 있다.In detail, for example, a subset of RAVDESS including 900 English voice audio files may be used as the training data set. 200 audios belong to each emotion except 'neutral'. Multi-class classification can be used for discrete weight estimation. The output of the learning model 400 may provide probability distributions for a plurality of emotion classes. In this case, for example, different settings may be required for each user in order to strengthen the emotion of the voice message. Accordingly, different settings may be set for each user based on the user's individual voice condition. For example, a person who normally speaks a little sleepy may be judged as 'desperation' unless he is really outraged. Accordingly, after setting a baseline by allowing each user to utter the same sentence, when there is a significant difference from the baseline, and when there is a most remarkable change among the significant differences, it can be determined as a specific emotion. For example, when a user inputs voice data, a sentence may be repeatedly recorded a plurality of times in a neutral atmosphere, and the average LSTM determination may be considered as a criterion. Accordingly, when a voice message is input, whether or not it coincides with a reference line for LSTM determination may be used for determination. The speech bubble expression system may display the degree (intensity level) of the emotion information by comparing the emotion information classified from the voice message with a baseline. The emotion category with the largest baseline change rate may be considered as the final prediction result. In addition, the rate of change may be compared to two predetermined thresholds and an intensity label assigned accordingly. For example, if the rate of change is greater than the upper threshold, the intensity may be considered 'high'. In this case, the threshold value may be modified through iterative correction according to the training data set. Here, the upper limit threshold value may be set to 10, and the lower limit threshold value may be set to 0.1. Visualize the basic speech bubble as a 'neutral' emotion, and color can be added to the speech bubble of other emotions. A plurality of (eg, two-color) gradations may be used for each emotion level based on the color scheme. For example, a gradient of orange-yellow, red-brown, gray-blue, mint-blue may be selected for each emotion, such as excitement, anger, sadness, stillness, and the like. The color information may change as the color intensity is changed for different intensity levels. For a high-level, more saturated or contrasting colors may be used, and for a low-level, more pale colors may be used.

말풍선 표현 시스템은 분류된 감정 정보에 따라 수신된 음성 메시지에 대응되는 부가 정보를 표현할 수 있다. 부가 정보란, 음성 메시지와 관련하여 감정 정보를 표현하기 위한 추가적인 정보를 의미할 수 있다. 예를 들면, 감정 정보에 따라 채팅창, 메시지의 말풍선 등에 특수효과, 색상 변경, 두께 변경 등이 부가 정보로 표현될 수 있다. 말풍선 표현 시스템은 말풍선을 통해 인스턴트 메시지(음성 메시지)로부터 판단된 사용자의 감정을 강화시킬 수 있다. The speech bubble expression system may express additional information corresponding to the received voice message according to the classified emotion information. The additional information may mean additional information for expressing emotion information in relation to a voice message. For example, a special effect, a color change, a thickness change, etc. may be expressed as additional information in a chat window, a speech bubble of a message, etc. according to the emotional information. The speech bubble expression system may reinforce the user's emotion determined from the instant message (voice message) through the speech bubble.

말풍선 표현 시스템은 각각의 분류된 감정 정보에 대응하는 색상 정보를 수신된 음성 메시지에 대한 말풍선의 배경색에 매핑할 수 있다. 이때, 각각의 감정 정보, 즉, 중립, 화남, 흥분, 절망, 고요함, 슬픔에 대한 각각의 색상 정보가 설정되어 있을 수 있다. 더 나아가, 감정 정보에 설정된 색상 정보의 범위 내에서 감정 정보의 정도에 따라 말풍선 배경색의 색상이 조절될 수 있다. 색상 정보는 색을 구성하는 색상, 명도 및 채도를 포함할 수 있다. 또한, 말풍선 표현 시스템은 말풍선의 배경색뿐만 아니라 말풍선을 포함하는 채팅 서비스의 배경색을 변경할 수 있다. 예를 들면, 각 채팅 서비스마다 대표하는 색상이 존재할 수 있다. 말풍선 표현 시스템은 사용자의 감정을 강화시키기 위하여 채팅방의 색상 정보를 채팅 서비스를 대표하는 색상으로 변경하고, 변경된 채팅방의 색상 정보에 기초하여 말풍선의 배경색을 극대화시킬 수 있다.The speech bubble expression system may map color information corresponding to each classified emotion information to a background color of a speech bubble for the received voice message. At this time, respective color information for each emotion information, that is, neutrality, anger, excitement, despair, stillness, and sadness may be set. Furthermore, the color of the background color of the speech bubble may be adjusted according to the degree of emotion information within the range of color information set in the emotion information. The color information may include hue, brightness, and saturation constituting a color. In addition, the speech bubble expression system may change the background color of the chat service including the speech bubble as well as the background color of the speech bubble. For example, a color representing each chat service may exist. The speech bubble expression system may change the color information of the chat room to a color representing the chat service in order to strengthen the user's emotion, and maximize the background color of the speech bubble based on the changed color information of the chat room.

예를 들면, 말풍선 표현 시스템은 음성 메시지에 대하여 분류된 감정 정보가 중립일 경우, 중립에 대응하는 말풍선의 배경색에 설정된 기본값을 반영할 수 있다. 말풍선 표현 시스템은 음성 메시지에 대하여 분류된 감정 정보가 화남일 경우, 화남에 대응되는 말풍선의 배경색으로 빨간색을 반영할 수 있다. 이때, 말풍선 표현 시스템은 말풍선의 배경색이 빨간색인지 판단할 수 있다. 말풍선 표현 시스템은 말풍선의 배경색이 빨간색이 아니라면, 선홍색(빨간색)으로 변경할 수 있고, 빨간색이라면 270°>색상>90°의 색상 정보로 반영할 수 있다. 말풍선 표현 시스템은 음성 메시지에 대하여 분류된 감정 정보가 흥분일 경우, 말풍선의 배경색을 270°>색상>90°, 채도(S)>90, 명도(L)>40을 포함하는 색상 정보로 반영할 수 있다. 말풍선 표현 시스템은 음성 메시지에 대하여 분류된 감정 정보가 절망일 경우, 채도<30, 명도<30의 색상 정보를 반영할 수 있고, 텍스트의 명도>70이면, 부정적 극성(Negative polarity)을 표현할 수 있다. For example, when the emotional information classified for the voice message is neutral, the speech bubble expression system may reflect a default value set in the background color of the speech bubble corresponding to the neutral. When the emotional information classified for the voice message is angry, the speech bubble expression system may reflect red as a background color of the speech bubble corresponding to the angry person. In this case, the speech bubble expression system may determine whether the background color of the speech bubble is red. If the background color of the speech bubble is not red, the speech bubble expression system can change it to bright red (red), and if it is red, it can be reflected with color information of 270°>color>90°. When the emotional information classified for the voice message is excitement, the speech bubble expression system reflects the background color of the speech bubble as color information including 270°>color>90°, saturation(S)>90, brightness(L)>40. can When the emotional information classified for the voice message is despair, the speech bubble expression system can reflect color information of saturation < 30 and brightness < 30, and when the text brightness > 70, negative polarity can be expressed. .

말풍선 표현 시스템은 수신된 음성 메시지에 대한 음향적 분석(120)을 수행할 수 있다. 음향적 분석(Acoustic analysis)은 발성의 질을 주파수, 강도, 시간의 측면에서 측정하여 분석하는 것으로, 말소리의 발호와 인지를 이해할 수 있게 해준다. 음향적 분석(Acoustic analysis)은 음성 데이터로부터 발생되는 음성 신호를 컴퓨터 등을 이용하여 음성 파형과 스펙트럼 및 스펙트로그램 등으로 시각화하여 이들을 분석함으로써 음성 데이터에 대한 진단과 치료에 이용하는 것을 의미한다. 예를 들면, 여러 가지 음의 물리적 성질을 나타내는 것을 주파수 분석을 이용하여 여러 가지 음의 물리적 성질이 분석될 수 있다.The speech bubble expression system may perform acoustic analysis 120 on the received voice message. Acoustic analysis is to measure and analyze the quality of speech in terms of frequency, intensity, and time, and it makes it possible to understand the outpouring and recognition of speech sounds. Acoustic analysis refers to using a computer, etc. to visualize a voice signal generated from voice data as a voice waveform, spectrum, and spectrogram, and then analyze them to be used for diagnosis and treatment of voice data. For example, various physical properties of sounds may be analyzed using frequency analysis that indicates the physical properties of various sounds.

예를 들면, 말풍선 표현 시스템은 수신된 음성 메시지에 대한 음의 크기, 높이, 음색 등을 포함하는 음향적 속성 정보를 이용하여 음향적 분석(120)을 수행할 수 있다. 말풍선 표현 시스템은 음향적 분석(120)을 수행한 결과, 음성 메시지에 대한 음의 크기가 기 설정된 기준 이상인지 판단할 수 있다. 말풍선 표현 시스템은 음성 메시지에 대한 음의 크기가 80dB을 초과하는지 여부를 판단할 수 있다. 말풍선 표현 시스템은 음성 메시지에 대한 음의 크기가 80dB을 초과하지 않을 경우, 말풍선 내 시그널 표시를 위한 굵기 정보에 설정된 기본값을 반영할 수 있다. 말풍선 표현 시스템은 음성 메시지에 대한 음의 크기가 80dB을 초과하는 것으로 판단됨에 따라 말풍선 내 시그널 표시의 굵기 정보를 두껍게 반영할 수 있다. For example, the speech bubble expression system may perform the acoustic analysis 120 using acoustic property information including the volume, height, tone, etc. of the received voice message. As a result of performing the acoustic analysis 120 , the speech bubble expression system may determine whether the volume of the sound for the voice message is greater than or equal to a preset standard. The speech bubble expression system may determine whether the loudness of the voice message exceeds 80 dB. The speech bubble expression system may reflect the default value set in the thickness information for signal display in the speech bubble when the volume of the sound for the voice message does not exceed 80 dB. As it is determined that the volume of the voice for the voice message exceeds 80 dB, the speech bubble expression system may thickly reflect the thickness information of the signal display in the speech bubble.

말풍선 표현 시스템은 감정 인식(110)을 통해 결정된 말풍선의 배경색 색상 정보와 음향적 분석(120)을 통해 결정된 말풍선 내 시그널 표시의 굵기 정보를 반영하여 시각화할 수 있다. 말풍선 표현 시스템은 분류된 감정 정보에 따라 미리 설정된 색상 정보를 기준으로 각 색상 정보에 설정된 색상값 범위에 기초하여 수신된 음성 메시지에 대한 말풍선의 배경색을 조절할 수 있다. 말풍선 표현 시스템은 음의 크기에 기초하여 말풍선 내 시그널 표시의 굵기 정보를 조절하여 수신된 음성 데이터에 대한 음의 크기를 반영할 수 있다. The speech bubble expression system may visualize by reflecting the background color color information of the speech bubble determined through the emotion recognition 110 and the thickness information of the signal display in the speech bubble determined through the acoustic analysis 120 . The speech bubble expression system may adjust the background color of the speech bubble for the received voice message based on the color value range set for each color information based on the color information preset according to the classified emotion information. The speech bubble expression system may reflect the volume of the received voice data by adjusting the thickness information of the signal display in the speech bubble based on the volume of the sound.

실시예에 따르면, 메신저 또는 SNS 플랫폼에서 사용자 선택 옵션으로 말풍선 표현을 설정하도록 제공할 수 있으며 흥미를 유발할 뿐만 아니라 자신의 말투에 대한 모니터링도 할 수 있어 사용 목적에 따라 차별화된 서비스로 사용자의 관심과 편의를 유도할 수 있다.According to an embodiment, it is possible to provide to set a speech bubble expression as a user selection option in a messenger or SNS platform, and it is possible to not only generate interest but also monitor one's own tone, so that the user's interest and interest and convenience can be induced.

실시예에 따르면, 음성의 내용이나 문자 메시지의 맥락적 흐름을 기반으로 사용자의 감성을 파악하는 종래의 기술과 비교하여, 개인 정보가 침해되지 않는 점이 있어 사용자들의 선호가 기대된다. 또한, 음성을 문자로 변환하거나 문맥을 읽어내는 연산과는 달리 음성의 음향적 속성 정보를 기반으로 단순한 연산을 적용할 수 있어 간단한 플랫폼에서도 손쉽게 탑재가 가능하다는 장점이 있다. According to the embodiment, compared with the prior art of recognizing the user's emotion based on the contextual flow of the voice or text message, the user's preference is expected because personal information is not infringed. In addition, unlike the operation that converts speech into text or reads the context, a simple operation can be applied based on the acoustic property information of the speech, so it has the advantage that it can be easily mounted on a simple platform.

실시예에 따르면, 음향적 속성 정보의 로우 레벨(low level) 특징을 추출하여 프론트 엔드에 설치된 감정 분류 알고리즘으로부터 출력된 결과를 반영할 수 있어 적은 투자로 강한 임팩트를 기대할 수 있다. According to the embodiment, a strong impact can be expected with a small investment because the result output from the emotion classification algorithm installed in the front end can be reflected by extracting the low level characteristic of the acoustic attribute information.

도 5를 참고하면, 감정 정보에 따라 말풍선을 시각화하는 것을 설명하기 위한 예이다. 도 5는 A 메신저, B 메신저, C 메신저를 나타낸 예이다. 이러한 각각의 메신저는 전자 기기 내에서 동작될 수 있다. 전자 기기는 컴퓨터 장치로 구현되는 고정형 단말이거나 이동형 단말일 수 있다. 전자 기기의 예를 들면, 스마트폰(smart phone), 휴대폰, 내비게이션, 컴퓨터, 노트북, 디지털방송용 단말, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 태블릿 PC, 게임 콘솔(game console), 웨어러블 디바이스(wearable device), IoT(internet of things) 디바이스, VR(virtual reality) 디바이스, AR(augmented reality) 디바이스, 디지털 사이니지 등이 있다. 전자 기기는 무선 또는 유선 통신 방식을 이용하여 네트워크를 통해 다른 전자 기기들 및/또는 서버와 통신할 수 있다. 전자 기기에서 메신저 또는 메신저 기능을 포함하는 SNS 어플리케이션이 실행되거나, 메신저 또는 SNS플랫폼을 통해 메신저 또는 SNS이 구동될 수 있고, 구동되는 메신저 또는 SNS 내에서 메시지를 송수신할 수 있게 된다. Referring to FIG. 5 , it is an example for explaining visualization of a speech bubble according to emotion information. 5 is an example showing a messenger A, a messenger B, and a messenger C. Each of these messengers may be operated within the electronic device. The electronic device may be a fixed terminal implemented as a computer device or a mobile terminal. For example, an electronic device, a smart phone, a mobile phone, a navigation system, a computer, a laptop computer, a digital broadcasting terminal, a PDA (Personal Digital Assistants), a PMP (Portable Multimedia Player), a tablet PC, a game console, There are wearable devices, internet of things (IoT) devices, virtual reality (VR) devices, augmented reality (AR) devices, digital signage, and the like. The electronic device may communicate with other electronic devices and/or a server through a network using a wireless or wired communication method. In the electronic device, a messenger or an SNS application including a messenger function may be executed, or a messenger or SNS may be driven through the messenger or SNS platform, and messages may be transmitted and received within the running messenger or SNS.

각 메신저에서 송수신되는 음성 메시지를 통해 분류된 감정 정보에 따라 말풍선이 시각화될 수 있다. 감정 정보에 포함된 중립, 흥분, 화남, 절망, 고요함, 슬픔에 대응하는 색상 정보가 말풍선의 배경색에 시각화될 수 있고, 음성 메시지의 음의 크기에 따라 말풍선 내 시그널 표시의 굵기 정보가 조절되어 시각화될 수 있다. 이때, 각 메신저에 따라 감정 정보에 대응하는 색상 정보, 음의 크기에 따른 굵기 정보를 다르게 설정하여 말풍선의 배경색 및 말풍선 내 시그널 표시를 다르게 시각화할 수 있다. 예를 들면, 사용자는 각 메신저에서 제공되는 편집 정보에 기초하여 사용자가 원하는 시그널 표시(UI)를 선택할 수 있고, 선택된 시그널 표시가 말풍선 내에 표시되도록 설정할 수 있다. 이때, 시그널 표시는 다양한 형태로 리스트화될 수 있으며, 이 중 사용자가 시그널 표시를 선택할 수 있다. 또는, 사용자가 시그널 표시를 그릴 수도 있다. 또한, 사용자는 각 메신저 또는 사용자마다 감정 정보에 대응하는 색상 정보를 다르게 설정할 수 있다. 예를 들면, A사용자는 화남에 빨간색으로 설정할 수 있고, B 사용자는 흥분에 빨간색으로 설정할 수 있다. 또는, A 사용자는 A메신저에서 화남에 빨간색, B메신저에서 흥분에 빨간색으로 설정할 수도 있다. 이와 같이, 편집 정보는 사용자에 의해 변경될 수 있다. A speech bubble may be visualized according to emotional information classified through a voice message transmitted and received by each messenger. Color information corresponding to neutrality, excitement, anger, despair, stillness, and sadness included in the emotional information can be visualized in the background color of the speech bubble, and the thickness information of the signal display in the speech bubble is adjusted and visualized according to the volume of the voice message can be In this case, the color information corresponding to the emotion information and the thickness information according to the sound volume are set differently according to each messenger, so that the background color of the speech bubble and the signal display in the speech bubble can be visualized differently. For example, the user may select a desired signal display (UI) based on edit information provided by each messenger, and may set the selected signal display to be displayed in a speech bubble. In this case, the signal display may be listed in various forms, and the user may select the signal display among them. Alternatively, the user may draw a signal indication. In addition, the user may set color information corresponding to the emotion information differently for each messenger or each user. For example, user A may set it to red for anger, and user B may set it to red for excitement. Alternatively, user A may set the color red for anger in messenger A and red for excitement in messenger B. In this way, the edit information can be changed by the user.

도 6를 참고하면, 음성 메시지를 이용하여 판단된 감정 정보에 따라 말풍선을 시각화하는 것을 설명하기 위한 예이다. Referring to FIG. 6 , it is an example for explaining visualization of a speech bubble according to emotion information determined using a voice message.

예를 들면, 메신저 내 채팅방을 통해 1:1, 1:N으로 음성 메시지 또는 텍스트 메시지가 송수신될 수 있다. 이러한 채팅방에 적어도 하나 이상의 멤버가 구성될 수 있다. 메신저 내의 채팅방은 일반 채팅방, 비밀 채팅방, 오픈 채팅방 등을 포함할 수 있다. 이러한 메신저 내 채팅방에서 사용자의 상황에 따라 음성 메시지 또는 텍스트 메시지가 혼용되어 입력될 수 있다. For example, voice messages or text messages may be transmitted/received 1:1 or 1:N through a chat room within the messenger. At least one member may be configured in such a chat room. The chat room in the messenger may include a general chat room, a secret chat room, an open chat room, and the like. In such a chat room within the messenger, a voice message or a text message may be mixed and input according to the user's situation.

말풍선 표현 시스템은 수신된 음성 데이터를 텍스트 데이터로 표현하는 대신, 말풍선을 시각화할 수 있다. 말풍선 시스템은 수신된 음성 데이터로부터 분류된 감정 정보에 따라 미리 설정된 색상 정보를 수신된 음성 메시지에 대한 말풍선의 배경색에 매핑하고, 매핑된 말풍선의 배경색을 시각화할 수 있다. 말풍선 표현 시스템은 수신된 음성 메시지의 음의 크기에 따라 말풍선 내 시그널 표시의 두께를 시각화할 수 있다. The speech bubble expression system may visualize the speech bubble instead of expressing the received speech data as text data. The speech balloon system may map preset color information according to the emotion information classified from the received voice data to the background color of the speech bubble for the received voice message, and visualize the mapped background color of the speech bubble. The speech bubble expression system may visualize the thickness of the signal display in the speech bubble according to the volume of the received voice message.

사용자 이외의 상대방 사용자(채팅방의 다른 멤버)는 사용자의 음성 메시지에 대하여 표시된 말풍선의 배경색 및 말풍선 내 시그널 표시에 기초하여 음성 메시지를 듣지 않아도, 사용자의 감정 정보를 파악할 수 있다. 상대방 사용자도 음성 메시지 또는 텍스트 메시지(이모티콘)로 응답할 수 있다. 이때, 상대방 사용자로부터 텍스트 메시지가 입력됨에 따라, 텍스트 메시지가 음성 메시지로 생성될 수 있으며, 텍스트 데이터 또는 음성 메시지의 분석을 통해 분류된 감정 정보에 설정된 색상 정보에 대응되는 감정이 음성 메시지에 반영될 수 있다. A user other than the user (another member of the chat room) may recognize the user's emotional information without listening to the voice message based on the background color of the speech bubble displayed with respect to the user's voice message and the signal display in the speech bubble. The other user can also respond with a voice message or a text message (emoticon). At this time, as the text message is input from the other user, the text message may be generated as a voice message, and the emotion corresponding to the color information set in the emotion information classified through analysis of the text data or the voice message will be reflected in the voice message. can

도 2는 일 실시예에 따른 말풍선 표현 시스템의 구성을 설명하기 위한 블록도이고, 도 3은 일 실시예에 따른 말풍선 표현 시스템에서 음성 메시지의 말풍선 표현 방법을 설명하기 위한 흐름도이다. 2 is a block diagram for explaining the configuration of a speech bubble expression system according to an embodiment, and FIG. 3 is a flowchart for explaining a speech bubble expression method of a voice message in the speech bubble expression system according to an embodiment.

말풍선 표현 시스템(100)의 프로세서는 음성 수신부(210), 감정 분류부(220) 및 감정 표현부(230)를 포함할 수 있다. 이러한 프로세서의 구성요소들은 말풍선 표현 시스템에 저장된 프로그램 코드가 제공하는 제어 명령에 따라 프로세서에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 프로세서 및 프로세서의 구성요소들은 도 3의 음성 메시지의 말풍선 표현 방법이 포함하는 단계들(310 내지 330)을 수행하도록 말풍선 표현 시스템을 제어할 수 있다. 이때, 프로세서 및 프로세서의 구성요소들은 메모리가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다.The processor of the speech bubble expression system 100 may include a voice receiver 210 , an emotion classification unit 220 , and an emotion expression unit 230 . The components of the processor may be expressions of different functions performed by the processor according to a control instruction provided by the program code stored in the speech bubble expression system. The processor and components of the processor may control the speech bubble expression system to perform steps 310 to 330 included in the speech bubble expression method of the voice message of FIG. 3 . In this case, the processor and the components of the processor may be implemented to execute instructions according to the code of the operating system included in the memory and the code of at least one program.

프로세서는 음성 메시지의 말풍선 표현 방법을 위한 프로그램의 파일에 저장된 프로그램 코드를 메모리에 로딩할 수 있다. 예를 들면, 말풍선 표현 시스템에서 프로그램이 실행되면, 프로세서는 운영체제의 제어에 따라 프로그램의 파일로부터 프로그램 코드를 메모리에 말풍선 표현 시스템을 제어할 수 있다. 이때, 음성 수신부(210), 감정 분류부(220) 및 감정 표현부(230) 각각은 메모리에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(310 내지 330)을 실행하기 위한 프로세서의 서로 다른 기능적 표현들일 수 있다.The processor may load the program code stored in the file of the program for the speech bubble expression method of the voice message into the memory. For example, when the program is executed in the speech bubble expression system, the processor may control the speech balloon expression system by placing the program code from the program file into the memory according to the control of the operating system. At this time, each of the voice receiving unit 210 , the emotion classification unit 220 , and the emotion expression unit 230 executes a command of a corresponding part of the program code loaded in the memory to execute subsequent steps 310 to 330 . They may be different functional representations of the processor.

단계(310)에서 음성 수신부(210)는 음성 데이터를 수신할 수 있다. 예를 들면, 음성 수신부(210)는 현재 화자로부터 발화되는 음성 데이터가 입력됨을 수신하거나, 과거에 화자에 의해 녹음된 음성 데이터가 입력됨을 수신할 수 있다. 일례로, 메신저에서 사용자들이 음성 데이터를 통해 메시지를 송수신할 수 있다. 이때, 사용자가 실시간으로 음성 데이터를 입력함에 따라 음성 데이터가 생성되어 상대방 사용자에게 전달될 수 있다. 또는, 사용자는 전자 기기에 사전에 녹음된 음성 데이터를 선택함에 따라 음성 데이터가 생성되어 상대방 사용자에게 전달될 수 있다. 음성 수신부(210)는 인스턴트 메시지 서비스를 제공하는 메신저 또는 인스턴트 메시지 서비스를 제공하는 메신저 기능이 포함된 SNS에서 송수신되는 음성 데이터 기반의 인스턴트 메시지를 수신할 수 있다. 이러한 메신저 또는 SNS에서 송수신된 음성 데이터는 기 설정된 기간 이후에는 삭제될 수 있다. In step 310 , the voice receiver 210 may receive voice data. For example, the voice receiver 210 may receive that voice data uttered by the current speaker is input or that voice data recorded by the speaker in the past is input. For example, in the messenger, users may transmit and receive messages through voice data. In this case, as the user inputs voice data in real time, voice data may be generated and transmitted to the other user. Alternatively, as the user selects voice data previously recorded in the electronic device, voice data may be generated and delivered to the other user. The voice receiver 210 may receive an instant message based on voice data transmitted/received from a messenger providing an instant message service or an SNS including a messenger function providing an instant message service. Voice data transmitted and received in such messenger or SNS may be deleted after a preset period.

단계(320)에서 감정 분류부(220)는 수신된 음성 데이터에 포함된 음향적 속성 정보를 이용하여 감정 정보를 분류할 수 있다. 감정 분류부(220)는 감정 인식을 위한 학습 모델을 이용하여 수신된 음성 데이터에 대한 감정 정보를 분류할 수 있다. 예를 들면, 말풍선 표현 시스템은 1차적으로 음성 메시지로부터 긍정 감정 또는 부정 감정을 분류할 수 있고, 분류된 긍정 감정 또는 부정 감정에 기초하여 2차적으로 중립, 화남, 흥분 및 절망, 고요함, 슬픔, 기쁨, 슬픔 등의 감정 정보를 판단할 수 있다.In step 320 , the emotion classification unit 220 may classify the emotion information using acoustic attribute information included in the received voice data. The emotion classifier 220 may classify emotion information on the received voice data using a learning model for emotion recognition. For example, the speech bubble expression system may primarily classify positive emotions or negative emotions from the voice message, and secondarily, based on the classified positive emotions or negative emotions, neutral, angry, excited and desperate, calm, sad, It is possible to judge emotional information such as joy and sadness.

단계(330)에서 감정 표현부(230)는 분류된 감정 정보에 따라 수신된 음성 데이터에 대응하는 부가 정보를 표현할 수 있다. 감정 표현부(230)는 수신된 음성 데이터에 대한 음향적 분석을 통해 수신된 음성 데이터에 대한 음의 크기를 획득하고, 획득된 음의 크기에 기초하여 말풍선 내 시그널 표시의 굵기 정보를 조절하여 수신된 음성 데이터에 대한 음의 크기를 반영할 수 있다. 감정 표현부(230)는 획득된 음의 크기가 기 설정된 값 이상일 경우, 말풍선 내 시그널 표시의 굵기 정보를 두껍게 시각화할 수 있다. 감정 표현부(230)는 분류된 감정 정보에 기초하여 분류된 감정 정보에 따라 미리 설정된 색상 정보를 상기 수신된 음성 메시지에 대한 말풍선의 배경색에 매핑하고, 매핑된 말풍선의 배경색을 시각화할 수 있다. 감정 표현부(230)는 분류된 감정 정보에 따라 미리 설정된 색상 정보를 기준으로 색상값 범위에 기초하여 수신된 음성 메시지에 대한 말풍선의 배경색을 조절할 수 있다. In step 330 , the emotion expression unit 230 may express additional information corresponding to the received voice data according to the classified emotion information. The emotion expression unit 230 obtains the volume of the received voice data through acoustic analysis of the received voice data, and adjusts the thickness information of the signal display in the speech bubble based on the acquired volume to receive It is possible to reflect the loudness of the converted voice data. When the acquired sound level is greater than or equal to a preset value, the emotion expression unit 230 may thickly visualize the thickness information of the signal display in the speech bubble. The emotion expression unit 230 may map preset color information according to the classified emotion information based on the classified emotion information to the background color of the speech bubble for the received voice message, and visualize the mapped background color of the speech bubble. The emotion expression unit 230 may adjust the background color of the speech bubble for the received voice message based on the color value range based on color information preset according to the classified emotion information.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the speech bubble expression method of the voice message performed by the speech bubble expression system,
receiving voice data;
classifying emotion information using acoustic attribute information included in the received voice data; and
expressing additional information corresponding to the received voice data according to the classified emotion information
A speech bubble expression method of a voice message comprising a.

According to claim 1,
The step of classifying the emotional information is,
Classifying emotion information on the received voice data using a learning model for emotion recognition
including,
The emotional information includes any one or more of neutrality, anger, excitement and despair, stillness, and sadness
A speech bubble expression method of a voice message, characterized in that.

3. The method of claim 2,
The learning model is
The same sentence is uttered for each user to set a baseline for judging the learning result, and when the received voice message changes more than a preset criterion based on the set baseline, emotional information recognized from the received voice message is A speech bubble expression method of a voice message, characterized in that it has been learned to be classified.

4. The method of claim 3,
The learning model is
The speech bubble expression method of a voice message, characterized in that the emotion information classified from the received voice message is compared with the set reference line and learned to derive the degree of the classified emotion information.

According to claim 1,
The step of expressing
Acoustic analysis of the received voice data acquires the sound volume of the received voice data, and adjusts the thickness information of the signal display in the speech bubble based on the acquired sound volume to adjust the received voice data Steps that reflect the loudness of
A speech bubble expression method of a voice message comprising a.

6. The method of claim 5,
The step of expressing
When the acquired sound level is greater than or equal to a preset value, thickly visualizing the thickness information of the signal display in the speech bubble
A speech bubble expression method of a voice message comprising a.

According to claim 1,
The step of expressing
Based on the classified emotion information, mapping preset color information according to the classified emotion information to a background color of a speech bubble for the received voice data, and visualizing the mapped background color of the speech bubble
A speech bubble expression method of a voice message comprising a.

8. The method of claim 7,
The step of expressing
adjusting a background color of a speech bubble for the received voice data based on a color value range based on preset color information according to the classified emotion information
A speech bubble expression method of a voice message comprising a.

The method of claim 1,
The receiving step is
Receiving an instant message based on voice data transmitted/received from a messenger providing an instant message service or an SNS including a messenger function providing the instant message service
A speech bubble expression method of a voice message comprising a.

A computer program stored in a computer-readable storage medium for executing a speech bubble expression method of a voice message performed by a speech bubble expression system,
receiving voice data;
classifying emotion information using acoustic attribute information included in the received voice data; and
expressing additional information corresponding to the received voice data according to the classified emotion information
A computer program stored in a computer-readable storage medium comprising a.

In the speech bubble expression system,
a voice receiver for receiving voice data;
an emotion classification unit for classifying emotion information using acoustic attribute information included in the received voice data; and
An emotion expression unit for expressing additional information corresponding to the received voice data according to the classified emotion information
A speech bubble expression system comprising a.

12. The method of claim 11,
The emotion classification unit,
Classifying the emotion information for the received voice data using a learning model for emotion recognition,
The emotional information includes any one or more of neutrality, anger, excitement and despair, stillness, and sadness
Speech bubble expression system, characterized in that.

12. The method of claim 11,
The emotion expression unit,
Acoustic analysis of the received voice data acquires the sound volume of the received voice data, and adjusts the thickness information of the signal display in the speech bubble based on the acquired sound volume to adjust the received voice data Reflects the sound level for the, and when the obtained sound level is greater than or equal to a preset value, thickly visualizes the thickness information of the signal display in the speech bubble
Speech bubble expression system, characterized in that.

12. The method of claim 11,
The emotion expression unit,
Based on the classified emotion information, the color information preset according to the classified emotion information is mapped to the background color of the speech bubble for the received voice data, the background color of the mapped speech bubble is visualized, and the classified emotion information Adjusting the background color of the speech bubble for the received voice data based on the color value range based on preset color information according to
Speech bubble expression system, characterized in that.

12. The method of claim 11,
The voice receiver,
Receiving an instant message based on voice data transmitted/received from a messenger providing an instant message service or an SNS including a messenger function providing the instant message service
Speech bubble expression system, characterized in that.