KR102604277B1

KR102604277B1 - Complex sentiment analysis method using speaker separation STT of multi-party call and system for executing the same

Info

Publication number: KR102604277B1
Application number: KR1020230050699A
Authority: KR
Inventors: 신현삼
Original assignee: 퓨렌스 주식회사; 신현삼
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-11-23

Abstract

본 발명의 일 실시예에 따른 STT를 이용한 복합 감정 분석 서버는 복수의 사용자 단말이 호 연결되면 사용자 사이의 통화 음성 데이터를 실시간으로 통화 텍스트 데이터로 변환하는 STT 변환부, 미리 결정된 데이터베이스를 기초로 상기 통화 음성 데이터 및 상기 통화 텍스트 데이터를 수정하여 수정 데이터를 생성하는 수정 데이터 생성부, 상기 수정 텍스트 데이터를 생성할 때 참조된 참조 데이터 별 횟수를 화자별로 제1 참조 데이터베이스에 저장하는 제1 참조 데이터베이스 생성부 및 상기 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는지 여부에 따라 감정 결정 키워드를 기초로 화자 별 감정으로 결정하는 감정 결정부를 포함한다.A complex emotion analysis server using STT according to an embodiment of the present invention includes an STT conversion unit that converts call voice data between users into call text data in real time when a plurality of user terminals are connected, based on a predetermined database. A correction data generator for generating correction data by modifying call voice data and the call text data, and generating a first reference database for storing the number of times for each reference data referenced when generating the correction text data in a first reference database for each speaker. and an emotion determination unit that determines an emotion for each speaker based on an emotion determination keyword depending on whether a predetermined emotion keyword exists among reference data for each speaker stored in the first reference database.

Description

Complex sentiment analysis method using speaker separation STT of multi-party call and system for executing the same}

본 발명은 다자간 통화의 화자분리 STT를 이용한 복합 감정 분석 방법 및 이를 실행하는 시스템에 관한 것으로, 보다 구체적으로 다자간의 통화 음성 데이터를 분석하여 화자를 분리한 후 화자 사이의 감정을 제공할 수 있는 다자간 통화의 화자분리 STT를 이용한 복합 감정 분석 방법 및 이를 실행하는 시스템에 관한 것이다.The present invention relates to a complex emotion analysis method using speaker separation STT in a multi-party call and a system for executing the same. More specifically, it relates to a multi-party call analysis method that can analyze multi-party call voice data to separate speakers and then provide the emotions between the speakers. This relates to a complex emotion analysis method using speaker separation STT in a call and a system for executing the same.

최근 컴퓨팅 기술의 발전으로 컴퓨터의 형태는 점점 소형화되었고, 항상 지니고 다닐 수 있는 각종 웨어러블 디바이스들이 생겨났는데, 컴퓨터의 형태가 변함에 따라서 필요한 휴먼 인터랙션 작용의 종류도 다양해 졌고, 다양한 지능형 서비스가 요구되고 있으며, 지능형 서비스를 위한 인공지능에 관한 연구가 활발하게 진행되면서 사람의 감정정보를 기기가 인식하여 사람과 적절한 인터랙션 작용을 하는 것 또한 중요해지고 있다. Recently, with the development of computing technology, the form of computers has become increasingly smaller, and various wearable devices that can be carried around at all times have been created. As the form of computers changes, the types of human interaction required have also become more diverse, and various intelligent services are required. , As research on artificial intelligence for intelligent services is actively progressing, it is also becoming important for devices to recognize people's emotional information and interact appropriately with people.

인간은 상대방에게 자신의 감정을 얼굴표정, 음성, 몸짓 등을 통한 다양한 방법으로 표현하는 이유로 영상, 음성, 생체신호 등의 매체를 통해 인간의 감정정보를 인식, 판별하기 위한 여러 분야에서의 연구가 활발히 진행되고 있다.Because humans express their emotions to others in a variety of ways through facial expressions, voices, and gestures, research in various fields is being conducted to recognize and determine human emotional information through media such as video, voice, and biological signals. It is actively underway.

일반적인 음성 데이터의 텍스트 변환(STT : Speech To Text) 기술은 녹취된 음성 또는 실시간으로 재생되는 음성 데이터가 입력되면, 상기 입력되는 음성 데이터의 발음, 억양, 길이를 분석하여, 단어와 문장을 생성하는 기술을 말한다.General voice data to text conversion (STT: Speech To Text) technology generates words and sentences by analyzing the pronunciation, intonation, and length of the input voice data when recorded voice or voice data played in real time is input. It speaks of technology.

그러나 음성 데이터의 경우, 말하는 사람의 발음, 억양이 다르며, 말하는 사람의 주변 환경에 따라 음성에 잡음이 끼어 정확한 발음을 인식하고 분석하기까지 많은 어려움이 따른다.However, in the case of voice data, the pronunciation and intonation of the speaker are different, and there is noise in the voice depending on the speaker's surrounding environment, making it difficult to recognize and analyze the correct pronunciation.

한국 공개특허 제10-2022-0121455호는 STT를 활용한 화자구분 시스템에 관한 것으로, STT를 활용하여 화자들간의 대화내용을 텍스트로 저장하고 복수의 화자로부터 음성신호가 입력되는 음성신호입력부 및 STT변화부를 통해 화자들간의 대화 텍스트를 사용자들의 휴대단말기로 제공하는 구성을 포함하고 있으나, 화자들의 음석의 감정을 분석하여 데이터를 분석하여 분류하는 기능을 제공하지는 않고 있으며, 음성 데이터의 인식율에 대한 구성이 부족한 점이 존재한다. Korean Patent Publication No. 10-2022-0121455 relates to a speaker classification system using STT, which uses STT to store the conversation between speakers as text, a voice signal input unit through which voice signals are input from multiple speakers, and STT. It includes a structure that provides conversation text between speakers to users' mobile terminals through the change unit, but does not provide a function to analyze and classify data by analyzing the emotions of speakers' voices, and a structure for the recognition rate of voice data. There is this shortcoming.

이러한 음성 데이터의 인식율을 높이기 위해서는 다양한 발음, 억양에 대한 수많은 샘플 데이터가 저장된 데이터베이스가 필요하며, 잘못 인식된 음성 데이터에 대해 피드백 가능한 교정 시스템이 필요하다.In order to increase the recognition rate of such voice data, a database storing numerous sample data for various pronunciations and intonations is needed, and a correction system that can provide feedback for misrecognized voice data is needed.

한국공개특허공보 제10-2022-0121455호 (2022.09.01. 공개)Korea Patent Publication No. 10-2022-0121455 (published on September 1, 2022)

본 발명은 다자간의 통화 음성 데이터를 분석하여 화자를 분리한 후 화자 사이의 감정을 제공할 수 있는 다자간 통화의 화자분리 STT를 이용한 복합 감정 분석 방법 및 이를 실행하는 시스템을 제공하는 것을 목적으로 한다.The purpose of the present invention is to provide a complex emotion analysis method using speaker separation STT in a multi-party call, which can analyze multi-party call voice data, separate speakers, and then provide emotions between speakers, and a system for executing the same.

또한, 본 발명은 다자간의 통화 음성 데이터를 통화 텍스트로 전환한 통화 텍스트 데이터 각각에 미리 결정된 키워드가 존재하면 텍스트 키워드에 해당하는 참조 데이터를 이용하여 텍스트 데이터를 수정하여 수정 텍스트 데이터를 생성할 수 있는 다자간 통화의 화자분리 STT를 이용한 복합 감정 분석 방법 및 이를 실행하는 시스템을 제공하는 것을 목적으로 한다.In addition, the present invention provides that if a predetermined keyword is present in each call text data converted from multi-party call voice data to call text, the text data can be modified using reference data corresponding to the text keyword to generate modified text data. The purpose is to provide a complex emotion analysis method using speaker separation STT for multi-party calls and a system for executing the same.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the objects mentioned above, and other objects and advantages of the present invention that are not mentioned can be understood by the following description and will be more clearly understood by the examples of the present invention. Additionally, it will be readily apparent that the objects and advantages of the present invention can be realized by the means and combinations thereof indicated in the patent claims.

이러한 목적을 달성하기 위한 STT를 이용한 복합 감정 분석 서버는 복수의 사용자 단말이 호 연결되면 사용자 사이의 통화 음성 데이터를 실시간으로 통화 텍스트 데이터로 변환하는 STT 변환부, 미리 결정된 데이터베이스를 기초로 상기 통화 음성 데이터 및 상기 통화 텍스트 데이터를 수정하여 수정 데이터를 생성하는 수정 데이터 생성부, 상기 수정 텍스트 데이터를 생성할 때 참조된 참조 데이터 별 횟수를 화자별로 제1 참조 데이터베이스에 저장하는 제1 참조 데이터베이스 생성부 및 상기 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는지 여부에 따라 감정 결정 키워드를 기초로 화자 별 감정으로 결정하는 감정 결정부를 포함한다.To achieve this purpose, a complex emotion analysis server using STT includes an STT conversion unit that converts call voice data between users into call text data in real time when a plurality of user terminals are connected, and the call voice based on a predetermined database. A correction data generator for generating correction data by modifying data and the call text data, a first reference database generator for storing the number of times for each reference data referenced when generating the correction text data in a first reference database for each speaker, and and an emotion determination unit that determines an emotion for each speaker based on the emotion determination keyword depending on whether a predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database.

일 실시예에서, 상기 감정 결정부는 상기 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는 경우 해당 화자 별 참조 횟수를 추출하고, 해당 화자 별 참조 데이터의 참조 횟수에 따라 해당 감정 결정 키워드에 해당하는 감정으로 결정하는 감정 결정부를 포함할 수 있다.In one embodiment, the emotion determination unit extracts the reference count for each speaker when a predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database, and extracts the corresponding emotion according to the reference number of the reference data for each speaker. It may include an emotion decision unit that decides with an emotion corresponding to the decision keyword.

또한 이러한 목적을 달성하기 위한 STT를 이용한 복합 감정 분석 방법은 복수의 사용자 단말이 호 연결되면 사용자 사이의 통화 음성 데이터를 실시간으로 통화 텍스트 데이터로 변환하는 단계, 미리 결정된 데이터베이스를 기초로 상기 통화 음성 데이터 및 상기 통화 텍스트 데이터를 수정하여 수정 데이터를 생성하는 단계, 상기 수정 텍스트 데이터를 생성할 때 참조된 참조 데이터 별 횟수를 화자별로 제1 참조 데이터베이스에 저장하는 단계 및 상기 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는지 여부에 따라 감정 결정 키워드를 기초로 화자 별 감정으로 결정하는 단계를 포함한다. In addition, a complex emotion analysis method using STT to achieve this purpose includes converting call voice data between users into call text data in real time when a plurality of user terminals are connected, and the call voice data based on a predetermined database. and generating correction data by modifying the call text data, storing the number of times for each reference data referenced when generating the correction text data in a first reference database for each speaker, and for each speaker stored in the first reference database. It includes the step of determining an emotion for each speaker based on the emotion determination keyword depending on whether a predetermined emotion keyword exists in the reference data.

일 실시예에서, 상기 미리 결정된 감정 키워드가 존재하는지 여부에 따라 감정 결정 키워드를 기초로 화자 별 감정으로 결정하는 단계는 상기 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는 경우 해당 화자 별 참조 횟수를 추출하는 단계 및 해당 화자 별 참조 데이터의 참조 횟수에 따라 해당 감정 결정 키워드에 해당하는 감정으로 결정하는 단계를 포함할 수 있다. In one embodiment, the step of determining an emotion for each speaker based on an emotion determination keyword depending on whether the predetermined emotion keyword exists is performed when the predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database. It may include extracting the reference count for each speaker and determining the emotion corresponding to the emotion determination keyword according to the reference count of reference data for each speaker.

전술한 바와 같은 본 발명에 의하면, 다자간의 통화 음성 데이터를 분석하여 화자를 분리한 후 화자 사이의 감정을 제공할 수 있다는 장점이 있다.According to the present invention as described above, there is an advantage in that it is possible to analyze multi-party call voice data, separate speakers, and then provide emotions between speakers.

도 1은 본 발명의 일 실시예에 따른 STT를 이용한 복합 감정 분석 시스템을 설명하기 위한 네트워크 구성도이다.
도 2는 본 발명의 일 실시예에 따른 복합 감정 분석 서버의 일 실시예를 설명하기 위한 블록도이다.
도 3은 본 발명에 따른 STT를 이용한 복합 감정 분석 방법의 일 실시예를 설명하기 위한 흐름도이다.Figure 1 is a network configuration diagram for explaining a complex emotion analysis system using STT according to an embodiment of the present invention.
Figure 2 is a block diagram for explaining an embodiment of a complex emotion analysis server according to an embodiment of the present invention.
Figure 3 is a flowchart illustrating an embodiment of a complex emotion analysis method using STT according to the present invention.

전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되며, 이에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다. 도면에서 동일한 참조부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용된다.The above-mentioned objects, features, and advantages will be described in detail later with reference to the attached drawings, so that those skilled in the art will be able to easily implement the technical idea of the present invention. In describing the present invention, if it is determined that a detailed description of known technologies related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the attached drawings. In the drawings, identical reference numerals are used to indicate identical or similar components.

도 1은 본 발명의 일 실시예에 따른 STT를 이용한 복합 감정 분석 시스템을 설명하기 위한 네트워크 구성도이다.Figure 1 is a network configuration diagram for explaining a complex emotion analysis system using STT according to an embodiment of the present invention.

도 1을 참조하면, STT를 이용한 복합 감정 분석 시스템은 제1 사용자 단말(100), 제2 사용자 단말(200) 및 복합 감정 분석 서버(300)를 포함한다. Referring to FIG. 1, a complex emotion analysis system using STT includes a first user terminal 100, a second user terminal 200, and a complex emotion analysis server 300.

제1 사용자 단말(100)은 제2 사용자 단말(200)과 호연결되면 제2 사용자 단말(200)을 보유하는 사용자와 통화하는 제1 사용자가 보유하는 단말이다. The first user terminal 100 is a terminal owned by the first user who makes a call with the user who owns the second user terminal 200 when a call is made to the second user terminal 200.

제1 사용자 단말(100)은 스마트 폰(Smart Phone), 휴대 단말기(Portable Terminal), 이동 단말기(Mobile Terminal), 개인 정보 단말기(Personal Digital Assistant: PDA), PMP(Portable Multimedia Player) 단말기, 개인용 컴퓨터(Personal Computer), 노트북 컴퓨터, 태블릿PC(Tablet PC) 등과 같은 다양한 단말기를 포함한다. The first user terminal 100 may be a smart phone, a portable terminal, a mobile terminal, a personal digital assistant (PDA), a portable multimedia player (PMP) terminal, or a personal computer. It includes various terminals such as personal computers, laptop computers, and tablet PCs.

제2 사용자 단말(200)은 제1 사용자 단말(100)과 호연결되면 제1 사용자 단말(100)을 보유하는 사용자와 통화하는 제2 사용자가 보유하는 단말이다. The second user terminal 200 is a terminal owned by a second user who makes a call with the user who owns the first user terminal 100 when a call is made to the first user terminal 100 .

제2 사용자 단말(200)은 스마트 폰(Smart Phone), 휴대 단말기(Portable Terminal), 이동 단말기(Mobile Terminal), 개인 정보 단말기(Personal Digital Assistant: PDA), PMP(Portable Multimedia Player) 단말기, 개인용 컴퓨터(Personal Computer), 노트북 컴퓨터, 태블릿PC(Tablet PC) 등과 같은 다양한 단말기를 포함한다. The second user terminal 200 may be a smart phone, a portable terminal, a mobile terminal, a personal digital assistant (PDA), a portable multimedia player (PMP) terminal, or a personal computer. It includes various terminals such as personal computers, laptop computers, and tablet PCs.

복합 감정 분석 서버(300)는 제1 사용자 단말(100) 및 제2 사용자 단말(200) 사이의 호 연결되면 제1 사용자 단말(100) 및 제2 사용자 단말(200) 사이의 통화 음성 데이터를 녹음한 후 분석하여 제1 사용자 및 제2 사용자를 분리한 후 감정을 분석하는 서버이다.When a call is connected between the first user terminal 100 and the second user terminal 200, the complex emotion analysis server 300 records call voice data between the first user terminal 100 and the second user terminal 200. It is a server that analyzes emotions after analyzing them to separate the first and second users.

이를 위해, 복합 감정 분석 서버(300)는 키워드 별 참조 데이터가 저장되어 있는 키워드 데이터베이스를 포함하고 있다. 키워드는 음성 또는 텍스트일 수 있으며, 참조 데이터는 텍스트로 구현될 수 있다. For this purpose, the complex emotion analysis server 300 includes a keyword database in which reference data for each keyword is stored. Keywords can be voice or text, and reference data can be implemented as text.

예를 들어, 복합 감정 분석 서버(300)는 키워드는 “시울, 소울, 소울, 시이울, 소오울” 등과 같은 텍스트로 구현되고, 참조 데이터는 “서울”과 같은 텍스트로 구현될 수 있다. For example, the complex emotion analysis server 300 may implement keywords as text such as “Siul, Soul, Soul, Siiul, Soul,” and reference data as text such as “Seoul.”

다른 예를 들어, 복합 감정 분석 서버(300)는 키워드가 텍스트로 구현되고 참조 데이터가 텍스트로 구현된 키워드 데이터베이스를 포함하는 경우, 텍스트 키워드는 “시울, 소울, 시이울, 소오울”과 같은 텍스트로 구현되고, 참조 데이터는 “서울” 등과 같은 텍스트로 구현될 수 있다. For another example, when the complex emotion analysis server 300 includes a keyword database in which keywords are implemented as text and reference data is implemented as text, the text keyword is text such as “Siul, Soul, Siiul, Soul.” It is implemented as, and the reference data can be implemented as text such as “Seoul”.

상기와 같이, 복합 감정 분석 서버(300)는 단어 사전 데이터베이스를 이용하여 각각의 단어를 서로 다른 높낮이, 길이, 억양 및 발음으로 구현하여 음성 키워드를 생성한 후 상기 음성 키워드를 기초로 텍스트 키워드를 생성한다. 그런 다음, 복합 감정 분석 서버(300)는 음성 키워드 및 상기 텍스트 키워드 각각을 해당 단어를 참조 데이터로서 매칭시켜 저장하여 키워드 데이터베이스를 생성한다. As described above, the complex emotion analysis server 300 generates a voice keyword by implementing each word with different pitch, length, intonation, and pronunciation using a word dictionary database, and then generates a text keyword based on the voice keyword. do. Then, the complex emotion analysis server 300 creates a keyword database by matching and storing each of the voice keywords and the text keywords with the corresponding words as reference data.

즉, 복합 감정 분석 서버(300)는 단어가 텍스트인 경우, 텍스트 키워드 및 참조 데이터를 매칭시켜 저장하여 키워드 데이터베이스를 생성하고, 단어를 표준 음성으로 변환하여 생성한 음성 키워드 및 참조 데이터를 매칭시켜 저장하여 키워드 데이터베이스를 생성한다. That is, if the word is text, the complex emotion analysis server 300 creates a keyword database by matching and storing text keywords and reference data, and matches and stores voice keywords and reference data generated by converting the word into a standard voice. to create a keyword database.

이와 같은 과정을 통해 키워드 데이터베이스를 생성하는 이유는, 동일한 단어라도 발화하는 사용자마다 서로 다른 높낮이, 길이, 억양 및 발음으로 다른 음성으로 발화될 수 있기 때문이다. The reason for creating a keyword database through this process is that even the same word can be uttered in a different voice with different pitch, length, intonation, and pronunciation for each user who speaks it.

이하에서는, 복합 감정 분석 서버(300)가 통화 음성 데이터를 이용하여 수정 텍스트 데이터를 생성하는 과정을 설명하기로 한다.Hereinafter, a process in which the complex emotion analysis server 300 generates modified text data using call voice data will be described.

일 실시예에서, 복합 감정 분석 서버(300)는 키워드 데이터베이스를 기초로 통화 음성 데이터를 통화 텍스트 데이터로 변환한 후 통화 텍스트 데이터 상에 텍스트 키워드가 존재하는지 여부를 확인한다. In one embodiment, the complex emotion analysis server 300 converts call voice data into call text data based on a keyword database and then checks whether a text keyword exists in the call text data.

만일, 복합 감정 분석 서버(300)는 키워드 데이터베이스를 기초로 통화 텍스트 데이터에 텍스트 키워드가 존재하면, 키워드 데이터베이스에서 텍스트 키워드에 해당하는 참조 데이터를 추출한 후 참조 데이터를 이용하여 텍스트 데이터를 수정하여 수정 텍스트 데이터를 생성할 수 있다.If a text keyword exists in the call text data based on the keyword database, the complex emotion analysis server 300 extracts reference data corresponding to the text keyword from the keyword database and then modifies the text data using the reference data to create a modified text. Data can be generated.

예를 들어, 복합 감정 분석 서버(300)는 텍스트 키워드가“시울에서는 무엇을 하나요?”인 경우, 키워드 데이터베이스에서 텍스트 키워드“시울”에 해당하는 참조 데이터 “서울”을 기초로 통화 텍스트 데이터를 수정하여 수정 텍스트 데이터 “서울에서는 무엇을 하나요?”를 생성할 수 있다.For example, when the text keyword is “What do you do in Siul?”, the complex emotion analysis server 300 modifies the call text data based on the reference data “Seoul” corresponding to the text keyword “Seoul” in the keyword database. You can create modified text data “What do you do in Seoul?”

다른 일 실시예에서, 복합 감정 분석 서버(300)는 키워드 데이터베이스를 기초로 통화 음성 데이터 상에 음성 키워드가 존재하는지 여부를 확인한다.In another embodiment, the complex emotion analysis server 300 determines whether a voice keyword exists in call voice data based on a keyword database.

만일, 복합 감정 분석 서버(300)는 통화 음성 데이터 상에 음성 키워드가 존재하면, 키워드 데이터베이스에서 음성 키워드에 해당하는 참조 데이터를 추출한 후 참조 데이터를 기초로 텍스트 데이터를 수정하여 수정 텍스트 데이터를 생성할 수 있다. 이때, 복합 감정 분석 서버(300)는 통화 음성 데이터 중 음성 키워드에 해당하는 음성 데이터의 주파수 특징을 제2 참조 데이터베이스에 저장한 후, 추후에 해당 통화 음성 데이터에 해당하는 감정이 결정되면 음성 데이터의 주파수 특징 및 감정을 대응시켜 저장한다. If a voice keyword exists in the call voice data, the complex emotion analysis server 300 extracts reference data corresponding to the voice keyword from the keyword database and then modifies the text data based on the reference data to generate modified text data. You can. At this time, the complex emotion analysis server 300 stores the frequency characteristics of the voice data corresponding to the voice keyword among the call voice data in the second reference database, and later, when the emotion corresponding to the call voice data is determined, the voice data Frequency characteristics and emotions are stored in correspondence.

예를 들어, 복합 감정 분석 서버(300)는 고객사 서버(200)로부터 수신된 통화 음성 데이터가 “소오울은 말이지?”인 경우, 키워드 데이터베이스에서 음성 키워드 “소오울”에 해당하는 참조 데이터 “서울”을 기초로 통화 텍스트 데이터를 수정하여 수정 텍스트 데이터 “서울은 말이지?”를 생성하고, “소우울”에 해당하는 주파수 특징을 제2 참조 데이터베이스에 저장한다. For example, when the call voice data received from the customer company server 200 is “Soul, right?”, the complex emotion analysis server 300 uses reference data corresponding to the voice keyword “Soul” in the keyword database, “Seoul.” ” Based on this, the call text data is modified to generate modified text data “Seoul, right?”, and the frequency characteristics corresponding to “Soul” are stored in the second reference database.

상기의 실시예에서, 복합 감정 분석 서버(300)는 통화 음성 데이터 상에 키워드 데이터베이스에 미리 저장된 텍스트 키워드 또는 음성 키워드가 존재하지 않는 경우, 키워드 데이터베이스를 참조로 통화 음성 데이터 또는 통화 텍스트 데이터 상에 유사 키워드가 존재하는지 여부를 확인한다. In the above embodiment, when there is no text keyword or voice keyword pre-stored in the keyword database in the call voice data, the complex emotion analysis server 300 refers to the keyword database and creates a similar message on the call voice data or call text data. Check whether the keyword exists.

예를 들어, 복합 감정 분석 서버(300)는 통화 음성 데이터가 “시우우우울은 어떻게 가나요?”이고 키워드 데이터베이스에 미리 저장된 음성 키워드가 존재하지 않으면, 키워드 데이터베이스에서 음성 키워드 “시울”을 유사 키워드라고 판단한다. For example, if the call voice data is “How do I get there?” and there is no voice keyword pre-stored in the keyword database, the complex emotion analysis server 300 uses the voice keyword “Seaul” as a similar keyword in the keyword database. It is judged that

만일, 복합 감정 분석 서버(300)는 키워드 데이터베이스를 참조로 통화 음성 데이터 또는 통화 텍스트 데이터 상에 유사 키워드가 존재하면, 유사 키워드를 이용하여 통화 텍스트 데이터를 수정한 후 수정 텍스트 데이터를 생성한다. If a similar keyword exists in the call voice data or call text data with reference to the keyword database, the complex emotion analysis server 300 modifies the call text data using the similar keyword and generates modified text data.

그런 다음, 복합 감정 분석 서버(300)는 통화 음성 데이터 또는 통화 텍스트 데이터 상에서 유사 키워드와 매칭되는 데이터를 이용하여 키워드 데이터베이스를 갱신한다.Then, the complex emotion analysis server 300 updates the keyword database using data matching similar keywords in call voice data or call text data.

예를 들어, 복합 감정 분석 서버(300)는 통화 음성 데이터를 기초로 생성된 통화 텍스트 데이터가 “시우우우울은 어떻게 가나요?”이고 유사 키워드가 “시울”인 경우, 유사 키워드 “시울”과 매칭되는 통화 텍스트 데이터 “시우우우울”을 이용하여 키워드 데이터베이스를 갱신한다. For example, if the call text data generated based on call voice data is “How do I get to Siuuuul?” and the similar keyword is “Siuul,” the complex emotion analysis server 300 uses the similar keyword “Siuul” and Update the keyword database using the matching call text data “Siuuuul.”

따라서, 원래의 데이터베이스에 음성 키워드“시울, 소울, 시이울, 소오울”및 참조 데이터“서울”이 저장되어 있는 경우, 음성 키워드 “시우우우울, 시울, 소울, 시이울, 소오울”및 참조 데이터“서울”로 키워드 데이터베이스가 갱신된다.Therefore, if the voice keyword “Seoul, Soul, Siul, Soul” and the reference data “Seoul” are stored in the original database, the voice keyword “Seoul, Siul, Soul, Siuul, Soul” and the reference data “Seoul” are stored in the original database. The keyword database is updated with the data “Seoul”.

상기와 같이, 복합 감정 분석 서버(300)는 통화 음성 데이터를 통화 텍스트 데이터로 변환한 후 키워드 데이터베이스를 기초로 수정하여 수정 텍스트 데이터를 생성할 때 참조된 참조 데이터 별 횟수를 화자별로 제1 참조 데이터베이스에 저장한다. As described above, the complex emotion analysis server 300 converts the call voice data into call text data and then modifies it based on the keyword database to calculate the number of times for each reference data referenced when generating the modified text data in the first reference database for each speaker. Save it to

이와 같은 이유는, 수정 텍스트 데이터를 참조된 참조 데이터가 미리 결정된 감정 결정 키워드인 경우 해당 참조 데이터의 참조 횟수에 따라 미리 결정된 감정 결정 키워드에 해당하는 감정으로 결정하기 위해서이다.The reason for this is to determine the modified text data as an emotion corresponding to the predetermined emotion determination keyword according to the number of references to the reference data when the reference data referenced is a predetermined emotion determination keyword.

복합 감정 분석 서버(300)는 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 및 미리 결정된 감정 키워드를 비교하고, 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는 경우 해당 화자 별 참조 데이터의 참조 횟수를 추출하고, 해당 화자 별 참조 데이터의 참조 횟수에 따라 해당 감정 결정 키워드에 해당하는 감정으로 결정한다.The complex emotion analysis server 300 compares reference data for each speaker stored in the first reference database and predetermined emotion keywords, and if a predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database, a reference for each speaker. The reference count of the data is extracted, and the emotion corresponding to the emotion determination keyword is determined according to the reference count of the reference data for each speaker.

일 실시예에서, 복합 감정 분석 서버(300)는 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는 경우 해당 화자 별 참조 횟수를 추출하고, 해당 화자 별 참조 데이터의 참조 횟수에 따라 해당 감정 결정 키워드에 해당하는 감정으로 결정할 수 있다.In one embodiment, the complex emotion analysis server 300 extracts the reference count for each speaker when a predetermined emotion keyword is present among the reference data for each speaker stored in the first reference database, and adds the reference count for each speaker to the reference count of the reference data for each speaker. Accordingly, the emotion corresponding to the relevant emotion decision keyword can be determined.

다른 일 실시예에서, 복합 감정 분석 서버(300)는 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 동일한 참조 데이터가 존재하는 경우, 상기 동일한 참조 데이터의 참조 횟수에 따라 해당 감정 결정 키워드에 해당하는 감정으로 상기 화자 사이의 감정으로 결정할 수 있다. In another embodiment, when the same reference data exists among the speaker-specific reference data stored in the first reference database, the complex emotion analysis server 300 determines the emotion corresponding to the emotion determination keyword according to the reference number of the same reference data. It can be decided based on the feelings between the speakers.

상기와 실시예와는 달리, 복합 감정 분석 서버(300)는 제1 사용자 단말(100) 및 제2 사용자 단말(200) 사이의 통화 음성 데이터를 주파수 분석하여 주파수 특징을 추출하고, 통화 음성 데이터의 주파수 특징 및 제2 참조 데이터베이스에 저장된 주파수 특징을 비교하여 비교 결과에 따라 미리 결정된 감정 결정 키워드에 해당하는 감정으로 화자의 감정으로 결정한다. Unlike the above and the embodiment, the complex emotion analysis server 300 performs frequency analysis on the call voice data between the first user terminal 100 and the second user terminal 200 to extract frequency characteristics, and extracts the frequency characteristics of the call voice data. The frequency feature and the frequency feature stored in the second reference database are compared, and according to the comparison result, the speaker's emotion is determined as the emotion corresponding to the predetermined emotion determination keyword.

상기의 감정 결정 키워드는 감정경험의 핵심이 되는 요소로서, 긍정적 감정(혹은 쾌)과 부정적 감정(혹은 불쾌)을 경험하는 정도를 말하며, 외부 세계에 대한 정보(삶에 득이 되는지 해가 되는지, 혹은 보상적인지 처벌적인지)가 내적 감정신호나 상태로 변환되어 대상이나 상황에 적절하게 대응(접근 혹은 회피)하도록 하는 과정을 핵심 감정(core affect)이라고 정의할 수 있는데, 동기화(motivation)의 측면에서 쾌는 삶에 득이 되는 것에 대한 접근동기에 기반한 감정경험으로, 불쾌는 해가 되는 것에 대한 회피동기에 기반한 감정경험으로 정의된다.The above emotion-determining keywords are core elements of emotional experience and refer to the degree to which positive emotions (or pleasure) and negative emotions (or discomfort) are experienced, as well as information about the external world (whether it is beneficial or harmful to life, The process of converting an emotion (or reward or punishment) into an internal emotional signal or state to respond appropriately (approach or avoidance) to an object or situation can be defined as a core affect. In terms of motivation, Pleasure is defined as an emotional experience based on the motivation to approach things that are beneficial to life, and displeasure is defined as an emotional experience based on the motivation to avoid things that are harmful.

쾌-불쾌가 감정경험의 핵심인 이유는 모든 사람이 보편적으로 경험하는 감정이며, 특별한 학습이 필요치 않은 원초적 감정이기 때문이며, 이 경험은 주관적인 언어의 표현에서 얼굴, 음성, 몸 등 객관적 지표를 통해 일관되게 나타난다는 특성을 갖고 있다. 일상의 어휘들을 극성과 각성의 두 차원에서 분석한 선행연구들의 결과를 보면, 감정표현어휘들이 부정에서부터 긍정에 이르는 연속선상에 골고루 분포되기보다는, 긍정과 부정으로 양극화하는 경향을 보한다. The reason why pleasure-displeasure is the core of emotional experience is because it is an emotion that everyone experiences universally and is a primal emotion that does not require special learning, and this experience is consistent through objective indicators such as face, voice, and body in subjective verbal expressions. It has the characteristic of appearing clearly. Looking at the results of previous studies that analyzed everyday vocabulary in two dimensions of polarity and arousal, emotional expression vocabulary tends to be polarized into positive and negative, rather than being evenly distributed along a continuum from negative to positive.

이에 따라, 복합 감정 분석 서버(300)는 한국어 감정어휘의 차원을 분석한 결과, 감정 표현 대표 어휘들의 측정치로 형용사는 물론 동사와 명사까지를 포함한 어휘들에 대해 도출된 키워드 데이터베이스를 이용하도록 한다Accordingly, the complex emotion analysis server 300 uses a keyword database derived for vocabulary including adjectives, verbs, and nouns as a measure of representative emotional expression vocabulary as a result of analyzing the dimensions of the Korean emotional vocabulary.

이러한 키워드 데이터베이스를 만들기 위해서는, 감성어휘들을 추출하고, 추출된 감성어휘에 대하여 극성값을 도출해야 하는데, 감성어휘를 추출하기 위해서는 3 단계의 처리작업을 수행할 수 있다. 우선, 영어, 숫자, 한글자 어휘, 특수문자 등의 불용어 제거작업이 선행되며, 남은 어휘에 대하여 TF, TF-IDF 값을 기준으로 일정 기준 이하의 값을 가진 단어들을 제거할 수 있고, 마지막으로 동일성을 부여하는 작업을 통하여 감성사전에 사용될 어휘를 최종 확정할 수 있다. 이렇게 추출된 어휘에 대한 감성극성값은 감성어휘가 출현한 긍정/부정 회수비율로 정의될 수 있다.In order to create such a keyword database, emotional vocabularies must be extracted and polarity values for the extracted emotional vocabularies must be derived. Three stages of processing can be performed to extract emotional vocabularies. First, the task of removing stop words such as English, numbers, Korean-character vocabulary, and special characters is performed. For the remaining vocabulary, words with values below a certain standard can be removed based on the TF and TF-IDF values, and finally, identity The vocabulary to be used in the emotional dictionary can be finalized through the task of assigning . The emotional polarity value for the vocabulary extracted in this way can be defined as the positive/negative recall ratio in which the emotional vocabulary appears.

이렇게 음성과 텍스트로 감정이 분석되고 정의된 경우, 상술한 바와 같이 사용자 적응(Adaptive)과정이 실행된다. 아무리 빅데이터를 이용하여 객관적으로 감정을 분석했다고 할지라도, 개인편차가 발생할 수 있기 때문에, 적응과정 및 학습과정을 거치게 된다. 개인편차를 피드백 및 인공신경망으로 학습한 후, 학습결과를 사전에 업데이트하여 어휘와 피치의 특징벡터를 학습 및 업데이트할 수 있다. When emotions are analyzed and defined through voice and text, a user adaptation process is performed as described above. No matter how objectively you analyze emotions using big data, individual deviations may occur, so you go through an adaptation process and a learning process. After learning individual deviations through feedback and artificial neural networks, the learning results can be updated in advance to learn and update feature vectors of vocabulary and pitch.

도 2는 본 발명의 일 실시예에 따른 복합 감정 분석 서버의 일 실시예를 설명하기 위한 블록도이다.Figure 2 is a block diagram for explaining an embodiment of a complex emotion analysis server according to an embodiment of the present invention.

도 2를 참조하면, 복합 감정 분석 서버(300)는 STT 변환부(310), 키워드 데이터베이스(320), 수정 데이터 생성부(330), 제1 참조 데이터베이스(340), 제2 참조 데이터베이스(350), 감정 결정부(360) 및 제어부(370)를 포함한다. Referring to FIG. 2, the complex emotion analysis server 300 includes an STT conversion unit 310, a keyword database 320, a modified data generation unit 330, a first reference database 340, and a second reference database 350. , includes an emotion determination unit 360 and a control unit 370.

STT 변환부(310)는 복수의 사용자 단말이 호 연결되면 사용자 사이의 통화 음성 데이터를 실시간으로 통화 텍스트 데이터로 변환한다.The STT converter 310 converts call voice data between users into call text data in real time when a plurality of user terminals are connected.

키워드 데이터베이스(320)에는 키워드 별 참조 데이터가 저장되어 있다. 이때, 키워드는 음성 또는 텍스트일 수 있으며, 참조 데이터는 텍스트로 구현될 수 있다. The keyword database 320 stores reference data for each keyword. At this time, the keyword may be voice or text, and the reference data may be implemented as text.

예를 들어, 키워드 데이터베이스(320)에 저장된 키워드는 “시울, 소울, 소울, 시이울, 소오울” 등과 같은 텍스트로 구현되고, 참조 데이터는 “서울”과 같은 텍스트로 구현될 수 있다. For example, keywords stored in the keyword database 320 may be implemented as text such as “Siul, Soul, Soul, Siiul, Soul,” etc., and reference data may be implemented as text such as “Seoul.”

다른 예를 들어, 키워드 데이터베이스(320)에 키워드가 텍스트로 구현되고 참조 데이터가 텍스트로 구현된 경우, 텍스트 키워드는 “시울, 소울, 시이울, 소오울”과 같은 텍스트로 구현되고, 참조 데이터는 “서울” 등과 같은 텍스트로 구현될 수 있다. For another example, if the keyword in the keyword database 320 is implemented as text and the reference data is implemented as text, the text keyword is implemented as text such as “Siul, Soul, Siiul, Soul”, and the reference data is It can be implemented as text such as “Seoul”.

상기와 같이, 본 발명은 단어 사전 데이터베이스를 이용하여 각각의 단어를 서로 다른 높낮이, 길이, 억양 및 발음으로 구현하여 음성 키워드를 생성한 후 상기 음성 키워드를 기초로 텍스트 키워드를 생성한다. 그런 다음, 본 발명은 음성 키워드 및 상기 텍스트 키워드 각각을 해당 단어를 참조 데이터로서 매칭시켜 저장하여 키워드 데이터베이스를 생성한다. As described above, the present invention uses a word dictionary database to generate voice keywords by implementing each word with different pitch, length, intonation, and pronunciation, and then generates text keywords based on the voice keywords. Then, the present invention creates a keyword database by matching and storing each of the voice keywords and the text keywords as reference data.

즉, 단어가 텍스트인 경우, 텍스트 키워드 및 참조 데이터를 매칭시켜 키워드 데이터베이스(320)에 저장하고, 단어를 표준 음성으로 변환하여 생성한 음성 키워드 및 참조 데이터를 매칭시켜 키워드 데이터베이스(320)에 저장한다. That is, if the word is text, the text keywords and reference data are matched and stored in the keyword database 320, and the voice keywords and reference data generated by converting the word into a standard voice are matched and stored in the keyword database 320. .

수정 데이터 생성부(330)는 키워드 데이터베이스(320)를 기초로 상기 통화 음성 데이터 및 상기 통화 텍스트 데이터를 수정하여 수정 데이터를 생성한다. The modified data generator 330 modifies the call voice data and the call text data based on the keyword database 320 to generate modified data.

일 실시예에서, 수정 데이터 생성부(330)는 키워드 데이터베이스(320)를 기초로 통화 음성 데이터를 통화 텍스트 데이터로 변환한 후 통화 텍스트 데이터 상에 텍스트 키워드가 존재하는지 여부를 확인한다. In one embodiment, the modified data generator 330 converts call voice data into call text data based on the keyword database 320 and then checks whether a text keyword exists in the call text data.

만일, 수정 데이터 생성부(330)는 키워드 데이터베이스(320)를 기초로 통화 텍스트 데이터에 텍스트 키워드가 존재하면, 데이터베이스에서 텍스트 키워드에 해당하는 참조 데이터를 추출한 후 참조 데이터를 이용하여 텍스트 데이터를 수정하여 수정 텍스트 데이터를 생성할 수 있다.If a text keyword exists in the call text data based on the keyword database 320, the modified data generator 330 extracts reference data corresponding to the text keyword from the database and then modifies the text data using the reference data. Modified text data can be created.

예를 들어, 수정 데이터 생성부(330)는 텍스트 키워드가“시울에서는 무엇을 하나요?”인 경우, 데이터베이스에서 텍스트 키워드“시울”에 해당하는 참조 데이터 “서울”을 기초로 통화 텍스트 데이터를 수정하여 수정 텍스트 데이터 “서울에서는 무엇을 하나요?”를 생성할 수 있다.For example, when the text keyword is “What do you do in Siul?”, the modified data generation unit 330 modifies the call text data based on the reference data “Seoul” corresponding to the text keyword “Seoul” in the database. You can create modified text data “What do you do in Seoul?”

다른 일 실시예에서, 수정 데이터 생성부(330)는 키워드 데이터베이스(320)를 기초로 통화 음성 데이터 상에 음성 키워드가 존재하는지 여부를 확인한다.In another embodiment, the modified data generator 330 determines whether a voice keyword exists in the call voice data based on the keyword database 320.

만일, 수정 데이터 생성부(330)는 통화 음성 데이터 상에 음성 키워드가 존재하면, 데이터베이스에서 음성 키워드에 해당하는 참조 데이터를 추출한 후 참조 데이터를 기초로 텍스트 데이터를 수정하여 수정 텍스트 데이터를 생성할 수 있다. If a voice keyword exists in the call voice data, the modified data generator 330 may extract reference data corresponding to the voice keyword from the database and then modify the text data based on the reference data to generate modified text data. there is.

이때, 통화 음성 데이터 중 음성 키워드에 해당하는 음성 데이터의 주파수 특징이 제2 참조 데이터베이스(350)에 저장되며, 추후에 해당 통화 음성 데이터에 해당하는 감정이 결정되면 음성 데이터의 주파수 특징 및 감정이 대응되어 제2 참조 데이터베이스(350)에 저장된다. At this time, the frequency characteristics of the voice data corresponding to the voice keyword among the call voice data are stored in the second reference database 350, and later, when the emotion corresponding to the call voice data is determined, the frequency characteristic and emotion of the voice data correspond to the corresponding voice data. and stored in the second reference database 350.

예를 들어, 수정 데이터 생성부(330)는 통화 음성 데이터가 “소오울은 말이지?”인 경우, 데이터베이스에서 음성 키워드 “소오울”에 해당하는 참조 데이터 “서울”을 기초로 통화 텍스트 데이터를 수정하여 수정 텍스트 데이터 “서울은 말이지?”를 생성하며,“소우울”에 해당하는 주파수 특징이 제2 참조 데이터베이스(350)에 저장된다. For example, when the call voice data is “Soul, right?”, the correction data generator 330 modifies the call text data based on the reference data “Seoul” corresponding to the voice keyword “Soul” in the database. Thus, modified text data “Seoul?” is generated, and the frequency characteristic corresponding to “Soul” is stored in the second reference database 350.

상기의 실시예에서, 수정 데이터 생성부(330)는 통화 음성 데이터 상에 데이터베이스에 미리 저장된 텍스트 키워드 또는 음성 키워드가 존재하지 않는 경우, 데이터베이스를 참조로 통화 음성 데이터 또는 통화 텍스트 데이터 상에 유사 키워드가 존재하는지 여부를 확인한다. In the above embodiment, if the text keyword or voice keyword pre-stored in the database does not exist in the call voice data, the correction data generator 330 refers to the database and creates a similar keyword in the call voice data or call text data. Check whether it exists or not.

예를 들어, 수정 데이터 생성부(330)는 통화 음성 데이터가 “시우우우울은 어떻게 가나요?”이고 데이터베이스에 미리 저장된 음성 키워드가 존재하지 않으면, 데이터베이스에서 음성 키워드 “시울”을 유사 키워드라고 판단한다. For example, if the call voice data is “How do I get to Siwoowooul?” and there is no voice keyword pre-stored in the database, the modified data generation unit 330 determines that the voice keyword “Siwooul” is a similar keyword in the database. do.

만일, 수정 데이터 생성부(330)는 키워드 데이터베이스(320)를 참조로 통화 음성 데이터 또는 통화 텍스트 데이터 상에 유사 키워드가 존재하면, 유사 키워드를 이용하여 통화 텍스트 데이터를 수정한 후 수정 텍스트 데이터를 생성한다. If a similar keyword exists in the call voice data or call text data with reference to the keyword database 320, the modified data generator 330 modifies the call text data using the similar keyword and generates modified text data. do.

그런 다음, 수정 데이터 생성부(330)는 통화 음성 데이터 또는 통화 텍스트 데이터 상에서 유사 키워드와 매칭되는 데이터를 이용하여 키워드 데이터베이스(320)를 갱신한다.Then, the modified data generator 330 updates the keyword database 320 using data matching similar keywords in the call voice data or call text data.

예를 들어, 수정 데이터 생성부(330)는 통화 음성 데이터를 기초로 생성된 통화 텍스트 데이터가 “시우우우울은 어떻게 가나요?”이고 유사 키워드가 “시울”인 경우, 유사 키워드 “시울”과 매칭되는 통화 텍스트 데이터 “시우우우울”을 이용하여 키워드 데이터베이스(320)를 갱신한다. For example, if the call text data generated based on the call voice data is “How do I get to Siuuuul?” and the similar keyword is “Siuul,” the modified data generation unit 330 may use the similar keyword “Siuul” and The keyword database 320 is updated using the matching call text data “Siuuuul.”

따라서, 원래의 데이터베이스에 음성 키워드“시울, 소울, 시이울, 소오울”및 참조 데이터“서울”이 저장되어 있는 경우, 음성 키워드 “시우우우울, 시울, 소울, 시이울, 소오울”및 참조 데이터“서울”로 키워드 데이터베이스(320)가 갱신된다.Therefore, if the voice keyword “Seoul, Soul, Siul, Soul” and the reference data “Seoul” are stored in the original database, the voice keyword “Seoul, Siul, Soul, Siuul, Soul” and the reference data “Seoul” are stored in the original database. The keyword database 320 is updated with the data “Seoul”.

상기 수정 텍스트 데이터를 생성할 때 참조된 참조 데이터 별 횟수를 화자별로 제1 참조 데이터베이스(340)에 저장된다. The number of times each reference data referenced when generating the modified text data is stored in the first reference database 340 for each speaker.

즉, 통화 음성 데이터를 통화 텍스트 데이터로 변환한 후 데이터베이스를 기초로 수정하여 수정 텍스트 데이터를 생성할 때 참조된 참조 데이터 별 횟수를 화자별로 제1 참조 데이터베이스(340)에 저장된다.That is, when call voice data is converted into call text data and modified based on the database to generate modified text data, the number of times each reference data is referenced is stored in the first reference database 340 for each speaker.

감정 결정부(360)는 제1 참조 데이터베이스(340)에 저장된 화자 별 참조 데이터 및 미리 결정된 감정 키워드를 비교하고, 제1 참조 데이터베이스(340)에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는 경우 해당 화자 별 참조 데이터의 참조 횟수를 추출하고, 해당 화자 별 참조 데이터의 참조 횟수에 따라 해당 감정 결정 키워드에 해당하는 감정으로 결정한다.The emotion determination unit 360 compares the reference data for each speaker stored in the first reference database 340 and the predetermined emotion keyword, and determines whether the predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database 340. In this case, the reference count of the reference data for each speaker is extracted, and the emotion corresponding to the emotion determination keyword is determined according to the reference count of the reference data for each speaker.

일 실시예에서, 감정 결정부(360)는 제1 참조 데이터베이스(340)에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는 경우 해당 화자 별 참조 횟수를 추출하고, 해당 화자 별 참조 데이터의 참조 횟수에 따라 해당 감정 결정 키워드에 해당하는 감정으로 결정할 수 있다.In one embodiment, the emotion determination unit 360 extracts the reference count for each speaker when a predetermined emotion keyword is present among the reference data for each speaker stored in the first reference database 340, and refers to the reference data for each speaker. Depending on the number of times, the emotion corresponding to the relevant emotion decision keyword can be determined.

다른 일 실시예에서, 감정 결정부(360)는 제1 참조 데이터베이스에 저장된 화자 별 참조 데이터 중 동일한 참조 데이터가 존재하는 경우, 상기 동일한 참조 데이터의 참조 횟수에 따라 해당 감정 결정 키워드에 해당하는 감정으로 상기 화자 사이의 감정으로 결정할 수 있다. In another embodiment, if the same reference data exists among the reference data for each speaker stored in the first reference database, the emotion determination unit 360 determines the emotion corresponding to the corresponding emotion determination keyword according to the number of references to the same reference data. It can be decided based on the feelings between the speakers.

상기와 실시예와는 달리, 감정 결정부(360)는 제1 사용자 단말(100) 및 제2 사용자 단말(200) 사이의 통화 음성 데이터를 주파수 분석하여 주파수 특징을 추출하고, 통화 음성 데이터의 주파수 특징 및 제2 참조 데이터베이스(350)에 저장된 주파수 특징을 비교하여 비교 결과에 따라 미리 결정된 감정 결정 키워드에 해당하는 감정으로 화자의 감정으로 결정한다. Unlike the above and the embodiment, the emotion determination unit 360 performs frequency analysis on the call voice data between the first user terminal 100 and the second user terminal 200 to extract frequency characteristics and frequency of the call voice data. The feature and the frequency feature stored in the second reference database 350 are compared, and according to the comparison result, the speaker's emotion is determined as the emotion corresponding to the predetermined emotion determination keyword.

도 3은 본 발명에 따른 STT를 이용한 복합 감정 분석 방법의 일 실시예를 설명하기 위한 흐름도이다.Figure 3 is a flowchart illustrating an embodiment of a complex emotion analysis method using STT according to the present invention.

도 3을 참조하면, 복합 감정 분석 서버(300)는 복수의 사용자 단말이 호 연결되면 사용자 사이의 통화 음성 데이터를 실시간으로 통화 텍스트 데이터로 변환한다(단계 S310).Referring to FIG. 3, when a plurality of user terminals are connected to a call, the complex emotion analysis server 300 converts call voice data between users into call text data in real time (step S310).

복합 감정 분석 서버(300)는 미리 결정된 데이터베이스를 기초로 상기 통화 음성 데이터 및 상기 통화 텍스트 데이터를 수정하여 수정 데이터를 생성한다(단계 S320).The complex emotion analysis server 300 generates modified data by modifying the call voice data and the call text data based on a predetermined database (step S320).

복합 감정 분석 서버(300)는 수정 텍스트 데이터를 생성할 때 참조된 참조 데이터 별 횟수를 화자별로 제1 참조 데이터베이스에 저장한다(단계 S330).The complex emotion analysis server 300 stores the number of times each reference data referenced when generating modified text data in the first reference database for each speaker (step S330).

복합 감정 분석 서버(300)는 제1 참조 데이터베이스(340)에 저장된 화자 별 참조 데이터 중 미리 결정된 감정 키워드가 존재하는지 여부에 따라 감정 결정 키워드를 기초로 화자 별 감정으로 결정한다(단계 S340).The complex emotion analysis server 300 determines the emotion for each speaker based on the emotion determination keyword depending on whether a predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database 340 (step S340).

한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.Although the present invention has been described with reference to limited embodiments and drawings, the present invention is not limited to the above embodiments, and various modifications and variations can be made by those skilled in the art from these descriptions. Accordingly, the spirit of the present invention should be understood only by the scope of the claims set forth below, and all equivalent or equivalent modifications thereof shall fall within the scope of the spirit of the present invention.

100: 제1 사용자 단말,
200: 제2 사용자 단말,
300: 복합 감정 분석 서버
310: STT 변환부,
320: 키워드 데이터베이스,
330: 수정 데이터 생성부,
340: 제1 참조 데이터베이스,
350: 제2 참조 데이터베이스,
360: 감정 결정부,
370: 제어부100: first user terminal,
200: second user terminal,
300: Complex emotion analysis server
310: STT conversion unit,
320: keyword database,
330: correction data generation unit,
340: first reference database,
350: secondary reference database,
360: emotional decision part,
370: control unit

Claims

An STT conversion unit that converts call voice data between users into call text data in real time when a plurality of user terminals are connected to a call;
a correction data generator that generates correction data by modifying the call voice data and the call text data based on a predetermined database;
If a text keyword exists in the call text data based on the keyword database, the modified data generator extracts reference data corresponding to the text keyword from the database and then modifies the text data using the reference data to generate modified text data. If a voice keyword exists in the voice data, reference data corresponding to the voice keyword is extracted from the database, and the text data is modified based on the reference data to generate modified text data.
The correction data generator is configured to check whether a similar keyword exists in the call voice data or call text data by referring to the database when the text keyword or voice keyword pre-stored in the database does not exist in the call voice data. Contains more,
a first reference database generator that stores the number of times for each reference data referenced when generating the correction data in a first reference database for each speaker; and
an emotion determination unit that determines an emotion for each speaker based on an emotion determination keyword depending on whether a predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database;
The emotion determination unit extracts frequency characteristics by analyzing the frequency of call voice data between the first user terminal and the second user terminal, and compares the frequency features of the call voice data with the frequency features stored in the second reference database according to the comparison result. A complex emotion analysis server using speaker separation STT for a multi-party call, characterized in that the emotion corresponding to a predetermined emotion determination keyword is determined by the speaker's emotion.

According to claim 1,
The appraisal decision department
Compare reference data for each speaker stored in the first reference database and predetermined emotion keywords, and if a predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database, extract the reference number of the reference data for each speaker; , A complex emotion analysis server using speaker separation STT of a multi-party call, comprising determining the emotion corresponding to the emotion determination keyword according to the reference number of reference data for each speaker.

According to paragraph 1 or 2,
The complex emotion analysis method of the complex emotion analysis server using STT is:
When a plurality of user terminals are connected to a call, converting call voice data between users into call text data in real time;
generating modified text data by modifying the call voice data and the call text data based on a predetermined database;
storing the number of times for each reference data referenced when generating the correction data in a first reference database for each speaker;
A step of determining an emotion for each speaker based on an emotion determination keyword depending on whether a predetermined emotion keyword exists among reference data for each speaker stored in the first reference database;
The step of determining an emotion for each speaker based on the emotion decision keyword depending on whether the predetermined emotion keyword exists is
A complex emotion analysis method using speaker separation STT for a multi-party call, further comprising the step of extracting the reference count for each speaker when a predetermined emotion keyword exists among the reference data for each speaker stored in the first reference database.

According to claim 3,
The step of determining an emotion for each speaker based on the emotion decision keyword depending on whether the predetermined emotion keyword exists is
A complex emotion analysis method using speaker separation STT for a multi-party call, further comprising the step of determining the emotion corresponding to the emotion determination keyword according to the reference number of reference data for each speaker.

According to claim 3,
The step of determining an emotion for each speaker based on the emotion decision keyword depending on whether the predetermined emotion keyword exists is
If the same reference data exists among the reference data for each speaker stored in the first reference database, determining the emotion corresponding to the emotion determination keyword as the emotion between speakers according to the number of references to the same reference data. A complex emotion analysis method using speaker separation STT in a multi-party call.