KR101326313B1

KR101326313B1 - Method of classifying emotion from multi sentence using context information

Info

Publication number: KR101326313B1
Application number: KR1020120024733A
Authority: KR
Inventors: 강행봉; 조상현
Original assignee: 가톨릭대학교 산학협력단
Priority date: 2012-03-09
Filing date: 2012-03-09
Publication date: 2013-11-11
Also published as: KR20130103249A

Abstract

본 발명은 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법에 관한 것으로서, 보다 구체적으로는 (1) 텍스트를 구성하는 다중 문장 각각에 대하여 컨텍스트 정보를 추출하는 단계, (2) 추출된 컨텍스트 정보를 이용하여 다중 문장으로부터 복수 개의 주요 문장을 추출하는 단계, (3) 추출된 주요 문장 각각에 대하여 감정 특징을 추출하는 단계, 및 (4) 추출된 감정 특징을 이용하여 감정 분류기를 통해 각각의 주요 문장의 감정을 분류하는 단계를 포함하는 것을 그 구성상의 특징으로 한다.
본 발명에서 제안하고 있는 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법에 따르면, 컨텍스트 정보를 이용하여 텍스트를 구성하는 다중 문장으로부터 주요 문장을 추출하고, 추출된 주요 문장에 대하여 감정을 분류하고, 분류된 감정을 결합함으로써 온라인 상에서 수집할 수 있는 다중 문장으로부터 감정을 정확하게 분류하여, 마케팅 전략에 활용할 수 있다.The present invention relates to a method for classifying emotions from multiple sentences using context information, and more specifically, (1) extracting context information for each of multiple sentences constituting text, and (2) using extracted context information. Extracting a plurality of main sentences from the multiple sentences, (3) extracting an emotional feature for each of the extracted main sentences, and (4) extracting each of the main sentences using the emotion classifier Categorizing the emotions is characterized by their construction.
According to the method of classifying emotions from multiple sentences using context information proposed by the present invention, the main sentences are extracted from the multiple sentences constituting the text using the context information, the emotions are classified, and the classification is performed on the extracted main sentences. By combining these emotions, you can accurately classify emotions from multiple sentences that can be collected online and use them in your marketing strategy.

Description

Method of classifying emotions from multiple sentences using context information {METHOD OF CLASSIFYING EMOTION FROM MULTI SENTENCE USING CONTEXT INFORMATION}

본 발명은 감정 분류 방법에 관한 것으로서, 보다 구체적으로는 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법에 관한 것이다.The present invention relates to a emotion classification method, and more particularly, to a emotion classification method from multiple sentences using context information.

스마트폰의 대중적인 보급으로 인해 트위터, 페이스북과 같은 소셜 네트워크 서비스(Social Network Service; SNS)가 보편화됨에 따라 다양한 주제에 대하여 수많은 의견들이 실시간으로 개진되고 있다. SNS는 기존의 인맥을 강화하고 새로운 인맥을 형성하여 폭넓은 인적 네트워크를 형성할 수 있도록 해주는 서비스로서, 많은 사람은 이와 같은 서비스를 통해 서로에게 댓글을 달아주는 형태로 막대한 양의 텍스트 정보를 생성하고 있다.
Due to the popularization of smart phones, social network services (SNS) such as Twitter and Facebook have become commonplace, and a lot of opinions on various topics are being expressed in real time. SNS is a service that strengthens existing networks and forms new ones to form a broad network of people. Many people create huge amounts of text information in the form of comments to each other through such services. have.

최근에는, 상품에 대한 리뷰(review), 영화 감상평, 음식 평가 등의 주요 이슈에 대하여 바이럴 마케팅(viral marketing), 즉 입소문을 통한 마케팅 전략이 많이 이용되고 있는바, 이와 같은 SNS 정보로부터 소비자들의 의견을 정확히 판단하는 것이 마케팅 전략에 매우 중요한 것으로 인식되고 있는 실정이다.
Recently, viral marketing, or word-of-mouth marketing strategies, has been widely used for major issues such as product reviews, movie reviews, and food ratings. It is recognized that it is very important to the marketing strategy to accurately determine the.

따라서 SNS 사용자들에 의해 작성된 막대한 텍스트들로부터 의미 있는 정보를 찾기 위한 연구가 관심의 대상이 되고 있고, 특히, 문장에 담겨 있는 감정은 활용 범위가 매우 넓은 정보인바, 문장으로부터 감정을 분류 또는 인식하는 연구가 이루어지고 있지만(공개번호 제10-2002-0042248호 참조), 매우 미약한 실정이다.Therefore, research to find meaningful information from huge texts written by SNS users is of interest, and in particular, the emotions contained in sentences are very wide range of information. Research is being done (see Publication No. 10-2002-0042248), but very weak.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 컨텍스트 정보를 이용하여 텍스트를 구성하는 다중 문장으로부터 주요 문장을 추출하고, 추출된 주요 문장에 대하여 감정을 분류하고, 분류된 감정을 결합함으로써 온라인 상에서 수집할 수 있는 다중 문장으로부터 감정을 정확하게 분류하여, 마케팅 전략에 활용할 수 있는, 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법을 제공하는 것을 그 목적으로 한다.The present invention has been proposed to solve the above problems of the conventionally proposed methods, extracts the main sentence from the multiple sentences constituting the text using the context information, classifies the emotion to the extracted main sentence, An object of the present invention is to provide a method of classifying emotions from multiple sentences using context information, by accurately classifying emotions from multiple sentences that can be collected online by combining the classified emotions, and using them in a marketing strategy.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른, 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법은,According to an aspect of the present invention for achieving the above object, a method for classifying emotions from multiple sentences using context information,

(1) 텍스트를 구성하는 다중 문장 각각에 대하여 컨텍스트 정보를 추출하는 단계;(1) extracting context information for each of the multiple sentences constituting the text;

(2) 추출된 상기 컨텍스트 정보를 이용하여 다중 문장으로부터 복수 개의 주요 문장을 추출하는 단계;(2) extracting a plurality of main sentences from the multiple sentences using the extracted context information;

(3) 추출된 상기 주요 문장 각각에 대하여 감정 특징을 추출하는 단계; 및(3) extracting an emotional feature for each of the extracted main sentences; And

(4) 추출된 상기 감정 특징을 이용하여 감정 분류기를 통해 각각의 상기 주요 문장의 감정을 분류하는 단계를 포함하는 것을 그 구성상의 특징으로 한다.
And (4) classifying the emotions of each of the main sentences using an emotion classifier using the extracted emotion features.

바람직하게는,Preferably,

(5) 분류된 각각의 상기 주요 문장의 감정을 결합하는 단계를 더 포함할 수 있다.
(5) may further comprise combining the emotions of each of the main sentences classified.

바람직하게는, 상기 단계 (1)은,Preferably, the step (1)

(1-1) 하기의 수학식을 이용하여 문장에 포함된 키워드 정보를 산출하는 단계;(1-1) calculating keyword information included in a sentence by using the following equation;

(여기서, S는 입력문장이고, k_i는 입력문장 S에 포함되어 있는 i번째 키워드, ω_i∈R는 키워드 가중치이며, K는 입력문장 S에 포함되어 있는 키워드 개수이다.)(Where S is the input sentence, k _i is the i th keyword included in the input sentence S, ω _i ∈ R is the keyword weight, and K is the number of keywords included in the input sentence S).

(1-2) 하기의 수학식을 이용하여 상기 텍스트 내에서 문장의 위치에 대한 가중치를 산출하는 단계;(1-2) calculating weights for positions of sentences in the text using the following equation;

(여기서, index(S_i)는 주요문장 S_i의 인덱스이고, T는 텍스트 안의 문장의 수이다.)(Where index (S _i ) is the index of the main sentence S _i , and T is the number of sentences in the text.)

(1-3) 하기의 수학식을 이용하여 문장 간의 감정 변화도를 산출하는 단계; 및(1-3) calculating the degree of emotional change between sentences using the following equation; And

(여기서, n(S_ps)은 동일한 감정을 가지는 앞 문장의 수이다.)(Where n (S _ps ) is the number of preceding sentences with the same emotion.)

(1-4) 산출된 상기 키워드 정보, 문장의 위치에 대한 가중치 및 문장 간의 변화도를 이용하여 하기의 수학식에 의해 문장의 컨텍스트 정보를 산출하는 단계(1-4) calculating context information of a sentence by using the following equation by using the calculated keyword information, a weight of a position of a sentence, and a degree of change between sentences.

를 포함할 수 있다.
. &Lt; / RTI >

바람직하게는, 상기 단계 (3)은,Preferably, the step (3)

(3-1) 형태소 분석기를 이용하여 문장에 포함된 단어를 품사별로 분류하는 단계; 및(3-1) classifying words included in sentences by parts of speech using a morpheme analyzer; And

(3-2) 감정 사전을 이용하여 상기 단어에 대한 감정 특징을 추출하는 단계를 포함할 수 있다.
(3-2) extracting an emotional feature of the word using an emotional dictionary.

더욱 바람직하게는, 상기 단계 (3-2)에서,More preferably, in the step (3-2),

상기 감정 사전은 어휘 사전 기반의 형식적 감정 사전에 도메인 기반 감정 사전을 추가하여 구축될 수 있다.
The emotion dictionary may be constructed by adding a domain-based emotion dictionary to a formal emotion dictionary based on a lexical dictionary.

더욱 바람직하게는, 상기 감정 사전은,More preferably, the emotion dictionary,

명사, 동사, 형용사, 부사 및 이모티콘별 감정 특징을 포함할 수 있다.
It may include emotional features for nouns, verbs, adjectives, adverbs, and emoticons.

더욱더 바람직하게는, 상기 이모티콘은,Even more preferably, the emoticon,

불규칙적으로 자주 사용되는 이모티콘 중 가장 간단한 이모티콘 형태인 참조 이모티콘일 수 있다.
It may be a reference emoticon, which is the simplest emoticon form among irregularly frequently used emoticons.

더욱더 바람직하게는,Even more preferably,

문장에 포함된 불규칙적인 이모티콘은 베이지안 프레임워크를 이용하여 상기 참조 이모티콘으로 변환하여 감정 특징을 추출할 수 있다.Irregular emoticons included in sentences may be converted into the reference emoticons using a Bayesian framework to extract emotional features.

본 발명에서 제안하고 있는 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법에 따르면, 컨텍스트 정보를 이용하여 텍스트를 구성하는 다중 문장으로부터 주요 문장을 추출하고, 추출된 주요 문장에 대하여 감정을 분류하고, 분류된 감정을 결합함으로써 온라인 상에서 수집할 수 있는 다중 문장으로부터 감정을 정확하게 분류하여, 마케팅 전략에 활용할 수 있다.According to the method of classifying emotions from multiple sentences using context information proposed by the present invention, the main sentences are extracted from the multiple sentences constituting the text using the context information, the emotions are classified, and the classification is performed on the extracted main sentences. By combining these emotions, you can accurately classify emotions from multiple sentences that can be collected online and use them in your marketing strategy.

도 1은 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법의 순서도.
도 2는 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법의 단계 S100에 대한 세부 순서도.
도 3은 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법의 단계 S300에 대한 세부 순서도.
도 4는 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법에서, 베이지안 프레임워크를 이용하여 불규칙 이모티콘을 처리하는 세부 흐름을 도시한 도면.
도 5는 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법에서, 문장의 감정 분류 성능 실험 결과를 도시한 도면.1 is a flowchart of a method for classifying emotions from multiple sentences using context information according to an embodiment of the present invention.
2 is a detailed flowchart of step S100 of a method for classifying emotions from multiple sentences using context information according to an embodiment of the present invention.
3 is a detailed flowchart of step S300 of a method for classifying emotions from multiple sentences using context information according to an embodiment of the present invention.
4 is a flowchart illustrating a detailed flow of processing an irregular emoticon using a Bayesian framework in a method for classifying emotions from multiple sentences using context information according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating an emotion classification performance test result of a sentence in a method of classifying emotions from multiple sentences using context information according to an embodiment of the present invention. FIG.

이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, in order that those skilled in the art can easily carry out the present invention. In the following detailed description of the preferred embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In the drawings, like reference numerals are used throughout the drawings.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 ‘연결’ 되어 있다고 할 때, 이는 ‘직접적으로 연결’ 되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 ‘간접적으로 연결’ 되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 ‘포함’ 한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.
In addition, in the entire specification, when a part is referred to as being 'connected' to another part, it may be referred to as 'indirectly connected' not only with 'directly connected' . In addition, the term 'comprising' of an element means that the element may further include other elements, not to exclude other elements unless specifically stated otherwise.

도 1은 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법의 순서도이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법은, 텍스트를 구성하는 다중 문장 각각에 대하여 컨텍스트 정보를 추출하는 단계(S100), 추출된 컨텍스트 정보를 이용하여 다중 문장으로부터 복수 개의 주요 문장을 추출하는 단계(S200), 추출된 주요 문장 각각에 대하여 감정 특징을 추출하는 단계(S300) 및 추출된 감정 특징을 이용하여 감정 분류기를 통해 각각의 주요 문장의 감정을 분류하는 단계(S400)를 포함하여 구성될 수 있고, 분류된 각각의 주요 문장의 감정을 결합하는 단계(S500)를 더 포함할 수 있다.
1 is a flowchart of a method for classifying emotions from multiple sentences using context information according to an embodiment of the present invention. As shown in FIG. 1, in the emotion classification method using multiple contexts according to an embodiment of the present invention, the context information is extracted for each of the multiple sentences constituting the text (S100). Extracting a plurality of main sentences from the multiple sentences using context information (S200), extracting an emotional feature for each extracted main sentence (S300), and using the emotion classifier using the extracted emotional features, respectively. It may be configured to include a step of classifying the emotions of the main sentence (S400), and may further include a step (S500) of combining the emotions of each classified main sentence.

단계 S100에서는, 텍스트를 구성하는 다중 문장 각각에 대하여 컨텍스트 정보를 추출하며, 단계 S100의 세부적인 흐름에 대하여는 도 2를 참조하여 상세히 설명하도록 한다.
In step S100, context information is extracted for each of the multiple sentences constituting the text, and the detailed flow of step S100 will be described in detail with reference to FIG. 2.

도 2는 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법의 단계 S100에 대한 세부 순서도이다. 도 2에 도시된 바와 같이, 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법의 단계 S100은, 문장에 포함된 키워드 정보를 산출하는 단계(S110), 텍스트 내에서 문장의 위치에 대한 가중치를 산출하는 단계(S120), 문장 간의 감정 변화도를 산출하는 단계(S130) 및 산출된 키워드 정보, 문장의 위치에 대한 가중치 및 문장 간의 변화도를 이용하여 문장의 컨텍스트 정보를 산출하는 단계(S140)를 포함할 수 있다.
2 is a detailed flowchart of step S100 of a method for classifying emotions from multiple sentences using context information according to an embodiment of the present invention. As shown in FIG. 2, step S100 of a method for classifying emotions from multiple sentences using context information according to an embodiment of the present disclosure includes calculating keyword information included in a sentence (S110), and a sentence in text. Calculating the weight of the position (S120), calculating the emotional variation between sentences (S130) and the calculated context information of the sentence by using the keyword information, the weight of the position of the sentence and the degree of change between sentences. It may include calculating (S140).

단계 S110에서는, 하기의 수학식 1을 이용하여 문장에 포함된 키워드 정보를 산출한다.In step S110, keyword information included in a sentence is calculated using Equation 1 below.

여기서, S는 입력문장이고, k_i는 입력문장 S에 포함되어 있는 i번째 키워드, ω_i∈R는 키워드 가중치이며, K는 입력문장 S에 포함되어 있는 키워드 개수를 나타낸다. 키워드는 도메인별로 미리 수집된 문장을 통해 해당 어휘의 빈도를 측정하여 이 빈도를 가중치(0~1)로 부여한다.
Here, S is an input sentence, k _i is an i-th keyword included in the input sentence S, ω _i ∈ R is a keyword weight, and K represents the number of keywords included in the input sentence S. The keyword measures the frequency of the corresponding vocabulary through sentences previously collected for each domain, and assigns the frequency as a weight (0 to 1).

단계 S120에서는, 텍스트 내에서 문장의 위치에 대한 가중치를 산출한다. 보다 구체적으로, 텍스트를 작성하는 사람은 일반적으로 자신의 의견을 텍스트의 첫 문장이나 끝 문장을 통해 표현하는 경우가 많기 때문에, 텍스트 내에서 문장의 위치는 텍스트의 전체 감정을 추정하는데 매우 중요한 정보인바, 하기의 수학식 2를 이용하여 텍스트 내에서 문장의 위치에 대한 가중치를 산출한다.In step S120, a weight for the position of the sentence in the text is calculated. More specifically, since the author of a text generally expresses his or her opinion through the first sentence or the ending sentence of the text, the position of the sentence in the text is very important information for estimating the overall feeling of the text. By using Equation 2 below, the weight of the position of the sentence in the text is calculated.

여기서, index(S_i)는 주요문장 S_i의 인덱스이고, T는 텍스트 안의 문장의 수를 나타낸다.
Here, index (S _i ) is the index of the main sentence S _i , and T represents the number of sentences in the text.

단계 S130에서는, 문장 간의 감정 변화도를 산출한다. 보다 구체적으로, 문장의 감정이 유지되다가 갑자기 감정의 변화가 생기는 경우 문장 전체의 감정이 변하는 경우가 많기 때문에, 문장 간의 감정 변화도 또한 전체 문장의 감정을 추정하는데 매우 중요한 정보인바, 하기의 수학식 3을 이용하여 문장 간의 감정 변화도를 산출한다.In step S130, the degree of emotional change between sentences is calculated. More specifically, since the emotions of the whole sentence are often changed when the emotion of the sentence is sustained and suddenly the emotion changes, the emotional change between sentences is also very important information for estimating the emotion of the whole sentence. Use 3 to calculate the degree of emotional change between sentences.

여기서, n(S_ps)은 동일한 감정을 가지는 앞 문장의 수를 나타낸다.
Here, n (S _ps ) represents the number of preceding sentences having the same emotion.

단계 S140에서는, 단계 S110 내지 단계 S130에 의해 산출된 키워드 정보, 문장의 위치에 대한 가중치 및 문장 간의 변화도를 이용하여 하기의 수학식 4에 의해 문장의 컨텍스트 정보를 산출한다.In step S140, the context information of the sentence is calculated by Equation 4 below using the keyword information calculated in steps S110 through S130, the weight of the position of the sentence, and the degree of change between sentences.

단계 S200에서는, 단계 S100에 의해 추출된 컨텍스트 정보를 이용하여 다중 문장으로부터 주요 문장을 추출한다. 즉, 단계 S200을 통해 텍스트 전체의 감정을 추정하는데 중요한 주요 문장을 추출하며, 추출되는 주요 문장은 복수 개로 추출될 수 있다.
In step S200, the main sentence is extracted from the multiple sentences using the context information extracted in step S100. That is, the main sentence is extracted through step S200, and the main sentences important for estimating the emotion of the entire text may be extracted, and a plurality of extracted main sentences may be extracted.

단계 S300에서는, 단계 S200에 의해 추출된 주요 문장 각각에 대하여 감정 특징을 추출하며, 단계 S300의 세부적인 흐름에 대하여는 도 3을 참조하여 상세히 설명하도록 한다.
In step S300, an emotional feature is extracted for each of the main sentences extracted by step S200, and the detailed flow of step S300 will be described in detail with reference to FIG.

도 3은 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법의 단계 S300에 대한 세부 순서도이다. 도 3에 도시된 바와 같이, 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법의 단계 S300은, 형태소 분석기를 이용하여 문장에 포함된 단어를 품사별로 분류하는 단계(S310) 및 감정 사전을 이용하여 단어에 대한 감정 특징을 추출하는 단계(320)를 포함할 수 있다.
3 is a detailed flowchart of step S300 of a method for classifying emotions from multiple sentences using context information according to an embodiment of the present invention. As shown in FIG. 3, step S300 of a method for classifying emotions from multiple sentences using context information according to an embodiment of the present disclosure includes: classifying words included in sentences by parts of speech using a morpheme analyzer (S310). And extracting 320 an emotional feature for a word using the emotional dictionary.

단계 S310에서는, 형태소 분석기를 이용하여 문장에 포함된 단어를 품사별로 분류한다. 단어를 형태소 분석을 하게 되면, 다양한 활용을 하는 용언도 일치하는 어간으로부터 동일 단어 여부를 판단할 수 있고, 이러한 과정을 거처 단어를 품사별로 분류할 수 있다.
In step S310, a word included in a sentence is classified by parts of speech using a morpheme analyzer. When the words are morphologically analyzed, it is possible to determine whether or not the same words are used from the matching stems, and the words can be classified by parts of speech based on this process.

단계 S320에서는, 감정 사전을 이용하여 단계 S310에 의해 분류된 단어에 대한 감정 특징을 추출한다. 보다 구체적으로, 감정 사전은 각 품사별로 감정에 따른 단어와 그 단어의 감정 세기를 포함할 수 있고, 단어에 대한 감정의 종류와 감정의 세기를 감정 특징으로 추출할 수 있다. 이때, 감정 사전은 명사, 동사, 형용사, 부사 및 이모티콘별 감정 특징을 포함할 수 있다.
In step S320, the emotion feature for the words classified in step S310 is extracted using the emotion dictionary. More specifically, the emotion dictionary may include words corresponding to emotions for each part-of-speech and emotion intensity of the words, and may extract the types of emotions and the intensity of emotions for the words as emotion characteristics. In this case, the emotion dictionary may include emotional features for nouns, verbs, adjectives, adverbs, and emoticons.

한편, 같은 어휘라 하더라도 특정 도메인에 따라 다른 감정을 나타내는 경우가 발생할 수 있다. 예컨대, “가볍다”라는 어휘는 “인물” 도메인에서는 부정적인 의미를 나타내는 반면, “통신” 도메인에서는 긍정적 의미를 나타낸다. 즉, 같은 어휘가 특정 도메인에 따라 감정이 달라질 수 있는바, 감정 사전은 어휘 사전에 기반한 기존의 형식적 감정 사전에 도메인 기반 감정 사전을 추가하여 구축하는 것이 바람직하고, 이를 통해 다중 문장에 대한 보다 정확한 감정 분류를 할 수 있다.
On the other hand, even the same vocabulary may show different emotions according to specific domains. For example, the word "light" has a negative meaning in the "personal" domain, while a positive meaning in the "communication" domain. In other words, the same vocabulary can be different emotions according to a specific domain, it is preferable that the emotional dictionary is built by adding a domain-based emotional dictionary to the existing formal emotional dictionary based on the lexical dictionary, through which a more accurate Emotion classification can be done.

더욱이, 단계 S320에서, 문장에 포함된 이모티콘은 문장의 감정을 분류하는데 매우 중요한 요소임에도 불구하고, 사용자의 취향이나 오타, 그리고 기타 여러 가지 요인으로 인해 같은 의미를 가짐에도 매우 불규칙하게 쓰여서 그 자체를 감정 특징으로 사용하는 것이 어려운 문제가 있다. 예컨대, “^_^”과 “^_________^”은 같은 의미이지만 개인에 따라 “_”의 개수를 다르게 사용할 수 있으며, 이러한 불규칙 이모티콘을 그대로 사용하는 것은 정확한 감정 분류를 어렵게 하는 요인 중 하나이다.
Furthermore, in step S320, although the emoticons included in the sentence are very important factors in classifying the feelings of the sentence, they are written very irregularly, even though they have the same meaning due to the user's taste, typos, and other factors. There is a difficult problem to use as an emotional feature. For example, “^ _ ^” and “^ _________ ^” have the same meaning, but the number of “_” can be used differently according to the individual. The use of such irregular emoticons as it is is one of the factors that makes it difficult to classify the correct emotion.

이를 위해, 문장에 포함된 불규칙적인 이모티콘을 감정 사전에 포함된 참조 이모티콘으로 변환하여 이로부터 감정 특징을 추출하는 것이 바람직하다. 여기서 "참조 이모티콘"이란 감정 사전에 포함된 이모티콘으로서, 불규칙적으로 자주 사용되는 이모티콘 중 가장 간단한 이모티콘 형태를 말한다. 즉, 불규칙적으로 사용하는 이모티콘을 이러한 참조 이모티콘으로 변환함으로써 문장에 포함된 불규칙 이모티콘으로부터 정확한 감정 분류를 수행할 수 있다.
To this end, it is preferable to convert the irregular emoticons included in the sentence into a reference emoticon included in the emotion dictionary to extract emotional features from the emoticon. Here, the "reference emoticon" is an emoticon included in the emotion dictionary, and refers to the simplest emoticon form among emoticons that are frequently used irregularly. That is, by converting an emoticon used irregularly into such a reference emoticon, accurate emotion classification can be performed from the irregular emoticon included in the sentence.

보다 구체적으로, 문장에 포함된 불규칙한 이모티콘은 베이지안 프레임워크를 이용하여 참조 이모티콘으로 변환할 수 있으며, 도 4는 본 발명의 일실시예에 따른 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법에서, 베이지안 프레임워크를 이용하여 불규칙 이모티콘을 처리하는 세부 흐름을 도시한 도면이다. 도 4에 도시된 바와 같이, 이모티콘을 분해한 후 히스토그램을 이용한 정규화 과정을 거쳐 확률분포 모델을 구성하고, 불규칙 이모티콘과 참조 이모티콘 간의 유사도(likelihood)를 산출하여 최적의 참조 이모티콘을 추출함으로써 불규칙 이모티콘을 처리할 수 있다.
More specifically, an irregular emoticon included in a sentence may be converted into a reference emoticon using a Bayesian framework, and FIG. 4 shows Bayesian in a method of classifying emotions from multiple sentences using context information according to an embodiment of the present invention. A detailed flow of processing an irregular emoticon using a framework is illustrated. As shown in FIG. 4, after disassembling an emoticon, a probability distribution model is constructed through a normalization process using a histogram, and an optimal emoticon is extracted by calculating a similarity between an irregular emoticon and a reference emoticon and extracting an optimal reference emoticon. Can be processed.

단계 S400에서는, 단계 S300에 의해 추출된 감정 특징을 이용하여 감정 분류기를 통해 각각의 주요 문장의 감정을 분류한다. 즉, 단계 S300에 의해 추출된 감정 특징을 특징 벡터로 구성하여 감정 분류기를 통해 문장의 감정을 분류하며, 이때 문장 감정 분류를 위한 감정 분류기는 SVM(Support Vector Machine)을 이용할 수 있다.
In step S400, the emotion of each main sentence is classified through the emotion classifier using the emotion feature extracted in step S300. That is, the emotion feature extracted by step S300 is configured as a feature vector to classify the emotions of the sentence through an emotion classifier. In this case, the emotion classifier for sentence emotion classification may use a support vector machine (SVM).

단계 S500에서는, 단계 S400에 의해 분류된 각각의 주요 문장의 감정을 결합하고, 이를 통해 다중 문장의 감정을 분류하여 최종적으로 텍스트의 전체 감정을 추정할 수 있다.
In step S500, the emotions of the respective main sentences classified by step S400 may be combined, and through this, the emotions of the multiple sentences may be classified to finally estimate the overall emotion of the text.

[실험예][Experimental Example]

문장의 감정 분류 성능 실험Emotion classification performance experiment of sentence

트위터, 페이스북, 미투데이와 같은 소셜 네트워크 서비스(SNS)에서 사용자가 작성한 글들을 일반, 제품리뷰, 여행, 음식 및 영화 도메인별로 무작위로 수집한 후, 수집된 텍스트를 각각 네 가지 방법을 사용하여 감정 분류를 수행하였다.
We randomly collect user-written posts from social network services (SNS) such as Twitter, Facebook, and Me2day by general, product review, travel, food, and movie domains, and then use each of the four methods Classification was performed.

즉, 기존의 형식적 사전만을 이용한 방법(case 1), 도메인 기반 감정 사전을 추가하여 구축한 감정 사전만을 이용한 방법(case 2), 컨텍스트 정보와 기존의 형식적 사전을 이용한 방법(case 3) 및 컨텍스트 정보와 도메인 기반 감정 사전을 추가하여 구축한 감정 사전을 이용한 방법(case 4)을 사용하여 감정 분류를 수행하였다. 수행된 각각의 방법에 따른 문장의 감정 분류 성능은 하기의 수학식 5 내지 수학식 7에 의한 정확률(precision, “p”) 및 재현율(recall, “r”)을 이용한 F₁-measure를 사용하여 평가하였고, 그 결과를 표 1 및 도 5에 나타내었다.That is, the method using only the existing formal dictionary (case 1), the method using only the emotion dictionary constructed by adding the domain-based emotion dictionary (case 2), the method using the context information and the existing formal dictionary (case 3), and the context information Emotion classification was performed using the method using case dictionary (case 4) constructed by adding domain-based emotion dictionary. Emotion classification performance of the sentence according to each method performed using F ₁ -measure using precision (“p”) and recall (“r”) according to Equations 5 to 7 below. It evaluated, and the result is shown in Table 1 and FIG.

도메인domain CaseCase 감정emotion pp rr F1F1

일반

Normal Case 1Case 1 긍정Positive 0.57980.5798 0.56440.5644 0.57190.5719 부정denial 0.63770.6377 0.48910.4891 0.55360.5536 중립neutrality 0.68410.6841 0.59910.5991 0.63870.6387 Case 2Case 2 긍정Positive 0.62130.6213 0.58910.5891 0.60470.6047 부정denial 0.61240.6124 0.64010.6401 0.62590.6259 중립neutrality 0.71350.7135 0.67870.6787 0.69560.6956

제품리뷰

Product Review Case 1Case 1 긍정Positive 0.60120.6012 0.81810.8181 0.69300.6930 부정denial 0.66630.6663 0.25130.2513 0.36490.3649 중립neutrality 0.53870.5387 0.65780.6578 0.59230.5923 Case 2Case 2 긍정Positive 0.66480.6648 0.72730.7273 0.69460.6946 부정denial 0.62590.6259 0.62110.6211 0.62340.6234 중립neutrality 0.91210.9121 0.68060.6806 0.77950.7795 Case 3Case 3 긍정Positive 0.81220.8122 0.73010.7301 0.76890.7689 부정denial 0.63810.6381 0.61140.6114 0.62440.6244 중립neutrality 0.75330.7533 0.81010.8101 0.78060.7806 Case 4Case 4 긍정Positive 0.81290.8129 0.70130.7013 0.75290.7529 부정denial 0.65870.6587 0.77590.7759 0.71250.7125 중립neutrality 0.82640.8264 0.85850.8585 0.84210.8421

여행

Travel Case 1Case 1 긍정Positive 0.75120.7512 0.75980.7598 0.75540.7554 부정denial 0.66020.6602 0.32890.3289 0.43900.4390 중립neutrality 0.44510.4451 0.66540.6654 0.53330.5333 Case 2Case 2 긍정Positive 0.71460.7146 0.83360.8336 0.76950.7695 부정denial 0.66570.6657 0.33280.3328 0.44370.4437 중립neutrality 0.42810.4281 0.50450.5045 0.46310.4631 Case 3Case 3 긍정Positive 0.74980.7498 0.75920.7592 0.75440.7544 부정denial 0.57220.5722 0.66870.6687 0.61660.6166 중립neutrality 0.80070.8007 0.66180.6618 0.72460.7246 Case 4Case 4 긍정Positive 0.65990.6599 0.83040.8304 0.73530.7353 부정denial 0.54310.5431 0.50070.5007 0.52100.5210 중립neutrality 0.87040.8704 0.50110.5011 0.63600.6360

음식

food Case 1Case 1 긍정Positive 0.71410.7141 0.84010.8401 0.77190.7719 부정denial 0.5060.506 0.19160.1916 0.27790.2779 중립neutrality 0.37540.3754 0.48940.4894 0.42480.4248 Case 2Case 2 긍정Positive 0.72710.7271 0.88910.8891 0.79990.7999 부정denial 0.20350.2035 0.27890.2789 0.23530.2353 중립neutrality 0.9020.902 0.25310.2531 0.39520.3952 Case 3Case 3 긍정Positive 0.83340.8334 0.81360.8136 0.82330.8233 부정denial 0.50140.5014 0.90420.9042 0.64500.6450 중립neutrality 0.85760.8576 0.72340.7234 0.78480.7848 Case 4Case 4 긍정Positive 0.78910.7891 0.83410.8341 0.81090.8109 부정denial 0.53010.5301 0.90120.9012 0.66750.6675 중립neutrality 0.82490.8249 0.64020.6402 0.72090.7209

영화

movie Case 1Case 1 긍정Positive 0.63040.6304 0.66810.6681 0.64870.6487 부정denial 0.78130.7813 0.42850.4285 0.55340.5534 중립neutrality 0.21110.2111 0.51310.5131 0.29910.2991 Case 2Case 2 긍정Positive 0.69360.6936 0.44620.4462 0.54300.5430 부정denial 0.72880.7288 0.58980.5898 0.65190.6519 중립neutrality 0.33120.3312 0.73660.7366 0.45690.4569 Case 3Case 3 긍정Positive 0.58130.5813 0.77770.7777 0.66530.6653 부정denial 0.60540.6054 0.42840.4284 0.50170.5017 중립neutrality 0.86520.8652 0.48090.4809 0.61810.6181 Case 4Case 4 긍정Positive 0.61480.6148 0.88760.8876 0.72640.7264 부정denial 0.93840.9384 0.62780.6278 0.75230.7523 중립neutrality 0.58120.5812 0.68570.6857 0.62910.6291

표 1 및 도 5에 나타낸 바와 같이, 감정 사전만을 이용한 경우에 비해 컨텍스트 정보와 감정 사전을 이용한 경우 감정 분류 성능 효과가 우수함을 확인하였다. 또한, 일반 감정 사전을 사용한 것에 비해 도메인 기반 감정 사전을 추가하여 구축한 감정 사전의 경우에 감정 분류 성능 효과가 보다 우수함을 확인하였다. 따라서 본 발명에 따른 방법은 텍스트를 구성하는 다중문장으로부터의 감정 분류 성능 효과가 우수함을 알 수 있다.
As shown in Table 1 and FIG. 5, it was confirmed that the effect of emotion classification performance is excellent when the context information and the emotion dictionary are used as compared with the case using only the emotion dictionary. In addition, it is confirmed that the emotion classification performance effect is better in the case of the emotion dictionary constructed by adding the domain-based emotion dictionary than the general emotion dictionary. Therefore, it can be seen that the method according to the present invention has an excellent effect on the performance of emotion classification from multiple sentences constituting text.

이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.The present invention may be embodied in many other specific forms without departing from the spirit or essential characteristics of the invention.

S100: 텍스트를 구성하는 다중 문장 각각에 대하여 컨텍스트 정보를 추출하는 단계
S200: 추출된 컨텍스트 정보를 이용하여 다중 문장으로부터 복수 개의 주요 문장을 추출하는 단계
S300: 추출된 주요 문장 각각에 대하여 감정 특징을 추출하는 단계
S400: 추출된 감정 특징을 이용하여 감정 분류기를 통해 각각의 주요 문장의 감정을 분류하는 단계
S500: 분류된 각각의 주요 문장의 감정을 결합하는 단계S100: extracting context information for each of the multiple sentences constituting the text
S200: extracting a plurality of main sentences from the multiple sentences using the extracted context information
S300: extracting an emotional feature for each extracted main sentence
S400: classifying the emotions of each main sentence through the emotion classifier using the extracted emotion features
S500: step of combining emotions of each classified main sentence

Claims

(1) extracting context information for each of the multiple sentences constituting the text;
(2) extracting a plurality of main sentences from the multiple sentences using the extracted context information;
(3) extracting an emotional feature for each of the extracted main sentences; And
(4) classifying the emotions of each of the main sentences using an emotion classifier using the extracted emotion features,
The step (1)
(1-1) calculating keyword information included in a sentence by using the following equation;

(Where S is the input sentence, k _i is the i th keyword included in the input sentence S, ω _i ∈ R is the keyword weight, and K is the number of keywords included in the input sentence S).
(1-2) calculating weights for positions of sentences in the text using the following equation;

(Where index (S _i ) is the index of the main sentence S _i , and T is the number of sentences in the text.)
(1-3) calculating the degree of emotional change between sentences using the following equation; And

(Where n (S _ps ) is the number of preceding sentences with the same emotion.)
(1-4) calculating context information of a sentence by using the following equation by using the calculated keyword information, a weight of a position of a sentence, and a degree of change between sentences.

Emotion classification method from multiple sentences using the context information, comprising a.

The method of claim 1,
(5) further comprising combining the emotions of each of the main sentences classified; emotion classification method from multiple sentences using context information.

delete

2. The method of claim 1, wherein step (3)
(3-1) classifying words included in sentences by parts of speech using a morpheme analyzer; And
(3-2) A method of classifying emotions from multiple sentences using context information, comprising extracting an emotion feature for the word using an emotion dictionary.

The method according to claim 4, wherein in step (3-2),
The emotion dictionary is constructed by adding a domain-based emotion dictionary to a formal emotion dictionary based on a lexical dictionary.

The method of claim 4, wherein the emotion dictionary,
Emotion classification method from multiple sentences using context information, characterized by including emotional features for nouns, verbs, adjectives, adverbs, and emoticons.

The method of claim 6, wherein the emoticon,
A method of classifying emotions from multiple sentences using context information, characterized in that the reference emoticon is the simplest emoticon among irregularly used emoticons.

The method of claim 7, wherein
Irregular emoticons included in the sentence is converted to the reference emoticon using a Bayesian framework to extract the emotional features, emotion classification method from multiple sentences using context information.