KR102398683B1

KR102398683B1 - System and Method for Constructing Emotion Lexicon by Paraphrasing and Recognizing Emotion Frames

Info

Publication number: KR102398683B1
Application number: KR1020170106064A
Authority: KR
Inventors: 박종철; 박한철; 정진우; 이희제
Original assignee: 한국과학기술원
Priority date: 2017-08-22
Filing date: 2017-08-22
Publication date: 2022-05-17
Also published as: KR20190021015A

Abstract

본 발명은 컴퓨터에서 실행가능한 언어로 기록된 기록매체에서 수행되는 감정 사전 구축 및 감정 구조 인식 방법으로서, 패러프레이징(paraphrasing)을 통해, 구(phrase) 단위의 감정 범주를 포함하는 감정 사전을 구축하는 단계, 및 상기 구축된 감정 사전을 기초로 하여, 입력받은 텍스트에 나타나는 감정 표현에 대한 감정 범주와 상기 감정 표현에 대응되는 감정 주체와 상기 감정 표현에 대응되는 감정의 대상을 포함하는 감정 구조를 인식하는 단계를 포함한다.The present invention provides a method for constructing an emotion dictionary and recognizing an emotion structure performed on a recording medium recorded in a computer-executable language, through paraphrasing, to construct an emotion dictionary including emotion categories in phrase units. Step, and based on the built-up emotion dictionary, the emotional structure including the emotional category for the emotional expression appearing in the received text, the emotional subject corresponding to the emotional expression, and the emotional object corresponding to the emotional expression are recognized including the steps of

Description

Construction of an emotion dictionary using paraphrasing and a system and method for recognizing emotional structures in text using the same {System and Method for Constructing Emotion Lexicon by Paraphrasing and Recognizing Emotion Frames}

본 발명은 패러프레이징을 통한 감정 사전 구축 및 이를 활용한 텍스트 상의 감정 구조를 인식하는 시스템 및 그 방법에 관한 것이다. 보다 구체적으로, Ekman의 기본 감정 모델(기쁨, 슬픔, 두려움, 분노, 놀람, 혐오)에 속하는 각 감정 범주에 대한 감정 어휘들을 모은 사전을 반자동으로 구축하고, 구축된 감정 사전과 구문 분석 기법을 이용하여 텍스트 내에 나타나는 다양한 감정과 각 감정을 느끼는 주체와 해당 감정을 유발하는 대상을 찾는 시스템과 그 방법에 관한 것이다.The present invention relates to a system and method for constructing an emotion dictionary through paraphrase and recognizing an emotion structure in a text using the same. More specifically, semi-automatically constructs a dictionary that collects emotion vocabulary for each emotion category belonging to Ekman's basic emotion model (joy, sadness, fear, anger, surprise, disgust), and uses the built emotion dictionary and syntax analysis technique Thus, it relates to a system and method for finding the various emotions appearing in the text, the subject who feels each emotion, and the object that induces the emotion.

자연어로 기술된 텍스트로부터 감정을 인식하는 기술은 제품 리뷰, 사회 현상, 시장 등을 분석하는 데 있어 중요한 기술로 인식되고 있다. 이러한 감정 인식 기술 개발을 위해 다양한 텍스트의 형태로부터(대화문, SNS 게시 글, 블로그 등) 감정을 인식하는 연구들이 수행되고 있다. 텍스트로부터의 감정 분석의 시도는 주로 감정 극성 분석(sentiment analysis) 기술을 이용하여 텍스트의 감정의 극성(긍/부정)을 인식하는데 집중되어 왔다(등록 특허 제10-1555-0390000호, (2015.09.16.), 등록 특허 제10-1346-1150000호, (2013.12.23.), 등록 특허 제10-1540-6830000호, (2015.07.24.)). 그러나 긍정 또는 부정이라는 단순한 감정의 극성(sentiment)에 대한 정보는 심도있는 감정 분석에 있어 그 한계가 있다. 따라서 더욱 세밀한 감정 분석을 위해 다양한 감점 범주를 인식하는 기술의 필요성이 대두되고 있다.The technology for recognizing emotions from texts written in natural language is recognized as an important technology for analyzing product reviews, social phenomena, and markets. In order to develop such emotion recognition technology, studies are being conducted to recognize emotions from various text types (dialogs, SNS posts, blogs, etc.). Attempts to analyze emotions from texts have been mainly focused on recognizing the polarity of emotions (positive/negative) of texts using sentiment analysis technology (Registration Patent No. 10-1555-0390000, (2015.09. 16.), Registered Patent No. 10-1346-1150000, (2013.12.23.), Registered Patent No. 10-1540-6830000, (2015.07.24.)). However, information on the sentiment of simple emotions such as positive or negative has its limitations in deep emotion analysis. Therefore, the need for technology that recognizes various deduction categories for more detailed emotion analysis is emerging.

본 발명은, 상기와 같은 문제점을 해결하기 위해 안출된 것으로, 다양한 감정 범주에 대한 감정 어휘로 구성된 구 수준의 감정 사전을 구축하고, 이를 이용하여 텍스트에서 나타나는 다양한 감정 구조를 인식하는 시스템과 그 방법을 제공하는데 그 목적이 있다.The present invention has been devised to solve the above problems, and a system and method for constructing a phrase-level emotion dictionary composed of emotion vocabulary for various emotion categories and recognizing various emotion structures appearing in text using this Its purpose is to provide

다만, 본 발명이 해결하고자 하는 과제는 상기 목적으로 한정되는 것이 아니며, 본 발명의 기술적 사상 및 영역으로부터 벗어나지 않는 범위에서 다양하게 확장될 수 있다. However, the problem to be solved by the present invention is not limited to the above purpose, and may be variously expanded without departing from the spirit and scope of the present invention.

상기 과제를 해결하기 위한 본 발명의 실시예에 따른 컴퓨터에서 실행가능한 언어로 기록된 기록매체에서 수행되는 감정 사전 구축 및 감정 구조 인식 방법은, 패러프레이징(paraphrasing)을 통해, 구(phrase) 단위의 감정 범주를 포함하는 감정 사전을 구축하는 단계, 및 상기 구축된 감정 사전을 기초로 하여, 입력받은 텍스트에 나타나는 감정 표현에 대한 감정 범주와 상기 감정 표현에 대응되는 감정 주체와 상기 감정 표현에 대응되는 감정의 대상을 포함하는 감정 구조를 인식하는 단계를 포함한다. A method for constructing an emotion dictionary and recognizing an emotion structure performed on a recording medium recorded in a language executable on a computer according to an embodiment of the present invention for solving the above problems is, through paraphrasing, a phrase unit Building an emotion dictionary including an emotion category, and based on the built emotion dictionary, an emotion category for an emotional expression appearing in the received text, an emotional subject corresponding to the emotional expression, and the emotional expression corresponding to the emotional expression and recognizing an emotional structure including an emotional object.

본 발명의 몇몇 실시예에서, 상기 감정 사전을 구축하는 단계는, 패러프레이징을 통해 구 단위로 감정 단어를 확장하는 과정을 포함하여 수행될 수 있다. In some embodiments of the present invention, the step of constructing the emotion dictionary may include expanding the emotion word in phrase units through paraphrase.

본 발명의 몇몇 실시예에서, 상기 감정 사전을 구축하는 단계는, 동철이의어에 해당하는 감정 표현이 아닌 표현이 선택되는 것을 제외하도록 수행될 수 있다. In some embodiments of the present invention, the step of constructing the emotion dictionary may be performed to exclude an expression other than the emotional expression corresponding to the same word is selected.

본 발명의 몇몇 실시예에서, 상기 감정 사전을 구축하는 단계는, 패러프레이즈 쌍의 두 표현의 감정 극성이 서로 다른 경우를 제외하도록 수행될 수 있다. In some embodiments of the present invention, the step of constructing the emotion dictionary may be performed to exclude a case in which emotion polarities of two expressions of a paraphrase pair are different from each other.

본 발명의 몇몇 실시예에서, 상기 감정 구조를 인식하는 단계는, 상기 텍스트 내의 표현 중 동철이의어에 해당하는 감정 표현이 아닌 표현이 인식되는 것을 제외하도록 수행될 수 있다.In some embodiments of the present invention, the step of recognizing the emotional structure may be performed to exclude an expression other than an emotional expression corresponding to a synonymous word among expressions in the text from being recognized.

본 발명의 몇몇 실시예에서, 상기 감정 구조를 인식하는 단계는, 인식된 감정 표현에 관계되는 감정의 주체와 감정의 대상을 인식하도록 수행될 수 있다. In some embodiments of the present invention, the step of recognizing the emotional structure may be performed to recognize the subject of the emotion and the object of the emotion related to the recognized emotion expression.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

본 발명의 실시예에 따르면, 패러프레이징을 통해 소정의 범주화된 감정 어휘에 대한 의미를 고려하여 동일한 의미와 감정 범주를 지닌 다양한 형태의 감정 어휘들을 추출할 수 있다. According to an embodiment of the present invention, various types of emotional vocabulary having the same meaning and emotional category can be extracted by taking into account the meaning of a predetermined categorized emotional vocabulary through paraphrase.

또한, 본 발명의 실시예에 따르면, 동철이의어를 고려하여 정확도 높은 감정 어휘를 추출할 수 있다. In addition, according to an embodiment of the present invention, it is possible to extract an emotion vocabulary with high accuracy in consideration of synonyms.

또한, 본 발명의 실시예에 따르면, 단어를 포함하는 구 단위의 감정 사전을 구축하여 복수 개의 표현으로 구성된 감정 표현을 추출할 수 있다.In addition, according to an embodiment of the present invention, it is possible to extract an emotional expression composed of a plurality of expressions by constructing a phrase-based emotion dictionary including words.

또한, 본 발명의 실시예에 따르면, 올바르지 못하게 추출된 감정 어휘에 대해서 필터링 할 수 있다.In addition, according to an embodiment of the present invention, it is possible to filter on the incorrectly extracted emotion vocabulary.

또한, 본 발명의 실시예에 따르면, 텍스트에 나타나는 다양한 감정 구조를 인식하여 세밀한 감정 분석을 수행할 수 있다.In addition, according to an embodiment of the present invention, it is possible to perform detailed emotion analysis by recognizing various emotional structures appearing in text.

또한, 본 발명의 실시예에 따르면, 구 단위 감정 사전을 이용함으로써 복수 개의 단어로 구성된 감성 표현에 대한 감정 인식을 수행할 수 있다.In addition, according to an embodiment of the present invention, emotion recognition can be performed on an emotional expression composed of a plurality of words by using a phrase-based emotion dictionary.

또한, 본 발명의 실시예에 따르면, 감정 사전의 어휘와 텍스트 상의 어휘를 비교하는 과정에서 발생할 수 있는 동철이의어 문제를 해소할 수 있다.In addition, according to an embodiment of the present invention, it is possible to solve the problem of identical words that may occur in the process of comparing the vocabulary of the emotion dictionary with the vocabulary on the text.

다만, 본 발명의 효과는 상기 효과들로 한정되는 것이 아니며, 본 발명의 기술적 사상 및 영역으로부터 벗어나지 않는 범위에서 다양하게 확장될 수 있다. However, the effects of the present invention are not limited to the above effects, and may be variously expanded without departing from the spirit and scope of the present invention.

도 1은 본 발명의 실시예에 따른 패러프레이징을 통한 감정 사전 구축 및 텍스트의 감정 구조 인식 방법의 흐름도이다.
도 2는 본 발명의 실시예에 따른 패러프레이징을 통한 감정 사전 구축 시스템의 구조를 예시한다.
도 3는 본 발명의 실시예에 따른 텍스트 상의 감정 구조 인식 시스템에 대한 흐름도이다.1 is a flowchart of a method of constructing an emotion dictionary through paraphrase and recognizing an emotion structure of a text according to an embodiment of the present invention.
2 illustrates the structure of a system for constructing an emotion dictionary through paraphrase according to an embodiment of the present invention.
3 is a flowchart of a system for recognizing an emotion structure on a text according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be embodied in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 기술적 사상 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 기술적 사상 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0012] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] Reference is made to the accompanying drawings, which show by way of illustration specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different but need not be mutually exclusive. For example, certain shapes, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the present invention with respect to one embodiment. In addition, it should be understood that the position or arrangement of individual components in each disclosed embodiment may be changed without departing from the spirit and scope of the present invention. Accordingly, the detailed description set forth below is not intended to be taken in a limiting sense, and the scope of the present invention, if properly described, is limited only by the appended claims, along with all scope equivalents as those claimed.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for the purpose of describing the embodiments, and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, "comprises" and/or "comprising" refers to the presence of one or more other components, steps, operations and/or elements mentioned. or addition is not excluded.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예들을 보다 상세하게 설명하고자 한다. 도면 상의 동일한 구성요소에 대해서는 동일한 참조 부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. The same reference numerals are used for the same components in the drawings, and repeated descriptions of the same components are omitted.

도 1은 본 발명의 실시예에 따른 패러프레이징(paraphrasing)을 통한 감정 사전 구축 및 텍스트의 감정 구조 인식 방법에 관한 흐름도이다. 1 is a flowchart illustrating a method for constructing an emotion dictionary through paraphrasing and recognizing an emotion structure of a text according to an embodiment of the present invention.

본 발명의 실시예에 따른 패러프레이징(paraphrasing)을 통한 감정 사전 구축 및 텍스트의 감정 구조 인식 방법은 각 단계들이 컴퓨터에서 실행될 수 있는 프로그램 언어로 기록되어, 컴퓨터에서 실행될 수 있다. In the method for constructing an emotion dictionary through paraphrasing and recognizing an emotion structure of text according to an embodiment of the present invention, each step is recorded in a program language that can be executed on a computer, and can be executed on the computer.

도 1을 참조하면, 본 발명의 실시예에 따른 패러프레이징을 통한 감정 사전 구축 및 감정 구조 인식 방법은, 패러프레이징을 통한 감정 사전 구축 단계(S100), 사용자로부터 텍스트를 입력받는 단계(S200), 구축된 감정 사전을 이용하여 입력받은 텍스트로부터 감정 구조를 인식하는 단계(S300)를 포함한다. Referring to FIG. 1 , the method for constructing an emotion dictionary through paraphrase and recognizing an emotion structure according to an embodiment of the present invention includes a step of constructing an emotion dictionary through paraphrase (S100), a step of receiving text from a user (S200), and recognizing the emotion structure from the input text using the built emotion dictionary (S300).

상기 단계들이 컴퓨터에서 실행가능한 언어로 기록된 기록매체 또는 컴퓨터 프로그램 자체가 본 발명이 구현된 실시형태일 수 있다. 우선적으로, 본 발명의 실시예에 따라 패러프레이징을 통한 감정 사전 구축 단계에 대해 설명하기로 한다. A recording medium in which the above steps are recorded in a language executable by a computer or the computer program itself may be an embodiment in which the present invention is implemented. First, the step of constructing an emotion dictionary through paraphrase according to an embodiment of the present invention will be described.

패러프레이징을 통한 감정 사전 구축 단계(S100)Step of constructing an emotion dictionary through paraphrase (S100)

도 2는 본 발명의 실시예에 따른 도 1의 패러프레이징을 통한 감정 사전 구축 단계를 세부적으로 설명하기 위한 흐름도이다. 2 is a flowchart for explaining in detail the step of constructing an emotion dictionary through the paraphrase of FIG. 1 according to an embodiment of the present invention.

본 발명의 실시예에 따른 패러프레이징을 통한 감정 사전 구축 단계(S100)는 미리 정해진 감정 범주(아래의 표 1 참조)에 대해 감정 표현에 해당하는 소정의 단어 수준의 감정 어휘를 기초로 패러프레이징을 통해 다양한 형태의 감정 어휘를 생성하여 감정 사전을 구축하는 단계로서, 병렬 말뭉치로부터 패러프레이즈(paraphrase) 쌍을 추출하는 단계(S110), 추출된 패러프레이즈 쌍으로부터 감정 표현을 추출하는 단계(S120), 추출된 감정 표현 가운데 오류가 포함된 감정 어휘를 필터링하는 단계(S130), 추출된 감정 표현을 사전의 형태로 구축하는 단계(S140)로 이루어져 있다.In the step (S100) of constructing an emotion dictionary through paraphrase according to an embodiment of the present invention, paraphrase is performed based on an emotion vocabulary of a predetermined word level corresponding to an emotion expression for a predetermined emotion category (see Table 1 below). A step of constructing an emotion dictionary by generating various types of emotional vocabulary through It consists of a step (S130) of filtering the emotional vocabulary containing errors among the extracted emotional expressions, and a step (S140) of building the extracted emotional expression in the form of a dictionary.

우선, 병렬 말뭉치로부터 패러프레이즈 쌍을 추출하는 단계(S110)는 두 개의 언어로 구성된 병렬 말뭉치로부터, 구(phrase) 단위의 패러프레이즈 쌍을 추출하는 단계이다. 본 발명에서의 패러프레이즈 쌍은 구 단위의 "원(original) 표현"과 이것의 의미를 보존한 다른 어형으로 구성된 표현인 "패러프레이즈된 표현"으로 짝지어진 표현 쌍을 의미한다. First, the step of extracting a paraphrase pair from the parallel corpus ( S110 ) is a step of extracting a phrase-unit paraphrase pair from a parallel corpus composed of two languages. A paraphrase pair in the present invention means an expression pair paired with a phrase-unit "original expression" and an expression composed of another word form that preserves its meaning.

패러프레이즈 쌍 추출을 위해 본 발명에서는 말뭉치 상에 병렬적으로 구성된 원 문장과 이에 대한 번역된 문장 사이의 구 단위 정렬(phrase alignment) 과정을 수행한다. 구 단위 정렬은 병렬 문장 사이에서 동일한 의미를 지닌 구 단위의 원 표현과 번역 표현 사이에 "의미적 정렬"을 하는 과정을 의미한다.For paraphrase pair extraction, in the present invention, a phrase alignment process is performed between an original sentence constructed in parallel on a corpus and a translated sentence for it. Phrase-wise alignment refers to a process of “semantic alignment” between the original and translated phrases of phrases that have the same meaning in parallel sentences.

(예시 1)(Example 1)

예를 들어, 병렬 문장 쌍 < "나는 행복하다." , "I am happy." > 로부터 < "나는" , "I" > , < "행복하다" , "am happy" > , < "." , "." > 의 형태로 구 단위 표현 정렬이 수행될 수 있다.For example, a pair of parallel sentences < "I am happy." , "I am happy." > from < "I" , "I" > , < "happy" , "am happy" > , < "." , "." In the form of >, phrase-wise expression sorting can be performed.

구 단위 정렬을 위한 컴퓨터 프로그램으로는, 예를 들어 GIZA++ 등이 이용될 수 있다. 상기 컴퓨터 프로그램은 여러 번역 문장 쌍에 걸쳐서 두 언어에 함께 출현하는 표현들의 빈도를 이용하여 두 언어 표현 사이의 번역 확률을 연산하고 정렬한다. 본 발명에서는 한 단어의 어형 변화에 의해 동일한 표현임에도 다른 단어로 간주되어 빈도 계산에 반영되지 않아 단어 정렬의 정확도가 훼손되는 문제를 해소하기 위해, 각 단어의 어간만을 추출하여 단어 정렬을 수행한다. As a computer program for sphere unit alignment, for example, GIZA++ may be used. The computer program calculates and aligns the translation probability between the two language expressions by using the frequency of the expressions co-occurring in both languages over several translation sentence pairs. In the present invention, in order to solve the problem that the accuracy of word alignment is damaged because it is regarded as a different word even though it is the same expression due to a change in the morphology of one word and is not reflected in the frequency calculation, only the stem of each word is extracted and word alignment is performed.

(예시 2)(Example 2)

예를 들어, "행복하다"의 경우 어간 "행복하"만 추출하여 "am happy"와 정렬이 되도록 한다.For example, in the case of "happy", only the stem "happy" is extracted so that it is aligned with "am happy".

정렬 후 두 다른 언어 표현간의 번역 확률은 다음 수학식 (1)에 의해 도출될 수 있다.After alignment, the translation probability between two different language expressions can be derived by the following Equation (1).

[수학식 (1)][Equation (1)]

본 발명의 실시예에서 k는 한글 표현, e는 영어 표현을 의미하며, count(e)는 해당 영어 표현이 말뭉치에 출현한 개수이며, count(k,e)는 한글 표현과 영어 표현이 병렬 말뭉치 상의 번역 쌍에서 동시에 출현한 빈도를 의미한다.In an embodiment of the present invention, k is a Korean expression, e means an English expression, count(e) is the number of occurrences of the corresponding English expression in the corpus, and count(k,e) is a parallel corpus between the Korean expression and the English expression. It refers to the frequency of simultaneous occurrences in a translation pair.

구 단위 표현 정렬 후 동일한 번역을 공유하는 두 개의 표현 가운데 문법적으로 동일한 문법 범주(시제, 태, 격 등)인 것만을 패러프레이즈 쌍으로 간주하여 저장한다. 문법 범주의 확인은 어간에 교착된 접사들을 조사함으로써 확인될 수 있다. 패러프레이즈 쌍에서 원 표현과 패러프레이즈된 표현 간의 패러프레이즈 관계 확률은 다음의 수학식 (2)에 의해 도출될 수 있다.After aligning phrase-by-phrase expressions, only those that are grammatically the same grammatical category (tense, voice, case, etc.) among two expressions sharing the same translation are considered as a paraphrase pair and stored. Identification of grammatical categories can be confirmed by examining the suffixes interlaced in the stem. In a paraphrase pair, the paraphrase relationship probability between the original expression and the paraphrase expression can be derived by the following Equation (2).

[수학식 (2)][Equation (2)]

이어서, 패러프레이즈 쌍으로부터 감정 표현을 추출하는 단계(S120)는 소정의 범주화된 감정 어휘들을 기초로 하여 생성된 패러프레이즈 쌍으로부터 구 단위의 감정 표현을 추출하는 과정이다. 이를 위해 우선 감정을 나타내는 다양한 단어들을 수집하고 최소 3인 이상의 평가자들로부터 이들이 속하는 감정 범주를 판단하는 과정을 수행한다. 본 발명의 실시예에서의 감정 범주는 Ekman 모델의 6가지 감정 범주(기쁨, 슬픔, 두려움, 분노, 놀람, 혐오)를 이용하여 정의하지만, 이러한 감정 범주는 Ekman 모델의 예시에만 제한되는 것은 아니다. 다음의 표 1에는 본 발명의 실시예로서 각각의 감정 범주에 할당되는 감정 표현이 정리되어 있다.Next, the step of extracting the emotional expression from the paraphrase pair ( S120 ) is a process of extracting the emotional expression of a phrase unit from the generated paraphrase pair based on predetermined categorized emotional vocabulary. To this end, first, various words expressing emotions are collected, and a process of determining the emotional category to which they belong from at least three evaluators is performed. The emotion category in the embodiment of the present invention is defined using six emotion categories (joy, sadness, fear, anger, surprise, disgust) of the Ekman model, but these emotion categories are not limited to the example of the Ekman model. Table 1 below summarizes the emotional expressions assigned to each emotion category as an embodiment of the present invention.

감정 범주emotion category 감정 표현expression of emotion 기쁨pleasure 사랑스럽다, 신나다, 흐뭇하다lovely, fun, delightful 슬픔sadness 구슬프다, 괴롭다, 애통하다grieve, grieve, mourn 두려움fear 두렵다, 노심초사하다, 초초하다afraid, anxious, anxious 분노anger 격분하다, 분개하다, 치떨리다rage, resent, tremble 놀람surprised 감탄하다, 혼비백산하다, 당황하다admiring, bewildered, bewildered 혐오aversion 경멸하다, 불쾌하다, 떨떠름하다contempt, displease, shiver

각 단어가 속하게 되는 감정 범주는 가장 많은 평가자에 의해 선택된 감정 범주로 결정되며, 이 때 해당 단어가 할당된 감정 범주에 속할 확률은 총 평가자의 수에서 해당 감정 범주로 선택받은 비율로써 정의된다. The emotion category to which each word belongs is determined by the emotion category selected by the largest number of evaluators, and the probability that the corresponding word belongs to the assigned emotion category is defined as the ratio selected as the emotion category from the total number of appraisers.

본 발명에서는 소정의 단어를 범주화하는데 수동 평가를 이용하였지만, 경우에 따라 사전 구축된 범주화된 감정 사전이 존재할 경우 이를 이용해도 무방하며, 감정 범주의 경우는 본 발명의 실시예에서 제시된 범주로만 제한되지는 않는다.In the present invention, manual evaluation is used to categorize predetermined words, but in some cases, if a pre-built categorized emotion dictionary exists, it may be used. does not

그리고, 패러프레이즈 쌍의 원 표현에 수집된 감정 단어의 어간이 존재할 경우 해당 쌍들을 모두 추출하고 대응되는 감정 범주를 할당한다. 이는 해당 감정 단어를 포함한 표현들은 동일한 감정 범주를 가졌을 가능성이 높기 때문이다. 어간을 기반으로 추출하는 이유는 패러프레이즈 쌍의 원 표현이 수집된 단어와 동일한 어간을 가졌음에도 불구하고 접사화에 의한 어형 변화로 추출되지 못하는 경우를 방지하기 위함이다. And, if there is a stem of the emotion word collected in the original expression of the paraphrase pair, all the pairs are extracted and the corresponding emotion category is assigned. This is because expressions including the corresponding emotional word are more likely to have the same emotional category. The reason for extracting based on the stem is to prevent a case in which the original expression of a paraphrase pair cannot be extracted due to a change in the word form due to affixation even though it has the same stem as the collected word.

(예시 3)(Example 3)

수집된 감정 단어: "화나다" [감정범주: 분노]Collected emotion word: "Angry" [Emotion Category: Anger]

감정 단어의 어간: "화나"The stem of the emotion word: "Angry"

추출된 패러프레이즈 쌍: < "화난 모습이었다" , "노기를 띄었다" > [감정범주: 분노]Extracted paraphrase pairs: < "He looked angry" , "I was angry" > [Emotional Category: Anger]

본 발명에서는 패러프레이즈 쌍의 원 표현이 할당된 감정 범주일 확률은 대응되는 감정 단어의 확률과 동일하게 한다. 패러프레이즈된 표현에 대한 할당된 감정 범주의 확률은 원 표현의 감정범주 확률과 패러프레이즈된 표현의 패러프레이즈 확률의 결합 확률(joint probability)로서 정의된다. 본 발명에서는 패러프레이징을 통해 동일한 의미를 지닌 감정 표현을 생성함으로써 보다 정확하게 감정 범주를 할당할 수 있다.In the present invention, the probability that the original expression of the paraphrase pair is an assigned emotion category is the same as the probability of the corresponding emotion word. The probability of the assigned emotion category for the paraphrase expression is defined as the joint probability of the emotion category probability of the original expression and the paraphrase probability of the paraphrase expression. In the present invention, by generating an emotional expression having the same meaning through paraphrase, it is possible to more accurately assign an emotional category.

이어서, 오류가 포함된 감정 어휘를 필터링하는 단계(S130)는 상기 과정에서 추출된 감정 표현 후보 패러프레이즈 쌍에 대해서 오류를 포함하는 표현 쌍들을 제거하는 과정이다. 본 발명을 위한 오류 분석 과정을 통해 도출된 오류의 종류는 다음 5가지이다. Next, the step of filtering the emotional vocabulary containing the error ( S130 ) is a process of removing the expression pairs containing the error with respect to the emotional expression candidate paraphrase pair extracted in the above process. The following five types of errors are derived through the error analysis process for the present invention.

첫째, 전술된 단계에서 감정 단어의 어간을 포함하는 패러프레이즈의 원 표현을 탐색할 때, 동철이의어에 의해서 감정 표현이 아닌 원 표현을 찾게 되는 경우에서 발생되는 오류이다. 예시로써, 감정 범주 "슬픔"에 해당하는 단어 "수치"의 경우 패러프레이즈 쌍의 원 표현이 숫자와 관계되는 의미인 "수치를 재다"와 같은 표현의 어간 "수치"와 매칭되어 이를 감정 표현으로 인식하는 경우이다. First, when searching for the original expression of the paraphrase including the stem of the emotional word in the above-mentioned step, it is an error that occurs when the original expression, not the emotional expression, is found by the same word. As an example, in the case of the word "figure" corresponding to the emotional category "sadness", the original expression of the paraphrase pair is matched with the stem "numerous" of an expression such as "measure the number", which is a meaning related to a number, and is used as an emotional expression. in case of recognizing

전술된 문제 해결을 위해 본 발명에서는 감정 단어의 어간의 품사가 다른 경우 또는 패러프레이즈 쌍이 공유하는 번역 표현이 감정의 극성을 가진 표현이 아닌 경우는 제외시켰다. 번역 표현이 감정 표현이 아닌 경우 패러프레이즈 쌍도 감정 표현이 아닐 가능성이 높다. 단어 중의성 문제를 감정 단어로 제약할 경우, 동철이의어에 관한 문제는 매우 낮은 확률로 발생하기 때문에 전술된 문제를 해소하는데 도움이 될 수 있다. 본 발명에 따른 실시예에서 품사 인식은 형태소 분석기를 통해 수행될 수 있으며, 감정 표현인 경우 종래 감정의 극성(sentiment)과 관련하여 구축된 사전을 이용하여 수행될 수도 있다. In order to solve the above-mentioned problem, in the present invention, the case where the part-of-speech of the stem of the emotional word is different or the case where the translation expression shared by the paraphrase pair is not the expression having the polarity of the emotion is excluded. If the translated expression is not an emotional expression, it is highly likely that the paraphrase pair is also not an emotional expression. When the word ambiguity problem is limited to emotional words, it can be helpful to solve the above-mentioned problem because the problem related to the same word occurs with a very low probability. In the embodiment according to the present invention, the part-of-speech recognition may be performed through a morpheme analyzer, and in the case of an emotional expression, it may be performed using a dictionary constructed in relation to the polarity of a conventional emotion.

둘째, 원 표현 또는 패러프레이즈된 표현에 의미가 추가되거나 상실되어 감정 단어를 포함함에도 순수한 감정 표현이 아닌 경우이다. 예시로써, “기쁨”에 해당하는 단어 “평화”를 포함하는 원 표현 “노벨 평화상”의 경우 감정 표현이 아니다. 또한 감정 표현에 개체가 추가되는 경우 순수한 감정 표현으로 보기 어렵다. 예시로써 “기쁨”에 해당하는 단어 “재미있다”의 경우 해당 단어의 어간을 포함하는 원 표현 “친구가 재미있어한다”는 표현 자체를 순수한 감정 표현으로 보기 어렵다. 따라서 본 발명에서는 개체명이거나 또는 고유명사/대명사를 포함한 표현을 지닌 패러프레이즈 쌍에 대해서는 제외하였다. 개체명 인식과 명사의 품사를 인식하는 과정은 개체명 인식기와 형태소 분석기를 이용하여 수행될 수 있다.Second, it is a case in which meaning is added or lost to the original expression or paraphrased expression, so that even though it contains emotional words, it is not a pure emotional expression. As an example, the original expression “Nobel Peace Prize” containing the word “peace” for “joy” is not an expression of emotion. Also, when an object is added to an emotional expression, it is difficult to see it as a pure emotional expression. As an example, in the case of the word “fun” corresponding to “joy”, it is difficult to see the original expression “my friend is having fun” containing the root of the word as a pure emotional expression. Therefore, in the present invention, a paraphrase pair having an entity name or an expression including a proper noun/pronoun is excluded. The process of recognizing an entity name and recognizing a part-of-speech of a noun may be performed using an entity name recognizer and a morpheme analyzer.

셋째, 패러프레이즈 실패로 인해 패러프레이즈된 표현이 원 표현과 완전히 다른 의미를 가진 경우이다. 전술된 문제 해결을 위해 본 발명의 실시예에서는 패러프레이즈된 표현의 패러프레이즈 확률이 특정 임계점 이하인 경우 제외시켰다. Third, it is a case where the paraphrase expression has a completely different meaning from the original expression due to a paraphrase failure. In order to solve the above-mentioned problem, in the embodiment of the present invention, the case where the paraphrase probability of the paraphrased expression is less than or equal to a specific threshold is excluded.

넷째, 패러프레이즈 쌍의 두 감정 표현의 감정의 극성(sentiment)이 다른 경우이다. 패러프레이즈된 표현은 원 표현과 의미가 동일하므로 원 표현이 지닌 감정의 극성을 가지고 있는 것이 자명하다. 따라서 두 표현의 극성이 서로 다른 경우 해당 패러프레이즈 쌍을 제외시킨다. 표현들에 대한 극성 분석은 종래에 제안된 다양한 감정 극성에 관한 사전을 이용하여 수행할 수 있다.Fourth, it is a case where the sentiments of the two emotional expressions of the paraphrase pair are different. Since the paraphrase expression has the same meaning as the original expression, it is self-evident that the original expression has the polarity of emotion. Therefore, if the two expressions have different polarities, the corresponding paraphrase pair is excluded. Polarity analysis of expressions may be performed using a dictionary related to various emotional polarities proposed in the prior art.

전술된 오류 필터링 단계의 과정은 종래의 연구에서 고려되지 않았던 동철이의어와 관련된 문제를 해소하고, 오류 분석에 따른 감정 사전의 품질 향상에 기여할 수 있다.The process of the above-described error filtering step can solve problems related to synonyms that have not been considered in conventional studies, and can contribute to improving the quality of the emotion dictionary according to error analysis.

이어서, 감정 사전 구축 단계(S140)는 감정 단어와 추출된 감정 표현들을 다음과 같은 형식으로 구성된 감성 사전의 형태로 구축하는 과정이다.Subsequently, the step of constructing the emotion dictionary ( S140 ) is a process of constructing the emotion words and the extracted emotion expressions in the form of an emotion dictionary configured in the following format.

“감정 범주 : 감정 표현 : 형태소의 품사가 주석된 감정 표현 : 형태소의 품사가 주석된 감정 표현의 어간 : 감정 범주 확률”“Emotional category: Emotional expression: Emotional expression annotated with part-of-speech of morpheme: Stem of emotional expression annotated with part-of-speech of morpheme: Emotion category probability”

(예시 4)(Example 4)

가. 기쁨 : 재미있다 : 재미있/VA 다/EF : 재미있/VA : 1.0go. Joy: Fun: Fun/VA Da/EF: Fun/VA: 1.0

나. 기쁨 : 기분이 좋았다 : 기분/NNG 이/JKS 좋/VA 았/EP 다/EF : 기분/NNG 좋/VA : 1.0me. Joy: I felt good: I felt good/NNG was/JKS was good/VA was/EP was everything/EF: I was feeling/NNG was good/VA: 1.0

사용자로부터 텍스트를 입력받는 단계(S200)Step of receiving text input from the user (S200)

본 발명의 실시예에 따른 텍스트는 단일 문장으로 구성된 것에서부터 복수 개의 문단으로 이루어진 문단 또는 문서를 포함한다.A text according to an embodiment of the present invention includes a paragraph or a document consisting of a single sentence to a plurality of paragraphs.

입력받은 텍스트로부터 감정 구조를 인식하는 단계(S300)Recognizing the emotional structure from the received text (S300)

도 3는 본 발명의 실시예에 따른 텍스트 상의 감정 구조 인식 시스템에 대한 흐름도이다. 텍스트의 감정 구조 인식 단계(S300)는 전술된 단계서 구축된 구 단위의 감정 사전을 이용하여 텍스트 상의 감정 표현을 인식하고 감정 범주를 할당하며, 해당 감정에 관계되는 감정의 주체와 감정의 대상을 인식하는 단계이다. 이는 텍스트 구문 분석 단계(S310), 감정 범주 인식 단계(S320), 감정 주체 및 대상 인식 단계(S330)를 포함한다. 이를 통해 문장 또는 문서 전체가 나타내는 감정을 인식하는 것이 아닌, 텍스트의 일 부분에 존재하는 세밀한 감정 구조들을 인식하는 것을 가능케 한다.3 is a flowchart of a system for recognizing an emotion structure on a text according to an embodiment of the present invention. In the step of recognizing the emotional structure of the text (S300), the emotional expression in the text is recognized and the emotional category is assigned using the phrase-based emotion dictionary built in the above-described step, and the subject of the emotion and the object of the emotion related to the emotion are identified. This is the recognition stage. This includes a text syntax analysis step (S310), an emotion category recognition step (S320), and an emotion subject and object recognition step (S330). Through this, it is possible to recognize the detailed emotional structures existing in a part of the text, rather than recognizing the emotion expressed by the sentence or the entire document.

텍스트 구문 분석 단계(S310)는 사용자로부터 입력받은 텍스트로부터 문장들을 분리하고, 각 문장에 대한 형태소 분석, 구(phrase) 구조 문법 분석, 의존 문법 분석을 수행한다. 이를 위해 형태소 분석기, 구 구조 파서(parser), 의존 파서 등이 이용될 수 있다.In the text syntax analysis step S310, sentences are separated from the text input by the user, and morpheme analysis, phrase structure grammar analysis, and dependency grammar analysis are performed for each sentence. For this purpose, a stemming analyzer, a phrase structure parser, a dependency parser, or the like may be used.

감정 범주 인식 단계(S320)는 전술한 단계에서 구축된 감정 사전을 통해 각 문장으로부터 감정 표현을 찾고 감정 범주를 할당하는 과정이다. 감정 사전으로부터 문장 내 감정 표현을 찾을 때 다음 예시와 같이 동철이의어 문제가 발생될 수 있다. The emotion category recognition step S320 is a process of finding an emotion expression from each sentence through the emotion dictionary built in the above step and assigning an emotion category. When searching for an emotional expression in a sentence from an emotion dictionary, a problem with the same word may occur as shown in the following example.

(예시 5)(Example 5)

감정 사전 어휘: 슬픔: 수치 : 수치/NNG : 수치/NNG : 0.6666Sentiment Dictionary Vocabulary: Sadness: Shame: Shame/NNG: Shame/NNG: 0.6666

주어진 문장: “조사된 수치에 따르면 이는 심각한 문제이다.”Given sentence: “According to the figures investigated, this is a serious problem.”

동철이의어 문제를 해소하기 위해 본 발명에서는 감정 사전 어휘의 네 번째 필드에 해당하는 어간과 이에 대응하는 품사를 가진 어휘에 대해서만 감정 표현으로 인식한다. 또한 구 단위에서는 여러 한정사들에 의해 감정 표현이 확실해지는 경우가 있지만(예시: 수치감을 느꼈다) 단어의 경우 동철이의어 문제가 더 심각하게 발생할 수 있다. 따라서 단어의 경우에만 품사 일치 외에도 문장 내에서 해당 단어에 직/간접적으로 의존하는 단어들 가운데 감정 극성을 가진 표현이 존재할 경우 문장 내에 해당 표현을 감정 표현으로 상정하여 사전에 수록된 감정 범주를 할당하게 된다. In order to solve the problem of cognate words, in the present invention, only a word having a stem corresponding to the fourth field of the emotion dictionary vocabulary and a corresponding part-of-speech is recognized as an emotional expression. Also, in the case of phrases, there are cases where the expression of emotion is made clear by various qualifiers (eg: I felt a sense of shame), but in the case of words, the problem of synonymous words can arise more seriously. Therefore, only in the case of words, in addition to the part-of-speech agreement, if there is an expression with emotional polarity among words that directly or indirectly depend on the word in the sentence, the emotion category recorded in the dictionary is assigned by assuming that the expression is the emotional expression in the sentence. .

감정 주체 및 대상 인식 단계(S330)는 전술한 단계에서 탐색된 감정 표현에 관계되는 감정의 주체와 대상을 탐색하는 과정이다. 본 발명의 실시예에서는 탐색된 감정 표현과 직접적으로 또는 간접적으로 의존 관계를 가지는 주어를 포함하는 명사구를 감정의 주체로 인식하며, 목적어를 포함하는 명사구 또는 개체, 및 사건을 포함하는 부사구를 감정의 대상으로 인식할 수 있다. The emotional subject and object recognition step S330 is a process of searching for the emotional subject and object related to the emotional expression searched for in the above-described step. In an embodiment of the present invention, a noun phrase including a subject having a direct or indirect dependence relationship with the searched emotional expression is recognized as the subject of emotion, and a noun phrase or entity including an object, and an adverb phrase including an event target can be recognized.

(예시 6)(Example 6)

그는/[NP_SBJ;명사구_주어] 나에게/[NP_AJT;명사구_용언 수식어] 단단히/AP_AJT[부사구_용언 수식어] 화났다/[분노]he/[NP_SBJ;noun-phrase_subject]/[NP_AJT;noun-phrase_verb-modifier] vehemently/AP_AJT[adverb-phrase_verb modifier] angry/[anger]

의존관계 1: 그는 (주어 포함) -> 화났다Dependency 1: he (including subject) -> angry

의존관계 2: 나에게 (목적어 포함) -> 화났다Dependency 2: at me (including object) -> angry

이상에서 실시예를 중심으로 설명하였으나 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 예를 들어, 실시예에 구체적으로 나타난 각 구성 요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.In the above, the embodiment has been mainly described, but this is only an illustration and does not limit the present invention, and those of ordinary skill in the art to which the present invention pertains are not exemplified above in the range that does not depart from the essential characteristics of the present embodiment. It can be seen that various modifications and applications are possible. For example, each component specifically shown in the embodiment can be implemented by modification. And differences related to such modifications and applications should be construed as being included in the scope of the present invention defined in the appended claims.

S100: 패러프레이징을 통한 감정 사전 구축 단계
S200: 사용자로부터 텍스트를 입력받는 단계
S300: 입력받은 텍스트로부터 감정 구조를 인식하는 단계S100: Steps of building an emotion dictionary through paraphrase
S200: Step of receiving text input from the user
S300: Recognizing the emotional structure from the received text

Claims

As a method of constructing an emotion dictionary and recognizing an emotion structure performed on a recording medium recorded in a language executable by a computer, the method comprising:
constructing an emotion dictionary including emotion categories in phrase units through paraphrasing; and
recognizing an emotional structure including an emotional category for an emotional expression appearing in the received text, an emotional subject corresponding to the emotional expression, and an emotional object corresponding to the emotional expression, based on the built-up emotion dictionary; including,
The step of building the emotion dictionary is,
extracting a phrase-unit paraphrase pair from a parallel corpus composed of two languages;
extracting an emotional expression from the extracted paraphrase pair; and
Constructing the extracted emotional expression in the form of a dictionary; Containing, a method of constructing an emotion dictionary and recognizing an emotion structure.

According to claim 1,
The step of constructing the emotion dictionary is performed including the step of expanding emotion words in phrase units through paraphrase, emotion dictionary building and emotion structure recognition method.

3. The method of claim 2,
The step of constructing the emotion dictionary is performed so as to exclude selection of an expression other than the emotional expression corresponding to the same word, the emotion dictionary building and emotion structure recognition method.

3. The method of claim 2,
The step of constructing the emotion dictionary is performed so as to exclude cases in which the emotion polarities of the two expressions of the paraphrase pair are different from each other.

According to claim 1,
The step of recognizing the emotional structure is performed to exclude an expression other than an emotional expression corresponding to the same word among expressions in the text from being recognized.

6. The method of claim 5,
The step of recognizing the emotional structure is performed to recognize the subject of the emotion and the object of the emotion related to the recognized emotion expression, the method of constructing an emotion dictionary and recognizing the emotion structure.

According to claim 1,
The step of extracting the paraphrase pair,
A method for constructing an emotion dictionary and recognizing an emotion structure, performing phrase alignment between an original sentence constructed in parallel on the corpus and a translated sentence therefor.

8. The method of claim 7,
The step of extracting the paraphrase pair,
A method of constructing an emotion dictionary and recognizing an emotion structure, calculating a translation probability between the two language expressions by using the frequency of expressions that appear together in the two languages over a plurality of translation sentence pairs, and performing the phrase unit alignment.

9. The method of claim 8,
A method for constructing an emotion dictionary and recognizing an emotion structure, wherein the translation probability between the two language expressions satisfies the following equation.

(k: first language expression, e: second language expression, count(e): number of appearances of second language expression in corpus, count(k,e): first and second language expression translation on parallel corpus frequency of occurrence in pairs)