KR101625787B1

KR101625787B1 - Method and server for estimating the sentiment value of word

Info

Publication number: KR101625787B1
Application number: KR1020150015800A
Authority: KR
Inventors: 이수원; 박해진
Original assignee: 숭실대학교산학협력단
Priority date: 2015-02-02
Filing date: 2015-02-02
Publication date: 2016-05-30
Also published as: WO2016125950A1

Abstract

The present invention provides a method and a server for estimating a sensitivity value of a word. According to one embodiment of the present invention, the method for the server estimating a sensitivity value of a word includes the following steps: (a) extracting a definition explanation and a synonym of a Korean dictionary for a certain word; (b) extracting each sensitivity word and a sensitivity value matched with the definition explanation and the synonym from a pre-established sensitivity dictionary; (c) calculating a simultaneous appearance frequency between the certain word and each of the each sensitivity word in an atypical text bundle and generating each simultaneous appearance frequency vector; and (d) estimating a sensitivity value of the certain word based on weightings of the definition explanation and the synonym, the each simultaneous appearance frequency vector, and the sensitivity value of the each sensitivity word.

Description

[0001] METHOD AND SERVER FOR ESTIMATING THE SENSITIVE VALUE OF WORD [0002]

본 발명은 단어의 감성 수치를 추정하는 기술에 관한 것이다.
The present invention relates to a technique for estimating emotion value of a word.

텍스트에서 감성을 분석하는 기술은 과거 긍정/중립/부정의 세 가지로 분석되고 있었지만 현재 다수의 감성으로 분석하는 기술로 발전하고 있는데, 여기서 가장 중요한 기술은 각 단어가 어느 정도의 감성을 가지고 있는지를 파악하는 것이다.The technique of analyzing emotions in text has been analyzed in three ways of positive / neutral / negative, but now it is developing into a technique of analyzing with many emotions. Here, the most important technique is to determine how much sensitivity each word has It is.

이에, 최근 다양한 감성으로 단어를 분류하려는 시도가 있지만, 단어의 감성 정도를 정량화하는 연구는 많지 않으며 단어에 대한 감성을 판단 시 ‘강한 긍정’과 ‘약한 긍정’을 분류하지 못해 고도화된 감성 분석이 불가능한 한계가 있다.Recently, there have been many attempts to classify words with various emotions, but there have not been many studies to quantify the degree of emotions of words. When the emotions of words are judged, it is difficult to classify 'strong positive' and 'weak positive' There is an impossible limit.

또한, 대부분의 감성 수치 연구는 연구자 및 전문가의 설문 조사를 통하여 이루어지고 있는데 이는 비용과 시간이 많이 소요되는 문제가 있다.
In addition, most emotional numerical studies are conducted through surveys of researchers and experts, which are costly and time consuming.

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로, 단어에 대한 한국어 사전의 뜻풀이와 유의어를 이용하여 단어의 감성 수치를 자동으로 추정할 수 있는 방안을 제공하고자 한다.
Disclosure of Invention Technical Problem [8] Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and an object of the present invention is to provide a method for automatically estimating sentence numerical values of a word using a dictionary of Korean words and synonyms.

상기와 같은 목적을 달성하기 위해, 본 발명의 일 실시예에 따른 서버가 단어의 감성 수치를 추정하는 방법은 (a) 특정 단어에 대하여 한국어 사전의 뜻 풀이와 유의어를 추출하는 단계, (b) 기 구축된 감성 사전에서 상기 뜻 풀이, 유의어와 매칭되는 각각의 감성 단어 및 감성 수치를 추출하는 단계, (c) 비정형 텍스트 뭉치에서 상기 특정 단어와 상기 각 감성 단어간 동시 출현 빈도를 계산하고 각각의 동시 출현 빈도 벡터를 생성하는 단계 및 (d) 상기 뜻 풀이와 유의어의 각 가중치, 상기 각 동시 출현 빈도 벡터 및 상기 각 감성 단어의 감성 수치에 근거하여 상기 특정 단어의 감성 수치를 추정하는 단계를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a method for estimating emotion value of a word, the method comprising the steps of: (a) extracting a meaning and a synonym of a Korean dictionary for a specific word; (b) (C) calculating the frequency of co-occurrence between the specific word and each of the emotional words in the unstructured text bundle, and calculating a frequency of each of the emotional words and the emotional value, And (d) estimating emotion values of the specific word based on the weights of the meaning solution and the synonyms, each of the simultaneous appearance frequency vectors, and the emotion value of each emotion word .

상기와 같은 목적을 달성하기 위해, 본 발명의 일 실시예에 따른 단어의 감성 수치를 추정하는 서버는 특정 단어에 대하여 한국어 사전의 뜻 풀이와 유의어를 추출하는 뜻 풀이 및 유의어 추출부, 기 구축된 감성 사전에서 상기 뜻 풀이, 유의어와 매칭되는 각각의 감성 단어 및 감성 수치를 추출하는 감성 단어 추출부, 비정형 텍스트 뭉치에서 상기 특정 단어와 상기 각 감성 단어간 동시 출현 빈도를 계산하고 각각의 동시 출현 빈도 벡터를 생성하는 동시 출현 빈도 계산부; 및 상기 뜻 풀이와 유의어의 각 가중치, 상기 각 동시 출현 빈도 벡터 및 상기 각 감성 단어의 감성 수치에 근거하여 상기 특정 단어의 감성 수치를 추정하는 감성 수치 추정부를 포함하는 것을 특징으로 한다.
According to an aspect of the present invention, there is provided a server for estimating emotion value of a word according to an embodiment of the present invention includes a meaning extraction unit and a thesaurus extraction unit for extracting meanings and synonyms of a Korean dictionary for a specific word, An emotional word extracting unit for extracting each emotional word and emotional value matched with the meaning solution and the thesaurus in the emotional dictionary, and a simultaneous occurrence frequency between the specific word and each emotional word in the unstructured text batch, A simultaneous appearance frequency calculating unit for generating a vector; And an emotion value estimating unit for estimating an emotion value of the specific word based on each weight of the meaning solution and the synonyms, each of the simultaneous appearance frequency vectors, and the emotion value of each emotion word.

본 발명의 일 실시예에 따르면, 현재 온라인상에서 많이 사용되는 감성 단어의 빈도를 고려하여 추정된 감성 수치로 감성 사전을 확장할 수 있다According to an embodiment of the present invention, the emotion dictionary can be extended with the emotion value estimated by considering the frequency of the emotion word frequently used on-line at present

또한, 본 발명의 이용하여 감성 사전을 구축 시, 단어의 품사에 상관 없이 감성 수치를 추정할 수 있다.Further, when building the emotion dictionary using the present invention, the emotion value can be estimated irrespective of the parts of speech of the word.

또한, 종래 연구보다 정밀한 감성 분석을 시행할 수 있도록 감성 사전을 구축하는데 소비되는 비용과 시간을 줄일 수 있다.Also, the cost and time consumed in constructing the emotional dictionary can be reduced so that the emotional analysis can be performed more precisely than the conventional research.

본 발명의 효과는 상기한 효과로 한정되는 것은 아니며, 본 발명의 상세한 설명 또는 특허청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.
It should be understood that the effects of the present invention are not limited to the above effects and include all effects that can be deduced from the detailed description of the present invention or the configuration of the invention described in the claims.

도 1은 본 발명의 일 실시예에 따른 단어의 감성 수치를 추정하는 서버의 구성을 도시한 블로도이다.
도 2는 본 발명의 일 실시예에 따른 특정 단어의 감성 수치를 추정한 실험 결과를 나타낸 것이다.
도 3은 본 발명의 다른 실시예에 따른 특정 단어의 감성 수치를 추정한 실험 결과를 나타낸 것이다.
도 4는 본 발명의 일 실시예에 따른 특정 단어의 감성 수치를 추정하는 과정을 도시한 흐름도이다.1 is a block diagram showing a configuration of a server for estimating emotion value of a word according to an embodiment of the present invention.
FIG. 2 is a graph showing an experimental result of estimating the emotion value of a specific word according to an embodiment of the present invention.
FIG. 3 shows an experimental result of estimating the emotion value of a specific word according to another embodiment of the present invention.
4 is a flowchart illustrating a process of estimating emotion values of a specific word according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명을 설명하기로 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 따라서 여기에서 설명하는 실시예로 한정되는 것은 아니다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "indirectly connected" .

또한 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 구비할 수 있다는 것을 의미한다.Also, when an element is referred to as "comprising ", it means that it can include other elements, not excluding other elements unless specifically stated otherwise.

이하 첨부된 도면을 참고하여 본 발명의 실시예를 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 단어의 감성 수치를 추정하는 서버의 구성을 도시한 블로도이다.1 is a block diagram showing a configuration of a server for estimating emotion value of a word according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 단어의 감성 수치를 추정하는 서버(100)는 뜻 풀이 및 유의어 추출부(110), 감성 단어 추출부(120), 동시 출현 빈도 계산부(130) 및 감성 수치 추정부(140)를 포함할 수 있다.The server 100 for estimating the emotion value of a word according to an embodiment of the present invention includes a mean solving and word extraction unit 110, an emotion word extraction unit 120, a simultaneous appearance frequency calculation unit 130, And may include a < / RTI >

각 구성 요소를 설명하면, 뜻 풀이 및 유의어 추출부(110)는 한국어 감성 사전을 자동 구축하기 위하여 감성 수치 추정 대상 단어(이하, ‘특정 단어’라 칭함)에 대한 한국어 사전의 뜻 풀이와 유의어를 추출할 수 있다.In order to automatically construct the Korean emotional dictionary, the meaning solution and the thesaurus extracting unit 110 will use the Korean dictionary and the synonyms of the Korean word for emotional value estimation target words (hereinafter, referred to as 'specific words') Can be extracted.

예를 들어, ‘행복’이라는 특정 단어에 대한 네이버(naver) 국어 사전은 뜻 풀이로서 ‘복된 좋은 운수, 생활에서 충분한 만족과 기쁨을 느끼며 흐믓함. 또는 그러한 상태’라고 기재하고 있으며, 유의어는 ‘만족, 복, 흡족’으로 기재하고 있다.For example, the Naver dictionary for the specific word 'happiness' is a good interpretation of' good fortune, blissful enough to feel satisfaction and joy in life. Or such state ', and the synonyms thereof are described as' satisfaction, blessing, satisfaction'.

이 경우, 뜻 풀이 및 유의어 추출부(110)는 ‘행복’에 대한 뜻 풀이와 유의어로서 ‘복된 좋은 운수, 생활에서 충분한 만족과 기쁨을 느끼며 흐믓함. 또는 그러한 상태’ 및 ‘만족, 복, 흡족’을 각각 추출할 수 있다.In this case, the meaning extraction and the thesaurus extracting unit 110 may be interpreted as a synonym of 'happiness' and a synonym of' blessed good fortune, enough satisfaction and joy in life. Or 'state' and 'satisfaction, blessing, satisfaction' can be respectively extracted.

한편, 감성 단어 추출부(120)는 뜻풀이 및 유의어 추출부(110)에서 추출된 해당 특정 단어에 대한 뜻 풀이와 유의어를 기준으로 기 구축된 감성 사전에서 매칭되는 감성 단어를 추출할 수 있다.On the other hand, the emotional word extracting unit 120 may extract emotional words matched in the emotional dictionary based on the thesaurus and the meaning solution for the specific word extracted by the meaning solution and thesaurus extracting unit 110.

이하 설명의 편의 상, 특정 단어에 대한 뜻 풀이 또는 유의어를 기준으로 기 구축된 감성 사전에서 매칭되는 감성 단어를 ‘매칭된(되는) 감성 단어’라 칭하도록 한다.For convenience of explanation, the emotional word matched in the emotional dictionary constructed based on the interpretation or the synonym of the specific word is referred to as a matching emotional word.

참고로, ‘기 구축된 감성 사전’은 감성 단어와 각 감성 단어별 감성 수치를 포함할 수 있다.For reference, 'pre-built emotional dictionary' can include emotional word and emotional value for each emotional word.

한편, 동시 출현 빈도 계산부(130)는 일상 생활에서 사용되는 감성 단어의 빈도를 고려하기 위해, 비정형 텍스트 뭉치에서 특정 단어 및 그와 매칭된 감성 단어간 동시 출현 빈도 벡터를 생성할 수 있다.Meanwhile, the co-occurrence frequency calculation unit 130 may generate a co-occurrence frequency vector between the specific word and the matched emotional word in the unstructured text bundle, in order to consider the frequency of the emotional word used in daily life.

여기서 특정 단어 및 특정 단어의 뜻 풀이에서 매칭된 감성 단어간 동시 출현 빈도 벡터의 구조는 아래와 같다.Here, the structure of the coexistence frequency vector between emotional words matched in the interpretation of a specific word and a specific word is as follows.

여기서 w는 특정 단어를 의미하고, m은 특정 단어의 뜻 풀이에서 매칭된 감성 단어의 개수를 의미하며, (w, mean)은 특정 단어 w와 뜻 풀이에서 매칭된 감성 단어 meani와의 동시 출현 빈도를 의미한다.(W, mean) represents the concurrent appearance frequency of the specific word w and the sensibility word meani, which are matched in the meaning solution, to the mean it means.

그리고, 특정 단어와 그 단어의 유의어에서 매칭된 감성 단어간 동시 출현 빈도 벡터의 구조는 아래와 같다.The structure of the coexistence frequency vector between the specific word and the emotional word matched by the word synonym is as follows.

여기서 w는 특정 단어를 의미하고, n은 특정 단어의 유의어에서 매칭된 감성 단어의 개수를 의미하며, (w, syn)은 특정 단어 w와 유의어에서 매칭된 감성 단어 synj와의 동시 출현 빈도를 의미한다.Here, w means a specific word, n means the number of emotional words matched in a specific word synonym, and (w, syn) means a concurrent appearance frequency with a sensible word synj matched with a specific word w in a synonym .

참고로, ‘비정형 텍스트 뭉치’는 해당 단어가 속하는(주로 사용되는) 분야로 카테고리가 한정될 수도 있고 그렇지 않을 수도 있다.For reference, 'unstructured text bundles' may or may not be categorized by the field to which the word belongs (mainly used).

한편, 감성 수치 추정부(140)는 뜻 풀이 및 유의어에 대한 가중치와 동시 출현 빈도 벡터, 그리고 매칭된 감성 단어의 감성 수치를 이용하여 특정 단어의 감성 수치를 추정할 수 있다.On the other hand, the emotion value estimating unit 140 can estimate the emotion value of a specific word using the weight of the meaning solution and the synonyms, the coexistence frequency vector, and the emotion value of the matched emotion word.

이때, 감성 수치 추정부(140)는 특정 단어와의 동시 출현 빈도가 높은 감성 단어일수록 해당 특정 단어의 감성 수치를 추정하는데 영향을 미칠 것이라는 가정에서 각 매칭된 감성 단어의 감성 수치에 상기 동시 출현 빈도를 가중치로 적용할 수 있다.At this time, the emotional value estimating unit 140 estimates the emotional value of each of the matched emotional words based on the assumption that the emotional value of the specific word is influenced to the estimation of the emotional value of the specific word, Can be applied as weights.

또한, 감성 수치 추정부(140)는 뜻 풀이와 유의어가 감성 수치에 미치는 영향을 고려하기 위해, 뜻 풀이에서 매칭된 감성 단어의 가중치를 α로 하고 유의어에서 매칭된 감성 단어의 가중치를 (1-α)로 하여 특정 단어의 감성 수치를 추정할 수 있다.Also, in order to consider the influence of the mean solving and the synonyms on the emotional value, the emotional-value estimating unit 140 sets the weight of the emotional word matched in the meaning solution to a, α), the emotion value of a specific word can be estimated.

감성 수치 추정부(140)는 특정 단어 w의 감성 수치를 추정하기 위하여 뜻 풀이 및 유의어에서 매칭된 감성 단어에 대한 가중치 α와 w의 뜻 풀이에서 매칭된 감성 단어 meani의 동시 출현 빈도 가중치 f(w, mean _i ) 및 w의 유의어에서 매칭된 감성 단어 synj의 동시 출현 빈도 가중치 f(w, syn _j )를 적용할 수 있으며, 이에 근거한 특정 단어 w의 감성 수치 계산식은 아래의 [수학식 1]과 같다.In order to estimate the emotion value of the specific word w, the emotional value estimating unit 140 calculates a simultaneous occurrence frequency weight f (w ( t)) of the matching emotion word meani in the meaning solution of the weighting a and w for the emotion word matched in the meaning solution and the thesaurus, , mean _i) and the emotional words co-occurrence frequency weighting of synj matching on synonyms of w f (w, can apply syn _j), this emotion value calculation formula based w certain words and equation 1 below same.

[수학식 1][Equation 1]

여기서, 단어 meani는 특정 단어의 뜻 풀이에서 매칭된 감성 단어이고, synj는 특정 단어의 유의어에서 매칭된 감성 단어를 의미한다.Here, the word meani is an emotional word matched in a meaning solution of a specific word, and synj means an emotional word matched in a synonym of a specific word.

그리고, PDI는 Pleasure-Displeasure Index의 약자이며, PDI(meani)는 특정 단어 w의 뜻 풀이에서 매칭된 감성 단어 meani에 대한 기존 감성 사전에서의 감성 수치를 의미하고, PDI(syn _j )는 특정 단어 w의 유의어에서 매칭된 감성 단어 synj에 대한 기존 감성 사전에서의 감성 수치를 의미한다.And, PDI is Pleasure-Displeasure stands for Index, PDI (meani) refers to the emotion value of the existing sensitivity prior to the emotional words meani matching in means pool of a word w, and (syn _j) PDI is a word w is the emotion value in the existing emotion dictionary for the matching emotion word synj in the thesaurus.

도 2는 본 발명의 일 실시예에 따른 특정 단어의 감성 수치를 추정한 실험 결과를 나타낸 것이다.FIG. 2 is a graph showing an experimental result of estimating the emotion value of a specific word according to an embodiment of the present invention.

도 2는 ‘속상하다’라는 특정 단어의 감성 수치를 도 1에 도시된 서버(100)를 통해 추정한 것으로서, 특정 단어와 사전 뜻 풀이에서 매칭된 감성 단어들(‘불편하다’ 및 ‘우울하다’)과의 동시 출현 빈도 벡터, 그리고 특정 단어와 유의어에서 매칭된 감성 단어들(‘화나다’ 및 ‘괴롭다’)과의 동시 출현 빈도 벡터를 이용하여 가중치 α를 매칭된 감성 단어 수치에 적용할 수 있다.FIG. 2 shows an emotion value of a specific word 'upsetting' through the server 100 shown in FIG. 1, in which emotional words ('uncomfortable' and 'depressed' ') Can be applied to the matched emotional word values using the coincidence frequency vector, and the coincidence frequency vector between the matched emotional words (' bad 'and' bad ') in the specific word and thesaurus have.

상기 [수학식 1]의 파라미터 α를 0.4로 하여 PDI(속상하다)를 계산하면 ‘속상하다’의 감성 수치는 2.398로 산출된다.When the parameter α of the above equation (1) is set to 0.4 and the PDI (distracted) is calculated, the sensibility value of 'upset' is calculated to be 2.398.

도 3은 본 발명의 다른 실시예에 따른 특정 단어의 감성 수치를 추정한 실험 결과를 나타낸 것이다.FIG. 3 shows an experimental result of estimating the emotion value of a specific word according to another embodiment of the present invention.

참고로, 실험 방법은 Leave-One-Out 방법으로 기 구축된 감성 사전의 434개의 단어에 대해 [수학식 1]의 파라미터 α(뜻 풀이에서 매칭된 감성 단어의 가중치)를 0에서 1까지 0.1씩 변화시키고, Root Mean Square Error(RMSE)와 Mean Absolute Error(MAE)를 사용하여 본 발명의 일 실시예에 따른 감성 수치 추정 결과의 정밀도를 평가하였다.For reference, the experimental method is to set the parameter α (the weight of the emotional word matched in the meaning solution) of the 434 words of the emotion dictionary prepared in the Leave-One-Out method to 0.1 from 0 to 1 And the precision of the emotion value estimation result according to an embodiment of the present invention was evaluated using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).

여기서, ‘기 구축된 감성 사전’은 연세대에서 1998에 구축한 현대 한국어의 어휘 빈도 자료집 ? 추출한 감성 단어의 긍정/부정(쾌?불쾌) 수치를 정서 심리학 차원 이론에 근거한 7점 척도로 감성 정도를 수치화함 ? 을 이용하였다.Here, 'pre-built emotional dictionary' is a vocabulary frequency book of modern Korean language built in 1998 at Yonsei University. The positive / negative (discomfort) values of the extracted emotional words are quantified by the 7-point scale based on emotional psychological dimension theory. Was used.

또한, 기 구축된 감성 사전에서 제시된 기존 감성 단어의 감성 수치를 평균한 값(Avg)과 특정 단어의 PDI(meani) 및 PDI(syn _j )에 대한 감성 수치 평균 값을 비교 대상으로 하였다.Also, we compared the mean value (averaged) of emotional values of existing emotional words presented in the pre - established emotional dictionary with the emotional mean values of PDI (meani) and PDI (syn _j ) of a specific word.

그리고, 도 3의 실험 결과에 사용된 비정형 데이터, 즉, 비정형 텍스트 뭉치는 2013년 10월부터 2014년 1월까지의 하루 5만건으로 샘플링된 Twitter 데이터이며, 한국어 사전은 온라인 네이버 국어 사전을 이용하였다.The irregular data used in the experiment result of FIG. 3, that is, the unstructured text bundle, is the Twitter data sampled at 50,000 a day from October 2013 to January 2014, and the Korean dictionary uses the online Naver Korean dictionary .

실험 데이터에 출현하는 감성 단어의 수는 기 구축된 감성 사전에서 추출된 434개이고, 그 중 추정 감성 수치 계산이 가능한 309개의 단어에 대해서 감성 수치를 추정하였다.The number of emotional words appearing in the experimental data is 434 extracted from the pre - established emotional dictionaries, and the emotional value is estimated for 309 words which can calculate the estimated emotional numerical value.

[수학식 1]의 파라미터 α 값을 0.1씩 증가시켜 총 10번의 실험을 수행한 결과, 비교 대상 중 Avg의 RMES는 1.2785이고, MAE는 1.0968이었으며, 특정 단어의 PDI(meani) 및 PDI(syn _j )에 대한 감성 수치 평균 값의 RMES는 0.9301이고, MAE는 0.5965이었다.As a result of performing a total of 10 experiments by increasing the parameter α value of Equation 1 by 0.1, the RMES of the comparison object was 1.2785, the MAE was 1.0968, and the PDI (mean _i ) and the PDI (syn _j ), The RMES of mean value was 0.9301 and the MAE was 0.5965.

여기서, [수학식 1]의 파라미터 α 값이 0일 때, 즉, 단어의 유의어만을 고려한 경우, 특정 단어의 PDI(meani) 및 PDI(syn _j )에 대한 감성 수치 평균 값의 RMES는 0.9253이고, MAE는 0.5841이었고, [수학식 1]의 파라미터 α 값이 1일 때, 즉, 단어의 뜻 풀이만을 고려한 경우, 특정 단어의 PDI(mean _i ) 및 PDI(syn _j )에 대한 감성 수치 평균 값의 RMES는 0.9748이고, MAE는 0.6312이었다.Here, when the parameter? Value of Equation (1) is 0, i.e., only the word synonyms are considered, the RMES of the emotion value average value for the PDI (meani) and the PDI (syn _j ) of the specific word is 0.9253, MAE was 0.5841 and the sensitivity value average value of the PDI (mean _i ) and the PDI (syn _j ) of the specific word when the parameter? Value of [Equation 1] was 1, The RMES was 0.9748 and the MAE was 0.6312.

[수학식 1]의 파라미터 α 값을 0.1씩 증가시켜 감성 수치를 추정한 결과, RMSE 척도에서는 파라미터 α 값이 0.3일 때 가장 높은 성능인 0.9169이었고, MAE 척도에서는 파라미터 α 값이 0.2일 때 가장 높은 성능인 0.5777로서 비교 대상보다 높았다.The empirical value was estimated by increasing the parameter α of Equation 1 by 0.1. As a result, in the RMSE scale, the highest performance was 0.9169 when the parameter α was 0.3, and the highest was 0.9169 when the parameter α was 0.2 The performance was 0.5777, which was higher than that of the comparison.

즉, 본 발명의 일 실시예에 따른 특정 단어에 대한 뜻 풀이와 유의어를 모두 고려했을 때 기존의 방식보다 더 좋은 성능을 보임을 알 수 있다.In other words, it can be seen that the performance is better than that of the conventional method when considering both the meaning solution and the synonyms of the specific word according to the embodiment of the present invention.

도 3에 도시된 실험 결과는 ‘한(恨)’과 ‘놀라다’라는 특정 단어에 대하여 한국어 사전의 뜻 풀이 및 유의어와 매칭된 기 구축된 감성 사전의 감성 단어가 나타나 있으며, 각 감성 단어별로 감성 수치와 비정형 텍스트 뭉치에서의 동시 출현 빈도가 나타나 있다.The experimental results shown in Fig. 3 show emotional words of pre-built emotional dictionaries matched with the meanings and synonyms of the Korean dictionaries for the specific word " han " and " The frequency of simultaneous occurrence in numerical and unstructured text bundles is shown.

그리고, [수학식 1]의 파라미터 α 값을 0.1씩 증가시켜 가는 과정 중 파라미터 α 값이 특정 단어 ‘한’의 경우 0.3, ‘놀라다’의 경우 0.5일 때 실제 감성 수치와 추정 감성 수치가 나타나 있다.In the process of increasing the parameter alpha value of [Equation 1] by 0.1, the actual sensitivity value and the estimated emotion value are shown when the parameter alpha value is 0.3 when the specific word is 'Han' and 0.5 when it is 'surprised' .

도 4는 본 발명의 일 실시예에 따른 특정 단어의 감성 수치를 추정하는 과정을 도시한 흐름도이다.4 is a flowchart illustrating a process of estimating emotion values of a specific word according to an embodiment of the present invention.

도 4의 흐름도는 도 1에 도시된 서버(100)에 의해 수행될 수 있으며, 이하, 도 1에 도시된 서버(100)를 주체로 도 4의 흐름도를 설명하도록 한다.The flowchart of FIG. 4 can be performed by the server 100 shown in FIG. 1. Hereinafter, the flowchart of FIG. 4 will be described mainly with the server 100 shown in FIG.

먼저, 서버(100)는 특정 단어에 대한 한국어 사전의 뜻 풀이와 유의어를 추출한다(S401).First, the server 100 extracts meanings and synonyms of a Korean dictionary for a specific word (S401).

S401 후, 서버(100)는 S401에서 추출된 특정 단어에 대한 뜻 풀이와 유의어를 기준으로 기 구축된 감성 사전에서 매칭되는 감성 단어를 추출한다(S402).After step S401, the server 100 extracts emotional words matched in the pre-built emotional dictionary on the basis of a word solution and a synonym for the specific word extracted in step S401 (S402).

S402 후, 서버(100)는 비정형 텍스트 뭉치에서 특정 단어 및 S402에서 추출된 감성 단어간 동시 출현 빈도를 계산하고 동시 출현 빈도 벡터를 생성한다(S403).After step S402, the server 100 calculates the coexistence frequency between the specific word and the sensible word extracted in step S402 in the unstructured text bundle and generates a coexistence frequency vector (step S403).

S403 후, 서버(100)는 뜻 풀이와 유의어에 대한 가중치와 동시 출현 빈도 벡터 및 매칭된 감성 단어의 감성 수치를 이용하여 특정 단어의 감성 수치를 추정한다(S404).After step S403, the server 100 estimates the emotion value of a specific word using the weight for the meaning solution, the synonym appearance frequency vector, and the emotion value of the matched emotion word (S404).

여기서 서버(100)는 특정 단어와의 동시 출현 빈도가 높은 감성 단어일수록 특정 단어의 감성 수치를 추정하는데 영향을 미칠 것이라는 가정에서 각 매칭된 감성 단어 수치에 동시 출현 빈도를 가중치로 적용할 수 있으며, 상기 [수학식 1]을 이용하여 특정 단어의 감성 수치를 추정할 수 있다.Here, the server 100 can apply the simultaneous occurrence frequency to each matched emotional word value as a weight, assuming that the emotional word having a high frequency of simultaneous occurrence with a specific word affects the estimation of the emotional value of the specific word, The emotion value of a specific word can be estimated by using Equation (1).

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다.It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be.

그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.
The scope of the present invention is defined by the appended claims, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included within the scope of the present invention.

100 : 감성 수치 추정 서버
110 : 뜻 풀이 및 유의어 추출부
120 : 감성 단어 추출부
130 : 동시 출현 빈도 계산부
140 : 감성 수치 추정부100: Sensibility value estimation server
110: Meaning solving and thesaurus extracting unit
120: emotional word extracting unit
130: Simultaneous appearance frequency calculation unit
140:

Claims

In a method for a server to estimate an emotion value of a word,
(a) extracting a meaning and a synonym of the Korean dictionary for a specific word;
(b) extracting each of the emotional words and emotional values matched with the meaning solution and the synonyms in the pre-established emotional dictionary;
(c) calculating a frequency of simultaneous occurrence between the specific word and each emotional word in the unstructured text bundle and generating a respective coincidence frequency vector; And
(d) each weight of the mean and thesaurus, the respective coincidence frequency vector and
Estimating an emotion value of the specific word based on the emotion value of each emotion word
, &Lt; / RTI &
The step (c)
As the co-occurrence frequency vector
(Hereinafter referred to as a first simultaneous appearance vector) between the specific word and the emotional word matching the meaning solution and a coexistence frequency vector between the specific word and the emotional word matching the same word Quot; concurrent appearance vector "), respectively,
The step (d)
And applying the simultaneous appearance frequency to the emotion value of the emotional word to calculate the weights of the meaning solution and the synonyms.

delete

The method according to claim 1,
The step (c)
Calculating the first coincidence frequency vector based on the specific word, the number of emotion words matched with the meaning solution, and the coexistence frequency between the specific word and the emotion word matching the meaning solution,
The second simultaneous appearance frequency vector is calculated on the basis of the specific word, the number of emotion words matched with the synonym of the specific word, and the frequency of coexistence between emotion words matching the specific word and the synonym word, Numerical estimation method.

delete

The method according to claim 1,
Wherein the atypical text bundle includes data of one or more categories in which the specific word is used.

A computer program stored in a recording medium comprising a series of instructions for performing the method according to any one of claims 1, 3 and 5.

A server for estimating emotion value of a word,
An extracting unit and a thesaurus extracting unit for extracting meaning and synonyms of a Korean dictionary for a specific word;
An emotional word extracting unit for extracting emotional words and emotional words matched with the meaning solution and the synonyms in the pre-established emotional dictionary;
Simultaneous appearance of the specific word and each emotional word in an unstructured text bundle
A concurrent appearance frequency calculation unit for calculating a frequency and generating a respective coincidence frequency vector; And
A sensory numerical value estimating unit for estimating emotional numerical values of the specific word based on the respective weights of the meaning solution and the synonyms, each of the simultaneous appearance frequency vectors, and the emotion value of each emotion word,
, &Lt; / RTI &
Wherein the coexistence frequency calculator comprises:
A simultaneous appearance frequency vector (hereinafter referred to as a first concurrent appearance vector) between the specific word and the emotional word matching with the meaning solution as the simultaneous appearance frequency vector,
(Hereinafter referred to as a second simultaneous appearance vector) between the specific word and the emotional word matched with the synonym,
Wherein the emotion value estimating unit comprises:
Wherein the server calculates the respective weights of the meaning solution and the synonym by applying the simultaneous appearance frequency to the emotion value of the emotional word.

delete

8. The method of claim 7,
Wherein the coexistence frequency calculator comprises:
Calculating the first coincidence frequency vector based on the specific word, the number of emotion words matched with the meaning solution, and the coexistence frequency between the specific word and the emotion word matching the meaning solution,
Wherein the second concurrent occurrence frequency vector is calculated on the basis of the specific word, the number of emotion words matching the synonym of the specific word, and the frequency of coexistence between the specific word and the emotion word matching the synonym. .

delete