KR101842361B1

KR101842361B1 - An apparatus for analyzing sentiment of review data and method thereof

Info

Publication number: KR101842361B1
Application number: KR1020160112690A
Authority: KR
Inventors: 이현수; 김다해; 이세희; 봉원재; 김누리; 이지형
Original assignee: 성균관대학교산학협력단
Priority date: 2016-09-01
Filing date: 2016-09-01
Publication date: 2018-03-26
Also published as: KR20180025690A

Abstract

리뷰 데이터의 감성을 분류하기 위한 방법이 제공된다. 상기 방법은 텍스트 정보 및 상기 텍스트 정보에 대응하는 감성 레벨 정보를 각각 포함하는 복수의 리뷰 데이터들을 기반으로 머신 러닝을 통해 감성 분류 모델을 생성하는 단계와, 상기 감성 분류 모델을 이용하여 감성 레벨 정보를 포함하지 않는 리뷰 데이터의 감성 레벨을 결정하는 단계를 포함할 수 있으며, 감성 분류 모델을 생성하는 단계는 상기 텍스트 정보들 간의 유사도를 기반으로 상기 감성 레벨 정보를 조정하여 상기 감성 분류 모델에 반영할 수 있다. 따라서, 리뷰가 받은 추천 및 비추천 정보와, 다른 평점을 가진 리뷰 간의 유사도를 이용해 리뷰의 신뢰도를 고려하여 효과적으로 리뷰의 감성을 분류할 수 있다. A method for classifying emotions of review data is provided. The method includes generating an emotion classification model through machine learning based on a plurality of review data each including text information and emotion level information corresponding to the text information, And the step of generating the emotion classification model may include the step of adjusting the emotion level information based on the degree of similarity between the text information to reflect the emotion level information in the emotion classification model have. Therefore, the sensitivity of the review can be effectively classified considering the reliability of the review by using the similarity between the recommendation and the non-review information received from the review and the review having the different rating.

Description

FIELD OF THE INVENTION [0001] The present invention relates to a method and apparatus for classifying emotions of review data,

본 발명은 리뷰 데이터의 감성을 분류하기 위한 방법 및 장치에 관한 것으로서, 보다 구체적으로는 감성 분류 모델을 기반으로 리뷰 데이터의 감성을 분류하기 위한 장치 및 방법에 관한 것이다. The present invention relates to a method and apparatus for classifying emotions of review data, and more particularly to an apparatus and method for classifying emotions of review data based on emotional classification models.

최근 다양한 기계학습 기법을 이용해 제품에 대한 리뷰에 내재된 사용자의 감성을 이용하여 감성 분류를 수행하는 연구가 활발히 수행되고 있다 (비특허문헌 1).Recently, emotional classification is performed actively using user's emotions embedded in reviews of products using various machine learning techniques (Non-Patent Document 1).

감성에 따라 리뷰를 분류하는 데에 있어 리뷰 텍스트의 표현 방식은 성능에 깊은 영향을 미치며(비특허문헌 2), 효과적인 텍스트 표현을 위한 여러 방법들이 연구되었다. 또한 리뷰를 작성함에 있어서, 사용자 및 제품에 따라 리뷰에 자주 나타나는 단어가 다르며 사용자에 따라 평점을 매기는 분포 또한 다르다. 이러한 문제를 해결하기 위해 Tang (비특허문헌 1) 은 단어와 사용자 및 제품을 표현하는 벡터를 학습하였다. In classifying reviews according to emotions, the manner of expression of the review text has a profound effect on performance (Non-Patent Document 2), and various methods for effective text expression have been studied. Also, when writing a review, the words frequently appear in the review depending on the user and the product, and the distribution of rating according to the user is also different. In order to solve this problem, Tang (Non-Patent Document 1) learned a word, a user and a vector expressing a product.

하지만 이와 같은 방법은 사용자 및 제품에 대한 충분한 데이터를 필요로 하며, 또한 리뷰 내용이 일치하는 경우에도 사용자에 따라 다른 평점을 매기는 경우나, 리뷰 내용 자체의 신뢰도에 대한 문제를 다루지 못하는 문제점이 있다. However, such a method requires sufficient data on the user and the product, and even when the contents of the review are matched, there is a problem that the user can not deal with the problem of the reliability of the review content itself .

(비특허문헌 1) Tang, Duyu, Bing Qin, and Ting Liu. "Learning semantic representations of users and products for document level sentiment classification." Proc. ACL. 2015.(Non-Patent Document 1) Tang, Duyu, Bing Qin, and Ting Liu. "Learning semantic representations of users and products for document level sentiment classification." Proc. ACL. 2015. (비특허문헌 2) Domingos, Pedro. "A few useful things to know about machine learning." Communications of the ACM 55.10 (2012): 78-87.(Non-Patent Document 2) Domingos, Pedro. "A few useful things to know about machine learning." Communications of the ACM 55.10 (2012): 78-87. (비특허문헌 3) Le, Quoc, and Tomas Mikolov. "Distributed Representations of Sentences and Documents." Proc. ICML. 2014.(Non-Patent Document 3) Le, Quoc, and Tomas Mikolov. "Distributed Representations of Sentences and Documents." Proc. ICML. 2014. 한국 등록특허공보 제 10-1561464 호 ("수집 데이터 감성분석 방법 및 장치", 성균관대학교산학협력단)Korean Patent Registration No. 10-1561464 ("Method and Apparatus for Analyzing Sensitivity of Collection Data ", Sungkyunkwan Univ.

전술한 문제점을 해결하기 위한 본 발명의 목적은 리뷰가 받은 추천 및 비추천 정보와, 다른 평점을 가진 리뷰 간의 유사도를 이용해 리뷰의 신뢰도를 고려하여 효과적으로 리뷰의 감성을 분류하는 방법을 제공하는 것이다. An object of the present invention is to provide a method of effectively classifying emotions of review by considering the reliability of reviews using similarity between recommendation and non-recommendation information received from reviews and reviews having different ratings.

전술한 문제점을 해결하기 위한 본 발명의 다른 목적은 리뷰가 받은 추천 및 비추천 정보와, 다른 평점을 가진 리뷰 간의 유사도를 이용해 리뷰의 신뢰도를 고려하여 효과적으로 리뷰의 감성을 분류하는 장치를 제공하는 것이다. Another object of the present invention to solve the above-mentioned problems is to provide an apparatus for classifying emotions of a review effectively considering the reliability of reviews by using the similarity between the recommendation and non-recommendation information received from the review and reviews having different ratings.

다만, 본 발명의 해결하고자 하는 과제는 이에 한정되는 것이 아니며, 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위에서 다양하게 확장될 수 있을 것이다.It should be understood, however, that the present invention is not limited to the above-described embodiments, but may be variously modified without departing from the spirit and scope of the invention.

전술한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법은, 텍스트 정보 및 상기 텍스트 정보에 대응하는 감성 레벨 정보를 각각 포함하는 복수의 리뷰 데이터들을 기반으로 머신 러닝을 통해 감성 분류 모델을 생성하는 단계 및 상기 감성 분류 모델을 이용하여 감성 레벨 정보를 포함하지 않는 리뷰 데이터의 감성 레벨을 결정하는 단계를 포함하고, 상기 감성 분류 모델을 생성하는 단계는 상기 텍스트 정보들 간의 유사도를 기반으로 상기 감성 레벨 정보를 조정하여 상기 감성 분류 모델에 반영할 수 있다. According to an aspect of the present invention, there is provided a method for classifying emotions of review data according to an embodiment of the present invention includes a step of classifying a plurality of review data, each of which includes text information and emotion level information corresponding to the text information, Wherein the step of generating the emotional classification model includes a step of generating a sensibility classification model through learning and a step of determining an emotion level of review data that does not include emotion level information using the sensory classification model, The sensitivity level information may be adjusted based on the degree of similarity between the sensory classification models and reflected in the sensory classification model.

일 측면에 따르면, 상기 감성 분류 모델을 생성하는 단계는, 상기 복수의 리뷰 데이터들 중 상기 감성 분류 모델에 반영될 리뷰 데이터를 선택하는 단계, 상기 선택된 리뷰 데이터의 텍스트 정보를 텍스트 벡터로 변환하는 단계 및 상기 텍스트 벡터를 상기 감성 분류 모델의 입력 값으로 설정하고 상기 선택된 리뷰 데이터의 감성 레벨 정보를 기반으로 상기 감성 분류 모델의 목표 출력 벡터를 설정함으로써 상기 감성 분류 모델을 학습하는 단계를 포함할 수 있다. According to an embodiment of the present invention, the step of generating the emotional classification model may include selecting review data to be reflected in the emotional classification model among the plurality of review data, converting the text information of the selected review data to a text vector And learning the emotion classification model by setting the text vector as an input value of the emotion classification model and setting a target output vector of the emotion classification model based on the emotion level information of the selected review data .

여기서, 상기 리뷰 데이터를 선택하는 단계는, 더 높은 신뢰도를 가지는 리뷰 데이터가 더 높은 선택될 확률을 가지도록 할 수 있다. Here, the step of selecting the review data may allow the review data having a higher reliability to have a higher selection probability.

또한, 상기 신뢰도는 상기 리뷰 데이터에 대한 추천수 및 비추천수의 비율을 기반으로 결정될 수도 있다. In addition, the reliability may be determined based on the ratio of the recommendation number and the non-recommendation number to the review data.

한편, 상기 감성 분류 모델을 학습하는 단계는, 상기 텍스트 정보들 간의 유사도를 기반으로 상기 목표 출력 벡터를 설정할 수 있다. 일 측면에 따르면, 상기 감성 분류 모델을 학습하는 단계는, 상기 선택된 리뷰 데이터의 텍스트 정보와 높은 유사도를 가지는 텍스트 정보에 대응하는 감성 레벨 정보에 가중치를 부여함으로써 상기 목표 출력 벡터를 설정할 수 있다. In the learning of the emotion classification model, the target output vector may be set based on the similarity between the text information. According to an aspect of the present invention, the step of learning the emotional classification model can set the target output vector by weighting emotion level information corresponding to text information having a high degree of similarity with text information of the selected review data.

여기서, 상기 텍스트 정보들 간의 유사도는 상기 텍스트 정보들에 각각 대응하는 텍스트 벡터들 간의 코사인 유사도를 기반으로 결정될 수 있다. Here, the similarity between the text information may be determined based on the cosine similarity between the text vectors corresponding to the text information.

상기 목표 출력 벡터는 하기의 수학식을 기반으로 결정될 수도 있다. The target output vector may be determined based on the following equation.

단, 여기서 T 는 목표 출력 벡터, r^x는 감성 레벨 x 를 가지는 텍스트 정보, L 은 감성 레벨의 범위, l 은 1 내지 L 사이의 값,

,

는 텍스트 정보 r^x 와 감성 레벨 l 을 가지는 다른 텍스트 정보들과의 유사도,

는 텍스트 정보 r^x 와 전체 텍스트 정보들과의 코사인 유사도의 합에 대한 텍스트 정보 r^x 와 감성 레벨 l 을 가지는 텍스트 정보들과의 코사인 유사도의 합의 비율을 나타낸다. Here, T is the target output vector, r ^x is the text information having the emotion level x, L is the range of the emotion level, l is a value between 1 and L,

,

The degree of similarity between the text information r ^x and other text information having the emotion level l ,

The text information r ^x And the sum of the cosine similarities of the text information r ^x and the text information having the emotion level 1 with respect to the sum of the cosine similarities with the full text information.

본 발명의 다른 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 장치는, 텍스트 정보 및 상기 텍스트 정보에 대응하는 감성 레벨 정보를 각각 포함하는 복수의 리뷰 데이터들을 기반으로 머신 러닝을 통해 감성 분류 모델을 생성하는 감성 분류 모델 생성부 및 상기 감성 분류 모델을 이용하여 감성 레벨 정보를 포함하지 않는 리뷰 데이터의 감성 레벨을 결정하는 결정부를 포함하고, 상기 감성 분류 모델 생성부는 상기 텍스트 정보들 간의 유사도를 기반으로 상기 감성 레벨 정보를 조정하여 상기 감성 분류 모델에 반영할 수 있다. An apparatus for classifying emotions of review data according to another embodiment of the present invention includes an emotional classifier for classifying emotional classification models through machine learning based on a plurality of review data each including text information and emotional level information corresponding to the text information, And a determination unit that determines an emotion level of review data that does not include emotion level information using the emotion classification model, and wherein the emotion classification model generation unit generates the emotion classification data based on the similarity between the text information The emotion level information can be adjusted and reflected in the emotion classification model.

일 측면에 따르면, 상기 감성 분류 모델 생성부는, 상기 복수의 리뷰 데이터들 중 상기 감성 분류 모델에 반영될 리뷰 데이터를 선택하는 선택부, 상기 선택된 리뷰 데이터의 텍스트 정보를 텍스트 벡터로 변환하는 변환부 및 상기 텍스트 벡터를 상기 감성 분류 모델의 입력 값으로 설정하고 상기 선택된 리뷰 데이터의 감성 레벨 정보를 기반으로 상기 감성 분류 모델의 목표 출력 벡터를 설정함으로써 상기 감성 분류 모델을 학습하는 학습부를 포함할 수 있다. According to an aspect of the present invention, the emotional classifier model generation unit includes a selection unit that selects review data to be reflected in the emotional classification model among the plurality of review data, a conversion unit that converts text information of the selected review data into a text vector, And a learning unit for learning the sensibility classification model by setting the text vector as an input value of the sensitivity classification model and setting a target output vector of the sensitivity classification model based on the sensitivity level information of the selected review data.

여기서, 상기 선택부는 더 높은 신뢰도를 가지는 리뷰 데이터가 더 높은 선택될 확률을 가지도록 구성될 수 있다. Here, the selection unit may be configured so that the review data having a higher reliability has a higher selection probability.

또한, 상기 신뢰도는 상기 리뷰 데이터에 대한 추천수 및 비추천수의 비율을 기반으로 결정될 수 있다. Also, the reliability may be determined based on the ratio of the recommendation number and the non-recommendation number to the review data.

일 측면에 따르면, 상기 학습부는 상기 텍스트 정보들 간의 유사도를 기반으로 상기 목표 출력 벡터를 설정할 수도 있다. According to an aspect of the present invention, the learning unit may set the target output vector based on a degree of similarity between the text information.

한편, 상기 학습부는 상기 선택된 리뷰 데이터의 텍스트 정보와 높은 유사도를 가지는 텍스트 정보에 대응하는 감성 레벨 정보에 가중치를 부여함으로써 상기 목표 출력 벡터를 설정할 수 있다. On the other hand, the learning unit can set the target output vector by assigning weights to emotion level information corresponding to text information having a high degree of similarity with the text information of the selected review data.

한편, 상기 목표 출력 벡터는 하기의 수학식을 기반으로 결정될 수 있다. The target output vector may be determined based on the following equation.

,

본 발명의 다른 실시예에 따른 컴퓨터 판독 가능한 저장 매체는, 상기 컴퓨터에 포함된 프로세서로 하여금, 텍스트 정보 및 상기 텍스트 정보에 대응하는 감성 레벨 정보를 각각 포함하는 복수의 리뷰 데이터들을 기반으로 머신 러닝을 통해 감성 분류 모델을 생성하도록 하기 위한 명령어 및 상기 감성 분류 모델을 이용하여 감성 레벨 정보를 포함하지 않는 리뷰 데이터의 감성 레벨을 결정하도록 하기 위한 명령어를 포함하고, 상기 감성 분류 모델을 생성하도록 하기 위한 명령어는 상기 텍스트 정보들 간의 유사도를 기반으로 상기 감성 레벨 정보를 조정하여 상기 감성 분류 모델에 반영하도록 구성될 수 있다. According to another embodiment of the present invention, there is provided a computer-readable storage medium for causing a processor included in the computer to perform machine learning based on a plurality of review data each including text information and emotion level information corresponding to the text information, A command for generating a sentence classification model through the use of the sentence classification model, and a command for determining a sentence level of review data that does not include sentence level information using the sentence classification model, May be configured to adjust the emotion level information based on the similarity between the text information and to reflect the emotion level information in the emotion classification model.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technique may have the following effects. It is to be understood, however, that the scope of the disclosed technology is not to be construed as limited thereby, as it is not meant to imply that a particular embodiment should include all of the following effects or only the following effects.

전술한 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법 및 장치에 따르면, 리뷰가 받은 추천 및 비추천 정보와, 다른 평점을 가진 리뷰 간의 유사도를 이용해 리뷰의 신뢰도를 고려하여 효과적으로 리뷰의 감성을 분류할 수 있다. According to the method and apparatus for classifying emotion of review data according to an embodiment of the present invention, it is possible to effectively review reviews based on the reliability of reviews using similarity between recommendation and non-recommendation information received from reviews and reviews having different ratings The emotion of the person can be classified.

따라서, 리뷰 텍스트가 의미적으로 서로 유사한 경우에도 작성자에 따라 서로 다른 평점을 가지는 경우가 있으며, 리뷰 텍스트들의 신뢰도가 모두 다를 수 있다는 문제점을 해결하고, 리뷰 텍스트가 가진 신뢰도와 리뷰 텍스트 간의 의미적 유사도에 따른 감성의 유사성을 반영하여 감성 분류 모델을 생성함으로써 이러한 문제들에 대응하여 감성 분류의 정확도를 높일 수 있다.Therefore, even if the review texts are semantically similar to each other, they may have different scores depending on the author, and the reliability of the review texts may be different, and the semantic similarity between the reliability of the review text and the review text It is possible to increase the accuracy of emotional classification in response to these problems by generating the emotional classification model.

도 1 은 본 발명의 일 실시예에 따른 리뷰의 신뢰도 및 리뷰 텍스트 간의 유사도를 고려한 리뷰 데이터의 감성을 분류하기 위한 방법의 개념도이다.
도 2 는 본 발명의 일 실시예에 따른 리뷰 텍스트 간의 유사도를 고려한 감성 레벨 정보 조정 절차의 개념도이다.
도 3 은 본 발명의 일 실시예에 따른 감성 분류 모델의 구성도이다.
도 4 는 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법의 흐름도이다.
도 5 는 도 4 의 감성 분류 모델 생성 단계의 상세 흐름도이다.
도 6 은 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 장치의 구성을 나타내는 블록도이다.
도 7 은 도 6 의 감성 분류 모델 생성부의 상세 블록도이다. 1 is a conceptual diagram of a method for classifying emotions of review data in consideration of similarity between review text and review text according to an embodiment of the present invention.
2 is a conceptual diagram of a process of adjusting emotion level information in consideration of the similarity between review texts according to an embodiment of the present invention.
3 is a configuration diagram of a sensibility classification model according to an embodiment of the present invention.
4 is a flowchart of a method for classifying emotions of review data according to an embodiment of the present invention.
5 is a detailed flowchart of the sensitivity classification model generating step of FIG.
6 is a block diagram showing a configuration of an apparatus for classifying emotion of review data according to an embodiment of the present invention.
7 is a detailed block diagram of the sensitivity classification model generation unit of FIG.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

제 1, 제 2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the relevant art and are to be interpreted in an ideal or overly formal sense unless explicitly defined in the present application Do not.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In order to facilitate the understanding of the present invention, the same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.

본 발명은 리뷰 데이터의 감성을 분류하기 위한 방법 및 장치에 관한 것이다. 특정 제품에 대해 복수의 리뷰 데이터가 축적될 수 있으며, 이러한 리뷰 데이터는 해당 리뷰의 실질적인 내용을 포함하는 텍스트 정보와 상기 텍스트 정보에 대응되는 감성 레벨 정보를 포함할 수 있다. 일 실시예로서, 상기 감성 레벨은 평점 (rating) 으로서 표현될 수 있고, 예를 들어 1 내지 5 사이의 평점을 가질 수 있다. 이처럼 텍스트 정보 및 텍스트 정보에 대응하는 감성 레벨 정보를 각각 포함하는 복수의 리뷰 데이터를 기반으로, 머신 러닝을 통해 인공 신경망 (Neural Network) 학습시킴으로써, 리뷰 데이터에 대한 감성 분류 모델을 생성할 수 있다. 이후, 감성 레벨에 관한 정보를 포함하지 않는 리뷰 데이터가 수집된 경우에, 상기 감성 분류 모델을 기반으로 그에 상응하는 감성 레벨을 예측할 수 있다. The present invention relates to a method and apparatus for classifying emotions of review data. A plurality of review data may be accumulated for a specific product, and the review data may include text information including substantial contents of the review and emotion level information corresponding to the text information. In one embodiment, the emotional level may be expressed as a rating and may have a rating of, for example, between 1 and 5. As described above, the emotion classification model for the review data can be generated by learning Neural Network through machine learning based on a plurality of review data each including emotion level information corresponding to text information and text information. Thereafter, when review data that does not include information on the emotion level is collected, the emotion level corresponding thereto can be predicted based on the emotion classification model.

신경망을 이용한 통상적인 감성 분류 모델의 학습 과정은 하기와 같다. 1) 먼저, 수집된 모든 리뷰 데이터들에 대하여, 언어 모델을 이용하여 리뷰의 텍스트 정보를 벡터 표현으로 변환할 수 있다. 2) 이후, 복수의 리뷰 데이터들 중에서 신경망을 학습하기 위한 리뷰 데이터를 임의로 선택할 수 있다. 3) 선택된 리뷰 데이터에 대하여, 벡터로 표현된 리뷰 텍스트는 신경망의 입력 값으로 설정하고, 선택된 리뷰 데이터의 평점은 신경망의 목표 출력 값으로 설정한다. 여기서 목표 출력 값은 one-hot 벡터 표현 (예를 들어, [1, 0, 0, 0, 0]) 의 형태를 가질 수 있다. 4) 신경망의 입력 값 및 목표 출력 값이 설정되면, 신경망의 실제 출력 값이 목표 출력 값과 유사해지도록 신경망을 학습한다. 5) 신경망의 오차가 일정 값 이하로 수렴할 때까지 상기 2) 내지 4) 의 과정을 반복함으로써 신경망의 학습을 통한 감성 분류 모델이 생성될 수 있다. 6) 이후, 새로운 리뷰 텍스트에 대해, 텍스트 정보를 벡터 표현으로 변환하여 신경망의 입력 값으로 설정했을 때의 출력 값이 해당 리뷰 데이터의 예상 평점 (감성 레벨)이 된다.The learning process of the conventional sensitivity classification model using the neural network is as follows. 1) First, the text information of the review can be converted into a vector expression using the language model for all collected review data. 2), review data for learning a neural network among a plurality of review data can be arbitrarily selected. 3) For the selected review data, the review text expressed as a vector is set as the input value of the neural network, and the rating of the selected review data is set as the target output value of the neural network. Where the target output value can take the form of a one-hot vector representation (eg, [1, 0, 0, 0, 0]). 4) If the input value of the neural network and the target output value are set, the neural network is learned so that the actual output value of the neural network becomes similar to the target output value. 5) By repeating the above-mentioned 2) to 4) until the error of the neural network converges to a certain value or less, a sensibility classification model through learning of the neural network can be generated. 6) After that, for the new review text, the output value when the text information is converted into the vector representation and set as the input value of the neural network becomes the expected rating (emotion level) of the review data.

하지만 이와 같은 방법은 사용자 및 제품에 대한 충분한 데이터를 필요로 하며, 또한 리뷰 내용이 일치하는 경우에도 사용자에 따라 다른 평점을 매기는 경우나, 리뷰 내용 자체의 신뢰도에 대한 문제를 다루지 못하는 문제점이 있다. 즉, 리뷰의 텍스트 내용 자체의 신뢰도에 대한 문제를 다루지 못하거나, 리뷰 내용이 비슷한 경우에도 사용자에 따라 다른 평점을 매기는 경우를 다루지 못한다는 문제가 있다. 이러한 문제를 고려하지 않고 감성 분류 모델을 생성할 경우, 신뢰도가 낮은 리뷰 데이터나, 텍스트와 평점의 일관성이 떨어지는 리뷰 데이터에 의해 모델의 성능이 떨어질 수 있다. 따라서, 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법 및 장치에서는 이와 같은 한계점을 고려하기 위해 리뷰의 신뢰도를 평가하여 신뢰도가 높은 리뷰를 우선적으로 감성 분류 모델에 반영하고, 리뷰의 텍스트 데이터들 간의 유사도를 기반으로 평점을 재계산하는 과정을 통해 분류 모델을 학습하여 리뷰 텍스트의 감성을 분류할 수 있다. However, such a method requires sufficient data on the user and the product, and even when the contents of the review are matched, there is a problem that the user can not deal with the problem of the reliability of the review content itself . In other words, there is a problem that the user can not deal with the problem of the reliability of the text content of the review itself, or even if the review content is similar, the user can not deal with the case of rating differently. When a sensitivity classification model is generated without considering such a problem, the performance of the model may be deteriorated by the low reliability data or the review data whose text and rating are inconsistent. Therefore, in the method and apparatus for classifying emotion of review data according to an embodiment of the present invention, in order to take such a limitation into consideration, the reliability of the review is evaluated, the highly reliable review is preferentially reflected in the emotion classification model, The classification of the review text can be classified by learning the classification model through recalculation of the scores based on the similarity between the text data of the review text.

먼저, 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법 및 장치는 해당 텍스트의 신뢰도를 고려하여 신경망 기반의 분류 모델을 학습함으로써 보다 효과적으로 리뷰 텍스트의 감성을 분류할 수 있다. 특정 상품에 대해서 리뷰의 작성자는 리뷰를 작성함으로써 자신이 소비한 상품에 대한 감성을 텍스트와 감성 레벨 (예를 들어 평점) 을 통해서 나타낼 수 있으며, 다른 사람이 작성한 리뷰 텍스트에 대해 추천 및 비추천을 통해 감성을 공유할 수 있다. 본 발명의 일 실시예에 따르면, 상기와 같은 리뷰의 추천 및 비추천 정보를 이용해 리뷰 데이터의 신뢰도를 평가하여 리뷰 텍스트에 내재된 작성자의 감성을 분류할 수 있다.First, a method and apparatus for classifying emotion of review data according to an embodiment of the present invention can classify the emotion of review text more effectively by learning a neural network-based classification model in consideration of the reliability of the text. For a specific product, the author of the review can express the emotion of the product he or she consumes through the text and emotional level (for example, rating) by writing a review, and by recommending and deprecating the review text written by another person You can share emotions. According to an embodiment of the present invention, the reliability of the review data can be evaluated using the recommendation and non-recommendation information of the review, and the emotion of the author inherent in the review text can be classified.

일반적으로 리뷰 텍스트의 신뢰도가 높을수록 해당 텍스트가 감성을 잘 내포하고 있을 확률이 높으며, 의미적으로 유사한 텍스트들의 경우 해당 텍스트들이 내포하고 있는 감성도 유사하다. 본 발명의 일 실시예에 따르면, 텍스트의 감성 분류 모델을 생성할 때, 분류 모델을 학습하는 과정에서 이와 같은 리뷰 텍스트들의 특징을 반영함으로써, 텍스트 감성 분류의 정확도를 향상시키는 것을 목적으로 한다. Generally, the higher the reliability of the review text, the higher the likelihood that the text has a good deal of emotion. In the case of semantically similar texts, the sensibility implied by the texts is also similar. According to an embodiment of the present invention, an object of the present invention is to improve the accuracy of text emotion classification by reflecting characteristics of review texts in a process of learning a classification model when generating emotion classification models of text.

한편, 리뷰 데이터에 포함된 텍스트 정보들이 가지는 의미가 서로 유사한 경우에도 그 리뷰 텍스트의 작성자에 따라 서로 다른 평점을 가지는 경우가 있으며, 텍스트들의 신뢰도가 모두 다르다. 본 발명의 일 실시예에서는 리뷰 텍스트가 가진 신뢰도와 리뷰 텍스트간의 의미적 유사도에 따른 감성의 유사성을 반영하여 감성 분류 모델을 생성함으로써 이러한 문제들에 대응하여 감성 분류의 정확도를 높일 수 있다.On the other hand, even when the text information included in the review data has similar meanings, the review text may have different scores according to the creator of the review text, and the reliability of the texts is different. In an embodiment of the present invention, the sensitivity classification classification model is generated by reflecting the similarity of emotions according to the semantic similarity between the reliability of the review text and the review text, thereby improving the accuracy of emotional classification in response to these problems.

도 1 은 본 발명의 일 실시예에 따른 리뷰의 신뢰도 및 리뷰 텍스트 간의 유사도를 고려한 리뷰 데이터의 감성을 분류하기 위한 방법의 개념도이다. 이하, 도 1 을 참조하여, 본 발명의 일 실시예에 따른 리뷰의 신뢰도 및 리뷰 텍스트 간의 유사도를 고려한 리뷰 데이터의 감성 분류 방법에 대해서 보다 구체적으로 설명한다. 도 1 에 도시된 바와 같이, 리뷰 데이터 DB (10) 에는 복수의 리뷰 데이터가 저장될 수 있다. 상기와 같은 리뷰 데이터는 특정 상품의 구매자 또는 이용자에 의해 생성될 수 있으며 예를 들어 통신망을 통해 수집될 수 있다. 복수의 리뷰 데이터들은 리뷰의 내용을 나타내는 텍스트 정보 및 이러한 텍스트 정보에 대응하는 감성 레벨 정보 (예를 들어, 평점 정보) 를 각각 포함할 수 있다. 1 is a conceptual diagram of a method for classifying emotions of review data in consideration of similarity between review text and review text according to an embodiment of the present invention. Hereinafter, with reference to FIG. 1, a method of emotion classification of review data considering reliability of reviews and similarity between review texts according to an embodiment of the present invention will be described in more detail. As shown in Fig. 1, the review data DB 10 may store a plurality of review data. Such review data may be generated by a buyer or a user of a specific product and may be collected, for example, via a communication network. The plurality of review data may include text information indicating the content of the review and emotion level information (e.g., rating information) corresponding to the text information.

복수의 리뷰 데이터들 중 임의의 리뷰 데이터가 감성 분류 모델의 생성을 위한 신경망의 학습에 먼저 반영될 수 있다. 다만 전술한 바와 같이 리뷰 데이터들 각각에 대한 신뢰도가 상이하므로, 더 높은 신뢰도를 가지는 리뷰 데이터가 우선적으로 신경망의 학습에 반영될 경우 감성 분류 모델의 정확도를 향상시킬 수 있다. 본 발명의 일 실시예에 따르면, 신경망 기반의 분류 모델을 이용하여 리뷰의 감성을 분류하는 과정에서 리뷰의 신뢰도를 기반으로 학습 데이터를 선별적으로 선택함으로써 보다 정확한 감성 분류 모델을 생성할 수 있다. Any review data among a plurality of review data can be reflected in the learning of the neural network for generating the emotional classification model first. However, since reliability of each review data differs as described above, when the review data having higher reliability is preferentially reflected in the learning of the neural network, the accuracy of the sensitivity classification model can be improved. According to an embodiment of the present invention, a more accurate sentence classification model can be generated by selectively selecting the learning data based on the reliability of the review in the process of classifying the sentiment of the review using the neural network-based classification model.

구체적으로, 신뢰도가 높은 리뷰 데이터를 우선적으로 선택하여 신경망을 학습하기 위하여 리뷰 r 의 신뢰도

을 하기의 수학식 1 에 따라 정의할 수 있다. Specifically, in order to preferentially select the review data with high reliability and learn the neural network, the reliability of the review r

Can be defined according to the following equation (1).

여기서, N 은 수집된 리뷰 데이터의 개수, help_i 는 i 번째 리뷰 데이터가 받은 추천 수, total_i 는 추천 수와 비추천 수의 합을 나타낼 수 있다. 이와 같이 리뷰의 신뢰도를 정의한 후, 전술한 감성 분류 모델의 학습 과정의 단계 2), 즉, 복수의 리뷰 데이터들 중에서 신경망을 학습하기 위한 리뷰 데이터를 선택하는 과정에서 리뷰 r 이 선택될 확률을 1/N 대신

로 계산함으로써, 높은 신뢰도를 가지는 리뷰가 우선적으로 선택되도록 리뷰의 선택 및 학습이 선별적으로 이루어지도록 할 수 있다. 즉, 더 높은 신뢰도를 가지는 리뷰 데이터가 감성 분류 모델에 반영될 리뷰 데이터를 선택함에 있어서 더 높은 선택될 확률을 가질 수 있으며, 이러한 신뢰도는 상기 리뷰 데이터에 대한 추천수 및 비추천수의 비율을 기반으로 결정될 수 있다. Here, N is the number of collected review data, help _i is the recommendation number received by the ith review data, and total _i is the sum of the recommended number and the non-recommended number. After defining the reliability of the review, the probability that the review r is selected in the process of selecting the review data to learn the neural network from among the plurality of review data, that is, the probability of selecting the review r is defined as 1 Instead of / N

The selection and learning of the review can be performed selectively so that the review having high reliability is preferentially selected. That is, the review data having higher reliability may have a higher probability of being selected in selecting the review data to be reflected in the sentence classification model, and this reliability is determined based on the ratio of the recommendation number and the non-recommendation number to the review data .

다시 도 1 을 참조하면, 전술한 바와 같이 리뷰 데이터 DB (10) 에 포함된 리뷰 데이터들 중에서 신뢰도를 기반으로 선택함으로써 감성 분류 모델을 학습하기 위한 리뷰 데이터 (20) 가 결정될 수 있다. 이러한 리뷰 데이터 (20) 는 리뷰의 내용과 연관된 텍스트 정보 및 상기 텍스트 정보에 대응하는 감성 레벨 (예를 들어, 평점) 정보를 가질 수 있다. 그 중, 텍스트 정보는 텍스트 벡터 (21) 로 변환되고, Text Representation 으로 신경망 (Neural Network, 30) 의 입력 값 (27) 이 될 수 있다. 또한, 감성 레벨 (예를 들어, 평점) 에 대한 정보는 Rating Representation 로서 신경망 (Neural Network, 30) 의 목표 출력 값 (25) 이 될 수 있다. 통상적인 감성 분류 과정에서 감성 레벨 정보가 one-hot 벡터 표현 (예를 들어, [1, 0, 0, 0, 0]) 의 형태 (23) 로서 목표 출력 값이 될 수 있는 것과 달리, 본 발명의 일 실시예에 따르면, 감성 레벨 정보는 당해 리뷰 데이터의 텍스트 정보와 유사한 텍스트 정보에 대응되는 감성 레벨 정보를 반영함으로써 재조정되어 목표 출력 값 (25) 으로 설정될 수 있다. 신경망의 입력 값 및 목표 출력 값이 설정되면, 신경망의 실제 출력 값이 목표 출력 값과 유사해지도록 신경망을 반복 학습함으로써 신경망의 학습을 통한 감성 분류 모델이 생성될 수 있다. Referring again to FIG. 1, the review data 20 for learning a sentence classification model can be determined by selecting based on the reliability among the review data included in the review data DB 10, as described above. Such review data 20 may have textual information associated with the content of the review and emotional level (e.g., rating) information corresponding to the textual information. Among them, the text information is converted into a text vector 21, and the Text Representation can be an input value 27 of a neural network 30. In addition, the information about the emotion level (for example, rating) may be a rating representation, which may be the target output value 25 of the neural network 30. Unlike the case where the sensitivity level information in the usual sensitivity classification process can be the target output value as the form (23) of one-hot vector expression (for example, [1, 0, 0, 0, 0] The emotion level information may be readjusted by reflecting the emotion level information corresponding to the text information similar to the text information of the review data to be set as the target output value 25. [ When the input value and the target output value of the neural network are set, a sensory classification model through learning of the neural network can be generated by repeatedly learning the neural network so that the actual output value of the neural network becomes similar to the target output value.

도 2 는 본 발명의 일 실시예에 따른 리뷰 텍스트 간의 유사도를 고려한 감성 레벨 정보 조정 절차의 개념도이다. 이하 도 2 를 참조하여, 도 1 의 one-hot 벡터 표현 (23) 의 감성 레벨 정보를 유사도 기반 조정을 통해 목표 출력 값 (25) 으로 설정하는 단계를 보다 상세히 설명한다. 복수의 리뷰 데이터들 중 선택된 리뷰 데이터는 텍스트 정보 및 이에 대응하는 감성 레벨 정보를 포함할 수 있고, 도 2 에 도시된 바와 같이 예를 들어 특정 텍스트 정보를 가지는 리뷰 데이터의 감성 레벨 정보는 평점이 1 점이라는 것을 나타낼 수 있다. 이러한 감성 레벨 정보는 Rating vector (210) 로서 표현되었을 때, 평점 1 에 해당하는 [1, 0, 0, 0, 0] 의 정보를 가질 수 있다. 본 발명의 일 실시예에 따르면, 상기 선택된 리뷰 데이터의 텍스트 정보와 유사한 텍스트 정보가 가지는 평점에 대해서 가중치를 부여하여 재조정된 Rating vector (230) 를 생성할 수 있으며, 이를 신경망의 목표 출력 값 (25) 으로 설정할 수 있다. 2 is a conceptual diagram of a process of adjusting emotion level information in consideration of the similarity between review texts according to an embodiment of the present invention. Hereinafter, the step of setting the sensitivity level information of the one-hot vector expression 23 of FIG. 1 to the target output value 25 through the similarity-based adjustment will be described in more detail with reference to FIG. The selected review data among the plurality of review data may include text information and corresponding emotion level information. For example, as shown in FIG. 2, the emotion level information of the review data having specific text information may have a rating of 1 Point. This emotion level information may have [1, 0, 0, 0, 0] information corresponding to the rating 1 when it is expressed as a Rating vector (210). According to an embodiment of the present invention, a re-adjusted Rating vector 230 may be generated by weighting the scores of the text information similar to the text information of the selected review data, ).

리뷰 데이터들의 텍스트 정보들 간의 유사성을 고려하는 과정 (220) 은 하기와 같다. 예를 들어, 감성 레벨이 1 내지 5 의 범위를 가지는 평점으로서 표현되고 선택된 리뷰가 평점 1 을 가지는 경우, 선택된 리뷰 데이터의 텍스트 정보와 평점 2 를 가지는 텍스트 정보 간의 유사성, 선택된 리뷰 데이터의 텍스트 정보와 평점 3 을 가지는 텍스트 정보 간의 유사성, 선택된 리뷰 데이터의 텍스트 정보와 평점 4 를 가지는 텍스트 정보 간의 유사성 및 선택된 리뷰 데이터의 텍스트 정보와 평점 5 를 가지는 텍스트 정보 간의 유사성을 고려함으로써, 선택된 리뷰 데이터의 텍스 정보와 높은 유사도를 가지는 텍스트 정보의 평점에 가중치를 부여하는 방식으로 재조정된 Rating vector (230) 를 생성할 수 있다. 여기서, 각 텍스트 정보들 간의 유사도를 판단함에 있어서는 텍스트 정보가 변환된 텍스트 벡터들 간의 코사인 유사도가 사용될 수 있다. The process 220 for considering the similarity between the textual information of the review data is as follows. For example, if the emotion level is expressed as a rating having a range of 1 to 5 and the selected review has a rating of 1, the similarity between the text information of the selected review data and the text information having the rating of 2, Considering the similarity between the text information having the rating 3, the similarity between the text information of the selected review data and the text information having the rating 4, and the similarity between the text information of the selected review data and the text information having the rating 5, And a rating vector 230 that has been readjusted in such a manner as to weight the scores of the text information having a high degree of similarity. Here, in determining the degree of similarity between the text information, the degree of similarity between the text vectors may be used.

구체적으로, 평점 x 를 가지는 임의의 리뷰 r^x에 대해서 신경망의 목표 출력 벡터 T 를 하기의 수학식 2 를 통해 정의할 수 있다. Specifically, the target output vector T of the neural network can be defined by the following equation (2) for an arbitrary review r ^x having a rating x.

여기서 T 는 목표 출력 벡터, r^x는 감성 레벨 x 를 가지는 텍스트 정보, L 은 감성 레벨의 범위, l 은 1 내지 L 사이의 값,

,

는 텍스트 정보 r^x 와 전체 텍스트 정보들과의 코사인 유사도의 합에 대한 텍스트 정보 r^x 와 감성 레벨 l 을 가지는 텍스트 정보들과의 코사인 유사도의 합의 비율을 나타낸다. T is a target output vector, r ^x is text information having an emotion level x, L is a range of emotion levels, l is a value between 1 and L,

,

코사인 유사도는 벡터로 표현된 두 리뷰 데이터의 텍스트 정보 사이에서 계산될 수 있다. 이와 같은 과정을 통해 목표 출력 벡터를 재정의함으로써, 리뷰 r^x와 유사도가 높은 리뷰의 평점 쪽에 높은 가중치를 부여되도록 분류 모델을 학습시킬 수 있다. 따라서, 본 발명의 일 실시예에 따르면 리뷰 텍스트 간의 유사도를 바탕으로 리뷰의 평점을 재계산하여 분류 모델을 학습할 수 있으며, 보다 정확한 감성 분류 모델을 생성할 수 있다. The cosine similarity can be calculated between text information of two review data expressed as a vector. By redefining the target output vector through the above process, the classification model can be learned so as to give a high weight to the rating side of the review having high similarity to the review r ^x . Therefore, according to the embodiment of the present invention, it is possible to recalculate the rating of the review based on the similarity between review texts, to learn the classification model, and to generate a more accurate classification classification model.

도 3 은 본 발명의 일 실시예에 따른 감성 분류 모델의 구성도이다. 도 3 에 도시된 바와 같이, 본 발명의 일 실시예에 따른 감성 분류 모델을 위한 신경망 (320) 은 통상의 신경망 기반의 감성 분류 모델 구조와 같이 Input Layer (321), Hidden Layer (323) 및 Output Layer (325) 를 포함할 수 있다. 여기서, 본 발명의 일 실시예에 따르면 복수의 리뷰 데이터들 중 신뢰도를 기반으로 우선하여 선택된 리뷰 데이터에 대해서, 텍스트 정보 (310) 가 변환된 텍스트 벡터를 신경망 (320) 의 입력 값으로 설정할 수 있으며, 대응하는 감성 레벨 (330) 정보를 텍스트 정보들 간의 유사도를 기반으로 재조정하여 출력 목표 벡터를 생성하고 이를 신경망 (320) 의 목표 출력 값으로 설정한 뒤 오차가 일정 수준 이하로 감소할 때까지 신경망을 학습시킴으로써, 리뷰의 신뢰도 및 텍스트 정보의 유사도를 고려한 보다 높은 정확도의 감성 분류 모델을 생성할 수 있다. 3 is a configuration diagram of a sensibility classification model according to an embodiment of the present invention. 3, the neural network 320 for the sensory classification model according to an embodiment of the present invention includes an input layer 321, a hidden layer 323, and an output Layer 325. < RTI ID = 0.0 > Here, according to an embodiment of the present invention, a text vector in which the text information 310 is transformed can be set as an input value of the neural network 320, for the review data selected based on the reliability of the plurality of review data, , The corresponding sensibility level 330 information is readjusted based on the similarity between the text information to generate an output target vector, which is set as a target output value of the neural network 320, and then, until the error is reduced to a certain level or less, It is possible to generate a sensitivity classification model with higher accuracy in consideration of the reliability of the review and the similarity of the text information.

도 4 는 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법의 흐름도이고, 도 5 는 도 4 의 감성 분류 모델 생성 단계의 상세 흐름도이다. 이하, 도 4 및 도 5 를 참조하여 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법을 보다 상세히 설명한다. FIG. 4 is a flowchart of a method for classifying emotion of review data according to an embodiment of the present invention, and FIG. 5 is a detailed flowchart of the emotional classifying model generating step of FIG. Hereinafter, a method for classifying emotions of review data according to an embodiment of the present invention will be described in detail with reference to FIGS. 4 and 5. FIG.

도 4 에 도시된 바와 같이, 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법은 먼저 텍스트 정보 및 상기 텍스트 정보에 대응하는 감성 레벨 정보를 각각 포함하는 복수의 리뷰 데이터들을 기반으로 머신 러닝을 통해 감성 분류 모델을 생성할 수 있다 (S410). 여기서, 복수의 리뷰 데이터들은 예를 들어 전술한 리뷰 데이터 DB 로부터 수신될 수 있고, 감성 레벨 정보는 예를 들어 1 내지 5 의 평점으로서 표현될 수도 있다. As shown in FIG. 4, a method for classifying sentiment of review data according to an embodiment of the present invention includes classifying a plurality of review data, each of which includes text information and sentence level information corresponding to the text information, The sensory classification model may be generated through machine learning (S410). Here, a plurality of review data may be received from the above-described review data DB, for example, and the emotion level information may be expressed as a rating of 1 to 5, for example.

도 5 를 참조하여 감성 분류 모델 생성 단계 (S410) 를 보다 구체적으로 설명하면, 감성 분류 모델을 생성하는 단계 (S410) 는 먼저 복수의 리뷰 데이터들 중 감성 분류 모델에 반영될 리뷰 데이터를 선택할 수 있다 (S411). 리뷰 데이터를 선택함에 있어서는, 전술한 바와 같이 높은 신뢰도를 가지는 리뷰 데이터가 우선적으로 감성 분류 모델에 반영되도록 할 수 있으며, 이는 더 높은 신뢰도를 가지는 리뷰 데이터가 더 높은 선택될 확률을 가지도록 함으로써 구현될 수도 있다. 리뷰 데이터의 신뢰도는 리뷰 데이터에 대한 추천수 및 비추천수의 비율을 기반으로 결정될 수 있으며, 전술한 수학식 1 을 기반으로 산출될 수도 있다. Referring to FIG. 5, the emotional classifying model generating step S410 will be described in more detail. In step S410 of generating the emotional classifying model, first, review data to be reflected in the emotional classifying model among the plurality of review data can be selected (S411). In selecting the review data, as described above, the review data having high reliability can be preferentially reflected in the sentence classification model, which is implemented by having the review data having higher reliability having a higher selection probability It is possible. The reliability of the review data may be determined based on the ratio of the recommendation number and the non-recommendation number to the review data, and may be calculated based on Equation (1).

감성 분류 모델에 반영될 리뷰 데이터가 선택되면, 선택된 리뷰 데이터의 텍스트 정보를 텍스트 벡터로 변환할 수 있다 (S413). 이러한 리뷰 데이터의 텍스트 정보에 대한 텍스트 벡터로의 변환은 반영될 것으로 선택된 리뷰 데이터에 대해서만 수행될 수도 있고, 리뷰 데이터 DB 에 저장된 모든 리뷰 데이터의 텍스트 정보에 대해서 전부 수행될 수도 있다. When the review data to be reflected in the sensibility classification model is selected, the text information of the selected review data can be converted into a text vector (S413). The conversion of the text information of the review data into the text vector may be performed only for the review data selected to be reflected or for all of the review information stored in the review data DB.

텍스트 벡터가 생성된 경우, 텍스트 벡터를 감성 분류 모델의 입력 값으로 설정하고 선택된 리뷰 데이터의 감성 레벨 정보를 기반으로 감성 분류 모델의 목표 출력 벡터를 설정함으로써 감성 분류 모델을 학습할 수 있다 (S415). 여기서 선택된 리뷰 데이터의 감성 레벨 정보는 다른 텍스트 정보와의 유사도 및 그 텍스트 정보가 가지는 평점을 반영하여 목표 출력 벡터로 조정될 수 있다. 텍스트 정보들 간의 유사도를 기반으로 상기 목표 출력 벡터를 설정할 수 있고, 선택된 리뷰 데이터의 텍스트 정보와 높은 유사도를 가지는 텍스트 정보에 대응하는 감성 레벨 정보에 가중치를 부여함으로써 목표 출력 벡터를 설정할 수 있다. 구체적으로, 전술한 수학식 2 를 기반으로 목표 출력 벡터를 설정할 수도 있다. When the text vector is generated, the sensory classification model can be learned by setting the text vector as an input value of the sensible classification model and setting the target output vector of the sensible classification model based on the sensibility level information of the selected review data (S415) . The sensitivity level information of the review data selected here may be adjusted to the target output vector by reflecting the similarity with other text information and the rating of the text information. The target output vector can be set based on the similarity between the text information and the target output vector can be set by weighting the sensitivity level information corresponding to the text information having high similarity with the text information of the selected review data. Specifically, the target output vector may be set based on Equation (2).

다시 도 4 를 참조하면, 머신 러닝을 통해 생성된 감성 분류 모델을 이용하여, 감성 레벨 정보를 포함하지 않는 리뷰 데이터의 감성 레벨을 결정할 수 있다 (S420). 리뷰의 신뢰도 및 텍스트 정보의 유사도에 따른 가성 레벨의 편차를 반영할 수 있어, 보다 정확한 감성 레벨의 분류가 가능하다. Referring again to FIG. 4, the emotion level of the review data that does not include emotion level information can be determined using the emotion classification model generated through machine learning (S420). The reliability of the review and the deviation of the pseudo-level depending on the similarity of the text information can be reflected, and more accurate classification of the sensitivity level is possible.

도 6 은 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 장치의 구성을 나타내는 블록도이다. 도 6 에 도시된 바와 같이, 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 장치 (600) 는 리뷰 데이터 DB (610), 감성 분류 모델 생성부 (620), 결정부 (630) 및 감성 분류 모델 (640) 을 포함할 수 있다. 6 is a block diagram showing a configuration of an apparatus for classifying emotion of review data according to an embodiment of the present invention. 6, an apparatus 600 for classifying sentiment of review data according to an embodiment of the present invention includes a review data DB 610, a sentence classification model generation unit 620, a determination unit 630, And a sensory classification model 640. [

감성 분류 모델 생성부 (620) 는 예를 들어 리뷰 데이터 DB (610) 로부터의 복수의 리뷰 데이터를 수신할 수 있으며, 텍스트 정보 및 상기 텍스트 정보에 대응하는 감성 레벨 정보를 각각 포함하는 복수의 리뷰 데이터들을 기반으로 머신 러닝을 통해 감성 분류 모델을 생성할 수 있다. 여기서, 감성 분류 모델 생성부 (620) 는 상기 텍스트 정보들 간의 유사도를 기반으로 상기 감성 레벨 정보를 조정하여 상기 감성 분류 모델 (640) 에 반영할 수 있다. 한편, 도 6 에 도시된 바와 달리 리뷰 데이터 DB (610) 은 리뷰 데이터의 감성을 분류하기 위한 장치 (600) 와 분리되어 별도의 장치로서 존재할 수 있으며, 예를 들어 원격지에 위치하여 통신망을 통해 리뷰 데이터의 감성을 분류하기 위한 장치 (600) 와 정보를 송수신할 수 있다. 아울러, 감성 분류 모델 (640) 역시 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 장치 (600) 와 분리되어 별도의 신경망으로서 존재할 수 있다. The sensibility classification model generation unit 620 may receive a plurality of review data from the review data DB 610, for example, and may include a plurality of review data items each including text information and emotion level information corresponding to the text information, Based on these, we can generate emotion classification model through machine learning. Here, the sensibility classification model generation unit 620 may adjust the sensibility level information based on the similarity between the text information and reflect the sensibility level information in the sensibility classification model 640. 6, the review data DB 610 may exist as a separate device separate from the device 600 for classifying emotions of review data. For example, the review data DB 610 may be located at a remote place, And can send and receive information with the device 600 for classifying the sensitivity of the data. In addition, the sensibility classification model 640 may exist as a separate neural network separately from the apparatus 600 for classifying sensibility of review data according to an embodiment of the present invention.

도 7 을 참조하면, 감성 분류 모델 생성부 (620) 는 선택부 (621), 변환부 (623) 및 학습부 (625) 를 포함할 수 있다. 선택부 (621) 는 복수의 리뷰 데이터들 중 감성 분류 모델에 반영될 리뷰 데이터를 선택할 수 있고, 변환부 (623) 는 선택된 리뷰 데이터의 텍스트 정보를 텍스트 벡터로 변환할 수 있다. 또한, 학습부 (625) 는 텍스트 벡터를 감성 분류 모델의 입력 값으로 설정하고 선택된 리뷰 데이터의 감성 레벨 정보를 기반으로 감성 분류 모델의 목표 출력 벡터를 설정함으로써 감성 분류 모델을 학습시킬 수 있다. 7, the sensibility classification model generation unit 620 may include a selection unit 621, a conversion unit 623, and a learning unit 625. The selecting unit 621 can select the review data to be reflected in the emotional classification model among the plurality of review data and the converting unit 623 can convert the text information of the selected review data into the text vector. The learning unit 625 can learn the emotion classification model by setting the text vector as an input value of the emotion classification model and setting the target output vector of the emotion classification model based on the emotion level information of the selected review data.

여기서, 선택부 (621) 는 더 높은 신뢰도를 가지는 리뷰 데이터가 더 높은 선택될 확률을 가지도록 구성될 수 있고, 신뢰도는 리뷰 데이터에 대한 추천수 및 비추천수의 비율을 기반으로 결정될 수도 있다. Here, the selecting unit 621 may be configured such that the review data having higher reliability has a higher selection probability, and the reliability may be determined based on the ratio of the recommendation number and the non-recommendation number to the review data.

한편, 학습부 (623) 는 텍스트 정보들 간의 유사도를 기반으로 목표 출력 벡터를 설정할 수 있으며, 선택된 리뷰 데이터의 텍스트 정보와 높은 유사도를 가지는 텍스트 정보에 대응하는 감성 레벨 정보에 가중치를 부여함으로써 목표 출력 벡터를 설정할 수도 있다. 텍스트 정보들 간의 유사도는 텍스트 정보들에 각각 대응하는 텍스트 벡터들 간의 코사인 유사도를 기반으로 결정될 수 있다. On the other hand, the learning unit 623 can set the target output vector based on the degree of similarity between the text information. By assigning a weight to the sensitivity level information corresponding to the text information having high similarity with the text information of the selected review data, You can also set the vector. The degree of similarity between the textual information can be determined based on the degree of cosine similarity between the textual vectors corresponding to the textual information.

다시 도 6 을 참조하면, 결정부 (630) 는 감성 분류 모델 (640) 을 이용하여 감성 레벨 정보를 포함하지 않는 리뷰 데이터의 감성 레벨을 결정할 수 있다. 리뷰의 신뢰도 및 텍스트 정보의 유사도에 따른 가성 레벨의 편차를 반영할 수 있어, 보다 정확한 감성 레벨의 분류가 가능하다. Referring again to FIG. 6, the determination unit 630 may determine the emotion level of the review data that does not include the emotion level information using the emotion classification model 640. The reliability of the review and the deviation of the pseudo-level depending on the similarity of the text information can be reflected, and more accurate classification of the sensitivity level is possible.

실험예Experimental Example

하기의 표 1 은 종래의 통상적인 신경망 기반의 감성 분류 모델과 본 발명의 일 실시예에 따른 리뷰 데이터의 감성을 분류하기 위한 방법에 따른 감성 분류 모델의 감성 분류 결과를 비교한 것이다. Table 1 below compares the sensitivity classification results of the conventional sensory classification model based on the conventional neural network and the sensitivity classification classification according to the method for classifying the sensitivity of the review data according to the embodiment of the present invention.

본 발명의 실험예에서는 아마존의 사용자 리뷰 데이터를 사용하였다. 다양한 주제를 가지고 있는 Instance videos 에 대한 리뷰 데이터 117,460개에 대해 80%는 학습 데이터로, 20%는 테스트 데이터로 사용하였으며 5-fold cross validation을 수행하였다. Accuracy, Mean Absolute Error, Root Mean Square Error를 평가 지표로 사용하였으며, 일반적인 인공 신경망을 이용한 분류 모델에 비교하여 Mean Absolute Error의 경우 2.4%, Mean Square Error의 경우 2.6% 의 성능 향상이 존재함을 확인하였다.In the experimental example of the present invention, the user review data of the Amazon was used. For 117,460 review data on Instance videos with various topics, 80% were used as learning data, 20% were used as test data and 5-fold cross validation was performed. We used the accuracy, mean absolute error, and root mean square error as the evaluation indexes and found that there is a performance improvement of 2.4% in Mean Absolute Error and 2.6% in Mean Square Error compared to the classification model using general artificial neural network Respectively.

상술한 본 발명에 따른 리뷰 데이터의 감성을 분류하기 위한 방법 및 장치는 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현되는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체로는 컴퓨터 시스템에 의하여 해독될 수 있는 데이터가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM (Read Only Memory), RAM (Random Access Memory), 자기 테이프, 자기 디스크, 플래시 메모리, 광 데이터 저장장치 등이 있을 수 있다. 또한, 컴퓨터로 판독 가능한 기록매체는 컴퓨터 통신망으로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 읽을 수 있는 코드로서 저장되고 실행될 수 있다.The method and apparatus for classifying emotion of review data according to the present invention can be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording media storing data that can be decoded by a computer system. For example, there may be a ROM (Read Only Memory), a RAM (Random Access Memory), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device and the like. The computer-readable recording medium may also be distributed and executed in a computer system connected to a computer network and stored and executed as a code that can be read in a distributed manner.

이상, 도면 및 실시예를 참조하여 설명하였지만, 본 발명의 보호범위가 상기 도면 또는 실시예에 의해 한정되는 것을 의미하지는 않으며 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined by the appended claims. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

10 : 리뷰 데이터 DB
20 : 리뷰 데이터
21 : 텍스트 벡터
23 : one-hot 벡터
25 : 목표 출력 값
27 : 입력 값
30 : 신경망
10: Review data DB
20: Review data
21: Text vector
23: one-hot vector
25: target output value
27: Input value
30: Neural network

Claims

Generating an emotion classification model through machine learning based on a plurality of review data each including text information and emotion level information corresponding to the text information; And
And determining an emotion level of review data that does not include emotion level information using the emotion classification model,
Wherein the step of generating the emotion classification model comprises the steps of adjusting the emotion level information based on the degree of similarity between the text information and reflecting the emotion level information on the emotion classification model,
Wherein the step of generating the emotion classification model comprises:
Selecting review data to be reflected in the emotional classification model among the plurality of review data;
Converting the text information of the selected review data into a text vector; And
Learning the sensory classification model by setting the text vector as an input value of the sensible classification model and setting a target output vector of the sensible classification model based on the sensibility level information of the selected review data,
Wherein selecting the review data comprises causing a computer to classify the sentiment of the review data, wherein the review data having a higher confidence has a higher probability of being selected.

delete

The method according to claim 1,
Wherein the reliability is determined based on a ratio of a recommendation number and a non-recommendation number to the review data.

The method according to claim 1,
Wherein the step of learning the emotion classification model comprises:
And setting the target output vector based on the similarity between the textual information.

6. The method of claim 5,
Wherein the step of learning the emotion classification model comprises:
And setting the target output vector by weighting emotion level information corresponding to text information having a high degree of similarity with text information of the selected review data.

The method according to claim 6,
Wherein the similarity between the textual information is determined based on a cosine similarity between textual vectors each corresponding to the textual information.

8. The method of claim 7,
Wherein the target output vector is determined based on the following equation: < EMI ID = 17.1 >

Here, T is the target output vector, r ^x is the text information having the emotion level x, L is the range of the emotion level, l is a value between 1 and L,

,

Represents the ratio of the sum of the cosine similarities of the text information r ^x with the text information having the emotion level l to the sum of the cosine similarities between the text information r ^x and the full text information.

A sentence classification model generation unit for generating an emotion classification model through machine learning based on a plurality of review data each including text information and emotion level information corresponding to the text information; And
And a determination unit that determines an emotion level of review data that does not include emotion level information using the emotion classification model,
Wherein the emotional classifier model generator adjusts the emotional level information based on the degree of similarity between the textual information and reflects the emotional level information on the emotional classifier model,
Wherein the sensitivity classification model generation unit comprises:
A selection unit for selecting review data to be reflected in the emotional classification model among the plurality of review data;
A conversion unit for converting the text information of the selected review data into a text vector; And
And a learning unit for learning the sensibility classification model by setting the text vector as an input value of the sensory classification model and setting a target output vector of the sensory classification model based on the sensibility level information of the selected review data,
And wherein the selector is configured to have a higher probability of selecting the review data having a higher reliability.

delete

10. The method of claim 9,
Wherein the reliability is determined based on a ratio of a recommendation number and a non-recommendation number to the review data.

10. The method of claim 9,
And the learning unit sets the target output vector based on the similarity between the text information.

14. The method of claim 13,
Wherein the learning unit sets the target output vector by assigning a weight to sensitivity level information corresponding to text information having a high degree of similarity with the text information of the selected review data.

15. The method of claim 14,
Wherein the similarity between the text information is determined based on the cosine similarity between the text vectors corresponding to the text information.

16. The method of claim 15,
Wherein the target output vector is determined based on the following equation: < EMI ID = 17.0 >

,

The text information r ^x And the sum of the cosine similarities of the text information r ^x and the text information having the emotion level l with respect to the sum of the cosine similarities with the full text information.

A computer-readable storage medium having stored thereon a processor included in the computer,
Instructions for generating an emotion classification model through machine learning based on a plurality of review data each including text information and emotion level information corresponding to the text information; And
And determining an emotion level of review data that does not include emotion level information using the emotion classification model,
Wherein the command for generating the emotion classification model is configured to adjust the emotion level information based on the degree of similarity between the text information and to reflect the emotion level information in the emotion classification model,
Generating the emotion classification model may include:
Selecting review data to be reflected in the emotional classification model among the plurality of review data;
Converting the text information of the selected review data into a text vector; And
Learning the sensory classification model by setting the text vector as an input value of the sensible classification model and setting a target output vector of the sensible classification model based on the sensibility level information of the selected review data,
Wherein selecting the review data has a higher probability that review data with higher confidence will be selected higher.