KR20210106884A

KR20210106884A - Apparatus and method for emotion classification based on artificial intelligence for online data

Info

Publication number: KR20210106884A
Application number: KR1020210005081A
Authority: KR
Inventors: 한경식; 유지수
Original assignee: 아주대학교산학협력단
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2021-08-31
Also published as: KR102360309B1

Abstract

Disclosed are a device and a method for emotion classification of online data based on artificial intelligence. According to an embodiment of the present application, a method for emotion classification of online data based on artificial intelligence includes the steps of: preparing learning data including a plurality of online data to which first emotion information is given; building an artificial intelligence-based emotion classification model for recognizing emotions reflected in target online data when the target online data is received based on the learning data; and receiving the target online data and recognizing the emotion corresponding to the target online data based on the emotion classification model. Therefore, it is possible to optimize the emotion classification model based on the analysis results of factors which make it difficult to identify main emotions reflected in the online data.

Description

AI-based emotion classification device and method for online data

본원은 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치 및 방법에 관한 것이다.The present application relates to an artificial intelligence-based emotion classification apparatus and method for online data.

감정은 화자의 의도를 전달하고 생각을 표현하는데 중요한 역할을 하며, 대인 관계의 형성과 유지 그리고 의사 결정에 영향을 미친다. 따라서, 감정 분석은 다양한 상황에 존재하는 인간의 여러 측면(예를 들면, 인지, 사고, 행동, 표현 등)을 이해하기 위한 수단이 될 수 있다.Emotions play an important role in conveying the speaker's intentions and expressing thoughts, and influence the formation and maintenance of interpersonal relationships and decision-making. Therefore, emotion analysis can be a means for understanding various aspects of human beings (eg, cognition, thinking, behavior, expression, etc.) present in various situations.

감정 분석 연구의 대표적인 영역은 자동 감정 분류를 수행하는 감정 모델의 개발로서, 이러한 감정 모델에 대한 연구는 인공지능, 인간 컴퓨터 상호작용 등 여러 학문 분야에서 상당한 관심을 받고 있으며, 그에 따라 각 도메인에 적합한 모델이 개발되고 있는 추세이다.A representative area of emotion analysis research is the development of emotion models that perform automatic emotion classification, and research on these emotion models has received considerable attention in several academic fields, such as artificial intelligence and human computer interaction, and accordingly The model is being developed.

그러나 이러한 감정 모델을 보다 신뢰성 있게 구축하기 위하여는 다양한 측면이 면밀히 고려되어야 하는데, 먼저, 감정 라벨의 신뢰성이 주요하게 고려되어야 한다. 이와 관련하여 종래의 연구는 게시글을 작성한 작성자 이외의 제3자인 주석자가 임의로 정의한 감정 라벨이나 해시태그와 같이 휴리스틱(heuristic)한 방법을 통해 유추된 감정 라벨을 사용하여 감정 모델을 구축하는 것이 일반적이었다.However, in order to build such an emotion model more reliably, various aspects should be carefully considered. First, the reliability of the emotion label should be considered as a major consideration. In relation to this, in the prior research, it was common to build an emotion model using an emotion label inferred through a heuristic method such as an emotion label or hashtag arbitrarily defined by a commenter who is a third party other than the author of the post. .

그러나 주석자가 지정한 감정은 제3자가 정의한 감정으로 작성자가 직접 명시한 감정이 아니라는 한계가 있으며, 휴리스틱(heuristic) 방식의 라벨링은 해시태그가 작성자의 감정을 정확하게 대변하는지에 대하여 확신할 수 없다는 문제가 있다. 따라서 전술한 두 가지 방식 모두 감정 라벨의 신뢰성에 대한 한계가 존재한다.However, there is a limitation that the emotion specified by the commenter is defined by a third party and is not directly specified by the author, and heuristic labeling has a problem in that it is not possible to be sure whether the hashtag accurately represents the emotion of the author. . Therefore, in both of the above-described methods, there is a limit to the reliability of the emotion label.

또한, 두 번째 측면은 수작업 라벨링으로 인한 신뢰성 있는 데이터의 부족을 들 수 있으며, 이는 앞서 설명한 첫 번째 측면과 관련될 수 있다. 즉, 주석자에 의한 감정 라벨의 수작업 라벨링은 상당한 시간, 비용 등의 인적 노력이 요구될 수 밖에 없으므로, 이러한 수작업 라벨링 방식을 채택한 종래의 연구는 일반적으로 1,000개 내지 10,000개 수준의 한정된 인간-주석(human-annotated) 데이터를 활용하여 감정 모델을 구축하는 수준에 그쳤다.Also, the second aspect is the lack of reliable data due to manual labeling, which may be related to the first aspect described above. That is, since manual labeling of emotion labels by annotators inevitably requires significant human effort such as time and cost, conventional studies adopting this manual labeling method are generally limited to 1,000 to 10,000 human-annotations. It was only at the level of constructing an emotion model using human-annotated data.

본원의 배경이 되는 기술은 한국등록특허공보 제10-1713558호에 개시되어 있다.The technology that is the background of the present application is disclosed in Korean Patent Publication No. 10-1713558.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 신뢰성 있는 감정 라벨이 부여된 다수의 온라인 데이터를 포함하도록 수집되어 감정 분석을 위한 적합도가 높은 학습 데이터를 기초로 감정 분류 모델을 구축하는 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치 및 방법을 제공하려는 것을 목적으로 한다.The present application is to solve the problems of the prior art described above, and is collected to include a large number of online data to which a reliable emotion label is given, and online data for constructing an emotion classification model based on learning data with a high degree of suitability for emotion analysis The purpose of the present invention is to provide an artificial intelligence-based emotion classification device and method for

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the embodiments of the present application are not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법은, 제1감정 정보가 부여된 복수의 온라인 데이터를 포함하는 학습 데이터를 준비하는 단계, 상기 학습 데이터에 기초하여, 대상 온라인 데이터가 수신되면 상기 대상 온라인 데이터에 반영된 감정을 파악하는 인공지능 기반의 감정 분류 모델을 구축하는 단계, 상기 대상 온라인 데이터를 수신하는 단계 및 상기 감정 분류 모델에 기초하여 상기 대상 온라인 데이터에 대응되는 감정을 파악하는 단계를 포함할 수 있다.As a technical means for achieving the above technical task, the AI-based emotion classification method for online data according to an embodiment of the present application prepares learning data including a plurality of online data to which first emotion information is given building an artificial intelligence-based emotion classification model for recognizing emotions reflected in the target online data when the target online data is received based on the learning data; receiving the target online data; and classifying the emotions It may include recognizing an emotion corresponding to the target online data based on the model.

또한, 상기 준비하는 단계는, 상기 제1감정 정보에 기초하여 적어도 하나의 감정 범주에 대응하는 제2감정 정보를 정의하는 단계 및 상기 복수의 온라인 데이터를 상기 제2감정 정보에 기초하여 분류하는 단계를 포함할 수 있다.In addition, the preparing may include defining second emotion information corresponding to at least one emotion category based on the first emotion information and classifying the plurality of online data based on the second emotion information. may include.

또한, 상기 제2감정 정보는 상기 제1감정 정보 보다 적은 수의 감정 범주를 포함하도록 정의될 수 있다.In addition, the second emotion information may be defined to include a smaller number of emotion categories than the first emotion information.

또한, 상기 제2감정 정보는, Parrott 감정 모델에 기초하여 정의될 수 있다.Also, the second emotion information may be defined based on a Parrott emotion model.

또한, 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법은, 상기 제2감정 정보에 대응하여 분류된 상기 복수의 온라인 데이터의 일관성 및 대표성을 평가하는 단계를 포함할 수 있다.In addition, the AI-based emotion classification method for online data according to an embodiment of the present application may include evaluating the consistency and representativeness of the plurality of online data classified in response to the second emotion information. .

또한, 상기 수신하는 단계는, 상기 제1감정 정보가 미부여된 상기 대상 온라인 데이터를 수신할 수 있다.In addition, the receiving may include receiving the target online data to which the first emotion information is not given.

또한, 상기 구축하는 단계는, 상기 복수의 온라인 데이터 각각을 토큰화하는 단계, 상기 토큰화된 복수의 온라인 데이터 각각으로부터 미리 설정된 품사에 해당하는 부분을 추출하는 단계 및 상기 추출된 부분에 기초하여 BERT(Bidirectional Encoder Representations from Transformers) 기반의 감정 분류 모델을 생성하는 단계를 포함할 수 있다.In addition, the building step may include tokenizing each of the plurality of online data, extracting a part corresponding to a preset part-of-speech from each of the plurality of tokenized online data, and BERT based on the extracted part (Bidirectional Encoder Representations from Transformers) may include generating a based emotion classification model.

또한, 상기 파악하는 단계는, 상기 감정 분류 모델에 기초하여 상기 제2감정 정보에 포함된 감정 범주 중에서 상기 대상 온라인 데이터에 반영된 주된 감정에 대응하는 감정 범주를 결정할 수 있다.In addition, the identifying may include determining an emotion category corresponding to a main emotion reflected in the target online data from among the emotion categories included in the second emotion information based on the emotion classification model.

또한, 상기 제2감정 정보는, 분노, 두려움, 슬픔, 놀람, 기쁨 및 사랑 중 적어도 하나의 감정 범주를 포함할 수 있다.Also, the second emotion information may include at least one emotion category of anger, fear, sadness, surprise, joy, and love.

또한, 상기 제2감정 정보는, 긍정적 감정으로 분류되는 제1감정 범주 및 부정적 감정으로 분류되는 제2감정 범주를 포함할 수 있다.Also, the second emotion information may include a first emotion category classified as a positive emotion and a second emotion category classified as a negative emotion.

또한, 상기 감정 분류 모델은 상기 제1감정 범주 및 상기 제2감정 범주에 대하여 개별 생성될 수 있다.Also, the emotion classification model may be separately generated for the first emotion category and the second emotion category.

또한, 상기 구축하는 단계는, 상기 제1감정 범주에 포함되는 제2감정 정보를 심리적 정의 및 감정 상황에 기초하여 세분화하는 단계를 포함할 수 있다.In addition, the constructing may include subdividing the second emotion information included in the first emotion category based on psychological definitions and emotional situations.

한편, 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치는, 제1감정 정보가 부여된 복수의 온라인 데이터를 포함하는 학습 데이터를 준비하는 수집부, 상기 학습 데이터에 기초하여, 대상 온라인 데이터가 수신되면 상기 대상 온라인 데이터에 반영된 감정을 파악하는 인공지능 기반의 감정 분류 모델을 구축하는 학습부 및 상기 대상 온라인 데이터를 수신하고, 상기 감정 분류 모델에 기초하여 상기 대상 온라인 데이터에 대응되는 감정을 파악하는 분석부를 포함할 수 있다.On the other hand, the artificial intelligence-based emotion classification apparatus for online data according to an embodiment of the present application, a collection unit for preparing learning data including a plurality of online data to which first emotion information is given, based on the learning data , When the target online data is received, a learning unit that builds an artificial intelligence-based emotion classification model for recognizing emotions reflected in the target online data and the target online data are received, and the target online data is added to the target online data based on the emotion classification model It may include an analysis unit for identifying the corresponding emotion.

또한, 상기 수집부는, 상기 제1감정 정보에 기초하여 적어도 하나의 감정 범주에 대응하는 제2감정 정보를 정의하는 감정 설정부 및 상기 복수의 온라인 데이터를 상기 제2감정 정보에 기초하여 분류하는 데이터 분류부를 포함할 수 있다.In addition, the collection unit may include an emotion setting unit defining second emotion information corresponding to at least one emotion category based on the first emotion information, and data for classifying the plurality of online data based on the second emotion information It may include a classification unit.

또한, 상기 학습부는, 상기 복수의 온라인 데이터 각각을 토큰화하고, 상기 토큰화된 복수의 온라인 데이터 각각으로부터 미리 설정된 품사에 해당하는 부분을 추출하는 데이터 가공부 및 상기 추출된 부분에 기초하여 BERT(Bidirectional Encoder Representations from Transformers) 기반의 감정 분류 모델을 생성하는 모델 구축부를 포함할 수 있다.In addition, the learning unit tokenizes each of the plurality of online data, and a data processing unit for extracting a part corresponding to a preset part-of-speech from each of the plurality of tokenized online data, and a BERT ( Bidirectional Encoder Representations from Transformers) may include a model building unit for generating the based emotion classification model.

또한, 상기 분석부는, 상기 감정 분류 모델에 기초하여 상기 제2감정 정보에 포함된 감정 범주 중에서 상기 대상 온라인 데이터에 반영된 주된 감정에 대응하는 감정 범주를 결정할 수 있다.Also, the analysis unit may determine an emotion category corresponding to the main emotion reflected in the target online data from among the emotion categories included in the second emotion information based on the emotion classification model.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present application. In addition to the exemplary embodiments described above, additional embodiments may exist in the drawings and detailed description.

전술한 본원의 과제 해결 수단에 의하면, 신뢰성 있는 감정 라벨이 부여된 다수의 온라인 데이터를 포함하도록 수집되어 감정 분석을 위한 적합도가 높은 학습 데이터를 기초로 감정 분류 모델을 구축하는 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present application, artificial intelligence for online data that is collected to include a large number of online data to which a reliable emotion label is given and builds an emotion classification model based on learning data with a high degree of suitability for emotion analysis It is possible to provide an apparatus and method for classifying emotions based on the present invention.

전술한 본원의 과제 해결 수단에 의하면, 온라인 데이터에 반영된 주된 감정의 파악을 어렵게 하는 요소에 대한 분석 결과를 기초로 하여 감정 분류 모델을 최적화함으로써 감정 분류 모델의 활용성을 보다 향상시킬 수 있다.According to the above-described problem solving means of the present application, the utility of the emotion classification model can be further improved by optimizing the emotion classification model based on the analysis result of the factors that make it difficult to grasp the main emotion reflected in the online data.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effects obtainable herein are not limited to the above-described effects, and other effects may exist.

도 1은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치를 포함하는 온라인 데이터 분석 시스템의 개략적인 구성도이다.
도 2는 제2감정 정보 및 제2감정 정보 각각의 감정 범주에 대응하도록 수집된 복수의 온라인 데이터를 예시적으로 나타낸 도표이다.
도 3은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치에 의해 수집된 학습 데이터를 평가하기 위하여 예시적으로 비교되는 종래의 감정 데이터 셋을 나타낸 도면이다.
도 4a 및 도 4b는 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치에 의해 수집된 학습 데이터와 종래의 감정 데이터 셋의 데이터 편차를 통계적으로 분석하여 나타낸 도표이다.
도 5는 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치에 의해 수집된 학습 데이터와 종래의 감정 데이터 셋의 일관성을 비교하여 나타낸 도표이다.
도 6은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치에 의해 구축되는 인공지능 기반의 감정 분류 모델의 기능 및 구조를 설명하기 위한 개념도이다.
도 7은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 기법과 연계된 일 실험예로서, 감정 분류 모델의 성능 평가 결과를 예시적으로 나타낸 도표이다.
도 8은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 기법과 연계된 일 실험예로서, 감정 분류 모델에 의해 파악된 감정과 설문자가 주관적으로 평가한 감정이 일치하지 않는 온라인 데이터의 특성을 예시적으로 나타낸 도표이다.
도 9는 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치의 개략적인 구성도이다.
도 10은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법에 대한 동작 흐름도이다.
도 11은 감정 분류 모델의 구축에 필요한 학습 데이터의 가공 프로세스에 대한 세부 동작 흐름도이다.1 is a schematic configuration diagram of an online data analysis system including an artificial intelligence-based emotion classification device for online data according to an embodiment of the present application.
FIG. 2 is a diagram exemplarily illustrating a plurality of online data collected to correspond to the second emotion information and each emotion category of the second emotion information.
3 is a diagram illustrating a conventional emotion data set that is exemplarily compared in order to evaluate learning data collected by an artificial intelligence-based emotion classification apparatus for online data according to an embodiment of the present application.
4A and 4B are diagrams illustrating statistical analysis of data deviation between learning data collected by an artificial intelligence-based emotion classification apparatus for online data and a conventional emotion data set according to an embodiment of the present application.
5 is a table showing the comparison between the learning data collected by the AI-based emotion classification apparatus for online data according to an embodiment of the present application and the consistency of the conventional emotion data set.
6 is a conceptual diagram for explaining the function and structure of an artificial intelligence-based emotion classification model built by an artificial intelligence-based emotion classification apparatus for online data according to an embodiment of the present application.
7 is an experimental example linked to an artificial intelligence-based emotion classification technique for online data according to an embodiment of the present application, and is a chart illustrating performance evaluation results of an emotion classification model.
8 is an experimental example linked to an artificial intelligence-based emotion classification technique for online data according to an embodiment of the present application. It is a diagram showing the characteristics of data by way of example.
9 is a schematic configuration diagram of an artificial intelligence-based emotion classification apparatus for online data according to an embodiment of the present application.
10 is an operation flowchart of an AI-based emotion classification method for online data according to an embodiment of the present application.
11 is a detailed operation flowchart of a processing process of learning data required for construction of an emotion classification model.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art to which the present application pertains can easily implement them. However, the present application may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present application in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" with another part, it is not only "directly connected" but also "electrically connected" or "indirectly connected" with another element interposed therebetween. "Including cases where

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when it is said that a member is positioned "on", "on", "on", "under", "under", or "under" another member, this means that a member is positioned on the other member. It includes not only the case where they are in contact, but also the case where another member exists between two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when a part "includes" a component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

도 1은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치를 포함하는 온라인 데이터 분석 시스템의 개략적인 구성도이다.1 is a schematic configuration diagram of an online data analysis system including an artificial intelligence-based emotion classification device for online data according to an embodiment of the present application.

도 1을 참조하면, 본원의 일 실시예에 따른 온라인 데이터 분석 시스템(10)은, 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치(100)(이하, '감정 분류 장치(100)'라 한다.), 스토리지 서버(200) 및 사용자 단말(300)을 포함할 수 있다.Referring to FIG. 1 , the online data analysis system 10 according to an embodiment of the present application is an artificial intelligence-based emotion classification apparatus 100 for online data according to an embodiment of the present application (hereinafter, 'emotion classification apparatus') (referred to as '100)'), a storage server 200 , and a user terminal 300 .

감정 분류 장치(100), 스토리지 서버(200) 및 사용자 단말(300) 상호간은 네트워크(20)를 통해 통신할 수 있다. 네트워크(20)는 단말들 및 서버들과 같은 각각의 노드 상호간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크(20)의 일 예에는, 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5G 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), wifi 네트워크, 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.The emotion classification apparatus 100 , the storage server 200 , and the user terminal 300 may communicate with each other through the network 20 . The network 20 refers to a connection structure in which information exchange is possible between each node, such as terminals and servers, and an example of such a network 20 includes a 3rd Generation Partnership Project (3GPP) network, a long-term LTE (LTE) network. Term Evolution) network, 5G network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area) Network), a wifi network, a Bluetooth network, a satellite broadcasting network, an analog broadcasting network, a Digital Multimedia Broadcasting (DMB) network, etc. are included, but are not limited thereto.

사용자 단말(300)은 예를 들면, 스마트폰(Smartphone), 스마트패드(SmartPad), 태블릿 PC등과 PCS(Personal Communication System), GSM(Global System for Mobile communication), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말기 같은 모든 종류의 무선 통신 장치일 수 있다.The user terminal 300 is, for example, a smartphone (Smartphone), a smart pad (SmartPad), a tablet PC and the like and PCS (Personal Communication System), GSM (Global System for Mobile communication), PDC (Personal Digital Cellular), PHS ( Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminals The same may be any type of wireless communication device.

본원의 실시예에 관한 설명에서, 스토리지 서버(200)는 이하에서 상세히 설명하는 인공지능 기반의 감정 분류 모델을 학습시키기 위해 필요한 학습 데이터를 저장하기 위한 서버 또는 디바이스일 수 있다.In the description of the embodiment of the present application, the storage server 200 may be a server or device for storing learning data required for learning an artificial intelligence-based emotion classification model to be described in detail below.

감정 분류 장치(100)는 제1감정 정보가 부여된 복수의 온라인 데이터를 포함하는 학습 데이터를 준비할 수 있다. 본원의 일 실시예에 따르면, 감정 분류 장치(100)는 스토리지 서버(200)로부터 복수의 온라인 데이터를 포함하는 학습 데이터를 수신하는 것일 수 있다. 다른 예로, 감정 분류 장치(100)는 적어도 하나의 사용자 단말(300)로부터 해당 사용자 단말(300)에 의해 작성된 온라인 데이터를 학습 데이터로서 수신하는 것일 수 있다.The emotion classification apparatus 100 may prepare learning data including a plurality of online data to which the first emotion information is given. According to an embodiment of the present disclosure, the emotion classification apparatus 100 may receive learning data including a plurality of online data from the storage server 200 . As another example, the emotion classification apparatus 100 may receive, as learning data, online data created by the corresponding user terminal 300 from at least one user terminal 300 .

이와 관련하여, 제1감정 정보는 온라인 데이터 각각의 작성자에 의해 부여된 감정 분류 태그에 대응되는 것일 수 있다. 본원의 일 실시예에 따르면, 후술하는 인공지능 기반의 감정 분류 모델의 구축을 위한 학습 데이터로서 수집되는 온라인 데이터는 작성 시 미리 설정된 복수의 감정 분류 태그 중 적어도 하나를 작성자가 선택하고, 이렇게 작성자에 의해 선택된 감정 분류 태그가 온라인 데이터의 콘텐츠(내용)와 함께 게시(업로드)되는 데이터일 수 있다.In this regard, the first emotion information may correspond to an emotion classification tag assigned by each creator of the online data. According to an embodiment of the present application, the online data collected as learning data for the construction of an artificial intelligence-based emotion classification model to be described later is created by the author selecting at least one of a plurality of emotion classification tags preset at the time of writing, and thus The emotion classification tag selected by the user may be data that is posted (uploaded) together with the content (content) of online data.

한편, 본원의 실시예에 관한 설명에서, '온라인 데이터'는 블로그, 소셜 네트워크 서비스(Social Network Service, SNS) 기반의 게시물(데이터), 리뷰, 댓글, 온라인 포스팅, 음성 데이터, 영상 데이터 등을 폭넓게 포함할 수 있다. 또한, 본원에서의 온라인 데이터는 텍스트 형식의 데이터를 적어도 일부 포함하는 데이터일 수 있으나, 이에만 한정되는 것은 아니고, 본원의 구현예에 따라 음성 데이터, 영상 데이터 등과 같이 그 자체로는 텍스트 타입의 데이터를 미포함하나, 데이터에 반영된 음성을 소정의 변환 수단(예를 들면, STT 알고리즘 등)을 기초로 변환하여 텍스트 데이터로 가공할 수 있는 형식을 가지는 데이터 등을 폭넓게 포함할 수 있다.On the other hand, in the description of the embodiments of the present application, 'online data' refers to blogs, social network service (SNS)-based posts (data), reviews, comments, online postings, voice data, image data, etc. broadly. may include In addition, the online data herein may be data including at least a portion of data in text format, but is not limited thereto, and is text-type data by itself such as voice data, image data, etc. according to an embodiment of the present application. is not included, but may include a wide range of data and the like having a format in which the voice reflected in the data can be converted based on a predetermined conversion means (eg, STT algorithm, etc.) and processed into text data.

이와 관련하여, 감정 분석 또는 감정 분류를 위한 대다수의 종래 기법에서는 온라인 데이터의 작성자와 무관한 주석자가 임의로 직접 감정 라벨을 부여하거나 해시태그 등을 통해 휴리스틱한 방식으로 유추한 감정 데이터를 학습 데이터로 활용하여, 학습 데이터에 대하여 부여되거나 유추된 감정 라벨에 대한 신뢰성 문제가 존재할 뿐만 아니라, 수동 라벨링 작업을 위해 상당한 시간과 노력이 요구되어 다량의 학습용 데이터를 확보하는데 어려움이 있었다.In this regard, in most conventional techniques for emotion analysis or emotion classification, emotion data inferred in a heuristic manner through a hashtag or the like is used as learning data by annotators independent of the creator of the online data. Therefore, not only there is a reliability problem for the emotion label given or inferred for the training data, but also a significant amount of time and effort is required for the manual labeling operation, so it was difficult to secure a large amount of data for learning.

반면에, 본원에서 개시하는 감정 분류 장치(100)는 작성자에 의해 직접 선택(할당)된 감정 분류 태그에 대응되는 제1감정 정보를 포함하는 온라인 데이터를 학습 데이터로 활용함으로써, 학습 데이터 각각의 감정 라벨에 대한 신뢰성이 높을 뿐만 아니라, 감정 정보를 할당하거나 감정 라벨을 부여하기 위한 별도의 수작업 프로세스가 요구되지 않으므로, 매우 방대한 온라인 데이터를 학습 데이터로서 수집하는 것이 비교적 용이한 특징을 갖는다. 따라서, 본원에 의할 때, 방대한 학습 데이터를 활용할 수 있을 뿐만 아니라 온라인 데이터에 반영된 감정 정보의 정확도(신뢰도) 자체도 매우 높으므로, 이를 활용하여 구축되는 감정 분류 모델의 정확도가 획기적으로 향상될 수 있는 것이다.On the other hand, the emotion classification apparatus 100 disclosed herein utilizes online data including first emotion information corresponding to the emotion classification tag directly selected (allocated) by the creator as learning data, thereby providing each emotion of the learning data. The reliability of the label is high, and since a separate manual process for allocating emotion information or giving an emotion label is not required, it is relatively easy to collect a very large amount of online data as learning data. Therefore, according to the present application, not only can a large amount of learning data be utilized, but also the accuracy (reliability) of the emotion information reflected in the online data is very high, so the accuracy of the emotion classification model built using this can be dramatically improved. there will be

이해를 돕기 위해 예시하면, 감정 분류 장치(100)는 동료 지원과 사회적 연결의 취지로 만들어진 온라인 소셜 커뮤니티 "TalkLife" 서비스에 기초하여 작성된 게시물을 학습 데이터로서 수집할 수 있으나, 이에만 한정되는 것은 아니다. 특히, "TalkLife" 서비스는 사전 정의된 수십 가지의 감정 분류 태그 중에서 게시물의 작성자의 감정을 선택하는 과정이 요구되는 특성을 갖는다.As an example to help understanding, the emotion classification device 100 may collect posts created based on the online social community “TalkLife” service created for the purpose of peer support and social connection as learning data, but is not limited thereto. . In particular, the "TalkLife" service has a characteristic that requires the process of selecting the emotion of the author of the post from among dozens of predefined emotion classification tags.

이하에서는 설명의 편의를 위하여, "TalkLife" 서비스에 기초하여 수집된 학습 데이터(본원의 일 실시예에 관련된 일 실험예에서는 약 90만개의 다수 게시물을 학습 데이터로 수집)를 기초로 본원에서 개시하는 감정 분류 모델을 구축하는 프로세스를 수행하는 것으로 가정하여 본원의 실시예를 설명하도록 한다.Hereinafter, for convenience of explanation, based on the learning data collected based on the "TalkLife" service (in an experimental example related to an embodiment of the present application, about 900,000 multiple posts are collected as learning data). An embodiment of the present application will be described on the assumption that a process of building an emotion classification model is performed.

감정 분류 장치(100)는 수집된 복수의 온라인 데이터 중 미리 설정된 언어에 부합하는 텍스트를 포함하는 온라인 데이터를 선별하는 전처리를 수행할 수 있다. 예시적으로, 감정 분류 장치(100)는 영어로 작성된 온라인 데이터를 선별하는 전처리를 수행하도록 파이썬 기반의 isalpha 함수를 활용하여 아랍어, 프랑스어, 포르투갈어 등에 기반하여 작성된 온라인 데이터를 배제하도록 동작할 수 있으나, 이에만 한정되는 것은 아니다.The emotion classification apparatus 100 may perform pre-processing of selecting online data including text corresponding to a preset language from among a plurality of collected online data. Illustratively, the emotion classification device 100 may operate to exclude online data written based on Arabic, French, Portuguese, etc. by using a Python-based isalpha function to perform preprocessing for screening online data written in English, However, the present invention is not limited thereto.

감정 분류 장치(100)는 제1감정 정보에 기초하여 적어도 하나의 감정 범주에 대응하는 제2감정 정보를 정의할 수 있다. 이와 관련하여, 제2감정 정보는 제1감정 정보 보다 적은 수의 감정 범주를 포함하도록 정의될 수 있으며, 본원의 일 실시예에 따르면, 감정 분류 장치(100)는 Parrott 감정 모델에 기초하여 제1감정 정보로부터 제2감정 정보를 정의할 수 있다.The emotion classification apparatus 100 may define second emotion information corresponding to at least one emotion category based on the first emotion information. In this regard, the second emotion information may be defined to include a smaller number of emotion categories than the first emotion information. The second emotion information may be defined from the emotion information.

전술한 Parrott 감정 모델과 관련하여, 감정 연구에는 여러 심리 모델이 사용되고 있으며, 크게 두 가지 관점으로 분류되며, 이러한 두 가지 관점의 분류는 각각 이산 감정(discrete emotion) 및 차원 감정(dimensional emotion)이다. 먼저, 이산 감정에 따르면 감정은 생리적, 행동적, 표현적 특성에 근거하여 구분될 수 있으며 Ekman 모델이 대표적이 예시이다. Ekman 모델은 얼굴 표정과 관련하여 감정을 6가지(구체적으로, 행복, 슬픔, 놀람, 두려움, 혐오, 분노)로 정의하였다. 또한, 이산 감정에 따른 또 다른 감정 모델인 Plutchik 모델은 8가지(구체적으로, 기쁨과 슬픔, 신뢰와 혐오, 공포와 분노, 놀라움과 기대)의 감정을 상반된 4개의 쌍으로 정의하였으나, 8개의 감정만으로는 사람의 다양한 감정을 범주화하기에는 충분하지 않을 수 있다. 반면, 본원에서 제2감정 정보를 정의하기 위해 사용되는 Parrott 모델은 보다 넓은 범위의 감정을 포괄하기 위해 세 가지 계층(수준)의 분류 체계를 가지며, 첫 번째 계층에서는 주요 감정을 분노, 두려움, 슬픔, 놀라움, 기쁨, 사랑의 6가지 범주로 분류하고 두 번째 계층에서는 25개의 감정 범주를 고려하였으며, 세 번째 계층에서는 더욱 세분화된 115개로 감정 범주를 확장하였다.With respect to the aforementioned Parrott emotion model, various psychological models are used in emotion research, and are largely classified into two perspectives, and the classification of these two perspectives is discrete emotion and dimensional emotion, respectively. First, according to discrete emotions, emotions can be classified based on physiological, behavioral, and expressive characteristics, and the Ekman model is a representative example. The Ekman model defined six emotions (specifically, happiness, sadness, surprise, fear, disgust, and anger) related to facial expressions. In addition, the Plutchik model, which is another emotion model according to discrete emotions, defined 8 emotions (specifically, joy and sadness, trust and disgust, fear and anger, surprise and expectation) as 4 opposite pairs, but 8 emotions It may not be sufficient to categorize a person's various emotions by itself. On the other hand, the Parrott model used to define the second emotion information herein has a classification system of three layers (levels) to cover a wider range of emotions, and the first layer divides the main emotions into anger, fear, and sadness. , surprise, joy, and love were classified into 6 categories, and 25 emotion categories were considered in the second layer, and the emotion categories were expanded to 115 more subdivided categories in the third layer.

이와 관련하여, 감정 분류 장치(100)는 Parrot 모델의 첫 번째 계층에서 고려되는 분노, 두려움, 슬픔, 놀람, 기쁨 및 사랑 중 적어도 하나의 감정 정보를 포함하도록 제1감정 정보로부터 제2감정 정보를 정의할 수 있다. 또한, 본원의 일 실시예에 따르면, 감정 분류 장치(100)는 제2감정 정보 대비 다수의 감정 범주를 포함하는 제1감정 정보를 제2감정 정보로 정의하기 위하여, 제1감정 정보 각각의 감정 범주를 Parrot 모델의 두 번째 계층 또는 세 번째 계층에서 고려되는 세분화된 감정 범주와의 유사도를 기초로 하여 분노, 두려움, 슬픔, 놀람, 기쁨 및 사랑 중 어느 하나의 감정 정보에 대응하도록 제1감정 정보 각각에 대한 제2감정 정보를 결정할 수 있다.In this regard, the emotion classification device 100 receives the second emotion information from the first emotion information to include at least one emotion information of anger, fear, sadness, surprise, joy, and love, which is considered in the first layer of the Parrot model. can be defined In addition, according to an embodiment of the present application, the emotion classification apparatus 100 determines each emotion of the first emotion information to define the first emotion information including a plurality of emotion categories compared to the second emotion information as the second emotion information. First emotional information to respond to any one of emotional information of anger, fear, sadness, surprise, joy, and love based on the similarity of the category with the subdivided emotional category considered in the second or third layer of the Parrot model It is possible to determine the second emotion information for each.

도 2는 제2감정 정보 및 제2감정 정보 각각의 감정 범주에 대응하도록 수집된 복수의 온라인 데이터를 예시적으로 나타낸 도표이다.FIG. 2 is a diagram exemplarily illustrating a plurality of online data collected to correspond to the second emotion information and each emotion category of the second emotion information.

도 2를 참조하면, 본원의 일 실시예에 따르면, 감정 분류 장치(100)는 계층적 감정 범주를 포함하는 Parrott 감정 모델에 기초하여 작성자가 부여한 복수의 감정 분류 태그를 포함하는 제1감정 정보를 분노, 두려움, 슬픔, 놀람, 기쁨 및 사랑의 감정 범주에 각각 대응하는 제2감정 정보에 할당(그룹핑)할 수 있다. 보다 구체적으로 예시하면, 감정 분류 장치(100)는 "TalkLife" 서비스에 기초하여 수집된 학습 데이터에 59개의 사전 정의된 감정 분류 태그가 포함되는 경우, "Meh", "Numb", "Nothing", "Inadequate" 등 감정 표현과 관련성이 적은 4개의 감정 분류 태그를 제외한 55개의 감정 분류 태그를 포함하는 제1감정 정보를 제2감정 정보로서 정의된 분노, 두려움, 슬픔, 놀람, 기쁨 및 사랑 각각의 감정 범주에 대하여 7개, 6개, 14개, 3개, 21개, 4개의 감정 분류 태그를 할당(그룹핑)하는 전처리를 수행할 수 있다.Referring to FIG. 2 , according to an embodiment of the present application, the emotion classification apparatus 100 receives first emotion information including a plurality of emotion classification tags provided by the author based on a Parrott emotion model including a hierarchical emotion category. It is possible to assign (group) the second emotional information corresponding to the emotional categories of anger, fear, sadness, surprise, joy, and love, respectively. More specifically, if the emotion classification apparatus 100 includes 59 predefined emotion classification tags in the learning data collected based on the "TalkLife" service, "Meh", "Numb", "Nothing", The first emotion information including 55 emotion classification tags excluding the four emotion classification tags that have little relevance to emotional expression such as "Inadequate" as the second emotion information, each of anger, fear, sadness, surprise, joy and love Pre-processing of assigning (grouping) 7, 6, 14, 3, 21, and 4 emotion classification tags to emotion categories may be performed.

이하에서는, 도 3 내지 도 5를 참조하여 본원에서 개시하는 감정 분류 장치(100)에 의해 수집된 학습 데이터를 종래의 감정 데이터 셋과 비교하여 평가한 결과를 설명하도록 한다.Hereinafter, the evaluation result of comparing the learning data collected by the emotion classification apparatus 100 disclosed herein with a conventional emotion data set will be described with reference to FIGS. 3 to 5 .

도 3은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치에 의해 수집된 학습 데이터의 적합성(보다 구체적으로, 인공지능 기반의 감정 분류 모델을 구축하기 위한 학습 데이터로서의 적합성을 반영하는 일관성 및 대표성)을 평가하기 위하여 예시적으로 비교되는 종래의 감정 데이터 셋을 나타낸 도면이다.3 shows the suitability of the learning data collected by the artificial intelligence-based emotion classification apparatus for online data according to an embodiment of the present application (more specifically, the suitability as learning data for building an artificial intelligence-based emotion classification model) It is a diagram showing a conventional emotional data set that is exemplarily compared in order to evaluate the reflecting consistency and representativeness).

도 3을 참조하면, 종래의 감정 데이터 셋으로서, Crowdflower 데이터, Emotion-Stimulus 데이터 및 ISEAR 데이터를 고려할 수 있다. 또한, 도 3을 참조하면, Crowdflower 데이터는 14개의 감정 분류를 포함하고, Emotion-Stimulus 데이터 및 ISEAR 데이터는 각각 7개의 감정 분류를 포함할 수 있다. 참고로, Emotion-Stimulus 데이터 및 ISEAR 데이터는 '사랑'의 감정 범주에 대응되는 데이터를 미포함하므로, Emotion-Stimulus 데이터 및 ISEAR 데이터에 대하여는 사랑을 제외한 네 가지 감정인 분노, 슬픔, 두려움 및 기쁨의 감정 범주에 대하여 본원에서의 학습 데이터와의 비교를 수행하였다.Referring to FIG. 3 , as a conventional emotion data set, Crowdflower data, Emotion-Stimulus data, and ISEAR data may be considered. In addition, referring to FIG. 3 , the Crowdflower data may include 14 emotion classifications, and the Emotion-Stimulus data and ISEAR data may include 7 emotion classifications, respectively. For reference, Emotion-Stimulus data and ISEAR data do not include data corresponding to the emotion category of 'love', so for Emotion-Stimulus data and ISEAR data, the emotion categories of anger, sadness, fear, and joy are four emotions except love. was compared with the training data in the present application.

본원의 학습 데이터의 적합성을 평가하기 위한 실험에서는, 각각의 감정 데이터에 텍스트 전처리를 적용하고, 임베딩 방법인 Word2Vec 기법을 통해 각각의 감정 데이터 별 모델을 구축하고, 소정의 분석 수단(예를 들면, Word2Vec의 most_similar 함수 등)을 통해 각각의 감정 범주에 대해 의미상 가장 유사한 것으로 판단되는 복수 개(예를 들면, 20개)의 단어를 감정 데이터 셋 별로 추출하였다. 또한, 이후에 복수의 응답자를 대상으로 추출된 단어들을 기반으로 한 감정 데이터의 일관성 및 대표성에 대한 설문 결과에 기초하여 통계적 검정을 시행하고, 정규성 및 등분산성 검정 후 ANOVA 분석과 Tukey 사후 검정을 진행하였다. In the experiment to evaluate the suitability of the learning data of the present application, text preprocessing is applied to each emotion data, a model for each emotion data is built through the Word2Vec technique, which is an embedding method, and a predetermined analysis means (for example, Word2Vec's most_similar function, etc.), a plurality of (eg, 20) words judged to be the most semantically similar for each emotion category were extracted for each emotion data set. Afterwards, statistical tests were performed based on the questionnaire results on the consistency and representativeness of emotional data based on the words extracted from multiple respondents, followed by normality and equality of variance tests, followed by ANOVA analysis and Tukey's post-test. did.

이와 관련하여, 도 4a 및 도 4b는 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치에 의해 수집된 학습 데이터와 종래의 감정 데이터 셋의 데이터 편차를 통계적으로 분석하여 나타낸 도표이다.In this regard, FIGS. 4A and 4B are tables showing statistical analysis of data deviations between the learning data collected by the AI-based emotion classification apparatus for online data and the conventional emotion data set according to an embodiment of the present application. am.

도 4a 및 도 4b를 참조하면, 모든 감정 범주(분노, 슬픔, 두려움, 기쁨, 사랑)에 대하여 본원에서 개시하는 감정 분류 장치(100)에 의해 수집된 학습 데이터로부터 도출된 단어 집합과 3개의 종래의 감정 데이터 셋에 도출된 단어 집합 사이에는 유의미한 집단 차이가 존재하는 것을 확인할 수 있다.4A and 4B, for all emotion categories (anger, sadness, fear, joy, love), the word set derived from the learning data collected by the emotion classification apparatus 100 disclosed herein and three conventional It can be confirmed that there is a significant group difference between the word sets derived from the emotion data set.

또한, 도 5는 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치에 의해 수집된 학습 데이터와 종래의 감정 데이터 셋의 일관성을 비교하여 나타낸 도표이다.In addition, FIG. 5 is a chart showing the comparison between the learning data collected by the AI-based emotion classification apparatus for online data according to an embodiment of the present application and the consistency of the conventional emotion data set.

도 5를 참조하면, Tukey 사후 분석 결과, 단어 집합(키워드 집합)에 포함된 단어들이 해당 감정 범주를 일관성 있게 묘사하고 있는가에 대한 설문(보다 구체적으로, 감정 범주 각각에 대하여 1 내지 5의 점수를 할당하도록 설문이 수행되되 1에 가까운 값일수록 일관되지 않은 것으로 판단되는 경우에 대응되고, 5에 가까운 값일수록 일관된 것으로 판단되는 경우에 대응) 결과, "TalkLife" 서비스에 기초하여 수집된 본원에서의 학습 데이터에 대한 일관성 평가 결과 값이 나머지 종래 감정 데이터 셋 각각에 대한 평가 결과 값보다 큰 값(달리 말해, 5에 가까운 값)으로 평가되어 일관성 측면에서 본원에서의 학습 데이터가 가장 우수한 것으로 평가된 것을 확인할 수 있다.Referring to FIG. 5 , as a result of Tukey's post-mortem analysis, a questionnaire about whether words included in a word set (keyword set) consistently describe the corresponding emotion category (more specifically, a score of 1 to 5 for each emotion category) A survey is performed to assign, but a value closer to 1 corresponds to a case that is judged to be inconsistent, and a value closer to 5 corresponds to a case judged to be consistent) As a result, learning in the present application collected based on the "TalkLife" service It can be confirmed that the consistency evaluation result value for the data is evaluated as a value larger (in other words, a value close to 5) than the evaluation result value for each of the remaining conventional emotional data sets, so that the learning data in this application is evaluated as the best in terms of consistency. can

또한, 본원의 일 실시예에 따른 감정 분류 장치(100)와 연계된 전술한 실험예에서는 감정을 가장 잘 표현하고 대표하는 데이터 셋을 선택하도록 요청된 설문 결과, "TalkLife" 서비스에 기초하여 수집된 본원에서의 학습 데이터에 대한 채택률이 95.5% 수준으로 가장 높게 나타나, 대표성 측면에서도 본원에서의 학습 데이터가 가장 우수한 것으로 평가된 것을 확인할 수 있었다.In addition, in the above-described experimental example in connection with the emotion classification apparatus 100 according to an embodiment of the present application, as a result of a questionnaire request to select a data set that best expresses and represents emotion, the "TalkLife" service collected based on The adoption rate for the learning data in the present application was the highest at 95.5%, confirming that the learning data in the present application was evaluated as the best in terms of representativeness.

따라서, 전술한 실험 결과에 따르면, 본원의 일 실시예에 따른 감정 분류 장치(100)에 의해 고려되는 감정 표현(달리 말해, 제2감정 정보에 반영된 감정 범주의 분류)이 감정 분류 모델 구축을 위한 종래의 연구들에서 활용된 기존 감정 데이터 셋 대비 일관성 및 대표성 측면에서 적합도가 높은 것을 확인할 수 있으며, 이에 따라 전술한 프로세스를 통해 수집되는 학습 데이터를 기초로 수행되는 감정 모델링을 통하여 보다 객관적인 평가와 개선에 대한 구체적인 관점을 제시할 수 있을 것으로 합리적으로 기대할 수 있다.Therefore, according to the above-described experimental results, the emotional expression (in other words, classification of the emotion category reflected in the second emotion information) considered by the emotion classification apparatus 100 according to an embodiment of the present application is used for constructing an emotion classification model. It can be confirmed that the suitability is high in terms of consistency and representativeness compared to the existing emotion data set used in previous studies, and accordingly, more objective evaluation and improvement through emotion modeling performed based on the learning data collected through the above process It can be reasonably expected to be able to present a specific point of view on

또한, 본원의 일 실시예에 따르면, 감정 분류 장치(100)는 제2감정 정보에 대응하여 분류된 복수의 온라인 데이터의 일관성 및 대표성을 평가할 수 있다.Also, according to an embodiment of the present application, the emotion classification apparatus 100 may evaluate the consistency and representativeness of a plurality of online data classified in response to the second emotion information.

예를 들면, 감정 분류 장치(100)는 복수의 응답자에 의해 평가된 온라인 데이터의 제2감정 정보의 감정 범주 별 설문 데이터를 수신하고, 수신된 설문 데이터에 기초하여 학습 데이터의 일관성 및 대표성을 수치화(연산)할 수 있다.For example, the emotion classification apparatus 100 receives questionnaire data for each emotion category of the second emotion information of online data evaluated by a plurality of respondents, and quantifies the consistency and representativeness of the learning data based on the received questionnaire data. (arithmetic) can be done.

또한, 감정 분류 장치(100)는 수치화(연산)된 감정 범주 각각의 일관성 및 대표성 중 적어도 하나에 대한 평가 결과에 기초하여 학습 데이터 수집(준비) 프로세스의 완결 여부를 결정할 수 있다. 예를 들어, 감정 분류 장치(100)는 평가된 일관성 또는 대표성의 평가 결과가 미리 설정된 임계 수준 이하인 것으로 평가된 감정 범주에 대하여는 기 수집된 온라인 데이터 중 적어도 일부를 제거하거나 해당 감정 범주에 대응하는 온라인 데이터를 추가로 수집하도록 동작할 수 있다.In addition, the emotion classification apparatus 100 may determine whether the learning data collection (preparation) process is completed based on the evaluation result of at least one of consistency and representativeness of each of the quantified (calculated) emotion categories. For example, the emotion classification apparatus 100 removes at least a portion of the online data collected in advance for an emotion category in which the evaluation result of the evaluated consistency or representativeness is less than or equal to a preset threshold level, or removes at least a portion of the online data corresponding to the emotion category. It may act to further collect data.

도 6은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치에 의해 구축되는 인공지능 기반의 감정 분류 모델의 기능 및 구조를 설명하기 위한 개념도이다.6 is a conceptual diagram for explaining the function and structure of an artificial intelligence-based emotion classification model built by an artificial intelligence-based emotion classification apparatus for online data according to an embodiment of the present application.

도 6을 참조하면, 감정 분류 장치(100)는 수집된 학습 데이터에 기초하여, 대상 온라인 데이터가 수신되면 수신된 대상 온라인 데이터에 반영된 감정을 파악하는 인공지능 기반의 감정 분류 모델을 구축할 수 있다.Referring to FIG. 6 , the emotion classification apparatus 100 may build an artificial intelligence-based emotion classification model for recognizing emotions reflected in the received target online data when the target online data is received, based on the collected learning data. .

또한, 도 6을 참조하면, 감정 분류 장치(100)는 BERT(Bidirectional Encoder Representations from Transformers) 기반의 감정 분류 모델을 구축할 수 있다.Also, referring to FIG. 6 , the emotion classification apparatus 100 may build an emotion classification model based on Bidirectional Encoder Representations from Transformers (BERT).

보다 구체적으로, 본원의 일 실시예에 따르면, 감정 분류 장치(100)는 BERT 기반의 감정 분류 모델의 구축을 위해, 학습 데이터에 포함된 복수의 온라인 데이터 각각을 토큰화할 수 있다. 이와 관련하여, 본원의 일 실시예에 따르면, 감정 분류 장치(100)는 소정의 자연어 처리 수단(예를 들면, Natural Language Toolkit, NLKT 등)을 활용하여 온라인 데이터 각각을 토큰화(Tokenizing)할 수 있다.More specifically, according to an embodiment of the present application, the emotion classification apparatus 100 may tokenize each of a plurality of online data included in the training data in order to build a BERT-based emotion classification model. In this regard, according to an embodiment of the present application, the emotion classification apparatus 100 may tokenize each of the online data using a predetermined natural language processing means (eg, Natural Language Toolkit, NLKT, etc.). have.

또한, 감정 분류 장치(100)는 토큰화된 복수의 온라인 데이터 각각으로부터 미리 설정된 품사에 해당하는 부분을 추출할 수 있다. 예를 들어, 미리 설정된 품사는 감정 표현과 연계된 내용을 포함할 가능성이 높은 명사, 형용사 및 부사를 포함하도록 설정될 수 있다.Also, the emotion classification apparatus 100 may extract a portion corresponding to a preset part-of-speech from each of the plurality of tokenized online data. For example, the preset part-of-speech may be set to include nouns, adjectives, and adverbs that are highly likely to include content related to emotional expression.

또한, 감정 분류 장치(100)는 추출된 부분에 기초하여 BERT 기반의 감정 분류 모델을 생성할 수 있다. BERT(Bidirectional Encoder Representations from Transformers)는 인코더-디코더 구조의 트랜스포머(Transformer) 아키텍쳐를 기반으로 한 인공지능 모델로서, 입력의 심층 표현(Representation)을 위해 복수의 트랜스포머 계층을 쌓고, 토큰 시퀀스인 마스킹 언어 모델(Masking Language Model)에 마스킹 과정을 적용하는 것을 특징으로 한다.Also, the emotion classification apparatus 100 may generate a BERT-based emotion classification model based on the extracted part. BERT (Bidirectional Encoder Representations from Transformers) is an artificial intelligence model based on a transformer architecture of encoder-decoder structure. It builds a plurality of transformer layers for in-depth representation of input, and a masking language model that is a token sequence. It is characterized by applying a masking process to (Masking Language Model).

예측할 단어를 마스크 토큰으로 대체하여 순차적으로 단어를 마스킹하는 기존 트랜스포머의 절차와 달리 BERT는 단어의 퍼센티지를 랜덤으로 마스킹하여 모델의 양방향 특성을 결정하는 절차를 포함한다. 따라서, BERT는 해당 절차를 통해 마스킹된 단어의 양방향 정보를 얻을 수 있으며, 맥락 내에서 누락된 단어 선정 시 인간과 유사한 접근 방식을 취할 수 있다는 이점이 있다.Unlike the existing transformer procedure that masks words sequentially by replacing the word to be predicted with a mask token, BERT includes a procedure to determine the bidirectional characteristics of the model by randomly masking the percentage of words. Therefore, BERT has the advantage of being able to obtain interactive information of the masked word through the corresponding procedure, and taking a human-like approach when selecting the missing word within the context.

이와 관련하여, 본원에서는 복수 개의 층(예를 들면, 12개)의 트랜스포머 블록을 포함하는 BERT 모델을 구축할 수 있다. 보다 구체적으로, 본원에서의 BERT 모델의 인코더 측의 각 블록은 12-head 셀프-어텐션(self-attention) 층 및 768-dimensional 은닉(hidden) 층을 포함하는 구조로 설계될 수 있으며, 이에 따라 총 110만개의 파라미터를 산출하도록 동작할 수 있다. 한편, 본원의 BERT 기반 모델의 미세 조정(fine-tuning)과 관련하여서는 배치 크기(예를 들면, 64)를 제외한 대부분의 하이퍼 파라미터는 사전 훈련 모델과 동일하게 유지될 수 있다.In this regard, herein, a BERT model including a plurality of layers (eg, 12) of transformer blocks may be constructed. More specifically, each block on the encoder side of the BERT model herein may be designed with a structure including a 12-head self-attention layer and a 768-dimensional hidden layer, and thus the total It can operate to yield 1.1 million parameters. On the other hand, with respect to the fine-tuning of the BERT-based model of the present application, most hyperparameters except for the batch size (eg, 64) may be maintained the same as the pre-trained model.

앞서 설명한 바와 같이, 인공지능 기반의 감정 분류 모델의 학습(구축)이 완료되고 나면, 감정 분류 장치(100)는 대상 온라인 데이터를 수신할 수 있다. 여기서, 대상 온라인 데이터는 제1감정 정보가 미부여된 온라인 데이터를 포함할 수 있으나, 이에만 한정되는 것은 아니다. 다른 예로, 감정 분류 장치(100)는 작성자가 할당(부여)한 감정 분류 태그를 포함하여 제1감정 정보가 부여된 온라인 데이터를 대상 온라인 데이터로서 수신하되, 제1감정 정보가 부여된 온라인 데이터에 대하여 대상 온라인 데이터의 콘텐츠(내용)로부터 인공지능 기반의 감정 분류 모델을 통해 파악된 감정과 해당 온라인 데이터에 대하여 할당된 제1감정 정보의 일치 또는 유사 수준을 평가하도록 동작할 수 있다.As described above, after learning (building) of the AI-based emotion classification model is completed, the emotion classification apparatus 100 may receive target online data. Here, the target online data may include online data to which the first emotion information is not provided, but is not limited thereto. As another example, the emotion classification apparatus 100 receives, as target online data, the online data to which the first emotion information is given, including the emotion classification tag assigned (granted) by the author, but to the online data to which the first emotion information is given In response, it may operate to evaluate the level of matching or similarity between the emotion identified through the AI-based emotion classification model from the content (content) of the target online data and the first emotion information allocated to the online data.

본원의 일 실시예에 따르면, 감정 분류 장치(100)는 사용자 단말(300)로부터 사용자 단말(300)의 사용자가 작성한 온라인 데이터를 수신하는 것일 수 있다.According to an embodiment of the present application, the emotion classification apparatus 100 may receive online data written by a user of the user terminal 300 from the user terminal 300 .

또한, 감정 분류 장치(100)는 구축(생성)된 감정 분류 모델에 기초하여 대상 온라인 데이터에 대응되는 감정을 파악할 수 있다. 구체적으로, 감정 분류 장치(100)는 감정 분류 모델에 기초하여 제2감정 정보에 포함된 감정 범주 중에서 대상 온라인 데이터에 반영된 주된 감정에 대응하는 감정 범주를 결정할 수 있다.Also, the emotion classification apparatus 100 may identify an emotion corresponding to the target online data based on the constructed (generated) emotion classification model. Specifically, the emotion classification apparatus 100 may determine an emotion category corresponding to the main emotion reflected in the target online data from among the emotion categories included in the second emotion information based on the emotion classification model.

보다 구체적으로, 감정 분류 장치(100)는 수신된 대상 온라인 데이터에 포함된 텍스트 데이터를 소정의 단위(예를 들면, 문장 단위, 단락 단위 등)로 분할하고, 분할된 단위 텍스트 데이터 각각을 인공지능 기반의 감정 분류 모델에 인가하고, 단위 텍스트 데이터 각각에 대하여 분석된 감정 범주에 대한 통계/확률적 처리에 기초하여 대상 온라인 데이터에 주되게 반영된 감정 범주를 파악하고, 이를 제2감정 정보 중 적어도 하나의 유형의 감정에 매칭시킬 수 있다.More specifically, the emotion classification apparatus 100 divides text data included in the received target online data into predetermined units (eg, sentence units, paragraph units, etc.), and divides each divided unit text data into artificial intelligence. Based on the emotion classification model, the emotion category mainly reflected in the target online data is identified based on statistical/probabilistic processing for the emotion category analyzed for each unit text data, and this is at least one of the second emotion information can be matched to the types of emotions of

달리 말해, 도 6을 참조하면, 본원의 일 실시예에 따른 감정 분류 장치(100)는 학습 데이터(Labeled Text Input, 1)에 기초하여 구축된 BERT 기반 감정 분류 모델에 기초하여 감정 분류 장치(100)로 인가된 대상 온라인 데이터에 반영된 제2감정 정보 중 적어도 하나의 감정 범주(Emotion Label, 2)를 출력하도록 동작할 수 있다.In other words, referring to FIG. 6 , the emotion classification apparatus 100 according to an embodiment of the present application provides an emotion classification apparatus 100 based on a BERT-based emotion classification model constructed based on the training data (Labeled Text Input, 1). ) may operate to output at least one emotion category (Emotion Label, 2) of the second emotion information reflected in the applied target online data.

이하에서는, 도 7 및 도 8을 참조하여 본원에서 개시하는 감정 분류 장치(100)에 의한 감정 분류 성능에 대한 평가 결과와 감정 분류 성능에 기초하여 감정 분류 모델을 최적화하는 기법에 대해 설명하도록 한다.Hereinafter, a technique for optimizing the emotion classification model based on the evaluation result of the emotion classification performance by the emotion classification apparatus 100 disclosed herein and the emotion classification performance will be described with reference to FIGS. 7 and 8 .

도 7은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 기법과 연계된 일 실험예로서, 감정 분류 모델의 성능 평가 결과를 예시적으로 나타낸 도표이다.7 is an experimental example linked to an artificial intelligence-based emotion classification technique for online data according to an embodiment of the present application, and is a chart illustrating performance evaluation results of an emotion classification model.

도 7을 참조하면, 감정 분류 장치(100)의 BERT 기반의 감정 분류 모델의 성능을 평가하기 위하여 수행된 실험예에 따르면, 수집된 온라인 데이터를 80%의 훈련 데이터 및 20%의 검증 데이터 로 분리하여 교차 검증(5-fold cross validation)을 수행한 결과, 본원에서 개시하는 BERT 기반의 감정 분류 모델은 전체 감정 범주에 대하여 81% 수준의 정확도(Precision)를 보이는 것을 확인할 수 있으며, 이는 종래의 감정 분류 연구에서 도출된 70% 대의 성능 수준에 비하여 유의미하게 정확도가 상승하는 것을 확인할 수 있다.Referring to FIG. 7 , according to an experimental example performed to evaluate the performance of the BERT-based emotion classification model of the emotion classification apparatus 100, the collected online data is divided into 80% training data and 20% verification data. As a result of performing 5-fold cross validation, it can be confirmed that the BERT-based emotion classification model disclosed herein exhibits an accuracy of 81% for the entire emotion category, which is a conventional emotion classification model. It can be seen that the accuracy increases significantly compared to the 70% performance level derived from the classification study.

도 8은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 기법과 연계된 일 실험예로서, 감정 분류 모델에 의해 파악된 감정과 설문자가 주관적으로 평가한 감정이 일치하지 않는 온라인 데이터의 특성을 예시적으로 나타낸 도표이다.8 is an experimental example linked to an artificial intelligence-based emotion classification technique for online data according to an embodiment of the present application. It is a diagram showing the characteristics of data by way of example.

도 8을 참조하면, 소정의 대상 온라인 데이터에 대하여 감정 분류 장치(100)의 감정 분류 모델에 의하여 도출된 감정 정보(제2감정 정보)와 소정의 설문자가 해당 대상 온라인 데이터를 직접 확인하고 평가한 감정 정보(도 8의 설문 결과)가 비교적 불일치하는 것으로 판단된 대상 온라인 데이터는 복수의 문장을 포함하되, 복수의 문장 각각에 반영된 감정이 상이한 특성을 갖거나 복수의 감정 범주 자체의 경계가 모호하거나 특정 감정에 대한 강도가 모호한 특징을 갖는 것으로 분석되었다.Referring to FIG. 8 , emotion information (second emotion information) derived by the emotion classification model of the emotion classification apparatus 100 for predetermined target online data and a predetermined surveyor directly check and evaluate the target online data The target online data determined to be relatively inconsistent with emotion information (the survey result of FIG. 8) includes a plurality of sentences, but the emotions reflected in each of the plurality of sentences have different characteristics, or the boundaries of the plurality of emotion categories themselves are vague, or It was analyzed that the intensity of a specific emotion had ambiguous characteristics.

보다 구체적으로, 본원의 일 실시예에 따르면, 제2감정 정보는 긍정적 감정으로 분류되는 제1감정 범주 및 부정적 감정으로 분류되는 제2감정 범주를 포함하도록 정의될 수 있다. 구체적으로, 제1감정 범주는 기쁨, 사랑 등을 포함할 수 있다. 또한, 제2감정 범주는 분노, 두려움 등을 포함할 수 있다. More specifically, according to an embodiment of the present application, the second emotion information may be defined to include a first emotion category classified as a positive emotion and a second emotion category classified as a negative emotion. Specifically, the first emotion category may include joy, love, and the like. In addition, the second emotion category may include anger, fear, and the like.

또한, 이와 관련하여, 감정 분류 장치(100)는 제1감정 범주 및 제2감정 범주에 대하여 감정 분류 모델을 개별 생성하도록 동작할 수 있다. 달리 말해, 긍정적 감정은 부정적 감정에 비해 서로 다른 감정 범주(예를 들면, 기쁨과 사랑 등) 사이의 경계가 모호한 경우가 많으므로, 부정적 감정으로 분류되는 감정 범주를 파악하기 위한 감정 분류 모델과 별도의 세부 설정(예를 들면, BERT 모델의 가중치, 파라미터 등)을 갖도록 구축될 수 있다.Also, in relation to this, the emotion classification apparatus 100 may operate to separately generate an emotion classification model for the first emotion category and the second emotion category. In other words, since positive emotions often have more blurred boundaries between different emotional categories (e.g., joy and love) compared to negative emotions, they are separate from the emotion classification model for identifying the emotional categories classified as negative emotions. It can be built to have detailed settings (eg, weights, parameters, etc. of the BERT model).

보다 구체적으로, 본원의 일 실시예에 따르면, 감정 분류 장치(100)는 긍정적 감정에 해당하는 제1감정 범주에 대응하는 감정 분류 모델의 생성(구축)시, 제1감정 범주에 포함되는 제2감정 정보를 감정 각각의 심리적 정의 및 감정 상황에 기초하여 세분화하여 감정 범주를 최적화(갱신)할 수 있다. 예를 들어, 감정 분류 장치(100)는 '사랑'과 같이 긍정적 감정이지만 단일 감정 상황으로 매칭되기 어려운 제2감정 정보 대신 '기쁨(joy)', '흥미(interest)'와 같이 심리적 정의 및 감정 상황에 비추어 보다 명확하게 구분될 수 있는 감정 범주를 포함하도록 제1감정 범주에 대응하는 감정 분류 모델의 학습을 위한 제2감정 정보를 정의할 수 있다.More specifically, according to an embodiment of the present application, the emotion classification apparatus 100 generates (constructs) an emotion classification model corresponding to a first emotion category corresponding to a positive emotion, and the second emotion category included in the first emotion category is generated (constructed). It is possible to optimize (update) the emotional category by subdividing the emotional information based on the psychological definition and emotional situation of each emotion. For example, the emotion classification apparatus 100 provides psychological definitions and emotions such as 'joy' and 'interest' instead of second emotion information that is a positive emotion such as 'love' but is difficult to match with a single emotional situation. The second emotion information for learning the emotion classification model corresponding to the first emotion category may be defined to include the emotion category that can be more clearly distinguished in light of the situation.

도 9는 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치의 개략적인 구성도이다.9 is a schematic configuration diagram of an artificial intelligence-based emotion classification apparatus for online data according to an embodiment of the present application.

도 9를 참조하면, 감정 분류 장치(100)는 수집부(110), 학습부(120) 및 분석부(130)를 포함할 수 있다. 또한, 도 9를 참조하면, 수집부(110)는 감정 설정부(111) 및 데이터 분류부(112)를 포함하고, 학습부(120)는 데이터 가공부(121) 및 모델 구축부(122)를 포함할 수 있다.Referring to FIG. 9 , the emotion classification apparatus 100 may include a collection unit 110 , a learning unit 120 , and an analysis unit 130 . Also, referring to FIG. 9 , the collection unit 110 includes an emotion setting unit 111 and a data classification unit 112 , and the learning unit 120 includes a data processing unit 121 and a model building unit 122 . may include.

수집부(110)는, 제1감정 정보가 부여된 복수의 온라인 데이터를 포함하는 학습 데이터를 준비할 수 있다.The collection unit 110 may prepare learning data including a plurality of online data to which the first emotion information is given.

구체적으로, 감정 설정부(111)는 학습 데이터에 포함된 복수의 온라인 데이터에 부여된 제1감정 정보에 기초하여 적어도 하나의 감정 범주에 대응하는 제2감정 정보를 정의할 수 있다. 또한, 데이터 분류부(112)는 수집된 복수의 온라인 데이터를 감정 설정부(111)에 의해 정의된 제2감정 정보에 기초하여 분류할 수 있다.Specifically, the emotion setting unit 111 may define second emotion information corresponding to at least one emotion category based on the first emotion information given to a plurality of online data included in the learning data. Also, the data classification unit 112 may classify the plurality of collected online data based on the second emotion information defined by the emotion setting unit 111 .

학습부(120)는 준비된 학습 데이터에 기초하여, 대상 온라인 데이터가 수신되면 대상 온라인 데이터에 반영된 감정을 파악하는 인공지능 기반의 감정 분류 모델을 구축할 수 있다.The learning unit 120 may build an artificial intelligence-based emotion classification model for recognizing emotions reflected in the target online data when the target online data is received, based on the prepared training data.

구체적으로, 데이터 가공부(121)는 복수의 온라인 데이터 각각을 토큰화하고, 토큰화된 복수의 온라인 데이터 각각으로부터 미리 설정된 품사에 해당하는 부분을 추출할 수 있다.Specifically, the data processing unit 121 may tokenize each of the plurality of online data, and extract a portion corresponding to a preset part-of-speech from each of the plurality of tokenized online data.

또한, 모델 구축부(122)는 추출된 품사 부분에 기초하여 BERT(Bidirectional Encoder Representations from Transformers) 기반의 감정 분류 모델을 생성할 수 있다.Also, the model building unit 122 may generate an emotion classification model based on Bidirectional Encoder Representations from Transformers (BERT) based on the extracted part of speech.

분석부(130)는 대상 온라인 데이터를 수신할 수 있다. 또한, 분석부(130)는 학습된 감정 분류 모델에 기초하여 수신된 대상 온라인 데이터에 대응되는 감정을 파악할 수 있다. 구체적으로, 분석부(130)는 생성된 감정 분류 모델에 기초하여 제2감정 정보에 포함된 감정 범주 중에서 대상 온라인 데이터에 반영된 주된 감정에 대응하는 감정 범주를 결정할 수 있다.The analysis unit 130 may receive target online data. In addition, the analysis unit 130 may identify an emotion corresponding to the received target online data based on the learned emotion classification model. Specifically, the analysis unit 130 may determine an emotion category corresponding to the main emotion reflected in the target online data from among the emotion categories included in the second emotion information based on the generated emotion classification model.

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다.Hereinafter, an operation flow of the present application will be briefly reviewed based on the details described above.

도 10은 본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법에 대한 동작 흐름도이다.10 is an operation flowchart of an AI-based emotion classification method for online data according to an embodiment of the present application.

도 10에 도시된 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법은 앞서 설명된 감정 분류 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 감정 분류 장치(100)에 대하여 설명된 내용은 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법에 대한 설명에도 동일하게 적용될 수 있다.The AI-based emotion classification method for online data shown in FIG. 10 may be performed by the emotion classification apparatus 100 described above. Therefore, even if omitted below, the description of the emotion classification apparatus 100 may be equally applied to the description of the artificial intelligence-based emotion classification method for online data.

도 10을 참조하면, 단계 S11에서 수집부(110)는, 제1감정 정보가 부여된 복수의 온라인 데이터를 포함하는 학습 데이터를 준비할 수 있다.Referring to FIG. 10 , in step S11 , the collection unit 110 may prepare learning data including a plurality of online data to which first emotion information is given.

다음으로, 단계 S12에서 학습부(120)는 준비된 학습 데이터에 기초하여, 대상 온라인 데이터가 수신되면 대상 온라인 데이터에 반영된 감정을 파악하는 인공지능 기반의 감정 분류 모델을 구축할 수 있다.Next, in step S12 , the learning unit 120 may build an artificial intelligence-based emotion classification model for recognizing emotions reflected in the target online data when the target online data is received based on the prepared training data.

구체적으로, 단계 S12에서 데이터 가공부(121)는 복수의 온라인 데이터 각각을 토큰화할 수 있다. 또한, 단계 S12에서 데이터 가공부(121)는 토큰화된 복수의 온라인 데이터 각각으로부터 미리 설정된 품사에 해당하는 부분을 추출할 수 있다.Specifically, in step S12 , the data processing unit 121 may tokenize each of the plurality of online data. Also, in step S12 , the data processing unit 121 may extract a portion corresponding to a preset part-of-speech from each of the plurality of tokenized online data.

또한, 단계 S12에서 모델 구축부(122)는 추출된 품사 부분에 기초하여 BERT(Bidirectional Encoder Representations from Transformers) 기반의 감정 분류 모델을 생성할 수 있다.Also, in step S12 , the model building unit 122 may generate an emotion classification model based on Bidirectional Encoder Representations from Transformers (BERT) based on the extracted part-of-speech part.

한편, 제2감정 정보는 긍정적 감정으로 분류되는 제1감정 범주 및 부정적 감정으로 분류되는 제2감정 범주를 포함할 수 있으며, 이와 관련하여, 단계 S12에서 모델 구축부(122)는 제1감정 범주 및 제2감정 범주에 대한 감정 분류 모델을 개별 생성하도록 제1감정 범주에 포함되는 제2감정 정보를 심리적 정의 및 감정 상황에 기초하여 세분화할 수 있다.Meanwhile, the second emotion information may include a first emotion category classified as a positive emotion and a second emotion category classified as a negative emotion. And the second emotion information included in the first emotion category may be subdivided based on psychological definitions and emotional situations so as to individually generate an emotion classification model for the second emotion category.

다음으로, 단계 S13에서 분석부(130)는 대상 온라인 데이터를 수신할 수 있다. 본원의 일 실시예에 따르면, 단계 S13에서 분석부(130)는 제1감정 정보가 미부여된 대상 온라인 데이터를 수신할 수 있다.Next, in step S13 , the analysis unit 130 may receive the target online data. According to an embodiment of the present application, in step S13, the analysis unit 130 may receive the target online data to which the first emotion information is not assigned.

다음으로, 단계 S14에서 분석부(130)는 학습된 감정 분류 모델에 기초하여 수신된 대상 온라인 데이터에 대응되는 감정을 파악할 수 있다.Next, in step S14 , the analysis unit 130 may identify an emotion corresponding to the received target online data based on the learned emotion classification model.

구체적으로, 단계 S14에서 분석부(130)는 생성된 감정 분류 모델에 기초하여 제2감정 정보에 포함된 감정 범주 중에서 대상 온라인 데이터에 반영된 주된 감정에 대응하는 감정 범주를 결정할 수 있다.Specifically, in step S14, the analysis unit 130 may determine an emotion category corresponding to the main emotion reflected in the target online data from among the emotion categories included in the second emotion information based on the generated emotion classification model.

상술한 설명에서, 단계 S11 내지 S14는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S11 to S14 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present application. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

도 11은 감정 분류 모델의 구축에 필요한 학습 데이터의 가공 프로세스에 대한 세부 동작 흐름도이다.11 is a detailed operation flowchart of a processing process of learning data required for construction of an emotion classification model.

도 11에 도시된 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법은 앞서 설명된 감정 분류 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 감정 분류 장치(100)에 대하여 설명된 내용은 도 11에 대한 설명에도 동일하게 적용될 수 있다.The AI-based emotion classification method for online data shown in FIG. 11 may be performed by the emotion classification apparatus 100 described above. Accordingly, even if omitted below, the description of the emotion classification apparatus 100 may be equally applied to the description of FIG. 11 .

도 11을 참조하면, 단계 S121에서 감정 설정부(111)는 학습 데이터에 포함된 복수의 온라인 데이터에 부여된 제1감정 정보에 기초하여 적어도 하나의 감정 범주에 대응하는 제2감정 정보를 정의할 수 있다.Referring to FIG. 11 , in step S121 , the emotion setting unit 111 defines second emotion information corresponding to at least one emotion category based on the first emotion information given to a plurality of online data included in the learning data. can

다음으로, 단계 S122에서 데이터 분류부(112)는 수집된 복수의 온라인 데이터를 단계 S1212에서 정의된 제2감정 정보에 기초하여 분류할 수 있다.Next, in step S122 , the data classification unit 112 may classify the plurality of collected online data based on the second emotion information defined in step S1212 .

상술한 설명에서, 단계 S121 및 S122는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S121 and S122 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present application. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

본원의 일 실시예에 따른 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The AI-based emotion classification method for online data according to an embodiment of the present application may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. Examples of the computer readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

또한, 전술한 온라인 데이터에 대한 인공지능 기반의 감정 분류 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.In addition, the AI-based emotion classification method for online data described above may be implemented in the form of a computer program or application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present application is for illustration, and those of ordinary skill in the art to which the present application pertains will understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present application.

10: 온라인 데이터 분석 시스템
100: 온라인 데이터에 대한 인공지능 기반의 감정 분류 장치
110: 수집부
111: 감정 설정부
112: 데이터 분류부
120: 학습부
121: 데이터 가공부
122: 모델 구축부
130: 분석부
20: 네트워크
200: 스토리지 서버
300: 사용자 단말10: Online data analysis system
100: Artificial intelligence-based emotion classification device for online data
110: collection unit
111: emotion setting unit
112: data classification unit
120: study unit
121: data processing unit
122: model building unit
130: analysis unit
20: network
200: storage server
300: user terminal

Claims

In the AI-based emotion classification method for online data,
Preparing learning data including a plurality of online data to which the first emotion information is given;
building an artificial intelligence-based emotion classification model for recognizing emotions reflected in the target online data when the target online data is received based on the learning data;
receiving the target online data; and
Recognizing an emotion corresponding to the target online data based on the emotion classification model;
Including, emotion classification method.

According to claim 1,
The preparing step is
defining second emotion information corresponding to at least one emotion category based on the first emotion information; and
classifying the plurality of online data based on the second emotion information;
A method for classifying emotions that includes.

3. The method of claim 2,
Wherein the second emotion information is defined to include a smaller number of emotion categories than the first emotion information.

4. The method of claim 3,
The second emotional information,
A method for classifying emotions, which is defined based on a Parrott emotion model.

3. The method of claim 2,
evaluating the consistency and representativeness of the plurality of online data classified in response to the second emotion information;
Which will further include, emotion classification method.

According to claim 1,
The receiving step is
The emotion classification method of receiving the target online data to which the first emotion information is not granted.

3. The method of claim 2,
The building step is
tokenizing each of the plurality of online data;
extracting a portion corresponding to a preset part-of-speech from each of the plurality of tokenized online data; and
Generating a BERT (Bidirectional Encoder Representations from Transformers)-based emotion classification model based on the extracted part;
A method for classifying emotions that includes.

5. The method of claim 4,
The step of figuring out is
The emotion classification method of determining an emotion category corresponding to the main emotion reflected in the target online data from among the emotion categories included in the second emotion information based on the emotion classification model.

9. The method of claim 8,
The second emotional information,
and comprising at least one emotion category of anger, fear, sadness, surprise, joy and love.

10. The method of claim 9,
The second emotional information,
a first emotion category classified as positive emotion and a second emotion category classified as negative emotion;
The emotion classification model is characterized in that it is separately generated for the first emotion category and the second emotion category, emotion classification method.

11. The method of claim 10,
The building step is
subdividing the second emotion information included in the first emotion category based on psychological definitions and emotional situations;
A method for classifying emotions that includes.

In an artificial intelligence-based emotion classification device for online data,
a collecting unit for preparing learning data including a plurality of online data to which first emotion information is given;
a learning unit for constructing an artificial intelligence-based emotion classification model for recognizing emotions reflected in the target online data when the target online data is received, based on the learning data; and
an analysis unit for receiving the target online data and identifying emotions corresponding to the target online data based on the emotion classification model;
Including, emotion classification device.

13. The method of claim 12,
The collection unit,
an emotion setting unit defining second emotion information corresponding to at least one emotion category based on the first emotion information; and
a data classification unit for classifying the plurality of online data based on the second emotion information;
That comprising a, emotion classification device.

14. The method of claim 13,
The learning unit,
a data processing unit for tokenizing each of the plurality of online data and extracting a portion corresponding to a preset part-of-speech from each of the plurality of tokenized online data; and
A model building unit for generating an emotion classification model based on BERT (Bidirectional Encoder Representations from Transformers) based on the extracted part;
That comprising a, emotion classification device.

14. The method of claim 13,
The second emotion information is defined based on a Parrott emotion model,
The analysis unit,
The emotion classification apparatus that determines an emotion category corresponding to the main emotion reflected in the target online data from among the emotion categories included in the second emotion information based on the emotion classification model.