KR102359466B1

KR102359466B1 - Deep metadata based emotion analysis method and system

Info

Publication number: KR102359466B1
Application number: KR1020200022337A
Authority: KR
Inventors: 양진홍; 정재은; 고예은; 김주현
Original assignee: 인제대학교 산학협력단
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2022-02-08
Also published as: KR20210107393A

Abstract

딥 메타데이터 기반 감성분석 방법 및 그 시스템이 개시된다. 일 실시예에 따른 컴퓨터로 구현되는 감성 분석 시스템은, 게시글과 관련된 데이터를 수집하는 크롤러 모듈; 상기 수집된 데이터의 전처리를 수행하는 데이터 전처리 모듈; 상기 전처리가 수행된 데이터로부터 기 구축된 감성 사전에 기초하여 필터링된 감성 단어에 대한 감성 점수를 획득하는 감성 분석 모듈; 및 상기 획득된 감성 점수를 이용하여 상기 게시글의 감성 분석 정보와 관련된 서비스를 제공하는 서비스 모듈을 포함할 수 있다. A deep metadata-based sentiment analysis method and system are disclosed. A sentiment analysis system implemented by a computer according to an embodiment includes: a crawler module for collecting data related to postings; a data pre-processing module for pre-processing the collected data; a sentiment analysis module for obtaining a sentiment score for a sentiment word filtered based on a pre-established sentiment dictionary from the pre-processed data; and a service module for providing a service related to the sentiment analysis information of the post by using the acquired sentiment score.

Description

Deep metadata-based sentiment analysis method and system {DEEP METADATA BASED EMOTION ANALYSIS METHOD AND SYSTEM}

아래의 설명은 감성을 분석하는 기술에 관한 것이다. The description below relates to the technology for analyzing emotions.

감성분석(Sentiment Analysis)은 텍스트에서 사람들의 의견, 감정, 평가, 태도를 분석하는 연구 분야이다. 또한, 감성분석은 주어진 텍스트에서 긍정, 부정 혹은 중립을 파악하는 것을 기본으로 하고 이를 텍스트의 polarity라고 한다. 이러한 감성분석은 소셜 미디어를 통해 개인의 의사 표현이 다양해지면서 사용자의 감정을 기반으로 한 비즈니스에 중요한 요소로 활용되고 있다. 실제로 주관적 태도나 감성을 분석하고자 여러 차례 진행되고 있다. 하지만 한국어 감성분석은 한국어 외의 요소를 배제하여 분석하는 경우가 많다. 정확한 감성분석을 위해서 텍스트 전체를 해석함이 중요하다. 특히, 이모지는 표정의 약어이자 비언어적 요소로서 텍스트에 인간의 심리상태를 반영하여 내용을 보완한다.Sentiment Analysis is a research field that analyzes people's opinions, emotions, evaluations, and attitudes in texts. In addition, sentiment analysis is based on identifying positive, negative, or neutral in a given text, and this is called polarity of the text. Such sentiment analysis is being used as an important element in business based on user emotions as individual expression of opinions is diversified through social media. In fact, it is being conducted several times to analyze subjective attitudes and emotions. However, Korean sentiment analysis often excludes elements other than Korean. For accurate sentiment analysis, it is important to interpret the entire text. In particular, emoji is an abbreviation of facial expression and a non-verbal element, reflecting the human psychological state in the text to supplement the content.

딥 메타데이터 기반 감성 분석 방법 및 시스템을 제공할 수 있다. It is possible to provide a deep metadata-based sentiment analysis method and system.

게시글과 관련하여 크롤링된 데이터를 감성 사전에 기초하여 필터링함에 따라 감성 점수를 획득하는 방법 및 시스템을 제공할 수 있다.It is possible to provide a method and system for obtaining a sentiment score by filtering crawled data related to a post based on a sentiment dictionary.

획득된 감성 정보를 이용하여 게시글의 감성 분석 정보와 관련된 서비스를 제공할 수 있다.Using the acquired emotional information, it is possible to provide a service related to the sentiment analysis information of the post.

컴퓨터로 구현되는 감성 분석 시스템은, 게시글과 관련된 데이터를 수집하는 크롤러 모듈; 상기 수집된 데이터의 전처리를 수행하는 데이터 전처리 모듈; 상기 전처리가 수행된 데이터로부터 기 구축된 감성 사전에 기초하여 필터링된 감성 단어에 대한 감성 점수를 획득하는 감성 분석 모듈; 및 상기 획득된 감성 점수를 이용하여 상기 게시글의 감성 분석 정보와 관련된 서비스를 제공하는 서비스 모듈을 포함할 수 있다. The computer-implemented sentiment analysis system includes: a crawler module for collecting data related to postings; a data pre-processing module for pre-processing the collected data; a sentiment analysis module for obtaining a sentiment score for a sentiment word filtered based on a pre-established sentiment dictionary from the pre-processed data; and a service module for providing a service related to the sentiment analysis information of the post by using the acquired sentiment score.

상기 감성 분석 모듈은, 상기 전처리가 수행된 데이터로부터 기 구축된 한국어 감성 사전 및 이모지 감성 사전에 기초하여 감성 단어를 필터링하고, 상기 필터링된 감성 단어에 상기 필터링된 감성 단어에 설정되어 있는 감성 점수를 매칭하여 게시글의 감성 점수를 계산할 수 있다.The sentiment analysis module filters sentiment words based on a pre-established Korean sentiment dictionary and emoji sentiment dictionary from the pre-processed data, and a sentiment score set in the filtered sentiment words in the filtered sentiment words can be matched to calculate the sentiment score of the post.

상기 감성 분석 모듈은, 감성 분석을 위한 한국어 감성 사전 및 이모지(Emoji) 감성 사전을 포함하는 감성 사전을 메타데이터 형태로 구축하는 것을 포함하고, 상기 이모지 감성 사전은, 유니코드 이름(Unicode Name), CLDR, 점수를 포함하는 감성사전 데이터를 저장하고, 상기 한국어 감성 사전은, 감성 단어, 어근, 점수를 포함하는 감성사전 데이터를 저장할 수 있다. The emotion analysis module includes constructing an emotion dictionary including a Korean emotion dictionary and an emoji emotion dictionary for emotion analysis in the form of metadata, and the emoji emotion dictionary includes a Unicode name (Unicode Name). ), CLDR, and sentiment dictionary data including scores, and the Korean sentiment dictionary may store sentiment dictionary data including sentiment words, roots, and scores.

상기 감성 분석 모듈은, 상기 게시글의 감성 점수를 기반으로 한국어 감성 사전에 포함된 각각의 감성 단어의 점수에 대한 가중치를 부여할 수 있다. The sentiment analysis module may assign a weight to the score of each sentiment word included in the Korean sentiment dictionary based on the sentiment score of the post.

상기 감성 분석 모듈은, 상기 전처리를 수행함에 따라 게시글의 데이터에서 형태소가 분석된 결과의 어근과 기 구축된 감성 사전에 존재하는 감성 단어의 어근을 비교하여 상기 형태소가 분석된 결과에 감성 점수를 부여하는 제1 방법, 상기 기 구축된 감성 사전에 존재하는 감성 단어와 게시글의 단어가 일치할 경우, 일치하는 게시글의 단어에 감성 점수를 부여하는 제2 방법 또는 상기 기 구축된 감성 사전에 수록되어 있는 감성 단어와 게시글의 단어를 비교함에 따라 상기 게시글의 단어가 상기 기 구축된 감성 사전에 존재하지 않을 경우, 상기 전처리를 수행함에 따라 게시글의 데이터에서 형태소가 분석된 결과의 어근과 감성 단어의 어근을 비교하여 상기 형태소가 분석된 결과에 감성 점수를 부여하는 제 3방법 중 어느 하나의 방법을 이용하여 감성 점수를 부여할 수 있다. The sentiment analysis module compares the root of the result of analyzing the morpheme in the data of the post with the root of the sentiment word existing in the pre-constructed sentiment dictionary as the pre-processing is performed, and gives a sentiment score to the result of the analysis of the morpheme A first method of giving a sentiment score to a word of a post that matches a word of a post with a sentiment word existing in the pre-established sentiment dictionary, or a second method of giving a sentiment score to the word of the post When the emotion word and the word of the post are compared and the word of the post does not exist in the pre-established emotional dictionary, the root of the result of morpheme analysis in the data of the post and the root of the emotion word as the pre-processing is performed By comparison, an emotional score may be given by using any one of the third methods for assigning an emotional score to the result of analyzing the morpheme.

상기 감성 분석 모듈은, 상기 전처리를 수행함에 따라 게시글의 형태소가 분석된 결과의 CLDR과 이모지 감성 사전의 CLDR이 일치할 경우, 상기 형태소가 분석된 결과에 점수를 부여할 수 있다. When the CLDR of the morpheme analysis result of the post and the CLDR of the emoji sentiment dictionary coincide with the pre-processing, the sentiment analysis module may assign a score to the morpheme analysis result.

상기 감성 분석 모듈은, 상기 게시글에서 감성을 나타내는 단어를 각 점수대별 또는 가중치가 부여된 각 점수대별로 구별하여 단어의 개수를 카운트함으로써 상기 게시글의 긍정 수치 또는 부정 수치를 백분율로 계산하여 나타낼 수 있다. The sentiment analysis module may calculate and represent the positive or negative values of the posting as a percentage by counting the number of words by distinguishing the words representing emotions in the posting for each score or each weighted score.

상기 감성 분석 모듈은, 상기 게시글에 존재하는 감정 단어를 합한 수를 포함하는 총 개수 대비 기 설정된 범위의 점수대의 감성 단어 개수의 비율을 계산하여 상기 게시글의 근정 또는 부정 수치를 나타낼 수 있다. The sentiment analysis module may calculate a ratio of the number of sentiment words in a score range of a preset range to the total number including the sum of the emotional words existing in the posting to indicate the true or negative value of the posting.

상기 서비스 모듈은, 상기 서비스에 권리자 권한으로 로그인됨에 따라 상기 게시글의 데이터를 분석한 감정 변화를 그래프로 나타내고, 상기 게시글의 긍정 수치 또는 부정 수치에 따라 감성 날씨를 나타낼 수 있다. The service module may represent the emotional change by analyzing the data of the post as a graph as the user logs into the service with the right of the right holder, and may represent emotional weather according to the positive or negative value of the post.

상기 서비스 모듈은, 상기 게시글의 감성 수치 및 감성 분포를 포함하는 통계 정보를 나타내고, 기 설정된 카테고리에 기초하여 분류된 게시글을 표시하고, 상기 게시글에서 감성 단어에 대응하는 영역을 하이라이팅할 수 있다. The service module may display statistical information including a sentiment value and emotion distribution of the post, display a post classified based on a preset category, and highlight an area corresponding to a sentiment word in the post.

상기 서비스 모듈은, 상기 게시글의 감성 분석 정보에 대해 사용자 개별 관리를 위한 서비스 화면을 제공하고, 상기 제공된 서비스 화면을 통하여 표시된 사용자의 게시글에 댓글, 쪽지를 포함하는 어느 하나의 의사 소통 방법을 이용하여 관리자의 응답이 수행됨에 따라 감정 분석 결과가 불안한 사용자와 연락을 가능하게 할 수 있다. The service module provides a service screen for individual user management of the sentiment analysis information of the post, and uses any one communication method including a comment and a note on the user's post displayed through the provided service screen. As the manager's response is performed, the result of the sentiment analysis may enable contact with the anxious user.

상기 크롤러 모듈은, 웹 페이지에 접속함에 따라 게시글의 주소를 크롤링하고, 상기 크롤링된 게시글의 주소를 통해 게시글에 접속하고, 상기 접속된 게시글의 본문 및 댓글을 크롤링하여 각각의 리스트에 저장하고, 상기 저장된 리스트의 각각을 데이터프레임에 저장한 후, 상기 데이터프레임에 저장된 리스트에 대하여 게시글 주소를 파일명으로 가지는 JSON 파일로 저장할 수 있다. The crawler module crawls the address of the post as it accesses the web page, accesses the post through the address of the crawled post, crawls the body and comments of the accessed post, and stores it in each list, and the After storing each of the stored lists in the data frame, it is possible to store the list stored in the data frame as a JSON file having a post address as a file name.

상기 데이터 전처리 모듈은, 상기 수집된 데이터에 포함된 게시글의 제목와 본문 내용으로부터 노이즈 데이터를 제거하고, 상기 노이즈 데이터가 제거된 게시글의 제목과 본문 내용에서 표현 방법이 다른 단어들을 통합하여 동일한 단어로 생성하는 텍스트 정규화를 수행하고, 상기 정규화된 텍스트에 띄어쓰기를 실행한 후, 형태소 분석을 수행함에 따라 형태소 분석된 단어들을 게시글의 본문과 제목을 구별하여 JSON 형태로 저장할 수 있다. The data pre-processing module removes noise data from the title and body content of the post included in the collected data, and combines words with different expression methods in the title and body content of the post from which the noise data has been removed to create the same word After performing text normalization, space is placed on the normalized text, and then morpheme analysis is performed, the morpheme-analyzed words can be stored in JSON format by distinguishing the body and title of the post.

컴퓨터로 구현되는 감성 분석 시스템에 의해 수행되는 감성 분석 방법은, 게시글과 관련된 데이터를 수집하는 단계; 상기 수집된 데이터의 전처리를 수행하는 단계; 상기 전처리가 수행된 데이터로부터 기 구축된 감성 사전에 기초하여 필터링된 감성 단어에 대한 감성 점수를 획득하는 단계; 및 상기 획득된 감성 점수를 이용하여 상기 게시글의 감성 분석 정보와 관련된 서비스를 제공하는 단계를 포함할 수 있다. A sentiment analysis method performed by a computer-implemented sentiment analysis system includes: collecting data related to postings; performing pre-processing of the collected data; obtaining a sentiment score for the filtered sentiment word based on a pre-established sentiment dictionary from the pre-processed data; and using the acquired sentiment score to provide a service related to sentiment analysis information of the post.

이모지 감성 사전 및 한국어 감성 사전을 구축하여 게시글에 대한 감성 분석의 정확도를 향상시킬 수 있다. By building an emoji sentiment dictionary and a Korean sentiment dictionary, you can improve the accuracy of sentiment analysis for posts.

감성 분석뿐만 아니라 감성 상태 정보와 관련된 서비스의 관리와 게시를 통합적으로 수행할 수 있다. In addition to sentiment analysis, management and posting of services related to emotional state information can be performed integrally.

도 1은 일 실시예에 따른 감성 분석 시스템의 구조를 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 감성 분석 시스템에서 감성 사전에 데이터를 저장하는 것을 설명하기 위한 예이다.
도 3은 일 실시예에 따른 감성 분석 시스템에서 데이터를 크롤링하는 동작을 설명하기 위한 흐름도이다.
도 4 내지 도 6은 일 실시예에 따른 감성 분석 시스템에서 리스트를JSON파일로 저장하는 방식에 대하여 설명하기 위한 예이다
도 7은 일 실시예에 따른 감성 분석 시스템에서 데이터 전처리를 수행하는 동작을 설명하기 위한 흐름도이다.
도 8은 일 실시예에 따른 감성 분석 시스템에서 감성 단어를 매칭하는 동작을 설명하기 위한 예이다.
도 9 내지 도 14는 감성 분석 시스템에서 분석된 감성 상태 정보와 관련된 서비스를 제공하는 동작을 설명하기 위한 도면이다. 1 is a diagram for explaining the structure of a sentiment analysis system according to an embodiment.
2 is an example for explaining storing data in the sentiment dictionary in the sentiment analysis system according to an embodiment.
3 is a flowchart illustrating an operation of crawling data in a sentiment analysis system according to an embodiment.
4 to 6 are examples for explaining a method of storing a list as a JSON file in the sentiment analysis system according to an embodiment
7 is a flowchart illustrating an operation of performing data pre-processing in the sentiment analysis system according to an exemplary embodiment.
8 is an example for explaining an operation of matching sentiment words in the sentiment analysis system according to an embodiment.
9 to 14 are diagrams for explaining an operation of providing a service related to emotional state information analyzed in the emotion analysis system.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 감성 분석 시스템의 구조를 설명하기 위한 도면이다.1 is a diagram for explaining the structure of a sentiment analysis system according to an embodiment.

감성 분석 시스템(100)은 감성을 분석하기 위한 것으로, 크롤러 모듈(110), 데이터 전처리 모듈(120), 감성 분석 모듈(130) 및 서비스 모듈(140)을 포함할 수 있다.The sentiment analysis system 100 is for analyzing emotions, and may include a crawler module 110 , a data preprocessing module 120 , a sentiment analysis module 130 , and a service module 140 .

크롤러 모듈(110)은 감성 분석을 하기 위한 게시글의 데이터를 크롤링할 수 있다. 게시글이란, SNS, 커뮤니티 등과 같이 인터넷 상에서 의사소통 또는 정보 공유를 위하여 게시/업로드된 글을 의미할 수 있다. 이때, 크롤러 모듈(110)은 텍스트 데이터뿐만 아니라 비디오 데이터 또는 오디오 데이터에 대해서도 SST(Speech To Text) 등의 기능을 이용하여 데이터를 크롤링할 수 있다. 크롤러 모듈(110)은 크롤링된 데이터를 해당 시점의 주변 메타데이터에 대해서도 함께 기록할 수 있다. 예를 들면, 크롤러 모듈(110)은 사이트 정보, 카테고리 정보, 댓글, 공유 등의 소셜 지표를 기록할 수 있다. 또는, 크롤러 모듈(110)은 관련 광고 노출 정보(예를 들면, URL 기반, 필요 시 정보도 함께 저장)를 기록할 수 있다. The crawler module 110 may crawl data of posts for sentiment analysis. A post may mean an article posted/uploaded for communication or information sharing on the Internet, such as SNS or a community. In this case, the crawler module 110 may crawl data not only for text data but also for video data or audio data using a function such as Speech to Text (SST). The crawler module 110 may record crawled data together with metadata surrounding the corresponding time point. For example, the crawler module 110 may record social indicators such as site information, category information, comments, and sharing. Alternatively, the crawler module 110 may record related advertisement exposure information (eg, URL-based, information is also stored if necessary).

데이터 전처리 모듈(120)은 텍스트 데이터의 용도에 맞게 감성 분석이 가능한 형태로 데이터를 전처리할 수 있다. 구체적으로, 데이터 전처리 모듈(120)은 카테고리 분류, 정제 및 정규화 및 형태소 분석 등을 수행할 수 있다. The data pre-processing module 120 may pre-process the data in a form that enables sentiment analysis according to the purpose of the text data. Specifically, the data preprocessing module 120 may perform category classification, refinement and normalization, morphological analysis, and the like.

감성 분석 모듈(130)은 전처리된 가공 데이터를 입력하여 감성 단어를 필터링하고, 사전에 설정되어 있는 감성 점수를 매칭하여 총 감성 점수를 계산할 수 있다. 감성 분석 모듈(130)은 게시글 전체의 감성 점수를 기반으로 한국어 감성 사전에 포함된 개별 감성 단어의 점수에 대한 가중치를 부여할 수 있다. 이때, 가중치 정보는 카테고리별, 기간별, 사이트별 등 특정 기준에 기초하여 관리가 가능하며, 관리자 개입을 통한 별도의 가중치 설정이 가능하다. ML/DL등을 이용하여 시스템을 구성하는 경우, 가중치 파라미터 등이 해당될 수 있다. 감성 분석 모듈(130)은 시점을 기준으로 'N(N은 자연수)회' 이상 반복될 수 있으며, 사전에 등록되지 않은 단어가 존재할 경우, 신조어로 판단하고, 판단된 신조어를 관리자에게 리포팅하여 빠르게 시스템 상에 추가할 수 있는 기능을 제공할 수 있다. The sentiment analysis module 130 may input the preprocessed processed data to filter the sentiment words, and may calculate the total sentiment score by matching the sentiment score set in advance. The sentiment analysis module 130 may give weights to the scores of individual sentiment words included in the Korean sentiment dictionary based on the sentiment scores of the entire post. In this case, weight information can be managed based on specific criteria such as category, period, and site, and a separate weight can be set through manager intervention. When a system is configured using ML/DL, a weight parameter, etc. may correspond. The sentiment analysis module 130 may be repeated more than 'N (N is a natural number)' times based on the time point, and if there is a word that is not registered in the dictionary, it is determined as a new word, and the determined new word is reported to the manager to quickly It can provide functions that can be added to the system.

예를 들면, 감성 분석 모듈(130)은 전처리된 가공 데이터를 감로부터 감성 단어를 필터링하기 위하여, 감성 어휘(단어)를 설정해놓을 수 있다. 감성 분석 모듈(130)은 한국어 감성 사전과 이모지 감성 사전을 딥러닝과 메타데이터가 결합된 딥 메타데이터 형태로 구축할 수 있다. 이때, 메타데이터 상세 정보는 가중치 정보와 조합 가능한 형태로 구성될 수 있다. 예를 들면, 메타데이터 상세 정보는 게시글의 사이트, 카테고리, 크롤링 시점과 같은 기본 정보 등이 포함될 수 있고, 특정 주변 단어와의 빈도 및 위치 벡터 값 등이 기록될 수 있고, ML/DL 등을 사용하는 경우, 해당 네트워크 구성에 필요한 가중치 정보 및 신경망 가중치 파라미터 등이 포함될 수 있다. 또한, 감성 분석 모듈(130)은 STT로부터 추출된 데이터의 경우, 음의 피치 값 등을 가중치 정보로 활용할 수 있다. For example, the sentiment analysis module 130 may set a sentiment vocabulary (word) in order to filter sentiment words from the preprocessed processed data. The sentiment analysis module 130 may construct the Korean sentiment dictionary and the emoji sentiment dictionary in the form of deep metadata in which deep learning and metadata are combined. In this case, the detailed metadata information may be configured in a form that can be combined with weight information. For example, detailed metadata information may include basic information such as the site, category, and crawling time of the posting, frequency and position vector values with specific surrounding words may be recorded, and ML/DL may be used. In this case, weight information and neural network weight parameters necessary for configuring a corresponding network may be included. Also, in the case of data extracted from the STT, the sentiment analysis module 130 may use a negative pitch value as weight information.

구축된 감성 사전은 감성 분석 시 기반이 되는 자료로 사용될 수 있다. 한국어 감성 사전은 인간의 보편적인 기본 감성 표현을 나타내는 감성 단어로 구성될 수 있으며, 표준국어대사전의 뜻풀이를 딥러닝 기법을 사용하여 감성을 분류할 수 있다. 이모지 감성 사전은 이모지 감성 랭킹을 활용하여 나타내며, 특정 서비스(예를 들면, 트윗)에 대한 데이터 세트 중 가장 빈번하게 사용되는 이모지로 감성 사전이 형성될 수 있다. 한국어 감성 사전과 이모지 감성 사전은 서로 다른 기준으로 구성될 수 있으며, 이러한 서로 다른 기준으로 구성된 한국어 감성 사전과 이모지 감성 사전의 이모지 감성 사전이 정규화되어 사용될 수 있다. 도 2를 참고하면, 한국어 감성 사전의 JSON 파일 저장 예제와 이모지 감성 사전 JSON 파일 저장 예제를 나타낸 것이다. 이모지 감성 사전 구축 방법은 이모지 모듈을 사용하여 이모지 형태가 아닌 CLDR 형태로 변환하여 사전에 저장할 수 있다. 이때, CLDR는 유니코드 컨소시엄에서 XML 형태로 제공하는 로케일 정보를 의미할 수 있다. 예를 들면, 이모티콘 '

' 표시는 CLDR에서 Loudly Crying Face로 변환될 수 있다. 감성 사전 데이터는 JSON 파일 형태로 저장될 수 있다. 예를 들면, 한국어 감성 사전은 단어, 어근, 점수, 이모지 감성 사전은 유니코드 이름(Unicode Name), CLDR, 점수 순으로 사전을 구성할 수 있다. 예를 들면, 형태소 분석기는 한국어를 지원하는 'MeCab'을 이용할 수 있다. 한국어 감성 사전은 감성 어휘, 어근, polarity 순으로, 이모지 감성 랭킹은 유니코드(Unicode), CLDR Short Name, polarity 순으로 사전을 구성할 수 있다. 실시예에서는, 분석된 감성 어휘 및 이모지를 감성 요소라고 하며, polarity를 감성 지수라고 기재하기로 한다. 일례로, 소셜 미디어에 작성된 게시글 등을 수집하고 형태소 분석을 통해 감성 요소가 추출될 수 있다. 한국어 감성 사전과 이모지 감성 사전을 비교하여 추출된 감성 요소를 필터링하고, 각 사전과 일치하는 필터링된 감성 요소는 필터링된 감성 요소에 해당하는 값과 그에 따른 점수를 반환하여 게시글의 긍정 또는 부정 점수를 나타낼 수 있다. The constructed sentiment dictionary can be used as a basis for sentiment analysis. The Korean emotional dictionary can be composed of emotional words that represent universal basic emotional expressions of humans, and the deep learning technique can be used to categorize the emotions of the standard Korean dictionary. The emoji sentiment dictionary is indicated by utilizing the emoji sentiment ranking, and the sentiment dictionary may be formed with the most frequently used emoji among data sets for a specific service (eg, tweet). The Korean sentiment dictionary and the emoji sentiment dictionary may be composed of different criteria, and the emoji sentiment dictionary of the Korean sentiment dictionary and the emoji sentiment dictionary composed of these different criteria may be normalized and used. Referring to FIG. 2 , an example of saving a JSON file of the Korean sentiment dictionary and an example of saving a JSON file of the emoji sentiment dictionary are shown. The emoji emotion dictionary construction method can be stored in advance by converting it to a CLDR format rather than an emoji format using an emoji module. In this case, CLDR may mean locale information provided by the Unicode Consortium in XML format. For example, emoticons'

' mark can be converted from CLDR to Loudly Crying Face. The sentiment dictionary data may be stored in a JSON file format. For example, the Korean sentiment dictionary may configure the dictionary in the order of word, root, score, and the emoji sentiment dictionary, Unicode name, CLDR, and score. For example, the morpheme analyzer may use 'MeCab' that supports Korean. The Korean emotional dictionary can be composed in the order of emotional vocabulary, root, and polarity, and the emoji emotional ranking in the order of Unicode, CLDR Short Name, and polarity. In the embodiment, the analyzed emotional vocabulary and emoji are referred to as emotional elements, and polarity is described as emotional index. For example, an emotional element may be extracted through collection of posts written on social media and the like and morphological analysis. By comparing the Korean sentiment dictionary and the emoji sentiment dictionary, the extracted emotional elements are filtered, and the filtered emotional element matching each dictionary returns the value corresponding to the filtered emotional element and the corresponding score to give the positive or negative score of the post. can represent

감성 분석 모듈(130)는 감성 분석 시, 필터링되지 않는 단어 중 감성 단어가 존재할 경우, 별도 관리자에게 리포팅 후, 신조어 단어로 승인되면 단어와 점수를 감성 사전에 추가할 수 있다. 이때, 사용자(관리자)의 개입 전에도 신조어로 판단되는 경우, 가중치로 활용될 수 있다. The sentiment analysis module 130 may add the word and score to the sentiment dictionary when, upon sentiment analysis, if there is an emotional word among the words that are not filtered, it is reported to a separate manager and approved as a neologism word. At this time, if it is determined as a neologism even before the intervention of the user (administrator), it may be used as a weight.

구체적으로, 도 8을 참고하면, 감성 단어를 매칭하는 동작에 대하여 설명하기로 한다. 일례로, 감성 분석 모듈은 한글의 경우, 게시글로부터 형태소 분석된 결과의 어근과 감성 단어의 어근만을 비교하여 상기 형태소 분석된 결과의 어근에 점수를 부여할 수 있다(방법 1). 다른 예로서, 감성 분석 모듈은 감성 사전에 수록되어 있는 단어와 게시글의 단어가 완전히 일치할 경우, 상기 게시글의 단어에 점수를 부여할 수 있다(방법 2). 또 다른 예로서, 감성 분석 모듈은 방법 2를 적용한 후, 해당 단어가 사전에 없을 경우 방법 1을 적용할 수 있다. 감성 분석 모듈(130)은 이모지의 경우, 게시글로부터 형태소 분석된 결과의 CLDR과 이모지 감성 사전의 CLDR이 일치할 경우, 형태소 분석된 결과에 점수를 부여할 수 있다. Specifically, referring to FIG. 8 , an operation of matching emotional words will be described. For example, in the case of Hangul, the emotion analysis module may compare only the root of the result of morpheme analysis from the post with the root of the emotional word, and may give a score to the root of the result of the morpheme analysis (Method 1). As another example, the sentiment analysis module may give a score to the word of the post when the word included in the sentiment dictionary completely matches the word of the post (Method 2). As another example, after applying method 2, the sentiment analysis module may apply method 1 when the corresponding word does not exist in the dictionary. In the case of an emoji, when the CLDR of the morpheme analysis result from the posting matches the CLDR of the emoji sentiment dictionary, the emotion analysis module 130 may give a score to the morpheme analysis result.

감성 분석 모듈(130)은 감성 점수 계산을 다양한 방식으로 진행하여 웹 페이지에 표시할 수 있는 형태로 저장할 수 있다. 예를 들면, 대학생 감정 모니터링의 경우, 각 게시글의 긍정/부정 점수, 주차별 감성 점수, 카테고리별 감성 점수 등을 도출할 수 있다. 이때, 감성 분석 모듈은 긍정/부정 점수(수치)를 백분율로 계산하여 나타내는 다양한 방법이 존재할 수 있다. 일례로, 감성 분석 모듈은 하나의 게시글 내 감성을 나타내는 단어를 각 점수대별 또는 가중치가 부여된 점수대별로 구별하여 단어의 개수를 카운트할 수 있다. 예를 들면, 매우 불안(-2~-1), 불안(-1~0), 보통(0), 양호(0~+1), 매우 양호(+1~+2)로 구별하여 단어의 개수가 카운트될 수 있다. 다른 예로서, 감성 분석 모듈은 하나의 게시글 내에서 각 감정이 차지하는 비율을 도출할 수 있다. 예를 들면, 0 점인 단어(감정을 가지지 않는 단어)를 제외한 감정 단어를 합한 수를 의미하는 총 개수 대비 각 점수대의 감성 단어의 개수의 비율이 계산될 수 있다(각 점수대의 감성단어 개수 / 총 개수). 또 다른 예로서, 감성 분석 모듈은 매우 양호와 양호를 가산한 값을 긍정, 매우 불안과 불안을 가산한 값을 부정으로 판단할 수 있다. 또 다른 예로서, 감성 분석 모듈은 긍정과 부정의 차이를 통해 게시글의 감성을 기 설정된 단계(예를 들면, 5단계)로 재분류할 수 있다. 예를 들면, 매우 불안(-50%~-25%), 불안(-25%~-5%), 보통(-5%~+5%), 양호(5%~25%), 매우 양호(25%~50%)로 재분류될 수 있다. The emotion analysis module 130 may calculate the emotion score in various ways and store it in a form that can be displayed on a web page. For example, in the case of college student emotion monitoring, positive/negative scores for each post, emotional scores for each parking, emotional scores for each category, and the like can be derived. In this case, various methods may exist for the sentiment analysis module to calculate and display the positive/negative score (numerical value) as a percentage. For example, the sentiment analysis module may count the number of words by discriminating words representing emotions in one post for each score or for each weighted score. For example, the number of words divided into very anxious (-2 to -1), anxious (-1 to 0), average (0), good (0 to +1), and very good (+1 to +2). can be counted. As another example, the sentiment analysis module may derive a ratio of each emotion within one post. For example, the ratio of the number of emotional words in each score range to the total number that means the sum of the emotional words excluding the word with 0 points (words that do not have emotions) may be calculated (the number of emotional words in each score/total number of emotional words) Count). As another example, the sentiment analysis module may determine that a value obtained by adding very good and good is positive, and a value obtained by adding very anxiety and anxiety is negative. As another example, the sentiment analysis module may reclassify the emotion of the post into a preset step (eg, step 5) through the difference between positive and negative. For example, very anxious (-50% to -25%), anxious (-25% to -5%), moderate (-5% to +5%), good (5% to 25%), very good ( 25% to 50%).

서비스 모듈(140)은 분석된 감성 상태 정보를 제공하기 위한 서비스 화면(유저 인터페이스)을 제공할 수 있다. 서비스 모듈(140)은 로그인 화면, 대시보드 화면, 통계 화면, 카테고리 화면 및 셋팅 화면을 제공할 수 있다. 서비스 모듈(140)은 로그인 화면을 통하여 관리자 권한으로 로그인 시, 관련 데이터에 접근이 가능하도록 제공할 수 있다. 서비스 모듈(140)은 대시보드 화면을 통하여 분석된 데이터의 감정 변화 월별 통계 그래프를 표현할 수 있다. 서비스 모듈(140)은 통계 화면을 통하여 감정 변화 기간별(예를 들면, 월별, 주차별, 일자별) 통계 그래프 및 감성 수치, 감성 분포를 표현할 수 있다. 서비스 모듈(140)은 카테고리 화면을 통하여 카테고리된 데이터를 표현하고, 세부 감성 점수를 표현할 수 있다. 서비스 모듈(140)은 셋팅 화면을 통하여 사용자 설정 정보를 설정하도록 제공할 수 있다. The service module 140 may provide a service screen (user interface) for providing the analyzed emotional state information. The service module 140 may provide a login screen, a dashboard screen, a statistics screen, a category screen, and a setting screen. The service module 140 may provide access to related data when logging in with administrator authority through the login screen. The service module 140 may express a monthly statistical graph of emotional change of the analyzed data through the dashboard screen. The service module 140 may express statistical graphs for each emotional change period (eg, monthly, weekly, daily), emotional values, and emotional distribution through the statistical screen. The service module 140 may express the categorized data through the category screen and express the detailed emotional score. The service module 140 may provide to set user setting information through a setting screen.

구체적으로, 도 9 내지 도 14를 참고하면, 분석된 감성 상태 정보와 관련된 서비스를 제공하는 동작을 설명하기 위한 도면이다. 상기 서비스는 서비스 모듈에 의하여 수행될 수 있으며, 서비스 모듈을 통하여 도 9 내지 도 13의 서비스 화면이 제공될 수 있다. 도 9는 로그인 화면을 나타낸 것으로, 관리자 계정으로 로그인 시, 감성을 분석한 데이터에 접속이 가능하게 된다. 이때, 관리자 계정의 아이디 및 패스워드가 입력됨에 따라 서비스 모듈에 접근이 가능하다. 또한, 관리자 계정은 사전에 설정되어 있을 수 있고, 추후에 관리자로 접근 권한이 변경됨에 따라 관리자 계정이 될 수 있다.Specifically, with reference to FIGS. 9 to 14 , it is a diagram for explaining an operation of providing a service related to the analyzed emotional state information. The service may be performed by the service module, and the service screens of FIGS. 9 to 13 may be provided through the service module. 9 shows a login screen, and when logging in with an administrator account, it is possible to access data analyzed by emotion. At this time, as the ID and password of the administrator account are input, the service module can be accessed. In addition, the administrator account may be set in advance, and may become an administrator account as access rights are changed to an administrator later.

도 10을 참고하면, 대시보드 화면을 나타낸 것이다. 대시보드 화면은 감성이 분석된 후, 전체 게시글의 긍정 또는 부정의 수치에 따라 감성 날씨로 표현될 수 있다(1). 감성 날씨 기준은 게시글 감정 분석 5단계와 같은 과정을 거치되, 전체 게시글을 대상으로 계산될 수 있다. 대시보드 화면은 감성이 분석된 후, 카테고리 내 게시글의 긍정 또는 부정 수치에 따라 감성 날씨로 표현될 수 있다(2). 감성 날씨 기준은 게시글 감정 분석 5단계와 같은 과정을 거치되, 카테고리 내 게시글을 대상으로 계산될 수 있다. 대시보드 화면에서 전체 게시글의 긍정 또는 부정 수치가 날짜 별로 기 설정된 기간씩(예를 들면, 일주일씩, 한달씩 등) 감성 점수 변화가 막대 그래프로 표현될 수 있다(3). 대시보드 화면에서 FullCalendar 라이브러리로 달력이 표시되고, 두 학기(3월부터 다음 년도 2월까지)에 대한 학사일정이 리스트로 표현될 수 있다(4). Referring to FIG. 10 , a dashboard screen is shown. After the emotion is analyzed, the dashboard screen may be expressed as emotional weather according to the positive or negative values of all posts (1). The emotional weather criterion may be calculated for all posts through the same process as step 5 of post sentiment analysis. After the emotion is analyzed, the dashboard screen may be expressed as emotional weather according to the positive or negative values of posts in the category (2). The emotional weather criterion may be calculated for postings within a category through the same process as in step 5 of post sentiment analysis. On the dashboard screen, the change in the emotional score may be expressed as a bar graph for each date in which the positive or negative values of all postings are preset for each date (eg, by a week, by a month, etc.) (3). A calendar is displayed in the FullCalendar library on the dashboard screen, and the academic calendar for two semesters (from March to February of the next year) can be displayed as a list (4).

도 11 및 도 12를 참고하면, 통계 화면을 나타낸 것이다. 통계 화면에서 전체 게시글의 긍정 또는 부정 수치가 날짜별로 기 설정된 기간(예를 들면, 일주일씩)의 감성 점수 변화에 대하여 선 그래프로 표현될 수 있다(5). 통계 화면에서 특정 기간(예를 들면, 일주일) 동안의 감성 게시글 중 각 감성 상태에 해당하는 게시글의 개수가 방사형 그래프와 수치로 표현될 수 있다(6). 통계 화면에서 특정 기간(예를 들면, 일주일) 동안의 감성 게시글 전체에 대한 긍정 또는 부정 수치가 도넛형 차트로 표현될 수 있다(7). 통계 화면에서 특정 기간(예를 들면, 일주일) 동안의 감성 게시글 중 카테고리에 속한 게시글의 개수가 표현될 수 있다. 11 and 12 , a statistical screen is shown. On the statistics screen, positive or negative values of all posts may be expressed as a line graph with respect to the change in emotional score for a preset period (eg, by one week) for each date (5). On the statistics screen, the number of posts corresponding to each emotional state among emotional posts for a specific period (eg, one week) may be expressed in a radial graph and numerical values (6). On the statistics screen, positive or negative values for all sentiment posts for a specific period (eg, one week) may be expressed as a donut chart (7). On the statistics screen, the number of posts belonging to a category among emotional posts for a specific period (eg, one week) may be expressed.

통계 화면에서 기 설정된 기간 동안의 데이터(예를 들면, 일주일 데이터) 중 하루가 선택되었을 때, 선택된 요일의 감성 점수 값이 표현될 수 있다(9). 또는, 통계 화면에서 방사형 그래프, 도넛형 차트 및 수치가 모두 선택된 날의 감성 점수가 표현될 수 있다. 통계 화면에서 게시글이 카테고라이제이션될 수 있다. 일례로, 게시글이 정적으로 카테고라이제이션될 수 있다. 예를 들면, 카테고리에 대한 후보군을 사전에 설정하여 게시글을 분류할 수 있다. 다른 예로서, 게시글이 동적으로 카테고라이제이션될 수 있다. 예를 들면, 분석한 데이터 중 기 설정된 기준 이상으로 언급된 내용을 카테고리로 설정할 수 있다. 이에, 그 날의 이슈 토픽을 한 눈에 정리하여 확인할 수 있고, 감성과 동시에 학생들의 관심사까지 알 수 있다. 또한, 나의 게시글이 여러 개의 카테고리에 해당될 경우, 정책에 따라 표기가 가능하다. 예를 들면, 카테고리를 라벨 형태로 표기하여 카테고리의 중복을 허용하거나, 카테고리를 사전 형태로 표기하여 카테고리의 개별 표현이 가능하다. When a day is selected from among data (eg, one week data) for a preset period on the statistics screen, the emotional score value of the selected day of the week may be expressed ( 9 ). Alternatively, the radial graph, the donut chart, and the emotional score of the day when all the numerical values are selected on the statistics screen may be expressed. Posts can be categorized in the stats screen. As an example, posts may be statically categorized. For example, posts can be classified by setting candidates for a category in advance. As another example, posts may be dynamically categorized. For example, content mentioned above a preset standard among analyzed data may be set as a category. In this way, you can check the topic of the day at a glance, and at the same time you can know the emotions and interests of the students. Also, if my post falls under multiple categories, it can be marked according to the policy. For example, it is possible to indicate the category in the form of a label to allow duplication of categories, or to indicate the category in the form of a dictionary to express individual categories.

도 13을 참고하면, 카테고리 화면을 나타낸 것이다. 카테고리 화면에 카테고리별로 분류된 게시글이 표현될 수 있다. 예를 들면, 긍정 또는 부정의 수치가 백분율로 계산되어 표시될 수 있다. 백분율로 계산된 긍정 또는 부정의 수치가 감성 날씨로 표현될 수 있다(10). 도 14를 참고하면, 카테고리 화면에서 감성 단어 하이라이팅 기능을 통해 주요 단어를 한눈에 확인할 수 있도록 제공될 수 있다(11). 이때, 감성 분석 모듈에 의하여 'exist'가 1인 값의 'word'와 일치하는 게시글의 단어에 하이라이팅될 수 있다. 하이라이팅 시 긍정 부정의 정도에 따른 색 온도차 또는 색상차를 이용하여 빠르게 분석이 가능하도록 제공할 수 있다. 또한, 카테고리 화면에서 하이라이팅 기능이 온(On)/오프(Off)가 가능할 수 있다. 또한, 카테고리 화면에서 키워드로 감성 단어를 나열할 수 있다. 카테고리 화면에서 감성 분석 모듈에서 계산된 게시글의 감정 단계가 게이지 형태로 표시될 수 있다(12). 카테고리 화면에서 게시글의 감성 분석 정보에 대해 관리자에 의해 사용자 각각의 개별 관리가 수행될 수 있도록 제공될 수 있다. 카테고리 화면에서 상담사가 직접 댓글, 쪽지 등의 어느 하나의 의사소통 방법을 통해 감성 분석 결과가 불안해 보이는 작성자와 연락이 가능하도록 제공할 수 있다(13). 예를 들면, 관리자(상담사)는 카테고리 화면에서 사용자의 게시글에 대하여 분석된 감성 분석 정보를 이용하여 사용자의 게시글에 직접 응답 메시지를 입력할 수 있다. Referring to FIG. 13 , a category screen is shown. Posts classified by category may be displayed on the category screen. For example, a positive or negative value may be calculated and displayed as a percentage. A positive or negative value calculated as a percentage may be expressed as emotional weather (10). Referring to FIG. 14 , it may be provided so that key words can be identified at a glance through the emotion word highlighting function on the category screen ( 11 ). In this case, by the sentiment analysis module, the word of the post that matches the 'word' of which 'exist' is 1 may be highlighted. When highlighting, it can be provided to enable quick analysis by using the color temperature difference or color difference according to the degree of positive or negative. In addition, it may be possible to turn on/off the highlighting function on the category screen. In addition, sentiment words can be listed as keywords on the category screen. On the category screen, the emotion level of the post calculated by the sentiment analysis module may be displayed in the form of a gauge (12). It may be provided so that individual management of each user can be performed by an administrator for sentiment analysis information of a post on the category screen. On the category screen, the counselor can directly provide a communication method such as a comment or a note to enable contact with the author who seems anxious about the result of the sentiment analysis (13). For example, the manager (counselor) may directly input a response message to the user's post by using sentiment analysis information analyzed for the user's post on the category screen.

추가적으로, 서비스 모듈은 감성 분석 결과가 일정한 주기성 패턴을 나타내면, 게시글 작성자의 감성을 예측하여 사전에 대응할 수 있는 프로그램을 진행할 수 있다. 또한, 서비스 모듈은 부정적인 감성 상태의 게시글이 하루에 3건 이상 발생할 경우 관리자에게 이메일 전송하여 위험을 알릴 수 있다. 또한, 서비스 모듈은 부정적인 감성 상태가 3일 이상 지속되거나 총 게시글 수의 70% 이상이 부정적이라면 상담사에게 SMS 전송하여 적극적으로 위험을 통보할 수 있다.Additionally, if the emotion analysis result shows a certain periodicity pattern, the service module may predict the emotion of the post creator and proceed with a program that can respond in advance. In addition, the service module can notify the risk by sending an e-mail to the administrator when there are 3 or more posts in a negative emotional state per day. In addition, if the negative emotional state persists for more than 3 days or more than 70% of the total number of posts are negative, the service module can send SMS to the counselor to proactively notify the risk.

도 3은 일 실시예에 따른 감성 분석 시스템에서 데이터를 크롤링하는 동작을 설명하기 위한 흐름도이다. 크롤링하는 동작은 감성 분석 시스템의 크롤러 모듈에 의하여 수행될 수 있다. 3 is a flowchart illustrating an operation of crawling data in a sentiment analysis system according to an embodiment. The crawling operation may be performed by a crawler module of the sentiment analysis system.

크롤러 모듈은 웹 페이지에 접속(310)함에 따라 게시글 주소를 크롤링(320)할 수 있다. 크롤러 모듈은 게시판 별로 각 페이지에 존재하는 게시글 주소를 크롤링할 수 있다. 예를 들면, 크롤러 모듈은 게시글 주소를 리스트(address[]) 형태로 저장할 수 있다. 크롤러 모듈은 크롤링된 게시글 주소에 기초하여 게시글에 접속(330)할 수 있다. 다시 말해서, 크롤러 모듈은 주소가 저장된 리스트를 통하여 각각의 게시글에 접속할 수 있다. The crawler module may crawl (320) the post address as the web page is accessed (310). The crawler module can crawl the post address existing on each page for each bulletin board. For example, the crawler module can store posting addresses in the form of a list (address[]). The crawler module may access 330 the post based on the crawled post address. In other words, the crawler module can access each post through a list in which addresses are stored.

크롤러 모듈은 게시글 내용을 크롤링(340)할 수 있다. 구체적으로, 크롤러 모듈은 게시글의 본문 및 댓글을 크롤링하여 각각의 리스트에 저장할 수 있다. 예를 들면, 크롤러 모듈은 게시글의 본문의 제목, 본문내용, 작성시간, 추천수, 댓글수, 댓글의 작성자, 댓글 내용, 댓글시간을 크롤링할 수 있다. 이때, 크롤러 모듈은 일정한 시간을 기준으로 주기적으로 크롤링할 수 있다. 일례로, 크롤러 모듈은 기 설정된 기간 동안 작성된 게시글을 분석한 결과, 하루에 작성되는 게시글 수 대비 적절하다고 판단된 시간(예를 들면, 4시간)마다 주기적으로 게시글의 본문 및 댓글을 크롤링할 수 있다. 다른 예로서, 크롤러 모듈은 일정한 게시글 수 이상 작성될 경우, 크롤링할 수 있다. 예를 들면, 기 설정된 기간 동안의 데이터를 분석한 결과, 하루에 작성되는 게시글 수 대비 적절하다고 판단된 시간(예를 들면, 4시간) 간격 마다 평균적으로 생성되는 게시글을 기준으로 게시글의 본문 및 댓글을 크롤링할 수 있다. The crawler module may crawl (340) the content of the posting. Specifically, the crawler module may crawl the body of the post and the comments and store them in each list. For example, the crawler module may crawl the title, body content, writing time, number of recommendations, number of comments, author of comments, comment content, and comment time of the body of the post. In this case, the crawler module may periodically crawl based on a predetermined time. As an example, the crawler module may periodically crawl the body and comments of a post every time (for example, 4 hours) determined to be appropriate compared to the number of posts written in a day as a result of analyzing posts made for a preset period. . As another example, the crawler module may crawl when more than a certain number of posts are written. For example, as a result of analyzing data for a preset period, the body and comments of the post are based on the posts generated on average at each interval of time (eg, 4 hours) that is judged appropriate compared to the number of posts made in a day. can be crawled.

크롤러 모듈은 게시글의 본문 및 댓글을 크롤링함에 따라 저장된 리스트를 데이터프레임(D)로 저장(350)할 수 있다. 크롤러 모듈은 게시글의 본문에 포함되는 내용을 크롤링하여 저장한 리스트와 댓글을 크롤링하여 저장한 리스트를 데이터프레임에 저장할 수 있다. 구체적으로, 크롤러 모듈은 게시글의 제목, 본문내용, 작성시간, 추천수, 댓글수에 포함되는 내용을 크롤링하여 저장한 리스트와 댓글의 작성자, 댓글 내용, 댓글시간에 포함된 내용을 크롤링한 리스트를 데이터프레임에 각각(D1, D2) 따로 분리하여 저장할 수 있다. 이는, 본문의 내용과 댓글 부분을 분리시키기 위함이다. The crawler module may store (350) the stored list as a data frame (D) as the body and comments of the post are crawled. The crawler module can crawl and store the contents included in the body of the post and the list stored by crawling the comments in the data frame. Specifically, the crawler module crawls and stores the content included in the title, body content, writing time, number of recommendations, and number of comments of the post, and the crawled list of comment authors, comment content, and comment time. Each frame (D1, D2) can be stored separately. This is to separate the content of the main text and the comment part.

크롤러 모듈은 리스트를 JSON 파일로 저장(360)할 수 있다. 도 4 내지 도 6을 참고하면, 리스트를 JSON파일로 저장하는 방식에 대하여 설명하기 위한 예이다. 크롤러 모듈은 게시글 주소를 파일명으로 가지는 JSON 파일로 저장할 수 있다. 예를 들면, 크롤러 모듈은 하나의 게시글마다 하나의 JSON 파일을 생성할 수 있다. 하나의 JSON 파일: D1(본문, D2(댓글)를 모두 하나의 JSON 파일에 저장할 수 있다. D1이 하나의 JSON 파일로 저장되면, D2는 각 주소(예를 들면, 게시글 주소)에 맞게 각각의 JOSN 파일로 생성될 수 있다. The crawler module may store 360 the list as a JSON file. 4 to 6 , it is an example for explaining a method of storing a list as a JSON file. The crawler module can save the post address as a JSON file with the file name. For example, the crawler module can generate one JSON file for each post. One JSON file: D1 (body, D2 (comment)) can all be saved in one JSON file. If D1 is saved as a single JSON file, D2 is stored for each address (e.g., post address). It can be created as a JOSN file.

도 7은 일 실시예에 따른 감성 분석 시스템에서 데이터 전처리를 수행하는 동작을 설명하기 위한 흐름도이다. 데이터 전처리를 수행하는 동작은 감성 분석 시스템의 데이터 전처리 모듈에 의하여 수행될 수 있다. 7 is a flowchart illustrating an operation of performing data pre-processing in the sentiment analysis system according to an exemplary embodiment. The data pre-processing operation may be performed by the data pre-processing module of the sentiment analysis system.

데이터 전처리 모듈은 로우 데이터를 임포트(Raw Data Import)(710)할 수 있다. 데이터 전처리 모듈은 게시글 ID, 게시글 작성 시간 등 부가 정보를 제외한 게시글 제목과 본문 내용만 전처리 과정을 수행할 수 있다. 데이터 전처리 모듈은 데이터를 정제(720)할 수 있다. 데이터 전처리 모듈은 노이즈 데이터(예를 들면, 수집된 데이터로부터 감성을 나타내지 않는 단어)들을 제거할 수 있다. 예를 들면, '$, #, /, +, =. &, @' 등을 제거할 수 있고, 이모티콘을 나타낼 수 있는 기호 '(, ), :, ;, ^, *, <, >' 는 제거하지 않는다. The data pre-processing module may import raw data (Raw Data Import) 710 . The data preprocessing module can preprocess only the post title and body content, excluding additional information such as post ID and post creation time. The data pre-processing module may refine 720 the data. The data pre-processing module may remove noise data (eg, words that do not represent sentiment from the collected data). For example, '$, #, /, +, =. &, @' can be removed, and symbols '(, ), :, ;, ^, *, <, >' that can represent emoticons are not removed.

데이터 전처리 모듈은 노이즈 데이터를 제거함에 따라 텍스트를 정규화(730)할 수 있다. 정규화란 표현 방법이 다른 단어들을 통합시켜 동일한 단어로 생성하는 과정이다. 데이터 전처리 모듈은 'ㅋ, ㅎ, ㅠ' 등이 일정 개수 이상 반복하여 나타날 경우, 'ㅋㅋㅋ', 'ㅎㅎㅎ', 'ㅠㅠㅠ' 등으로 통일하여 동일한 점수의 형태로 정규화시킬 수 있다. The data pre-processing module may normalize ( 730 ) the text as the noise data is removed. Normalization is the process of generating the same word by integrating words with different expression methods. In the data preprocessing module, when 'ㅋ, ㅎ, ㅠ', etc. repeatedly appear more than a certain number of times, it can be normalized into the form of the same score by unifying it into 'ㅋ', ''heh'', ''ㅠ', etc.

데이터 전처리 모듈은 띄어쓰기 모듈을 실행(740)시킬 수 있다. 데이터 내 띄어쓰기가 제대로 이루어지지 않은 경우, 형태소 분석 시 오류가 발생할 수 있다. 데이터 전처리 모듈은 띄어쓰기 모듈을 사용하여 오류를 방지할 수 있다. 예를 들면, Chatspace (한글 띄어쓰기 모델), KoSpacing과 RAWS를 사용할 수 있다. KoSpacing는 형태소 분석 이전, 정확한 결과를 위한 한글 자동 띄어쓰기 패키지 CNN-RNN 구조이고, RAWS는 문자 수준 임베딩 기법을 차용하여 기본적인 모델을 구성한다. Mecab형태소의 경우 이모티콘이 복수 번 반복되면 하나로 인식하기 때문에 이모티콘 간 앞 뒤에 공백을 두어 구분한다.The data pre-processing module may execute the spacing module ( 740 ). If the spacing in the data is not done properly, an error may occur during morphological analysis. The data preprocessing module can prevent errors by using the spacing module. For example, you can use Chatspace (Korean space model), KoSpacing and RAWS. KoSpacing is a CNN-RNN structure of the Korean automatic spacing package for accurate results before morpheme analysis, and RAWS constructs a basic model by borrowing character-level embedding techniques. In the case of the Mecab morpheme, if an emoticon is repeated multiple times, it is recognized as one, so a space is placed before and after the emoticons to separate them.

데이터 전처리 모듈은 데이터에 형태소를 분석(750)할 수 있다. 데이터 전처리 모듈은 띄어쓰기가 실행된 데이터에 Mecab형태소 분석기를 실행시킬 수 있다. 데이터의 형태소 분석을 수행한 분석 결과 품사가 'SY(기호)'일 경우, 이모지 모듈을 사용하여 CLDR로 변환하여 저장될 수 있다. The data preprocessing module may analyze 750 the morphemes in the data. The data pre-processing module can run the Mecab stemming analyzer on the spaced data. As a result of morphological analysis of data, if the part-of-speech is 'SY (symbol)', it can be converted into CLDR using the emoji module and stored.

데이터 전처리 모듈은 분석 결과를 저장(760)할 수 있다. 데이터 전처리 모듈은 형태소 분석된 단어들을 제목과 게시글을 구별하여 JSON 형태로 저장할 수 있다. 데이터 전처리 모듈은 Mecab 실행 시 단어명, 품사태그, 의미분류, 받침유무, 읽기 등의 요소로 형태소 분석이 이루어지지만 그 중에서 단어명과 품사태그만 파일에 저장할 수 있고, 이모지의 경우 이모지, CLDR을 저장할 수 있다. The data pre-processing module may store ( 760 ) the analysis result. The data preprocessing module can store the stemmed words in JSON format by distinguishing titles and posts. The data preprocessing module performs morphological analysis with elements such as word name, part-of-speech tag, semantic classification, support presence, and reading when executing Mecab. can be saved

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the emotional analysis system implemented by a computer,
a crawler module that collects data related to posts;
a data pre-processing module for pre-processing the collected data;
a sentiment analysis module for obtaining a sentiment score for a sentiment word filtered based on a pre-established sentiment dictionary from the pre-processed data; and
A service module that provides a service related to the sentiment analysis information of the post by using the acquired sentiment score
including,
The pre-built emotional dictionary is,
A sentiment dictionary including a Korean sentiment dictionary for sentiment analysis and an emoji sentiment dictionary was built in the form of deep metadata combined with deep learning and metadata.
The emotion analysis module,
From the pre-processed data, sentiment words are filtered based on a pre-established Korean sentiment dictionary and emoji sentiment dictionary, and sentiment scores set in the filtered sentiment words are matched to the filtered sentiment words to match the sentiment of the post. Calculate the score, but compare the root of the result of analyzing the morpheme in the data of the post on which the pre-processing has been performed with the root of the emotional word existing in the pre-built Korean emotional dictionary and emoji emotional dictionary. A first method of giving an emotional score, a second method of giving an emotional score to a word of a matching post when the emotional word existing in the pre-established Korean emotional dictionary and emoji emotional dictionary matches the word of the post; or When the word of the post does not exist in the pre-established Korean sentiment dictionary and emoji sentiment dictionary, as the word of the post is compared with the emotional word contained in the pre-established Korean sentiment dictionary and emoji sentiment dictionary, the above In the post data on which preprocessing has been performed, the root of the result of the morpheme analysis is compared with the root of the emotional word, and the emotion score is given by using any one of the third methods of giving the emotion score to the result of the morpheme analysis. and, when the CLDR of the morpheme analysis result of the post on which the preprocessing has been performed matches the CLDR of the emoji sentiment dictionary, giving a score to the morpheme analysis result,
The service module is
Providing a service screen for individual user management of emotional state information including the emotional value and emotional distribution of the post, and sentiment analysis information of the post
Sentiment Analysis System.

delete

According to claim 1,
The emoji emotion dictionary stores emotion dictionary data including a Unicode name (Unicode Name), CLDR, and score,
The Korean sentiment dictionary is to store sentiment dictionary data including sentiment words, roots, and scores.
Sentiment analysis system, characterized in that.

According to claim 1,
The emotion analysis module,
Based on the sentiment score of the post, weight is given to the score of each sentiment word included in the Korean sentiment dictionary.
Sentiment analysis system, characterized in that.

delete

According to claim 1,
The emotion analysis module,
By counting the number of words by distinguishing the words expressing emotion in the post for each score or for each weighted score, the positive or negative value of the post is calculated and displayed as a percentage.
Sentiment analysis system, characterized in that.

According to claim 1,
The emotion analysis module,
By calculating the ratio of the number of emotional words in a preset range to the total number including the sum of the emotional words existing in the posting,
Sentiment analysis system, characterized in that.

According to claim 1,
The service module is
As the user is logged into the service with the right of the right holder, the emotional change by analyzing the data of the post is shown in a graph, and the emotional weather is shown according to the positive or negative value of the post.
Sentiment analysis system, characterized in that.

According to claim 1,
The service module is
Displaying statistical information including the emotional value and emotional distribution of the post, displaying the post classified based on a preset category, and highlighting the area corresponding to the emotional word in the post
Sentiment analysis system, characterized in that.

According to claim 1,
The service module is
As the manager's response is performed using any one communication method including a comment and a note to the user's post displayed through the provided service screen, it is possible to contact the user who is anxious about the emotional analysis result.
Sentiment analysis system, characterized in that.

According to claim 1,
The crawler module is
As the web page is accessed, the address of the post is crawled, the post is accessed through the address of the crawled post, and the body and comments of the accessed post are crawled and stored in each list, and each of the stored lists After saving in the data frame, the list stored in the data frame is saved as a JSON file with the post address as a file name.
Sentiment analysis system, characterized in that.

According to claim 1,
The data pre-processing module,
Remove noise data from the title and body content of the post included in the collected data, and perform text normalization to generate the same word by integrating words with different expression methods in the title and body content of the post from which the noise data has been removed And, after executing the space in the normalized text, the morpheme analysis is performed to distinguish the morpheme-analyzed words from the body of the post and store them in JSON format.
Sentiment analysis system, characterized in that.

In the sentiment analysis method performed by a computer-implemented sentiment analysis system,
collecting data related to postings;
performing pre-processing of the collected data;
obtaining a sentiment score for the filtered sentiment word based on a pre-established sentiment dictionary from the pre-processed data; and
providing a service related to the sentiment analysis information of the post by using the acquired sentiment score;
including,
The pre-built emotional dictionary is,
A sentiment dictionary including a Korean sentiment dictionary for sentiment analysis and an emoji sentiment dictionary was built in the form of deep metadata combined with deep learning and metadata.
The obtaining step is
From the pre-processed data, sentiment words are filtered based on a pre-established Korean sentiment dictionary and emoji sentiment dictionary, and sentiment scores set in the filtered sentiment words are matched to the filtered sentiment words to match the sentiment of the post. Calculate the score, but compare the root of the result of analyzing the morpheme in the data of the post on which the pre-processing has been performed with the root of the emotional word existing in the pre-built Korean emotional dictionary and emoji emotional dictionary. A first method of giving an emotional score, a second method of giving an emotional score to a word of a matching post when the emotional word existing in the pre-established Korean emotional dictionary and emoji emotional dictionary matches the word of the post; or When the word of the post does not exist in the pre-established Korean sentiment dictionary and emoji sentiment dictionary, as the word of the post is compared with the emotional word contained in the pre-established Korean sentiment dictionary and emoji sentiment dictionary, the above In the post data on which preprocessing has been performed, the root of the result of the morpheme analysis is compared with the root of the emotional word, and the emotion score is given by using any one of the third methods of giving the emotion score to the result of the morpheme analysis. and if the CLDR of the morpheme analysis result of the post on which the preprocessing has been performed matches the CLDR of the emoji sentiment dictionary, giving a score to the morpheme analysis result,
The step of providing the service comprises:
Providing a service screen for individual user management of emotional state information including the emotional value and emotional distribution of the post, and sentiment analysis information of the post
Sentiment analysis method comprising.