KR102567896B1

KR102567896B1 - Apparatus and method for religious sentiment analysis using deep learning

Info

Publication number: KR102567896B1
Application number: KR1020210113387A
Authority: KR
Inventors: 노기섭; 구민구
Original assignee: 청주대학교 산학협력단
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2023-08-17
Also published as: KR20230031027A

Abstract

딥러닝을 이용한 종교 감성 분석 장치 및 방법이 개시된다. 딥러닝을 이용한 종교 감성 분석 방법은, 코로나 19(COVID-19)와 종교가 관련된 뉴스 기사의 댓글을 수집하여 댓글 데이터를 생성하는 단계, 생성된 댓글 데이터에 대하여 감성 레이블링(labeling)을 수행하는 단계, 감성 레이블링이 수행된 댓글 데이터에 대하여 데이터 전처리(data pre-processing)를 수행하는 단계 및 딥러닝 모델을 이용하여 전처리된 댓글 데이터에 대하여 감성 분석을 수행하여 종교별로 긍정 및 부정에 대한 감성 수치를 산출하는 단계를 포함한다.An apparatus and method for analyzing religious sentiment using deep learning are disclosed. Religious sentiment analysis method using deep learning includes generating comment data by collecting comments on news articles related to Corona 19 (COVID-19) and religion, and performing sentiment labeling on the generated comment data. , performing data pre-processing on the comment data for which sentiment labeling has been performed, and performing sentiment analysis on the pre-processed comment data using a deep learning model to obtain emotional values for positive and negative emotions for each religion. It includes the step of calculating

Description

Apparatus and method for religious sentiment analysis using deep learning}

본 발명은 딥러닝을 이용한 종교 감성 분석 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for analyzing religious sentiment using deep learning.

2020년 1월에 전 세계로 퍼진 코로나 19(COVID-19)로부터 사람들의 일상생활은 많은 영향을 받았다. 또한, 2020년 2월 18일에 특정 종교 집단으로부터 집단감염이 시작되었으며, 2020년 6월 또 다른 집단감염, 그리고 2020년 8월에 광복절 관련 대규모 집회 등이 발생하였다. 이와 같은 사회적 이벤트는 코로나 19의 국내 확산을 촉진하였으며, 이후 확진자 수가 급격히 증가하였다.People's daily lives have been greatly affected by Corona 19 (COVID-19), which spread around the world in January 2020. In addition, on February 18, 2020, a group infection began from a specific religious group, another group infection in June 2020, and a large-scale rally related to Liberation Day in August 2020. Social events like this facilitated the spread of COVID-19 in Korea, and the number of confirmed cases increased rapidly.

이에 따라, 종교가 코로나 19 감염의 매개체가 될 수 있다는 인식과 함께, 언론은 종교에 대한 다양한 온라인 정보를 생산하였다.Accordingly, with the recognition that religion can be a medium of COVID-19 infection, the media has produced various online information about religion.

그래서, 본 출원인은 코로나 19가 우리 사회에 미치는 영향 중 종교에 대한 영향성을 데이터 분석 관점에서 접근하기 위하여 본 발명을 제안하였다.So, the present applicant proposed the present invention to approach the impact on religion among the impacts of Corona 19 on our society from a data analysis point of view.

대한민국등록특허공보 제10-0935828호(2009.12.30)Republic of Korea Patent Registration No. 10-0935828 (2009.12.30)

본 발명은 코로나 19(COVID-19)와 종교가 관련된 뉴스 기사의 댓글을 수집하고, 딥러닝 모델을 이용하여 수집된 댓글을 분석하여 종교별 감성 수준을 산출하고, 월별 댓글 수를 분석하여 날짜별 감성 수치를 산출하는 딥러닝을 이용한 종교 감성 분석 장치 및 방법을 제공하기 위한 것이다.The present invention collects comments on news articles related to Corona 19 (COVID-19) and religion, analyzes the collected comments using a deep learning model to calculate the emotional level for each religion, and analyzes the number of comments per month for each date. It is to provide a religious sentiment analysis device and method using deep learning that calculates sentiment values.

본 발명의 일 측면에 따르면, 종교 감성 분석 장치가 수행하는 딥러닝을 이용한 종교 감성 분석 방법이 개시된다.According to one aspect of the present invention, a method for analyzing religious sentiment using deep learning performed by an apparatus for analyzing religious sentiment is disclosed.

본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 방법은, 코로나 19(COVID-19)와 종교가 관련된 뉴스 기사의 댓글을 수집하여 댓글 데이터를 생성하는 단계, 상기 생성된 댓글 데이터에 대하여 감성 레이블링(labeling)을 수행하는 단계, 상기 감성 레이블링이 수행된 댓글 데이터에 대하여 데이터 전처리(data pre-processing)를 수행하는 단계 및 딥러닝 모델을 이용하여 상기 전처리된 댓글 데이터에 대하여 감성 분석을 수행하여 종교별로 긍정 및 부정에 대한 감성 수치를 산출하는 단계를 포함한다.Religious sentiment analysis method using deep learning according to an embodiment of the present invention includes generating comment data by collecting comments on news articles related to Corona 19 (COVID-19) and religion, and sentiment about the generated comment data. Performing labeling, performing data pre-processing on the comment data for which the sentiment labeling has been performed, and performing sentiment analysis on the pre-processed comment data using a deep learning model A step of calculating emotional values for positive and negative emotions for each religion is included.

상기 감성 레이블링을 수행하는 단계는, 상기 댓글 데이터에서 긍정적인 경향을 보이는 댓글은 긍정 식별자로 레이블링되고, 부정적인 경향을 보이는 댓글은 부정 식별자로 레이블링된다.In the performing of the sentiment labeling, replies showing a positive tendency in the comment data are labeled with positive identifiers, and replies showing negative tendencies are labeled with negative identifiers.

상기 딥러닝 모델은 파이썬 패키지(Python package) 중 하나인 케라스(Keras) 모델이 적용되며, 입력층(Input layer)과 출력층(Output Layer) 사이에 10개의 은닉층(Hidden Layer)을 가지도록 구성된다.The deep learning model applies a Keras model, one of the Python packages, and is configured to have 10 hidden layers between the input layer and the output layer. .

상기 10개의 은닉층은, 문자열을 벡터화하는 함수인 TextVectorization이 적용된 제1 은닉층, 상기 벡터화된 문자열인 양의 정수를 고정된 크기의 밀집 벡터로 전환하는 Embedding 함수가 적용된 제2 은닉층, 상기 밀집 벡터를 평평하게 하여 배열로 변환하는 Flatten 함수가 적용된 제3 은닉층 및 32개의 유닛(unit)을 가지고, 활성화 함수(Activation Function)가 ReLU(Rectified Linear Unit)가 되도록 설정되고, 학습 과정에서의 과적합(Overfitting)을 방지하기 위하여, L2 regularization이 추가되는 제4 은닉층 내지 제10 은닉층을 포함한다.The 10 hidden layers include a first hidden layer to which TextVectorization, a function that vectorizes strings, is applied, a second hidden layer to which an embedding function is applied that converts positive integers, which are vectorized strings, into dense vectors of a fixed size, and the dense vectors are flattened. With a third hidden layer and 32 units to which the Flatten function that converts into an array is applied, the activation function is set to be ReLU (Rectified Linear Unit), and overfitting in the learning process In order to prevent this, a fourth to tenth hidden layer to which L2 regularization is added is included.

상기 감성 수치를 산출하는 단계는, 한국어 BERT 모델인 KoBERT 모델을 이용하여 댓글 데이터에 대하여 감성 분석을 수행한다.In the step of calculating the sentiment value, sentiment analysis is performed on the comment data using the KoBERT model, which is a Korean BERT model.

상기 딥러닝을 이용한 종교 감성 분석 방법은, 상기 전처리된 댓글 데이터를 이용하여 날짜별 감성 수치를 산출하는 단계를 더 포함한다.The religious sentiment analysis method using the deep learning further includes calculating a sentiment value for each date using the preprocessed comment data.

상기 날짜별 감성 수치는 날짜당 가중 평균(Weight Average)으로부터 도출되고, 상기 날짜당 가중 평균(W)은 하기 수학식을 이용하여 산출된다.The emotional value per day is derived from a weighted average per day, and the weighted average (W) per day is calculated using the following equation.

여기서, k는 날짜당 기사 수이고, i는 기사 인덱스이고, count(i)는 기사 인덱스 i의 총 댓글 수이고, avg(i)는 기사 평균 감성 수치이고, M은 해당 날짜의 모든 기사의 총 댓글 수의 합이다.where k is the number of articles per day, i is the article index, count(i) is the total number of comments for article index i, avg(i) is the average sentiment value of the article, and M is the total number of all articles on that day is the sum of the number of comments.

상기 날짜당 가중 평균을 보정한 날짜별 감성 평균(X)은 하기 수학식을 이용하여 산출된다.The sentiment average (X) for each date obtained by correcting the weighted average per day is calculated using the following equation.

여기서, N은 날짜당 총 댓글 수이고, L은 감성 수치가 신뢰를 얻기 위한 날짜당 최소 댓글 수이고, e는 전체 기간의 감성 수치 평균이고, W는 수학식 1에서 산출되는 날짜당 가중 평균값이다.Here, N is the total number of comments per day, L is the minimum number of comments per day for the sentiment value to gain trust, e is the average sentiment value for the entire period, and W is the weighted average value per day calculated in Equation 1 .

본 발명의 다른 측면에 따르면, 딥러닝을 이용한 종교 감성 분석 장치가 개시된다.According to another aspect of the present invention, an apparatus for analyzing religious sentiment using deep learning is disclosed.

본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 장치는, 명령어를 저장하는 메모리 및 상기 명령어를 실행하는 프로세서를 포함하되, 상기 명령어는, 코로나 19(COVID-19)와 종교가 관련된 뉴스 기사의 댓글을 수집하여 댓글 데이터를 생성하는 단계, 상기 생성된 댓글 데이터에 대하여 감성 레이블링(labeling)을 수행하는 단계, 상기 감성 레이블링이 수행된 댓글 데이터에 대하여 데이터 전처리(data pre-processing)를 수행하는 단계 및 딥러닝 모델을 이용하여 상기 전처리된 댓글 데이터에 대하여 감성 분석을 수행하여 종교별로 긍정 및 부정에 대한 감성 수치를 산출하는 단계를 포함하는 딥러닝을 이용한 종교 감성 분석 방법을 수행한다.An apparatus for analyzing religious sentiment using deep learning according to an embodiment of the present invention includes a memory for storing commands and a processor for executing the commands, wherein the commands include news articles related to COVID-19 and religion. Collecting comments of and generating comment data, performing emotion labeling on the generated comment data, and performing data pre-processing on the comment data on which the emotion labeling has been performed. and performing sentiment analysis on the preprocessed comment data using a deep learning model to calculate positive and negative sentiment values for each religion.

본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 장치 및 방법은, 코로나 19(COVID-19)와 종교가 관련된 뉴스 기사의 댓글을 수집하고, 딥러닝 모델을 이용하여 수집된 댓글을 분석하여 종교별 감성 수준을 산출하고, 월별 댓글 수를 분석하여 날짜별 감성 수치를 산출할 수 있다.Religious sentiment analysis apparatus and method using deep learning according to an embodiment of the present invention collects comments on news articles related to Corona 19 (COVID-19) and religion, and analyzes the collected comments using a deep learning model The emotional level for each religion can be calculated, and the emotional value for each date can be calculated by analyzing the number of comments per month.

도 1은 본 발명의 실시예에 따른 종교 감성 분석 장치가 수행하는 딥러닝을 이용한 종교 감성 분석 방법을 나타낸 흐름도.
도 2는 본 발명의 실시예에 따른 딥러닝 모델의 구조의 예를 나타낸 도면.
도 3 내지 도 5는 본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 방법에 대한 실험 결과를 나타낸 도면.
도 6은 본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 장치의 구성을 개략적으로 예시하여 나타낸 도면.1 is a flowchart illustrating a religious sentiment analysis method using deep learning performed by an apparatus for analyzing religious sentiment according to an embodiment of the present invention.
2 is a diagram showing an example of the structure of a deep learning model according to an embodiment of the present invention.
3 to 5 are diagrams showing experimental results for a religious sentiment analysis method using deep learning according to an embodiment of the present invention.
6 is a diagram schematically illustrating the configuration of an apparatus for analyzing religious sentiment using deep learning according to an embodiment of the present invention.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.Singular expressions used herein include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various components or steps described in the specification, and some of the components or some of the steps It should be construed that it may not be included, or may further include additional components or steps. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

이하, 본 발명의 다양한 실시예들을 첨부된 도면을 참조하여 상술하겠다. Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 종교 감성 분석 장치가 수행하는 딥러닝을 이용한 종교 감성 분석 방법을 나타낸 흐름도이고, 도 2는 본 발명의 실시예에 따른 딥러닝 모델의 구조의 예를 나타낸 도면이다. 이하, 도 1을 중심으로, 본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 방법에 대하여 설명하되, 도 2를 참조하기로 한다.1 is a flowchart illustrating a religious sentiment analysis method using deep learning performed by an apparatus for analyzing religious sentiment according to an embodiment of the present invention, and FIG. 2 is a diagram showing an example of the structure of a deep learning model according to an embodiment of the present invention. am. Hereinafter, a method for analyzing religious sentiment using deep learning according to an embodiment of the present invention will be described with reference to FIG. 1 , but reference will be made to FIG. 2 .

S110 단계에서, 종교 감성 분석 장치는 코로나 19(COVID-19)와 종교가 관련된 뉴스 기사의 댓글을 수집하여 댓글 데이터를 생성한다.In step S110, the device for analyzing religious sensibility generates comment data by collecting comments on news articles related to COVID-19 and religion.

대중들의 인식을 파악하기 위하여, 미리 설정된 수집 기간 동안의 국내 뉴스 기사의 댓글이 수집될 수 있다. 예를 들어, 데이터 수집 공간을 설정하기 위하여, 인터넷 트랜드(www.internettrend.co.kr)에 개시된 2019년 8월부터 2020년 10월까지의 검색 엔진의 방문수 순위가 이용될 수 있다. 이 수집 기간 동안 인터넷 트랜드에 따르면, 국내 포털 사이트의 점유율은 1위가 네이버(58.5%), 2위가 구글(32.8%), 3위가 다음(6.63%)으로 나타났다. 여기서, 2위인 구글은 뉴스 기사의 댓글을 수집하기에 부적절한 웹 페이지로 판단되어 데이터 수집 공간 대상에서 제외되었다. 즉, 데이터 수집 공간으로서, 1위인 네이버 및 3위인 다음이 설정될 수 있다.In order to grasp public awareness, comments on domestic news articles during a preset collection period may be collected. For example, in order to set a data collection space, the ranking of visits of a search engine from August 2019 to October 2020 disclosed in Internet Trend (www.internettrend.co.kr) may be used. According to Internet trends during this collection period, the share of domestic portal sites was ranked first with Naver (58.5%), second with Google (32.8%), and third with Daum (6.63%). Here, Google, the second place, was judged to be an inappropriate web page for collecting comments on news articles, and was excluded from the data collection space. That is, as a data collection space, Naver, which is the first place, and Daum, which is the third place, may be set.

코로나 19와 종교가 관련된 뉴스 기사를 검색하기 위한 키워드는 기독교, 불교, 천주교, 신천지 및 종교의 5가지로 설정될 수 있다. 여기서, 기독교, 불교 및 천주교는 국내 신자 비율이 차례대로 1위부터 3위이며, 각각 20%, 15.5% 및 8%의 비율을 차지한다(문화체육관광부 발표, 2018). 그리고, 신천지는 국내 신자 비율이 1% 미만이지만, 코로나 19의 집단감염의 원인이 되었던 종교로서, 코로나 19의 국내 1차 대유행의 원인이 된 종교이기 때문에, 키워드로 선정될 수 있다. 마지막으로, 전체 키워드를 모두 포함하는 인식 변화를 알아보기 위하여, 종교의 키워드가 추가로 설정될 수 있다.Keywords for searching for news articles related to COVID-19 and religion can be set to five keywords: Christianity, Buddhism, Catholicism, Shincheonji, and religion. Here, Christianity, Buddhism, and Catholicism rank from 1st to 3rd in the proportion of domestic believers in order, accounting for 20%, 15.5% and 8%, respectively (announced by the Ministry of Culture, Sports and Tourism, 2018). And, although Shincheonji has less than 1% of domestic believers, it can be selected as a keyword because it is the religion that caused the mass infection of Corona 19 and caused the first domestic pandemic of Corona 19. Finally, in order to find out the change in perception including all keywords, the keyword of religion may be additionally set.

데이터 수집 기간은 키워드별로 다르게 설정될 수 있다. 즉, 신천지는 집단감염 사태가 발생하기 하루 전인 2020년 2월 17일이 수집 기준일로 설정될 수 있고, 다른 키워드는 코로나 19가 유입된 2020년 1월 20일이 수집 기준일로 설정될 수 있다. 그리고, 수집 기준일 이전 5개월 간 및 수집 기준일 이후 5개월 간 생성된 뉴스 기사의 댓글이 수집될 수 있다. The data collection period may be set differently for each keyword. That is, for Shincheonji, February 17, 2020, one day before the mass infection occurred, can be set as the reference date for collection, and for other keywords, January 20, 2020, when Corona 19 was introduced, can be set as the reference date for collection. In addition, comments on news articles generated for 5 months before the collection reference date and for 5 months after the collection reference date may be collected.

예를 들어, Selenium과 BeautifulSoup4 라이브러리를 이용하여 파이썬(Python) 언어로 구현된 크롤러(crawler)를 이용하여 뉴스 기사의 댓글이 수집될 수 있다. 날짜에 맞는 뉴스 기사를 검색하기 위하여, 포털 사이트가 제공하는 날짜 검색 조건 설정 기능이 이용될 수 있다. 뉴스 기사의 댓글 수집을 통해 생성되는 댓글 데이터는 뉴스 기사의 URL, 날짜, 모든 댓글 및 답글을 포함할 수 있으며, 데이터베이스(MariaDB)에 저장되어 관리될 수 있다. 생성된 댓글 데이터의 예는 하기의 표 1과 같이 나타낼 수 있다. 표 1에서, 수집 기준일 이전 및 이후는 Before 및 After로 표기된다.For example, comments on news articles may be collected using a crawler implemented in the Python language using Selenium and the BeautifulSoup4 library. In order to search for news articles suitable for a date, a date search condition setting function provided by a portal site may be used. Comment data generated by collecting comments on news articles may include URLs of news articles, dates, and all comments and replies, and may be stored and managed in a database (MariaDB). An example of generated comment data can be shown in Table 1 below. In Table 1, before and after the collection reference date are indicated as Before and After.

S120 단계에서, 종교 감성 분석 장치는 생성된 댓글 데이터에 대하여 감성 레이블링(labeling)을 수행한다.In step S120, the apparatus for analyzing religious sentiment performs sentiment labeling on the generated comment data.

뉴스 기사의 댓글 수집을 통해 생성된 댓글 데이터는 딥러닝(deep learning)을 이용한 감성 분석(Sentiment Analysis)을 위하여 감성 레이블링될 수 있다.Comment data generated through the collection of comments on news articles may be sentiment labeled for sentiment analysis using deep learning.

댓글 데이터에서 긍정적인 경향을 보이는 댓글은 긍정 식별자(예를 들어, 1)로 레이블링되고, 부정적인 경향을 보이는 댓글은 부정 식별자(예를 들어, 0)로 레이블링될 수 있다. 즉, 종교 감성 분석 장치는 사용자로부터 댓글 별로 긍정 식별자 또는 부정 식별자를 입력받아 댓글 데이터의 각 댓글에 대하여 감성 레이블링을 수행할 수 있다.Comments showing a positive trend in the comment data may be labeled with a positive identifier (eg, 1), and comments showing a negative trend may be labeled with a negative identifier (eg, 0). That is, the apparatus for analyzing religious sentiment may receive a positive identifier or a negative identifier for each comment from the user and perform sentiment labeling on each reply of the reply data.

예를 들어, 댓글 데이터는 전술한 바와 같이, 네이버 및 다음의 2개의 데이터 수집 공간에서, 기독교, 불교, 천주교, 신천지 및 종교의 5가지의 키워드로 수집 기준일 이전 기간 및 수집 기준일 이후 기간동안 수집될 수 있다. 따라서, 딥러닝을 위한 전체 학습 데이터 세트의 수는 2 × 5 × 2 = 20가 될 수 있다. 각 학습 데이터 세트에서 5 ~ 10%의 댓글을 추출하여 감성 레이블링 작업이 진행될 수 있다. 사용자가 댓글 별 감성 레이블링 작업 시, 개인마다 긍정이나 부정을 판단하는 기준이 상이하여 편향이 존재할 수 있다. 이 점을 보완하기 위하여, 8명이 인원이 함께 감성 레이블링을 진행하여, 딥러닝 결과 특정 개인의 기준이 반영되지 않도록 하였다. 이를 통해, 총 32,474개의 댓글에 대하여 감성 레이블링이 진행되어 감성 레이블링된 학습 데이터가 생성되었다.For example, as described above, comment data will be collected in the two data collection spaces of Naver and Daum, with five keywords of Christianity, Buddhism, Catholicism, Shincheonji, and religion, during the period before and after the reference date of collection. can Therefore, the total number of training data sets for deep learning can be 2 × 5 × 2 = 20. Sentiment labeling can be performed by extracting 5 to 10% of comments from each training data set. When a user works on emotion labeling for each comment, bias may exist because each individual has different criteria for determining positive or negative. In order to compensate for this point, eight people conducted emotional labeling together, so that the criteria of a specific individual were not reflected as a result of deep learning. Through this, emotion labeling was performed on a total of 32,474 comments, and emotion-labeled learning data was created.

S130 단계에서, 종교 감성 분석 장치는 감성 레이블링이 수행된 댓글 데이터에 대하여 데이터 전처리(data pre-processing)를 수행한다.In step S130, the apparatus for analyzing religious sentiment performs data pre-processing on the comment data for which sentiment labeling has been performed.

여기서, 데이터 전처리는 데이터를 분석 및 처리에 적합한 형태로 만드는 과정이다. 즉, 데이터 전처리는 컴퓨터가 데이터를 처리할 수 있는 형태로 변형하고 불필요한 정보를 제거하여 인공지능의 예측 정확도를 향상시키는 것이 목적이다.Here, data preprocessing is a process of making data into a form suitable for analysis and processing. In other words, the purpose of data preprocessing is to improve the prediction accuracy of artificial intelligence by transforming data into a form that can be processed by a computer and removing unnecessary information.

즉, 종교 감성 분석 장치는 형태소 분석 라이브러리를 이용하여 형태소를 분석하여 불필요한 형태소를 제거할 수 있다.That is, the apparatus for analyzing religious sentiment may remove unnecessary morphemes by analyzing morphemes using a morpheme analysis library.

예를 들어, 한글 데이터는 조사에 따른 다양한 의미 변화가 가능하기 때문에, 단어분절(tokenization) 작업이 필요하다. 이를 위하여, KoNLPy의 Okt(이전 명칭: Twitter) 라이브러리를 이용한 형태소 분석이 수행될 수 있다. 여기서, KoNLPy에는 Komoran, Kkma, Hannanum, Mecab 등의 다양한 형태소 분석 라이브러리가 존재한다. 본 발명의 실시예에서는, 불필요한 일부 형태소만을 제거하는 방식을 사용하기 위하여 형태소 분석을 수행하기 때문에, 간단하고 빠르게 형태소 분석이 가능한 Okt 라이브러리가 이용될 수 있다. 형태소 분석이 완료된 댓글에서, 숫자(Number), 조사(Josa), 구두점(Punctuation) 및 불용어(Stop Word)를 제거하여 데이터 전처리가 이루어질 수 있다. 데이터 전처리의 예는 하기 표 2와 같이 나타낼 수 있다. 표 2에서, Raw data는 데이터 전처리 이전의 댓글 데이터이고, Preprocessed Data는 댓글 데이터에 대하여 Okt 라이브러리를 이용한 데이터 전처리를 수행한 결과를 나타낸다.For example, since Hangul data can undergo various semantic changes according to research, tokenization is required. To this end, morphological analysis using KoNLPy's Okt (formerly Twitter) library can be performed. Here, various morpheme analysis libraries such as Komoran, Kkma, Hannanum, and Mecab exist in KoNLPy. In an embodiment of the present invention, since morpheme analysis is performed to use a method of removing only some unnecessary morphemes, the Okt library capable of simple and fast morpheme analysis can be used. Data pre-processing may be performed by removing numbers, josa, punctuation, and stop words from the morphologically analyzed comments. An example of data preprocessing can be shown in Table 2 below. In Table 2, Raw data is the comment data before data preprocessing, and Preprocessed Data shows the result of data preprocessing using the Okt library on the comment data.

S140 단계에서, 종교 감성 분석 장치는 딥러닝 모델을 이용하여 댓글 데이터에 대하여 감성 분석을 수행하여 종교별로 긍정 및 부정에 대한 감성 수치를 산출한다.In step S140, the apparatus for analyzing religious sentiment calculates sentiment values for positive and negative emotions for each religion by performing sentiment analysis on the comment data using a deep learning model.

예를 들어, 본 발명의 실시예에 따른 딥러닝 모델은 도 2에 도시된 바와 같이, 파이썬 패키지(Python package) 중 하나인 케라스(Keras) 모델이 적용되며, 입력층(Input layer)과 출력층(Output Layer) 사이에 10개의 은닉층(Hidden Layer)을 가지도록 구성될 수 있다.For example, as shown in FIG. 2, the deep learning model according to an embodiment of the present invention applies a Keras model, one of the Python packages, and includes an input layer and an output layer. It can be configured to have 10 hidden layers between (Output Layers).

첫번째 은닉층에는, 케라스에서 제공하는 내장 함수 중 문자열을 벡터화하는 함수인 TextVectorization이 적용될 수 있다. 이때, TextVectorization의 인자로서, 전체 단어 사전의 크기를 의미하는 max_tokens는 100,000으로 설정될 수 있고, 출력 모드를 결정하는 output_mode는 정수(int) 형으로 설정될 수 있고, 출력 벡터의 크기를 나타내는 output_sequence_length는 64로 설정될 수 있다. 이후, TextVectorization 함수에 학습 데이터가 적용될 수 있다.For the first hidden layer, TextVectorization, a function that vectorizes strings among the built-in functions provided by Keras, can be applied. At this time, as a factor of TextVectorization, max_tokens, which means the size of the entire word dictionary, can be set to 100,000, output_mode, which determines the output mode, can be set to an integer (int) type, and output_sequence_length, which indicates the size of the output vector, can be set to 100,000. Can be set to 64. After that, learning data can be applied to the TextVectorization function.

두번째 은닉층에는, 벡터화된 문자열인 양의 정수(색인)를 고정된 크기의 밀집 벡터로 전환하는 Embedding 함수가 적용될 수 있다. Embedding 함수의 인자 중 어휘 목록의 크기 즉, 최대 정수 색인+1이 되는 input_dim은 TextVectorization의 인자 중 하나인 max_tokens 값에 1을 더해준 값으로 설정될 수 있다. 출력값의 밀집 벡터 차원을 결정하는 output_dim은 200으로 설정될 수 있다.An embedding function that converts a positive integer (index), which is a vectorized character string, into a dense vector of a fixed size may be applied to the second hidden layer. Among the parameters of the Embedding function, input_dim, which is the size of the vocabulary list, that is, the maximum integer index + 1, can be set to a value obtained by adding 1 to the max_tokens value, one of the TextVectorization factors. output_dim, which determines the dimension of the dense vector of the output value, may be set to 200.

세번째 은닉층에는, 입력된 밀집 벡터를 평평하게 하여 배열로 변환하는 Flatten 함수가 적용될 수 있다.In the third hidden layer, a flatten function that flattens an input dense vector and converts it into an array may be applied.

네번째부터 열번째까지의 은닉층은, 32개의 유닛(unit)을 가지고, 활성화 함수(Activation Function)가 ReLU(Rectified Linear Unit)가 되도록 설정될 수 있다. 이때, 학습 과정에서의 과적합(Overfitting)을 방지하기 위하여, 0.001의 값을 가진 L2 regularization이 추가될 수 있다.The fourth to tenth hidden layers may have 32 units, and an activation function may be set to be a Rectified Linear Unit (ReLU). At this time, in order to prevent overfitting in the learning process, L2 regularization with a value of 0.001 may be added.

출력층은 1개의 노드를 가지고, 활성화 함수가 sigmoid로 가 되도록 설정될 수 있다. 하나의 댓글 데이터 당 하나의 출력값만 존재해야 하므로, 출력 노드는 1개이고, 0부터 1 사이의 소수 출력값을 가져야 하므로, sigmoid 함수가 사용될 수 있다. 이때, 출력값은 1에 가까울수록 긍정적인 경향을 나타내고, 0에 가까울수록 부정적인 경향을 나타낼 수 있다.The output layer has one node, and the activation function can be set to be sigmoid. Since there must be only one output value per one comment data, the output node must be one and must have a decimal output value between 0 and 1, so the sigmoid function can be used. In this case, the closer the output value is to 1, the more positive it is, and the closer it is to 0, the more negative it is.

한편, 신경망이 학습할 수 있도록 해주는 지표가 바로, 손실 함수(Loss Function)이다. 본 발명의 실시예에서는, 두 개의 이진(Binary) 값 중 하나를 선택하는 것이 목적이며, 데이터의 레이블이 독립적이기 때문에, 이진 교차 엔트로피(Binary Cross Entropy)가 손실 함수로 설정될 수 있다. 신경망 학습은 이러한 손실 함수의 지표를 토대로, 손실 함수값을 최대한 낮추는 최적의 매개변수를 찾아야 한다. 이러한 과정을 최적화(Optimization)라고 부르며, 최적의 매개변수는 케라스 모델을 컴파일(Compile)하기 위해 필요한 변수 중 하나이다. 본 발명의 실시예에서는, 신경망 학습을 하면서, 경사 하강법(Gradient Descent)의 일부인 Adam이 최적화 함수로 설정될 수 있다. 여기서, Adam은 학습량을 줄여 나가고 속도를 계산하여, 학습의 갱신 강도를 조정하는 알고리즘이다.On the other hand, the indicator that enables the neural network to learn is the loss function. In an embodiment of the present invention, the purpose is to select one of two binary values, and since labels of data are independent, binary cross entropy can be set as a loss function. Neural network training must find optimal parameters that minimize the loss function value based on these loss function indicators. This process is called optimization, and the optimal parameter is one of the variables required to compile a Keras model. In an embodiment of the present invention, Adam, which is a part of gradient descent, may be set as an optimization function while learning the neural network. Here, Adam is an algorithm that adjusts the renewal strength of learning by reducing the amount of learning and calculating the speed.

예를 들어, 사용자가 직접 감성 레이블링한 데이터는 총 32,474개로, 전체 데이터의 약 4.9% 수준이기에, 학습 데이터로 부족할 수 있다. 이 점을 보완하기 위하여, 네이버 영화 리뷰 데이터 20만 개를 추가로, 학습 데이터에 포함시킬 수 있다. 신경망 학습 결과, 학습 데이터의 정확도(Accuracy)는 약 95%인고, 손실(Loss) 값은 약 0.16인 성능을 보였다. 테스트 데이터의 정확도는 약 83%이고, 손실 값은 약 0.49로 나타났다.For example, a total of 32,474 emotionally labeled data by the user is about 4.9% of the total data, so it may be insufficient as training data. To compensate for this, 200,000 Naver movie review data can be additionally included in the training data. As a result of neural network training, the accuracy of the training data was about 95%, and the loss value was about 0.16. The accuracy of the test data is about 83%, and the loss value is about 0.49.

이와 같이 생성된 딥러닝 모델이 전체 댓글 데이터에 적용되었다. 총 748,020개의 댓글에 대하여, 학습 데이터와 같은 방식으로 데이터 전처리가 진행된 다음, 학습된 딥러닝 모델을 통해 각 댓글의 감성 수치가 예측되었다. 예측된 감성 수치는, 최종 출력 노드의 활성 함수인 sigmoid를 통해 0.0 이상 1.0 이하의 실수로 출력될 수 있다.The deep learning model created in this way was applied to the entire comment data. For a total of 748,020 comments, data pre-processing was performed in the same way as the training data, and then the sentiment value of each comment was predicted through the trained deep learning model. The predicted sentiment value may be output as a real number between 0.0 and 1.0 through sigmoid, which is an activation function of a final output node.

다른 실시예에 따르면, 딥러닝 모델은 KoBERT 모델이 적용될 수 있다.According to another embodiment, a KoBERT model may be applied to the deep learning model.

BERT 모델은 대규모 데이터 세트를 사전 학습하여 간단한 미세조정(Fine-tuning)을 통해 여러가지 자연어 처리 문제를 해결하도록 고안되었다. BERT 모델은 트랜스포머(transformer)의 인코더 부분을 활용한 모델이며, 자연어 처리의 여러 태스크에서 강점을 가지고 있다. 본 발명에서는, SKTBrain에서 개발한 한국어 BERT 모델인 KoBERT 모델을 이용하여 댓글 데이터에 대하여 감성 분석이 수행될 수 있다.The BERT model is designed to solve various natural language processing problems through simple fine-tuning by pre-training on large data sets. The BERT model is a model that utilizes the encoder part of a transformer, and has strengths in various tasks of natural language processing. In the present invention, sentiment analysis can be performed on comment data using the KoBERT model, which is a Korean BERT model developed by SKTBrain.

예를 들어, KoBERT에서 제공하는 PyTorch API가 사용될 수 있다. 학습 데이터와 테스트 데이터는 7:3으로 구분될 수 있다. 데이터 최대 길이 max_len은 64, batch size는 64, epochs는 8, learning_rate는 5e-5로 설정될 수 있다. 학습 결과, 학습 데이터의 정확도는 약 92%, 손실 값은 약 0.17의 성능을 보였다. 그리고, 테스트 데이터의 정확도는 약 94%로 나타났다. For example, the PyTorch API provided by KoBERT can be used. Training data and test data can be divided into 7:3. The maximum data length max_len can be set to 64, batch size to 64, epochs to 8, and learning_rate to 5e-5. As a result of learning, the accuracy of the training data was about 92%, and the loss value was about 0.17. And, the accuracy of the test data was about 94%.

이와 같이 생성되는 KoBERT딥러닝 모델은 전술한 케라스 딥러닝 모델과 같은 방식으로 전체 댓글 데이터에 적용될 수 있다.The KoBERT deep learning model generated in this way can be applied to all comment data in the same way as the Keras deep learning model described above.

S150 단계에서, 종교 감성 분석 장치는 감성 레이블링이 수행된 댓글 데이터를 이용하여 날짜별 감성 수치를 산출한다.In step S150, the apparatus for analyzing religious sentiment calculates sentiment values for each date using the comment data for which sentiment labeling has been performed.

최초 날짜당 감성 수치는 가중 평균(Weight Average)으로부터 도출될 수 있다. 즉, 날짜당 가중 평균(W)은 하기 수학식을 이용하여 산출될 수 있다.The emotional value per first date may be derived from a weighted average. That is, the weighted average (W) per day can be calculated using the following equation.

수학식 1은, 날짜별 댓글 수의 편향이 발생할 수 있다. 예를 들어, 어느 한 날의 댓글이 1개인데 긍정이면, 긍정 가중치는 100%가 될 수 있다. 그리고, 다른 날의 댓글이 100개인데 긍정이 90개이면, 긍정 가중치는 90%가 될 수 있다. 즉, 긍정 가중치는 긍정 댓글이 1개인 날에 100개 댓글 중 긍정 댓글이 90개인 날보다 더 높게 나타날 수 있다.In Equation 1, the number of comments per date may be biased. For example, if there is one comment on a certain day and it is positive, the positive weight may be 100%. And, if there are 100 comments on another day and 90 positives, the positive weight can be 90%. That is, the positive weight may appear higher on a day when there is one positive comment than on a day when there are 90 positive comments out of 100 comments.

그래서, 날짜별로 가중 평균을 산출한 다음, 하기 수학식을 이용하여 전체 기간에 대하여 해당 날짜의 총 댓글 수를 고려하여 가중 평균을 보정할 수 있다. 즉, 전체 기간에 대하여 해당 날짜의 총 댓글 수를 고려한 날짜별 감성 평균(X)은 하기 수학식을 이용하여 산출될 수 있다.Therefore, after calculating the weighted average for each date, the weighted average can be corrected by considering the total number of replies for the entire period using the following equation. That is, the sentiment average (X) for each date considering the total number of comments for the corresponding date for the entire period may be calculated using the following equation.

도 3 내지 도 5는 본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 방법에 대한 실험 결과를 나타낸 도면이다.3 to 5 are diagrams showing experimental results of a method for analyzing religious sentiment using deep learning according to an embodiment of the present invention.

본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 방법을 위하여, 직접 설계 및 구현된 케라스 프레임워크(Keras framework) 기반의 딥러닝 모델을 이용하여 전체 댓글에 대하여 감성 수치를 예측하였다.For the religious sentiment analysis method using deep learning according to an embodiment of the present invention, emotional values were predicted for all comments using a Keras framework-based deep learning model designed and implemented directly.

도 3의 그래프에서, 초록색 점은 코로나19 이전 기간의 감성 수치이고, 빨간색 점은 코로나19 이후 기간의 감성 수치이다. Y축은 0.0 이상 1.0 이하의 실수값으로, 1.0에 가까울수록 긍정적인 경향을 보이고, 0.0에 가까울수록 부정적인 경향을 보인다. 이때, 감성 수치가 0.7 이상인 데이터가 전체 데이터의 1% 미만인 점을 고려하여, 그래프에서 Y축의 범위는 0부터 0.7로 설정하였다. X축은 날짜이며, 눈금 하나는 뉴스 기사가 존재하는 날짜이고, 년 및 월만 숫자로 표시하였다. 키워드별로 수집한 기간에 따라 나타난 종교별 감성 변화는 도 3에 도시된 바와 같다.In the graph of FIG. 3, the green dots are the sentiment values before COVID-19, and the red dots are the sentiment values after COVID-19. The Y-axis is a real number between 0.0 and 1.0. The closer to 1.0, the more positive the trend is, and the closer to 0.0, the more negative. At this time, considering the fact that the data with a sensitivity value of 0.7 or more is less than 1% of the total data, the range of the Y-axis in the graph is set from 0 to 0.7. The X-axis is the date, one scale is the date of the news article, and only the year and month are displayed as numbers. Changes in sentiment by religion according to the period of collection for each keyword are shown in FIG. 3 .

한편, KoBERT 모델의 PyTorch API를 이용하여 학습한 딥러닝 모델을 이용하여, 전체 댓글에 대한 감성 수치를 예측하였다. KoBERT 모델에서는 감성 수치가 0.5 이상인 데이터가 전체 데이터의 2% 미만인 점을 고려하여, Y축의 범위는 0부터 0.5로 설정하였다. 기독교는 코로나19 이전에도 부정 감성 수치가 가장 낮고 밀집된 점을 보아, 기존에도 부정 감성이 짙었던 것으로 나타난다. 코로나19 이후 기간에서 6월 교회 집단감염 사례와 8월 광복절 관련 집회 이슈 등으로 인해 강한 부정 감성이 지속된 것으로 보인다. 천주교는 코로나19 이후 기간에서 이전 기간보다 감성 수치의 폭이 더 좁게 나타나며, 근소하게 감소한 것이 확인된다. 집단감염 이슈는 없었지만, 다른 종교의 영향을 받은 것으로 추측된다. 불교는 코로나19 이후 기간에서 감성 수치의 차이가 미미하다.On the other hand, the deep learning model learned using the PyTorch API of the KoBERT model was used to predict sentiment values for all comments. In the KoBERT model, the Y-axis range was set from 0 to 0.5, considering that less than 2% of the total data had a sentiment value of 0.5 or higher. Judging from the fact that Christianity had the lowest and highest density of negative emotions even before COVID-19, it appears that negative emotions were strong in the past. In the post-COVID-19 period, strong negative sentiment seems to have continued due to the case of group infection in a church in June and the issue of an assembly related to Liberation Day in August. It is confirmed that Catholics show a narrower range of emotional values in the post-COVID-19 period than in the previous period, and a slight decrease. There was no issue of group infection, but it is assumed that it was influenced by other religions. In Buddhism, the difference in emotional values is minimal in the post-COVID-19 period.

신천지는 코로나19 이전 기간에서 감성 수치가 넓게 분포하고 있지만, 이후 기간에서는 일관되게 좁고 낮은 수치를 유지하고 있다. 신천지 집단감염 사례가 부정적인 영향을 미쳤음을 확인할 수 있다. 6월 이후에 상대적으로 높은 감성 수치를 보이는 구간이 존재한다. 이는 각각 6월에 신천지 회장 이만희가 구속되었다는 이슈와 7월에 신천지 신도들이 코로나19 연구에 필요한 혈장을 공여했다는 이슈로 인한 것으로 나타났다. 하지만, 이 사례는 일시적으로 긍정적인 감성을 보인 것일 뿐, 신천지에 대한 전반적인 감성을 긍정적으로 변화시켰다고 보기는 어렵다. 특히, 신천지 회장 구속 이슈에 대해서는 긍정적 반응이 나타나는데, 구속이라는 이슈의 특성상 긍정 감성이 종교에 대한 긍정 감성으로 볼 수 없다. 종교 키워드에 대한 감성은 이후 이전과 이후 기간에서 큰 차이가 존재하지 않지만, 이후 기간에서 더 좁고 밀집된 감성 수치가 나타나는 것을 확인할 수 있다.Shincheonji has a wide distribution of emotional values in the pre-COVID-19 period, but maintains a consistently narrow and low number in the post-COVID-19 period. It can be confirmed that the case of Shincheonji group infection had a negative impact. After June, there is a section showing a relatively high sensitivity level. This was due to the issue of the arrest of Shincheonji Chairman Lee Man-hee in June and the issue of donation of blood plasma needed for COVID-19 research by Shincheonji members in July, respectively. However, this case only temporarily showed positive emotions, and it is difficult to say that the overall emotions toward Shincheonji have changed positively. In particular, there is a positive response to the issue of arresting the president of Shincheonji, but due to the nature of the issue of arrest, positive emotions cannot be regarded as positive emotions toward religion. Although there is no significant difference in the sensitivity to the religious keyword between the before and after period, it can be confirmed that a narrower and denser sentiment value appears in the later period.

감성 수치의 변화에 신뢰도를 알아보기 위해 키워드별 기간에 따른 평균을 계산하였다. 도 4 및 도 5는 각각 케라스(Keras) 모델과 KoBERT 모델에서 기간에 따른 키워드별 평균을 나타낸다. 그래프의 Y 축은 기간별 평균이고, X 축은 키워드이다. 키워드별 좌측 막대는 코로나19 이전 기간에서의 감성 평균이며, 우측 막대는 이후 기간에서의 감성 평균이다.In order to find out the reliability of the change in sentiment value, the average according to the period for each keyword was calculated. 4 and 5 show averages for each keyword according to the period in the Keras model and the KoBERT model, respectively. The Y-axis of the graph is the average by period, and the X-axis is the keyword. The bar on the left for each keyword is the average sentiment in the pre-COVID-19 period, and the bar on the right is the average sentiment in the post-COVID-19 period.

도 4 및 도 5를 참조하면, 케라스 딥러닝 모델을 이용한 감성 분석에서 기독교를 제외한 모든 키워드에서 평균 감성 수치가 낮아졌다. 기독교는 이후 기간에서 근소하게 더 높아졌다. 그리고, KoBERT 모델을 이용한 감성분석에서 불교를 제외한 모든 키워드에서 평균 감성 수치가 낮아졌다.Referring to FIGS. 4 and 5 , in the sentiment analysis using the Keras deep learning model, the average sentiment value was lowered in all keywords except for Christianity. Christianity rose slightly higher in the later period. And, in the sentiment analysis using the KoBERT model, the average sentiment value was lowered in all keywords except Buddhism.

본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 방법에서는, 95%의 신뢰도를 기준으로 하여 유의 수준을 5%로 설정하였다. 귀무 가설은 '이전 기간과 이후 기간의 차이가 없다.'이고, 대립 가설은 '이전 기간과 이후 기간의 차이가 있다.'이다. 키워드별 날짜당 이전 감성 수치와 이후 감성 수치를 그룹화하였고, 각 그룹의 F-test를 진행하여 분산의 차이 유무를 확인하였다. 귀무 가설과 대립 가설에는 방향성이 존재하지 않기 때문에, 양측 검정 방법(Double-tail p-value)을 사용하였다. Microsoft Excel의 TTEST 함수를 이용하여, 두 그룹의 p-value를 계산하였다. 키워드별 p-value는 하기 표 3과 같다.In the religious sentiment analysis method using deep learning according to an embodiment of the present invention, the significance level was set to 5% based on the reliability of 95%. The null hypothesis is 'there is no difference between the previous period and the later period', and the alternative hypothesis is 'there is a difference between the previous period and the later period'. Before and after emotional values were grouped per keyword and date, and the F-test of each group was conducted to confirm the presence or absence of a difference in variance. Since there is no directionality between the null and alternative hypotheses, a double-tail p-value was used. Using Microsoft Excel's TTEST function, the p-value of the two groups was calculated. The p-value for each keyword is shown in Table 3 below.

표 3을 참조하면, 케라스 모델을 이용한 감성 분석 및 KoBERT 모델을 이용한 감성 분석에서 도출한 결과에 대한 유의성 검증 결과, 신천지, 천주교, 종교가 유의미한 감성 변화가 발생하는 것이 확인된다.Referring to Table 3, as a result of the significance verification of the results derived from the sentiment analysis using the Keras model and the sentiment analysis using the KoBERT model, it is confirmed that significant emotional changes occur in Shincheonji, Catholicism, and religions.

도 6은 본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 장치의 구성을 개략적으로 예시하여 나타낸 도면이다.6 is a diagram schematically illustrating the configuration of an apparatus for analyzing religious sentiment using deep learning according to an embodiment of the present invention.

도 6을 참조하면, 본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 장치는 프로세서(10), 메모리(20), 통신부(30) 및 인터페이스부(40)를 포함한다.Referring to FIG. 6 , an apparatus for analyzing religious sentiment using deep learning according to an embodiment of the present invention includes a processor 10, a memory 20, a communication unit 30, and an interface unit 40.

프로세서(10)는 메모리(20)에 저장된 처리 명령어를 실행시키는 CPU 또는 반도체 소자일 수 있다.The processor 10 may be a CPU or a semiconductor device that executes processing instructions stored in the memory 20 .

메모리(20)는 다양한 유형의 휘발성 또는 비휘발성 기억 매체를 포함할 수 있다. 예를 들어, 메모리(20)는 ROM, RAM 등을 포함할 수 있다.Memory 20 may include various types of volatile or non-volatile storage media. For example, memory 20 may include ROM, RAM, and the like.

예를 들어, 메모리(20)는 본 발명의 실시예에 따른 딥러닝을 이용한 종교 감성 분석 방법을 수행하는 명령어들을 저장할 수 있다.For example, the memory 20 may store instructions for performing a religious sentiment analysis method using deep learning according to an embodiment of the present invention.

통신부(30)는 통신망을 통해 다른 장치들과 데이터를 송수신하기 위한 수단이다.The communication unit 30 is a means for transmitting and receiving data with other devices through a communication network.

인터페이스부(40)는 네트워크에 접속하기 위한 네트워크 인터페이스 및 사용자 인터페이스를 포함할 수 있다.The interface unit 40 may include a network interface and a user interface for accessing a network.

한편, 전술된 실시예의 구성 요소는 프로세스적인 관점에서 용이하게 파악될 수 있다. 즉, 각각의 구성 요소는 각각의 프로세스로 파악될 수 있다. 또한 전술된 실시예의 프로세스는 장치의 구성 요소 관점에서 용이하게 파악될 수 있다.On the other hand, the components of the above-described embodiment can be easily grasped from a process point of view. That is, each component can be identified as each process. In addition, the process of the above-described embodiment can be easily grasped from the viewpoint of components of the device.

또한 앞서 설명한 기술적 내용들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예들을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.In addition, the technical contents described above may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiments or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. A hardware device may be configured to act as one or more software modules to perform the operations of the embodiments and vice versa.

상기한 본 발명의 실시예는 예시의 목적을 위해 개시된 것이고, 본 발명에 대한 통상의 지식을 가지는 당업자라면 본 발명의 사상과 범위 안에서 다양한 수정, 변경, 부가가 가능할 것이며, 이러한 수정, 변경 및 부가는 하기의 특허청구범위에 속하는 것으로 보아야 할 것이다.The embodiments of the present invention described above have been disclosed for illustrative purposes, and those skilled in the art having ordinary knowledge of the present invention will be able to make various modifications, changes, and additions within the spirit and scope of the present invention, and such modifications, changes, and additions will be considered to fall within the scope of the following claims.

10: 프로세서
20: 메모리
30: 통신부
40: 인터페이스부10: Processor
20: memory
30: Ministry of Communications
40: interface unit

Claims

In the religious sentiment analysis method using deep learning performed by the religious sentiment analysis device,
Collecting comments on news articles related to Corona 19 (COVID-19) and religion to generate comment data;
performing emotion labeling on the generated comment data;
performing data pre-processing on the comment data for which the sentiment labeling has been performed;
Calculating emotional values for positive and negative emotions for each religion by performing sentiment analysis on the preprocessed comment data using a deep learning model; and
Calculating a sentiment value for each date using the preprocessed comment data,
The deep learning model applies a Keras model, one of the Python packages, or a KoBERT model, a Korean BERT model,
The Keras model is configured to have 10 hidden layers between an input layer and an output layer,
The 10 hidden layers,
a first hidden layer to which TextVectorization, a function that vectorizes a character string, is applied;
a second hidden layer to which an embedding function for converting the vectorized character string into a dense vector having a fixed size is applied;
a third hidden layer to which a Flatten function is applied to flatten the dense vector and convert it into an array; and
With 32 units, the activation function is set to be ReLU (Rectified Linear Unit), and in order to prevent overfitting in the learning process, the fourth hidden layer to which L2 regularization is added. Including a 10th hidden layer,
The sentiment value for each day is derived from a weighted average per day,
The weighted average (W) per day is calculated using the following formula,

where k is the number of articles per day, i is the article index, count(i) is the total number of comments for article index i, avg(i) is the average sentiment value of the article, and M is the total number of all articles on that day is the sum of the number of comments,
Religious sentiment analysis method using deep learning, characterized in that the sentiment average (X) for each day corrected for the weighted average per day is calculated using the following formula.

Here, N is the total number of comments per day, L is the minimum number of comments per day for the sentiment value to gain trust, e is the average sentiment value for the entire period, and W is the weighted average value per day calculated in Equation 1

According to claim 1,
The step of performing the emotion labeling,
Religious sentiment analysis method using deep learning, characterized in that in the comment data, comments showing a positive trend are labeled with positive identifiers, and comments showing negative tendencies are labeled with negative identifiers.

delete

In the religious sentiment analysis device using deep learning,
memory for storing instructions; and
Including a processor that executes the instructions,
The command is
Collecting comments on news articles related to Corona 19 (COVID-19) and religion to generate comment data;
performing emotion labeling on the generated comment data;
performing data pre-processing on the comment data for which the sentiment labeling has been performed;
Calculating emotional values for positive and negative emotions for each religion by performing sentiment analysis on the preprocessed comment data using a deep learning model; and
Performing a religious sentiment analysis method using deep learning, which includes calculating a sentiment value for each date using the preprocessed comment data,
The deep learning model applies a Keras model, one of the Python packages, or a KoBERT model, a Korean BERT model,
The Keras model is configured to have 10 hidden layers between an input layer and an output layer,
The 10 hidden layers,
a first hidden layer to which TextVectorization, a function that vectorizes a character string, is applied;
a second hidden layer to which an embedding function for converting the vectorized character string into a dense vector having a fixed size is applied;
a third hidden layer to which a Flatten function is applied to flatten the dense vector and convert it into an array; and
With 32 units, the activation function is set to be ReLU (Rectified Linear Unit), and in order to prevent overfitting in the learning process, the fourth hidden layer to which L2 regularization is added. Including a 10th hidden layer,
The sentiment value for each day is derived from a weighted average per day,
The weighted average (W) per day is calculated using the following formula,

where k is the number of articles per day, i is the article index, count(i) is the total number of comments for article index i, avg(i) is the average sentiment value of the article, and M is the total number of all articles on that day is the sum of the number of comments,
Religious sentiment analysis device using deep learning, characterized in that the emotional average (X) for each day corrected for the weighted average per day is calculated using the following equation.

Here, N is the total number of comments per day, L is the minimum number of comments per day for the sentiment value to gain trust, e is the average sentiment value for the entire period, and W is the weighted average value per day calculated in Equation 1