KR20140101906A

KR20140101906A - Apparatus and Method for social data analysis

Info

Publication number: KR20140101906A
Application number: KR1020130014943A
Authority: KR
Inventors: 이주양; 장필식
Original assignee: 이주양; 장필식
Priority date: 2013-02-12
Filing date: 2013-02-12
Publication date: 2014-08-21
Also published as: KR101448228B1

Abstract

In order to achieve the above purpose, an apparatus for analyzing social data according to the present invention includes: a data collecting unit which collects data on content present in a server group through a communication network, and stores the collected data; a message analysis unit which analyzes the data in morphological units matched to a morpheme DB and an emotional vocabulary DB, tags the data with parts of speech in the morphological units, and searches out and stores emotional vocabularies; an emotion evaluation unit which receives, from the message analysis unit, unregistered vocabularies which are not tagged with parts of speed, performs emotion evaluation, and stores, in an emotion DB, standard language evaluation vocabularies corresponding to the unregistered vocabularies; an emotion analysis unit which performs an emotional engineering analysis of an emotion measurement target, based on a plurality of emotional vocabularies provided by the message analysis unit and the emotion evaluation unit, and extracts a main emotion image from the emotion measurement target; and an analysis result display unit which displays the emotion measurement target and the main emotion component on an image space in real time. The present invention can naturally identify mixed emotions based on ordinary language, slang, jargon, abbreviation, emoticon, etc. and can measure the hidden emotions of consumers by utilizing documents, internet comments, SNS message text data spontaneously made by the consumers.

Description

[0001] Apparatus and Method for Social Data Analysis [

본 발명은 통신망을 통해 수신된 소셜 데이터를 분석하는 장치 및 방법에 관한 것이다. 더욱 상세하게는, 인터넷 및 소셜미디어를 통하여 취합된, 문서, 인터넷 댓글, SNS(Social Network Service) 메시지 텍스트 등으로부터 신제품 컨셉, 브랜드, 브랜드 컨셉, 네이밍, 디자인, 연예인이나 정치인 등의 사람과 모든 사물에 대한 세부적 이미지와 세부 감성을 자동으로 정량 측정, 평가, 분석하여 실시간으로 결과를 제시할 수 있는 감성 측정 및 분석 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and a method for analyzing social data received via a communication network. More specifically, the present invention relates to a new concept, a brand concept, a naming, a design, a person or an object such as an entertainer or a politician, etc. from a document, an internet comment, and a social network service The present invention relates to an apparatus and method for emotional measurement and analysis capable of automatically quantifying, evaluating, and analyzing detailed images and detailed emotions.

현재 스마트 폰과 인터넷의 보급 확대 등으로 디지털 데이터는 기존의 방식으로 다루기 어려울 정도로 폭발적으로 증가 추세에 있으며, 빅데이터(big data)에서 누가 먼저 그 가치를 추출하는 가가 기업과 국가의 발전 성패를 결정할 것으로 예상되고 있다. 하지만, 현재의 소셜미디어를 포함한 빅데이터에 대한 기계적 분류 분석방법은 무가치한 결과를 양산하고 있으며, 빅데이터가 도움이 되려면 감정, 감성까지 헤아리는 감성분석이 이루어져야 한다. 이와 관련된 지금까지의 감성측정, 분석 기술은 긍정/부정 등 두세 개 큰 범주로 감성을 분류하는'감성분석(감정분석, Sentiment Analysis)'기술과, 설문평가 방법을 통해 감성을 측정 분석하는 '감성공학 기술'로 대별된다. With the expansion of smartphone and the Internet nowadays, digital data is explosively increasing to be difficult to handle in the conventional way, and who extracts the value from big data first determines the success of the enterprise and the nation. . However, the mechanical classification method for big data including current social media is mass producing results, and emotional and emotional analysis should be done for big data to help. The emotional measurement and analysis techniques related to this are the 'Sentiment Analysis' technology which classifies the emotions into two or three big categories such as positive / negative and the 'emotional' Engineering technology '.

감성분석은 ‘오피니언 마이닝’(Opinion Mining)이나 '평판 분석'이라고도 불린다. 메시지 등 텍스트에 포함된 내용이 주관적(Subjective)인지 객관적(Objective)인지 판별하고, 주관적이면 극성(Polarity)을 분석하여 내용이 긍정적(Positive)인지 부정적(Negative)인지 판별하거나 중립인지를 판단한다. 즉, 해당 글을 쓴 사람의 글을 분석하여 문맥상으로 긍정인지 부정인지 찾아내어 특정 상품이나 서비스 등에 대한 사용자의 반응이나 여론을 파악하는데 이용되고 있다. 이러한 감성분석를 이용하면 소셜 미디어와 같은 온라인 상의 여론을 비교적 신속하게 파악할 수 있으며, 기존의 오픈라인 여론조사에 비해서 시간과 비용을 줄이고 사람들의 의견을 쉽게 파악하고 예측할 수 있는 것으로 알려져 있다.Emotional analysis is also called 'Opinion Mining' or 'reputation analysis'. And whether the content contained in the text such as a message is subjective or objective and if the subject is subjective, it is analyzed to determine whether the content is positive or negative or whether the content is neutral. In other words, the article written by the person who wrote the article is analyzed to determine whether it is positively or negatively in the context, and is used to grasp the user's reaction or opinion on a specific product or service. This emotion analysis is known to be able to grasp online opinion like social media relatively quickly, and it can save time and money compared to existing open-line survey, and can easily grasp and predict people's opinion.

다만, 한국 공개 특허 제 2012-0108095 등 종래기술은 다음과 같은 문제점을 가지고 있다. 첫째, 단순하게 긍정, 부정 두 개 범주 또는 긍정, 부정, 중립의 세 개 범주로 나누는 것은 활용에 있어 큰 제약으로 작용하게 된다. 즉, ‘복고풍의’-‘신세대 감각의’, ‘여성스러운’-‘남성스러운’ 등의 감성은 특정 상품이나 인물에 대한 이미지 파악에 있어 중요한 감성이지만, 단순히 ‘중립’으로 분류되며, 세부 감성 및 감정은 파악이 불가능하다. 오히려 극성의 판단이 애매한 감성들을 두 개(긍정/부정) 또는 세 개(긍정/부정/중립)의 범주로 무리하게 끼워 넣음으로써, 감성분석의 정확성을 훼손하게 된다. 둘째, 기존 감성분석 기술들에서는 비속어, 은어(隱語), 약어, 이모티콘 들은 필터링하여 제외하고, 표준 어휘만 분석하고 있다. 하지만, SNS 나 홈페이지 댓글, 블로그 등에 사용자 들이 올리는 많은 글(텍스트) 들은 구어 및 은어 비속어, 이모티콘을 포함하고 있으며, 이들 텍스트들이 필터링 됨으로써, 다양한 감성을 포함하는 데이터 들이 감성분석과정에서 제외되고 있다. 따라서, 취합된 문서, 인터넷 댓글, SNS(Social Network Service) 메시지 텍스트 등으로부터, 비속어, 은어, 약어 등 표준어휘의 범주에 벗어나는 데이터를 포함한, 사람 및 모든 사물에 대한 세부적 이미지와 세부 감성을 자동으로 정량 측정, 평가, 분석할 수 있는 소셜 데이터 분석 장치 및 방법이 필요하다.However, the prior arts such as Korean Patent Publication No. 2012-0108095 have the following problems. First, dividing into three categories of positive, negative, or positive, negative, and neutral is a big constraint on the application. In other words, emotions such as 'retro', 'new generation', 'feminine' and 'masculine' are important emotions in image appreciation for specific products or characters, but they are simply classified as 'neutral' And feelings can not be grasped. Rather, polarity judgments undermine the accuracy of emotional analysis by forcing ambiguous emotions into two (positive / negative) or three (positive / negative / neutral) categories. Second, existing emotion analysis techniques exclude profanity, abbreviation, abbreviation, and emoticon filtering and analyze only standard vocabulary. However, many articles (texts) such as SNS, homepage comments, and blogs contain spoken words, idiomatic expressions, and emoticons. By filtering these texts, data including various emotions are excluded from the emotional analysis process. Therefore, detailed images and detailed emotions of people and objects, including data that are out of the standard vocabulary categories such as profanity, fluency, abbreviation, etc., are collected automatically from collected documents, Internet comments, SNS (Social Network Service) There is a need for social data analysis devices and methods that can quantitatively measure, evaluate and analyze.

본 발명은 전술한 종래 기술의 단점을 해결하기 위하여, 일상적 언어, 및 비속어, 은어(隱語), 약어, 이모티콘 등을 기초로 복합적인 감성을 자연스럽게 파악할 수 있는 소셜 데이터 분석 장치 및 방법을 제공하는 것을 목적으로 한다.Disclosure of Invention Technical Problem [8] Accordingly, the present invention has been made to solve the above problems of the related art by providing a social data analysis apparatus and method capable of naturally grasping complex emotions based on everyday language, profanity, abbreviation, abbreviation, emoticon, The purpose.

또한, 본 발명은 자발적인 다수의 소비자에 의해 작성된 문서, 인터넷 댓글, SNS 메시지 텍스트 데이터로부터 소비자의 숨겨진 감성을 측정하고 평가하는 소셜 데이터 분석 장치 및 방법을 제공하는 것을 목적으로 한다. It is another object of the present invention to provide a social data analysis apparatus and method for measuring and evaluating the hidden emotions of consumers from documents prepared by a plurality of consumers, Internet comments, and SNS message text data.

또한, 본 발명은 수개의 주된 감성 이미지 요인을 특정시점 또는 특정기간에 대한 이미지 공간상에 위치시킴으로써 실시간, 연속적인 분석 결과를 표시하는 소셜 데이터 분석 장치 및 방법을 제공하는 것을 목적으로 한다.It is another object of the present invention to provide an apparatus and method for analyzing social data that displays real-time and continuous analysis results by locating several main emotional image factors on an image space for a specific time point or a specific time period.

상기한 목적을 달성하기 위한 본 발명에 따른 소셜 데이터 분석 장치는 통신망을 통해 서버군에 존재하는 콘텐츠의 데이터를 수집하고, 저장하는 데이터 취합부; 상기 데이터를 형태소 DB 및 감성어휘 DB에 매칭되는 형태소 단위로 분석하여, 상기 형태소 단위로 품사를 태깅(tagging)하고, 감성어휘를 색출하여 저장하는 메시지 분석부; 상기 메시지 분석부에서 품사 태깅이 되지 않은 미등록 어휘에 대하여 감성평가를 하며, 상기 감성평가의 결과로 얻어진 상기 미등록 어휘에 대응하는 표준어 감성어휘를 감성 DB에 저장하는 감성 평가부; 상기 메시지 분석부와 상기 감성 평가부에서 제공되는 감성어휘를 기반으로 감성측정 대상에 대한 감성공학적 분석 후 상기 감성측정 대상에 대한 대표적인 감성 성분인 주된 감성 성분을 추출하는 감성 분석부; 및 상기 감성측정 대상과 상기 주된 감성 성분을 실시간으로 이미지 공간상에 표시하는 분석결과 표시부를 포함한다.According to an aspect of the present invention, there is provided a social data analysis apparatus comprising: a data collection unit for collecting and storing data of contents existing in a server group through a communication network; Analyzing the data in a morpheme unit matched with the morpheme DB and the emotional vocabulary DB, tagging parts of speech by the morpheme unit, extracting emotional vocabulary and storing the sensed vocabulary; An emotion evaluation unit for performing emotional evaluation on an unregistered vocabulary that is not partly tagged in the message analyzing unit and storing a standard word emotional vocabulary corresponding to the unregistered vocabulary obtained as a result of the emotional evaluation in an emotional DB; An emotional analysis unit for extracting a main emotional component, which is a representative emotional component for the emotional measurement object, after the emotional engineering analysis of the emotional measurement object based on the emotional vocabulary provided by the message analysis unit and the emotional evaluation unit; And an analysis result display unit for displaying the sensibility measurement object and the main sensibility component on an image space in real time.

이 때, 상기 데이터 취합부는, 데이터의 키워드 또는 데이터를 수집할 기간을 입력받아, 상기 입력에 대응하여 데이터를 수집하고 저장하며, 상기 데이터의 출처 또는 상기 데이터의 업로더 별로 분류하여 저장할 수 있다.In this case, the data collecting unit may receive a keyword of data or a period of collecting data, collect and store data corresponding to the input, and store the classified data according to the source of the data or the uploader of the data.

이 때, 상기 데이터 취합부는, 다수의 컴퓨터, 멀티 프로세서 또는 멀티스레드를 동시에 활용하여 데이터를 수집하고 저장하는 병렬처리 기술이 지원되며, 상기 병렬처리 기술을 통하여 수집된 데이터 중 중복 데이터는 삭제하며, 상기 데이터를 시간별, 유형별로 통합하며, 상기 병렬처리 기술이 진행 중에 중단된 경우, 자동으로 연결되어 상기 중단된 데이터 이후 데이터를 계속하여 수집하고, 저장할 수 있다.At this time, the data collecting unit supports a parallel processing technique for collecting and storing data by utilizing a plurality of computers, a multiprocessor or a multi-thread simultaneously, deletes redundant data among data collected through the parallel processing technique, If the parallel processing technique is interrupted during the process, the data can be automatically connected to continuously collect and store the data after the interrupted data.

이 때, 상기 메시지 분석부는, 띄어쓰기 없는 짧은 패턴의 반복 또는 의미 없는 특수기호의 반복 사용 등을 필터링하여 제외하거나, 축약하여 분석하는 입력필터 모듈; 및 품사 태깅이 되지 않은 비속어, 은어, 약어, 이모티콘 중 적어도 어느 하나에 해당하는 미등록 어휘를 감성 평가부로 전달하는 미등록어 처리 모듈을 포함할 수 있다.In this case, the message analyzing unit may include an input filter module for filtering or eliminating or repeating short patterns without repetition or repetitive use of meaningless special symbols. And an unregistered word processing module for delivering an unregistered vocabulary corresponding to at least one of profanity, hangeul, abbreviation, and emoticon that has not been tagged with part-of-speech to the sensitivity evaluation unit.

이 때, 상기 메시지 분석부는, 다수의 컴퓨터, 멀티 프로세서 또는 멀티스레드를 동시에 활용하는 병렬처리 기술이 지원될 수 있다.At this time, the message analyzing unit may support a parallel processing technique that utilizes a plurality of computers, a multiprocessor, or a multi-thread simultaneously.

이 때, 상기 메시지 분석부는, 형태소 DB와 감성어휘 DB를 기반으로 감성 및 감정과 관련된 어휘를 구별하고, 상기 감성 및 감정과 관련된 어휘의 주체 및 객체를 분리하여 저장할 수 있다.At this time, the message analyzing unit can distinguish the vocabulary related to the emotion and emotion based on the morpheme DB and the emotional vocabulary DB, and can separate and store the subject and object of the vocabulary related to the emotion and emotion.

이 때, 상기 감성 평가부는, 표준어 감성어휘쌍을 이용하여 감성평가를 하며, 온라인 접속을 통하여 평가자로 하여금 상기 표준어 감성어휘쌍에 대한 평가가 이루어 질 수 있다.At this time, the emotion evaluation unit performs emotional evaluation using a pair of standard word emotional vocabularies, and an evaluator can evaluate the standard word emotional vocabulary pairs through an online connection.

이 때, 상기 감성 평가부는, 온라인 접속을 통하여 평가자로 하여금 상기 품사 태깅이 되지 않은 미등록 어휘를 대체할 수 있는 표준어를 2 이상의 표준어들의 가중합으로 선정할 수 있다.At this time, the emotion evaluation unit may select a standard word that can be substituted for an unregistered vocabulary that is not part-tagged by the evaluator through online connection, as a weighted sum of two or more standard words.

이 때, 상기 감성 평가부는, 상기 품사 태깅이 되지 않은 미등록 어휘에 대한 감정 및 감성의 강도를 직접 입력 받거나, 온라인 설문 평가를 통해 평가할 수 있다.At this time, the emotion evaluation unit may directly receive the intensity of the emotion and emotion for the unregistered vocabulary that has not been tagged with the part-of-speech, or may be evaluated through an online questionnaire evaluation.

이 때, 상기 감성 분석부는, 감성측정 대상에 대한 감성어휘들과 상기 어휘들의 빈도 및 감정적 강도를 조합한 것을 기반으로 성긴 주성분 분석(SPCA; Sparse Principal Component Analysis)을 함으로써 상기 감성측정 대상의 주된 감성 성분을 추출할 수 있다.At this time, the emotional analysis unit performs sparse principal component analysis (SPCA) based on a combination of the emotional vocabulary of the emotional measurement target and the frequency and the emotional intensity of the vocabulary, The components can be extracted.

이 때, 상기 주된 감성 성분의 추출은, 특정 시점 또는 특정 기간을 설정하여 이루어질 수 있다.At this time, the extraction of the main emotional component may be performed by setting a specific point in time or a specific period.

이 때, 상기 분석 결과 표시부는, 상기 감성 분석부에서 추출되는 수개의 주된 감성 성분을 2차원 내지 3차원의 축으로 구성하고, 감성측정 대상의 특정 시점에 대한 이미지 공간상의 위치를 실시간으로 표시하거나, 특정 기간 동안의 시간의 추이에 따른 분석결과를 애니메이션 형식으로 표시할 수 있다.At this time, the analysis result display unit may be configured such that a plurality of principal emotion components extracted from the emotion analyzing unit are configured as two-dimensional or three-dimensional axes, and the position on the image space with respect to a specific time point of the emotion measurement object is displayed in real time , And the analysis result according to the change of time during a specific period can be displayed in an animation format.

이 때, 상기 분석 결과 표시부는, 감성측정 대상 여러 개를 공통의 감정 및 감성 이미지 요인들로 구성된 이미지 공간상에 동시에 표시할 수 있다.
At this time, the analysis result display unit can simultaneously display several sensibility measurement objects on an image space composed of common emotional and emotional image factors.

또한, 상기한 목적을 달성하기 위한 본 발명에 따른 소셜 데이터 분석 방법은 통신망을 통해 서버군에 존재하는 콘텐츠의 데이터를 수집하고, 저장하는 단계; 상기 데이터를 형태소 DB 및 감성어휘 DB에 매칭되는 형태소 단위로 분석하여, 상기 형태소 단위로 품사를 태깅(tagging)하고, 감성어휘를 색출하여 저장하는 단계; 상기 색출하여 저장하는 단계에서 품사 태깅이 되지 않은 미등록 어휘에 대하여 감성평가를 하며, 상기 감성평가의 결과로 얻어진 상기 미등록 어휘에 대응하는 표준어 감성어휘를 감성 DB에 저장하는 단계; 상기 색출하여 저장하는 단계와 감성 DB에 저장하는 단계에서 제공되는 감성어휘를 기반으로 감성측정 대상에 대한 감성공학적 분석 후 상기 감성측정 대상에 대한 대표적인 감성 성분인 주된 감성 성분을 추출하는 단계; 및 상기 감성측정 대상과 상기 주된 감성 성분을 실시간으로 이미지 공간상에 표시하는 단계를 포함한다.According to another aspect of the present invention, there is provided a method of analyzing social data, comprising: collecting and storing data of contents existing in a server group through a communication network; Analyzing the data in a morpheme unit matched with the morpheme DB and the emotional vocabulary DB, tagging parts of speech by the morpheme unit, searching and storing emotional vocabulary; Performing emotional evaluation on an unregistered vocabulary that is not part-tagged in the detecting and storing step and storing a standard word emotional vocabulary corresponding to the unregistered vocabulary obtained as a result of the emotional evaluation in an emotional DB; Extracting and storing a main emotion component as a representative emotion component for the emotion measurement object after emotion engineering analysis of the emotion measurement object based on the emotion word provided in the step of extracting and storing the emotion word and storing the emotion word in the emotion DB; And displaying the sensibility measurement object and the main sensibility component on the image space in real time.

이 때, 상기 수집하고, 저장하는 단계는, 데이터의 키워드 또는 데이터를 수집할 기간을 입력받아, 상기 입력에 대응하여 데이터를 수집하고 저장하며, 상기 데이터의 출처 또는 상기 데이터의 업로더 별로 분류하여 저장하며, 다수의 컴퓨터, 멀티 프로세서 또는 멀티 스레드를 동시에 활용하여 데이터를 수집하고 저장하는 병렬처리 기술이 지원되며, 상기 병렬처리 기술을 통하여 수집된 데이터 중 중복 데이터는 삭제하며, 상기 데이터를 시간별, 유형별로 통합하며, 상기 병렬처리 기술이 진행 중에 중단된 경우, 자동으로 연결되어 상기 중단된 데이터 이후 데이터를 계속하여 수집하고, 저장할 수 있다.In this case, the collecting and storing step may include receiving a keyword of data or a period of collecting data, collecting and storing data corresponding to the input, and classifying the data by the source of the data or the uploader of the data A parallel processing technique for collecting and storing data by using a plurality of computers, a multiprocessor or a multi-thread simultaneously is supported, and redundant data among data collected through the parallel processing technique is deleted, And if the parallel processing technique is interrupted during the process, it is automatically connected to continue collecting and storing the data after the interrupted data.

이 때, 상기 색출하여 저장하는 단계는, 형태소 DB와 감성어휘 DB를 기반으로 감성 및 감정과 관련된 어휘를 구별하고, 상기 감성 및 감정과 관련된 어휘의 주체 및 객체를 분리하여 저장하며, 다수의 컴퓨터, 멀티 프로세서 또는 멀티스레드를 동시에 활용하는 병렬처리 기술이 지원될 수 있다.At this time, the extracting and storing step distinguishes vocabulary related to emotion and emotion based on the morpheme DB and the emotional vocabulary DB, separates and stores the subject and object of the vocabulary related to the emotion and emotion, , Parallel processing techniques that utilize multiprocessor or multithread simultaneously can be supported.

이 때, 상기 색출하여 저장하는 단계는, 띄어쓰기 없는 짧은 패턴의 반복 또는 의미 없는 특수기호의 반복 사용 등을 필터링하여 제외하거나, 축약하여 분석하며, 품사태깅이 되지 않은 비속어, 은어, 약어, 이모티콘 중 적어도 어느 하나에 해당하는 미등록 어휘가 존재하는 경우에 한하여, 상기 감성 DB에 저장하는 단계를 진행할 수 있다.At this time, the extracting and storing step may be performed by eliminating or shortening the repetition of a short pattern without spacing or repetitive use of a meaningless special symbol, and analyzing it by using an abbreviated word, abbreviation, abbreviation, or emoticon Only when there is at least one unregistered vocabulary corresponding to at least one of the unregistered vocabularies is stored in the emotional DB.

이 때, 상기 감성 DB에 저장하는 단계는, 표준어 감성어휘쌍을 이용하여 감성평가를 하며, 온라인 접속을 통하여 평가자로 하여금 상기 표준어 감성어휘쌍에 대한 평가가 이루어지도록 하며, 온라인 접속을 통하여 평가자로 하여금 상기 품사 태깅이 되지 않은 미등록 어휘를 대체할 수 있는 표준어를 2 이상의 표준어들의 가중합으로 선정하도록 하며, 상기 품사 태깅이 되지 않은 미등록 어휘에 대한 감정 및 감성의 강도를 직접 입력 받거나, 온라인 설문 평가를 통해 평가할 수 있다.At this time, in the step of storing in the emotional DB, emotional evaluation is performed using a pair of standard word emotional vocabulary, and the evaluator is allowed to evaluate the standard word emotional vocabulary pair through online connection, The standard word that can replace the unregistered vocabulary which is not tagged with the part-of-speech is selected as a weighted sum of two or more standard words, and the intensity of emotion and emotion for the unregistered vocabulary that is not tagged with the part- .

이 때, 상기 추출하는 단계는, 감성측정 대상에 대한 감성어휘들과 상기 어휘들의 빈도 및 감정적 강도를 조합한 것을 기반으로 성긴 주성분 분석(SPCA; Sparse Principal Component Analysis)을 함으로써 상기 감성측정 대상의 주된 감성 성분을 추출하되, 상기 주된 감성 성분의 추출은, 특정 시점 또는 특정 기간을 설정하여 이루어질 수 있다.At this time, the extracting step may be performed by performing Sparse Principal Component Analysis (SPCA) based on a combination of emotional vocabulary for an emotional measurement target and frequency and emotional intensity of the vocabulary, Extracting the sensible component, and extracting the main sensible component may be performed by setting a specific point in time or a specific period.

이 때, 상기 표시하는 단계는, 상기 추출하는 단계에서 추출되는 수개의 주된 감성성분을 2차원 내지 3차원의 축으로 구성하고, 감성측정 대상의 특정 시점에 대한 이미지 공간상의 위치를 실시간으로 표시하거나, 특정 기간 동안의 시간의 추이에 따른 분석결과를 애니메이션 형식으로 표시하며, 감성측정 대상 여러 개를 공통의 감정 및 감성 이미지 요인들로 구성된 이미지 공간상에 동시에 표시할 수 있다.At this time, in the displaying step, a plurality of principal emotion components extracted in the extracting step are constituted by two-dimensional or three-dimensional axes, and the position on the image space with respect to a specific time point of the sensitivity measurement object is displayed in real time , An analysis result according to the change of time during a specific period is displayed in an animation format, and a plurality of objects for sensitivity measurement can be simultaneously displayed on an image space composed of common emotional and emotional image factors.

본 발명은, 일상적 언어, 및 비속어, 은어(隱語), 약어, 이모티콘 등을 기초로 복합적인 감성을 자연스럽게 파악할 수 있는 소셜 데이터 분석 장치 및 방법을 제공하는 효과가 있다.The present invention has an effect of providing a social data analysis apparatus and method capable of naturally grasping complex emotions based on everyday language, profanity, abbreviation, abbreviation, emoticon, and the like.

또한, 본 발명은 자발적인 다수의 소비자에 의해 작성된 문서, 인터넷 댓글, SNS 메시지 텍스트 데이터를 활용하므로 소비자의 숨겨진 감성을 측정하고 평가하는 소셜 데이터 분석 장치 및 방법을 제공하는 효과가 있다.Further, the present invention provides an apparatus and method for analyzing social data that measures and evaluates the hidden emotions of the consumers, utilizing the documents, Internet comments, and SNS message text data that are created by a plurality of consumers.

또한, 본 발명은 수개의 주된 감성 이미지 요인을 특정시점 또는 특정기간에 대한 이미지 공간상에 위치시킴으로써 실시간, 연속적인 분석 결과를 표시할 수 있는 효과가 있다.In addition, the present invention has the effect of displaying real-time and continuous analysis results by placing several main emotional image factors on the image space for a specific time point or a specific time period.

도 1은 본 발명에 따른 소셜 데이터 분석 장치의 구성에 대하여 설명한 도면이다.
도 2는 본 발명에 따른 소셜 데이터 분석 장치의 데이터 취합부가 서버군으로부터 콘텐츠 데이터를 수집하는 모습을 설명한 도면이다.
도 3은 본 발명에 따른 소셜 데이터 분석 장치의 메시지 분석부의 구성 및 작용에 관하여 설명한 도면이다.
도 4는 본 발명에 따른 소셜 데이터 분석 방법의 동작 흐름도를 설명한 도면이다.
도 5는 본 발명에 따른 소셜 데이터 분석 장치에서 활용하는 표준어 감성어휘쌍의 예를 도시한 도면이다.
도 6은 본 발명에 따른 소셜 데이터 분석 장치의 감성 평가부에서 활용되는 주성분 분석의 예를 도시한 도면이다.
도 7은 분석 결과 표시부의 출력의 예를 나타낸 도면이다.
도 8은 분석 결과 표시부의 출력의 다른 예를 나타낸 도면이다.
도 9는 분석 결과 표시부의 출력의 또 다른 예를 나타낸 도면이다.1 is a view for explaining a configuration of a social data analysis apparatus according to the present invention.
FIG. 2 is a view for explaining an example of collecting contents data from a data collecting unit server group of a social data analyzing apparatus according to the present invention.
FIG. 3 is a diagram for explaining the configuration and operation of a message analyzing unit of the social data analyzing apparatus according to the present invention.
4 is a flowchart illustrating an operation of the method for analyzing social data according to the present invention.
FIG. 5 is a diagram illustrating an example of a standard word emotional lexical pair used in the social data analyzing apparatus according to the present invention.
6 is a diagram showing an example of principal component analysis used in the sensitivity evaluation unit of the social data analysis apparatus according to the present invention.
7 is a diagram showing an example of the output of the analysis result display unit.
8 is a diagram showing another example of the output of the analysis result display unit.
9 is a diagram showing another example of the output of the analysis result display unit.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.
The present invention will now be described in detail with reference to the accompanying drawings. Hereinafter, a repeated description, a known function that may obscure the gist of the present invention, and a detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Accordingly, the shapes and sizes of the elements in the drawings and the like can be exaggerated for clarity.

감성분석을 이용하면 소셜 미디어와 같은 온라인 상의 여론을 비교적 신속하게 파악할 수 있으며, 기존의 오프라인 여론조사에 비해서 시간과 비용을 줄이고 사람들의 의견을 쉽게 파악하고 예측할 수 있는 것으로 알려져 있다. 긍정/부정을 분석하는 주요 기법에는 다음과 같은 기법이 있으며, 이 기법 중에서 여러 기법을 함께 활용하여 분석하기도 한다. SVMs(Support Vector Machines)방식은 미리 사전에 긍정/부정으로 분류된 학습 데이터(Training Sets)로 텍스트의 긍정/부정 의견을 분류하는 방식이며, N-grams or Part Of Speech는 N-grams 단어 구조로 긍정/부정을 찾는 방식으로 "I do not like to drink tea"라는 문장이 있고 이 문장을 Bi-gram Decomposition으로 분리하면, "I-do", "do-not", "not-like"와 같은 식으로 분리한다. 이 때, N은 주로 1, 2, 3까지 사용한다. Lexicon-based Approach는 사전에 미리 정의된 긍정/부정 Bag of Words(1-grams or Uni-grams)를 이용하여 텍스트에 포함된 긍정/부정 단어의 출현 빈도로 긍정과 부정을 판별하는 기법으로 LIWC(Linguistic Inquiry and Word Count)나 POMS(Profile of Mood States) 같은 사전을 이용할 수 있다. 또한, Linguistic Approach는 텍스트의 문법적인 구조를 파악하여 극성을 판별하는 기법으로 주로 Lexicon-based Approach 방식과 함께 사용한다. 문맥(context) 등을 파악하여 극성을 판별하는 것이 특징이다.Using emotional analysis, it is known that online opinion such as social media can be understood relatively quickly, and it is known that it can save time and money compared to existing offline opinion poll, and can easily grasp and predict opinions of people. The main techniques for analyzing positive / negative are the following techniques, and some of these techniques are analyzed together. The Support Vector Machines (SVMs) method is a method of classifying affirmative / negative opinions of text with training sets previously classified as affirmative / negative, and N-grams or Part Of Speech is classified into N-grams word structure There is a sentence called "I do not like to drink tea" in the way of finding affirmation / negation and separating this sentence into Bi-gram Decomposition, and I-do, do-not and not-like . At this time, N is mainly used up to 1, 2, and 3. Lexicon-based Approach is a technique to identify affirmation and negation with the pre-defined positive / negative Bag of Words (1-grams or Uni-grams) Linguistic Inquiry and Word Count) or POMS (Profile of Mood States). In addition, the Linguistic Approach is a technique for discriminating the polarity by grasping the grammatical structure of text, and is mainly used in conjunction with the Lexicon-based Approach method. It is characterized by distinguishing the polarity by grasping the context and the like.

이렇게 텍스트 데이터로부터 감성 및 감정을 추출하고자 하는 감성분석과는 별개로, 다양한 상품이나 디자인을 대상으로 감성을 직접 정량적으로 측정, 활용하고자하는 감성공학(human sensibility ergonomics / sensibility ergonomics / image technology)이라는 분야도 현재 활성화 되고 있다. 감성공학은 개인의 경험을 통해 얻어지는 외부의 물리적 자극에 대한 쾌적함·안락함 또는 불쾌함·불편함 등의 복합적인 감성을 과학적으로 측정·분석하여 공학적으로 적용시켜 제품이나 환경을 그것에 맞게 편리하고 안락하며 쾌적하게 개발하려는 분야이다. 크게 생체측정, 인간의 오감센서 및 감성처리, 감성디자인, 마이크로가공, 사용성 평가 기술 등의 분야로 대별된다. 감성공학적 기법은 일반적으로, 다수의 감성형용사 쌍을 평가자에게 제시하고, 이를 이용하여 대상을 평가하도록 하는 SD(Semantic Differential)법을 이용하여 감성을 정량적으로 측정하며, 요인분석(Factor Analysis), MDS(Multi Dimensional Scaling) 등의 분석방법을 통해, 감성 맵(map) 형태의 결과를 보여주게 된다. 현재 까지 자동차 계기판, 인테리어, 주택, 화장품용기, 전기자동차 외장, 요트 외장 디자인 등에 널리 사용되어 오고 있으며, 브랜드, 브랜드 컨셉·디자인 및 네이밍의 진단평가, 사물 및 사람에 대한 이미지 진단 평가에도 활용 가능하다.
In addition to the emotional analysis of extracting emotions and emotions from the text data, it is also known as human sensibility ergonomics (sensibility ergonomics / image technology), which aims to quantitatively measure emotions directly to various products and designs Is currently being activated. Sensibility engineering is a technique that scientifically measures and analyzes a complex sensibility such as comfort, comfort, discomfort, and inconvenience of an external physical stimulus obtained through an individual experience, so that the product or environment is convenient and comfortable It is a field to develop comfortably. It is largely divided into biometrics, human sense of the five senses and emotional processing, emotional design, micro processing, and usability evaluation technology. Emotion engineering techniques are generally used to quantitatively measure emotions using the Semantic Differential (SD) method, which presents a large number of pairs of emotional adjectives to the evaluator, (Multi Dimensional Scaling), and the like, the results of the emotion map type are displayed. It has been widely used in automotive instrument panel, interior, housing, cosmetic container, electric car exterior, yacht exterior design, etc. It can also be used for diagnostic evaluation of brand, concept, design and naming, image diagnosis evaluation of objects and people .

이하에서는 본 발명에 따른 소셜 데이터 분석 장치의 구성과 기능에 대하여 설명한다.Hereinafter, the structure and functions of the social data analysis apparatus according to the present invention will be described.

도 1은 본 발명에 따른 소셜 데이터 분석 장치의 구성에 대하여 설명한 도면이다.1 is a view for explaining a configuration of a social data analysis apparatus according to the present invention.

상기 도 1을 참조하면, 소셜 데이터 분석 장치(100)는 서버군(10)으로 부터 콘텐츠 데이터를 수집하는 것을 알 수 있고, 데이터 취합부(110) 메시지 분석부(120), 형태소 DB(121), 감성어휘 DB(122), 감성 평가부(130), 감성 DB(131), 감성 분석부(140), 분석결과 표시부(150)로 구성되어 있다.
Referring to FIG. 1, the social data analysis apparatus 100 can know that it collects content data from the server group 10, and includes a data collecting unit 110, a message analyzing unit 120, a morpheme DB 121, A sensibility database 130, a sensibility DB 131, an emotion analyzing unit 140, and an analysis result display unit 150. The emotion analyzing unit 140,

도 2는 본 발명에 따른 소셜 데이터 분석 장치의 데이터 취합부가 서버군으로부터 콘텐츠 데이터를 수집하는 모습을 설명한 도면이다. FIG. 2 is a view for explaining an example of collecting contents data from a data collecting unit server group of a social data analyzing apparatus according to the present invention.

도 1 및 2를 참조하면, 데이터 취합부(110)는 인터넷 망 등의 통신망(20)을 통해 서버군(10)에 존재하는 콘텐츠의 데이터를 수집하고, 저장한다. 이 때, 본 발명과 관련하여 '소셜 데이터(Social Data)'란 블로그(Blog), 카페(Cage), 포털(Portal), 쇼밍몰, 트위터 등 각종 온라인 서비스를 이용하는 사용자들이 업로드한 데이터를 의미한다. 상기 데이터 취합부(110)는 다양한 방식을 이용하여 소셜 데이터를 수집할 수 있다. 예컨대, URL Request 방식, 검색엔진을 이용하는 방식, 브라우저를 직접 실행하는 방식 등을 이용하여 소셜 데이터를 수집할 수 있다. 또한, 상기 데이터 취합부(110)는 데이터의 키워드 또는 데이터를 수집할 기간을 입력받아서 상기 입력에 매칭되는 조건으로 현재 또는 그 동안 내에 올려진, 키워드를 포함하는 소셜미디어 메시지나 홈페이지 내 텍스트, 댓글 등을 자동으로 수집하여 상기 데이터 취합부의 DB에 저장할 수 있다. 또한, 상기 수집된 각 소셜미디어 메시지, 텍스트 댓글, 의견, 미디어 데이터 뭉치에서 메시지, URL, 업로드한 사람의 정보(성별, 지역) 등을 유형화 하여 추출하고, 분류하여 DB에 저장할 수 있다. 그리고 상기 데이터 취합부(110)는 단위 시간 내에 많은 양의 데이터를 수집하기 위하여 여러 대의 컴퓨터, 멀티프로세서, 멀티스레드를 동시에 활용하는 병렬처리 기술과, 각각의 컴퓨터, 프로세서, 스레드에 의해 취합된 데이터들 중 중복 데이터를 삭제하고, 상기 데이터를 시간별 또는 유형별로 통합할 수 있다. 이 때, 상기 데이터 취합부(110)는 상기 병렬처리 기술 진행 중에 인터넷 등의 통신 연결이 끊어진 경우, 자동으로 재연결하고, 중단된 데이터 이후의 데이터를 계속하여 취합할 수 있는 기능을 가진다.1 and 2, the data collecting unit 110 collects and stores data of contents existing in the server group 10 through a communication network 20 such as the Internet network. In this case, the term 'social data' refers to data uploaded by users who use various online services such as a blog, a cage, a portal, a show mall, and a Twitter . The data collecting unit 110 may collect social data using various methods. For example, the social data can be collected using a URL request method, a method using a search engine, a method of directly executing a browser, and the like. In addition, the data collection unit 110 receives a data keyword or a period of time for collecting data, and inputs a social media message including a keyword, a text in a homepage, a text in a homepage, And the like can be automatically collected and stored in the DB of the data collecting unit. In addition, a message, URL, information (gender, region) of the uploaded person can be typed and extracted from each collected social media message, text comment, opinion, and media data bundle and stored in the DB. The data collecting unit 110 collectively collects a large amount of data within a unit time by using a parallel processing technique that simultaneously utilizes a plurality of computers, a multiprocessor, and a multithread, and the data collected by each computer, It is possible to delete the redundant data among the data and integrate the data by time or type. At this time, the data collecting unit 110 automatically reconnects when the communication connection of the Internet or the like is disconnected while the parallel processing technology is in progress, and has a function of continuously collecting data after the interrupted data.

도 3은 본 발명에 따른 소셜 데이터 분석 장치의 메시지 분석부의 구성 및 작용에 관하여 설명한 도면이다. FIG. 3 is a diagram for explaining the configuration and operation of a message analyzing unit of the social data analyzing apparatus according to the present invention.

도1 및 3을 참조하면, 메시지 분석부(120)는 상기 데이터 취합부(110)에서 수집하여 저장된 텍스트, 메시지 들을 형태소 단위로 분석하여 품사를 구분하고, 각각의 형태소에 해당되는 품사를 태깅(tagging)하는 기능을 가진다. 상기 품사(品詞)는 단어를 문법적 기능, 형태, 의미에 따라 나눈 갈래다. 이는 명사(이름씨), 대명사(대이름씨), 수사(셈씨), 조사(토씨), 동사(움직씨), 형용사(그림씨), 관형사(매김씨), 부사(어찌씨), 감탄사(느낌씨)로 나누어 진다. 상기 메시지 분석부(120)는 띄어쓰기 없는 짧은 패턴의 반복 또는 의미 없는 특수기호의 반복 사용 등을 필터링하여 제외하거나, 축약하여 분석하는 입력필터 모듈(123)과 품사 태깅이 되지 않은 비속어, 은어, 약어, 이모티콘 중 적어도 어느 하나에 해당하는 미등록 어휘를 감성 평가부로 전달하는 미등록어 처리 모듈(125)을 포함한다. 기타 문장을 분리하는 문장 분리 모듈, 형태소를 분석하는 형태소 분석 모듈 및 상기 품사를 태깅하는 품사태거(tagger) 모듈(124)이 포함될 수 있다. 또한, 상기 데이터 취합부(110)에서와 마찬가지로, 분석의 효율을 높이기 위해 다수의 컴퓨터, 멀티 프로세서 또는 멀티스레드를 동시에 활용하는 병렬처리 기술이 지원된다. 도 3을 참조하면, 메시지 분석부(120)는 형태소 DB(121)와 감성어휘 DB(122)를 이용하여, 감성, 감정과 관련된 어휘를 구별하고, 메시지, 문장 내에서 감성, 감정 관련 어휘의 주체 및 객체를 분리하여 저장하는 기능을 가진다. 이 때, 상기 형태소 DB(121) 및 상기 감성어휘 DB(122)에 저장되어 있지 않아서 품사태깅이 되지 않은 비속어, 은어, 약어, 이모티콘 등의 미등록 어휘는 상기 미등록어 처리모듈(125)의 작동으로 감성 평가부(130)로 전송하게 된다. 상기 형태소 DB 및 감성어휘 DB는 업데이트가 가능하여 추후 저장 및 삭제가 가능한 것으로 구성된다. 따라서 이상에서와 같이 메시지 분석부(120)는 미등록 어휘를 제외한 상기 데이터 취합부(110)에서 수집하여 저장된 모든 데이터에 대하여 감성어휘를 색출하게 된다.
Referring to FIGS. 1 and 3, the message analyzer 120 analyzes the texts and messages collected and stored in the data collecting unit 110 by morpheme units to distinguish parts of speech, tagging. The part - of - speech (word of speech) divides the word according to its grammatical function, form, and meaning. This is because of the fact that there are two kinds of names: Noun (Name), Daemyung (Daemma), Sasa (Sam), Survey (Toshi), Verb (Mang), Adjective Divided. The message analyzing unit 120 may include an input filter module 123 for filtering or eliminating short repeated patterns or repetitive use of meaningless special symbols, and an input filter module 123 for analyzing shortened patterns, And an unregistered word processing module 125 for transmitting an unregistered vocabulary corresponding to at least one of the emoticons to the emotion evaluation unit. A sentence separating module for separating other sentences, a morpheme analysis module for analyzing the morpheme, and a speech tagger module 124 for tagging the part of speech. Also, as in the data collecting unit 110, a parallel processing technique that simultaneously utilizes a plurality of computers, a multiprocessor, or a multi-thread is supported to increase the efficiency of analysis. 3, the message analyzing unit 120 distinguishes vocabulary related to emotion and emotion by using the morpheme DB 121 and the emotional vocabulary DB 122, It has a function to separate and store subjects and objects. At this time, the unregistered vocabularies such as profane words, abbreviations, abbreviations, and emoticons that are not stored in the morpheme DB 121 and the emotional vocabulary DB 122, To the emotion evaluation unit 130. The morpheme DB and emotional vocabulary DB can be updated and stored and deleted later. Therefore, as described above, the message analyzing unit 120 searches the emotional vocabulary for all the data collected and stored in the data collecting unit 110 except for the unregistered vocabulary.

도 5는 본 발명에 따른 소셜 데이터 분석 장치에서 활용하는 표준어 감성어휘쌍의 예를 도시한 도면이다. FIG. 5 is a diagram illustrating an example of a standard word emotional lexical pair used in the social data analyzing apparatus according to the present invention.

감성 평가부(130)는 상기 메시지 분석부(120)의 미등록어 처리 모듈(125)에 의하여 품사태깅이 되지 않은 비속어, 은어, 약어, 이모티콘 등의 미등록 어휘에 대하여 대체할 수 있는 표준어를 선정함으로써 감성어휘를 저장한다. 상기 감성어휘의 저장은 상기 감성 평가부(130)의 감성 DB(131)에 저장되면 상기 감성 DB(131)는 업데이트가 가능하여 추후 저장 및 삭제가 가능한 것으로 구성된다. 상기 감성 평가부(130)는 미등록 어휘를 은어, 비속어, 약어, 이모티콘 등의 유형별로 분류하여 저장할 수 있으며, 상기 미등록 어휘를 대상으로 표준어 감성어휘쌍을 이용하여 평가자로 하여금 감성평가를 할 수 있도록 한다. 이 때, 각각의 은어, 비속어, 약어, 이모티콘을 온라인 접속을 통하여 모니터 등의 출력장치의 화면에 제시하고, 평가자로 하여금 표준어 감성어휘쌍 또는 표준어 조합에 대한 평가를 마우스, 키보드 등의 입력장치를 통하여 입력받음으로써 온라인상에서 평가를 진행할 수도 있다. 도 5를 참조하면 표준어 감성어휘쌍이란 예를들어, ‘뽀대 있는’이라는 비속어를 표준어로 매핑(또는 번역)하기 위해서 도5와 같이 감성을 나타내는 형용사들을 쌍(서로 반대의미를 가지는)으로 제시하고(보통 20~40개쌍)을 피실험자들에게 평가를 하는 것이다. 이는 SD법(Semantic Differential method)에서 사용되는 방법으로서 상기 SD법은 기조사방법의 하나이며, 각종 이미지나 선호도를 측정하는 데 이용되며, SD법으로 약칭된다. 크다-작다, 좋다-나쁘다 등의 형용사 반대어를 사용해서 일정한 상표나 상품이 어느 쪽에 어느 정도나 가까운 느낌을 주고 있는가를 조사한다. 약 30명의 평균으로 비교적 안정된 결과를 얻을 수 있다. 또한, 상기 감성 평가부(130)는 각각의 은어, 약어를 대체할 수 있는 대체 표준어를 한 가지로 선정하기 어렵거나, 의미가 모호할 경우, 온라인 설문 평가를 통해 각각의 은어, 약어를 2개 이상 표준어 들의 가중합으로 대체할 수 있으며, 비속어 별로 감정, 감성의 강도를 직접 입력받거나 온라인 설문 평가를 통하여 감정, 감성의 강도를 평가하고 그 결과를 상기 감성 DB에 저장한다. 상기 감성 DB에는 은어, 비속어, 약어, 이모티콘 등의 의미와 각 어휘 또는 이모티콘을 대체할 수 있는 표준어 1개 또는 표준어들의 조합과 가중치가 수록될 수 있으며, 감성, 감정과 관련 있는 어휘, 이모티콘은 표준 감성어휘 조합과 그 가중치가 수록되며, 비속어는 감성, 감정의 강도가 수록 될 수 있다. The emotion evaluation unit 130 selects a standard word that can be substituted for an unregistered vocabulary such as profanity, hermit, abbreviation, or emoticon that has not been tagged with a part of speech by the unregistered word processing module 125 of the message analysis unit 120 Save emotional vocabulary. When the emotional DB 131 is stored in the emotional DB 131 of the emotional evaluation unit 130, the emotional DB 131 can be updated and stored and deleted later. The emotion evaluation unit 130 may classify and store the unregistered vocabulary according to the type such as silver, profanity, abbreviation, and emoticon, and may store the unregistered vocabulary so that the evaluator can perform the sensitivity evaluation using the standard word emotional vocabulary pair. do. At this time, each swear word, profanity, abbreviation, and emoticon is displayed on the screen of an output device such as a monitor through online connection, and the evaluator evaluates the standard word sentence lexical pair or the combination of the standard word by input device such as a mouse or a keyboard The evaluation can be carried out online. Referring to FIG. 5, in order to map (or translate) an adjective word of 'standardized word' to a standard word, for example, the standard word emotional word pair is presented as pairs (having opposite meanings) (Usually 20 to 40 pairs) to the subjects. This is a method used in the SD method (Semantic Differential method). The SD method is one of the method of irradiating and is used for measuring various images and preferences, and is abbreviated as the SD method. Use the adjective-opposite words such as big-small, good-bad, etc. to investigate to what degree a certain brand or product gives a close feeling to either side. A relatively stable result can be obtained with an average of about 30 persons. In addition, if the sensibility evaluation unit 130 can not select one substitute standard word that can replace each word or abbreviation, or if the meaning is unclear, The strength of emotion and emotion is evaluated through online questionnaire evaluation and the result is stored in the emotional DB. The emotional DB may contain one or a combination of standard words and weights that can replace the meaning of each of the words, abbreviation, abbreviation, and emoticon, and each vocabulary or emoticon, and vocabulary related to emotion and emotion, Emotional vocabulary combinations and their weights are recorded, and profanity can include emotional and emotional intensity.

도 6은 본 발명에 따른 소셜 데이터 분석 장치의 감성 평가부에서 활용되는 주성분 분석의 예를 도시한 도면이다. 6 is a diagram showing an example of principal component analysis used in the sensitivity evaluation unit of the social data analysis apparatus according to the present invention.

주성분 분석(Principal Component Analysis)은 다변량 통계분석방법 중 하나로서, 서로 연관되어 있는 변수들이 관측되었을 때, 이 변수들이 가지고 있는 정보들을 최대한 확보하면서 이들을 설명할 수 있는 작은 수의 새로운 변수(주성분)를 찾아내는 방법이다. 상기 도 5에서‘뽀대 있는’이라는 비속어는 그 아래 감성어휘쌍(형용사 쌍)에 대한 평가결과를 통해 의미 또는 감성을 파악할 수 있는데, 이 감성어휘쌍(형용사 쌍)이 너무 많게 된다. 따라서 도 6을 참조하여 볼 때, 주성분 분석을 통해 은어 비속어 등을 두 개, 또는 세 개 쌍으로 축약해서 나타낼 수 있다. (단순히 위에서 보이는 감성어휘쌍 중 두세 개로 줄여지는 것이 아니라 복합된 형용사쌍으로 축약되게 된다. 따라서 주성분 쌍의 이름을 새로 부여하기도 하지만 기존 감성어휘쌍의 이름을 이용하기도 한다.)
Principal Component Analysis is one of the methods of multivariate statistical analysis. When the variables related to each other are observed, a small number of new variables (principal components) can be explained to maximize the information they have. It is a way to find out. In FIG. 5, the word 'poojae' has the meaning or emotion through the evaluation result of the emotional lexical pair (adjective pair) below it, and the number of the emotional lexical pair (adjective pair) becomes too much. Therefore, referring to FIG. 6, it is possible to represent slang-based profanity or the like in abbreviated form by two or three pairs through principal component analysis. (It is not simply reduced to two or three of the emotional lexical pairs shown above, but it is abbreviated as a composite adjective pair.

상기 도 5에서의 "뽀대 있는" 이라는 비속어를 살펴볼 때,In the case of the word "having a puffy" in FIG. 5,

첫 번째 주성분 : (각지고 격식을 차린) - (부드럽고 격식을 차리지 않은)The first main ingredient: (lagged) - (soft and unformed)

두 번째 주성분 : (남성적인) - (여성적인)Second main ingredient: (masculine) - (feminine)

즉, ‘뽀대 있는’ 이라는 비속어를 가장 많은 부분 설명할 수 있는 형용사 쌍이 첫 번째 주성분이 되며, 첫 번째 주성분, 두 번째 주성분으로 설명이 부족하게 되면 세 번째 네 번째 주성분 까지 이용하게 된다. 이러한 과정을 통해 본 발명에서는 ‘뽀대있는’의 감성은 0.8*(각지고 격식을 차린) + 0.2*(남성적인) 과 같이 표준어 감성 조합으로 나타낼 수 있으며 이러한 데이터는 상기 감성DB(131)에 저장된다.
In other words, the adjective pair that can explain the most part of the profanity of "poojae" is the first main ingredient, and the first main ingredient, the second main ingredient, is used until the fourth main ingredient if the explanation is insufficient. Through this process, the sensibility of 'poojae' can be expressed as a standard sentiment combination such as 0.8 * (each lucky form) + 0.2 * (masculine), and such data can be stored in the emotional DB 131 do.

감성 분석부(140)는 상기 메시지 분석부(120)와 상기 감성 평가부(130)에서 제공되는 다수의 감성어휘를 기반으로 감성측정 대상에 대한 감성공학적 분석 후 주된 감성 이미지를 추출한다. The emotional analysis unit 140 extracts a main emotional image after emotional engineering analysis on the emotional measurement object based on the plurality of emotional vocabulary provided by the message analysis unit 120 and the emotional evaluation unit 130.

감성공학적 분석이란 감성공학이라는 분야에서 제품이나 브랜드 평가에 이용되는 분석방법을 말한다. 상기 ‘뽀대있는’ 이라는 비속어에 대한 평가, 분석이 바로 감성공학적 기법을 이용하여 감성평가를 한 것이다. 감성공학적 기법은 일반적으로, 다수의 감성형용사 쌍을 평가자에게 제시하고, 이를 이용하여 대상을 평가하도록 하는 SD(Semantic Differential)법을 이용하여 감성을 정량적으로 측정하며, 요인분석(Factor Analysis), MDS(Multi Dimensional Scaling) 등의 분석방법을 통해, 감성 맵(map) 형태의 결과를 보여주게 된다. 현재 까지 자동차 계기판, 인테리어, 주택, 화장품용기, 전기자동차 외장, 요트 외장 디자인 등에 널리 사용되어 오고 있으며, 브랜드, 브랜드 컨셉·디자인 및 네이밍의 진단평가, 사물 및 사람에 대한 이미지 진단 평가에도 활용 가능하다. Emotional engineering analysis refers to analytical methods used in product or brand evaluation in the area of emotional engineering. The evaluation and analysis of the profligate word "poverty" is based on emotional evaluation using emotional engineering techniques. Emotion engineering techniques are generally used to quantitatively measure emotions using the Semantic Differential (SD) method, which presents a large number of pairs of emotional adjectives to the evaluator, (Multi Dimensional Scaling), and the like, the results of the emotion map type are displayed. It has been widely used in automotive instrument panel, interior, housing, cosmetic container, electric car exterior, yacht exterior design, etc. It can also be used for diagnostic evaluation of brand, concept, design and naming, image diagnosis evaluation of objects and people .

상기 감성 분석부(140)는 감성 측정 대상에 대한 감성 어휘들과 상기 감성 어휘들의 빈도 및 감정적 강도 등을 조합하고 이를 입력자료로 하여 성긴 주성분 분석(SPCA; Spare Principal Component Analysis)을 실시함으로써, 주된 감성 성분을 추출한다. 상기 감성 평가부(130)에서는 도 5에서 보여지는 것처럼 다양한 감성 어휘쌍으로 빠진 부분 없이 평가를 하므로 일반적인 주성분 분석(PCA; Principal Component Analysis)을 사용하였으나, 인터넷이나 SNS에서 수집된 데이터 들은 도 5와는 다르게 충분한 감성어휘쌍에 대한 평가자료를 얻을 수 없기 때문에 상기 감성 분석부(140)에서는 성긴 주성분 분석을 이용하여 주된 감성 성분을 추출한다.
The emotional analysis unit 140 combines the emotional vocabulary of the emotional measurement target with the frequency and emotional intensity of the emotional vocabulary and performs sparse principal component analysis (SPCA) The sensitive component is extracted. As shown in FIG. 5, the emotion evaluating unit 130 uses a principal component analysis (PCA) because it performs evaluation without missing portions of various emotional lexical pairs. However, the data collected from the Internet or the SNS is shown in FIG. 5 The emotion analyzing unit 140 extracts the main emotional component using the sparse principal component analysis because evaluation data for a sufficient emotional lexical pair can not be obtained.

분석결과 표시부(150)는 상기 감성 분석부(140)에서 추출된 주된 감성 성분과 감성측정 대상을 실시간으로 이미지 공간상에 표시한다. 이 때, 상기 분석결과 표시부(150)는 상기 감성 분석부(140)에서 추출되는 주된 수개의 감성 성분으로부터 2차원 내지 3차원의 요인축을 구성하고, 감성측정 대상의 특정 시점에 대한 이미지 공간상의 위치를 실시간으로 표시하거나, 특정 기간을 입력받아 상기 기간 동안의 시간의 추이에 따른 분석결과를 애니메이션 형식으로 표시할 수 있다. 또한, 감성측정 대상 여러 개를 공통의 감정 및 감성 성분들로 구성된 이미지 공간상에 동시에 표시할 수 있다.The analysis result display unit 150 displays the main emotion component extracted by the emotion analyzing unit 140 and the emotion measurement object on the image space in real time. At this time, the analysis result display unit 150 constructs factor axes of two or three dimensions from the main several emotion components extracted from the emotion analysis unit 140, Or display the analysis result according to the transition of the time during the period in an animation format. Further, it is possible to simultaneously display several sensibility measurement objects on an image space composed of common emotion and emotion components.

도 7은 분석 결과 표시부의 출력의 예를 나타낸 도면이다. 도 8은 분석 결과 표시부의 출력의 다른 예를 나타낸 도면이다. 도 9는 분석 결과 표시부의 출력의 또 다른 예를 나타낸 도면이다.7 is a diagram showing an example of the output of the analysis result display unit. 8 is a diagram showing another example of the output of the analysis result display unit. 9 is a diagram showing another example of the output of the analysis result display unit.

도 7 및 8을 참조하면, 예를 들어, 자동차 3개 모델(A, B, C)을 대상으로 감성을 특정, 분석하여, 주된 감성성분을 다음과 같이 두 개 추출한 경우 Referring to FIGS. 7 and 8, for example, when emotion is specified and analyzed with respect to three models (A, B, and C) of automobiles and two main emotion components are extracted as follows

1) (전통적인-현대적인)1) (traditional-modern)

2) (불안하고 불편해 보이는-안정되고 편안해 보이는)2) (looking uneasy and uncomfortable - looking stable and comfortable)

상기 두 개의 주된 감성성분을 X, Y 축으로 하고 3개 모델을 나타내면, 다음과 같으며, 시기에 따라 감성이 변화하는 것을 나타낼 수 있다. (예를 들어 모델들 시장 출시 전 컨셉카 사진들에 대한 감성과 출시 후, 실제 모델들을 보거나 타보고 난 후의 감성이 변화됨. 정치인들의 경우, 여러 가지 사건이나 정치적 이슈에 따라 감성평가 결과가 달라질 수 있음) 또한, 시간에 따른 감성변화는 애니메이션으로도 표현 가능하다. 도 7 및 도 8은 2D 형태로 출력되는 것으로서 타원은 상기 모델의 95% 유의수준 범위를 나타낸다. 도 8은 도 7을 간략화된 정보 표시 방식으로 출력한 화면이다. 또한, 도 9를 참조하면 주된 감성성분을 (전통적인 - 현대적인), (불안하고 불편해보이는 - 안정되고 편안해 보이는), (남성적인 - 여성적인) 이렇게 세개를 추출한 경우로서, 3D 형태로 출력된다. 상기 도 7, 8 및 9를 참조하면, 모두 특정 시점에 대한 평가가 이루어지므로 시간의 추이에 따라 변화를 확인할 수 있으며 상기 시간의 추이에 따른 변화는 애니매이션 형태로 출력될 수도 있다.
The three models with the two main emotional components as the X and Y axes are as follows. It can be shown that the emotion changes according to the time. (For example, the emotions of concept cars before the launch of the models, the emotions after seeing and watching the actual models are changed after the launch, etc.) In addition, the change in emotion over time can also be expressed by animation. Figures 7 and 8 are output in 2D form, with ellipses representing the 95% significance level range of the model. Fig. 8 is a screen in which Fig. 7 is output in a simplified information display mode. In addition, referring to FIG. 9, a case in which the main emotional component is extracted as three types (traditional-modern), (unstable and uncomfortable-stable and comfortable-looking), and (masculine-feminine) . Referring to FIGS. 7, 8, and 9, since the evaluation is performed at a specific time point, a change can be confirmed according to a change in time, and a change according to the change in time can be output in an animation form.

이하에서는 본 발명에 따른 소셜 데이터 분석 방법의 동작에 대하여 설명한다.Hereinafter, the operation of the method for analyzing social data according to the present invention will be described.

도 4는 본 발명에 따른 소셜 데이터 분석 방법의 동작 흐름도를 설명한 도면이다.4 is a flowchart illustrating an operation of the method for analyzing social data according to the present invention.

도 4를 참조하면, 상기 소셜 데이터 분석 방법은 통신망을 통하여 서버군에 존재하는 콘텐츠의 데이터를 수집 및 저장하고(S10), 상기 수집 및 저장된 데이터를 형태소 DB 및 감성어휘 DB를 기반으로 형태소 별로 품사 태깅을 하고, 감성어휘를 색출한다(S20). 상기 형태소 DB 및 감성어휘 DB에 저장되지 않은 데이터에 대해서는 상기 품사 태깅이 이루어 질 수 없다. 이 때 비속어, 은어, 약어, 이모티콘 등의 미등록 어휘는 품사태깅이 되지 않으므로 품사태깅이 되지 않은 미등록 어휘가 존재하는지를 판단한다(S30). 이 후, 미등록 어휘가 발견되면 상기 미등록 어휘에 대하여 감성평가를 하고, 상기 미등록 어휘를 대체할 수 있는 표준어 감성어휘를 감성 DB에 저장한다(S40). 한편, 미등록 어휘가 발견되지 않은 경우 또는 미등록 어휘가 발견 된 경우로서 S40 단계를 진행하고 나서는 S20 단계 및 S40 단계에서 제공되는 감성어휘 들을 기반으로 주된 감성성분을 추출한다(S50). 이 후, 상기 추출된 주된 수개의 감성 성분과 감성 측정대상을 2차원 내지 3차원의 이미지 공간상에 표시를 함으로써(S60) 당해 소셜 데이터 분석 방법은 종료된다.
Referring to FIG. 4, the social data analysis method collects and stores data of contents existing in a server group through a communication network (S10), and stores the collected and stored data in the form of a morpheme DB Tagging, and emotional vocabulary is searched (S20). The part-of-speech tagging can not be performed on data that is not stored in the morpheme DB and emotional vocabulary DB. At this time, since the unregistered vocabulary such as profanity, fluency, abbreviation, and emoticon can not be tagged with part-of-speech, it is determined whether there is an unregistered vocabulary that is not part-tagged (S30). Thereafter, if an unregistered vocabulary is found, the unregistered vocabulary is subjected to a sensitivity evaluation, and the standard word sensible vocabulary capable of replacing the unregistered vocabulary is stored in the sensibility DB (S40). If the unregistered vocabulary is not found or the unregistered vocabulary is found, the main sensibility component is extracted based on the sensibility vocabulary provided in steps S20 and S40 (S50). Thereafter, the main extracted emotional components and the sensibility measurement object are displayed on the two- or three-dimensional image space (S60), and the social data analysis method ends.

이상에서와 같이 본 발명에 따른 소셜 데이터 장치 및 방법은 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, the social data apparatus and method according to the present invention are not limited to the configuration and method of the embodiments described above, but the embodiments can be applied to all of the embodiments Or some of them may be selectively combined.

10; 서버군
20; 통신망
110; 데이터 취합부
120; 메시지 분석부
121; 형태소 DB
122; 감성어휘 DB
123; 입력 필터모듈
124; 품사 태깅모듈
125; 미등록어 처리모듈
130; 감성 평가부
131; 감성 DB
140; 감성 분석부
150; 분석결과 표시부10; Server group
20; communications network
110; The data collecting unit
120; Message analysis section
121; Morpheme DB
122; Emotional Vocabulary DB
123; Input filter module
124; Partial tagging module
125; Unregistered word processing module
130; The emotion evaluation unit
131; Emotional DB
140; Emotional analysis department
150; Analysis result display

Claims

A data collecting unit for collecting and storing data of contents existing in a server group through a communication network;
Analyzing the data in a morpheme unit matched with the morpheme DB and the emotional vocabulary DB, tagging parts of speech by the morpheme unit, extracting emotional vocabulary and storing the sensed vocabulary;
An emotion evaluation unit for performing emotional evaluation on an unregistered vocabulary that is not partly tagged in the message analyzing unit and storing a standard word emotional vocabulary corresponding to the unregistered vocabulary obtained as a result of the emotional evaluation in an emotional DB;
The message analyzing unit and the emotional vocabulary and standard
An emotional analysis unit for extracting a main emotional component, which is a representative emotional component for the emotional measurement object, after the emotional engineering analysis of the sensation target based on the emotional vocabulary; And
And an analysis result display unit for displaying the sensibility measurement object and the main sensibility component on an image space in real time.

The method according to claim 1,
Wherein the data-
Wherein the input unit receives input of a keyword of data or a period of time for collecting data, collects and stores data corresponding to the input, and stores the classified data by the source of the data or the uploader of the data.

The method according to claim 1,
Wherein the data-
A parallel processing technique for collecting and storing data by utilizing a plurality of computers, a multiprocessor or a multi-thread simultaneously is supported, and redundant data among data collected through the parallel processing technique is deleted, and the data is integrated And automatically collecting and storing data after the suspended data when the parallel processing technique is stopped during the process.

The method according to claim 1,
The message analyzing unit,
An input filter module for filtering or repeating short patterns without repetition or repeated use of meaningless special symbols; And
And an unregistered word processing module for transmitting an unregistered vocabulary corresponding to at least one of a profanity word, a fluent word, an abbreviation, and an emoticon that has not been tagged with part-of-speech to the emotion evaluation unit.

The method according to claim 1,
The message analyzing unit,
Characterized in that a parallel processing technique which utilizes a plurality of computers, a multiprocessor or a multi-thread simultaneously is supported.

The method according to claim 1,
The message analyzing unit,
Wherein the system is configured to distinguish emotional and emotional vocabularies based on a morphological database and an emotional vocabulary DB, and to separate and store subjects and objects of the vocabulary related to the emotional and emotional states.

The method according to claim 1,
Wherein the emotion evaluation unit
And the evaluator performs an emotional evaluation using the standard word emotional vocabulary pair, and the evaluator evaluates the standard word emotional vocabulary pair through an online connection.

The method according to claim 1,
Wherein the emotion evaluation unit
Wherein the evaluator selects the standard word that can replace the unregistered vocabulary that has not been tagged with the part-of-speech by a weighted sum of two or more standard words through an online connection.

The method according to claim 1,
Wherein the emotion evaluation unit
Wherein the strength of emotion and emotion for the unregistered vocabulary that has not been tagged with the part-of-speech is directly input or is evaluated through an online questionnaire evaluation.

The method according to claim 1,
Wherein the emotional analysis unit comprises:
Wherein the main sensory component of the sensibility measurement object is extracted by sparse principal component analysis (SPCA) based on a combination of emotional vocabulary of the sensibility measurement object and frequency and emotional intensity of the vocabulary Social data analysis device.

The method of claim 10,
The extraction of the main emotional component,
And a specific time or a specific period is set.

The method according to claim 1,
As a result of the analysis,
The main sensory components extracted from the emotional analysis unit may be configured as two-dimensional or three-dimensional axes, and the position on the image space with respect to a specific time point of the sensibility measurement object may be displayed in real time, Wherein the analysis result is displayed in an animation format.

The method according to claim 1,
As a result of the analysis,
Wherein the plurality of sensibility measurement objects are simultaneously displayed on an image space composed of common emotion and emotion components.

Collecting and storing data of contents existing in a server group through a communication network;
Analyzing the data in a morpheme unit matched with the morpheme DB and the emotional vocabulary DB, tagging parts of speech by the morpheme unit, searching and storing emotional vocabulary;
Performing emotional evaluation on an unregistered vocabulary that is not part-tagged in the detecting and storing step and storing a standard word emotional vocabulary corresponding to the unregistered vocabulary obtained as a result of the emotional evaluation in an emotional DB;
Extracting and storing the main emotional component as a representative emotional component for the emotional measurement object after the emotional engineering analysis on the emotional measurement object based on the emotional vocabulary and the standard word emotional vocabulary provided in the step of storing in the emotional DB step; And
And displaying the sensibility measurement object and the main sensibility component on an image space in real time.

15. The method of claim 14,
Collecting and storing the data comprises:
A plurality of computers, a multi-processor or a multi-thread, and a plurality of computers, a plurality of computers, a multi-processor or a multithreader, for collecting and storing data corresponding to the input, A parallel processing technique for collecting and storing data by utilizing the parallel processing technology is supported, and redundant data among data collected through the parallel processing technology is deleted, the data is integrated by time and type, And automatically collecting and storing the data after the interrupted data when it is interrupted.

15. The method of claim 14,
The step of extracting and storing comprises:
It is possible to distinguish emotional and emotional vocabularies based on morpheme DB and emotional vocabulary DB, to separate subjects and objects of vocabulary related to emotions and emotions, to store them in parallel, to use multiple computers, multi-processors or multi- Processing technology is supported.

15. The method of claim 14,
The step of extracting and storing comprises:
A repetition of a short pattern without spacing or a repetitive use of a meaningless special symbol is excluded or shortened to analyze and an unregistered vocabulary corresponding to at least one of profanity, The method comprising the steps of: storing the sensed data in the emotional DB;

The method according to claim 1,
Wherein the storing in the emotional DB comprises:
And the evaluator is made to evaluate the standard word emotional lexical pair through an online connection, and the evaluator is allowed to perform evaluation of the part marking tagging through the online access, The standard word that can replace the unregistered vocabulary is selected as a weighted sum of two or more standard words and the strength of emotion and emotion for the unregistered vocabulary that is not tagged in the part is directly input or is evaluated through an online questionnaire evaluation A social data analysis method characterized by.

Claim 1:
Wherein the extracting comprises:
(SPCA) based on a combination of the emotional vocabulary of the sensibility measurement object and the frequency and the emotional intensity of the vocabulary, thereby extracting the main sensibility component representative of the sensibility measurement object However,
Wherein the extraction of the main sensible component is performed by setting a specific time point or a specific time period.

The method according to claim 1,
Wherein the displaying comprises:
A plurality of principal emotion components extracted in the extracting step are constituted by two-dimensional or three-dimensional axes, the position on the image space with respect to a specific time point of the sensitivity measurement object is displayed in real time, And displays a plurality of sensibility measurement objects on an image space composed of common emotion and emotion components at the same time.