KR101100830B1

KR101100830B1 - Entity searching and opinion mining system of hybrid-based using internet and method thereof

Info

Publication number: KR101100830B1
Application number: KR20090102129A
Authority: KR
Inventors: 나승훈; 남상협
Original assignee: 주식회사 버즈니
Priority date: 2009-10-27
Filing date: 2009-10-27
Publication date: 2012-01-02
Also published as: KR20110045519A

Abstract

본 발명은 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템 및 그 방법에 관한 것으로, 인터넷 상에 존재하는 웹 문서 데이터들을 수집하는 제1 서버와, 상기 제1 서버로부터 수집된 웹 문서 데이터들을 제공받아 개체별 메타 정보들을 추출하고, 상기 개체별 메타 정보들을 이용하여 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 분석하는 데이터 분석서버와, 상기 데이터 분석서버로부터 분석된 개체별 메타 정보들을 비롯한 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 데이터베이스화하여 저장되도록 인덱싱하는 제2 서버와, 인터넷을 통해 접속되어 사용자 단말로부터 전송된 사용자 검색 키워드를 제공받아 상기 제2 서버와 연동되어 상기 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재하는지 판단하고, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재할 경우 해당 사용자 단말의 화면에 해당 메타 정보 또는 대상 키워드와 관련된 개체 리스트 결과를 디스플레이 해주는 웹 서버를 포함함으로써, 특정 사용자 검색 키워드가 지시하는 개체(Entity)를 직접 찾아서 개체 결과 리스트를 비롯한 의견통계 정보들을 손쉽게 한눈에 검색 및 모니터링 할 수 있는 효과가 있다.The present invention relates to an object search using the Internet and a hybrid-based opinion analysis system and method therefor, the method comprising: a first server for collecting web document data existing on the internet, and a web document collected from the first server A data analysis server that receives data and extracts meta information for each object, and analyzes positive / negative statistical information about each object of each object by using the meta information for each object, and for each object analyzed from the data analysis server A second server that indexes positive / negative statistical information about each object of each entity including meta information, and indexes the database to be stored; and receives a user search keyword transmitted from a user terminal connected through the Internet. Meta information or target key pre-stored in the user search keyword in conjunction with A web server for determining whether a card is present and displaying an object list result related to the corresponding meta information or the target keyword on the screen of the corresponding user terminal when the pre-stored meta information or the target keyword exists. By directly finding the entity indicated by the keyword, it is possible to easily search and monitor the statistical information including the entity result list at a glance.

개체 검색, 의견분석, 인터넷, 사용자 검색 키워드, 메타 정보 키워드, 대상 키워드, 웹 서버 Object Search, Opinion Analysis, Internet, User Search Keyword, Meta Information Keyword, Target Keyword, Web Server

Description

ENTITY SEARCHING AND OPINION MINING SYSTEM OF HYBRID-BASED USING INTERNET AND METHOD THEREOF}

본 발명은 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템 및 그 방법에 관한 것으로, 보다 상세하게는 인터넷을 이용하는 사용자들에 의해 입력된 특정 검색 키워드가 지시하는 개체(Entity)를 직접 찾아서 문서가 아닌 개체 결과 리스트 및/또는 규칙기반(Rule-Based)과 기계학습(Machine Learning) 방식을 모두 사용하는 하이브리드 기반(Hybrid-Based) 방식으로 추출된 의견통계 정보들을 해당 사용자 단말을 통해 디스플레이 해줌으로써, 인터넷 사용자들은 특정 검색 키워드와 관련된 개체들을 한눈에 검색 및 모니터링 할 수 있도록 한 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템 및 그 방법에 관한 것이다.The present invention relates to an entity search using the Internet and a hybrid based opinion analysis system and method therefor, and more particularly, directly to an entity indicated by a specific search keyword inputted by users using the Internet. Find and display the results of opinions, not documents, and / or statistical information extracted in a hybrid-based manner using both rule-based and machine learning methods. By doing so, Internet users relate to object search using the Internet and hybrid based opinion analysis system and method for using the Internet to search and monitor objects related to a specific search keyword at a glance.

최근에 인터넷 사용이 점차 활발해 짐에 따라, 많은 사람들이 인터넷에서 예 컨대, 블로그(Blog), 위키(Wiki)와 같은 매체를 통해서 자신의 의견을 표현하고 있는 추세이다. 또한, 특정한 정보의 가치를 평가할 때, 이러한 다른 사람들이 인터넷 상에 올려놓은 의견 정보를 참조하고자 하는 수요도 높아지고 있다.Recently, as the use of the Internet has become more active, many people are expressing their opinions on the Internet through media such as blogs and wikis. In addition, when evaluating the value of certain information, there is an increasing demand to refer to opinion information posted by these other people on the Internet.

예를 들면, 인터넷 상에는 상품 리뷰(Review)에서 영화 리뷰까지 다양한 사용자들의 의견이 존재한다. 이러한 각 사용자들의 의견들은 일반 사용자들이 물품을 구매하거나, 영화를 보기 전에 다른 사용자들의 의견을 보고자 하는 경우에도 이용될 수 있으며, 마케팅 담당자나 주식 매매자 등이 각 물품이나 회사에 대한 일반 사용자들의 다양한 의견을 알고자 하는 경우에도 사용될 수 있다. 특히, 일반 사용자들은 특정 물품을 구매하기 전에 다른 사용자들의 평가를 먼저 보고 나서 이런 물품을 구매하려는 경향이 크다.For example, there are opinions of users on the Internet ranging from product reviews to movie reviews. The opinions of each of these users can be used when the general user wants to buy the goods or see other users' opinions before watching a movie. Can also be used if you want to know. In particular, general users tend to purchase other items after first reviewing other users' ratings before purchasing certain items.

하지만, 이러한 인터넷 상에 존재하는 의견들은 개개의 웹사이트들에만 존재하여, 이러한 의견 정보들을 사용하고자 할 경우에는 사용자가 일일이 이러한 개개의 모든 웹사이트를 수동으로 찾아보아야 하는 번거로움이 존재한다.However, the opinions that exist on the Internet exist only on individual websites, and when there is a desire to use such opinion information, the user has to manually search all these individual websites.

이러한 모든 웹사이트들을 사용자들이 모두 찾아보기 어려우며 일반 검색으로 다른 사용자들의 의견을 찾고자 하는 경우에는 의견이 있는 웹 문서, 긍정적인 의견이 있는 웹 문서, 부정적인 의견이 있는 웹 문서 등이 혼재하여 효과적으로 다른 사용자들의 의견을 찾아보기 어려운 문제점이 있다.It is difficult for users to browse all of these websites, and if you want to find other users' opinions by general search, you can effectively mix and match web documents with feedback, web documents with positive feedback, and web documents with negative feedback. There is a problem that is difficult to find their opinion.

이러한 문제점을 해결하기 위하여 국/내외 학계를 중심으로 사용자 의견 추출 기술이 활발하게 연구되고 있으며, 정보 검색 분야에서도 2000년도 초반부터 크게 발전하여 다양한 기술이 연구되고 있다.In order to solve these problems, technology for extracting user's opinions is actively researched in domestic and foreign academia. In the field of information retrieval, various technologies have been developed since early 2000.

그러나, 기존의 정보 검색 기술은 단순히 키워드가 존재하는 정보에 기반한 검색만 제공해주고 있을 뿐이고, 각 키워드가 등장하는 문서나 문장에서 긍정적/부정적으로 평가된 내용을 기반으로 한 좀더 고차원적인 검색까지 제공해주고 있지 못하고 있다. 최근에 사용자 의견 추출 기술을 정보 검색에 적용하려는 시도가 진행되고 있으나 아직도 단순히 긍정, 부정 문서를 나누는 수준에만 머무르고 있는 실정이다.However, the existing information retrieval technology simply provides a search based on the information in which the keyword exists, and provides a higher level search based on the positive / negative evaluation of the document or sentence in which each keyword appears. It is not. Recently, attempts have been made to apply user feedback extraction techniques to information retrieval, but they are still only at the level of sharing positive and negative documents.

본 발명은 전술한 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은 인터넷을 이용하는 사용자들에 의해 입력된 특정 검색 키워드가 지시하는 개체(Entity)를 직접 찾아서 문서가 아닌 개체 결과 리스트 및/또는 규칙기반(Rule-Based)과 기계학습(Machine Learning) 방식을 모두 사용하는 하이브리드 기반(Hybrid-Based) 방식으로 추출된 의견통계 정보들을 해당 사용자 단말을 통해 디스플레이 해줌으로써, 인터넷 사용자들은 특정 검색 키워드와 관련된 개체들을 한눈에 검색 및 모니터링 할 수 있도록 한 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템 및 그 방법을 제공하는데 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problem, and an object of the present invention is to directly find an entity indicated by a specific search keyword input by users using the Internet, and to search for an entity result list and / or not a document. By displaying opinion statistics extracted by the hybrid-based method using both rule-based and machine learning methods through the corresponding user terminal, Internet users can search for specific search keywords and keywords. The present invention provides an object search using the Internet to search and monitor related objects at a glance, and provides a hybrid-based opinion analysis system and method thereof.

전술한 목적을 달성하기 위하여 본 발명의 제1 측면은, 인터넷 상에 존재하는 웹 문서 데이터들을 수집하는 제1 서버; 상기 제1 서버로부터 수집된 웹 문서 데이터들을 제공받아 개체별 메타 정보들을 추출하고, 상기 개체별 메타 정보들을 이용하여 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 분석하는 데이터 분석서버; 상기 데이터 분석서버로부터 분석된 개체별 메타 정보들을 비롯한 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 데이터베이스화하여 저장되도록 인덱싱하는 제2 서버; 및 인터넷을 통해 접속되어 사용자 단말로부터 전송된 사용자 검색 키워드를 제공받아 상기 제2 서버와 연동되어 상기 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재하는지 판단하고, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재할 경우 해당 사용자 단말의 화면에 해당 메타 정보 또는 대상 키워드와 관련된 개체 리스트 결과를 디스플레이 해주는 웹 서버를 포함하는 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템 및 그 방법을 제공하는 것이다.In order to achieve the above object, a first aspect of the present invention includes a first server for collecting web document data existing on the Internet; A data analysis server that receives the web document data collected from the first server, extracts meta information for each object, and analyzes positive / negative statistical information about each object of each object by using the meta data for each object; A second server for indexing positive / negative opinion statistics information for each object of each individual including the meta information for each entity analyzed from the data analysis server to be stored in a database; And receiving a user search keyword transmitted from a user terminal connected through the Internet to determine whether there is meta information or a target keyword pre-stored in the user search keyword in association with the second server, and the meta prestored in the user search keyword. If the information or the target keyword exists, the object search using the Internet including a web server displaying the result of the object list related to the meta information or the target keyword on the screen of the user terminal, and a hybrid based opinion analysis system and method therefor To provide.

여기서, 상기 제1 서버는 인터넷 상에서 RSS 주소들을 수집 및 저장하고, 상기 저장된 RSS 주소들에 해당하는 RSS 파일들을 제공받아 각 RSS 파일이 제공해주는 링크정보를 이용하여 웹 문서 데이터를 수집함이 바람직하다.Here, it is preferable that the first server collects and stores RSS addresses on the Internet, receives RSS files corresponding to the stored RSS addresses, and collects web document data using link information provided by each RSS file. .

바람직하게, 상기 데이터 분석서버는, 미리 설정된 웹 문서 데이터에서 미리 설정된 형태의 문자열을 나타내는 정규식을 이용하여 개체별 메타 정보들을 추출할 수 있다.Preferably, the data analysis server may extract meta information for each object by using a regular expression representing a string of a preset form in the web document data.

바람직하게, 상기 데이터 분석서버는, 상기 수집된 웹 문서 데이터들에 대해 미리 설정된 분야별 기계학습 모델을 이용하여 분야별로 분류하는 제1 모듈; 상기 수집된 웹 문서 데이터들에 대해 언어처리를 수행하여 의견 문장을 추출하고, 상기 추출된 의견 문장에 대해 긍정/부정 의견표현으로 구분하는 제2 모듈; 상기 수집된 웹 문서 데이터들에 대해 상기 개체별 메타 정보들을 이용하여 어떤 개체에 해당되는지 판별하는 제3 모듈; 및 상기 제2 모듈을 통해 추출된 의견 문장의 주변에 있는 단어 및 품사 정보를 이용하여 어떤 대상에 해당되는지 판별하는 제4 모듈을 포함할 수 있다.Preferably, the data analysis server, the first module for classifying by sector using a pre-set machine learning model for the collected web document data; A second module for performing language processing on the collected web document data to extract opinion sentences, and dividing the extracted opinion sentences into positive / negative opinion expressions; A third module for determining which object corresponds to the collected web document data using the object-specific meta information; And a fourth module for determining which object corresponds to a word and part-of-speech information around the opinion sentence extracted through the second module.

바람직하게, 상기 제2 모듈은, 상기 수집된 웹 문서 데이터에 대해 문장 단 위로 분리하고, 상기 분리된 각 문장에 대해 언어처리를 수행하여 언어적인 자질들을 추출하는 언어처리부; 상기 추출된 각 문장의 언어적인 자질들을 이용하여 의견/비의견 문장을 구분하는 의견/비의견 구분부; 및 상기 구분된 의견 문장의 언어적인 자질들에 대해 긍정/부정 의견표현으로 구분하는 의견표현 구분부를 포함할 수 있다.Preferably, the second module comprises: a language processor for separating the collected web document data into sentence units and performing linguistic processing on the separated sentences to extract linguistic features; An opinion / non-computation division unit for classifying opinion / non-comment sentences using the linguistic qualities of the extracted sentences; And an opinion expression division unit for dividing the linguistic qualities of the divided opinion sentences into positive / negative opinion expressions.

바람직하게, 상기 제2 모듈은, 상기 추출된 의견 문장에 대해 미리 정의된 규칙이 존재할 경우에 미리 설정된 규칙기반(Rule-Based) 모델을 적용하여 긍정/부정 의견표현으로 구분하고, 상기 추출된 의견 문장에 대해 미리 정의된 규칙이 존재하지 않을 경우에 미리 설정된 기계학습 모델을 적용하여 긍정/부정 의견표현으로 구분할 수 있다.Preferably, when the predefined rule exists for the extracted opinion sentence, the second module applies a predetermined rule-based model to classify the opinion into positive / negative opinion expressions, and the extracted opinion. If there is no predefined rule for a sentence, the machine learning model can be divided into positive and negative opinions by applying a preset machine learning model.

바람직하게, 상기 제2 모듈은, 상기 추출된 의견 문장에 대해 미리 설정된 규칙기반 모델(Rule-Based) 및 기계학습 모델을 동시에 적용하여 긍정/부정 의견표현을 판단한 후, 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수를 부여하고, 상기 신뢰도 점수를 바탕으로 긍정/부정 의견표현으로 구분할 수 있다.Preferably, the second module, after applying a rule-based model and a machine learning model previously set to the extracted opinion sentence at the same time to determine a positive / negative opinion expression, the rule-based model and machine learning Different reliability scores may be given according to whether or not the results of applying the model are matched, and may be divided into positive / negative opinion expressions based on the reliability scores.

바람직하게, 상기 제2 모듈은, 상기 추출된 의견 문장에 대해 미리 정의된 예외처리 규칙후보 문장인지를 판단하고, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장일 경우에 미리 정의된 예외처리 규칙의 존재여부에 따라 미리 정의된 예외처리 규칙기반 모델 또는 기계학습 모델을 적용하여 긍정/부정/중립 의견표현으로 구분할 수 있다.Preferably, the second module determines whether the extracted exception sentence is a predefined exception handling rule candidate sentence, and the predefined exception if the extracted opinion sentence is a predefined exception handling rule candidate sentence. Depending on the existence of processing rules, predefined exception handling rule-based models or machine learning models can be applied to classify as positive / negative / neutral.

바람직하게, 상기 제2 모듈은, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장이 아닐 경우에 상기 추출된 의견 문장에 대해 미리 설정된 규칙기반 모델(Rule-Based) 및 기계학습 모델을 동시에 적용하여 긍정/부정 의견표현을 판단한 후, 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수를 부여하고, 상기 신뢰도 점수를 바탕으로 긍정/부정 의견표현으로 구분할 수 있다.Preferably, the second module simultaneously performs a rule-based model and a machine learning model preset for the extracted opinion sentence when the extracted opinion sentence is not a predefined exception handling rule candidate sentence. After determining the positive / negative opinion expression by applying, different reliability scores can be given according to whether the rule-based model and the machine learning model are matched, and can be divided into positive / negative opinion expressions based on the reliability score. .

바람직하게, 상기 제2 모듈은, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장이고, 미리 정의된 예외처리 규칙이 존재할 경우에 미리 정의된 예외처리 규칙기반 모델을 적용하여 긍정/부정/중립 의견표현으로 구분하고, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙이 존재하지 않을 경우에 미리 설정된 기계학습 모델을 적용하여 긍정/부정/중립 의견표현으로 구분할 수 있다.Preferably, the second module is affirmative / negative / by applying a predefined exception handling rule-based model when the extracted opinion sentence is a predefined exception handling rule candidate sentence and a predefined exception handling rule exists. In the case of a neutral opinion expression, the extracted opinion sentence may be divided into positive / negative / neutral opinion expression by applying a preset machine learning model when a predefined exception handling rule does not exist.

바람직하게, 상기 제2 서버에 저장되는 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 개체 ID, 대상, 각 개체의 각 대상에 대한 긍정/부정 의견 표현수, 전체 의견 표현수 또는 각 대상을 사용한 의견 표현 내용 중 적어도 어느 하나의 정보로 이루어질 수 있다.Preferably, the positive / negative opinion statistics information for each object of each object stored in the second server may include an object ID, a target, the number of positive / negative opinion expressions for each object of each object, the total number of opinion expressions, or each object. At least one of the opinion expression content using the information can be made.

바람직하게, 상기 웹 서버는, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재하지 않을 경우에 개체 매핑이 되어 있는 제2 서버에서 해당 사용자 검색 키워드로 검색하여 관련된 개체 결과 리스트를 디스플레이 해줄 수 있다.Preferably, when the meta information or target keyword stored in advance in the user search keyword does not exist, the web server may search for the corresponding user search keyword in the second server to which the object is mapped and display the related object result list. have.

바람직하게, 상기 웹 서버는, 상기 사용자 검색 키워드에 대한 의견 분석 결 과에 따라 긍정/부정 의견이 많은 순서로 개체 결과 리스트를 디스플레이 해줄 수 있다.Preferably, the web server may display the object result list in the order of positive / negative opinion according to the opinion analysis result for the user search keyword.

바람직하게, 상기 웹 서버는, 상기 사용자 단말로부터 전송된 사용자 검색 키워드를 분석하여 상기 제2 서버에 저장된 메타 정보 또는 대상 키워드가 존재하는지 판단하고, 그 판단 결과에 따라 키워드 검색 방식을 분류하는 키워드 분석모듈; 및 상기 키워드 분석모듈로부터 분류된 키워드 검색 방식에 따라 상기 제2 서버와 연동되어 해당 메타 정보 또는 대상 키워드와 관련된 개체들을 검색하여 해당 사용자 단말의 화면에 개체 리스트 결과를 디스플레이 해주는 키워드 검색모듈을 포함할 수 있다.Preferably, the web server analyzes a user search keyword transmitted from the user terminal to determine whether there is meta information or a target keyword stored in the second server and classifies a keyword search method according to the determination result. module; And a keyword search module for searching the entities related to the corresponding meta information or the target keyword and displaying the object list result on the screen of the corresponding user terminal according to the keyword search method classified from the keyword analysis module. Can be.

바람직하게, 상기 웹 서버는, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재할 경우 해당 메타 정보 키워드와 관련된 개체들을 검색한 후, 상기 검색된 개체들에 대해 해당 대상 키워드의 긍정/부정 또는 전체 의견 순서 중 어느 하나의 순서로 개체 결과 리스트를 재정렬하여 디스플레이 해줄 수 있다.Preferably, the web server searches for the entities related to the meta information keyword if the meta information and the target keyword pre-stored in the user search keyword exist, and then, affirmative / negative or all of the target keywords for the searched entities. The result list of objects can be rearranged and displayed in any order of the opinions.

바람직하게, 상기 제2 서버에 저장된 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장되어 있으며, 상기 웹 서버는, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않을 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 상기 형태소 분석된 사용자 검색 키워드와 상기 제2 서버에 저장된 문서/문단들을 비교 분석하여, 상기 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, the affirmative / negative opinion statistics information for each object of each object stored in the second server is stored so that each object is mapped on a document / paragraph basis, and the web server may store meta in advance in the user search keyword. If the information and the target keyword do not exist, the corresponding user search keyword is morphologically analyzed, and then the morphologically analyzed user search keyword is compared and analyzed by comparing the stemmed user search keyword with documents / paragraphs stored in the second server. Search in a paragraph result list and display the entity result list mapped to the searched document / paragraph result list on the corresponding user terminal.

바람직하게, 상기 제2 서버에 저장된 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장되어 있으며, 상기 웹 서버는, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 메타 정보 키워드가 존재할 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 메타 정보 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 상기 제2 서버에 저장된 문서/문단들을 비교 분석하여, 해당 메타 정보 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트에서 해당 메타 정보 키워드와 관련된 개체 결과 리스트를 필터링하여 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, the affirmative / negative opinion statistics information for each object of each object stored in the second server is stored so that each object is mapped on a document / paragraph basis, and the web server may store meta in advance in the user search keyword. If the information and the target keyword do not exist and the pre-stored meta information keyword exists, the corresponding user search keyword is stemmed and then the stemmed user search keyword except the meta information keyword and the document / paragraph stored in the second server are stored. Compare and analyze the search results of the stemmed user search keyword excluding the corresponding meta information keyword in the document / paragraph result list, search the entity result list mapped to the searched document / paragraph result list, and then search the retrieved entity result list. Lease object results related to the corresponding metainformation keyword By the filter can give a display in the user terminal.

바람직하게, 상기 제2 서버에 저장된 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장되어 있으며, 상기 웹 서버는, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 대상 키워드가 존재할 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 상기 제2 서버에 저장된 문서/문단들을 비교 분석하여, 해당 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검 색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트를 해당 대상 키워드의 긍정 또는 전체 의견수가 많은 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, the affirmative / negative opinion statistics information for each object of each object stored in the second server is stored so that each object is mapped on a document / paragraph basis, and the web server may store meta in advance in the user search keyword. If the information and the target keyword do not exist and the pre-stored target keyword exists, the corresponding user search keyword is stemmed, and then the stemmed user search keyword excluding the target keyword is compared with the documents / paragraphs stored in the second server. After analyzing, search for the stemmed user search keyword except the target keyword in the document / paragraph result list, search the entity result list mapped to the searched document / paragraph result list, and then apply the searched entity result list to the corresponding search result. Collect objects in the order of positive or total opinions of the target keywords. By reordering the list that will be displayed on the user terminal.

바람직하게, 상기 제2 서버에 저장된 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장되어 있으며, 상기 웹 서버는, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 메타 정보 및 대상 키워드가 존재할 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 메타 정보 및 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 상기 제2 서버에 저장된 문서/문단들을 비교 분석하여, 해당 메타 정보 및 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트에서 해당 메타 정보 키워드와 관련된 개체 결과 리스트를 필터링하고, 상기 필터링 된 개체 결과 리스트를 해당 대상 키워드의 긍정 또는 전체 의견수가 많은 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, the affirmative / negative opinion statistics information for each object of each object stored in the second server is stored so that each object is mapped on a document / paragraph basis, and the web server may store meta in advance in the user search keyword. If the information and the target keyword do not exist and the pre-stored meta information and the target keyword exist, the corresponding user search keyword is stemmed, and then the stemmed user search keyword except the meta information and the target keyword is stored in the second server. Compare and analyze the stored documents / paragraphs, search for the stemmed user search keyword except the meta information and the target keyword in the document / paragraph result list, and search the object result list mapped to the searched document / paragraph result list. , The corresponding meta information keyword and the The related entity result list may be filtered, and the filtered entity result list may be rearranged and displayed on the corresponding user terminal in order of positive or total opinions of the corresponding target keywords.

바람직하게, 상기 웹 서버는, 상기 개체 리스트 결과와 함께 각 개체의 의견통계 정보들을 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, the web server may display opinion statistics information of each entity together with the entity list result on the corresponding user terminal.

본 발명의 제2 측면은, (a) 인터넷 상에 존재하는 웹 문서 데이터들을 수집 하는 단계; (b) 상기 수집된 웹 문서 데이터들을 제공받아 개체별 메타 정보들을 추출한 후, 상기 개체별 메타 정보들을 이용하여 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 분석하는 단계; (c) 상기 분석된 개체별 메타 정보들을 비롯한 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 데이터베이스화하여 저장되도록 인덱싱하는 단계; 및 (d) 인터넷을 통해 접속되어 사용자 단말로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재하는지 판단한 후, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재할 경우 해당 사용자 단말의 화면에 해당 메타 정보 또는 대상 키워드와 관련된 개체 리스트 결과를 디스플레이 해주는 단계를 포함하는 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템 및 그 방법을 제공하는 것이다.A second aspect of the invention comprises the steps of: (a) collecting web document data residing on the internet; (b) receiving the collected web document data, extracting meta information for each object, and analyzing positive / negative statistical information about each object of each individual using the individual meta information; (c) indexing positive / negative statistical information about each object of each individual including the analyzed individual meta-information to be stored in a database; And (d) determining whether there is meta information or a target keyword stored in advance in the user search keyword transmitted from the user terminal connected through the Internet, and if the meta information or the target keyword is stored in the user search keyword in advance. The present invention provides a hybrid object-based opinion analysis system and method for retrieving an object using the Internet including the step of displaying an object list result related to corresponding meta information or a target keyword on a screen.

여기서, 상기 단계(b)는, (b-1) 상기 수집된 웹 문서 데이터들에 대해 미리 설정된 분야별 기계학습 모델을 이용하여 분야별로 분류하는 단계; (b-2) 상기 수집된 웹 문서 데이터들에 대해 언어처리를 수행하여 의견 문장을 추출하고, 상기 추출된 의견 문장에 대해 긍정/부정 의견표현으로 구분하는 단계; (b-3) 상기 수집된 웹 문서 데이터들에 대해 상기 개체별 메타 정보들을 이용하여 어떤 개체에 해당되는지 판별하는 단계; 및 (b-4) 상기 단계(b-2)에서 추출된 의견 문장의 주변에 있는 단어 및 품사 정보를 이용하여 어떤 대상에 해당되는지 판별하는 단계를 포함함이 바람직하다.Here, the step (b) may include: (b-1) classifying by sector by using a preset machine learning model for each of the collected web document data; (b-2) extracting an opinion sentence by performing language processing on the collected web document data, and dividing the extracted opinion sentence into positive / negative opinion expressions; (b-3) determining which entity corresponds to the collected web document data using the entity-specific meta information; And (b-4) determining which object corresponds to the word and part-of-speech information around the opinion sentence extracted in the step (b-2).

바람직하게, 상기 단계(b-2)는, 상기 수집된 웹 문서 데이터에 대해 문장 단 위로 분리하고, 상기 분리된 각 문장에 대해 언어처리를 수행하여 언어적인 자질들을 추출하는 단계; 상기 추출된 각 문장의 언어적인 자질들을 이용하여 의견/비의견 문장을 구분하는 단계; 및 상기 구분된 의견 문장의 언어적인 자질들에 대해 긍정/부정 의견표현으로 구분하는 단계를 포함할 수 있다.Preferably, the step (b-2) may include: separating linguistic units from the collected web document data, and performing linguistic processing on the separated sentences to extract linguistic features; Classifying opinion / non-computation sentences using linguistic qualities of the extracted sentences; And dividing the linguistic qualities of the divided opinion sentences into positive / negative opinion expressions.

바람직하게, 상기 단계(b-2)에서, 상기 추출된 의견 문장에 대해 미리 정의된 규칙이 존재할 경우에 미리 설정된 규칙기반(Rule-Based) 모델을 적용하여 긍정/부정 의견표현으로 구분한 후, 상기 추출된 의견 문장에 대해 미리 정의된 규칙이 존재하지 않을 경우에 미리 설정된 기계학습 모델을 적용하여 긍정/부정 의견표현으로 구분할 수 있다.Preferably, in step (b-2), when a predefined rule exists for the extracted opinion sentence, a predetermined rule-based model is applied and divided into positive / negative opinion expressions. When there is no predefined rule for the extracted opinion sentence, a predetermined machine learning model may be applied and classified into positive / negative opinion expressions.

바람직하게, 상기 단계(b-2)에서, 상기 추출된 의견 문장에 대해 미리 설정된 규칙기반 모델(Rule-Based) 및 기계학습 모델을 동시에 적용하여 긍정/부정 의견표현을 판단한 후, 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수를 부여하고, 상기 신뢰도 점수를 바탕으로 긍정/부정 의견표현으로 구분할 수 있다.Preferably, in step (b-2), the rule-based model is determined by simultaneously applying a rule-based model and a machine learning model to the extracted opinion sentence at the same time. And different reliability scores according to whether or not the results of applying the machine learning model are matched, and may be divided into positive / negative opinion expressions based on the reliability scores.

바람직하게, 상기 단계(b-2)에서, 상기 추출된 의견 문장에 대해 미리 정의된 예외처리 규칙후보 문장인지를 판단한 후, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장일 경우에 미리 정의된 예외처리 규칙의 존재여부에 따라 미리 정의된 예외처리 규칙기반 모델 또는 기계학습 모델을 적용하여 긍정/부정/중립 의견표현으로 구분할 수 있다.Preferably, in the step (b-2), it is determined whether the extracted exception sentence is a predefined exception handling rule candidate sentence for the extracted opinion sentence, and in advance when the extracted opinion sentence is a predefined exception handling rule candidate sentence. Depending on the existence of the defined exception handling rules, a predefined exception handling rule-based model or machine learning model can be applied to classify as positive / negative / neutral.

바람직하게, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장 이 아닐 경우에 상기 추출된 의견 문장에 대해 미리 설정된 규칙기반 모델(Rule-Based) 및 기계학습 모델을 동시에 적용하여 긍정/부정 의견표현을 판단한 후, 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수를 부여하고, 상기 신뢰도 점수를 바탕으로 긍정/부정 의견표현으로 구분할 수 있다.Preferably, when the extracted opinion sentence is not a predefined exception handling rule candidate sentence, a positive / negative opinion is applied by simultaneously applying a rule-based model and a machine learning model preset to the extracted opinion sentence. After determining the expression, different reliability scores may be given according to whether the rule-based model and the machine learning model are matched, and classified into positive / negative expressions based on the reliability scores.

바람직하게, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장이고, 미리 정의된 예외처리 규칙이 존재할 경우에 미리 정의된 예외처리 규칙기반 모델을 적용하여 긍정/부정/중립 의견표현으로 구분한 후, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙이 존재하지 않을 경우에 미리 설정된 기계학습 모델을 적용하여 긍정/부정/중립 의견표현으로 구분할 수 있다.Preferably, the extracted comment sentence is a predefined exception handling rule candidate sentence, and when a predefined exception handling rule exists, a predefined exception handling rule-based model is applied and divided into positive / negative / neutral opinion expressions. Subsequently, when the extracted opinion sentence does not have a predefined exception handling rule, a predetermined machine learning model may be applied and classified into positive / negative / neutral opinion expressions.

바람직하게, 상기 단계(d)에서, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재하지 않을 경우에 개체 매핑이 되어 있는 데이터베이스(DB)에서 해당 사용자 검색 키워드로 검색하여 관련된 개체 결과 리스트를 디스플레이 해줄 수 있다.Preferably, in the step (d), if there is no meta information or target keyword pre-stored in the user search keyword, the search results are searched with the corresponding user search keyword in the database (DB) to which the object is mapped to search for the related entity result list. Can display

바람직하게, 상기 단계(d)에서, 상기 사용자 검색 키워드에 대한 의견 분석 결과에 따라 긍정/부정 의견이 많은 순서로 개체 결과 리스트를 디스플레이 해줄 수 있다.Preferably, in step (d), the object result list may be displayed in the order of having a lot of positive / negative opinions according to the opinion analysis result for the user search keyword.

바람직하게, 상기 단계(d)에서, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재할 경우 해당 메타 정보 키워드와 관련된 개체들을 검색한 후, 상기 검색된 개체들에 대해 해당 대상 키워드의 긍정/부정 또는 전체 의견 순서 중 어느 하나의 순서로 개체 결과 리스트를 재정렬하여 디스플레이 해줄 수 있다.Preferably, in step (d), if the meta information and the target keyword pre-stored in the user search keyword exist, the entities related to the meta information keyword are searched, and then the positive / negative of the corresponding target keyword is searched for the searched entities. Alternatively, the object result list may be rearranged and displayed in any order of the entire opinion order.

바람직하게, 상기 단계(c)에서, 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장하고, 상기 단계(d)에서, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않을 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 상기 형태소 분석된 사용자 검색 키워드와 상기 저장된 문서/문단들을 비교 분석하여, 상기 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, in step (c), affirmative / negative statistical information about each object of each object is stored so that each object is mapped in document / paragraph units, and in step (d), the user search keyword is previously stored. If the stored meta information and the target keyword do not exist, the corresponding user search keyword is morphologically analyzed, and then the morphologically analyzed user search keyword is compared with the stored document / paragraph to compare the stemmed user search keyword with the document / paragraph. The result list may be searched and the entity result list mapped to the searched document / paragraph result list may be displayed on the corresponding user terminal.

바람직하게, 상기 단계(c)에서, 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장하며, 상기 단계(d)에서, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 메타 정보 키워드가 존재할 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 메타 정보 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 상기 저장된 문서/문단들을 비교 분석하여, 해당 메타 정보 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트에서 해당 메타 정보 키워드와 관련된 개체 결과 리스트를 필터링하여 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, in step (c), affirmative / negative statistical information about each object of each object is stored so that each object is mapped on a document / paragraph basis, and in step (d), in advance to the user search keyword If the stored meta information and the target keyword do not exist and the pre-stored meta information keyword exists, the corresponding user search keyword is morphologically analyzed, and then the stored user search keyword except the meta information keyword is compared with the stored document / paragraphs. Analyze the search results of the stemmed user search keyword except the meta information keyword in the document / paragraph result list, search the entity result list mapped to the searched document / paragraph result list, and then apply the corresponding search result in the searched entity result list. Filter the list of object results related to meta-information keywords to Can be displayed on the user terminal.

바람직하게, 상기 단계(c)에서, 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장하며, 상기 단계(d)에서, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 대상 키워드가 존재할 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 상기 저장된 문서/문단들을 비교 분석하여, 해당 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트를 해당 대상 키워드의 긍정 또는 전체 의견수가 많은 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, in step (c), affirmative / negative statistical information about each object of each object is stored so that each object is mapped on a document / paragraph basis, and in step (d), in advance to the user search keyword If the stored meta information and the target keyword do not exist, and the pre-stored target keyword exists, the corresponding user search keyword is morphologically analyzed, and then the stemmed analysis of the user search keyword except the target keyword is compared with the stored documents / paragraphs. Search for the stemmed user search keyword excluding the target keyword in the document / paragraph result list, search the entity result list mapped to the searched document / paragraph result list, and search the searched entity result list for the target keyword Reorder the list of object results in order of positive or total feedback It can be displayed on the user terminal.

바람직하게, 상기 단계(c)에서, 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장하며, 상기 단계(d)에서, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 메타 정보 및 대상 키워드가 존재할 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 메타 정보 및 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 상기 저장된 문서/문단들을 비교 분석하여, 해당 메타 정보 및 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트에서 해당 메타 정보 키워드와 관련된 개체 결과 리스트를 필터링하고, 상기 필터링 된 개체 결과 리스트를 해 당 대상 키워드의 긍정 또는 전체 의견수가 많은 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, in step (c), affirmative / negative statistical information about each object of each object is stored so that each object is mapped on a document / paragraph basis, and in step (d), in advance to the user search keyword If the stored meta information and the target keyword do not exist and the pre-stored meta information and the target keyword exist, the corresponding user search keyword is stemmed and then the stemmed user search keyword and the stored document except the meta information and the target keyword are Comparing / parsing the paragraphs, searching the document / paragraph result list for the stemmed user search keyword excluding the meta information and the target keyword, and searching the entity result list mapped to the searched document / paragraph result list. The object result lease associated with the corresponding metainformation keyword in the retrieved object result list. Filtering the list and rearranging the filtered result list of the object in the order of the number of positive or total opinions of the target keyword and displaying the filtered result list on the corresponding user terminal.

바람직하게, 상기 단계(d)에서, 상기 개체 리스트 결과와 함께 각 개체의 의견통계 정보들을 해당 사용자 단말에 디스플레이 해줄 수 있다.Preferably, in step (d), opinion statistics information of each entity may be displayed on the corresponding user terminal together with the entity list result.

바람직하게, 상기 단계(d)에서, 해당 사용자 키워드에 대한 검색 결과를 해당 사용자 단말의 화면에 표시할 때, 각 개체 정보와 개체에 대해서 자동으로 추출한 대상 키워드 통계 정보를 긍정의견, 전체의견 또는 부정의견 중 어느 하나의 의견이 많은 순서로 배열하고, 대상 키워드의 긍정/부정 수치와 이를 나타내는 기호를 표시한 후, 찾은 개체에 대한 긍정/부정 의견을 좌우로 나누어 디스플레이 하거나, 긍정/부정 탭으로 선택할 수 있다.Preferably, in the step (d), when displaying a search result for the corresponding user keyword on the screen of the corresponding user terminal, each object information and the target keyword statistical information automatically extracted for the object are positive, total opinion or negative. One of the opinions is arranged in order of many, and the positive / negative values of the target keywords and the symbols indicating them are displayed, and the positive / negative opinions for the found objects are displayed left and right, or the positive / negative tabs are selected. Can be.

이상에서 설명한 바와 같은 본 발명의 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템 및 그 방법에 따르면, 인터넷을 이용하는 사용자들에 의해 입력된 특정 검색 키워드가 지시하는 대상을 직접 찾아서 문서가 아닌 개체 결과 리스트 및/또는 규칙기반과 기계학습 방식을 모두 사용하는 하이브리드 기반 방식으로 추출된 의견통계 정보들을 해당 사용자 단말을 통해 디스플레이 해줌으로써, 인터넷 사용자들은 특정 검색 키워드와 관련된 개체들을 한눈에 검색 및 모니터링 할 수 있는 이점이 있다.According to the object search using the Internet of the present invention as described above, and a hybrid-based opinion analysis system and method therefor, the document is searched directly by searching for an object indicated by a specific search keyword inputted by users using the Internet. Internet users can search and search for objects related to a specific search keyword at a glance by displaying, on the user's device, a list of individual results rather than a list of individual results and / or a hybrid-based method that uses both rule-based and machine learning methods. There is an advantage to monitoring.

이하, 첨부 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. 그러나, 다음에 예시하는 본 발명의 실시예는 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 다음에 상술하는 실시예에 한정되는 것은 아니다. 본 발명의 실시예는 당업계에서 통상의 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위하여 제공되어지는 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the following embodiments of the present invention may be modified into various other forms, and the scope of the present invention is not limited to the embodiments described below. The embodiments of the present invention are provided to enable those skilled in the art to more fully understand the present invention.

먼저, 본 발명의 전반에 걸쳐 언급된 '대상(Sub-theme)'이란 어떠한 개체(Entity)가 가지고 있는 본질적인 성질 즉, 속성(Property)이라고도 칭하며, 주로 의견을 많이 표현하는 부분으로서, 인터넷 검색의 모든 분야(예컨대, 영화, 정치, 경제, 게임, 스포츠 등)에 걸쳐 적용될 수 있는 포괄적인 의미이다.First of all, the term 'sub-theme' mentioned throughout the present invention is referred to as an essential property of an entity, that is, a property, and mainly expresses many opinions. It is a comprehensive meaning that can be applied across all fields (eg, cinema, politics, economy, games, sports, etc.).

예를 들면, 영화 분야에서의 대상(Sub-theme)은 감동, 재미, 배우, 연기, 스토리, 반전, 그래픽, 음악, 장면 등으로 이루어질 수 있고, 전자제품 분야에서의 대상(Sub-theme)은 가격, 디자인, 배터리, A/S 등으로 이루어질 수 있으며, 맛집 분야에서의 대상(Sub-theme)은 맛, 가격, 분위기 등으로 이루어질 수 있다.For example, the sub-theme in the field of film may be composed of emotion, fun, actor, acting, story, inversion, graphics, music, scene, etc. It may be made of price, design, battery, A / S, and the like (Sub-theme) in the restaurant field may be made of taste, price, atmosphere and the like.

그리고, 상기 '개체(Entity)'란 상기 대상(Sub-theme)의 상위 개념으로서, 관련 있는 대상들이 모여서 하나의 정보 단위를 나타낸 것이다.The term 'Entity' is a higher concept of the sub-theme, in which related objects are gathered to represent one information unit.

도 1은 본 발명의 일 실시예에 따른 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템을 설명하기 위한 전체적인 블록 구성도이고, 도 2는 본 발명의 일 실시예에 적용된 데이터 분석서버를 구체적으로 설명하기 위 한 블록 구성도이다.FIG. 1 is a block diagram illustrating an entity search using the Internet and a hybrid-based opinion analysis system for the same according to an embodiment of the present invention, and FIG. 2 is a data analysis server applied to an embodiment of the present invention. It is a block diagram for explaining in detail.

도 1 및 도 2를 참조하면, 본 발명의 일 실시예에 따른 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템은, 크게 데이터 수집서버(100), 데이터 분석서버(200), 인덱싱 서버(300), 웹 서버(400) 및 사용자 단말(500) 등을 포함하여 이루어질 수 있다.1 and 2, the object search using the Internet and a hybrid-based opinion analysis system for the same according to an embodiment of the present invention are largely a data collection server 100, a data analysis server 200, and indexing. The server 300 may include a web server 400 and a user terminal 500.

여기서, 데이터 수집서버(100)는 인터넷(10) 상에 존재하는 웹 문서 데이터들을 수집하는 서버로서, 인터넷 상에서 RSS 주소들을 수집 및 저장하고, 상기 수집 및 저장된 RSS 주소들에 해당하는 RSS 파일들을 제공받아 각 RSS 파일이 제공해주는 링크정보를 이용하여 웹 문서 데이터를 수집하는 기능을 수행한다.Here, the data collection server 100 is a server that collects web document data existing on the Internet 10, collects and stores RSS addresses on the Internet, and provides RSS files corresponding to the collected and stored RSS addresses. It collects web document data using link information provided by each RSS file.

이러한 데이터 수집서버(100)는 인터넷(10)에 연결되어 통상의 자동적인 확장방식으로 수많은 RSS 주소들을 수집하여 별도의 데이터베이스(DB)에 저장될 수 있도록 전송하는 기능을 수행한다.The data collection server 100 is connected to the Internet 10 performs a function of transmitting a number of RSS addresses to be stored in a separate database (DB) by collecting a large number of RSS addresses in a normal automatic expansion method.

이때, 상기 자동적인 확장방식은 대표적인 인터넷 자원(예컨대, RSS 또는 ATOM 등) 주소 표현 형태를 이용하여 인터넷(Internet) 상에 있는 웹(Web)(예컨대, IPv4에서는 일반 웹, IPv6에서는 전자제품을 포함한 웹 등) 문서(HTML 파일)에서 RSS 주소를 자동적으로 추출하고, 해당 웹 문서에 있는 링크(link)에서도 같은 방식으로 RSS 주소를 추출하는 방식이다.In this case, the automatic extension method includes a web (eg, general web in IPv4 and electronic products in IPv6) using a representative Internet resource (eg, RSS or ATOM) address representation form. Web addresses are extracted automatically from the document (HTML file), and the same way to extract the RSS address from the link (link) in the web document.

즉, 미리 설정된 주요 포탈이나 블로그 웹 문서를 시작으로 해서 점차적으로 해당 웹 문서들의 외부로 향하는 링크를 따라 방문하면서 RSS 주소를 자동 추출하거나, RSS 주소를 추출할 웹 문서를 주요 메타 사이트들이 제공해주는 최신 RSS 파 일을 주기적으로 방문하면서 이에 들어 있는 링크 주소를 방문하여 RSS 주소를 추출하는 방식이다.In other words, starting with a preconfigured main portal or blog web document, the web site automatically extracts the RSS address by visiting a web link that gradually goes outside, or the latest meta-site provides the web document to extract the RSS address. It is a method of extracting RSS address by visiting RSS file periodically and visiting the link address contained in it.

또한, 데이터 수집서버(100)는 별도의 데이터베이스(DB)에 미리 저장된 RSS 주소들에 해당하는 RSS 파일들을 제공받아 각 RSS 파일이 제공해주는 링크정보를 이용하여 웹 문서 데이터들을 수집하는 기능을 수행한다.In addition, the data collection server 100 receives RSS files corresponding to RSS addresses stored in a separate database in advance and collects web document data using link information provided by each RSS file. .

즉, 데이터 수집서버(100)는 별도의 데이터베이스(DB)와 연동되어 미리 수집 및 저장된 RSS 주소 목록을 주기적으로 제공받아 각 RSS 주소를 방문하면서 해당 RSS 파일을 다운로드(Download)받은 후, 각 RSS 파일이 제공해주는 RSS 정보들(예컨대, 제목(title), 링크(link), 요약설명(description), 카테고리(category), 등록날짜(publication date) 정보 등) 중 소스 링크정보에 존재하는 링크(link)를 방문하여 해당 웹 문서 데이터(예컨대, RSS 주소, 원문 링크, 날짜, 제목, 본문, 태그, 블로그 이름, 카테고리, 썸네일, 이미지, 동영상, 글자수/이미지 개수/동영상 개수 등)를 수집하여 데이터 분석서버(200) 또는 인덱싱 서버(300)의 데이터베이스(DB)에 전송한다.That is, the data collection server 100 receives a list of RSS addresses periodically collected and stored in association with a separate database DB, visits each RSS address, downloads the corresponding RSS file, and then downloads each RSS file. Links that are present in the source link information among the RSS information provided (e.g. title, link, description, category, publication date information, etc.) To analyze the data by collecting the relevant web document data (e.g. RSS address, text link, date, title, body, tag, blog name, category, thumbnail, image, video, character count / image count / video count, etc.) The database 200 of the server 200 or the indexing server 300 is transmitted.

이때, 상기 소스 링크정보에 존재하는 링크 방문 시 별도의 데이터베이스(DB)에 미리 저장된 RSS 파일 목록과 상기 다운로드(Download)받은 RSS 파일을 비교하여 RSS 파일 내용 중에서 갱신된 RSS 정보의 소스 링크정보에 존재하는 링크를 방문하여 수집함이 바람직하다.At this time, when visiting a link existing in the source link information, the RSS file list previously stored in a separate database (DB) is compared with the downloaded RSS file and present in the source link information of the updated RSS information among the contents of the RSS file. It is desirable to visit and collect links.

한편, 데이터 수집서버(100)에 의해 웹 문서 데이터를 수집할 경우, 각 RSS 파일이 제공해주는 링크가 활성화되어 있는지를 체크하는 활성화 여부 체크 기능 과, 스팸 RSS(예컨대, 광고성 및 성인성 글과 같은 상업적인 RSS, 리포트 샵과 같은 다른 사이트에 대한 링크만 있는 RSS, 글리 너무 빨리 업데이트 되는 RSS 등) 체크 기능과, 중복 RSS(예컨대, 한 블로그에서 RSS1.0, RSS2.0, Atom를 동시에 제공하는 경우, feedburner 혹은 메타 블로그를 통해서 RSS 재발행 하는 경우 등) 체크 기능 등을 수행할 수도 있다.On the other hand, when collecting web document data by the data collection server 100, the activation check function for checking whether the link provided by each RSS file is activated, and spam RSS (eg, advertising and adult articles such as RSS feeds that only link to other sites, such as commercial RSS, report shops, and glyphs that update too quickly, and duplicate RSS feeds (for example, RSS1.0, RSS2.0, and Atom on the same blog) , RSS feeds via feedburner or meta blogs).

또한, 데이터 수집서버(100)는 본 출원인에 의해 선출원된 특허출원 제2008-93125호(인터넷을 이용한 의견 검색 시스템 및 그 방법)에 제안된 방식으로 인터넷(10) 상에 존재하는 각 웹사이트(Web Site)들의 HTML(Hyper Text Markup Language) 정보를 실시간으로 다운로드(Download) 받고, 상기 다운로드(Download) 받은 웹 문서 데이터에서 필요한 정보들 예컨대, 텍스트(Text), 이미지(Image) 또는 비디오(Video) 등의 정보들 중 적어도 어느 하나의 정보 데이터를 추출하여 별도의 데이터 저장수단에 저장시킬 수 있다.In addition, the data collection server 100 is a web application that exists on the Internet 10 in the manner proposed in Patent Application No. 2008-93125 (Opinion Retrieval System and Method Using the Internet) filed by the applicant. Downloading Hyper Text Markup Language (HTML) information of Web Sites in real time, and information required in the downloaded Web document data, for example, text, image, or video. At least one of the information data, such as information may be extracted and stored in a separate data storage means.

또한, 데이터 수집서버(100)는 의견정보 데이터(즉, 일반 문장/문서 데이터와 이에 대한 긍정/부정 평가가 매겨진 정보 데이터)를 포함하는 웹 문서 데이터들을 선별하여 수집할 수도 있다.In addition, the data collection server 100 may collect and collect web document data including opinion information data (ie, general sentence / document data and information data given affirmative / negative evaluation thereof).

이때, 상기 의견정보 데이터를 포함하는 웹 문서 데이터들만을 선별적으로 수집하는 방법으로는, 의견정보 데이터를 포함하는 특정의 웹 문서 데이터를 선별하고, 후술하는 기계학습 알고리즘(예컨대, SVM, K-NN, Bayseian 등)을 사용하여 웹 문서 선별 모델을 생성한 후, 상기 생성된 웹 문서 선별 모델을 사용하여 전체 인터넷 웹 페이지에서 의견정보 데이터가 포함된 웹 문서 데이터들만을 선별적으로 수집할 수 있게 된다.In this case, as a method for selectively collecting only web document data including the opinion information data, the specific web document data including the opinion information data is selected and machine learning algorithms (for example, SVM and K-) described later are selected. NN, Bayseian, etc.) to generate a web document screening model, and then use the generated web document screening model to selectively collect only web document data including opinion information data from the entire Internet web page. do.

더욱이, 데이터 수집서버(100)에 의해 수집된 웹 문서 데이터는 바로 사용도 가능하지만, 도메인 분류모듈(미도시)을 적용하여 각 도메인별로 분류한 후 사용하는 방법도 가능하다.Furthermore, the web document data collected by the data collection server 100 may be used immediately, but a method of classifying each domain by using a domain classification module (not shown) may be used.

한편, 인터넷(Internet)(10)은 TCP/IP 프로토콜 및 그 상위계층에 존재하는 여러 서비스, 즉 HTTP(Hyper Text Transfer Protocol), Telnet, FTP(File Transfer Protocol), DNS(Domain Name System), SMTP(Simple Mail Transfer Protocol), SNMP(Simple Network Management Protocol), NFS(Network File Service), NIS(Network Information Service) 등을 제공하는 전 세계적인 개방형 컴퓨터 네트워크 구조를 의미하며, 사용자 단말(500)은 후술하는 웹 서버(400)에 용이하게 접속될 수 있게 하는 환경을 제공한다. 한편, 인터넷(10)은 유선 또는 무선 인터넷일 수도 있고, 이외에도 유선 공중망, 무선 이동 통신망, 또는 휴대 인터넷 등과 통합된 코어망 일 수도 있다.On the other hand, the Internet (10) is a TCP / IP protocol and a number of services that exist in the upper layer, that is, Hyper Text Transfer Protocol (HTTP), Telnet, File Transfer Protocol (FTP), Domain Name System (DNS), SMTP (Simple Mail Transfer Protocol), Simple Network Management Protocol (SNMP), Network File Service (NFS), Network Information Service (NIS), and the like, a worldwide open computer network structure that provides the user terminal 500 will be described later It provides an environment that allows easy access to the web server 400. Meanwhile, the Internet 10 may be a wired or wireless internet, or may be a core network integrated with a wired public network, a wireless mobile communication network, or a portable internet.

데이터 분석서버(200)는 데이터 수집서버(100)로부터 수집된 웹 문서 데이터들을 제공받아 개체별 메타 정보들을 추출하고, 상기 개체별 메타 정보들을 이용하여 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 분석하는 기능을 수행한다.The data analysis server 200 receives web document data collected from the data collection server 100, extracts meta information for each object, and affirmative / negative opinion statistics for each object of each object using the meta information for each object. It analyzes the information.

이때, 데이터 수집서버(100)는 인터넷 상에서 각 개체에 대한 메타 정보들(예컨대, 영화에 대한 배우, 감독, 출시 날짜 등)을 주기적으로 업데이트 하는 특정 웹사이트(예컨대, http:// movie.daum.net / moviedetail / moviedetailMain.do ? movield = 52800 등)를 미리 지정하여 특정 웹 문서 데이터를 수집하고, 데이터 분석서버(200)는 데이터 수집서버(100)로부터 수집된 특정 웹 문서 데이터에서 미리 설정된 형태의 문자열을 나타내는 정규식(Regular Expression)을 이용하여 개체별 메타 정보들을 추출한다.At this time, the data collection server 100 is a specific website (for example, http: // movie.daum) to periodically update the meta information (for example, actor, director, release date, etc.) for each individual on the Internet .net / moviedetail / moviedetailMain.do? movield = 52800, etc.) in advance to collect specific web document data, and the data analysis server 200 is a preset form from the specific web document data collected from the data collection server 100 Extract meta information for each object using regular expression that represents string of.

그리고, 상기 정규식은 어떤 형태의 문자열을 지칭하는 기호로서, 예를 들어서 『a href="(.*?)"』 이런 식으로 정규식을 만들면, 여기서 『( )』는 문자열 그룹을 나타내고, 『.』은 임의의 문자열을 지칭하고, 『*』는 그 문자열이 0개 이상 나옴을 나타내고, 『?)"』 이라고 나타낸 것은 『"』문자열이 나오기 전까지의 문자열을 의미한다.And, the regular expression is a symbol indicating a string of some form, for example, "a href =" (. *?) "" If a regular expression is made in this manner, where "()" represents a group of strings, ". Refers to an arbitrary string, and "*" indicates zero or more occurrences of the string, and "?)" "Means a string before the" "" string.

따라서, <a href="http://test.com">이라는 문자열에서 『http://test.com』이란 스트링을 찾을 수 있게 된다. 이런 정규식을 이용하여 각 개체(Entity)에 대한 메타 정보들을 추출할 수 있게 된다. 이때, 상기 추출한 정보들은 인덱싱 서버(300)로 전송하여 바로 저장될 수 있도록 한다.Therefore, the string "http://test.com" can be found in the string <a href="http://test.com">. By using this regular expression, meta information about each entity can be extracted. At this time, the extracted information is transmitted to the indexing server 300 to be stored immediately.

또한, 본 발명의 일 실시예에서는 데이터 분석서버(200)에서 개체별 메타 정보들을 추출하여 인덱싱 서버(300)로 전송하였지만, 이에 국한하지 않으며, 데이터 수집서버(100)에서 개체별 메타 정보들을 추출하여 바로 인덱싱 서버(300) 또는 별도의 데이터베이스(DB)에 저장되도록 전송할 수도 있다.In addition, in an embodiment of the present invention, the meta data for each object is extracted from the data analysis server 200 and transmitted to the indexing server 300. However, the present invention is not limited thereto, and the meta information for each object is extracted from the data collection server 100. It may be transmitted to be stored in the indexing server 300 or a separate database (DB) immediately.

또한, 데이터분석서버(200)는 데이터 수집서버(100)로부터 수집된 웹 문서 데이터들에 대해 미리 설정된 분야별(예컨대, 영화, 정치, 경제, 게임, 스포츠 등) 기계학습(Machine Learning) 모델을 이용하여 분야별로 분류하는 제1 모듈(210)과, 데이터 수집서버(100)로부터 수집된 웹 문서 데이터들에 대해 언어처리를 수행하여 의견 문장을 추출하고, 상기 추출된 의견 문장에 대해 긍정/부정 의견표현으로 구분하는 제2 모듈(220)과, 데이터 수집서버(100)로부터 수집된 웹 문서 데이터들에 대해 상기 추출된 개체별 메타 정보들을 이용하여 어떤 개체에 해당되는지 판별하는 제3 모듈(230)과, 제2 모듈(220)을 통해 추출된 의견 문장의 주변에 있는 단어 및 품사 정보 등을 이용하여 어떤 대상(Sub-theme)에 해당되는지 판별하는 제4 모듈(240)과, 상기 추출된 개체별 메타 정보들과 함께 제1 내지 제4 모듈(210 내지 240)로부터 출력된 정보들을 이용하여 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 분석하는 제5 모듈(250) 등을 포함할 수 있다.In addition, the data analysis server 200 uses a machine learning model for each field (eg, movie, politics, economy, game, sports, etc.) preset for web document data collected from the data collection server 100. The first module 210 to classify by field and language processing on the web document data collected from the data collection server 100 to extract opinion sentences, and affirmative / negative opinions on the extracted opinion sentences. A second module 220 for classifying the expression and the third module 230 for determining which object corresponds to the extracted object-specific meta information on the web document data collected from the data collection server 100. And a fourth module 240 for determining which sub-theme corresponds to the word and part-of-speech information around the opinion sentence extracted through the second module 220, and the extracted entity. Star meta information And a fifth module 250 for analyzing positive / negative statistical information about each object of each individual by using information output from the first to fourth modules 210 to 240.

여기서, 제2 모듈(220)은 언어처리부(221), 의견/비의견 구분부(222) 및 의견표현 구분부(223) 등을 포함할 수 있다.Here, the second module 220 may include a language processor 221, an opinion / non-discrimination division 222, an opinion expression division 223, and the like.

언어처리부(221)는 데이터 수집서버(100)로부터 수집되거나 별도의 데이터베이스(DB)에 저장된 웹 문서 데이터에 대해 문장 단위로 분리하고, 분리된 각 문장에 대해 언어처리를 수행하여 언어적인 자질(Feature)들을 추출하는 기능을 수행한다.The language processing unit 221 separates the web document data collected from the data collection server 100 or stored in a separate database (DB) in sentence units, and performs language processing on each of the separated sentences. ) To extract them.

이때, 상기 언어처리는 예컨대, 형태소 분석(Morpheme Analyze) 또는 띄어쓰기(Segmentation) 처리로 수행됨이 바람직하지만, 이외에도 자질(또는 색인어) 추출을 위한 조사 처리, 한국어 굴절 처리, 또는 원형 복귀 처리 등을 수행할 수도 있다.In this case, the linguistic processing may be performed by, for example, Morpheme Analyze or Segmentation, but in addition to the irradiation process for extracting features (or index words), Korean refractive processing, or circular return processing, etc. It may be.

그리고, 의견/비의견 구분부(222)는 언어처리부(221)로부터 추출된 각 문장 의 언어적인 자질(Feature)들을 이용하여 의견/비의견 문장을 구분하는 기능을 수행한다.And, the opinion / disagreement separator 222 performs a function of distinguishing the opinion / disagreement sentences using linguistic features of each sentence extracted from the language processing unit 221.

즉, 언어처리부(221)로부터 추출된 문장들은 의견이 있는 문장들도 있고, 의견이 존재하지 않은 일반 문장도 있다. 이러한 문장들은 의견/비의견 구분부(222)를 이용하여 의견이 존재하는 문장과 의견이 존재하지 않은 문장으로 구분할 수 있게 된다.That is, the sentences extracted from the language processing unit 221 include sentences with opinions, and general sentences without opinions. These sentences may be divided into sentences in which an opinion exists and sentences in which an opinion does not exist using the opinion / non-comment division unit 222.

이러한 의견/비의견 구분부(222)는 상술한 통상의 기계학습 알고리즘을 이용하여 용이하게 구현될 수 있다. 이를 구체적으로 설명하면, 먼저, 의견으로 이루어진 데이터 집합과 사실 정보로만 이루어진 데이터 집합을 수집한다. 이후에, 예컨대, 형태소 분석(Morpheme Analyze)이나 띄어쓰기(Segmentation) 등을 수행하여 적절한 언어적인 자질(Feature)을 추출한다.The opinion / disagreement separator 222 can be easily implemented using the conventional machine learning algorithm described above. Specifically, first, a data set composed of opinions and a data set composed only of fact information are collected. Thereafter, for example, Morpheme Analyze or Segmentation is performed to extract an appropriate linguistic feature.

여기서, 상기 띄어쓰기(Segmentation)라 함은 입력 문장을 의미를 가지는 단위로 나누는 과정이다. 예를 들면, 입력 문장이 "나는 영화를 재밌게 봤다"라고 한다면, 결과 문장은 "나 는 영화 를 재밌 게 보 았 다"로 변환된다.Here, the spacing is a process of dividing an input sentence into units having meanings. For example, if the input sentence says "I enjoyed the movie", the resulting sentence translates to "I enjoyed the movie".

그리고, 상기 형태소 분석(Morpheme Analyze)이라 함은 상기 각 나뉘어진 단위에 대하여 어떤 품사(Part Of Speech) 정보를 지니고 있는지 찾아주는 작업이다. 예를 들면, 입력 문장이 "나는 영화를 재밌게 봤다"라고 한다면, 결과 문장은 "나(CTP1 1인칭 대명사) + 는(fjb 보조사) 영화(CMCN 비서술 보통명사) + 를(fjco 목적격조사) 재밌(YBDO 일반동사) + 게(fmoca 보조 연결어미) 보(YBDO 일반동사) + 았(fmbtp 과거시제 선어말어미) + 다(fmofd 평서형 종결어미)"로 변환된다.In addition, the morpheme analysis (Morpheme Analyze) is a task for finding what part of speech information for each of the divided units. For example, if the input sentence says "I enjoyed the movie", the result sentence reads "I (CTP1 first person pronoun) + (fjb assistant) movie (CMCN secretary common noun) + (fjco purpose check) (YBDO general verbs) + crab (fmoca auxiliary verb) + (YBDO general verbs) + (fmbtp past tense first ending endings) + da (fmofd flat ending endings).

다음으로, 상기 추출한 언어적인 자질(Feature)을 이용하여 통상의 기계학습 알고리즘인 예컨대, Naㅿve Baysian, SVM, K-NN 이나 기타 모델을 선택하여 학습을 수행한다.Next, using the extracted linguistic features, learning is performed by selecting a general machine learning algorithm, for example, Navve Baysian, SVM, K-NN or other models.

이렇게 학습이 끝나고 나면, 임의의 문장이나 문서가 입력이 되면, 해당 데이터가 의견 데이터인지 사실 데이터인지 구분할 수 있는 의견/비의견 구분모델 즉, 의견/비의견 구분부(222)가 구현될 수 있다.After the learning is completed, if any sentence or document is input, the opinion / non-dispute classification model that can distinguish whether the data is opinion data or fact data, that is, the opinion / non-comment separator 222 may be implemented. .

그리고, 의견표현 구분부(223)는 의견/비의견 구분부(222)로부터 구분된 의견 문장의 언어적인 자질(Feature)들에 대해 긍정/부정 의견표현으로 구분하는 기능을 수행한다.In addition, the opinion expression division unit 223 performs a function of dividing the language features (feature) of the opinion sentences separated from the opinion / non- opinion division unit 222 into positive / negative opinion expressions.

즉, 의견표현 구분부(223)는 입력된 의견 문장 중에서 긍정적/부정적 의견인 부분을 찾아서 그 부분을 표시해준다. 한편, 의견/비의견 구분부(222)를 사용하지 않고 바로 의견표현 구분부(223)를 사용하여 입력된 문장에서 긍정적/부정적 표현 부분을 표시해 줄 수도 있다.That is, the opinion expression division unit 223 finds a part that is a positive / negative opinion among the input comment sentences and displays the part. On the other hand, instead of using the opinion / disagreement divider 222, the opinion expression divider 223 may be used to display a positive / negative expression portion in the input sentence.

이러한 의견표현 구분부(223)는 연어뿐만 아니라 일반적인 자립어, 어절 등 모든 단어들의 긍정/부정 정도를 수량화하여 하나의 자원으로 활용하고, 문장 내에서 긍정/부정 표현을 찾아내기 위한 기계학습 모델을 생성하는데 사용될 수 있다.The opinion expression division unit 223 quantifies the degree of affirmation / negativeness of all words such as general self-help words and words as well as salmon, and utilizes them as a resource, and finds a machine learning model for finding positive / negative expressions in sentences. Can be used to generate

또한, 제2 모듈(220)은 상기 추출된 의견 문장에 대해 긍정/부정 의견표현 구분 시 규칙기반(Rule-Based) 모델 및/또는 기계학습(Machine Learning) 모델의 적절한 적용에 따라 하이브리드(Hybrid) 의견분석(Opinion Mining)(예컨대, 캐스케이딩(Cascading), 보간(Interpolation) 및 혼합(Mixed) 의견분석 등) 방식을 수행 할 수 있다.In addition, the second module 220 according to the appropriate application of the rule-based model and / or machine learning model when distinguishing positive / negative opinion expressions for the extracted opinion sentences. Opinion Mining (eg, Cascading, Interpolation, and Mixed Opinion Analysis) may be performed.

여기서, 상기 캐스케이딩(Cascading) 의견분석 방식은 정확도가 매우 높은 규칙 집합을 정의한 후 데이터 수집서버(100)로부터 크롤링(Crawling)된 웹 문서 데이터에 대하여 먼저 규칙을 적용하여 의견을 판단한 후 규칙에 적용되지 않는 예제의 경우 학습방법을 적용하는 선규칙 후통계 방법이다. 즉, 상기 캐스케이딩 방법은 규칙으로 커버될 수 없는 예외적인 경우를 기계학습을 통하여 해결하는 방법으로 실질적인 상황에서 유용도가 높은 효과가 있다.In the cascading opinion analysis method, a rule set having a very high accuracy is defined, and then a rule is first applied to the web document data crawled from the data collection server 100 to determine an opinion and then applied to the rule. In the case of the example that does not apply, the pre-statistic post-statistic method to apply the learning method. That is, the cascading method is a method of solving an exceptional case that cannot be covered by a rule through machine learning, and has a high usefulness in practical situations.

이러한 캐스케이딩(Cascading) 의견분석 방식을 적용할 경우, 제2 모듈(220)은 상기 추출된 의견 문장에 대해 미리 정의된 규칙이 존재할 경우에 미리 설정된 규칙기반(Rule-Based) 모델을 적용하여 긍정/부정 의견표현으로 구분하고, 상기 추출된 의견 문장에 대해 미리 정의된 규칙이 존재하지 않을 경우에 미리 설정된 기계학습 모델을 적용하여 긍정/부정 의견표현으로 구분할 수 있다.In case of applying such a cascading opinion analysis method, the second module 220 applies a rule-based model that is set in advance when a predefined rule exists for the extracted opinion sentence. Positive / negative opinion expressions may be divided, and if there is no predefined rule for the extracted opinion sentence, a predetermined machine learning model may be applied and divided into positive / negative opinion expressions.

그리고, 상기 보간(Interpolation) 의견분석 방식은 서로 다른 두 방법 즉, 규칙기반(Rule-Based) 및 기계학습(Machine Learning) 방법을 통하여 의견문서의 긍정극성, 부정극성 여부를 판단한 후에 두 방법의 극성이 일치할 경우, 높은 신뢰도를 부여하는 방법으로 서로 다른 방법을 통하여 의견의 긍정, 부정 판단결과의 신뢰도를 높이는 방법이다. 즉, 상기 보간 방법을 통하여 높은 신뢰도를 갖는 의견극성을 먼저 제시함으로써 높은 사용자들로부터 높은 신뢰도를 얻을 수 있는 효과가 있다.In addition, the interpolation opinion analysis method uses two different methods, that is, the rule-based and machine learning methods to determine whether the opinion document is positive or negative, and then the polarity of the two methods. If this is the same, a method of giving high reliability is a method of increasing the reliability of the positive and negative judgment results through different methods. That is, by first presenting the opinion polarity having high reliability through the interpolation method, it is possible to obtain high reliability from high users.

이러한 보간(Interpolation) 의견분석 방식을 적용할 경우, 제2 모듈(220)은 상기 추출된 의견문장에 대해 미리 설정된 규칙기반 모델(Rule-Based) 및 기계학습 모델을 동시에 적용하여 긍정/부정 의견표현을 판단한 후, 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수를 부여하고, 상기 신뢰도 점수를 바탕으로 긍정/부정 의견표현으로 구분할 수 있다.When applying the interpolation opinion analysis method, the second module 220 simultaneously expresses positive / negative opinions by simultaneously applying a rule-based model and a machine learning model previously set to the extracted opinion sentences. After the determination, different reliability scores may be given according to whether the rule-based model and the machine learning model are matched, and classified into positive / negative opinion expressions based on the reliability scores.

그리고, 상기 혼합(Mixed) 의견분석 방식은 규칙을 적용하면 높은 성능을 올릴 수 있는 예외적인 상황에서 규칙을 먼저 적용하는 캐스케이딩 방법을 사용하고, 일반적인 규칙 상황에서 서로 다른 두 방법(규칙기반 및 기계학습 방법)을 이용하는 보간 방법을 사용하는 방식으로서, 기계학습 방법으로 처리하기 힘들고, 확실한 규칙을 정할 수 있는 부분에서 강점을 가지면서 신뢰도 개념도 그대로 사용할 수 있는 방법이다.In addition, the mixed opinion analysis method employs a cascading method of applying a rule first in an exceptional situation in which a rule may increase high performance, and in the general rule situation, two different methods (rule-based and It is a method that uses interpolation method using machine learning method, which is difficult to process by machine learning method, and has the strength in the part that can define certain rules and can use reliability concept as it is.

이러한 상기 혼합(Mixed) 의견분석 방식을 적용할 경우, 제2 모듈(220)은 상기 추출된 의견 문장에 대해 미리 정의된 예외처리 규칙후보 문장인지를 판단하고, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장일 경우, 미리 정의된 예외처리 규칙의 존재여부에 따라 미리 정의된 예외처리 규칙기반 모델 또는 기계학습 모델을 적용하여 긍정/부정/중립 의견표현으로 구분할 수 있다.When the mixed opinion analysis method is applied, the second module 220 determines whether the exception processing rule candidate sentence is predefined for the extracted opinion sentence, and the extracted opinion sentence is defined in advance. In the case of an exception handling rule candidate sentence, it can be classified into positive / negative / neutral opinion expression by applying a predefined exception handling rule-based model or a machine learning model according to the existence of a predefined exception handling rule.

만약, 제2 모듈(220)은 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장이 아닐 경우, 상기 추출된 의견 문장에 대해 미리 설정된 규칙기반 모델(Rule-Based) 및 기계학습 모델을 동시에 적용하여 긍정/부정 의견표현을 판단한 후, 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수를 부여하고, 상기 신뢰도 점수를 바탕으로 긍정/부정 의견표현으 로 구분할 수 있다.If the extracted opinion sentence is not a predefined exception handling rule candidate sentence, the second module 220 simultaneously performs a rule-based model and a machine learning model preset for the extracted opinion sentence. After determining the positive / negative opinion expression by applying, different reliability scores can be given according to the result of applying the rule-based model and the machine learning model, and can be divided into positive / negative expression based on the reliability score. have.

또한, 제2 모듈(220)은 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장이고, 미리 정의된 예외처리 규칙이 존재할 경우에 미리 정의된 예외처리 규칙기반 모델을 적용하여 긍정/부정/중립 의견표현으로 구분하고, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙이 존재하지 않을 경우에 미리 설정된 기계학습 모델을 적용하여 긍정/부정/중립 의견표현으로 구분할 수 있다.In addition, the second module 220 is positive / negative / by applying a predefined exception handling rule-based model when the extracted comment sentence is a predefined exception handling rule candidate sentence, if a predefined exception handling rule exists. In the case of a neutral opinion expression, the extracted opinion sentence may be divided into positive / negative / neutral opinion expression by applying a preset machine learning model when a predefined exception handling rule does not exist.

인덱싱 서버(300)는 데이터 수집서버(100) 및/또는 데이터 분석서버(200)로부터 수집 및/또는 분석된 개체별 메타 정보들을 비롯한 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 데이터베이스(DB)화하여 저장되도록 인덱싱(Indexing)하는 기능을 수행한다.The indexing server 300 may store positive / negative statistical information about each object of each object, including meta information for each object collected and / or analyzed from the data collection server 100 and / or the data analysis server 200. DB) to index and store it.

여기서, 상기 인덱싱(Indexing) 처리는 일반적으로 검색 엔진에서 많이 쓰이는 방법으로서, 하기의 랭크(Rank)는 해당 웹 페이지에 대한 역 링크와 외부 링크 개수 등을 이용한 정보 검색 알고리즘을 사용한다. 이 알고리즘은 많은 역 링크를 가지는 링크가 더 유리하고 각 링크들은 자신이 가지고 있는 점수를 자신에서 뻗어 가는 링크들에게 나누어주는 방식이다.In this case, the indexing process is generally used in a search engine, and the following rank uses an information retrieval algorithm using the number of reverse links and external links for the corresponding web page. This algorithm is more advantageous for links with many reverse links, and each link distributes its own scores to links extending from it.

그리고, 단일 검색키워드가 아닌 합성 검색키워드의 경우에는 사용자가 입력한 검색키워드의 각 형태소들이 가지고 있는 링크들 중에서 공통으로 들어 있는 링크들을 추출한 후에 각 형태소들이 각 링크 문서 중에서 얼마나 가까이 존재하는지 정보와 해당 링크의 랭크 정보들을 종합해서 계산한 후 점수 순서대로 해당 사용자에게 보여주게 된다.In the case of a synthetic search keyword instead of a single search keyword, after extracting links commonly included among links of each morpheme of the search keyword input by the user, information about how close each morpheme exists in each link document and the corresponding information is given. The rank information of the link is calculated and displayed to the corresponding users in order of score.

이를 간단히 설명하면, 각 링크(Link)별 형태소들에 대해서 형태소를 중심으로 역 리스트(Inverted List)를 만들어 랭크(Rank) 순서대로 저장한 후에 사용자가 검색키워드를 입력하면, 해당 검색키워드에 해당되는 링크(Link)들을 역 리스트(Inverted List)에서 얻어 온 후에 불(boolean) 연산, 거리(Distance) 연산, TF(Term Frequency), IDF(Inversed Document Frequency) 등을 이용하여 계산한 점수 순서대로 사용자에게 보여주게 된다.To explain this briefly, an inverted list of morphemes for each link is created based on morphemes, stored in rank order, and the user enters a search keyword. After the links are obtained from the Inverted List, they are presented to users in the order of scores calculated using boolean, distance, TF (Term Frequency), and Inversed Document Frequency (IDF). Will be shown.

또한, 인덱싱 서버(300)에 저장되는 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 예컨대, 개체 ID, 대상(Sub-theme), 각 개체의 각 대상에 대한 긍정/부정 의견 표현수, 전체 의견 표현수 또는 각 대상을 사용한 의견 표현 내용 중 적어도 어느 하나의 정보로 이루어질 수 있다.In addition, positive / negative opinion statistics information for each object of each object stored in the indexing server 300 may include, for example, object ID, sub-theme, number of positive / negative opinion expressions for each object of each object, At least one of the total number of opinion expressions or the content of opinion expression using each object may be included.

또한, 인덱싱 서버(300)는 각 대상(Sub-theme)을 클러스터링을 통해서 그룹화하여 저장할 수 있다. 이때, 상기 클러스터링은 각 대상(Sub-theme)이 어떤 의견표현과 같이 등장했는지 정보를 이용하여 클러스터링을 수행한다. 예컨대, 이야기, 내용, 스토리, 플롯 등과 같은 각 대상(Sub-theme)을 "스토리"로 그룹화하고, 액션, 볼거리, 장면 등과 같은 각 대상(Sub-theme)을 "장면"으로 그룹화하여 저장할 수 있다.In addition, the indexing server 300 may group and store each sub-theme through clustering. In this case, the clustering is performed by using information on which opinions each sub-theme appeared like. For example, each sub-theme such as a story, content, story, plot, etc. may be grouped into a "story", and each sub-theme such as an action, a sight, a scene, etc. may be grouped and stored as a "scene". .

또한, 인덱싱 서버(300)에 저장된 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장될 수도 있다.In addition, positive / negative consensus statistical information about each object of each object stored in the indexing server 300 may be stored so that each object is mapped on a document / paragraph basis.

그리고, 웹 서버(400)는 인터넷(10)을 통해 각 사용자 단말(500)로부터 전송된 사용자 검색 키워드들을 제공받아 인덱싱 서버(300)와 연동되어 해당 사용자 검 색 키워드에 미리 저장된 메타 정보 및/또는 대상 키워드가 존재하는지 판단하고, 해당 사용자 검색 키워드에 미리 저장된 메타 정보 및/또는 대상 키워드가 존재할 경우 해당 사용자 단말(500)의 화면에 해당 메타 정보 및/또는 대상 키워드와 관련된 개체 리스트 결과를 디스플레이(Display) 해주는 기능을 수행한다.In addition, the web server 400 receives user search keywords transmitted from each user terminal 500 through the Internet 10, and interoperates with the indexing server 300 to store meta information and / or stored in advance in the corresponding user search keywords. It is determined whether the target keyword exists, and if the meta information and / or target keyword pre-stored in the corresponding user search keyword exist, the object list result related to the corresponding meta information and / or target keyword is displayed on the screen of the corresponding user terminal 500 ( Display) function.

또한, 웹 서버(400)는 해당 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재하지 않을 경우에 개체 매핑이 되어 있는 인덱싱 서버(300)에서 해당 사용자 검색 키워드로 검색하여 관련된 개체 결과 리스트를 디스플레이 해줄 수 있다.In addition, when the meta information or target keyword stored in advance in the user search keyword does not exist, the web server 400 searches the user search keyword in the indexing server 300 to which the object is mapped and displays a list of related object results. I can do it.

이러한 웹 서버(400)는 사용자 단말(500)로부터 전송된 사용자 검색 키워드를 분석하여 인덱싱 서버(300)에 저장된 메타 정보 또는 대상 키워드가 존재하는지 판단하고, 그 판단 결과에 따라 키워드 검색 방식을 분류하는 키워드 분석모듈(410)과, 키워드 분석모듈(410)로부터 분류된 키워드 검색 방식에 따라 인덱싱 서버(300)와 연동되어 해당 메타 정보 또는 대상 키워드와 관련된 개체들을 검색하여 해당 사용자 단말(500)의 화면에 개체 리스트 결과를 디스플레이 해주는 키워드 검색모듈(420) 등을 포함할 수 있다.The web server 400 analyzes a user search keyword transmitted from the user terminal 500 to determine whether there is meta information or a target keyword stored in the indexing server 300, and classifies the keyword search method according to the determination result. In accordance with the keyword analysis module 410 and the keyword search method categorized from the keyword analysis module 410, the indexing server 300 may be linked to search for objects related to the corresponding meta information or the target keyword and may be screened on the user terminal 500. The keyword search module 420 for displaying the object list result may be included.

또한, 웹 서버(400)는 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재할 경우, 인덱싱 서버(300)와 연동되어 해당 메타 정보 키워드와 관련된 개체들을 검색한 후, 상기 검색된 개체들에 대해 해당 대상 키워드의 긍정/부정 또는 전체 의견 순서 중 어느 하나의 순서로 개체 결과 리스트를 재정렬하여 디스플레이 해줄 수 있다.In addition, when the meta information and the target keyword pre-stored in the user search keyword transmitted from the user terminal 500 exist, the web server 400 interworks with the indexing server 300 to search for entities related to the meta information keyword. In response to the searched entities, the object result list may be rearranged and displayed in any one of affirmative / negative or total opinion order of the corresponding target keyword.

또한, 웹 서버(400)는 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않을 경우, 해당 사용자 검색 키워드를 형태소 분석한 후, 상기 형태소 분석된 사용자 검색 키워드와 인덱싱 서버(300)에 저장된 문서/문단들을 비교 분석하여, 상기 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 해당 사용자 단말(500)에 디스플레이 해줄 수 있다.In addition, when the meta information and the target keyword stored in advance in the user search keyword transmitted from the user terminal 500 do not exist, the web server 400 performs stemming of the corresponding user search keyword and then analyzes the user search keyword. Compares and analyzes documents / paragraphs stored in the indexing server 300 to search for the stemmed user search keyword in the document / paragraph result list, and converts the individual result list mapped to the searched document / paragraph result list into the corresponding user terminal. Can be displayed on the 500.

또한, 웹 서버(400)는, 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 메타 정보 키워드가 존재할 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 메타 정보 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 인덱싱 서버(300)에 저장된 문서/문단들을 비교 분석하여, 해당 메타 정보 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트에서 해당 메타 정보 키워드와 관련된 개체 결과 리스트를 필터링(Filtering)하여 해당 사용자 단말(500)에 디스플레이 해줄 수 있다.In addition, the web server 400, when there is no meta information and a target keyword pre-stored in the user search keyword transmitted from the user terminal 500, and the pre-stored meta information keyword is present, the web server 400 performs stemming analysis of the corresponding user search keyword. Then, by comparing and analyzing the stemmed user search keyword excluding the corresponding meta information keyword and the documents / paragraphs stored in the indexing server 300, searching for the stemmed user search keyword excluding the meta information keyword in the document / paragraph result list. After searching the entity result list mapped to the searched document / paragraph result list, the entity result list related to the meta information keyword is filtered in the searched entity result list and displayed on the corresponding user terminal 500. Can be.

또한, 웹 서버(400)는 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 대상 키워드가 존재할 경우, 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 인덱싱 서버(300)에 저장 된 문서/문단들을 비교 분석하여, 해당 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트를 해당 대상 키워드의 긍정 또는 전체 의견수가 많은 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말(500)에 디스플레이 해줄 수 있다.In addition, when the meta information and the target keyword pre-stored in the user search keyword transmitted from the user terminal 500 do not exist and the pre-stored target keyword exists, the web server 400 stems the corresponding user search keyword. Compare and analyze the stemmed user search keyword excluding the target keyword and the documents / paragraphs stored in the indexing server 300, and search for the stemmed user search keyword excluding the target keyword in the document / paragraph result list. After retrieving the entity result list mapped to the searched document / paragraph result list, the entity result list is rearranged in the order of positive or total opinions of the target keyword and displayed on the corresponding user terminal 500. I can do it.

또한, 웹 서버(400)는 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하지 않음과 동시에 미리 저장된 메타 정보 및 대상 키워드가 존재할 경우 해당 사용자 검색 키워드를 형태소 분석한 후, 해당 메타 정보 및 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드와 인덱싱 서버(300)에 저장된 문서/문단들을 비교 분석하여, 해당 메타 정보 및 대상 키워드를 제외한 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색하고, 상기 검색된 문서/문단 결과 리스트에 매핑된 개체 결과 리스트를 검색한 후, 상기 검색된 개체 결과 리스트에서 해당 메타 정보 키워드와 관련된 개체 결과 리스트를 필터링하고, 상기 필터링 된 개체 결과 리스트를 해당 대상 키워드의 긍정 또는 전체 의견수가 많은 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말(500)에 디스플레이 해줄 수 있다.In addition, the web server 400 stems from the user search keyword if the meta information and the target keyword are not stored in the user search keyword transmitted from the user terminal 500 and the meta information and the target keyword are stored in advance. Then, a comparison of the stemmed user search keyword excluding the meta information and the target keyword and documents / paragraphs stored in the indexing server 300 is performed, and the stemmed user search keyword excluding the meta information and the target keyword is analyzed. Search the paragraph result list, search the entity result list mapped to the searched document / paragraph result list, filter the entity result list related to the meta information keyword in the searched entity result list, and filter the entity result list Is the number of positive or total comments for that target keyword The result list of the objects may be rearranged in ascending order and displayed on the corresponding user terminal 500.

또한, 웹 서버(400)는 상기 개체 리스트 결과와 함께 각 개체의 의견통계 정보들을 해당 사용자 단말(500)에 디스플레이 해줄 수 있다.In addition, the web server 400 may display opinion statistics information of each entity on the corresponding user terminal 500 together with the entity list result.

또한, 웹 서버(400)는 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 대한 의견 분석 결과에 따라 긍정/부정 의견이 많은 순서로 개체 결과 리스트를 디 스플레이 해줄 수 있다.In addition, the web server 400 may display the individual result list in the order of positive / negative opinions according to the opinion analysis result for the user search keyword transmitted from the user terminal 500.

그리고, 사용자 단말(500)은 예컨대, 네트워크(Network) 또는 인터넷(Internet) 등과 같은 유선 또는 무선 통신망을 통해 웹 서버(400)에 접속되며, 통상적인 웹 브라우저(Web Browser)를 통해 웹 서버(400)에서 제공하는 각종 서비스를 제공받을 수 있게 된다.In addition, the user terminal 500 is connected to the web server 400 through a wired or wireless communication network such as, for example, a network or the Internet, and the web server 400 through a conventional web browser. You can get a variety of services provided by).

이러한 사용자 단말(500)은 개인용 퍼스널 컴퓨터(Personal Computer, PC)로 구현됨이 바람직하지만, 이에 국한하지 않으며, 인터넷에 연결하여 통신할 수 있는 노트북(Notebook), 개인 휴대용 단말기(Personal Digital Assistant, PDA), PDA폰 또는 통신기능이 있는 DMB(Digital Multimedia Broadcasting)폰 등과 같이 통신기능을 가지는 모든 장치로 구현할 수도 있다.The user terminal 500 is preferably implemented as a personal computer (PC), but is not limited thereto. A notebook, a personal digital assistant, or a PDA capable of connecting and communicating with the Internet may be used. ), A PDA phone or a DMB (Digital Multimedia Broadcasting) phone with a communication function can be implemented as any device having a communication function.

이하에는 본 발명의 일 실시예에 따른 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 방법에 대하여 상세하게 설명하기로 한다.Hereinafter, an object search using the Internet and a hybrid-based opinion analysis method for the same according to an embodiment of the present invention will be described in detail.

도 3은 본 발명의 일 실시예에 따른 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 방법을 설명하기 위한 전체적인 흐름도이고, 도 4 및 도 5는 본 발명의 일 실시예에 적용된 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 분석하는 과정을 구체적으로 설명하기 위한 흐름도이다.3 is an overall flowchart illustrating an object search using the Internet and a hybrid-based opinion analysis method for the same according to an embodiment of the present invention, and FIGS. 4 and 5 are each entity applied to an embodiment of the present invention. This is a flow chart for explaining in detail the process of analyzing positive / negative opinion statistics information for each target of.

도 3 내지 도 5를 참조하면, 인터넷을 이용하는 사용자들이 특정 사용자 검색 키워드에 대한 개체 결과 리스트 및 의견통계 정보들을 제공받기 위해서는 먼저, 각 개체(Entity)에 대한 메타 정보들{예컨대, 영화에 대한 배우(송강호, 이병 헌 등), 감독(봉준호, 강제규 등), 출시 날짜 등}을 비롯한 대상(Sub-theme)에 대한 키워드를 미리 추출하여 저장되어 있어야 한다.3 to 5, in order for users using the Internet to be provided with entity result lists and opinion statistics on specific user search keywords, first, meta information about each entity (eg, an actor for a movie) is provided. (Song Kang-ho, Lee Byung-hun, etc.), directors (Bong Joon-ho, compulsory rules, etc.), release date, etc.} should be extracted in advance and stored keywords for the sub-theme.

이때, 각 개체에 대한 메타 정보들은 인덱싱 서버(300, 도 1 참조)의 데이터베이스(DB)의 각 필드(field)에 있는 값들을 미리 모두 추출해서 저장되어 있는 정보들이다. 상기 대상(Sub-theme)은 클러스터링을 수행할 때 가지고 있던 정보들이다.In this case, the meta information of each entity is information stored in advance by extracting all values in each field of the database DB of the indexing server 300 (see FIG. 1). The sub-theme is information that is held when performing clustering.

즉, 데이터 수집서버(100, 도 1 참조)를 통해 인터넷 상에 존재하는 웹 문서 데이터들을 수집하고(S100), 상기 단계S100에서 수집된 웹 문서 데이터들은 데이터 분석서버(200, 도 1 참조)를 통해 개체별 메타 정보들을 추출한 후, 상기 개체별 메타 정보들을 이용하여 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 분석한다(S200).That is, the web document data existing on the Internet is collected through the data collection server 100 (refer to FIG. 1) (S100), and the web document data collected in step S100 is referred to as the data analysis server 200 (refer to FIG. 1). After extracting the meta information for each object through the object, the positive / negative opinion statistics information for each object of each individual is analyzed using the meta information for each object (S200).

이때, 상기 개체별 메타 정보들은 데이터 수집서버(100)의 웹 크롤러가 인터넷 상에서 이런 정보를 주기적으로 업데이트 하는 웹사이트를 미리 지정하여, 상기 지정된 웹사이트에서 정규식(Regular Expression)을 사용하여 정보를 업데이트 한다.In this case, the meta information for each object may designate a web site in which a web crawler of the data collection server 100 periodically updates such information on the Internet, and update the information using a regular expression in the designated web site. do.

또한, 상기 각 개체의 각 대상(Sub-theme)에 대한 긍정/부정 의견통계 정보들의 구축 방법은, 먼저, 데이터 수집서버(100)를 통해 웹 문서 데이터를 수집한 후, 데이터 분석서버(200)를 통해 상기 수집한 웹 문서 데이터가 어떤 분야(예컨대, 영화, 정치, 경제, 게임, 스포츠 등)에 속하는지 미리 설정된 기계학습 기반의 자동분류 모델을 사용하여 분류하고, 상기 수집한 웹 문서 데이터를 의견분 석(Opinion Mining)을 통해서 각 문장에서 긍정/부정 의견표현을 찾아낸다.In addition, the method of constructing the positive / negative opinion statistics information for each sub-theme of each object, first, after collecting the web document data through the data collection server 100, the data analysis server 200 The classified web document data belongs to which field (eg, movie, politics, economy, game, sports, etc.) using a pre-set automatic classification model based on machine learning, and classifies the collected web document data. Opinion Mining finds positive / negative expressions in each sentence.

또한, 데이터 분석서버(200)를 통해 상기 수집한 웹 문서 데이터가 어떤 개체(Entity)를 가리키는지 판별한다. 즉, 각 개체(Entity)가 가지는 메타(Meta) 정보(예컨대, 영화 - 감독, 배우 등)들을 이용하여 어떤 개체(Entity)에 가까운지 판별한다.In addition, the data analysis server 200 determines which entity the collected web document data indicates. That is, it determines which entity is close to using meta information (eg, movie-director, actor, etc.) of each entity.

예를 들면, 개체(Entity) A에 대하여 메타 정보가 "a, b, c"가 있다고 할 때 수집한 웹 문서 데이터(Doc 1)에서 개체(Entity) 이름과, 각 메타 정보들의 출현 빈도 정보 등을 이용하여 해당 웹 문서 데이터(Doc1)가 상기 개체(Entity) A를 가리킬 확률을 구한다.For example, in the web document data (Doc 1) collected when the meta information of the entity A has "a, b, c", the entity name and the frequency of occurrence of each meta information, etc. The probability that the corresponding web document data Doc1 points to the entity A is obtained by using.

한편, 상기와 같이 문서 단위로 비교했다면, 웹 문서 데이터(Doc1)를 각 문장/문단/임의의 구절 단위로 나눈 후에 각 단위 데이터들이 임의의 개체(Entity) 일 확률을 전술한 바와 같이 메타 정보를 활용하여 구할 수도 있다.On the other hand, if the comparison in the document unit as described above, after dividing the web document data (Doc1) by each sentence / paragraph / arbitrary phrase unit meta-information as described above to determine the probability that each unit data is any entity (Entity) You can also get it.

상기 각 단위 데이터(문서/문장/문단/임의의 구절) 단위로 가리키는 개체(Entity) 후보들을 구한 다음에 가장 높은 확률의 개체(Entity)를 선택하거나, 확률 상으로 가장 높은 상위 N개의 개체(Entity)를 선택할 수 있다.Obtain entity candidates indicated by each unit data (document / sentence / paragraph / arbital phrase) and then select the entity with the highest probability, or select the highest N entities with the highest probability. ) Can be selected.

그리고, 각 단위 데이터 단위로 선택한 개체(Entity)에 대해서 그 단위 데이터 안에 나타난 의견 정보를 해당 개체(Entity)에 대한 의견 정보로 간주하고, 의견 정보를 인덱싱 서버(300)의 데이터베이스(DB)에 저장하게 된다.In addition, the opinion information displayed in the unit data is regarded as the opinion information on the entity for the entity selected as the unit data unit, and the opinion information is stored in the database DB of the indexing server 300. Done.

또한, 데이터 분석서버(200)를 통해 상기 수집한 웹 문서 데이터에서 각 의견이 어떤 대상(Sub-theme)을 가리키는지 찾아낸다. 이때, 해당 대상 주변에 있는 메타 정보들 또는 단어 및 품사 정보 등을 이용하여 대상(Sub-theme)을 분류해 낸다.In addition, the data analysis server 200 finds out which object (sub-theme) each opinion represents in the collected web document data. At this time, the sub-theme is classified using meta information or words and parts-of-speech information around the target.

예를 들면, 『이번/NNG 영화/NNG + 는/JX 스토리/NNG + 가/JKS 정말/MAG <positive>괜찮/VA + 았/EP + 다/EF</positive>』라는 문장에서 의견인 "괜찮았다" 의 대상(Sub-theme)을 찾아야 하는데, 이때 주격조사 『가(JKS)』 앞에 있는 명사 "스토리/NNG"가 대상(Sub-theme)인 것을 주격조사 『가(JKS)』를 통해서 추측할 수 있다.For example, in this sentence `` This / NNG movie / NNG + / JX story / NNG + / JKS really / MAG <positive> okay / VA + did / EP + da / EF </ positive> '' Sub-theme must be found. At this time, the noun "story / NNG" in front of the main investigation "JKS" is a sub-theme. I can guess.

좀더 구체적인 예를 들어보면, "신기전"이라는 영화 개체(Entity)에 대해서, "감독: 김유진", "배우: 정재영 , 한은정 , 허준호"라는 메타 정보가 총 4개 존재하면, 이때, 문서/문단 A에서 "신기전", [신기전], '신기전', 영화 신기전, 신기전 후기, 신기전 감상 등 어떠한 개체를 나타내는 주변 단어들이 있는지를 파악하여 일단 "신기전"이라는 개체를 언급할 가능성이 있는 문서/문단인지 결정한다.More specifically, for a movie entity called "Shin Ki-jeon", if there are four meta-information of "Director: Kim Yu-jin", "Actress: Jae-Young Jung, Han Eun-jung, Hur Jun-Ho", document / paragraph A Determines whether there is a word / paragraph that is likely to refer to an entity called "new sensation" by identifying the surrounding words that represent any entity such as "new sensation", [new sensation], "new sensation", movie novel, late sensation, or appreciation of the epoch. do.

그런 다음, "신기전"을 어느 정도 가리킨다고 위와 같은 주변 문맥을 통해서 파악한 후에는 위에서 가지고 있는 4개의 메타 정보가 얼마나 존재하는지 파악한다. 만약, "정재영, 김유진" 2개의 메타 정보가 존재하면 전체 4개 중에서 2개가 존재한다고 파악하게 된다. 즉, 50％가 존재한다고 파악한다. 그래서, 어떠한 임의의 문턱값(Threshold)을 정해서 그 이상일 경우 해당 개체(Entity)를 가리키는 문서/문단으로 파악하게 된다.Then, after grasping through the surrounding context as above to indicate "negative mechanism", we can see how much of the four meta-informations we have above. If two meta-information exists, "Jeong Jae-Young and Kim Yu-Jin", two out of four exist. In other words, it is understood that 50% exists. Thus, if any threshold is set and more than that, the document / paragraph pointing to the entity is identified.

상기와 같이 대상(Sub-theme)은 의견 표현과 그 주변에 있는 품사 정보들을 보고 예측이 가능하며, 이렇게 각 경우에 대해서 직접적으로 지정해 주는 규칙기 반(Rule-Based) 방법과, 특정한 문맥 하에서 대상(Sub-theme)일 확률이 높은 단어를 찾아내는 기계학습(Machine Learning) 방법 모두 가능하다.As described above, the sub-theme can predict and display the parts of speech and the parts of speech information surrounding it, and thus, the rule-based method that directly assigns each case, and the object under a specific context. Any machine learning method that finds words that are likely to be sub-theme is possible.

또한, 의견통계 정보를 저장하기 전에 먼저 각 대상(Sub-theme)을 클러스터링을 통해서 그룹화 할 수 있다. 예를 들면, 이야기, 내용, 스토리, 플롯 등과 같은 대상(Sub-theme)을 "스토리"로 그룹화 할 수 있다. 이때, 상기 클러스터링은 각 대상(Sub-theme)이 어떤 의견 표현과 같이 등장했는지 정보를 이용하여 클러스터링을 수행한다.Also, before storing opinion statistics information, each sub-theme can be grouped through clustering. For example, you can group sub-themes such as stories, content, stories, plots, etc. into "story". In this case, the clustering is performed by using information on which sub-theme has appeared with which opinion expression.

한편, 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 예컨대, 개체 ID, 대상(Sub-theme), 각 개체의 각 대상에 대한 긍정/부정 의견 표현수, 전체 의견 표현수 또는 각 대상을 사용한 의견 표현 내용 중 적어도 어느 하나의 정보로 이루어질 수 있다.On the other hand, affirmative / negative opinion statistics information for each object of each entity may include, for example, an entity ID, a sub-theme, the number of positive / negative opinion expressions for each object of each individual, the total number of opinion expressions, or each object. At least one of the opinion expression used content may be made.

예를 들면, 개체(Entity) A 에 대한 "감동" 부분에서 긍정(Positive) 의견을 가진 표현을 찾았다면, 해당 개체(Entity) A에 대한 데이터베이스(DB)의 "감동"에 대한 대상(Sub-theme)에서 긍정 의견 표현수(Positive Count)를 1 증가해서 저장하게 된다.For example, if you find an expression with a positive opinion in the "Emotions" section on an Entity A, the Sub- In the theme), the positive count expression is increased by 1 and stored.

즉, 도 4 및 도 5에 도시된 바와 같이, 상기 단계S200은, 상기 단계S100에서 수집된 웹 문서 데이터들에 대해 미리 설정된 분야별 기계학습 모델을 이용하여 분야별로 분류하는 단계(S210)와, 상기 단계S100에서 수집된 웹 문서 데이터들에 대해 언어처리를 수행하여 의견 문장을 추출하고, 상기 추출된 의견 문장에 대해 긍정/부정 의견표현으로 구분하는 단계(S220)와, 상기 단계S100에서 수집된 웹 문서 데이터들에 대해 개체별 메타 정보들을 이용하여 어떤 개체에 해당되는지 판별하는 단계(S230)와, 상기 단계S220에서 추출된 의견 문장의 주변에 있는 단어 및 품사 정보를 이용하여 어떤 대상에 해당되는지 판별하는 단계(S240)를 포함할 수 있다.That is, as shown in Figures 4 and 5, the step S200, the step of classifying by sector using the machine learning model for each field set in advance for the web document data collected in the step S100 (S210), and Performing language processing on the web document data collected in step S100 to extract a comment sentence, and dividing the extracted comment sentence into positive / negative expressions (S220) and the web collected in step S100; Determining which object corresponds to the document data using meta information for each object (S230), and determining which object corresponds to the object using word and part-of-speech information around the opinion sentence extracted in step S220. It may include the step (S240).

여기서, 상기 단계S220은, 상기 단계S100에서 수집된 웹 문서 데이터에 대해 문장 단위로 분리하고, 분리된 각 문장에 대해 언어처리를 수행하여 언어적인 자질들을 추출하는 단계(S220-1)와, 상기 단계S220-1에서 추출된 각 문장의 언어적인 자질들을 이용하여 의견/비의견 문장을 구분하는 단계(S220-2)와, 상기 단계S220-2에서 구분된 의견 문장의 언어적인 자질들에 대해 긍정/부정 의견표현으로 구분하는 단계(S220-3)를 포함할 수 있다.Here, in step S220, the web document data collected in step S100 is separated into sentence units, and language processing is performed on each of the separated sentences to extract linguistic features (S220-1), and A step S220-2 is used to classify the opinion / non-sentence sentences using the linguistic qualities of the sentences extracted in step S220-1, and affirms the linguistic qualities of the comment sentences separated in step S220-2. It may include a step (S220-3) to divide the negative opinion expression.

또한, 상기 단계S220에서, 상기 추출된 의견 문장에 대해 긍정/부정 의견표현 구분 시 규칙기반(Rule-Based) 모델 및/또는 기계학습(Machine Learning) 모델의 적절한 적용에 따라 하이브리드(Hybrid) 의견분석(Opinion Mining)(예컨대, 캐스케이딩(Cascading), 보간(Interpolation) 및 혼합(Mixed) 의견분석 등) 방식을 수행할 수 있다.In addition, in step S220, hybrid opinion analysis according to appropriate application of a rule-based model and / or a machine learning model when distinguishing positive / negative opinion expressions on the extracted opinion sentences. (Opinion Mining) (eg, Cascading, Interpolation, and Mixed Opinion Analysis, etc.) may be performed.

도 6 내지 도 9는 본 발명의 일 실시예에 따른 하이브리드 기반의 의견분석 방법을 구체적으로 설명하기 위한 흐름도로서, 도 6은 캐스케이딩(Cascading) 의견분석 방법을 나타낸 흐름도이고, 도 7은 보간(Interpolation) 의견분석 방법을 나타낸 흐름도이며, 도 8은 도 7의 각 의견표현에 따른 신뢰도 점수를 나타낸 도면이며, 도 9는 혼합(Mixed) 의견분석 방법을 나타낸 흐름도이다.6 to 9 are flowcharts illustrating a hybrid-based opinion analysis method according to an embodiment of the present invention in detail, FIG. 6 is a flowchart illustrating a cascading opinion analysis method, and FIG. 7 is interpolated. (Interpolation) A flowchart illustrating a method of analyzing opinions, FIG. 8 is a diagram illustrating a reliability score according to each opinion expression of FIG. 7, and FIG. 9 is a flowchart illustrating a mixed opinion analysis method.

도 6을 참조하면, 상기 단계S220에서 추출된 의견 문장에 대해 미리 정의된 규칙이 존재하는지 판단하고(S221), 상기 단계S221에서의 판단 결과, 상기 추출된 의견 문장에 대해 미리 정의된 규칙이 존재할 경우, 미리 설정된 규칙기반(Rule-Based) 모델을 적용하여 긍정/부정 의견표현으로 구분한다(S222).Referring to FIG. 6, it is determined whether a predefined rule exists for the opinion sentence extracted in step S220 (S221), and as a result of the determination in step S221, a predefined rule exists for the extracted opinion sentence. In this case, a predetermined rule-based model is applied to distinguish positive / negative opinions (S222).

예를 들면, "이 영화는 더 좋을 수 없다."라는 의견 문장에 대해 미리 설정된 규칙기반 모델(더 + 긍정어 + 없 → 긍정)을 통해 긍정 의견으로 구분하게 된다.For example, a pre-set rule-based model (more + affirmations + no → affirmations) for the comment sentence "This movie can't be better" is categorized as a positive opinion.

그렇지 않고, 상기 단계S221에서의 판단 결과, 상기 추출된 의견 문장에 대해 미리 정의된 규칙이 존재하지 않을 경우, 미리 설정된 기계학습 모델을 적용하여 긍정/부정 의견표현으로 구분한다(S223).Otherwise, if there is no predefined rule for the extracted opinion sentence as a result of the determination in step S221, a predetermined machine learning model is applied to classify affirmative / negative opinion representation (S223).

예를 들면, "이 영화는 그냥 뭐 그렇다."라는 의견 문장에 대해 미리 정의된 규칙이 존재하지 않을 경우, 미리 설정된 기계학습 모델을 통해 구 단위로 의견 분석한 후, 상기 기계학습 모델을 통해 의견의 긍정/부정에 대한 확률추출 방식으로 부정 의견으로 구분하게 된다.For example, if there is no pre-defined rule for the comment sentence "This movie is just like that", the opinion is analyzed in units of units using a preset machine learning model, and then the opinion is provided through the machine learning model. Probability sampling method for positive / negatives is divided into negative opinions.

도 7 및 도 8을 참조하면, 상기 단계S220에서 추출된 의견 문장에 대해 미리 설정된 규칙기반 모델(Rule-Based) 및 기계학습 모델을 동시에 적용하여 각각 긍정/부정 의견표현을 판단한 후(S224a 및 S224b), 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수를 부여하고(S225), 상기 신뢰도 점수를 바탕으로 긍정/부정 의견표현으로 구분한다(S226).Referring to FIGS. 7 and 8, after a rule-based model and a machine learning model are previously applied to the opinion sentences extracted in step S220, the positive / negative opinion expressions are respectively determined (S224a and S224b). ), Different reliability scores are assigned according to whether the rule-based model and the machine learning model are matched (S225), and classified into positive / negative opinion expressions based on the reliability scores (S226).

예를 들면, "이번에 본 영화 A는 전반적인 스토리가 너무 괜찮았다."라는 의견 문장에 대해 미리 설정된 규칙기반 모델을 통해 긍정 의견으로 구분하고(예, 이 번에 본 영화 A는 전반적인 스토리가 너무 <positive>괜찮았다</positive>.), 미리 설정된 기계학습 모델을 통해 구 단위로 의견 분석한 후, 상기 기계학습 모델을 통해 의견의 긍정/부정에 대한 확률추출 방식으로 긍정 의견으로 구분한다(예, 이번에 본 영화 A는 전반적인 스토리가 <positive>너무 괜찮았다</positive>.).For example, a pre-set rule-based model for the comment sentence "This time A's overall story was too good" is divided into positive comments (e.g., this time A's overall story is too < positive> OK </ positive>.), the opinions are analyzed in units of units through a preset machine learning model, and then classified into positive opinions using the machine learning model as a probability extraction method for positive / negative opinions. , Movie A, this time, the overall story was <positive> too good </ positive>.).

다음으로, 도 8에 도시된 바와 같이, 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수{Positive-Positive 나, Negative-Negative(예제에서는 "괜찮았다")에 대해서는 2점, 중립과 일치하는 결과 Positive-Objective, Negative-Objective(예제에서는 "너무")에 대해서는 1점, 그리고 두 모델이 완전히 상반되는 결과를 내놓는 경우 Positive-Negative, Negative-Positive에 대해서는 -1점)를 부여한 후, 상기 신뢰도 점수를 바탕으로 특정 신뢰도 점수 이상인 경우에만 긍정/부정 의견표현으로 구분할 수 있게 된다.Next, as shown in FIG. 8, different reliability scores (Positive-Positive or Negative-Negative (“Okay” in the example)) according to whether the result of applying the rule-based model and the machine learning model are matched. 2 points, neutral matched results 1 point for Positive-Objective, Negative-Objective ("too" in the example), and -1 for Positive-Negative and Negative-Positive if the two models yield completely opposite results ), And based on the reliability score can be divided into positive / negative opinion expression only when a certain reliability score or more.

도 9를 참조하면, 상기 단계S220에서 추출된 의견 문장에 대해 미리 정의된 예외처리 규칙후보 문장인지를 판단한 후(S227), 상기 단계S227에서의 판단 결과, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장(예컨대, ~이보다 좋을수 없어∼, 영화 "우리 생애 최고의 순간"을 봤다, ~~~~주인공 A는 이러한 상황을 즐겼다. ~~~~ 등)일 경우, 미리 정의된 예외처리 규칙의 존재여부에 따라 미리 정의된 예외처리 규칙기반 모델 또는 기계학습 모델을 적용하여 긍정/부정/중립 의견표현으로 구분할 수 있다.Referring to FIG. 9, after it is determined whether or not the exception processing rule candidate sentence is predefined for the opinion sentence extracted in step S220 (S227), as a result of the determination in step S227, the extracted opinion sentence is an exception defined in advance. If you have a candidate sentence (e.g., it can't be better than this, you've seen the movie "The Best Moments of Our Lives", the main character A enjoyed this situation, ~~~~, etc.). Depending on the existence of the rule, a predefined exception handling rule-based model or machine learning model can be applied to classify as positive / negative / neutral.

만약, 상기 단계S227에서의 판단 결과, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장이고, 미리 정의된 예외처리 규칙이 존재할 경우, 미리 정의된 예외처리 규칙기반 모델을 적용하여 긍정/부정/중립 의견표현으로 구분한 후, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙이 존재하지 않을 경우에 미리 설정된 기계학습 모델을 적용하여 긍정/부정/중립 의견표현으로 구분한다(S228).If, as a result of the determination in step S227, the extracted opinion sentence is a predefined exception handling rule candidate sentence, and if a predefined exception handling rule exists, affirmative / negative by applying a predefined exception handling rule based model. After dividing by / neutral opinion expression, the extracted opinion sentence is divided into affirmative / negative / neutral opinion expression by applying a preset machine learning model when a predefined exception handling rule does not exist (S228).

한편, 상기 단계S227에서의 판단 결과, 상기 추출된 의견 문장이 미리 정의된 예외처리 규칙후보 문장이 아닐 경우에 상기 추출된 의견 문장에 대해 미리 설정된 규칙기반 모델(Rule-Based) 및 기계학습 모델을 동시에 적용하여 긍정/부정 의견표현을 판단한 후, 상기 규칙기반 모델 및 기계학습 모델을 적용한 결과의 일치여부에 따라 서로 다른 신뢰도 점수를 부여하고, 상기 신뢰도 점수를 바탕으로 긍정/부정 의견표현으로 구분한다(S229).On the other hand, as a result of the determination in step S227, when the extracted opinion sentence is not a predefined exception processing rule candidate sentence, a rule-based model and a machine learning model preset for the extracted opinion sentence are used. After determining the positive / negative opinion expression by applying it at the same time, different reliability scores are assigned according to whether the rule-based model and the machine learning model are matched, and classified into positive / negative expression based on the reliability score. (S229).

다음으로, 상기 단계S200에서 분석된 개체별 메타 정보들을 비롯한 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 인덱싱 서버(300, 도 1 참조)를 통해 데이터베이스화하여 저장되도록 인덱싱한다(S300).Next, affirmative / negative opinion statistics information for each object of each object including the meta data for each object analyzed in step S200 is indexed to be stored in a database through the indexing server 300 (see FIG. 1) (S300). .

이때, 상기 단계S300에서, 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들은 문서/문단 단위로 각 개체가 매핑되도록 저장할 수 있다. 즉, 인덱싱 서버(300)를 통해 예컨대, 문서/문단 ID, 제목, 내용, 태그, 분류(예컨대, 영화, 정치, 경제 등), 개체 이름 리스트 등으로 데이터베이스(DB)화 하여 저장할 수 있다.At this time, in step S300, the positive / negative opinion statistics information for each object of each object may be stored so that each object is mapped in units of documents / paragraphs. That is, through the indexing server 300, a database (DB) may be stored as, for example, a document / paragraph ID, a title, a content, a tag, a classification (eg, a movie, a politics, an economy, etc.), an object name list, and the like.

마지막으로, 웹 서버(400)를 통해 사용자 단말(500, 도 1 참조)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재하는지 판단한 후, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 또는 대상 키워드가 존재할 경우 해당 사용자 단말(500)의 화면에 해당 메타 정보 또는 대상 키워드와 관련된 개체 리스트 결과를 디스플레이 해준다(S400). 이때, 상기 단계S400에서, 상기 사용자 검색 키워드에 대한 의견 분석 결과에 따라 긍정/부정 의견이 많은 순서로 개체 결과 리스트를 디스플레이(Display) 해줄 수 있다.Finally, after determining whether there is a meta information or a target keyword stored in advance in the user search keyword transmitted from the user terminal 500 (see FIG. 1) through the web server 400, the meta information or the target stored in advance in the user search keyword is determined. If there is a keyword, the result of displaying the object list related to the corresponding meta information or the target keyword is displayed on the screen of the corresponding user terminal 500 (S400). In this case, in step S400, the object result list may be displayed in order of a large number of positive / negative opinions according to the opinion analysis result of the user search keyword.

예를 들면, 영화명, 배우이름, 제작연도, 장르, 국가 등 각 필드명 즉, 메타 정보 키워드를 중심으로 검색할 수 있다. 즉, 사용자가 "봉준호 감독 영화"라고 검색하게 되면, "봉준호"라는 메타 정보 키워드가 존재함을 판단하여, 봉준호 감독 영화들을 인덱싱 서버(300)에서 검색하게 된다.For example, it is possible to search based on each field name, that is, a meta information keyword such as a movie name, actor name, production year, genre, and country. That is, when the user searches for "Bong Joon Ho Director Movie", it is determined that the meta information keyword "Bong Joon Ho" exists, and the Bong Joon Ho Director movies are searched by the indexing server 300.

마찬가지로, "송강호 영화"라고 검색하게 되면, "송강호"라는 메타 정보 키워드가 존재함을 판단하여, "송강호" 출연 영화를 인덱싱 서버(300)에서 검색하게 된다.Similarly, when searching for "songgangho movie", it is determined that the meta information keyword "songgangho" exists, and the indexing server 300 searches for the movie starring "songgangho".

한편, 총평, 스토리, 연기, 배우, 재미, 감동, 반전, 음악 등과 같은 대상 키워드를 중심으로 각 대상별 긍정/부정 의견별로 정렬하여 검색할 수 있다. 즉, "감동"에 대해서 가장 긍정적인 의견이 많은 영화나, "재미"에 대해서 가장 부정적인 의견이 많은 영화 등을 검색할 수 있게 된다.On the other hand, it is possible to search by sorting by affirmative / negative opinions for each target based on target keywords such as general comments, stories, acting, actors, fun, excitement, inversion, and music. In other words, it is possible to search for movies with the most positive opinions about "emotion" or movies with the most negative opinions about "fun".

예를 들면, "스토리 영화"라고 검색하게 되면, "스토리"라는 대상 키워드가 존재함을 판단하여, 인덱싱 서버(300)에서 "스토리"부분에서 의견이 많은 영화 또는 긍정/부정 의견이 많은 영화를 검색하게 된다. 또한, 검색 시 "스토리" 전체 의견/긍정/부정 의견 개수로 정렬해서 검색하게 된다.For example, when searching for "story movie", it is determined that there is a target keyword "story", and the indexing server 300 selects a movie having a lot of opinions or a lot of positive / negative opinions in the "story" part. Search. In addition, the search will be sorted by the number of "story" total opinion / positive / negative feedback.

마찬가지로, "스토리가 좋은 영화"라고 검색하게 되면, "스토리"라는 대상 키워드가 포함되어 있음을 판단하고, 검색 문장 자체를 의견 분석을 통해서 긍정적인 의견이 담겨져 있음을 파악한다. 따라서, 인덱싱 서버(300)에서 "스토리" 부분에서 긍정적인 의견이 많은 영화 순으로 검색하게 된다.Similarly, if a search is made for a movie with a good story, it is determined that the target keyword "story" is included, and the search sentence itself is analyzed to include positive opinions through opinion analysis. Therefore, the indexing server 300 searches for the movies with the most positive opinions in the "story" part.

또한, "스토리가 엉망인 영화"라고 검색하게 되면, "스토리"라는 대상 키워드가 포함되어 있음을 판단하고, 검색 문장 자체를 의견 분석을 통해서 부정적인 의견이 담겨져 있음을 파악한다. 따라서, 인덱싱 서버(300)에서 "스토리" 부분에서 부정적인 의견이 많은 영화 순으로 검색하게 된다.In addition, when searching for "the story is a messed-up movie," it is determined that the target keyword "story" is included, and the search sentence itself through the analysis of opinions to find that negative opinions are contained. Therefore, the indexing server 300 searches in the order of movies with a large number of negative opinions in the "story" part.

도 10 내지 도 14는 사용자 검색 키워드에 따른 다양한 키워드 검색 방법을 설명하기 위한 구체적인 흐름도이다.10 to 14 are detailed flowcharts illustrating various keyword search methods according to user search keywords.

도 10을 참조하면, 웹 서버(400, 도 1 참조)를 통해 사용자 단말(500, 도 1 참조)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 복합되어 존재하는지 판단한 후(S401), 상기 단계S401에서의 판단 결과, 해당 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재할 경우, 인덱싱 서버(300, 도 1 참조)를 이용하여 해당 메타 정보 키워드와 관련된 개체들을 검색한 후(S402), 상기 단계S402에서 검색된 개체들에 대해 해당 대상 키워드의 긍정/부정 또는 전체 의견 순서 중 어느 하나의 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말(500)의 화면에 디스플레이 해준다(S403).Referring to FIG. 10, after determining whether a meta keyword and a target keyword pre-stored in the user search keyword transmitted from the user terminal 500 (see FIG. 1) through the web server 400 (see FIG. 1) are present (S401). As a result of the determination in step S401, when the meta information and the target keyword pre-stored in the corresponding user search keyword exist, the entities related to the meta information keyword are searched using the indexing server 300 (see FIG. 1) (S402). ), The object result list is rearranged in the order of affirmative / negative or total opinion order of the corresponding target keywords for the objects searched in step S402 and displayed on the screen of the corresponding user terminal 500 (S403).

예를 들면, "이병헌의 감동적인 영화"라고 검색하게 되면, "이병헌"이라는 메타 정보 키워드와 "감동"이라는 대상(Sub-theme)이 복합되어 있는 경우로서, 인덱싱 서버(300)에서 "이병헌"이 출연한 개체(Entity)들을 먼저 검색한 후, 해당 개 체들에 대해서 "감동"이라는 대상의 긍정/부정 또는 전체 의견 순서 중 어느 하나의 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말(500)의 화면에 디스플레이한다.For example, a search for "Lee Byung Hun's touching movie" is a case where a meta information keyword "Lee Byung Hun" and a sub-theme of "Impression" are combined. In the indexing server 300, "Lee Byung Hun" The searched entities are searched first, and then the results of the entities are rearranged in the order of positive / negative or total opinions of the subjects “inspired” with respect to the corresponding entities. Display on.

도 10 및 도 11을 참조하면, 웹 서버(400)를 통해 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 복합되어 존재하는지 판단한 후(S401), 상기 단계S401에서의 판단 결과, 해당 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 모두 존재하지 않을 경우, (A)단계로 진행하여 해당 사용자 검색 키워드를 형태소 분석한다(S404).10 and 11, after determining whether the meta information and the target keyword stored in advance in the user search keyword transmitted from the user terminal 500 through the web server 400 are present (S401), in step S401. As a result of the determination, if neither the meta information previously stored in the user search keyword and the target keyword exist, the process proceeds to step (A) and the user search keyword is morphologically analyzed (S404).

이후에, 상기 단계S404에서 형태소 분석된 사용자 검색 키워드와 문서/문단 단위로 개체 후보 리스트가 매핑(mapping)되어 있는 인덱싱 서버(300)에 저장된 문서/문단들을 비교 분석한다(S405).Subsequently, the documents / paragraphs stored in the indexing server 300 to which the object candidate list is mapped in units of documents / paragraphs are compared and analyzed in step S404 (S405).

그런 다음, 상기 단계S404에서 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색한 후(S406), 상기 단계S406에서 검색된 문서/문단 결과 리스트에 매핑(mapping)된 개체 결과 리스트를 해당 사용자 단말(500)의 화면에 디스플레이 해준다(S407).Then, after searching the user search keyword stemmed in step S404 from the document / paragraph result list (S406), the user result list mapped to the document / paragraph result list retrieved in step S406 is corresponding to the corresponding user terminal. The display on the screen 500 is displayed (S407).

예를 들면, "쫓고 쫓기는 영화"라고 검색하게 되면, 메타 정보 키워드나 대상 키워드가 모두 포함되어 있지 않은 경우로서, 전술한 바와 같이 문서/문단 단위로 개체 후보 리스트가 매핑되어 있는 데이터베이스(DB) 상에서 검색을 수행한다.For example, a search for "movie chased after" does not include both a meta information keyword and a target keyword, and as described above, on a database (DB) in which an object candidate list is mapped on a document / paragraph basis. Perform a search.

즉, 이러한 데이터베이스(DB)에서 예컨대, 제목, 태그, 본문의 검색 키워드와 상기 사용자 검색 키워드가 형태소 분석된 키워드(쫓/VV + 고/EC 쫓기/VV + 는 /ETM 영화/NNG)를 검색하게 된다. 이때, 해당 사용자 검색 키워드와 해당 문서/문단과의 관련도에 따라서 랭킹이 된다.That is, in such a database DB, for example, a search keyword of a title, a tag, a text and a search term of the user's search keyword may be searched for a stemmed keyword (chase / VV + high / EC chasing / VV + / ETM movie / NNG). do. At this time, the ranking is based on the degree of relevance of the corresponding user search keyword and the document / paragraph.

이를 상세하게 설명하면, "쫓고 쫓기는 영화"라는 사용자 검색 키워드가 있으면, 이 각 단어 "쫓고", "쫓기는", "영화"들이 많이 들어 있는 문서가 더 관련도가 높고(TF: Term Frequency - 문서에서 해당 Term이 나온 빈도수), 거기에 더해서 "쫓고", "쫓기는"처럼 "영화"보다는 자주 나오지 않는 단어들이(IDF: Inverse Document Frequency - 전체문서/Term이 나온 문서) 더 많이 들어 있으면 더 가중치를 주게 된다.To explain this in detail, if there is a user search keyword "movie chased after," a document containing many of these words "chase", "chased", and "movies" is more relevant (TF: Term Frequency-Document Frequency in the corresponding term), plus more weights that contain less frequent words (IDF: Inverse Document Frequency) than "movie", such as "chase" and "chase". Is given.

상기와 같이 사용자 검색 키워드와 관련된 문서/문단들을 검색하게 되고, 그 문서/문단 결과 리스트에서 해당 결과에 매핑되어 있는 개체 리스트를 개체 검색 결과로 반환하게 된다.As described above, the document / paragraphs related to the user search keyword are searched, and the object list mapped to the corresponding result in the document / paragraph result list is returned as the object search result.

이때, N개의 문서/문단이 검색되고 나서 그 각 N개의 결과에 매핑되어 있는 개체들을 첫째, 순서대로 검색 결과로 반환하는 제1 방법과 둘째, 상위 K개 내에서 각 개체의 빈도수를 계산하여 빈도수 높은 순서로 반환하는 제2 방법과 셋째, 검색 결과 랭크(rank) 점수와 빈도수를 보간(Interpolation)하여 반환하는 제3 방법이 가능하다.In this case, after N documents / paragraphs are searched, the first method of returning the objects mapped to each of the N results as a search result first and second, and calculating the frequency of each object in the top K, A second method of returning in high order and a third method of interpolating and returning a search result rank score and frequency are possible.

여기서, 상기 제3 방법은 예컨대, 랭크(rank) 점수를 정규화 하여 반환할 수 있다. 이때, 랭크 점수 정규화는 "(Entity[i]의 rank점수 - 최저rank점수)/(최고 rank점수 - 최저 rank 점수)"로 정규화 하여 [0 - 1] 사이 값으로 변환할 수 있다.Here, the third method may normalize and return a rank score, for example. At this time, the rank score normalization may be converted to a value between [0-1] by normalizing to "(rank rank-lowest rank score) / (highest rank score-lowest rank score) of Entity [i]".

또한, 빈도수 점수 정규화는 "(Entity[i]의 빈도수 - 빈도수가 최저인 Entity의 빈도수)/(빈도수가 최고인 Entity의 빈도수 - 빈도수가 최저인 Entity의 빈도수)"로 정규화 하여 [0 - 1] 사이 값으로 변환할 수 있다.In addition, the frequency score normalization is normalized to "(frequency of Entity [i]-frequency of Entity with lowest frequency) / (frequency of Entity with highest frequency-frequency of Entity with lowest frequency)" between [0-1]. Can be converted to a value.

또한, "Score(Entity[i]) = (1-lambda)*(정규화된 Entity[i]의 rank점수) + lambda*(정규화된 Entity[i]의 빈도수)"로 정규화 하여 변환할 수 있다.In addition, it can be converted by normalizing to "Score (Entity [i]) = (1-lambda) * (rank score of normalized Entity [i]) + lambda * (frequency of normalized Entity [i])".

도 12를 참조하면, 웹 서버(400)를 통해 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하는지 판단한 후(S408), 상기 단계S408에서의 판단 결과, 해당 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 모두 존재하지 않을 경우, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 키워드가 존재하는지 판단한다(S409).Referring to FIG. 12, after determining whether the meta information and the target keyword stored in advance in the user search keyword transmitted from the user terminal 500 through the web server 400 exist (S408), as a result of the determination in step S408, the corresponding result is determined. If neither the meta information prestored in the user search keyword nor the target keyword exist, it is determined whether the meta information keyword pre-stored in the user search keyword exists (S409).

상기 단계S409에서의 판단 결과, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 키워드가 존재할 경우, 해당 사용자 검색 키워드를 형태소 분석한 후(S410), 해당 메타 정보 키워드를 제외한 상기 단계S410에서 형태소 분석된 사용자 검색 키워드와 문서/문단 단위로 개체 후보 리스트가 매핑(mapping)되어 있는 인덱싱 서버(300)에 저장된 문서/문단들을 비교 분석한다(S411).As a result of the determination in step S409, when there is a meta information keyword stored in advance in the user search keyword, after the stemming of the corresponding user search keyword is performed (S410), the stemming analysis of the user in step S410 except for the meta information keyword is performed. The document / paragraphs stored in the indexing server 300 to which the entity candidate list is mapped in terms of keywords and documents / paragraphs are compared and analyzed (S411).

그런 다음, 해당 메타 정보 키워드를 제외한 상기 단계S410에서 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색한 후(S412), 상기 단계S412에서 검색된 문서/문단 결과 리스트에 매핑(mapping)된 개체 결과 리스트를 검색한다(S413).Then, after searching for the user search keyword which has been stemmed in step S410 except for the meta information keyword in the document / paragraph result list (S412), the object mapped to the document / paragraph result list retrieved in step S412. The result list is retrieved (S413).

이후에, 상기 단계S413에서 검색된 개체 결과 리스트에서 해당 메타 정보 키워드와 관련된 개체 결과 리스트를 필터링(Filtering)하여 해당 사용자 단말(500) 에 디스플레이 해준다(S414).Thereafter, the entity result list related to the corresponding meta information keyword is filtered in the entity result list found in step S413 and displayed on the corresponding user terminal 500 (S414).

예를 들면, "쫓고 쫓기는 하정우 영화"라고 검색하게 되면, "하정우"라는 메타 정보 키워드가 존재하고 있음을 파악하고, "쫓고 쫓기는"이라는 다른 검색 키워드가 있음을 파악하게 된다. 한편, "영화"는 영화 개체 검색에서는 흔한 단어라서 예외 단어 리스트에 포함할 수 있다.For example, a search for "Ha Jung-woo movie chased and chased" will find that there is a meta information keyword "Ha Jung-woo", and another search keyword "chased and chased". Meanwhile, "movie" is a common word in a movie entity search and may be included in the exception word list.

먼저, "쫓고 쫓기는"으로 개체 검색을 수행하여 개체 결과 리스트를 획득한 후, 상기 획득된 개체 결과 리스트에서 "하정우"라는 메타 정보 키워드로 개체 결과 리스트를 필터링 한다. 마지막으로, 상기 필터링 된 결과를 반환한다.First, an entity search is performed by "tracking and chasing" to obtain an entity result list, and then, on the obtained entity result list, the entity result list is filtered by the meta information keyword "Hajeongwoo". Finally, return the filtered result.

도 12 및 도 13을 참조하면, 웹 서버(400)를 통해 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하는지 판단한 후(S408), 상기 단계S408에서의 판단 결과, 해당 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 모두 존재하지 않을 경우, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 키워드가 존재하는지 판단한다(S409).12 and 13, after determining whether the meta information and the target keyword stored in advance in the user search keyword transmitted from the user terminal 500 through the web server 400 exist (S408), the determination in step S408 is performed. As a result, when neither the meta information prestored in the user search keyword nor the target keyword exist, it is determined whether the meta information keyword prestored in the user search keyword exists (S409).

상기 단계S409에서의 판단 결과, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 키워드가 존재하지 않을 경우, (B)단계로 진행하여 해당 사용자 검색 키워드에 미리 저장된 대상 키워드가 존재하는지 판단한다(S415).As a result of the determination in step S409, if there is no meta information keyword previously stored in the user search keyword, the process proceeds to step (B) to determine whether the target keyword previously stored in the user search keyword exists (S415).

상기 단계S415에서의 판단 결과, 해당 사용자 검색 키워드에 미리 저장된 대상 키워드가 존재할 경우, 해당 사용자 검색 키워드를 형태소 분석한 후(S416), 해당 대상 키워드를 제외한 상기 단계S416에서 형태소 분석된 사용자 검색 키워드와 문서/문단 단위로 개체 후보 리스트가 매핑(mapping)되어 있는 인덱싱 서버(300)에 저장된 문서/문단들을 비교 분석한다(S417).As a result of the determination in step S415, if there is a target keyword stored in advance in the corresponding user search keyword, the user search keyword is morphologically analyzed (S416), and the user search keyword which is morphologically analyzed in step S416 except for the target keyword is selected. The document / paragraphs stored in the indexing server 300 to which the entity candidate list is mapped on a document / paragraph basis are compared and analyzed (S417).

그런 다음, 해당 대상 키워드를 제외한 상기 단계S416에서 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색한 후(S418), 상기 단계S418에서 검색된 문서/문단 결과 리스트에 매핑(mapping)된 개체 결과 리스트를 검색한다(S419).Then, after searching for the user search keyword stemmed in step S416 except for the target keyword in the document / paragraph result list (S418), the entity result mapped to the document / paragraph result list retrieved in step S418. Search the list (S419).

이후에, 상기 단계S419에서 검색된 개체 결과 리스트를 해당 대상 키워드의 긍정 또는 전체 의견수가 많은 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말(500)의 화면에 디스플레이 해준다(S420).Thereafter, the entity result list retrieved in step S419 is rearranged in the order of positive or total opinions of the corresponding target keywords and displayed on the screen of the corresponding user terminal 500 (S420).

예를 들면, "감동적인 전쟁영화"라고 검색하게 되면, "감동"이라는 대상(Sub-theme)이 있음을 파악하고, "전쟁"이라는 개체(Entity) 검색 요소가 포함되어 있음을 파악한다. 즉, 메타 정보 키워드나 대상 키워드가 아닌 경우가 개체 검색할 키워드들이다.For example, a search for "inspiring war movie" identifies a sub-theme called "inspiration" and an entity search element called "war." That is, the keywords to search for the object are not meta information keywords or target keywords.

먼저, "전쟁"이라는 단어로 개체(Entity) 검색을 수행한 후, 결과로 나온 개체 결과 리스트를 "감동"이라는 대상으로 전체 의견수 또는 긍정 의견수가 많은 개체 리스트 순으로 다시 재정렬한다. 마지막으로, 상기 재정렬된 결과를 반환한다.First, an entity search is performed with the word "war", and the resultant entity result list is rearranged in order of the total number of opinions or the number of positive opinions with the object of "inspiration". Finally, the reordered result is returned.

도 12 및 도 14를 참조하면, 웹 서버(400)를 통해 사용자 단말(500)로부터 전송된 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 존재하는지 판단한 후(S408), 상기 단계S408에서의 판단 결과, 해당 사용자 검색 키워드에 미리 저장된 메타 정보 및 대상 키워드가 모두 존재하지 않을 경우, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 키워드가 존재하는지 판단한다(S409).12 and 14, after determining whether the meta information and the target keyword stored in advance in the user search keyword transmitted from the user terminal 500 through the web server 400 exist (S408), the determination in step S408 is performed. As a result, when neither the meta information prestored in the user search keyword nor the target keyword exist, it is determined whether the meta information keyword prestored in the user search keyword exists (S409).

상기 단계S409에서의 판단 결과, 상기 사용자 검색 키워드에 미리 저장된 메타 정보 키워드가 존재할 경우, (C)단계로 진행하여 해당 사용자 검색 키워드에 미리 저장된 대상 키워드가 존재하는지 판단한다(S421).As a result of the determination in step S409, if the meta information keyword previously stored in the user search keyword exists, the process proceeds to step (C) to determine whether the target keyword previously stored in the user search keyword exists (S421).

상기 단계S421에서의 판단 결과, 해당 사용자 검색 키워드에 미리 저장된 대상 키워드가 존재할 경우, 해당 사용자 검색 키워드를 형태소 분석한 후(S422), 해당 메타 정보 및 대상 키워드를 제외한 상기 단계S422에서 형태소 분석된 사용자 검색 키워드와 문서/문단 단위로 개체 후보 리스트가 매핑(mapping)되어 있는 인덱싱 서버(300)에 저장된 문서/문단들을 비교 분석한다(S423).As a result of the determination in step S421, if there is a target keyword stored in advance in the corresponding user search keyword, after the stemming of the corresponding user search keyword (S422), the user who has been stemmed in step S422 except for the corresponding meta information and the target keyword The document / paragraphs stored in the indexing server 300 to which the object candidate list is mapped in units of search keywords and documents / paragraphs are analyzed in operation S423.

그런 다음, 해당 메타 정보 및 대상 키워드를 제외한 상기 단계S422에서 형태소 분석된 사용자 검색 키워드를 문서/문단 결과 리스트에서 검색한 후(S424), 상기 단계S424에서 검색된 문서/문단 결과 리스트에 매핑(mapping)된 개체 결과 리스트를 검색한다(S425).Then, after searching the document / paragraph result list except for the meta information and the target keyword, the user search keyword searched in step S422 is mapped to the document / paragraph result list searched in step S424. The searched object result list is searched (S425).

이후에, 상기 단계S425에서 검색된 개체 결과 리스트에서 해당 메타 정보 키워드와 관련된 개체 결과 리스트를 필터링(Filtering)한 후(S426), 상기 단계S426에서 필터링 된 개체 결과 리스트를 해당 대상 키워드의 긍정 또는 전체 의견수가 많은 순서로 개체 결과 리스트를 재정렬하여 해당 사용자 단말(500)의 화면에 디스플레이 해준다(S427).Subsequently, after filtering the entity result list related to the corresponding meta information keyword from the entity result list retrieved in step S425 (S426), the object result list filtered in step S426 is positive or total opinion of the target keyword. The result list of the objects is rearranged in the order of increasing number and displayed on the screen of the corresponding user terminal 500 (S427).

예를 들면, "감동적인 정재영의 전쟁 영화"라고 검색하게 되면, "감동"이라는 대상(Sub-theme)이 있음을 파악하고, "정재영"이라는 메타 정보 키워드가 있음을 파악한다. 그리고, "전쟁"이라는 개체(Entity) 검색 요소도 포함되어 있음을 파 악한다.For example, searching for "Emotional War Movies by Jeong Jae-young" identifies a sub-theme of "emotion" and a meta-information keyword of "Jeong Jae-young." It also knows that an entity search element called "war" is also included.

먼저, "전쟁"이라는 단어로 개체(Entity) 검색을 수행한 후, "정재영"이라는 메타 정보 키워드로 개체 결과 리스트를 필터링 한다. 그런 다음, 상기 필터링 된 개체 결과 리스트를 "감동"이라는 대상으로 전체 의견수 또는 긍정 의견이 많은 개체 결과 리스트 순으로 다시 재정렬한다.First, an entity search is performed with the word "war", and then the result list of the entity is filtered by the meta information keyword "Jeong Jae-young". Then, the filtered object result list is rearranged in order of the total number of opinions or the number of positive opinions.

도 15 내지 도 19는 본 발명의 일 실시예에 따른 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 방법을 이용하여 사용자 단말의 화면에 디스플레이 되는 키워드 검색 결과 화면을 나타낸 도면이다.15 to 19 are diagrams illustrating keyword search result screens displayed on a screen of a user terminal by using an entity search using the Internet and a hybrid-based opinion analysis method for the same according to an embodiment of the present invention.

도 15 및 도 16을 참조하면, "트랜스포머" 및 "하정우"라는 메타 정보 키워드에 대한 검색 결과를 나타낸 화면으로서, 상기 검색 결과 화면상에는 검색된 기본 영화 정보들(예컨대, 포스트, 제목, 감독, 배우 등)을 표현하는 제1 표시창(1000)과, 대상 항목별(예컨대, 총평, 장면, 스토리, 재미, 배우, 감동, 연기 등) 의견통계 정보들(예컨대, 총평점, 대상항목별 평점 및 평점 표현 방법(ex, 그래프) 등)을 표현하는 제2 표시창(2000)과, 검색된 개체 결과 리스트와 이와 관련된 이미지, 동영상 및 긍정/부정 의견통계 정보 등을 표현하는 제3 표시창(3000)과, 검색된 기본 영화(예컨대, 가장 전체 의견이 많은 영화)에 대한 의견의 원문을 표현하는 제4 표시창(4000)을 포함할 수 있다.15 and 16, a screen showing a search result for meta information keywords such as “transformer” and “Ha Jung-woo”, and basic searched movie information (eg, post, title, director, actor, etc.) are searched on the search result screen. ) And the opinion display information (eg, total rating, rating for each item and rating method) for each target item (eg, general rating, scene, story, fun, actor, impression, acting, etc.) (ex, graph), etc.), a second display window 2000 expressing a searched object result list and related image, video, and positive / negative statistical information, and a searched basic movie. A fourth display window 4000 that expresses the original text of the opinion of the movie (eg, the movie with the most overall opinion) may be included.

여기서, 제3 표시창(3000)에서 검색된 개체 결과 리스트를 가장 전체 의견이 많은 순서 또는 가장 긍정/부정 의견이 많은 순서로 재정렬하여 보여줄 수 있다.Here, the object result list searched on the third display window 3000 may be rearranged and displayed in the order of the highest total opinion or the order of the most positive / negative opinion.

한편, 도 17을 참조하면, 전술한 도 16에 도시된 바와 달리 "하정우"라는 메타 정보 키워드에 대한 검색 결과를 나타낸 다른 화면으로서, 좀더 간략하고 보기 쉽게 구성한 디스플레이 화면이다.Meanwhile, referring to FIG. 17, unlike FIG. 16, another screen showing a search result for the meta information keyword “Ha Jung-woo” is a display screen configured to be simpler and easier to see.

도 18을 참조하면, "반전이 있는 영화"라는 대상(Sub-theme) 키워드에 대한 검색 결과를 나타낸 화면으로서, "반전"이라는 대상에 대한 전체 의견 또는 긍정 의견이 많은 순서대로 검색 결과를 정렬하여 개체 결과 리스트를 보여줄 수 있다. 도 17에서는 "반전"에 대한 긍정 의견이 많은 순서로 정렬한 디스플레이 화면이다.Referring to FIG. 18, a screen showing a search result of a sub-theme keyword of "movie with inversion", and the search results are sorted in order of the total opinion or positive opinion of the object of "inversion". You can display the object result list. In FIG. 17, a display screen is arranged in order of affirmative opinion about "inversion".

도 19를 참조하면, "쫓고 쫓기는 영화"라는 사용자 검색 키워드에 대한 개체 검색 결과를 나타낸 화면으로서, 좌측 화면에 해당 사용자 검색 키워드에 대해서 개체 검색 리스트(예컨대, 추격자, 싸움의 기술, 노인을 위한 나라는 없다, 놈놈놈 등)가 표시되어 있다.Referring to FIG. 19, a screen showing an individual search result for a user search keyword of “movie chased and chased”, and on the left screen, an object search list (eg, a chaser, a fighting skill, an elderly person) for the user search keyword. There is no country, nom nom, etc.) is displayed.

전술한 본 발명에 따른 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템 및 그 방법에 대한 바람직한 실시예에 대하여 설명하였지만, 본 발명은 이에 한정되는 것이 아니고 특허청구범위와 발명의 상세한 설명 및 첨부한 도면의 범위 안에서 여러 가지로 변형하여 실시하는 것이 가능하고 이 또한 본 발명에 속한다.Although the above-described preferred embodiments of the object search using the Internet according to the present invention and a hybrid-based opinion analysis system and method therefor have been described, the present invention is not limited thereto, and the claims and the detailed description of the invention are described. And it is possible to carry out various modifications within the scope of the accompanying drawings, which also belongs to the present invention.

도 1은 본 발명의 일 실시예에 따른 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 시스템을 설명하기 위한 전체적인 블록 구성도이다.FIG. 1 is a block diagram illustrating an entity search using the Internet and a hybrid-based opinion analysis system for the same according to an embodiment of the present invention.

도 2는 본 발명의 일 실시예에 적용된 데이터 분석서버를 구체적으로 설명하기 위한 블록 구성도이다.2 is a block diagram illustrating in detail a data analysis server applied to an embodiment of the present invention.

도 3은 본 발명의 일 실시예에 따른 인터넷을 활용한 개체 검색과 이를 위한 하이브리드 기반의 의견분석 방법을 설명하기 위한 전체적인 흐름도이다.FIG. 3 is a flowchart illustrating an object search using the Internet and a hybrid analysis method based on the same according to an embodiment of the present invention.

도 4 및 도 5는 본 발명의 일 실시예에 적용된 각 개체의 각 대상에 대한 긍정/부정 의견통계 정보들을 분석하는 과정을 구체적으로 설명하기 위한 흐름도이다.4 and 5 are flowcharts illustrating in detail a process of analyzing affirmative / negative opinion statistics information for each object of each entity applied to an embodiment of the present invention.

Claims

delete

A first server for collecting web document data present on the Internet;

A data analysis server that receives the web document data collected from the first server, extracts meta information for each object, and analyzes positive / negative statistical information about each object of each object by using the meta data for each object;

A second server for indexing positive / negative opinion statistics information for each object of each individual including the meta information for each entity analyzed from the data analysis server to be stored in a database; And

Receives a user search keyword transmitted from a user terminal connected through the Internet, and interoperates with the second server to determine whether there is meta information or a target keyword pre-stored in the user search keyword, and meta information prestored in the user search keyword. Or, if the target keyword exists, the web server displaying the result of the object list related to the corresponding meta information or the target keyword on the screen of the corresponding user terminal.

The first server collects and stores RSS addresses on the Internet,

An object search using the Internet and a hybrid based opinion analysis system for collecting web document data using the link information provided by each RSS file by receiving RSS files corresponding to the stored RSS addresses.

The method of claim 2,

The data analysis server is a hybrid based opinion analysis system for object searching using the Internet, characterized in that for extracting meta information for each object by using a regular expression representing a string of a predetermined form in the web document data.

A first server for collecting web document data present on the Internet;

The data analysis server,

A first module for classifying the collected web document data by sector using a preset machine learning model for each sector;

A second module for performing language processing on the collected web document data to extract opinion sentences, and dividing the extracted opinion sentences into positive / negative opinion expressions;

A third module for determining which object corresponds to the collected web document data using the object-specific meta information; And

And a hybrid module for object searching using the Internet, characterized in that it comprises a fourth module for determining which object is used by using words and parts of speech information around the opinion sentence extracted through the second module. Opinion Analysis System.

5. The method of claim 4,

The second module,

A language processor that separates the collected web document data into sentence units and performs linguistic processing on each of the separated sentences to extract linguistic features;

An opinion / non-computation division unit for classifying opinion / non-comment sentences using the linguistic qualities of the extracted sentences; And

An object search using the Internet and a hybrid-based opinion analysis system for the same, characterized in that it comprises a opinion expression division for classifying the linguistic qualities of the divided opinion sentences into positive / negative opinion expression.

5. The method of claim 4,

The second module applies a rule-based model that is set in advance when there is a predefined rule for the extracted opinion sentence, and divides it into positive / negative opinion expressions.

If there is no predefined rule for the extracted opinion sentence, applying the preset machine learning model to classify the expression into positive / negative opinion expression. Analysis system.

5. The method of claim 4,

The second module determines a positive / negative opinion expression by simultaneously applying a rule-based model and a machine learning model to the extracted opinion sentence at the same time, and then applying the rule-based model and the machine learning model. An object search using the Internet and a hybrid-based opinion analysis system for assigning different reliability scores according to whether or not the results are matched, and classifying them into positive / negative opinion expressions based on the reliability scores.

5. The method of claim 4,

The second module determines whether the extracted exception sentence is a predefined exception handling rule candidate sentence, and if the extracted opinion sentence is a predefined exception handling rule candidate sentence, An object search using the Internet and a hybrid-based opinion analysis system for distinguishing positive / negative / neutral opinion expressions by applying a predefined exception handling rule-based model or a machine learning model according to existence.

The method of claim 8,

The second module is affirmed by simultaneously applying a rule-based model and a machine learning model preset to the extracted opinion sentence when the extracted opinion sentence is not a predefined exception handling rule candidate sentence. After determining the negative opinion expression, the different reliability scores are assigned according to whether the rule-based model and the machine learning model are matched, and classified into positive / negative opinion expressions based on the reliability scores. Object search using the Internet and hybrid-based opinion analysis system.

The method of claim 8,

The second module may express the positive / negative / neutral opinion by applying a predefined exception handling rule-based model when the extracted opinion sentence is a predefined exception handling rule candidate sentence and a predefined exception handling rule exists. Separated by

When the extracted opinion sentence does not have a pre-defined exception handling rule, a predetermined machine learning model is applied to classify the object into positive / negative / neutral opinion expressions. Based opinion analysis system.

The method according to claim 2 or 4,

Positive / negative opinion statistical information about each object of each object stored in the second server is the object ID, the object, the number of positive / negative opinion expressions for each object of each object, the total number of opinion expressions, or opinions using each object. An object search using the Internet, characterized in that the information consists of at least one of the contents and a hybrid-based opinion analysis system for this.

A first server for collecting web document data present on the Internet;

The web server, when there is no meta information or a target keyword stored in advance in the user search keyword, the second server that is the object mapping is searched by the user search keyword to display a list of related object results. Object search using the Internet and hybrid-based opinion analysis system.

The method according to any one of claims 2, 4 or 12,

And the web server is a hybrid based opinion analysis system for object searching using the Internet, characterized in that to display a list of object results in the order of a lot of positive / negative opinions according to the opinion analysis results for the user search keyword.

A first server for collecting web document data present on the Internet;

The web server,

A keyword analysis module that analyzes a user search keyword transmitted from the user terminal to determine whether there is meta information or a target keyword stored in the second server, and classifies a keyword search method according to the determination result; And

And a keyword search module for interworking with the second server according to the keyword search method classified from the keyword analysis module to search for entities related to the corresponding meta information or target keyword and to display the object list result on the screen of the corresponding user terminal. Object search using the Internet and hybrid based opinion analysis system for it.

The method according to any one of claims 2, 4, 12 or 14,

The web server searches for entities related to the meta information keyword if meta information and target keyword pre-stored in the user search keyword exist, and then, among the affirmative / negative or total opinion order of the target keyword for the searched entities. An object search utilizing the Internet, characterized by rearranging and displaying the object result list in any one order, and a hybrid-based opinion analysis system for the same.

A first server for collecting web document data present on the Internet;

Positive / negative opinion statistics information for each object of each object stored in the second server is stored so that each object is mapped on a document / paragraph basis.

If the meta information and the target keyword previously stored in the user search keyword do not exist, the web server may stem the corresponding user search keyword and then search the stemmed user search keyword and the documents / paragraphs stored in the second server. By comparing and analyzing the search results of the morphologically analyzed user search keyword in the document / paragraph result list, and displaying the entity result list mapped to the searched document / paragraph result list on the corresponding user terminal. Search and hybrid based opinion analysis system.

A first server for collecting web document data present on the Internet;

When the meta information stored in the user search keyword and the target keyword do not exist together, and the entity keyword exists together with the stored meta information keyword, the web server may stem the corresponding user search keyword, and then the corresponding meta information keyword Compare and analyze the stemmed user search keyword excluding the document and paragraphs stored in the second server, search for the stemmed user search keyword except the meta information keyword in the document / paragraph result list, and search the searched document / paragraph. Search for an entity result list mapped to a result list, and then filter the entity result list related to the corresponding meta information keyword from the searched entity result list and display it on the corresponding user terminal. Hybrid based Feedback Analysis System.

A first server for collecting web document data present on the Internet;

When the meta information and the target keyword which are pre-stored in the user search keyword do not exist together and the entity keyword exists with the pre-stored target keyword, the web server may stem the corresponding user search keyword and then exclude the target keyword. Compare and analyze the stemmed user search keyword and the documents / paragraphs stored in the second server, search for the stemmed user search keyword except the target keyword in the document / paragraph result list, and enter the searched document / paragraph result list. Search for the mapped entity result list, and then search for the entity result list by displaying the searched entity result list in the order of positive or total opinions of the corresponding target keywords and displaying the result list on the corresponding user terminal. And hybrids for this Based on the analyzed system.

A first server for collecting web document data present on the Internet;

The web server may, if the entity keyword exists together with the meta information and the target keyword pre-stored in the user search keyword, after stemming the corresponding user search keyword, and the stemmed analysis of the user search keyword excluding the meta information and the target keyword; Comparatively analyze the documents / paragraphs stored in the second server, search for the stemmed user search keyword except the meta information and the target keyword in the document / paragraph result list, and the entity result mapped to the searched document / paragraph result list. After searching the list, the entity result list related to the corresponding meta information keyword is filtered in the searched entity result list, and the filtered entity result list is rearranged in the order of positive or total opinions of the target keyword. Display on the corresponding user terminal Using the Internet, wherein the object discovery that hybrid systems based on analyzed for this purpose.

The method according to any one of claims 2, 4, 12, 14, 16, 17, 18, or 19,

The web server, a hybrid based opinion analysis system for the object search using the Internet characterized in that to display the opinion information of each individual with the object list results on the user terminal.

delete

(a) collecting web document data residing on the internet;

(b) receiving the collected web document data, extracting meta information for each object, and analyzing positive / negative statistical information about each object of each individual using the individual meta information;

(c) indexing positive / negative statistical information about each object of each individual including the analyzed individual meta-information to be stored in a database; And

(d) determining whether there is a meta information or a target keyword pre-stored in the user search keyword transmitted from the user terminal connected through the Internet, and if the meta information or a target keyword is pre-stored in the user search keyword, the screen of the user terminal Including displaying the result of the object list related to the corresponding meta information or the target keyword,

Step (b) is,

(b-1) classifying the collected web document data by sector using a preset machine learning model for each sector;

(b-2) extracting an opinion sentence by performing language processing on the collected web document data, and dividing the extracted opinion sentence into positive / negative opinion expressions;

(b-3) determining which entity corresponds to the collected web document data using the entity-specific meta information; And

(b-4) searching for an object using the Internet, characterized in that it comprises a step of determining which object corresponds to the word and part-of-speech information around the opinion sentence extracted in the step (b-2); Hybrid based opinion analysis method for this.

23. The method of claim 22,

Step (b-2),

Dividing the collected web document data into sentence units, and performing linguistic processing on the separated sentences to extract linguistic features;

Classifying opinion / non-computation sentences using linguistic qualities of the extracted sentences; And

And a hybrid-based opinion analysis method for object searching using the Internet, comprising the step of dividing the linguistic qualities of the divided opinion sentences into positive / negative opinion expressions.

23. The method of claim 22,

In step (b-2), if there is a predefined rule for the extracted opinion sentence, a rule-based model is applied and a positive / negative opinion expression is applied. If there is no predefined rule for a sentence, the object search using the Internet and hybrid-based opinion analysis method for distinguishing a positive / negative opinion expression by applying a preset machine learning model.

23. The method of claim 22,

In step (b-2), the rule-based model and the machine learning model are applied to the extracted opinion sentence at the same time to determine positive / negative expression, and then the rule-based model and machine learning Individual search using the Internet and hybrid based opinion analysis method for assigning different reliability scores according to the results of applying the model, and classifying them into positive / negative opinion expressions based on the reliability scores. .

23. The method of claim 22,

In step (b-2), it is determined whether the extracted exception sentence is a predefined exception handling rule candidate sentence with respect to the extracted opinion sentence, and the exception is predefined when the extracted opinion sentence is a predefined exception handling rule candidate sentence. Based on the existence of processing rules, applying the exception-based rule-based model or machine learning model to classify them into positive / negative / neutral opinion expressions. Way.

The method of claim 26,

When the extracted opinion sentence is not a predefined exception handling rule candidate sentence, a rule-based model and a machine learning model that are preset to the extracted opinion sentence are simultaneously applied to determine positive / negative opinion expression. After that, different reliability scores are given according to whether the rule-based model and the machine learning model are matched, and the object search using the Internet is divided into positive / negative opinion expressions based on the reliability scores. And hybrid based opinion analysis method.

The method of claim 26,

The extracted opinion sentence is a predefined exception handling rule candidate sentence, and if there is a predefined exception handling rule, it is divided into positive / negative / neutral opinion expression by applying a predefined exception handling rule-based model, and extracting the above. If there is no pre-defined exception handling rule, the specified opinion sentence is classified into positive / negative / neutral opinion expression by applying a preset machine learning model. Opinion analysis method.

(a) collecting web document data residing on the internet;

In step (d), if there is no meta information or a target keyword stored in advance in the user search keyword, the corresponding user search keyword is searched in the database to which the object is mapped and the related entity result list is displayed. Object searching using the Internet and hybrid based opinion analysis method.

The method of claim 22 or 29,

In the step (d), the entity search using the Internet and hybrid based opinion analysis for displaying the object result list in the order of a lot of positive / negative opinions according to the opinion analysis result for the user search keyword. Way.

The method of claim 22 or 29,

In step (d), if the meta information and the target keyword pre-stored in the user search keyword exist, the entities related to the meta information keyword are searched, and then the positive / negative or overall opinion of the target keyword is searched for the searched entities. An object search using the Internet, and a hybrid-based opinion analysis method for displaying the object result list in any one order.

(a) collecting web document data residing on the internet;

In the step (c), affirmative / negative statistical information about each object of each entity is stored so that each entity is mapped on a document / paragraph basis,

In step (d), if there is no meta information and a target keyword previously stored in the user search keyword, the corresponding user search keyword is morphologically analyzed, and then the morphologically analyzed user search keyword is compared with the stored documents / paragraphs. Search for the morphologically analyzed user search keyword in the document / paragraph result list, and display the object result list mapped to the searched document / paragraph result list on the corresponding user terminal. Hybrid based opinion analysis method for this.

(a) collecting web document data residing on the internet;

In the step (c), affirmative / negative statistical information about each object of each entity is stored so that each entity is mapped on a document / paragraph basis.

In the step (d), if the meta information and the target keyword which are pre-stored in the user search keyword do not exist together and the entity keyword exists together with the pre-stored meta information keyword, the corresponding user search keyword is stemmed and then the meta Compare and analyze the stemmed user search keyword excluding the information keyword and the stored document / paragraph, and search the stemmed user search keyword excluding the meta information keyword in the document / paragraph result list, and search the searched document / paragraph result list. Search for an entity result list mapped to and then filter the entity result list related to the corresponding meta information keyword from the searched entity result list and display it on the corresponding user terminal. Opinion of Way.

(a) collecting web document data residing on the internet;

In the step (d), if the meta information and the target keyword pre-stored in the user search keyword do not exist together and the entity keyword exists together with the pre-stored target keyword, the corresponding user search keyword is stemmed and then the target keyword Compare and analyze the stemmed user search keyword except for the stored document / paragraph, and search for the stemmed user search keyword except the target keyword in the document / paragraph result list and map the searched document / paragraph result list. After searching the individual result list, the searched individual result list is rearranged in the order of positive or total opinions of the corresponding target keywords and displayed on the corresponding user terminal. Hybrid based for Dogs analysis method.

(a) collecting web document data residing on the internet;

In the step (d), if the entity keyword exists together with the meta information and the target keyword pre-stored in the user search keyword, after the stemming of the corresponding user search keyword, the stemmed analysis of the user search except for the meta information and the target keyword By comparing and analyzing a keyword and the stored document / paragraphs, a search for a stemmed user search keyword excluding corresponding meta information and a target keyword is performed in a document / paragraph result list, and an entity result list mapped to the searched document / paragraph result list. After searching, the entity result list related to the meta information keyword is filtered from the searched entity result list, and the filtered entity result list is rearranged in the order of positive or total opinions of the target keyword, and the corresponding user is rearranged. Display on the terminal That the use of the Internet featuring object retrieval and analysis method based on hybrid feedback for them.

36. The method according to any one of claims 22, 29, 32, 33, 34 or 35,

In the step (d), the object search using the Internet and hybrid based opinion analysis method for this, characterized by displaying the opinion information of each individual with the result of the object list on the corresponding user terminal.

(a) collecting web document data residing on the internet;

In the step (d), when displaying the search result for the corresponding user keyword on the screen of the corresponding user terminal, the individual keyword information and the target keyword statistical information automatically extracted for the individual are either positive, overall or negative. One comment can be arranged in many orders, display the positive / negative value of the target keyword and the symbol indicating it, and then display the positive / negative opinions about the found object left and right, or select the positive / negative tab. Object searching using the Internet and hybrid based opinion analysis method.