KR100996997B1

KR100996997B1 - User ordered blogger analysis system and method

Info

Publication number: KR100996997B1
Application number: KR1020100020147A
Authority: KR
Inventors: 박성배; 박세영; 손정우
Original assignee: 경북대학교 산학협력단
Priority date: 2010-03-05
Filing date: 2010-03-05
Publication date: 2010-11-25
Also published as: WO2011108845A2; WO2011108845A3

Abstract

PURPOSE: A system and a method for analyzing a user customized blogger are provided to extract feature information of a blogger community, thereby providing customized blogger analysis to a user by the feature information. CONSTITUTION: A server(100) classifies blogger information. The server extracts a blogger community. The server analyzes a user feature by comparing user information with feature information of the blogger community. The server analyzes features of the user and the community according to flow of an event based on time. A browser terminal includes a service UI(User Interface)(10). An analyzed result of the server is provided to the user through the service UI.

Description

User ordered blogger analysis system and method}

본 발명은 사용자 맞춤 블로거 분석 시스템 및 방법에 관한 것으로서, 특히 블로그, 미니 홈페이지와 같은 인터넷 서비스에서 온톨로지에 저장된 블로거 정보를 통해 찾은 블로거 커뮤니티의 특징 정보 및 사용자 정보를 이용하여 통시적/공시적 분석을 통해 사용자 맞춤 불로거 분석을 수행하기 위한 시스템 및 방법에 관한 것이다.
The present invention relates to a system and method for user-customized blogger analysis. In particular, the present invention analyzes the temporal / public analysis using characteristic information and user information of the blogger community found through blogger information stored in the ontology in an internet service such as a blog and a mini homepage. A system and method for performing custom bulllogger analysis is provided.

최근에는 블로그, 미니 홈페이지 등의 대중화로 인해 사용자는 쉽게 자신이 생성한 컨텐츠(UCC)를 다른 사람과 공유할 수 있게 되었다. 이와 같은 현상은 인터넷 사용자로 하여금 특정 언론 혹은 포털에서 얻지 못하는 다양한 정보를 쉽게 얻을 수 있게 하여 사용자의 정보 접근성을 향상 시킨다. 이러한 정보를 얻기 위해 기존의 시스템에서는 키워드 기반의 검색을 사용한다. 사용자가 모든 컨텐츠를 일일이 확인하지 않는 한 이와 같은 기존 검색 기반의 시스템은 사용자가 정보에 접근할 수 있는 유일한 방법이다.Recently, due to the popularization of blogs and mini homepages, users can easily share their contents (UCC) with others. This phenomenon improves the accessibility of users' information by making it easier for Internet users to obtain various information not obtained from a specific press or portal. To get this information, existing systems use keyword-based searches. Existing search-based systems like this are the only way for users to access information unless the user checks every piece of content.

사용자가 생성하여 공유한 컨텐츠는 그 자체만으로도 유용한 정보를 제공할 뿐 아니라, 이와 같은 컨텐츠가 모여 컨텐츠 생성자인 특정 사용자에 대한 정보도 제공 할 수 있다. 예를 들어, 특정 블로거가 올린 “iphone”에 관한 컨텐츠는 “iphone”에 관한 유용한 정보를 제공한다. 그리고 이와 같은 컨텐츠들이 모여 이 블로거가 어떤 분야에 관심이 많은지 어떤 제품에 호감을 가지고 있는지와 같은 정보를 제공할 수 있다. Content generated and shared by a user may not only provide useful information by itself, but also provide information on a specific user who is a content creator by gathering such contents. For example, content about “iphone” posted by a particular blogger provides useful information about “iphone”. And content like this can provide information such as what areas the blogger is interested in and what products she likes.

하지만 기존의 검색 기반 시스템에서는 이와 같은 정보를 찾을 수 있는 방법이 전무하다. 상술한 바와 같은 예에 비추어 보면, "iphone"은 키워드를 통해 쉽게 정보를 얻을 수 있으나, "iphone"를 보유한 블로거들이 관심 있어 하는 분야“와 같은 정보는 얻을 수 없다. 따라서 기존의 검색 기반 시스템을 이용하면, 사용자는 인터넷에 존재하는 다양한 사용자 생성 컨텐츠의 모든 정보를 접근할 수도 활용할 수도 없게 된다.
However, in existing search-based systems, there is no way to find such information. In view of the above example, "iphone" can easily obtain information through a keyword, but information such as "area of interest by bloggers who hold" iphone "is not available. Therefore, using the existing search-based system, the user can not access or utilize all the information of the various user-generated content existing on the Internet.

상술한 바와 같은 문제점을 해결하기 위해 본 발명은 블로거 커뮤니티의 특징 정보를 추출하여 추출된 특징 정보를 이용하여 사용자에게 맞춤 분석을 제공함으로써 인터넷 상에서 더 많은 정보를 사용자가 접근할 수 있고, 사용자가 양질의 정보를 쉽게 활용할 수 있도록 하기 위한 사용자 맞춤 블로거 분석 시스템 및 방법을 제공함에 있다.In order to solve the above problems, the present invention extracts the feature information of the blogger community and provides the user with customized analysis using the extracted feature information so that the user can access more information on the Internet, To provide a customized blogger analysis system and method to facilitate the use of information.

또한, 본 발명은 사용자가 요구하는 정보의 특성에 맞는 커뮤니티를 찾고 이들 커뮤니티에서 생성한 컨텐츠를 보여 주어 추출된 커뮤니티 및 특징 정보의 활용을 극대화하기 위해 추출된 커뮤니티의 특징 정보를 사용자 정보와 통시적/공시적인 비교를 통해 사용자로 하여금 쉽게 그 차이점과 공통점을 찾을 수 있도록 하기 위한 사용자 맞춤 블로거 분석 시스템 및 방법을 제공함에 있다.
In addition, the present invention is to find a community that meets the characteristics of the information required by the user and to show the content generated by these communities to extract the feature information of the extracted community and user information and maximize the usage of the extracted community and feature information It provides a customized blogger analysis system and method to make it easy for users to find the differences and commonalities through public comparison.

상기 본 발명의 목적들을 달성하기 위한 사용자 맞춤 블로거 분석 시스템은, 검색된 블로거들의 블로거 정보로부터 블로거 커뮤니티 및 특징 정보를 추출하고, 추출된 블로거 커뮤니티 및 특징 정보와, 사용자 정보를 이용하여 사용자 맞춤 블로거 분석을 수행하는 서버; 및 상기 서버와 인터넷을 통해 연결되어 상기 서버로부터 분석된 결과를 전달받아 상기 분석된 결과를 사용자에게 보여주는 서비스 UI를 갖는 브라우저를 포함하는 것을 특징으로 한다.The customized blogger analysis system for achieving the objects of the present invention, extracts blogger community and feature information from the blogger information of the retrieved bloggers, and performs custom blogger analysis using the extracted blogger community and feature information and the user information. Performing server; And a browser having a service UI that is connected to the server through the Internet and receives the analyzed result from the server and displays the analyzed result to the user.

또한, 상기 본 발명의 목적들을 달성하기 위한 사용자 맞춤 블로거 분석 시스템에서의 사용자 맞춤 블로거 분석 방법은, 검색된 블로거들의 블로거 정보를 추출하는 단계; 추출된 블로거 정보들로부터 블로거 커뮤니티를 찾고 찾은 블로거 커뮤니티의 특징 정보를 추출하는 단계; 추출된 블로거 커뮤니티의 특징 정보와, 사용자 정보를 이용하여 사용자 맞춤 블로거 분석을 수행하는 단계; 및 사용자 맞춤 블로거 분석 결과를 사용자에게 보여주는 단계를 포함하는 것을 특징으로 한다.
In addition, a custom blogger analysis method in a custom blogger analysis system for achieving the objects of the present invention, extracting blogger information of the retrieved bloggers; Searching for a blogger community and extracting feature information of the found blogger community from the extracted blogger information; Performing a customized blogger analysis using the extracted blogger community feature information and user information; And displaying a result of the customized blogger analysis to the user.

본 발명은 블로거 커뮤니티의 특징 정보를 추출하여 추출된 특징 정보를 이용하여 통시적/공시적 분석 방법을 통해 사용자에게 맞춤 블로거 분석을 제공함으로써, 사용자가 인터넷 상에서 더 많은 정보를 접근하여 양질의 정보를 쉽게 활용할 수 있으며, 사용자가 블로거들 중 자신이 원하는 특징을 가지는 블로거를 선택하여 이들이 생성한 컨텐츠에서 보다 유용한 정보를 얻을 수 있는 효과가 있다. The present invention provides a personalized blogger analysis through a synchronic / public analysis method using feature information extracted by extracting feature information of the blogger community, so that the user can access more information on the Internet to obtain high quality information. It is easy to use, and the user can select more bloggers with the desired characteristics among the bloggers, thereby obtaining more useful information from the generated content.

또한, 본 발명은 일반적으로 인터넷 검색을 통해 얻기 어려운 커뮤니티 정보를 개인에 맞게 제시할 수 있으므로 사용자가 자신에게 더 유용한 정보블로그나 책/여행지/영화와 같은 아이템을 쉽게 찾을 수 있는 효과가 있다.
In addition, since the present invention can present personalized community information that is generally difficult to obtain through Internet search, the user can easily find an item such as an information blog or a book / destination / movie that is more useful to them.

도 1은 본 발명의 실시예에 따른 사용자 맞춤 블로거 분석 시스템의 구조를 도시한 도면,
도 2는 본 발명의 실시예에 따라 사용자 맞춤 블로거 분석 시스템에서 사용자 맞춤 블로거를 분석하기 위한 방법을 도시한 도면,
도 3은 본 발명의 실시예에 따라 사용자 맞춤 블로거 분석을 위한 군집화의 상세 과정을 도시한 도면.1 is a view showing the structure of a custom blogger analysis system according to an embodiment of the present invention;
2 illustrates a method for analyzing a custom blogger in a custom blogger analysis system according to an embodiment of the present invention;
3 illustrates a detailed process of clustering for custom blogger analysis according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 본 발명을 설명함에 있어, 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, if it is determined that detailed descriptions of related known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

블로그 사용자를 분석하여 이들의 특징을 검색할 수 있다면 사용자는 블로거들 중 자신이 원하는 특징을 가지는 블로거를 선택하여 이들이 생성한 컨텐츠에서 보다 유용한 정보를 얻을 수 있을 것이다. 예를 들어 “20대 서울에 사는 블로거”들을 분석함에 따라 분석된 블로거가 네 개의 커뮤니티를 이루고 있으며, 이러한 커뮤니티들 중 “영화”를 즐겨 보고, “모험, 액션” 장르를 좋아하는 커뮤니티가 있음을 알 수 있다. 이와 같은 정보를 통해 사용자는 특정 제품의 정보를 얻고 싶을 경우, 모든 블로거에서 나타난 특정 제품의 정보가 아니라 자신이 관심 있어 하는 혹은 선택한 특정 사용자들이 제공한(“20대로 서울에 사는 모험 액션 매니아가 제공한”) 정보(“iphone”)를 쉽게 얻을 수 있다. 이러한 블로거 정보의 분석은 사용자가 접근할 수 있는 정보를 양적으로 늘려줄 뿐 아니라 기존에 정보 접근을 위한 검색의 질을 높여 주는 역할도 한다. If the blog users can be analyzed and searched for their characteristics, the user can select bloggers with the desired characteristics among the bloggers and obtain more useful information from the contents generated by them. For example, by analyzing “Bloggers living in Seoul in their 20s”, the analyzed bloggers have four communities, among them are those who enjoy “movie” and like the “adventure, action” genre. Able to know. With this information, if a user wants to get information about a specific product, it is not provided by specific bloggers in all bloggers, but provided by the specific users they are interested in or selected (“It is provided by adventure action enthusiasts living in Seoul in 20s. Information ”(“ iphone ”) can be easily obtained. This analysis of blogger information not only increases the amount of information accessible to users, but also improves the quality of the search for information access.

이와 같은 블로거 정보의 분석을 위해서는 두 가지 기술이 요구되는데, 첫 번째는 블로거들의 정보(개개인의 정보와 생성한 컨텐츠)로부터 커뮤니티를 찾고 찾은 커뮤니티의 특징을 추출하는 기술과 찾아진 커뮤니티들을 사용자에 맞게 분석하는 기술이다. Two technologies are required for the analysis of blogger information. The first is to find a community from bloggers' information (individual information and generated content) and to extract the features of the found community and to customize the found communities. It is a technique to analyze.

이하, 본 발명의 실시예에서는 이와 같은 두 기술을 이용하여 블로거들의 정보를 그들이 생성한 컨텐츠에서 얻어 블로거 커뮤니티와 그 특징을 추출하고, 사용자 시점에서 통시적/공시적 분석 방법을 통해 보여주기 위한 사용자 맞춤 블로거 분석 시스템 및 방법에 대해 구체적으로 설명하기로 한다.Hereinafter, in the embodiment of the present invention using the above two techniques to extract the information of the bloggers from the content generated by them, the blogger community and its features, and the user to show through the method of perceptual / public analysis from the user's point of view A custom blogger analysis system and method will be described in detail.

우선, 본 발명의 실시예에 따른 사용자 맞춤 블로거 분석 시스템에 대해 첨부된 도면을 참조하여 구체적으로 설명하기로 한다. First, a custom blogger analysis system according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 사용자 맞춤 블로거 분석 시스템의 구조를 도시한 도면이다. 1 is a view showing the structure of a user-customized blogger analysis system according to an embodiment of the present invention.

사용자 맞춤 블로거 분석 시스템은 브라우저의 서비스 사용자 인터페이스(Service User Interface 이하, 서비스 UI라 칭함)(10)와 서버(100)로 크게 구분할 수 있다. 또한, 사용자 맞춤 블로거 분석 시스템은 기본적으로 인터넷의 웹 서비스를 사용하며, 서비스 UI(10)와 서버(20)를 인터넷을 통해 소통하도록 연결하며, 실시간 업데이트를 위해 아작스(AJAX : Asynchronous JavaScript and XML)프로토콜을 사용한다. 여기서, 사용자 맞춤 블로거 분석 시스템은 블로거가 제공되는 환경이 존재한다면, 웹뿐만 아니라 어느 환경에서도 사용될 수 있다. The customized blogger analysis system may be largely classified into a service user interface (hereinafter, referred to as a service UI) 10 and a server 100 of a browser. In addition, the custom blogger analysis system basically uses a web service of the Internet, connects the service UI 10 and the server 20 to communicate through the Internet, and Ajax (Asynchronous JavaScript and XML) for real time updating. Use a protocol. Here, the customized blogger analysis system may be used in any environment as well as the web, if there is an environment in which the blogger is provided.

서비스 UI(10)는 분석 UI(11) 및 검색 UI(12)를 포함하며, 사용자 맞춤 블로거 분석이 응용될 인터넷 컨텐츠 서비스(사용자 정보, 블로그 등)의 사용자 인터페이스 부분을 의미한다. 또한, 서비스 UI(10)는 사용자 맞춤 분석을 수행하기 위해 로그인으로 사용자 정보를 서버 파트로 전송하는 역할을 수행한다. 인터넷 컨텐츠 서비스의 메인 서비스 라인은 설명의 편의상 상기 도 1에 도시하지 않았으며, 예를 들어, 검색, 분석 후의 결과 및 각 UI에서의 이벤트에 대한 부분은 생략되어 있다. The service UI 10 includes an analysis UI 11 and a search UI 12, and refers to a user interface portion of an Internet content service (user information, blog, etc.) to which a customized blogger analysis is applied. In addition, the service UI 10 transmits user information to the server part by login in order to perform user-specific analysis. The main service line of the Internet content service is not shown in FIG. 1 for convenience of description, and for example, a part of a result after a search and analysis and an event in each UI are omitted.

검색 UI(11)는 사용자가 브라우저를 통해, 블로거를 검색하는 사용자 인터페이스 부분으로서, 검색 결과를 사용자가 인식할 수 있게 보여주는 역할을 수행한다. 본 발명의 실시예에서는 기존의 시스템에서 사용하는 질의문 입력, 입력 이벤트 등 보편적인 검색 작업은 설명을 생략하기로 한다. The search UI 11 is a portion of a user interface through which a user searches for a blogger through a browser, and serves to show a search result to a user. In the embodiment of the present invention, a general search operation such as a query input or an input event used in the existing system will be omitted.

분석 UI(12)는 사용자가 브라우저를 통해, 자신과 검색을 수행한 사람과의 비교된 결과를 보는 사용자 인터페이스 부분으로서, 서버측에서 통시적/공시적 분석을 통해 나온 분석 결과를 사용자가 살펴볼 수 있도록 보여주는 역할을 수행한다. The analysis UI 12 is a part of the user interface through which a user views a comparison result between himself and the person who performed the search through a browser. The analysis UI 12 allows the user to examine the analysis result obtained through the synchronic / public analysis on the server side. It serves to show.

서버(100)는 블로거 정보 추출부(110), 블로거 커뮤니트 및 특징 추출부(120), 분석부(130), 사용자 정보부(140), 블로거 검색부(150), 블로그 데이터베이스(DB)(160), 태그 검색부(170), 태그 데이터베이스(DB)(180)를 포함하여 구성할 수 있다. The server 100 may include a blogger information extraction unit 110, a blogger community and feature extraction unit 120, an analysis unit 130, a user information unit 140, a blogger search unit 150, and a blog database (DB) 160. ), The tag search unit 170 and the tag database (DB) 180 can be configured.

블로거 정보 추출부(110)는 블로거들의 회원 정보 및 블로거의 관심 컨텐츠 및 포스팅 등의 블로거 정보를 블로거 검색부(150) 및 태그 검색부(170)로부터 추출하고, 추출된 블로거 정보들을 블로그 커뮤니티 및 특징 추출부로 전달한다. 여기서 블로거 정보(구축 정보)들은 정적 정보와 동적 정보로 구분할 수 있다. 정적 정보는 블로그 생성 시, 작성하는 회원 정보(나이, 이름, 성별 등)이며, 기존의 검색/분석 방법에서도 활용되어 왔다. 동적 정보는 블로거가 어떠한 제품에 관한 컨텐츠를 생성하였으며, 어떤 관심 영역을 가지는지 등으로 볼 수 있는데, 이러한 동적 정보는 블로거가 컨텐츠에 태그를 부여할 경우 쉽게 얻을 수 있다. 본 발명의 실시예에서는 블로거가 생성한 컨텐츠에 다양한 정보를 부과할 수 있는 의미 태그가 부과되어 있다고 보고 블로거 추출부(110)에서 동적 및 정적 정보를 추출함으로써 블로거 정보를 추출하게 된다.The blogger information extracting unit 110 extracts blogger information such as member information of the bloggers, content of the blogger, and postings from the blogger search unit 150 and the tag search unit 170, and extracts the extracted blogger information from the blog community and features. Transfer it to the extraction unit. The blogger information (build information) can be divided into static information and dynamic information. Static information is the member information (age, name, gender, etc.) to be created when creating a blog, and has been used in existing search / analysis methods. The dynamic information can be viewed as to what product the blogger has created and what area of interest they have. Such dynamic information can be easily obtained when the blogger tags the content. According to an embodiment of the present invention, the semantic tag that imposes various information is imposed on the content generated by the blogger, and the blogger extracting unit 110 extracts the blogger information by extracting the dynamic and static information.

블로거 커뮤니티 및 특징 추출부(120)는 블로거 정보 추출부(110)에서 추출된 블로거 정보들로부터 검색된 블로거들의 커뮤니티 및 특징을 추출하여 분석부(130)로 추출된 블로거 커뮤니티 및 특징 정보를 전달한다. 또한, 블로거 커뮤니티 및 특징 추출부(120)는 추출된 블로거 정보들의 특성에 따라 미리 설정된 타입으로 블로거 정보의 집합을 분류하고, 블로거 정보의 집합별로 거리값을 계산 및 계산된 거리값을 통합하여 얻은 최종 계산된 거리값을 통해 블로거 커뮤니티 및 특징 정보를 추출한다. 블로거 커뮤니티는 공시적인 커뮤니티와 암시적인 커뮤니티로 나뉜다. 예를 들어 특정 동호회에 속한 블로거들은 공시적인 커뮤니티에 속해있다 할 수 있다. 여기서 암시적인 커뮤니티는 블로거가 공시적으로 지정하지 않은 블로거 집단을 말한다. 검색을 통해 얻어진 블로거들이 공시적인 몇몇 커뮤니티에 속할 확률은 매우 낮기 때문에 암시적인 커뮤니티를 발견하고 이들의 특징을 추출하는 것은 실제 서비스 측면에서 매우 중요하다. 따라서 본 발명의 실시예에서는 암시적인 커뮤니티를 블로거들의 정적, 동적 정보를 통해 찾고 이들의 특징으로 커뮤니티에 속한 사용자들에서 중요성을 가지고 나타나는 정보로 정의한다. The blogger community and feature extractor 120 extracts the community and feature of the bloggers searched from the blogger information extracted by the blogger information extractor 110 and delivers the extracted blogger community and feature information to the analyzer 130. In addition, the blogger community and feature extractor 120 classifies the set of blogger information into a preset type according to the characteristics of the extracted blogger information, calculates the distance value for each set of blogger information, and integrates the calculated distance value. Blogger community and feature information is extracted from the final calculated distance value. The blogger community is divided into a public community and an implicit community. For example, bloggers belonging to a specific club may belong to a public community. An implicit community here refers to a group of bloggers that are not publicly designated by the blogger. Since bloggers obtained through search are very unlikely to belong to some public community, finding implicit communities and extracting their characteristics is very important in terms of actual service. Therefore, in the embodiment of the present invention, the implicit community is searched through the static and dynamic information of the bloggers, and defined as information that has importance in users belonging to the community as their characteristics.

분석부(130)는 사용자 정보부(140)로부터 전달된 사용자의 정보와 블로그 커뮤니티 및 특징 추출부(120)로부터 전달된 블로그 커뮤니티 및 특징 정보를 바탕으로 미리 설정된 분석 방법(통시적/공시적 분석 방법)으로 사용자 맞춤 블로거 분석을 수행하고, 분석된 결과를 인터넷과 AJAX 프로토콜을 통해 사용자의 분석 UI(11)로 전달한다. The analysis unit 130 is a preset analysis method based on the user information transmitted from the user information unit 140 and the blog community and feature information transmitted from the blog community and feature extraction unit 120 (temporal / public analysis method). ) Performs a customized blogger analysis, and transmits the analyzed result to the analysis UI 11 of the user through the Internet and the AJAX protocol.

사용자 정보부(140)는 분석부(130)에서 필요로 하는 사용자의 정보들을 관리한다. The user information unit 140 manages user information required by the analysis unit 130.

블로거 검색부(150)는 블로그 DB(160)를 통해 블로거들을 검색 즉, 특정 블로그로부터 블로거의 정보 및 관련 포스팅 검색하여 검색된 블로거들을 블로그 정보 추출부(110)로 전달한다.The blogger search unit 150 searches for bloggers through the blog DB 160, that is, retrieves blogger information and related posts from a specific blog, and delivers the searched bloggers to the blog information extraction unit 110.

태그 검색부(170)는 태그 데이터베이스(180)로부터 블로거 정보 추출에 필요한 태그를 검색한다. The tag search unit 170 searches for a tag for extracting blogger information from the tag database 180.

태그 데이터베이스(180)는 어떤 태그들이 어떤 조합으로 하나의 태그 집합을 형성했는지 모든 기록을 보유하고 있는 레코드이다.
The tag database 180 is a record that holds all records of which tags form a set of tags in any combination.

그러면, 이와 같은 구조를 갖는 사용자 맞춤 블로거 분석 시스템에서 통시적/공시적으로 사용자 맞춤 블로거를 분석하기 위한 방법을 첨부된 도면들을 참조하여 구체적으로 설명하기로 한다. Next, a method for analyzing the custom blogger in a custom / blogger analysis system having such a structure will be described in detail with reference to the accompanying drawings.

기존의 시스템에서는 기 추출된 커뮤니티 및 특징 정보를 접근할 수 없으므로 이러한 정보를 사용자가 활용하기 위해서는 사용자가 요구하는 정보의 특성에 맞는 커뮤니티를 찾고 이들 커뮤니티에서 생성한 컨텐츠를 보여주여야 한다. 따라서 본 발명의 실시예에서는 추출된 커뮤니티 및 특징 정보의 활용을 극대화하기 위해 추출된 커뮤니티의 정보를 사용자 정보와 통시적/공시적인 비교를 통해 사용자로 하여금 쉽게 그 차이점과 공통점을 찾을 수 있도록 하는 사용자 맞춤 블로거 분석 방법을 설명하기로 한다. In the existing system, the extracted community and feature information cannot be accessed. Therefore, in order to utilize the information, the user must find a community that matches the characteristics of the information required by the user and show the contents generated by the community. Therefore, in the embodiment of the present invention, the user can easily find the difference and commonality through the user's information and the chronological / public comparison of the extracted community information to maximize the utilization of the extracted community and feature information. We will explain how to analyze custom bloggers.

사용자 맞춤 블로거 분석 방법은 크게 블로거의 암시적 커뮤니티 및 특징 추출 과정 및 사용자 맞춤 통시적/공시적 분석 과정으로 구분할 수 있다. 여기서 암시적인 커뮤니티는 블로거들의 회원 정보와 이벤트 정보를 바탕으로 유사한 정보를 가지고 있는 사람들끼리 묶여진 군집을 말한다. The customized blogger analysis method can be largely divided into the implicit community and feature extraction process of the bloggers and the customized poetic / public analysis process. Here, the implicit community refers to a group of people who have similar information based on bloggers' member information and event information.

블로거의 암시적 커뮤니티 및 특징 추출 과정은 블로거가 가지고 있는 정보의 집합을 4가지로 분류하고, 각 집합별로 다른 계산 방법을 사용하여 거리 값을 측정한다. 이렇게 계산되어진 4가지의 거리 값들을 합하여서 단일 수치의 거리 값을 얻는다. 그런 다음 얻어진 거리 값을 기준으로 작은 거리 값을 가지는 블로거들끼리 같은 군집으로 배정한다. 이에 대한 구체적인 설명은 이하, 첨부된 도 3을 참조하여 설명하기로 한다. The implicit community and feature extraction process of bloggers categorizes the blogger's information sets into four types and measures the distance values using different calculation methods for each set. The four distance values thus calculated are added together to obtain a single distance value. Then, based on the obtained distance values, bloggers with small distance values are assigned to the same cluster. Detailed description thereof will be described below with reference to FIG. 3.

사용자 맞춤 통시적/공시적 분석 과정은 다수의 블로거들로 구성된 커뮤니티와 그 커뮤니티로부터 추출된 특징들을 이용하여 분석하고자 하는 대상이 되는 사용자의 특징을 통시적/공시적 관점에서 분석한다. The user-defined poetic / public analysis process analyzes the characteristics of the user to be analyzed using the community composed of multiple bloggers and the features extracted from the community from the perspective of the poetic / public.

이와 같은 과정들을 보다 구체적으로 설명하면 다음과 같다. The following describes these processes in more detail.

도 2는 본 발명의 실시예에 따라 사용자 맞춤 블로거 분석 시스템에서 사용자 맞춤 블로거를 분석하기 위한 방법을 도시한 도면이다. 2 is a diagram illustrating a method for analyzing a custom blogger in a custom blogger analysis system according to an embodiment of the present invention.

상기 도 2를 참조하면, 210단계에서 사용자 맞춤 블로거 분석 시스템의 서버(100)는 블로거 검색부(150)를 통해 블로그 DB(160)에서 블로거들 및 블로거들의 태그를 검색하고, 검색된 블로거들로부터 블로거 정보를 추출한다. 여기서 블로거의 정보는 기본적으로 처음 블로그를 생성할 때 입력된 나이, 이름, 성별 등의 회원 정보와 사용자가 블로그를 사용하면서 등록하는 컨텐츠로부터 추출되는 사용자의 경험에 관한 이벤트 정보가 있다. 회원 정보는 시간에 흐름이나 사용자의 경험과는 무관하게 고정되어 있는 정적 정보이고, 이벤트 정보는 시간이 흐름에 따라 정보가 더 추가되거나 수정되는 동적 정보이다. 이러한 동적 정보에는 예를 들어 사용자가 등록한 컨텐츠로부터 추출할 수 있는 읽은 책의 장르, 저자, 출판사, 본 영화의 장르, 감독, 배우, 구매한 IT기기의 종류, 제조사 등의 정보가 있다. 본 발명의 실시예에서는 이와 같은 정적, 동적 정보들이 온톨로지로 구축이 되어 있고, 구축된 온톨로지를 정보 추출 및 계산에 용이하도록 벡터형태로 표현하여 이용한다. 벡터로 표현 할 때 정보별로 3단계의 Depth로 구분하였다. 여기서, Depth 1은 정적인 정보들인 회원정보이고, Depth 2는 각 이벤트의 타입과 이벤트의 날짜이며, Depth 3은 각 이벤트의 오브젝트이다.Referring to FIG. 2, in step 210, the server 100 of the user-customized blogger analysis system searches for the bloggers and the tags of the bloggers in the blog DB 160 through the blogger search unit 150, and the bloggers from the searched bloggers. Extract the information. The information of the blogger basically includes member information such as age, name, gender, etc. input when the blog is first created, and event information about the user's experience extracted from content registered by the user using the blog. Member information is static information that is fixed regardless of the flow or user's experience in time, and event information is dynamic information in which more information is added or modified over time. Such dynamic information includes, for example, information about the genre of the book, the author, the publisher, the genre of the film, the director, the actor, the type of the IT device purchased, and the manufacturer, which can be extracted from the content registered by the user. In an embodiment of the present invention, such static and dynamic information is constructed as an ontology, and the constructed ontology is represented and used in a vector form to facilitate information extraction and calculation. When expressed as a vector, the information is divided into three levels of depth. Here, Depth 1 is member information which is static information, Depth 2 is the type of each event and the date of the event, and Depth 3 is the object of each event.

이와 같이 한명의 사용자에 대해서는 추출된 각 정보들을 자질로 가지는 벡터를 생성하여 표현하고, 각각의 이벤트에 대해서는 따로 오브젝트 모듈을 생성하여 표현한다.As described above, one user generates and expresses a vector having each extracted information as a feature, and generates and expresses an object module for each event separately.

이후, 220단계에서 서버(100)는 블로거 커뮤니트 및 특징 추출부(110)를 통해 추출된 블로거 정보들로부터 블로거 커뮤니티를 찾고, 찾은 블로거 커뮤니티들의 특징 정보를 추출한다. 이러한 블로거 커뮤니티 및 커뮤니티 특징 정보를 추출하는 구체적인 과정은 이하에서 첨부된 도 3을 참조하여 설명하기로 한다. In operation 220, the server 100 finds a blogger community from blogger information extracted through the blogger community and the feature extractor 110, and extracts feature information of the found blogger communities. A detailed process of extracting such blogger community and community feature information will be described below with reference to FIG. 3.

다음으로 230단계에서 서버(100)는 분석부(130)를 통해 블로거 커뮤니트 및 특징 추출부(110)에서 다수의 블로거들로 구성된 커뮤니티와 그 커뮤니티로부터 추출된 특징들을 이용하여 분석하고자 하는 대상이 되는 사용자의 특징을 통시적/공시적 관점에서 분석한다. 여기서 공시적 분석 및 공시적 분석에 대한 의미는 하기에서 구체적으로 설명하기로 한다. Next, in step 230, the server 100 analyzes the community composed of a plurality of bloggers and the features extracted from the community by the blogger community and the feature extraction unit 110 through the analysis unit 130. Analyze the characteristics of the users from the point of view. Here, the meanings of the syntactic analysis and the syntactic analysis will be described in detail below.

이후, 240단계에서 서버(100)는 서비스 UI(10)의 분석 UI(11)로 분석 결과를 전달한다. 이에 따라 서비스 UI(10)의 분석 UI(11)는 사용자가 분석 결과를 파악할 수 있도록 전달받은 분석 결과를 표시한다. In operation 240, the server 100 transmits the analysis result to the analysis UI 11 of the service UI 10. Accordingly, the analysis UI 11 of the service UI 10 displays the analysis result received so that the user can grasp the analysis result.

상기 230단계에서 설명되어진 공시적 분석 및 통시적 분석에 대해 구체적으로 설명하기로 한다. The public analysis and the historical analysis described in step 230 will be described in detail.

우선, 공시적 분석은 시간과 관계없이 블로거 커뮤니티 및 특징 정보와 사용자 정보를 각각의 하나의 데이터로 보고, 블로거 커뮤니티 및 특징 정보와 사용자 정보를 비교하여 사용자의 특징을 분석 즉, 두 데이터 유형의 거리차를 바탕으로 이들 사이의 특징을 분석하는 방법이다. 예를 들어, 사용자와 각 커뮤니티 간의 위치에 따른 분석, 사용자의 이벤트 빈도에 따른 사용자의 특징 분석, 소셜 네트워크 의 총 3가지 관점의 분석 방법이 있다. First, public analysis analyzes blogger community and feature information and user information as one data regardless of time, and compares blogger community and feature information and user information to analyze user characteristics, that is, the distance between two data types. It is a way to analyze the characteristics between them based on the difference. For example, there are three types of analysis methods: analysis based on the location between the user and each community, user characteristic analysis according to the user's event frequency, and social network.

사용자와 각 커뮤니티 간의 위치에 따른 분석 방법은 사용자의 특징이 사용자와 가까운 거리에 존재하는 커뮤니티의 특징과 유사할 것이라는 사실을 바탕으로, 각 커뮤니티들과 사용자의 거리를 비교하여 그 위치를 바탕으로 특징을 분석하는 방식이다. 이를 통해 사용자의 특징을 쉽게 분석할 수 있을 뿐만 아니라, 과연 사용자가 비교 대상이 되는 사람들의 전형에 속하는지를 쉽게 판단할 수 있다. 이는 2차원 평면에 각 커뮤니티 그룹과 사용자의 위치를 거리에 비례하게 배치하여 표현함으로써 손쉽게 보일 수 있다. 이때, 사용자와 각 커뮤니티간의 거리는 각 커뮤니티의 이벤트와 나의 이벤트 정보를 바탕으로 계산되며, 데이터의 유형은 상기 Depth 1, 2, 3과 같다.The analysis method based on the location between the user and each community is based on the fact that the user's characteristics will be similar to the characteristics of the community that are close to the user. This is how you analyze. This not only makes it easy to analyze the characteristics of the user, but also makes it easy to determine whether the user belongs to the types of people to be compared. This can be easily seen by arranging each community group and user's position in a two-dimensional plane in proportion to the distance. In this case, the distance between the user and each community is calculated based on the events of each community and my event information, and the types of data are as Depth 1, 2, and 3 above.

사용자의 이벤트 빈도에 따른 사용자의 특징 분석 방법은 이벤트의 빈도에 따른 특성 분석으로, 동일한 이벤트 유형에 따라 빈도수를 비교함으로써 분석하는 방법이다. 사용자의 이벤트 빈도에 따른 사용자의 특징 분석 방법은 이벤트 또는 오브젝트에 대한 전체적인 빈도수로 그 대상에 대한 관심도를 측정할 수 있으며, 이를 통하여 커뮤니티가 특정 영역에 가지는 관심도와 내가 가지는 관심도의 차이를 바탕으로 나의 특징을 분석할 수 있다. 사용자와 커뮤니티간의 차이는 막대 그래프로 각 커뮤니티의 이벤트 빈도수와 사용자의 빈도수를 나열함으로써 쉽게 격차를 파악할 수 있다. 이를 바탕으로 전반적으로 비슷한 관심도를 가지고 있는지, 또는 전혀 다른 관심도를 가지고 있는지를 파악할 수 있다.The user's feature analysis method according to the user's event frequency is a feature analysis according to the frequency of the event, and the method is analyzed by comparing the frequency according to the same event type. The user's feature analysis method according to the frequency of the user's event can measure the interest of the object by the overall frequency of the event or object, and through this, based on the difference between the interest of the community in a specific area and the interest of me Analyze features. The difference between users and communities can be easily identified by listing the frequency of events in each community and the frequency of users in a bar graph. Based on this, it is possible to determine whether they have similar interests in general or have completely different interests.

소셜 네트워크 방법은 커뮤니티에 속한 구성원들과 사용자의 관계를 다양한 이벤트 또는 오브젝트들을 대상으로 소셜 네트워크를 구축한다. 관계의 대상은 여행, 책, IT기기 등 다양한 도메인들이 될 수 있고, 또는 특정 제품이 될 수도 있다. 이렇게 세부적인 관계로 이루어진 소셜 네트워크를 바탕으로 커뮤니티에 속한 구성원 각각과의 관계를 파악할 수 있다. 2차원 평면상에 각 구성원 및 사용자는 노드를 의미하며, 이러한 노드들은 관계선으로 연결하여 보여줌으로써 그들간의 관계를 표현할 수 있다.The social network method builds a social network based on various events or objects in relation to members and users belonging to a community. The relationship can be a variety of domains, such as travel, books, IT devices, or a specific product. Based on these detailed social networks, we can grasp the relationship with each member of the community. Each member and user on a two-dimensional plane means a node, and these nodes can be represented by connecting them by a relationship line.

다음으로, 통시적 분석은 커뮤니티와 사용자의 정보를 시간의 흐름에 따라 달라지는 연속적인 데이터로 보고 시간에 따른 이벤트의 흐름을 바탕으로 사용자와 커뮤니티의 특징을 분석하는 방법이다. 예를 들어, 트랜드와 같은 관점의 분석 방법이 있다. Next, the temporal analysis is a method of analyzing the characteristics of the user and the community based on the flow of events over time by viewing the information of the community and the user as continuous data that changes over time. For example, there is an analysis method from the perspective of trend.

트렌드는 사용자와 커뮤니티간의 특정 하나의 오브젝트 또는 이벤트에 대해서 시간이 흐름에 따라 관심도가 어떻게 변해 가는지를 보여준다. 이를 통해 최근 각 커뮤니티 구성원들이 어떤 부분에 관심을 가지고 있으며, 사용자가 그 구성원들의 관심도 흐름을 얼마나 쫓아가고 있는지를 비교, 분석할 수 있다. 트랜드는 커뮤니티와 사용자의 시간별 이벤트의 비율을 바탕으로 서로의 유사도를 비교 분석하여 사용자와 커뮤니티의 관심 흐름이 어느 정도의 격차를 가지고 유사한지를 밝혀내어 분석에 사용한다. 이때 사용하는 유사도는 하기 4가지 타입에 대해 거리 값을 계산하는 방법과 동일한 방법을 사용한다. Trends show how interest changes over time for a particular object or event between the user and the community. This allows you to compare and analyze what areas each community member is interested in recently and how users are following the flow of their interests. Trend analyzes the similarity of each other based on the ratio of community and user's time-based events, and finds out how much the interest flows between users and the community are similar and uses them in the analysis. The similarity used here is the same as the method for calculating the distance value for the following four types.

이러한 트랜드를 통해 분석부(130)는 사용자의 각 구간대별 데이터와 커뮤니티의 각 구간대별 가장 유사한 쌍을 찾아내어 일치하는 달들을 찾아내고, 이를 통해 사용자가 후행 또는 선행하는지를 분석하게 된다. 이는 한 평면에 두 개의 꺾은선 그래프를 동시에 배치하여 그려줌으로써 표현할 수 있으며, 이를 통해 두 개의 이벤트 흐름이 얼마나 유사한지를 쉽게 파악할 수 있다.
Through such a trend, the analysis unit 130 finds the most similar pairs of data of each section of the user and each section of the community, finds matching months, and analyzes whether the user follows or precedes. This can be expressed by drawing and drawing two line graphs on one plane at the same time, which makes it easy to see how similar two event flows are.

상술한 바와 같은 사용자 맞춤 블로거를 분석하기 위한 과정에서 220단계의 블로거 커뮤니티 및 커뮤니티의 특징 정보를 추출하는 구체적인 과정을 첨부된 도 3을 참조하여 설명하기로 한다. A detailed process of extracting the blogger community and the feature information of the community in step 220 in the process of analyzing the user-customized blogger as described above will be described with reference to FIG. 3.

310단계에서 블로거 커뮤니티 및 특징 추출부(120)는 추출된 블로거 정보들의 집합을 특성에 따라 미리 설정된 타입들로 분류한다. 여기서 블로거의 정보들은 값이 표현되는 형태가 다를 뿐 아니라 반영하는 사용자의 특성에도 차이가 있으므로 본 발명의 실시예에서는 숫자 타입(Numeric Type), 카테고리 타입(Category Type), 문자열 타입(String Type), 가우시안 타입(Gaussian Type)과 같은 네 가지 타입으로 분류할 수 있으며, 이러한 타입들 외에 블로거 정보들의 특성에 따라 다른 형태의 타입을 적용할 수도 있다. In step 310, the blogger community and feature extractor 120 classifies the set of extracted blogger information into types that are preset according to characteristics. Here, the information of the blogger is not only different in the form in which the value is expressed but also in the characteristics of the user to reflect, so in the embodiment of the present invention, the numeric type, the category type, the string type, the string type, Four types, such as a Gaussian type, may be classified. In addition to these types, other types may be applied according to characteristics of blogger information.

숫자 타입(Numeric Type)은 숫자로 된 값을 가지고 있는 데이터 (ex>나이, 키)를 나타내며, 카테고리 타입(Category Type)은 선택 가능한 몇 개의 항목 중 하나의 값을 가지는 데이터 (ex>성별, 지역)를 나타낸다. 문자열 타입(String Type)은 문자열로 된 값을 가지고 있는 데이터 (ex>E-mail, 홈페이지)를 나타내며, 가우시안 타입(Gaussian Type)은 여러 개의 차원이 하나의 집합을 이루어 분포를 가지는 데이터 (ex>책 저자, 기기 제조사)를 나타낸다. Numeric type represents data with numeric value (ex> age, key), and Category type represents data with one of several selectable items (ex> gender, region ). String type represents data (ex> E-mail, homepage) that has a value in string, and Gaussian type represents data having distribution with multiple sets of one dimension. Book author, device manufacturer).

상기 4가지 타입으로 분류된 블로거의 정보들은 서로 표현되는 값의 형태와 나타내는 특성이 다르므로 하나의 기준으로 거리 값을 계산 할 수 없다. 따라서 320단계에서 블로거 커뮤니트 및 특징 추출부(120)는 각 분류 별로 다음과 같이 거리 값을 계산하고, 계산된 값을 통합한다. 여기서 계산되는 거리값은 개개의 블로거들 사이의 거리를 의미한다. The information of the bloggers classified into the four types is different from each other in the form of the values represented and the characteristics thereof, so that the distance value cannot be calculated based on one criterion. Therefore, in step 320, the blogger communicator and the feature extractor 120 calculate a distance value for each classification and integrate the calculated values as follows. The distance value calculated here means the distance between individual bloggers.

문자 타입은 하기 <수학식 1>을 이용하여 두 값의 차이를 구한다. 여기서, 은 블로거 속성이 가지는 값, 는 i 번째 군집, k는 해당 군집안의 블로거의 수를 나타낸다. For the character type, the difference between two values is obtained using Equation 1 below. Where is the value of the blogger attribute, is the i th cluster, and k is the number of bloggers in the cluster.

[수학식 1][Equation 1]

카테고리 타입은 하기 <수학식 2>를 이용하여 두 값이 같은지 비교한다. 여기서, 는 지시 함수(indicator function)를 나타낸다. The category types are compared by using the following Equation 2 for equality. Where denotes an indicator function.

[수학식 2][Equation 2]

문자열 타입은 하기<수학식 3>을 이용하여 두 문자열 값이 얼마나 다른지 비교한다. For the string type, the difference between two string values is compared using Equation 3 below.

[수학식 3]
[Equation 3]

가우시안 타입은 하기 <수학식 4>를 이용하여 해당 차원의 집합 크기를 고려하여 두 값이 같은 정도를 계산한다. 여기서, 는 군집 의 현재 비교하는 Gaussian Type 집합의 차원 수를 나타낸다. The Gaussian type calculates the degree to which the two values are equal by considering the set size of the corresponding dimension using Equation 4 below. Where is the number of dimensions of the Gaussian Type set currently being compared in the cluster.

[수학식 4]&Quot; (4) "

이렇게 각 분류된 집합별로 따로 계산한 거리 값을 그대로 이용하면, 분류된 집합별로 정보의 개수도 다르고, 고려해야할 크기도 다르기 때문에 정확한 정보를 추출할 수 없게 된다. 예를 들어, 숫자 타입의 정보는 나이, 키, 몸무게 등의 수치적인 정보로 개수가 그렇게 많지 않다. 반면, 가우시안 타입의 정보는 분포로 나타날 수 있는 모든 정보, 장르별, 배우별, 감독별, 작가별 정보가 하나의 데이터가 되므로 수 없이 많다. 따라서 각 타입별로 같은 가중치를 두게 된다면 개수가 적은 타입, 즉 숫자 타입과 같은 값들에 영향을 크게 받게 된다. If the distance value calculated for each classified set is used as it is, the number of information for each classified set is different and the size to be considered is different, so that accurate information cannot be extracted. For example, numeric type information is numerical information such as age, height, and weight, and the number is not so large. On the other hand, Gaussian-type information is numerous because all information, genre, actor, director, and writer can be represented as a single data. Therefore, if each type is given the same weight, the number of types, that is, the number type, is greatly affected.

이에 따라 330단계에서 블로거 커뮤니티 및 특징 추출부(120)는 계산된 각 거리 값들은 하기 <수학식 5>와 같이 각 거리 값들에 각각 가중치를 적용하여 최종 거리값을 계산하여 분류된 집합별 거리값을 통합한다. 여기서, 는 각각 숫자, 카테고리, 문자열, 가우시안별 가중치를 나타낸다. Accordingly, in step 330, the blogger community and the feature extractor 120 calculate the final distance values by applying weights to the distance values, respectively, as calculated by the distance values for each set, as shown in Equation 5 below. Integrate. Here, denotes weights for numbers, categories, strings, and Gaussians, respectively.

[수학식 5][Equation 5]

이렇게 계산된 최종 거리값이 계산되면, 340단계에서 블로거 커뮤니트 및 특징 추출부(120)는 최종 거리값을 기준으로 작은 거리 값을 가지는 블로거들끼리 같은 군집으로 배정하는 블로거들의 군집화를 수행한다. 즉, 블로거의 커뮤니티(암시적 커뮤니티)를 찾는다. When the calculated final distance value is calculated, in step 340, the blogger community and the feature extracting unit 120 perform the clustering of bloggers who assign bloggers having the small distance value to the same cluster based on the final distance value. That is, find the blogger's community (implicit community).

그런 다음 350단계에서 블로거 커뮤니티 및 특징 추출부(120)는 찾은 블로거 커뮤니티의 특징 정보를 추출한다. Then, in step 350, the blogger community and feature extractor 120 extracts feature information of the found blogger community.

상술한 바와 같은 블로거들의 군집화에 대해 첨부된 도 4를 참조하여 보다 구체적으로 설명하기로 한다. The clustering of the bloggers as described above will be described in more detail with reference to FIG. 4.

일반적으로 블로거들을 군집화하는데 사용되는 정보의 개수가 아주 많다. 책에 관한 이벤트 하나만 생각해 보더라도 수 천 가지 책 장르가 있고, 수 만 명의 책 저자가 존재한다. 이 모든 정보들에 대해서 군집화를 수행하게 되면 고려해야 할 정보의 개수는 많고, 실제로 존재하는 데이터는 적어서 데이터 성김 현상이 발생한다. In general, the number of information used to cluster bloggers is very large. There are thousands of book genres, and tens of thousands of book authors. When clustering all of the information, the number of information to be considered is large, and the actual data is small, resulting in data sparsity.

이를 해결하기 위해서 군집화를 할 때 주어진 모든 정보에 대해서 거리 값을 구하는 것이 아니라 각 군집별로 특성이 잘 드러나는 정보를 찾아서 그에 해당하는 정보만을 가지고 거리값을 구한다. 즉, 찾은 블로거 커뮤니티와 사용자 사이의 거리값을 구한다. 이는 데이터 성김 현상의 문제에 대한 해결책도 되지만 각 군집별로 더 의미 있는 정보를 찾아준다는 점에서 큰 의미가 있다.In order to solve this problem, instead of finding distance values for all the information given when clustering, find the information that shows characteristics of each cluster well and obtain the distance value using only the corresponding information. That is, the distance value between the found blogger community and the user is obtained. This is a solution to the problem of data sparsity, but it is significant in that it finds more meaningful information for each cluster.

상기 도 4를 참조하면, 410단계에서 블로거 커뮤니트 및 특징 추출부(110)는 군집화를 수행하기 위해 블로거 정보들을 벡터로 변환한다. 벡터로 변환하는 방법은 각 정보들의 값을 각 차원별 데이터로 넣는다. 예를 들어, 블로거의 이메일(email)이 aa@aa.com이라면, 이메일이라는 차원에 값으로 aa@aa.com이 들어가는 것을 의미한다.Referring to FIG. 4, in step 410, the blogger community and feature extractor 110 converts the blogger information into a vector to perform clustering. The method of converting into a vector puts the values of each information into data of each dimension. For example, if a blogger's email is aa@aa.com, it means that aa@aa.com is a value for email.

다음으로 420단계에서 블로거 커뮤니트 및 특징 추출부(110)는 제1 알고리즘(K-means Clustering 알고리즘)을 이용하여 원하는 개수(K * A 개)의 군집을 생성한다. 여기서 A는 사용자 지정 상수이며, 초기에 작은 크기의 군집을 몇 개나 생성할 것인가를 결정할 수 있도록 한다. 제1 알고리즘은 초기에 블로거들을 작은 크기의 군집으로 묶어주는 역할을 하며, 생성된 모든 벡터 차원을 고려하여 거리 값을 계산하는 알고리즘이다. Next, in step 420, the blogger communicator and the feature extracting unit 110 generate a desired number (K * A) clusters using a first algorithm (K-means Clustering algorithm). Where A is a user-defined constant that allows you to determine how many smaller clusters to create initially. The first algorithm initially serves to group the bloggers into small clusters, and calculates distance values in consideration of all generated vector dimensions.

430단계에서 블로거 커뮤니트 및 특징 추출부(110)는 생성된 군집별로 평균과 분산을 이용하여 분산이 낮고 평균이 높은 M 개의 벡터 차원을 검색한다. 여기서 분산이 낮고 평균이 높은 차원을 의미가 있는 차원이라 가정한다. In step 430, the blogger communicator and feature extractor 110 searches for M vector dimensions having a low variance and a high mean using the mean and the variance for each generated cluster. It is assumed here that the dimension with low variance and high mean is a meaningful dimension.

440단계에서 블로거 커뮤니트 및 특징 추출부(110)는 검색된 차원만을 고려하여 제2 알고리즘(Hierarchical Clustering 알고리즘)을 이용하여 하나의 군집으로 병합한다. 여기서 제2 알고리즘은 제1 알고리즘으로 묶여진 작은 크기의 군집들 중에서 서로 유사한 군집을 찾아서 군집과 군집을 묶어주는 역할을 한다. 제2 알고리즘에서는 각 군집별로 의미가 있는 벡터차원을 찾아서 선택된 차원만으로 거리 값을 계산하여 군집화를 수행한다.In step 440, the blogger community and feature extractor 110 merges the data into one cluster using a second algorithm (Hierarchical Clustering Algorithm) in consideration of only the searched dimension. Here, the second algorithm finds similar clusters among clusters of small sizes bound by the first algorithm and binds the clusters with the clusters. In the second algorithm, clustering is performed by finding a meaningful vector dimension for each cluster, calculating a distance value using only the selected dimension.

450단계에서 블로거 커뮤니트 및 특징 추출부(110)는 각 군집별로 분산이 낮고 평균이 높은 M 개의 벡터 차원을 다시 찾고, 460단계에서 찾아진 차원만을 고려하여 각 군집별로 블로거들을 다시 배정한다. In step 450, the blogger communicator and feature extractor 110 finds M vector dimensions having a low variance and a high mean for each cluster, and reassigns bloggers for each cluster in consideration of only the dimensions found in step 460.

이후, 470단계에서 블로거 커뮤니트 및 특징 추출부(110)는 K개의 군집이 남았는지를 확인하여 K개의 군집이 남은 경우에는 군집화를 위한 동작을 종료하고, 그렇지 않은 경우에는 440단계로 되돌아가서 이후 단계들을 반복 수행한다. Thereafter, in step 470, the blogger communicator and the feature extracting unit 110 check whether K clusters remain and terminate the operation for clustering if K clusters remain, otherwise return to step 440 afterwards. Repeat the steps.

이와 같은 군집화의 결과로 본 발명의 실시예에서는 K개의 군집을 얻을 수 있으며, 각 군집은 해당 군집에 포함된 블로거, 군집 전체의 평균, 분산, M개의 선택된 차원, 선택된 차원별 평균과 분산을 얻을 수 있다. 이때, 선택된 M개의 차원은 각 군집별로 분산이 낮고 평균이 높은 차원들로 해당 군집의 블로거들이 선택된 차원에 대한 비슷한 값을 가지고 있으므로 의미 있는 차원이 된다.As a result of such clustering, K clusters may be obtained in an embodiment of the present invention, and each cluster may have a blogger included in the cluster, an average of the entire cluster, a variance, M selected dimensions, and an average and variance of each selected dimension. Can be. At this time, the selected M dimensions are low in variance and high in average for each cluster, so that the bloggers in the cluster have similar values with respect to the selected dimension.

이렇게 얻어진 군집은 기존의 검색에서는 볼 수 없는 많은 정보를 포함하고 있으며, 특정 이벤트들 간에 선호하는 관계나 시간에 따른 경향을 볼 수 있다. The clusters thus obtained contain a lot of information that cannot be found in the existing search, and you can see the preferred relationship or trends over time between specific events.

한편, 본 발명의 상세한 설명에서는 구체적인 실시예에 관하여 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 안되며 후술하는 발명청구의 범위뿐 만 아니라 이 발명청구의 범위와 균등한 것들에 의해 정해져야 한다.
On the other hand, in the detailed description of the present invention has been described with respect to specific embodiments, various modifications are of course possible without departing from the scope of the invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the scope of the following claims, but also by the equivalents of the claims.

10 : 서비스 UI 11: 분석 UI
12 : 검색 UI 100 : 서버
110 : 블로거 정보 추출부 120 : 블로거 커뮤니트 및 특징 추출부
130 : 분석부 140 : 사용자 정보부
150 : 블로거 검색부 160 : 블로그 DB
170 : 태그 검색부 180 : 태그 DB10: Service UI 11: Analysis UI
12: Search UI 100: Server
110: blogger information extraction unit 120: blogger community and feature extraction unit
130: analysis unit 140: user information unit
150: blogger search unit 160: blog DB
170: tag search unit 180: tag DB

Claims

Classify blogger information of searched bloggers into a set type of preset numeric type, category type, string type, and Gaussian type, calculate distance values for each category of the blogger information, and integrate the calculated distance values for each classification. After calculating the final distance value, and grouping the bloggers based on the calculated final distance value to extract the blogger community, using the extracted blogger community and feature information, the user information and the user information regardless of time And a public analysis method or the user information and characteristics of the blogger community, which report the feature information of the blogger community as one piece of data, and analyze the feature of the user by comparing the user information with the feature information of the blogger community. When reporting information as continuous data that changes over time Based on the flow of events through a diachronic analysis of the method of analyzing the characteristics of the users and the community server to perform customized analysis bloggers; And
And a browser terminal connected to the server through the Internet and having a service UI which receives the analyzed result from the server and displays the analyzed result to the user.

The method of claim 1, wherein the server,
A blogger information extracting unit extracting blogger information of the retrieved bloggers;
Custom blogger analysis system comprising a.

The server according to claim 2,
A blogger searching unit for searching the blogs from the blogs;
A tag search unit for searching a tag necessary for extracting the blogger information; And
Custom blogger analysis system, characterized in that it further comprises a user information for managing user information required for custom blogger analysis.

The method of claim 1,
The server is
A weight is applied to each of the calculated distance values for each classification, and the final distance value is obtained by summing each weighted distance value.
[Equation]
d (m, X _j ) = w _n Nd (m, X _j ) + w _c Cd (m, X _j ) + W _s Sd (m, X _j ) + W _g Gd (m, X _j )
(W _n denotes a weight of the distance function between attributes having a numeric type, w _c denotes a weight of the distance function between attributes having a category type, and w _s denotes a weight of the distance function between attributes having a string type) Where w _g represents a weight of the distance function between attributes having a Gaussian type, and Nd (m, Xj) represents a function for obtaining a distance between attributes having a numeric type, and Cd (m, Xj) represents a category. Represents a function for calculating the distance between attributes, wherein Sd (m, Xj) represents a function for calculating the distance between attributes having a string value, and Gd (m, Xj) represents a function for calculating the distance between attributes having a distribution. .)
Custom blogger analysis system, characterized in that the calculation through the above equation.

delete

The method according to claim 1 or 2,
The blogger information is composed of static information, which is member information input when creating a blog, and dynamic information, which is event information about a user's experience extracted from content registered by the user while using the blog. system.

Extracting blogger information of bloggers retrieved by the server;
The server classifies the extracted blogger information into a set type of a predetermined number type, a category type, a string type, and a Gaussian type, calculates a distance value for each category of the blogger information, and calculates the calculated distance values for each category. Integrating bloggers based on the calculated final distance value and extracting blogger community and feature information to search for blogger community and feature information;
By using the feature information of the blogger community extracted by the server and the user information irrespective of time, the user information and the feature information of the blogger community are viewed as one data, respectively, the user information and the feature of the blogger community. A public analysis method for analyzing user characteristics by comparing information or reporting the user information and the characteristic information of the blogger community as continuous data that changes with time, based on the flow of events over time. Performing a user-specific blogger analysis through a historical analysis method of analyzing a feature of the; And
Displaying, by the browser terminal, a user customized blogger analysis result;
Custom blogger analysis method in a custom blogger analysis system comprising a.

delete

10. The method of claim 9,
The calculating of the final distance value may include applying a weight to each of the calculated distance values and adding the weighted distance values together to obtain the final distance value.
[Equation]
d (m, X _j ) = w _n Nd (m, X _j ) + w _c Cd (m, X _j ) + W _s Sd (m, X _j ) + W _g Gd (m, X _j )
(W _n denotes a weight of the distance function between attributes having a numeric type, w _c denotes a weight of the distance function between attributes having a category type, and w _s denotes a weight of the distance function between attributes having a string type) Where w _g represents a weight of the distance function between attributes having a Gaussian type, and Nd (m, Xj) represents a function for obtaining a distance between attributes having a numeric type, and Cd (m, Xj) represents a category. Represents a function for calculating the distance between attributes, wherein Sd (m, Xj) represents a function for calculating the distance between attributes having a string value, and Gd (m, Xj) represents a function for calculating the distance between attributes having a distribution. .)
Custom blogger analysis method characterized in that the calculation through the above equation.

The method of claim 9, wherein performing clustering of the bloggers comprises:
Converting the blogger information into a vector;
Generating a desired number of clusters by a user using a first algorithm;
Searching for a vector dimension having a low variance and a high mean using the mean and the variance for each generated cluster;
Merging clusters generated by the second algorithm into one cluster, considering only the retrieved vector dimension;
Rescanning the vector dimension having a low variance and a high mean for each of the generated clusters; And
And reassigning bloggers for each of the clusters using only the re-searched vector dimensions.

The method of claim 13,
The clustering of the bloggers may be performed by repeatedly merging the bloggers into one cluster by using the second algorithm until the predetermined number of clusters remain, and reassigning the bloggers for each cluster. A custom blogger analysis method in a custom blogger analysis system.

The method of claim 13,
The first algorithm is a custom blogger analysis method in a custom blogger analysis system, characterized in that the algorithm that serves to initially group the bloggers in a small cluster size.

The method of claim 13,
The second algorithm is a user-specific blogger analysis method in a user-defined blogger analysis system, characterized in that the algorithm that serves to find a cluster similar to each other among the clusters of the small size grouped by the first algorithm.

delete

10. The method of claim 9,
The blogger information is composed of static information, which is member information input when creating a blog, and dynamic information, which is event information about a user's experience extracted from content registered by the user while using the blog. How to analyze custom bloggers in the system.