KR101013761B1

KR101013761B1 - Blog search apparatus and method using authority estimation in blog space

Info

Publication number: KR101013761B1
Application number: KR1020090027594A
Authority: KR
Inventors: 이동만; 정윤재
Original assignee: 한국과학기술원
Priority date: 2008-10-27
Filing date: 2009-03-31
Publication date: 2011-02-14
Also published as: KR20100047108A

Abstract

본 발명은 블로그 권위값 추정 기법을 사용한 블로그 검색 장치 및 방법에 관한 것으로, 사용자 블로그의 이웃 블로그 내에 있는 중요한 문서를 자기중심적으로 고속 검색함에 있어서 이웃 블로그의 권위값을 추정하고 권위값이 높은 이웃 블로그를 우선 검색함으로써, 이웃 블로그 중에서 상대적으로 중요한 블로그로 검색 공간을 좁혀서 블로그 검색 시에 중요한 문서를 찾는데 필요로 하는 시간적 오버헤드를 줄여서 블로그 검색의 속도를 향상시키는 이점이 있다.The present invention relates to a blog retrieval apparatus and method using a blog authority value estimation technique. In the self-centered high-speed search for important documents in a neighbor blog of a user blog, the neighbor blog with high authority value is estimated By first searching, the search space is narrowed to a relatively important blog among neighboring blogs, thereby reducing the time overhead required to find important documents in the blog search, thereby improving the speed of blog search.

블로그 정보 검색, 블로그 검색 엔진, 사회 네트워크 검색 Blog Information Search, Blog Search Engine, Social Network Search

Description

BLOG SEARCH APPARATUS AND METHOD USING AUTHORITY ESTIMATION IN BLOG SPACE}

본 발명은 블로그 권위값 추정 기법을 사용한 블로그(blog) 검색 장치 및 방법에 관한 것으로, 더욱 상세하게는 검색 대상 블로그에 대해 추정한 권위값(authority)과 질의에 해당하는 문서의 존재 여부에 따라 계산한 우선순위에 의거하여 순차적으로 검색하는 블로그 권위값 추정 기법을 사용한 블로그 검색 장치 및 방법에 관한 것이다.The present invention relates to a blog retrieval apparatus and method using a blog authority value estimation technique, and more particularly, to calculate according to the existence of a document corresponding to an authority value and a query estimated for a blog to be searched. The present invention relates to a blog search apparatus and method using a blog authority value estimation technique of sequentially searching based on a priority.

블로그는 최근 널리 퍼지고 있는 새로운 형태의 미디어이다. 이는 웹페이지의 한 종류이지만 사용자 사이의 사회 네트워크가 강화된 형태를 띠고 있으며, 그에 따라 연결된 사용자 사이의 검색이 중요한 요소가 되고 있다. 연결된 블로그 사이의 검색을 지원하는 검색 방식으로는 중앙화된 웹 검색 방식과 자기중심적 검색(egocentric search) 방식이 존재한다.Blogs are a new type of media that are spreading recently. This is a kind of web page, but the social network between users is intensified, so that search among connected users becomes an important factor. Search methods that support searching between connected blogs include centralized web search and egocentric search.

자기중심적 검색은 사용자의 블로그와 연결된 블로그들의 문서를 검색하여 사용자의 관심사에 부합하는 블로그 문서를 검색하는 것을 목표로 하고 있다. 그러나 이는 사용자의 블로그 네트워크에 많은 블로그가 존재하고 있을 경우, 중요한 문서 검색 시 많은 시간을 소요한다. 또한, 검색된 문서가 문서의 중요도에 따라 정렬되어 있지 않으므로 어떤 문서가 사용자의 관심에 부합하는 중요한 문서인지 알 수 없는 문제점이 있다.Self-centered search aims to search blog documents that match the user's interest by searching the documents of blogs connected with the user's blog. However, if you have a lot of blogs in your blog network, it takes a lot of time to search for important documents. In addition, since the retrieved documents are not sorted according to the importance of the documents, there is a problem that it is not possible to know which documents are important documents that meet the user's interest.

이와는 반대로 중앙화된 웹 검색 방식은 전체 블로그 문서를 수집하고 이를 랭킹(ranking)하여 사용자의 질의에 맞게 중요도에 맞는 정렬된 검색 결과를 얻을 수 있으나, 상위 검색 결과는 전체 블로그 공간의 극히 일부분이며, 전체 블로그 공간상에서 매우 인기 있는 문서에 한정되므로 사용자 개개인의 세부 관심사에 부합하지 않을 수 있는 문제점이 있다.In contrast, the centralized web search method collects and ranks the entire blog document to obtain search results sorted by importance according to the user's query, but the top search results are only a fraction of the overall blog space. Since it is limited to documents that are very popular in the blog space, there is a problem that may not correspond to the specific interests of each user.

본 발명은 종래 기술의 문제점을 해결하기 위해 제안한 것으로서, 자기중심적 검색 방식에 중앙화된 웹 검색 방식의 장점을 결합하여 자기중심적 검색의 속도와 검색된 결과의 질을 향상시킬 수 있는 블로그 권위값 추정 기법을 사용한 블로그 검색 방법을 제공한다.The present invention has been proposed to solve the problems of the prior art, and combines the advantages of the centralized web search method with the self-centered search method to estimate the authority value of the blog that can improve the speed and quality of the search results. Provides a way to search the blog you used.

또한, 본 발명은 검색 대상 블로그에 대해 추정한 권위값과 질의에 해당하는 문서의 존재 여부에 따라 계산한 우선순위에 의거하여 순차적으로 검색할 수 있는 블로그 권위값 추정 기법을 사용한 블로그 검색 장치를 제공한다.In addition, the present invention provides a blog search apparatus using a blog authority value estimation technique that can be sequentially searched based on the authority value estimated for the search target blog and the priority calculated according to the existence of a document corresponding to the query. do.

본 발명의 제 1 관점으로서 블로그 권위값 추정 기법을 사용한 블로그 검색 방법은, 검색 대상 블로그의 지역적 정보를 이용하여 권위값을 추정하는 단계와, 상기 권위값과 질의에 해당하는 문서의 존재 여부에 따라 우선순위를 계산하는 단계와, 상기 검색 대상 블로그를 상기 우선순위에 따라 순차적으로 검색하는 단계를 포함한다.According to a first aspect of the present invention, a blog search method using a blog authority value estimating method includes estimating an authority value using local information of a blog to be searched for, and whether or not a document corresponding to the authority value and a query exists. Calculating a priority and sequentially searching the search target blogs according to the priority.

여기서, 상기 권위값을 추정하는 단계는, 정규화된 실제 권위값에 대한 추정함수를 사용하여 상기 권위값을 추정한다.The estimating of the authority value may include estimating the authority value using an estimation function for a normalized actual authority value.

상기 추정함수는, 휴리스틱 함수(heuristic function)를 사용한다.The estimation function uses a heuristic function.

상기 권위값을 추정하는 단계는, 상기 지역적 정보로서 트랙백(trackback)으로 연결된 이웃 블로그의 개수와 코멘트(comment)로 연결된 이웃 블로그의 개수 중에서 적어도 어느 하나의 개수를 이용한다.The estimating of the authority value uses at least one of the number of neighboring blogs connected by trackback and the number of neighboring blogs connected by comment as the local information.

상기 권위값을 추정하는 단계는, 아이겐루머(EigenRumor) 알고리즘을 이용하여 전체 블로그 상의 데이터를 기반으로 계산된 각 블로그의 권위값을 추정하기 위해, 선형회귀분석(linear regression)을 통해 상기 트랙백으로 연결된 이웃 블로그의 개수 및 상기 코멘트로 연결된 이웃 블로그의 개수에 대한 가중치를 계산한 후에 이를 이용한다.The estimating the authority value may be connected to the trackback through linear regression to estimate the authority value of each blog calculated based on the data on the entire blog using an EigenRumor algorithm. The weight of the number of neighbor blogs and the number of neighbor blogs connected by the comment is calculated and then used.

상기 우선순위를 계산하는 단계는, 상기 질의에 해당하는 문서가 존재할 경우에 상기 권위값에 가중치를 부여한다.The calculating of the priority may include weighting the authority value when a document corresponding to the query exists.

상기 검색하는 단계는, 상기 검색 대상 블로그 중에서 기 설정한 범위의 블로그를 검색한다.The searching may include searching a blog in a preset range among the search target blogs.

상기 검색하는 단계는, 사용자 블로그로부터의 거리 범위와 검색 블로그의 개수 범위 중에서 적어도 어느 하나의 범위를 설정하여 검색한다.The searching may be performed by setting at least one of a range of distances from a user blog and a number range of search blogs.

상기 검색하는 단계는, 상기 기 설정한 범위의 블로그를 그리디 검색(greedy search) 방식으로 차례차례 방문하여 검색한다.In the searching, the blog of the preset range is sequentially visited and searched by a greedy search method.

본 발명의 제 2 관점으로서 블로그 권위값 추정 기법을 사용한 블로그 검색 장치는, 검색 대상 블로그의 지역적 정보를 이용하여 권위값을 추정하는 권위값 추정부와, 상기 권위값과 질의에 해당하는 문서의 존재 여부에 따라 우선순위를 계산하는 우선순위 계산부와, 상기 검색 대상 블로그를 상기 우선순위에 따라 순차적으로 검색하는 블로그 검색부를 포함한다.A blog retrieval apparatus using a blog authority value estimation technique as a second aspect of the present invention includes an authority value estimating unit for estimating an authority value using local information of a search target blog, and a document corresponding to the authority value and the query. It includes a priority calculation unit for calculating the priority according to whether or not, and a blog search unit for sequentially searching the search target blog according to the priority.

여기서, 상기 권위값 추정부는, 정규화된 실제 권위값에 대한 추정함수를 사용하여 상기 권위값을 추정한다.Here, the authority value estimating unit estimates the authority value by using an estimation function for a normalized actual authority value.

상기 추정함수는, 휴리스틱 함수를 사용한다.The estimation function uses a heuristic function.

상기 권위값 추정부는, 상기 지역적 정보로서 트랙백으로 연결된 이웃 블로그의 개수와 코멘트로 연결된 이웃 블로그의 개수 중에서 적어도 어느 하나의 개수를 이용한다.The authority value estimating unit uses at least one of the number of neighboring blogs connected by trackback and the number of neighboring blogs connected by comment as the local information.

상기 권위값 추정부는, 아이겐루머 알고리즘을 이용하여 전체 블로그 상의 데이터를 기반으로 계산된 각 블로그의 권위값을 추정하기 위해, 선형회귀분석을 통해 상기 트랙백으로 연결된 이웃 블로그의 개수 및 상기 코멘트로 연결된 이웃 블로그의 개수에 대한 가중치를 계산한 후에 이를 이용하다.The authority value estimating unit estimates the authority value of each blog calculated based on the data on the entire blog using an eigen rumor algorithm, and the number of neighbor blogs connected to the trackback and the neighbors connected to the comment through linear regression analysis. Use this after calculating the weight for the number of blogs.

상기 우선순위 계산부는, 상기 질의에 해당하는 문서가 존재할 경우에 상기 권위값에 가중치를 부여한다.The priority calculating unit weights the authority value when a document corresponding to the query exists.

상기 블로그 검색부는, 상기 검색 대상 블로그 중에서 기 설정한 범위의 블로그를 검색한다.The blog search unit searches for blogs in a preset range among the search target blogs.

상기 블로그 검색부는, 사용자 블로그로부터의 거리 범위와 검색 블로그의 개수 범위 중에서 적어도 어느 하나의 범위를 설정하여 검색한다.The blog search unit searches at least one of a range of distance from a user blog and a number range of search blogs.

상기 블로그 검색부는, 상기 기 설정한 범위의 블로그를 그리디 검색 방식으로 차례차례 방문하여 검색한다.The blog search unit sequentially visits and searches a blog of the preset range by a greedy search method.

본 발명에 의하면, 사용자 블로그의 이웃 블로그 내에 있는 중요한 문서를 자기중심적으로 고속 검색함에 있어서 이웃 블로그의 권위값을 추정하고 권위값이 높은 이웃 블로그를 우선 검색함으로써, 이웃 블로그 중에서 상대적으로 중요한 블로그로 검색 공간을 좁혀서 블로그 검색 시에 중요한 문서를 찾는데 필요로 하는 시간적 오버헤드를 줄여서 블로그 검색의 속도를 향상시키는 효과가 있다.According to the present invention, in a self-centered high-speed search for important documents in a neighbor blog of a user blog, the neighbor blog having a high authority value is estimated by first estimating the authority value of the neighbor blog and searching for a relatively important blog among the neighbor blogs. By narrowing the space, you can reduce the time overhead required to find important documents when searching for blogs.

이하, 본 발명의 일부 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 아울러 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설 명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다.Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. In addition, in describing the present invention, when it is determined that the detailed description of the related known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

본 발명은 전체 블로그 공간에 대한 문서 데이터가 있지 않은 자기중심적 블로그 검색 환경에서의 블로그 고속 검색 장치 및 방법으로서, 블로그의 권위를 추정하고 자기중심적 검색에 포함되는 블로그의 개수를 높은 권위값 추정치를 가지는 블로그들로 한정시켜 블로그 고속 검색을 수행한다. 즉 종래의 자기중심적 블로그 검색 방법과는 달리 블로그의 권위값을 블로그가 가지고 있는 지역적 정보(트랙백으로 연결된 이웃 블로그의 개수, 코멘트로 연결된 이웃 블로그의 개수)를 사용하여 추정하며, 권위 추정치와 질의에 대한 적합 여부를 기준으로 블로그 검색을 수행한다.The present invention is an apparatus and method for fast blog search in a self-centered blog search environment in which there is no document data for the entire blog space. The present invention estimates the authority of a blog and has a high authority value estimate for the number of blogs included in the self-centered search. Limit blogs to fast blog search. Unlike the conventional self-centered blog search method, the authority value of the blog is estimated by using the local information (number of neighbor blogs linked by trackback, number of neighbor blogs linked by comment), and the authority estimate and query are estimated. Perform a blog search based on compliance.

도 1은 본 발명에 따른 블로그 권위값 추정 기법을 사용한 블로그 검색 장치의 블록 구성도이다.1 is a block diagram of a blog search apparatus using a blog authority value estimation method according to the present invention.

이에 나타낸 바와 같이 본 발명에 의한 블로그 검색 장치는, 권위값 추정부(110), 우선순위 계산부(120), 블로그 검색부(130) 등을 포함하여 구성된다.As described above, the blog search apparatus according to the present invention includes an authority value estimating unit 110, a priority calculating unit 120, a blog searching unit 130, and the like.

권위값 추정부(110)는 검색 대상 블로그의 지역적 정보를 이용하여 권위값을 추정한다. 여기서, 휴리스틱 추정함수를 사용하여 정규화된 권위값을 추정한다. 지역적 정보로는 트랙백으로 연결된 이웃 블로그의 개수와 코멘트로 연결된 이웃 블로그의 개수 중에서 적어도 어느 하나의 개수를 이용하며, 트랙백으로 연결된 이웃 블로그의 개수와 코멘트로 연결된 이웃 블로그의 개수를 모두 이용할 수 있다. 아이겐루머 알고리즘을 이용하여 전체 블로그 상의 데이터를 기반으로 계산된 각 블로그의 권위값을 추정하기 위해, 선형회귀분석을 통해 트랙백으로 연결된 이웃 블로그의 개수 및 코멘트로 연결된 이웃 블로그의 개수에 대한 가중치를 계산하고, 이를 이용하여 권위값을 추정한다.The authority value estimator 110 estimates the authority value by using local information of the search target blog. Here, the heuristic estimation function is used to estimate the normalized authority value. As the regional information, at least one of the number of neighboring blogs connected by trackback and the number of neighboring blogs connected by comment may be used, and the number of neighboring blogs connected by trackback and the number of neighboring blogs connected by comment may be used. In order to estimate the authority value of each blog calculated based on the data on the entire blog using the eigen rumor algorithm, the weight of the number of neighbor blogs connected by trackback and the number of neighbor blogs connected by comment is calculated through linear regression analysis. And the authority value is estimated using this.

우선순위 계산부(120)는 권위값과 질의에 해당하는 문서의 존재 여부에 따라 우선순위를 계산한다. 여기서, 질의에 해당하는 문서가 존재할 경우에 권위값에 1보다 큰 가중치를 부여한다.The priority calculator 120 calculates a priority according to the authority value and the existence of a document corresponding to the query. Here, if a document corresponding to the query exists, a weight greater than 1 is assigned to the authority value.

블로그 검색부(130)는 검색 대상 블로그를 우선순위에 따라 순차적으로 검색한다. 여기서, 검색 대상 블로그 중에서 기 설정한 범위의 블로그를 검색하며, 이때 검색 범위는 사용자 블로그로부터의 거리 범위와 검색 블로그의 개수 범위 중에서 적어도 어느 하나의 범위를 설정하고, 거리 범위와 개수 범위를 모두 설정할 수 있다. 아울러, 기 설정한 범위의 블로그를 그리디 검색 방식으로 차례차례 방문하여 검색한다.The blog search unit 130 sequentially searches the search target blogs in order of priority. Here, a blog of a preset range is searched among the blogs to be searched, and at this time, the search range is set to at least one of a range of distances from the user blog and a number range of search blogs, and sets both a range of distances and a number of ranges. Can be. In addition, the blog of the preset range is visited and searched in turn by the greedy search method.

이와 같이 구성된 본 발명에 따른 블로그 권위값 추정 기법을 사용한 블로그 검색 장치에 의한 블로그 검색 방법에 대해 도 2 내지 도 6을 추가로 참고하여 설명하면 다음과 같다.The blog search method by the blog search apparatus using the blog authority value estimation method according to the present invention configured as described above will be described with reference to FIGS. 2 to 6 as follows.

먼저, 블로그 검색부(130)를 통해 검색 대상 블로그의 검색 범위를 설정한다(S210). 이때 검색 범위는 사용자 블로그로부터의 거리 범위와 검색 블로그의 개 수 범위 중에서 적어도 어느 하나의 범위를 설정할 수 있으며, 거리 범위와 개수 범위를 모두 설정할 수 있다. 거리 범위라 함은 사용자 블로그에 트랙백이나 코멘트에 의해 직접적으로 연결된 거리를 단위 거리라고 할 때에 몇 단위를 거쳐서 연결된 블로그까지를 검색할 것이냐를 설정하는 것이며, 개수 범위라 함은 몇 개까지의 블로그를 검색할 것이냐를 설정하는 것이다.First, the search range of the search target blog is set through the blog search unit 130 (S210). In this case, the search range may set at least one of a range of distances from the user blog and a number range of search blogs, and may set both a range of distances and a number of ranges. The distance range is to set how many units the blog will be searched if the distance directly connected to the user blog by trackback or comment is the unit distance. The number range is the number of blogs. Setting whether or not to search.

그리고, 권위값 추정부(110)는 검색 대상 블로그의 지역적 정보, 즉 트랙백으로 연결된 이웃 블로그의 개수와 코멘트로 연결된 이웃 블로그의 개수 중에서 적어도 어느 하나의 개수를 이용하여 권위값을 추정한다(S220). 여기서, 정규화된 권위값을 추정하기 위해 휴리스틱 추정함수를 사용한다. 지역적 정보로는 트랙백으로 연결된 이웃 블로그의 개수와 코멘트로 연결된 이웃 블로그의 개수를 모두 이용할 수 있다. 이때, 아이겐루머 알고리즘을 이용하여 전체 블로그 상의 데이터를 기반으로 계산된 각 블로그의 권위값을 추정하기 위해, 선형회귀분석을 통해 트랙백으로 연결된 이웃 블로그의 개수 및 코멘트로 연결된 이웃 블로그의 개수에 대한 가중치를 계산하고, 이를 이용하여 권위값을 추정한다.In addition, the authority value estimator 110 estimates the authority value using at least one of local information of the search target blog, that is, the number of neighbor blogs connected by trackback and the number of neighbor blogs connected by comment (S220). . Here, the heuristic estimation function is used to estimate the normalized authority value. For regional information, both the number of neighbor blogs linked by trackback and the number of neighbor blogs linked by comments can be used. At this time, in order to estimate the authority value of each blog calculated based on the data on the entire blog using the Eigen Rumer algorithm, the weight of the number of neighbor blogs connected by trackback and the number of neighbor blogs connected by comment through linear regression analysis Calculate the equation and use it to estimate the authority value.

아이겐루머 알고리즘에 따르면 도 3에 나타낸 바와 같이, 블로그의 권위값(authority score)은 포함하고 있는 블로그 문서의 평가값(reputation score)에 따라 결정되며, 문서의 평가값은 문서에 트랙백이나 코멘트를 한 블로그의 허브값(hub score)에 따라 결정된다. 이는 높은 허브값을 가지는 블로그에 의해 많이 연결된 문서를 많이 가지고 있는 블로그가 높은 권위값을 가지고 있는 것을 의미한다.According to the eigen rumor algorithm, as shown in FIG. 3, an authority score of a blog is determined according to a reputation score of a blog document, which includes a trackback or comment on the document. It depends on your blog's hub score. This means that a blog with a lot of documents linked by a blog with a high hub value has a high authority value.

그러나, 자기중심적 검색에 있어서는 전체 블로그 공간의 정보가 모두 알려져 있지 않으므로 대상 블로그 내의 정보만을 활용하여 권위값을 추정하여야 한다. 도 3에서와 같이 코멘트나 트랙백을 한 블로그의 개수가 권위값을 계산하는데 많은 영향을 미치므로 아래의 수학식 2와 같은 권위값 추정함수를 이용한다.However, in the self-centered search, since all the information of the entire blog space is not known, the authority value should be estimated using only the information in the target blog. As shown in FIG. 3, since the number of blogs that have been commented or tracked back has a great influence on calculating the authority value, the authority value estimation function shown in Equation 2 below is used.

여기서, 블로그의 권위값과 대상 블로그에 트랙백을 한 블로그의 개수, 코멘트를 한 블로그의 개수는 정규분포를 따르지 않으므로 추정함수를 계산하기 위해서는 이를 정규화시킬 필요가 있다. 도 4는 블로그의 권위값을 a라고 했을 때에 전체 블로그 공간상에서 블로그 권위값의 분포를 나타낸 그래프로서, (a)는 권위값 a의 분포이며, (b)는 ln(a)의 분포이고, (c)는 -1/ln(a)의 분포이다.Here, since the authority value of the blog, the number of blogs tracked to the target blog, and the number of blogs commented do not follow a normal distribution, it is necessary to normalize them in order to calculate the estimation function. 4 is a graph showing the distribution of blog authority values in the entire blog space when the authority value of a blog is a, (a) is a distribution of authority values a, (b) is a distribution of ln (a), and ( c) is the distribution of −1 / ln (a).

아래의 수학식 1은 각각의 값에 대한 정규화 방법을 나타낸 것으로 a는 실제 블로그의 권위값이며, na는 정규화된 블로그의 권위값이다.Equation 1 below shows a normalization method for each value, where a is the authority value of the actual blog and na is the authority value of the normalized blog.

블로그의 권위값은 해당 블로그에 트랙백을 한 이웃 블로그의 개수와 코멘트한 이웃 블로그의 개수를 이용하여 계산한다. 트랙백한 이웃 블로그의 개수와 코멘트한 이웃 블로그의 개수는 한 블로그 상에서 쉽게 파악할 수 있으므로, 전체 블로그 공간의 데이터를 알지 못하더라고 블로그의 권위값을 추정할 수 있게 한다. 수학식 2에서 na는 추정된 블로그의 정규화된 권위값이며 n_c는 해당 블로그에 코멘트한 이웃 블로그의 개수, n_t는 트랙백한 이웃 블로그의 개수이다.The authority value of a blog is calculated using the number of neighbor blogs that tracked the blog and the number of neighbor blogs commented on. The number of neighboring blogs tracked back and the number of neighboring blogs that have been commented on are easily tracked on a single blog, which makes it possible to estimate the authority value of a blog without knowing the data of the entire blog space. In Equation 2, na is a normalized authority value of the estimated blog, n _c is the number of neighboring blogs commented on the blog, and n _t is the number of neighboring blogs tracked back.

β는 가중치를 나타내는 상수로서, β₁₀과 β₁₁은 코멘트만을 가지는 블로그의 가중치를 나타내고, β₂₀과 β₂₁은 트랙백만을 가지는 블로그의 가중치를 나타내고, β₃₀과 β₃₁ 및 β₃₂는 코멘트와 트랙백을 모두 가지는 블로그의 가중치를 나타낸다.β is a constant that represents the weight, β ₁₀ and β ₁₁ represent the weight of the blog with comments only, β ₂₀ and β ₂₁ represent the weight of the blog with trackmills, and β ₃₀ and β ₃₁ and β ₃₂ represent the comments and trackbacks. Represents the weight of a blog that has both.

여기서, 아이겐루머 알고리즘을 이용하여 전체 블로그 공간상의 데이터를 기반으로 각 블로그의 실제 권위값을 계산한 후, 선형회귀분석 기법을 사용하여 각 가중치를 계산한 결과는 아래의 수학식 3과 같다.Here, after calculating the actual authority value of each blog based on the data on the entire blog space using the eigen rumor algorithm, the result of calculating each weight using the linear regression analysis method is shown in Equation 3 below.

다음으로, 우선순위 계산부(120)는 권위값과 질의에 해당하는 문서의 존재 여부에 따라 우선순위를 계산한다(S230). 여기서, 질의에 해당하는 문서가 존재할 경우에 권위값에 1보다 큰 가중치를 부여한다. 즉 사용자의 질의에 대한 우선권을 계산하기 위해 이웃 블로그의 추정 권위치와 질의에 대한 적합도를 모두 고려하는 것이다. 우선권을 계산하는 함수는 수학식 4와 같다. 수학식 4에서 x는 대상 블로그를 의미하며, q는 사용자의 질의를 의미하고, r은 1보다 큰 가중치이며, h_a는 정규화된 권위 추정값이다. 아래의 수학식 4에 따라 질의 q에 해당하는 문서가 있는 블로그 x는 블로그의 권위(h_a)의 r배 만큼의 우선권을 갖는다.Next, the priority calculation unit 120 calculates the priority according to the existence of a document corresponding to the authority value and the query (S230). Here, if a document corresponding to the query exists, a weight greater than 1 is assigned to the authority value. In other words, in order to calculate the priority of the user's query, both the estimated authority value of the neighboring blog and the fitness of the query are considered. A function for calculating the priority is shown in Equation 4. In Equation 4, x denotes a target blog, q denotes a query of a user, r denotes a weight greater than 1, and h _a denotes a normalized authority estimate. According to Equation 4 below, a blog x having a document corresponding to the query q has a priority of r times the authority (h _a ) of the blog.

끝으로, 블로그 검색부(130)는 단계 S210에서 설정한 검색 대상 블로그를 우선순위에 따라 순차적으로 검색한다. 이때, 기 설정한 범위의 블로그를 그리디 검색 방식으로 차례차례 방문하여 검색한다(S240). 도 5는 블로그 검색부(130)에 의한 검색 과정을 나타낸 도면이다. 종래의 자기중심적 블로그 검색에 의하면 ① → ② → ③ → ④ → ⑤ → ⑥ → ⑦의 순서로 이웃 블로그를 방문하여 검색하였는데, 본 발명에 의한 블로그 검색에서는 권위값이 높은 블로그, 즉 상위 우선순위 블로그만을 방문하여 검색하므로 ② → ⑤ → ⑥의 순서로 이웃 블로그를 방문하여 검색을 수행한다.Finally, the blog search unit 130 sequentially searches the search target blog set in step S210 according to the priority. At this time, the blog of the preset range is sequentially visited and searched by the greedy search method (S240). 5 is a diagram illustrating a search process by the blog search unit 130. According to the conventional self-centered blog search, the neighboring blogs were searched in the order of ① → ② → ③ → ④ → ⑤ → ⑥ → ⑦. In the blog search according to the present invention, the blog having a high authority value, that is, a high-priority blog Visit the bay and search, so visit the neighboring blogs in the order of ② → ⑤ → ⑥ to perform a search.

본 발명에 의한 블로그 권위값 추정 기법을 사용한 블로그 검색 방법은 컴퓨터 프로그램으로 작성 가능하다. 이 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 당해 분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다. 또한, 해당 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 정보저장매체(computer readable media)에 저장되고, 컴퓨터에 의하여 읽혀지고 실행됨으로써 블로그 권위값 추정 기법을 사용한 블로그 검색 방법을 구현한다. 정보저장매체는 자기 기록매체, 광 기록매체 및 캐리어 웨이브 매체를 포함한다.The blog retrieval method using the blog authority value estimation method according to the present invention can be created by a computer program. Codes and code segments constituting this computer program can be easily inferred by a computer programmer in the art. In addition, the computer program is stored in a computer readable media, and is read and executed by a computer to implement a blog search method using a blog authority estimation technique. The information storage medium includes a magnetic recording medium, an optical recording medium and a carrier wave medium.

도 6은 본 발명에 의한 블로그 권위값 추정 기법을 사용한 블로그 검색 방법을 컴퓨터 프로그램으로 작성하였을 때의 알고리즘을 나타낸 것이다.6 shows an algorithm when a blog program is created using a blog authority estimation method according to the present invention.

라인 3∼7에서, 자기 블로그의 주소 정보, 검색 거리 범위, 검색 개수 범위, 질의, 가중치를 설정한다.In lines 3 to 7, the address information, search distance range, search number range, query, and weight of the own blog are set.

라인 12∼13에서, 사용자의 블로그를 우선순위 큐(priority queue)에 넣는다.In lines 12 to 13, the user's blog is put in a priority queue.

라인 16∼17에서, 우선순위 큐로부터 현재 블로그를 얻고, 질의에 관련된 문서가 있는지 검색한다.In lines 16 to 17, the current blog is obtained from the priority queue, and a search is made for a document related to the query.

라인 19∼27에서, 검색된 관련 문서를 검색 결과로 저장하며, 사용자 블로그와 현재 블로그 사이의 거리가 검색 거리 범위 내에 있는지 확인한다.In lines 19 to 27, the retrieved relevant document is stored as a search result, and the distance between the user blog and the current blog is within the search distance range.

라인 30∼47에서, 해당 블로그가 검색 범위 내에 존재하고 있다면, 해당 블로그의 이웃 블로그를 우선순위 큐에 넣는다. 우선순위는 수학식 4에 따라 계산된다.In lines 30 to 47, if the blog is within the search range, the neighboring blog of the blog is put in the priority queue. The priority is calculated according to equation (4).

라인 16∼47은 지정한 검색 공간 크기, 즉 라인 5에서 설정한 검색 개수 범위(검색할 블로그의 개수)만큼 반복한다.Lines 16 to 47 repeat the specified search space size, that is, the number of search ranges (number of blogs to be searched) set in line 5.

지금까지 본 발명에 대하여 그 일부 실시예를 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been described based on some embodiments thereof. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

도 1은 본 발명에 따른 블로그 권위값 추정 기법을 사용한 블로그 검색 장치의 블록 구성도,1 is a block diagram of a blog search apparatus using a blog authority value estimation technique according to the present invention;

도 2는 본 발명에 따른 블로그 권위값 추정 기법을 사용한 블로그 검색 방법을 설명하기 위한 흐름도,2 is a flowchart illustrating a blog search method using a blog authority value estimation method according to the present invention;

도 3은 본 발명에서 이용하는 블로그 권위값의 개념도,3 is a conceptual diagram of a blog authority value used in the present invention;

도 4는 본 발명에서 이용하는 블로그 권위값을 분포 상태를 나타낸 그래프,4 is a graph showing a distribution state of a blog authority value used in the present invention;

도 5는 본 발명에 따른 블로그 검색 장치에 의한 블로그 검색 과정을 나타낸 개념도,5 is a conceptual diagram illustrating a blog search process by a blog search apparatus according to the present invention;

도 6은 본 발명에 따른 블로그 권위값 추정 기법을 사용한 블로그 검색 방법을 컴퓨터 프로그램으로 작성한 실시예를 보인 알고리즘.6 is an algorithm showing an embodiment in which a computer program is used for a blog retrieval method using a blog authority value estimation method according to the present invention;

Claims

Estimating authority value using local information of the blog to be searched,

Calculating a priority according to the authority value and the existence of a document corresponding to the query;

Sequentially searching the searched blogs according to the priority

Blog search method using a blog authority value estimation technique comprising a.

The method of claim 1,

The estimating authority value may include estimating the authority value using an estimation function of a normalized actual authority value.

How to search blogs using blog authority estimation technique.

The method of claim 2,

The estimation function uses a heuristic function.

How to search blogs using blog authority estimation technique.

The method of claim 1,

The estimating of the authority value may include using at least one of the number of neighboring blogs connected by trackback and the number of neighboring blogs connected by comment as the local information.

How to search blogs using blog authority estimation technique.

The method of claim 4, wherein

The estimating the authority value may be connected to the trackback through linear regression to estimate the authority value of each blog calculated based on the data on the entire blog using an EigenRumor algorithm. After calculating weights for the number of neighbor blogs and the number of neighbor blogs connected with the comment,

How to search blogs using blog authority estimation technique.

The method of claim 1,

The calculating of the priority may include weighting the authority value when a document corresponding to the query exists.

How to search blogs using blog authority estimation technique.

The method of claim 1,

The searching may include searching a blog in a preset range among the search target blogs.

How to search blogs using blog authority estimation technique.

The method of claim 7, wherein

The searching may include setting and searching at least one of a range of distances from a user blog and a number range of search blogs.

How to search blogs using blog authority estimation technique.

The method of claim 7, wherein

The searching may include: sequentially visiting and searching the blog of the preset range by a greedy search method.

How to search blogs using blog authority estimation technique.

An authority value estimating unit estimating an authority value using local information of a blog to be searched;

A priority calculation unit for calculating a priority according to the authority value and the existence of a document corresponding to the query;

Blog search unit for sequentially searching the search target blog according to the priority

Blog search apparatus using a blog authority value estimation technique comprising a.

The method of claim 10,

The authority value estimating unit estimates the authority value by using an estimation function for a normalized actual authority value.

Blog retrieval device using blog authority estimation technique.

The method of claim 11,

The estimation function uses a heuristic function.

Blog retrieval device using blog authority estimation technique.

The method of claim 10,

The authority value estimating unit uses at least one of the number of neighboring blogs connected by trackback and the number of neighboring blogs connected by comment as the local information.

Blog retrieval device using blog authority estimation technique.

The method of claim 13,

The authority estimator is configured to estimate the authority value of each blog calculated based on the data on the entire blog using an EigenRumor algorithm. After calculating the weight for the number and the number of neighboring blogs connected by the comment,

Blog retrieval device using blog authority estimation technique.

The method of claim 10,

The priority calculator is configured to assign a weight to the authority value when a document corresponding to the query exists.

Blog retrieval device using blog authority estimation technique.

The method of claim 10,

The blog search unit searches for blogs in a preset range among the search target blogs.

Blog retrieval device using blog authority estimation technique.

The method of claim 16,

The blog search unit is configured to search by setting at least one of a range of distance from a user blog and a number range of search blogs.

Blog retrieval device using blog authority estimation technique.

The method of claim 16,

The blog search unit sequentially visits and searches the blog of the preset range by a greedy search method.

Blog retrieval device using blog authority estimation technique.