KR101218141B1

KR101218141B1 - Method and system for calculating ranking

Info

Publication number: KR101218141B1
Application number: KR1020110097261A
Authority: KR
Inventors: 유성준; 강한훈
Original assignee: (주)레드테이블
Priority date: 2011-09-27
Filing date: 2011-09-27
Publication date: 2013-01-03

Abstract

PURPOSE: A ranking calculation method and a system are provided to use a document without having a hyperlink structure, thereby calculating rankings between objects. CONSTITUTION: A restaurant information collection unit(100) collects information related to restaurants which are searched on the web. The restaurant information collection unit stores the information related to the restaurants in a restaurant database. A review collection unit(110) collects a restaurant review document which is searched on the web. The review collection unit stores the restaurant review document in a review database. An intermediate table generation unit(120) generates an intermediate calculation table and includes an intermediate database which manages the intermediate calculation table. A ranking calculation unit(130) calculates a final ranking value of the restaurants by using the intermediate calculation table. [Reference numerals] (100) Restaurant information collection unit; (102) Restaurant DB; (110) Review collection unit; (112) Review DB; (120) Intermediate table generation unit; (130) Ranking calculation unit; (140) Ranking result storage unit; (142) Ranking DB; (A) Web

Description

Ranking calculation method and system {METHOD AND SYSTEM FOR CALCULATING RANKING}

본 발명은 랭킹 계산 방법 및 시스템에 관한 것으로서, 보다 상세하게는 하이퍼 링크 구조를 갖지 않는 리뷰 문서를 이용하여 랭킹을 산출하는 랭킹 계산 방법 및 시스템에 관한 것이다.The present invention relates to a ranking calculation method and system, and more particularly, to a ranking calculation method and system for calculating a ranking using a review document having no hyperlink structure.

랭킹기술은 검색 엔진이 사용자의 질의어와 관련된 문서를 리턴해주기 위해 계산하는 기술이다.Ranking technique is a technique that a search engine calculates to return a document related to a user's query.

국외의 초기 랭킹 기술은 일반적인 웹 문서를 대상으로 하이퍼링크의 구조나 문서 내 단어 출현 빈도수를 반영하여 랭킹을 결정하였고, 그 이 후에는 이를 응용한 형태의 기술이 개발되었다.The initial ranking technique in foreign countries was determined by reflecting the structure of hyperlinks and the frequency of word appearance in the document for general web documents. After that, the technology was applied.

국내의 랭킹 기술은 국외 초기의 랭킹 기술을 용용한 형태이거나 블로그 문서를 대상으로 한 기술도 존재한다.Domestic ranking technology is a form that uses the initial foreign ranking technology or there is a technology that targets blog documents.

국외 기술로서 웹 문서 랭킹 기법은 문서에 포함된 하이퍼링크를 분석하는 기법과 웹 문서의 내용을 분석하고 질의와 웹 문서와의 유사성 점수를 기반으로 랭크를 결정하는 방법이 있다. 링크 분석 방법은 하나의 웹 문서가 다른 웹 문서에 하이퍼링크를 연결함으로써 추천의 의미를 부여할 수 있기 때문에 하이퍼링크 연결을 많이 당한 문서일수록 많은 추천을 받은 것으로 판단하여 높은 랭킹을 부여한다. As a foreign technology, the web document ranking technique includes a technique of analyzing a hyperlink included in a document, a method of analyzing the contents of the web document, and determining a rank based on a similarity score between the query and the web document. In the link analysis method, since one web document can give a meaning of recommendation by connecting a hyperlink to another web document, the document which receives a lot of hyperlinks is determined to have received more recommendation and gives a high ranking.

링크 분석 방법에서는 웹 문서를 노드(node)와 방향성 간선(directed edge)으로 구성된 그래프 구조로 표현하고 이를 이용한다. In the link analysis method, a web document is represented using a graph structure composed of nodes and directed edges.

도 1은 웹 문서를 그래프로 표현한 사례를 나타낸 도면이다. 1 is a diagram illustrating an example in which a web document is represented graphically.

여기서 노드는 웹 문서이고, 간선은 하이퍼링크이다. Where nodes are web documents and edges are hyperlinks.

이 때, 웹 문서 u는 웹 문서 v를 참조하고, 웹 문서 v는 웹 문서 u에 의해 참조 받는다고 표현한다. 웹 문서가 참조를 하느냐 또는 받느냐에 따라 간선은 방향성이 존재하는데, 웹 문서 u에서 웹 문서 v를 참조할 때의 간선을 정방향이라고 하고 웹 문서 v가 웹 문서 u에 의해 참조 받는 간선을 역방향이라 한다. At this time, the web document u refers to the web document v, and the web document v is referred to as being referred to by the web document u. The edges are directional depending on whether the web document references or receives them. The edge when the web document u refers to the web document v is called forward, and the edge that web document v refers to by the web document u is called reverse. .

도 1에서 웹 문서 v는 웹 문서 u에 의해 참조를 받기도 하며, 웹 문서 w를 참조하고 있다.In FIG. 1, the web document v is also referred to by the web document u and refers to the web document w.

국외 랭킹 알고리즘 중 InDegree는 참조를 많이 받은 문서가 많을 때 문서의 질이 높은 것으로 보아 랭킹을 높게 측정하는 방법이다. InDegree is one of the foreign ranking algorithms, which means that the quality of documents is high when there are many documents that receive a lot of references.

웹 문서 v가 웹 문서 w와 같은 정방향 간선 보다 웹 문서 u와 같이 역방향으로 들어오는 간선의 수, 즉 웹 문서 u를 포함하여 다른 웹 문서가 웹 문서 v를 참조하는 수를 통해 웹 문서의 랭킹을 계산한다.Web document v calculates the ranking of the web document by the number of edges coming in the reverse direction, such as web document u, rather than forward edges such as web document w, that is, the number of other web documents including web document u refer to web document v. do.

PageRank는 다수의 양질의 웹 문서로부터 참조 받은 웹 문서를 질 좋은 웹 문서로 계산한다. PageRank에서 웹 문서의 질은 랜덤 서퍼(Random Surfer)라는 페이지를 임의의 정방향으로 방문하며 탐색하는 모델을 가정한다. PageRank calculates a web document that is referenced from many high quality web documents as a quality web document. The quality of a web document in PageRank assumes a model of randomly visiting and navigating a page called a random surfer.

예를 들어 페이지 A가 페이지 B,C,D로 총 3개의 링크를 걸었다면 B는 A의 페이지 랭크 값의 1/3만큼 가져오는데, A페이지를 방문한 서퍼는 A페이지를 보고 만족하여 탐색을 중단하거나, 혹은 A페이지에서 만족하지 못하여 다른 페이지로 방문할 수 있다. For example, if page A has three links to pages B, C, and D, B gets 1/3 of A's page rank value, and a surfer who visits page A sees page A and is happy to stop searching. If you are not satisfied with page A, you can visit another page.

이러한 확률(Damping Factor)이 α라 한다면 페이지 B는 αㄷ 의 PageRank를 받게 된다. PageRank는 이와 같은 방법을 통해 페이지간 PageRank 값을 주고 받는 값을 반복하다보면, 전체 웹페이지가 특정한 PageRank 값을 수렴한다는 사실을 통해 각 페이지의 최종 페이지 랭크를 계산하게 된다. 수학식 1은 i번째 페이지 p_i에 대한 PageRank를 계산하는 식을 나타낸다.If the probability (Damping Factor) is α, page B receives a PageRank of α. When PageRank repeats the value of PageRank exchange between pages through this method, it calculates the final page rank of each page based on the fact that the entire webpage converges a specific PageRank value. Equation 1 shows an equation for calculating PageRank for the i th page p _i .

수학식 1에서 d는 Damping Factor이고 N은 웹페이지의 총 개수, M(p_i)는 p_i페이지를 링크한 페이지들의 집합, L(p_j)는 p_j페이지에서 밖으로 나가는 링크의 개수이다. In Equation 1, d is a damping factor, N is the total number of web pages, M (p _i ) is a set of pages linking p _i pages, and L (p _j ) is the number of outgoing links from p _j pages.

Damping Factor는 위에서 설명한 것처럼 링크를 따라 클릭하여 이동할 확률로 일반적으로 d는 0.85로 주게 된다. 수학식 1을 분석해보면 하나의 웹 페이지는 자신의 PageRank를 자기가 링크하고 있는 페이지에 골고루 나눠준다는 것을 알 수 있다. 링크를 받은 페이지 쪽에서는 이렇게 나눠 받은 페이지 랭크를 Damping Factor를 고려해서 합산한 것이 PageRank 값이 된다. 이 과정을 PageRank값들이 변동이 크게 없이 수렴할 때 까지 반복하여 최종 PageRank값을 구하게 된다.The damping factor is the probability of clicking and moving along a link as described above. Generally, d is 0.85. Analysis of Equation 1 shows that a web page distributes its PageRank evenly to the page to which it is linked. On the page that received the link, the page rank divided by the sum of the damping factors is added to the PageRank value. This process is repeated until the PageRank values converge without any significant change to obtain the final PageRank value.

HITS 랭킹 알고리즘은 하나의 웹 페이지에 권위 점수와 허브 점수를 부여한다. 그리고 이 점수를 이용하여 문서의 질을 결정하여 랭크를 계산한다. HITS에서 정의하고 있는 권위 점수란 웹 문서의 내용의 질을 의미하고 허브 점수란 좋은 문서를 참조하는 정도를 의미한다. 웹 문서는 높은 허브 점수를 가지는 웹 문서에 의해 많이 참조 받을수록 높은 권위 점수를 가진다. 또한, 웹 문서는 높은 권위점수를 가지는 웹 문서를 많이 참조할수록 높은 허브점수를 갖게 된다. HITS에서의 권위점수와 허브점수는 서로 영향을 미친다. 권위점수가 갱신되면 허브점수가 새롭게 갱신되고, 허브점수가 새롭게 갱신되면 다시 권위점수가 새롭게 갱신된다. 이러한 갱신이 반복되면 권위점수와 허브점수가 더 이상 새로운 값을 가지지 않게 되는데, 이 값이 웹 문서의 질을 평가하는 점수로 부여된다.The HITS ranking algorithm assigns authority and hub scores to a single web page. The score is then used to determine the document quality to calculate the rank. Authority scores as defined by HITS refer to the quality of the content of web documents, and hub scores refer to the degree to which good documents are referenced. A web document has a higher authority score the more it is referred to by a web document with a high hub score. Also, a web document has a higher hub score when more web documents with high authority scores are referred to. Authority scores and hub scores in HITS influence each other. When the authority score is updated, the hub score is newly updated, and when the hub score is newly updated, the authority score is newly updated. When this update is repeated, the authority and hub scores no longer have new values, which are then used to evaluate the quality of the web document.

SALSA 랭킹 알고리즘은 하나의 웹 문서에 권위점수와 허브점수를 부여한다. SALSA에서는 권위점수 계산 과정을 정방향 에지를 따라, 허브점수 계산 과정을 역방향 에지를 따라 랜덤 서퍼가 랜덤 이동하는 것으로 본다. 이 때, 웹 문서의 권위 점수와 허브점수는 랜덤 서퍼가 해당 웹 페이지에 머무를 확률로 결정된다.The SALSA ranking algorithm assigns authority and hub scores to a web document. In SALSA, the authority score calculation process follows the forward edge and the hub score calculation process assumes that the random surfer randomly moves along the reverse edge. At this time, the authority score and the hub score of the web document are determined by the probability that the random surfer stays on the web page.

Web Page Scoring System(WPSS) 알고리즘은 구글 검색 엔진에서 사용하는 PageRank가 웹 문서의 내용은 고려하지 않고 하이퍼링크의 구조만을 고려하여 랭킹을 계산한다는 점을 단점으로 지적하고 있다. 또한 구글이 일반적이고 총체적인 검색(horizontal) 뿐만 아니라 특정 도메인의 검색(vertical)을 지원하고 있으며, 최근에는 vertical portal 사이트가 늘어남에 따라 이에 적용할 수 있는 랭킹 방법으로 페이지의 내용을 고려하고 기존에 제안된 랜덤 서퍼 모델을 개선한 방법을 설명하고 있다.The Web Page Scoring System (WPSS) algorithm points out that PageRank used by Google's search engine calculates the ranking based on the structure of hyperlinks, not the contents of web documents. In addition, Google supports not only general and horizontal search, but also search of specific domains. In recent years, as the number of vertical portal sites increases, the ranking method that can be applied is considered and suggested. To improve the random surfer model.

국내 연구로서 블로그 환경을 위한 포스트 랭킹 알고리즘을 제안한 연구가 있다. 이 연구에서는 기존에 제안된 PageRank나 HITS와 같은 링크 기반 웹 문서 랭킹 알고리즘을 포스트 랭킹에 적용할 수 없음을 시사하고 있다. 그 이유는 블로그 포스트에는 하이퍼링크가 포함되어 있지 않기 때문이라고 밝히고 있다. 대신에 이 알고리즘에서는 블로그의 소유자인 블로거를 PageRank에서의 노드 개념으로 보고, 블로그에서 일어나는 액션인 스크랩, 덧글 쓰기, 엮인글 쓰기 등을 간선의 개념으로 하여 PageRank의 변형된 계산방법을 활용하여 포스트에 대한 랭킹을 결정한다.As a domestic study, there is a study that proposed a post ranking algorithm for blog environment. This study suggests that the proposed link-based web document ranking algorithm such as PageRank or HITS cannot be applied to post ranking. The reason is that blog posts do not contain hyperlinks. Instead, the algorithm sees the blog's owner as a node concept in PageRank, and uses PageRank's transformed calculation method to post to posts with the notion of scraps, comments, and writebacks that occur in the blog. Determine the ranking for.

최근 들어, 인터넷 사용자들은 레스토랑을 방문하기에 앞서 인터넷을 통해 다른 사람들의 의견을 검색한다. 이러한 의견은 윙버스, 메뉴판 등의 사이트에서 찾을 수 있고, 블로그 문서에서도 찾을 수 있다. 윙버스 등의 레스토랑 전문 검색 사이트에서는 레스토랑 랭킹의 기준으로 사용자가 작성한 별점의 평균 값을 기반으로 한다. 그러나 랭킹의 요소로 별점만 고려하는 것은 동일한 별점을 갖는 레스토랑의 정확한 순위를 알 수 없고, 별점에도 왜곡이 있을 수 있다. 예를 들어, 악의적으로 너무 좋게 평가하거나 너무 나쁘게 평가할 수 있다.Recently, Internet users search for opinions of others via the Internet before visiting restaurants. These comments can be found on sites such as Wingbus, the menu board, and in blog articles. Restaurant search sites such as Wingbus are based on the average value of the user-created star ratings as a standard for restaurant ranking. However, considering only the star rating as a factor of the ranking does not know the exact ranking of the restaurant having the same star rating, there may be distortion in the star rating. For example, one might rate too badly or too badly.

그런데, 종래의 랭킹 기술은 인터넷 상에 존재하는 일반적인 웹 문서의 랭킹을 결정할 때 쓰인다. 이는 모든 종류의 웹 문서에 일반적으로 적용하기 위한 것으로 하이퍼링크를 통해 참조를 많이 받은 문서는 여러 사람에게 흥미가 될 수 있는 문서라고 판단하기 때문이다. 따라서 하이퍼 링크 구조를 갖지 않는 레스토랑에 대한 사용자의 리뷰 문서를 대상으로 종래의 랭킹 기술의 적용이 불가능하다. 이는 하이퍼링크가 존재하지 않는 블로그 문서의 랭킹을 계산하기 위해 스크랩 수, 덧글 수, 엮인 글 수 등을 이용해야만 하는 이치와 동일하다.However, the conventional ranking technique is used to determine the ranking of general web documents existing on the Internet. This is because it is generally applied to all kinds of web documents, because it is determined that a document that is referred to through hyperlinks may be of interest to many people. Therefore, it is impossible to apply the conventional ranking technique to the user's review document for the restaurant that does not have a hyperlink structure. This is the same as having to use the number of scraps, the number of comments, the number of posts, etc. to calculate the ranking of blog documents that do not have hyperlinks.

본 발명은 상기와 같은 문제점을 해결하기 위하여 착안된 것으로서, 하이퍼 링크 구조를 갖지 않는 리뷰 문서를 이용하여 랭킹을 산출하는 랭킹 계산 방법 및 시스템을 제공하는 것을 목적으로 한다. The present invention has been conceived to solve the above problems, and an object thereof is to provide a ranking calculation method and system for calculating a ranking using a review document having no hyperlink structure.

상기와 같은 목적을 달성하기 위하여, 본 발명에 따른 랭킹 계산 시스템은 웹 상에서 검색되는 복수의 식당에 관한 정보를 수집하고, 식당 데이터베이스에 상기 복수의 식당에 관한 정보를 데이터베이스화하여 관리하는 식당 정보 수집부, 웹 상에서 검색되는 식당에 관하여 작성된 복수의 리뷰 문서를 수집하고, 리뷰 데이터베이스에 상기 리뷰 문서를 데이터베이스화하여 관리하는 리뷰 수집부, 랭킹 최종 결과를 생성하기 위하여 사용되는 중간 계산 테이블을 적어도 하나 생성하고, 상기 중간 계산 테이블을 데이터베이스화하여 관리하기 위한 적어도 하나의 중간 데이터베이스를 포함하는 중간 테이블 생성부, 적어도 하나의 상기 중간 계산 테이블을 활용하여 각 식당의 최종 랭킹 값을 산출하는 랭킹 계산부 및 각각의 상기 최종 랭킹 값을 랭킹 데이터베이스에 데이터베이스화하여 저장 및 관리하는 랭킹 결과 저장부를 포함하여 이루어진다. In order to achieve the above object, the ranking calculation system according to the present invention collects information on a plurality of restaurants to be searched on the web, restaurant information collection to manage the database of the information on the plurality of restaurants in a restaurant database And a review collection unit for collecting a plurality of review documents written about restaurants searched on the web, and managing the database of the review documents in a review database, and generating at least one intermediate calculation table used to generate a ranking final result. And an intermediate table generation unit including at least one intermediate database for database-managing the intermediate calculation table, a ranking calculation unit for calculating a final ranking value of each restaurant using at least one intermediate calculation table, and each Having ranked the final ranking value of It includes a ranking result storage unit that is stored in the database and stored in the database.

이 때, 상기 중간 계산 테이블은 각 식당에 관한 리뷰 문서를 작성한 사용자들의 영향력을 수치화한 영향력 점수를 반영하도록 이루어지는 것이 좋다.At this time, the intermediate calculation table is preferably made to reflect the influence score quantified the influence of the users who created the review document for each restaurant.

또한, 상기 중간 계산 테이블은 각 식당에 관한 평가점수를 수치화한 별점 점수를 반영하도록 이루어지는 것이 좋다. In addition, the intermediate calculation table may be made so as to reflect the star score that is the numerical evaluation score for each restaurant.

한편, 상기 영향력 점수는 상기 사용자들의 각각에 대한 최근 활동성을 수치화한 최근 활동성 점수를 반영하도록 이루어지는 것이 바람직하다.On the other hand, the influence score is preferably made to reflect the recent activity score that quantifies the recent activity for each of the users.

또한, 상기 영향력 점수는 상기 사용자들의 각각에 대한 지속 참여 정도를 수치화한 지속성 점수를 반영하도록 이루어지는 것이 바람직하다.In addition, the influence score is preferably made to reflect a persistence score that quantifies the degree of continuous participation for each of the users.

나아가, 상기 영향력 점수는 특정 리뷰 문서의 작성 이후 시점의 리뷰 문서의 개수를 통하여 상기 특정 리뷰 문서를 작성한 사용자의 영향력을 판단하도록 하는 전파력 점수를 반영하도록 이루어지는 것이 더욱 바람직하다.Further, the influence score is more preferably made to reflect the propagation force score to determine the influence of the user who created the particular review document through the number of review documents at the time after the creation of the particular review document.

한편, 본 발명의 다른 측면에 따른 랭킹 계산 방법은, 식당 정보 수집부, 리뷰 수집부, 중간 테이블 생성부, 랭킹 계산부 및 랭킹 결과 저장부를 포함하는 랭킹 계산 시스템을 통하여 이루어지며, 웹 상에 산재하는 복수의 식당에 대응되는 정보가 상기 랭킹 계산 시스템의 상기 식당 정보 수집부에서 수집되고, 상기 식당 정보 수집부의 식당 데이터베이스에 데이터베이스화되는 식당 정보 수집단계, 특정 식당에 관하여 작성된 적어도 하나의 리뷰 문서가 상기 리뷰 수집부에서 수집되며, 상기 리뷰 수집부의 리뷰 데이터베이스에 데이터베이스화되는 리뷰 문서 수집단계, 각 식당의 랭킹 최종 결과를 생성하기 위하여 사용되는 적어도 하나의 중간 계산 테이블이 상기 중간 테이블 생성부에서 생성되는 중간 테이블 생성단계, 적어도 하나의 상기 중간 계산 테이블을 활용하여 각 식당의 최종 랭킹 값이 상기 랭킹 계산부에서 산출되는 랭킹 계산단계 및 상기 랭킹 계산단계에서 산출된 각 식당의 최종 랭킹 값이 상기 랭킹 결과 저장부의 상기 랭킹 데이터베이스에 데이터베이스화되는 랭킹 결과 저장단계를 포함하여 이루어진다. Meanwhile, the ranking calculation method according to another aspect of the present invention is made through a ranking calculation system including a restaurant information collecting unit, a review collecting unit, an intermediate table generating unit, a ranking calculating unit, and a ranking result storage unit, and scattered on a web. The information corresponding to a plurality of restaurants are collected in the restaurant information collecting unit of the ranking calculation system, the restaurant information collecting step of being databased in the restaurant database of the restaurant information collecting unit, at least one review document written about a specific restaurant is A review document collection step collected by the review collector and databased in a review database of the review collector, at least one intermediate calculation table used to generate a ranking final result of each restaurant is generated by the intermediate table generator Intermediate table generation step, at least one intermediate system A ranking calculation step in which the final ranking value of each restaurant is calculated by the ranking calculation unit and a final ranking value of each restaurant calculated in the ranking calculation step are databased in the ranking database of the ranking result storage unit by using an acid table. The result storage step is included.

본 발명을 이용하면, 하이퍼링크 구조를 갖지 않는 문서를 이용하여 랭킹 산정 대상 간의 랭킹을 계산할 수 있는 랭킹 계산 방법 및 시스템을 구현할 수 있는 효과가 있다. According to the present invention, it is possible to implement a ranking calculation method and system capable of calculating a ranking between ranking calculation targets using a document having no hyperlink structure.

도 1은 웹 문서를 그래프로 표현한 사례를 나타낸 도면,
도 2는 본 발명에 따른 랭킹 계산 시스템의 일례를 나타낸 블록도,
도 3은 본 발명에 따른 랭킹 계산 방법의 일례를 나타낸 흐름도이다. 1 is a diagram illustrating an example in which a web document is represented graphically;
2 is a block diagram showing an example of a ranking calculation system according to the present invention;
3 is a flowchart illustrating an example of a ranking calculation method according to the present invention.

본 발명에 따른 랭킹 계산 방법 및 시스템을 설명하기 위하여, 이하에서는 랭킹 산정의 대상을 복수의 식당(레스토랑)으로 정하기로 한다. In order to explain the ranking calculation method and system according to the present invention, a target of ranking calculation will be defined as a plurality of restaurants (restaurants).

즉, 복수의 식당(레스토랑)에 대한 랭킹을 계산하기 위해 인터넷 상에 산재해 있는 리뷰 문서들로부터 랭킹을 factors를 찾아내고 이를 적용하는 방법을 기술한다. That is, it describes how to find the factors and apply the rankings from review documents scattered on the Internet to calculate the rankings for a plurality of restaurants (restaurants).

도 2는 본 발명에 따른 랭킹 계산 시스템의 일례를 나타낸 블록도이다. 2 is a block diagram showing an example of a ranking calculation system according to the present invention.

랭킹 계산 시스템(10)은 식당 정보 수집부(100), 리뷰 수집부(110), 중간 테이블 생성부(120), 랭킹 계산부(130) 및 랭킹 결과 저장부(140)를 포함하여 이루어진다. The ranking calculation system 10 includes a restaurant information collecting unit 100, a review collecting unit 110, an intermediate table generating unit 120, a ranking calculating unit 130, and a ranking result storage unit 140.

식당 정보 수집부(100)는 웹(A) 상에서 검색되는 수많은 식당들에 대응되는 정보를 수집하는 역할을 수행한다. 이를 위하여, 식당 정보 수집부(100)는 복수의 식당에 관한 정보를 데이터베이스화하여 관리하는 식당 데이터베이스(DB)(102)를 더 포함한다. The restaurant information collector 100 collects information corresponding to a number of restaurants searched on the web (A). To this end, the restaurant information collecting unit 100 further includes a restaurant database (DB) 102 that manages a database of information on a plurality of restaurants.

또한, 리뷰 수집부(110)는 웹(A)을 통하여 식당들에 관한 리뷰 문서를 수집하여 관리하는 역할을 수행한다. 이를 위하여, 리뷰 수집부(110)는 수집한 복수의 리뷰 문서를 데이터베이스화하여 관리하는 리뷰 데이터베이스(DB)(112)를 더 포함한다.In addition, the review collector 110 collects and manages review documents related to restaurants through the web (A). To this end, the review collection unit 110 further includes a review database (DB) 112 for managing a plurality of collected review documents by database.

또한, 중간 테이블 생성부(120)는 랭킹 최종 결과를 생성하기 위하여 사용되는 중간 계산 테이블을 생성하는 역할을 수행한다. 중간 계산 테이블로서 생성되는 정보로서는 예컨대 CS 정보, RFM 정보 및 M 정보가 있으며, 이들을 각각 데이터베이스화하여 관리하기 위하여 중간 테이블 생성부(120)는 CS 데이터베이스(122), RFM 데이터베이스(124) 및 M 데이터베이스(126)를 더 포함한다. In addition, the intermediate table generating unit 120 serves to generate an intermediate calculation table used to generate the ranking final result. The information generated as the intermediate calculation table includes, for example, CS information, RFM information, and M information. The intermediate table generation unit 120 uses the CS database 122, the RFM database 124, and the M database to manage them by database. 126 further.

랭킹 계산부(130)에서는 중간 테이블 생성부(120)에서 생성된 중간 계산 테이블 정보를 활용하여, 개개의 식당에 대한 최종 랭킹 값을 산출하는 역할을 수행한다. The ranking calculation unit 130 calculates the final ranking value for each restaurant by using the intermediate calculation table information generated by the intermediate table generation unit 120.

랭킹 결과 저장부(140)는 랭킹 계산부(130)에서 산출된 각 식당의 최종 랭킹 값을 저장하고 관리한다. 이를 위하여 랭킹 결과 저장부(140)는 랭킹 데이터베이스(142)를 더 포함한다.
The ranking result storage unit 140 stores and manages the final ranking value of each restaurant calculated by the ranking calculation unit 130. For this purpose, the ranking result storage unit 140 further includes a ranking database 142.

도 3은 본 발명에 따른 랭킹 계산 방법의 일례를 나타낸 흐름도이다. 3 is a flowchart illustrating an example of a ranking calculation method according to the present invention.

도 3에서 나타낸 바와 같이, 랭킹 계산 방법은 식당 정보 수집부, 리뷰 수집부, 중간 테이블 생성부, 랭킹 계산부, 랭킹 결과 저장부를 포함하는 랭킹 계산 시스템을 통하여 이루어지며, 식당 정보 수집단계(S100), 리뷰 문서 수집단계(S110), 중간 테이블 생성 단계(S120), 랭킹 계산단계(S130) 및 랭킹 결과 저장단계(S140)를 포함한다. As shown in FIG. 3, the ranking calculation method is performed through a ranking calculation system including a restaurant information collecting unit, a review collecting unit, an intermediate table generating unit, a ranking calculating unit, and a ranking result storage unit, and the restaurant information collecting step (S100). , Collecting document review step S110, intermediate table generation step S120, ranking calculation step S130, and ranking result storage step S140.

식당 정보 수집단계(S100)에서는 웹 상에 산재하는 복수의 식당에 대응되는 정보가 랭킹 계산 시스템의 식당 정보 수집부에서 수집되고, 식당 정보 수집부의 식당 데이터베이스에 데이터베이스화된다.In the restaurant information collecting step (S100), information corresponding to a plurality of restaurants scattered on the web is collected in the restaurant information collecting unit of the ranking calculation system, and is databased in the restaurant database of the restaurant information collecting unit.

리뷰 문서 수집단계(S110)에서는 식당에 관하여 작성되고 웹을 통하여 퍼져 있는 복수의 리뷰 문서가 리뷰 수집부에서 수집되며, 이들 리뷰 문서는 리뷰 수집부의 리뷰 데이터베이스에 데이터베이스화된다. In the review document collection step (S110), a plurality of review documents written about the restaurant and spread through the web are collected in the review collector, and these review documents are databased in the review database of the review collector.

중간 테이블 생성 단계(S120)에서는 랭킹 최종 결과를 생성하기 위하여 사용되는 중간 계산 테이블이 중간 테이블 생성부에서 생성된다. 중간 계산 테이블로서 예컨대 CS 정보, RFM 정보 및 M 정보 등이 생성되며, 이들 정보는 각각 CS 데이터베이스, RFM 데이터베이스 및 M 데이터베이스에 데이터베이스화된다. In the intermediate table generation step (S120), an intermediate calculation table used to generate the ranking final result is generated in the intermediate table generation unit. As the intermediate calculation table, for example, CS information, RFM information, M information, and the like are generated, and these information are databased in the CS database, the RFM database, and the M database, respectively.

랭킹 계산단계(S130)에서는 수집된 개개의 리뷰 문서에 따른 각 식당의 최종 랭킹 값이 랭킹 계산부에서 산출된다. 이 때, 최종 랭킹 값의 산출을 위해서 중간 테이블 생성 단계(S120)에서 생성된 복수의 중간 계산 테이블이 활용된다.In the ranking calculation step (S130), the final ranking value of each restaurant according to the collected individual review documents is calculated in the ranking calculation unit. In this case, a plurality of intermediate calculation tables generated in the intermediate table generation step S120 are used to calculate the final ranking value.

랭킹 결과 저장단계(S140)에서는 랭킹 계산부에서 산출된 각 식당의 최종 랭킹 값이 랭킹 결과 저장부에서 관리된다. 이 때, 최종 랭킹 결과값들은 랭킹 데이터베이스에 데이터베이스화된다.
In the ranking result storage step S140, the final ranking value of each restaurant calculated by the ranking calculation unit is managed in the ranking result storage unit. At this time, the final ranking result values are databased in the ranking database.

본 발명에서 랭킹 산출의 근거 자료가 되는, 사용자가 작성한 리뷰 문서에는 하이퍼링크가 포함되어 있지 않기 때문에 기존의 랭킹 기술을 적용할 수 없다. In the present invention, since the user-created review document, which is the basis for ranking calculation, does not include a hyperlink, existing ranking techniques cannot be applied.

따라서, 본 발명에서는 랭킹 factors로 Recency, Frequency, Influence, Custormer Satisfaction을 이용한다.Therefore, the present invention uses Recency, Frequency, Influence, Custormer Satisfaction as ranking factors.

Recency는 사용자가 작성한 리뷰가 얼마나 최근의 것인지를 평가한다.Recency evaluates how recent a user review is.

Frequency는 사용자가 작성한 리뷰의 수를 평가한다.Frequency evaluates the number of reviews you have written.

Influence는 특정 사용자가 리뷰 작성한 때의 특정 레스토랑의 리뷰 수와 최근의 레스토랑의 리뷰수를 계산하여, 특정 사용자의 영향력을 판단한다.Influence determines the influence of a specific user by calculating the number of reviews of a particular restaurant and the number of reviews of a recent restaurant when a user writes a review.

랭킹 factors로 별점만 고려하였던 기존의 방법에서 왜곡 문제를 개선하기 위해 별점 이외에 다른 factors를 적용하여 레스토랑에 대한 랭킹을 계산하는 방법을 기술한다.In order to improve the distortion problem in the existing methods that considered only star rating as the ranking factors, we describe how to calculate the ranking for restaurants by applying other factors besides the star rating.

표 1은 리뷰 데이터를 기반으로 식당에 대한 랭킹을 계산하기 위한 표의 일례이다. Table 1 is an example of a table for calculating a ranking for a restaurant based on review data.

nono shopshop starstar titletitle contentscontents namename createcreate __ timetime 1One 매장1Store1 4.54.5 제목1Title 1 내용1Content 1 people1people1 2011-05-012011-05-01 22 매장2Store2 33 제목2Title 2 내용2Content 2 people1people1 2011-04-302011-04-30 33 매장1Store1 22 제목3Heading 3 내용3Content 3 people1people1 2011-04-152011-04-15 44 매장1Store1 2.52.5 제목4Title 4 내용4Content 4 people2people2 2011-04-202011-04-20 55 매장2Store2 55 제목5Title 5 내용5Content 5 people2people2 2011-04-152011-04-15 66 매장1Store1 44 제목6Heading 6 내용6Content 6 people3people3 2011-04-012011-04-01 77 매장2Store2 3.53.5 제목7Heading 7 내용7Content 7 people3people3 2011-03-302011-03-30 88 매장1Store1 44 제목8Heading 8 내용8Content 8 people3people3 2011-02-012011-02-01 99 매장2Store2 4.54.5 제목9Title 9 내용9Content 9 people4people4 2011-02-202011-02-20 1010 매장1Store1 33 제목10Heading 10 내용10Content 10 people4people4 2011-01-102011-01-10 1111 매장1Store1 22 제목11Heading 11 내용11Content 11 people4people4 2011-01-012011-01-01 1212 매장1Store1 55 제목12Title 12 내용12Content 12 people5people5 2011-01-012011-01-01 1313 매장2Store2 55 제목13Title 13 내용13Content 13 people5people5 2010-12-302010-12-30 1414 매장2Store2 44 제목14Title 14 내용14Content 14 people5people5 2010-11-302010-11-30

리뷰 수집부에서 리뷰 문서를 수집하기 위해서는 일정한 규칙이 필요하다.In order to collect review documents from the review collector, certain rules are required.

표 1에서 사용자가 작성한 리뷰 RT는 다음과 같은 데이터를 갖도록 수학식2 같이 정의된다. 여기서 no는 일련번호, shop은 매장, star는 별점, title은 리뷰의 제목, contents는 리뷰 내용, name은 리뷰를 작성한 사람의 이름, createDate는 리뷰의 작성 날짜이다. The review RT created by the user in Table 1 is defined as Equation 2 to have the following data. Where no is the serial number, shop is the store, star is the star, title is the title of the review, contents is the content of the review, name is the name of the author, and createDate is the date of the review.

한편, 랭킹 계산부에서 식당의 랭킹을 계산하기 위해서 사용되는 식은 수학식 3과 같이 정의한다.On the other hand, the equation used to calculate the ranking of the restaurant in the ranking calculation unit is defined as in Equation 3.

수학식 3에서 PI는 People Influence로서 리뷰를 작성한 사람의 영향력을 평가하는 식이고, CS는 레스토랑 j에 대한 별점으로, 여러 사람의 별점의 평균을 최종 랭킹 계산에 반영한다.In Equation 3, PI is a measure of the influence of the person who wrote the review as People Influence, CS is a star rating for restaurant j, and reflects the average of several people's star ratings in the final ranking calculation.

즉, 수학식 3에서도 볼 수 있듯, 식당의 최종 랭킹을 계산하기 위해서는 계산에 사용되는 개개의 식당에 대한 PI, CS 등의 중간 데이터가 필요하다.That is, as shown in Equation 3, in order to calculate the final ranking of the restaurant, intermediate data such as PI and CS for each restaurant used in the calculation is required.

중간 테이블 생성부에서는 아래에 기술하는 기법들을 사용하여 이러한 중간 데이터를 산출한다.The intermediate table generator calculates such intermediate data using the techniques described below.

매장에 대한 리뷰를 작성한 사용자들의 영향력 점수 PI는 수학식 4와 같이 계산한다. The influence score PI of users who write a review of the store is calculated as shown in Equation 4.

수학식 4에서 R은 최근에 P_i가 작성한 RT를 기반으로 최근 활동성을 판단하는 값으로서, R(P_i)는 수학식 5와 같이 정의한다.In Equation 4, R is a value for determining recent activity based on RT recently written by P _i , and R (P _i ) is defined as Equation 5 below.

수학식 5에서 N은 총 people의 수이다. T는 랭킹을 계산하는 시점에서의 현재 날짜를 의미한다. Rank_asc함수는 특정 숫자 집합을 대상으로 오름차순 정렬했을 때의 등수이다. MAX 함수는 Pi가 작성한 최근 리뷰의 날짜를 구한다.
In Equation 5, N is the total number of people. T means the current date at the time of calculating the ranking. The Rank _asc function is an _equal number when sorting in ascending order on a specific set of numbers. The MAX function returns the date of the most recent review written by Pi.

표 2는 수학식 5를 통해 각 people의 R을 계산한 결과를 나타낸 표이다.Table 2 is a table showing the results of calculating the R of each people through the equation (5).

P_i P _i NN TT MAX(t_i)MAX (t _i ) T-MAX(t_i)T-MAX (t _i ) Rank_asc Rank _asc R(P_i)R (P _i ) people1people1 55 '2011-05-05''2011-05-05' '2011-05-01''2011-05-01' 44 1One 5-1+1=55-1 + 1 = 5 people2people2 55 '2011-05-05''2011-05-05' '2011-04-20''2011-04-20' 1515 22 5-2+1=45-2 + 1 = 4 people3people3 55 '2011-05-05''2011-05-05' '2011-04-01''2011-04-01' 3434 33 5-3+1=35-3 + 1 = 3 people4people4 55 '2011-05-05''2011-05-05' '2011-02-20''2011-02-20' 7474 44 5-4+1=25-4 + 1 = 2 people5people5 55 '2011-05-05''2011-05-05' '2011-01-01''2011-01-01' 124124 55 5-5+1=15-5 + 1 = 1

수학식 5에서 F는 P_i가 작성한 리뷰수를 기반으로 사용자의 지속적 참여 정도를 판단하는 값으로 수학식 6과 같이 정의한다.In Equation 5, F is a value for determining the degree of continuous participation of the user based on the number of reviews written by P _i as defined in Equation 6.

수학식 6에서 N은 총 people의 수이다. COUNT 함수는 RT에서 P_i가 작성한 리뷰의 건 수를 계산한다. Rank_desc 함수는 특정 숫자 집합을 대상으로 내림차순 정렬했을 때의 등수이다.
In Equation 6, N is the total number of people. The COUNT function counts the number of reviews written by P _i at RT. Rank _desc A function is an equal number when sorted in descending order on a specific set of numbers.

표 3은 수학식 6을 통해 각 people의 F를 계산한 결과를 나타낸 표이다. Table 3 is a table showing the results of calculating the F of each person through the equation (6).

P_i P _i NN COUNT(P_i,RT)COUNT (P _i , RT) Rank_desc Rank _desc F(P_i)F (P _i ) people1people1 55 33 1One 5-1+1=55-1 + 1 = 5 people2people2 55 22 55 5-5+1=15-5 + 1 = 1 people3people3 55 33 1One 5-1+1=55-1 + 1 = 5 people4people4 55 33 1One 5-1+1=55-1 + 1 = 5 people5people5 55 33 1One 5-1+1=55-1 + 1 = 5

수학식 4에서 I는 P_i가 shop에 대한 리뷰를 작성하고 그 이후에 다른 People P들이 작성한 리뷰의 양을 통해 Pi의 영향력을 판단하는 값으로 수학식 7과 같이 정의한다.In Equation 4, I is defined as Equation 7 by P _i to write a review about the shop and then determine the influence of Pi based on the amount of reviews written by other people P.

수학식 8에서 N은 총 people의 수이다. Rank_desc 함수는 특정 숫자 집합을 대상으로 내림차순 정렬했을 때의 등수이다.In Equation 8, N is the total number of people. Rank _desc A function is an equal number when sorted in descending order on a specific set of numbers.

수학식 7에서 M은 P가 대상으로 한 shop의 총 수이다. COUNT(RT_p)은 Pi 가 최초로 리뷰를 작성했을 때의 shop s의 전체 리뷰수이다. COUNT(RT_p)은 랭킹을 계산하는 시점에서 shop s의 전체 리뷰수이다.
In Equation 7, M is the total number of shops that P targets. COUNT (RT _p ) is the total number of reviews in shop s when Pi first wrote a review. COUNT (RT _p ) is the total number of reviews of shop s at the time the ranking is calculated.

표 4는 수학식 8을 계산한 결과를 나타낸 표이다. Table 4 is a table showing the result of calculating the equation (8).

P_i P _i ss COUNT(RT_i)COUNT (RT _i ) COUNT(RT_p)COUNT (RT _p ) I₁(P_i,j)I ₁ (P _i , j) people1people1 매장1Store1 66 88 0.50.5 people1people1 매장2Store2 66 66 00 people1people1 합계 Sum 0.50.5 people2people2 매장1Store1 77 88 0.1250.125 people2people2 매장2Store2 55 66 0.1670.167 people2people2 합계 Sum 0.2920.292 people3people3 매장1Store1 44 88 22 people3people3 매장2Store2 44 66 0.6670.667 people3people3 합계 Sum 2.6672.667 people4people4 매장1Store1 22 88 4.54.5 people4people4 매장2Store2 33 66 1.51.5 people4people4 합계 Sum 66 people5people5 매장1Store1 22 88 4.54.5 people5people5 매장2Store2 1One 66 4.174.17 people5people5 합계 Sum 8.678.67

표 5는 수학식 7을 계산한 결과를 나타낸 표이다. Table 5 is a table showing the result of calculating the equation (7).

P_i P _i NN I₁(P_i)I ₁ (P _i ) Rank_desc Rank _desc I(P_i)I (P _i ) people1people1 55 0.500.50 44 5-4+1=25-4 + 1 = 2 people2people2 55 0.290.29 55 5-5+1=15-5 + 1 = 1 people3people3 55 2.672.67 33 5-3+1=35-3 + 1 = 3 people4people4 55 6.006.00 22 5-2+1=45-2 + 1 = 4 people5people5 55 8.678.67 1One 5-1+1=55-1 + 1 = 5

또한, 표 6은 수학식 4를 통해 P_i의 최종적인 PI 값을 계산한 결과를 나타낸 표이다. In addition, Table 6 is a table showing the result of calculating the final PI value of _Pi through the equation (4).

P_i P _i R(P_i)R (P _i ) F(P_i)F (P _i ) I(P_i)I (P _i ) PI(P_i)PI (P _i ) people1people1 55 55 22 1212 people2people2 44 1One 1One 66 people3people3 33 55 33 1111 people4people4 22 55 44 1111 people5people5 1One 55 55 1111

이처럼, 중간 테이블 생성부를 통하여 생성된 중간 데이터들을 통하여 랭킹 계산부에서는 각각의 식당에 대하여 최종적으로 랭킹을 산출하게 된다.As such, the ranking calculation unit finally calculates the ranking for each restaurant through the intermediate data generated through the intermediate table generation unit.

다음으로 표 7은 수학식 3을 통해 레스토랑에 대한 랭킹을 계산한 결과를 나타낸 표이다. Next, Table 7 is a table showing the results of calculating the ranking for the restaurant through the equation (3).

jj

AVG (CS _j ) AVG (CS _j ) * ω RS Store1 51 3.38 3.38 * 15 = 50.63 101.63 Store2 51 4.17 4.17 * 15 = 62.50 113.50

이상에서 나타낸 바와 같이, 기존 랭킹 기술은 웹 문서의 구조나 형태 등을 기반으로 하였지만 본 발명은 리뷰를 작성한 각 사람의 영향력을 기반으로 랭킹을 계산하기 때문에 기존의 랭킹 기술 보다 보다 타당한 랭킹의 결과를 도출할 수 있다.
As described above, the existing ranking technique is based on the structure and form of the web document, but the present invention calculates the ranking based on the influence of each person who writes the review, so the result of the ranking more reasonable than the existing ranking technique Can be derived.

Claims

A restaurant information collection unit which collects information about a plurality of restaurants searched on the web and manages the information about the plurality of restaurants in a restaurant database;
A review collector which collects a plurality of review documents written about restaurants searched on the web, and manages the review documents in a review database;
An intermediate table generation unit including at least one intermediate calculation table used to generate a ranking final result, and including at least one intermediate database for databaseizing and managing the intermediate calculation table;
A ranking calculation unit for calculating a final ranking value of each restaurant using at least one intermediate calculation table;
A ranking calculation system including a ranking result storage unit for storing and managing each of the final ranking value in a ranking database,
The ranking calculation unit may be configured such that a ranking score RANK of a particular restaurant among a plurality of restaurants becomes a sum of the sum of the influence scores of all reviewers for the specific restaurant and AVG (CS), which is an average of the star scores for the specific restaurant. Is calculated using the formula,

,
The influence score PI of each of the reviewers is calculated to be the sum of the recent activity score, the persistence index, and the propagation score of the reviewer,
The intermediate calculation table calculates the influence score, the star score, the activity score, the persistence index and the propagation force score and provides the ranking to the ranking calculation unit.
The recent activity score is a score that quantifies recent activity for each of the users,
The persistence score is a score that quantifies the degree of continuous participation for each of the users,
And the propagation force score is a score that quantifies the influence of the user who created the specific review document through the number of review documents at the time after creation of the specific review document.

delete

The method of claim 1,
The intermediate calculation table is made to reflect the star scores quantified the evaluation score for each restaurant, wherein the star score is a ranking calculation system that is the average value of the star scores of all reviewers.

The method of claim 1,
The recent activity score R (P _i ),
R (P _i ) = {N-Rank _asc (T-Max (t _i ))} + 1
N is the sum of the users, T is the current date at the time of calculating the ranking, and the Rank _asc function is an even number when ascending to a specific set of numbers, and MAX (t _i ) Is a ranking calculation system that is the date of a recent review written by P _i .

The method of claim 1,
The persistence score F (P _i ),
F (P _i ) = {N-Rank _desc (COUNT (P _i , RT))} + 1
Where N is the total number of users, COUNT (P _i , RT) is the number of reviews written by P _i in the RT, and Rank _desc is an _equal number when sorting in descending order on a specific set of numbers. Ranking calculation system.

The method of claim 1,
The propagation force score I (P _i ),
I (P _i ) = {N-Rank _desc (I1 (P _i ))} + 1
And I1 (P _i ) is

ego,
The N is the sum of the users, the Rank _desc is an equal number when sorted in descending order for a specific set of numbers, the M is the total number of stores targeted by the P, the COUNT (RT _P ) is The total number of reviews of stores when Pi first writes a review, wherein COUNT (RT _p ) is the total number of reviews of stores at the time of calculating the ranking.

A ranking calculation method using a ranking calculation system including a restaurant information collecting unit, a review collecting unit, an intermediate table generating unit, a ranking calculation unit, and a ranking result storage unit,
A restaurant information collecting step wherein information corresponding to a plurality of restaurants scattered on the web is collected by the restaurant information collecting unit of the ranking calculation system and is databased in a restaurant database of the restaurant information collecting unit;
At least one review document written about a particular restaurant is collected in the review collector, a review document collection step of being databased in the review database of the review collector,
An intermediate table generation step of generating at least one intermediate calculation table used by the restaurant to generate a final ranking result of each restaurant,
A ranking calculation step of calculating a final ranking value of each restaurant by using the at least one intermediate calculation table in the ranking calculation unit;
A ranking calculation method comprising a ranking result storing step in which a final ranking value of each restaurant calculated in the ranking calculation step is databased in the ranking database of the ranking result storage unit.
The ranking calculation step is such that the ranking score RANK of a particular restaurant among a plurality of restaurants is a sum of the sum of the influence scores of all reviewers for the specific restaurant and AVG (CS), which is an average of the star scores for the specific restaurant. Calculate using the formula below,

,
The influence score PI of each of the reviewers further includes calculating the sum of the reviewer's recent activity score (R), the persistence index (F), and the propagation force score (I).
The intermediate table generating step may further include calculating the influence score, the star score, the activity score, the persistence index, and the propagation force score and providing the ranking to the ranking calculation unit.
In this case, the recent activity score is a score that quantifies recent activity for each of the users, the persistence score is a score that quantifies the degree of continuous participation for each of the users, the propagation power score is after the creation of a specific review document A ranking calculation method that is a score that quantifies the influence of the user who created the specific review document through the number of review documents at the time point.