KR100818553B1

KR100818553B1 - Document ranking granting method and computer readable record medium thereof

Info

Publication number: KR100818553B1
Application number: KR1020060079176A
Authority: KR
Inventors: 홍석후; 정경석; 이상호; 이길재; 장성민; 정영신; 유현애
Original assignee: 에스케이커뮤니케이션즈 주식회사
Priority date: 2006-08-22
Filing date: 2006-08-22
Publication date: 2008-04-01
Also published as: KR20080017685A; WO2008023904A1

Abstract

본 발명은 본 발명은 문서랭킹 부여방법에 관한 것으로, 사용자가 검색엔진의 검색창을 통해 검색질의를 입력하면, 상기 질의에 해당하는 문서의 가중치에 그 문서의 사용자 액션랭킹를 고려하고, 상기 문서의 작성시간을 적용하여, 상기 질의에 관심있는 다른 사용자들로부터 액션이 많이 수행되고, 그 문서의 사용자가 그 주제에 구루에 해당하면 상위에 랭크되도록 하는 문서랭킹 부여방법에 관한 것이다.The present invention relates to a method for assigning a document ranking. When a user inputs a search query through a search box of a search engine, the user action ranking of the document is considered in the weight of the document corresponding to the query, By applying a creation time, a lot of actions are performed from other users interested in the query, and if the user of the document corresponding to the subject is a document ranking grant method to be ranked higher.

이러한 본 발명은, 특정 질의에 부합하는 문서와 사용자를 노출하고, 사용자의 피드백 정보를 이용하여 기존의 단순한 키워드 기반 검색랭킹 방법에 비해, 월등한 검색 성능을 보이는 효과가 있다.The present invention has an effect of exposing a document and a user corresponding to a specific query, and showing superior search performance compared to a conventional simple keyword-based search ranking method using user feedback information.

문서, 작성자, 액션, 랭킹, 랭크 Document, author, action, ranking, rank

Description

A computer-readable recording medium containing a document ranking method and a program capable of performing the same. {DOCUMENT RANKING GRANTING METHOD AND COMPUTER READABLE RECORD MEDIUM THEREOF}

도1은 본 발명에 적용되는 랭킹계산 방법에 대한 구조도1 is a structural diagram of a ranking calculation method applied to the present invention

도2는 본 발명에 적용되는 액션의 종류와 사용 유무를 나타내는 표Figure 2 is a table showing the type and use of the action applied to the present invention

도3은 문서 생성기간(d_c-d_w)에 따른 시간 가중치3 shows time weights according to the document generation period (d _c -d _w ).

도4는 본 발명에서 사용되는 검색 시 적용 방법에 대한 도면Figure 4 is a diagram of the application method when searching used in the present invention

도5는 본 발명이 적용되는 검색결과 화면 예시5 is a search result screen example to which the present invention is applied

도6은 본 발명이 적용되는 전문가 결과 화면 예시Figure 6 is an example of expert results screen to which the present invention is applied

본 발명은 문서랭킹 부여방법에 관한 것으로, 사용자가 검색엔진의 검색창을 통해 검색질의를 입력하면, 상기 질의에 해당하는 문서의 가중치에 그 문서의 사용 자 액션랭킹를 고려하고, 상기 문서의 작성시간을 적용하여, 상기 질의에 관심있는 다른 사용자들로부터 액션이 많이 수행되고, 그 문서의 사용자가 그 주제에 구루에 해당하면 상위에 랭크되도록 하므로써, 특정 질의에 부합하는 문서와 사용자를 노출하고, 사용자의 피드백 정보를 이용하여 기존의 단순한 키워드 기반 검색랭킹 방법에 비해, 월등한 검색 성능을 보이는 효과가 있는 문서랭킹 부여방법에 관한 것이다.The present invention relates to a method for assigning a document ranking. When a user inputs a search query through a search box of a search engine, the user action ranking of the document is considered in the weight of the document corresponding to the query, and the creation time of the document is determined. By applying, a lot of actions are performed from other users interested in the query, and the documents and users matching the specific query are exposed by allowing the user of the document to be ranked higher when the subject is a guru. By using the feedback information of the present invention, compared to the existing simple keyword-based search ranking method, the present invention relates to a document ranking granting method having an excellent search performance.

사용자들이 검색엔진을 이용하여 정보를 찾을 때, 검색엔진의 여러 가지 품질 척도중에 가장 중요한 것은 랭킹이다. 여기서 랭킹이란, 특정 키워드를 기준으로 적합한 문서를 순위화하는 것으로, 적합한 문서를 상위에 위치시켜 사용자에게 정보획득의 노력을 덜어 줄 수 있다. When users search for information using a search engine, ranking is the most important of the various quality measures of the search engine. In this case, the ranking is to rank appropriate documents based on specific keywords, and it is possible to reduce the effort of acquiring information to the user by placing the appropriate documents above.

현재 국내 포탈의 경우, 검색한 질의를 기준으로 검색결과문서에서의 출현 위치정보를 이용하거나 출현빈도를 이용하고 있다. 하지만, 이것만으로 방대한 문서에서 원하는 문서를 쉽게 찾을 수 있는 효율적인 검색을 하기는 어렵다.Currently, domestic portals use the appearance location information or the frequency of appearance in the search result document based on the searched query. However, this alone is difficult to find an efficient search that can easily find the desired document in a large document.

국외의 경우, 단어 기반의 출현 정보 이외에, 문서간의 링크된 구조를 이용한 페이지 랭크를 필두로 하여, html상의 여러 메타정보등을 이용한 다양한 가중치 결정 방식을 사용하고 있다. 또한, 메타 검색엔진들은 질의에 따라 검색 시스템을 선택하는 방법등을 사용하기도 한다. 하지만, 그러한 많은 비용에 비하여 성능의 개선은 한계가 있고, 웹페이지를 대상으로 검증되어, 국내 포탈에서 제공하는 다양한 컨텐츠에 일괄적으로 적용할 수 없다.In the case of foreign countries, in addition to word-based appearance information, various weight determination methods using various meta information on HTML are used, starting with page rank using a linked structure between documents. Meta search engines also use a method of selecting a search system based on a query. However, compared to such a large cost, the improvement of performance is limited, it is verified for web pages, and cannot be applied to various contents provided by a domestic portal at once.

최근까지 발표된 문헌 및 논문을 보면, 검색랭킹에 유의한 여러 요소들을 대상으로 적절한 가중치를 할당하는 연구들이 수행되었다. 웹 문서를 기준으로 페이지 랭크가 가장 좋은 영향을 끼쳤으며, 각 도메인마다 강력한 요소들이 다르다는 것이 증명되었다.In the literature and articles published recently, studies were conducted to assign appropriate weights to various factors that were significant for search ranking. Page ranks have the best impact on Web documents, and it is proven that each domain has strong factors.

그러나 이런 방법들은 현재 국내에서 서비스하는 주요한 문서인 UCC(사용자제작컨텐츠)계열의 문서에 적합하지 않고, 많은 비용을 필요로 하여, 적용하는데 어려움이 있다.However, these methods are not suitable for UCC (user-generated content) documents, which are the main documents currently being serviced in Korea, and they are difficult to apply due to the high cost.

본 발명은 상기와 같은 종래기술의 문제점을 해결하기 위한 것으로, 국내 포탈의 컨텐츠를 랭킹하기 위한 주요 요소는 주제 적합성과 사용자의 액션 정보이다. 특정 질의에 부합되는 주제를 갖는 문서와 구루들을 서비스 하기 위해서는 자동분류 방법이 필요하다. 자동분류는 일정한 주제 목록을 정의하고, 문서를 분석하여, 그 문서를 대표할 수 있는 주제들에게 매칭시키는 방법이다. 또한, 사용자의 액션 정보를 이용하여, 사용자가 선호하는 문서를 상위에 위치하도록 이전의 주제 분류 적합성과 연계하여 랭킹 값을 계산하는 방법을 제공하고자 하는 것을 그 목적으로 한다.The present invention is to solve the above problems of the prior art, the main element for ranking the content of the domestic portal is subject fitness and user action information. In order to serve documents and gurus that have a subject that matches a particular query, an automatic classification method is required. Autoclassification is a method of defining a list of subjects, analyzing a document, and matching it to topics that represent the document. It is also an object of the present invention to provide a method of calculating a ranking value in association with a previous subject classification suitability so as to locate a document preferred by a user using a user's action information.

본 발명에서는 사용자의 입력 키워드와 일치하거나 유사한 주제를 갖는 문서와 사용자의 다양한 피드백정보(이후 액션이라 칭함)를 이용하여 그 주제에 잘 부합되거나 사용자들이 선호하는 문서를 상위에 올릴 수 있는 랭킹(이후 액션랭킹이 라 칭함)을 제안한다. 또한, 입력한 키워드에 적합한 구루(전문가) 리스트를 노출하여, 검색의 효율성을 극대화 하고자 한다.According to the present invention, a document having a subject that matches or is similar to a user's input keyword and a user's various feedback information (hereinafter referred to as an action) may be ranked higher than the document that matches the subject or the user prefers. Action Ranking is called). In addition, we want to maximize the efficiency of the search by exposing the list of guru (expert) suitable for the entered keyword.

본 발명에 의한 문서랭킹 부여방법은,Document ranking method according to the present invention,

사용자에 의해 문서가 작성되는 문서작성단계와; 검색자에 의해 질의가 입력되는 질의입력단계와; 상기 문서작성단계에서 작성된 문서가 상기 질의와 일치할 확률에 상기 문서에 대한 검색일까지의 플러스영향력, 중복도영향력, 클릭영향력의 합을 고려한 문서가중치를 추출하는 문서가중치추출단계와; 상기 사용자가 작성한 문서들 중 상기 질의와 관련된 문서들에 대한 다른 사람들이 행한 문서가중치의 총합과, 상기 사용자가 상기 질의와 관련된 다른 사람들의 플러스가중치의 총합, 중복도가중치의 총합, 클릭가중치의 총합으로부터 사용자랭킹을 추출하는 사용자랭킹추출단계와; 상기 문서가중치와 사용자랭킹으로부터 문서랭킹을 추출하는 문서랭킹추출단계를 갖는다.A document creation step of creating a document by a user; A query input step of inputting a query by a searcher; A document weight extraction step of extracting a document weight value considering a sum of plus influence, redundancy impact, and click impact force up to a search date for the document to a probability that the document created in the document creation step matches the query; The sum of document weights made by others of the documents related to the query among the documents created by the user, the sum of plus weights of other people related to the query, the sum of duplicate weights, and the sum of click weights. A user ranking extraction step of extracting a user ranking from the user; And a document ranking extraction step of extracting a document ranking from the document weight value and user ranking.

상기 문서랭킹은

에 의해 계산되는 것이 바람직하다.The document ranking

It is preferable to calculate by.

여기서,here,

W(s, d_j)는 문서가중치W (s, d _j ) is the document weight

AR_A(s, u_i)는 사용자랭킹.AR _A (s, u _i ) is a user ranking.

상기 문서가중치는

에 의해 계산되는 것이 바람직하다.The document weight is

It is preferable to calculate by.

여기서,here,

상기 사용자랭킹은

에 의해 계산되는 것이 바람직하다.The user ranking is

It is preferable to calculate by.

여기서,here,

문서가중치

Document weight

플러스가중치

Plus weight

중복도가중치

Redundancy Weight

클릭가중치

Click weights

상기 사용자랭킹, 문서랭킹 중 어느 하나 또는 둘에는 시간가중치가 적용되어 시간이 경과함에 따라 랭킹이 하락하는 것이 바람직하다.One or both of the user ranking and the document ranking is preferably time-weighted so that the ranking decreases as time passes.

상기 시간가중치는

으로 정의되며, 상기 적용은 상기 사용자랭킹, 문서랭킹에 상기 시간가중치가 곱해진 것을 의미하는 것이 바람직하다.The time weighted value

Preferably, the application means that the time weighted value is multiplied by the user ranking and the document ranking.

여기서,here,

d_c는 액션랭킹 계산시 시간d _c is the time of action ranking calculation

d_w는 문서 생성 시간.d _w is the document generation time.

상기 문서랭킹에 시스템랭크를 적용하여 최종 문서랭킹을 산출하는 것이 바람직하다.It is preferable to calculate a final document ranking by applying a system rank to the document ranking.

상기 최종 문서랭킹은

에 의해 계산되는 것이 바람직하다.The final document ranking

It is preferable to calculate by.

여기서,here,

: 질의 q에 대한 최종 문서랭킹

: Final document ranking for query q

s_j : j번째 주제값s _j : j subject value

d_i : i번째 문서d _i : i-th document

q : queryq: query

ave : 평균값ave: mean value

: 주제벡터 s_j와 쿼리벡터 q간의 코사인 값

: Cosine of subject vector s _j with query vector q

: j번째 주제에 속한 문서 i번째 문서의 Atype 액션랭크 점수

: Atype Action Rank score of the i th document in the j subject

α, β : 액션랭크 점수와 SR의 가중치α, β: Action Rank Score and SR Weight

SR : 시스템 랭크값(검색엔진의 랭킹점수)SR: System rank value (ranking score of search engine)

작성한 사용자를 알 수 없는 문서에 대하여는, 상기 문서에 대한 플러스회수, 클릭회수의 영향력과 시스템랭크값을 고려한 문서랭킹을 추출하는 사용자가 없는 문서의 문서랭킹을 추출하는 문서랭킹추출단계를 갖는 것이 바람직하다.For a document whose user is unknown, it is preferable to have a document ranking extraction step of extracting a document ranking of a document without a user, which extracts a document ranking in consideration of the influence of plus counts, click counts, and system rank values for the document. Do.

상기 사용자가 없는 문서의 문서랭킹은 Document ranking of documents without the user

에 의해 계산되는 것이 바람직하다.

It is preferable to calculate by.

여기서,here,

상기 사용자가 없는 문서의 문서랭킹에는 시간가중치가 적용되어 시간이 경과함에 따라 랭킹이 하락하는 것이 바람직하다.It is preferable that a time weighting value is applied to the document ranking of the document without the user so that the ranking decreases as time passes.

상기 시간가중치는

로 정의되며, 상기 적용은 상기 사용자가 없는 문서의 문서랭킹에 상기 시간가중치가 곱해진 것을 의미하는 것이 바람직하다.The time weighted value

The application means that the document weight of the document without the user is multiplied by the time weight.

여기서,here,

d_w는 문서 생성 시간.d _w is the document generation time.

또 다른, 본 발명에 의한 문서랭킹 부여방법은,Further, the document ranking granting method according to the present invention,

제 1 문서에 행해지는 사용자의 액션에 따라 상기 제 1 문서의 가중치를 계산하는 단계; 상기 계산된 제 1 문서의 가중치와 제 1 키워드와의 제 1 적합도를 계산하는 단계; 상기 제 1 키워드에 의해 질의가 발생하는 경우, 상기 제 1 키워드와 소정 크기 이상의 제 2 적합도를 가지는 제 2 문서에서, 상기 제 1 적합도 및 상기 제 2 적합도의 크기에 따라 상기 제 1 문서 및 상기 제 2 문서를 정렬하는 단 계를 포함한다.Calculating a weight of the first document according to an action of a user performed on the first document; Calculating a first goodness of fit with the first keyword and the weight of the calculated first document; When a query is generated by the first keyword, in the second document having the first keyword and a second goodness of fit of a predetermined size or more, the first document and the first document according to the size of the first goodness and the second goodness of fit. 2 Includes steps to sort documents.

상기 제 1 문서의 가중치는 상기 제 1 문서를 작성한 사용자의 랭킹에 비례하는 것이 바람직하다.The weight of the first document is preferably proportional to the ranking of the user who created the first document.

상기 제 1 문서의 가중치는 상기 제 1 문서를 읽은 사용자가 상기 제 1 문서를 북마크한 횟수 또는 답글을 추가하는 횟수에 비례하는 것이 바람직하다.The weight of the first document is preferably proportional to the number of times the user who read the first document bookmarks the first document or the number of replies added.

상기 제 1 문서의 가중치는 상기 제 1 문서를 읽은 사용자의 수에 비례하는 것이 바람직하다.The weight of the first document is preferably proportional to the number of users who read the first document.

상기 제 1 문서와 상기 제 2 문서가 동일한 내용을 포함하고, 상기 동일한 내용의 원 저자가 상기 제 1 문서인 경우, 상기 제 1 문서의 상기 제 1 키워드와의 적합도를 상기 제 2 문서의 적합도보다 높게 설정하는 단계를 더 포함하는 것이 바람직하다.When the first document and the second document contain the same content, and the original author of the same content is the first document, the suitability of the first document with the first keyword is greater than that of the second document. It is preferable to further include setting the height.

상기 제 1 문서의 가중치는 상기 제 1 문서에 대해 발생하는 사용자의 액션의 시간적 순서에 비례하는 것이 바람직하다.The weight of the first document is preferably proportional to the temporal order of actions of the user occurring on the first document.

상기 목적을 달성하기 위한 또 다른 본 발명은, 상기 방법을 수행할 수 있는 프로그램이 수록된 컴퓨터로 읽을 수 있는 기록 매체를 제공한다.Another object of the present invention for achieving the above object is to provide a computer-readable recording medium containing a program capable of performing the method.

이하 도면을 참조하여 상세히 설명한다. 그러나, 이들 도면은 예시적인 목적일 뿐 본 발명이 이에 한정되는 것은 아니다.Hereinafter, with reference to the drawings will be described in detail. However, these drawings are for illustrative purposes only and the present invention is not limited thereto.

도1은 검색 대상인 문서의 정보 및 사용자의 피드백 정보를 저장하는 데이터 베이스와 자동분류를 수행하는 자동분류기, 분류정보와 사용자 액션 정보를 이용하여 랭킹 값을 계산하는 랭킹 계산 모듈로 크게 이루어진다.1 is largely composed of a database storing information of a document to be searched and feedback information of a user, an automatic classifier performing automatic classification, and a ranking calculation module that calculates a ranking value using classification information and user action information.

액션은 정보탐색 과정 중의 암시적이거나 명시적인 활동을 모두 말한다. 이런 액션은 두 분류로 나눌 수 있으며, 첫 번째가 정보제공 주체로서의 Creation 액션이며, 두 번째는 정보탐색자로서의 Delivery 액션이다. 본 발명에서는 두 가지를 모두 비중있게 다룰 것이나, 실제 액션 가중치에는 Delivery액션보다 Creation 액션에 중요한 의미를 부여한다. An action is any implicit or explicit activity during the information search process. These actions can be divided into two categories. The first is the Creation action as the information provider, and the second is the Delivery action as the information explorer. In the present invention, both of them will be weighted, but the actual action weight is given more significance to the Creation action than the Delivery action.

도2는 액션의 종류 및 채택하는 액션에 대한 표를 나타내고 있다.2 shows a table of the types of actions and the actions to be adopted.

Write 액션은 문서를 작성하는 행위를 말한다. 통에서, 페이퍼에서, 싸이월드 게시판에서와 같이 자신이 작성한 글을 말하여, 모든 행동 중 가장 중요한 행동이 되며 주제 분류 시, 문서가 특정 주제로 분류될 값인 Write 액션 값은 재조정된다.The Write action is the act of writing a document. In the container, in the paper, as in the Cyworld bulletin board, it is the most important of all actions, and the Write action value, which is the value that the document is classified as a specific subject, is readjusted.

Wiki는 서치 플러스 서비스와 같이 개설될 서비스로, Wiki에서 내가 작성한 글에 대한 가중치이다. Wiki는 글을 쓸 수 있는 사용자가 제한적이기 때문에 Write액션과 동일한 가중치를 주는 것에는 문제가 있어 보인다. 이는 모든 사용자가 동일한 조건하에 경쟁해야하는 조건에 위배된다. 구루를 더욱 구루로 만들어 주며, 일반사용자는 구루 군에 들어올 수 있는 확률을 줄여 빈익빈 부익부현상이 이곳에서도 발생할 소지가 있다. 또한 Wiki는 한 문서에 대해서 여러명이 작성할 수 있어, 이를 액션랭킹에 반영하는 것이 쉽지 않다. 그래서 Wiki는 백과사전의 활용으로 한정하고, 액션랭킹과는 별개로 진행한다.A wiki is a service that will be opened like the Search Plus service, and is a weight for the article I wrote in the wiki. Since wikis have a limited number of users who can write, it seems problematic to give them the same weight as the Write action. This violates the condition that all users must compete under the same conditions. It makes guru more guru, and the general user can reduce the chance of entering the guru, and there is a possibility that the vacancy bin side benefit phenomenon will occur here. Also, a wiki can be written by several people on the same document, so it is not easy to reflect this in action ranking. Therefore, the wiki is limited to the use of encyclopedias and proceeds separately from action rankings.

RSS 액션은 나의 Archive를 RSS를 통해 주기적으로 읽는 다른 사용자의 영향력 지수를 나의 영향력으로 흡수한 점수이다. 단, RSS는 특정 주제의 문서들만을 등록할 경우만 반영될 수 있으나, RSS를 전체로 등록하는 것은 특정 주제로 매핑될 수가 없으므로 액션랭킹에서는 일단 배제한다.The RSS action is a score that absorbs the influence index of other users who read my archive periodically through RSS as my influence. However, RSS can be reflected only when registering only documents of a specific subject, but registering RSS as a whole cannot be mapped to a specific subject, so it is excluded from action ranking.

Plus 액션은 검색결과 또는 웹 서핑 도중 발견한 좋은 문서를 북마크 하는 개념으로, 사용자가 플러스한 문서는 원문 소유자의 액션점수로 반영되고, 또한 원문 자체의 내용도 나의 액션 점수에 미약하게 반영된다. Plus 액션은 문서에 대한 평가도 포함되며, 좋고 나쁨의 정도에 의해 랭킹값에도 영향을 주게 된다.Plus action is a concept of bookmarking a good document found during a search or web surfing. The document you add is reflected in the original owner's action score, and the content of the original itself is also weakly reflected in my action score. Plus actions also include evaluation of the document and affect the ranking by good or bad.

Comment 액션은 특정 문서에 대해 남들이 커맨트한 회수를 문서 주인의 액션랭킹에 반영함을 말한다. 문서가 생성된 서비스의 종류에 따라 커맨트 시스템이 구축되지 않은 경우가 있을 수 있기 때문에 커맨트의 값을 모든 문서에서 동일하게 부여하기는 어렵다. 또한 커맨트 자체가 긍정적 의미인지 부정적 의미인지는 시스템이 판단하기 어렵기 때문에 랭킹에 직접적인 영향을 주는 것은 위험하여 비용 대비 경제성이 없다.The Comment action reflects the number of comments made by others about a particular document in the document owner's action ranking. Because the command system may not be built according to the type of service for which the document is generated, it is difficult to give the command value the same in all documents. In addition, it is difficult for the system to judge whether the command itself is positive or negative, so it is dangerous to directly affect the ranking, which is not cost-effective.

Query 액션은 사용자가 검색질의로 사용한 키워드에 대한 가중치이다. 이 액션은 어떤 사용자가 어떤 주제에 관심이 있다는 성향파악에는 도움이 되나, Query를 많이 한 것이 많은 정보력을 가지지 않기 때문에 정보력 측면에서 유용하지 않은 액션이다.The Query action is the weight for the keyword that the user used as a search query. This action is useful for understanding the tendency that a user is interested in a certain topic, but it is not useful in terms of information power because a lot of queries do not have much information power.

Click 액션은 검색결과에서 사용자가 클릭한 다른 사람의 문서들을 말한다. 나의 영향력이 클릭된 문서에 반영되고, 문서의 내용은 나의 성향 및 액션점수에 미약하게 반영된다.The Click action refers to someone else's documents that the user clicked on in search results. My influence is reflected in the clicked document, and the content of the document is slightly reflected in my disposition and action score.

Duplicate 액션은 A타입 문서(네티즌 정보 베스트)만 해당되며, 스크랩이나 부분적인 문단 도용 등을 가려 원작자에게 좀더 좋은 높은 점수를 주고, 중복도가 높은 문서일수록 Write 액션 값을 더 높게 주기 위함이다. 이 것은 실제 연산 시, Plus 액션으로 투영할 수 있다. 예를 들어, A라는 사용자가 B라는 사용자의 문서를 스크랩하면, A라는 사용자가 B문서를 Plus한 것으로 계산하면 된다.Duplicate action is for A type document (the best netizen information), and it is to give the original author a higher score by masking scrap or partial paragraph theft, and the higher the duplicated document, the higher the Write action value. This can be projected as a Plus action during the actual operation. For example, if user A scraps a document of user B, user A calculates that document B is Plus.

액션 랭킹을 계산 하는 방법은 주제별 랭킹을 계산하기 위해 사용자의 문서(또는 액션들)을 주제별로 자동분류를 수행해야 한다. 이후 분류된 문서에 대해, 어떤 액션을 취한 타인의 영향력을 반영한다. The method of calculating the action ranking should perform automatic classification of the user's document (or actions) by the subject in order to calculate the thematic ranking. The classified documents then reflect the influence of the other person taking the action.

계산 절차는 다음과 같다.The calculation procedure is as follows.

- 액션 자동분류-Action classification

- 분류된 액션의 확률값에 남의 영향력을 반영-Reflects the influence of others on the probabilities of the classified actions

- 영향력이 반영된 액션들의 점수를 취합-Score the actions that reflect the influence

액션의 기본이 되는 것은 문서와 키워드이다. 플러스, 클릭조차 주체가 되는 것은 문서이므로 액션의 자동분류는 이러한 것들이 기본 소스가 된다. 자동분류된 액션들은 한 주제에 대한 적합도가 확률값으로 표현된다. 이 확률값과 함께 남이 이 문서에 한 액션들의 영향력을 같이 계산한다.The basis of the action is the document and keywords. Plus, even clicks are the subject, so automatic classification of actions is the primary source. Autoclassified actions are expressed as probability values of goodness of fit for a subject. Along with this probability, we calculate the influence of the actions that others have done on this document.

사용자의 영향력은 주제로 나뉘어지게 되며, 특정 주제에 대한 사용자의 영향력은 1차적으로 그 주제에 포함된 문서 액션들의 합으로 계산된다. 여기서 액션 들의 합이란, 사용자가 취한 모든 액션들을 자동분류를 통해 그 주제와 일치하는 확률을 계산하고, 이 확률값들에 액션 가중치로 보정하여 사용자의 영향력으로 판단한다.The influence of the user is divided into subjects, and the influence of the user on a particular subject is primarily calculated as the sum of the document actions contained in the subject. Here, the sum of actions is calculated by calculating the probability of matching all the actions taken by the user through the automatic classification, and correcting these probability values by the action weight to determine the influence of the user.

액션랭크의 알고리즘은 앞서 말한 액션의 종류들에 대해 수치화 되어진 결과가 합산되어 나타난다.Action Rank's algorithm is the sum of the numerical results for the aforementioned types of actions.

다음은 각 액션 별 가중치 계산법이다.The following is the weight calculation method for each action.

(1) 생성한 문서에 대한 가중치(1) weights for generated documents

사용자가 작성한 문서에 대한 영향력은 특정 주제에 포함된 사용자의 문서들의 가중치 및 남의 영향력의 총 합으로 계산된다.Influence on the user-created document is calculated as the sum of the weight of the user's documents included in a particular subject and the influence of others.

특정 주제 s에 대해 j번째 문서의 가중치를 W(s,d_j) 라 할 때When the weight of the j th document for a particular subject s is W (s, d _j )

문서의 가중치는 자동분류를 통해 분류된 문서의 주제에 기본적으로 영향을 받는다. 여기서 P(s, d_j)는 자동분류를 통해 생성된 주제별 문서의 확률값이다. 문서의 가중치는 속한 주제와 일치할 확률값을 기본값으로 하고, 여기에 다른 사용자가 이 문서에 행한 액션의 영향력을 통해 확률값이 보정되어진다. The weight of a document is basically influenced by the subject of the classified document through automatic classification. Where P (s, d _j ) is the probability value of the subject-specific document generated through the automatic classification. The weight of the document defaults to a probability value that will match the subject to which it belongs, and the probability value is corrected through the influence of the actions that other users have made on this document.

다른 사용자의 영향력은 크게 3가지로 플러스와 중복도와 클릭으로 나뉜다. 이를 계산하는 방법은 시스템 리소스와 시간을 고려하여, 매일 한 번 이전 날까지(d-1) 각 액션의 영향력의 총 합에 그 날(d)의 남의 영향력을 합쳐서 계산하게 된다. 마지막으로, 이 문서 이하에 쓰이는 각 액션들에 대한 가중치는 시뮬레이션을 통해서 평가된 회귀계수를 반영한 가중치가 사용된다.There are three main influences of other users: plus, redundancy and click. The calculation method takes into account the system resources and time, and calculates the sum of the influence of each action by the previous day (d-1) and the influence of others of the day (d) once a day. Finally, the weights for each action used in this document are used to reflect the regression coefficients evaluated through the simulation.

(2) 내가 한 플러스 가중치(2) I did plus weight

플러스 액션이 발생하면, 나의 영향력이 문서가중치를 계산할 때 문서 작성자에게 일정부분이 부여된다. 그리고, 나에게는 플러스한 문서가 속한 주제에 대해 문서의 가중치가 나의 영향력으로 일정부분 부여된다. 플러스 가중치는 다음 수식으로 결정한다.When a plus action occurs, a portion of my influence is given to the document author when calculating the document weights. And, the weight of the document is given to my influence on the subject to which the plus document belongs. The plus weight is determined by the following formula.

특정 주제 s에 속해 있는 ㅣ번째 문서에 대한 플러스 가중치를 W(s, p_ㅣ)라 할 때When the plus weight for the first document belonging to a specific subject s is W (s, p _| )

주제별 문서의 영향력은 주제에 속할 문서의 확률과 문서 소유자의 영향력이 곱해서 반영된다. The influence of a topical document is reflected by multiplying the probability of the document belonging to the topic by the influence of the document owner.

한 문서에는 여러 사람이 플러스 할 수 있으며, 이때 플러스 영향력은 동일하게 분배된다고 가정한다. Multiple people can be added to a document, assuming that the positive influence is equally distributed.

(3) 내가 한 중복도 가중치(3) the redundancy weights I did

중복도 액션은 플러스와 같은 방식으로 계산된다. 중복도 가중치는 다음 수식으로 결정한다.Redundancy actions are calculated in the same way as plus. Redundancy weight is determined by the following equation.

특정 주제 s에 속해 있는 o번째 문서에 대한 중복도 가중치를 W(s, r_o)라 할 때When the redundancy weight for the oth document belonging to a specific topic s is W (s, r _o )

(4) 내가 한 클릭 가중치(4) the weight I clicked

클릭 또한, 플러스와 같은 방식으로 계산한다. 단지, 중요도는 플러스가 월등할 것으로 예상된다. 클릭은 중복 제거 방법을 적용하여, 악의적인 행위에 대해 클릭방지를 할 수 있어야 한다(하루에 문서당, 한 유저가 한 번의 클릭만 할 수 있도록 제안).Clicks also count in the same way as plus. However, the importance is expected to be superior to the plus. Clicks should apply deduplication to prevent clicks against malicious behavior (suggesting one user only one click per document per day).

특정 주제 s에 속해 있는 m번째 문서에 대한 클릭 가중치를 W(s, c_m)라 할 때Let's say W (s, c _m ) is the click weight for the mth document in a particular topic s.

최종 액션랭킹은 위에서 설명한 3가지의 가중치를 조합하여 문서의 작성자의 유무에 따라 크게 두가지 타입으로 나누어 수식화 된다.The final action ranking is formulated by dividing into two types according to the existence of the author of the document by combining the three weights described above.

(1) 문서의 작성자를 알수있는 타입(이후 A타입이라 칭함)(1) A type that can identify the author of a document (hereinafter referred to as A type)

A타입은 액션랭킹이 사용자 랭킹과 문서 랭킹으로 나뉜다. In Type A, Action Ranking is divided into User Ranking and Document Ranking.

사용자의 액션랭킹은 생성한 문서에 가중치와 내가 한 플러스 가중치와 내가 한 클릭 가중치를 복합적으로 사용한다. A타입에 따른 사용자 액션랭킹은 아래와 같다.The user's action ranking uses a combination of weights, plus one's plus weight, and one's click weights in the generated document. User action ranking according to A type is as follows.

위의 식은 특정 주제(s)에 대하여 관심있는 사용자(u_j)의 랭킹은 그 사용자가 작성한 모든 문서들 각각 다른 사용자들로부터 액션이 많이 수행(

)되고, 그 자신 또한, 그 주제에 일치하는 문서들에 대해서 많은 액션을 수행(

)하였을 때 순위가 올라가게 됨을 의미한다.The above equation shows that the ranking of interested users (u _j ) on a particular subject (s) is a lot of action from different users for each document written by that user.

) And itself also perform a number of actions on documents that match the subject (

) Means that the rank goes up.

문서의 액션랭킹은 생성한 문서에 가중치와 사용자의 액션랭킹과 시스템랭킹을 모두 병합하여 사용한다. A타입에 따른 문서 액션랭킹은 다음과 같다.Action ranking of the document is used by merging both the weighting and the user's action ranking and system ranking in the generated document. The document action ranking according to A type is as follows.

위의 식은 문서가 생성한 문서의 가중치(W(s, d_j))에 그 문서의 사용자 액션점수(AR_A(s, u_j))를 곱하는 것으로, d_j문서가 특정 주제(s)에 관심있는 다른 사용자들로부터 액션이 많이 수행되고, 그 문서의 사용자가 그 주제에 구루이면 상위에 랭크 되도록 하기 위함이다.To the weight of the expression of these documents to produce document _{(W (s, d j)} ) by multiplying the user action score of the document _{_{(AR A (s, u j}} )), the d _j documents on a specific topic (s) This is to ensure that many actions are taken by other interested users, and ranked higher if the user of the document is a guru on the subject.

(2) 사용자를 알수 없는 타입(이후 B타입이라 칭함)(2) Type unknown to the user (hereinafter referred to as type B)

B타입의 문서는 주제에 대한 정보가 없으므로, 단순히 고려한 액션 자질의 횟수만을 사용한다.Documents of type B have no subject information, so simply use the number of action qualities considered.

(3) 날짜 반영(3) reflect the date

문서의 생성 날짜를 시점에서 현재 액션랭킹을 계산하는 날짜까지의 기간을 수치화하여 이를 액션랭킹 값에 곱하여, 최신의 글들이 좀 더 랭킹이 높도록 유도한다. AR_A(s, d_j)과 AR_B(s, d_j)에 0.998^(d_c-d_w)을 곱하면 도3과 같은 그래프를 그린다. 단, 최소 시간가중치는 0.5이며, 1년 이후부터 0.5를 일괄 곱하여 시간성을 반영한다. 여기서 d_c는 액션랭킹 계산 시 시간, d_w는 문서 생성 시간을 말한다.The date of creation of the document is digitized from the time point to the date of calculating the current action ranking and multiplied by the action ranking value, leading to higher ranking of the latest articles. When AR _A (s, d _j ) and AR _B (s, d _j ) are multiplied by 0.998 ^ (d _c -d _w ), a graph like FIG. However, the minimum time weight is 0.5 and multiplies 0.5 by one year to reflect timeliness. Where d _c is the time of action ranking calculation and d _w is the document generation time.

최종 시간 가중치를 고려한 타입별 액션랭크는 다음과 같다.The action rank for each type considering the final time weight is as follows.

검색 시 적용 방법은 다음과 같다.Application method is as follows.

검색도 물론, A타입과 B타입으로 구분하며, 아래의 도4의 시스템 구성으로 검색을 수행한다. B타입은 기존의 검색방법으로, B타입 액션랭킹으로 순위화하면 되지만 A타입 검색은 주제 검색 서버로부터 쿼리와 가장 잘 일치하는 주제정보를 검색하여 기존 랭킹과 결합하는 점이 다른 점이다. 아래의 설명은 A타입에 대한 문서와 사용자 검색 적용 방법이다.Search is, of course, divided into A type and B type, and performs the search by the system configuration of FIG. 4 below. B type is the existing search method, but it can be ranked by B type action ranking, but A type search differs from the subject search server by searching the subject information that best matches the query and combining it with the existing ranking. The description below describes the documentation for Type A and how to apply user search.

(1) 문서 검색(1) document search

전통적인 검색과 같은 맥락이나, 내부 검색 방법은 키워드 매칭과 주제 매칭의 결합으로 결과를 추출한다.In the same context as traditional search, the internal search method extracts results by combining keyword matching and topic matching.

검색 알고리즘Search algorithm

1) 여러 키워드에 대해 여러 주제를 매핑하여 찾음(max 3개).1) Mapping and finding multiple topics for different keywords (max 3).

--> 질의벡터와 주제 벡터간의 cosine 계수 사용-> Use cosine coefficients between query vector and subject vector

2) 질의벡터가 포함된 문서들을 결과 집합으로 추출2) Extract documents containing query vectors into result set

--> 질의-문서간 키워드 매칭 사용-> Using Query-Document Keyword Matching

3) 액션랭킹을 반영하여 최종 랭킹을 계산3) Calculate final ranking reflecting action ranking

여기서,here,

: 질의 q에 대한 최종 문서랭킹

: Final document ranking for query q

s_j : j번째 주제값s _j : j subject value

d_i : i번째 문서d _i : i-th document

q : queryq: query

ave : 평균값ave: mean value

: 주제벡터 s_j와 쿼리벡터 q간의 코사인 값

: Cosine of subject vector s _j with query vector q

: j번째 주제에 속한 문서 i번째 문서의 Atype 액션랭크 점수

: Atype Action Rank score of the i th document in the j subject

∴ 정책적으로 시스템 랭크보다, 쿼리와의 주제 관계까지 고려한 액션 중, 어디에 무게를 줄 것인지 등을 가중치로 제어할 수 있다. By policy, you can control the weight of the actions that take into consideration the subject relation with the query rather than the system rank.

∴ α는 class 회귀계수를 참고하고 β는 system 회귀 계수를 이용한다. ∴ α will refer to the class and the coefficient β is used in the system coefficient.

검색 질의는 복합질의가 될 수 있으며, 질의 또는 주제분류에 따라 검색결과가 하나의 주제에 대한 결과가 아닌 여러 주제에 걸친 주제 집합의 결과가 나타날 수 있다. 이 결과 중에서, cosine 계수가 높은 주제를 최대 3개만 추출한다. 이는, 엔진에서의 검색 반응 속도를 저하를 방지하기 위함이다. 그리고, 키워드 매칭으로 인한 문서들을 최종적으로 액션랭크로 순위화하여 검색한다.A search query can be a complex query, and depending on the query or subject classification, the search results can be the result of a set of topics that span multiple topics, rather than a single topic. Of these results, we extract only up to three subjects with high cosine coefficients. This is to prevent the search reaction speed in the engine from being lowered. Finally, the documents resulting from keyword matching are searched by ranking them by action rank.

(2) 사람 검색(2) people search

문서검색과 유사하나, 주제내의 사람들만 추출하는 것이 다르다.Similar to document search, but extracting only those within the subject.

검색 알고리즘Search algorithm

1) 여러 키워드에 대해 여러 주제를 매핑하여 찾음.1) Mapping multiple topics for different keywords.

2) 매핑된 주제에 속한 사용자를 전체 결과 집합으로 추출2) Extract the users belonging to the mapped subject as a full result set

3) 질의에 따른 액션랭킹을 반영하여 최종 랭킹을 계산3) Calculate final ranking by reflecting action ranking according to query

도5와 도6은 위의 알고리즘으로 서비스되는 문서와 사용자의 목록을 예시한다.5 and 6 illustrate a list of documents and users served by the above algorithm.

이상과 같이 본 발명에 의하면, 특정 질의에 부합하는 문서와 사용자를 노출하고, 사용자의 피드백 정보를 이용하여 기존의 단순한 키워드 기반 검색랭킹 방법에 비해, 월등한 검색 성능을 보이는 효과가 있다.As described above, according to the present invention, the document and the user corresponding to the specific query are exposed, and the user's feedback information is used to show superior search performance compared to the existing simple keyword-based search ranking method.

Claims

In the document ranking grant method,

A document creation step of creating a document by a user;

A query input step of inputting a query by a searcher;

A document weight extraction step of extracting a document weight value considering a sum of plus influence, redundancy impact, and click impact force up to a search date for the document to a probability that the document created in the document creation step matches the query;

The sum of document weights made by others of documents related to the query among the documents created by the user, the sum of plus weights of other people related to the query, the sum of duplicate weights, and the sum of click weights. A user ranking extraction step of extracting a user ranking from the user;

And a document ranking extracting step of extracting a document ranking from the document weight and user ranking.

The method of claim 1,

The document ranking AR _A (s, d _j ) is calculated by the following formula.

here,

W (s, d _j ) is the document weight

AR _A (s, u _i ) is a user ranking.

The method of claim 2,

And the document weight value W (s, d _j ) is calculated by the following equation.

here,

The method of claim 2,

And the user ranking AR _A (s, u _i ) is calculated by the following equation.

here,

Document weight

Plus weight

Redundancy Weight

Click weights

The method according to any one of claims 1 to 4,

One or both of the user ranking, the document ranking is applied to the document weighting method, characterized in that the ranking is reduced as time passes.

The method of claim 5,

The time weighting value is defined by the following equation, and the application is a document ranking granting method, characterized in that the user weighting, document ranking means that the time weighting value is multiplied.

here,

d _c is the time of action ranking calculation

d _w is the document generation time.

The method of claim 5, wherein the final document ranking is calculated by applying the system ranking to the document ranking according to claim 5.

The method of claim 7, wherein

Final document ranking

The document ranking grant method, characterized in that calculated by the following equation.

here,

: Final document ranking for query q

s _j : j subject value

d _i : i-th document

q: query

ave: mean value

: Cosine of subject vector s _j with query vector q

: Atype Action Rank score of the i th document in the j subject

α, β: Action Rank Score and SR Weight

SR: System rank value (ranking score of search engine)

The method of claim 8,

For a document whose user is unknown, a document ranking extraction step of extracting a document ranking of a document without a user, which extracts a document ranking in consideration of the influence of plus counts, click counts, and system rank values for the document, is performed. Grant method.

The method of claim 9,

The document ranking AR _B (s, d _j ) of the document without the user is calculated by the following equation.

The method of claim 10,

The document ranking granting method of claim 1, wherein a document weighting is applied to a document ranking, wherein a time weight is applied and the ranking decreases as time passes.

The method of claim 11,

The time weighting value is defined by the following equation, and the application is a document ranking granting method, characterized in that the time weighting value is multiplied by the document ranking of the document without the user.

here,

d _c is the time of action ranking calculation

d _w is the document generation time.

Calculating a weight of the first document according to a user action performed on a first document, wherein the weight of the first document is proportional to the temporal order of the user's actions occurring on the first document;

Calculating a first goodness of fit with the first keyword and the weight of the calculated first document;

When the query is generated by the first keyword, in the second document having the first keyword and the second goodness of fit, the first document and the second document are determined according to the size of the first goodness and the second goodness of fit. And aligning the document ranking.

The method of claim 13,

And the weight of the first document is proportional to the ranking of the user who created the first document.

The method of claim 13,

And the weight of the first document is proportional to the number of times the user who read the first document bookmarks the first document or the number of replies added.

The method of claim 13,

And the weight of the first document is proportional to the number of users who read the first document.

The method of claim 13,

When the first document and the second document contain the same content, and the original author of the same content is the first document, the suitability of the first document with the first keyword is greater than that of the second document. The document ranking grant method further comprising the step of setting a high.

delete

A computer-readable recording medium containing a program capable of performing the method of any one of claims 1 to 17.