KR101290000B1

KR101290000B1 - Method and apparatus for ranking paper

Info

Publication number: KR101290000B1
Application number: KR1020110009325A
Authority: KR
Inventors: 김상욱; 배덕호; 황세미
Original assignee: 한양대학교 산학협력단
Priority date: 2011-01-31
Filing date: 2011-01-31
Publication date: 2013-08-07
Also published as: KR20120088170A

Abstract

논문 랭킹 방법 및 장치가 제공된다. 논문에 대한 랭킹이 수행됨으로써 특정 분야의 핵심 논문들이 검출된다. 논문 랭킹 방법 및 장치는 통합된 하나의 확률 모델에 기반한다. 상기의 확률 모델은 논문의 명성, 논문들 간의 참조 관계, 참조 관계에 있는 논문들 간의 발행 연도 차이 및 논문의 최신성을 고려할 수 있다. 또한, 상기의 논문 랭킹 방법 및 장치의 수행 속도 및 확장성을 증가시키는 방안이 제시된다.A paper ranking method and apparatus are provided. By ranking the papers, key papers in a specific field are detected. The article ranking method and apparatus is based on one integrated probabilistic model. The probability model may take into account the reputation of the article, the reference relations among the articles, the difference in the publication year between the articles in the reference relation, and the freshness of the article. In addition, a method for increasing the speed and scalability of the paper ranking method and apparatus are presented.

Description

Thesis ranking method and apparatus {METHOD AND APPARATUS FOR RANKING PAPER}

아래의 실시예들은 논문의 랭킹을 수행하기 위한 방법 및 장치에 관한 것이다.The following embodiments are directed to a method and apparatus for performing a ranking of articles.

논문 랭킹을 통해 핵심 논문을 검출하는 장치 및 방법이 개시된다.Disclosed are an apparatus and method for detecting a core article through article ranking.

페이지 랭크(PageRank) 및 하이퍼링크-유발 주제 검색(Hyperlink-Induced Topic Search; HITS)와 같은, 웹 페이지(web page)를 위한 특정 랭킹(ranking) 알고리즘(algorithm)은 웹 페이지에게 랭크(rank)를 부여하기 위해 단지 하이퍼링크(hyperlink)만을 사용한다.Specific ranking algorithms for web pages, such as PageRank and Hyperlink-Induced Topic Search (HITS), rank the web pages with rank. Use only hyperlinks to grant.

그러나, 논문 랭킹 알고리즘들은 보다 정확한 논문의 권위(authority) 측정을 위해 논문들 간의 참조 관계뿐만 아니라, 논문의 저자 및 논문이 발표된 학회 등의 정보도 함께 이용하여 논문의 영향력을 측정하고, 측정된 영향력에 기반하여 논문에게 랭크를 부여한다.However, the paper ranking algorithms measure the influence of the paper by using not only the reference relations between the papers but also information about the author of the paper and the society where the paper was published. Rank papers based on influence.

브라우징-기반 모델(browsing-based model) 및 팝 랭크(PopRank)와 같은 기술은, 논문 간의 참조 관계뿐만 아니라, 저자-논문 관계도 이용한다. 또한, 팝 랭크는 학회-논문의 관계도 추가적으로 이용한다.Techniques such as browsing-based models and PopRank use author-paper relationships as well as reference relationships between articles. Pop rank also uses the association-thesis relationship in addition.

최신 논문의 품질이 높더라도 상위에 랭크되지 않는 랭킹 알고리즘의 전형적인 문제를 해결하기 위해, 사이트 랭크(CiteRank)는 논문의 최신 정도를 반영하여 논문에게 랭크를 부여한다.In order to solve the typical problem of the ranking algorithm, which is not ranked at the top even if the quality of the latest paper is high, SiteRank reflects the latest degree of the paper and ranks the paper.

전술된 종래 기술들은, 특정 분야에서 핵심 논문들을 찾는 것에 중점을 두지 않는다. 따라서, 전술된 종래 기술들은, 논문의 분야와는 독립적으로, 중요한 논문들 또는 논문들로부터 참조받는 논문을 존중하였다.The foregoing prior arts do not focus on finding key articles in a particular field. Accordingly, the above-described prior arts respected articles which are referenced from important articles or articles, independently of the field of the article.

그러나, 특정 분야의 핵심 논문을 검출하기 위해서는, 상기 분야에서의 지지도를 반영할 수 있는 랭킹 방법이 요구된다.However, in order to detect key articles in a specific field, a ranking method capable of reflecting the support in the field is required.

또한, 전술된 종래 기술들의 대부분은 단지 권위 있는 논문을 찾는 문제에 초점을 맞추었다.In addition, most of the prior art described above has focused only on the problem of finding authoritative papers.

즉, 전술된 종래 기술들은 예전의 핵심 논문을 찾는 것에는 매우 유용하지만, 최신의 핵심 논문을 찾는 것에는 적합하지 않다.That is, the aforementioned prior arts are very useful for finding the old core papers, but are not suitable for finding the latest core papers.

그러나, 특정 분야의 연구 동향을 파악하기 위해서는 최신 핵심 논문을 파악하는 것이 매우 중요하며, 이러한 최신의 핵심 논문을 추출할 수 있는 랭킹 방법이 요구된다.However, in order to grasp research trends in a specific field, it is very important to identify the latest core papers, and a ranking method for extracting the latest core papers is required.

본 발명의 일 실시에는 참조 관계에 있는 논문들 간의 발행 연도 차이를 반영하여 논문의 랭크를 계산하는 장치 및 방법을 제공할 수 있다.One embodiment of the present invention can provide an apparatus and method for calculating the rank of a paper by reflecting the difference in the year of publication between the papers in the reference relationship.

본 발명의 일 실시에는 논문의 최신성을 반영하여 논문의 랭크를 계산하는 장치 및 방법을 제공할 수 있다.One embodiment of the present invention may provide an apparatus and method for calculating the rank of a paper by reflecting the freshness of the paper.

본 발명의 일 측에 따르면, 복수 개의 논문들 간의 참조 관계를 추출하는 참조 관계 모델링부 및 상기 참조 관계에 기반하여 상기 복수 개의 논문들 중 제1 논문의 랭크를 계산하는 랭크 계산부를 포함하고, 상기 랭크 계산부는 상기 제1 논문을 참조하는 제2 논문의 발행 시기 및 상기 제1 논문의 발행 시기 간의 차이를 반영하여 상기 제1 논문의 랭크를 계산하는, 논문 랭크 계산 장치가 제공된다.According to one aspect of the present invention, a reference relationship modeling unit for extracting a reference relationship between a plurality of papers and a rank calculation unit for calculating a rank of the first paper of the plurality of papers based on the reference relationship, A rank calculation unit is provided for calculating a rank of the first article by reflecting a difference between a publication time of a second article referring to the first article and a publication time of the first article.

상기 논문 랭크 계산 장치는 상기 하나 이상의 논문들 각각의 랭크들에 기반하여, 상기 하나 이상의 논문들 중 핵심 논문을 추출하는 핵심 논문 추출부를 더 포함할 수 있다.The article rank calculation apparatus may further include a core article extracting unit for extracting a core article among the one or more articles based on ranks of each of the one or more articles.

상기 핵심 논문 추출부는 상기 하나 이상의 논문들 중 랭크의 값이 특정한 임계 값 이상인 논문들을 상기 핵심 논문으로서 추출할 수 있다.The core paper extracting unit may extract, as the core papers, papers whose rank value is greater than or equal to a certain threshold value among the one or more papers.

상기 참조 관계 모델링부는 상기 복수 개의 논문들 간의 상기 참조 관계를 그래프 자료 구조 형태로 모델링하고, 상기 그래프 내의 노드는 상기 복수 개의 논문들 중 하나의 논문을 나타내고, 상기 그래프 내의 에지는 상기 복수 개의 논문들 간의 참조 관계를 나타낼 수 있다.The reference relationship modeling unit models the reference relationship between the plurality of papers in the form of a graph data structure, a node in the graph represents one paper among the plurality of papers, and an edge in the graph indicates the plurality of papers. It can represent a reference relationship between them.

상기 복수 개의 논문들은 동일한 분야에 속하는 논문들일 수 있다.The plurality of papers may be papers belonging to the same field.

상기 랭크 계산부는 상기 제1 논문을 참조하는 제2 논문들의 개수 또는 상기 제1 논문을 참조하는 상기 제2 논문들을 참조하는 제3 논문들의 개수를 반영하여 상기 제1 논문의 랭크를 계산할 수 있다.The rank calculator may calculate the rank of the first paper by reflecting the number of second papers referring to the first paper or the number of third papers referring to the second papers referring to the first paper.

상기 랭크 계산부는 상기 제1 논문을 참조하는 제4 논문들의 개수를 반영하여 상기 제1 논문의 랭크를 계산할 수 있고, 상기 제4 논문들은 상기 제1 논문의 분야와 동일한 분야의 논문일 수 있다.The rank calculator may calculate the rank of the first paper by reflecting the number of fourth papers referring to the first paper, and the fourth papers may be papers in the same field as that of the first paper.

상기 랭크 계산부는 상기 제1 논문의 발행 시기를 반영하여 상기 제1 논문의 랭크를 계산할 수 있다.The rank calculator may calculate the rank of the first article by reflecting the publication time of the first article.

상기 랭크 계산부는 리스타트 벡터 및 참조 관계 행렬에 기반하여 상기 제1 논문의 랭크를 계산하는 랜덤 워크 위드 리스타트(Random Walk with Restart; RWR)에 기반하여 상기 제1 논문의 랭크를 계산할 수 있다.The rank calculator may calculate the rank of the first article based on a random walk with restart (RWR) that calculates the rank of the first article based on a restart vector and a reference relationship matrix.

상기 랭크 계산부는 상기 제1 논문과의 유사도를 기반으로 값이 설정되는 상기 리스타트 벡터를 사용함으로써 상기 제1 논문을 참조하는 상기 제1 논문의 분야와 동일한 분야의 논문들인 제4 논문들의 개수를 반영하여 상기 제1 논문의 랭크를 계산하는, 논문 랭크 계산 장치.The rank calculator calculates the number of fourth papers that are papers in the same field as the first paper that refers to the first paper by using the restart vector whose value is set based on the similarity with the first paper. The article rank calculation apparatus which calculates the rank of the said 1st article by reflection.

상기 랭크 계산부는 상기 복수 개의 논문들 중 참조 관계에 있는 논문들 간의 유사도를 반영하도록 변경된 상기 참조 관계 행렬을 사용함으로써 상기 제1 논문을 참조하는 제4 논문들 각각의 상기 제1 논문과의 연관도들을 반영하여 상기 제1 논문의 랭크를 계산할 수 있다.The rank calculator is associated with the first article of each of the fourth articles referring to the first article by using the reference relation matrix changed to reflect the similarity between the articles in the reference relationship among the plurality of articles. Reflecting these, the rank of the first article can be calculated.

상기 랭크 계산부는 상기 제1 논문을 참조하는 제5 논문의 발행 시기 및 상기 제1 논문의 발행 시기 간의 차이를 반영하는 상기 리스타트 벡터를 사용함으로써, 상기 제5 논문의 발행 시기 및 상기 제1 논문의 발행 시기 간의 차이를 반영하여 상기 제1 논문의 랭크를 계산할 수 있다.The rank calculation unit uses the restart vector reflecting a difference between a publication time of a fifth article referring to the first article and a publication time of the first article, thereby publishing the fifth article and the first article. The rank of the first article may be calculated by reflecting the difference between the publication times of the first articles.

상기 랭크 계산부는 상기 리스타트 벡터 내에 지수적 증가 함수를 사용하여 발행 시기의 차이가 큰 논문으로의 리스타트 확률을 증가시킴으로써 참조 관계에 있는 두 논문들 간의 발행 시기의 차이를 상기 리스타트 벡터에 반영할 수 있다.The rank calculation unit uses an exponential increase function in the restart vector to increase the restart probability of articles with a large difference in publication time, thereby reflecting the difference in publication time between two papers in a reference relationship in the restart vector. can do.

상기 랭크 계산부는 상기 제1 논문 이전에 출판된 논문들로의 리스타트 확률이 0인 커스토마이즈된 상기 리스타트 벡터를 사용함으로써 상기 제1 논문의 발행 시기를 반영하여 상기 제1 논문의 랭크를 계산할 수 있다.The rank calculator calculates the rank of the first article by reflecting the publication time of the first article by using the customized restart vector having a restart probability of 0 to articles published before the first article. Can be calculated

상기 랭크 계산부는 상기 리스타트 벡터 계수를 계산하여 저장할 수 있고, 상기 계산된 리스타트 벡터 계수 및 상기 제1 논문의 리스타트 벡터를 곱함으로써 상기 제1 논문의 랭킹을 계산할 수 있다.The rank calculator may calculate and store the restart vector coefficients, and calculate the ranking of the first article by multiplying the calculated restart vector coefficients and the restart vector of the first article.

본 발명의 다른 일 측에 따르면, 복수 개의 논문들 간의 참조 관계를 추출하는 참조 관계 모델링 동작 및 상기 참조 관계에 기반하여 상기 복수 개의 논문들 중 제1 논문의 랭크를 계산하는 랭크 계산 동작을 포함하고, 상기 랭크 계산 동작은 상기 제1 논문을 참조하는 제2 논문의 발행 시기 및 상기 제1 논문의 발행 시기 간의 차이를 반영하여 상기 제1 논문의 랭크를 계산하는, 논문의 랭크 계산 방법이 제공된다.According to another aspect of the present invention, a reference relationship modeling operation for extracting a reference relationship between a plurality of papers and a rank calculation operation for calculating a rank of a first paper of the plurality of papers based on the reference relationship; The rank calculation operation of the article is provided by calculating the rank of the first article by reflecting the difference between the publication time of the second article referring to the first article and the publication time of the first article. .

본 발명의 또 다른 일측에 따르면, 복수 개의 문서들 간의 참조 관계를 추출하는 참조 관계 모델링부 및 상기 참조 관계에 기반하여 상기 복수 개의 문서들 중 제1 문서의 랭크를 계산하는 랭크 계산부를 포함하고, 상기 랭크 계산부는 상기 제1 문서을 참조하는 제2 문서의 생성 시기 및 상기 제1 문서의 생성 시기 간의 차이를 반영하여 상기 제1 문서의 랭크를 계산하는, 문서 랭크 계산 장치가 제공된다.According to another aspect of the invention, the reference relationship modeling unit for extracting a reference relationship between a plurality of documents and a rank calculation unit for calculating the rank of the first document of the plurality of documents based on the reference relationship, The rank calculator is provided with a document rank calculation device that calculates the rank of the first document by reflecting a difference between the generation time of the second document referring to the first document and the generation time of the first document.

상기 문서 랭크 계산 장치는, 상기 하나 이상의 문서들 각각의 랭크들에 기반하여, 상기 하나 이상의 문서들 중 핵심 문서를 추출하는 핵심 문서 추출부를 더 포함할 수 있다.The document rank calculating device may further include a core document extracting unit that extracts a core document among the one or more documents based on ranks of the one or more documents.

상기 랭크 계산부는 상기 제1 문서를 참조하는 제2 문서들의 개수 또는 제1 문서를 참조하는 상기 제2 문서들을 참조하는 제3 문서들의 개수를 반영하여 상기 제1 문서의 랭크를 계산할 수 있다.The rank calculator may calculate the rank of the first document by reflecting the number of second documents referring to the first document or the number of third documents referring to the second documents referring to the first document.

상기 랭크 계산부는 상기 제1 문서를 참조하는 제4 문서들의 개수를 반영하여 상기 제1 문서의 랭크를 계산할 수 있고, 상기 제4 문서들은 상기 제1 문서의 분야와 동일한 분야의 문서일 수 있다.The rank calculator may calculate the rank of the first document by reflecting the number of fourth documents referring to the first document, and the fourth documents may be documents of the same field as that of the first document.

상기 랭크 계산부는 상기 제1 문서의 생성 시기를 반영하여 상기 제1 문서의 랭크를 계산할 수 있다.The rank calculator may calculate a rank of the first document by reflecting a generation time of the first document.

참조 관계에 있는 논문들 간의 발행 연도 차이를 반영하여 논문의 랭크를 계산하는 장치 및 방법이 제공된다.An apparatus and method are provided for calculating the rank of a paper reflecting the difference in year of publication between articles in a reference relationship.

논문의 최신성을 반영하여 논문의 랭크를 계산하는 장치 및 방법이 제공된다.An apparatus and method are provided for calculating the rank of a paper reflecting the freshness of the paper.

도 1은 본 발명의 일 실시예에 따른 논문의 랭크 계산 장치(100)의 구조도이다.
도 2는 랭크 계산부가 논문의 랭크를 계산하기 위해 사용하는 기준의 일 예를 설명한다.
도 3은 본 발명의 일 예에 따른 핵심 논문 검출 점수 계산 방법을 나타낸다.
도 4는 본 발명의 일 예에 따른 프리-컴퓨트 방법을 설명한다.
도 5는 본 발명의 일 실시예에 따른 논문의 랭크 계산 방법의 흐름도이다.1 is a structural diagram of an apparatus 100 for calculating a rank of a paper according to an embodiment of the present invention.
2 illustrates an example of a criterion used by the rank calculator to calculate the rank of a paper.
3 illustrates a method for calculating a core paper detection score according to an embodiment of the present invention.
4 illustrates a pre-computing method according to an embodiment of the present invention.
5 is a flowchart illustrating a rank calculation method of a paper according to an embodiment of the present invention.

이하에서, 본 발명의 일 실시예를, 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.
Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. Like reference symbols in the drawings denote like elements.

도 1은 본 발명의 일 실시예에 따른 논문의 랭크 계산 장치(100)의 구조도이다.1 is a structural diagram of an apparatus 100 for calculating a rank of a paper according to an embodiment of the present invention.

논문 랭크 계산 장치(100)는 참조 관계 모델링부(110) 및 랭크 계산부(120)를 포함한다. 논문 랭크 계산 장치(100)는 핵심 논문 추출부(130)를 더 포함할 수 있다.The article rank calculation apparatus 100 includes a reference relationship modeling unit 110 and a rank calculation unit 120. The article rank calculation apparatus 100 may further include a core article extraction unit 130.

참조 관계 모델링부(110)는 복수 개의 논문들 간의 참조 관계를 추출한다.The reference relationship modeling unit 110 extracts a reference relationship between a plurality of papers.

복수 개의 논문들은 동일한 분야에 속하는 논문들일 수 있다.The plurality of papers may be papers belonging to the same field.

참조 관계 모델링부(110)는 복수 개의 논문들 간의 참조 관계를 그래프(graph) 자료 구조(data structure) 형태로 모델링할 수 있다. 그래프 내의 노드(node)는 한 편의 논문을 나타낸다. 그래프 내의 에지(edge)는 논문들 간의 참조 관계를 나타낸다. 즉, 제2 논문에 대응하는 제2 노드로부터 제1 논문에 대응하는 제1 노드를 향하는 방향성(directional) 에지는 제2 논문이 제1 논문을 참조하는 것을 나타낼 수 있다.The reference relationship modeling unit 110 may model a reference relationship between a plurality of papers in the form of a graph data structure. Nodes in the graph represent a paper. Edges in the graph represent the reference relationships between articles. That is, a directional edge from the second node corresponding to the second paper to the first node corresponding to the first paper may indicate that the second paper refers to the first paper.

랭크 계산부(120)는 상기의 참조 관계에 기반하여 복수 개의 논문들 중 하나 이상의 논문들(예컨대, 제1 논문) 각각의 랭크를 계산하고, 계산된 랭크를 해당하는 논문에게 부여한다.The rank calculator 120 calculates a rank of each of one or more papers (eg, the first paper) of the plurality of papers based on the reference relationship, and assigns the calculated rank to the corresponding papers.

랭크 계산부(120)가 논문의 랭크를 계산하기 위해 사용하는 기준의 일 예가 하기에서 상세히 설명된다.An example of a criterion used by the rank calculator 120 to calculate the rank of a paper is described in detail below.

핵심 논문 추출부(130)는 랭크가 부여된 하나 이상의 논문들 중, 하나 이상의 논문들 각각의 랭크들에 기반하여 핵심 논문을 추출한다.The core paper extracting unit 130 extracts the core paper based on the ranks of each of the one or more papers among the one or more papers to which the rank is assigned.

핵심 논문은 1) 독창적으로 우수한 아이디어(idea)를 제안하였기 때문에 예전부터 많은 참조를 받으며 존중받아온 논문을 의미할 수 있다. 또는, 핵심 논문은 2) 최근 이슈가 되는 아이디어를 제안하여 화제가 되고 있는 논문(rising star)을 의미할 수 있다.Core papers can mean a paper that has received a lot of references and respects since it has proposed original ideas. Alternatively, the core thesis can mean a rising star, which suggests an idea that has recently become an issue.

핵심 논문 추출부(130)는 랭크가 부여된 하나 이상의 논문들 중 랭크의 값이 특정한 임계 값 이상인 논문들을 핵심 논문으로서 추출할 수 있다.
The core paper extracting unit 130 may extract, as the core papers, papers whose rank value is greater than or equal to a specific threshold value among the one or more papers to which the rank is assigned.

도 2는 랭크 계산부가 논문의 랭크를 계산하기 위해 사용하는 기준의 일 예를 설명한다.2 illustrates an example of a criterion used by the rank calculator to calculate the rank of a paper.

논문의 랭크는 전술된 핵심 논문을 추출하기 위해 계산되고, 부여되는 것일 수 있다. 따라서, 제1 논문의 랭크는 하기의 1) 내지 4)의 기준들 중 하나 이상의 기준들에 기반하여 계산될 수 있다. 하기의 1) 내지 4)를 순서에 따라 각각, 제1 기준, 제2 기준, 제3 기준 또는 제4 기준으로 명명한다.
The rank of the article may be calculated and assigned to extract the core article described above. Thus, the rank of the first article may be calculated based on one or more of the criteria of 1) to 4) below. The following 1) to 4) are named as the first criterion, the second criterion, the third criterion or the fourth criterion, respectively, in order.

1) 랭크 계산부(120)는 제1 논문을 참조하는 제2 논문들의 개수 또는 제1 논문을 참조하는 제2 논문들을 참조하는 제3 논문들의 개수를 반영하여 제1 논문의 랭크를 계산할 수 있다.1) The rank calculator 120 may calculate the rank of the first article by reflecting the number of second articles referring to the first article or the number of third articles referring to the second articles referring to the first article. .

즉, 다른 논문들로부터 많은 참조를 받거나 또는 많은 참조를 받은 논문들로부터 참조를 받은 논문이 높은 랭크를 부여받을 수 있으며, 핵심 논문으로서 추출될 수 있다.That is, a paper that has received a lot of references from other papers or a paper that has received many references can be given a high rank and can be extracted as a core paper.

만약, 제1 논문이 많은 논문들로부터 참조를 받거나, 많은 참조를 받은 논문들로부터 참조를 받는다면, 제1 논문은 품질이 좋고 해당 분야에서의 영향력이 큰 논문일 수 있다.If the first paper is referred to from many papers or from many papers, the first paper may be of high quality and high impact in the field.

2) 랭크 계산부(120)는 제1 논문을 참조하는 제4 논문들의 개수를 반영하여 제1 논문의 랭크를 계산할 수 있다. 여기서, 제4 논문들은 제1 논문의 분야와 동일한 분야의 논문들이다. 또는, 제4 논문들은 제1 논문과 연관성이 높은 논문들이다.2) The rank calculator 120 may calculate the rank of the first article by reflecting the number of fourth articles referring to the first article. Here, the fourth papers are papers of the same field as that of the first paper. Alternatively, the fourth articles are articles that are highly related to the first article.

또는, 랭크 계산부(120)는 제1 논문을 참조하는 제4 논문들 각각의 제1 논문과의 연관도들을 반영하여 제1 논문의 랭크를 계산할 수 있다.Alternatively, the rank calculation unit 120 may calculate the rank of the first article by reflecting associations with the first article of each of the fourth articles referring to the first article.

즉, 동일한 분야의 연관성이 높은 논문들로부터 많은 참조를 받은 논문이 높은 랭크를 부여받을 수 있으며, 핵심 논문으로서 추출될 수 있다.That is, a paper that receives many references from highly related papers in the same field can be given a high rank and can be extracted as a core paper.

제1 논문의 발행 이후에 발행된 유사한(즉, 동일한 분야의 연관성이 높은) 제3 논문들은, 제1 논문과의 차이점을 설명하기 위해 제1 논문을 참조할 의무를 갖는다. 만약, 제3 논문이 제1 논문을 참조하지 않았다면, 제1 논문의 품질이 낮다는 것 또는 제1 논문의 자신의 분야에서의 중요도가 높지 않다는 것을 간접적으로 의미할 수 있다. 유사한 논문들로부터 참조를 많이 받은 논문이 해당 분야에서 영향력이 큰 논문이라고 볼 수 있다.Similar (ie, highly relevant, third-party) articles published after the publication of the first article are obliged to refer to the first article to explain differences from the first article. If the third paper does not refer to the first paper, it may indirectly mean that the quality of the first paper is low or the importance of the first paper in its field is not high. Papers that have received a lot of references from similar papers can be considered as influential papers in the field.

3) 랭크 계산부(120)는 제1 논문을 참조하는 제5 논문의 발행 시기 및 상기 제1 논문의 발행 시기 간의 차이를 반영하여 제1 논문의 랭크를 계산할 수 있다.3) The rank calculator 120 may calculate the rank of the first article by reflecting the difference between the publication time of the fifth article referring to the first article and the publication time of the first article.

즉, 발행 시기의 차이가 큰 논문들로부터 참조를 많이 받은 논문이 높은 랭크를 부여받을 수 있으며, 핵심 논문으로서 추출될 수 있다.That is, a paper that has received a lot of references from papers having a large difference in publication time can be given a high rank and can be extracted as a core paper.

상기의 발행 시기는 논문의 발행 연도일 수 있다.The publication time may be a publication year.

논문(예컨대, 제5 논문)의 저자는 자신의 논문과 관련된 높은 품질의 최신 논문을 참고 문헌에 포함시키고 싶어한다. 이는, 자신의 논문이 속하는 분야의 우수한 최신 연구에 대한 논문을 참조 논문으로서 포함함으로써, 자신의 논문이 최신의 연구까지 잘 조사하였다는 것을 나타내기 위해서이다.The author of a paper (e.g., a fifth paper) wants to include in his reference the latest paper of high quality related to his paper. This is to indicate that the research has been well investigated to the latest research by including a reference paper on the latest excellent research in the field to which the research belongs.

그럼에도 불구하고, 논문의 저자가 예전에 발행된 제1 논문을 참조하였다면, 이는 제1 논문이 제1 논문 및 상기 저자 자신의 논문의 사이에 발행된 많은 논문들과의 경쟁에서 이겼다는 것(즉, 가치있는 논문으로서 제5 논문의 저자에 의해 선택되었다는 것)을 의미한다.Nevertheless, if the author of the article refers to a previously published first article, this means that the first article has won the competition with many articles published between the first article and the author's own article (i.e. In other words, it is a valuable paper and has been selected by the author of the fifth paper).

이러한 경쟁은, 발행 연도의 차이가 커질수록 더 심해질 것이다. 따라서, 제1 논문의 발행 연도 차이가 큰 논문들로부터 참조를 많이 받았다면, 제1 논문은 해당 분야의 핵심 논문으로 간주될 수 있다.This competition will intensify as the year gap increases. Therefore, if a large number of references are received from papers having a large difference in year of publication of the first paper, the first paper may be regarded as a core paper in the relevant field.

4) 랭크 계산부(120)는 제1 논문의 발행 시기를 반영하여 제1 논문의 랭크를 계산할 수 있다.4) The rank calculator 120 may calculate the rank of the first article by reflecting the publication time of the first article.

즉, 최신 논문이 높은 랭크를 부여받을 수 있으며, 핵심 논문으로서 추출될 수 있다.That is, the latest paper can be given a high rank and can be extracted as a core paper.

예전 논문에 비해, 최신 논문은 상대적으로 더 적은 개수의 이후에 발행된 논문들을 갖는다. 따라서, 예전 논문에 비해, 최신 논문은 다른 논문들에 의해 참조될 기회가 더 적다.Compared to the previous article, the latest article has a relatively smaller number of later articles. Thus, compared to the previous article, the latest article has less chance to be referenced by other articles.

따라서, 참조의 절대적인 개수 등을 랭킹의 기준으로 사용하는 단순한 참조 기반 랭킹 방안에 의해서는, 최신 논문들은 핵심 논문으로 추출되기 어렵다.Therefore, by the simple reference-based ranking method using the absolute number of references as a criterion of ranking, the latest papers are difficult to extract as core papers.

그러나, 최신 논문들을 핵심 논문으로 추출하는 것은, 현재 이슈가 되는 주제 및 최신의 연구 동향을 파악할 수 있게 하며, 해당 분야의 발전 방향에 대한 예측을 가능하게 해주기 때문에, 매우 중요하다.
However, extracting the latest papers into the core papers is very important because it allows us to grasp the current issue and the latest research trends, and to predict the direction of development in the field.

도 3은 본 발명의 일 예에 따른 핵심 논문 검출 점수 계산 방법을 나타낸다.3 illustrates a method for calculating a core paper detection score according to an embodiment of the present invention.

참조 관계 모델링부(110) 및 랭크 계산부(120)는 논문의 랭크를 나타내는 핵심 논문 검출(SPF; seminal paper finding) 점수를 계산할 수 있다.The reference relationship modeling unit 110 and the rank calculator 120 may calculate a seminal paper finding (SPF) score indicating a rank of the paper.

특정 논문의 SPF 점수는 상기 논문이 도 2를 참조하여 전술된 4 가지 기준을 얼마나 충족시키는가를 나타내는 척도이다. 따라서, 논문들 각각의 SPF 점수가, 상기의 논문들 중 핵심 논문을 추출하기 위해 사용될 수 있다.The SPF score of a particular article is a measure of how much the article meets the four criteria described above with reference to FIG. Thus, the SPF score of each of the articles can be used to extract the core article of the articles.

랭크 계산부(120)는 참조 관계 모델링부(110)에 의해 생성된 참조 관계 그래프를 분석함으로써, 참조 관계 그래프를 생성하기 위해 사용된 논문들 중 전부 또는 일부의 SPF 점수를 구할 수 있다.The rank calculator 120 may obtain the SPF scores of all or some of the papers used to generate the reference relationship graph by analyzing the reference relationship graph generated by the reference relationship modeling unit 110.

본 발명의 일 예는 논문들의 랭크(즉, SPF 점수)에 대한 해석 가능한 결과를 도출하기 위해 확률 모델을 채택할 수 있다. 또한, 본 발명의 일 예에 따른 SPF 점수는 랜덤 워크 위드 리스타트(Random Walk with Restart; RWR)(310)에 기반하여 계산될 수 있다.One example of the invention may employ a probabilistic model to derive an interpretable result for a rank (ie SPF score) of articles. In addition, the SPF score according to an embodiment of the present invention may be calculated based on a random walk with restart (RWR) 310.

즉, 랭크 계산부(120)는 RWR(310)을 변형한 변형된 RWR(340)을 사용함으로써 제1 논문의 랭크(즉, 핵심 논문으로 검출되기 위한 점수)를 계산할 수 있다.That is, the rank calculator 120 may calculate the rank of the first article (that is, the score to be detected as the core article) by using the modified RWR 340 modified from the RWR 310.

RWR(310)은 웹 페이지, 블로그(blog) 및 논문 랭킹 등에서 널리 쓰이는 방법이다.The RWR 310 is a widely used method in web pages, blogs, and article rankings.

도시된 것처럼, RWR(310)은 하기의 수학식 1에 의해 표현될 수 있다.As shown, RWR 310 may be represented by Equation 1 below.

여기서, W(320)는 n×n의 행 정규화된 인용 행렬(row normailized citation matrix)이다. n은 RWR(310)의 대상인 논문들의 전체 개수이다. 상기의 논문들은 특정한 분야에 대한 논문들일 수 있다.Here, W 320 is a row normalized citation matrix of n × n . n is the total number of articles that are the subject of RWR 310. The papers above may be papers in a particular field.

(322)는 n×1의 리스타트 벡터(restart vector)이다.

는 1/n이다.

322 is a restart vector of n × 1 .

Is 1 / n .

c(324)은 리스타트 확률(restart probability)이다. c 324 is the restart probability.

(330)는 n×1의 세미날리티 점수 벡터(seminality score vector)이다.

330 is a seminality score vector of n × 1 .

즉, RWR(310) 방법은 리스타트 벡터(322) 및 참조 관계 행렬(320)에 기반하여 제1 논문의 SPF 점수를 포함하는 점수 벡터(330)를 생성할 수 있다.That is, the RWR 310 method may generate a score vector 330 including the SPF score of the first paper based on the restart vector 322 and the reference relationship matrix 320.

RWR(310)은 논문들 간의 참조 관계 그래프에서 논문들 각각의 권위 점수를 성공적으로 계산한다.The RWR 310 successfully calculates the authority score of each of the articles in the reference relationship graph between the articles.

연구자가 임의의 논문 한 편을 선택하여 읽기 시작한다고 가정한다. 연구자는 1-c의 확률로 상기의 논문이 참조한 다른 논문들을 읽거나, 또는 c의 확률로 리스타트 벡터

에 의해 결정된 임의의 논문들을 읽는다. 이러한 단계가 반복되면, 참조 관계 그래프의 모든 논문들에 대해, 연구자에 의해 읽혀질 수렴된 확률 분포(stationary probability distribution)인

이 구해질 수 있다.

에서, r _i 는 연구자가 논문 i를 읽을 확률을 나타낸다.Suppose a researcher selects a random article and starts reading it. The investigator reads other articles referenced by the above article with a probability of 1- c , or restart vector with a probability of c

Read any articles decided by If this step is repeated, for all papers in the reference relationship graph, the stationary probability distribution that is to be read by the investigator

This can be saved.

Where r _i represents the probability that the researcher reads article i .

r _i 가 높다는 것은, 대응하는 논문 i의 품질이 좋고, 따라서, 논문 i가 다수의 우수한 논문들로부터 존중 받았다는 것을 의미한다. 따라서, RWR을 기반으로 논문의 영향력을 계산하는 것은 제1 기준을 충족시킨다.
A high r _i means that the quality of the corresponding article i is good, and therefore the article i was respected by a number of excellent articles. Therefore, calculating the impact of the article based on the RWR satisfies the first criterion.

제2 기준을 충족시키기 위해, 하기에서 2 가지 방안이 제안된다.In order to meet the second criterion, two approaches are proposed below.

1) 랭크 계산부(120)는 하나의 개별 논문 i의 SPF 점수

를 계산하기 위한 커스토마이즈된 RWR(customized RWR; cRWR)(340)을 사용할 수 있다.1) Rank calculation unit 120 is the SPF score of one individual article i

A customized RWR (cRWR) 340 may be used to calculate.

이를 위해, 랭크 계산부(120)는 1/n의 동일한 값이 아닌, 논문 i와의 유사도를 기반으로 값이 설정되는 리스타트 백더

(352)를 사용한다. 논문 i의 유사도 계산을 위해 리스타트 벡터

(352)는 논문 i에 대해 개인화된다.To this end, the rank calculation unit 120 is a restart backer whose value is set based on the similarity with the paper i, not the same value of 1 / n

352 is used. Restart vector for calculating the similarity of paper i

352 is personalized for article i .

즉, 랭크 계산부(120)는 제1 논문과의 유사도를 기반으로 값이 설정되는 리스타트 벡터

(352)를 사용함으로써 제1 논문을 참조하는 제1 논문의 분야와 동일한 분야의 논문들인 제4 논문들의 개수를 반영하여 제1 논문의 랭크를 계산할 수 있다.That is, the rank calculation unit 120 is a restart vector whose value is set based on the similarity with the first paper.

By using 352, the rank of the first article may be calculated by reflecting the number of fourth articles that are articles of the same field as the first article referring to the first article.

이 경우, 리스타트 벡터

(352)는 하기의 수학식 2와 같이 정의될 수 있다.In this case, the restart vector

352 may be defined as Equation 2 below.

여기서, sim(i, j)는 논문 i 및 논문 j 간의 유사도를 나타낸다. Z는 정규화 팩터(factor)이다.Here, sim ( i , j ) represents the similarity between article i and article j . Z is a normalization factor.

Z는 하기의 수학식 3과 같이 정의될 수 있다. Z may be defined as in Equation 3 below.

여기서, N은 주제 내의 모든 논문들의 집합이다.Where N is the set of all articles in the subject.

수학식 2 및 수학식 3에 의해 나타난 것과 같이, cRWR(340)은 커스토마이즈된 리스타트 벡터

(352)를 통해, 논문 i가 유사한 논문으로부터 많은 참조를 받은 경우, 논문 i에 높은 SPF 점수

가 부여될 수 있게 한다. 이는, 유사한 논문으로부터의 참조를 유사하지 않은 논문으로부터의 참조보다 더 중요하게 취급하는 것이다.As represented by equations (2) and (3), the cRWR 340 is a customized restart vector.

Through (352), if article i receives many references from similar articles, the high SPF score for article i

To be given. This treats references from similar articles more importantly than references from dissimilar articles.

2) 랭크 계산부(120)는 참조 관계에 있는 논문들 간의 유사도를 반영하도록 변경된 참조 관계 행렬 W(354)를 사용할 수 있다.2) The rank calculator 120 may use the modified reference relationship matrix W 354 to reflect the similarity between the articles in the reference relationship.

변경된 참조 관계 행렬 W(354)는 하기의 수학식 4와 같이 정의될 수 있다.The modified reference relationship matrix W 354 may be defined as Equation 4 below.

여기서, CitedBy(i)는 논문 i에 의해 언급(즉, 참조)된 논문들의 집합이다. Where CitedBy ( i ) is the set of articles mentioned by (i.e., referenced by) i .

변경된 참조 관계 행렬 W(354)을 통해, 제1 논문은 자신을 참조한 논문들 중 연관성이 높은 눈문에게 더 큰 점수를 부여한다. 따라서, 논문 i는 연관성이 높은 논문들로부터 많은 참조를 받을 때 더 높은 SPF 점수를 갖게 된다.Through the modified reference relation matrix W 354, the first article gives a higher score to the relevant sentences among the articles that refer to it. Thus, article i has a higher SPF score when it receives many references from highly relevant articles.

즉, 랭크 계산부(120)는 참조 관계에 있는 논문들 간의 유사도를 반영하도록 변경된 참조 관계 행렬 W(354)를 사용함으로써 제1 논문을 참조하는 제4 논문들 각각의 제1 논문과의 연관도들을 반영하여 제1 논문의 랭크를 계산할 수 있다.
That is, the rank calculation unit 120 uses the changed reference relationship matrix W 354 to reflect the similarity between the articles in the reference relationship, and thus the degree of association with the first articles of each of the fourth articles referring to the first article. The rank of the first article can be calculated by reflecting these.

제3 기준을 충족시키기 위한 방안이 하기에서 설명된다.Methods for meeting the third criterion are described below.

핵심 논문을 검출함에 있어, 제3 기준을 적용하기 위해서는 발행 시기 차이가 큰 논문들로부터 참조된 논문들이 검색되어야 한다.In detecting the core papers, in order to apply the third criterion, the papers referenced from the papers having a large difference in time of publication should be searched.

랭크 계산부(120)는 참조 관계에 있는 두 논문들 간의 발행 시기의 차이를 커스토마이즈된 리스타트 벡터

(352)에 반영함으로써 제3 기준을 충족시킬 수 있다.The rank calculation unit 120 customizes the difference between the publication times between the two articles in the reference relationship.

The third criterion may be satisfied by reflecting at 352.

즉, 랭크 계산부(120)는 제1 논문을 참조하는 제5 논문의 발행 시기 및 제1 논문의 발행 시기 간의 차이를 반영하는 리스타트 벡터

(352)를 사용함으로써, 제5 논문의 발행 시기 및 제1 논문의 발행 시기 간의 차이를 반영하여 제1 논문의 랭크를 계산할 수 있다.That is, the rank calculation unit 120 may include a restart vector reflecting a difference between a publication time of the fifth article referring to the first article and a publication time of the first article.

By using 352, the rank of the first article can be calculated by reflecting the difference between the publication time of the fifth article and the publication time of the first article.

랭크 계산부(120)는, 리스타트 벡터

(352) 내에 지수적 증가 함수(exponential growth function)를 사용하여 발행 시기(예컨대, 발행 연도)의 차이가 큰 논문으로의 리스타트 확률을 증가시킴으로써 참조 관계에 있는 두 논문들 간의 발행 시기의 차이를 리스타트 벡터

(352)에 반영할 수 있다. 상기의 전략은 하기의 수학식 5와 같이 정의될 수 있다.Rank calculation unit 120 is a restart vector

An exponential growth function within (352) is used to increase the restart probability of articles with large differences in publication time (e.g., publication year). Restart vector

352 may be reflected. The above strategy may be defined as in Equation 5 below.

여기서, yr(i)는 논문 i의 발행 시기(예컨대, 발행 연도)이다.Where yr ( i ) is the time of publication of the article i (eg, year of publication).

t는 적절한 값이 획득되도록 튜닝(tuning)하기 위한 스케일링(scaling) 파라미터(parameter)이다. t is a scaling parameter for tuning so that an appropriate value is obtained.

수학식 5의

와 같은 지수적 증가 함수로 인해, 발행 시기의 차이가 큰 논문은 높은 리스타트 확률을 갖게 된다.Equation 5

Due to an exponential increment function such as, a paper with a large difference in publication time has a high restart probability.

따라서, 랭크 계산부(120)는 제1 논문이 발행 시기의 차이가 큰 논문들로부터 많은 참조를 받은 경우, 제1 논문에게 높은 랭크(즉, SPF 점수 벡터

(350) 중 제1 논문의 SPF 점수)를 부여할 수 있다.
Therefore, if the first paper receives many references from papers with a large difference in publication time, the rank calculation unit 120 ranks the first paper with a high rank (ie, an SPF score vector).

SPF score of the first paper of 350 may be assigned.

제4 기준을 충족시키기 위한 방안이 하기에서 설명된다.Methods for meeting the fourth criterion are described below.

최신 핵심 논문을 검출하기 위해, 논문 i 이전에 출판된 논문들로의 리스타트 확률이 0이 되도록 커스토마이즈된 리스타트 벡터

(352)를 수정할 수 있다. 이는, 논문 i의 SPF 점수 계산에 있어서, 논문 i의 분야에 해당하는 전체 논문들이 아닌, 논문 i를 참조할 수 있는 논문들(즉, 논문 i의 발행 시기 이후에 발행된 논문들)만을 고려하기 위해서이다.Restart vector customized to zero the restart probability to articles published before article i to detect the latest key articles

352 may be modified. This only consideration in calculating SPF rating of the article i, (in a paper published in other words, after the publication time of articles i) papers to see, not the entire article have papers i corresponding to a field of study i For that.

즉, 랭크 계산부(120)는 제1 논문 이전에 출판된 논문들로의 리스타트 확률이 0인 커스토마이즈된 리스타트 벡터

(352)를 사용함으로써 제1 논문의 발행 시기를 반영하여 제1 논문의 랭크(즉, SPF 점수 벡터

(350) 중 제1 논문의 SPF 점수)를 계산할 수 있다.That is, the rank calculation unit 120 customizes the restart vector having a restart probability of 0 to articles published before the first article.

By using 352, the rank of the first article reflecting the time of publication of the first article (ie, the SPF score vector

SPF score of the first paper of 350 may be calculated.

상기의 발행 시기 반영 방법은 하기의 수학식 6과 같이 정의될 수 있다.The publication time reflecting method may be defined as in Equation 6 below.

수학식 6과 같은 발행 시기 반영 방법은, 모든 논물들이 발행 시기와 무관하게 공정한 경쟁을 할 수 있게 한다.The method of reflecting the time of publication, such as Equation 6, allows all the articles to compete fairly regardless of the time of publication.

제1 논문의 발행 시기를 반영하지 않은 채 제1 논문의 랭크를 계산할 경우의 문제는 아래와 같다.The problem of calculating the rank of the first article without reflecting the time of publication of the first article is as follows.

예전에 발행된 논문일수록 상대적으로 자신의 발행 시기 이후에 발행된 논문들을 더 많이 갖는다. 이는, 논문 i가 예전에 발행된 논문이라면, 리스타트 벡터

(352)는 리스타트 하는 논문들을 많이 갖지만, 논문들 각각으로의 리스타트 확률은 상대적으로 높다는 것을 의미한다.Older papers have more papers published since their publication date. This is a restart vector if article i is a previously published article

(352) means that there are many papers to restart, but the restart probability to each of the papers is relatively high.

따라서, 논문 i는 (최근에 발행된 논문에 비해) 많은 논문들로부터 참조를 받아야 높은 SPF 점수(330)를 획득할 수 있다.Thus, article i can obtain a high SPF score 330 only with reference from many articles (compared to recently published articles).

이와 반대로, 최근에 발행된 논문은 상대적으로 자신의 발행 시기 이후에 발행된 논문들을 더 적게 갖는다. 따라서, 최근에 발행된 논문은 몇 편의 논문으로부터만 참조를 받더라도 높은 SPF 점수(330)를 획득할 수 있다는 것을 의미한다.In contrast, recently published articles have relatively fewer articles published since their publication time. Thus, a recently published article means that a high SPF score 330 can be obtained even if only a few articles are referenced.

전술된 문제를 해결하기 위해, 랭크 계산부(120)는 제1 논문의 랭크(즉, SPF 점수)를 제1 논문을 참조한 논문의 절대적인 수보다 제1 논문을 참조할 수 있는 논문들 중 실제로 제1 논문을 참조한 논문의 비율을 반영하여 제1 논문의 랭크(즉, SPF 점수 벡터

(350) 중 제1 논문의 SPF 점수)를 계산할 수 있다.In order to solve the above-mentioned problem, the rank calculation unit 120 actually ranks the rank of the first article (ie, the SPF score) among the articles that can refer to the first article rather than the absolute number of articles referring to the first article. 1 The rank of the first article (that is, the SPF score vector), reflecting the proportion of articles referenced in the article

SPF score of the first paper of 350 may be calculated.

따라서, 예전의 논문과 최신의 논문 간의 공정한 경쟁이 가능하게되며, 제4 기준이 충족될 수 있다.
Thus, a fair competition between the old paper and the latest paper is possible, and the fourth criterion can be satisfied.

전술된 것처럼, 리스타트 벡터

(352)는 각각의 개별 논문에 대해 개인화되었다. 즉, 개별 논문들 각각은 서로 상이한 리스타트 벡터

(352)를 갖는다.As mentioned above, restart vector

352 is personalized for each individual article. That is, each of the individual papers has a different restart vector.

Has 352.

따라서, n 개의 논문들에 대한 랭킹(즉, SPF 점수)를 계산하기 위해서는 cRWR이 총 n 번 반복되어야 한다. 이러한 반복은, 기존의 RWR(310)에 비하여 큰 수행 오버헤드(overhead)를 갖는다.Thus, to calculate the ranking (ie SPF score) for n articles, cRWR must be repeated n times in total. This repetition has a large performance overhead compared to the existing RWR 310.

하기에서, 이러한 추가적인 수행 오버헤드를 해결하는 방법이 제시된다.In the following, a method for solving this additional performance overhead is presented.

RWR(310)을 사용하여 점수를 계산하기 위한 방법은 크게 두 가지로 분류된다.There are two main methods for calculating scores using the RWR 310.

1) 하나는, RWR 점수가 수렴될 때까지 수학식 1을 반복 계산하는 온-더-플라이(OnTheFly) 방법이다.1) One is the OnTheFly method of repeating Equation 1 until the RWR score converges.

2) 다른 하나는, 하기의 수학식 7과 같은 프리-컴퓨트(Pre-Compute) 방법이다. 2) The other is a pre-compute method as shown in Equation 7 below.

여기서, I는 단위 행렬(identity matrix)이다.Where I is an identity matrix.

도시된 프리-컴퓨트 수식(360)은, 전술된 cRWR(340)의 수식을

(350)에 대해 정리한 것이다.The pre-computation equation 360 shown illustrates the equation of the cRWR 340 described above.

It is summarized about 350.

여기서,

(374)를 리스타트 벡터에 곱해지는 리스타트 벡터 계수(374)로 명명한다. 논문 i의 SPF 점수

는 리스타트 벡터 계수(374) 및 논문 i의 리스타트 벡터

(352)의 곱이다.here,

374 is named restart vector coefficient 374 which is multiplied by restart vector. SPF Score for Paper i

Is the restart vector coefficient (374) and restart vector of paper i .

Is the product of 352.

따라서, 랭크 계산부(120)는 리스타트 벡터 계수(364) 및 제1 논문의 리스타트 벡터(352)를 곱함으로써 제1 논문의 랭킹(즉, SPF 점수 벡터

(350) 중 제1 논문의 SPF 점수)을 계산할 수 있다.Therefore, the rank calculation unit 120 multiplies the restart vector coefficient 364 and the restart vector 352 of the first article, thereby ranking the first article (ie, the SPF score vector).

The SPF score of the first paper of 350 may be calculated.

프리-컴퓨터 방법을 사용하기 위해, 랭크 계산부(120)는 제1 논문을 포함한 각 논문 i에 대한 SPF 점수를 계산하기 전에, 미리 리스타트 벡터 계수

(374)를 계산하여 저장한다. 랭크 계산부(120)는 계산된 리스타트 벡터 계수

(374)를 제1 논문의 리스타트 벡터(352)를 곱함으로써 제1 논문의 랭킹(즉, SPF 점수 벡터

(350) 중 제1 논문의 SPF 점수)을 빠르게 계산할 수 있다.
In order to use the pre-computer method, the rank calculation unit 120 calculates the restart vector coefficient in advance before calculating the SPF score for each article i including the first article.

Compute and store (374). Rank calculation unit 120 calculates the restart vector coefficient

The ranking of the first article (ie, SPF score vector) by multiplying 374 by the restart vector 352 of the first article

(SPF score of the first article of 350) can be quickly calculated.

도 4는 본 발명의 일 예에 따른 프리-컴퓨트 방법을 설명한다.4 illustrates a pre-computing method according to an embodiment of the present invention.

프리-컴퓨트 방법은 온-더-플라이 방법에 비해 더 많은 프리-컴퓨테이션(pre-computation) 비용 및 저장 비용을 필요로 한다. 그러나, 프리-컴퓨트 방법의 수행 속도는 온-더-플라이 방법에 비해 매우 빠르다Pre-computation methods require more pre-computation and storage costs than on-the-fly methods. However, the performance rate of the pre-computation method is very fast compared to the on-the-fly method.

프리-컴퓨트 방법의 복잡도는 O(n ³ + n ^2.376) = O(n ³)이고, 온-더-플라이 방법의 복잡도는 O(j × n ³)이다. 여기서, j는 반복 횟수이다. 따라서, 프리-컴퓨트 방법의 복잡도는 온-더-플라이 방법의 복잡도에 비해 매우 작다.The complexity of the pre-computation method is O ( n ³ + n ^2.376 ) = O ( n ³ ) and the complexity of the on-the-fly method is O ( j × n ³ ). Where j is the number of repetitions. Thus, the complexity of the pre-computation method is very small compared to the complexity of the on-the-fly method.

도 3을 참조하여 제안된 cRWR(340)에 따르면, 복수 개의 논문들 각각에 대해 cRWR(340)이 수행되는 동안에는 논문들 간의 참조 관계 그래프가 변하지 않는다. 따라서, 프리-컴퓨트 방법이 적용될 수 있다.According to the cRWR 340 proposed with reference to FIG. 3, the reference relationship graph between the articles does not change while the cRWR 340 is performed for each of a plurality of articles. Thus, the pre-computing method can be applied.

또한, 본 발명의 일 실시예에 따르면, 데이터베이스 내의 모든 논문들 대신, 랭크 계산의 대상이 되는 제1 논문과 동일한 분야에 속한 논문들만이 랭크 계산을 위해 사용될 수 있다. 따라서, 리스타트 벡터 계수

(374)의 프리-컴퓨테이션 비용 및 저장 비용은 그리 크지 않다.In addition, according to an embodiment of the present invention, instead of all the papers in the database, only papers belonging to the same field as the first paper to be ranked may be used for rank calculation. Thus, restart vector coefficients

The precomputation and storage costs of 374 are not very large.

또한, 본 발명의 일 실시예에 따르면, 랭크 계산부(120)는 SPF 점수 벡터

(350) 중 단지 논문 i에 대한 SPF 점수

만을 이용하여 논문 i의 랭킹을 결정한다. 따라서, SPF 점수 벡터

(350) 전체가 계산될 필요는 없다.In addition, according to an embodiment of the present invention, the rank calculation unit 120 is an SPF score vector

SPF score for only paper i of 350

Use only to determine the ranking of article i . Thus, SPF score vector

350 does not need to be calculated in its entirety.

도 4에서 수학식 7의 리스타트 벡터 계수

(374)를 나타내는 행렬(420) 및 커스토마이즈된 리스타트 벡터

(352)를 나타내는 벡터(430) 간의 곱이 표현되었고, 상기 곱의 결과인 SPF 점수 벡터

(350)에 대응하는 벡터(410)가 표시되었다.Restart vector coefficients of Equation 7 in FIG.

Matrix 420 representing 374 and customized restart vector

The product between the vectors 430 representing 352 is represented and the SPF score vector resulting from the product

Vector 410 corresponding to 350 is indicated.

기존의 프리-컴퓨트 방법의 경우, 행렬(420) 및 벡터(430) 전체의 곱이 계산되어야 한다.In the conventional pre-computing method, the product of the matrix 420 and the entirety of the vector 430 must be calculated.

그러나, 논문 i에 대한 SPF 점수

는 행렬(420)의 i 번째 행 및 벡터(430)을 곱함으로써 계산될 수 있다. 즉, 랭크 계산부(120)는 행렬 및 벡터의 곱셉이 아닌, 벡터 및 벡터의 곱셈만으로도 원하는 결과, 즉, 제1 논문의 랭크를 획득할 수 있다.However, the SPF score for article i

Can be calculated by multiplying the i th row of the matrix 420 and the vector 430. That is, the rank calculator 120 may obtain a desired result, that is, the rank of the first paper, by multiplying the vector and the vector, not the multiplication of the matrix and the vector.

따라서, 계산 오버헤드가 크게 감소될 수 있다.Thus, the computational overhead can be greatly reduced.

전술된 방법의 복잡도는 O(n ² + n ^2.376) = O(n ^2.376)으로, 수행 속도가 크게 향상될 수 있다.
The complexity of the above-described method is O ( n ² + n ^2.376 ) = O ( n ^2.376 ), so that the execution speed can be greatly improved.

도 5는 본 발명의 일 실시예에 따른 논문의 랭크 계산 방법의 흐름도이다.5 is a flowchart illustrating a rank calculation method of a paper according to an embodiment of the present invention.

동작(510)에서, 예컨대 참조 관계 모델링부(110)에 의해, 복수 개의 논문들 간의 참조 관계가 추출된다.In operation 510, for example, the reference relationship modeling unit 110 extracts a reference relationship between a plurality of articles.

동작(520)에서, 예컨대 랭크 계산부(120)에 의해, 추출된 참조 관계에 기반하여 상기의 복수 개의 논문들 중 제1 논문의 랭크가 계산된다.In operation 520, for example, the rank calculator 120 calculates a rank of the first article among the plurality of articles based on the extracted reference relationship.

전술된 제1 기준 내지 제4 기준이 본 실시예에서도 적용될 수 있다.The first to fourth criteria described above may also be applied in this embodiment.

즉, 제1 논문의 랭크는 제1 논문을 참조하는 제2 논문들의 개수 또는 제1 논문을 참조하는 상기 제2 논문들을 참조하는 제3 논문들의 개수를 반영하여 계산될 수 있다.That is, the rank of the first article may be calculated by reflecting the number of second articles referring to the first article or the number of third articles referring to the second articles referring to the first article.

제1 논문의 랭크는 제1 논문을 참조하는 제4 논문들의 개수를 반영하여 계산될 수 있다. 제4 논문들은 제1 논문의 분야와 동일한 분야의 논문이다.The rank of the first article may be calculated by reflecting the number of fourth articles referring to the first article. The fourth papers are the same papers as those of the first paper.

제1 논문의 랭크는 제1 논문을 참조하는 제2 논문의 발행 시기 및 제1 논문의 발행 시기 간의 차이를 반영하여 계산될 수 있다.The rank of the first article may be calculated by reflecting the difference between the publication time of the second article referring to the first article and the publication time of the first article.

제1 논문의 랭크는 제1 논문의 발행 시기를 반영하여 계산될 수 있다.The rank of the first article may be calculated to reflect the publication time of the first article.

동작(530)에서, 예컨대 핵심 논문 추출부(130)에 의해, 상기의 하나 이상의 논문들 각각의 랭크들에 기반하여, 상기의 하나 이상의 논문들 중 핵심 논문이 추출된다.In operation 530, for example, the core article extracting unit 130 extracts the core article among the one or more articles based on ranks of each of the one or more articles.

앞서 도 1 내지 도 4를 참조하여 설명된 본 발명의 일 실시예에 따른 기술적 내용들이 본 실시예에도 그대로 적용될 수 있다. 따라서 보다 상세한 설명은 이하 생략하기로 한다.
Technical contents according to an embodiment of the present invention described above with reference to FIGS. 1 to 4 may be applied to the present embodiment as it is. Therefore, more detailed description will be omitted below.

도 1 내지 도 4를 참조하여 설명된 본 발명의 일 실시예에 따른 기술적 내용들은, 논문이 아닌 일반적인 문서의 랭크를 계산하기 위해서도 사용될 수 있다.Technical contents according to an embodiment of the present invention described with reference to FIGS. 1 to 4 may be used to calculate a rank of a general document rather than a paper.

문서는 특정한 컴퓨터 시스템(system) 내의 문서, 또는 인트라넷(intranet) 등 특정한 네트워크(network) 그룹(group) 내의 문서, 또는 인터넷(internet) 내의 문서를 의미할 수 있다.A document may refer to a document in a specific computer system, a document in a specific network group such as an intranet, or a document in the Internet.

전술된 설명들 중 '논문'은 '문서'로 대체될 수 있다.In the above descriptions, the 'paper' may be replaced with the 'document'.

'논문'의 '참조'(또는 '참조 관계' 등)는 '문서'의 '참조'로 대체될 수 있다. '문서'의 '참조'란, 한 문서가 하이퍼-텍스트(hyper-text) 또는 링크(link) 등을 사용하여 다른 문서로의 접근 경로를 제공하는 것을 의미할 수 있다.'References' (or 'relationships', etc.) in 'paper' can be replaced by 'references' in 'document'. 'Reference' of the 'document' may mean that one document provides a path to another document by using a hyper-text or a link.

'논문'의 '발행 시기'는 '문서'의 '생성 시기'로 대체될 수 있다. '문서'의 '생성 시기'는 문서가 파일 시스템 내에서 생성된 시각을 의미할 수 있으며, 문서와 관련된 특정한 발행 시기(예컨대, 특허 문서에 포함되는 상기 특허 문서의 출원일)를 의미할 수 있다.The timing of publication of a paper can be replaced by the timing of creation of a document. The 'generation time' of the 'document' may mean the time when the document was generated in the file system, and may mean a specific publication time (eg, the filing date of the patent document included in the patent document) associated with the document.

따라서, 참조 관계 모델링부(110)는 복수 개의 문서들 간의 참조 관계를 추출할 수 있으며, 랭크 계산부(120)는 상기 참조 관계에 기반하여 상기 복수 개의 문서들 중 제1 문서의 랭크를 계산할 수 있다. 또한, 핵심 논문 추출부(130)에 대응하는 핵심 문서 추출부는 랭크가 계산된 하나 이상의 문서들 각각의 랭크들에 기반하여, 상기의 하나 이상의 문서들 중 핵심 문서를 추출할 수 있다.Accordingly, the reference relationship modeling unit 110 may extract a reference relationship between a plurality of documents, and the rank calculator 120 may calculate a rank of a first document among the plurality of documents based on the reference relationship. have. In addition, the core document extractor corresponding to the core article extractor 130 may extract the core document among the one or more documents based on the ranks of the one or more documents whose rank is calculated.

또한, 전술된 제1 기준 내지 제4 기준이 문서의 랭크를 계산하기 위해 사용될 수 있다.In addition, the first to fourth criteria described above can be used to calculate the rank of the document.

즉, 랭크 계산부(120)는 제1 문서를 참조하는 제2 문서들의 개수 또는 제1 문서를 참조하는 제2 문서들을 참조하는 제3 문서들의 개수를 반영하여 제1 문서의 랭크를 계산할 있다.That is, the rank calculator 120 may calculate the rank of the first document by reflecting the number of second documents referring to the first document or the number of third documents referring to the second documents referring to the first document.

랭크 계산부(120)는 제1 문서를 참조하는 제4 문서들의 개수를 반영하여 제1 문서의 랭크를 계산할 수 있다. 여기서, 제4 문서들은 제1 문서의 분야와 동일한 분야의 문서이다.The rank calculator 120 may calculate the rank of the first document by reflecting the number of fourth documents referring to the first document. Here, the fourth documents are documents of the same field as that of the first document.

랭크 계산부(120)는 제1 문서을 참조하는 제2 문서의 생성 시기 및 제1 문서의 생성 시기 간의 차이를 반영하여 제1 문서의 랭크를 계산할 수 있다.The rank calculator 120 may calculate a rank of the first document by reflecting a difference between a generation time of the second document referring to the first document and a generation time of the first document.

랭크 계산부(120)는 제1 문서의 발행 시기를 반영하여 제1 문서의 랭크를 계산할 수 있다.
The rank calculator 120 may calculate the rank of the first document by reflecting the publication time of the first document.

본 발명의 일 실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Method according to an embodiment of the present invention is implemented in the form of program instructions that can be executed by various computer means may be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

100: 논문 랭크 계산 장치
110: 참조 관계 모델링부
120: 랭크 계산부
130: 핵심 논문 추출부100: paper rank calculation device
110: reference relationship modeling unit
120: rank calculation unit
130: core paper extraction unit

Claims

A reference relationship modeling unit for extracting a reference relationship between a plurality of papers; And
A rank calculator configured to calculate a rank of a first article among the plurality of articles based on the reference relationship
Lt; / RTI >
The rank calculator calculates a rank of the first article by reflecting a difference between a publication time of a second article referring to the first article and a publication time of the first article,
The rank calculation unit gives a higher rank to the first article as the difference is greater,
The rank calculator calculates the rank of the first article based on a random walk with restart (RWR) that calculates the rank of the first article based on a restart vector and a reference relation matrix.
The rank calculator calculates the rank of the first paper by reflecting the difference by using the customized restart vector of the RWR reflecting the difference,
The rank calculation unit uses an exponential increase function in the restart vector to increase the restart probability of articles with a large difference in publication time, thereby reflecting the difference in publication time between two papers in a reference relationship in the restart vector. Paper rank calculation device.

The method of claim 1,
A core article extracting unit extracting a core article among the one or more articles based on ranks of the one or more articles
The article rank calculation device further comprising.

The method of claim 2,
The core paper extracting unit extracts, as the core papers, papers whose rank value is greater than or equal to a specific threshold value among the one or more papers.

The method of claim 1,
The reference relationship modeling unit models the reference relationship between the plurality of papers in the form of a graph data structure, a node in the graph represents one paper among the plurality of papers, and an edge in the graph indicates the plurality of papers. A paper rank calculation device for indicating a reference relationship between the two.

The method of claim 1,
And a plurality of articles are articles belonging to the same field.

The method of claim 1,
The rank calculator calculates the rank of the first article by reflecting the number of second articles referring to the first article or the number of third articles referring to the second articles referring to the first article. Rank calculation device.

The method of claim 1,
The rank calculator calculates the rank of the first article by reflecting the number of fourth articles referring to the first article,
And said fourth paper is a paper of the same field as that of the first paper.

The method of claim 1,
And the rank calculation unit calculates the rank of the first article by reflecting the publication time of the first article.

delete

In the paper rank calculation method performed by a paper rank calculation apparatus,
A reference relationship modeling operation of extracting a reference relationship between a plurality of papers; And
A rank calculation operation of calculating a rank of a first article among the plurality of articles based on the reference relationship
/ RTI >
The rank calculation operation may calculate a rank of the first article by reflecting a difference between a publication time of a second article referring to the first article and a publication time of the first article,
The rank calculation operation gives a higher rank to the first article as the difference is larger,
The rank calculator calculates a rank of the first article based on a random walk with restart (RWR) that calculates the rank of the first article based on a restart vector and a reference relationship matrix.
The rank calculator calculates the rank of the first paper by reflecting the difference by using the customized restart vector of the RWR reflecting the difference,
The rank calculation unit uses an exponential increase function in the restart vector to increase the restart probability of articles with a large difference in publication time, thereby reflecting the difference in publication time between two papers in a reference relationship in the restart vector. Rank calculation method of article to do.

delete

A reference relationship modeling unit for extracting a reference relationship between a plurality of documents; And
A rank calculator configured to calculate a rank of a first document among the plurality of documents based on the reference relationship
Lt; / RTI >
The rank calculator calculates a rank of the first document by reflecting a difference between a generation time of a second document referring to the first document and a generation time of the first document.
The rank calculation unit gives a higher rank to the first document as the difference is greater,
The rank calculator calculates a rank of the first article based on a random walk with restart (RWR) that calculates the rank of the first article based on a restart vector and a reference relationship matrix.
The rank calculator calculates the rank of the first paper by reflecting the difference by using the customized restart vector of the RWR reflecting the difference,
The rank calculation unit uses an exponential increase function in the restart vector to increase the restart probability of articles with a large difference in publication time, thereby reflecting the difference in publication time between two papers in a reference relationship in the restart vector. Document rank calculation device.

delete