KR20100056068A

KR20100056068A - System and method for visualization and extraction of core literature using citation network

Info

Publication number: KR20100056068A
Application number: KR1020080115042A
Authority: KR
Inventors: 권오진; 이방래; 여운동; 최현규
Original assignee: 한국과학기술정보연구원
Priority date: 2008-11-19
Filing date: 2008-11-19
Publication date: 2010-05-27
Also published as: KR101047108B1

Abstract

PURPOSE: A system and a method for visualization and extraction of core literature using a citation network are provided to find a core technology for a certain subject and recognize a technical trend. CONSTITUTION: A core document extraction server(200) extracts citation information from the document stored at a document information database(100), and selects the document derivative from the start document. The core document extraction server creates a tree graph using the start document as a root node, and selects a core document by calculating the importance index for each node of the tree graph. The core document extraction server visualizes the connections among the selected core documents.

Description

System and Method for visualization and extraction of core literature using citation network

본 발명은 인용 네트웍을 이용한 핵심 문헌 추출 및 시각화 시스템 및 방법에 관한 것으로서, 더욱 상세하게는 문헌 정보 데이터베이스에 저장된 문헌으로부터 시작문헌 및 상기 시작문헌로부터 파생된 파생문헌을 선정하여 상기 시작문헌을 루트(root) 노드로 한 트리 그래프를 생성하고, 상기 트리 그래프의 각 노드에 대해 중요도 지수를 계산하여 핵심문헌을 선정하고, 상기 선정한 핵심문헌간의 연결을 시각화하는 인용 네트웍을 이용한 핵심 문헌 추출 및 시각화 시스템 및 방법에 관한 것이다. The present invention relates to a core document extraction and visualization system and method using a citation network, and more particularly, by selecting a start document and a derivative document derived from the start document from a document stored in a document information database. Root) node to generate a tree graph, calculate the importance index for each node of the tree graph to select the core literature, and extract the core literature extraction and visualization system using a citation network to visualize the link between the selected core literature and It is about a method.

문헌 발간량이 꾸준히 증가하고 인터넷을 통해 입수할 수 있는 정보가 과잉 상태에 도달함에 따라, 그 정보를 통해 연구 동향을 파악하고, 이를 통해 기업간 경쟁력을 확보하고, 연구 개발 방향을 대세에 맞추어 설정하고, 미래의 도전에 대 비해야 하는 사람들에게는 이러한 과잉 정보가 오히려 정확한 판단을 저해하는 요인으로 작용하고 있다.As the volume of publications steadily increases and the information available on the Internet has reached a surplus, information on the research trends can be used to identify the research trends, to secure competitiveness among companies, and to set the direction for research and development. For those who need to prepare for future challenges, this excess information is rather a deterrent to accurate judgment.

정보 이용자가 모든 정보 출처에서 개인의 지식과 경험을 토대로 정보를 입수하고 분석하는 전통적인 정보 분석 방법은 전문가의 시간 과다 소요, 전문가의 관점에 따라 편향된 정보 수집과 분석 진행 등의 단점을 내포하고 있다.Traditional methods of information analysis, in which information users obtain and analyze information from all information sources based on personal knowledge and experience, have drawbacks such as excessive time spent by experts and the collection and analysis of biased information according to the expert's point of view.

이러한 단점을 개선하기 위한 하나의 방편으로서 사람이 수행하는 일의 일부를 컴퓨터가 수행해주는 정보 분석 시스템 개발에 대한 연구가 진행되고 있다. As a way to remedy these shortcomings, researches have been conducted on the development of computer-based information analysis systems that perform part of human tasks.

또한, 특정 연구 분야에서 새로운 연구를 시작하려는 많은 신진 연구자들은 그들의 첫 출발점을 찾는데 많은 어려움을 겪게 된다. In addition, many young researchers who want to start new research in a particular field of research have a lot of trouble finding their first starting point.

만일 당사자가 운이 좋다면, 그들을 돕고 지원하는데 적극적이고 헌신적인 선배들이나 교수들이 있어서 그 출발점 상에서의 어려움에 대한 해결책이 되기도 하지만, 대부분은 본인 스스로 해결해야 한다.If the parties are lucky, there are seniors and professors who are active and dedicated to helping and supporting them, but most of them must be solved on their own.

연구의 본격적인 수행을 위한 준비 작업인 정보의 수집은 인터넷이 발달한 현재에도 매우 힘들고 시간이 많이 걸리는 작업으로 여겨지고 있다.Collecting information, which is a preparatory work for full-fledged conduct of research, is considered to be a very difficult and time-consuming task even in the current Internet development.

특히 정보의 양이 방대하고 주어진 주제에 대한 검색 집합의 크기가 클 경우, 기술자나 연구자들은 기술의 흐름을 파악하기 위해 많은 시간을 투자해야 한다. In particular, if the amount of information is huge and the size of the search set for a given topic is large, technicians or researchers must spend a lot of time figuring out the flow of the technology.

또한, 대부분의 연구자나 기술자들은 특정 주제에 대한 핵심기술을 발견하거나 역사적인 기술의 흐름을 파악하기를 원하는데, 본인 스스로 찾아야 하므로 시간이 오래 걸리고 핵심기술을 찾아도 그 기술이 객관적으로 정확한 것인지 알지 못하 는 단점이 있다.In addition, most researchers or technicians want to discover key technologies on a particular topic or to understand the flow of historical technology, which requires time to find for themselves, and it is time consuming to find a key technology that does not know whether it is objectively accurate. There are disadvantages.

본 발명의 목적은 특정 주제에 대한 핵심기술을 발견하거나 기술의 흐름을 파악할 수 있도록 하는 인용 네트웍을 이용한 핵심 문헌 추출 및 시각화 시스템 및 방법을 제공하는데 있다. An object of the present invention is to provide a core literature extraction and visualization system and method using a citation network to discover the core technology on a particular subject or to grasp the flow of the technology.

본 발명의 다른 목적은 방향성을 갖는 문헌 인용 네트워크에서 직접 인용과 간접인용을 동시에 고려하여 가장 영향력 있는 핵심문헌을 추출하는 인용 네트웍을 이용한 핵심 문헌 추출 및 시각화 시스템 및 방법을 제공하는데 있다. Another object of the present invention is to provide a core document extraction and visualization system and method using a citation network that extracts the most influential core documents by considering both direct citation and indirect citation in a directional citation network.

본 발명의 또 다른 목적은 핵심문헌들간의 연결관계를 시각화하여 보여줄 수 있는 인용 네트웍을 이용한 핵심 문헌 추출 및 시각화 시스템 및 방법을 제공하는데 있다. Still another object of the present invention is to provide a core document extraction and visualization system and method using a citation network that can visualize and show the linkage relationship between core documents.

상기 목적들을 달성하기 위하여 본 발명의 일 측면에 따르면, 인용정보가 표시된 문헌이 저장된 문헌 정보 데이터베이스, 문헌 정보 데이터베이스에 저장된 문헌으로부터 인용정보를 추출하고, 상기 문헌으로부터 시작문헌 및 상기 시작문헌로부터 파생된 파생문헌을 선정하여 상기 시작문헌을 루트(root) 노드로 한 트리 그래프를 생성하고, 상기 트리 그래프의 각 노드에 대해 중요도 지수를 계산하여 핵심문헌을 선정하고, 상기 선정한 핵심문헌간의 연결을 시각화하는 핵심 문헌 추출 서버를 포함하는 인용 네트웍을 이용한 핵심 문헌 추출 및 시각화 시스템이 제공된 다. According to an aspect of the present invention in order to achieve the above object, the citation information is extracted from the document information database, the document stored in the document information database, the document is displayed citation information, and derived from the start document and the start document from the document By selecting derivative documents, we create a tree graph with the starting document as the root node, select the core documents by calculating the importance index for each node of the tree graph, and visualize the linkage between the selected core documents. A core literature extraction and visualization system using a citation network including a core literature extraction server is provided.

상기 문헌 정보 데이터베이스에는 국내외 학술논문, 특허, 연구보고서 중 적어도 하나의 전문 분야에 대한 문헌이 저장되어 있다.The literature information database stores literature on at least one specialized field among domestic and international academic papers, patents, and research reports.

상기 핵심 문헌 추출 서버는 해당 분야의 전문가 또는 피인용수를 이용하여 시작 문헌을 선정하고, 상기 추출한 인용정보를 이용하여 상기 시작문헌을 직접 인용한 문헌 및 인용한 문헌을 재인용한 문헌(간접 인용한 문헌)을 포함하는 파생문헌을 선정한다.The core document extraction server selects a start document by using an expert or a citation in the relevant field, and uses the extracted citation information to directly cite the reference document and a cited document (indirect citation). Select a derived document, including

또한, 상기 핵심 문헌 추출 서버는 상기 생성한 트리 그래프를 인접행렬로 구성하고, 상기 인접행렬을 거듭제곱한 행렬로 만들어 각 노드의 중요도 지수를 계산한다.In addition, the core document extraction server configures the generated tree graph as an adjacency matrix, and calculates the importance index of each node by making the adjacency matrix a power-square matrix.

또한, 상기 핵심 문헌 추출 서버는 상기 생성한 트리 그래프를 인접행렬로 구성하고, 상기 인접 행렬을 이용한 깊이 우선 탐색(DFS, depth first search) 기법을 적용하여 중요도 지수를 계산한다.In addition, the core document extraction server configures the generated tree graph as an adjacent matrix, and calculates an importance index by applying a depth first search (DFS) technique using the adjacent matrix.

본 발명의 다른 측면에 따르면, 문헌 정보 데이터베이스에 저장된 문헌으로부터 인용정보를 추출하고, 그 인용정보를 이용하여 핫 스팟 문헌을 추출하는 데이터 수집 모듈, 상기 데이터 수집 모듈에서 수집한 문헌으로부터 시작문헌 및 상기 시작문헌로부터 파생된 파생문헌을 선정하여 상기 시작문헌을 루트 노드로 한 트리 그래프를 생성하고, 상기 트리 그래프의 각 노드에 대해 중요도 지수를 계산하여 핵심문헌을 선정하는 핵심 문헌 선정 모듈, 상기 핵심 문헌 선정 모듈에서 선정한 핵심문헌간의 연결을 시각화하는 시각화 모듈을 포함하는 핵심 문헌 추출 서버가 제공된다. According to another aspect of the present invention, a data collection module for extracting the citation information from the literature stored in the literature information database, and extracting the hot spot literature using the citation information, starting documents and the above-mentioned literature from the documents collected by the data collection module Core literature selection module for selecting a core document by selecting a derivative document derived from the source document, generating a tree graph with the root document as the root node, and calculating the importance index for each node of the tree graph, the core document A core literature extraction server is provided that includes a visualization module that visualizes the links between the core literature selected in the selection module.

상기 데이터 수집 모듈은, 상기 문헌 정보 데이터베이스에 저장된 각 분야별 문헌을 대상으로 하여 피인용 및 인용 정보를 추출하는 인용 정보 추출부, 일정 기간의 문헌에 대해 각 문헌이 인용한 문헌을 수집하여 전방 인용회수를 계산하고, 그 전방 인용회수를 이용하여 핫 스팟 문헌을 선정하는 핫 스팟 문헌 추출부를 포함한다.The data collection module includes a citation information extracting unit for extracting citation and citation information for each document stored in the document information database, and collects citations cited by each document for a certain period of time. And a hot spot document extraction unit for selecting a hot spot document using the forward citation count.

상기 핵심 문헌 선정 모듈은 상기 데이터 수집 모듈에서 수집한 문헌으로부터 시작문헌을 선정하는 시작문헌 선정부, 상기 시작문헌 선정부에서 선정한 시작문헌으로부터 파생된 파생문헌을 상기 데이터 수집모듈에서 추출한 인용정보를 이용하여 추출하는 파생문헌 추출부, 상기 파생문헌 추출부에서 추출한 파생문헌에 대해 상기 시작문헌을 루트 노드로 하는 트리 그래프를 생성하는 트리 그래프 생성부, 상기 트리 그래프 생성부에서 생성한 트리 그래프의 각 노드에 대해 중요도 지수를 계산하는 중요도 지수 계산부, 상기 중요도 지수 계산부에서 계산된 중요도 지수를 기준으로 년도별로 가장 높은 중요도 지수를 갖는 문헌을 핵심문헌으로 선정하는 핵심문헌 선정부를 포함한다.The core document selection module uses a citation information extracted from the data collection module, a start document selection unit for selecting a start document from a document collected by the data collection module, and a derivative document derived from the start document selected by the start document selection unit. Derived document extraction unit for extracting by extracting, a tree graph generation unit for generating a tree graph with the starting document as a root node for the derived documents extracted by the derivative document extraction unit, each node of the tree graph generated by the tree graph generation unit The importance index calculation unit for calculating the importance index for the, including the core literature selection unit for selecting the document having the highest importance index for each year based on the importance index calculated in the importance index calculation unit as the core literature.

상기 시작 문헌 선정부는 해당 분야의 전문가 또는 상기 데이터 수집 모듈에서 추출한 피인용수를 이용하여 시작 문헌을 선정한다.The start document selection unit selects a start document by using an expert in the relevant field or the cited water extracted from the data collection module.

상기 중요도 지수 계산부는

를 이용하여 중요도 지 수(

)를 계산하되, 상기 r은 인접행렬의 거듭제곱, m은 거듭 제곱값의 최대값, w(k)는

, k은 중간 노드 수,

은 인접행렬 A의 r 거듭제곱행렬에서 (i,j)성분 값, n은 노드의 총 수를 말한다. The importance index calculation unit

Using the materiality index (

), Where r is the power of the adjacency matrix, m is the maximum value of power, and w (k) is

, k is the number of intermediate nodes,

Is the (i, j) component value in the r-th matrix of the adjacent matrix A, and n is the total number of nodes.

또한, 상기 중요도 지수 계산부는 상기 트리 그래프 생성부에서 생성한 트리 그래프를 인접행렬로 구성하고, 상기 인접 행렬을 이용한 깊이 우선 탐색(DFS, depth first search) 기법을 적용하여 중요도 지수를 계산한다. The importance index calculator may configure the tree graph generated by the tree graph generator as an adjacent matrix, and calculate a importance index by applying a depth first search (DFS) technique using the adjacent matrix.

상기 시각화 모듈은 상기 핵심 문헌 선정 모듈에서 선정된 핵심문헌의 노드에 대한 직접 연결 링크를 탐색하고, 직접 연결 링크가 존재하지 않은 노드는 중요도 지수가 큰 하위 노드를 선택하여 미리 선별된 노드와의 경로를 조사하여 경로를 추가한다.The visualization module searches for a direct connection link to a node of a core document selected in the core document selection module, and selects a lower node having a high importance index for a node having no direct link link to a path with a preselected node. Investigate to add the route.

본 발명의 또 다른 측면에 따르면, (a)문헌 정보 데이터베이스에 저장된 문헌으로부터 피인용/인용 정보를 추출하는 단계, (b)상기 문헌에서 시작문헌을 선정하며, 상기 시작문헌으로부터 파생된 파생문헌을 추출하는 단계, (c)상기 추출한 파생문헌에 대해 상기 시작문헌을 루트노드로 하는 트리 그래프를 생성하는 단계, (d)상기 생성한 트리 그래프의 각 노드에 대해 중요도 지수를 계산하는 단계, (e)상기 계산된 중요도 지수를 기준으로 하나 이상의 핵심문헌을 선정하는 단계, (f)상기 선정한 핵심문헌간의 연결을 시각화하는 단계를 포함하는 인용 네트웍을 이용한 핵심 문헌 추출 및 시각화 방법이 제공된다. According to another aspect of the present invention, (a) extracting the cited / cited information from the literature stored in the literature information database, (b) selecting the starting document from the document, and derived derivatives derived from the starting document Extracting, (c) generating a tree graph having the starting document as a root node with respect to the extracted derivative document, (d) calculating a importance index for each node of the generated tree graph, (e There is provided a method for extracting and visualizing a core document using a citation network, including selecting one or more core documents based on the calculated importance index, and (f) visualizing a link between the selected core documents.

상기 (b)단계는 상기 문헌 정보 데이터베이스에 저장된 일정 기간의 문헌에 대해 각 문헌이 인용한 문헌을 수집하여 전방 인용회수를 계산하고, 그 전방 인용회수를 이용하여 핫 스팟 문헌을 선정하는 단계, 상기 선정한 핫 스팟 문헌중에서 시작문헌을 선정하는 단계, 상기 (a)단계에서 추출한 피인용/인용 정보를 이용하여 상기 시작문헌을 직접 인용한 문헌 및 인용한 문헌을 재인용한 문헌(간접 인용한 문헌)을 포함하는 파생문헌을 추출하는 단계를 포함한다.Step (b) is a step of collecting the documents cited by each document for a period of time stored in the document information database to calculate the number of forward citations, using the forward citation times to select a hot spot document, the Selecting a reference document from the selected hot spot documents, using the citation / quotation information extracted in step (a), and a document that cites the starting document directly and cites the cited document (indirectly cited documents) Extracting a derivative comprising a.

상기 (d)단계는 상기 (c)단계에서 생성한 트리 그래프를 노드간의 직접연결을 표시하는 인접행렬로 구성하는 단계, 상기 구성한 인접행렬을 거듭제곱하여 거듭제곱 행렬을 구하는 단계, 상기 구해진 거듭제곱 행렬을 이용하며 경로길이별로 가중치를 주어 노드의 중요도 지수를 계산하는 단계를 포함한다. In the step (d), the tree graph generated in the step (c) is configured as an adjacency matrix indicating direct connection between nodes, the power of the adjacent adjacency matrix is calculated to obtain a power matrix, and the obtained power is Computing the importance index of the node by using a matrix and weighted for each path length.

또한, 상기 (d)단계는 상기 (c)단계에서 생성한 트리 그래프를 노드간의 직접연결을 표시하는 인접행렬로 구성하는 단계, 상기 구성한 인접행렬에 깊이 우선 탐색(DFS, depth first search) 기법을 적용하여 중요도 지수를 계산하는 단계를 포함한다. In the step (d), the tree graph generated in the step (c) is configured as an adjacent matrix indicating direct connection between nodes, and the depth first search (DFS) technique is applied to the configured adjacent matrix. Applying to calculate the importance index.

상기 (e)단계는 상기 (d)단계에서 구해진 중요도 지수를 기준으로 년도별로 가장 높은 중요도 지수를 갖는 문헌을 핵심문헌으로 선정한다. In step (e), the documents having the highest importance index for each year are selected as core documents based on the importance index obtained in step (d).

상기 (f)단계는 상기 선정된 각 핵심문헌의 노드에 대한 직접 연결 링크를 탐색하고, 직접 연결 링크가 존재하지 않는 노드는 중요도 지수가 큰 하위 노드를 선택하여 미리 선정된 노드와의 경로를 조사하여 경로를 추가하여 핵심문헌간의 연결을 시각화한다. In step (f), a direct link link is searched for the nodes of the selected core documents, and a node having no direct link link selects a lower node having a high importance index and examines a path with a preselected node. Visualize the links between key documents by adding paths.

본 발명에 따르면, 방향성을 갖는 문헌 인용 네트워크에서 직접 인용과 간접인용을 동시에 고려하여 가장 영향력 있는 핵심문헌을 추출하여 보여주므로, 연구자는 특정 주제에 대한 핵심기술을 쉽게 검색할 수 있고, 기술의 흐름을 파악할 수 있다. According to the present invention, since the most influential core documents are extracted and shown in consideration of direct citation and indirect citation simultaneously in a directional citation network, the researcher can easily search for the core technology on a specific subject, and the flow of technology Can be identified.

또한, 핵심문헌들간의 연결관계를 시각화하여 보여주므로, 특허 가치 평가, 핵심 기술에 대한 연결 관계, 특허맵에서의 기술 전개도 및 특허 분쟁 가능 후보군 생성 등에 이용할 수 있다. In addition, it visualizes the linkage relationship between core documents, so it can be used to evaluate patent value, linkage to core technology, technology development in patent map, and creation of patent disputeable candidate group.

본 발명의 전술한 목적과 기술적 구성 및 그에 따른 작용 효과에 관한 자세한 사항은 본 발명의 명세서에 첨부된 도면에 의거한 이하 상세한 설명에 의해 보다 명확하게 이해될 것이다.Details of the above-described objects and technical configurations of the present invention and the effects thereof according to the present invention will be more clearly understood by the following detailed description based on the accompanying drawings.

도 1은 본 발명에 따른 인용 네트웍을 이용한 핵심 문헌 추출 시스템을 나타낸 도면, 도 2는 본 발명에 따른 트리 그래프를 이용하여 직간접 경로와 노드의 중요도 지수를 설명하기 위한 도면이다.1 is a diagram illustrating a system for extracting a core document using a citation network according to the present invention, and FIG. 2 is a diagram for explaining the importance index of direct and indirect paths and nodes using a tree graph according to the present invention.

도 1을 참조하면, 인용 네트웍을 이용한 핵심 문헌 추출 시스템은 인용정보가 표시된 문헌이 저장된 문헌 정보 데이터베이스(100), 문헌 정보 데이터베이 스(100)에 저장된 문헌에서 핵심 문헌을 추출하여 시각화하는 핵심 문헌 추출 서버(200)를 포함한다. Referring to FIG. 1, a core document extraction system using a citation network extracts and visualizes a core document from a document stored in a document information database 100 and a document information database 100 storing a document in which citation information is displayed. It includes an extraction server 200.

상기 문헌 정보 데이터베이스(100)는 국내외 학술논문, 특허, 연구보고서 등의 전문 분야에 대한 문헌이 저장되어 있고, 각 문헌에는 인용한 문헌 정보가 표시되어 있다.The literature information database 100 stores literatures on specialized fields such as domestic and international academic papers, patents, and research reports, and the cited literature information is displayed on each of the literatures.

상기 문헌 정보 데이터베이스(100)는 특허 문헌인 경우 특허문헌 정보 데이터베이스, 학술 문헌인 경우 학술문헌 정보 데이터베이스 등으로 분리할 수 있다.The literature information database 100 may be divided into a patent literature information database in the case of patent literature, and a literature literature information database in the case of academic literature.

상기 핵심 문헌 추출 서버(200)는 상기 문헌 정보 데이터베이스(100)에 저장된 문헌으로부터 인용정보를 추출하고, 상기 문헌으로부터 시작문헌 및 상기 시작문헌로부터 파생된 파생문헌을 선정하여 상기 시작문헌을 루트 노드(root node)로 한 트리 그래프를 생성하고, 상기 트리 그래프의 각 노드에 대해 중요도 지수를 계산하여 핵심문헌을 선정하며, 상기 선정한 핵심문헌간의 연결을 시각화한다.The core document extraction server 200 extracts citation information from documents stored in the document information database 100, selects a start document and a derivative document derived from the start document, and selects the start document as a root node ( Create a tree graph as a root node, select the core documents by calculating the importance index for each node of the tree graph, and visualize the linkage between the selected core documents.

이때, 상기 핵심 문헌 추출 서버(200)는 상기 문헌 정보 데이터베이스(100)에 저장된 문헌으로부터 해당 분야의 전문가 또는 피인용수를 이용하여 시작 문헌을 선정하고, 상기 추출한 인용정보를 이용하여 상기 시작특허를 직접 인용한 문헌 및 인용한 문헌을 재인용한 문헌(간접 인용한 문헌)을 포함하는 파생문헌을 선정한다. At this time, the core document extraction server 200 selects a start document from a document stored in the document information database 100 using an expert or a citation of the relevant field, and uses the extracted citation information to obtain the start patent. Derivative literature should be selected, including those that are cited directly and references that are cited (indirectly cited).

또한, 상기 핵심 문헌 추출 서버(200)는 상기 생성한 트리 그래프를 인접행렬로 구성하고, 상기 인접행렬을 거듭제곱한 행렬로 만들어, 그 거듭제곱행렬을 이용하여 경로길이별로 가중치를 주어 노드의 중요도 지수를 계산한다.In addition, the core document extraction server 200 configures the generated tree graph as an adjacency matrix, makes the adjacency matrix a power of a matrix, and weights each path length using the power matrix to give importance to the nodes. Calculate the index.

또한, 상기 핵심 문헌 추출 서버(200)는 상기 생성한 트리 그래프를 인접행렬로 구성하고, 상기 인접 행렬을 이용한 깊이 우선 탐색(DFS, depth first search) 기법을 적용하여 중요도 지수를 계산할 수 있다. In addition, the core document extraction server 200 may configure the generated tree graph as an adjacent matrix and calculate a importance index by applying a depth first search (DFS) technique using the adjacent matrix.

상기 핵심 문헌 추출 서버(200)는 중요도 지수, 특정 노드에서 다른 노드로 직접 연결된 에지의 수(degree), closeness 등을 이용하여 핵심문헌을 선정할 수 있다.The core document extraction server 200 may select a core document using an importance index, the number of edges directly connected to another node from one node, and the closeness.

상기 핵심 문헌 추출 서버(200)가 중요도 지수를 이용하여 핵심문헌을 선정하는 방법은 본 발명에서 제안하는 것으로서, 이에 대한 상세한 설명은 도 3을 참조하기로 한다. The method of selecting the core document by the core document extraction server 200 using the importance index is proposed in the present invention, and a detailed description thereof will be made with reference to FIG. 3.

본 발명에서 중요도 지수를 핵심 문헌 선정을 위한 지표로 선정한 이유는, 네트워크에서 다른 노드로 연결되는 경로의 수가 많을수록 지표 값은 높아져야 하고, 직접 연결이 간접 연결보다 지표 값이 높게 나와야 하는 지표를 사용해야 하는데, 중요도 지수가 상기 속성을 만족하기 때문에 핵심문헌 선정을 위한 지표로 중요도 지수를 이용한다. In the present invention, the importance index is selected as an index for selecting the core literature. The larger the number of paths connected to other nodes in the network, the higher the value of the indicator, and the use of the indicator in which the direct connection is higher than the indirect connection should be used. However, since the importance index satisfies the above properties, the importance index is used as an index for selecting the core literature.

예를 들어, 특정 특허가 다른 특허들로 직접이든 간접이든 많이 인용될수록 해당 특허는 중요한 특허로 인정받고, 직접 인용된 특허가 여러 번 경로를 거쳐서 간접적으로 인용된 특허보다 해당 특허에 더 많은 영향을 끼치는 것으로 가정하여, 도 2를 참조하여 직간접 경로와 노드의 중요도 지수를 설명하기로 한다.For example, the more often a patent is cited, whether directly or indirectly, as another patent, the more important the patent is recognized as an important patent and the more directly the patent cited has a greater impact on that patent than the patent cited indirectly through multiple passes. On the assumption that it is inclined, the importance index of the direct and indirect paths and nodes will be described with reference to FIG. 2.

도 2를 살펴보면, 1번 노드가 2번 노드에 연결되는 트리 그래프의 패턴들을 보여주고 있다.Referring to FIG. 2, patterns of a tree graph in which node 1 is connected to node 2 are illustrated.

(a)와 같이 1번 노드에서 2번 노드로 직접 연결된 경우에는 1번 노드의 중요도 지수에 1을 더한다. 1에서 직접 연결된 노드수가 n 개이면 1번 노드의 중요도 지수값에 n이 더해진다. As shown in (a), when node 1 is directly connected to node 2, 1 is added to the importance index of node 1. If the number of nodes directly connected at 1 is n, n is added to the importance index value of node 1.

(b)의 경우는 1번 노드에서 2번 노드로 가는 경로의 길이가 2인 경우이다. 이때는 서로 독립적인 경로에 대해서 1/2 값이 1번 노드의 중요도 지수에 더해진다. (c)의 경우는 1번 노드에서 2번 노드로 가는 경로의 길이가 3이므로 각 경로별로 1/3이 1번 노드의 중요도 지수에 더해진다.In case (b), the length of the path from node 1 to node 2 is 2. In this case, 1/2 value is added to node 1's importance index for independent paths. In the case of (c), since the length of the path from node 1 to node 2 is 3, 1/3 of each path is added to the importance index of node 1.

다음으로, 상기 핵심 문헌 추출 서버(200)가 closeness를 이용하여 핵심 문헌을 선정하는 방법을 살펴보면, 상기 핵심 문헌 추출 서버(200)는 수학식 1을 이용하여 closeness를 구하고, 그 구해진 closeness값을 이용하여 핵심문헌을 선정한다.Next, looking at how the core document extraction server 200 selects a core document using closeness, the core document extraction server 200 obtains the closeness using Equation 1, and uses the obtained closeness value. Select key documents.

여기서,

는 노드 i에서 다른 모든 노드로 가는 최단경로(shortest path) 길이의 합,

는 노드 i에서 도달가능한 다른 노드의 개수, g 는 노드의 총 개수를 말한다.here,

Is the sum of the shortest path lengths from node i to all other nodes,

Is the number of other nodes reachable at node i, and g is the total number of nodes.

상기 문헌 정보 데이터베이스(100)와 핵심 문헌 추출 서버(200)는 통신망을 통해 연결되어 있다.The document information database 100 and the core document extraction server 200 are connected through a communication network.

예를 들어, 핵심 문헌 추출 서버(200)가 핵심특허를 추출하여 시각화하기를 원하는 경우, 상기 핵심문헌 추출 서버(200)는 상기 문헌 정보 데이터베이스(100)에서 저장된 특허문헌으로부터 인용정보를 추출하고, 상기 특허문헌으로부터 시작특허 및 상기 시작특허로부터 파생된 파생특허를 선정하여 상기 시작특허를 루트 노드(root node)로 한 트리 그래프를 생성하고, 상기 트리 그래프의 각 노드에 대해 중요도 지수를 계산하여 핵심특허를 선정하며, 상기 선정한 핵심특허간의 연결을 시각화한다.For example, if the core document extraction server 200 wants to extract the core patent to visualize, the core document extraction server 200 extracts the citation information from the patent documents stored in the document information database 100, Selecting a starting patent and a derivative patent derived from the starting patent to generate a tree graph with the starting patent as a root node, and calculating the importance index for each node of the tree graph. Patents are selected and the links between the selected core patents are visualized.

상기와 같은 역할을 수행하는 핵심 문헌 추출 서버(200)에 대한 상세한 설명은 도 3을 참조하기로 한다. A detailed description of the core document extraction server 200 which performs the above role will be described with reference to FIG. 3.

도 3은 본 발명에 따른 핵심 문헌 추출 서버의 구성을 개략적으로 나타낸 블럭도, 도 4는 도 3에 도시된 핵심 문헌 선정 모듈의 구성을 상세히 나타낸 블럭도, 도 5 및 도 6은 본 발명에 따른 트리 그래프를 인접행렬로 표현한 예시도이다.Figure 3 is a block diagram schematically showing the configuration of the core document extraction server according to the present invention, Figure 4 is a block diagram showing in detail the configuration of the core document selection module shown in Figure 3, Figures 5 and 6 according to the present invention This is an exemplary diagram in which a tree graph is expressed by an adjacent matrix.

도 3을 참조하면, 핵심 문헌 추출 서버(200)는 데이터 수집 모듈(210), 핵심 문헌 선정 모듈(230), 시각화 모듈(250)을 포함한다.Referring to FIG. 3, the core document extraction server 200 includes a data collection module 210, a core document selection module 230, and a visualization module 250.

상기 데이터 수집 모듈(210)은 기 구비된 문헌 정보 데이터베이스에 저장된 문헌에서 인용정보를 추출하고, 그 인용정보를 이용하여 핫 스팟(hot spot) 문헌을 추출하는 역할을 수행한다.The data collection module 210 extracts citation information from a document stored in a document information database provided therein, and extracts a hot spot document using the citation information.

상기와 같은 역할을 수행하는 데이터 수집 모듈(210)은 문헌 정보 데이터베 이스에 저장된 각 분야별 문헌을 대상으로 하여 피인용 및 인용 정보를 추출하는 인용 정보 추출부(212), 일정 기간내의 문헌에 대해 핫 스팟 문헌을 추출하는 핫 스팟 문헌 추출부(214)를 포함한다.The data collection module 210 serving as the above is a citation information extracting unit 212 for extracting citation and citation information for each document stored in a bibliographic information database, for a document within a certain period of time. And a hot spot document extracting unit 214 for extracting the hot spot document.

상기 문헌 정보 데이터베이스에 저장된 문헌에는 인용 문헌이 표시되어 있으므로, 상기 인용 정보 추출부(212)는 각 문헌에서 인용 문헌 표시 영역을 쿼리하여 인용정보를 추출한다.Since cited documents are displayed in the documents stored in the document information database, the citation information extracting unit 212 extracts the citation information by querying the cited document display area in each document.

상기 핫 스팟 문헌 추출부(214)는 일정 기간의 문헌에 대해 각 문헌이 인용한 문헌을 수집하여 전방 인용회수(forward citation count)를 계산하고, 그 전방 인용회수를 이용하여 핫 스팟 문헌을 추출한다. 여기서, 핫 스팟 문헌을 추출하기 위한 기간은 임의로 설정 가능하다.The hot spot document extraction unit 214 collects the documents cited by each document for a period of time, calculates a forward citation count, and extracts the hot spot documents using the forward citation count. . Here, the period for extracting the hot spot document can be arbitrarily set.

예를 들어, 최근 5년 동안의 핫 스팟 특허를 추출하는 경우를 살펴보면, 핫 스팟 문헌 추출부(214)는 최근 5년 동안에 등록된 특허가 인용한 모든 특허들을 수집하여 전방 인용 회수(Forward Citation count)를 계산하고, IPC 별 상위 1%의 특허를 추출할 수 있다. 이때, 상기 추출한 상위 1%의 특허가 핫 스팟 특허일 수 있다.For example, in the case of extracting a hot spot patent for the last five years, the hot spot literature extracting unit 214 collects all patents cited by the registered patents in the last five years to forward citation count. ), And extract the top 1% of patents by IPC. At this time, the extracted top 1% patent may be a hot spot patent.

상기 핵심 문헌 선정 모듈(230)은 상기 데이터 수집 모듈(210)에서 수집한 문헌 및 인용정보를 이용하여 시작문헌 및 상기 시작문헌로부터 파생된 파생문헌을 선정하여 상기 시작문헌을 루트 노드로 한 트리 그래프를 생성하고, 상기 트리 그래프의 각 노드에 대해 중요도 지수를 계산하여 핵심문헌을 선정하는 역할을 수행한다.The core document selection module 230 selects a start document and a derivative document derived from the start document by using the documents and citation information collected by the data collection module 210, and uses the start document as a root node. It generates and calculates the importance index for each node of the tree graph serves to select the core literature.

상기와 같은 역할을 수행하는 핵심 문헌 선정 모듈(230)에 대해 도 4를 참조하면, 핵심 문헌 선정 모듈(230)은 시작 문헌 선정부(232), 파생 문헌 추출부(234), 트리 그래프 생성부(236), 중요도 지수 계산부(238), 핵심 문헌 선정부(240)를 포함한다.Referring to FIG. 4 for the core document selection module 230 that performs the above role, the core document selection module 230 includes a start document selection unit 232, a derivative document extraction unit 234, and a tree graph generation unit. 236, an importance index calculator 238, and a key document selector 240.

상기 시작 문헌 선정부(232)는 데이터 수집 모듈(210)에서 수집한 핫 스팟 문헌으로부터 시작문헌을 선정한다. 상기 시작문헌은 해당 분야의 전문가가 선정 또는 피인용수가 가장 높은 문헌을 선정할 수 있다.The start document selection unit 232 selects a start document from the hot spot document collected by the data collection module 210. The starting document may be selected by an expert in the relevant field or a document having the highest number of cited documents.

상기 파생문헌 추출부(234)는 상기 시작문헌 선정부(232)에서 선정한 시작문헌으로부터 파생된 파생문헌을 상기 데이터 수집모듈(210)에서 추출한 인용정보를 이용하여 추출한다. 즉, 상기 파생문헌 추출부(234)는 데이터 수집 모듈(210)에서 추출한 피인용/인용 정보를 이용하여 상기 시작특허를 직접 인용한 문헌 및 인용한 문헌을 재인용한 문헌(간접 인용한 문헌)을 모두 추출한다. 이때, 상기 직접 인용한 문헌과 간접 인용한 문헌을 파생문헌이라고 한다. The derivative document extracting unit 234 extracts a derivative document derived from the starting document selected by the starting document selecting unit 232 using the citation information extracted by the data collection module 210. That is, the derivative document extracting unit 234 uses the cited / quoted information extracted by the data collection module 210 to directly cite the documents cited and the cited documents (indirect cited documents). Extract all of them. At this time, the documents cited directly and indirectly cited are referred to as derivatives.

상기 파생 문헌 추출부(234)는 시작문헌으로부터 파생된 파생 문헌을 추출하여 해쉬 테이블(hash table)을 구성한다.The derived document extracting unit 234 extracts a derived document derived from the start document to form a hash table.

상기 트리 그래프 생성부(236)는 상기 파생문헌 추출부(234)에서 추출한 파생문헌에 대해 상기 시작문헌을 루트 노드로 하는 트리 그래프로 생성한다. The tree graph generation unit 236 generates a tree graph having the start document as a root node with respect to the derivative document extracted by the derivative document extracting unit 234.

상기 중요도 지수 계산부(238)는 상기 트리 그래프 생성부(236)에서 생성한 트리 그래프의 각 노드에 대해 중요도 지수를 계산한다.The importance index calculator 238 calculates a importance index for each node of the tree graph generated by the tree graph generator 236.

즉, 상기 중요도 지수 계산부(238)는 상기 트리 그래프 생성부(236)에서 생 성한 트리 그래프를 노드간의 직접 연결을 표시하는 인접행렬로 구성하고, 상기 구성한 인접행렬을 거듭제곱하여 거듭제곱 행렬을 구한 후, 상기 구해진 거듭제곱 행렬을 이용하며 경로 길이별로 가중치를 주어 노드의 중요도 지수를 계산한다.That is, the importance index calculator 238 configures the tree graph generated by the tree graph generator 236 as an adjacency matrix indicating direct connection between nodes, and generates a power matrix by multiplying the constructed adjacency matrix. After the calculation, the importance matrix of the node is calculated by using the obtained power matrix and weighted for each path length.

상기 중요도 지수 계산부(238)가 트리 그래프를 인접행렬로 표현하는 방법에 대해 도 5를 참조하면, 노드의 집합을 V라 하고 에지의 집합을 E라 두면 트리 그래프 데이터는 G = (V, E)로 표현하며, 노드집합과 에지집합으로 이루어졌음을 의미한다. Referring to FIG. 5, the importance index calculator 238 expresses a tree graph as an adjacent matrix. If the set of nodes is V and the set of edges is E, the tree graph data is G = (V, E ) And means that it is composed of node set and edge set.

인접행렬(adjacency matrix)이란 노드간의 직접연결을 표시하며 인접행렬에서 셀 값이 1이면 두 노드간 직접연결(에지)이 있음을 의미하며 0이면 연결이 없음을 의미, 인접행렬의 대각값은 자기자신으로 연결되는 경우가 없음을 의미한다.An adjacency matrix indicates a direct connection between nodes. A cell value of 1 indicates that there is a direct connection (edge) between two nodes, and 0 means no connection. It means no connection to itself.

인접 행렬을 거듭제곱하여 구한 거듭제곱 행렬에 대해 표 1을 참조하여 설명하기로한다.The power matrix obtained by multiplying adjacent matrices will be described with reference to Table 1.

A를 인접행렬이라고 했을 때, A를 r 번 거듭제곱한 A^r 행렬의 각 셀 값은 특정 노드에서 다른 노드로 가는 경로길이가 r인 서로 다른 경로의 수이다. 다시 말하면 노드 V_i에서 노드 V_j까지 경로길이 r인 상이한 경로의 개수는 행렬 A의 거듭제곱행렬 A^r 의 (i, j) 성분 값이다. 표1에서 거듭제곱행렬 A^r의 사례를 살펴보자. When A is called an adjacency matrix, each cell value of the matrix A ^r , where A is raised to r times, is the number of different paths whose length is r from one node to another. In other words, the number of different paths having the path length r from the node V _i to the node V _j is the value of the (i, j) component of the matrix A ^r of the matrix A. ^Consider the example of the power matrix A ^r in Table 1.

표 1에서 a로 표시된 (1,2) 셀 값은 노드 1에서 노드 2로 가는 길이 r인 서로 다른 경로가 3개 있음을 의미한다. 한편 b의 값들을 모두 더하면 네트워크에서 노드 1로부터 다른 모든 노드들로 도달 가능한 경로길이 r인 서로 다른 경로의 수가 된다.The cell value (1,2), denoted a in Table 1, means that there are three different paths of length r from node 1 to node 2. On the other hand, adding all the values of b yields the number of different paths with path length r reachable from node 1 to all other nodes in the network.

본 발명에서 제안하는 노드의 중요도 지수는 인접행렬의 거듭제곱행렬을 이용하며 경로길이별로 가중치를 주어서 계산한다. 이 때 가중치는 경로길이의 역일 수 있다. The importance index of the node proposed in the present invention is calculated by using the power matrix of the adjacent matrix and weighted for each path length. In this case, the weight may be the inverse of the path length.

따라서, 상기 중요도 지수 계산부는 수학식 2를 이용하여 중요도 지수(

)를 구한다.Therefore, the importance index calculation unit using the equation 2 importance index (

)

즉, r 값은 인접행렬의 거듭제곱(power)을 나타내고, m은 거듭제곱값의 최대값으로서 경로거리의 최대값이다. w(k)는 각 경로에서 중간에 위치하는 노드의 수가 몇 개인가를 살펴보고 이를 이용하여 가중치 함수를 만든 것으로, 이를 다른 형태로 표현하자면 경로거리의 역수로 표현할 수도 있다. n은 네트워크의 크기(노드의 총수)를 말하고

은 인접행렬 A의 r 거듭제곱행렬에서 (i,j) 성분 값이다. 한편 노드i 는 중요도 지수를 계산하려고 하는 대상이며, f_i값이 이렇게 계산된 노드의 중요도 지수를 나타낸다. In other words, the r value represents the power of the adjacent matrix, and m is the maximum value of the power of the path distance. w (k) looks at the number of nodes in the middle of each path and uses it to create a weighting function. n is the size of the network (total number of nodes)

Is the (i, j) component value of the r-th matrix of the adjacent matrix A. On the other hand, node i is an object to which the importance index is to be calculated, and the value of f _i represents the importance index of the node thus calculated.

본 발명에서 제안한 중요도 지수의 특성을 살펴보면 특정 노드가 다른 모든 노드로 도달 가능한가(reachable)를 살펴보고 있으며 경로거리 값의 역을 가중치(weight)로 사용하고 있음을 알 수 있다.Looking at the characteristics of the importance index proposed in the present invention, it can be seen that the specific node is reachable to all other nodes, and that the inverse of the path distance value is used as the weight.

또한, 상기 중요도 지수 계산부(238)는 상기 트리 그래프 생성부(236)에서 생성한 트리 그래프를 인접행렬로 구성하고, 상기 인접행렬을 이용한 깊이 우선 탐색(DFS, depth first search) 기법을 적용하여 중요도 지수를 계산할 수 있다.In addition, the importance index calculator 238 configures the tree graph generated by the tree graph generator 236 into an adjacent matrix, and applies a depth first search (DFS) technique using the adjacent matrix. The importance index can be calculated.

상기 중요도 지수 계산부(238)가 깊이 우선 탐색 기법을 이용하여 중요도 지수를 계산하는 방법에 대하여 도 6을 참조하여 설명하기로 한다. A method of calculating the importance index by the importance index calculator 238 using a depth-first search will be described with reference to FIG. 6.

도 6의 (a)는 시작문헌이 직접 경로로 연결되는 문헌이 2개이고, 간접경로로 연결되는 문헌이 7개인 트리 그래프이다. 여기에서, 노드의 번호는 최상단이 1이고, 위에서 아래로, 좌우에 우로 번호가 증가하고 있다.6 (a) is a tree graph of two documents in which a start document is connected by a direct path and seven documents connected by an indirect path. Here, the number of nodes is 1 at the top, and the numbers are increasing from top to bottom and right to left.

이 트리 그래프를 인접행렬로 구성하면 (b)와 같다.If the tree graph is composed of adjacent matrices, it is as shown in (b).

(b)를 이용하여 노드 1의 중요도 지수를 계산하면, 1행에서 1 값의 위치(2, 3)를 기억하고 1 값을 모두 더한다. 그러면, 노드 1에 대한 중요도 지수 f₁ += (1+1)이 된다. If we calculate the importance index of node 1 using (b), we store the position (2, 3) of 1 value in row 1 and add all 1 values. Then, the importance index f ₁ + = (1 + 1) for node 1.

그런 다음 전 단계에서 기억한 위치 중 먼저 2행으로 이동하고 다시 1 값의 위치(4, 5)를 기억하고 1 값을 모두 더한다. 단 이 때는 경로거리가 2이므로 가중치 (1/2)를 곱한다. 그러면, 중요도 지수 f₁ += (1/2)×(1+1)일 수 있다. Then move to row 2 first of the positions you remembered in the previous step, and remember the positions of the value 1 (4, 5) and add all the values of 1. In this case, however, the path distance is 2, so the weight is multiplied by 1/2. Then, the importance index f ₁ + = (1/2) × (1 + 1).

다음으로 전 단계에서 기억한 위치 중 먼저 4행으로 이동하고 다시 1 값의 위치(7)를 기억하고 1 값을 모두 더한다. 이 때는 경로거리가 3이므로 가중치 (1/3)를 곱한다. 그러면, 중요도 지수 f₁ += (1/3)×(1)일 수 있다.Next, move to row 4 of the positions memorized in the previous step, and remember the position (7) of 1 value and add all 1 values. In this case, since the path distance is 3, multiply by the weight (1/3). Then, the importance index f ₁ + = (1/3) × (1).

다음으로 전 단계에서 기억한 위치 7행으로 이동해서 1 값이 없으므로 한 단계를 뒤로 거슬러 나오고 기억된 5행으로 이동한다.Next, move to position 7, remembered in the previous step, and there is no value of 1, so go back one step and move to memory 5, remembered.

5행에서 1 값의 위치(8, 9)를 기억하고, 1 에 가중치를 곱해서 더한다. 그러면, 중요도 지수는 f₁ += (1/3)×(1+1) 일 수 있다. Remember the position (8, 9) of the value 1 in row 5, and multiply 1 by the weight. Then, the importance index may be f ₁ + = (1/3) × (1 + 1).

다음으로 전 단계에서 기억한 8행, 9행으로 이동하면 모두 1 값이 없다. 2단계를 거슬러 올라가면 기억된 3행으로 이동할 수 있고 3행에서 1 값의 위치(6)을 기억하고 가중치를 곱해서 더한다. 그러면, 중요도 지수 f₁ += (1/2)×(1) 일 수 있다. Next, if you move to row 8 and row 9 you remembered in the previous step, there is no value of 1. If you go back to step 2, you can move to the memorized row 3, remember the position (6) of the 1 value in the row 3, and multiply by the weight. Then, the importance index f ₁ + = (1/2) × (1).

다음으로 전단계에서 기억한 6행으로 이동하고 1 값의 위치(10)을 기억하고 가중치를 곱해서 더한다. 그러면, 중요도 지수 f₁ += (1/3)×(1)일 수 있다. Next, move to row 6 memorized in the previous step, store the position 10 of the value 1, and multiply by weight to add. Then, the importance index f ₁ + = (1/3) × (1).

다음으로 전 단계에서 기억한 10행으로 이동하고 1 값이 없으므로 종료한다.Next go to row 10, remembered in the previous step, and exit because there is no value of 1.

상기와 같이 상기 중요도 지수 계산부(238)는 인접행렬을 이용한 깊이 우선 탐색(DFS, depth first search) 기법을 적용하여 노드 1의 중요도 지수를 계산할 수 있다.As described above, the importance index calculator 238 may calculate the importance index of node 1 by applying a depth first search (DFS) technique using an adjacent matrix.

상기 핵심문헌 선정부(240)는 상기 중요도 지수 계산부(238)에서 계산된 중요도 지수를 기준으로 년도별로 가장 높은 중요도 지수를 갖는 문헌을 핵심문헌으로 선정한다.The core document selector 240 selects a document having the highest importance index for each year as a core document based on the importance index calculated by the importance index calculator 238.

상기 시각화 모듈(250)은 상기 핵심 문헌 선정 모듈(230)에서 선정한 핵심문헌간의 연결을 시각화한다. 즉, 상기 시각화 모듈(250)은 상기 핵심 문헌 선정 모듈(230)에서 선정된 핵심문헌의 노드에 대한 직접 연결 링크를 탐색하고, 직접 연결 링크가 존재하지 않은 노드는 중요도 지수가 큰 하위 노드를 선택하여 미리 선 별된 노드와의 경로를 조사하여 경로를 추가하여 핵심 문헌을 연결한다.The visualization module 250 visualizes the link between the core documents selected by the core document selection module 230. That is, the visualization module 250 searches for a direct connection link to a node of a core document selected by the core document selection module 230, and selects a lower node having a high importance index for a node that does not have a direct link link. Investigate the paths with the nodes selected in advance, and add the paths to link the core documents.

도 7은 본 발명에 따른 핵심문헌 추출 장치가 핵심문헌을 추출하여 시각화하는 방법을 나타낸 흐름도, 도 8은 본 발명에 따른 핵심 문헌간을 시각화한 예시도이다. 7 is a flowchart illustrating a method of extracting and visualizing a core document by a core document extraction apparatus according to the present invention, and FIG. 8 is an exemplary diagram visualizing core documents according to the present invention.

도 7을 참조하면, 핵심문헌 추출장치는 문헌 정보 데이터베이스에 저장된 문헌으로부터 피인용/인용 정보를 추출한다(S700).Referring to FIG. 7, the core document extracting apparatus extracts citation / quotation information from documents stored in a document information database (S700).

그런 다음 상기 핵심문헌 추출 장치는 상기 문헌 정보 데이터베이스에 저장된 일정 기간의 문헌에 대해 각 문헌이 인용한 문헌을 수집하여 전방 인용회수를 계산하고, 그 전방 인용회수를 이용하여 핫 스팟 문헌을 선정한다(S702).Then, the core document extraction apparatus collects the documents cited by each document for the documents stored in the document information database, calculates forward citation counts, and selects hot spot documents using the forward citation counts ( S702).

그런 다음 상기 핵심문헌 추출 장치는 상기 선정한 핫 스팟 문헌중에서 시작문헌을 선정하고(S704), 상기 추출한 피인용/인용 정보를 이용하여 상기 시작 문헌으로부터 파생된 파생 문헌을 추출한다(S706). 즉, 상기 핵심 문헌 추출 장치는 상기 추출한 피인용/인용 정보를 이용하여 상기 시작문헌을 직접 인용한 문헌 및 인용한 문헌을 재인용한 문헌(간접 인용한 문헌)을 포함하는 파생문헌을 추출한다.Then, the core document extraction apparatus selects a start document from the selected hot spot document (S704), and extracts a derived document derived from the start document using the extracted citation / quotation information (S706). That is, the apparatus for extracting the core documents extracts derived documents including the documents directly citing the starting document and the documents (indirectly cited documents) re-cited the cited documents using the extracted citation / quotation information.

상기 S706의 수행 후, 상기 핵심문헌 추출 장치는 상기 추출한 파생문헌에 대해 상기 시작문헌을 루트 노드로 하는 트리 그래프를 생성하고(S708), 상기 생성한 트리 그래프의 각 노드에 대해 중요도 지수를 계산한다(S710).After performing the step S706, the core document extracting device generates a tree graph having the starting document as a root node for the extracted derivative document (S708), and calculates an importance index for each node of the generated tree graph. (S710).

즉, 상기 핵심문헌 추출 장치는 상기 생성한 트리 그래프를 노드간의 직접연결을 표시하는 인접행렬로 구성하고, 상기 구성한 인접행렬을 거듭제곱하여 거듭제 곱 행렬을 구한 후, 상기 거듭제곱 행렬을 이용하며 경로길이별로 가중치를 주어 노드의 중요도 지수를 계산한다.That is, the apparatus for extracting the core documents consists of an adjacent matrix representing the direct connection between nodes, and calculates a power product matrix by multiplying the constructed neighbor matrix by using the power matrix. The importance index of the node is calculated by weighting the path length.

또한, 상기 핵심문헌 추출장치는 상기 생성한 트리 그래프를 노드간의 직접연결을 표시하는 인접행렬로 구성하고, 상기 구성한 인접행렬에 깊이 우선 탐색(DFS, depth first search) 기법을 적용하여 중요도 지수를 계산할 수 있다. The apparatus for extracting the core documents may be configured to construct the tree graph as an adjacent matrix indicating direct connection between nodes, and calculate a importance index by applying a depth first search (DFS) technique to the configured adjacent matrix. Can be.

상기 S710의 수행 후, 상기 핵심문헌 추출 장치는 상기 계산된 중요도 지수를 기준으로 년도별로 가장 높은 중요도 지수를 갖는 문헌을 핵심문헌으로 선정하고(S712), 상기 선정한 핵심문헌간의 연결을 시각화한다(S714).After performing the S710, the core document extraction apparatus selects a document having the highest importance index for each year based on the calculated importance index as a core document (S712), and visualizes the link between the selected core documents (S714). ).

즉, 상기 핵심문헌 추출장치는 상기 선정된 각 핵심문헌의 노드에 대한 직접 연결 링크를 탐색하고, 직접 연결 링크가 존재하지 않는 노드는 중요도 지수가 큰 하위 노드를 선택하여 미리 선정된 노드와의 경로를 조사하여 경로를 추가하여 핵심문헌간의 연결을 시각화한다. That is, the apparatus for extracting the core documents searches for a direct connection link to the nodes of the selected core documents, and a node having no direct connection link selects a lower node having a high importance index and routes to a predetermined node. Visualize the links between key literature by investigating and adding paths.

예를 들어, 도 8의 (a)와 같은 핵심특허에 대한 중요도 지수가 표시된 테이블을 이용하여 (b)와 같이 시각화하는 방법에 대해 설명하기로 한다.For example, a method of visualizing as shown in (b) using a table in which an importance index for core patents such as (a) of FIG. 8 is displayed will be described.

(a)는 미국 등록 특허 5349655로부터 파생된 핵심특허에 대한 중요도 지수의 결과를 나타낸 테이블이고, (b)는 (a)에 표시된 핵심특허간의 연계를 시각화한 것이다.(a) is a table showing the results of the importance index for the core patents derived from US registered patent 5349655, and (b) is a visualization of the linkage between the core patents shown in (a).

(b)에서 노드의 크기는 중요도 지수(weight) 값을 나타낸다. 핵심노드는 노드의 중요도 지수값을 기준으로 년도별로 선별하는데 년도별로 살펴보았을 때 다이아몬드 형태가 되도록 선별하였다. 즉, 시작특허는 1개, 중간부분은 복수로 선택하 고 최하단에서 몇 개의 특허만 선택하는 방식을 이용하였다. In (b), the size of the node represents a weight value. The core node is selected by year based on the node's importance index. In other words, the starting patent was selected as one, the middle part was selected in plural, and only a few patents were selected at the bottom.

(b)에서 실선으로 표시된 노드는 중요도 지수(weight)값이 높아서 선별된 노드이고, 점선으로 표시된 노드는 실선 노드가 하위 노드로 연결이 없을 때 미리 선별된 노드와의 경로를 만들기 위해 추가된 노드이다. 이때 실선 노드의 하위노드들 중에서 미리 선별된 노드와 경로가 존재하는 노드들 중에서 노드의 중요도 지수가 가장 큰 노드가 점선 노드로 추가된다. 즉, 4, 9, 14, 16, 22, 25, 32번 노드가 이렇게 추가된 점선 노드이다. In (b), a node indicated by a solid line is a node selected because of its high weight value, and a node indicated by a dotted line is a node added to make a path with a pre-selected node when the solid node has no connection to a lower node. to be. At this time, among the subnodes of the solid node, the node having the largest importance index among the nodes in which the preselected node and the path exist is added as the dotted line node. In other words, nodes 4, 9, 14, 16, 22, 25, and 32 are added dotted lines.

이와 같이, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.As such, those skilled in the art will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features thereof. Therefore, the above-described embodiments are to be understood as illustrative in all respects and not as restrictive. The scope of the present invention is shown by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

도 1은 본 발명에 따른 인용 네트웍을 이용한 핵심 문헌 추출 및 시각화 시스템을 나타낸 도면.1 is a diagram illustrating a core document extraction and visualization system using a citation network according to the present invention.

도 2는 본 발명에 따른 트리 그래프를 이용하여 직간접 경로와 노드의 중요도 지수를 설명하기 위한 도면.2 is a view for explaining the importance index of the direct and indirect paths and nodes using the tree graph according to the present invention.

도 3은 본 발명에 따른 핵심 문헌 추출 서버의 구성을 개략적으로 나타낸 블럭도.Figure 3 is a block diagram schematically showing the configuration of the core document extraction server according to the present invention.

도 4는 도 3에 도시된 핵심 문헌 선정 모듈의 구성을 상세히 나타낸 블럭도.4 is a block diagram showing in detail the configuration of the core document selection module shown in FIG.

도 5 및 도 6은 본 발명에 따른 트리 그래프를 인접행렬로 표현한 예시도.5 and 6 are exemplary diagrams representing a tree graph according to the present invention in an adjacency matrix.

도 7은 본 발명에 따른 핵심문헌 추출 장치가 핵심문헌을 추출하여 시각화하는 방법을 나타낸 흐름도.Figure 7 is a flow chart illustrating a method of extracting the core documents by the core document extraction apparatus according to the present invention.

도 8은 본 발명에 따른 핵심 문헌간을 시각화한 예시도. 8 is an exemplary view visualizing core documents according to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100 : 문헌 정보 데이터베이스 200 : 핵심 문헌 추출 서버100: document information database 200: core document extraction server

210 : 데이터 수집 모듈 212 : 인용 정보 추출부210: data collection module 212: citation information extraction unit

214 : 핫 스팟 문헌 추출부 230 : 핵심 문헌 선정 모듈214: Hot spot literature extraction unit 230: Core literature selection module

232 : 시작 문헌 선정부 234 : 파생 문헌 추출부 232: start document selection unit 234: derived document extraction unit

236 : 트리 그래프 생성부 238 : 중요도 지수 계산부236: tree graph generator 238: importance index calculator

240 : 핵심 문헌 선정부 250 : 시각화 모듈240: core literature selection unit 250: visualization module

Claims

A bibliographic information database in which bibliographic information is displayed;

Extracts citation information from documents stored in a document information database, selects a start document and a derived document derived from the start document, generates a tree graph with the start document as a root node, and the tree graph A core document extraction server configured to select a core document by calculating an importance index for each node of and visualize a connection between the selected core documents;

Core literature extraction and visualization system using a citation network, including.

The method of claim 1,

The bibliographic information database is a key literature extraction and visualization system using a citation network, characterized in that the literature stored in at least one of the domestic and international academic papers, patents, research reports.

The method of claim 1,

The core document extraction server selects a start document by using an expert or a citation in the relevant field, and uses the extracted citation information to directly cite the reference document and a cited document (indirect citation). A system for extracting and visualizing a core document using a citation network, characterized in selecting derivative documents including a document).

The method of claim 1,

The core document extraction server configures the generated tree graph as an adjacency matrix, and calculates the importance index of each node by making the adjacency matrix a power-square matrix. The key document extraction and visualization system using a citation network .

The method of claim 1,

The core document extraction server configures the generated tree graph as an adjacent matrix and calculates a importance index by applying a depth first search (DFS) technique using the adjacent matrix. Core literature extraction and visualization system.

A data collection module for extracting citation information from documents stored in a document information database and extracting a hot spot document using the citation information;

From the documents collected by the data collection module, a start document and a derivative document derived from the start document are selected to generate a tree graph with the start document as the root node, and the importance index is calculated for each node of the tree graph. A core literature selection module for selecting core literature; and

A visualization module for visualizing a link between the core documents selected by the core document selection module;

Core literature extraction server comprising a.

The method of claim 6,

The data collection module includes a citation information extracting unit for extracting citation and citation information for each document stored in the document information database;

A core document extraction server comprising: a hot spot document extraction unit configured to collect documents cited by each document for a predetermined period of time, calculate forward citation counts, and select hot spot documents using the forward citation counts.

The method of claim 6,

The core document selection module,

A start document selection unit for selecting a start document from documents collected by the data collection module;

A derivative document extracting unit extracting a derived document derived from the starting document selected by the starting document selecting unit using the citation information extracted by the data collection module;

A tree graph generator for generating a tree graph having the root document as a root node with respect to the derivative document extracted by the derivative document extractor;

An importance index calculator for calculating an importance index for each node of the tree graph generated by the tree graph generator; and

Core literature selection server comprising: a core literature selection unit for selecting a document having the highest importance index for each year based on the importance index calculated by the importance index calculation unit.

The method of claim 8,

The starting document selection unit is a core document extraction server, characterized in that for selecting the starting document by using the citation water extracted from the expert or the data collection module.

The method of claim 8,

The importance index calculation unit

Using the materiality index (

),

Where r is the power of the adjacency matrix, m is the maximum value of power, w (k) is

, k is the number of intermediate nodes,

Is a (i, j) component value in the r-squared matrix of the adjacent matrix A, and n is the total number of nodes.

The method of claim 8,

The importance index calculator comprises a tree graph generated by the tree graph generator as an adjacent matrix, and calculates an importance index by applying a depth first search (DFS) technique using the adjacent matrix. Core literature extraction server.

The method of claim 6,

The visualization module searches for a direct connection link to a node of a core document selected in the core document selection module, and selects a lower node having a high importance index for a node having no direct link link to a path with a preselected node. Core document extraction server, characterized in that to add a path to investigate.

(a) extracting the cited / cited information from the literature stored in the literature information database;

(b) selecting a start document from the document and extracting a derivative document derived from the start document;

(c) generating a tree graph on which the derived document is a root node;

(d) calculating a importance index for each node of the generated tree graph;

(e) selecting one or more key documents based on the calculated importance index; and

(f) visualizing a link between the selected core documents;

Core literature extraction and visualization method using a citation network comprising a.

The method of claim 13,

In step (b),

Collecting the documents cited by each document with respect to the documents stored in the document information database, calculating forward citation counts, and selecting hot spot documents using the forward citation counts;

Selecting a starting document from the selected hot spot document; and

Extracting a derived document including a document directly citing the starting document and a document (indirectly cited document) re-cited the cited document using the citation / quotation information extracted in step (a); and Core literature extraction and visualization method using citation network.

The method of claim 13,

In step (d),

Constructing the tree graph generated in step (c) into an adjacency matrix indicating direct connection between nodes;

Obtaining a power matrix by powering the constructed adjacent matrix;

Computing the importance index of the node by using the obtained power matrix and weighted for each path length; core literature extraction and visualization method using a citation network.

The method of claim 13,

In step (d),

Computing the importance index by applying a depth first search (DFS) technique to the constructed adjacent matrix; core literature extraction and visualization method using a citation network.

The method of claim 13,

In step (e),

A method of extracting and visualizing a core document using a citation network, wherein a document having a highest importance index for each year is selected as a core document based on the importance index obtained in step (d).

The method of claim 13,

Step (f),

The direct link link is searched for the nodes of each of the selected core documents, and the node without the direct link link is selected by selecting a lower node having a high importance index and adding a path by investigating a path with a preselected node. Core literature extraction and visualization method using citation network, characterized by visualizing the link between the documents.