KR101061075B1

KR101061075B1 - How to Create Contour Maps for Research Trend Analysis

Info

Publication number: KR101061075B1
Application number: KR1020090115837A
Authority: KR
Inventors: 예상준; 장현철; 김상균; 김철; 김진현; 송미영
Original assignee: 한국 한의학 연구원
Priority date: 2009-11-27
Filing date: 2009-11-27
Publication date: 2011-08-31
Also published as: KR20110059185A

Abstract

등고선 맵 생성 방법이 개시된다. 본 발명에 따른 등고선 맵 생성 방법은, 데이터 베이스에 저장된 다수의 연구 자료들에 대하여 텍스트 마이닝 기법을 이용하여 키워드를 추출하는 제1 단계, 다수의 연구 자료와, 추출된 상기 키워드에 대한 벡터 공간을 생성하는 제2 단계, 생성된 벡터 공간에 대하여 상관 분석(correspondence analysis) 기법을 이용하여 다수의 연구 자료와 키워드에 대한 상관관계를 분석하여 2차원 좌표를 생성하는 제3 단계, 2차원 좌표를 핵밀도추정(kernel density estimation) 기법을 이용하여 높이가 표시된 3차원 좌표의 등고선 맵으로 변환하는 제4 단계, 국소 최대값 검출(local maxima detection) 기법을 이용하여 등고선 맵에서 봉우리 위치를 검출하는 제5 단계 및 k-최근접 이웃 검출 기법을 적용하여 검출된 봉우리 위치에 키워드를 매칭시켜 표시하는 제6 단계를 포함한다.A contour map generation method is disclosed. The method of generating a contour map according to the present invention includes a first step of extracting keywords using a text mining technique for a plurality of research materials stored in a database, a plurality of research data, and a vector space for the extracted keywords. The second step of generating, the third step of generating a two-dimensional coordinates by analyzing the correlation of a plurality of research data and keywords by using a correlation analysis (correspondence analysis) method for the generated vector space, the two-dimensional coordinates nuclear A fourth step of converting a height map into a contour map of three-dimensional coordinates indicated by height, and a fifth step of detecting peak positions in the contour map using a local maxima detection method. And a sixth step of applying a k-nearest neighbor detection technique to match the keyword to the detected peak position and to display the keyword.

등고선 맵, 2차원 좌표, 3차원 좌표, 연구 동향, 키워드 Contour map, two-dimensional coordinates, three-dimensional coordinates, research trends, keywords

Description

How to create contour map for research trend analysis {Method for creating contour map for research trend analysis}

본 발명은 연구 동향 분석을 위한 등고선 맵 생성 방법에 관한 것으로, 보다 상세하게는, 다수의 연구 자료에 대한 연구 동향 분석에 이용되는 등고선 맵 생성 방법에 관한 것입니다.The present invention relates to a method for generating contour maps for research trend analysis, and more particularly, to a method for generating contour maps used for research trend analysis on a plurality of research data.

인터넷을 기반으로 하는 지식사회의 특징은 정보홍수로 지칭되는 정보의 과잉공급이다. 그러나, 빠르게 변하는 무수한 정보 중에서 필요한 정보를 효율적으로 수집하고 분석하는 것이 점점 어려워지고 있다. The characteristic of the knowledge society based on the Internet is the oversupply of information called information flood. However, it is becoming increasingly difficult to efficiently collect and analyze necessary information among a myriad of rapidly changing information.

최근 과학기술의 발전 속도는 매우 빠르며 연구 분야는 다양하고 변화하고 있기 때문에 관련 정보다 과거 어느 때보다 급격하게 증가하고 있으며, 연구자들은 연구 분야를 결정하기 위해 사전 조사에 투자하는 시간이 점점 늘어나고 있는 추세이다. 따라서, 연구자들은 연구의 중복 방지 및 효율성 제고를 위하여 연구 동향을 분석하는 과정이 필요하게 된다. Recently, the development of science and technology is very fast, and the research field is diverse and changing, so the related information is increasing more rapidly than ever before, and researchers are spending more and more time investing in preliminary research to determine the research field. to be. Therefore, researchers need to analyze research trends in order to prevent duplication of research and to improve efficiency.

기존의 연구자들은 특허 맵 분석과, 논문 맵 분석을 이용하여 연구 동향을 분석하였다. 그리고, 연구 분야를 결정하는데 중요한 지표로써 그 분석 결과를 이용하고 있다. Existing researchers analyzed research trends using patent map analysis and paper map analysis. The results of the analysis are used as important indicators in determining the research field.

특허 맵이란, 특허의 서지사항(출원인, 특허번호 등) 및 특허의 기술내용(특허청구범위, 발명의 상세한 설명, 도면 등)을 분류하여 그 데이터를 가지고 여러 정보를 알기 쉽게 도표화한 것이다. 또한, 논문 맵이란, 논문 정보에 대해서 특허 맵과 유사하게 연구 정보를 분석하기 쉽게 작성한 것이다. A patent map classifies bibliographic matters (patents, patent numbers, etc.) of patents and technical contents of patents (claims, detailed description of the invention, drawings, etc.) and tabulates various information with the data. In addition, a paper map is created so that research information can be easily analyzed with respect to paper information similar to a patent map.

상술한 바와 같은, 특허 맵 및 논문 맵은 국외 특허나, SCI(Science Citation Intex) 논문들을 대상으로 하고 있다. 따라서, 국외 특허 및 SCI 논문들이 차지하는 비중이 타 분야에 비해 적은 한의학 특성상 특허 맵 또는 논문 맵을 이용하여 연구 동향을 분석하는데 어려움이 있었다. 따라서, 한의학과 관련된 다수의 연구 자료를 이용하여 한의학 연구 동향을 분석하는데 적합한 방법이 요구된다. As described above, the patent map and the paper map are for foreign patents or SCI (Science Citation Intex) papers. Therefore, it is difficult to analyze research trends using patent maps or thesis maps because of the characteristics of Korean medicine, which are less occupied by foreign patents and SCI papers than other fields. Therefore, there is a need for a suitable method for analyzing the trends of oriental medical research using a large number of research data related to oriental medicine.

본 발명은 상술한 문제점을 해결하기 위한 것으로, 본 발명의 목적은, 다수의 연구 자료를 분석하여 연구 활성화 정도에 따라 연구 자료가 표시된 3차원 좌표의 등고선 맵을 생성함으로써, 연구 동향을 시각적으로 제공할수 있는 등고선 맵 생성 방법에 관한 것이다. SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to visually provide research trends by analyzing a plurality of research data and generating a contour map of three-dimensional coordinates in which the research data are displayed according to the degree of research activation. It is about how to create a contour map.

이상과 같은 목적을 달성하기 위한 본 발명의 일 실시 예에 따른 등고선 맵 생성 방법은 데이터 베이스에 저장된 다수의 연구 자료들에 대하여 텍스트 마이닝 기법을 이용하여 키워드를 추출하는 제1 단계, 상기 다수의 연구 자료와, 추출된 상기 키워드에 대한 벡터 공간을 생성하는 제2 단계, 생성된 상기 벡터 공간에 대하여 상관 분석(correspondence analysis) 기법을 이용하여 상기 다수의 연구 자료와 상기 키워드에 대한 상관관계를 분석하여 2차원 좌표를 생성하는 제3 단계, 상기 2차원 좌표를 핵밀도추정(kernel density estimation) 기법을 이용하여 높이가 표시된 3차원 좌표의 등고선 맵으로 변환하는 제4 단계, 국소 최대값 검출(local maxima detection) 기법을 이용하여 상기 등고선 맵에서 봉우리 위치를 검출하는 제5 단계 및 k-최근접 이웃 검출 기법을 적용하여 상기 검출된 봉우리 위치에 키워드를 매칭시켜 표시하는 제6 단계를 포함한다. A contour map generation method according to an embodiment of the present invention for achieving the above object is the first step of extracting a keyword using a text mining technique for a plurality of research data stored in a database, the plurality of studies A second step of generating a vector space for the data, the extracted keywords, and analyzing the correlations between the plurality of research data and the keywords by using a correlation analysis technique for the generated vector spaces. A third step of generating two-dimensional coordinates; a fourth step of converting the two-dimensional coordinates into a contour map of the three-dimensional coordinates indicated by height using a kernel density estimation technique; local maxima detection The fifth step of detecting the peak position in the contour map using the detection technique and the k-nearest neighbor detection technique are applied. And a sixth step of displaying by matching keywords to group the detected peak position.

이 경우, 상기 제1 단계는 상기 다수의 연구 자료들에서 관련 텍스트를 추출하는 제1 과정, 한방 및 양방 용어에 대한 국어 사전, 영어 사전 및 중국어 사전을 이용하여 영어 및 한자로 표기된 상기 관련 텍스트를 한글로 변환하여 키워드를 생성하는 제2 과정, 상기 한글로 변환되지 않는 영어 및 한자로 표기된 관련 텍스트를 삭제하는 제3 과정 및 상기 한글로 변환된 키워드를 분석하여 해쉬맵을 업데이트하는 제4 과정을 포함한다. In this case, the first step is a first process of extracting the relevant text from the plurality of research materials, using the Korean dictionary for Chinese and Western terms, the English dictionary and the Chinese dictionary to display the related text in English and Chinese characters. A second process of generating a keyword by converting to Korean, a third process of deleting related texts written in English and Hanja that are not converted into Korean, and a fourth process of updating a hash map by analyzing the converted keyword in Korean; Include.

또한, 상기 제2 단계는 검색 키워드를 입력받는 제1 과정, 상기 업데이트된 해쉬맵에서 상기 검색 키워드를 포함하는 연구 자료와 해당 연구 자료에서 추출된 키워드를 검색하는 제2 과정, 상기 제2 과정에서 검색된 키워드를 빈도수가 높은 순서대로 나열하고, 상위에서 n번째까지의 키워드와 상기 연구 자료에 대한 벡터를 생성하는 제3 과정 및 상기 벡터에서 상기 키워드들의 빈도수를 검출하여 기 설정된 임계값을 초과하면 해당 키워드의 빈도수를 제1 행렬에 저장하고, 기 설정된 임계값을 초과하지 않으면 해당 키워드의 빈도수를 제2 행렬에 저장하는 제4 과정을 포함한다. The second step may include a first step of receiving a search keyword, a second step of searching for a study data including the search keyword in the updated hash map and a keyword extracted from the study data, and in the second step A third step of generating the searched keywords in order of high frequency, and generating a vector for the keyword from the top to the nth and the research data; and detecting the frequency of the keywords in the vector and exceeding a preset threshold. And storing the frequency of the keyword in the first matrix, and storing the frequency of the keyword in the second matrix if it does not exceed the preset threshold.

그리고, 상기 제3 단계는 상기 생성된 제1 행렬을 차원 축소하여 상기 연구 자료와 상기 기 설정된 임계값을 초과하는 키워드에 대한 상기 2차원 좌표를 생성한다. In the third step, the generated first matrix is dimensionally reduced to generate the two-dimensional coordinates for the research data and a keyword exceeding the preset threshold.

한편, 본 발명에 따른 등고선 맵 생성 방법은, 상기 키워드가 매칭된 상기 봉우리 높이를 분석하여, 상기 연구 자료 및 상기 키워드를 도입 단계, 성장 단계 및 성숙 단계의 3단계로 분류하여 표시하는 제7 단계를 더 포함할 수 있다. On the other hand, the contour map generation method according to the present invention, by analyzing the peak height matched with the keyword, the seventh step of classifying the study data and the keyword into three stages of introduction, growth and maturation It may further include.

또한, 상기 제7 단계는 상기 3차원 좌표의 등고선 맵에서 가장 높은 봉우리의 높이를 산출하여 3등분하는 제1 과정, 상기 3등분된 등고선 맵의 높이를 가장 낮은 높이 순서로 상기 도입 단계, 상기 성장 단계 및 상기 성숙 단계로 분류하는 제2 과정, 상기 봉우리에 매칭된 상기 연구 자료 및 상기 키워드를 상기 도입 단계, 상기 성장 단계 및 상기 성숙 단계에 따라 분류하여 표시하는 제3 과정을 포함한다. The seventh step may include the first step of calculating the height of the highest peak in the contour map of the three-dimensional coordinates by dividing the height into three equal parts, and introducing the height of the third divided contour map in the order of the lowest height. And a third process of classifying the maturation stage into a maturation stage, a third process of classifying and displaying the research data and the keyword matching the peaks according to the introduction stage, the growth stage, and the maturation stage.

한편, 상기 키워드는 상기 다수의 연구 자료 각각에 대한 연구 자료의 제목, 연구 자료의 초록 및 색인 항목에 기재된 키워드로 이루어진 것이 바람직하다.On the other hand, the keyword is preferably made of the keywords described in the title of the research data, the abstract of the research data and the index items for each of the plurality of research data.

또한, 상기 등고선 맵은 높이에 따라 색상을 달리하여 표시되는 것이 바람직하다.In addition, the contour map is preferably displayed by changing the color according to the height.

본 발명에 따르면, 다수의 연구 자료를 분석하여 키워드가 표시된 3차원 좌표의 등고선 맵을 생성함으로써, 3차원 좌표의 등고선의 높이를 통해 연구 동향을 분석할 수 있게 된다. 따라서, 연구 동향을 등고선 맵을 이용하여 시각적으로 제공함으로써, 연구자들이 연구 분야를 결정함에 있어서 연구 분야의 중복을 방지하고, 효율성을 제고할 수 있게 되는데 유용하게 이용될 수 있다.According to the present invention, by analyzing a plurality of research data to generate a contour map of the three-dimensional coordinates marked with the keyword, it is possible to analyze the research trend through the height of the contour of the three-dimensional coordinates. Therefore, by providing the research trend visually using the contour map, it can be usefully used to enable researchers to prevent the duplication of the research field and improve efficiency in determining the research field.

이하에서는 첨부된 도면을 참조하여 본 발명을 보다 자세하게 설명한다. Hereinafter, with reference to the accompanying drawings will be described in detail the present invention.

도 1은 본 발명의 일 실시 예에 따른 등고선 맵 생성 방법을 설명하기 위한 흐름도이다. 그리고, 도 2는 본 발명의 일 실시 예에 따른 텍스트 마이닝 기법을 이용하여 키워드를 추출하는 과정을 설명하기 위한 흐름도, 도 3은 본 발명의 일 실시 예에 따라 연구 자료과 키워드에 대한 벡터 공간을 생성하는 과정을 설명하기 위한 흐름도, 도 4는 본 발명의 일 실시 예에 따른 연구 동향 분석 및 표시 과정을 설명하기 위한 흐름도이다, 도 5는 도 3에 도시된 과정에 따라 생성된 벡터 공간을 나타내는 도면이다. 1 is a flowchart illustrating a contour map generation method according to an embodiment of the present invention. 2 is a flowchart illustrating a process of extracting a keyword using a text mining technique according to an embodiment of the present invention, and FIG. 3 is a vector space for research data and keywords according to an embodiment of the present invention. 4 is a flowchart illustrating a research trend analysis and display process according to an embodiment of the present invention. FIG. 5 is a diagram illustrating a vector space generated according to the process illustrated in FIG. 3. to be.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 등고선 맵 생성 방법은, 데이터 베이스에 저장된 다수의 연구 자료들에 대하여 텍스트 마이닝 기법을 이용하여 키워드를 추출한다(S110). 이 과정은 도 2를 이용하여 보다 자세하게 설명한다. Referring to FIG. 1, in the method of generating a contour map according to an exemplary embodiment, keywords are extracted using text mining techniques for a plurality of research data stored in a database (S110). This process will be described in more detail with reference to FIG.

도 2를 참조하면, 우선, 대용량 데이터베이스(예를 들어, 오아시스 데이터베 이스)에 저장된 다수의 연구 자료들에서 관련 텍스트를 추출한다(S111). 이 경우, 다수의 연구 자료들은 소정의 연구와 관련된 모든 자료들을 지칭하는 것으로, 구체적으로는 연구 논문, 연구 보고서, 연구 문서 등이 포함될 수 있다. Referring to FIG. 2, first, relevant texts are extracted from a plurality of research data stored in a large database (eg, an oasis database) (S111). In this case, the plurality of research materials refer to all data related to a predetermined research, and specifically, may include a research paper, a research report, a research document, and the like.

또한, 관련 텍스트는 다수의 연구 자료들 각각의 내용과 관련된 텍스트로써, 연구 자료의 제목, 초록 및 색인 등에 포함된 명사(단일 명사, 복합 명사 모두 포함)가 될 수 있다. In addition, the related text is text related to the contents of each of a plurality of research materials, and may be a noun (including both a single noun and a compound noun) included in the title, abstract, and index of the study material.

관련 텍스트의 추출은 다음과 같은 방법으로 이루어진다. 구체적으로, 다수의 연구 자료들 중에는 원문 자체가 이미지 파일 형태인 PDF 파일이 포함되어 있으므로, 연구 자료의 제목, 초록 및 색인에서 관련 텍스트를 추출할 수 있다. 먼저, 연구 자료의 제목, 초록 및 색인을 추출한 후, 형용사, 조사, 동사 등을 제거하여 명사만 남기는 과정을 통해 관련 텍스트를 추출할 수 있다. Extraction of related text is done in the following way. Specifically, since many originals include PDF files in which the original text itself is an image file, related texts can be extracted from the title, abstract, and index of the research material. First, the title, abstract, and index of the research data can be extracted, and the relevant text can be extracted by removing adjectives, surveys, verbs, etc. and leaving only nouns.

이후, 사전을 이용하여 한자 및 영어로 표기된 관련 텍스트를 한글로 번역한다(S112). 구체적으로, 연구 자료가 한의학 관련 자료라 할지라도, 영어 및 한자의 표기가 많이 존재하는 것으로, 연구 자료에 포함된 영어 및 한자들을 한글로 변역하는 과정이 필요하다. 이 경우, 대용량 데이터베이스에는 다수의 연구 자료 외에 사전이 구비될 수 있다. 또한, 사전은 국어, 중국어 및 영어로 구성되는 27,836 단어의 한방 용어 사전과, 국어 및 영어로 구성되는 2,144 단어의 양방 용어 사전이 될 수 있다. Then, using a dictionary to translate the relevant text written in Chinese characters and English to Korean (S112). Specifically, even though the research data is related to oriental medicine, there are many notations of English and Chinese characters, and a process of translating English and Chinese characters included in the research data into Korean is needed. In this case, a large database may be provided with a dictionary in addition to a plurality of research data. Also, the dictionary may be a 27,836 word Korean terminology dictionary composed of Korean, Chinese, and English, and a 2,144 word term dictionary composed of Korean and English.

S112 단계를 수행하는 과정에서, 한자 및 영어로 표기된 관련 텍스트가 한글로 번역되지 않는 단어(용어)인 경우에는(S113) 해당 관련 텍스트를 삭제한다(S114). 또한, 한글 번역이 완료되면, 번역된 키워드를 이용하여 해쉬맵(Hashmap<Key, Value>)을 업데이트시킨다. In the process of performing step S112, if the related text written in Chinese characters and English is a word (term) not translated into Korean (S113), the related text is deleted (S114). In addition, when the Korean translation is completed, the hash map (Hashmap <Key, Value>) is updated using the translated keyword.

구체적으로, 해쉬맵은 "Key"와 "Value"로 이루어진 것으로, 본 발명에서 "Key"는 키워드가 될 수 있으며, "Value"는 키워드가 검출된 빈도수(누적수)를 나타낸다. 이 과정은 S115, S116 및 S117을 통해 수행될 수 있다. Specifically, the hash map is composed of "Key" and "Value". In the present invention, "Key" may be a keyword, and "Value" represents a frequency (cumulative number) in which the keyword is detected. This process may be performed through S115, S116 and S117.

다음, 한글로 번역된 키워드가 기존에 검출된 것인지를 확인하여(S115), 기 검출된 것이면 해당 키워드에 대한 값을 "1" 증가시킨다(S116). 예를 들어, 해당 키워드가 "인삼"인 경우, 기존에 "인삼"이 한번 검출되었다면 "1"이 더해져 해쉬맵이 <인삼, 2>로 업데이트될 수 있다. 또한, "인삼"이 이전에 두번 검출되었다면 "1"이 더해져 해쉬맵이 <인삼, 3>으로 업데이트될 수 있다. Next, it is checked whether the keyword translated into Korean has been detected (S115), and if it is detected previously, the value for the corresponding keyword is increased to "1" (S116). For example, when the corresponding keyword is "ginseng", when "ginseng" has been previously detected, "1" may be added to update the hash map to <ginseng, 2>. In addition, if "ginseng" has been previously detected twice, "1" may be added to update the hash map to <ginseng, 3>.

그리고, 키워드가 기존에 검출된 것이 아니라, 처음 검출된 것으로 확인되면(S115), 해당 키워드에 대한 값을 "1"로 설정한다(S117). 즉, 해당 키워드가 처음 검출되면 되면 해쉬맵을 <해당 키워드, 1>로 업데이트한다. If it is confirmed that the keyword is detected first instead of being detected in the past (S115), the value for the keyword is set to "1" (S117). That is, when the keyword is detected for the first time, the hash map is updated to the <keyword, 1>.

이 같은 S111 내지 S117의 과정은 데이터 베이스에 저장된 모든 연구 자료에 대해서 수행될 수 있다. 또한, S111 내지 S117의 과정을 통해 텍스트 마이닝 기법 에 의한 키워드 추출이 완료되면, 도 1에 도시된 S120 단계를 수행한다. This process of S111 to S117 can be performed for all the research data stored in the database. In addition, when the keyword extraction by the text mining technique is completed through the process of S111 to S117, step S120 shown in FIG.

즉, 다수의 연구 자료와, 추출된 키워드에 대한 벡터 공간을 생성한다(S120). 이에 대한 설명은 도 3을 이용하여 자세하게 설명한다. That is, a plurality of research data and a vector space for extracted keywords are generated (S120). This will be described in detail with reference to FIG. 3.

우선, 검색 키워드가 입력되면(S121), 업데이트된 해쉬맵에서 검색 키워드를 포함하는 연구 자료와 해당 연구 자료에서 추출된 키워드를 검색한다(S122). 구체적으로, 연구자로부터 검색하고자 하는 분야에 대한 키워드, 즉, 검색 키워드가 입력되면, 업데이트된 해쉬맵에서 해당 검색 키워드를 포함하는 연구 자료를 모두 검색하고, 검색된 연구 자료에서 추출된 키워드 역시 모두 검색한다.First, when a search keyword is input (S121), the search data including the search keyword and the keywords extracted from the study data in the updated hash map is searched (S122). Specifically, if a keyword for a field to be searched from a researcher, that is, a search keyword is input, search for all research data including the search keyword in the updated hash map, and search for all keywords extracted from the searched research data. .

이후, S121 과정에서 검색된 키워드를 빈도수가 높은 순서대로 나열하고, 상위에서 n번째까지의 키워드와 연구 자료에 대한 벡터를 생성한다(S123). 이 경우, 연구 자료는 행 단위로 나열하고, n번째까지의 키워드들은 열 단위로 나열하여 각각의 연구 자료에 대한 키워드의 빈도수를 표기한다. 이 같은 방법으로 생성된 벡터는 도 5에 도시되어 있다. Thereafter, the keywords searched in step S121 are listed in order of high frequency, and a vector for the keyword from the top to the nth keyword and the research data is generated (S123). In this case, the research data are listed in rows, and the keywords up to the nth column are listed in columns, indicating the frequency of keywords for each research data. The vector generated in this way is shown in FIG.

도 5를 참조하면, 우선, 키워드들 중 빈도수가 n번째까지의 키워드들만을 검색한다. 그리고, 연구 자료는 행 단위에 연구 자료 1, 연구 자료 2, 연구 자료 3, …, 연구 자료 m로 나열하고, 추출된 키워드들은 열 단위에 키워드 1, 키워드 2, 키워드 3, …, 키워드 n로 나열하여 벡터를 생성한다. 그리고, 연구 자료 1에서 키 워드 1이 갖는 값, 즉, 연구 자료 1에서 키워드 1이 등장한 빈도수(C₁₁)를 표시한다. 이 같은 방식으로, 연구 자료 2 내지 연구 자료 m에서 키워드 1 내지 n이 몇 번이나 검출되었는지를 확인하여 도 5와 같은 벡터를 생성한다. Referring to FIG. 5, first, only frequencies up to the nth frequency of the keywords are searched. And, research data are research data 1, research data 2, research data 3,… in row units. , M as the research data, and the extracted keywords are columnar units of keyword 1, keyword 2, keyword 3,... Create a vector by listing them with the keyword n. In addition, the value of keyword 1 in research data 1, that is, the frequency C ₁₁ in which keyword 1 appears in research data 1 is displayed. In this way, it is confirmed how many times the keywords 1 to n are detected in the study data 2 to the study data m to generate a vector as shown in FIG.

한편, S123 단계 이후, 벡터가 생성되면 벡터에서 각 키워드의 빈도수가 기 설정된 임계값을 초과하면(S124), 연구 자료와 해당 키워드의 빈도수를 제1 행렬에 저장하고(S125), 초과하지 않을 경우에는 연구 자료와 해당 키워드의 빈도수를 제2 행렬에 저장한다(S126). Meanwhile, after the step S123, when the vector is generated, if the frequency of each keyword in the vector exceeds a preset threshold (S124), the frequency of the research data and the corresponding keyword is stored in the first matrix (S125), and if not exceeded. In the study data and the frequency of the keyword is stored in the second matrix (S126).

예를 들어, 도 5에서와 같이 연구 자료들을 행 단위로 나열하고, 해당 키워드들을 열 단위로 나열한 각각 제1 행렬 및 제2 행렬을 생성하고, 각 연구 자료에서 추출된 키워드의 빈도수가 기 설정된 임계값을 초과하면, 해당 키워드의 빈도수를 제1 행렬에서 해당 연구 자료과 해당 키워드의 위치에 저장한다. 또한, 그렇지 않을 경우에는 제2 행렬에 저장한다. 기 설정된 임계값을 초과하는 키워드가 저장되는 제1 행렬은 메인 행렬로 볼 수 있고, 제2 행렬은 서브 행렬로 볼 수 있으며, 제1 행렬 및 제2 행렬은 모두 벡터 공간이 될 수 있다. For example, as shown in FIG. 5, study materials are listed in row units, first and second matrices each of corresponding keywords are arranged in column units, and the frequency of keywords extracted from each study data is set at a predetermined threshold. If the value is exceeded, the frequency of the keyword is stored in the research data and the position of the keyword in the first matrix. Otherwise, it is stored in the second matrix. The first matrix in which keywords exceeding a predetermined threshold value are stored may be viewed as a main matrix, the second matrix may be viewed as a sub-matrix, and both the first matrix and the second matrix may be vector spaces.

한편, S120 단계(구체적으로는, 도 3에 도시된 S121 내지 S126 단계)를 통해 벡터 공간이 생성되면, 이 벡터 공간에 대하여 상관 분석(correspondence analysis) 기법을 적용하여 다수의 연구 자료와 키워드에 대한 상관관계를 분석하 여 2차원 좌표를 생성한다(S130). On the other hand, if a vector space is generated through step S120 (specifically, steps S121 to S126 shown in FIG. 3), a correlation analysis technique is applied to the vector space to search for a plurality of research data and keywords. The correlation is analyzed to generate two-dimensional coordinates (S130).

다수의 연구 자료와 각 키워드에 대한 상관 관계를 분석하여 좌표 상에 표시하게 될 경우, 각 키워드의 수에 따라 차원이 결정된다. 예를 들어, 벡터 공간에 존재하는 키워드가 50개일 경우, 50차원의 좌표가 생성될 수 있다. 따라서, 행의 수가 m(=p)이고, 열의 수가 n(=q)로 이루어진 벡터 공간에 대하여 상관 분석(correspondence analysis) 기법을 적용하여 연구 자료와 키워드 간의 상관 관계, 연구 자료와 연구 자료 간의 상관 관계, 키워드와 키워드 간의 상관 관계를 분석하고, 연구 자료와 키워드에 대한 저차원 좌표, 즉, 2차원 좌표를 생성한다. When a plurality of research data and correlations for each keyword are analyzed and displayed on the coordinates, the dimension is determined according to the number of each keyword. For example, when 50 keywords exist in a vector space, 50-dimensional coordinates may be generated. Therefore, the correlation between study data and keywords and the correlation between research data and research data are applied by applying a correlation analysis technique to a vector space where the number of rows is m (= p) and the number of columns is n (= q). Analyze the relationship, correlation between keywords and keywords, and generate low-dimensional coordinates, that is, two-dimensional coordinates, for research data and keywords.

2차원 좌표 생성에 있어서, 주성분 분석(principal coordinate analysis) 기법 또는 다차원 척도법 등을 이용할 수 있으나, 이들 기법은 연구 자료와 연구 자료 간의 상관 관계만 분석 가능한 것으로, 연구 자료와 키워드 간의 상관 관계나 키워드와 키워드 간의 상관 관계를 분석하는 것이 불가능하다. 따라서, 보다 정확한 상관 관계 분석을 위해 연구 자료와 키워드 간의 상관 관계 및 키워드와 키워드 간의 상관 관계 분석이 가능한 상관 분석(correspondence analysis) 기법을 이용하는 것이 바람직하다.In generating two-dimensional coordinates, principal coordinate analysis or multi-dimensional scaling can be used. However, these techniques can only analyze correlations between research data and research data. It is impossible to analyze the correlation between keywords. Therefore, for more accurate correlation analysis, it is preferable to use a correlation analysis technique that enables correlation between research data and keywords and correlation analysis between keywords and keywords.

상관 분석 기법을 이용한 2차원 좌표 생성은 다음과 같은 과정을 통해 이루어질 수 있다.2D coordinate generation using the correlation analysis technique may be performed through the following process.

q-1 차원 심플렉스 S^q에서 단위 벡터

에 행 백터 a₁, a₂, a₃, …, a_p를 사영하면

이며, a_i의 정사영들의 제곱합이 최대일 때, 차원축소에 의한 손실이 최소가 된다. 따라서, 제1 축을 결정하는 행렬식은 아래의 수학식 1과 같이 나타낼 수 있다. Unit vector from q-1 dimensional simplex S ^q

In row vectors a ₁ , a ₂ , a ₃ ,. , if you project a _p

When the sum of squares of the orthogonal projections of a _i is maximum, the loss due to dimension reduction is minimal. Therefore, the determinant for determining the first axis may be expressed as Equation 1 below.

수학식 1에서

이고,

이다. 따라서,

행렬 G를

로 정의할 때, 수학식 1은 아래의 수학식 2로 나타날 수 있다. In Equation 1

ego,

to be. therefore,

Matrix G

When defined as, Equation 1 may be represented by Equation 2 below.

수학식 2는

의 최대 고유값(eigenvalue)과 일치하며 모든 고유값의 합은 아래의 수학식 3에서와 같이 카이제곱 통계값과 비례한다. Equation 2 is

The sum of the eigenvalues of and the sum of all eigenvalues is proportional to the chi-square statistic as in Equation 3 below.

수학식 3에서

에 대한 고유값

를 주관성(principal inertia),

를

에 대한 비정칙값(singular value)이라고 한다. 따라서, 고유벡터(eigenvector)

로 결정된 선형 부분 공간(linear subspace)에서 행 a₁, …, a_p는 각각 좌표값

로 표현되며 열 e₁, …, e_q는

로 표현된다. 이 같은 행렬의 값은 0 이 아닌 고유값

와 대응하는 고유벡터

마다 가능하지만 효과적인 차원축소에서는 처음

의 축만 활용한다. 따라서, s=2인 경우에는 아래의 수학식 4에서와 같이 행과 열이 2차원 공간으로 나타낸다. In equation (3)

Eigenvalues for

Subjective (principal inertia),

To

This is called the singular value for. Thus, eigenvectors

Rows a ₁ ,... In a linear subspace determined by. , a _p are the coordinate values

Represented by the columns e ₁ ,. , e _q is

It is expressed as The values of such a matrix are nonzero eigenvalues.

Eigenvectors corresponding to

Every time, but for the first time effective scaling

Use only the axis of. Therefore, when s = 2, rows and columns are represented in a two-dimensional space as shown in Equation 4 below.

행 표본의 표현 :

Representation of the row sample:

열 범주의 표현 :

Representation of the ten categories:

k가 총 카이제곱 변동 중에서

를 설명하므로, s축 차원 근사도를 고유값의 누적비율인

로 정의할 수 있다. k is the total chi-square variation

Since the s-axis approximation is the cumulative ratio of the eigenvalues

Can be defined as

차원수(s)는 주성분 분석처럼 고유값

의 감소 패턴을 고려하여 결정될 수 있다. 평면 공간에 행과 열을 표현하는 것이 보기 편하기 때문에 s=2가 선호될 수 있다. 수학식 4에서 행 표본의 표현을 행의 주좌 표(principal coordinates)라 하고, 열 범주의 표현을 표준화좌표(standard coordinate)라고 한다. The number of dimensions (s) is the eigenvalues as in the principal component analysis

It may be determined in consideration of the reduction pattern of. S = 2 may be preferred because it is easier to represent rows and columns in planar space. In Equation 4, the representation of the row sample is called principal coordinates of the row, and the representation of the column category is called standard coordinate.

한편, S130 단계를 통해 2차원 좌표가 생성되면, 이 2차원 좌표를 핵밀도추정(kernel density estimation, parzen window method) 기법을 적용하여 높이가 표시된 3차원 좌표의 등고선 맵으로 변환한다(S140). On the other hand, when the two-dimensional coordinates are generated through the step S130, the two-dimensional coordinates are converted into a contour map of the three-dimensional coordinates indicated by the height by applying the kernel density estimation (kernel density estimation, parzen window method) (S140).

S130 단계에서 생성된 2차원 좌표는 연구 자료들과 키워드 간의 상관 관계에 따라 등고선이 생성된다. 구체적으로, 상관도가 유사한 연구 자료 및 키워드들이 밀집되어 표시된다. 이 같은 2차원 좌표에, 핵밀도추정 기법을 적용하여 등고선 높이를 표시함으로써, 3차원 좌표의 등고선 맵을 생성할 수 있다. The two-dimensional coordinates generated in step S130 are contours generated according to the correlation between the research data and the keywords. Specifically, research data and keywords having similar correlations are displayed in a dense manner. By applying the nuclear density estimation technique to such two-dimensional coordinates, the contour height can be displayed to generate a contour map of three-dimensional coordinates.

핵밀도추정 기법은 확률변수의 확률밀도 함수를 추정하는 방법으로, 모집단에 대해 몇 개의 자료가 주어졌을 때, 전체 모집단을 추정할 수 있는 방법이다. 본 발명에서 모집단은 "검색 키워드"에 해당하는 모든 연구 자료이며, 몇 개의 자료는 "검색 키워드"에 의해 검색된 연구 자료가 될 수 있다. 예를 들어, 데이터베이스에 저장된 다수의 연구 자료에서 "인삼"이란 검색 키워드에 의해 검색된 연구 자료가 될 수 있다. 따라서, "인삼"에 의해서 검색된 연구 자료에 대해서 확률밀도 함수를 적용하여 이를 등고선 상에 표시함으로써, "인삼"에 해당하는 모든 연구 자료에 대한 연구 동향을 분석할 수 있는 3차원 좌표를 생성할 수 있게 된다. Nuclear density estimation is a method of estimating the probability density function of a random variable. It is a method of estimating the total population when several pieces of data are given. In the present invention, the population is all research data corresponding to "search keywords", and some data may be research data searched by "search keywords". For example, in a number of research data stored in a database, "ginseng" may be research data searched by a search keyword. Therefore, by applying the probability density function to the research data searched by "ginseng" and displaying it on the contour line, it is possible to generate three-dimensional coordinates that can analyze the research trends for all research data corresponding to "ginseng". Will be.

즉, 확률밀도 함수가 높게 나타나는 연구 자료에 대해서는 등고선 높이를 높게 표시하고, 확률밀도 함수가 낮게 나타나는 연구 자료에 대해서는 등고선 높이를 낮게 표시한다. 이 경우, 등고선에서 높이 표시는 각 등고선 높이에서 색상을 달리할 수 있다. 구체적으로, 가장 낮은 등고선에서 가장 높은 등고선으로 갈수록 점차적으로 색상을 옅게 하여 표시할 수 있다. 또는 높이에 따라 색상을 달리하여 표시할 수도 있다.In other words, for the research data with high probability density function, the contour height is high, and for the research data with low probability density function, the contour height is low. In this case, the height indication in the contour may vary in color at each contour height. In detail, the color may be gradually lightened from the lowest contour to the highest contour. Alternatively, the colors may be displayed differently depending on the height.

2차원 좌표가 3차원 좌표의 등고선 맵으로 변환되면, 3차원 좌표의 등고선 맵에 국소 최대값 검출(local maxima detection) 기법을 적용하여 봉우리 위치를 검출한다(S150). 이 경우, 3차원 좌표의 등고선 맵 상에서 어떤 점 x가 주위의 x'에 대하여

를 만족하는 경우, 그

를 국소 최대값이라고 한다. 즉, 주어진 소정의 ε(>0)에 대하여

을 만족하는 모든 x는

를 만족해야한다. 이 경우, 등고선 맵이 3차원 좌표이므로, 국소 최대값을 검출하기 위해 x 좌표뿐만 아니라, y좌표로 검출해야한다. 따라서,

를 만족하는 모든 (x, y) 좌표를 검출해야한다. 이 같은 방식으로 검출된 모든 (x, y) 좌표가 3차원 좌표의 등고선 맵에서의 봉우리 위치가 될 수 있다. When the two-dimensional coordinates are converted into the contour map of the three-dimensional coordinates, the peak position is detected by applying a local maxima detection technique to the contour map of the three-dimensional coordinates (S150). In this case, a point x on the contour map of three-dimensional coordinates

If you meet that,

Is called the local maximum. That is, for a given ε (> 0)

All x satisfy

Should be satisfied. In this case, since the contour map is three-dimensional coordinates, not only the x coordinate but also the y coordinate must be detected in order to detect the local maximum value. therefore,

Should detect all (x, y) coordinates that satisfy All (x, y) coordinates detected in this way can be peak positions in the contour map of three-dimensional coordinates.

한편, k-최근접 이웃 검출(k-nearest neighbor detection) 기법을 적용하여 검출된 봉우리 위치에 키워드를 매칭시켜 표시한다(S160). 구체적으로, 검출된 각 봉우리 위치에서 이웃하는 키워드의 위치를 산출하여, 봉우리 위치와 산출된 키워드 위치 간의 거리에 따른 유사도를 결정한다. 그리고, 유사도 순서가 높은 순서대로 k번째까지의 키워드만을 봉우리 위치에 매칭시켜 표시할 수 있다. On the other hand, by applying a k-nearest neighbor detection (k-nearest neighbor detection) technique to match the keyword to the detected peak position (S160). Specifically, the position of the neighboring keyword at each detected peak position is calculated to determine the similarity according to the distance between the peak position and the calculated keyword position. Then, only the kth keyword up to the peak position can be displayed and displayed in the order of high similarity.

아래의 수학식 5를 이용하여 봉우리 위치와 산출된 연구 자료의 위치 간의 거리에 따른 유사도

를 산출할 수 있다. Similarity according to distance between peak location and location of calculated research data using Equation 5 below

Can be calculated.

수학식 5에서, X와 Y는 각각 k개의 키워드로 구성되는 봉우리 위치이며, x_i와 y_i를 각각 i번째 키워드 위치이다. In Equation 5, X and Y are peak positions each composed of k keywords, and x _i and y _i are respectively the i th keyword position.

수학식 5에서,

는 대상 봉우리 위치에서의 가중치로, 대상 봉우리 위치와 키워드들 간의 거리가 될 수 있다. 또한,

는 대상 i번째 키워드 위치(x_i, y_i)에 대한 표준 좌표 변환이다.In Equation 5,

Is a weight at the target peak position, and may be a distance between the target peak position and the keywords. Also,

Is the standard coordinate transformation for the target i th keyword position (x _i , y _i ).

한편, 키워드가 매칭된 3차원 좌표의 등고선 맵이 생성되면, 키워드가 매칭된 봉우리 높이를 분석하여, 키워드를 도입 단계, 성장 단계 및 성숙 단계의 3단계로 분류하여 표시한다(S170). 이 단계에 대한 구체적인 설명은 도 4를 이용한다. On the other hand, when the contour map of the matched keyword is generated, the peak height matched by the keyword is analyzed, and the keyword is classified into three stages of an introduction stage, a growth stage, and a maturation stage (S170). A detailed description of this step uses FIG. 4.

도 4를 참조하면, 먼저, 3차원 좌표의 등고선 맵에서 각 봉우리의 높이를 산출한다(S171). 이 때, 봉우리 높이는 등고선 맵에서 바닥면(또는 가장 낮은 봉우리)에서부터 가장 높은 봉우리 높이까지의 거리를 3등분한다. Referring to FIG. 4, first, the height of each peak is calculated from the contour map of the three-dimensional coordinates (S171). At this time, the peak height divides the distance from the bottom surface (or the lowest peak) to the highest peak height by three in the contour map.

이후, 봉우리 높이를 3등분하여 도입 단계, 성장 단계 및 성숙 단계로 분류한다(S172). 이 경우, 3등분된 봉우리를 가장 낮은 높이 순서로 도입 단계, 성장 단계 및 성숙 단계로 분류한다. Thereafter, the peak height is divided into three and classified into an introduction stage, a growth stage, and a maturation stage (S172). In this case, the triangular peaks are sorted in order of introduction, growth and maturity in order of lowest height.

이후, 각 키워드가 매칭된 봉우리가 어느 단계에 해당하는지를 확인하여(S173) 각 키워드들을 도입 단계, 성장 단계 및 성숙 단계로 각각 분류한다(S174). 예를 들어, 키워드 중 "인삼"이 매칭된 봉우리가 도입 단계, 성장 단계 및 성숙 단계 중 어느 단계에 해당하는지를 확인한다. 확인 결과, "인삼"이 성장 단계에 해당하는 것으로 분석되면, "인삼"이라는 키워드를 포함하는 연구 자료을 분류한다. 이 같은 과정은 3차원 등고선 맵에 매칭된 모든 키워드들에 적용될 수 있다. Thereafter, it is checked which stage corresponds to the peak where each keyword is matched (S173), and each keyword is classified into an introduction stage, a growth stage, and a maturation stage (S174). For example, the peak matched by the keyword "ginseng" corresponds to the stage of introduction, growth, and maturation. As a result, when "ginseng" is analyzed to correspond to the growth stage, the research data including the keyword "ginseng" is classified. This process can be applied to all keywords matched to the 3D contour map.

결과적으로, 화면 상에는 도입 단계, 성장 단계 및 성숙 단계에 어떤 키워드들이 있는지와, 연구 자료의 수가 표시된다. 따라서, 현재 가장 활발하게 연구되고 있는 분야와, 공백 기술 분야를 알 수 있게 된다.As a result, on the screen, what keywords are present in the introduction stage, growth stage, and maturity stage, and the number of research data is displayed. Therefore, it is possible to know the field that is currently being actively researched and the blank technology field.

본 발명에 제공하는 도시된 연구 단계 분석 방법은 컴퓨터와 같은 기기를 이용하여 데이터 베이스(예를 들어, 오아시스 데이터베이스)에 저장된 연구 자료을 S110 내지 S170 단계의 과정을 거쳐 현재 연구되고 있는 의학 분야와, 그 분야의 연구 단계를 도입 단계, 성장 단계 및 성숙 단계로 분류하여 표시할 수 있게 된다. 따라서, 현재의 연구 단계를 알 수 있을 뿐만 아니라, 앞으로의 연구 방향을 결정하는데 많은 도움이 될 수 있다. The research step analysis method provided in the present invention is a medical field currently being studied through the process of steps S110 to S170 using a device such as a computer in a database (for example, an oasis database), and Research phases in the field can be labeled and labeled as introduction, growth and maturity. Therefore, not only the current research stage can be known, but also it can be very helpful in determining the future research direction.

한편, 도 6은 본 발명의 일 실시 예에 따른 3차원 좌표의 등고선 맵을 나타내는 도면, 도 7은 키워드가 매칭된 3차원 좌표의 등고선 맵을 나타내는 도면이며, 도 8은 키워드에 따른 연구 단계를 도입 단계, 성장 단계 및 성숙 단계의 3단계를 분석 및 표시 3차원 좌표의 등고선 맵을 나타내는 도면이다. 6 is a diagram illustrating a contour map of three-dimensional coordinates according to an embodiment of the present invention, FIG. 7 is a diagram illustrating a contour map of three-dimensional coordinates with matching keywords, and FIG. 8 is a study step according to a keyword. It is a figure which shows the contour map of a three-dimensional coordinate analysis and display three stages of an introduction stage, a growth stage, and a maturation stage.

도 6은 2차원 좌표를 핵밀도추정 기법을 적용하여 3차원 좌표의 등고선 맵으로 변환된 상태의 도면으로, 등고선의 높이에 따라 상이한 색상이 표시된다. 도 5에서 등고선 맵에 표시된 점들은 키워드 또는 연구 자료에 관한 것일 수 있다.FIG. 6 is a diagram in which two-dimensional coordinates are converted into contour maps of three-dimensional coordinates by applying a nuclear density estimation technique, and different colors are displayed according to the height of the contour lines. The points displayed on the contour map in FIG. 5 may relate to keywords or research data.

또한, 도 7은 키워드들 간의 유사도를 결정하여 3차원 좌표의 등고선 맵 상에 매칭시킨 상태의 도면으로, 봉우리 위치에 키워드가 매칭되어 표시된다. In addition, FIG. 7 is a diagram of a state in which similarity between keywords is determined and matched on a contour map of three-dimensional coordinates, and keywords are matched and displayed at peak positions.

한편, 도 8은 등고선 맵에서 가장 높은 봉우리 높이를 기준으로 3등분하여 가장 낮은 부분부터 도입 단계, 성장 단계 및 성숙 단계로 분류하고, 각 부분에 매칭되어 있는 키워드를 표시한다. 즉, 도 8을 참조하면, 왼쪽에는 성숙 분야, 성장 분야, 공백 분야(즉, 도입 분야) 각각에 키워드가 분류 및 표시되어 있으며, 각 분야의 연구 자료 수도 표시되어 있다. 이러한 분류 표시를 통해 연구자는 의학 분야에 있어서 현재 연구 단계를 성숙 분야, 성장 분야, 공백 분야 별로 확인할 수 있게 되며, 앞으로의 연구 방향을 결정할 수 있게 된다. On the other hand, FIG. 8 is divided into three parts based on the highest peak height in the contour map, and is classified into the introduction stage, the growth stage, and the maturity stage from the lowest portion, and displays keywords matching each portion. That is, referring to FIG. 8, keywords are classified and displayed in each of a mature field, a growth field, and a blank field (that is, an introduction field), and the number of research data of each field is displayed on the left side. This labeling allows researchers to identify current research stages in the field of medicine by maturity, growth, and gaps, and to determine future research direction.

본 발명에서 제공되는 방법은 의학 분야에만 적용되는 것은 아니며, 연구 단계에 대한 분석이 필요한 모든 분야에 적용 가능하다. The method provided by the present invention is not only applicable to the medical field, but is applicable to all fields requiring analysis of the research stage.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가 진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다. While the above has been shown and described with respect to preferred embodiments of the invention, the invention is not limited to the specific embodiments described above, it is usually in the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Various modifications may be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

도 1은 본 발명의 일 실시 예에 따른 등고선 맵 생성 방법을 설명하기 위한 흐름도, 1 is a flowchart illustrating a contour map generation method according to an embodiment of the present invention;

도 2는 본 발명의 일 실시 예에 따른 텍스트 마이닝 기법을 이용하여 키워드를 추출하는 과정을 설명하기 위한 흐름도,2 is a flowchart illustrating a process of extracting a keyword using a text mining technique according to an embodiment of the present invention;

도 3은 본 발명의 일 실시 예에 따라 연구 자료과 키워드에 대한 벡터 공간을 생성하는 과정을 설명하기 위한 흐름도, 3 is a flowchart illustrating a process of generating a vector space for research data and keywords according to an embodiment of the present invention;

도 4는 본 발명의 일 실시 예에 따른 연구 동향 분석 및 표시 과정을 설명하기 위한 흐름도, 4 is a flowchart illustrating a research trend analysis and display process according to an embodiment of the present invention;

도 5는 도 3에 도시된 과정에 따라 생성된 벡터 공간을 나타내는 도면,5 is a diagram illustrating a vector space generated according to the process illustrated in FIG. 3;

도 6은 본 발명의 일 실시 예에 따라 생성된 등고선 맵을 나타내는 도면, 6 is a view showing a contour map generated according to an embodiment of the present invention;

도 7은 본 발명의 일 실시 예에 따라 키워드가 매칭된 등고선 맵을 나타내는 도면, 그리고, 7 is a view showing a contour map with matching keywords according to an embodiment of the present invention;

도 8은 본 발명의 일 실시 예에 따른 연구 동향이 분석 및 표시된 등고선 맵을 나타내는 도면이다. 8 is a diagram illustrating a contour map of analyzed and displayed research trends according to an embodiment of the present invention.

Claims

Extracting keywords using a text mining technique for a plurality of research materials stored in a database;

Generating a vector space for the plurality of research data and the extracted keywords;

A third step of generating two-dimensional coordinates by analyzing correlations between the plurality of research data and the keyword by using a correlation analysis technique with respect to the generated vector space;

A fourth step of converting the two-dimensional coordinates into a contour map of three-dimensional coordinates of which height is indicated by using a kernel density estimation technique;

Detecting a peak position in the contour map using a local maxima detection technique; And

and a sixth step of matching and displaying a keyword at the detected peak position by applying a k-nearest neighbor detection technique.

The method of claim 1,

The first step,

Extracting relevant text from the plurality of research materials;

A second process of generating a keyword by converting the related text written in English and Chinese characters into Korean using a Korean dictionary, an English dictionary, and a Chinese dictionary for Chinese and Western terms;

A third process of deleting related texts written in English and Chinese characters that are not converted into Korean; And

And a fourth process of updating a hash map by analyzing the keyword converted into Korean.

The method of claim 2,

The second step,

A first step of receiving a search keyword;

A second step of searching for research data including the search keyword and keywords extracted from the research data in the updated hash map;

A third process of arranging keywords searched in the second process in order of high frequency, and generating a vector for the keyword from the top to the nth and the research data; And

A fourth method of detecting a frequency of the keywords in the vector and storing the frequency of the keyword in the first matrix if the predetermined threshold is exceeded, and storing the frequency of the keyword in the second matrix if the preset threshold is not exceeded. Contour map generation method comprising the process.

The method of claim 3,

The third step,

And dimensionally reducing the generated first and second matrices to generate the two-dimensional coordinates for the research data and the keyword exceeding the preset threshold.

The method of claim 1,

And a seventh step of analyzing the peak height matched with the keyword and classifying the research data and the keyword into three stages of introduction, growth, and maturation.

The method of claim 5,

The seventh step,

A first step of calculating a height of the highest peak in the contour map of the three-dimensional coordinates and dividing it into three equal parts;

A second process of classifying the heights of the third divided contour map into the introduction step, the growth step, and the maturation step in order of lowest height;

And a third process of classifying and displaying the research data and the keyword matching the peaks according to the introduction stage, the growth stage, and the maturity stage.

The method of claim 1,

The keyword is

Method for generating a contour map, characterized in that consisting of the keywords of the title of the research data, the abstract of the research data and the index items for each of the plurality of research data.

The method of claim 1,

And the contour map is displayed by changing colors according to heights.