KR20130057715A

KR20130057715A - Method for providing deep domain knowledge based on massive science information and apparatus thereof

Info

Publication number: KR20130057715A
Application number: KR1020110123596A
Authority: KR
Inventors: 전홍우; 최성필; 최윤수; 정창후; 성원경; 송사광; 이원구; 정도헌
Original assignee: 한국과학기술정보연구원
Priority date: 2011-11-24
Filing date: 2011-11-24
Publication date: 2013-06-03
Also published as: KR101374195B1

Abstract

PURPOSE: A deep knowledge providing method based on a scientific knowledge memory and a device thereof are provided to analyze literature of a specific science field and copy a complex process which learns knowledge by using natural language processing and mining technology, thereby automatically extracting and accumulating specialized knowledge. CONSTITUTION: A knowledge memory(304) stores relational knowledge, structural knowledge, and procedural knowledge for a document. A deep knowledge providing unit(306) inputs a query language. The deep knowledge providing unit searches and provides a triple which includes the query language and documents related to the triple. The deep knowledge providing unit uses a GCL(Generalized Concordance Lists) query which searches a specific word or a relation between word sets or between words. [Reference numerals] (302) Multidimensional knowledge generating technology; (304a) Relational knowledge memory; (304b) Structural knowledge memory; (304c) Procedural knowledge memory; (306) Deep knowledge providing technology; (AA) Large scholarly information; (BB) Deep knowledge delivery by field

Description

Method for providing deep knowledge based on academic science knowledge and apparatus suitable for this

본 발명은 대규모 디지털 학술 자원 기반의 텍스트 분석 및 지식 가공 기술에 관한 것으로서, 특히 학술 논문, 특허, 보고서 등의 대상 문서를 분석하여 관계형 지식, 구조적 지식, 절차적 지식으로 이루어지는 다차원 지식을 추출하고, 대상문서에 대한 다차원 지식들을 저장하고 질의어에 상응하는 심층 지식을 제공함으로써 대규모 디지털 학술 자원을 용이하게 활용 및 공유할 수 있게 하는 학술적 과학 지식 메모리 기반의 심층 지식 제공 방법 및 이에 적합한 장치에 관한 것이다.The present invention relates to text analysis and knowledge processing technology based on large-scale digital academic resources, and in particular, to analyze multi-dimensional knowledge consisting of relational knowledge, structural knowledge, and procedural knowledge by analyzing target documents such as academic papers, patents, reports, etc. The present invention relates to a method for providing in-depth knowledge based on academic scientific knowledge memory and a device suitable for storing and utilizing multi-dimensional knowledge of a target document and providing in-depth knowledge corresponding to a query word.

과학기술 연구자, 정책 결정자들의 가장 큰 고민은 폭발적으로 증가하는 논문, 특허, 보고서 등의 기술 자료를 효율적으로 분석하여 자신들의 연구 및 정책에 반영 혹은 새로운 연구의 방향을 설정하는 데 있다. The biggest concern for science and technology researchers and policy makers is to efficiently analyze technical data such as papers, patents and reports that are exploding and reflect them in their own research and policies or to set new research directions.

이와 관련하여, 영국의 국립 텍스트 마이닝 연구센터(National Centre for Text Mining, NaCTeM)의 Sophia Ananiadou 소장은 2008년 Biomedical Computation Review와의 인터뷰에서 "과학자들은 현재 텍스트에 익사당하고(drowning) 있다."고 언급하였으며, 이를 극복할 수 있는 대용량 자원 기반의 텍스트 분석 및 지식 가공 기술의 중요성을 강조하였다. In this regard, Sophia Ananiadou, director of the UK's National Center for Text Mining (NaCTeM), commented in a 2008 Biomedical Computation Review that "scientists are now drowning in text." We emphasized the importance of large-scale resource-based text analysis and knowledge processing techniques to overcome this problem.

과학기술분야 종사자들의 위와 같은 고민들을 해결하기 위해 수행될 수 있는 과학기술정보의 심층적 분석은 전체 R&D 과정에서 상당한 부분을 차지하고 있으며, 대부분 고도의 수작업으로 수행되고 있다. In-depth analysis of S & T information, which can be carried out to solve the above concerns of science and technology workers, is a large part of the entire R & D process, and most of them are performed by highly manual work.

이로 인해, 광범위한 과학기술분야에서 생산되는 엄청난 규모의 전문지식자원에 내포된 심층전문지식(Deep Domain Knowledge)을 제대로 식별하지 못하여 R&D의 의사결정이나 수행과정에 있어서, (1) 국가적 연구개발 정책 방향 수립에 있어서의 비효율성, (2) 기업 및 국가 간의 특허 분쟁 (3) 연구의 중복 및 후진성 등과 같은 오류나 불합리성을 일으키는 경우가 빈번하게 발생하고 있는 실정이다. As a result, inadequate identification of deep domain knowledge embedded in a vast amount of expertise produced in a wide range of science and technology sectors has led to difficulties in the development and implementation of R & D. There are many cases where errors or irrationalities such as inefficiency in establishment, (2) patent disputes between companies and countries, and (3) duplication and backwardness of research are caused.

분야별 과학기술 연구 종사자(기업/공공 연구원, 대학원생, 교수 등)들은 자신의 연구 분야에서 생산되고 있는 양질의 최신 심층지식을 신속하게 획득하고 연구에 반영하기 원하고 있다. 그러나 연구 추진 이전 혹은 과정에서 자신들이 정한 가설(hypothesis) 혹은 아이디어에 대한 독창성(originality) 및 타당성(justification)에 대한 확신을 얻기 위해 매우 많은 시간을 할애함으로써 연구 활동(R&D activity)의 효율성을 저해하고 있는 상황이다. Sectoral science and technology researchers (enterprise / public research institutes, graduate students, professors, etc.) want to quickly acquire and reflect the latest high-quality in-depth knowledge produced in their research fields. However, by deciding on their own hypothesis or originality and justification for ideas before or during the research process, they spend a great deal of time hindering the effectiveness of their R & D activities. It is a situation.

또한, R&D 의사결정자 및 정책입안자들은 대상 연구 분야에 대한 신뢰성 있는 핵심 동향 지식(technical trends)을 쉽게 얻기를 원하고 있다. 부가 작업이 없이 자신들의 보유 지식만을 활용해서 쉽게 파악할 수 있는 직관적, 실용적 동향 지식의 확보가 이들의 핵심적인 이슈이다.In addition, R & D decision makers and policy makers want to easily obtain reliable technical trends in the subject research area. The key issue is to secure intuitive and practical trend knowledge that can be easily identified using only their own knowledge without any additional work.

지금까지 과학기술 연구 종사자나 R&D 의사결정자 및 정책 입안자들이 주로 수행하였던 수작업에 의한 과학기술분야 심층전문지식 획득의 문제점은 (1) 과다한 시간 및 자원 활용, (2) 망라성(comprehensiveness) 확보, (3) 개인적으로 획득된 심층전문지식의 재사용/공유/확산의 어려움 등에 있다. Until now, problems of obtaining in-depth expertise in science and technology through manual labor, which have been mainly performed by science and technology researchers, R & D decision makers, and policy makers, are: (1) excessive time and resource utilization, (2) securing comprehensiveness, ( 3) Difficulties in reuse, sharing, and diffusion of personally acquired in-depth expertise.

문서로부터 심층지식을 추출하여 그 정보들을 기반으로 하는 검색 시스템은 여러 가지가 있다. 그 중 세 가지 시스템에 대해 간략하게 소개한다. There are many retrieval systems based on information extracted from deep knowledge from documents. Three of these systems are briefly introduced.

첫 번째 검색 시스템은 미국 National Library of Medicine의 PubMed로써 이 시스템은 다양한 옵션의 고급검색기능 및 결과 문서와 관련된 문서들을 함께 제시하는 기능 등 일반 검색시스템과의 차별화를 시도하였다. The first search system was PubMed of the US National Library of Medicine, which attempted to differentiate itself from the general search system, including advanced search functions with various options and the presentation of documents related to the result documents.

두 번째 검색 시스템은 일본 동경대학 Tsujii 연구실의 MEDIE로써 주어, 동사, 목적어의 최대 세 개의 질의어를 이용한 고급 검색이 가능하다. 이 시스템은 검색 대상 문서들을 색인하여 검색하는 일반적인 검색 시스템과는 다르게, 모든 보유 문서에 대상으로 문장 분리, 구문 분석, 개체명 인식과 같은 고급 자연언어처리 기술을 적용하여 획득한 심층 지식을 추출한 후 이 정보를 콘텐츠로 제공하고 있다. The second search system is given by MEDIE of Tsujii Laboratories of Tokyo University, Japan, allowing advanced search using up to three query terms of verbs and objects. Unlike a general search system that indexes and searches searched documents, this system extracts in-depth knowledge acquired by applying advanced natural language processing techniques such as sentence separation, parsing, and object name recognition to all documents. This information is provided as content.

마지막 검색 시스템은 일본 AIST (National Institute of Advanced Industrial of Science and Technology)의 LEGENDA로써 개체와 개체 사이의 공기 정보를 미리 파악한 후, 하나의 개체로 검색이 되면 이와 공기빈도가 높은 개체 이름들이 공기 문장과 같이 빈도순으로 열거된다 The last retrieval system is LEGENDA of the National Institute of Advanced Industrial of Science and Technology (AIST) in Japan, which grasps the air information between objects in advance, and when a single object is retrieved, the names of the objects with high air frequency Are listed in order of frequency as

이러한 종래의 검색 시스템들은 고급 자연언어처리 기술을 적용하여 문서 내의 심층 지식을 추출하였으나, 일반 문서를 대상으로 한 정보검색 방법에서 크게 벗어나지 않고 단편적인 결과만을 제시하기 때문에 이런 결과만으로는 다양한 분석을 할 수가 없고 궁극적으로 연구 주제 선정이나 연구 동향 분석을 할 수가 없다는 문제점을 가지고 있다. These conventional retrieval systems use advanced natural language processing techniques to extract in-depth knowledge in a document. However, these results can only be analyzed in various ways because they present only fragmentary results without significant deviation from the information retrieval method for general documents. And ultimately cannot select research topics or analyze research trends.

본 발명을 상기의 문제점들 중의 적어도 일부를 해결하기 위하여 고안된 것으로서 학술적 과학기술 지식의 효율적 공유/확산을 위한 학술적 과학지식 메모리 기반의 심층 지식 제공 방법을 제공하는 것을 그 목적으로 한다. The present invention has been devised to solve at least some of the above problems, and an object thereof is to provide a method for providing in-depth knowledge based on academic science memory for efficient sharing / diffusion of academic science and technology knowledge.

본 발명을 상기의 문제점들 중의 적어도 일부를 해결하기 위하여 고안된 것으로서 학술적 과학기술 지식의 효율적 공유/확산을 위한 학술적 과학지식 메모리 기반의 심층 지식 제공 장치를 제공하는 것을 그 목적으로 한다. The present invention has been devised to solve at least some of the above problems, and an object thereof is to provide an apparatus for providing in-depth knowledge based on academic scientific knowledge for efficient sharing / diffusion of academic scientific and technological knowledge.

상기의 목적을 달성하기 위한 본 발명에 따른 학술적 과학지식 메모리 기반의 심층 지식 제공 방법은 Method for providing in-depth knowledge based on academic scientific knowledge memory according to the present invention for achieving the above object

대상 문서에 포함된 개체간의 관계를 나타내는 관계형 지식, 대상 문서에 포함된 문장들 간의 화용적 역할을 나타내는 구조적 지식 그리고 대상 문서의 목적, 행위 그리고 방법을 포함하는 절차적 지식으로 구성되는 다차원 지식이 트리플(주어/이벤트동사/목적어) 형태로 저장된 지식 메모리에 기반한 지식 제공 방법에 있어서,The multidimensional knowledge consists of relational knowledge representing the relationship between the objects contained in the target document, structural knowledge representing the pragmatic role between the sentences contained in the target document, and procedural knowledge including the purpose, behavior, and methods of the target document. In the knowledge providing method based on the knowledge memory stored in the form of (subject / event verb / object),

검색을 위해 입력된 질의어를 포함하는 트리플을 상기 지식 메모리로부터 검색하여 제시하는 과정; 및Searching for and presenting a triple including a query word input for searching from the knowledge memory; And

제시된 트리플들 중에서 선택된 트리플에 관련된 문서들을 검색하여 제시하는 과정;을 포함하는 것이 바람직하다.
And searching for and presenting documents related to the selected triple among the presented triples.

여기서, 상기 트리플을 검색하여 제시하는 과정은Here, the process of searching for and presenting the triple

상기 질의어를 주어로 하여 공기관계를 가지는 이벤트동사를 가지는 트리플을 검색하며,Search for triples with event verbs with air relations based on the query terms,

공기관계를 가지는 이벤트 동사들을 제시하는 과정; 및 Presenting event verbs with air relationships; And

제시된 이벤트동사들 중의 하나가 선택되면 “주어-선택된 이벤트동사“를 가지는 검색 트리플로 검색을 재수행하는 과정을 포함하는 것일 수 있다.If one of the presented event verbs is selected, it may include a process of performing a search again with a search triple having a “subject-selected event verb”.

여기서, 상기 트리플들에 포함된 개체들 중에서 주어 혹은 목적어에 대한 유사어를 더 제시하고 유사어가 선택되면 선택된 유사어를 가지는 검색 트리플로 검색을 재수행하는 과정을 더 포함하는 것이 바람직하다.Here, the method may further include a step of further presenting a similar word for a subject or object among the entities included in the triples, and re-searching for the search triple having the selected similar word when the similar word is selected.

또한, 상기 검색 트리플이 재구성될 때마다 순간 검색에 의해 상기 지식 메모리로부터 재구성된 검색 트리플에 관련된 문서들을 검색하여 제시하는 과정;을 더 포함하는 것이 바람직하다.
The method may further include searching for and presenting documents related to the search triple reconstructed from the knowledge memory by an instant search whenever the search triple is reconstructed.

다른 한편으로, 상기 트리플을 검색하여 제시하는 과정은On the other hand, the process of searching for and presenting the triple

상기 지식 메모리로부터 입력된 질의어를 주어로 가지는 트리플을 검색하는 과정;Searching for a triple having a query word input from the knowledge memory as a subject;

상기 검색된 트리플로부터 목적어에 해당하는 객체(목적어 객체)를 추출하는 과정; 및Extracting an object (object object) corresponding to an object from the searched triples; And

주어와 목적어를 잇는 선에 의해 트리플 관계를 표현하는 네트워크를 제공하는 과정;을 포함하는 것일 수 있다.Providing a network representing a triple relationship by a line connecting the subject and the object.

여기서, 상기 지식 메모리로부터 상기 목적어를 주어로 가지는 트리플을 검색하여 상기 네트워크를 확장하는 과정을 더 구비하는 것이 바람직하다.
The method may further include expanding the network by searching for triples having the object as the subject from the knowledge memory.

또 다른 한편으로, 상기 트리플을 검색하여 제시하는 과정은On the other hand, the process of searching for and presenting the triple

상기 지식 메모리로부터 입력된 질의어를 주어로 가지는 트리플들을 검색하는 과정; 및Searching for triples having a query word input from the knowledge memory as a subject; And

상기 검색된 트리플들을 객체별 분류하고, 각 객체의 출현 빈도에 따라 리스트하는 과정을 포함하는 것일 수 있다.
The searched triples may be classified by object, and a list may be included according to the frequency of appearance of each object.

상기 지식 메모리로부터 입력된 질의어를 포함하는 트리플을 검색하는 과정;Searching for a triple including a query word input from the knowledge memory;

검색된 트리플로부터 개체를 추출하는 과정;Extracting an object from the retrieved triples;

추출된 개체의 목록을 제시하는 과정;Presenting a list of extracted objects;

개체 목록으로부터 선택된 개체를 포함하는 질의어를 재구성하는 과정;을 포함하는 것일 수 있다.
Reconstructing a query including an object selected from an object list.

본 발명의 다른 목적을 달성하는 학술적 과학 지식 메모리 기반의 심층 지식 제공장치는 In-depth knowledge providing apparatus based on academic scientific knowledge memory to achieve another object of the present invention

문서에 대한 관계형 지식(Relational Knowledge), 구조적 지식(Structural Knowledge), 절차적 지식(Procedural Knowledge)을 저장하는 지식 메모리; 및A knowledge memory for storing relational knowledge, structural knowledge, and procedural knowledge about the document; And

질의어를 입력하고, 상기 지식 메모리로부터 입력된 질의어를 포함하는 트리플 및 트리플에 관련된 문서들을 검색하여 제시하는 심층 지식 제공부;An in-depth knowledge provider for inputting a query and searching and presenting triples and documents related to the triples including the query word input from the knowledge memory;

를 포함하는 것을 특징으로 한다.Characterized in that it comprises a.

여기서, 상기 심층 지식 제공부는 특정 용어를 검색하거나 용어 집단 혹은 용어간의 관계를 검색할 수 있는 범용 일치 항목 질의(Generalized Concordance Lists(GCL) Query)를 이용하는 것을 특징으로 한다.In this case, the in-depth knowledge provider may use Generalized Concordance Lists (GCL) Query to search for a specific term or to search for a term group or a relationship between terms.

또한, 상기 심층 지식 제공부는 지속적으로 연결된 트리플 기반의 지식 정보를 동일 화면에서 마우스 클릭으로 손쉽게 옮겨 가면서, 해당되는 트리플에 대한 문서정보를 동적으로 살펴볼 수 있는 슬라이드 내비게이션을 제공하는 것을 특징으로 한다. In addition, the in-depth knowledge provider is characterized by providing a slide navigation to dynamically look at the document information for the triple, while moving the continuously connected triple-based knowledge information on the same screen with a mouse click.

또한, 상기 심층 지식 제공부는 사용자가 원하는 키워드를 입력하고 검색을 실행하면 해당 키워드와 연관되는 주어/이벤트동사/목적어 정보를 빈도수 순으로(상위 5순위까지) 차트와 함께 통계결과를 표현하는 동적 트리플 분석 정보 브라우징을 더 제공하는 것을 특징으로 한다.
In addition, the in-depth knowledge provider is a dynamic triple to express the statistical results with a chart in order of frequency (up to the top five ranks) the subject / event verb / target information associated with the keyword when the user enters the desired keyword and executes a search Characterized in that the analysis information browsing further.

또한, 상기 심층 지식 제공부는 입력된 2개의 키워드를 기준으로 서로 연관되는 개체목록과 이벤트목록을 화면에 출력하는 동적 테이블 기반 검색을 더 제공하는 것을 특징으로 한다.In addition, the in-depth knowledge provider may further provide a dynamic table-based search for outputting the object list and the event list associated with each other based on the two input keywords.

또한, 상기 심층 지식 제공부는 방사형으로 표시되는 지식트리플을 기반으로 세부 요소 지식을 내비게이션하는 심층지식 내비게이션을 더 제공하는 것을 특징으로 한다.In addition, the in-depth knowledge provider may further provide in-depth knowledge navigation for navigating detailed element knowledge based on a knowledge triple displayed in a radial form.

본 발명에 따른 심층 지식 제공 방법 및 장치는 학술적 과학 지식 메모리를 효과적으로 구축하기 위하여, 인간이 특정 과학 분야의 문헌을 분석하고 지식을 습득/소화하는 복잡한 과정을 고도화된 자연언어처리 및 마이닝(mining) 기술로 일부 모사함으로써 학술적 전문지식의 자동 추출 및 축적을 가능하게 하는 효과를 갖는다. The method and apparatus for providing in-depth knowledge according to the present invention is an advanced natural language processing and mining process for the complex process of analyzing and acquiring / digesting knowledge in a specific scientific field in order to effectively construct academic scientific knowledge memory. Partial simulation with technology has the effect of enabling automatic extraction and accumulation of academic expertise.

본 발명은 평면적 문헌정보가 다차원으로 구조화되고, 개별 단위 정보 내에 내포된 요소 지식이 식별/관리될 수 있는 학술적 과학 지식 메모리를 구축함으로써 이용자는 분야별 심층 과학 지식을 혁신적으로 활용/공유할 수 있게 하는 효과를 갖는다.The present invention enables the user to innovatively utilize / share in-depth scientific knowledge by field by constructing an academic scientific knowledge memory in which planar bibliographic information is structured in multiple dimensions and element knowledge embedded in individual unit information can be identified / managed. Has an effect.

도 1은 본 발명에 있어서의 다차원 지식의 개념을 도식적으로 도시한다.
도 2는 본 발명에 있어서 다차원 지식들 간의 관계를 도식적으로 보이고 있다.
도 3은 본 발명에 따른 심층 지식 제공 장치의 구성을 도시한다.
도 4는 본 발명에 있어서 표층분야지식 식별 및 추출을 위한 통합 언어 분석 모듈 팩토리의 구성을 도시한다.
도 5는 본 발명에 따른 절차적 지식의 모델링 과정을 도식적으로 도시한다.
도 6은 본 발명에 따른 절차적 지식 추출 방법을 도시한다.
도 7은 본 발명에 따른 지식 제공 장치에 있어서의 검색 초기 화면을 도시한다.
도 8은 본 발명에 있어서 슬라이드 내비게이션 서비스 화면의 예를 도시한다.
도 9 내지 도 10은 본 발명에 있어서 심층 지식 내비게이션 서비스 화면의 예를 도시한다.
도 11은 본 발명에 있어서 동적 트리플 분석 서비스 화면의 예를 도시한다.
도 12는 본 발명에 있어서 동적 테이블 검색 서비스 화면의 예를 도시한다.1 schematically illustrates the concept of multidimensional knowledge in the present invention.
2 schematically illustrates the relationship between multidimensional knowledge in the present invention.
3 illustrates a configuration of an apparatus for providing in-depth knowledge according to the present invention.
Figure 4 shows the configuration of a unified language analysis module factory for surface field knowledge identification and extraction in the present invention.
5 diagrammatically illustrates a modeling process of procedural knowledge in accordance with the present invention.
6 shows a procedural knowledge extraction method according to the present invention.
Fig. 7 shows a search initial screen in the knowledge providing apparatus according to the present invention.
8 shows an example of a slide navigation service screen in the present invention.
9 to 10 show an example of the in-depth knowledge navigation service screen in the present invention.
11 shows an example of a dynamic triple analysis service screen in the present invention.
12 shows an example of a dynamic table search service screen in the present invention.

이하 첨부된 도면을 참조하여 본 발명의 구성 및 동작에 대하여 상세히 설명하기로 한다.Hereinafter, the configuration and operation of the present invention will be described in detail with reference to the accompanying drawings.

정보검색에서의 개체(ENTITY)는 정보를 생성하거나 정보에 연관된 인물, 기관, 위치 등과 같이 특정 그룹으로 분류할 수 있는 사물 또는 개념으로 정의한다. An entity in information retrieval is defined as an object or concept that can generate information or classify it into a specific group, such as a person, organization, or location.

개체의 일례로, 인물에 속하는 개체로는 "을지문덕", "이순신", "빌 게이츠" 등이 있으며, 기관에 속하는 개체로는 "서울대학교", "교육과학기술부" 등이 있으며, 위치에 속하는 개체로는 "만주벌판", "서울특별시", "대전시", "구로구" 등이 있다.
Examples of individuals include "Euljimunde", "Yi Soon Shin", "Bill Gates", etc., and individuals belonging to the institution include "Seoul National University" and "Ministry of Education, Science and Technology". The individuals belonging to are Manchurian Plate, Seoul, Daejeon, and Guro-gu.

온톨로지는 인간의 지식을 다루는 분야 및 기법을 가리키며, 특히 컴퓨터를 이용한 지식 표현으로서의 온토로지는 개념화의 명시적인 규약 즉, 어떤 분야의 지식을 계산기로 처리할 수 있도록 명시적 및 논리적으로 기술하고, 그 지식의 공유와 재이용을 가능하게 하는 것이다. 온톨로지에 있어서, 클래스는 같은 성질을 갖는 리소스들을 그룹화하고 공통 성질을 논리적으로 표현하기 위한 기능을 제공한다. 클래스의 성질은 해당 클래스가 갖는 속성의 조건을 규정함으로써 표현할 수 있다. 한편, 인스턴스(instance)란 개념에 속하는 개체를 말한다.Ontology refers to the fields and techniques that deal with human knowledge.Ontology, in particular, as a computer-based expression of knowledge, expresses the explicit conventions of conceptualization, that is, expressly and logically so that the knowledge of a certain field can be processed by a calculator. It is to enable knowledge sharing and reuse. In ontology, classes provide the ability to group resources with the same properties and to logically express common properties. The nature of a class can be expressed by specifying the condition of the property of the class. On the other hand, an instance refers to an entity belonging to the concept.

온톨로지 구축은 각 분야의 샘플링된 서류들로부터 패턴 및 구성 요소를 분석해내고, 분석한 결과들을 토대로 온톨로지를 모델링함에 의해 달성된다.
Ontology construction is accomplished by analyzing patterns and components from sampled documents in each field and modeling the ontology based on the analysis results.

본 발명에서 정의한 다차원 지식(multifaceted knowledge)이란 학술정보에서 추출될 수 있는 지식을 그 깊이에 따라 3단계로 구분한 것으로서, (1) 관계형 지식(Relational Knowledge), (2) 구조적 지식(Structural Knowledge), (3) 절차적 지식(Procedural Knowledge)을 포함하는 다층적 구조로 이루어진 지식 정보를 말한다. Multifaceted knowledge defined in the present invention is divided into three levels according to the depth of knowledge that can be extracted from academic information, (1) Relational Knowledge, (2) Structural Knowledge (3) Knowledge information in a multi-layered structure that includes procedural knowledge.

첫 번째로 관계형 지식이란 문헌 내에 존재하는 개체 간의 관계를 식별해놓은 지식으로서, 보다 구체적으로는 문헌 내에서 내용의 구심점 역할을 수행하는 다양한 전문용어 및 개체명과 같은 과학기술 핵심개체 간의 관계를 식별해 놓은 지식이며, 비교적 용이하게 식별 및 추출이 가능하다. First, relational knowledge is knowledge that identifies the relationships between entities in the literature. More specifically, relational knowledge identifies the relationships among key scientific and technological entities, such as various terminology and entity names, which serve as a focal point of content in the literature. Knowledge is relatively easy to identify and extract.

두 번째로 구조적 지식은 문헌 내에 존재하는 다양한 구문적 형태의 문장들에 대한 화용적 역할(discourse role)을 구분해 놓은 지식으로서, 보다 구체적으로는 특정 문서 내에서 문장의 속성, 문장들 사이의 관계 분석을 통해 구조화된 지식이다. Second, structural knowledge is the knowledge that distinguishes discourse roles for various syntactic forms of sentences in a document. More specifically, the relationship between the properties of sentences and sentences within a specific document. Knowledge is structured through analysis.

예를 들어, 학술논문의 초록에는 그 논문이 해결하고자 하는 문제나 분야를 표현한 문장(목표), 연구결과를 간략하게 기술한 문장(결과), 연구 방법을 설명한 문장(방법) 등이 나타난다. 이러한 문장 기반의 화용적 역할을 구분해 놓으면, 다양한 심층지식 추출이 매우 용이하게 된다. For example, in the abstract of a scholarly article, a sentence (a goal) expressing a problem or a field to be solved by the article, a sentence briefly describing a research result (result), a sentence (method) explaining a research method, and the like are displayed. By separating the sentence-based pragmatic roles, it is very easy to extract various in-depth knowledge.

세 번째로 절차적 지식은 특정 과학기술 분야에서 학술논문이나 기타 전문문헌들이 기술한 연구적 행위(R&D Activity)나 학술적 연구의 목적, 방법, 검증 절차 등에 대한 구조화를 통해 도출된 지식으로서, 보다 구체적으로는 분야별 지식 기반 심층 분석을 통해 R&D Activity 식별하기 위한 것이며, 학술적 연구의 목적, 방법, 검증 절차 등에 대한 구조화된 지식이다.
Third, procedural knowledge is knowledge derived from structured research and research activities (R & D activities) or the purpose, methods, and verification procedures described by academic papers or other specialized literature in a specific science and technology field. The purpose is to identify R & D activities through in-depth analysis of the knowledge base of each field, and it is structured knowledge about the purpose, method, and verification procedure of academic research.

도 1은 본 발명에 있어서의 다차원 지식의 개념을 도식적으로 도시한다.1 schematically illustrates the concept of multidimensional knowledge in the present invention.

도 1을 참조하여 알 수 있는 바와 같이, 비교적 인식 및 추출이 쉽고, 현재 전 세계적으로 활발하게 그 연구가 추진되고 있는 관계형 지식을 표층분야지식 (Surface Domain Knowledge, SDK)이라 하고 이에 비해 비교적 추출이 어려우며 다양한 기반 처리가 필요한 구조적 지식과 절차적 지식을 심층 분야 지식 (Deep Domain Knowledge, DDK)이라고 구분한다. As can be seen with reference to FIG. 1, the relational knowledge, which is relatively easy to recognize and extract, is being actively promoted around the world, is called Surface Domain Knowledge (SDK). Structural and procedural knowledge that is difficult and requires a variety of infrastructure processes is classified as deep domain knowledge (DDK).

이와 같은 심층화 수준 및 처리 난이도에 따른 지식의 단계적 구분은 학술적 과학 지식 메모리 기반의 심층지식을 위한 연구를 수행함에 있어서 체계적인 접근이 가능하게 하며 연구의 단계별 목표를 명확하게 설정할 수 있는 장점이 있다.This stepwise division of knowledge according to the level of in-depth and processing difficulty enables a systematic approach in conducting research for in-depth knowledge based on academic scientific knowledge memory, and has the advantage of clearly setting targets for each step of research.

도 2는 본 발명에 있어서 다차원 지식들 간의 관계를 도식적으로 보이고 있다. 도 2에는 생의학 분야에서 개별 지식을 활용하였을 경우, 이용자가 획득할 수 있는 퇴행성 척수병증(degenerative myelopathy)에 대한 지식의 구체적인 예가 표시되어 있다. 2 schematically illustrates the relationship between multidimensional knowledge in the present invention. Figure 2 shows a specific example of the knowledge of degenerative myelopathy that can be obtained by the user when using the individual knowledge in the biomedical field.

예를 들어, 관계형 지식의 경우 퇴행성 척수병증의 원인 및 결과에 대한 정보 및 질병과 관련된 유전자나 단백질 정보를 쉽게 획득할 수 있다. For example, in relational knowledge, information about the cause and effect of degenerative myelopathy and gene or protein information related to the disease can be easily obtained.

구조적 지식은 퇴행성 척수병증의 내인성/외인성 요인을 밝히기 위한 방법론에 대한 정보를 얻게 한다. 예를 들어, 척수병증(spondylopathy)과 내인성/외인성 요인을 밝히기 위해 적용되는 접근 방법을 알 수 있다.Structural knowledge provides information on methodologies for identifying endogenous / exogenous factors of degenerative myelopathy. For example, the approach applied to elucidate spondylopathy and endogenous / exogenous factors may be known.

또한, 절차적 지식은 척수병증에 대한 치료 방법이나 연구실험 절차 등의 세부적인 지식을 얻을 수 있는 기회를 제공한다. 예를 들어 환자가 트라마돌(tramadlo)에 부작용을 보일 때에 대한 연구자료나 조류를 이용한 자발성 축수병증(spontaneous spondylopathy)에 대한 실험 과정에 대한 정보를 얻을 수 있다.In addition, procedural knowledge provides an opportunity to gain detailed knowledge about the treatment of myelopathy and experimental procedures. For example, research data on when patients have side effects on tramadlo, or information on experimental procedures for spontaneous spondylopathy with algae can be obtained.

도 2에 보여지는 바와 같이, 지식의 심도가 깊어짐에 따라서 이용자 측면에서의 활용 유연성이 커지며, 동적(dynamic)인 심층지식을 획득할 수 있는 기회가 보다더 많이 부여될 수 있음을 알 수 있다. As shown in FIG. 2, it can be seen that as the depth of knowledge deepens, the flexibility of utilization on the user side increases, and more opportunities for acquiring dynamic in-depth knowledge can be given.

도 3은 본 발명에 따른 지식 제공 장치의 구성을 도시한다. 도 3에 도시된 본 발명에 따른 지식 제공 장치(300)는 일측면으로는 대규모 학술 정보를 입력하고 다차원 지식을 생성하여 지식 메모리에 저장하는 기능을 가지며, 다른 측면으로는 질의어에 대하여 도 1 내지 2를 통하여 설명되는 다차원 지식을 지식 메모리로부터 검색하여 제공하는 기능을 갖는다. 3 shows a configuration of a knowledge providing apparatus according to the present invention. The knowledge providing apparatus 300 according to the present invention shown in FIG. 3 has a function of inputting large-scale academic information on one side, generating multidimensional knowledge, and storing the multidimensional knowledge in a knowledge memory. It has a function of retrieving the multidimensional knowledge described through 2 from the knowledge memory and providing it.

입력된 대규모 학술 정보로부터 다차원 지식을 생성하기 위하여 지식 제공 장치(30)는 입력된 대상 문서에 대하여 자연언어처리를 행하고, 자연언어처리에 의해 얻어지는 다양한 자질들을 이용하여 문서로부터 관계형 지식, 구조적 지식 그리고 절차적 지식을 생성하고, 문서에 대한 관계형 지식, 구조적 지식 그리고 절차적 지식들을 지식 메모리에 저장한다.In order to generate multidimensional knowledge from the inputted large-scale academic information, the knowledge providing apparatus 30 performs natural language processing on the input target document, and uses relational knowledge, structural knowledge, and structural knowledge from the document using various qualities obtained by natural language processing. Generate procedural knowledge and store relational, structural and procedural knowledge about the document in knowledge memory.

또한, 다차원 지식에 대한 검색 서비스를 제공하기 위하여 지식 제공 장치(30)는 질의어 자동 완성, 다양한 검색 및 프리제테이션 방법들을 제시한다.
In addition, in order to provide a search service for multidimensional knowledge, the knowledge providing apparatus 30 proposes a query automatic completion, various search and presentation methods.

도 3을 참조하면, 본 발명에 따른 지식 제공 장치(300)는 다차원 지식 생성부(302), 지식 메모리(304) 그리고 심층지식 제공부(306)를 포함한다.Referring to FIG. 3, the knowledge providing apparatus 300 according to the present invention includes a multidimensional knowledge generating unit 302, a knowledge memory 304, and an in-depth knowledge providing unit 306.

지식 메모리(304), 일명 버추얼 사이언스 브레인(virtual science brain)은 관계형 지식 메모리(304a), 구조적 지식 메모리(304b), 절차적 지식 메모리(304c)로 구성될 수 있으며, 질의어에 상응하는 심층 지식을 제공한다. The knowledge memory 304, also known as a virtual science brain, may be composed of a relational knowledge memory 304a, a structural knowledge memory 304b, and a procedural knowledge memory 304c. to provide.

다차원 지식 생성부(302)는 학술정보로부터 학술정보에서 추출될 수 있는 관계형 지식(Relational Knowledge), 구조적 지식(Structural Knowledge), 절차적 지식(Procedural Knowledge)으로 이루어지는 다차원 지식을 추출한다. 다차원 지식 생성부(302)로부터 추출된 관계형 지식, 구조적 지식, 절차적 지식은 각각 관계형 지식 메모리(304a), 구조적 지식 메모리(304b), 절차적 지식 메모리(304c)에 저장된다.The multidimensional knowledge generator 302 extracts multidimensional knowledge consisting of relational knowledge, structural knowledge, and procedural knowledge that can be extracted from academic information. The relational knowledge, structural knowledge, and procedural knowledge extracted from the multidimensional knowledge generator 302 are stored in the relational knowledge memory 304a, the structural knowledge memory 304b, and the procedural knowledge memory 304c, respectively.

심층 지식 제공부(306)은 질의어에 대하여 지식 메모리(304)를 참조하여 심층 지식을 추출한다. 질의어 입력을 돕기 위하여 질의어 자동 완성 기능이 제공될 수 있다. 추출된 심층 지식을 제공하기 위하여 슬라이드 방식의 인터페이스를 포함하는 다양한 인터페이스가 제공될 수 있다.The in-depth knowledge provider 306 extracts in-depth knowledge by referring to the knowledge memory 304 with respect to the query word. A query autocompletion function may be provided to assist with query entry. Various interfaces may be provided including a slide type interface in order to provide extracted deep knowledge.

도 3에 도시된 본 발명에 따른 지식 제공 장치(300)는 표층분야지식(SDK)와 심층분야지식(DDK)가 단일 융합된 형태의 대규모 학술적 과학지식 메모리와, 이를 활용할 수 있는 심층지식제공기술이 결합된 구조로 이루어져 있다. The apparatus 300 for providing knowledge according to the present invention shown in FIG. 3 is a large-scale academic scientific knowledge memory in which a surface field knowledge (SDK) and a deep field knowledge (DDK) are fused together, and a deep knowledge providing technology that can utilize the same. It consists of a combined structure.

도 3에 도시된 본 발명에 따른 지식 제공 장치(300)는 학술적 과학지식메모리를 효과적으로 구축하기 위하여, 인간이 특정 과학 분야의 문헌을 분석하고 지식을 습득/소화하는 복잡한 과정을 고도화된 자연어 처리 및 마이닝 기술로 모사함으로써 디지털 전문지식의 자동 추출 및 축적을 가능하게 하였다. The apparatus 300 for providing knowledge according to the present invention shown in FIG. 3 is an advanced natural language processing method for a complicated process of analyzing a document of a specific scientific field and acquiring / digesting knowledge in order to effectively construct an academic scientific knowledge memory. Simulation with mining technology enabled the automatic extraction and accumulation of digital expertise.

이와 같이 평면적 문헌정보가 구조화되고, 개별 단위 정보 내에 내포된 요소 지식이 식별/관리될 수 있는 이러한 학술적 과학지식메모리를 구축함으로써 이용자는 분야별 심층과학지식을 혁신적으로 활용/공유할 수 있다. In this way, by constructing such academic scientific knowledge memory in which planar bibliographic information is structured and element knowledge embedded in individual unit information can be identified / managed, users can innovatively utilize / share in-depth scientific knowledge of each field.

본 발명에서는 과학기술지식에 대한 현실적인 정의를 부여하고 그 심도 및 추출 난이도에 따라 단계적으로 구분함으로써 연구 결과의 타당성(feasibility)을 확보하고, 기존 관련 연구와의 차별성을 토대로 수준 높은 국제적 연구 성과를 도출할 수 있는 계기를 확보하도록 한다.In the present invention, by giving a realistic definition of scientific and technological knowledge and classifying according to its depth and extraction difficulty step by step to secure the feasibility of the research results, derive a high level of international research results based on the differentiation from existing related research Make sure you have a chance to do it.

다차원 지식 생성부(302)는 용어사전, 온톨로지, 용어 사전 및 온톨로지 기반의 용어 인식 엔진, 기계학습 기반 용어 학습 엔진, ENJU 구문 분석 시스템, 지지벡터기계(SVMs) 및 최대 엔트로피(Maximum Entropy) 기반의 심층 언어분석 시스템 등을 사용하여, 학술 정보로부터 다차원 지식을 추출한다. 다차원 지식의 추출에 있어서, 기반 기술(엔진), 데이터베이스, 언어자원 등의 공유와 공동 활용 체제는 상당히 중요하다.The multidimensional knowledge generation unit 302 includes a term dictionary, ontology, term dictionary and ontology based term recognition engine, machine learning based term learning engine, ENJU syntax analysis system, support vector machines (SVMs) and maximum entropy based Multidimensional knowledge is extracted from academic information using in-depth language analysis system. In the extraction of multidimensional knowledge, sharing and co-utilization of the underlying technologies (engines), databases, and language resources are very important.

다차원 지식 생성부(302)에 있어서, 커널 기반 모델을 세부적으로 구현하여 지지벡터기계(SVMs) 및 최대 엔트로피(Maximum Entropy) 기반의 심층 언어분석이 가능하도록 하였다.In the multi-dimensional knowledge generation unit 302, a kernel-based model was implemented in detail to enable deep language analysis based on support vector machines (SVMs) and maximum entropy.

본 발명의 실시예에 있어서, 일본 동경대학에서 개발한 MEDIE(Intelligent Search Engine for MEDLINE)을 도입하여 활용하기 위해서, 생물학 분야 온톨로지 기반 전문용어 인식엔진과 Enju 구문분석 시스템을 세부적으로 분석하였다. In the embodiment of the present invention, in order to introduce and utilize the MEDIE (Intelligent Search Engine for MEDLINE) developed by the University of Tokyo, Japan, the ontology-based terminology recognition engine and the Enju syntax analysis system in the field of biology were analyzed in detail.

MEDIE 데이터베이스를 구성하기 위해 개발된 전문 용어 인식 엔진은 생물학 분야 온톨로지와 전문용어사전 집합을 기반으로 개발되었으며, 문헌 내에 출현한 단백질명, 유전자명, 질병명 등을 식별하여 이를 MeSH나 UMLS 내의 디스크립터와 연동시킨다. The terminology recognition engine developed to construct the MEDIE database was developed based on a biological ontology and a set of terminology dictionaries, and identifies protein names, gene names and disease names that appear in the literature and links them with descriptors in MeSH or UMLS. Let's do it.

문헌 내에서 전문용어 간의 연관관계를 추정하는데 필요한 Enju 구문 분석 시스템은 HPSG(Head-driven Phrase Structure Grammar) 기반의 고속 언어분석 시스템으로서, 문장 내에 존재하는 다양한 문법적 의존관계를 규명하고 이를 술어-논항 구조(Predicate Argument Structure)로 표현함으로써 관계추출에 중요한 다양한 자질을 추출할 수 있는 장점이 있다.The Enju parsing system, which is necessary for estimating the relationship between terminology in the literature, is a high-speed linguistic analysis system based on HPSG (Head-driven Phrase Structure Grammar), which identifies various grammatical dependencies in a sentence and uses the predicate-argument structure. By expressing it as (Predicate Argument Structure), it is possible to extract various qualities that are important for relation extraction.

MEDIE 데이터베이스에 접근하기 위한 기본적인 질의구조는 범용 일치 항목 질의(Generalized Concordance Lists(GGL) Query)라고 불리는 구조를 사용한다. GCL은 SQL과 흡사한 구조를 가지며, 특정 용어를 검색하거나 용어 집단 혹은 용어 간의 관계까지도 쉽게 검색할 수 있는 다양한 유사 비교 연산자를 포함하고 있다. The basic query structure for accessing the MEDIE database uses a structure called Generalized Concordance Lists (GGL) Query. GCL has a structure similar to SQL, and includes a variety of similar comparison operators that make it easy to search for specific terms or even term groups or relationships between terms.

다차원 지식 생성부(302)는 표층 지식 추출 기능과 심층 지식 추출 기능을 수행한다. The multidimensional knowledge generator 302 performs a surface knowledge extraction function and a deep knowledge extraction function.

표층 지식(Surface Domain Knowledge, SDK)의 추출을 위하여, For the extraction of Surface Domain Knowledge (SDK),

(ㄱ) 표층 지식 추출을 위한 언어자원 활용기술 (ㄴ) SDK 식별을 위한 언어처리/기계학습 응용기술이 이용된다. (A) Language resource utilization techniques for surface knowledge extraction (b) Language processing / machine learning application techniques for SDK identification are used.

우선 표층 지식 추출을 위한 언어자원 활용기술 부분에서는 각종 말뭉치 및 사전(MeSH, UMLS, OntoNotes 2.0, MUC 등)을 수집하고 분석하여 이를 구조화한 것을 이용한다. First, the language resource utilization technology for extracting surface knowledge is used to collect and analyze various corpus and dictionaries (MeSH, UMLS, OntoNotes 2.0, MUC, etc.) and structure them.

표층 지식 식별을 위한 언어처리/기계학습 응용기술 부분에서는 심층 언어분석을 위해 전 세계적으로 활발하게 사용 중인 각종 언어처리 플랫폼을 단일 API로 활용할 수 있는 범용 인터페이스 및 이를 활용한 통합 언어분석 모듈 팩토리를 이용한다. The Application of Language Processing / Machine Learning for Surface Knowledge Identification utilizes a universal interface and an integrated language analysis module factory that can utilize various language processing platforms that are actively used around the world as a single API for deep language analysis. .

현재 대표적인 구문분석 시스템인 Charniak Parser, Stanford Parser 그리고 Enju Parser가 이 팩토리에 포함되어 있다. 마지막으로 관계형 지식을 추출하기 위한 모델 중에서 가장 성능이 높은 것으로 알려져 있는 커널 기반 모델을 세부적으로 구현하여 지지벡터기계(SVMs) 및 최대 엔트로피(Maximum Entropy) 기반의 심층 언어분석이 가능하도록 하였다.Charniak Parser, Stanford Parser and Enju Parser are now included in this factory. Finally, the kernel-based model, which is known to be the highest performing model among relational knowledge extraction, is implemented in detail to enable deep language analysis based on support vector machines (SVMs) and maximum entropy.

도 4는 본 발명에 있어서 표층분야지식 식별 및 추출을 위한 통합 언어 분석 모듈 팩토리의 구성을 도시한다.Figure 4 shows the configuration of a unified language analysis module factory for surface field knowledge identification and extraction in the present invention.

도 4를 참조하면, 통합 언어 분석 모듈 팩토리는 개발환경 및 개발언어와 독립적으로 폭넓게 활용할 수 있도록 JVM(Java Virtual Machine) 기반의 JNI(Java Native Interface) 인터페이스를 사용하고, 현재 전 세계적으로 가장 광범위하게 사용하고 있는 3 가지 구문분석 모듈을 탑재하여 구동된다.Referring to FIG. 4, the integrated language analysis module factory uses a Java Virtual Machine (JNI) interface based on a Java Virtual Machine (JVM) to be widely used independently of a development environment and a language, and is currently the most widely used worldwide. It is driven by 3 different parsing modules.

표 1은 통합 언어 분석 모듈 팩터리에 사용되는 파서의 기능을 나타낸다.Table 1 shows the capabilities of the parsers used in the unified language analysis module factory.

구문분석기Parser 구문분석트리Parse tree 제공 offer 의존구문트리Dependency syntax tree 제공 offer 술어-논항구조 제공Provide predicate-argument structure Charniak Parser(미)Charniak Parser (US) OO Stanford Parser(미)Stanford Parser (US) OO OO Enju Parser(일)Enju Parser (Sun) OO OO

자연어 처리 및 텍스트 마이닝 분야에서 정보 추출(Information Extraction)은 핵심적인 영역으로 인식되고 있다. 정보 추출의 최종 목표는 비정형적인 텍스트 데이터 내에서 테이블화된 정형 데이터를 추출 및 변환하기 위해서 텍스트 내에 존재하는 중요하고 연관성 있는 정보를 식별하는 것이다. Information extraction is a key area in natural language processing and text mining. The final goal of information extraction is to identify important and relevant information present in the text to extract and transform the tabular structured data within the unstructured text data.

이러한 정보 추출 기술을 구성하는 요소 기술로서 (1) 개체명 인식(Named-Entity Recognition), (2) 관계 추출(Relation Extraction), (3) 대용어 참조 해소(Co-reference Resolution) 등이 있다. Element technologies constituting such information extraction techniques include (1) Named-Entity Recognition, (2) Relationship Extraction, and (3) Co-reference Resolution.

관계추출의 성능을 높이기 위해서 다양한 지도 학습(Supervised Learning) 기반의 관계 추출 기법이 소개되었다. 이들은 (1) 규칙기반 방법(Rule-based Methods), (2) 자질기반 방법(Feature-based Methods), (3) 커널기반 방법(Kernel-based Methods)의 세 가지 유형으로 분류될 수 있다. In order to improve the performance of relationship extraction, various supervised learning-based relationship extraction techniques have been introduced. They can be classified into three types: (1) rule-based methods, (2) feature-based methods, and (3) kernel-based methods.

이들 중 비교적 최근에 개발된 방법으로서, 관계추출에 특화된 커널 함수를 새롭게 구성하여 이를 기반으로 SVM(Support Vector Machine)에 적용하는 커널기반 방법이 주목받고 있다. As a relatively recently developed method, a kernel-based method that newly constructs a kernel function specialized for relation extraction and applies it to a support vector machine (SVM) based on this is drawing attention.

관계추출 분야에서 커널기반 방법의 특징은 한 문장에 존재하는 두 개체간의 관계를 가장 잘 표현하고, 이를 포함하는 두 관계 포함 문장들 간의 유사도를 가장 효과적으로 계산하는 커널을 구성하기만 하면, 그 성능이 매우 높게 나타난다는 것이다.
The characteristics of kernel-based methods in the field of relation extraction are that the performance of the kernel is best represented by constructing a kernel that best expresses the relationship between two entities in a sentence, and calculates the similarity between two relation-containing statements that includes it most effectively. It is very high.

도 3에 도시된 다차원 지식 추출부(302)에 있어서, 심층 지식 추출을 위한 기술은 크게 (1) SDA(Structured Digital Abstracts) 기반 기술, (2) 대용량 Activity 온톨로지 적용 및 변형, (3) 분야 특화된 심층 전문 지식 추출 및 활용 모델로 나뉠 수 있다.In the multi-dimensional knowledge extraction unit 302 shown in FIG. 3, a technique for extracting in-depth knowledge is largely based on (1) structured digital abstracts (SDA) based technology, (2) application and modification of large-scale activity ontology, and (3) specialized fields. It can be divided into in-depth expertise extraction and utilization model.

우선 SDA 기반 기술 부분에 있어서는 학술문헌의 초록에서 구조적 지식을 추출하기 위해서 연구의 목적이나 방법 등을 표현하는 핵심 문장들을 자동으로 추출하여 분류하기 위해서 지지벡터기계(Support Vector Machines, SVM) 및 조건부 무작위 필드(Conditional Random Fields, CRF) 모델 등을 활용한다.
First, in the SDA-based technology part, support vector machines (SVM) and conditional randomization are used to automatically extract and classify key sentences that express the purpose or method of research in order to extract structural knowledge from abstracts of academic literature. Uses the field (Conditional Random Fields, CRF) model.

대용량 Activity 온톨로지 적용 및 적용 부분에 있어서는 학술문헌에서의 절차적 지식, 즉 대상, 행위, 방법 등이 표현될 수 있는 다중 프로세스 간의 관계에 대한 세부적인 모형화와 함께, 절차적 지식 추출을 위한 평가 셋(set), 평가 셋을 위한 절차적 지식 태깅 지원 도구, 기계학습 기반의 절차적 지식 자동 추출 기법 등이 사용된다. 특히 총 7 단계로 구성되는 학술정보 기반 절차적 지식 자동 추출에 대한 세부적인 모델링을 하였다.
In the application and application of large-scale activity ontology, the evaluation set for procedural knowledge extraction, along with detailed modeling of the relationship between multiple processes in which procedural knowledge, that is, objects, behaviors, and methods, can be expressed in academic literature, set), procedural knowledge tagging support tools for evaluation sets, and automatic learning of procedural knowledge extraction based on machine learning. In particular, detailed modeling of automatic extraction of procedural knowledge based on academic information consisting of a total of seven steps was performed.

사전, 시소러스, 온톨로지 등을 활용한 다양한 전문용어, 개체, 개념 등의 정보를 추출하는 연구는 최근까지 꾸준하게 진행되어 오고 있고, 단백질 개체 간의 관계 추출 연구 등 개체간의 연관 관계 또는 이벤트 등의 정보를 추출하는 연구들이 근래에 진행되어 오고 있다. Researches that extract information about various terminology, objects, concepts, etc. using dictionaries, thesauruses, and ontologies have been steadily progressing until recently. Recently, researches have been conducted.

하지만 이러한 단편적인 지식에서 벗어나, 구조화되고 절차화된 지식 추출에 대한 연구는 극히 찾아보기 어렵다. 다만 eHow, wikiHow 등의 웹 문서에서 절차적인 정보를 추출하는 연구가 있으나, 이는 이미 tfka들에 의해 구조화/순차화된 문장 집합에서 온톨로지를 추출하는 정도의 것에 불과할 뿐이다.
However, beyond this fragmentary knowledge, research on structured and procedural knowledge extraction is extremely difficult to find. However, there are studies that extract procedural information from web documents such as eHow and wikiHow, but this is only about extracting ontology from a set of sentences already structured and sequenced by tfka.

절차적 지식은 특정 목적을 달성하기 위한 순차적인 또는 구조적인 행위들의 모음 또는 단위 절차 지식의 모음이다. 즉, 분석 대상이 되는 문서의 구조를, 문서를 작성한 목적 문장과 그 목적(Purpose)을 이루기 위한 해법(Solution) 문장의 집합으로 표현할 수 있고, 해법 문장은 다시 구체적인 해결 절차들을 포함한 문장의 모음으로 표현할 수 있다.Procedural knowledge is a collection of sequential or structural acts or unit procedural knowledge to achieve a specific purpose. In other words, the structure of the document to be analyzed can be expressed as a set of sentences that make up the document and solution sentences to achieve the purpose. I can express it.

해법 내의 단위 절차들은 각 절차들간에 순차적, 병렬적, 또는 독립적 관계가 있다고 가정될 수 있고, 각 단위 절차는 다시 절차 수행의 대상(Target), 방법(Method), 행위(Action)의 트리플로 구성되는 것으로 가정할 수 있다.It can be assumed that the unit procedures in the solution are sequential, parallel, or independent among the procedures, and each unit procedure is composed of triples of target, method, and action for performing the procedure. Can be assumed.

대상(Target)은 단위 절차 내에서의 실험 수행 대상이 되는 부분이고, 방법(Method)은 이 대상에 적용하는 구체적인 실험 방법이며 그리고 행위(Act)는 실험 대상에 실험방법을 어떻게 적용했는 지를 나타낸다.
Target is the part that is the subject of the experiment in the unit procedure, Method is the specific experimental method applied to the subject, and Act indicates how the experimental method is applied to the subject.

본 발명의 실시예에 있어서, 대상(Target)은 병명, 질병명, 증상, 효과, 수치 등으로 한정하였고, 방법(Method)은 치료 방법, 수술 방법, 복약 방법 등으로 정의하였다. 또한, 행위(Act)는 대상과 방법을 연결해주는 술어 부분을 의미하여, "'대상'에 '방법'을 적용하여 '행위'를 하였다"와 같이 해석이 가능하다. 이때, 'have', 'be' 등과 같은 보편적인 술어는 고려하지 않았다.In the embodiment of the present invention, the target (Target) was limited to the disease name, disease name, symptoms, effects, values, etc., Method (Method) was defined as a treatment method, a surgical method, a medication method and the like. In addition, the act (Act) means a predicate portion connecting the object and the method, it can be interpreted as "act" by applying the "method" to the "object". At this time, universal predicates such as 'have' and 'be' are not considered.

이러한 절차적 지식 모델링은 의료문서 분석 결과의 가장 큰 수혜자라고 할 수 있는 전문 의료인들과 함께 문서의 내용/특성 등을 분석하여 얻어진 것을 토대로 수행되었다.This procedural knowledge modeling was carried out based on the analysis of the content and characteristics of the document along with the specialists who are the biggest beneficiaries of the analysis results.

도 5은 절차적 지식을 모식적으로 표현한 것이다.5 is a schematic representation of procedural knowledge.

도 5을 참조하면, 절차적 지식을 다중 프로세스(Process) 간의 관계로 표현될 수 있으며, 각 프로세스에는 "대상(Target)", "행위(Action)", "방법(Method)"이 포함되어 있음을 알 수 있다.Referring to FIG. 5, procedural knowledge may be expressed as a relationship between multiple processes, and each process includes "target", "action", and "method". It can be seen.

단일 프로세스는 "대상(Target)", "행위(Action)", "방법(Method)" 이 세 가지 요소에 의해서 특징지어지며, 이들 단어들의 대문자들을 따서 TAM이라고 명명된다. A single process is characterized by three elements: "Target", "Action", and "Method" and is named TAM after the capital letters of these words.

이에 따라, 절차적 지식은 Process(TAM)-Relation-Process(TAM) 구조를 가지는 트리플로 정의될 수 있다. Accordingly, procedural knowledge may be defined as a triple having a Process (TAM) -Relation-Process (TAM) structure.

두 프로세스 사이에 지정될 수 있는 관계를 총 7가지로 분류하고 각각에 대한 정의는 표 2에 도시되는 바와 같다.There are seven types of relationships that can be specified between the two processes and their definitions are shown in Table 2.

관계relation 설명Explanation Temporal RelationTemporal relation 두 개의 TAM이 순차적인 관계Sequential relationship between two TAMs Parallel RelationParallel relation 두 개의 TAM이 병렬적인 관계Two TAMs in Parallel Relationship Causal RelationCausal relation 두 개의 TAM이 원인과 결과인 관계Relationship between two TAMs as cause and effect Comparable RelationComparable Relation 두 개의 TAM을 비교하는 관계Relationship comparing two TAMs Explanatory RelationExplanatory Relation 두 번째 TAM이 첫 번째 TAM을 설명하는 관계Relationship where the second TAM describes the first TAM Targetable RelationTargetable Relation 두 번째 TAM이 첫 번째 TAM의 Target 항목에 해당하는 관계The second TAM corresponds to the Target item in the first TAM Methodological RelationMethodological Relation 두 번째 TAM이 첫 번째 TAM의 Method 항목에 해당하는 관계The second TAM corresponds to the Method item of the first TAM

도 6은 본 발명에 따른 절차적 지식 추출 프로세스를 도식적으로 도시한다.6 diagrammatically illustrates a procedural knowledge extraction process according to the present invention.

도 6을 참조하면, 태깅된 초록에 대하여 Referring to FIG. 6, for a tagged abstract

(1) 대상(T), 행위(A), 방법(M)을 포함하는 후보문장 추출,(2) TAM 추출, (3) TAM 정규화, (4) TAM 연동 그리고 (5) 프로세스(TAM) 간 관계추출을 통해 절차적 지식을 추출하게 된다. (1) candidate sentence extraction including target (T), behavior (A), and method (M), (2) TAM extraction, (3) TAM normalization, (4) TAM interworking, and (5) process (TAM) Procedural knowledge is extracted through relationship extraction.

추출된 절차적 지식은 지식 메모리(304)에 저장된다.The extracted procedural knowledge is stored in the knowledge memory 304.

학술 논문의 초록에 대하여 품사태깅, 구문분석, 술어-논항 구조 분석, 전문용어 추출 등을 위한 다양한 자연언어처리 기법이 적용될 수 있다.Various natural language processing techniques can be applied to the abstracts of academic papers such as pruning, syntactic analysis, predicate-argument structure analysis, and terminology extraction.

술어-논항 구조는 술어(Predicate)와 논항(Argument) 관계를 이용하여 문장 내에 존재하는 각 단어간의 유의미한 연관관계를 표현하는 구조이다.The predicate-dissertation structure is a structure that expresses a significant association between each word in a sentence by using a predicate and an argument.

전문용어 추출이란 잘 알려진 UMLS(Unified Medical Language System), UniProt, GO(Gene Ontology) 등과 같은 의/생명 분야 온톨로지를 기반으로 문서 내의 단어절 또는 다어절 용어를 태깅한 정보인데, 이는 대상 문서에 포함된 용어들이 해당 분야의 전문용어임을 고려할 때 심층 지식 추출의 효율성을 위해 필수요소라 할 수 있다.Terminology extraction is information tagged with word or multi-term terms within a document based on well-known medical / life discipline ontology such as Unified Medical Language System (UMLS), UniProt, and Gene Ontology (GO), which are included in the target document. Given that these terms are terminology in the field, they are essential for the efficiency of in-depth knowledge extraction.

태깅된 초록을 분석하여 대상(T), 행위(A), 방법(M)을 포함하는 후보문장을 추출한다. 단위 절차는 Target(대상질병)/ Action(행위) / Method(적용 방법)의 트리플로 구성되므로 후보 문장을 추출한다는 것은 이러한 트리플을 가지는 문장을 추출하는 것을 의미한다. The candidate abstract including the target (T), action (A), and method (M) is extracted by analyzing the tagged abstract. Since the unit procedure consists of triples of Target / Action / Method, extracting candidate sentences means extracting sentences with such triples.

예를 들어, 바이오 분야 문헌 내에서 "대상(Target)"은 질병명칭, 증상, 효과 등과 같이 특정 연구에서 해결하고자 하는 연구 과제를 나타내며, "방법(Method)"은 연구 과제를 해결하기 위해 사용한 방법, 즉, 치료방법, 수술방법, 복약 방법등을 의미한다. 마지막으로 "행위(Action)"는 "대상"과 "방법"을 연결시키는 술어(predicate) 역할을 수행한다.
For example, in the literature, "Target" refers to the research task to be addressed in a particular study, such as disease name, symptoms, effects, etc., and "Method" refers to the method used to solve the research task. That is, the treatment method, the surgical method, the medication method and the like. Finally, "Action" acts as a predicate that connects "target" and "method".

문장에서 Target, Action, Method에 해당하는 의미적 개체 추출하는 것에 대해 아래에 기술한다.Extracting semantic objects corresponding to Target, Action, and Method from the sentence is described below.

단위 절차 추출의 기본 요소인 Target, Actin, Method의 추출을 위해 어절 자체의 자질, 문맥 자질, 술어-논항 구조 자질, 전문용어 자질 등이 이용될 수 있다. For the extraction of Target, Actin, and Method, which are the basic elements of unit procedure extraction, the features of word itself, context, predicate-argument, and terminology can be used.

● 어절 자체의 자질; 어절 자체, 어절의 기본형, 품사 태그, 품사 분류 정보(즉, 동사, 명사, 기호 등), 어절의 대문자로 시작 여부 또는 전체 대문자 여부● the qualities of the word itself; The word itself, the base form of the word, the part-of-speech tag, the part-of-speech classification information (ie, verbs, nouns, symbols, etc.)

● 문맥 자질; 이전/이후 N개의 어절 및 품사 태그 정보Context qualities; Before / after N word and part-of-speech tag information

● 술어-논항 구조; 술어 및 논항(argument) 해당 구절Predicate-argument structure; Predicates and Argument Corresponding Phrases

● 전문용어 자질; UMLS/Uniprot/GO Ontology 온톨로지 태깅 정보Terminology qualities; UMLS / Uniprot / GO Ontology Ontology Tagging Information

태깅된 초록으로부터 이용가능한 자질들이 추출되고, 추출된 자질을 기반으로 CRFs 모델 학습을 수행한다. CRFs에서 입력은 위의 자질 정보 집합이고, 출력은 각 단위 절차 요소(Target, Action, Method)이다.Available features are extracted from the tagged abstract, and CRFs model training is performed based on the extracted features. In CRFs, the input is the above feature information set, and the output is each unit procedure element (Target, Action, Method).

상술한 바와 같이 본 발명에 따른 지식 제공 장치(300)는 특정 분야에서의 심층 지식 서비스에 대한 힌트를 제공함으로써, 과학기술 지식발견 통합 플랫폼을 사용하는 사용자가 자신이 생각하는 서비스를 고도화할 수 있는 기반을 제공할 수 있다.As described above, the knowledge providing apparatus 300 according to the present invention may provide a hint about an in-depth knowledge service in a specific field, thereby enabling a user who uses a science and technology knowledge discovery integrated platform to advance a service that he or she thinks of. Can provide a foundation.

심층 지식 제공부(310)는 입력된 질의어에 상응하는 심층 지식을 지식 메모리(308)로부터 검색하고 다양한 프레젠테이션 방식에 맞추어 가공하여 출력한다.The in-depth knowledge provider 310 retrieves in-depth knowledge corresponding to the input query word from the knowledge memory 308 and processes and outputs it according to various presentation methods.

본 발명은 (1) 슬라이드 내비게이션, (2) 심층지식 내비게이션, (3) 동적 트리플 분석 (4) 동적 테이블 기반 검색 등의 다양한 검색 방법을 제공한다. 모든 서비스는 기본적으로 MEDIE 데이터베이스에서 추출된 의미기반 트리플 집합으로 이루어지며, 다양한 형태의 심층적 관계형 지식이 제공될 수 있다. The present invention provides various search methods such as (1) slide navigation, (2) deep knowledge navigation, (3) dynamic triple analysis, and (4) dynamic table based search. All services basically consist of semantic based triple set extracted from MEDIE database, and various types of deep relational knowledge can be provided.

본 발명에서 제안하는 검색 방법은 사용자로 하여금 단순한 의미 트리플 검색뿐만 아니라 다양한 분석 결과를 제시함으로써 새로운 지식의 발견과 창출을 가능하게 한다. The search method proposed in the present invention enables the user to discover and create new knowledge by presenting various analysis results as well as simple semantic triple search.

도 7은 본 발명에 따른 지식 제공 장치에 있어서의 검색 초기 화면을 도시한다. Fig. 7 shows a search initial screen in the knowledge providing apparatus according to the present invention.

본 발명에 따른 지식 제공 장치(300)는 문서 기반 검색 시스템으로써 일반 검색 시스템처럼 하나의 질의어로 검색이 시작된다. 질의어 입력시 도 7에 도시된 바와 같이 자동 완성 기능으로 후보 질의어를 제시할 수 있다.
The knowledge providing apparatus 300 according to the present invention is a document-based search system, and a search is started with one query word like a general search system. When the query is input, the candidate query may be presented by an autocomplete function as shown in FIG. 7.

(1) 슬라이드 내비게이션(1) slide navigation

슬라이드 내비게이션은 지속적으로 연결된 트리플 기반의 지식 정보를 동일 화면에서 마우스 클릭으로 손쉽게 옮겨 가면서, 해당되는 트리플에 대한 상세 학술문헌정보를 동적으로 살펴볼 수 있는 기능을 제공한다. 이러한 슬라이드 내비게이션에 의하면 특정 개체에서 출발하여 그 개체와 연관된 개체들 사이의 관계를 중심으로 한 확장, 그리고 목적어 개체를 중심으로 한 재확장이 쉽게 된다.Slide navigation provides the ability to dynamically view detailed scholarly information on the corresponding triple while moving continuously connected triple-based knowledge information with the click of a mouse on the same screen. Such slide navigation makes it easy to start with a specific object and expand based on the relationship between the objects associated with the object and re-expand the object object.

또 다른 특징은, 질의어 자동 생성 기능이다. 처음 입력된 질의어에 대해 이와 공기 가능한 후보 질의어를 슬라이드 내비게이션 형식으로 제시하여 사용자가 간단하게 질의어를 트리플까지 확장하게 할 수 있다. 결과는 구글의 순간 검색과 같은 방식으로 질의어를 재구성함과 동시에 해당 검색 결과가 제시될 수 있다.
Another feature is automatic query generation. For the first input query, this candidate candidate query can be presented in the form of slide navigation so that the user can simply expand the query to triple. The results can be reconstructed in the same way as Google's instant search and the search results can be presented.

도 8는 본 발명에 있어서 슬라이드 내비게이션 검색 화면의 예를 도시한다. 8 shows an example of a slide navigation search screen in the present invention.

검색 결과는 일반 검색 시스템의 결과와 유사하게 질의어(ex, diabete)에 대한 랭킹된 문서들이 보여진다. 차이점은 2가지로 하나는 6개 개체 종류 정보가 각각의 색으로 강조되어 표현되어 있는 것과 원문 또는 다른 서비스로의 접근이 가능하다는 점이다.
The search results show the ranked documents for the query words (ex, diabete) similar to the results of a general search system. The difference is two, one with six object type information highlighted in each color and access to the text or other services.

본 발명에 따른 슬라이드 내비게이션을 이용한 검색 방법은 다음과 같다. A search method using slide navigation according to the present invention is as follows.

먼저, 지식 메모리로부터 입력된 질의어를 포함하는 트리플이 검색된다.First, a triple including a query word input from the knowledge memory is retrieved.

검색된 트리플을 구성하는 개체들 즉, 트리플의 주어, 이벤트 동사, 목적어를 구성하는 개체들 각각에 대하여 내비게이션 기능에 의해 유사어를 선택하여 트리플을 재구성한다. 개체들에 대한 내비게이션을 위하여 각각의 개체들을 수용하는 내비게이션 박스가 제공된다. 이 내비게이션 박스는 유사어를 선택할 수 있는 슬라이드바를 구비한다. 내비게이션 박스는 트리플에 포함되는 개체들 이외에도 다른 개체들을 포함하도록 확장될 수 있다.The navigation function is used to reconstruct the triple by selecting a similar word for each of the entities constituting the retrieved triple, that is, the subjects, event verbs, and objects constituting the triple. A navigation box is provided for receiving individual objects for navigation to the objects. The navigation box is provided with a slide bar from which similar words can be selected. The navigation box can be extended to include other entities in addition to those included in the triple.

재구성된 트리플을 포함하는 문서들을 검색하고 목록화하여 제시한다.
Retrieve, list, and present documents that contain reconstructed triples.

도 8에 도시되는 슬라이드 내비게이션 검색을 위한 인터페이스 화면(900)은 질의어 입력창(902), 내비게이션 창(904), 그리고 목록창(906)을 포함한다. 질의어 입력창(902)에는 사용자에 의해 입력된 질의어가 표시된다.
The interface screen 900 for the slide navigation search shown in FIG. 8 includes a query input window 902, a navigation window 904, and a list window 906. The query input window 902 displays a query input by the user.

한편, 내비게이션창(904)에는 사용자에 의해 입력된 질의어를 포함하는 트리플의 각 요소들을 확장하기 위한 내비게이션 박스(904a~904d)들이 표시된다. 내비게이션 박스는 유사어들 사이에서 이동할 수 있게 하는 슬라이드바(slide bar)를 포함한다.
Meanwhile, the navigation window 904 displays navigation boxes 904a to 904d for expanding each element of a triple including a query input by a user. The navigation box includes a slide bar that allows movement between similar words.

내비게이션 박스(904a~904c)는 질의어 'diabete'를 포함하는 트리플 'diabete(주어)-associate(이벤트 동사)-hypertension(목적어)'의 주어, 이벤트 동사, 목적어에 각각 해당한다. 검색 초기에 있어서 사용자에 의해 입력된 질의어를 주어로 가지는 트리플이 표시될 수 있다. 다른 예로서 사용자가 두 개의 질의어를 입력하였을 경우 이들을 포함하는 트리플이 표시될 수 있다. 즉, 사용자가 입력한 질의어에 대해 이와 공기 가능한 후보 질의어를 질의어 자동 생성 기능에 의해 제공한다.The navigation boxes 904a to 904c correspond to subjects, event verbs, and objects of the triple 'diabete (main) -associate (event verb) -hypertension (object)' including the query word 'diabete', respectively. In the initial search, a triple having a query input by the user as a subject may be displayed. As another example, when a user inputs two query terms, a triple including the two query terms may be displayed. That is, the query query input by the user is provided by the query automatic generation function.

이러한 질의어 자동 생성 기능은 다차원 지식 특히 구조적 지식에 기반을 둔다. 즉, 사용자가 입력한 질의어를 포함하는 트리플 구조의 다차원 지식 특히 구조적 지식을 검색하고, 검색된 트리플을 질의어 자동 생성 기능을 이용하여 제시하게 되는 것이다.This automatic query generation function is based on multidimensional knowledge, especially structural knowledge. That is, the multi-dimensional knowledge, especially structural knowledge, of the triple structure including the query input by the user is searched, and the retrieved triple is presented using the query automatic generation function.

각각의 내비게이션 박스(904a ~ 904d)는 유사한 단어들을 내비게이션할 수 있는 슬라이드바를 갖는다.예를 들어 슬라이드바를 상하로 슬라이드시키면 단어 'diabete'와 알파벳순으로 정렬된 유사한 의미를 가지는 단어들이 차례로 표시될 수 있다. Each navigation box 904a-904d has a slide bar that allows navigation of similar words. For example, sliding the slide bar up or down can cause the words 'diabete' and words with similar meanings arranged in alphabetical order to be displayed one after the other. .

각각의 내비게이션 박스(904a~904d)에 의해 선택된 단어들의 조합에 해당하는 검색 트리플에 대한 구조적 지식이 순간 검색 기능에 의해 검색되고, 검색된 구조적 지식에 해당하는 문서들의 목록이 목록창(906)에 표시된다.The structural knowledge of the search triple corresponding to the combination of words selected by the respective navigation boxes 904a to 904d is searched by the instant search function, and a list of documents corresponding to the searched structural knowledge is displayed in the list window 906. do.

여기서, 각 문서들의 목록은 제목 및 해당 문서에서 추출된 구조적 지식을 표현하는 문장을 포함한다. 구조적 지식을 표현하는 문장은 단위 프로세스를 나타내는 문장은 해당 문서를 분석함에 의해 얻어지는 구조적 지식 즉, TAM 트리플 구조의 지식이다. 단위 프로세스를 나타내는 문장에 있어서 Target, Action 그리고 Method에 해당하는 단어가 서로 다른 색으로 표시된다.Here, the list of each document includes a title and a sentence representing structural knowledge extracted from the document. The sentence representing the structural knowledge is the structural knowledge obtained by analyzing the document, that is, the sentence representing the unit process, that is, the knowledge of TAM triple structure. In sentences representing unit processes, words corresponding to Target, Action, and Method are displayed in different colors.

각 문서들의 목록에서 제목을 선택하면 바로 원문으로의 접근이 이루어진다. If you select a title from the list of documents, the original text is accessed.

네비게이션 박스(904a~904d)는 확장될 수 있다. 즉, 트리플을 구성하는 주어/술어/목적어 이외에도 다른 단어들을 포함하도록 확장된다. 도 8의 내비게이션 박스(904d)는 이러한 예를 보여주고 있다. 도 8에 도시되는 것은 "diadiabete-associate-hypertension"의 트리플을 가지면서 'treat'이라는 단어를 가지는 문서에 대한 검색 예를 도시한다.The navigation boxes 904a-904d can be expanded. That is, it is expanded to include other words in addition to the subject / predicate / object that constitutes a triple. The navigation box 904d of FIG. 8 illustrates this example. 8 shows an example of searching for a document having a triple of "diadiabete-associate-hypertension" and having the word "treat".

내비게이션창(904) 하부의 수평방향 슬라이더를 조작함에 의해 내비게이션 박스들을 확장하거나 이전/이후의 내비게이션 박스들을 참조할 수 있다.By manipulating the horizontal slider under the navigation window 904, the navigation boxes can be expanded or reference can be made to the navigation boxes before / after.

그리고 내비게이션 박스의 개체를 선택하면 하기의 심층지식 네트워크 브라우징 검색 화면이 팝업된다.
When the object of the navigation box is selected, the following in-depth network browsing search screen pops up.

(2) 심층지식 네트워크 브라우징(2) Deep Network Browsing

심층지식 네트워크 브라우징은 하나의 개체에 대해 네트워크 형식으로 검색할 수 있는 기능을 제공한다. In-depth network browsing provides the ability to search for a single entity in a network format.

지식 트리플은 기본적으로 방사형으로 표현할 수 있으며, 이를 기반으로 세부 요소 지식을 내비게이션할 수 있는 시스템 및 서비스를 구성함으로써, 사용자는 검색(키워드 및 식별자 기반 모두 제공)을 통해서 해당 개체를 찾고, 이를 기반으로 화면에서 개체 간의 관계를 확인하면서 지속적인 항해가 가능하다. Knowledge triples can be expressed radially by default, and by constructing systems and services that can navigate detailed element knowledge, users can find the object by searching (providing both keyword and identifier base) and based on it. Continuous navigation is possible while checking the relationship between objects on the screen.

이러한 심층 지식 내비게이션은 광범위한 지식 트리플에 대한 항해의 효율성을 위해서 특정 위치에서 특정 개체와 연관된 다른 개체 및 관계 출력 정도를 조정할 수 있는 기능을 포함하고 있다.
This in-depth knowledge navigation includes the ability to adjust the degree of output of other entities and relationships associated with a particular entity at a particular location for the efficiency of navigation across a wide range of knowledge triples.

본 발명에 있어서, 심층 지식 내비게이션을 이용한 검색 방법은 다음과 같다.In the present invention, a search method using in-depth knowledge navigation is as follows.

먼저, 지식 메모리로부터 입력된 질의어를 주어로 가지는 트리플이 검색된다.First, a triple having a query word input from the knowledge memory as a subject is searched.

검색된 트리플로부터 목적어에 해당하는 객체(목적어 객체)가 추출된다.The object corresponding to the object (object object) is extracted from the retrieved triples.

입력된 질의어와 목적어 객체를 잇는 선에 의해 트리플 관계를 표현하는 네트워크를 제공한다. 여기서, 질의어와 목적어를 있는 선은 주어와 목적 사이를 연관짓는 술어를 나타내며, 마우스를 선위에 위치시켜 술어의 내용을 확인할 수 있다.It provides a network that expresses a triple relationship by a line connecting the input query object and the object object. Here, the line with the query word and the object word represents a predicate that associates the subject and the object, and the contents of the predicate can be confirmed by placing the mouse on the line.

네트워크는 목적어 객체를 주어로 가지는 하위 네트워크를 포함하도록 확장될 수 있다.
The network can be extended to include sub-networks that have object objects as their subject.

도 9 내지 도 10은 본 발명에 있어서 심층 지식 내비게이션 서비스 화면의 예를 도시한다.9 to 10 show an example of the in-depth knowledge navigation service screen in the present invention.

심층지식 내비게이션은 하나의 개체에 대해 네트워크 형식으로 검색이 되는 서비스로써 검색 결과 화면에서 하나의 개체를 선택하면 해당 개체를 기준으로 심층지식 네트워크 브라우징 서비스가 시작된다. 네트워크는 3단계까지 확장 가능하고, 확대/축소 보기 기능을 제공하여 면밀히 결과를 확인할 수 있다. 도 9는 1단계 검색 네트워크를 도시하고 도 10은 2단계 검색 네트워크를 표시하고 있다.In-depth knowledge navigation is a service that searches one object in a network format. When one object is selected in the search result screen, in-depth network browsing service is started based on the object. The network is expandable up to three levels, and the zoom view function allows you to check the results closely. 9 shows a first stage search network and FIG. 10 shows a two stage search network.

도 9를 참조하면, UMLS:C0011849(database;번호)라는 개체에 대한 네트워크 가 도시된다. UMLS;C0011849라는 개체는 diabetes인 것을 알 수 있다. 이러한 네트워크는 도 8의 슬라이드 내비게이션 창(902)에서 선택된 개체에 대한 것일 수 있다. 예를 들어, 도 8의 슬라이드 내비게이션창(904a)에 'daabetes'가 표시되고 있는 상태에서 슬라이드 내비게이션 박스(904a)를 마우스의 더블클릭 동작에 의해 선택된 것일 수 있다.9, a network for an entity called UMLS: C0011849 (database; number) is shown. The subject UMLS; C0011849 is diabetes. This network may be for the entity selected in the slide navigation window 902 of FIG. 8. For example, the slide navigation box 904a may be selected by a double-click operation of a mouse in a state where 'daabetes' is displayed in the slide navigation window 904a of FIG. 8.

선택된 개체(diabetes, 1002)가 네트워크의 중심에 표시되고 그 주위에 방사상으로 연관된 객체들이 표시된다. 이들 연관된 객체(1004)들은 선택된 객체(1002)를 주어로 가지는 트리플들에 포함되는 목적어일 수 있다. 선택된 개체(diabetes)와 그에 연관된 객체들의 연결 관계를 보이는 선이 함께 표시된다. 이 선은 선택된 개체(diabetes)와 그에 연관된 개체(hypertension) 사이의 관계를 나타내는 "associate"일 수 있다. 즉, 선택된 객체(diabetes), 그에 연관된 개체(hypertension) 그리고 이들을 잇는 선(associate)에 의해 "diabetes associate hypertension"이라는 트리플을 표현하게 된다. 사용자가 마우스를 선위에 위치시키면 선에 의해 표현되는 관계 즉, 트리플의 술어가 표시된다.The selected objects 1002 are displayed in the center of the network and the radially associated objects are displayed around them. These associated objects 1004 may be objects included in triples that have the selected object 1002 as the subject. A line showing the connection relationship between the selected objects and the objects associated with them is displayed. This line may be an "associate" representing the relationship between the selected entity and its associated hypertension. That is, the triplet "diabetes associate hypertension" is represented by the selected diabetes, the hypertensions associated with them, and the lines connecting them. When the user places the mouse over the line, the relationship represented by the line, that is, the predicate of the triple, is displayed.

도 9의 좌측 상단에 보여지는 범주(1006)는 네트워크에 포함된 개체들의 속성을 색깔별로 표시하고 있음을 보여준다. 한편, 도 8의 우측 상단에 보여지는 네트워크 조절박스(1008)는 광범위한 지식 트리플에 대한 항해의 효율성을 위해서 특정 위치에서 특정 개체와 연관된 다른 개체 및 관계 출력 정도를 조정할 수 있게 한다.The category 1006 shown in the upper left of FIG. 9 shows that the attributes of entities included in the network are displayed by color. On the other hand, the network control box 1008 shown in the upper right of FIG. 8 allows to adjust the degree of output of other entities and relationships associated with a particular entity at a particular location for efficiency of navigation for a wide range of knowledge triples.

또한, 개체(노드)를 선택하면 동의어(Synonym)가 빈도순으로 보여지고, 관계(선)를 선택하면 두 개체 사이의 가능한 관계들과 이를 증명하는 문장을 확인할 수 있다. 게다가, 원문으로의 접근도 가능하다. 모든 개체는 외부 유명 데이터베이스의 ID정보를 갖고 있고 결과에 나타나기 때문에 해당 데이터베이스로의 접근도 용이하게 된다. In addition, if you select an entity (node), synonyms are shown in order of frequency, and if you select a relationship (line), you can check the possible relationships between two entities and the sentence that proves them. In addition, access to the text is possible. All objects have ID information from external well-known databases and appear in the results, making it easy to access them.

도 10을 참조하면, 선택된 개체(1002)에 연관된 개체들의 네트워크(1단계 네트워크)와 더불어 연관된 개체(1004)에 연관된 다른 개체들(1010의 네트워크(2단계 네트워크)가 표시되고 있음을 알 수 있다.
Referring to FIG. 10, it can be seen that the network of the entities associated with the selected entity 1002 (stage 1 network), as well as other entities 1010 associated with the associated entity 1004 (network 2 levels) are displayed. .

(3) 동적 트리플 분석(3) dynamic triple analysis

동적 트리플 분석은 질의어에 대한 상호 개체 또는 관계 정보를 확인할 수 있는 서비스로써 하나의 개체를 질의어로 입력하면 이와 공기하는 다른 개체 및 관계가 빈도순으로 분석된 결과가 제시되고, 두 개의 질의어가 입력하면 두 개의 질의어와 함께 공기하는 다른 개체나 관계 동사가 빈도순으로 제시된다.
Dynamic triple analysis is a service that checks mutual entity or relationship information about a query. When one object is entered as a query, the results of analysis of other entities and relationships in the air are presented in order of frequency. When two queries are entered, The two query terms, along with other entity or relationship verbs, are presented in order of frequency.

본 발명에 있어서, 동적 트리플 분석을 위한 검색 방법은 다음과 같다.In the present invention, a search method for dynamic triple analysis is as follows.

먼저, 지식 메모리로부터 입력된 질의어를 주어로 가지는 트리플들이 검색된다.First, triples having a query word input from the knowledge memory as a subject are searched.

검색된 트리플들을 객체별로 분류하고, 각 객체의 출현 빈도에 따라 리스트된다. 각 개체의 출현빈도 및 점유율이 리스트된다. 출현빈도 및 점유율은 차트 형태 혹은 테이블 형태로 제공될 수 있다.
The retrieved triples are sorted by object and listed according to the frequency of appearance of each object. The frequency and share of occurrences of each individual are listed. The frequency of occurrence and share can be provided in chart form or in table form.

도 11은 본 발명에 있어서 동적 트리플 분석 서비스 화면의 예를 도시한다.11 shows an example of a dynamic triple analysis service screen in the present invention.

도 11을 참조하면, 사용자가 원하는 질의어('diabete')를 입력하고 검색을 실행하면 해당 질의어와 연관되는 주어/이벤트동사/목적어 정보를 빈도수 순으로(상위 5순위까지) 차트와 함께 통계결과를 표현한다. Referring to FIG. 11, when a user inputs a desired query ('diabete') and executes a search, statistics of the subject / event verb / target word information associated with the query in frequency order (up to the top 5 ranks) are displayed along with the statistical results. Express.

도 11의 상측에는 질의어 'diabete'와 연관된 동사(verb)들 중에서 빈도수가 많은 5개의 동사 induce, associate, injibit treat, affect의 빈도수 및 점유율을 각각 파이 차트(pie chart)와 테이블로 보이는 것이 도시된다.The upper side of FIG. 11 shows a pie chart and a table showing the frequency and the occupancy rate of five verbs induce, associate, injibit treat, and affect the most frequently among verbs associated with the query term 'diabete'. .

한편, 도 11의 하측에는 질의어 'diabete'와 연관된 목적어(object)들 중에서 빈도수가 많은 5개의 목적어 insulin, abesity, hypertension, atherosclerosis, cardiovascular disease의 빈도수 및 점유율을 각각 파이 차트와 테이블로 보이는 것이 도시된다.Meanwhile, the lower side of FIG. 11 shows pie charts and tables showing the frequency and the occupancy rate of five objects insulin, abesity, hypertension, atherosclerosis, and cardiovascular disease, respectively, among the objects associated with the query 'diabete'. .

여기서, 사용자의 취향에 따라 다양한 차트 타입을 지정할 수 있다. 예를 들어, 특정 질병을 치료할 수 있는 모든 치료제 혹은 치료법에 대한 전체적인 통계정보를 얻을 수 있다. 본 발명의 실시예에서는 파이(Pie)형과 바(Bar)형의 예를 도시하고 있다. 관계 동사에 대해서는 한글 대역어도 확인 가능하다.
Here, various chart types may be designated according to the taste of the user. For example, you can get overall statistical information about all the treatments or therapies that can treat a particular disease. In the embodiment of the present invention, examples of a pie type and a bar type are shown. Korean verbs can also be identified for related verbs.

(4) 동적 테이블 검색(4) dynamic table search

동적 테이블 검색은 입력한 두 개의 질의어에 대한 공통으로 관계하는 개체나 이벤트 동사를 분석하고, 이 공통 인자들이 포함된 의미 트리플을 재검색하여 새로운 주변 관계를 발견할 수 있는 검색 방법이다. The dynamic table search is a search method that analyzes the related entity or event verbs of two input queries, and discovers new neighbor relationships by re-searching the semantic triples containing these common factors.

본 발명에 있어서 동적 테이블 검색 방법은 다음과 같다.In the present invention, the dynamic table search method is as follows.

먼저, 지식 메모리로부터 입력된 질의어를 포함하는 트리플들이 검색된다.First, the triples including the query word input from the knowledge memory are retrieved.

검색된 트리플로부터 개체들이 추출된다. The objects are extracted from the retrieved triples.

추출된 개체들의 목록이 제시된다. 각각의 목록은 두 개 이상의 개체들을 포함한다. 이러한 목록에 의해 입력된 질의어에 관련된 개체들의 가능한 조합이 사용자에게 제시되고, 이러한 목록들을 참조하여 질의어를 재구성함에 의해 네트워크 브라우징이나 동적 트리플 분석에서 발견할 수 없는 새로운 관계까지 확장하여 브라우징하는 것이 가능하게 된다. A list of extracted entities is presented. Each list contains two or more entities. Possible combinations of entities related to the query entered by this list are presented to the user, and by reconstructing the query by referencing these lists, it is possible to expand and browse to new relationships not found in network browsing or dynamic triple analysis. do.

개체 목록으로부터 선택된 개체를 포함하는 질의어를 재구성한다.
Reconstruct the query that contains the selected object from the list of objects.

도 12는 본 발명에 있어서 동적 테이블 검색 서비스 화면의 예를 도시한다.12 shows an example of a dynamic table search service screen in the present invention.

도 12를 참조하면, 사용자는 검색하고자 하는 키워드 2개를 입력하여 검색을 수행하게 되고 2개의 키워드를 기준으로 서로 연관되는 개체목록과 이벤트목록을 화면에 출력된다.Referring to FIG. 12, a user inputs two keywords to search for and performs a search and outputs an object list and an event list related to each other based on the two keywords.

검색된 개체목록과 이벤트 목록을 동적 테이블 박스에 드랙앤드롭(Drag & Drop)하여 사용자가 원하는 목적어를 찾을 수도 있고 전혀 새로운 목적어 정보를 찾을 수 있다.You can drag and drop the list of retrieved objects and events into a dynamic table box to find the object you want, or to find new object information.

300...지식제공장치
302...다차원 지식 생성부 304...지식 메모리
306...심층지식 제공부300 ... Knowledge Feeder
302 Multidimensional Knowledge Generator 304 Knowledge Memory
306.In-depth knowledge provider

Claims

The multidimensional knowledge consists of relational knowledge representing the relationship between the objects contained in the target document, structural knowledge representing the pragmatic role between the sentences contained in the target document, and procedural knowledge including the purpose, behavior, and methods of the target document. In the knowledge providing method based on the knowledge memory stored in the form of (subject / event verb / object),
Searching for and presenting a triple including a query word input for searching from the knowledge memory; And
And searching for and presenting documents related to the selected triple among the presented triples.

The process of claim 1, wherein the searching and presenting of the triples is performed.
The knowledge memory-based in-depth knowledge providing method comprising: searching for triples having event verbs having air relations based on the query terms.

The method of claim 2,
Presenting event verbs with air relationships; And
A method of providing in-depth knowledge based on knowledge memory, characterized in that if one of the presented event verbs is selected, the search is repeated with a search triple having a "subject-selected event verb".

The method of claim 3,
Providing further similar words for a subject or object among the entities included in the triples, and if the similar words are selected, further including re-searching for the search triple having the selected similar words. Way.

The method according to claim 3 or 4,
And searching for and presenting documents related to the search triple reconstructed from the knowledge memory by an instant search whenever the search triple is reconstructed.

The process of claim 1, wherein the searching and presenting of the triples is performed.
Searching for a triple having a query word input from the knowledge memory as a subject;
Extracting an object (object object) corresponding to an object from the searched triples; And
Providing a network expressing a triple relationship by a line connecting a subject and an object; a method of providing in-depth knowledge based on knowledge memory.

The method according to claim 6,
And searching for a triple having the object as the subject from the knowledge memory, and expanding the network.

The process of claim 1, wherein the searching and presenting of the triples is performed.
Searching for triples having a query word input from the knowledge memory as a subject; And
And classifying the searched triples by object and listing the searched triples according to the frequency of appearance of each object.

The process of claim 1, wherein the searching and presenting of the triples is performed.
Searching for a triple including a query word input from the knowledge memory;
Extracting an object from the retrieved triples;
Presenting a list of extracted objects;
Reconstructing a query containing a selected object from the list of objects; Deep knowledge-based knowledge providing method comprising a.

A knowledge memory for storing relational knowledge, structural knowledge, and procedural knowledge about the document; And
An in-depth knowledge provider for inputting a query and searching and presenting triples and documents related to the triples including the query word input from the knowledge memory;
Device for providing in-depth knowledge based on academic scientific knowledge memory.

The academic scientific knowledge memory of claim 10, wherein the in-depth knowledge provider uses Generalized Concordance Lists (GCL) queries to search for a specific term or to search for a term group or a relationship between terms. In-depth knowledge provision device.

The method of claim 11, wherein the in-depth knowledge providing unit provides a slide navigation to dynamically look at the document information for the triple, while moving the continuously connected triple-based knowledge information on the same screen with a mouse click easily An in-depth knowledge providing device based on academic scientific knowledge.

The method of claim 12, wherein the in-depth knowledge provider inputs a keyword desired by the user and executes a search to display statistical results along with a chart of subject / event verb / target word information associated with the keyword in order of frequency (up to the top five ranks). An apparatus for providing in-depth knowledge based on academic scientific knowledge, characterized by further providing dynamic triple analysis information browsing.

The method of claim 13, wherein the in-depth knowledge provider further provides a dynamic table-based search for outputting a list of entities and events related to each other based on two input keywords. In-depth knowledge provision device.

The apparatus of claim 14, wherein the in-depth knowledge provider further provides in-depth knowledge navigation for navigating detailed element knowledge based on a knowledge triple displayed in a radial manner.