KR20090078986A

KR20090078986A - Ontology-based semantic annotation system and method thereof

Info

Publication number: KR20090078986A
Application number: KR1020080004880A
Authority: KR
Inventors: 김홍기; 양경모; 송승재; 김동범
Original assignee: 재단법인서울대학교산학협력재단
Priority date: 2008-01-16
Filing date: 2008-01-16
Publication date: 2009-07-21
Also published as: KR100966651B1

Abstract

A semantic annotation system based on ontology and a method thereof are provided to subdivide document information into the component unit of a document, not a document unit, and provide annotation with a semantic information knowledge base composed of ontology providing semantic mutual operability to the document, thereby making document search based on meaning possible. A semantic information administration unit(31) stores semantic information in a ontology storage to build a semantic information knowledge base. A document management unit(32) calls a document and attaches the semantic information to the document. If the semantic information related to the components of the document exists in the semantic information knowledge base, an auto annotation processor(33) annotates to the components of the document with the semantic information. An annotation preparation unit(34) annotates the semantic information to the components of the document for the components of document and semantic information which a user appoints.

Description

Ontology-based Semantic Annotation System and Its Method {Ontology-based Semantic Annotation System and Method}

본 발명은 의미정보를 온톨로지로 구성하여 구조화시킨 의미정보 지식베이스를 구축하고, 상기 의미정보 지식베이스를 이용하여 문서를 어노테이션을 하는 시맨틱 어노테이션 시스템 및 방법에 관한 기술분야에 속한다.The present invention belongs to the technical field of a semantic annotation system and method for constructing a semantic information knowledge base structured by constructing semantic information into an ontology and annotating a document using the semantic information knowledge base.

또, 본 발명은 문서의 정보를 문서단위가 아닌 문서의 구성요소단위로 세분화하고 의미정보 지식베이스를 이용하여, 어노테이션에 적합한 문서의 구성요소를 찾아 자동으로 어노테이션을 수행하는 시맨틱 어노테이션 시스템 및 방법에 관한 기술분야에 속한다.In addition, the present invention relates to a semantic annotation system and method for subdividing document information into document components instead of document units and using the semantic knowledge base to find and automatically annotate the document components suitable for the annotation. Belongs to the technical field.

일반적으로 웹 기술은 간단한 마크업 언어로 정보를 표현하여 인터넷 공간에 서 상호 정보를 공유할 수 있도록 함으로써, 네트워크 기반의 정보 공유와 전달에 획기적인 계기를 마련하여, 다양한 정보 자원을 상호 연결하는 거대한 정보공간이 구축되었고, 그 결과 인터넷이 실생활에까지 급속하게 확산되어 정보 사회의 혁신을 이루어 왔다. 그러나 종래의 웹 기술은 키워드(keyword)에 의해서만 필요한 정보를 검색할 수 있기 때문에, 단순 키워드 검색에 의하여 불필요한 정보들도 검색하게 된다. 이것은 종래의 웹 기술이 인간 중심의 정보 처리 기술로서, 컴퓨터가 스스로 필요한 정보를 효과적으로 추출하고, 해석하고, 가공하는 기능을 충분히 제공하지 못하는 구조적인 문제에 기인한다. 특히, 웹 정보의 양이 방대하여 짐에 따라 이런 문제는 더욱 부각되고 있다.In general, web technology expresses information in a simple markup language so that mutual information can be shared in the Internet space, thereby creating a breakthrough in network-based information sharing and delivery, and enormous information that interconnects various information resources. Spaces have been built, and as a result, the Internet has spread rapidly into real life, revolutionizing the information society. However, since the conventional web technology can search necessary information only by keyword, unnecessary information is also searched by simple keyword search. This is due to a structural problem in which the conventional web technology is a human-centered information processing technology, and the computer does not provide enough functions to effectively extract, interpret, and process information necessary for itself. In particular, as the amount of web information increases, this problem becomes more noticeable.

이런 문제를 해결하기 위하여, 기존의 웹 기술을 보완하여 컴퓨터가 이해할 수 있도록 의미적 상호 운용성(semantic interoperability)을 보장하고, 인간과 컴퓨터간의 효과적인 협동 체제를 구축할 수 있는 기술로서 시맨틱 웹(Semantic web)이 등장하게 되었다. 시맨틱 웹은 웹 상에 존재하는 정보를 사람뿐만 아니라 기계가 의미를 파악할 수 있는 구조를 제공하고, 사용자의 검색요청에 적합한 결과만을 찾아주는 의미 기반 검색을 지원하고, 사람과 기계 또는 기계와 기계 상호간에 협업을 원활히 수행함으로써 사람을 대신하는 자동적인 서비스가 가능하도록 하기 위하여 제안된 웹 기술이다.To solve this problem, the semantic web is a technology that can complement the existing web technologies to ensure semantic interoperability so that computers can understand them and to build an effective cooperative system between humans and computers. ) Came up. The semantic web provides a structure that enables not only humans but also machines to understand the information on the web, and supports semantic-based searching that finds only the results that are appropriate for a user's search request. It is a web technology that is proposed to enable automatic service on behalf of people by smoothly collaborating with.

즉, 시맨틱 웹은 컴퓨터가 정보 자원의 의미를 이해하고, 자동화하고, 통합하고, 재사용할 수 있는 차세대 웹 기술로서 온톨로지(ontology), 의미적으로 주석화된 웹(semantically annotated web), 에이전트(agent)의 3가지 주요 요소로 구성 된다. 온톨로지는 공유된 개념화에 대한 형식적 명세 체계로서, 도메인 어휘의 의미 정보를 제공한다. 온톨로지는 일종의 지식 표현으로, 컴퓨터는 온톨로지로 표현된 개념을 이해하고 지식처리를 할 수 있다. 추론 등의 처리를 위해서는 온톨로지의 공리(axiom)와 규칙(rule) 체계가 필요하다. 의미적으로 주석화된 웹은 일종의 지식베이스(knowledge base)이다. 시맨틱 웹에서는 인터넷의 분산 정보 자원을 의미적으로 통합하는 거대한 지식 베이스를 구축할 수 있다. 좁은 의미에서 기업 또는 기관의 정보 자원에 대한 지식 베이스를 구축할 수도 있다. 에이전트는 인간(사용자)을 대신하여 정보 자원을 수집, 검색하고 추론하며, 다른 에이전트와 상호 정보를 교환하는 등의 일을 수행하는 지능형 에이전트를 말한다. 지능형 에이전트는 시맨틱 웹 기반 응용 시스템의 핵심이라 할 수 있다. 시맨틱 웹은 온톨로지와 에이전트 기술을 활용하여 의미적 상호 운용성을 실현하며, 기존의 정보 표현중심의 웹을 지식 기반 의미중심의 웹으로 발전할 수 있는 새로운 패러다임을 제시하고 있다.In other words, the Semantic Web is a next-generation web technology that enables computers to understand, automate, integrate, and reuse the meaning of information resources. Ontology, semantically annotated web, and agent It consists of three main elements. Ontology is a formal specification system for shared conceptualization and provides semantic information of domain vocabulary. Ontology is a kind of knowledge expression, and the computer can understand the concept represented by the ontology and process the knowledge. In order to deal with inferences, the ontology's axiom and rule system are needed. A semantically annotated web is a knowledge base. The Semantic Web can build a huge knowledge base that semantically integrates the distributed information resources of the Internet. In a narrow sense, it may be possible to build a knowledge base of information resources of a company or institution. An agent is an intelligent agent that collects, retrieves and infers information resources on behalf of humans (users), and exchanges information with other agents. Intelligent agents are the core of semantic web-based application systems. The Semantic Web realizes semantic interoperability by utilizing ontology and agent technology, and suggests a new paradigm that can develop existing information expression-oriented web into knowledge-based semantic web.

또한, 웹문서이외에도 일반적인 문서에 대하여 의미적으로 주석화된 문서를 만듦으로써, 시맨틱 웹의 개념과 유사하게 지식베이스를 구축하고자 하는 시도가 있다. 특히, 어도비사(Adobe)에서 PDF문서에 어노테이션을 할 수 있도록 하기 위하여, 메타데이터 표현방식인 XMP를 개발하여 지원하고 있다.In addition, there is an attempt to build a knowledge base similar to the semantic web concept by creating a document that is semantically annotated with respect to general documents in addition to web documents. In particular, in order to enable Adobe to annotate PDF documents, XMP, a metadata representation method, has been developed and supported.

요약하면, 방대한 웹 문서 또는 일반 문서들이 종래에 인간중심으로 폐쇄적이고 시각적인 배치를 주안점으로 작성된 것에 반해, 최근에는 방대한 문서를 기계인 에이전트를 활용하여 보다 효과적이고 효율적으로 검색하고 더 나아가 지식베이 스로 구축하기 위하여, 의미적 상호운용성을 갖는 온톨로지를 바탕으로 문서에 의미적으로 주석을 첨부하여 작성함으로써, 사람뿐만 아니라 기계인 에이전트들도 문서의 의미적 관계를 파악하여 검색하고 문서들을 분류하고 관리할 수 있도록 하는 방향으로 점차 연구가 진행되고 있다.In summary, while large web documents or general documents are conventionally written with a human-centered closed and visual arrangement, recently, a large amount of documents can be searched more effectively and efficiently using a machine agent, and furthermore, the knowledge base. In order to construct, by attaching semantic annotations to documents based on ontology with semantic interoperability, not only human but also machine agents can identify and search semantic relations of documents and classify and manage documents. Increasingly, research is underway.

상기와 같이 산재되어 있는 웹문서나 일반 문서 등을 의미적으로 엮어내기 위한 표준화 기술들이 각기 분야에서 발전하여 RDF, OWL, TopicMaps, KIF, OWL-S, WSDL-S, AJAX, XMP, 더블린 코어, SIOC, FOAF, SKOS, SCOT 등의 형태로 다양하게 제시되고 있다. 그러나 무엇보다 필요한 것은 의미적으로 구조화된 웹 등과 같은 문서를 작성하고 관리하거나 이미 작성된 문서들을 이와 같은 형태를 적용시키는 작업을 하기 위한 통합화된 도구 및 시스템이라 할 수 있다.Standardization techniques for semantically weaving web documents or general documents scattered as described above are developed in each field, and RDF, OWL, TopicMaps, KIF, OWL-S, WSDL-S, AJAX, XMP, Dublin Core, Various types of SIOC, FOAF, SKOS, and SCOT are suggested. But what's most important is an integrated tool and system for creating and managing documents, such as semantically structured Webs, or for applying these forms to documents that have already been created.

본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, 문서의 의미정보를 온톨로지로 구조화시킨 의미정보 지식베이스를 이용하여 문서에 어노테이션(annotation)을 하는 시맨틱 어노테이션 시스템 및 방법을 제공하는 것이다.SUMMARY OF THE INVENTION An object of the present invention is to solve the problems described above, and to provide a semantic annotation system and method for annotating a document using a semantic knowledge base structured by ontology of semantic information of the document.

본 발명의 다른 목적은 문서의 정보를 문서단위가 아닌 문서의 구성요소단위로 세분화하여, 상기 문서에 의미적 상호운용성(semantic interoperability)을 제공하는 온톨로지로 구성된 의미정보 지식베이스를 이용하여 어노테이션함으로써, 사람뿐만 아니라 기계가 의미를 파악하여 적합한 결과를 찾아주어 의미 기반의 문 서검색을 가능하게 하는 시맨틱 어노테이션 시스템 및 방법을 제공하는 것이다.Another object of the present invention is to subdivide the information of the document by the component of the document rather than by the document, and to annotate using a semantic information knowledge base composed of an ontology that provides semantic interoperability to the document. It is to provide a semantic annotation system and method that enables not only people but also machines to grasp meaning and find suitable results to enable semantic based document search.

본 발명의 또 다른 목적은 어노테이션을 하기 위한 메타데이터에 대해 의미적 상호운용성 확보가 가능한 온톨로지로 구성하고, 이 의미정보 지식베이스를 통해 적절히 어노테이션을 할 수 있는 문서의 구성요소를 찾아 자동으로 어노테이션을 하는 시맨틱 어노테이션 시스템 및 방법을 제공하는 것이다.Another object of the present invention consists of an ontology capable of securing semantic interoperability with respect to metadata for annotation, and automatically finds an annotation of a document that can be properly annotated through this semantic knowledge base. It is to provide a semantic annotation system and method.

상기 목적을 달성하기 위한 본 발명은 온톨로지 저장소를 구비하는 시맨틱 어노테이션 시스템에 관한 것으로서, 의미정보를 상기 온톨로지 저장소에 저장하여 의미정보 지식베이스를 구축하는 의미정보 관리부; 문서를 불러오고 상기 문서에 의미정보를 첨부하는 문서관리부; 상기 문서의 구성요소와 관계되는 의미정보가 상기 의미정보 지식베이스에 존재하면, 상기 문서의 구성요소에 상기 의미정보로 어노테이션을 하는 어노테이션 자동처리부; 사용자가 지정하는 의미정보와 문서의 구성요소에 대하여, 상기 의미정보를 상기 문서의 구성요소에 어노테이션을 하는 어노테이션 작성부를 포함하는 것을 특징으로 한다.The present invention relates to a semantic annotation system having an ontology repository, the semantic information management unit for storing the semantic information in the ontology repository to build a semantic information knowledge base; A document management unit for importing a document and attaching semantic information to the document; An annotation automatic processing unit for annotating the components of the document with the semantic information when the semantic information related to the elements of the document exists in the semantic information knowledge base; And an annotation preparation unit for annotating the semantic information to the elements of the document with respect to the semantic information designated by the user and the elements of the document.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 문서의 구성요소에 상기 의미정보로 자동으로 어노테이션을 하기 전에 사용자에게 어노테이션을 할 수 있다는 것을 알리는 것을 특징으로 한다.In addition, the semantic annotation system according to the present invention is characterized in that the user can be annotated before annotating the components of the document automatically with the semantic information.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 문서의 구성요소는 어휘를 포함하고, 상기 어노테이션 자동처리부는, 상기 문서의 어휘와 상기 의미정보 인스턴스의 이름 속성(값)이 동일한 경우에 매칭되는 것으로 판단하거나, 상기 문서의 어휘의 형태소와 상기 의미정보 클래스의 형태소 속성(값)이 동일한 경우에 관계되는 것으로 판단하는 것을 특징으로 한다.Further, in the semantic annotation system according to the present invention, the component of the document includes a vocabulary, and the annotation automatic processing unit is matched when the vocabulary of the document and the name attribute (value) of the semantic information instance are the same. Or a morpheme of the vocabulary of the document and a morpheme attribute (value) of the semantic information class.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 어노테이션 자동처리부는, 상기 문서의 어휘의 형태소와 동일한 형태소 속성(값)을 가지는 의미정보 클래스가 존재하면, 상기 클래스에 대한 인스턴스를 생성하고, 생성된 인스턴스의 이름 속성의 속성값을 상기 문서의 어휘로 정하고, 상기 문서의 어휘에 상기 인스턴스로 어노테이션을 하는 것을 특징으로 한다.In addition, in the semantic annotation system according to the present invention, the annotation automatic processing unit generates an instance of the class if the semantic information class having the same morpheme attribute (value) as the morpheme of the vocabulary of the document exists. The attribute value of the name attribute of the instance is set as the vocabulary of the document, and the vocabulary of the document is annotated with the instance.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 문서관리부는, 문서에 어노테이션을 한 의미정보가 의미정보 지식베이스에 인스턴스로 존재하면 상기 인스턴스에 대한 연결정보만을 첨부하는 것이다.In addition, in the semantic annotation system according to the present invention, the document management unit attaches only connection information for the instance if the semantic information annotated to the document exists as an instance in the semantic information knowledge base.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 문서관리부는, 상기 문서에 의미정보를 XML, XMP, RDF 형식 중 어느 하나의 형식으로 표기하여 첨부하는 것을 특징으로 한다.In addition, in the semantic annotation system according to the present invention, the document management unit is characterized in that the attached semantic information to the document in any one of the format of XML, XMP, RDF format attached.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 의미정보 관리부는 상기 온톨로지 저장소에 저장된 의미정보 지식베이스 중에서 사용자가 지정한 온톨로지를 로드(load)하고, 상기 어노테이션 자동처리부는 상기 로드된 의미정보 지식베이스만 대상으로 관계되는 의미정보가 있는지를 확인하는 것을 특징으로 한다.In the semantic annotation system according to the present invention, the semantic information management unit loads an ontology designated by a user from the semantic knowledge base stored in the ontology repository, and the annotation automatic processing unit loads the loaded semantic knowledge base. It is characterized by checking whether there is semantic information related to the target.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 의미정보 지식베이 스를 보여주는 온톨로지뷰, 문서를 보여주는 도큐멘트뷰, 어노테이션 정보를 보여주는 어노테이션뷰로 구성된 화면을 사용자 컴퓨터에 전송하는 인터페이스부를 더 포함하고, 상기 어노테이션 정보는 상기 도큐멘트뷰에서 활성화된 문서의 구성요소에 어노테이션을 하는 상기 온톨로지뷰의 활성화된 의미정보에 대한 정보인 것을 특징으로 한다.The semantic annotation system according to the present invention may further include an interface unit configured to transmit a screen including a ontology view showing a semantic information knowledge base, a document view showing a document, and an annotation view showing the annotation information to a user computer. The information may be information on activated semantic information of the ontology view that annotates components of the document activated in the document view.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 인터페이스부는, 상기 어노테이션 자동처리부가 어노테이션을 할 수 있는 문서의 구성요소로 알리는 상기 문서의 구성요소를 화면상에서 하일라이트(highlight)시키는 것을 특징으로 한다.In the semantic annotation system according to the present invention, the interface unit highlights a component of the document on the screen that the annotation automatic processing unit notifies the component of the document that can be annotated. .

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 속성에 대한 검색조건을 입력받고, 상기 검색조건에 부합되는 의미정보 인스턴스를 검색하여 결과를 보여주는 패싯 브라우저(Faceted Browser)를 더 포함하는 것을 특징으로 한다.The semantic annotation system according to the present invention may further include a faceted browser that receives a search condition for an attribute, searches for a semantic information instance that meets the search condition, and displays a result. .

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 문서에 작성된 어노테이션을 SCOT(Social Semantic Cloud of Tags) 형식으로 변환하여 저장하는 의미정보 출판부를 더 포함하는 것을 특징으로 한다.In addition, the semantic annotation system according to the present invention, characterized in that it further comprises a semantic information publishing unit for converting and storing the annotations created in the document in the SCOT (Social Semantic Cloud of Tags) format.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 문서는 웹문서, PDF문서, 이미지파일, 동영상파일, 음성파일 중 어느 하나이고, 상기 문서의 구성요소는 문서전체, 텍스트 또는 어휘, 이미지, 음성, 동영상, 링크 중 적어도 하나이상을 포함하는 것을 특징으로 한다.In the semantic annotation system according to the present invention, the document is any one of a web document, a PDF document, an image file, a video file, and an audio file, and the components of the document are the entire document, text or vocabulary, images, and voice. , At least one of a video and a link.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 문서는 사용 자 컴퓨터에 내장된 문서이거나 인터넷으로 연결되어 전송되어 가져온 문서인 것을 특징으로 한다.In addition, in the semantic annotation system according to the present invention, the document is a document embedded in a user computer or a document that is transferred and connected to the Internet is characterized in that the imported.

또한, 본 발명은 온톨로지 저장소를 구비하는 시맨틱 어노테이션 시스템에 관한 것으로서, 의미정보(메타데이터)를 상기 온톨로지 저장소에 저장하여 의미정보 지식베이스를 구축하는 의미정보 관리부; 문서를 불러오고 상기 문서에 의미정보를 첨부하는 문서관리부; 상기 구성요소와 의미정보의 관계여부를 판단하기 위하여, 상기 문서의 구성요소에 속성을 정의하고, 상기 구성요소의 속성과 비교하는 의미정보의 속성을 정하고, 관계여부를 판단하는 상기 속성들의 비교방법을 정하는 자동처리규칙을 저장하여 관리하는 자동처리규칙 관리부; 상기 문서의 각 구성요소에 대하여, 상기 구성요소의 속성과 비교하는 의미정보의 속성이 상기 자동처리규칙에 있으면, 상기 의미정보의 속성을 가지는 의미정보를 의미정보 지식베이스에서 검색하고, 검색된 의미정보의 상기 속성(값)과 상기 구성요소의 속성(값)을 비교하여 관계가 있는 것으로 판단되면, 상기 문서의 구성요소에 상기 검색된 의미정보로 어노테이션을 하는 어노테이션 자동처리부; 사용자가 지정하는 의미정보와 문서의 구성요소에 대하여, 상기 의미정보를 상기 문서의 구성요소에 어노테이션을 하는 어노테이션 작성부;를 포함하는 것을 특징으로 한다.The present invention also relates to a semantic annotation system having an ontology repository, comprising: a semantic information management unit for storing semantic information (metadata) in the ontology repository to construct a semantic information knowledge base; A document management unit for importing a document and attaching semantic information to the document; In order to determine the relationship between the component and the semantic information, an attribute is defined in an element of the document, an attribute of semantic information to be compared with an attribute of the element is determined, and a comparison method of the attributes for determining the relationship An automatic processing rule manager to store and manage the automatic processing rule for determining a value; For each component of the document, if the attribute of the semantic information to be compared with the attribute of the component is in the automatic processing rule, the semantic information having the attribute of the semantic information is searched in the semantic information knowledge base, and the retrieved semantic information is retrieved. An annotation automatic processing unit for annotating the components of the document with the retrieved semantic information when it is determined that the attributes (values) of the components and the attributes (values) of the components are related; And an annotation creation unit for annotating the semantic information to the elements of the document with respect to the semantic information designated by the user and the elements of the document.

또, 본 발명에 따른 시맨틱 어노테이션 시스템에 있어서, 상기 문서의 구성요소는 어휘를 포함하고, 상기 자동처리규칙 관리부는 상기 어휘에서 어휘 자체를 이름 속성으로 정의하고, 상기 어휘의 이름 속성과 비교하는 의미정보의 속성을 이 름 속성으로 정하고, 상기 속성들이 동일하면 관계가 있는 것으로 판단하는 규칙, 또는, 상기 어휘에서 어휘에 포함된 형태소를 형태소 속성으로 정의하고, 상기 어휘의 형태소 속성과 비교하는 의미정보의 속성을 형태소 속성으로 정하고, 상기 속성들이 동일하면 관계가 있는 것으로 판단하는 규칙을 자동처리규칙으로 저장하는 것을 특징으로 한다.In the semantic annotation system according to the present invention, a component of the document includes a vocabulary, and the automatic processing rule manager defines the vocabulary itself as a name attribute in the vocabulary, and compares the vocabulary with the name attribute of the vocabulary. A semantic information that defines an attribute of information as a name attribute, defines a rule that the attributes are related, or defines a morpheme included in a vocabulary in the vocabulary as a morpheme attribute and compares the morpheme attribute of the vocabulary. Set the attribute of as a morpheme attribute, and if the attributes are the same, it is characterized in that to store the rules that are determined to be relevant as an automatic processing rule.

또한, 본 발명은 온톨로지 저장소를 이용하는 시맨틱 어노테이션 방법에 관한 것으로서, (a) 의미정보(메타데이터)를 상기 온톨로지 저장소에 저장하여 의미정보 지식베이스를 구축하는 단계; (b) 사용자 지정에 의하여 상기 저장된 의미정보 지식베이스들 중 일부를 로드하는 단계; (c) 사용자 지정에 의하여 문서를 불러오는 단계; (d) 상기 문서의 모든 구성요소에 대하여, 상기 구성요소와 관계되는 의미정보가 상기 의미정보 지식베이스에 존재하면, 상기 문서의 구성요소에 상기 의미정보로 어노테이션을 하는 단계; (e) 사용자가 지정하는 의미정보와 문서의 구성요소에 대하여, 상기 의미정보를 상기 문서의 구성요소에 어노테이션을 하는 단계; (f) 상기 의미정보들을 문서에 첨부하는 단계;를 포함하는 것을 특징으로 한다.The present invention also relates to a semantic annotation method using an ontology repository, comprising: (a) storing semantic information (metadata) in the ontology repository to build a semantic information knowledge base; (b) loading some of the stored semantic knowledge bases by user designation; (c) retrieving a document by user specification; (d) annotating the components of the document with the semantic information when the semantic information related to the components exists in the semantic information knowledge base for all the components of the document; (e) annotating the semantic information to the elements of the document with respect to the semantic information designated by the user and the elements of the document; (f) attaching the semantic information to the document.

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, 상기 (d)단계는, 상기 문서의 구성요소에 상기 의미정보를 어노테이션을 하기 전에 어노테이션을 할 수 있는 것을 알리는 것을 특징으로 한다.Further, in the semantic annotation method according to the present invention, step (d) is characterized by notifying the components of the document that the annotation can be performed before annotating the semantic information.

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, 상기 (d)단계는, (d1) 상기 문서의 모든 구성요소 각각에 대하여, 상기 구성요소가 어휘이면, 다음 (d2)단계 내지 (d5)단계를 수행하는 단계; (d2) 상기 문서의 어휘와 동일한 이름 속성(값)을 갖는 의미정보 인스턴스가 존재하면, 상기 문서의 어휘에 상기 인스턴스로 어노테이션을 하고 (d5)단계를 수행하고, 존재하지 않으면 (d3)단계를 수행하는 단계; (d3) 상기 문서의 어휘의 형태소와 동일한 형태소 속성(값)을 갖는 의미정보 클래스가 존재하면, 상기 클래스에 대한 인스턴스를 생성하고, 생성된 인스턴스의 이름 속성의 속성값을 상기 문서의 어휘로 정하는 단계; (d4) 상기 문서의 어휘에 상기 인스턴스로 어노테이션을 하는 단계; (d5) 문서의 모든 구성요소에 대하여, 상기 (d2)단계 내지 (d4)단계를 수행하는 단계;를 포함하는 것을 특징으로 한다.In the semantic annotation method according to the present invention, the step (d) may include the following steps (d2) to (d5) if the component is a vocabulary for each component of the document (d1). Performing; (d2) If there is an instance of semantic information having the same name attribute (value) as the vocabulary of the document, annotate the vocabulary of the document with the instance, perform step (d5), and if not exist, step (d3) Performing; (d3) if a semantic information class having the same stemming attribute (value) as the morpheme of the document's vocabulary exists, create an instance for the class, and set the attribute value of the name attribute of the created instance as the vocabulary of the document; step; (d4) annotating the vocabulary of the document to the instance; (d5) performing the steps (d2) to (d4) for all components of the document.

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, 상기 (f)단계는, 문서에 어노테이션을 한 의미정보가 의미정보 지식베이스에 인스턴스로 존재하면 상기 인스턴스에 대한 연결정보만을 첨부하는 것을 특징으로 한다.In the semantic annotation method according to the present invention, the step (f) is characterized in that only the connection information for the instance is attached if the semantic information annotated to the document exists as an instance in the semantic information knowledge base.

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, 상기 (f)단계는, 상기 문서에 의미정보를 XML, XMP, RDF 형식 중 어느 하나의 형식으로 표기하여 첨부하는 것을 특징으로 한다.In the semantic annotation method according to the present invention, step (f) is characterized in that the semantic information is attached to the document in one of XML, XMP, and RDF formats.

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, 상기 (d)단계는, 어노테이션을 할 수 있는 문서의 구성요소로 알리는 상기 문서의 구성요소를 화면상에서 하일라이트(highlight)시키는 것을 특징으로 한다.Further, in the semantic annotation method according to the present invention, the step (d) is characterized by highlighting the components of the document that are notified of the components of the document that can be annotated on the screen.

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, (f0) (f)단계이전에 상기 문서에 작성된 어노테이션을 SCOT(Social Semantic Cloud of Tags) 형식으로 변환하여 저장하는 단계를 더 포함하는 것을 특징으로 한다.In addition, the semantic annotation method according to the present invention, characterized in that it further comprises the step of converting and storing the annotations written in the document before the step (f0) (f) in the SCOT (Social Semantic Cloud of Tags) format .

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, 상기 문서는 웹문서, PDF문서, 이미지파일, 동영상파일, 음성파일 중 어느 하나이고, 상기 문서의 구성요소는 문서전체, 텍스트 또는 어휘, 이미지, 음성, 동영상, 링크 중 적어도 하나이상을 포함하는 것을 특징으로 한다.In the semantic annotation method according to the present invention, the document is any one of a web document, a PDF document, an image file, a video file, and an audio file, and the components of the document are the entire document, text or vocabulary, images, and voice. , At least one of a video and a link.

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, 상기 문서는 사용자 컴퓨터에 내장된 문서이거나 인터넷으로 연결되어 전송되어 가져온 문서인 것을 특징으로 한다.In the semantic annotation method according to the present invention, the document may be a document embedded in a user computer or a document imported by being connected to the Internet.

또한, 본 발명은 온톨로지 저장소를 이용하는 시맨틱 어노테이션 방법에 관한 것으로서, (g) 의미정보(메타데이터)를 상기 온톨로지 저장소에 저장하여 의미정보 지식베이스를 구축하는 단계; (h) 문서의 구성요소와 의미정보의 관계여부를 판단하기 위하여, 상기 문서의 구성요소에 속성을 정의하고, 상기 구성요소의 속성과 비교하는 의미정보의 속성을 정하고, 관계여부를 판단하는 상기 속성들의 비교방법을 정하는 자동처리규칙을 작성하여 저장하는 단계; (i) 사용자 지정에 의하여 상기 저장된 의미정보 지식베이스들 중 일부를 로드하는 단계; (j) 사용자 지정에 의하여 문서를 불러오는 단계; (k) 상기 문서의 각 구성요소에 대하여, 상기 구성요소의 속성과 비교하는 의미정보의 속성이 상기 자동처리규칙에 있으면, 상기 의미정보의 속성을 가지는 의미정보를 의미정보 지식베이스에서 검색하고, 검색된 의 미정보의 상기 속성(값)과 상기 구성요소의 속성(값)을 비교하여 관계가 있는 것으로 판단되면, 상기 문서의 구성요소에 상기 검색된 의미정보로 어노테이션을 하는 단계; (l) 사용자가 지정하는 의미정보와 문서의 구성요소에 대하여, 상기 의미정보를 상기 문서의 구성요소에 어노테이션을 하는 단계; (m) 상기 의미정보들을 문서에 첨부하는 단계;를 포함하는 것을 특징으로 한다.The present invention also relates to a semantic annotation method using an ontology repository, comprising: (g) constructing a semantic information knowledge base by storing semantic information (metadata) in the ontology repository; (h) In order to determine the relationship between the components of the document and the semantic information, the attributes are defined in the components of the document, the attributes of the semantic information to be compared with the attributes of the components, and the relationship to determine the relationship Creating and storing an automatic processing rule for determining a method of comparing the attributes; (i) loading some of the stored semantic knowledge bases by user designation; (j) retrieving a document by user specification; (k) for each component of the document, if the attribute of the semantic information to be compared with the attribute of the component is in the automatic processing rule, the semantic information having the attribute of the semantic information is searched in the semantic information knowledge base, Annotating the elements of the document with the retrieved semantic information when it is determined that the attributes (values) of the retrieved meaning information are related by comparing the attributes (values) of the elements; (l) annotating the semantic information to components of the document with respect to the semantic information designated by the user and the components of the document; (m) attaching the semantic information to the document.

또, 본 발명에 따른 시맨틱 어노테이션 방법에 있어서, 상기 문서의 구성요소는 어휘를 포함하고, 상기 (h)단계는, 상기 어휘에서 어휘 자체를 이름 속성으로 정의하고, 상기 어휘의 이름 속성과 비교하는 의미정보의 속성을 이름 속성으로 정하고, 상기 속성들이 동일하면 관계가 있는 것으로 판단하는 규칙, 또는, 상기 어휘에서 어휘에 포함된 형태소를 형태소 속성으로 정의하고, 상기 어휘의 형태소 속성과 비교하는 의미정보의 속성을 형태소 속성으로 정하고, 상기 속성들이 동일하면 관계가 있는 것으로 판단하는 규칙을 자동처리규칙으로 저장하는 것을 특징으로 한다.In the semantic annotation method according to the present invention, the components of the document include a vocabulary, and in step (h), the vocabulary itself is defined as a name attribute in the vocabulary, and the name attribute of the vocabulary is compared with the name attribute of the vocabulary. A semantic information that defines an attribute of semantic information as a name attribute, defines a rule that is related when the attributes are identical, or defines a morpheme included in a vocabulary in the vocabulary as a morpheme attribute and compares the morpheme attribute of the vocabulary. Set the attribute of as a morpheme attribute, and if the attributes are the same, it is characterized in that to store the rules that are determined to be relevant as an automatic processing rule.

또한, 본 발명은 상기한 시맨틱 어노테이션 방법을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The invention also relates to a computer readable recording medium having recorded the semantic annotation method described above.

상술한 바와 같이, 본 발명에 따른 시맨틱 어노테이션 시스템 및 방법에 의하면, 문서의 정보를 문서단위가 아닌 문서의 구성요소단위로 세분화하여, 문서의 의미정보를 온톨로지로 구조화시킨 의미정보 지식베이스를 이용하여 어노테이션을 함으로써, 사람뿐만 아니라 기계도 의미를 파악하여 적합한 결과를 찾아주는, 의미 기반의 문서검색을 할 수 있는 효과가 얻어진다.As described above, according to the semantic annotation system and method according to the present invention, by using the semantic information knowledge base structured the document information is divided into the component units of the document rather than the document unit, and the semantic information of the document is structured by the ontology By annotating, the semantic-based document retrieval can be obtained, in which not only people but also machines can grasp the meaning and find suitable results.

또, 본 발명에 따른 시맨틱 어노테이션 시스템 및 방법에 의하면, 종래에 구축된 방대한 웹문서 또는 일반문서들에 온톨로지로 구축된 의미정보로 자동으로 또는 반자동으로 어노테이션을 하게 됨으로써, 종래의 방대한 문서들을 보다 빠르고 정확하게 어노테이션을 할 수 있는 효과가 얻어진다.In addition, according to the semantic annotation system and method according to the present invention, by automatically or semi-automatically annotated with semantic information constructed ontologies to a large number of conventional web documents or general documents, it is possible to quickly The effect of annotating correctly is obtained.

또, 본 발명에 따른 시맨틱 어노테이션 시스템 및 방법에 의하면, 웹문서이외에도 현재 표준문서로 많이 이용되는 PDF문서를 지원하거나 메타태그의 표준으로 많이 사용되는 SCOT 형식 등을 지원함으로써, 다른 표준기술과의 호환성을 높여 사용의 편리성을 도모하는 효과가 얻어진다.In addition, according to the semantic annotation system and method according to the present invention, by supporting a PDF document commonly used as a standard document in addition to the web document, or by supporting a SCOT format commonly used as a standard for meta tags, compatibility with other standard technologies It is possible to obtain an effect of increasing the ease of use.

본 발명의 실시를 위한 구체적인 내용을 설명하기에 앞서서, 이하에서 언급될 문서, 의미정보, 의미정보 지식베이스, 어노테이션의 개념을 설명한다.Prior to describing the details for carrying out the present invention, the concept of a document, semantic information, semantic information knowledge base, and annotations to be described below will be described.

본 발명에 따른 문서란 사람의 시각 또는 청각 등 감각을 통해 정보를 전달할 수 있도록 만들어진 전자파일을 의미한다. 따라서 PDF파일나 한글파일 등 일반적인 전자문서, 또는 이미지파일이나 동영상 파일 등이 이에 해당된다. 한편, 문서는 정보를 사람에게 쉽게 전달하는데 반해, 검색 에이전트 등 기계는 이러한 문서 정보를 의미적으로(semantically) 해석하는데 어려움이 많다. 이를 해결하기 위한 것이 메타데이터이다.The document according to the present invention refers to an electronic file made to transmit information through a sense of sight or hearing of a person. Therefore, general electronic documents such as PDF files or Korean files, or image files or video files correspond to this. On the other hand, while documents easily convey information to people, machines such as search agents have difficulty in interpreting such document information semantically. The solution to this is metadata.

본 발명에 따른 의미정보는 검색에이전트 등 기계도 문서의 의미정보(semantic information)를 해석할 수 있도록 문서의 정보를 설명하는 메타데이터(metadata)를 의미한다. 상기 의미정보는 문서의 정보를 구조화하는 메타데이터나 메타데이터를 표기하기 위한 표준규격을 의미하는 것은 아니다. 따라서 HTML, XML, RDF 등 메타데이터를 표기하기 위한 표준규격들은 의미정보에 해당되지 않는다. 다만, 본 발명에서는 공지된 메타데이터 표기의 표준규격을 적절하게 선택하여 문서 내에 의미정보를 표기하는데 이용한다.The semantic information according to the present invention refers to metadata that describes information of a document so that a machine such as a search agent can also interpret semantic information of the document. The semantic information does not mean a metadata for structuring information of a document or a standard standard for indicating metadata. Therefore, standard specifications for indicating metadata such as HTML, XML, and RDF do not correspond to semantic information. However, in the present invention, the standard specification of a known metadata notation is appropriately selected and used to indicate semantic information in a document.

본 발명에 따른 의미정보 지식베이스는 상기 의미정보들을 온톨로지로 구조화하여 온톨로지 저장소에 저장되는 의미정보 전체를 의미한다. 상기 의미정보가 일반적인 메타데이터와 같이 기계가 해석하기 위하여 일정한 규칙에 의하여 작성되고 구조화된 메타데이터이어야 하므로, 상기 의미정보를 온톨로지로 구성하여 일정한 규칙에 의해 작성하고 구조화시켜 의미정보 지식베이스로 구축한 것이다. 이렇게 지식베이스로 구축하게 되면, 한 문서에 의미정보를 작성하여 첨부하고 끝내는 것이 아니라, 작성된 의미정보들을 다른 문서에도 어노테이션을 하는데 이용될 수 있다. 또, 어떤 문서에 의미정보를 작성하면서 이미 구축된 의미정보 지식베이스에 없으면, 의미정보를 새로 만들고 상기 의미정보 지식베이스에 추가하여 갱신한다. 따라서 상기 의미정보 지식베이스는 계속 추가, 변경 등 갱신하여 진화해 나가는 온톨로지 저장소이다.The semantic information knowledge base according to the present invention means the whole semantic information stored in the ontology repository by constructing the semantic information into an ontology. Since the semantic information must be metadata that is created and structured by a certain rule in order for the machine to interpret it like general metadata, the semantic information is composed of ontology, created and structured by a certain rule, and constructed into a semantic information knowledge base. will be. This knowledge base can be used to annotate semantic information in other documents, rather than creating and appending semantic information to one document. If the semantic information is written in a document and is not already in the semantic knowledge base, the semantic information is created and added to the semantic knowledge base. Therefore, the semantic information knowledge base is an ontology repository that continues to evolve by adding, changing, and updating.

또, 본 발명에 따른 어노테이션은 상기 문서에 상기 의미정보를 작성하여 첨부하는 것을 의미한다. In addition, the annotation according to the present invention means creating and attaching the semantic information to the document.

이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다.DETAILED DESCRIPTION Hereinafter, specific contents for carrying out the present invention will be described with reference to the drawings.

또한, 본 발명을 설명하는데 있어서 동일 부분은 동일 부호를 붙이고, 그 반복 설명은 생략한다.In addition, in describing this invention, the same code | symbol is attached | subjected and the repeated description is abbreviate | omitted.

도 1a에서 보는 바와 같이, 본 발명에 따른 시맨틱 어노테이션 시스템(30)는 컴퓨터에 내장되어 직접 사용자(15)에 의해 이용되거나, 도 1b에서 보는 바와 같이, 인터넷(5)에 연결되어 사용자 컴퓨터(15)에 의해 접속되어 이용되기도 한다.As shown in FIG. 1A, the semantic annotation system 30 according to the present invention is embedded in a computer and used directly by the user 15, or as shown in FIG. 1B, connected to the Internet 5 to the user computer 15. It is also used by connecting.

전자의 경우에는, 상기 시맨틱 어노테이션 시스템(30)는 컴퓨터에 설치되어 직접 사용자(15)에 의해 이용될 수 있다. 상기 컴퓨터는 일반적인 범용 컴퓨터나 시맨틱 어노테이션을 전문적으로 처리하기 위한 전용 컴퓨터 모두 가능하다. 상기 컴퓨터는 본 분야에 통상으로 사용되는 공지기술이므로 구체적 설시는 생략한다.In the former case, the semantic annotation system 30 may be installed in a computer and used directly by the user 15. The computer may be a general-purpose general purpose computer or a dedicated computer for professionally processing semantic annotations. Since the computer is a known technology commonly used in the art, detailed description thereof will be omitted.

상기 문서(25)는 외부로부터 읽어들인 문서나 컴퓨터 내부에 있는 문서 모두 가능하다. 또, 상기 문서(25)는 웹문서나 PDF 등 일반적인 컴퓨터로 처리할 수 있는 문서는 어느 것이나 가능하다. 그러나 본 발명에서는 일실시예로서 주로 웹문서나 PDF문서를 위주로 설명하기로 한다. 상기 문서들은 본 분야에 통상으로 사용되는 공지기술이므로 구체적 설시는 생략한다. 상기 시맨틱 어노테이션 시스템(30)는 문서(25)를 읽어들여 내부적으로 자동적으로 처리하거나 사용자(15)의 명령을 받아 상기 문서(25)에 어노테이션을 첨부한다. 상기 문서(25)는 전혀 어노테이션이 되지 않는 문서일 수도 있고, 어노테이션이 이미 되어 있는 문서일 수도 있다. 즉, 상기 시맨틱 어노테이션 시스템(30)는 상기 문서(25)에 새로 어노테이션을 추가하거나 수정 또는 삭제 등의 기능 등을 수행할 수 있다.The document 25 may be a document read from the outside or a document inside the computer. The document 25 may be any document that can be processed by a general computer such as a web document or a PDF. However, in the present invention, a web document or a PDF document will be mainly described as an embodiment. Since the above documents are well-known technologies commonly used in the art, detailed description thereof will be omitted. The semantic annotation system 30 reads the document 25 and automatically processes it internally, or attaches an annotation to the document 25 at the command of the user 15. The document 25 may be a document that is not annotated at all, or may be a document that has already been annotated. That is, the semantic annotation system 30 may perform a function such as adding a new annotation to the document 25, modifying or deleting the document.

상기 온톨로지 저장소(40)는 정보를 온톨로지로 구성하여 저장할 수 있는 일반적인 온톨로지 저장소로서, 클래스를 정의하는 기능, 속성들을 규정하고 속성값을 입력하는 기능, 인스턴스를 생성하고 삭제, 속성값들의 입력, 변경 기능 등을 수행할 수 있는 통상의 온톨로지 DB엔진을 가진다. 상기 온톨로지 저장소(40)도 이 분야에서 이용되는 공지기술이므로 구체적인 설시는 생략한다.The ontology repository 40 is a general ontology repository for organizing and storing information as an ontology. The ontology repository 40 may define a class, define a class, define attributes and input attribute values, create and delete instances, and input and change attribute values. It has a normal ontology DB engine that can perform functions. The ontology reservoir 40 is also a well-known technology used in this field, so a detailed description thereof will be omitted.

도 1b에서 보는 바와 같이, 본 발명에 따른 시맨틱 어노테이션 시스템(30)는 인터넷(5)에 연결되어 인터넷에 산재하는 문서 정보들을 어노테이션을 할 수 있다. 인터넷(5)상에서는 어노테이션을 할 문서들을 제공하는 서비스 제공자 서버(20)와 이들 정보를 이용하는 서비스 사용자 컴퓨터(10)로 구분된다. 상기 시맨틱 어노테이션 시스템(30)는 상기 서비스 제공자(20)가 제공하는 정보들을 상기 서비스 사용자(10)가 보다 손쉽게 검색하고 이용할 수 있도록 어노테이션을 지원하는 서비스 중개자의 장치로서의 기능을 한다. 즉, 서비스 중개자는 문서에 어노테이션에 필요한 기본적인 온톨로지 저장소나 어노테이션 시스템 등을 지원한다. 이를 이용하여 서비스 제공자(20)는 어노테이션을 직접 작성하여 첨부한 문서를 제공하거나 어노테이션을 할 수 있는 문서를 제공한다. 서비스 사용자(10)는 정보를 얻기 위해 상기 문서를 주로 이용하기도 하지만, 직접 문서에 어노테이션을 할 수 있다. 즉, 상기 시맨틱 어노테이션 시스템(30)는 서비스 사용자나 서비스 제공자 모두 사용이 가능하나 각 주체에 따라 기능을 제한할 수 있게 할 수 있다. 예컨대, 새로운 클래스의 생성 등은 서비스 사용자나 서비스 제공자에 금지할 수 있다. 이러한 정책은 본 분야에 통상으로 이용되는 공지기술이므로 구체적인 설시는 생략한다.As shown in FIG. 1B, the semantic annotation system 30 according to the present invention may be connected to the Internet 5 to annotate document information scattered on the Internet. On the Internet 5, it is divided into a service provider server 20 which provides documents to be annotated, and a service user computer 10 using this information. The semantic annotation system 30 functions as a device of a service broker that supports annotation so that the service user 10 can more easily search and use information provided by the service provider 20. In other words, the service broker supports the basic ontology repository or annotation system necessary for annotating documents. Using this, the service provider 20 directly creates an annotation and provides an attached document or a document that can be annotated. The service user 10 often uses the document to obtain information, but can annotate the document directly. That is, the semantic annotation system 30 can use both a service user and a service provider, but can limit the function according to each subject. For example, creation of a new class may be prohibited to a service user or a service provider. This policy is a well-known technique commonly used in the art, so detailed description thereof will be omitted.

다음으로, 다음에 본 발명의 일실시예에 따른 시맨틱 어노테이션 시스템에 대해 도 2를 참조하여 설명한다.Next, a semantic annotation system according to an embodiment of the present invention will be described with reference to FIG. 2.

도 2는 본 발명의 일실시예에 따른 시맨틱 어노테이션 시스템의 구성에 대한 블록도이다.2 is a block diagram of a configuration of a semantic annotation system according to an embodiment of the present invention.

도 2에서 도시한 바와 같이, 본 발명의 일실시예에 따른 시맨틱 어노테이션 시스템(30)는 의미정보 관리부(31), 문서관리부(32), 어노테이션 자동처리부(33), 어노테이션 작성부(34)로 구성된다. 또, 상기 시맨틱 어노테이션 시스템(30)는 온톨로지 저장소(40)와 연동한다. 그리고 추가적으로 인터페이스부(36), 의미정보 출판부(37), 패싯 브라우저(38)를 포함할 수 있다.As shown in FIG. 2, the semantic annotation system 30 according to an embodiment of the present invention includes a semantic information managing unit 31, a document managing unit 32, an annotation automatic processing unit 33, and an annotation preparing unit 34. It is composed. In addition, the semantic annotation system 30 is interlocked with the ontology reservoir 40. In addition, the interface unit 36 may include a semantic information publishing unit 37 and a facet browser 38.

한편, 상기 시맨틱 어노테이션 시스템(30)에 자동처리규칙 관리부(35)를 추가하여 다른 실시예를 구성할 수 있다. 그러나 여기서 별도의 실시예로 나누어 설명하지 않고 이하에 같이 설명한다.Meanwhile, another embodiment may be configured by adding an automatic processing rule manager 35 to the semantic annotation system 30. However, it will be described below without dividing the description into separate embodiments.

상기 의미정보 관리부(31)는 의미정보를 온톨로지로 구성하여 상기 온톨로지 저장소에 저장하고 의미정보 지식베이스로 구축된다. 의미정보는 문서를 의미적으로(semantically) 설명하기 위한 메타데이터들이다. 즉, 일반적인 문서는 자연어로 기술된 문장으로 구성되어 있으나 기계적으로는 자연어를 해석하지 않는 이상 그 의미를 파악하지 못한다. 예컨대 통상 논문은 PDF 문서로 작성되면 논문제목과 작성자, 요약, 본문, 주석 등을 사람이 보면 바로 알 수 있으나 기계는 그 의미를 해석하지 못한다. 따라서 이러한 문서에 대하여 작성자, 제목 등을 구분하여 기록하는 메타데이터가 의미정보이다. 의미정보는 상기와 같이 문서 전체에 대한 의미정보 외에도 문서 내의 문구나 어휘, 이미지, 동영상, 음성 등에 대한 의미정보를 기록할 수 있다.The semantic information manager 31 configures semantic information into an ontology, stores the semantic information in the ontology repository, and constructs the semantic information knowledge base. Semantic information is metadata for semantically describing a document. In other words, a general document is composed of sentences written in natural language, but its meaning is not understood unless the natural language is mechanically interpreted. For example, if a general paper is written as a PDF document, the title, author, summary, text, and comments can be seen by humans, but the machine cannot interpret the meaning. Therefore, the metadata that records the author, the title, and the like for these documents is semantic information. The semantic information may record semantic information about a phrase, a vocabulary, an image, a video, a voice, etc. in the document, in addition to the semantic information of the entire document as described above.

또, 상기 의미정보는 의미적 상호운용성을 위해 온톨로지로 구성된다. 상기의 예와 같이, 제목, 작성자, 요약 등을 하나의 클래스의 속성들로 묶음으로써 서로 상호 의미적 관계를 설정해줄 수 있다. 또, 문서의 종류에 따라, 즉, 일반 서적, 매뉴얼, 리포트, 사전 등에 따라 의미정보는 다르게 구성될 수 있으므로, 이를 분류하여 상위 클래스, 하위 클래스 등으로 구성하여 구조적으로 의미를 가질 수 있도록 온톨로지로 구성한다. 따라서 상기 의미정보 지식베이스는 클래스는 트리형태로 구성되고, 각 클래스에서는 인스턴스를 생성할 수 있고, 해당 클래스에 생성된 인스턴스들은 해당 클래스의 리스트로 관리한다. 또한, 각 클래스 또는 인스턴스에 속하는 속성들도 리스트로 관리한다.In addition, the semantic information is composed of an ontology for semantic interoperability. As in the above example, the semantic relationship can be established with each other by grouping the title, the author, the summary, and the like into properties of one class. In addition, since semantic information may be configured differently according to types of documents, that is, general books, manuals, reports, dictionaries, etc., the ontology is classified into upper classes, lower classes, and the like to have structural meaning. Configure. Therefore, the semantic knowledge base of the class is formed in a tree form, each class can create an instance, the instances created in the class is managed as a list of the class. It also manages the properties belonging to each class or instance as a list.

또, 상기 의미정보는 온톨로지로 구성된 하나 이상의 의미정보 지식베이스로 구축할 수 있다. 온톨로지도 도메인 온톨로지, 개념표현을 위한 온톨로지, 일상적 온톨로지 등 유형별로 구분될 수도 있고, 분야에 따라서(특히, 도메인 온톨로지) 여러 온톨로지를 구성할 수 있다. 또, 이미 구성된 온톨로지를 가져와 이용하는 경 우도 있을 수 있다. 이런 경우 같은 의미정보에 대하여 다른 개념으로 온톨로지를 구성하는 경우도 발생한다. 마찬가지로, 의미정보 지식베이스도 다양한 지식베이스로 구현되고 동일한 의미정보에 대하여 다른 개념의 지식베이스가 구축될 수도 있다. 따라서 상기 온톨로지 저장소에는 모든 의미정보에 대한 지식베이스를 저장하지만, 사용자의 지정에 따라 그 일부만 로드(load)하여 이용할 수 있어야 한다. 즉, 상기 의미정보 관리부(31)는 어노테이션을 할 문서와 관련된 의미정보 지식베이스를 개별적으로 선택하여 로드(load)시킬 수 있고, 일부만이 지정되어 로드(load)되면, 이후 처리되는 의미정보는 로드된 의미정보 지식베이스에 한정된다.In addition, the semantic information may be constructed as one or more semantic information knowledge bases composed of ontology. Ontologies may be classified by type, such as domain ontology, ontology for conceptual expression, and everyday ontology, and various ontologies may be configured according to a field (particularly, domain ontology). In addition, there may be a case where the ontology already configured is used. In this case, ontologies may be constructed with different concepts for the same semantic information. Similarly, the semantic knowledge base may be implemented with various knowledge bases, and knowledge bases of different concepts may be constructed for the same semantic information. Therefore, the ontology store stores a knowledge base of all semantic information, but only a part thereof can be loaded and used according to a user's designation. That is, the semantic information management unit 31 may individually select and load the semantic information knowledge base associated with the document to be annotated. If only a part is designated and loaded, the semantic information to be processed is loaded. Limited to semantic knowledge base.

상기 문서관리부(32)는 문서를 불러오고 상기 문서에 의미정보들을 첨부한다. 문서는 문서를 작성하는 소프트웨어에 따라 다양한 문서가 존재하듯 모든 형식의 문서를 모두 포함할 수 있다. 상기 문서들은 어노테이션을 첨부할 수 있는 기능이 있는 것이 바람직하다. 예컨대, 웹문서는 RDF, XML, SCOT 등 온톨로지나 메타태그 등을 지원하고 있기 때문에 어노테이션 정보를 웹문서에 첨부할 수 있는 플랫폼을 갖추어져 있다. 또, PDF도 XMP 플랫폼이 갖추어져 있어 자체적으로 의미정보를 편집하고 검색할 수 있도록 지원하고 있다. 상기 문서관리부(32)는 상기 문서가 PDF문서인 경우에는 의미정보를 XMP 형식으로 첨부한다. 그 외에 XML, XMP, RDF 형식 중 어느 하나의 형식으로 표기하여 첨부할 수도 있다. 그러나 한글 문서, 워드 문서, 이미지파일, 동영상파일 등은 이러한 기능이 없기 때문에 어노테이션을 첨부하기 위해서는 별도로 이를 처리하기 위한 수단이 필요하다. 예컨대, 상기 문서에 수행된 어노테이션 정보를 별도로 관리하고 단지 상기 문서와 연결정보만을 갖게 할 수 있다. 그렇더라도 문서내의 특정 위치에 있는 어휘 및 문구, 또는 (이미지파일 또는 동영상 파일의) 설명에 대한 어노테이션을 처리하는 것은 쉽지가 않다.The document management unit 32 retrieves a document and attaches semantic information to the document. Documents can include all types of documents, just as there are different documents, depending on the software that creates them. The documents preferably have the ability to attach annotations. For example, web documents support ontologies such as RDF, XML, SCOT, meta tags, etc., and thus have a platform for attaching annotation information to web documents. PDF also has an XMP platform, which allows users to edit and search semantic information. The document management unit 32 attaches semantic information in XMP format when the document is a PDF document. In addition, it can be attached in the form of XML, XMP, or RDF. However, Korean documents, word documents, image files, video files, etc. do not have such a function, so to attach an annotation, a means for processing them separately is required. For example, annotation information performed on the document can be separately managed and only the document and connection information can be provided. Nevertheless, it is not easy to deal with annotations on vocabulary and phrases or descriptions (of image files or video files) at specific locations in the document.

한편, 상기 문서관리부(32)는 어노테이션이 되는 의미정보들을 문서에 첨부한다. 다만, 문서에 어노테이션을 한 의미정보가 의미정보 지식베이스에 인스턴스로 존재하면 상기 인스턴스에 대한 연결정보만을 첨부하는 것이 바람직하다. 즉, 기 문서관리부(32)는 문서에 어노테이션을 한 의미정보가 의미정보 지식베이스에 인스턴스로 존재하면 상기 인스턴스에 대한 연결정보만을 첨부할 수 있다. 왜냐하면, 온톨로지 정보까지 포함하는 인스턴스 전체를 첨부하는 것은 문서의 크기를 커지게 하기 때문이다. 그러나 인스턴스 중에서 필요한 속성값들만 첨부할 수 있는 규칙들을 정해 일부만 첨부한다면 연결정보만 첨부하는 것보다는 크기가 커질 수 있지만, 나중에 문서 검색 등을 할 때 보다 빠르게 할 수 있기 때문에, 일부 첨부방법도 바람직한 실시예 중 하나이다.Meanwhile, the document manager 32 attaches semantic information that is annotated to the document. However, if the semantic information annotated to the document exists as an instance in the semantic information knowledge base, it is preferable to attach only the connection information for the instance. That is, the document management unit 32 may attach only the connection information for the instance if the semantic information annotated to the document exists as an instance in the semantic information knowledge base. This is because attaching the entire instance including ontology information increases the size of the document. However, if you attach a rule by attaching only the required attribute values among instances, the size may be larger than attaching only the connection information, but it may be faster when searching documents later. One example is

한편, 상기 문서관리부(32)에서 불러오는 문서는 앞서 본 바와 같이, 사용자 컴퓨터에 내장된 문서이거나 인터넷으로 연결되어 전송되어 가져온 문서일 수도 있다. 즉, 문서가 어느 곳에 저장되어 있는가는 중요하지 않다.Meanwhile, as described above, the document loaded from the document manager 32 may be a document embedded in a user computer or a document imported by being connected to the Internet. In other words, it does not matter where the document is stored.

상기 어노테이션 자동처리부(33)는 상기 문서의 구성요소과 관계되는 의미정보가 상기 의미정보 지식베이스에 존재하면, 상기 문서의 구성요소에 상기 의미정보를 어노테이션을 한다. The annotation automatic processing unit 33 annotates the semantic information to the component of the document when the semantic information related to the component of the document exists in the semantic information knowledge base.

일반적으로 문서는 문서전체, 페이지, 텍스트 또는 어휘, 이미지, 음성, 동영상, 링크 등으로 구성되어 있다. 상기와 같은 요소들을 문서의 구성요소로 부르기로 한다. 통상의 문서는 상기 대부분의 구성요소가 다 포함될 수 있으나, 이미지, 동영상 등 문서파일은 구성요소가 이미지나 동영상만 포함될 수도 있다. 문서전체는 문서 자체에 대한 것을 의미한다. 텍스트는 문서에서 텍스트로 기술된 것을 말하는 것으로서, 통상의 자연어 기술이다. 상기 텍스트는 문단이나 문구, 어휘 등으로 다시 세분화되어 문서의 구성요소로 볼 수 있다. 문단은 그 종류에 따라 제목이나 요약, 본문, 주석 등으로 구분되어 문서의 구성요소로 구분되거나 구성요소인 문단의 종류로 구분할 수도 있다. 다시 말하면, 문서는 문단으로 구성되어 있고, 그 구성요소인 문단은 요약, 본문, 주석 등의 종류가 있다라고 볼 수도 있고, 문서는 요약 문단, 본문 문단, 주석 문단 등으로 구성되어 있다라고 볼 수도 있다. 또한, 이미지, 음성, 동영상, 표, 링크 등은 텍스트 이외의 특수 형태의 요소들이다. 텍스트 이외의 특수형태 요소를 하나의 문서의 구성요소로 정하고, 각 음성, 동영상 등은 종류로 구분할 수도 있다.Generally, a document is composed of an entire document, pages, text or vocabulary, images, voice, video, links, and the like. Such elements will be referred to as document components. A typical document may include most of the above components, but a document file such as an image or a video may include only an image or a video. The whole document is about the document itself. Text refers to what is described as text in a document, which is a common natural language description. The text may be further subdivided into paragraphs, phrases, vocabulary, etc., and viewed as elements of the document. Paragraphs are divided into headings, summaries, texts, and comments, depending on the type. In other words, a document is composed of paragraphs, and its components may be regarded as types of summary, text, comments, etc., and documents may be regarded as summary paragraphs, text paragraphs, and comment paragraphs. have. Also, images, voices, videos, tables, links, etc. are specially shaped elements other than text. Special form elements other than text may be designated as elements of one document, and each voice, video, etc. may be classified into types.

즉, 본 발명은 상기와 같이 다양한 구성요소들을 각각 실시예로 나누어 실시할 수 있다. 상기와 같이 문서의 구성요소를 구분하는 이유는 상기 각 구성요소에 따라 의미정보를 추가하여 어노테이션을 할 수 있고, 이런 어노테이션은 검색을 보다 풍부하게 할 수 있기 때문이다.That is, the present invention can be carried out by dividing the various components as described above in each embodiment. The reason for classifying the components of the document as described above is that annotations can be added by adding semantic information according to the respective components, and such annotations can enrich the search.

한편, 문서의 구성요소들은 각기 특성들을 가지고 있다. 예컨대, 문서전체의 특성은 제목, 작성자, 작성일, 문서종류, 크기 등이 있을 수 있고, 문단의 특성에 는 종류, 크기, 순서 등이 있을 수 있고, 어휘의 특성은 어휘자체, 글자수, 품사, 접두사, 접미사, 형태소 등이 있을 수 있다. 본 발명에서는 상기와 같은 문서의 구성요소의 특성들을 문서의 구성요소의 속성이라 명명하기로 한다. 특히, 어휘의 특성 중에서 어휘자체를 어휘의 "이름 속성"이라 부르기로 한다. 예컨대, "바다" 어휘의 이름 속성의 속성값은 "바다"이다. 즉, "바다"라는 어휘는 "바다"라는 어휘자체, 2글자, 명사 등의 특징들이 있는데, 어휘자체인 "바다" 라는 특성을 본 발명에서는 "바다"의 이름 속성이라 부르는 것을 의미한다.On the other hand, the components of a document have their own characteristics. For example, the characteristics of the entire document may include a title, author, creation date, document type, size, and the like. The characteristics of a paragraph may include a type, size, order, and the like. , Prefix, suffix, morpheme, and so on. In the present invention, the characteristics of the components of the document as described above will be referred to as attributes of the components of the document. In particular, the vocabulary itself among the characteristics of the vocabulary will be referred to as the "name attribute" of the vocabulary. For example, the attribute value of the name attribute of the "sea" vocabulary is "sea". That is, the word "sea" has features such as the word "sea", two letters, nouns, etc., which means that the property of the word "sea" is called "sea" in the present invention.

한편, 문서의 구성요소와 의미정보가 관계되는지 여부에 대한 판단은 상기 구성요소의 속성과 의미정보의 속성들을 비교하여 판단한다. 예컨대 속성들이 동일하거나, 어느 속성이 다른 속성에 속한다거나 등이 될 수 있다. 또한 자연어 검색 등에서 이용되는 유사성 판단에 따라 유사성이 있으면, 양 속성이 관계되는 것으로 판단할 수도 있다. 의미정보는 온톨로지로 구성되므로 클래스이거나 인스턴스이다. 상기 클래스나 인스턴스는 속성을 가지고 있으므로, 클래스나 인스턴스의 속성을 의미정보의 속성이라 부르기로 한다. 보다 정확한 의미는 의미정보가 인스턴스인 경우, 의미정보의 속성은 의미정보 인스턴스의 속성을 의미한다.On the other hand, the determination of whether the components of the document and the semantic information is related is determined by comparing the attributes of the components and the attributes of the semantic information. For example, the attributes may be the same, which attributes belong to different attributes, or the like. In addition, if there is similarity according to the similarity judgment used in natural language search or the like, it may be determined that both attributes are related. Semantic information consists of ontology, so it is a class or an instance. Since the class or instance has an attribute, the attribute of the class or instance is called an attribute of semantic information. More precisely, when semantic information is an instance, an attribute of semantic information means an attribute of semantic information instance.

상기 어노테이션 자동처리부(33)는 문서의 구성요소들에 대하여 어노테이션을 자동으로 하기 위해서 상기 구성요소의 속성을 검사한다. 예컨대, "바나나"라는 어휘가 나타나면, 의미정보의 "바나나" 인스턴스로 어노테이션을 자동으로 하는 경우, 상기 "바나나"어휘의 이름 속성은 "바나나"이고, "바나나" 인스턴스의 이름 속성도 "바나나"일 것이므로, 어휘의 이름속성과 의미정보의 이름 속성과 동일하면 상기 의미정보를 어노테이션을 하면 된다.The annotation automatic processor 33 checks the attribute of the component to automatically annotate the components of the document. For example, when the term "banana" appears, when annotating automatically with the "banana" instance of semantic information, the name attribute of the "banana" vocabulary is "banana", and the name attribute of the "banana" instance is "banana". Since the name attribute of the vocabulary and the name attribute of the semantic information are the same, the semantic information may be annotated.

또, 상기 어노테이션 자동처리부(33)는 상기 문서의 어휘의 형태소와 동일한 형태소 속성(값)을 가지는 의미정보 클래스가 존재하면, 상기 클래스에 대한 인스턴스를 생성하고, 생성된 인스턴스의 이름 속성의 속성값을 상기 문서의 어휘로 정하고, 상기 문서의 어휘에 상기 인스턴스를 어노테이션을 한다. 예를 들면, "서울대학교"는 학교를 지칭하는 어휘이므로, "학교"라는 형태소가 있으면, "학교"라는 클래스에서 "서울대학교"라는 인스턴스를 하나 생성하고, 이 인스턴스를 어노테이션을 자동으로 할 수 있다. 이때, "학교" 의미정보 클래스에 형태소 속성을 정의하고 형태소 속성값에 "학교"를 정의하면, 어휘의 형태소 속성에서 "학교"가 나타나고, 그 형태소 속성을 "학교"로 갖는 클래스가 존재하면, 어노테이션을 하면 된다. 이때, 인스턴스의 이름 속성은 "서울대학교"로 자동 입력하면 된다. "서울대학교"는 "서울대학교" 어휘의 이름속성이기도 하다.If the semantic information class having the same morpheme attribute (value) as the morpheme of the vocabulary of the document exists, the annotation automatic processing unit 33 creates an instance for the class, and the attribute value of the name attribute of the generated instance. Is set to the vocabulary of the document, and the instance is annotated to the vocabulary of the document. For example, "Seoul National University" is a vocabulary for schools, so if you have a stemming of "school", you can create an instance of "Seoul National University" in the class "School" and annotate this instance automatically. have. In this case, if a morpheme attribute is defined in the "school" semantic information class and "school" is defined in the morpheme attribute value, "school" appears in the morpheme attribute of the vocabulary, and a class having the morpheme attribute as "school" exists. Just annotate. At this time, the name attribute of the instance is automatically entered as "Seoul National University". "Seoul National University" is also a name attribute of "Seoul National University" vocabulary.

상기 어노테이션 자동처리부(33)가 문서의 어휘를 검색하여 자동으로 어노테이션을 하는 대표적인 방법은 이름과 형태소로 매칭하는 상기 2가지 방법이 있다. 그런데 상기 2가지 방법을 같이 이용하는 경우에는 이름으로 매칭되는지를 먼저 검색하고 이름에 해당하는 의미정보가 없는 경우, 형태소로 매칭하는 방법을 적용해야 한다. 이는 이름이 있으면 바로 어노테이션을 하면 되고 형태소 매칭을 할 필요가 없기 때문이다.Representative methods for automatically annotating the vocabulary of the document by the annotation automatic processing unit 33 include the above two methods of matching by name and morpheme. However, when the two methods are used together, a search is first performed on whether a name matches, and when there is no semantic information corresponding to the name, a morpheme matching method should be applied. This is because if you have a name, you just annotate it, and you don't have to do stemming.

본 발명에 따른 상기 어노테이션 자동처리부(33)에 대한 다른 실시예를 설명 한다. 상기 어노테이션 자동처리부(33)는 자동으로 의미정보를 어노테이션을 하는 실시예를 앞서 설명하였으나, 자동으로 어노테이션을 하지 않고 사용자의 명령에 의해 반자동으로 어노테이션을 하는 다른 실시예가 있을 수 있다.Another embodiment of the annotation automatic processing unit 33 according to the present invention will be described. Although the annotation automatic processing unit 33 has previously described an embodiment of automatically annotating semantic information, there may be another embodiment in which an annotation is semi-automatically performed by a user's command without automatically annotating.

즉, 상기 어노테이션 자동처리부(33)는 상기 문서의 구성요소와 관계되는 의미정보가 상기 의미정보 지식베이스에 존재하면, 상기 문서의 구성요소에 상기 의미정보를 어노테이션을 할 수 있는 것을 알린다. 상기 어노테이션 자동처리부(33)는 자동으로 하지 않고, 사용자에게 어노테이션이 가능한 어휘만을 알려주고 어노테이션 여부를 사용자에게 확인하도록 한다. 상기 어노테이션 자동처리부(33)가 문서의 구성요소와 관계되는 의미정보를 찾는 방법은 앞서 설명한 실시예와 동일하다. 다만, 어노테이션을 하지 않고, 특히, 형태소를 비교하여 관계되는지를 판단할 때, 해당하는 클래스의 인스턴스를 생성하지 않을 뿐이다.That is, the annotation automatic processing unit 33 notifies that the semantic information can be annotated to the component of the document when the semantic information related to the component of the document exists in the semantic information knowledge base. The annotation automatic processing unit 33 does not automatically, but notifies the user of only the vocabulary that can be annotated and allows the user to check whether the annotation is present. The method of finding the semantic information related to the elements of the document by the annotation automatic processing unit 33 is the same as the above-described embodiment. However, it does not annotate, and in particular, does not create an instance of that class when comparing morphemes to determine if they are related.

한편, 본 발명에 따른 상기 어노테이션 자동처리부(33)에 대한 2가지 실시예를 모두 시맨틱 어노테이션 시스템(30)에서 실시하고 사용자가 옵션에 의한 선택에 의해 둘 중 하나를 수행하는 형태로 실시할 수 있다.Meanwhile, both embodiments of the annotation automatic processing unit 33 according to the present invention may be implemented in the semantic annotation system 30 in a form in which a user performs one of two options by selection. .

상기 어노테이션 작성부(34)는 사용자가 지정하는 의미정보와 문서의 구성요소에 대하여, 상기 의미정보를 상기 문서의 구성요소에 어노테이션을 한다. 본 발명의 일실시예에 따른 시맨틱 어노테이션 시스템(30)는 어노테이션을 편리하게 지원하기 위한 도구로서 사용자가 직접 어노테이션을 하는 것을 지원한다. 즉, 사용자는 문서의 구성요소를 선택하고, 의미정보 지식베이스에서 어노테이션을 할 의미 정보를 찾아 상기 구성요소에 어노테이션을 할 수 있다. 이때, 필요하면 인스턴스를 생성하고 인스턴스에 필요한 속성값들을 입력하고 상기 인스턴스를 어노테이션을 할 수 있다. 앞서 본 바와 같이 문서의 구성요소는 문서전체, 텍스트, 어휘, 음성, 동영상, 이미지 등 어느 것도 선택이 가능하다.The annotation preparing unit 34 annotates the semantic information to the elements of the document with respect to the semantic information designated by the user and the elements of the document. The semantic annotation system 30 according to an embodiment of the present invention supports a user's direct annotation as a tool for conveniently supporting the annotation. That is, the user may select an element of the document and find the semantic information to be annotated in the semantic information knowledge base and annotate the element. At this time, if necessary, an instance may be created, attribute values required for the instance may be input, and the instance may be annotated. As described above, the components of the document can be selected from the entire document, text, vocabulary, voice, video, and image.

다음은, 본 발명의 다른 일실시예에 따른 시맨틱 어노테이션 시스템에 관한 것으로서, 자동처리을 할 수 있는 규칙을 따로 관리하여 추가하거나 수정할 수 있도록 하여 자동처리기능을 확장한 실시예이다. The following is related to a semantic annotation system according to another embodiment of the present invention, and is an embodiment in which an automatic processing function is extended by separately managing or adding or modifying a rule capable of automatic processing.

상기 자동처리규칙 관리부(35)는 문서의 구성요소와 의미정보의 관계여부를 판단하기 위하여, 상기 문서의 구성요소에 속성을 정의하고, 상기 구성요소의 속성과 비교하는 의미정보의 속성을 정하고, 관계여부를 판단하는 상기 속성들의 비교방법을 정하는 자동처리규칙을 저장하여 관리한다. 그리고, 어노테이션 자동처리부(33)는 상기 문서의 각 구성요소에 대하여, 상기 구성요소의 속성과 비교하는 의미정보의 속성이 상기 자동처리규칙에 있으면, 상기 의미정보의 속성을 가지는 의미정보를 의미정보 지식베이스에서 검색하여 검색된 의미정보의 상기 속성(값)과 상기 구성요소의 속성(값)을 비교하여 관계가 있는 것으로 판단되면, 상기 문서의 구성요소에 상기 검색된 의미정보로 어노테이션을 한다.The automatic processing rule manager 35 defines an attribute in an element of the document, determines an attribute of semantic information to compare with an attribute of the element, in order to determine whether the element of the document is related to semantic information, It stores and manages automatic processing rules that determine how to compare the attributes to determine the relationship. And the annotation automatic processing unit 33, for each component of the document, if the attribute of the semantic information compared with the attribute of the component is in the automatic processing rule, the semantic information having the attribute of the semantic information is the semantic information. If it is determined that the attribute (value) of the semantic information retrieved by searching in the knowledge base and the attribute (value) of the component is related and annotated, the component of the document is annotated with the retrieved semantic information.

즉, 상기 확장된 실시예는 다른 요소는 동일하고, 상기 자동처리규칙 관리부(35)가 추가되고, 상기 어노테이션 자동처리부(33)의 기능이 달라진다.That is, in the expanded embodiment, the other elements are the same, the automatic processing rule manager 35 is added, and the function of the automatic annotation processor 33 is changed.

상기 자동처리규칙 관리부(35)는 앞서 살펴본 문서의 구성요소의 속성을 정 의할 수 있도록 한다. 예컨대, 어휘의 이름 속성은 어휘에서 어휘자체를 이름 속성으로 정의할 수 있다. 또한, 어휘의 형태소 속성은 어휘에서 갖는 형태소로 정의할 수 있다. 이는 통상의 어휘분석기를 이용한다. 본 분야의 공지기술이므로 구체적 설시는 생략한다. 상기 어휘의 속성과 비교할 의미정보의 속성을 정하고, 비교연산자를 정의한다. 예컨대, 동일하다, 속한다, 포함한다, 크다, 작다, 같다 등 비교연산자 중 하나를 선택한다. 따라서 상기 자동처리규칙은 문서의 구성요소의 속성에 대한 정의, 의미정보의 속성, 비교연산자로 구성된 규칙이다.The automatic processing rule manager 35 allows to define the attributes of the components of the document described above. For example, the name attribute of a vocabulary may define the vocabulary itself as a name attribute in the vocabulary. In addition, the morpheme attribute of a vocabulary can be defined as the morpheme which a vocabulary has. This uses a conventional lexical analyzer. Since it is known in the art, specific description thereof will be omitted. An attribute of semantic information to be compared with an attribute of the vocabulary is defined, and a comparison operator is defined. For example, one of the comparison operators is selected to be the same, belong, include, large, small, or equal. Therefore, the automatic processing rule is a rule consisting of a definition of an attribute of a document element, an attribute of semantic information, and a comparison operator.

상기 예를 들면, 상기 자동처리규칙 관리부(35)는 문서의 구성요소인 어휘에서 어휘자체를 이름 속성으로 정하고, 상기 이름 속성과 비교하는 의미정보의 속성을 이름 속성으로 정하는 자동처리규칙, 또는, 상기 어휘에서 어휘의 형태소를 형태소 속성으로 정하고, 상기 형태소 속성과 비교하는 의미정보의 속성을 형태소 속성으로 정하는 자동처리규칙을 작성하여 저장한다.For example, the automatic processing rule manager 35 sets the vocabulary itself as a name attribute in a vocabulary that is a component of a document, and sets an automatic processing rule to set an attribute of semantic information to be compared with the name attribute as a name attribute, or In the vocabulary, a morpheme of the vocabulary is defined as a morpheme attribute, and an automatic processing rule for setting an attribute of semantic information compared with the morpheme attribute as a morpheme attribute is created and stored.

상기와 같이, 자동처리규칙을 이용하면 추가적인 처리규칙을 발굴하면 계속 자동처리능력을 향상시킬 수 있다. 예컨대, 문서상에는 나타나지 않지만 이미지에 대한 설명 속성이 있는 경우, 그 설명에 포함되는 어휘에 대하여 자동처리하여 어노테이션을 할 수 있다. 또한, 문단의 경우에는 문서의 종류가 논문 등이면, 문단의 순서와 길이, 또는 위치로 제목인지, 작성자인지, 요약인지 등을 자동처리할 수 있는 규칙을 만들어 낼 수도 있다.As described above, the use of automatic processing rules can continue to improve the automatic processing capacity by discovering additional processing rules. For example, if there is an explanatory attribute of an image that does not appear on the document, the vocabulary included in the description may be automatically processed and annotated. In addition, in the case of a paragraph, if the type of document is a paper or the like, a rule that can automatically process whether the paragraph, order, length, or position of the title, author, summary, etc. may be generated.

상기 인터페이스부(36)는 의미정보 지식베이스를 보여주는 온톨로지뷰, 문서 를 보여주는 도큐멘트뷰, 어노테이션 정보를 보여주는 어노테이션뷰로 구성된 화면을 사용자 컴퓨터에 전송한다. 또한, 상기 인터페이스부(36)는 상기 어노테이션 자동처리부가 어노테이션을 할 수 있는 문서의 구성요소로 알리는 상기 문서의 구성요소를 화면상에서 하이라이트(highlight)시킨다.The interface unit 36 transmits a screen composed of an ontology view showing a semantic information knowledge base, a document view showing a document, and an annotation view showing the annotation information to the user computer. In addition, the interface unit 36 highlights a component of the document on the screen that the annotation automatic processing unit notifies the component of the document that can be annotated.

도 3은 본 발명의 일실시예에 따른 시맨틱 어노테이션 시스템의 화면 구성을 예시한 도면이다. 도 3에서 보는 바와 같이, 상기 화면은 좌측에 의미정보 지식베이스를 보여주는 온톨로지뷰, 오른쪽 상단에 문서를 보여주는 도큐멘트뷰, 오른쪽 하단에 어노테이션 정보를 보여주는 어노테이션뷰로 구성된다. 상기 온톨로지뷰는 아래와 위로 구분되어 위는 클래스를 트리형태로 보여주고, 하단에는 인스턴스나 속성에 대하여 리스트 형태로 보여준다. 상기 도큐멘트뷰는 문서를 보여주기 때문에 보다 넓은 뷰를 갖는다. 상기 어노테이션뷰는 의미정보의 속성들을 보여준다. 특히, 어노테이션뷰는 상기 도규멘트뷰에 활성화된 문서와 온톨로지뷰에서 활성화된 의미정보에 연관된 정보들을 보여준다.3 is a diagram illustrating a screen configuration of a semantic annotation system according to an embodiment of the present invention. As shown in FIG. 3, the screen includes an ontology view showing a semantic information knowledge base on the left side, a document view showing a document on the upper right side, and an annotation view showing the annotation information on the lower right side. The ontology view is divided into the bottom and the top, and the above shows the class in tree form, and the bottom shows the instance or property in list form. The document view has a wider view because it shows the document. The annotation view shows the attributes of the semantic information. In particular, the annotation view shows information related to the document activated in the document view and semantic information activated in the ontology view.

상기 의미정보 출판부(37)는 상기 문서에 작성된 어노테이션을 SCOT(Social Semantic Cloud of Tags) 형식으로 변환하여 저장한다.The semantic information publishing unit 37 converts and stores the annotations created in the document in SCOT (Social Semantic Cloud of Tags) format.

SCOT(Social Semantic Cloud of Tags)는 온라인 커뮤니티에서 사용되는 태그를 표현하기 위한 온톨로지로서, RDF로 태그를 구조적으로 표현하고 동시에 의미적으로 태그를 정의한다. 이를 통해, 사용자는 태그를 재사용하거나 다른 사용자와 공유하여, 사회성 네트워크(Social network)를 사람, 콘텐츠, 태그가 의미적으로 연결된 환경을 만든다. SCOT 온톨로지는 FOAF, SIOC, SKOS와 연결되며 SCOT 온톨로지를 생성하고 공유하는 모든 활동은 시맨틱 웹 환경 안에서 이루어진다.SCOT (Social Semantic Cloud of Tags) is an ontology for expressing tags used in online community. It structurally expresses tags in RDF and defines tags semantically. This allows users to reuse tags or share them with other users, creating an environment where people, content, and tags are semantically connected to social networks. SCOT Ontology is connected to FOAF, SIOC, and SKOS, and all activities to create and share SCOT Ontology are done in the Semantic Web environment.

본 발명의 일실시예에 따른 시맨틱 어노테이션 시스템는 태크 온톨로지와의 호환성을 위하여 가장 대표적인 태그 온톨로지인 상기 SCOT 형식으로 어노테이션을 변환하여 저장하여 관리한다.The semantic annotation system according to an embodiment of the present invention converts and stores annotations in the SCOT format, which is the most representative tag ontology, for compatibility with tag ontology.

상기 패싯 브라우저(38)는 클래스, 인스턴스, 속성 중 적어도 하나이상에 대하여 매칭되는 값을 정하여 의미정보 및 어노테이션을 검색한다. 패싯 브라우징(faceted browsing)은 온톨로지의 특성을 이용하여 온톨로지의 구성요소별로 검색하는 방법이다. 예컨대, "감우성이 출연한 영화검색"을 하기 위하여, 사람의 클래스를 지정하고 이름 속성이 "감우성"인 조건을 입력하여 "감우성"인스턴스를 찾아내고, 영화 클래스에서 배우 속성이 "감우성"인스턴스인 영화 인스턴스를 찾아냄으로써, 검색을 수행한다.The facet browser 38 searches for semantic information and annotations by determining a matching value for at least one of a class, an instance, and an attribute. Faceted browsing is a method of searching for each component of ontology using the characteristics of ontology. For example, in order to perform a "movie search starring Gam Woo Sung", a class of a person is specified and the condition that the name attribute is "Gam Woo Sung" is found to find the "Gam Woo Sung" instance, and the actor attribute is "Gam Woo Sung" instance in the movie class. Perform a search by finding a movie instance.

따라서 상기 패싯 브라우저(38)는 검색할 클래스 대상을 정하는 형식(type), 매칭여부 판단 대상인 속성(facet), 속성의 실제값인 인스턴스 또는 속성값인 비교값(value)에 의한 조건에 의하여 검색한다. 검색결과는 의미정보의 인스턴스나 문서이다. 그리고 복합 검색하는 경우, 순차적인 검색과정을 기록한다.Accordingly, the facet browser 38 searches based on a condition for determining a class target to search for, a facet for determining whether to match, an instance that is an actual value of the attribute, or a comparison value that is an attribute value. . The search results are instances or documents of semantic information. In the case of a complex search, the sequential search process is recorded.

요약하면, 상기 패싯 브라우저(Faceted Browser)는 속성에 대한 검색조건을 입력받고, 상기 검색조건에 부합되는 의미정보 인스턴스를 검색하여 결과를 보여준다. 즉, 주된 검색은 속성에 의한 검색이고, 이 검색을 보다 한정하기 위하여 클래 스 등을 한정시킬 수 있다.In summary, the faceted browser receives a search condition for an attribute, searches for a semantic information instance that meets the search condition, and displays a result. That is, the main search is a search by attributes, and classes and the like can be defined to further limit the search.

다음으로, 본 발명의 일실시예에 따른 시맨틱 어노테이션 시스템에서 보여주는 화면을 도 4 내지 도 7을 참조하여 설명한다.Next, a screen shown in the semantic annotation system according to an embodiment of the present invention will be described with reference to FIGS. 4 to 7.

도 4는 본 발명의 일실시예에 따른 의미정보 지식베이스를 로드하는 화면을 예시한 도면이다. 도 4에서 보는 바와 같이, 지식베이스를 로드하면 트리형태의 의미정보 클래스 트리가 보여진다.4 is a diagram illustrating a screen for loading a semantic information knowledge base according to an embodiment of the present invention. As shown in Fig. 4, when the knowledge base is loaded, the semantic information class tree in the form of a tree is shown.

도 5는 본 발명의 일실시예에 따른 PDF 문서를 불러오는 화면을 예시한 도면이다. 도 5에서 보는 바와 같이, PDF 문서를 불러오면 화면 오른쪽 상단에 문서가 화면상에 보여진다.5 is a diagram illustrating a screen for loading a PDF document according to an embodiment of the present invention. As shown in FIG. 5, when a PDF document is loaded, the document is displayed on the upper right side of the screen.

도 6은 본 발명의 일실시예에 따른 어노테이션 자동처리을 하는 화면을 예시한 도면이다. 도 6a에서 보는 바와 같이, 자동으로 어노테이션이 되면, 도큐먼트뷰에서 어노테이션을 한 문서의 구성요소는 활성화되고, 하단의 어노테이션뷰에는 의미정보의 속성들이 보여진다. 또, 도 6b에서 보는 바와 같이, 반자동으로 어노테이션이 가능한 문서의 구성요소들을 노란색으로 하이라이트(highlight)시켜준다.6 is a diagram illustrating a screen for automatic annotation processing according to an embodiment of the present invention. As shown in FIG. 6A, when annotated automatically, components of the document annotated in the document view are activated, and attributes of semantic information are shown in the annotation view at the bottom. In addition, as shown in Figure 6b, the components of the semi-automatically annotable document are highlighted in yellow.

도 7은 본 발명의 일실시예에 따른 패싯브라우징을 하는 화면을 예시한 도면이다. 도 7a에서 보는 바와 같이, 형식(type)에서 Person 클래스를 선택하고, 속성(facet)에는 name을 선택하여 리스트에 나타나는 "감우성"이름을 선택하여 "감우성"인스턴스를 검색한다. 그 결과를 하단의 결과화면(Information)에 보여준다. 도 7b에서 보는 바와 같이, 영화 클래스를 선택하고, 속성(facet)에는 name 속성을 선 택한다. 이때 검색과정(history)에서 바로 직전에 검색한 "감우성" 인스턴스로 조건으로 "감우성"이 출연한 영화 인스턴스들의 결과를 찾는다. 그 결과는 하단의 결과하면(Information)에 나타난다.7 is a diagram illustrating a screen for facet browsing according to an embodiment of the present invention. As shown in FIG. 7A, a Person class is selected in a type, a name is selected in a facet, and a "feeling" instance is displayed by selecting a "feeling" name appearing in the list. The result is shown on the lower information screen. As shown in FIG. 7B, the movie class is selected, and the name attribute is selected for the facet. At this time, the "Gam Woo Sung" instance searched just before the search process (history) finds the results of the movie instances starring "Gam Woo Sung" as a condition. The result is shown in the information below.

다음으로, 본 발명의 일실시예들에 따른 시맨틱 어노테이션 방법을 도 8과 도 9를 참조하여 설명한다. 도 8은 본 발명의 일실시예에 따른 시맨틱 어노테이션 방법을 설명한 흐름도이고, 도 9는 본 발명의 일실시예에 따른 문서의 어휘에 의미정보로 어노테이션을 하는 방법을 설명한 흐름도이다.Next, a semantic annotation method according to an embodiment of the present invention will be described with reference to FIGS. 8 and 9. FIG. 8 is a flowchart illustrating a semantic annotation method according to an embodiment of the present invention, and FIG. 9 is a flowchart illustrating a method of annotating semantic information to a vocabulary of a document according to an embodiment of the present invention.

도 8에서 보는 바와 같이, 본 발명의 일실시예에 따른 시맨틱 어노테이션 방법은 (a) 의미정보(메타데이터)를 상기 온톨로지 저장소에 저장하여 의미정보 지식베이스를 구축하는 단계(S10); (b) 사용자 지정에 의하여 상기 저장된 의미정보 지식베이스들 중 일부를 로드하는 단계(S20); (c) 사용자 지정에 의하여 문서를 불러오는 단계(S30); (d) 상기 문서의 모든 구성요소에 대하여, 상기 구성요소와 관계되는 의미정보가 상기 의미정보 지식베이스에 존재하면, 상기 문서의 구성요소에 상기 의미정보로 어노테이션을 하는 단계(S40); (e) 사용자가 지정하는 의미정보와 문서의 구성요소에 대하여, 상기 의미정보를 상기 문서의 구성요소에 어노테이션을 하는 단계(S50); (f) 상기 의미정보들을 문서에 첨부하는 단계(S60)로 나뉜다.As shown in Figure 8, the semantic annotation method according to an embodiment of the present invention comprises the steps of (a) storing the semantic information (metadata) in the ontology repository to build a semantic information knowledge base (S10); (b) loading some of the stored semantic knowledge bases by user designation (S20); (c) step S30 of retrieving a document by user designation; (d) annotating the components of the document with the semantic information when the semantic information related to the components exists in the semantic information knowledge base for all the components of the document (S40); (e) annotating the semantic information with respect to the semantic information designated by the user and the elements of the document (S50); (f) the semantic information is attached to the document (S60).

도 9에서 보는 바와 같이, 상기 (d)단계는, (d1) 상기 문서의 모든 구성요소 각각에 대하여, 상기 구성요소가 어휘이면, 다음 (d2)단계 내지 (d5)단계를 수행하는 단계(S41); (d2) 상기 문서의 어휘와 동일한 이름 속성(값)을 갖는 의미정보 인 스턴스가 존재하면, 상기 문서의 어휘에 상기 인스턴스로 어노테이션을 하고 (d5)단계를 수행하고, 존재하지 않으면 (d3)단계를 수행하는 단계(S42); (d3) 상기 문서의 어휘의 형태소와 동일한 형태소 속성(값)을 갖는 의미정보 클래스가 존재하면, 상기 클래스에 대한 인스턴스를 생성하고, 생성된 인스턴스의 이름 속성의 속성값을 상기 문서의 어휘로 정하는 단계(S43); (d4) 상기 문서의 어휘에 상기 인스턴스로 어노테이션을 하는 단계(S44); (d5) 문서의 모든 구성요소에 대하여, 상기 (d2)단계 내지 (d4)단계를 수행하는 단계(S45)로 더 세분화된다.As shown in FIG. 9, in the step (d), for each component of the document, if the component is a vocabulary, the steps (d2) to (d5) are performed (S41). ); (d2) If there is a semantic information instance having the same name attribute (value) as the vocabulary of the document, annotate the vocabulary of the document with the instance, perform step (d5), and if it does not exist, step (d3) Performing step (S42); (d3) if a semantic information class having the same stemming attribute (value) as the morpheme of the document's vocabulary exists, create an instance for the class, and set the attribute value of the name attribute of the created instance as the vocabulary of the document; Step S43; (d4) annotating the vocabulary of the document with the instance (S44); (d5) For all the components of the document, it is further subdivided into the step (S45) of performing the steps (d2) to (d4).

상기 (d)단계는, 상기 문서의 구성요소에 상기 의미정보를 어노테이션을 하기전에 어노테이션을 할 수 있는 것을 알린다(S41).In step (d), the component of the document is notified before the semantic information is annotated (S41).

상기 (f)단계는, 문서에 어노테이션을 한 의미정보가 의미정보 지식베이스에 인스턴스로 존재하면 상기 인스턴스에 대한 연결정보만을 첨부한다.In step (f), if the semantic information annotated to the document exists as an instance in the semantic information knowledge base, only the connection information for the instance is attached.

상기 (f)단계는, 상기 문서에 의미정보를 XML, XMP, RDF 형식 중 어느 하나의 형식으로 표기하여 첨부한다.In step (f), the semantic information is attached to the document in one of XML, XMP, and RDF formats.

상기 (d)단계는, 어노테이션을 할 수 있는 문서의 구성요소로 알리는 상기 문서의 구성요소를 화면상에서 하이라이트(highlight)시킨다.In step (d), the components of the document notified to the components of the document that can be annotated are highlighted on the screen.

상기 시맨틱 어노테이션 방법은 (f0) (f)단계이전에 상기 문서에 작성된 어노테이션을 SCOT(Social Semantic Cloud of Tags) 형식으로 변환하여 저장하는 단계를 더 포함한다.The semantic annotation method further includes converting an annotation written in the document before step (f0) and (f) into a SCOT (Social Semantic Cloud of Tags) format and storing the converted annotation.

상기 문서는 웹문서, PDF문서, 이미지파일, 동영상파일, 음성파일 중 어느 하나이고, 상기 문서의 구성요소는 문서전체, 텍스트 또는 어휘, 이미지, 음성, 동 영상, 링크 중 적어도 하나이상을 포함한다.The document is any one of a web document, a PDF document, an image file, a video file, and an audio file, and a component of the document includes at least one of a whole document, text or vocabulary, an image, an audio, a video, and a link. .

상기 문서는 사용자 컴퓨터에 내장된 문서이거나 인터넷으로 연결되어 전송되어 가져온 문서이다.The document is a document embedded in a user's computer or a document imported and connected to the Internet.

상기 (b)단계와 상기(c)단계는 동시에 진행하거나 선후관계의 진행이 바뀌어도 된다.The step (b) and the step (c) may be performed simultaneously or the progress of the posterity relationship may be changed.

다음으로, 본 발명의 다른 실시예들에 따른 시맨틱 어노테이션 방법을 도 10을 참조하여 설명한다. 도 10은 본 발명의 다른 실시예에 따른 시맨틱 어노테이션 방법을 설명한 흐름도이다.Next, a semantic annotation method according to other embodiments of the present invention will be described with reference to FIG. 10. 10 is a flowchart illustrating a semantic annotation method according to another embodiment of the present invention.

도 10에서 보는 바와 같이, 온톨로지 저장소를 이용하는 시맨틱 어노테이션 방법에 관한 것으로서, (g) 의미정보(메타데이터)를 상기 온톨로지 저장소에 저장하여 의미정보 지식베이스를 구축하는 단계(S210); (h) 문서의 구성요소와 의미정보의 관계여부를 판단하기 위하여, 상기 문서의 구성요소에 속성을 정의하고, 상기 구성요소의 속성과 비교하는 의미정보의 속성을 정하고, 관계여부를 판단하는 상기 속성들의 비교방법을 정하는 자동처리규칙을 작성하여 저장하는 단계(S215); (i) 사용자 지정에 의하여 상기 저장된 의미정보 지식베이스들 중 일부를 로드하는 단계(S22); (j) 사용자 지정에 의하여 문서를 불러오는 단계(S230); (k) 상기 문서의 각 구성요소에 대하여, 상기 구성요소의 속성과 비교하는 의미정보의 속성이 상기 자동처리규칙에 있으면, 상기 의미정보의 속성을 가지는 의미정보를 의미정보 지식베이스에서 검색하여 검색된 의미정보의 상기 속성(값)과 상기 구성요소의 속성 (값)을 비교하여 관계가 있는 것으로 판단되면, 상기 문서의 구성요소에 상기 검색된 의미정보로 어노테이션을 하는 단계(S240); (l) 사용자가 지정하는 의미정보와 문서의 구성요소에 대하여, 상기 의미정보를 상기 문서의 구성요소에 어노테이션을 하는 단계(S250); (m) 상기 의미정보들을 문서에 첨부하는 단계(S260)로 나뉜다.As shown in FIG. 10, the method relates to a semantic annotation method using an ontology repository, comprising: (g) constructing a semantic information knowledge base by storing semantic information (metadata) in the ontology repository (S210); (h) In order to determine the relationship between the components of the document and the semantic information, the attributes are defined in the components of the document, the attributes of the semantic information to be compared with the attributes of the components, and the relationship to determine the relationship Creating and storing an automatic processing rule for determining a method of comparing the attributes (S215); (i) loading some of the stored semantic knowledge bases by user designation (S22); (j) retrieving a document by user specification (S230); (k) For each component of the document, if the attribute of the semantic information comparing with the attribute of the component is in the automatic processing rule, the semantic information having the attribute of the semantic information is searched and searched in the semantic information knowledge base. Comparing the attribute (value) of the semantic information with the attribute (value) of the component and annotating the element of the document with the retrieved semantic information (S240); (l) annotating the semantic information with the semantic information designated by the user and the elements of the document (S250); (m) the method is attached to the document (S260).

상기 문서의 구성요소는 어휘를 포함하고, 상기 (h)단계는, 상기 어휘에서 어휘 자체를 이름 속성으로 정의하고, 상기 어휘의 이름 속성과 비교하는 의미정보의 속성을 이름 속성으로 정하고, 상기 속성들이 동일하면 관계가 있는 것으로 판단하는 규칙, 또는, 상기 어휘에서 어휘에 포함된 형태소를 형태소 속성으로 정의하고, 상기 어휘의 형태소 속성과 비교하는 의미정보의 속성을 형태소 속성으로 정하고, 상기 속성들이 동일하면 관계가 있는 것으로 판단하는 규칙을 자동처리규칙으로 작성하여 저장한다.The component of the document includes a vocabulary, and in step (h), the vocabulary itself is defined as a name attribute in the vocabulary, the attribute of semantic information comparing with the name attribute of the vocabulary is defined as a name attribute, and the attribute If they are the same, the rule judged to be related or the morpheme included in the vocabulary in the vocabulary are defined as the morpheme attribute, the attribute of the semantic information comparing with the morpheme attribute of the vocabulary is determined as the morpheme attribute, and the attributes are the same. If you do so, create a rule that you consider to be related as an automatic processing rule and save it.

상기 시맨틱 어노테이션 방법에 대한 설명이 미흡한 부분은 앞서 설명된 시맨틱 어노테이션 시스템에 대한 설명을 참고한다.For the insufficient description of the semantic annotation method, refer to the description of the semantic annotation system described above.

이상, 본 발명자에 의해서 이루어진 발명을 상기 실시 예에 따라 구체적으로 설명하였지만, 본 발명은 상기 실시 예에 한정되는 것은 아니고, 그 요지를 이탈하지 않는 범위에서 여러 가지로 변경 가능한 것은 물론이다.As mentioned above, although the invention made by this inventor was demonstrated concretely according to the said Example, this invention is not limited to the said Example and can be variously changed in the range which does not deviate from the summary.

본 발명은 방대한 문서들을 검색하는 분야에 적용이 가능하고, 특히, 웹문서나 PDF 등 인터넷 상에서 보편화된 문서들에 대하여, 단순한 키워드 검색이외에 의미적인 상호운용성에 기초한 검색을 수행하여 효과적으로 원하는 문서를 찾는 분야에 적용이 가능하다.The present invention is applicable to the field of searching a large number of documents, and in particular, for documents commonly used on the Internet such as web documents or PDFs, a search based on semantic interoperability besides a simple keyword search is performed to find a desired document effectively. Applicable to the field.

또, 본 발명은 방대한 문서들을 의미적인 상호운용성에 기초하여 문서를 분류하고 어노테이션을 하여, 하나의 전체 지식베이스를 구축하는 분야에 적용이 가능하다.In addition, the present invention can be applied to the field of building a whole knowledge base by classifying and annotating documents based on semantic interoperability.

도 1a는 본 발명에 따른 시맨틱 어노테이션 시스템 및 방법을 PC내장형으로 구현하기 위한 전체 시스템의 구성을 도시한 도면이다.FIG. 1A is a diagram illustrating a configuration of an entire system for implementing a semantic annotation system and method according to the present invention with a built-in PC.

도 1b는 본 발명에 따른 시맨틱 어노테이션 시스템 및 방법을 네트워크에서 구현하기 위한 전체 시스템의 구성을 도시한 도면이다.1B is a diagram illustrating the configuration of an entire system for implementing a semantic annotation system and method in a network according to the present invention.

도 3은 본 발명의 일실시예에 따른 시맨틱 어노테이션 시스템의 화면 구성을 예시한 도면이다.3 is a diagram illustrating a screen configuration of a semantic annotation system according to an embodiment of the present invention.

도 4는 본 발명의 일실시예에 따른 의미정보 지식베이스를 로드하는 화면을 예시한 도면이다.4 is a diagram illustrating a screen for loading a semantic information knowledge base according to an embodiment of the present invention.

도 5는 본 발명의 일실시예에 따른 PDF 문서를 불러오는 화면을 예시한 도면이다.5 is a diagram illustrating a screen for loading a PDF document according to an embodiment of the present invention.

도 6은 본 발명의 일실시예에 따른 어노테이션 자동처리을 하는 화면을 예시한 도면이다.6 is a diagram illustrating a screen for automatic annotation processing according to an embodiment of the present invention.

도 7은 본 발명의 일실시예에 따른 다면검색을 하는 화면을 예시한 도면이다.7 is a diagram illustrating a screen for performing a multi-faceted search according to an embodiment of the present invention.

도 8은 본 발명의 일실시예에 따른 시맨틱 어노테이션 방법을 설명한 흐름도이다.8 is a flowchart illustrating a semantic annotation method according to an embodiment of the present invention.

도 9는 본 발명의 일실시예에 따른 문서의 어휘에 메타데이터로 어노테이션 을 하는 방법을 설명한 흐름도이다.9 is a flowchart illustrating a method of annotating metadata of a vocabulary of a document according to an embodiment of the present invention.

도 10은 본 발명의 제2의 실시예에 따른 시맨틱 어노테이션 방법을 설명한 흐름도이다.10 is a flowchart illustrating a semantic annotation method according to a second embodiment of the present invention.

* 도면의 주요 부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

5 : 인터넷 10 : 사용자 단말기5: internet 10: user terminal

20 : 문서서버 25 : 문서20: Document Server 25: Document

30 : 시맨틱 어노테이션 시스템 40 : 온톨리지DB30: semantic annotation system 40: ontology DB

31 : 의미정보 관리부 32 : 문서관리부31: semantic information management unit 32: document management unit

33 : 어노테이션 자동처리부 34 : 어노테이션 작성부33: Annotation automatic processing unit 34: Annotation creation unit

35 : 자동처리규칙 관리부 36 : 인터페이스부35: automatic processing rule management unit 36: interface unit

37 : 의미정보 출판부37: Semantic Information Publishing Department

Claims

A semantic annotation system having an ontology repository,

A semantic information manager to store semantic information in the ontology repository to construct a semantic information knowledge base;

A document management unit for importing a document and attaching semantic information to the document;

An annotation automatic processing unit for annotating the components of the document with the semantic information when the semantic information related to the elements of the document exists in the semantic information knowledge base;

An annotation creation unit for annotating the semantic information to the elements of the document with respect to the semantic information designated by the user and the components of the document;

A semantic annotation system comprising a.

The method of claim 1, wherein the annotation automatic processing unit,

And a semantic annotation system, wherein the semantic annotation system is notified to the user of the document before automatically annotating the semantic information.

The method of claim 1,

The components of the document include a vocabulary,

The annotation automatic processing unit,

Determine that the vocabulary of the document and the name attribute (value) of the semantic information instance are the same, or

And a morpheme of the vocabulary of the document and a morpheme attribute (value) of the semantic information class are determined to be related.

The method of claim 3, wherein the annotation automatic processing unit,

If there is a semantic information class having the same morpheme attribute (value) as the morpheme of the document's vocabulary, an instance is created for the class, and the attribute value of the name attribute of the generated instance is defined as the vocabulary of the document. A semantic annotation system for annotating a vocabulary of a document with the instance.

The method of claim 1, wherein the document management unit,

If the semantic information annotated to the document exists as an instance in the semantic information knowledge base, the semantic annotation system, characterized in that attached only the connection information for the instance.

The method of claim 1, wherein the document management unit,

Semantic annotation system characterized in that the semantic information in the form of XML, XMP, RDF format attached to the document attached.

The method of claim 1,

The semantic information management unit loads the knowledge base designated by the user from the semantic information knowledge base stored in the ontology repository,

And the annotation automatic processing unit checks whether there is semantic information related to only the loaded semantic information knowledge base.

The method of claim 1,

It further includes an interface for transmitting a screen consisting of an ontology view showing a semantic information knowledge base, a document view showing a document, an annotation view showing an annotation information to the user computer,

And the annotation information is information about activated semantic information of the ontology view that annotates a component of a document activated in the document view.

The method of claim 8, wherein the interface unit,

And the component of the document that the annotation automatic processor notifies the component of the document that can be annotated on the screen.

The method of claim 1,

The semantic annotation system of claim 1, further comprising: a faceted browser that receives a search condition for an attribute, searches for a semantic information instance that meets the search condition, and displays a result.

The method of claim 1,

The semantic annotation system further comprises a semantic information publishing unit for converting and storing the semantic information created in the document in the SCOT (Social Semantic Cloud of Tags) format.

The method of claim 1,

The document is any one of a web document, PDF document, image file, video file, audio file,

The semantic annotation system of claim 1, wherein a component of the document includes at least one of a document, a text or a vocabulary, an image, a voice, a video, and a link.

The method of claim 1,

The document is a semantic annotation system, characterized in that the document embedded in the user's computer or a document that is imported and connected to the Internet.

A semantic annotation system having an ontology repository,

A semantic information manager to store semantic information (metadata) in the ontology repository to build a semantic information knowledge base;

In order to determine the relationship between the component and the semantic information, an attribute is defined in an element of the document, an attribute of semantic information to be compared with an attribute of the element is determined, and a comparison method of the attributes for determining the relationship An automatic processing rule manager to store and manage the automatic processing rule for determining a value;

For each component of the document, if the attribute of the semantic information to be compared with the attribute of the component is in the automatic processing rule, the semantic information having the attribute of the semantic information is searched in the semantic information knowledge base, and the retrieved semantic information is retrieved. An annotation automatic processing unit for annotating the components of the document with the retrieved semantic information when it is determined that the attributes (values) of the components and the attributes (values) of the components are related;

A semantic annotation system comprising a.

The method of claim 14,

The components of the document include a vocabulary,

The automatic processing rule management unit defines the vocabulary itself as a name attribute in the vocabulary, and sets the attribute of the semantic information comparing with the name attribute of the vocabulary as the name attribute, and if the attributes are the same, the rule to determine that the relationship;

Alternatively, the morpheme included in the vocabulary is defined as a morpheme attribute in the vocabulary, the attribute of semantic information comparing with the morpheme attribute of the vocabulary is defined as a morpheme attribute, and if the attributes are the same, a rule is determined to be related. Semantic annotation system, characterized in that stored as a rule.

In semantic annotation method using ontology repository,

(a) storing semantic information (metadata) in the ontology repository to build a semantic information knowledge base;

(b) loading some of the stored semantic knowledge bases by user designation;

(c) retrieving a document by user specification;

(d) annotating the components of the document with the semantic information when all the components of the document have semantic information related to the components in the semantic information knowledge base;

(e) annotating the semantic information to the elements of the document with respect to the semantic information designated by the user and the elements of the document;

(f) attaching the semantic information to a document;

Semantic annotation method comprising a.

The method of claim 16, wherein step (d)

And informing the components of the document that the semantic information can be annotated before annotating the semantic information.

The method of claim 16, wherein step (d)

(d1) for each component of the document, if the component is a vocabulary, performing steps (d2) to (d5);

(d2) If there is an instance of semantic information having the same name attribute (value) as the vocabulary of the document, annotate the vocabulary of the document with the instance, perform step (d5), and if not exist, step (d3) Performing;

(d3) if a semantic information class having the same stemming attribute (value) as the morpheme of the document's vocabulary exists, create an instance for the class, and set the attribute value of the name attribute of the created instance as the vocabulary of the document; step;

(d4) annotating the vocabulary of the document to the instance;

(d5) performing steps (d2) to (d4) for all components of the document;

Semantic annotation method comprising a.

The method of claim 16, wherein step (f) comprises:

If semantic information annotated to a document exists as an instance in the semantic information knowledge base, the semantic annotation method is characterized by attaching only the connection information for the instance.

The method of claim 16, wherein step (f) comprises:

And attaching semantic information to the document in one of XML, XMP, and RDF formats.

The method of claim 17, wherein step (d),

The semantic annotation method of highlighting on the screen the components of the document informing the components of the document that can be annotated.

The method of claim 16,

(f0) The semantic annotation method further comprises converting and storing semantic information created in the document before the step (f) into a SCOT (Social Semantic Cloud of Tags) format.

The method of claim 16,

The component of the document includes at least one or more of the entire document, text or vocabulary, images, voice, video, links.

The method of claim 16,

The document is a document embedded in a user computer or a semantic annotation method, characterized in that the document is transferred to the Internet imported.

In semantic annotation method using ontology repository,

(g) storing semantic information (metadata) in the ontology repository to build a semantic information knowledge base;

(h) In order to determine the relationship between the components of the document and the semantic information, the attributes are defined in the components of the document, the attributes of the semantic information to be compared with the attributes of the components, and the relationship to determine the relationship Creating and storing an automatic processing rule for determining a method of comparing the attributes;

(i) loading some of the stored semantic knowledge bases by user designation;

(j) retrieving a document by user specification;

(k) for each component of the document, if the attribute of the semantic information to be compared with the attribute of the component is in the automatic processing rule, the semantic information having the attribute of the semantic information is searched in the semantic information knowledge base, If it is determined that the attribute (value) of the retrieved semantic information and the attribute (value) of the component are related, annotating the retrieved semantic information to the component of the document;

(l) annotating the semantic information to components of the document, for the semantic information designated by the user and the components of the document;

(m) attaching the semantic information to a document;

Semantic annotation method comprising a.

The method of claim 25,

The components of the document include a vocabulary,

In step (h),

A rule for defining the vocabulary itself as a name attribute in the vocabulary, defining an attribute of semantic information comparing with the name attribute of the vocabulary as a name attribute, and determining that the attributes are related if they are identical;

Alternatively, the morpheme included in the vocabulary is defined as a morpheme attribute in the vocabulary, and an attribute of semantic information comparing with the morpheme attribute of the vocabulary is defined as a morpheme attribute, and if the attributes are the same, a rule is determined to be related. Semantic annotation method, characterized in that stored as a rule.

A computer-readable recording medium having recorded the semantic annotation method of claim 16.