KR20090119383A

KR20090119383A - System and method for providing terminology resource

Info

Publication number: KR20090119383A
Application number: KR1020080045388A
Authority: KR
Inventors: 정도헌; 최희윤; 신기정; 김환민; 이상환; 김혜선; 최호남; 예용희
Original assignee: 한국과학기술정보연구원
Priority date: 2008-05-16
Filing date: 2008-05-16
Publication date: 2009-11-19
Also published as: KR100945495B1

Abstract

PURPOSE: A multilingual specialized terminology resource providing system is provided to offer the search result in which the similarity is reflected to a user by expanding the specialized terminology to the multilingual environment. CONSTITUTION: A multilingual semantic network build-up device(130) generates a translation file by consulting a keyword field of the each document stored in bibliographic information database(100). When the search word is inputted, a multilingual deduction device(150) extracts the semantic network about the search word from the multilingual semantic network build-up device. The multilingual deduction device extracts the similarity between configuration keywords based on the search word and provides the relation information with the visualization information.

Description

System and Method for providing terminology resource

본 발명은 다국어 전문용어 자원 제공 시스템 및 방법에 관한 것으로서, 더욱 상세하게는 문헌 데이터베이스의 키워드 필드를 이용하여 대역어 파일을 생성하고, 개별 대역어 파일을 유사도에 따라 트리플 구조로 형성하여 다국어 전문용어 의미망에 대해 의미기반의 자동 추론을 할 수 있는 다국어 전문용어 자원 제공 시스템 및 그 방법에 관한 것이다. The present invention relates to a system and method for providing a multilingual terminology resource. More particularly, the present invention relates to a multilingual terminology semantic network by generating a bandword file using keyword fields of a literature database, and forming individual bandword files in a triple structure according to similarity. The present invention relates to a multilingual terminology resource providing system and a method for semantic-based automatic reasoning.

정보통신 기술이 발전함에 따라 다양한 언어자원의 어휘적 의미, 구문적 의미, 담화적 의미를 바탕으로 의미론적/개념론적 특성을 해석하여 정보검색의 효율을 높이고자 하는 목적으로 의미적 언어 자원구축에 대한 연구가 다양하게 이루어지고 있다. With the development of information and communication technology, semantic / conceptual characteristics are interpreted based on the lexical, syntactic, and discourse meanings of various language resources to improve the efficiency of information retrieval. Various studies are being done.

기존에 수행된 대부분의 연구에서는 사전이나 시소러스 등 외부 언어자원을 이용하여 이용자와 정보검색 시스템 간의 정보 검색시 발생하는 격차를 해소할 수 있는 시스템과 환경을 개발하는 데 주력하여 왔다. 즉, 용어의 해석이 가능하도록 지식베이스를 구축하는 것과 의미 해석기법에 대한 연구가 언어자원 시스템의 중요한 사항이다. 그러나, 실제 외부로부터 도입한 언어자원을 이용하는 경우에는 이용자의 질의어를 실제 구축하고 있는 데이터베이스의 내용과 정확히 의미적으로 매칭시키기 어렵기 때문에, 문헌으로부터 추출한 색인어를 시소러스의 개념어와 매칭하는 응용연구가 필요하게 된다.Most previous researches have focused on developing systems and environments that can bridge the gap in information retrieval between users and information retrieval systems by using external language resources such as dictionaries or thesaurus. In other words, building a knowledge base so that terms can be interpreted and studying semantic interpretation techniques are important aspects of the language resource system. However, in the case of using language resources introduced from outside, it is difficult to accurately match the user's query word with the contents of the database that is actually constructed. Therefore, an applied research is needed to match the index word extracted from the literature to the conceptual word of the thesaurus. Done.

게다가, 자동분류, 자동색인 실험을 비롯한 언어자원을 기반으로 한 검색 성능 실험은 실제 대용량 데이터베이스 환경에서 수행하기 힘든 한계로 인해 주로 정제된 언어 자원을 바탕으로 한 실험적 수준에서 이루어지고 있다. In addition, retrieval performance experiments based on language resources, including automatic classification and auto-indexing experiments, are mainly performed at the experimental level based on refined language resources due to the limitations that are difficult to perform in a large database environment.

검색성능을 향상시키기 위해서는 각 데이터베이스 및 서비스의 특성에 맞는 양질의 언어자원의 구축이 필수적이다. 그러나, 분야별 특성화된 전문용어를 구축하기 위해서는 분야별 전문가가 직접 작업을 수행해야 하기 때문에, 막대한 인적 자원과 예산 및 소요 시간이 요구된다.To improve retrieval performance, it is essential to build high quality language resources suitable for each database and service. However, in order to build specialized terminology for each field, field experts need to work directly, requiring huge human resources, budget and time.

그렇기 때문에 몇몇 특정 분야에 한정하여 전문 용어를 구축하고 있는 실정이며, 과학기술 전분야와 같이 여러 분야에 대한 광범위한 언어 자원 구축은 현실적으로 매우 힘들다. 또한, 한글 또는 한글을 중심으로 하여 다국어로 구축되어 활용가능한 전문용어 자원이 매우 부족한 것이 현실이다.Therefore, the terminology is limited to a few specific fields, and it is difficult to construct a wide range of language resources in various fields such as the entire science and technology field. In addition, the reality is that the terminology resources that can be built and utilized in a multi-language around Hangul or Hangul is very short.

사전이나 시소러스와 같이, 수작업 기반의 구축방식에서는 용어간의 의미관계를 정의하기 위해 전문용어간의 관계를 명시적으로 선언하는 방법을 적용하고 있다. 그러나 상기와 같은 종래의 언어 자원 구축 방법은 전문가의 전문성 정도에 따라 용어간의 의미 관계가 크게 좌우된다는 문제점이 발생할 수 있고, 구축된 전문 용어간의 유사관계의 정도를 파악할 수 없다는 문제점이 발생한다. 이로 인해 언어 자원의 활용성이 낮아질 수 있다.As with dictionaries and thesauruses, hand-based constructs employ a method of explicitly declaring relationships between terminology to define semantic relationships between terms. However, the conventional language resource construction method as described above may cause a problem that the semantic relationship between terms is greatly dependent on the degree of expertise of the expert, and the problem of not being able to grasp the degree of similarity between the established terminology. This can reduce the availability of language resources.

또한, 대부분의 용어처리와 관련된 실험들은 한정된 규모의 자원을 바탕으로 수행한 경우가 많아 실제 시스템에 적용할 경우 언어자원의 규모 및 신뢰성에 대한 문제가 여전히 남아 있다.In addition, most of the experiments related to term processing have been performed based on a limited amount of resources. Therefore, when applied to an actual system, problems regarding the size and reliability of language resources remain.

또한, 분야별로 구축된 언어자원들의 통합 문제가 사전에 검토되지 않을 경우, 여러 분야별로 구축된 언어자원을 통합하여 매크로 자원으로 재구성하는 것은 거의 불가능한 것으로 알려져 있다. 그리고, 전문 용어의 변화가 잦은 동적인 자원임을 감안하여 원천 데이터의 변경 시 자동으로 연계된 언어자원들이 갱신되어야 하는데, 그렇게 되지 못하는 단점이 있다. In addition, it is known that it is almost impossible to integrate language resources constructed in various fields and reconstruct them into macro resources unless the problem of integration of language resources constructed in each field is examined in advance. In addition, considering that terminology changes are frequent dynamic resources, language resources that are automatically linked should be updated when the source data is changed.

마지막으로 다국어 시스템과 관련하여, 현재 운용되고 있는 대부분의 정보검색 서비스는 주로 영어 및 자국어 만을 지원하는 방식으로 운용되고 있는데, 이러한 검색 엔진을 통해 인터넷상에 산재해 있는 정보를 검색하기 위해서는 해당되는 단어에 대한 해당 국가의 언어표기를 정확히 알아야 하며 검색기 상에서 지원하는 해당 언어가 사용자의 시스템상에 있어야만 한다. 따라서, 이러한 조건이 만족되지 못하는 경우에는 사용자가 해당 단어를 검색하여 원하는 정보를 취득하고자 할 때 접근 자체가 불가능하거나 부수적인 시스템이 필요한 점 등의 문제점이 있다. Finally, in relation to multilingual systems, most information retrieval services currently in operation operate mainly in English and native languages. In order to search for information scattered on the Internet through these search engines, You must know the correct language representation of your country for and the language supported by the browser must be present on your system. Therefore, when such a condition is not satisfied, there is a problem in that when the user searches for the corresponding word and obtains the desired information, the access itself is impossible or an additional system is required.

본 발명은 상술한 문제점을 해결하기 위해 안출된 것으로, 전문 용어를 다국어 환경으로 자동 확장하여 의미기반의 유사 언어 자원 네트워크를 자동으로 추론할 수 있고, 전문 용어간의 의미 유사도에 따른 의미 해석을 자동으로 수행할 수 있고, 특정 용어를 기초로 검색을 시도하는 사용자에게 유사도가 반영된 검색 결과를 제공할 수 있는 다국어 전문용어 자원 제공 시스템 및 그 방법을 제공하는데 있다. The present invention has been made to solve the above-mentioned problems, and can automatically infer a semantic-based similar language resource network by automatically expanding the terminology into a multilingual environment, and automatically interpret the semantic interpretation according to the semantic similarity between the terminology. The present invention provides a multilingual terminology resource providing system and method capable of performing the search and providing a search result reflecting similarity to a user who attempts a search based on a specific term.

본 발명의 다른 목적은 개별 대역파일 정보를 시맨틱 웹의 표준 기술규칙인 트리플 데이터 구조로 형성하고, 이렇게 형성된 다국어 전문용어 네트워크에 대해 의미기반의 자동 추론을 할 수 있도록 하는 다국어 전문용어 자원 제공 시스템 및 그 방법을 제공하는데 있다.Another object of the present invention is to provide a multilingual terminology resource providing system for forming individual band file information into a triple data structure, which is a standard technical rule of the semantic web, and enabling semantic-based automatic inference on the formed multilingual terminology network; To provide that method.

본 발명의 또 다른 목적은 문헌 정보 데이터베이스의 업데이트를 실시간으로 반영하여 다국어 전문용어의 의미망을 자동으로 갱신할 수 있는 다국어 전문용어 자원 제공 시스템 및 그 방법을 제공하는데 있다. It is still another object of the present invention to provide a multilingual terminology resource providing system and method for automatically updating a semantic network of multilingual terminology by reflecting an update of a document information database in real time.

상기 목적들을 달성하기 위하여 본 발명의 일 측면에 따르면, 문헌 정보 데이터베이스, 상기 문헌 정보 데이터베이스에 저장된 각 문헌의 키워드 필드를 참조하여 대역어 파일을 생성하고, 상기 대역어 파일을 트리플 구조로 만든 후 각 키워 드의 속성 정보에 따라 노드로 연결하여 의미망을 생성하는 다국어 의미망 구축 장치, 사용자에 의해 검색어가 입력되면 상기 검색어에 대한 의미망을 상기 다국어 의미망 구축 장치로부터 추출하고, 상기 추출된 의미망에서 상기 검색어를 기준으로 구성 키워드간 유사도를 구하여 그 관계를 시각화 정보로 사용자에게 제공하는 다국어 전문용어 추론 장치를 포함하는 다국어 전문용어 자원 제공 시스템이 제공된다. According to an aspect of the present invention in order to achieve the above object, by generating a bandword file by referring to the keyword information database, the keyword field of each document stored in the document information database, and after making the bandword file in a triple structure, each keyword A multilingual semantic network building device for generating a semantic network by connecting to a node according to the attribute information of. When a search word is input by a user, the semantic network for the search word is extracted from the multilingual semantic network building device. There is provided a multilingual terminology resource providing system including a multilingual terminology inference device that obtains a similarity between constituent keywords based on the search word and provides the relationship as visualization information to a user.

상기 문헌 정보 데이터베이스에는 각 문헌이 유니코드로 변환되어 저장되어 있다. Each document is converted into Unicode and stored in the document information database.

상기 다국어 의미망 구축 장치는 상기 문헌 정보 데이터베이스의 업데이트 여부를 실시간으로 감지하여 업데이트가 된 경우, 의미망을 재생성한다. The multilingual semantic network building device detects whether the document information database is updated in real time and regenerates the semantic network when the update is made.

상기 다국어 전문용어 추론 장치는 사용자에 의해 입력된 검색어와 트리플 구조가 생성된 키워드를 상기 다국어 의미망 구축 장치로부터 추출하고, 상기 추출된 키워드를 상기 검색어와의 속성 정보에 따라 노드로 연결하여 의미망을 생성하고, 상기 생성된 의미망에서 상기 검색어를 기준으로 구성 키워드간 유사도를 구하여 그 관계를 시각화 정보로 사용자에게 제공한다. The multilingual term inference device extracts a search word input by a user and a keyword having a triple structure from the multilingual semantic network building device, and connects the extracted keyword to a node according to attribute information with the search word. To generate a similarity between the keyword keywords based on the search word in the generated semantic network and provide the relationship to the user as visualization information.

본 발명의 다른 측면에 따르면, 문헌 정보 데이터베이스에 저장된 문헌에 대한 다국어 의미망을 구축하는 다국어 의미망 구축 장치에 있어서, 상기 문헌 정보 데이터베이스의 키워드 필드를 참조하여 대역어 파일을 생성하는 대역어 파일 생성부, 상기 대역어 파일 생성부에서 생성된 대역어간의 관계를 트리플 구조로 생성하는 트리플 구조 생성부, 상기 트리플 구조 생성부에서 생성된 트리플 구조의 키워 드에 대하여 속성 정보에 따라 노드로 연결하여 의미망을 생성하는 의미망 생성부, 상기 의미망 생성부에서 생성된 의미망내 각 키워드에 대하여 의미 정보를 생성하여 해당 키워드와 맵핑시켜 저장하는 의미 정보 맵핑부를 포함하는 다국어 전문용어 자원 제공을 위한 다국어 의미망 구축 장치가 제공된다. According to another aspect of the present invention, in the multilingual semantic network building apparatus for building a multilingual semantic network for documents stored in a bibliographic information database, a bandword file generation unit for generating a bandword file by referring to a keyword field of the bibliographic information database; Triple structure generation unit for generating a relationship between the band words generated by the band word file generation unit in a triple structure, and a semantic network by connecting to the node of the triple structure generated by the triple structure generation unit according to the attribute information An apparatus for constructing a multilingual semantic network for providing a multilingual terminology resource comprising a semantic network generating unit and a semantic information mapping unit for generating semantic information for each keyword in the semantic network generated by the semantic network generating unit and mapping and storing the semantic information with the corresponding keyword. Is provided.

상기 다국어 의미망 구축 장치는 상기 문헌 정보 데이터베이스의 업데이트 여부를 실시간으로 감지하여 업데이트 된 경우, 상기 대역어 파일 생성부에 대역어 파일 생성을 요청하는 문헌 정보 업데이트 감지부를 더 포함한다. The apparatus for constructing a multilingual semantic network further includes a document information update detection unit for requesting generation of a band word file in the band word file generation unit when the document information database is updated in real time.

상기 다국어 의미망 구축 장치는 상기 대역어 파일 생성부에서 생성된 대역어 파일을 저장하는 대역파일정보 데이터베이스, 상기 트리플 구조 생성부에서 생성된 트리플 구조 데이터를 저장하는 트리플 데이터 데이터베이스, 각 키워드에 대하여 의미정보가 맵핑된 의미망 정보 데이터베이스를 더 포함한다.The apparatus for constructing a multilingual semantic network includes a band file information database for storing a band word file generated by the band word file generator, a triple data database for storing triple structure data generated by the triple structure generator, and semantic information for each keyword. It further includes a mapped semantic network information database.

또한, 상기 다국어 의미망 구축 장치는 상기 문헌 정보 데이터베이스에 저장된 문헌 정보를 유니코드 포맷으로 변환하여 저장하는 포맷 변환부를 더 포함한다. The apparatus for constructing a multilingual semantic network further includes a format conversion unit for converting and storing the document information stored in the document information database into a Unicode format.

상기 대역어 파일 생성부는 상기 문헌 정보 데이터베이스의 키워드 필드에서 다국어 키워드 쌍을 추출하여 대역어 파일을 생성한다. The bandword file generator generates a bandword file by extracting a multilingual keyword pair from a keyword field of the document information database.

또한, 상기 대역어 파일 생성부는 상기 생성된 대역어 파일에서 도메인별로 정해진 일정한 임계치 이상의 발생 빈도를 가지는 대역어 파일을 유효한 파일로 선택한다. In addition, the bandword file generator selects a bandword file having a frequency of occurrence above a predetermined threshold determined for each domain from the generated bandword file as a valid file.

상기 트리플 구조 생성부는 시맨틱 웹의 표준기술 규칙을 이용하여 상기 대역어 파일 생성부에서 생성된 대역 키워드간의 관계를 트리플 구조로 생성하되, 언 어가 다른 대역어 관계는 대칭 속성(Symmetry Property)을 부여하고, 동일한 언어의 경우는 더미 속성(Dummy Property)을 부여하여 트리플 구조를 생성한다. The triple structure generator generates a relationship between the band keywords generated by the band word file generator using a standard description rule of the semantic web in a triple structure, and a band word relationship with different languages is given a symmetry property, and the same In the case of a language, a triple structure is created by giving a dummy property.

또한, 상기 트리플 구조 생성부는 생성된 트리플 구조의 각 키워드에 고유한 URI를 부여한다. In addition, the triple structure generation unit assigns a unique URI to each keyword of the generated triple structure.

상기 의미망 생성부는 상기 트리플 구조 생성부에서 생성된 대역 키워드 간의 관계에서 더미 속성(Dummy Property)을 가지는 키워드들은 동일한 개체(Instance)로 해석하고, 그 키워드를 중심으로 대칭 속성(Symmetry Property)을 가지는 키워드를 연결하여 의미망을 생성한다. The semantic network generator interprets keywords having dummy properties in the relationship between the band keywords generated by the triple structure generator as the same instance and has a symmetry property around the keywords. Create semantic networks by connecting keywords.

또한, 상기 의미망 생성부는 더미 속성(Dummy Property)을 가져 동일한 개체(Instance)로 해석된 키워드에는 각 키워드가 가지고 있던 URI를 함께 저장한다. In addition, the semantic network generating unit stores a URI that each keyword has in a keyword that has a dummy property and is interpreted as the same instance.

상기 의미 정보 맵핑부는 상기 트리플 구조 생성부에서 부여된 URI를 이용하여 각 키워드의 도메인별 분류 정보를 판단하고, 그 분류 정보에서의 해당 키워드의 발생빈도 및 가중치를 포함하는 유사벡터를 구하여 각 키워드와 맵핑시켜 저장한다. The semantic information mapping unit determines the classification information for each keyword of each keyword by using the URI given by the triple structure generation unit, obtains a similar vector including the occurrence frequency and weight of the corresponding keyword in the classification information, and calculates a similar vector. Map and save.

본 발명의 또 다른 측면에 따르면, 사용자에 의해 질의된 검색어에 대한 정보를 제공하는 다국어 전문용어 추론장치에 있어서, 사용자에 의해 검색어가 입력되면, 다국어 의미망 구축 장치의 데이터베이스로부터 상기 검색어를 추출하는 검색어 추출부, 상기 데이터베이스로부터 상기 검색어와 관련 속성(Property)이 있는 노드가 설정된 의미망을 추출하는 의미망 정보 추출부, 상기 의미망 정보 추출부에서 추출된 의미망에서 상기 검색어를 기준으로 각 노드간 유사도를 측정하는 유사 도 측정부, 상기 유사도 측정부에서 측정된 유사도가 상기 의미망내의 노드에 표시된 시각화 정보를 생성하여 상기 사용자에게 제공하는 시각화 정보 생성부를 포함하는 다국어 전문용어 자원 제공을 위한 다국어 전문용어 추론 장치가 제공된다. According to another aspect of the present invention, in a multilingual terminology inference apparatus that provides information on a search query queried by a user, when the search term is input by the user, the search term is extracted from a database of the multilingual semantic network building apparatus. A search term extracting unit, a semantic network information extracting unit for extracting a semantic network in which a node having a related property is set from the database, and a semantic network extracted by the semantic network information extracting unit based on the search terms A multi-language terminology resource providing unit comprising a similarity measurer measuring a similarity between the two and a similarity measured by the similarity measurer and generating a visualization information displayed on a node in the semantic network and providing the same to the user. A terminology inference device is provided.

본 발명의 또다른 측면에 따르면, 사용자에 의해 질의된 검색어에 대한 정보를 제공하는 다국어 전문용어 추론장치에 있어서, 사용자에 의해 검색어가 입력되면, 다국어 의미망 구축 장치의 데이터베이스로부터 상기 검색어를 추출하는 검색어 추출부, 상기 검색어와 트리플 구조로 연결된 키워드들을 상기 데이터베이스로부터 추출하는 트리플 구조 정보 추출부, 상기 추출된 트리플 구조의 키워드를 상기 검색어를 기준으로 속성 정보에 따라 관련 노드를 연결하여 의미망을 생성하고, 상기 생성된 의미망내 각 키워드에 대하여 의미 정보를 생성하여 해당 키워드와 맵핑시키는 의미망 정보 생성부, 상기 의미망 정보 생성부에서 생성된 의미망에서 상기 검색어를 기준으로 각 노드간의 유사도를 측정하는 유사도 측정부, 상기 유사도 측정부에서 측정된 유사도에 따라 상기 의미망내 각 노드의 위치 정보를 산출하여 시각화 정보를 생성하고, 상기 생성된 시각화 정보를 사용자에게 제공하는 시각화 정보 생성부를 포함하는 다국어 전문용어 자원 제공을 위한 다국어 전문용어 추론 장치가 제공된다. According to another aspect of the present invention, in a multilingual terminology inference apparatus providing information on a search term queried by a user, when the search term is input by the user, extracting the search term from a database of the multilingual semantic network construction device. A search word extracting unit, a triple structure information extracting unit extracting keywords connected to the search word in a triple structure from the database, and generating a semantic network by connecting related nodes based on the attribute information based on the extracted triple structure keyword And a semantic network information generating unit for generating semantic information for each keyword in the semantic network generated and mapping the semantic information to the corresponding keyword, and measuring similarity between nodes based on the search terms in the semantic network generated by the semantic network information generating unit. Similarity measuring unit to measure, the similarity measuring unit The multilingual terminology inference device for providing a multilingual terminology resource includes a visualization information generation unit configured to calculate location information of each node in the semantic network according to the similarity, and provide the generated visualization information to a user. Is provided.

상기 유사도 측정부는 상기 검색어를 기준으로 각 노드에 있는 키워드와의 유사도를 구한다. The similarity measurer calculates similarity with a keyword in each node based on the search word.

상기 시각화 정보 생성부는 상기 유사도 측정부에서 측정된 키워드간 유사도값을 이용하여 최적의 의미망을 가지는 시각화 정보를 생성하여 상기 사용자에게 제공한다. The visualization information generator generates visualization information having an optimal semantic network using the similarity value between keywords measured by the similarity measurer and provides the same to the user.

상기 시각화 정보에는 의미망, 유사 임계치 조절키가 포함되고, 상기 시각화 정보 생성부는 상기 유사 임계치 조절키를 이용하여 임계치가 조절되는 경우, 해당 임계치를 기준으로 의미망의 규모를 제어한다. The visualization information includes a semantic network and a similar threshold control key. When the threshold is adjusted using the pseudo threshold control key, the visualization information control unit controls the size of the semantic network based on the corresponding threshold.

본 발명의 또 다른 측면에 따르면, 다국어 의미망 구축 장치가 문헌 정보 데이터베이스에 저장된 문헌을 이용하여 다국어 의미망을 구축하는 방법에 있어서, (a)상기 문헌 정보 데이터베이스의 키워드 필드를 참조하여 대역어 파일을 생성하는 단계, (b)상기 생성된 대역어간의 관계를 트리플 구조로 생성하는 단계, (c)상기 생성된 트리플 구조의 대역어 간의 관계에서 더미 속성(Dummy Property)을 가지는 키워드는 동일 개체(Instance)로 해석하고, 그 키워드를 중심으로 대칭 속성(Symmetry Property)을 가지는 키워드를 연결하여 의미망을 생성하는 단계, (d)상기 생성된 의미망내 각 키워드의 의미 정보를 생성하여 해당 키워드와 맵핑시켜 저장하는 단계를 포함하는 다국어 전문용어 자원 제공을 위한 다국어 의미망 구축 방법이 제공된다. According to another aspect of the present invention, in the multilingual semantic network building apparatus to build a multilingual semantic network using the documents stored in the bibliographic information database, (a) using the reference field in the bibliographic information database with reference to the keyword language file (B) generating the relationship between the generated band words in a triple structure; and (c) a keyword having a dummy property in the relationship between the generated band words in the triple structure is the same instance. Generating a semantic network by connecting keywords having a symmetry property around the keyword, and (d) generating semantic information of each keyword in the generated semantic network and mapping and storing the semantic information. A multilingual semantic network construction method for providing a multilingual terminology resource including a step is provided.

상기 문헌 정보 데이터베이스의 업데이트가 감지되는 경우, 상기 (a)단계부터 (d)단계를 수행하는 단계를 더 포함하는 다국어 전문용어 자원 제공을 위한 다국어 의미망 구축 방법이 제공된다. When the update of the document information database is detected, a method for constructing a multilingual semantic network for providing a multilingual terminology resource further comprising the step (a) to (d).

상기 (a)단계는 상기 문헌 정보 데이터베이스에 저장된 각 문헌의 키워드 필드에서 키워드를 추출하는 단계, 상기 추출된 키워드가 둘 이상의 언어로 구성되어 있는 경우, 상기 추출한 키워드 간을 상호 매칭시켜 대역어 파일을 생성하는 단계 를 포함한다. Step (a) is a step of extracting a keyword from a keyword field of each document stored in the document information database. When the extracted keyword is composed of two or more languages, the extracted keyword is mutually matched to generate a band word file. It includes a step.

상기 (b)단계는 상기 시맨틱 웹의 표준기술 규칙을 이용하여 언어가 다른 대역어 파일은 대칭 속성(Symmetry Property)을 부여하고, 동일한 언어를 갖는 대역어 파일은 더미 속성(Dummy Property)을 부여하여 트리플 구조를 생성한다. In the step (b), a bandword file having a different language is given a symmetry property, and a bandword file having a same language is given a dummy property using a standard description rule of the semantic web. Create

상기 (d)단계는 상기 트리플 구조내 각 키워드에 부여된 URI를 이용하여 분류정보를 판단하는 단계, 상기 판단된 분류정보에서 해당 키워드의 발생빈도 및 가중치를 포함하는 유사벡터를 구하여 각 키워드와 맵핑시켜 저장하는 단계를 포함한다. In the step (d), the classification information is determined using URIs assigned to each keyword in the triple structure, and the similarity vector including occurrence frequency and weight of the corresponding keyword is obtained from the determined classification information and mapped to each keyword. To store.

본 발명의 또 다른 측면에 따르면, 다국어 전문용어 추론 장치가 사용자에 의해 입력된 질의어를 검색하는 방법에 있어서, (a)사용자에 의해 검색어가 입력되면, 데이터베이스로부터 상기 검색어와 유사관계가 설정된 키워드가 포함된 의미망을 추출하는 단계, (b)상기 추출된 의미망에서 상기 검색어를 기준으로 각 노드간의 유사도를 측정하는 단계, (c)상기 측정된 유사도에 따라 상기 의미망내 각 노드의 위치 정보를 산출하여 시각화 정보를 생성하고, 상기 생성된 시각화 정보를 사용자에게 제공하는 단계를 포함하는 다국어 전문용어 자원 제공을 위한 다국어 전문용어 추론 방법이 제공된다. According to another aspect of the present invention, in the multi-lingual term inference device to search for a query input by the user, (a) if a search word is input by the user, a keyword having a similarity with the search term is entered from the database; Extracting an included semantic network, (b) measuring similarity between each node based on the search word in the extracted semantic network, and (c) obtaining location information of each node in the semantic network according to the measured similarity. A multilingual terminology inference method for providing a multilingual terminology resource comprising calculating and generating visualization information and providing the generated visualization information to a user is provided.

본 발명의 또 다른 측면에 따르면, 다국어 전문용어 추론 장치가 사용자에 의해 입력된 질의어를 검색하는 방법에 있어서, (a)사용자에 의해 검색어가 입력되면, 데이터베이스를 검색하여 상기 검색어와 트리플 구조가 생성된 대역어를 추출하는 단계, (b)상기 추출된 대역어에 대해 상기 검색어와의 속성 정보를 이용하여 의미망을 생성하는 단계, (c)상기 생성된 의미망내 각 키워드의 의미 정보를 생성하여 해당 키워드와 맵핑시키는 단계, (d)상기 생성된 의미망에서 상기 검색어를 기준으로 각 키워드간의 유사도를 측정하는 단계, (e)상기 측정된 측정된 유사도에 따라 상기 의미망내 각 노드의 위치 정보를 산출하여 시각화 정보를 생성하고, 상기 생성된 시각화 정보를 사용자에게 제공하는 단계를 포함하는 다국어 전문용어 자원 제공을 위한 다국어 전문용어 추론 방법이 제공된다. According to another aspect of the present invention, in the multilingual terminology inference device to search for a query input by the user, (a) if a search word is input by the user, the database is searched to generate the search term and triple structure Extracting the generated band words, (b) generating a semantic network using the attribute information with the search word for the extracted band words, and (c) generating semantic information of each keyword in the generated semantic networks. Mapping to and (d) measuring similarity between each keyword based on the search word in the generated semantic network, and (e) calculating position information of each node in the semantic network according to the measured similarity. Generating visualization information, and providing the generated visualization information to the user multilingual for providing a multilingual terminology resource Terminology reasoning methods are provided.

상기 (b)단계는, 상기 추출된 대역어 중에서 상기 검색어와 더미 속성(Dummy Property)을 가지는 대역어는 동일 개체(Instance)로 해석하고, 상기 검색어를 중심으로 대칭 속성(Symmetry Property)을 가지는 대역어를 연결하여 의미망을 생성한다. In the step (b), a band word having the search word and a dummy property among the extracted band words is interpreted as an instance, and a band word having a symmetry property is connected around the search word. To create a semantic network.

상기 (c)단계는, 각 키워드의 도메인별 분류 정보를 판단하고, 그 분류 정보에서의 해당 키워드의 발생빈도 및 가중치를 포함하는 유사벡터를 구하여 각 키워드와 맵핑시킨다. In the step (c), the classification information for each domain of each keyword is determined, and a similar vector including the occurrence frequency and weight of the keyword in the classification information is obtained and mapped to each keyword.

본 발명은 문헌 데이터베이스로부터 자동 추출한 개별 대역파일 정보를 유사도에 따라 군집을 형성하여 다국어 전문용어 의미망을 자동으로 구축하기 때문에 구축된 언어 자원에 대해 보다 신뢰할 수 있고, 다양한 분야를 모두 반영한 언어 자원을 자동으로 구축할 수 있으며, 실제 데이터베이스의 정보 변경을 실시간으로 반영하여 다국어 전문용어의 의미망을 자동으로 갱신할 수 있는 다국어 전문용어 자원 제공 시스템 및 그 방법을 제공할 수 있다. According to the present invention, since the multi-band terminology semantic network is automatically built by clustering individual band file information automatically extracted from the literature database according to similarity, the language resource is more reliable and the language resource reflecting all the various fields can be selected. A multilingual terminology resource providing system and a method for automatically updating the semantic network of multilingual terminology can be provided by reflecting the change of information in a real database in real time.

또한, 전문 용어를 다국어 환경으로 자동 확장하여 의미기반의 유사 언어 자원 네트워크를 자동으로 추론할 수 있고, 전문 용어간의 의미 유사도에 따른 의미 해석을 자동으로 수행할 수 있고, 특정 용어를 기초로 검색을 시도하는 사용자에게 유사도가 반영된 검색 결과를 제공할 수 있는 다국어 전문용어 자원 제공 시스템 및 그 방법을 제공할 수 있다. In addition, it is possible to automatically infer a semantic-based similar language resource network by automatically expanding the terminology into a multilingual environment, automatically perform the semantic interpretation based on the semantic similarity between the terminology, and search based on a specific term. A multilingual terminology resource providing system and method capable of providing search results reflecting similarities to an attempted user may be provided.

본 발명의 전술한 목적과 기술적 구성 및 그에 따른 작용 효과에 관한 자세한 사항은 본 발명의 명세서에 첨부된 도면에 의거한 이하 상세한 설명에 의해 보다 명확하게 이해될 것이다.Details of the above-described objects and technical configurations of the present invention and the effects thereof according to the present invention will be more clearly understood by the following detailed description based on the accompanying drawings.

도 1은 본 발명에 따른 다국어 전문용어 자원 제공 시스템의 구성을 나타낸 도면이다.1 is a view showing the configuration of a multi-lingual terminology resource providing system according to the present invention.

도 1을 참조하면, 다국어 전문용어 자원 제공 시스템은 문헌 정보 데이터베이스(100), 다국어 의미망 구축 장치(130), 다국어 전문용어 추론 장치(150)를 포함한다. Referring to FIG. 1, the multilingual terminology resource providing system includes a document information database 100, a multilingual semantic network construction device 130, and a multilingual terminology inference device 150.

상기 문헌 정보 데이터베이스(100)는 국내외 학술논문, 특허, 연구보고서 등의 전문 분야에 대한 문헌 정보가 저장되어 있다. 상기 문헌 정보 데이터베이스(100)의 문헌 정보는 유니코드로 변환되어 저장되어 있다. 예를 들면, 한글코드(KSC5601)는 UFT-8로 변환, 일본어코드(Shift-JIS)는 UFT-8로 변환, 중국어간체 코드(GB2312)는 UTF-8로 변환을 실시간으로 수행한다. The bibliographic information database 100 stores bibliographic information on specialized fields such as domestic and international academic papers, patents, and research reports. The document information of the document information database 100 is converted into Unicode and stored. For example, the Korean code (KSC5601) is converted to UFT-8, the Japanese code (Shift-JIS) is converted to UFT-8, and the Simplified Chinese code (GB2312) is converted to UTF-8 in real time.

상기 다국어 의미망 구축 장치(130)는 상기 문헌 정보 데이터베이스(100)에 저장된 각 문헌의 키워드 필드를 참조하여 대역어 파일을 생성하고, 상기 대역어 파일을 트리플 구조로 만든 후 각 노드간의 속성정보(Property)에 따라 노드를 연결하여 의미망을 생성한다. The apparatus for constructing a multilingual semantic network 130 generates a bandword file by referring to a keyword field of each document stored in the document information database 100, makes the bandword file into a triple structure, and then displays property information between nodes. Create semantic network by connecting nodes according to

또한, 상기 다국어 의미망 구축 장치(130)는 상기 문헌 정보 데이터베이스(100)의 업데이트 여부를 실시간으로 감지하여 문헌 정보가 업데이트된 경우, 의미망을 재생성하는 역할을 수행한다. In addition, the apparatus for constructing a multilingual semantic network 130 detects whether the document information database 100 is updated in real time and plays a role of regenerating the semantic network when the document information is updated.

상기와 같은 역할을 수행하는 다국어 의미망 구축 장치(130)에 대한 상세한 설명은 도 2를 참조하기로 한다. A detailed description of the apparatus for constructing a multilingual semantic network 130 performing the above role will be given with reference to FIG. 2.

상기 다국어 전문용어 추론 장치(150)는 사용자(170)에 의해 검색어가 입력되면 상기 검색어에 대한 의미망을 상기 다국어 의미망 구축 장치(130)로부터 추출하고, 상기 추출된 의미망에서 상기 검색어를 기준으로 구성 키워드간 유사도를 구하여 그 관계를 시각화 정보로 생성하여 상기 사용자(170)에게 제공하는 역할을 수행한다. The multilingual terminology deduction device 150 extracts a semantic network for the search word from the multilingual semantic network building device 130 when a search word is input by the user 170, and based on the search word in the extracted semantic network. The similarity between the keywords is calculated, and the relationship is generated as visualization information and provided to the user 170.

또한, 상기 다국어 전문용어 추론 장치(150)는 사용자에 의해 검색어가 입력되면, 상기 다국어 의미망 구축 장치(130)의 데이터베이스를 검색하여 상기 검색어와 트리플 구조가 생성된 키워드 정보를 추출하고, 상기 추출된 키워드에 대해 상기 검색어와의 유사도에 따라 노드로 연결하여 의미망을 생성한다. 그런다음 상기 다국어 전문용어 추론 장치(150)는 상기 생성된 의미망에서 상기 검색어를 기준으 로 구성 노드간 유사도를 구하여 그 관계를 시각화 정보로 사용자에게 제공한다. In addition, the multilingual terminology deduction device 150 searches for a database of the multilingual semantic network building device 130 when a user inputs a search word, extracts keyword information in which the search word and triple structure are generated, and extracts the search word. The semantic network is generated by connecting the generated keywords to nodes according to the similarity with the search word. Then, the multilingual terminology deduction device 150 obtains the similarity between the configuration nodes based on the search word in the generated semantic network and provides the relationship as visualization information to the user.

상기와 같은 역할을 수행하는 다국어 전문용어 추론 장치(150)에 대한 상세한 설명은 도 5, 도 6을 참조하기로 한다. A detailed description of the multilingual terminology inference apparatus 150 that performs the above role will be described with reference to FIGS. 5 and 6.

도 2는 본 발명에 따른 다국어 의미망 구축 장치의 구성을 개략적으로 나타낸 블럭도, 도 3은 도 2에 도시된 트리플 구조 생성부가 생성한 트리플 구조의 예시도, 도 4는 도 2에 도시된 의미망 생성부가 생성한 의미망의 예시도이다. FIG. 2 is a block diagram schematically illustrating a configuration of an apparatus for constructing a multilingual semantic network according to the present invention, FIG. 3 is an exemplary diagram of a triple structure generated by the triple structure generation unit shown in FIG. 2, and FIG. 4 is a meaning illustrated in FIG. 2. An example diagram of a semantic network generated by the network generator.

도 2을 참조하면, 다국어 의미망 구축 장치(130)는 문헌정보 업데이트 감지부(132), 문헌 정보 데이터베이스의 키워드 필드를 참조하여 대역어 파일을 생성하는 대역어 파일 생성부(134), 상기 대역어 파일 생성부(134)에서 생성된 대역어간의 관계를 트리플 구조로 생성하는 트리플 구조 생성부(136), 상기 트리플 구조 생성부(136)에서 생성된 트리플 구조의 대역어에 대하여 속성 정보에 따라 노드로 연결하여 의미망을 생성하는 의미망 생성부(138), 상기 의미망 생성부(138)에서 생성된 의미망내 각 키워드에 대하여 의미 정보를 생성하여 해당 키워드와 맵핑시켜 저장하는 의미 정보 맵핑부(140), 데이터베이스(142)를 포함한다. Referring to FIG. 2, the apparatus for constructing a multilingual semantic network 130 may include a bibliographic information update detector 132, a bandword file generator 134 for generating a bandword file with reference to a keyword field of a bibliographic information database, and the bandword file generation. Triple structure generation unit 136 for generating the relationship between the band words generated in the unit 134 in a triple structure, and connected to the node according to the attribute information for the triple structure band words generated in the triple structure generation unit 136 means Semantic network generation unit 138 for generating a network, semantic information mapping unit 140 for generating semantic information for each keyword in the semantic network generated by the semantic network generating unit 138 and mapping and storing the semantic information with the corresponding keyword, database 142.

상기 문헌 정보 업데이트 감지부(132)는 문헌 정보 데이터베이스의 업데이트 여부를 실시간으로 감지하여 업데이트 된 경우, 상기 대역어 파일 생성부(134)에 대역어 파일 생성을 요청한다. The document information update detecting unit 132 detects whether the document information database is updated in real time and requests the band word file generation unit 134 to generate the band word file when it is updated.

상기 대역어 파일 생성부(134)는 상기 문헌 정보 업데이트 감지부(132)로부터 대역어 파일 생성 요청이 수신되는 경우 또는 다국어 의미망을 처음 구축할 경 우, 상기 문헌 정보 데이터베이스의 키워드 필드를 참조하여 대역어 파일을 생성한다.The bandword file generator 134 may refer to a keyword field of the bibliographic information database when a bandword file generation request is received from the bibliographic information update detector 132 or when a multilingual semantic network is first constructed. Create

즉, 상기 대역어 파일 생성부(134)는 상기 문헌 정보 데이터베이스의 키워드 필드에서 다국어 키워드 쌍을 추출하여 대역어 파일을 생성하고, 각 대역어 파일에 URI(Uniform Resource Identifier)를 부여하여 저장한다.That is, the bandword file generator 134 extracts a multilingual keyword pair from a keyword field of the document information database to generate a bandword file, and stores and assigns a URI (Uniform Resource Identifier) to each bandword file.

상기 대역어 파일 생성부(134)는 각 문헌의 키워드 필드에서 키워드를 추출하고, 상기 추출된 키워드가 둘 이상의 언어로 된 경우, 대역 키워드간 상호 매칭을 실시한다.The bandword file generator 134 extracts a keyword from a keyword field of each document, and performs matching between band keywords when the extracted keyword is in two or more languages.

문헌 데이터베이스 내의 키워드 필드가 각각 두 개의 언어로 구분되어 있는 경우에는, 두 필드로부터 추출한 키워드 간에 상호매칭을 시켜 대역파일을 생성한다. 대역쌍이 정확히 생성되지 않는 경우는 매칭 과정을 무효화 한다. When the keyword fields in the bibliographic database are each divided into two languages, a band file is generated by mutual matching between keywords extracted from the two fields. If the band pair is not generated correctly, the matching process is invalidated.

문헌 데이터베이스 내 1개의 키워드 필드 내에 두개 이상의 언어가 포함되어 있는 경우에는, 우선 키워드 필드 내의 전체 키워드 개수를 구한다. 상기 키워드 개수가 짝수이면, 상기 대역어 파일 생성부(134)는 추출된 키워드 전체를 두 부분으로 나누고, 언어코드를 자동으로 인식하여 두 부분의 언어구성이 다른 경우(예를 들면, 앞부분은 한글, 뒷부분은 영어인 경우), 앞부분의 키워드와 뒷부분의 키워드를 상호 매칭시켜 키워드 쌍을 만들어 대역어 파일을 생성한다. When two or more languages are included in one keyword field in the literature database, first, the total number of keywords in the keyword field is obtained. If the number of keywords is an even number, the bandword file generator 134 divides the extracted keyword into two parts and automatically recognizes a language code so that the language structure of the two parts is different (for example, the first part is Korean, The latter part is in English), and the keyword of the first part and the keyword of the latter part are matched to generate keyword pairs to generate a keyword word file.

상기의 경우 예를 들면, 키워드 필드에서 '단어 중의성 해소; 자동 태깅; 의미 분류; 사전 추출 정보 기반 태깅; 연어 공기 기반 태깅; word sense disambiguation; automatic tagging; sense classification; dictionary Information-based tagging; collocation co-occurrence-based tagging'가 추출된 경우에 대하여 설명하기로 한다. In this case, for example, in the keyword field 'resolve the word neutrality; Automatic tagging; Semantic classification; Pre-extraction information based tagging; Salmon air based tagging; word sense disambiguation; automatic tagging; sense classification; dictionary Information-based tagging; A case where collocation co-occurrence-based tagging 'is extracted will be described.

상기 추출된 키워드 개수는 짝수이므로, 이를 두 부분으로 나누면 5쌍이 추출된다. 언어코드를 측정한 결과, 앞의 5개 키워드가 모두 한글이고 뒤의 5개 키워드가 영어이므로 상호 매칭을 시도한다.Since the extracted number of keywords is an even number, dividing it into two parts extracts 5 pairs. As a result of measuring the language code, the first five keywords are all Korean and the last five keywords are English.

즉, 앞의 한글 키워드 5개 '단어 중의성 해소; 자동 태깅; 의미 분류; 사전 추출 정보 기반 태깅; 연어 공기 기반 태깅'과 뒤의 영어 키워드 5개 'word sense disambiguation; automatic tagging; sense classification; dictionary Information-based tagging; collocation co-occurrence-based tagging'를 상호 매칭시킨다. In other words, the previous five Hangul keywords' word neutralization resolution; Automatic tagging; Semantic classification; Pre-extraction information based tagging; Salmon air based tagging and five English keywords behind 'word sense disambiguation; automatic tagging; sense classification; dictionary Information-based tagging; match collocation co-occurrence-based tagging '.

그러면, '단어 중의성 해소 = word sense disambiguation', '자동 태깅 = automatic tagging', '의미 분류 = sense classification', '사전 추출 정보 기반 태깅 = dictionary Information-based tagging', '연어 공기 기반 태깅 = collocation co-occurrence-based tagging'의 대역어 파일이 생성된다. Then, 'word sense disambiguation', 'auto tagging = automatic tagging', 'meaning classification = sense classification', 'dictionary information-based tagging = dictionary information-based tagging', 'salmon air-based tagging = collocation A bandword file of co-occurrence-based tagging 'is created.

또한, 한글 키워드 부분에 영문 키워드가 포함될 수 있으므로, 그 경우 상기 대역어 파일 생성부(134)는 두 부분의 영문 키워드가 일치하는가를 검증하여 일치하지 않은 경우 이전까지 수행된 매칭 과정을 무효화하여 대역어 파일을 생성하지 않는다. In addition, since the English keyword may be included in the Korean keyword portion, in this case, the bandword file generator 134 verifies whether the English keywords of the two portions match, and if it does not match, invalidates the matching process performed until the previous word file. Does not generate

또한, 상기 대역어 파일 생성부(134)는 상기 생성된 대역어 파일에서 도메인별로 정해진 일정한 임계치 이상의 발생 빈도를 가지는 대역어 파일만을 유효한 파 일로 선택하여 추출된 대역파일의 신뢰성을 제공한다. 예를 들면, 국내학회지의 경우, '한글-영어' 대역파일의 임계치를 3으로 설정하여 3회 이상 동일 패턴이 반복된 데이터만 트리플 데이터로 생성을 하고, 중국학술지의 경우, 품질문제를 고려하여 '중국어 - 영어' 대역파일의 임계치를 10으로 설정하여 10회 이상 동일 패턴이 반복된 데이터만을 트리플 데이터로 생성하도록 한다.In addition, the bandword file generator 134 selects only the bandword file having a frequency of occurrence above a predetermined threshold determined for each domain in the generated bandword file as a valid file to provide reliability of the extracted bandfile. For example, in the case of domestic journals, the threshold value of the 'Hangul-English' band file is set to 3 to generate only the data having the same pattern repeated three times or more, and in the case of Chinese journals, considering the quality problem The threshold value of the 'Chinese-English' band file is set to 10 to generate only the data having the same pattern repeated 10 times or more as triple data.

상기 트리플 구조 생성부(136)는 시맨틱 웹의 표준기술 규칙을 이용하여 상기 대역어 파일 생성부(134)에서 생성된 대역어간의 관계를 트리플 구조로 생성하되, 언어가 다른 대역어 관계는 대칭 속성(Symmetry Property)인 'hss Synonym'을 부여하고, 동일한 언어(동일한 문자열값)의 경우는 더미 속성(Dummy Property)인 'has Dummy'를 부여하여 트리플 구조를 생성한다. The triple structure generator 136 generates a relationship between the band words generated by the band word file generator 134 in a triple structure by using standard semantic rules of the semantic web, but a band word relationship having different languages is a symmetry property. ) And 'hss Synonym', and in case of the same language (same string value), 'has Dummy' is granted to create a triple structure.

또한, 상기 트리플 구조 생성부(136)는 트리플 구조의 각 키워드에 고유한 URI를 부여한다. 상기 URI는 '도메인명.용어' 또는 '도메인명.용어ID값' 형태로서, 상기 도메인명은 해당 전문용어가 추출된 고유한 데이터베이스 테이블의 명칭 또는 별칭일 수 있다.In addition, the triple structure generation unit 136 assigns a unique URI to each keyword of the triple structure. The URI may be in the form of 'domain name.term' or 'domain name.term ID value', and the domain name may be a name or an alias of a unique database table from which the terminology is extracted.

상기 트리플 구조 생성부(136)가 'ship'에 대해 생성한 트리플 구조에 대해 도 3을 참조하여 설명하기로 한다. The triple structure generated by the triple structure generator 136 for 'ship' will be described with reference to FIG. 3.

제1, 제2, 제3 SHIP은 다른 URI를 가지고 있고, 각각에 대한 대역어가 선박, 배, 船舶으로 저장된 경우, 제1 SHIP과 선박, 제2 SHIP과 배, 제3 SHIP과 船舶은 각각 대칭 속성(Symmetry Property)이 부여되고, 제1, 제2, 제3 SHIP은 같은 문자열값을 가지므로 더미 속성(Dummy Property)이 부여되어 도 3과 같은 트리플 구조 가 생성된다. When the first, second, and third SHIPs have different URIs and the bandwords for each are stored as ship, ship, and ship, the first SHIP and ship, the second SHIP and ship, and the third SHIP and ship are symmetric, respectively. Since the property (Symmetry Property) is given, and the first, second, and third SHIP have the same string value, a dummy property is given to generate a triple structure as shown in FIG. 3.

여기서, 대칭 속성은 전문 용어간 동의어(유의어)를 선언한 것을 말하고, 더미 속성은 동일어를 말한다. Here, the symmetric attribute refers to the declaration of synonyms (synonyms) between the terminology, and the dummy attribute refers to the same term.

상기 의미망 생성부(138)는 상기 트리플 구조 생성부(136)에서 생성된 대역 키워드 간의 관계에서 더미 속성(Dummy Property)을 가지는 키워드들은 동일 개체(Instance)로 해석하고, 그 키워드를 중심으로 대칭 속성(Symmetry Property)을 가지는 키워드를 연결하여 의미망을 생성한다. The semantic network generating unit 138 interprets keywords having a dummy property in the relationship between the band keywords generated by the triple structure generating unit 136 as the same instance and is symmetric about the keywords. A semantic network is created by concatenating keywords having a property.

상기 의미망 생성부(138)가 도 3과 같은 트리플 구조를 이용하여 의미망을 생성하면 도 4와 같은 의미망이 생성된다. When the semantic network generating unit 138 generates the semantic network using the triple structure as shown in FIG. 3, the semantic network as shown in FIG. 4 is generated.

도 4를 참조하면, 하나의 'SHIP'에 선박, 배, 船舶이 연결되어 의미망이 생성되어 있다. 이때, 'SHIP'에는 제1, 제2, 제3 SHIP에 부여된 URI가 모두 맵핑되어 있다. Referring to FIG. 4, a vessel, a ship, and a ship are connected to one 'SHIP' to create a semantic network. At this time, all URIs assigned to the first, second, and third SHIP are mapped to 'SHIP'.

상기 의미 정보 맵핑부(140)는 상기 트리플 구조 생성부(136)에서 부여된 URI를 이용하여 각 키워드의 도메인별 분류 정보를 판단하고, 그 분류 정보에서의 해당 키워드의 발생빈도 및 가중치를 포함하는 유사벡터를 구하여 각 키워드와 맵핑시켜 저장한다. The semantic information mapping unit 140 determines classification information for each keyword of each keyword by using the URI given by the triple structure generation unit 136, and includes occurrence frequency and weight of the corresponding keyword in the classification information. Obtain a similar vector, map it with each keyword, and save it.

각 키워드의 주제분야별 가중치 유사벡터를 구하는 방식은, 해당 키워드에 대한 각종 분류코드의 발생빈도 정보와 해당 키워드가 아닌 다른 키워드에 발생한 분류코드의 발생빈도를 측정한 후, 측정된 빈도정보를 각종 유사계수에 적용하여 산출할 수 있다.The method of obtaining weighted similar vector for each subject field of each keyword is to measure the frequency of occurrence of various classification codes for the corresponding keyword and the frequency of occurrence of the classification code generated for keywords other than the corresponding keyword, and then measure the frequency information for various similarities. It can be calculated by applying it to the coefficient.

이 때 사용하는 유사계수(similarity measures)는 고빈도어 선호경향을 갖는 자카드 계수(Jaccard coefficient), 코사인 계수(Cosine coefficient) 등과 저빈도어 선호경향을 갖는 로그승산비(Log-odds ratios), 상호정보량(Mutual information) 등을 다양하게 적용할 수 있다.The similarity measures used here include Jaccard coefficients, Cosine coefficients with high frequency preference, and Log-odds ratios with low frequency preference. Various amounts of information may be applied.

또한 상기 측정된 유사 가중치를 정규화하기 위하여, 수학식 1과 같이 용어빈도(term frequency:TF)와 역문헌빈도(inverse document frequency:IDF)를 곱한 TF*logIDF 를 사용할 수도 있다.In addition, in order to normalize the measured similar weights, TF * logIDF multiplied by term frequency (TF) and inverse document frequency (IDF) may be used as in Equation 1.

여기서, W(t)는 문헌내 발생용어 t의 가중치, N은 전체문헌 수, TF는 용어빈도, DF는 문헌빈도를 뜻한다. Here, W (t) is the weight of the occurrence term t in the literature, N is the total number of documents, TF is the term frequency and DF is the document frequency.

상기 방법으로 산출된 '용어-주제분야'간 유사벡터는 SHIP = {기계 : 0.7, 건설 : 0.3}과 같은 형태로 의미망의 키워드와 맵핑되어 저장된다. 상기 'SHIP = {기계 : 0.7, 건설 : 0.3}'은 SHIP이 기계 분야에서 0.7의 유사도 가중치를 가지고, 건설분야에서 0.3의 유사도 가중치를 가진다는 것을 말한다.Similar vectors between the terminology-topics calculated by the above method are stored in the form of SHIP = {machine: 0.7, construction: 0.3} and mapped to the keywords of the semantic network. 'SHIP = {machine: 0.7, construction: 0.3}' means that SHIP has a similarity weight of 0.7 in the mechanical field and 0.3 similarity weight in the construction field.

상기 데이터베이스(142)는 상기 대역어 파일 생성부(134)에서 생성된 대역어 파일을 저장하는 대역어 파일 정보 데이터베이스(142a), 상기 트리플 구조 생성부(136)에서 생성된 트리플 구조 데이터를 저장하는 트리플 데이터 데이터베이스(142b), 상기 의미 정보 맵핑부(140)에서 생성된 각 키워드와 유사벡터 값이 맵 핑된 의미망 정보 데이터베이스(142c)를 포함한다. The database 142 may include a bandword file information database 142a for storing the bandword file generated by the bandword file generator 134 and a triple data database for storing triple structure data generated by the triple structure generator 136. In operation 142b, the semantic information mapping unit 140 includes a semantic network information database 142c to which each keyword and similar vector value are mapped.

도 5는 본 발명에 따른 다국어 전문용어 추론 장치의 구성을 개략적으로 나타낸 블럭도, 도 7은 본 발명에 따른 시각화 정보 생성부에 의해 생성되어 사용자에게 제공되는 시각화 정보를 나타낸 화면 예시도, 도 8은 본 발명에 따른 사용자에 의해 입력된 검색어에 대해 유사도 임계치를 적용하기 전과 유사도 임계치를 적용한 후의 네트워크 변화를 나타낸 화면 예시도이다. FIG. 5 is a block diagram schematically illustrating a configuration of an apparatus for multilingual terminology according to the present invention. FIG. 7 is an exemplary view showing visualization information generated by a visualization information generating unit according to the present invention and provided to a user. FIG. Is an exemplary view showing a network change before applying a similarity threshold and after applying a similarity threshold to a search word input by a user according to the present invention.

도 5를 참조하면, 다국어 전문용어 추론 장치(150)는 사용자에 의해 검색어가 입력되면, 다국어 의미망 구축 장치의 데이터베이스에서 상기 검색어를 추출하는 검색어 추출부(152), 상기 추출된 검색어와 유사관계가 설정된 키워드가 포함된 의미망을 추출하는 의미망 정보 추출부(154), 상기 의미망 정보 추출부(154)에서 추출된 의미망에서 상기 검색어를 기준으로 각 노드간의 유사도를 측정하는 유사도 측정부(156), 상기 유사도 측정부(156)에서 측정된 유사도를 상기 의미망내의 노드에 표시하여 사용자에게 시각화 정보를 제공하는 시각화 정보 생성부(158)를 포함한다. Referring to FIG. 5, when a search term is input by a user, the multilingual terminology deriving apparatus 150 may search for a search term extractor 152 that extracts the search term from a database of a multilingual semantic network building device. A semantic network information extractor 154 for extracting a semantic network including a set keyword, and a similarity measurer for measuring similarity between nodes based on the search word in the semantic network extracted by the semantic network information extractor 154. 156, and a visualization information generator 158 that displays the similarity measured by the similarity measurer 156 to a node in the semantic network and provides visualization information to the user.

상기 유사도 측정부(156)는 상기 검색어를 기준으로 각 노드에 있는 키워드와의 유사도 또는 상기 추출된 전체 의미망에서 상기 검색어를 기준으로 유사도를 구한다. The similarity measurer 156 calculates the similarity with the keyword in each node based on the search word or the similarity based on the search word in the extracted semantic network.

상기 유사도 측정부(156)가 키워드 벡터간 유사도를 구하는 방식은 유사계수(similarity measures)와 거리계수(distance measures)를 모두 사용할 수 있다. 두 키워드 벡터간 유사도 측정은 두 키워드에 공통적으로 출현한 분류코드 정보, 어느 한쪽에만 발생한 분류코드 정보, 두 키워드 어디에도 나타나지 않은 분류코드 정보의 발생빈도를 측정한 후 각종 유사계수에 측정된 빈도정보를 적용하여 산출할 수 있다. 이 때 유클리드 거리계수(Euclidean distance coefficient), 피어슨 상관계수(Pearson correlation coefficient), 코사인 계수(cosine coefficient) 등 다양한 측정방법을 사용할 수 있다. 아래 수학식 2는 코사인 유사계수 공식의 예이다.The similarity measurer 156 may use the similarity measure and the distance measure to calculate the similarity between keyword vectors. The similarity measurement between two keyword vectors measures frequency of occurrence of classification code information common to both keywords, classification code information that occurs only on one side, and classification code information that does not appear on either keyword. It can be calculated by applying. In this case, various measurement methods such as Euclidean distance coefficient, Pearson correlation coefficient, and cosine coefficient may be used. Equation 2 below is an example of the cosine similar coefficient formula.

상기 시각화 정보 생성부(158)는 사용자에 의해 입력된 검색어에 대하여 상기 유사도 측정부(156)에서 측정된 의미적 유사도에 따라 상기 의미망 정보 추출부(154)에서 추출된 각 키워드에 표시된 시각화 정보를 생성하여 도 7과 같이 디스플레이한다. The visualization information generator 158 displays visualization information displayed on each keyword extracted by the semantic network information extractor 154 according to the semantic similarity measured by the similarity measurer 156 with respect to a search word input by a user. Is generated and displayed as shown in FIG.

사용자에게 제공되는 시각화 정보 제공 화면에 대하여 도 7을 참조하면, 시각화 정보 제공 화면(700)은 각 키워드에 유사도가 맵핑되어 노드가 설정된 의미망 정보 제공 영역(702), 유사도 임계치 조절 영역(704)으로 구성되어 있다. 상기 의 미망 정보 제공 영역(702)에는 각 키워드에 대한 주제 분야간 유사도 및 키워드간의 유사도가 표시된다. Referring to FIG. 7 for a visualization information providing screen provided to a user, the visualization information providing screen 700 includes a semantic network information providing region 702 and a similarity threshold adjusting region 704 in which nodes are set by mapping similarities to respective keywords. It consists of. The similarity information providing area 702 indicates similarity between subject fields and similarity between keywords for each keyword.

상기 유사도 임계치 조절 영역(704)에 있는 임계치 조절 키를 이용하여 검색어와 네트워크 간의 유사도를 조정하여 전체 의미망을 동적으로 변화할 수 있고, 실시간으로 최적의 용어 군집 변화를 확인하고, 데이터의 품질을 조절할 수 있다. The similarity between the search term and the network may be dynamically adjusted by using the threshold adjustment keys in the similarity threshold adjustment region 704 to dynamically change the entire semantic network, to check the optimal term cluster change in real time, and to check the quality of the data. I can regulate it.

상기 유사도 임계치 조절 키를 이용하여 임계치를 조절한 경우의 네트워크 변화에 대해 도 8을 참조하여 설명하기로 한다. A network change when the threshold is adjusted using the similarity threshold adjustment key will be described with reference to FIG. 8.

도 8을 참조하면, 유사 임계치가 0(800a)일 경우, 키워드와 네트워크 전체 유사도는 0.519이지만, 유사 임계치를 계속 증가시켜 0.7 유사 임계치(800b)에서 키워드와 네트워크 전체의 유사도는 0.981까지 증가하여 가장 의미적으로 유사한 최적의 네트워크를 생성한다. Referring to FIG. 8, when the similarity threshold is 0 (800a), the keyword and network overall similarity is 0.519, but the similarity threshold is continuously increased to increase the similarity between the keyword and the network as a whole at 0.7 similar threshold 800b to 0.981. Create an optimal network that is semantically similar.

상기와 같이 사용자가 검색한 키워드에 대하여 유사도 임계치 조절 키를 이용하여 유사도 임계치를 적용하기 전과 유사도 임계치를 적용한 후의 네트워크 변화를 실시간으로 확인할 수 있다.As described above, the network change after applying the similarity threshold and after applying the similarity threshold can be confirmed in real time using the similarity threshold adjusting key for the keyword searched by the user.

도 6은 본 발명의 다른 실시예에 따른 다국어 전문용어 추론 장치의 구성을 개략적으로 나타낸 블럭도이다. 6 is a block diagram schematically illustrating a configuration of a multilingual terminology inference apparatus according to another embodiment of the present invention.

도 6을 참조하면, 다국어 전문용어 추론 장치(150)는 사용자에 의해 검색어가 입력되면, 다국어 의미망 구축 장치의 데이터베이스에서 상기 검색어를 추출하는 검색어 추출부(152), 상기 검색어와 트리플 구조가 생성된 키워드 정보를 추출 하는 트리플 구조 정보 추출부(153), 상기 추출된 트리플 구조의 키워드에 대하여 유사도에 따라 노드로 연결하여 의미망을 생성하고, 상기 생성된 의미망내 각 키워드에 대하여 의미 정보를 생성하여 해당 키워드와 맵핑시키는 의미망 정보 생성부(155), 상기 의미망 정보 생성부(155)에서 생성된 의미망에서 상기 검색어를 기준으로 각 노드간의 유사도를 측정하는 유사도 측정부(156), 상기 유사도 측정부(156)에서 측정된 유사도를 상기 의미망내의 노드에 표시하여 사용자에게 시각화 정보를 제공하는 시각화 정보 생성부(158)를 포함한다. Referring to FIG. 6, when a search word is input by a user, the multilingual terminology deduction device 150 generates a search word extractor 152 that extracts the search word from a database of a multilingual semantic network building device, and generates the search word and a triple structure. Triple structure information extraction unit 153 for extracting the extracted keyword information, the semantic network is generated by connecting the extracted triple structure keywords to nodes according to similarity, and generating semantic information for each keyword in the generated semantic network The semantic network information generation unit 155 to map the keyword to the corresponding keyword, and the similarity measurement unit 156 measuring the similarity between nodes based on the search word in the semantic network generated by the semantic network information generation unit 155. The similarity measurer 156 displays the similarity to the nodes in the semantic network to generate visualization information for providing visualization information to the user. And a 158. The

상기 의미망 정보 생성부(155)는 상기 트리플 구조내 각 키워드에 부여된 URI를 이용하여 각 키워드의 도메인별 분류 정보를 판단하고, 그 분류 정보에서의 해당 키워드의 발생빈도 및 가중치를 포함하는 유사벡터를 구하여 각 키워드와 맵핑시켜 저장한다.The semantic network information generating unit 155 determines classification information for each keyword of each keyword by using URIs assigned to each keyword in the triple structure, and includes similarity including occurrence frequency and weight of the corresponding keyword in the classification information. Obtain a vector, map it with each keyword, and save it.

이 경우는 상기 다국어 의미망 구축 장치가 트리플 구조로 생성된 각 키워드에 대하여 의미망을 구축하지 않은 경우에 다국어 전문용어 추론 장치가 해당 검색어에 대하여 실시간으로 의미망을 구축하여 그 결과를 사용자에게 제공할 수 있다. In this case, when the multilingual semantic network building device does not construct a semantic network for each keyword generated in a triple structure, the multilingual terminology inference device constructs the semantic network in real time with respect to the corresponding search word and provides the result to the user. can do.

도 9는 본 발명에 따른 다국어 의미망 구축 장치가 문헌 정보 데이터베이스에 저장된 문헌을 이용하여 다국어 의미망을 구축하는 방법을 나타낸 흐름도이다. 9 is a flowchart illustrating a method for constructing a multilingual semantic network using a document stored in a bibliographic information database according to the present invention.

도 9를 참조하면, 다국어 의미망 구축 장치는 문헌 정보 데이터베이스의 업데이트 여부를 실시간으로 감지하여 업데이트가 감지되면(S900), 해당 문헌 정보의 키워드 필드를 참조하여 대역어 파일을 생성한다(S902). Referring to FIG. 9, if an update is detected in real time by detecting whether the document information database is updated in real time (S900), the multilingual semantic network building apparatus generates a band word file with reference to a keyword field of the corresponding document information (S902).

즉, 상기 다국어 의미망 구축 장치는 문헌 정보 데이터베이스에 저장된 각 문헌의 키워드 필드에서 키워드를 추출하고, 상기 추출된 키워드가 둘 이상의 언어로 된 경우, 대역 키워드간 상호 매칭을 실시하여 대역어 파일을 생성한다.That is, the apparatus for constructing a multilingual semantic network extracts a keyword from a keyword field of each document stored in a document information database, and generates a band word file by performing mutual matching between band keywords when the extracted keyword is in two or more languages. .

상기 다국어 의미망 구축 장치가 대역어 파일을 생성하는 방법에 대한 상세한 설명은 도 2에서 설명하였으므로 생략하기로 한다.A detailed description of the multilingual semantic network building apparatus generating the bandword file will be omitted since it has been described with reference to FIG. 2.

그런 다음 상기 다국어 의미망 구축 장치는 상기 생성된 대역어간의 관계를 트리플 구조로 만든다(S904). 즉, 상기 다국어 의미망 구축 장치는 시맨틱 웹의 표준기술규칙에 따라 언어가 다른 대역어 파일에 대해서는 대칭 속성(Symmetry Property)을 부여하고, 동일한 문자열을 갖는 용어에 대해서는 더미 속성(Dummy Property)을 부여하여 트리플 구조를 생성한다. 이때, 상기 다국어 의미망 구축 장치는 상기 트리플 구조로 생성된 각 키워드에 대해 고유한 URI를 부여한다. Then, the apparatus for constructing a multilingual semantic network makes the relationship between the generated band words in a triple structure (S904). That is, the apparatus for constructing a multilingual semantic network grants a symmetry property to a bandword file having a different language according to the standard technical rules of the semantic web, and a dummy property to a term having the same string. Create a triple structure. In this case, the apparatus for constructing a multilingual semantic network assigns a unique URI to each keyword generated in the triple structure.

그런 다음 상기 다국어 의미망 구축 장치는 상기 생성된 트리플 구조의 대역어 간의 관계에서 더미 속성(Dummy Property)을 가지는 키워드는 동일한 개체(Instance)로 해석하고, 그 키워드를 중심으로 대칭 속성(Symmetry Property)을 가지는 키워드를 연결하여 의미망을 생성한다(S906).Then, the apparatus for constructing a multilingual semantic network interprets a keyword having a dummy property as the same instance in the relationship between the generated triplet band words, and generates a symmetry property around the keyword. Branch to generate a semantic network by connecting the keywords (S906).

그런 후, 상기 다국어 의미망 구축 장치는 각 키워드에 부여된 URI를 이용하여 각 키워드의 도메인별 분류 정보를 판단하고, 그 분류 정보에서의 해당 키워드의 발생빈도 및 가중치를 포함하는 유사벡터를 구하여 각 키워드와 맵핑시켜 저장한다(S908).Thereafter, the apparatus for constructing a multilingual semantic network determines classification information for each keyword of each keyword using URIs assigned to each keyword, and obtains a similar vector including occurrence frequency and weight of each keyword in the classification information. Map and store the keyword (S908).

본 발명의 다른 실시예에 따르면, 상기 다국어 의미망 구축 장치는 S906과 S908을 수행하지 않고 S904까지만을 수행할 수도 있다. 이 경우, 사용자가 검색어를 입력한 경우, 다국어 전문용어 추론 장치가 해당 검색어에 대한 의미망을 실시간으로 생성하는 과정을 수행하게 된다. According to another embodiment of the present invention, the apparatus for constructing a multilingual semantic network may perform only up to S904 without performing S906 and S908. In this case, when a user inputs a search word, the multilingual terminology inference device performs a process of generating a semantic network for the corresponding search word in real time.

도 10은 본 발명에 따른 다국어 전문용어 추론 장치가 사용자에 의해 입력된 질의어를 검색하는 방법을 나타낸 흐름도이다. 10 is a flowchart illustrating a method of searching for a query input by a user by a multilingual terminology inference apparatus according to the present invention.

도 10을 참조하면, 다국어 전문용어 추론 장치는 사용자에 의해 검색어가 입력되면(S1000), 다국어 의미망 구축 장치의 데이터베이스로부터 상기 검색어와 유사관계가 설정된 키워드가 포함된 의미망을 추출한다(S1002).Referring to FIG. 10, when a search term is input by a user (S1000), the multilingual terminology inference apparatus extracts a semantic network including a keyword having similarity to the search term from a database of a multilingual semantic network building apparatus (S1002). .

그런 다음 상기 다국어 검색 장치는 상기 추출된 의미망에서 상기 검색어를 기준으로 각 노드간의 유사도를 측정하고(S1004), 상기 측정된 유사도를 상기 의미망내의 노드에 표시하여 상기 사용자에게 제공한다(S1006).Then, the multilingual search apparatus measures similarity between each node based on the search word in the extracted semantic network (S1004), and displays the measured similarity to nodes in the semantic network and provides the same to the user (S1006). .

상기 사용자는 상기 검색어와 네트워크 간의 유사도를 조정하여 전체 의미망을 동적으로 변화시킬 수 있고, 실시간으로 최적의 용어 군집 변화를 확인하고, 데이터의 품질을 조절할 수 있다.The user may dynamically change the entire semantic network by adjusting the similarity between the search word and the network, identify an optimal term cluster change in real time, and adjust the quality of data.

도 11은 본 발명의 다른 실시예에 따른 다국어 전문용어 추론 장치가 사용자에 의해 입력된 질의어를 검색하는 방법을 나타낸 흐름도이다. 11 is a flowchart illustrating a method of searching for a query input by a user by a multilingual terminology inference apparatus according to another exemplary embodiment of the present invention.

도 11을 참조하면, 다국어 전문용어 추론 장치는 사용자에 의해 검색어가 입력되면(S1100), 다국어 의미망 구축 장치의 데이터베이스를 검색하여 상기 검색어 와 트리플 구조가 생성된 키워드 정보를 추출한다(S1102).Referring to FIG. 11, when a search term is input by a user (S1100), the multilingual terminology inference apparatus searches a database of a multilingual semantic network construction apparatus and extracts keyword information in which the search term and triple structure are generated (S1102).

그런 다음 상기 다국어 전문용어 추론 장치는 상기 추출된 트리플 구조의 대역어 간의 관계에서 더미 속성(Dummy Property)을 가지는 키워드들은 동일 개체(Instance)로 해석하고, 그 키워드를 중심으로 대칭 속성(Symmetry Property)을 가지는 키워드를 연결하여 의미망을 생성한다(S1104). Then, the multilingual term inference apparatus interprets keywords having dummy properties in the relationship between the extracted triplet band words as the same instance, and calculates a symmetry property around the keywords. Branches connect the keywords to generate a semantic network (S1104).

그런 다음 상기 다국어 전문용어 추론 장치는 각 키워드의 도메인별 분류 정보를 판단하고, 그 분류 정보에서의 해당 키워드의 발생빈도 및 가중치를 포함하는 유사벡터를 구하여 각 키워드와 맵핑시킨다(S1106).Then, the multilingual term inference apparatus determines classification information for each keyword of each keyword, obtains a similar vector including an occurrence frequency and a weight of the corresponding keyword in the classification information, and maps it to each keyword (S1106).

그런 다음 상기 다국어 전문용어 추론 장치는 상기 유사벡터가 맵핑된 의미망에서 상기 검색어를 기준으로 각 노드간의 유사도를 구하고(S1108), 상기 구해진 유사도를 상기 의미망내의 노드에 표시하여 상기 사용자에게 제공한다(S1110).Then, the multilingual term inference device obtains the similarity between each node based on the search word in the semantic network to which the similar vector is mapped (S1108), and displays the obtained similarity to the nodes in the semantic network and provides the same to the user. (S1110).

이와 같이, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.As such, those skilled in the art will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features thereof. Therefore, the above-described embodiments are to be understood as illustrative in all respects and not as restrictive. The scope of the present invention is shown by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

도 1은 본 발명에 따른 다국어 전문용어 자원 제공 시스템의 구성을 나타낸 도면.1 is a view showing the configuration of a multi-lingual terminology resource providing system according to the present invention.

도 2는 본 발명에 따른 다국어 의미망 구축 장치의 구성을 개략적으로 나타낸 블럭도.Figure 2 is a block diagram schematically showing the configuration of a multilingual semantic network building apparatus according to the present invention.

도 3은 도 2에 도시된 트리플 구조 생성부가 생성한 트리플 구조의 예시도.3 is an exemplary view of a triple structure generated by the triple structure generating unit shown in FIG. 2;

도 4는 도 2에 도시된 의미망 생성부가 생성한 의미망의 예시도. 4 is an exemplary diagram of a semantic network generated by the semantic network generating unit shown in FIG. 2;

도 5 및 도 6은 본 발명에 따른 다국어 전문용어 추론 장치의 구성을 개략적으로 나타낸 블럭도.5 and 6 are block diagrams schematically showing the configuration of a multilingual terminology inference apparatus according to the present invention.

도 7은 본 발명에 따른 시각화 정보 생성부에 의해 생성되어 사용자에게 제공되는 시각화 정보를 나타낸 화면 예시도.7 is an exemplary screen illustrating visualization information generated by a visualization information generating unit according to the present invention and provided to a user.

도 8은 본 발명에 따른 사용자에 의해 입력된 검색어에 대해 유사도 임계치를 적용하기 전과 유사도 임계치를 적용한 후의 네트워크 변화를 나타낸 화면 예시도. 8 illustrates an example of a network change before and after applying a similarity threshold to a search word input by a user according to the present invention.

도 9는 본 발명에 따른 다국어 의미망 구축 장치가 문헌 정보 데이터베이스에 저장된 문헌을 이용하여 다국어 의미망을 구축하는 방법을 나타낸 흐름도. 9 is a flowchart illustrating a method for constructing a multilingual semantic network by using a document stored in a bibliographic information database, according to the present invention.

도 10 및 도 11은 본 발명에 따른 다국어 전문용어 추론 장치가 사용자에 의해 입력된 질의어를 검색하는 방법을 나타낸 흐름도. 10 and 11 are flowcharts illustrating a method for retrieving a query input by a user by a multilingual terminology inference apparatus according to the present invention;

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100 : 문헌 정보 데이터베이스 130 : 다국어 의미망 구축 장치100: literature information database 130: multilingual semantic network construction device

132 : 문헌 정보 업데이트 감지부 134 : 대역어 파일 생성부132: document information update detection unit 134: band word file generation unit

136 : 트리플 구조 생성부 138 : 의미망 생성부136: triple structure generator 138: semantic network generator

140 : 의미 정보 맵핑부 142 : 데이터베이스140: semantic information mapping unit 142: database

150 : 다국어 전문용어 추론 장치 152 : 검색어 추출부150: multilingual terminology inference device 152: search term extraction unit

153 : 트리플 구조 정보 추출부 154 : 의미망 정보 추출부153: triple structure information extractor 154: semantic network information extractor

155 : 의미망 정보 생성부 156 : 유사도 측정부155: semantic network information generation unit 156: similarity measurement unit

158 : 시각화 정보 생성부 170 : 사용자 단말기158: visualization information generation unit 170: user terminal

Claims

Bibliographic information database;

Multi-language semantic network building apparatus generating a word-language file by referring to the keyword field of each document stored in the document information database, making the word-language file into a triple structure, and connecting to nodes according to attribute information of each keyword to generate a semantic network. ; And

When a search word is input by a user, a semantic network for the search term is extracted from the multilingual semantic network building device, the similarity between components of the extracted semantic network is calculated based on the search word, and the relationship is provided to the user as visualization information. Multilingual terminology inference apparatus;

Multilingual terminology resource providing system comprising a.

The method of claim 1,

The bilingual terminology resource providing system, wherein each document is converted into Unicode and stored in the bibliographic information database.

The method of claim 1,

The multilingual semantic network construction apparatus detects whether the document information database is updated in real time and, when updated, regenerates the semantic network.

The method of claim 1,

The multilingual term inference device extracts a search word input by a user and a keyword having a triple structure from the multilingual semantic network building device, and connects the extracted keyword to a node according to attribute information with the search word. The multi-lingual terminology resource providing system, characterized in that for generating a similarity between the keywords constituting the keyword based on the search word in the generated semantic network and providing the relationship to the user as visualization information.

In the multilingual semantic network construction device for building a multilingual semantic network for documents stored in a document information database,

A bandword file generation unit generating a bandword file by referring to a keyword field of the document information database;

A triple structure generator for generating a relationship between the band words generated by the band word file generator in a triple structure;

A semantic network generator for generating a semantic network by connecting to a node according to attribute information with respect to the keyword of the triple structure generated by the triple structure generator; and

A semantic information mapping unit for generating semantic information for each keyword in the semantic network generated by the semantic network generating unit and mapping the semantic information with the corresponding keyword and storing the semantic information;

Multilingual semantic network building device for providing a multilingual terminology resource comprising a.

The method of claim 5,

And a document information update detection unit for requesting generation of a band word file in the band word file generation unit when the document information database is updated in real time.

The method of claim 5,

A band file information database for storing the band word file generated by the band word file generator, a triple data database for storing triple structure data generated by the triple structure generator, and a semantic network information database to which semantic information is mapped for each keyword; Multilingual semantic network construction apparatus for providing a multilingual terminology resource further comprising.

The method of claim 5,

And a format conversion unit for converting and storing the document information stored in the document information database into a Unicode format.

The method of claim 5,

The band word file generator extracts a multi-language keyword pair from a keyword field of the bibliographic information database to generate a band word file.

The method of claim 5,

The bandword file generator is a multilingual semantic network construction apparatus for providing a multilingual terminology resource, characterized in that for selecting the bandword file having a frequency of occurrence more than a predetermined threshold determined for each domain in the generated bandword file.

The method of claim 5,

The triple structure generator generates a relationship between the band keywords generated by the band word file generator using a standard description rule of the semantic web in a triple structure, and a band word relationship with different languages is given a symmetry property, and the same In the case of a language, a multilingual semantic network constructing apparatus for providing a multilingual terminology resource, wherein a triple structure is generated by giving a dummy property.

The method of claim 5,

The triple structure generator is a multi-lingual semantic network for providing a multi-lingual terminology resource, characterized in that to assign a unique URI to each keyword of the generated triple structure.

The method of claim 5,

The semantic network generator interprets keywords having dummy properties in the relationship between the band keywords generated by the triple structure generator as the same instance and has a symmetry property around the keywords. An apparatus for constructing a multilingual semantic network for providing a multilingual terminology resource that connects keywords to generate a semantic network.

The method of claim 13,

The semantic network generating unit, a multilingual semantic network construction apparatus for providing a multilingual terminology resource, characterized in that the keyword that has a dummy property (Dummy Property) is interpreted as the same instance (s) stored together with the URI that each keyword has .

The method of claim 5,

The semantic information mapping unit determines the classification information for each keyword of each keyword by using the URI given by the triple structure generation unit, obtains a similar vector including the occurrence frequency and weight of the corresponding keyword in the classification information, and calculates a similar vector. Multilingual semantic network construction apparatus for providing multilingual terminology resources, characterized in that the mapping and storing.

In the multilingual terminology inference device that provides information about a search query queried by a user,

A search word extracting unit extracting the search word from a database of a multilingual semantic network building device when a search word is input by a user;

A semantic network information extracting unit for extracting a semantic network in which a node having a property related to the search word is set from the database;

A similarity measurement unit for measuring similarity between nodes in the semantic network extracted by the semantic network information extractor based on the search word; and

A visualization information generator configured to generate visualization information displayed on the nodes in the semantic network measured by the similarity measurer and provide the same to the user;

Multilingual terminology inference device for providing a multilingual terminology resource comprising a.

A triple structure information extracting unit which extracts keywords connected to the search word in a triple structure from the database;

Semantic network information for generating the semantic network by connecting the extracted triple-structured keyword to relevant nodes according to the attribute information based on the search word, and generating semantic information for each keyword in the semantic network. Generation unit;

A similarity measurer for measuring similarity between nodes in the semantic network generated by the semantic network information generator based on the search word; and

A visualization information generation unit configured to generate visualization information by calculating position information of each node in the semantic network according to the similarity measured by the similarity measurement unit, and providing the generated visualization information to a user;

The method according to claim 16 or 17,

The similarity measurer calculates a similarity with the keyword in each node based on the search word multilingual terminology inference device for providing a multilingual terminology resource, characterized in that.

The method according to claim 16 or 17,

The visualization information generation unit generates visualization information having an optimal semantic network by using similarity values between keywords measured by the similarity measurer and provides the user with multilingual terminology for providing a multilingual terminology resource. Device.

The method according to claim 16 or 17,

The visualization information includes a semantic network and a similar threshold control key,

The visualization information generation unit, if the threshold is adjusted using the similar threshold control key, multilingual terminology inference device for providing a multilingual terminology resource, characterized in that for controlling the size of the semantic network based on the threshold.

In the method of building a multilingual semantic network using a document stored in a bibliographic information database,

(a) generating a bandword file by referring to a keyword field of the document information database;

(b) generating a relationship between the generated band words in a triple structure;

(c) A keyword having a dummy property in the generated triple-structured band word is interpreted as an instance, and a keyword having a symmetry property is connected to the keyword to mean it. Creating a network; and

(d) generating semantic information of each keyword in the generated semantic network and mapping and storing the semantic information with the corresponding keyword;

Multilingual semantic network construction method for providing a multilingual terminology resource comprising a.

The method of claim 21,

If the update of the document information database is detected, the multi-lingual semantic network for providing a multi-lingual terminology resource further comprising the step of (a) to (d).

The method of claim 21,

In step (a),

Extracting a keyword from a keyword field of each document stored in the document information database;

If the extracted keywords are composed of two or more languages, multi-language semantic network construction method for providing a multi-lingual terminology resource, comprising the step of generating a band-word file by mutually matching the extracted keywords.

The method of claim 21,

In step (b),

Using a standard description rule of the semantic web, a band word file having a different language gives a symmetry property, and a band word file having a same language gives a dummy property to generate a triple structure. How to build a multilingual semantic network to provide multilingual terminology resources.

The method of claim 21,

In step (d),

Determining classification information using URIs assigned to respective keywords in the triple structure;

Obtaining a similar vector including the occurrence frequency and weight of the corresponding keyword from the determined classification information, and mapping and storing the similar vector with each keyword, the multilingual semantic network for providing a multilingual terminology resource.

In the multi-lingual term inference device to search for a query input by the user,

(a) extracting a semantic network including keywords having similarity with the search word from a database when a search word is input by a user;

(b) measuring similarity between nodes in the extracted semantic network based on the search word; and

(c) generating visualization information by calculating position information of each node in the semantic network according to the measured similarity, and providing the generated visualization information to a user;

Multilingual terminology inference method for providing a multilingual terminology resource comprising a.

(a) if a search word is input by a user, searching a database and extracting a band word in which the search word and the triple structure are generated;

(b) generating a semantic network with respect to the extracted band word using attribute information with the search word;

(c) generating semantic information of each keyword in the generated semantic network and mapping the semantic information with the corresponding keyword;

(d) measuring similarity between each keyword based on the search word in the generated semantic network; and

(e) calculating location information of each node in the semantic network according to the measured measured similarity, generating visualization information, and providing the generated visualization information to a user;

The method of claim 27,

In the step (b), a band word having the search word and a dummy property among the extracted band words is interpreted as an instance, and a band word having a symmetry property is connected around the search word. Multilingual terminology inference method for providing a multilingual terminology resource, characterized in that to generate a semantic network.

The method of claim 27,

In the step (c), the classification information for each keyword of each keyword is determined, and a similar vector including the occurrence frequency and weight of the corresponding keyword in the classification information is obtained and mapped to each keyword. Multilingual Terminology Reasoning Method for Provisioning.