KR100963667B1

KR100963667B1 - Apparatus of semantic technological intelligence language mining system for large size database

Info

Publication number: KR100963667B1
Application number: KR1020080040595A
Authority: KR
Inventors: 최윤수; 최성필; 김광영; 이민호; 정창후; 조민희; 윤화묵; 한선화; 진두석
Original assignee: 한국과학기술정보연구원
Priority date: 2008-04-30
Filing date: 2008-04-30
Publication date: 2010-06-15
Also published as: KR20090114778A

Abstract

본 발명은 대용량 데이터베이스의 의미기반 기술용어 발굴 장치에 관한 것으로, 특히 관리 데이터베이스로부터 특정 기술 분야의 정보를 검색할 신규 및 시드의 기술용어와 문맥정보에 기반한 질의어를 검색하여 출력하는 에이알엠(ARM) 수단; 에이알엠 수단으로부터 입력되는 기술용어와 문맥정보에 기반한 질의어가 포함되는 문서집합과 해당 포스팅 정보를 과학정보 데이터베이스로부터 추출하는 티알에스(TRS) 수단; 티알에스 수단이 제공하는 문서집합과 포스팅 정보로부터 기술용어와 문맥정보를 추출하고, 기술용어 들의 연관관계를 분석하는 분석수단; 분석수단으로부터 기술용어, 문맥정보, 연관관계 정보와 문서집합을 제공받고 기술용어의 발생시간, 발생위치, 저자를 포함하여 발생빈도, 연관, 확장의 관계에 의한 기술지식을 추적하여 추출하는 추적수단; 분석수단이 추출한 기술용어, 문맥정보, 연관관계 정보를 제공받고 신규 기술용어와 문맥정보를 추출하여 관리 데이터베이스에 기록하는 에이알이에스(ARES) 수단; 및 에이알이에스 수단에 접속하고 외부 자원으로부터 기술용어, 문맥정보, 연관관계와 기술문서를 추출하여 제공하는 이알에이(ERA) 수단; 을 포함하는 구성을 특징으로 하여, 대용량 데이터베이스의 전체 검색 효율성과 활용성을 높이고, 검색된 기술용어 들의 관계를 분석 및 축적하여 기술정보들의 연관 관계, 시계열 분석, 분류 등을 실시간으로 신속하게 검색 및 추적하여 기술검토, 개발 및 의사결정을 신속하게 하는 효과가 있다. The present invention relates to a device for discovering a semantic-based technical term of a large database, and in particular, to search for and output query terms based on technical terms and contextual information of new and seed to search for information of a specific technical field from a management database. Way; TRS means for extracting a document set including a technical term input from the ALM means and a query word based on contextual information, and corresponding posting information from a scientific information database; Analysis means for extracting a technical term and contextual information from a document set and posting information provided by a TLS means and analyzing the relations between the technical terms; Tracking means that receives technical term, context information, association information and document set from analysis means, and traces and extracts technical knowledge based on occurrence frequency, association, and expansion including technical time of occurrence, location, author ; ARS means for receiving technical terms, contextual information, and association information extracted by the analysis means, extracting new technical terms and contextual information, and recording them in a management database; And an ERA means for accessing the ALS means and extracting and providing description terms, context information, relations and description documents from external resources; It is characterized in that the configuration, including, to increase the overall search efficiency and utilization of large databases, and to analyze and accumulate the relationship of the searched technical terms to quickly search and track the relationship, time series analysis, classification, etc. of the technical information in real time This speeds up technology review, development, and decision making.

특허, 논문, 데이터베이스, 검색, 질의어, 기술용어, 문맥정보, 추출 Patent, Paper, Database, Search, Query, Technical Terms, Contextual Information, Extraction

Description

Meaning-based technology term extraction device for large-scale database {APPARATUS OF SEMANTIC TECHNOLOGICAL INTELLIGENCE LANGUAGE MINING SYSTEM FOR LARGE SIZE DATABASE}

본 발명은 과학기술 및 특허 정보를 기록하고 관리하는 데이터베이스로부터 기술용어를 발굴하는 것으로, 특히, 관리하는 원시적 텍스트 데이터베이스의 정보량이 많은 경우 기존 방식으로 검색하는데 많은 시간이 소요되므로 검색시간을 단축하는 동시에 텍스트의 내용을 정제, 정리, 연계, 확장 처리 및 분석하여 가공된 기술용어로 추출하는 대용량 데이터베이스의 의미기반 기술용어 발굴 장치에 관한 것이다. The present invention finds technical terms from a database for recording and managing scientific and patent information. In particular, when a large amount of information is managed in a primitive text database, it takes a lot of time to search in a conventional manner. The present invention relates to a device for discovering a semantic-based technical terminology of a large-scale database which extracts, refines, organizes, links, expands, and analyzes the content of text.

인류는 습득한 체험, 지식, 기술 등의 정보를 기록 등으로 축적하고, 다음 세대의 후손은 축적된 정보를 활용하여 더욱 발전시키는 과정을 반복하면서 문화를 발달시키는 동시에 풍족한 생활을 영위하여 왔다. Human beings have accumulated the information of experiences, knowledge, skills, etc. acquired through records, and the next generation's descendants have continued to develop culture and live abundant life by repeating the process of further developing by using accumulated information.

이러한 정보의 축적된 양은 시간이 갈수록 계속 늘어나고 축적 수단으로는 일반적으로 책 형태를 이용하며, 다양한 지식 및 정보가 기록된 많은 책을 체계적으로 관리하는 것과 필요한 지식 및 정보를 원하는 시점에 신속하고 정확하게 찾아 내는 것도 중요하다. The accumulated amount of this information increases over time, and generally used as a form of accumulation, and systematically manages many books with various knowledge and information, and finds the required knowledge and information quickly and accurately when desired. It is also important to pay.

컴퓨터의 발달에 의하여 각종 지식 및 정보를 텍스트(TEXT)로 기록 및 관리하게 되었고, 상기와 같이 컴퓨터로 관리되는 정보 중에서 필요한 내용을 정확하고 신속하게 찾기 위한 것으로, 데이터베이스(DATABASE: DB) 기술이 개발되었다. With the development of computers, various knowledge and information have been recorded and managed in text, and the database is developed to accurately and quickly find the necessary contents among the information managed by computers as described above. It became.

현대는 인문, 사회, 과학이 포함되는 모든 분야에서 텍스트로 각각 축적되는 지식 및 정보의 양이 급속하게 팽창하고 있고, 이러한 지식 및 정보를 기록, 관리하기 위하여 컴퓨터가 필수적으로 이용되고 있다. Today, the amount of knowledge and information accumulated in texts is rapidly expanding in all fields including humanities, society, and science, and computers are essential for recording and managing such knowledge and information.

특히, 현대 사회가 산업화되어 가면서 기술개발을 위한 정보 검색의 필요성이 증가하고 있으며, 축적된 대단위의 기술정보로부터 원하는 또는 필요로 하는 기술정보를 발굴(MINING)하는 것은 또 하나의 독립된 기술 분야가 된다. In particular, as the modern society becomes industrialized, the necessity of information retrieval for technology development is increasing, and mining the desired or required technical information from the accumulated large-scale technical information becomes another independent technical field. .

도 1 을 참조하여 일반적인 기술 정보의 발굴 개념을 설명하면, 일반적으로 기술 정보가 포함된 것으로, 특허 문서, 논문, 기술 보고서 등이 있으며, 이러한 기술정보는 텍스트 위주의 데이터베이스(DB)로 기록되고 관리된다. Referring to Figure 1 describes the concept of discovery of general technical information, generally includes technical information, there are patent documents, papers, technical reports, such technical information is recorded and managed in a text-oriented database (DB) do.

상기와 같이 기술정보가 기록된 데이터베이스의 텍스트 정보를 소프트웨어(SOFTWARE: S/W)로 분석(ANALYZE) 및 설명(REPRESENTATION)을 통하여 필요로 하는 기술정보를 검색한다. As described above, the text information of the database in which the technical information is recorded is searched through software (SOFTWARE: S / W) through analysis (ANALYZE) and explanation (REPRESENTATION).

상기와 같이 검색된 기술정보는 기술지식(TECHNOLOGY INTELLIGENCE PRODUCT)로서 기술을 분석한 결과의 목록(REPERTOIRE)이고, 경영자 및 관리자가 인식할 수 있어야 하며, 판단 또는 결정을 지원한다. The technical information retrieved as described above is a technical knowledge (REPERTOIRE) as a technical knowledge (REPERTOIRE), and should be recognized by managers and managers, and supports judgment or decision.

즉, 제공된 정보를 전문가에 의하여 확인하고 기술적으로 분석하며 연구한 결과에 의하여 최고 경영자 또는 관리자가 결정 및 판단하기 위한 참고자료로 사용한다. 그러므로 산업화된 현대사회에서 기술정보의 발굴은 매우 중요한 의사결정의 수단이 된다. In other words, the information provided is verified by experts, technically analyzed, and used as a reference for decision and judgment by the CEO or manager based on the results of the research. Therefore, the discovery of technical information is a very important means of decision making in modern industrialized society.

그러나 컴퓨터로 축적되어 관리되는 텍스트 기반 기술 정보의 량이 매우 크게 늘어나면서, 원하는 정보를 검색, 분석 및 활용하는데 많은 시간이 소요되는 문제가 있다. However, as the amount of text-based technical information accumulated and managed by a computer increases greatly, there is a problem in that it takes a long time to search, analyze, and utilize desired information.

또한, 기술의 발달에 의하여 사용되는 특정 단어 및 용어의 의미가 변화하여 다른 뜻으로 사용되거나 확장 및 축소되어 사용되고, 유사한 다른 용어로 사용되거나 동일한 용어가 다른 의미로 사용될 수 있는 동시에 유사한 기술이 다른 곳에서 이미 개발되거나 사용될 수 있다. In addition, the meaning of certain words and terms used by the development of the technology is changed to be used in different meanings or expanded and reduced, used in other similar terms or the same term in different meanings, while at the same time different technologies Already developed or used in.

따라서 대량으로 관리되는 텍스트 정보를, 특히, 과학기술 정보를 사전에 자동으로 가공하여 정제(CLARIFICATION), 연계(LIAISON), 정리(ARRANGEMENT), 확장(EXTENSION)된 기술지식으로 분석하여 제공하므로 검색의 효율성을 높이는 기술을 개발할 필요가 있다. Therefore, text information managed in large quantities, especially scientific information, is automatically processed in advance, and analyzed and provided by CLARIFICATION, LIAISON, ARRANGEMENT, and EXTENSION technical knowledge. There is a need to develop technologies that increase efficiency.

본 발명은 상기와 같은 종래의 문제점 및 필요성을 개선하기 위하여 안출된 것으로서, 특히 데이터베이스의 텍스트 정보를 분석하여 키워드에 의한 기술용어를 추출하고 각 기술용어 및 다수의 기술용어가 연결된 복합 기술용어를 반복적으로 추출하여 기술지식(TECHNOLOGICAL INTELLIGENCE)으로 관리하므로 검색의 효율성을 높이는 대용량 데이터베이스의 의미기반 기술용어 발굴 장치를 제공하는 것이 그 목적이다. The present invention has been made to improve the above-mentioned problems and necessities, in particular, by analyzing the text information of the database to extract the technical terms by keywords, iteratively repeating the complex technical terms connected to each technical term and a plurality of technical terms The purpose of this study is to provide a semantic-based technical term extraction device for large-scale databases that improves the efficiency of search because it is extracted and managed by TECHNOLOGICAL INTELLIGENCE.

또한, 본 발명은 데이터베이스로 관리되고 논문과 특허가 포함되는 과학 기술정보의 기술용어를 분석하여 각 기술정보 사이의 관계를 실시간으로 분류하도록 하는 대용량 데이터베이스의 의미기반 기술용어 발굴 장치를 제공하는 것이 그 목적이다. In addition, the present invention is to provide a device for discovering the meaning-based technical terminology of a large-scale database that analyzes the technical terminology of scientific and technological information, which is managed as a database and includes articles and patents, to classify the relationship between each technical information in real time. Purpose.

상기와 같은 목적을 달성하기 위하여 안출한 본 발명은, 관리 데이터베이스로부터 특정 기술 분야의 정보를 검색할 신규 및 시드의 기술용어와 문맥정보에 기반한 질의어를 검색하여 출력하는 에이알엠(ARM) 수단; 에이알엠 수단으로부터 입력되는 기술용어와 문맥정보에 기반한 질의어가 포함되는 문서집합과 해당 포스팅 정보를 과학정보 데이터베이스로부터 추출하는 티알에스(TRS) 수단; 티알에스 수단이 제공하는 문서집합과 포스팅 정보로부터 기술용어와 문맥정보를 추출하고, 기술 용어들의 연관관계를 분석하는 분석수단; 분석수단으로부터 기술용어, 문맥정보, 연관관계 정보와 문서집합을 제공받고 기술용어의 발생시간, 발생위치, 저자를 포함하여 발생빈도, 연관, 확장의 관계에 의한 기술지식을 추적하여 추출하는 추적수단; 분석수단이 추출한 기술용어, 문맥정보, 연관관계 정보를 제공받고 신규 기술용어와 문맥정보를 추출하여 관리 데이터베이스에 기록하는 에이알이에스(ARES) 수단; 및 에이알이에스 수단에 접속하고 외부 자원으로부터 기술용어, 문맥정보, 연관관계와 기술문서를 추출하여 제공하는 이알에이(ERA) 수단; 을 포함하는 구성을 제시한다. The present invention has been made in order to achieve the above object, the ARM means for retrieving and outputting a query based on the technical terms and context information of the new and seed to search for information of a specific technical field from the management database; TRS means for extracting a document set including a technical term input from the ALM means and a query word based on contextual information, and corresponding posting information from a scientific information database; Analysis means for extracting descriptive terminology and contextual information from a set of documents and posting information provided by a TLS means, and analyzing an association of descriptive terms; Tracking means that receives technical term, context information, association information and document set from analysis means, and traces and extracts technical knowledge based on occurrence frequency, association, and expansion including technical time of occurrence, location, author ; ARS means for receiving technical terms, contextual information, and association information extracted by the analysis means, extracting new technical terms and contextual information, and recording them in a management database; And an ERA means for accessing the ALS means and extracting and providing description terms, context information, relations and description documents from external resources; Present a configuration comprising a.

바람직하게 상기 추적수단이 소정 시간 동안 추적하여 누적한 기술지식을 분석하여 서비스 시나리오를 제공하는 디알에스(DRS) 수단; 분석수단, 추적수단이 분석하고 추적한 신규의 기술지식을 기록하고 관리하는 지식 데이터베이스; 에이알이에스 수단으로부터 기술용어와 문맥정보를 제공받고 기록하여 관리하고 상기 에이알엠 수단과 분석수단에 제공하는 관리 데이터베이스; 및 티엘에스 수단에 접속하고 특허, 논문, 기술보고서와 상기 이알에이 수단이 외부로부터 추출한 기술문서를 기록하여 관리하고 검색에 의하여 제공하는 과학정보 데이터베이스; 를 더 포함하는 구성을 제시한다. Preferably the DRS means for analyzing the technical knowledge accumulated by the tracking means for a predetermined time to provide a service scenario; A knowledge database for recording and managing new technical knowledge analyzed and tracked by the analysis means and the tracking means; A management database that receives, records, and manages technical terms and context information from an ALS means and provides them to the ALM means and analysis means; And a scientific information database connected to TLS means for recording, managing and providing patents, articles, technical reports, and technical documents extracted from the outside by means of the RS; Presents a configuration that includes more.

또한, 상기 에이알엠 수단은 관리 데이터베이스로부터 입력된 신규 및 시드의 기술용어에 기반을 두어 선택횟수가 비교적 많고 특정 기술영역과 기술패턴의 문서집합을 검색하는 질의어를 추출하고, 상기 관리 데이터베이스로부터 입력된 신규 및 시드의 문맥정보에 의하여 자주 출현하는 어휘 패턴의 질의어를 추출하며 상 기 티알에스 수단에 제공하는 구성으로 이루어지는 것을 특징으로 한다. In addition, the ALM means extracts a query word that searches a document set of a specific description area and a technology pattern with a relatively high number of selections based on the new and seed description terms inputted from the management database, and inputs the information from the management database. It is characterized by consisting of a configuration of extracting a query word of the frequent vocabulary pattern by the context information of the new and the seed and providing to the TLS means.

또한, 상기 티알에스 수단은 에이알엠 수단으로부터 기술용어와 문맥정보의 질의어를 입력하고 부하가 소정값 이하로 운용되는 상태의 상기 과학정보 데이터베이스로부터 상기 질의어와 지정된 품사, 문장성분이 포함되는 문서들의 집합 및 각 문서의 포스팅 정보를 추출하는 구성으로 이루어지는 것을 특징으로 한다. Also, the TLS means inputs a query term of a technical term and context information from an ALM means and includes a set of documents including the query word, a designated part-of-speech, and a sentence component from the scientific information database in a state where a load is operated below a predetermined value. And extracting the posting information of each document.

또한, 상기 포스팅 정보는 각 문서를 구성하는 색인어의 가중치 정보가 포함되는 것을 특징으로 한다. The posting information may include weight information of an index word constituting each document.

또한, 상기 분석수단은 티알에스 수단으로부터 문서집합과 포스팅 정보를 입력하고 기술용어와 문맥정보를 추출하는 타스 수단; 티알에스 수단으로부터 문서집합과 포스팅 정보를 입력하고 지식 데이터베이스로부터 신규 기술용어 집합을 입력하며 문서들 사이에서의 기술 연관관계 정보를 추출하여 에이알이에스 수단과 지식 데이터베이스에 제공하는 타마 수단; 타스 수단과 타마 수단으로부터 추출된 결과를 입력하고 시소러스, 온톨로지, 어휘 지능망을 이용하여 기술용어 후보를 타스 수단에 제공하고 기술용어의 연관관계를 타마 수단에 제공하는 티엘에이 수단; 을 포함하는 구성으로 이루어지는 것을 특징으로 한다. The analyzing means may further include tass means for inputting a document set and posting information from the TLS means and extracting a technical term and context information; A Tama means for inputting document set and posting information from a TLS means, inputting a new terminology set from a knowledge database, and extracting technical association information between documents and providing the information to the ALS means and the knowledge database; A TLA means for inputting the results extracted from the TAS means and the Tama means, providing the term candidates to the TAS means using the thesaurus, ontology, and lexical intelligent network, and providing the relations of the technical terms to the Tama means; Characterized in that consisting of a configuration including a.

또한, 상기 타스 수단은 티알에스 수단으로부터 입력된 문서집합으로부터 품사 나열 방식의 패턴 분석으로 기술용어를 추출하고, 관리 데이터베이스에 기록되지 않은 신규 기술용어를 자동과 수동 방식으로 분류하며, 분류된 기술용어에 의한 문맥정보를 추출하여 에이알이에스 수단에 제공하고, 질의된 문맥정보와 함께 사용되는 기술용어를 추출하며 관리 데이터베이스에 기록되지 않은 신규 기술용어를 자 동과 수동 방식으로 분류하고, 분류된 기술용어에 의한 문맥정보를 추출하여 에이알이에스 수단에 제공하는 구성으로 이루어지는 것을 특징으로 한다. In addition, the TAS means extracts a technical term from a document set input from a TLS means by pattern analysis of a part-of-speech arrangement method, classifies new technical terms not recorded in the management database in an automatic and manual manner, and classifies the technical terms. Extract the contextual information by the user and provide it to the ALS means, extract the technical terminology used with the queried contextual information, and classify the new technical terminology not recorded in the management database in an automatic and manual manner. It is characterized by consisting of a configuration that extracts the contextual information by the term provided to the ALS.

또한, 상기 타마 수단은 티알에스 수단으로부터 입력되는 문서집합으로부터 기술용어의 품사 정보 패턴 및 어휘에 의하여 연관관계를 추출하고, 기존의 연관관계 정보를 자동으로 재확인하며, 추출된 연관관계를 자동과 수동으로 검증하고, 시소러스, 온톨로지, 어휘지능망에 의하여 연관관계를 분석하여 에이알이에스에 제공하고 특정 기술에 의한 연관관계 정보를 지식 데이터베이스에 제공하는 구성으로 이루어지는 것을 특징으로 한다.In addition, the Tama means extracts an association relationship from the parts of the documents input from the TLS means by the part-of-speech information pattern and vocabulary of the technical term, automatically reconfirms the existing association information, and automatically and manually extracts the extracted association relationship. It is characterized in that the configuration consisting of a thesaurus, ontology, vocabulary intelligent network to analyze the association to provide the ALS and the association information by a specific technology to the knowledge database.

또한, 상기 시소러스, 온톨로지, 어휘지능망에 의한 연관관계의 분석은 티엘에이 수단에 의하여 분석하는 것을 특징으로 한다. In addition, the analysis of the relationship by the thesaurus, ontology, and lexical intelligence network is characterized by the analysis by TLA means.

또한, 상기 추적수단은 분석수단으로부터 추출된 기술용어와 문서내용과 연관관계의 정보를 입력하고 수동과 자동으로 기술지식을 추적하는 사트 수단; 사트 수단으로부터 기술용어, 연관관계, 문서내용의 정보를 입력하고 지식 데이터베이스로부터 누적된 기술용어, 연관관계, 문서내용의 정보를 입력하여 통계적 분석결과를 제공하는 샘 수단; 및 사트 수단으로부터 기술용어, 연관관계, 문서내용의 정보를 입력하고 지식 데이터베이스로부터 누적된 기술용어, 연관관계, 문서내용의 정보를 입력하여 기술집합으로 분류하고 군집화한 결과를 제공하는 티시엠 수단; 을 포함하는 구성으로 이루어지는 것을 특징으로 한다. In addition, the tracking means comprises: a satt means for inputting the technical terms and document content and the relationship information extracted from the analysis means and tracking the technical knowledge manually and automatically; Sam means for inputting descriptive terms, associations, document content information from the Sat means, and inputted descriptive terms, associations, document information information from the knowledge database to provide statistical analysis results; And t-SIM means for inputting description terms, association relations and document contents information from the satt means, and inputting description terms, association relations and document contents accumulated from the knowledge database, and classifying them into description sets and providing clustering results. Characterized in that consisting of a configuration including a.

또한, 상기 사트 수단은 분석수단으로부터 검증되어 추출된 기술용어와 문서내용과 연관관계 정보를 입력하고 지식 데이터베이스로부터는 누적된 기술용어, 기 술과 기술, 기술과 문서에 의한 연계정보를 입력하며, 각 기술용어별로 발생시간, 생산자 정보, 위치정보가 포함되는 연관관계를 추적한 기술지식을 지식 데이터베이스와 디알에스 수단에 제공하며 도표와 테이블 정보로 제공하는 구성으로 이루어지는 것을 특징으로 한다. In addition, the sart means inputs the technical terms and document contents and association information extracted and verified from the analysis means, and inputs the accumulated technical terms, technical and technical information, and linkage information by technical and document from the knowledge database. It is characterized by consisting of a technical knowledge that tracks the associations including occurrence time, producer information, and location information for each technical term to the knowledge database and the DSA means, and provides them with charts and table information.

또한, 상기 연관관계는 기술용어의 시간별 발생빈도, 관계, 거리정보에 따른 기술의 융합과 분열, 발생위치별 발생빈도, 새로운 명칭의 추정 및 검증이 포함되는 것을 특징으로 한다. In addition, the association relationship is characterized in that it includes the fusion and fragmentation of the technology according to the occurrence frequency, relationship, distance information of the technical term, occurrence frequency for each occurrence position, estimation and verification of a new name.

또한, 상기 샘 수단은 사트 수단과 지식 데이터베이스로부터 기술용어, 연관관계, 문서내용의 정보를 입력하고, 상기 사트 수단을 통하여 대용량으로 제공되는 텍스트의 문서집합을 가설설정, 빈도 정보와 해당 원인에 의한 비선형 회귀분석, 주성분 분석 중에서 선택된 어느 하나 이상으로 통계 분석하여 추적하는 구성으로 이루어지는 것을 특징으로 한다. In addition, the sampling means inputs information of technical terms, relations, and document contents from the Sat means and the knowledge database, and hypothesizes the document set of texts provided in large quantities through the Sat means based on the hypothesis setting, the frequency information and the corresponding cause. Nonlinear regression, principal component analysis characterized in that consisting of a configuration for tracking by analyzing any one or more selected from.

또한, 상기 티시엠 수단은 사트 수단과 지식 데이터베이스로부터 기술용어, 연관관계, 문서내용의 정보를 입력하고, 상기 사트 수단을 통하여 대용량으로 제공되는 텍스트의 문서집합을 추출된 명칭과 기술 사이의 연관관계로 분류하고 군집화하는 구성으로 이루어지는 것을 특징으로 한다. In addition, the TSI means input information of the technical terms, associations, document contents from the Sat means and knowledge database, and the relationship between the name and description extracted from the document set of the text provided in a large amount through the Sat means Characterized in that consisting of a configuration to classify and cluster.

또한, 상기 관리 데이터베이스는 시드 기술용어를 누적하여 기록하고 관리하는 기술용어 사전 데이터베이스; 및 시드 문맥정보를 누적하여 기록하고 관리하는 문맥정보 데이터베이스; 를 포함하는 구성으로 이루어지는 것을 특징으로 한다. The management database may include a technical term dictionary database that accumulates and manages the seed terminology; And a context information database for accumulating and recording seed context information; Characterized in that consisting of a configuration including a.

또한, 상기 과학정보 데이터베이스는 특허정보로 국내, 미국, 일본, 유럽 지 역에서 수집된 특허 정보를 기록하는 특허 데이터베이스; 논문정보로 관리되는 논문과 기술보고서의 영문초록을 기록하는 논문 데이터베이스; 특허, 논문, 기술보고서를 포함하는 기술문서를 외부로부터 수집하여 기록하는 외부 데이터베이스; 를 포함하여 이루어지는 구성을 특징으로 한다.The scientific information database may further include a patent database that records patent information collected from domestic, US, Japan, and European regions as patent information; An article database that records English abstracts of articles and technical reports managed by thesis information; An external database for collecting and recording technical documents including patents, articles, and technical reports from outside; Characterized in that the configuration consisting of.

또한, 상기 에이알이에스 수단은 분석수단이 추출한 신규의 기술용어, 문맥정보, 연관관계 정보와 상기 이알에이 수단이 외부로부터 추출한 기술용어, 문맥정보, 연관관계 정보를 관리 데이터베이스의 할당된 영역에 기록하고 관리하는 구성으로 이루어지는 것을 특징으로 한다. In addition, the ALS means records the new technical terms, context information and association information extracted by the analysis means and the technical terms, context information and association information extracted from the outside by the means in the assigned area of the management database. And characterized in that consisting of a configuration to manage.

또한, 상기 이알에이 수단은 외부의 웹 사이트와 데이터베이스로부터 기술용어, 문맥정보, 연관관계 정보를 자동 추출하여 지식 데이터베이스와 상기 과학정보 데이터베이스, 에이알이에스 수단에 각각 제공하는 구성으로 이루어지는 것을 특징으로 한다. The EAL means may be configured to automatically extract technical terms, contextual information, and association information from external web sites and databases, and provide them to the knowledge database, the scientific information database, and the AR means. .

상기와 같은 목적을 달성하기 위하여 안출한 본 발명은, 에이알엠수단, 티알에스수단, 분석수단, 추적수단, 관리 데이터베이스, 과학정보 데이터베이스, 에이알이에스 수단, 이알에이 수단, 지식 데이터베이스를 포함하여 의미기반 기술용어를 발굴하는 장치에 있어서, 티알에스 수단이 제공하는 특정 기술의 문서집합과 포스팅 정보로부터 품사 나열 방식의 패턴 분석으로 기술용어를 자동과 수동 방식으로 추출하고, 새로운 기술용어와 문맥정보를 분류하며, 추출된 기술용어들의 연관관계를 분석하고 검증하는 분석수단; 및 분석수단이 분석한 기술용어를 문서집합으 로부터 추출하고 추정하여 확장상태의 연관관계를 의미 확장, 시간별 발생빈도, 기술용어 사이의 관계, 거리정보에 의한 기술융합 및 분열, 발생위치별 발생빈도, 새로운 명칭을 추정하고 검증하는 추적수단; 을 포함하는 구성을 제시한다. The present invention devised in order to achieve the above object, including the means of AM, TS, analysis means, tracking means, management database, scientific information database, AI means, AI means, knowledge database means In the device for discovering the technical terminology, the technical term is extracted automatically and manually by the pattern analysis of the part-of-speech method from the document set and posting information of the specific technology provided by TLS means, and the new technical term and context information are classified. Analysis means for analyzing and verifying the relations between the extracted technical terms; And the technical terms analyzed by the analysis means are extracted and estimated from the document set to indicate the relation of expansion status. Expansion, frequency of occurrence by time, relationship between technical terms, technology fusion and fragmentation by distance information, frequency of occurrence by location Tracking means for estimating and verifying a new name; Present a configuration comprising a.

바람직하게, 상기 추출된 기술용어의 연관관계를 분석 및 검증은 시소러스, 온톨로지, 어휘지능망 방식 중에서 선택된 어느 하나를 이용하는 구성으로 이루어지는 것을 특징으로 한다. Preferably, the analysis and verification of the relationship between the extracted technical terms is characterized in that consisting of the configuration using any one selected from thesaurus, ontology, lexical intelligence network.

상기와 같은 목적을 달성하기 위하여 안출한 본 발명은, 에이알엠수단, 티알에스수단, 분석수단, 추적수단, 관리 데이터베이스, 과학정보 데이터베이스, 에이알이에스 수단, 이알에이 수단, 지식 데이터베이스를 포함하는 장치에 의하여 대용량 데이터베이스의 의미기반 기술용어를 발굴하는 방법에 있어서, 분석수단에 의하여 과학정보 데이터베이스의 운용부하가 소정 비율로 낮은 경우 에이알엠 수단이 관리 데이터베이스로부터 검색한 질의어를 추출하여 티알에스 수단에 제공하고 티알에스 수단은 질의어에 의하여 과학 데이터베이스로부터 지정된 기술의 문서집합과 해당 포스팅 정보를 추출하여 분석수단에 제공하는 과정; 분석수단은 타스 수단에 의하여 문서집합으로부터 신규와 시드의 기술용어가 검출되면 에이알이에스 수단에 통보하고 지식 데이터베이스에 등록하는 기술용어 과정; 분석수단은 타마 수단에 의하여 문서집합으로부터 신규와 시드의 문맥정보가 검출되면 에이알이에스 수단에 통보하고 지식 데이터베이스에 등록하는 문맥정보 과정; 문서집합과 지식 데이터베이스의 정보를 추적수단에 제공하여 연관된 기술을 추적하고, 추적된 정보 를 지식 데이터베이스와 디알에스 수단에 제공하며, 추적된 정보를 분석하여 문서로 출력하는 과정; 을 포함하는 구성을 제시한다. The present invention devised in order to achieve the above object, an apparatus comprising an ALM means, a TS means, an analysis means, a tracking means, a management database, a scientific information database, an ALS means, an AI means, a knowledge database In the method of discovering the semantic-based technical term of a large database by means of analysis, when the operational load of the scientific information database is low by a predetermined ratio by the analyzing means, the AML means extracts a query searched from the management database and provides it to the TLS means. And the TLS means extracting the document set of the designated technology and the corresponding posting information from the scientific database by the query and providing the analysis information to the analyzing means; The analyzing means may include: a technical term process of notifying an ALS means and registering in a knowledge database when a technical term of a new and seed is detected from a document set by a third means; The analyzing means includes: a context information process of notifying the ALS means of registering new and seed context information from the document set by the Tama means and registering it in the knowledge database; Providing information in the document set and knowledge database to the tracking means to track the associated skills, providing the tracked information to the knowledge database and the means for RS, and analyzing and outputting the tracked information as a document; Present a configuration comprising a.

바람직하게, 상기 과학정보 데이터베이스의 운용부하가 소정 비율보다 높은 경우는 이알에이 수단에 의하여 외부자원으로부터 질의어를 수집하고 에이알이에스 수단에 통보하며 지식 데이터베이스에 등록하고, 수집한 기술문서를 과학정보 데이터베이스에 기록한 후에 상기 출력하는 과정으로 진행하는 기록과정; 을 더 포함하는 구성을 제시한다. Preferably, if the operational load of the scientific information database is higher than a predetermined ratio, query means are collected from an external resource by the AL means, notified to the AR means, registered in the knowledge database, and the collected technical documents are stored in the scientific information database. A recording process that proceeds to the outputting process after recording in the; Present a configuration that includes more.

또한, 상기 추적된 정보는 기술용어의 확장, 축소에 의한 연관관계, 발생시간에 의한 발생빈도 분석, 발생시간에 의한 기술용어 간 연관관계 분석, 기술용어 사이의 거리정보에 의한 기술의 융합과 분열과정 분석, 발생위치에 의한 발생빈도 분석, 새로운 명칭의 추정과 검증이 포함되는 것을 특징으로 한다. In addition, the tracked information may be related to the expansion and contraction of technical terms, the occurrence frequency analysis by the occurrence time, the analysis of the relationship between the technical terms by the generation time, the fusion and division of the technology by the distance information between the technical terms. Process analysis, frequency of occurrence by location, and estimation and verification of new names.

상기와 같은 구성의 본 발명은 텍스트로 이루어지는 과학 기술정보의 의미에 기반한 기술용어와 문맥정보를 지속적이고 반복적으로 추출하여 관리하므로 대용량 데이터베이스의 전체적인 검색 효율성과 활용성을 높이는 산업적 이용효과가 있다. The present invention having the above configuration has an industrial use effect of increasing the overall search efficiency and utilization of a large database because it continuously and repeatedly extracts and manages technical terms and contextual information based on the meaning of scientific and technological information made of text.

또한, 과학 기술정보의 텍스트로부터 검색된 기술용어들의 관계를 분석 및 축적하여 기술정보들의 연관 관계, 시계열 분석, 분류 등을 실시간으로 신속하게 검색하고 추적하여 기술검토 및 개발을 용이하게 하는 사용상 편리한 효과가 있다. In addition, by analyzing and accumulating the relations of technical terms retrieved from the texts of scientific and technological information, the user-friendly effect of facilitating the technology review and development by quickly searching and tracking the relations, time series analysis, and classification of technical information in real time. have.

이하, 상기와 같은 구성의 본 발명에 의한 것으로, 대용량 데이터베이스의 의미기반 기술용어 발굴 장치의 바람직한 실시 예를 첨부된 도면을 참조하여 상세히 설명한다. Hereinafter, with reference to the accompanying drawings, according to the present invention having the configuration as described above, a preferred embodiment of the semantic technology term extraction apparatus of a large-capacity database will be described in detail.

실시 예Example

본 발명을 설명하기 위하여 첨부된 것으로, 도 2 는 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치로 최종 목표를 달성하기 위한 단계별 작용 설명도 이며, 도 3 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치 기능 구성도 이고, 도 4 는 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 분석수단의 상세 기능 구성도 이며, 도 5 는 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 추적수단의 상세 기능 구성도 이고, 도 6 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 관리 데이터베이스의 상세 기능 구성도 이며, 도 7 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 과학정보 데이터베이스의 상세 기능 구성도 이고, 도 8 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 분석수단의 지식 데이터베이스의 상세 기능 구성도 이며, 도 9 는 본 발명의 일례에 의한 것으로 데이터베이스의 의 미기반 기술용어 발굴 장치에 의하여 추적되는 기술용어의 연관관계 설명도 이고, 도 10 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치의 전체 상세 기능 구성도 이며, 도 11 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 방법 순서도 이다. It is attached to explain the present invention, Figure 2 is an example of the present invention is a diagram illustrating the step-by-step operation to achieve the final goal of the semantic technology term extraction apparatus of the database, Figure 3 is an example of the present invention Figure 4 is a functional configuration diagram of the semantic base technology term extraction device of the database, Figure 4 is a detailed functional configuration diagram of the analysis means constituting the meaning-based technical term excavation apparatus of the database according to an example of the present invention, Figure 5 Fig. 6 is a detailed functional configuration diagram of the tracking means constituting the semantic-based technical term excavation apparatus of the database as an example. 7 is a configuration diagram and FIG. 7 is an example of the present invention. FIG. 8 is a detailed functional configuration diagram of a scientific information database constituting a semantic technology term discovery device, and FIG. 8 is a detailed functional configuration diagram of a knowledge database of analysis means constituting a semantic technology term discovery device of a database according to an example of the present invention. FIG. 9 is an explanatory diagram of a relation of technical terms tracked by an apparatus for discovering meaning-based technical terms of a database according to an example of the present invention, and FIG. 10 is a meaning-based technical term of a database according to an example of the present invention. 11 is an overall detailed functional configuration diagram of the excavation apparatus, and FIG. 11 is a flowchart illustrating a method of discovering a term based on technical terms of a database according to an example of the present invention.

본 발명의 일례를 설명함에 있어서, 본 발명과 직접적으로 관련 없고, 잘 알려져 있는 기술 내용에 대하여서는 도면 도시 및 설명을 생략하므로, 본 발명의 요지를 흐리지 않고 명확하게 전달한다. In describing an example of the present invention, drawings and descriptions of well-known technical contents that are not directly related to the present invention are omitted, and thus the present invention is clearly communicated without obscuring the gist of the present invention.

도 2 를 참조하여 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치로 최종 목표를 달성하기 위한 단계별 작용을 설명하면, 특허, 논문, 기술보고서 등이 포함되는 텍스트 기반 기술정보를 검색하고 관리하는 시스템을 이용하여 필요로 하는 기술정보를 검색한다. Referring to Figure 2 describes the step-by-step operation to achieve the final goal with a device for discovering the semantic technology terminology of the database by means of an example of the present invention, search for text-based technical information, including patents, articles, technical reports, etc. Search for the technical information you need using the management system.

상기와 같이 검색된 기술정보는, 전문 어휘자원, 지식 표현기술, 언어 처리기술, 기계 학습기술 등을 이용하고, 과학기술 어휘망, 전문용어 자동인식 기술, 과학기술 전문, 규칙기반 정보 추출, 텍스트 기반 노벨티 추출, 토픽 클러스터링 및 클래시피케이션, 어소시에이션 룰 발굴 등에 의하여 기술을 자동탐지하고, 기술이 시계열적 관계를 추적하며, 기술과 기술 사이의 관계를 발견하고, 기술을 자동으로 분류하며, 기술연관 정보를 인식 및 추출하고, 외부정보와 연계하여 분석하는 등의 작업을 실시간(REAL-TIME)으로 처리한다. Technical information retrieved as described above, using the technical vocabulary resources, knowledge expression technology, language processing technology, machine learning technology, etc., scientific and technical vocabulary network, terminology automatic recognition technology, scientific and technical expertise, rule-based information extraction, text-based Automatically detects technologies, extracts time series relationships, discovers relationships between skills, classifies them automatically, and associates them with novelty extraction, topic clustering and settlement, and discovery rule discovery. Recognizes and extracts information and analyzes it in conjunction with external information in real-time.

상기와 같이 실시간 처리된 기술정보는 외부에 연계된 정보를 참조하여 사용자 프로파일에 기반하여 개인화되고, 기계 학습에 기반하여 혁명적으로 기능화되며, 사용자 피드백을 기반으로 토픽 랭킹에 적응하며, 개인화된 비밀 수단이 되고, 사용자 스크립트에 의한 컴포넌트 재정열이 가능하게 하는 등으로 사용자를 지원한다. The technical information processed in real time as described above is personalized based on user profile with reference to externally linked information, revolutionized functionalized based on machine learning, adapted to topic ranking based on user feedback, and personalized secret means. This allows the user to reorder components by user scripts.

즉, 검색할 기술정보와 내용을 다양하게 선택 및 변경하고, 시간적, 지역적, 기술적 등에 의한 연결 관계를 분석하므로 사용자 개인이 의사결정을 용이하게 하도록 지원한다. That is, by selecting and changing the technical information and contents to be searched in various ways, and analyzing the connection relations based on time, region, and technology, the user supports the decision making easily.

상기와 같은 시스템의 실시간 대화방식으로 동작하여 기술지식(TECHNOLOGICAL INTELLIGENCE)을 제공하고, 정보검색을 정확하게 하며, 검색 결과를 실시간으로 분석하고, 최신 자연어 처리 방법, 패턴인식, 데이터 발굴 방법 등을 이용하며, 특허, 논문, 기술보고서, 웹문서를 포함하는 대용량의 기술문서 자원(RESOURCE)을 대상으로 한다. By operating in the real-time dialogue method of the above system, it provides technical knowledge, accurate information search, analysis of search results in real time, using the latest natural language processing method, pattern recognition, data discovery method, etc. It targets a large number of RESOURCEs, including patents, papers, technical reports, and web documents.

도 3 내지 도 8 을 참조하여, 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 설명하면, 에이알엠(ARM) 수단(100), 티알에스(TRS) 수단(110), 분석수단(120), 추적수단(130), 관리 데이터베이스(140), 과학정보 데이터베이스(150), 에이알이에스(ARES) 수단(160), 이알에이(ERA) 수단(170), 디알에스(DRS) 수단(180), 지식 데이터베이스(190)를 포함하는 구성이다. Referring to FIGS. 3 to 8, an apparatus for discovering a semantic technology term based on a database according to an example of the present invention will be described. An ARM means 100, a TRS means 110, and an analysis means may be described. 120, tracking means 130, management database 140, scientific information database 150, ARS means 160, ERA means 170, DRS means 180, the configuration includes the knowledge database 190.

상기 에이알엠 수단(100)은 관리 데이터베이스(140)로부터 특정 기술 분야의 정보를 검색할 신규 및 시드(SEED)의 기술용어와 문맥정보에 기반한 질의어를 검색하여 출력한다. 상기 관리데이터베이스(140)는 첨부된 도 6 에 도시된 것과 같이 기존에 관리되는 시드 기술용어를 누적하여 기록하고 관리하는 기술용어 사전 데이터베이스(142)와, 기존에 관리되는 시드 문맥정보를 누적하여 기록하고 관리하는 문맥정보 데이터베이스(144)를 포함하는 구성이다. The ALM means 100 searches for and outputs a query word based on new and seed technical terms and context information for searching for information of a specific technical field from the management database 140. The management database 140 accumulates and records the technical term dictionary database 142 that accumulates and manages the previously managed seed description terms and the previously managed seed context information as shown in FIG. 6. And a contextual information database 144 that manages and manages.

즉, 상기 에이알엠 수단(100)은 관리 데이터베이스(140)의 기술용어사전 데이터베이스(142)로부터 입력된 신규 및 시드의 기술용어에 기반하여 선택횟수가 비교적 많고 특정 기술영역과 기술패턴의 문서집합을 검색할 수 있는 질의어를 추출하며, 상기 관리 데이터베이스(140)의 문맥정보 데이터베이스(144)로부터 입력된 신규 및 시드의 문맥정보에 의하여 자주 출현하는 어휘 패턴의 질의어를 추출하며, 상기와 같이 추출된 기술용어 및 문맥정보를 티알에스 수단(110)에 제공한다. That is, the ALM means 100 has a relatively large number of selections based on the new and seed technical terms inputted from the technical term dictionary database 142 of the management database 140, and generates a document set of a specific technical area and technical pattern. Extracts a query word that can be searched, extracts a query word of a lexical pattern that frequently appears by new and seed context information input from the context information database 144 of the management database 140, and the extracted technology as described above. The term and contextual information are provided to the TS unit 110.

상기 기술용어는 각 기술 분야에서 일반적으로 많이 사용하는 용어이며, 상기 용어에 포함된 의미는 해당 기술 분야에서 통상의 지식을 가진 사람이 쉽게 알 수 있고 이해할 수 있는 단어이다. 상기 신규 기술용어는 새로이 검색된 기술용어이고, 시드 기술용어는 기존에 검색되어 데이터베이스(DATABASE: DB)에 기록되고 관리되는 기술용어이다. The technical term is a term commonly used in each technical field, and the meaning included in the term is a word easily understood and understood by a person having ordinary knowledge in the technical field. The new technical term is a newly searched technical term, and the seed technical term is a technical term that is previously searched and recorded and managed in a database (DATABASE: DB).

상기 기술용어에 기반한 질의어는 기술용어 사전 데이터베이스(142)에 기록되어 관리하는 다수의 기술용어 중에서 이용자 또는 사용자가 원하는 기술용어를 선택하여 질의어로 사용하는 것으로, 일례로, “STEM CELL", ”GENE EXPRESSION" 등이 될 수 있으며, 문맥정보에 기반한 질의어는 문맥정보 데이터베이스(144)에 기 록되어 관리되는 것으로서, 신규 기술용어들의 관계를 추출하기 위하여 특정 기술명칭이나 관계가 나타날 개연성이 있는 위치에서 자주 출현하는 어휘패턴으로 생성되는 질의어로, 일례로, ”TECHNOLOGIES SUCH AS", “AND OTHER TECHNOLOGIES" 등이 될 수 있다. The query term based on the terminology is a term used to select a term or a term desired by a user or a user from among a plurality of terminologies recorded and managed in the terminology dictionary database 142. For example, “STEM CELL” and “GENE”. EXPRESSION "and the like, and the query term based on the context information is recorded and managed in the context information database 144. In order to extract the relations of the new terminologies, the term is likely to appear in a specific technical name or relationship. As a query word generated by the appearing lexical pattern, it may be, for example, "TECHNOLOGIES SUCH AS", "AND OTHER TECHNOLOGIES", and the like.

본 발명의 기술적 사상은 기술의 명칭이나 관계를 추출하기 위하여 데이터베이스 전체를 분석하는 것보다, 질의어를 사용하여 특정 기술 영역이나 패턴이 포함되는 “문서집합”을 대상으로 필요한 기술을 추출하고 분석하는 것이, 즉, 대용량의 데이터베이스로부터 질의어를 사용하여 특정 기술 분야의 정보만을 선택 및 검색하는 것이 효과적이라는 개념이다. The technical idea of the present invention is to extract and analyze a necessary technique for a "document set" that includes a specific technical area or pattern using a query rather than analyzing the entire database to extract the name or relationship of the technology. In other words, it is effective to select and retrieve only information of a specific technical field by using a query from a large database.

상기 티알에스(TRS) 수단(110)은 에이알엠 수단(100)으로부터 입력되는 기술용어와 문맥정보에 기반한 질의어가 포함되는 문서집합과 해당 포스팅(POSTING) 정보를 과학정보 데이터베이스(150)로부터 추출한다. 상기 과학정보 데이터베이스(150)는 첨부된 도 7 에 도시되어 있는 것과 같이, 국내, 미국, 일본, 유럽 지역의 특허 문서를 기록하고 관리하는 특허 데이터베이스(152), 논문과 기술보고서의 영문초록을 기록하고 관리하는 논문 데이터베이스(154), 특허, 논문, 기술보고서를 포함하는 기술문서를 웹(WEB)이 포함되는 외부로부터 수집하여 기록하는 외부 데이터베이스(156) 를 포함하여 이루어진다. The TRS means 110 extracts from the scientific information database 150 a document set including the technical term and the query word based on the context information input from the ALM means 100 and corresponding posting information. The scientific information database 150 records an English abstract of a patent database 152, a paper and a technical report, which records and manages patent documents in Korea, the United States, Japan, and Europe, as shown in FIG. And an external database 156 that collects and records a technical document including a patent database, a patent, a paper, and a technical report from the outside including the web.

상기 티알에스 수단(110)은 에이알엠 수단(100)으로부터 기술용어와 문맥정보의 질의어를 입력하고 상기 과학정보 데이터베이스(150)가 소정 값 이하의 부 하(LOAD)로 운용되는 상태에서, 상기 과학정보 데이터베이스(150)로부터 상기 질의어와 지정된 품사, 문장성분이 포함되는 문서들의 집합 및 각 문서의 포스팅 정보를 추출한다. The TS unit 110 inputs a query term of a technical term and context information from the ALM unit 100 and the science information database 150 is operated with a load below a predetermined value. The information database 150 extracts the set of documents including the query word, the designated part-of-speech, the sentence component, and the posting information of each document.

상기 포스팅(POSTING) 정보는 각 기술문서에 구성되는 색인어의 가중치 정보이다. The posting information is weight information of index words formed in each technical document.

상기 티알에스 수단(110)은 에이알엠 수단(100)에서 생성된 질의어를 기반으로 대용량, 일례로, 논문 데이터베이스(154)로 약 50 기가 바이트(GIGA BYTE), 특허 데이터베이스(152)로 약 40 기가 바이트의 총 90 기가 바이트 용량을 갖는 과학정보 데이터베이스(150)를 검색한다. 상기 검색의 결과는 특정한 질의어가 포함된 문서들의 집합이다. The TS unit 110 has a large capacity, for example, about 50 gigabytes (GIGA BYTE) to the article database 154 and about 40 gigabytes to the patent database 152 based on the query word generated by the AR means 100. Search scientific information database 150 having a total capacity of 90 gigabytes of bytes. The result of the search is a collection of documents containing a particular query.

상기 질의어에 의한 검색은 품사, 문장성분에 의한 언어정보에 기반하여 검색하고 추출하는 것으로, 일례로, <명사구> + “AND OTHER" + "TECHNOLOGIES"에 기반하는 검색은 “AND OTHER TECHNOLOGIES"가 포함되는 문서 중에서 그 앞에 ”명사구“가 위치하는 문서만을 검색하는 것이고, 다른 일례로, ”TECHNOLOGIES" + "ESPECIALLY" + <명사구> + <명사구> +,,,,에 기반하는 검색은 TECHNOLOGIES ESPECIALLY가 포함되는 문서 중에서 그 뒤에 “명사구”가 계속 나열된 문서만을 검색하는 것이다. The search by the query is to search and extract based on language information by parts of speech and sentence components. For example, the search based on <noun phrase> + "AND OTHER" + "TECHNOLOGIES" includes "AND OTHER TECHNOLOGIES". Searches only documents that have "noun phrase" in front of them, and in another example, searches based on "TECHNOLOGIES" + "ESPECIALLY" + <noun phrase> + <noun phrase> + ,,,, are included by TECHNOLOGIES ESPECIALLY. Only those documents that are listed after the "noun phrase" will be searched.

또한, 각 개별 문서에 대한 내부 분석 정보를 제공하는 것으로, 일례로, 색인어의 가중치 정보(DOCUMENT FREQUENCY, TERM FREQUENCY 등)를 분석하여 제공한다. Further, internal analysis information for each individual document is provided. For example, weight information (DOCUMENT FREQUENCY, TERM FREQUENCY, etc.) of index words is analyzed and provided.

즉, 상기 티알에스 수단(110)은 상기 에이알엠 수단(100)으로부터 기술용어와 문맥정보에 의한 질의어를 입력하고, 상기 과학정보 데이터베이스(150)를 검색하여 상기 질의어가 포함되는 문서집합을 추출하는 동시에 각 문서의 색인 가중치 정보가 포함되는 포스팅 정보를 함께 추출하므로 특정 기술영역의 문서집합을 추출하여 분석수단(120)에 제공한다. That is, the TS unit 110 inputs a query term based on technical terms and context information from the AL means 100, searches the scientific information database 150, and extracts a document set including the query term. At the same time, since the posting information including the index weight information of each document is extracted together, a document set of a specific technical area is extracted and provided to the analyzing means 120.

상기 분석수단(120)은 상기 티알에스 수단(110)이 제공하는 문서집합과 포스팅 정보로부터 기술용어와 문맥정보를 추출하고, 기술용어들의 연관관계를 분석하는 것으로, 타스(TAS) 수단(122), 티엘에이(TLA) 수단(124), 타마(TAMA) 수단(126)을 포함하는 구성이다. The analyzing means 120 extracts the technical terms and the context information from the document set and posting information provided by the TLS means 110, and analyzes the relationship between the technical terms, TAS means 122, The configuration includes a TLA means 124 and a Tama means 126.

상기 타스 수단(122)은 티알에스 수단(110)으로부터 문서집합과 포스팅 정보를 입력하고 기술용어와 문맥정보를 추출하는 것으로, 상기 문서집합으로부터 품사 나열 방식의 패턴 분석으로 기술용어를 추출하고, 관리 데이터베이스(140)에 기록되지 않은 신규 기술용어를 자동과 수동 방식으로 분류하며, 분류된 기술용어에 의한 문맥정보를 추출하여 에이알이에스 수단(160)에 제공하고, 질의된 문맥정보와 함께 사용되는 기술용어를 추출하며 상기 관리 데이터베이스(140)에 기록되지 않은 신규 기술용어를 자동과 수동 방식으로 분류하고, 분류된 기술용어에 의한 문맥정보를 추출하여 에이알이에스 수단(160)에 제공한다. The TAS means 122 inputs a document set and posting information from the TLS means 110 and extracts a technical term and context information. The technical term extracts a technical term from the document set by pattern analysis of a part-of-speech arrangement method, and a management database. Automatically and manually classify the new technical terminology not recorded in 140, extract the contextual information by the classified technical term, and provide it to the ALS means 160, and the technology used with the queried contextual information. The terminology is extracted and the new technical terminology not recorded in the management database 140 is automatically and manually classified. The contextual information based on the classified technical term is extracted and provided to the ALS unit 160.

상기 타스 수단(122)은 기술 획득 시스템이고, 문서집합과 포스팅 정보를 입력하여 추출된 기술용어와 문서의 발행연도 또는 시간, 저자, 발행위치 등이 포함 되는 문서내용, 문맥정보를 추출한다. 즉, 상기 티알에스 수단(110)으로부터 추출된 문서집합을 제공받고, 기술용어 및 문맥정보에 의한 두(2) 가지 모드(MODE)를 추출한다. The tass means 122 is a technology acquisition system, and inputs a document set and posting information to extract document descriptions and contextual information including the extracted terminology, the publication year or time of the document, the author, the publication location, and the like. That is, the document set extracted from the TLS means 110 is provided, and two (2) modes (MODE) are extracted based on descriptive terms and contextual information.

기술용어에 기반한 추출모드는 상기 선택된 기술용어가 포함되는 각각의 문서로부터 단순 품사 나열 패턴 분석 방식에 의하여 불특정한 기술용어를 추출하는 방식이다. 상기 추출된 모든 기술용어를 관리 데이터베이스(140)의 기술용어 사전(142)에 기록된 시드 기술용어와 대비 및 분석하여 기록 및 관리되지 않는 신규 기술용어로 확인되면, 자동 및 수동 방식으로 검증하고, 검증에 의하여 기술용어로 판명되면, 해당 문맥정보를 수집하고 추출하며, 상기 검증된 신규 기술용어 및 문맥정보는 에이알이에스 수단(160)에 제공되어 관리 데이터베이스(140)에 추가 기록 및 관리되고, 지식 데이터베이스(190)에 제공되어 기술지식 데이터베이스(192)에 기록 및 관리되며, 상기 검증된 정보는 상기 지식 데이터베이스(190)의 검증집합 데이터베이스(194)에 기록 및 관리된다. The extraction mode based on the descriptive term is a method of extracting an unspecified descriptive term from a simple part-of-speech pattern analysis method from each document including the selected descriptive term. When the extracted technical terms are compared with the seed technical terms recorded in the technical term dictionary 142 of the management database 140 and analyzed as new technical terms not recorded and managed, the automatic and manual methods are verified. If it is found to be technical terms by verification, the relevant context information is collected and extracted, and the verified new technical terms and context information are provided to the ALS means 160 for further recording and management in the management database 140, It is provided to the knowledge database 190 and recorded and managed in the technical knowledge database 192, the verified information is recorded and managed in the verification set database 194 of the knowledge database 190.

상기 신규 기술용어의 자동 검증은, 외부의 정보와 연계하여 기술용어를 검증하는 방식이고, 상기 수동 검증은, 해당 전문가에 의하여 수동으로 분석 및 판단하는 방식이다. The automatic verification of the new technical term is a method of verifying the technical term in connection with external information, and the manual verification is a method of manually analyzing and determining by the expert.

상기 문맥정보에 기반한 추출모드는 특정한 문맥정보에 관련된 용어들을 자동으로 추출하고, 상기 관리 데이터베이스(140)의 문맥정보 데이터베이스(144)에 기록되고 관리되는 시드 문맥정보와 대비하여 신규 문맥정보로 분류되면, 상기와 동일한 방식의 자동 및 수동으로 검증하며, 검증된 신규 문맥정보는 에이알이에스 수단(160)에 제공되어 관리 데이터베이스(140)에 추가 기록 및 관리되고, 지식 데이터베이스(190)에 제공되어 기술지식 데이터베이스(192)에 기록 및 관리되며, 상기 검증된 정보는 상기 지식 데이터베이스(190)의 검증집합 데이터베이스(194)에 기록 및 관리된다. The extraction mode based on the context information automatically extracts terms related to specific context information and is classified as new context information in comparison with the seed context information recorded and managed in the context information database 144 of the management database 140. In the same manner as above, the automatic and manual verification is performed, and the verified new context information is provided to the ALS means 160 to be additionally recorded and managed in the management database 140 and provided to the knowledge database 190 to describe it. Recorded and managed in the knowledge database 192, the verified information is recorded and managed in the verification set database 194 of the knowledge database 190.

상기 티엘에이 수단(124)은 타스 수단(122)과 타마 수단(126)으로부터 추출된 결과를 입력하고 시소러스(THESAURUS), 온톨로지(ONTOLOGY), 어휘 지능망을 이용하여 기술용어 후보를 타스 수단에 제공하고 기술용어의 연관관계를 타마 수단에 제공한다. The TLA means 124 inputs the results extracted from the TAS means 122 and the Tama means 126, and provides the technical term candidates to the TAS means by using the thesaurus, the ontology, and the lexical intelligent network. Provide the Tama means with the technical term associations.

상기 시소러스는, 컴퓨터가 정보검색을 위하여 기록 및 관리하는 용어사전이며, 각 용어의 동의어, 반의어, 유의어, 상위어, 하위어, 관련어 등을 항목별로 관리하는 방식으로 운용한다. The thesaurus is a term dictionary that is recorded and managed by a computer for information retrieval, and operates in such a manner as to manage synonyms, antonyms, synonyms, upper words, lower words, related words, etc. of each term.

상기 온톨로지는, 일반적으로 우주 안에 어떤 종류의 실체들이 존재하는가에 관한 연구 또는 관심을 말하며, '실재'라는 의미의 그리스어 'ONTO'와 '논문 또는 강연' 등의 의미를 갖는 'LOGIA'의 합성어로부터 유래되고, 사물의 본질에 관한 연구를 추구하는 분야를 지칭하며, 전산학 및 정보과학에서는 특정 영역을 표현하는 데이터 모델로서 개념들 사이의 관계를 기술하는 정형 어휘의 집합이고, 추론(REASONING, INFERENCE) 하는 기술로서, 특정 분야의 인터넷 자원과의 관계 기술에 사용하는 시맨틱 웹 및 시맨틱 웹 서비스 등에 응용된다. The ontology generally refers to a study or interest in what kind of reality exists in the universe, and is derived from the compound word 'LOGIA', which means 'ONTO' in Greek meaning 'reality' and 'thesis or lecture'. It refers to a field that originates and seeks research on the essence of things.In computer science and information science, it is a data model that expresses a specific area. It is a set of formal vocabulary that describes the relationship between concepts, and reasoning (REASONING, INFERENCE). It is applied to the semantic web, semantic web service, and the like, which are used to describe relations with Internet resources in a specific field.

상기 티엘에이 수단(124)은 과학 기술 분야의 데이터베이스에 최적화된 언어처리 모듈의 집합으로서, 기술들의 연관관계를 추출하고 검증하기 위한 심층 문장 분석 모듈이며, 식별하기 어려운 문장과 다양한 문장 표현방식을 고려한 부분 구문 분석 기술을 적용하는 SHALLOW PARSER 이며, 전문성 있는 정보의 학습문서들로 학습된 기계 학습기반 품사의 모호성을 분석하는 것으로, 대용량의 정보를 분석하기에 효율적 이도록 운용속도가 최적화된 품사 태깅 시스템이다. The TLA means 124 is a set of language processing modules optimized for a database of science and technology, an in-depth sentence analysis module for extracting and verifying associations of technologies, and considering difficult sentences and various sentence expression methods. SHALLOW PARSER that applies partial syntax analysis technology, and analyzes the ambiguity of machine learning-based part-of-speech learned with specialized information learning documents. It is a part-of-speech tagging system with optimized operation speed to efficiently analyze large amounts of information. .

상기 타마 수단(126)은 티알에스 수단(110)으로부터 문서집합과 포스팅 정보를 입력하고 지식 데이터베이스(190)로부터 신규 기술용어 집합을 입력하며 문서들 사이에서의 기술 연관관계 정보를 추출하여 에이알이에스 수단(160)과 지식 데이터베이스(190)에 제공하는 것으로, 상기 연관관계 정보를 추출하고, 기존에 추출되어 관리되는 연관관계 정보의 유효성을 자동으로 재확인하며, 상기 추출된 연관관계 정보를 자동과 수동으로 검증하고, 시소러스, 온톨로지, 어휘지능망에 의하여 연관관계 정보를 분석하여 에이알이에스(160)에 제공하고 특정 기술에 의한 연관관계 정보를 지식 데이터베이스(190)에 제공한다. The TMA means 126 inputs a document set and posting information from the TLS means 110, inputs a new set of technical terms from the knowledge database 190, and extracts technical relationship information between documents. Providing the information to the 160 and the knowledge database 190, extracting the correlation information, automatically revalidating the validity of the previously extracted and managed correlation information, and automatically and manually checking the extracted correlation information. After verifying, the association information is analyzed by the thesaurus, ontology, and lexical intelligence network and provided to the ALS 160, and the association information by the specific technology is provided to the knowledge database 190.

상기 타마 수단(126)은 티알에스 수단(110)이 추출된 문서집합으로부터 현재까지 상기 지식 데이터베이스(190)에 누적 기록된 신규 기술용어와 상기 관리 데이터베이스(140)에 누적 기록된 시드 기술용어와의 연관관계를 어휘 및 품사정보 패턴을 기반으로 추출한다. 일례로, 전치사에 기반한 기술용어를 추출하는 경우는 “단백질키나아제(PROTAIN KINASE) IN 열처리단백질(HEAT SHOCK PROTAIN)" 등과 같고, 동사구에 기반한 기술용어를 추출하는 경우는 ”지표수(SURFACE WATER) ARE ACCOMPANIED BY HIGH FLUXES OF 유기물(ORGANIC MATTER)" 등과 같으며, 기타의 예로는 “역연령(CHRONOLOGICAL AGE) THAT IS APPLICABLE TO THE 자연개체군(NATURAL POPULATION)" 등과 같다. The tama means 126 may be configured to compare the new technical terms accumulated in the knowledge database 190 with the seed technical terms accumulated in the management database 140 from the document set extracted by the TLS means 110 to the present. Correlation is extracted based on lexical and part-of-speech information patterns. For example, extraction of technical terms based on prepositions is the same as “protein kinase IN heat treat protein”, and the extraction of technical terms based on verb phrases is called “SURFACE WATER” ARE ACCOMPANIED. BY HIGH FLUXES OF ORGANIC MATTER, "and others, such as" CHRONOLOGICAL AGE THAT IS APPLICABLE TO THE NATURAL POPULATION. "

또한, 상기 타마수단(126)은 이전에 추출된 연관관계 정보를 재검증하므로 자동으로 정제(CLARIFICATION) 작업하고, 추출된 연관관계 정보는 상기와 동일하게 자동 및 수동 검증작업을 한다. 또한, 시소러스 및 어휘망(예; WORDNET) 등을 활용한 연관관계 분석은 두(2) 가지 형태로 분류한다. 일례로, GENERIC RELATION 형태의 연관관계 정보는 “상위”, “하위”, “효과”, “원인”,,등과 같으며, SPECIFIC RELATION 형태의 연관관계 정보는 특정 영역이나 기술용어 사이에서 도출될 수 있는 특수한 형태의 관계이며 “고유치 문제(EIGEN-VALUE PROBLEM) OF AN OPEN SYSTEM UNDER 강한 상호작용(STRONG INTERACTION)" 등과 같다. In addition, since the tama means 126 re-verifies previously extracted correlation information, the CLARIFICATION operation is automatically performed, and the extracted correlation information is automatically and manually verified as described above. In addition, the relationship analysis using thesaurus and lexical network (eg WORDNET) is classified into two types. For example, the association information in the form of GENERIC RELATION is the same as the “parent”, “child”, “effect”, “cause”, etc., and the association information in the SPECIFIC RELATION form can be derived between a specific area or technical term. Is a special form of relationship, such as “EIGEN-VALUE PROBLEM OF AN OPEN SYSTEM UNDER STRONG INTERACTION”.

상기 추적수단(130)은 분석수단(120)으로부터 기술용어, 문맥정보, 연관관계 정보와 문서집합을 제공받고 기술용어의 발생시간, 발생위치, 저자를 포함하여 발생빈도, 연관, 확장의 관계에 의한 기술지식을 추적하여 추출하는 것으로, 샘(SAM) 수단(132), 사트(SATT) 수단(134), 티시엠(TCM) 수단(136)을 포함하는 구성이다. The tracking means 130 receives the technical term, the contextual information, the association information and the document set from the analysis means 120, and the relation of occurrence frequency, association, and extension including the occurrence time, location, and author of the technical term. By extracting the technical knowledge by means of the Sam (SAM) means 132, Sat (SATT) means 134, TSI (TCM) means 136 is a configuration that includes.

상기 사트 수단(134)은 분석수단(120)의 타스 수단(122)과 타마 수단(126)으로부터 각각 추출된 기술용어와 문서내용과 연관관계의 정보를 입력하고 상기 지식 데이터베이스(190)로부터는 누적된 신규의 기술용어, 기술과 기술, 기술과 문서에 의한 연계정보를 입력하여 수동과 자동으로 기술지식을 추적한다. 상기 추적된 각 기술용어는 발생시간, 생산자 정보, 위치정보가 포함되는 연관관계의 기술지식이며, 지식 데이터베이스(190)에 제공되어 기록 및 관리되고, 디알에스 수단(180)에 제공되어 다양한 서비스로 응용되며, 도표와 테이블 정보에 의한 문서로 출력한다. The sart means 134 inputs the technical terms, document contents, and association information extracted from the TAS means 122 and the Tama means 126 of the analysis means 120, and accumulates from the knowledge database 190. The technical knowledge is traced manually and automatically by inputting new technical terminology, technical and technical information, and technical and document linkage information. Each of the traced technical terms is technical knowledge of an association relationship including time of occurrence, producer information, and location information, and is provided to the knowledge database 190 to be recorded and managed, and provided to the DLS means 180 to provide various services. Applied and printed as a document with table and table information.

상기 연관관계는 기술용어의 시간별 발생빈도, 관계, 데이터베이스 상에서의 거리정보에 따른 기술의 융합과 분열, 발생위치별 발생빈도, 새로운 명칭의 추정 및 검증이 포함된다. The association includes the fusion and fragmentation of the technology according to the frequency of occurrence of the term, relationship, distance information on the database, the frequency of occurrence by position, and the estimation and verification of a new name.

상기 사트 수단(134)은 추출되고 검증된 기술용어들과 문서의 발행 연도 또는 시간, 저작자, 발행위치 등의 문서내용 정보와 기술용어들의 연관관계를 입력하고, 추적된 기술지식을 지식 데이터베이스(190)에 제공하여 기록 관리하고, 디알에스 수단(180)에 제공하여 다양한 응용 서비스에 활용하며, 도표 및 테이블 등에 의한 기술문서 정보로 출력한다. 즉, 상기 추출된 모든 기술문서에는 발행 연도 등에 의한 시간정보와 저자 등에 의한 생산자 정보와 발생장소 등에 의한 위치정보가 포함되어 있고, 이러한 문서정보를 이용하여 다양한 형태로 연관관계를 추적할 수 있다. The sat means 134 inputs the correlation between the extracted and verified technical terms and document content information such as the year or time of publication of the document, the author, the publication location, and the technical terms, and records the traced technical knowledge into the knowledge database 190. ) To record management, to provide to the RS means 180 to be utilized in various application services, and output as technical document information by charts and tables. That is, all the extracted technical documents include time information by the year of publication, producer information by the author, and location information by the generation place, and the like, and the relationship can be tracked in various forms using the document information.

상기 추적의 일례로, [기술용어]:[발생시간]:[저자]:[발생위치]와 같은 형태로 연관관계를 추적할 수 있고, 또한, [기술용어]:[연관관계]:[기술용어]:[발생시간]:[저자]:[발생위치]와 같은 형태로 연관관계를 추적할 수 있다. As an example of the above tracking, the association can be traced in the form of [technical term]: [time of occurrence]: [author]: [location of occurrence], and also [technical term]: [association relationship]: [technical term]. Relationships can be tracked in the form: term: [time of occurrence]: [author]: [location of occurrence].

상기와 같이 추적된 각 기술문서들 사이에서의 연관관계는 첨부된 도 9에 도시된 것과 같이 표현할 수 있다. The relationship between each of the technical documents tracked as described above may be expressed as shown in FIG. 9.

상기 도 9에서, 각 점들은 추출된 기술용어이고, 각 기술용어 사이에 연결된 선은 추출된 연관관계이다. In FIG. 9, each point is an extracted technical term, and a line connected between each technical term is an extracted relation.

상기와 같이 추출된 연관관계는 확장(EXTENSION) 개념으로 추정될 수 있다. 일례로, 기술용어 A와 기술용어 B와 기술용어 C와의 연관관계를 추적하면, A -> B, B->C 인 경우에 결국 A->C가 되므로, 기술용어 A 는 기술용어 C와 연관관계가 있는 것으로 추적된다. 또한, 기술용어의 발생시간에 따른 발생빈도의 연관관계를 추적할 수 있고, 발생시간에 따른 연관관계를 추적할 수 있으며, 데이터베이스 안에서의 거리정보 연산에 의하여 해당 기술이 융합되거나 분열되는 과정을 추적 분석할 수 있다. 또한, 위치정보에 따른 기술의 발생빈도를 추적 분석하고, 기술용어의 명칭 변경을 추정하거나 검증할 수 있다. The correlation extracted as described above may be estimated by the concept of EXTENSION. For example, if we trace the relationship between technical term A, technical term B, and technical term C, technical term A is related to technical term C because A-> B, B-> C eventually becomes A-> C. It is tracked as having a relationship. In addition, it is possible to track the correlation of the occurrence frequency according to the occurrence time of the technical term, to track the correlation between the occurrence time, and to track the process of fusion or division of the corresponding technology by the distance information operation in the database. Can be analyzed. In addition, the frequency of occurrence of the technology based on the location information can be tracked and analyzed, and the name change of the technical term can be estimated or verified.

상기 샘 수단(132)은 사트 수단(134)으로부터 기술용어, 연관관계, 문서내용의 정보를 입력하고 지식 데이터베이스(190)로부터 누적된 기술용어, 연관관계, 문서내용의 정보를 입력하여 통계적 분석결과를 제공하는 것으로, 사트 수단을 통하여 대용량으로 제공되는 텍스트의 문서집합을 가설설정, 빈도 정보와 해당 원인에 의한 비선형 회귀분석, 주성분(PRINCIPAL COMPONENT ANALYSIS: PCA) 분석이 포함되는 그룹 중에서 선택된 어느 하나 이상으로 통계분석으로 연관관계를 추적한다. The sampling means 132 inputs the technical terms, the relations, and the contents of the documents from the sat means 134, and inputs the technical terms, the relations, the contents of the contents of the documents accumulated from the knowledge database 190, and statistical analysis results. At least one selected from the group including hypothesis setting, frequency information and nonlinear regression analysis based on the cause, and PRIINCIPAL COMPONENT ANALYSIS (PCA) analysis. The correlation is tracked by statistical analysis.

즉, 상기 샘 수단(132)은 각종 통계정보를 수집하기 위한 백엔드(BACKEND) 모듈로서, 대용량의 수치정보 또는 텍스트 정보를 제공받고 상황에 따라 통계적으로 분석한다. That is, the sampling means 132 is a backend module for collecting various statistical information, and receives large amount of numerical information or text information and statistically analyzes it according to the situation.

상기 티시엠 수단(136)은 사트 수단(134)으로부터 기술용어, 연관관계, 문서내용의 정보를 입력하고 지식 데이터베이스(190)로부터 누적된 기술용어, 연관관계, 문서내용의 정보를 입력하여 기술집합으로 분류하고 군집화한 결과를 추적하여 제공하는 것으로, 사트 수단(134)을 통하여 대용량으로 제공되는 텍스트의 문서집 합을 추출된 명칭과 기술들의 연관관계로 분류(CLASSIFICATION)하고 군집화(CLUSTERING) 한다. The TSI means 136 inputs description terms, association relations, and document contents from the satt means 134, and inputs description terms, association relations, and document contents accumulated from the knowledge database 190, and sets the descriptions. By classifying and clustering and tracking the results provided, the document set of the text provided in a large amount through the means 134 is classified (CLASSIFICATION) and clustered (CLUSTERING) by the relationship between the extracted name and technology.

즉, 상기 티시엠 수단(136)은 다양한 모델, 일례로, NAIVE BAYESIAN MODEL, k-NEAREST NEIGHBOR MODEL, SVM MODEL 등을 이용하여 추출된 기술문서들을 명칭으로 군집화 및 분류하고, 연관관계에 의한 군집화 및 분류한다. That is, the TSI module 136 clusters and classifies technical documents extracted using various models, for example, NAIVE BAYESIAN MODEL, k-NEAREST NEIGHBOR MODEL, SVM MODEL, etc. Classify.

도 4 를 참조하면 분석수단(120)은 타스 수단(122), 티엘에이 수단(124), 타마 수단(126)을 포함하는 구성이다. Referring to FIG. 4, the analyzing means 120 includes a TAS means 122, a TLA means 124, and a Tama means 126.

도 5 를 참조하면 추적수단(130)은 샘 수단(132), 사트 수단(134), 티시엠 수단(136)을 포함하는 구성이다. Referring to FIG. 5, the tracking means 130 includes a leakage means 132, a sat means 134, and a TSI means 136.

도 6 을 참조하여 관리 데이터베이스(140)를 설명하면, 에이알이에스 수단(160)에 의하여 제공되는 시드 기술용어를 누적하여 기록하고 관리하는 기술용어 사전 데이터베이스(142)와 시드 문맥정보를 누적하여 기록하고 관리하는 문맥정보 데이터베이스(144)를 포함하는 구성이다. Referring to FIG. 6, the management database 140 will be described by accumulating and recording the seed terminology provided by the ALS means 160 and accumulating and recording the technical term dictionary database 142 and seed context information. And a contextual information database 144 that manages and manages.

도 7 을 참조하여 과학정보 데이터베이스(150)를 설명하면, 국내, 미국, 일본, 유럽 지역으로부터 수집된 특허 정보를 기록하여 관리하는 특허 데이터베이스(152), 관리되는 논문과 기술보고서의 영문초록을 기록하여 관리하는 논문 데이 터베이스(154), 특허, 논문, 기술보고서를 포함하는 기술문서를 외부 및 웹(WEB)으로부터 수집하여 기록하는 외부 데이터베이스(156)를 포함하는 구성이다. Referring to FIG. 7, the scientific information database 150 will be described. A patent database 152 for recording and managing patent information collected from Korea, the United States, Japan and Europe will be recorded. It is a configuration that includes an external database 156 that collects and records a technical document including a paper database 154, a patent, a paper, and a technical report from the outside and the web (WEB).

도 8 을 참조하면 지식 데이터베이스(190)는 기술지식 데이터베이스(192), 검증집합 데이터베이스(194)를 포함하는 구성이고, 도 10 은 본 발명에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치의 전체 기능 블록이 상세히 도시되어 있다. Referring to FIG. 8, the knowledge database 190 is a configuration including a technical knowledge database 192 and a verification set database 194, and FIG. 10 is an entire functional block of a device for discovering a semantic-based technical term of a database according to the present invention. This is shown in detail.

이하 상기 첨부된 도 11 을 참조하여, 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 방법을 설명하면, 제공하는 과정; 기술용어 과정; 문맥정보 과정; 출력하는 과정; 기록과정; 을 포함하는 구성이다. Hereinafter, referring to the attached FIG. 11, a method of discovering a semantic-based technical term of a database according to an example of the present disclosure will be provided. Terminology courses; Contextual information process; Output process; Recording process; It includes a configuration.

상기와 같은 구성에 의한 것으로 에이알엠 수단, 티알에스 수단, 분석수단, 추적수단, 관리 데이터베이스, 과학정보 데이터베이스, 에이알이에스 수단, 이알에이 수단, 디알에스 수단, 지식 데이터베이스를 포함하는 구성을 이용하는 데이터베이스의 의미기반 기술용어 발굴 방법을 상세히 설명한다. A database using the configuration as described above, which includes an ALM means, a TS means, an analysis means, a tracking means, a management database, a scientific information database, an ALS means, an ALA means, a DLA means, and a knowledge database. This section describes in detail how to discover semantic-based technical terms.

상기 분석수단에 의하여 과학정보 데이터베이스의 운용부하가 소정 비율로 낮은지를 판단한다(S100). 즉, 이용자 또는 사용자의 접속이 많은지 또는 적은지를 판단하는 것으로 운용부하가 적은 경우에 본 발명에 의한 방법이 자동적이고 반복적으로 운용되도록 한다. It is determined by the analyzing means whether the operating load of the scientific information database is low at a predetermined rate (S100). That is, the method according to the present invention is automatically and repeatedly operated when the operating load is small by determining whether the user or the user's connection is large or small.

상기 판단(S100)에서 과학정보 데이터베이스의 운용비율이 소정 비율로 낮은 경우, 에이알엠(ARM) 수단이 관리 데이터베이스를 검색하여 추출한 질의어를 티알에스 수단에 제공한다(S110). 상기 질의어는 기술용어와 문맥정보에 기반한 질의어이다. When the operation ratio of the scientific information database is low at a predetermined ratio in the determination (S100), the AL means provides a query word extracted by searching the management database to the TS means (S110). The query word is a query word based on descriptive terms and contextual information.

상기 티알에스 수단은 질의어에 의하여 과학정보 데이터베이스로부터 문서집합, 포스팅(POSTING) 정보를 추출하여 분석수단에 제공한다(S120). 즉, 특정한 기술 분야로 분류된 문서집합과 상기 질의어 및 포스팅 정보를 과학정보 데이터베이스로부터 검색 및 추출하여 분석수단에 제공하고, 상기 포스팅 정보는 문서를 구성하는 색인어의 가중치 정보가 포함된다. The TLS means extracts a document set and posting information from the scientific information database by using a query word and provides it to the analyzing means (S120). That is, a set of documents classified into a specific technical field and the query and posting information are searched and extracted from a scientific information database, and provided to analysis means, and the posting information includes weight information of index words constituting a document.

상기 분석수단은 제공된 문서집합으로부터 신규 및 시드의 기술용어가 검출 또는 추출되는지를 확인하고(S130), 상기의 기술용어가 검출 또는 추출되는 경우는 에이알이에스 수단에 통보하는 동시에 지식 데이터베이스에 제공하여 등록한다(S140). 즉, 에이알이에스 수단은 관리 데이터베이스를 구성하는 기술용어 사전 데이터베이스에 등록하고, 지식 데이터베이스는 해당 기술지식 데이터베이스에 등록 한다. The analyzing means checks whether the technical terms of the new and seed are detected or extracted from the provided document set (S130), and if the technical terms are detected or extracted, notifies the ALS means to the knowledge database. Register (S140). That is, the ALS means registers in a technical term dictionary database constituting a management database, and a knowledge database is registered in a corresponding technical knowledge database.

또한, 상기 분석수단은 제공된 문서집합으로부터 신규 및 시드의 문맥정보가 검출 또는 추출되는지를 확인하고(S150), 상기의 기술용어가 검출 또는 추출되는 경우는 에이알이에스 수단에 통보하는 동시에 지식 데이터베이스에 제공하여 등록한다(S160). 즉, 에이알이에스 수단은 관리 데이터베이스를 구성하는 문맥정보 데이터베이스에 등록하고, 지식 데이터베이스는 해당 기술지식 데이터베이스에 등록한다. In addition, the analyzing means checks whether the context information of the new and the seed is detected or extracted from the provided document set (S150), and if the technical term is detected or extracted, notifies the ALS means to the knowledge database. Provide and register (S160). That is, the ALS means registers in the context information database constituting the management database, and the knowledge database registers in the corresponding technical knowledge database.

상기 분석수단은 질의어로 추출된 문서집합을 추적수단에 제공하고, 상기 지식 데이터베이스는 기록하고 관리하는 것으로 기술용어가 포함되는 모든 신규 지식정보를 추적수단에 제공한다(S170). The analyzing means provides the tracking means with the document set extracted by the query word, and the knowledge database records and manages and provides all the new knowledge information including the technical term to the tracking means (S170).

상기 추적수단은 문서집합과 기술용어가 포함되는 해당 정보를 제공받고 연관된 기술이 있는지를 기술용어, 발생시간, 생산자 정보, 위치정보를 포함하여 연관관계가 있는지를 추적한다. The tracking means receives the corresponding information including the document set and the technical term, and tracks whether there is an associated technology and whether there is an associated relationship including the technical term, the occurrence time, the producer information, and the location information.

상기의 추적에 의하여 연관된 기술이 있는지를 확인하고(S180), 연관된 기술이 있는 것으로 확인되면 추적된 기술정보를 지식 데이터베이스와 디알에스 수단에 제공하여 기록 및 응용하고, 도표 및 테이블 등에 의한 문서로 출력한다(S190). Check whether there is an associated technology by the above tracking (S180), and if it is determined that there is an associated technology, the tracked technical information is provided to a knowledge database and a DSA means, recorded and applied, and outputted as a document by charts and tables. (S190).

또한, 상기 과학정보 데이터베이스의 운용부하가 소정 비율로 높은 경우, 이알에이 수단에 의하여 웹이 포함되는 외부자원으로부터 질의어, 기술문헌 정보 등을 수집하여 에알이에스 수단 및 과학정보 데이터베이스의 외부 데이터베이스에 제공하여 기록하고(S200), 상기 출력과정(S170)으로 진행한다. In addition, when the operational load of the scientific information database is high at a predetermined rate, query words, technical literature information, and the like are collected from external resources including the web by EAL means and provided to the external database of the RS means and the scientific information database. Record (S200), and proceeds to the output process (S170).

상기와 같은 구성의 본 발명은 자연어 처리법, 패턴 인식, 데이터 발굴법 등으로 특정 분야의 기술 문서를 제한 검색 및 추출하므로, 대용량의 정보를 실시간으로 검색하고 분석하는 장점이 있다. The present invention having the above configuration has the advantage of searching and extracting a large amount of information in real time because of limited search and extraction of technical documents in a specific field such as natural language processing, pattern recognition, data discovery, and the like.

본 발명을 일례로 설명하였으나, 반드시 이러한 일례에 국한되는 것이 아니고, 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형 실시될 수 있 다. 따라서 본 발명에 개시된 일례들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 일례에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 한다. Although the present invention has been described as an example, it is not necessarily limited to such an example, and various modifications can be made without departing from the technical spirit of the present invention. Therefore, the examples disclosed in the present invention are not intended to limit the technical idea of the present invention but to describe the present invention, and the scope of the technical idea of the present invention is not limited by the examples. The scope of protection of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

도 1 은 일반적인 기술 정보의 발굴 개념도, 1 is a conceptual diagram of excavation of general technical information;

도 2 는 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치로 최종 목표를 달성하기 위한 단계별 작용 설명도, FIG. 2 is an explanatory diagram of step-by-step operation for achieving a final goal with an apparatus for discovering a term based on technical terms according to an example of the present invention;

도 3 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치 기능 구성도, 3 is a functional configuration diagram of a semantic technology term extraction means based on the database according to an example of the present invention,

도 4 는 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 분석수단의 상세 기능 구성도, 4 is a detailed functional configuration diagram of an analysis means constituting a device for discovering a semantic based technical term of a database according to an example of the present invention;

도 5 는 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 추적수단의 상세 기능 구성도, 5 is a detailed functional configuration diagram of a tracking means constituting a device for discovering a term based on technical terms by means of an example of the present invention;

도 6 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 관리 데이터베이스의 상세 기능 구성도, 6 is a detailed functional configuration diagram of a management database constituting a semantic-based technical term discovery device of a database according to an example of the present invention;

도 7 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 과학정보 데이터베이스의 상세 기능 구성도, 7 is a detailed functional configuration diagram of a scientific information database constituting a device for discovering a term based on technical terms by an example of the present invention;

도 8 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치를 구성하는 분석수단의 지식 데이터베이스의 상세 기능 구성도, 8 is a detailed functional configuration diagram of a knowledge database of analysis means constituting a semantic-based technical term discovery device of a database according to an example of the present invention;

도 9 는 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치에 의하여 추적되는 기술용어의 연관관계 설명도, FIG. 9 is an explanatory diagram of association of technical terms tracked by an apparatus for discovering a term based on technical terms by an example of the present invention; FIG.

도 10 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 장치의 전체 상세 기능 구성도, 10 is an overall detailed functional configuration diagram of a device for discovering semantic based technical terms of a database according to an example of the present invention;

도 11 은 본 발명의 일례에 의한 것으로 데이터베이스의 의미기반 기술용어 발굴 방법 순서도. 11 is a flowchart illustrating a method of discovering a term based on technical terms of a database by an example of the present invention.

** 도면의 주요 부분에 대한 부호 설명 ** ** Explanation of symbols on the main parts of the drawing **

100 : ARM 수단 110 : TRS 수단100: ARM means 110: TRS means

120 : 분석수단 122 : TAS 수단120: analysis means 122: TAS means

124 : TLA 수단 126 : TAMA 수단124: TLA means 126: TAMA means

130 : 추적수단 132 : SAM 수단130: tracking means 132: SAM means

134 : SATT 수단 136 : TCM 수단134: SATT means 136: TCM means

140 : 관리 데이터베이스 142 : 기술용어사전 DB140: management database 142: technical glossary DB

144 : 문맥정보 데이터베이스 150 : 과학정보 데이터베이스144: contextual information database 150: scientific information database

152 : 특허 데이터베이스 154 : 논문 데이터베이스152: Patent Database 154: Thesis Database

156 : 외부 데이터베이스 160 : ARES 수단156: external database 160: ARES means

170 : ERA 수단 180 : DRS 수단170: ERA means 180: DRS means

190 : 지식 데이터베이스 192 : 기술지식 데이터베이스190: Knowledge Database 192: Technical Knowledge Database

194 : 검증집합 데이터베이스194: Validation Set Database

Claims

ARM means for searching for and outputting query terms based on technical terms and context information of new and seed to search for information of a specific technical field from a management database;

TRS means for extracting a document set including a technical term input from the ALM means and a query word based on context information and corresponding posting information from a scientific information database;

Analysis means for extracting description terms and context information from the document set and posting information provided by the TLS means, and analyzing the relationship between the description terms;

ARS means for receiving technical terms, context information, and association information extracted by the analysis means, extracting new technical terms and context information, and recording them in a management database; And

ERA means for accessing the ALS means and extracting and providing descriptive terms, context information, relations and description documents from external resources; Meaning-based technical term excavation device of a database comprising a.

The method of claim 1,

The technical term, contextual information, association information, and document set are provided from the analysis means, and tracking and extracting the technical knowledge based on the occurrence frequency, association, and extension relation including the occurrence time, location, and author of the technical term Way;

DRS means for analyzing the technical knowledge accumulated by the tracking means for a predetermined time to provide a service scenario;

A knowledge database for recording and managing new technical knowledge analyzed and tracked by said analysis means and said tracking means;

The management database that receives, records and manages technical terms and context information from the ALS means and provides the ALM means and the analysis means to the ALM means; And

The scientific information database connected to the TS unit and recording, managing, and providing a patent, a paper, a technical report, and a technical document extracted from the outside by the RS unit by searching; Meaning-based technical term extraction device of the database further comprising.

The method of claim 1, wherein the ALM means,

Based on the terminology of the new and seed input from the management database, a query word for a relatively large number of selections and a search for a document set of a specific technical area and technical pattern is extracted, and the context information of the new and seed input from the management database is extracted. Meaning-based technical term extraction apparatus of a database, characterized in that consisting of the configuration to provide a query means of frequently appearing vocabulary pattern by the TLS means.

The method of claim 1, wherein the TS means,

A set of documents including the query word, the designated part-of-speech, and the sentence component from the scientific information database in which the query term of the technical term and the context information is input from the ALM means and the load is operated below a predetermined value, and the posting information of each document Meaning-based technical term excavation device of a database, characterized in that consisting of a configuration for extracting.

The method of claim 2, wherein the analysis means,

Tass means for inputting document set and posting information from the TLS means and extracting technical terms and context information;

A tama means for inputting document set and posting information from the TLS means, inputting a new description term set from the knowledge database, and extracting the technical association information between the documents and providing them to the ALS means and the knowledge database;

A TI-A means for inputting the results extracted from the TAS means and the TAM means, providing candidate candidates with technical terms to the TAS means using a thesaurus, ontology, and lexical intelligent network, and providing the relations of the technical terms to the Tama means; Meaning-based technical term excavation apparatus of the database, characterized in that consisting of a configuration.

The method of claim 5, wherein the tass means,

Extract technical terms from the document set inputted from the TLS means by pattern analysis in a part-of-speech arrangement method, classify new technical terms not recorded in the management database in an automatic and manual manner, and contextual information by the classified technical terms. Extract and provide to the ALS means, extract technical terms used with the queried context information, classify new technical terms not recorded in the management database in an automatic and manual manner, and use the classified technical terms. Apparatus for discovering the meaning-based technical terminology of a database, characterized in that the context information is extracted and provided to the ALS means.

The method of claim 5, wherein the tama means,

Extract the association from the document set input from the TLS means by the part-of-speech information pattern and vocabulary of the technical term, automatically reconfirm the existing association information, automatically and manually verify the extracted association, and thesaurus And an ontology and a lexical intelligent network to analyze the association and provide it to the AI, and provide the association information by a specific technology to the knowledge database.

The method of claim 2, wherein the tracking means,

Satt means for inputting descriptive terms extracted from the analysis means, document content and association information, and tracking technical knowledge manually and automatically;

Sampling means for inputting description terms, association relations and document contents information from the satt means, and inputting description terms, association relations and document contents accumulated from the knowledge database to provide statistical analysis results; And

TSI M means for inputting information of technical terms, relations and document contents from the Sat means, and inputting information of technical terms, relations and document contents accumulated from the knowledge database, and classifying them into description sets and providing clustered results. ; Meaning-based technical term excavation apparatus of the database, characterized in that consisting of a configuration.