KR20050000468A

KR20050000468A - A Method For Classifying Document Information Based On User's Definition And Storage Media Thereof

Info

Publication number: KR20050000468A
Application number: KR1020030040978A
Authority: KR
Inventors: 감문호
Original assignee: 울림정보기술(주)
Priority date: 2003-06-24
Filing date: 2003-06-24
Publication date: 2005-01-05

Abstract

PURPOSE: A learning-based bibliographic information classifying method by a classification code and user classification definition, and a recording medium thereof are provided to classify bibliographic information efficiently by automatically classifying the bibliographic information if a user fixes a desired classification item/standard. CONSTITUTION: Classification object file selection information and classification code information are received from the user(S16). The bibliographic information included in a classification object file is classified by the classification standard information corresponding to the classification code and outputs a result to the user(S18). The selection information of the classified bibliographic information is received, and the corresponding abstract information and indexed keywords are extracted/transmitted to the user(S18). A keyword selection signal is received from the user. The selected keyword is added, edited, or removed from the classification standard information(S24).

Description

A method for classifying document information based on user's definition and storage media thereof}

본 발명은 문헌정보 분류방법에 관한 것으로서, 보다 상세하게는 특정 분류코드로 분류하여 문헌정보에 대한 요약 정보 및 키워드 정보를 사용자에게 제공하고, 사용자가 이에 기초하여 분류코드에 해당하는 분류기준을 용이하게 보강할 수 있도록 하여 문헌정보의 분류 작업을 신속하게 처리할 수 있도록 하는 분류방법에 관한 것이다.The present invention relates to a method for classifying document information, and more particularly, classifies it as a specific classification code to provide a user with summary information and keyword information about the document information, and the user can easily classify a classification code corresponding to the classification code based on this. The present invention relates to a classification method that can quickly classify a document information by allowing it to be reinforced.

산업의 고도화 및 첨단 기술의 시대로 접어들면서 특허 등의 지적 재산권의 중요성이 날로 강조되고 있으며, 이에 따라서 정부나 기업 등은 기술 개발이나 영업 전략 등의 목적으로 특허 지도(patent map) 등의 특허 정보를 포함하는 문헌정보의 분석을 통한 신기술 동향 파악에 힘쓰고 있다.As the industry advances and the age of advanced technology is increasing, the importance of intellectual property rights, such as patents, is being emphasized day by day. Therefore, governments and corporations, etc., have patent information such as patent maps for the purpose of technology development and sales strategy. We are trying to grasp new technology trends through analysis of literature information, including.

이러한 특허 지도를 작성하기 위해서는 관련 기술분야에 대한 무수한 특허들을 분석하고 분류하는 과정이 요구된다. 그러나, 종래의 특허 분석 및 분류 작업은 사용자가 컴퓨터로 다운로드 받은 특허자료를 하나하나 열어서 초록자료를 읽어보면서 분류자의 직관에 의해 반복적으로 검사하는 방법을 취하고 있다.In order to prepare such a patent map, a process of analyzing and classifying a myriad of patents in related fields is required. However, the conventional patent analysis and classification work takes a method of repeatedly checking by the intuition of the classifier while reading the abstract data by opening the patent data downloaded by the user one by one.

따라서, 이러한 방법은 분류자의 자질에 따라 분류 자료의 품질이 달라질 수 있으며, 자료를 하나하나 읽어가면서 처리해야 하므로 시간과 노력이 과다하게 투자되는 문제점이 있었다.Therefore, this method may vary the quality of the classification data according to the quality of the classifier, there is a problem that excessive investment of time and effort because the data must be processed by reading each one.

또한, 사용자 개인의 컴퓨터에서 특허 정보 분류 작업이 이루어지므로 다른 분류자와 분류 정보를 공유하는 것이 매우 까다롭고 동일한 자료군에 동일한 분류방법을 적용하려면 상기 과정을 반복 수행해야 하므로 많은 비용이 낭비되고 있는 실정이다.In addition, since patent information is classified on a user's computer, it is very difficult to share the classification information with other classifiers, and the above process must be repeated to apply the same classification method to the same data group. It is true.

따라서, 이러한 종래의 특허 정보를 포함하는 문헌정보 분류방법의 불합리를 극복하고, 보다 신속하면서도 정확한 문헌정보 분류방법에 대한 요구가 높아지고 있다.Accordingly, there is an increasing demand for a faster and more accurate document information classification method, overcoming the unreasonability of the document information classification method including the conventional patent information.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 제 1 목적은 사용자가 자신의 목적에 적합한 분류항목과 분류기준을 설정하면 자동분류 프로그램이 설정된 분류기준에 따라 특허문서 등의 문헌정보들을 자동으로 분류하여 줌으로써 문헌정보의 분류 작업을 신속하고 효율적으로 처리할 수 있도록 하는 사용자 분류정의에 의한 학습 기반의 문헌 정보 분류방법 및 그 기록매체를 제공하는 것이다.The present invention has been made in order to solve the above problems, the first object of the present invention is to set the classification items and classification criteria suitable for the user's purpose, such as patent documents, etc. The present invention provides a method of classifying document information based on a user classification definition and a recording medium for automatically classifying document information so as to quickly and efficiently classify document information.

본 발명의 제 2 목적은 분류결과가 사용자가 쉽게 이해할 수 있는 구조로 화면에 표시되어 사용자가 분류된 결과를 효과적으로 확인하면서 분류기준 정보를 추가, 삭제, 수정을 통해 보강 할 수 있는 사용자 분류정의에 의한 학습 기반의 문헌 정보 분류방법 및 그 기록매체를 제공하는 것이다.The second object of the present invention is to define a user classification that can be reinforced by adding, deleting, and modifying classification criteria information while the classification results are displayed on the screen in a structure that can be easily understood by the user. It provides a method of classifying document information based on learning and a recording medium thereof.

본 발명의 제 3 목적은 분류결과가 다시 분류기준에 반영되어 분류기준이 학습/진화되어 가는 사용자 분류정의에 의한 학습 기반의 문헌 정보 분류방법 및 그 기록매체를 제공하는 것이다.A third object of the present invention is to provide a learning-based document information classification method based on a user classification definition in which a classification result is reflected in a classification standard and learning / evolution of the classification standard, and a recording medium thereof.

도 1은 본 발명에 따른 특허 정보 분류 시스템의 구성도,1 is a block diagram of a patent information classification system according to the present invention;

도 2는 도 1의 서버 컴퓨터의 기능 모듈 및 그에 따른 자료처리절차를 도시한 흐름도,FIG. 2 is a flowchart illustrating a functional module of the server computer of FIG. 1 and data processing procedures thereof;

도 3은 도 2의 서버 컴퓨터의 각 기능 중에서 자동분류 모듈만 도시한 도면,3 is a view showing only the automatic classification module among the functions of the server computer of FIG.

도 4는 본 발명에 따른 문헌 정보의 분류가 수행되는 과정을 도시한 흐름도,4 is a flowchart illustrating a process of classifying document information according to the present invention;

도 5는 문헌 정보의 분류 기준 학습이 수행되는 과정을 도시한 흐름도,5 is a flowchart illustrating a process in which classification reference learning of document information is performed;

도 6은 본 발명에 따른 특허 정보 분류방법이 수행되는 사용자 컴퓨터의 예시적인 실행화면이다.6 is an exemplary execution screen of a user computer on which a patent information classification method according to the present invention is performed.

<주요 도면부호에 관한 간단한 설명><Brief description of the major reference numerals>

10 : 사용자 컴퓨터,,10: user computer ,,

20 : 서버 컴퓨터,20: server computer,

30 : 네트워크.30: network.

상기와 같은 목적을 달성하기 위한 본 발명의 제 1 측면에 따르면, 사용자 분류정의에 의한 학습 기반의 문헌 정보 분류방법에 있어서, 사용자 컴퓨터로부터 분류대상파일 선택 정보 및 분류코드 선택 정보를 수신하는 단계, 상기 분류 대상 파일에 포함된 문헌 정보를 상기 선택된 분류코드에 상응하는 분류기준 정보에 따라 분류하여 그 결과를 상기 사용자 컴퓨터로 전송하는 단계, 상기 분류코드의 분류기준 정보로 분류된 문헌 정보의 선택 정보를 수신하고 그에 상응하는 초록정보와 적어도 하나 이상의 색인된 키워드를 추출하여 상기 사용자 컴퓨터로 전송하는 단계, 상기 사용자 컴퓨터로부터 상기 추출된 키워드 중에서 하나 이상의 키워드를 선택하는 키워드 선택 신호를 수신하는 단계, 상기 사용자 컴퓨터로부터 선택된 키워드를 분류기준 정보에 추가, 수정, 삭제 중 어느 하나의 처리를 수행하는 단계, 상기 사용자 컴퓨터로부터 변화된 분류기준 정보에 대한 검증 요청정보가 수신되면 분류기준 정보를 기초로 문헌정보를 재분류하여 재분류된 결과를 사용자 컴퓨터로 전송하는 단계; 및 상기의 분류기준 결과를 저장하는 단계를 포함하는 것을 특징으로 하는 사용자 분류정의에 의한 학습 기반의 문헌 정보 분류방법이 제공될 수 있다.According to a first aspect of the present invention for achieving the above object, in the learning-based document information classification method according to the user classification definition, receiving classification target file selection information and classification code selection information from a user computer, Classifying the document information included in the classification target file according to the classification criteria information corresponding to the selected classification code and transmitting the result to the user computer; selection information of the literature information classified as classification criteria information of the classification code Receiving and extracting corresponding green information and at least one or more indexed keywords to the user computer, receiving a keyword selection signal for selecting one or more keywords from the extracted keywords from the user computer, Define selected keywords from user's computer Performing the processing of any one of addition, modification, and deletion in the beam; and when verification request information about the changed classification criteria information is received from the user computer, the document information is reclassified based on the classification criteria information to reclassify the result. Transmitting to a user computer; And classifying the learning-based document information classification method according to the user classification definition, comprising: storing the result of the classification criteria.

상기와 같은 목적을 달성하기 위한 본 발명의 제 2 측면에 따르면, 컴퓨터에 의해 판독 가능한 프로그램이 기록된 기록매체로서, 컴퓨터에 의해 판독 가능한 프로그램이 기록된 기록매체로서, 사용자 분류정의에 의한 학습 기반의 문헌 정보 분류방법에 있어서, 사용자 컴퓨터로부터 분류대상파일 선택 정보 및 분류코드 선택 정보를 수신하는 단계, 상기 분류 대상 파일에 포함된 문헌 정보를 상기 선택된 분류코드에 상응하는 분류기준 정보에 따라 분류하여 그 결과를 상기 사용자 컴퓨터로 전송하는 단계, 상기 분류코드의 분류기준 정보로 분류된 문헌 정보의 선택 정보를 수신하고 그에 상응하는 초록정보와 적어도 하나 이상의 색인된 키워드를 추출하여 상기 사용자 컴퓨터로 전송하는 단계, 상기 사용자 컴퓨터로부터 상기 추출된 키워드 중에서 하나 이상의 키워드를 선택하는 키워드 선택 신호를 수신하는 단계, 상기 사용자 컴퓨터로부터 선택된 키워드를 분류기준 정보에 추가, 수정, 삭제 중 어느 하나의 처리를 수행하는 단계, 상기 사용자 컴퓨터로부터 변화된 분류기준 정보에 대한 검증 요청정보가 수신되면 분류기준 정보를 기초로 문헌정보를 재분류하여 재분류된 결과를 사용자 컴퓨터로 전송하는 단계 및 상기의 분류기준 결과를 저장하는 단계를 실행시키기 위한 프로그램을 기록한 기록매체가 제공될 수 있다.According to a second aspect of the present invention for achieving the above object, a recording medium on which a computer-readable program is recorded, and a recording medium on which a computer-readable program is recorded, based on learning by user classification definition. A method of classifying document information of a document, the method comprising: receiving classification file selection information and classification code selection information from a user computer, classifying the document information included in the classification file according to classification criteria information corresponding to the selected classification code; Transmitting the result to the user computer, receiving selection information of document information classified as classification reference information of the classification code, extracting the corresponding green information and at least one indexed keyword, and transmitting the selected information to the user computer; Step, among the extracted keywords from the user computer Receiving a keyword selection signal for selecting one or more keywords, performing any one of the process of adding, modifying, or deleting the selected keyword from the user computer to the classification criteria information, for the changed classification criteria information from the user computer When the verification request information is received, the recording medium records the program for reclassifying the document information based on the classification criteria information, transmitting the reclassified results to the user computer, and storing the classification criteria results. Can be.

본 발명의 그 밖의 목적, 특정한 장점들 및 신규한 특징들은 첨부된 도면들과 연관되어지는 이하의 상세한 설명과 바람직한 실시예들로부터 더욱 분명해질 것이다.Other objects, specific advantages and novel features of the present invention will become more apparent from the following detailed description and the preferred embodiments associated with the accompanying drawings.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명하도록 한다. 본 발명은 문헌정보 분류방법에 관한 것으로서 특허정보, 기술정보, 논문정보, 뉴스기사, 역사사료 등과 같이 출력물의 형태 등으로 사용자가 이용할 수 있는 모든 문헌 정보의 분류가 가능하지만, 이하에서는 특허정보를 분류하는 방법을 예시하여 설명하기로 한다. 이하에 개시된 특허정보 분류방법이 기타의 문헌정보에도 동일하게 적용될 수 있음은 물론이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The present invention relates to a method for classifying document information, which is possible to classify all document information available to the user in the form of output such as patent information, technical information, article information, news articles, historical feeds, etc. The classification method will be described by way of example. Of course, the patent information classification method disclosed below can be equally applied to other document information.

도 1은 본 발명에 따른 특허 정보 분류 시스템의 구성도이다.1 is a block diagram of a patent information classification system according to the present invention.

도 1에 도시된 바와 같이, 본 발명에 따른 특허 정보 분류 시스템은 사용자 컴퓨터(10) 및 서버 컴퓨터(20)를 포함하여 구성된다.As shown in FIG. 1, the patent information classification system according to the present invention includes a user computer 10 and a server computer 20.

사용자 컴퓨터(10)는 일반적으로 IBM PC로서, TCP/IP 방식 또는 PPP방식으로 서버 컴퓨터(20)와 네트워크(30)를 통해 연결된다. 여기서, 네트워크(30)는 인터넷, LAN, MAN, WAN 등을 포함하고, 따라서, 웹기반 외에 로컬 네트워크를 기반으로 하는 서버/클라이언트 구조로도 구현될 수 있다. 사용자 컴퓨터(10)는 특허 정보 분류에 사용될 데이터 또는 분류 대상 파일을 서버 컴퓨터(20)로 전송하고, 서버 컴퓨터(20)로부터 특허 정보 분류결과 정보 및 특허 정보의 색인을 위한 키워드 정보를 수신한다.The user computer 10 is generally an IBM PC and is connected to the server computer 20 via a network 30 in a TCP / IP or PPP manner. Here, the network 30 includes the Internet, a LAN, a MAN, a WAN, and the like, and thus, may be implemented as a server / client structure based on a local network in addition to the web. The user computer 10 transmits data to be used for classification of patent information or a file to be classified to the server computer 20, and receives patent information classification result information and keyword information for indexing patent information from the server computer 20.

서버 컴퓨터(20)는 특허 정보 분석 및 분류 서비스를 제공하는 업체에 설치되어 있는 컴퓨터로서, 사용자 컴퓨터(10)로부터 수신된 분류 대상 파일에 포함된 복수개의 특허 정보들을 사용자에 의해 설정된 분류 코드로 분류하는 기능을 수행한다.The server computer 20 is a computer installed in a company that provides patent information analysis and classification service. The server computer 20 classifies a plurality of patent information included in a classification target file received from the user computer 10 into a classification code set by a user. It performs the function.

도 2는 도 1의 서버 컴퓨터(20)의 기능 모듈 및 그에 따른 자료 처리 절차를 도시한 흐름도이다.FIG. 2 is a flowchart illustrating a functional module and a data processing procedure of the server computer 20 of FIG. 1.

도 2를 참조하면, 서버 컴퓨터(20)의 문헌가공 모듈(50)은 특허 및 문헌의 원시 자료(40)를 기초하여 분석/수집/입력/규약검사/변환 등의 가공절차를 거쳐 가공하여 문헌 데이터베이스(60)에 저장한다. 가공 모듈(50)은 사용자가 특허 분류 작업을 용이하게 할 수 있도록 발명의 명칭, 특허번호 등의 서지적 사항과 발명의 구성을 요약 정리한 본문 요약 정보를 포함하도록 원시자료(40)를 가공하여 특허 정보를 생성한다.Referring to Figure 2, the document processing module 50 of the server computer 20 is processed based on the processing procedure such as analysis / collection / input / protocol inspection / conversion based on the raw data 40 of the patent and literature Stored in the database 60. The processing module 50 processes the raw data 40 to include bibliographic matters such as the name of the invention, patent number, and the like, and text summary information summarizing the composition of the invention to facilitate a user's patent classification. Generate patent information.

색인 모듈(80)은 불용어 사전(70) 및 용어 사전(90)을 이용하여 수집된 특허 정보 및 부속 정보를 검색, 분석 시스템 및 자동분류 모듈에 적용할 수 있도록 데이터베이스 테이블 규격에 맞추어 색인한 후 색인어 데이터베이스(110)에 업로드하는 기능을 수행한다. 이러한 색인 방법에는 자동색인과 수동색인이 있다. 자동색인은 가공 모듈(50)에 의해 가공된 자료를 파일 정보를 이용하여 사전참조/형태소 분석을 통하여 키워드를 추출하며, 색인된 키워드를 색인파일/서지사항/분문 파일에 갱신하는 절차의 색인방식으로 색인자의 작업을 거치지 않고 일괄적으로 색인한다. 수동색인은 가공된 자료를 색인 절차마다 색인자가 레코드 단위별로 1건씩 색인하는 방식으로 색인자가 자료로부터 용어를 직접 추출하는 완전 수동방식과 레코드 단위별로 자동 색인하여 색인자가 확인 및 수정하는 반자동색인 방식이 있다. 본실시예에서는 자동 색인 방식을 이용하였다.The index module 80 indexes the patent information and the supplementary information collected by using the stopword dictionary 70 and the terminology dictionary 90 in accordance with the database table standard so that it can be applied to the search, analysis system, and automatic classification module. Upload to the database (110). There are two types of indexing methods: automatic indexes and manual indexes. The automatic index extracts the keywords processed by the processing module 50 through pre-reference / morphological analysis using the file information, and indexes the procedure of updating the indexed keywords in the index file / surgery / text file. In this way, indexing is done in batches without going through the work of the indexer. Manual indexing is a method in which the indexer indexes the processed data one record unit for each indexing procedure. The manual indexer extracts the terms directly from the data and the semi-automatic indexing system checks and corrects the indexes automatically. have. In this embodiment, an automatic indexing method is used.

검색 모듈(100)은 사용자가 지정한 검색식 또는 자연어 질의에 의하여 특허 및 문헌 정보를 검색하는 모듈이다. 검색 방법은 그 유형에 따라서 IPC 분류, 대상 분류 등의 해당 분류 군집에 접근하여 분류를 수행하는 분류 검색, 키워드의 논리함수를 이용한 용어 검색, 필드별 색인속성을 이용하는 필드 검색, 자연어 질의에 의한 텍스트 검색, 특정 특허자료에 대한 인용된 자료와 피인용된 자료를 검색하는 인용검색이 있다.The retrieval module 100 is a module for retrieving patent and document information by a user-specified search expression or natural language query. The search method is classified classification that accesses the classification group such as IPC classification and target classification according to the type and performs classification, term search using logical function of keywords, field search using index attribute by field, text by natural language query Search, citation search for specific patent data and citation data.

자동분류 모듈(120)은 검색 모듈(100)에 의해 검색된 결과들을 일정한 분류 기준에 따라 복수개의 카테고리 즉, 분류 코드로 분류하기 위한 모듈이다. 자동분류 모듈(120)에서 자동분류 처리가 수행되는 과정은 이하의 도 3 에서 상세히 설명하기로 한다.The automatic classification module 120 is a module for classifying the results retrieved by the search module 100 into a plurality of categories, that is, classification codes, according to a predetermined classification criterion. A process in which the automatic classification process is performed in the automatic classification module 120 will be described in detail later with reference to FIG. 3.

프로젝트 관리 모듈(130)은 사용자가 검색, 분석 시스템 및 발명지원, 자동분류 모듈 등의 다양한 방법으로 작성한 결과를 사용자 파일(140)형태로 저장하여 통합 관리한다. 이렇게 작성된 파일은 같은 작업그룹내의 다른 사용자들과 공유하여 사용한다. 특정 사용자에 상응하는 사용자 파일(140)을 다른 사용자들이 공유할 수 있도록 하는 방법으로는 각 사용자가 로그인 과정에서 입력하는 사용자 아이디를 이용하여 해당 사용자와 동일한 그룹에 속한 사용자만이 공유할 수 있도록 하는 방법, 해당 사용자가 미리 지정한 다른 사용자만이 공유할 수 있도록 하는 방법 등이 적용될 수 있다.The project management module 130 integrates and manages the results created by the user in various ways such as a search, an analysis system, an invention support, and an automatic classification module in the form of a user file 140. This file is then shared with other users in the same workgroup. In order to allow other users to share the user file 140 corresponding to a specific user, each user can share only the users belonging to the same group as the user by using the user ID entered during the login process. The method may be applied such that only the other user designated by the user can share the same.

그리고, 도 2에는 도시되지 않았으나, 검색 모듈(100)에 의한 검색 결과를분석하기 위한 분석 모듈이 더 포함될 수 있다. 종래의 분석툴은 검색 데이터를 다운로드하여 당해 사용자의 단말기(예를 들어, 컴퓨터)에서만 분석이 가능한 한계를 가지고 있었다. 그러나, 본 발명에 따른 분석 모듈은 OLAP(On Line Application Processing) 기술을 이용하여 웹 상에서 자료의 검색 결과를 목록화하여 실시간으로 분석할 수 있다. 본 발명에 따른 분석 모듈은 특허번호, 출원일, 출원인, 제목, IPC 코드, 발명자 등의 다양한 조건을 적용하여 분석을 수행할 수 있다. 그리고, 분석 결과 표시 방법으로는 그래프 등의 형태로 각 조건별 검색 결과(예를 들어, 출원인별 출원건수)를 비교하여 표시할 수 있다. 예를 들어, 분석 모듈은 국가 분석(즉, 등록국가나 출원국가에 해당하는 자료 개수 추출), 기술 분석(즉, IPC 코드별 자료 개수 추출), 출원인 분석(각 출원인에 상응하는 자료 개수 추출), 연도별 국가 분석, 특정 국가의 출원인 분석(즉, 각 국가에서 출원인의 자료 개수 추출) 등이 가능하다.Although not shown in FIG. 2, an analysis module for analyzing a search result by the search module 100 may be further included. The conventional analysis tool has a limitation in that it is possible to download search data and analyze the data only on the terminal (for example, a computer) of the user. However, the analysis module according to the present invention can list the search results of data on the web using OLAP (On Line Application Processing) technology to analyze in real time. The analysis module according to the present invention may perform analysis by applying various conditions such as a patent number, an application date, an applicant, a title, an IPC code, an inventor, and the like. In addition, as an analysis result display method, a search result for each condition (for example, the number of applications for each applicant) may be compared and displayed in the form of a graph or the like. For example, the analysis module may include country analysis (i.e., extract the number of data corresponding to a registered country or filing country), technical analysis (i.e., extract the number of data by IPC code), and applicant analysis (extract the number of data corresponding to each applicant). , Year-by-year country analysis, and applicant country-specific analysis (i.e., extracting the data of applicants from each country).

도 3은 도 1의 서버 컴퓨터(20)의 분류 모듈의 구성을 도시한 도면이다.3 is a diagram illustrating a configuration of a classification module of the server computer 20 of FIG. 1.

사용자가 설정한 분류항목은 분류항목 데이터베이스(150)에 저장되고, 각각의 분류항목에 대한 분류기준은 분류기준 데이터베이스(160)에 저장된다.The classification items set by the user are stored in the classification item database 150, and the classification criteria for each classification item are stored in the classification reference database 160.

특허목록 파일(170)에는 사용자가 분류하고자 하는 특허들의 목록이 저장되어 있다.The patent list file 170 stores a list of patents that a user wants to classify.

자동분류 프로그램은 분류항목 데이터베이스(150), 분류기준 데이터베이스(160), 특허정보 데이터베이스(200)에 저장되어 있는 정보를 참조하여사용자가 지정한 특허목록 파일(170)에 해당되는 특허정보를 분류한다.The automatic classification program classifies the patent information corresponding to the patent list file 170 designated by the user with reference to the information stored in the classification item database 150, the classification criteria database 160, and the patent information database 200.

분류결과는 분류결과 화면(180)에 표시되고, 사용자는 분류결과를 확인한 후 다시 분류기준을 보강해 나가는 과정을 반복함으로써 분류기준이 점차 사용자의 분류의도에 더욱 부합되는 방향으로 학습되어 가게 된다.The classification result is displayed on the classification result screen 180, and the user repeats the process of reinforcing the classification criteria again after checking the classification result so that the classification criteria are gradually learned in a direction more in line with the classification intention of the user. .

분류결과는 다시 분류결과 파일(190)로 저장할 수 있다.The classification result may be stored as the classification result file 190 again.

복수의 사용자에 의해 복수개의 분류결과 파일(190)이 생성된 경우 프로젝트 관리 모듈이 복수의 사용자에 의해 분류된 특허 정보들 또는 분류결과 파일을 통합하여 하나의 사용자 파일(140)을 생성 및 관리할 수도 있다.When a plurality of classification result files 190 are generated by a plurality of users, the project management module may generate and manage one user file 140 by integrating the classification information files or the patent information classified by the plurality of users. It may be.

도 4는 본 발명에 따른 문헌 정보의 분류가 수행되는 과정을 도시한 흐름도이다.4 is a flowchart illustrating a process of classifying document information according to the present invention.

우선, 사용자 컴퓨터(20)에서 검색된 특허정보(대상파일) 및 분류구분, 분류코드 등을 선택하여 서버 컴퓨터(20)로 전송한다(S1).First, the patent information (target file), classification classification, classification code, etc. retrieved from the user computer 20 are selected and transmitted to the server computer 20 (S1).

서버 컴퓨터(20)는 수신된 특허자료(대상파일)에 대해 사용자 컴퓨터(20)로부터 전송 받은 분류정의를 바탕으로 각 분류코드별 분류기준에 속하는 특허자료를 자동으로 분류해 간다(S2).The server computer 20 automatically classifies the patent data belonging to the classification criteria for each classification code based on the classification definition received from the user computer 20 with respect to the received patent data (target file) (S2).

자동분류 과정은 서버 컴퓨터(20)에 수신된 특허자료(대상파일)에 대해 사용자 컴퓨터(20)로부터 전송 받은 분류정의(구분, 코드)를 바탕으로 각 분류코드별 분류기준에 속하는 특허자료를 추출한다. 만일 사용자가 분류코드를 별도로 지정하지 않거나 전체 분류를 선택한 경우에는 각 분류 코드별로 특허 정보가 분류된다.각각의 특허자료는 여러 개의 분류코드에 해당될 수 있으며, 어느 분류코드에도 해당되지 않으면 　미분류로 분류된다. 여기서 분류코드별 분류기준은 사용자가 의도하는 단어 또는 색인어나 이 단어들을 논리 연산 기호(　+　, 　*　)를 사용하여 결합된 형태로 각 특허자료는 해당 논리 연산 처리를 통해 각 분류 코드별로 분류된다.The automatic classification process extracts the patent data belonging to the classification criteria for each classification code based on the classification definition (classification, code) received from the user computer 20 for the patent data (target file) received at the server computer 20. do. If the user does not specify a classification code or selects a whole classification, the patent information is classified by each classification code. Each patent data may correspond to several classification codes. Are classified. Here, the classification criteria for each classification code is a word or index word intended by the user or the words are combined using logical operation symbols (+, *). Each patent data is classified by each classification code through a corresponding logical operation process.

서버 컴퓨터(20)는 자동분류가 완료된 특허정보를 사용자 컴퓨터(20)로 전송하면(S3), 자동 분류된 특허 정보들이 사용자 컴퓨터(20)의 화면에 트리 구조로 표시된다(S4). 즉 서버 컴퓨터(20)는 자동분류가 완료된 특허자료에 대해 분류코드별로 특허건수를 디렉토리 형태로 표시하고, 각 분류코드 노드를 선택하면, 해당되는 특허자료의 리스트가 하위 노드로 표시된다. 하위노드의 특정 특허자료를 선택하여 서버 컴퓨터(20)에 전송하면(S5), 서버 컴퓨터(20)는 해당 특허자료에 대한 자세한 세부 정보(서지적 사항 및 본문 요약 정보)를 추출한다(S6).When the server computer 20 transmits the patent information of which the automatic classification is completed to the user computer 20 (S3), the automatically classified patent information is displayed in a tree structure on the screen of the user computer 20 (S4). That is, the server computer 20 displays the number of patents for each classification code of the patent data for which the automatic classification is completed in the form of a directory, and if each classification code node is selected, a list of the corresponding patent data is displayed as a lower node. When the specific patent data of the lower node is selected and transmitted to the server computer 20 (S5), the server computer 20 extracts detailed information (book matters and body summary information) about the patent data (S6). .

서버 컴퓨터(20)가 추출된 특허 세부 정보(서지적 사항 및 본문 요약 정보)를 사용자 컴퓨터(20)에 전송하면(S7), 사용자 컴퓨터(20) 웹 화면에 특허의 서지적 사항 및 본문 요약 정보가 표시된다(S8). 본 실시예에서는 로딩 속도의 지연을 고려하여 자동분류된 특허의 개수가 20개 이하일 경우에만 다중 보기를 지원하도록 하였다.When the server computer 20 transmits the extracted patent details (book details and body summary information) to the user computer 20 (S7), the bibliographic details and body summary information of the patent are displayed on the web screen of the user computer 20. Is displayed (S8). In this embodiment, multiple views are supported only when the number of automatically categorized patents is 20 or less in consideration of the delay of the loading speed.

사용자 컴퓨터(20)에서 키워드 추출신호를 서버 컴퓨터(20)로 전송한다(S09), 즉 사용자 컴퓨터(20) 화면에 표시된 특허 초록정보의 특정 코드를 선택하면, 서버 컴퓨터(20)는 해당 특허정보에 상응하는 키워드를 추출하여 사용자컴퓨터(20)에 전송한다(S10, S11). 여기서 키워드는 특허정보의 본문요약에 포함된 단어를 형태소 분석을 통해 불필요한 단어와 조사 등을 제거하고 각 명사의 빈도수와 가중치를 산정하여 적정치에 해당되는 명사들을 추출하는 방식으로 추출된다. 그러나, 본 발명은 이에 국한하지 않으며, 키워드를 추출하는 방법은 그 외에도 여러 가지 방법이 있을 수 있다.The user computer 20 transmits the keyword extraction signal to the server computer 20 (S09). That is, when a specific code of the patent abstract information displayed on the screen of the user computer 20 is selected, the server computer 20 sends the corresponding patent information. Extract the keyword corresponding to and transmit it to the user computer 20 (S10, S11). Here, keywords are extracted by morphological analysis of the words included in the text summary of the patent information, removing unnecessary words and investigations, and calculating the frequency and weight of each noun to extract proper nouns. However, the present invention is not limited thereto, and there may be various methods of extracting keywords.

이렇게 자동 분류된 결과자료들을 화면으로 표시하여 검토하고 분류기준의 규칙을 보강시켜 나가는 과정은 도면3에서 설명하겠다.The process of displaying and reviewing the results of the automatic classification on a screen and reinforcing the rules of the classification criteria will be described in FIG. 3.

이러한 자동분류 과정을 특정 목적으로 검색된 특허정보를 대상으로 반복적으로 실행할 수 있다(S12)This automatic classification process can be repeatedly executed for the patent information retrieved for a specific purpose (S12).

사용자 컴퓨터(20)에서 자동분류가 완료된 특허정보에 대해 　저장　 요청 신호를 서버 컴퓨터(20)에 전송하면(S13), 서버 컴퓨터(20)는 자동 분류된 특허정보 자료를 일정 양식의 파일 형태로 저장한 뒤에(S14), 저장이 완료되면 사용자의 컴퓨터에 저장완료 내용을 표시한다(S15).When the user computer 20 transmits the "save" request signal for the patent information for which the automatic classification is completed to the server computer 20 (S13), the server computer 20 stores the automatically classified patent information data in the form of a file in a predetermined form. After the storage (S14), when the storage is completed, the contents of the storage is displayed on the user's computer (S15).

도 5는 문헌 정보의 분류 기준 학습이 수행되는 과정을 도시한 흐름도이다.5 is a flowchart illustrating a process of performing classification criterion learning of document information.

도 5을 참조하면, 우선, 사용자 컴퓨터(20)에서 대상파일, 분류기준, 분류코드를 선택하여 서버 컴퓨터(20)에 전송한다(S16).Referring to FIG. 5, first, a target file, classification criteria, and classification code are selected by the user computer 20 and transmitted to the server computer 20 (S16).

본 발명에서 분류구분 및 분류 코드는 기본적으로 제공되는 것 외에 사용자가 직접 작성할 수 있는 것을 비롯해서 여러 가지 방법을 도입할 수 있다.In the present invention, the classification code and the classification code may be provided in addition to the basic information, and various methods may be introduced, including a user can directly write the classification.

서버 컴퓨터(20)는 수신된 분류구분, 분류코드에 상응하는 분류 기준 정보를추출하여(S17), 사용자 컴퓨터(20)로 전송한다(S18).The server computer 20 extracts classification criterion information corresponding to the received classification classification and classification code (S17) and transmits it to the user computer 20 (S18).

사용자는 각 분류코드에 대한 분류기준을 정의함에 있어 핵심 키워드를 선정한다(S19). 핵심 키워드 선정에 있어서는 사용자가 해당 분류코드에 적절한 임의의 키워드를 선정하거나 특허 자료에 대한 본문 요약의 색인 키워드 중에서 선정한다.The user selects a key keyword in defining the classification criteria for each classification code (S19). In selecting a key keyword, the user selects any keyword appropriate for the classification code, or selects an index keyword from a text summary of the patent data.

사용자는 선정된 핵심키워드를 이용하여 분류기준을 새로 등록하거나 기존의 분류기준을 보강시켜나가는 작업을 통해 최적의 분류기준을 작성(S20)한다. 여기에서 키워드를 중심으로 논리연산자인 *(and) +(or) 또는 괄호를 사용하여 추가하거나 수정, 삭제하여 작성할 수 있다.The user creates an optimal classification standard by registering new classification criteria or reinforcing existing classification criteria using the selected key keywords (S20). Here, you can add, modify, or delete using the logical operators * (and) + (or) or parentheses around keywords.

사용자는 보강된 분류기준정보가 잘 작성되었는지 확인(테스트)하는 과정을 실행한다 테스트를 원할 경우 사용자 컴퓨터(20)는 서버 컴퓨터(20)에 해당 분류기준정보와 해당분류코드 및 분류대상특허정보를 전송한다(S21). 여기에서 서버 컴퓨터(20)는 사용자 컴퓨터(20)로부터 수신된 정보를 적용하여 대상파일의 특허 정보 자료를 자동 분류하여(S22), 사용자 컴퓨터(20)의 화면에 자동분류 결과를 표시한다(S23). 또한 저장분류기준정보가 필요에 따라 교정 및 수정, 삭제등이 필요하다면 새로운 키워드 선정부터 분류학습 과정을 반복하게 된다.The user executes a process of confirming (testing) whether the enhanced classification standard information is well prepared. If a test is desired, the user computer 20 transmits the classification standard information, the classification code, and the classification target patent information to the server computer 20. It transmits (S21). Here, the server computer 20 applies the information received from the user computer 20 to automatically classify the patent information data of the target file (S22), and displays the result of the automatic classification on the screen of the user computer 20 (S23). ). In addition, if the storage classification criteria information needs correction, correction, deletion, etc., the classification learning process is repeated from the selection of new keywords.

사용자는 자동으로 재분류된 결과를 확인 후, 만족스러운 결과가 나타날 때까지 지속적으로 분규기준정보를 보강시켜 나간다. 만일 사용자는 표시된 재분류 결과를 검토한 후, 만족된 결과가 나오면 분류기준정보를 채택한다.(S24)The user checks the result of reclassification automatically and continuously reinforces the dispute standard information until a satisfactory result is displayed. If the user reviews the displayed reclassification result and the result is satisfied, the user adopts the classification criteria information (S24).

서버 컴퓨터(20)는 사용자 컴퓨터(20)로부터 수신된 분류기준 정보를 저장한 후(S25), 저장 결과를 사용자 컴퓨터(20)의 화면에 표시한다(S26).The server computer 20 stores the classification criterion information received from the user computer 20 (S25), and displays the storage result on the screen of the user computer 20 (S26).

사용자가 서버 컴퓨터(20)에 접속하여 특허 정보 자동분류 메뉴를 선택하면 도 6과 같은 화면이 표시된다.When the user connects to the server computer 20 and selects the patent information automatic classification menu, a screen as shown in FIG. 6 is displayed.

대상 파일 창(200)은 분류하고자 하는 특허 정보가 저장된 파일을 지정하는 창이다. 즉, 사용자가 대상 파일 창(200)을 클릭하면 사용자와 관련된 대상 파일 목록이 풀 다운 방식으로 표시되며 사용자가 표시된 목록에서 분류 대상 파일을 선택한다.The target file window 200 is a window for designating a file in which patent information to be classified is stored. That is, when the user clicks on the target file window 200, a list of target files related to the user is displayed in a pull-down manner, and the user selects a classification target file from the displayed list.

분류 구분 창(210)은 IOC(기능 분류), ICC(기술 분류) 또는 사용자가 등록한 분류 구분을 목록으로 표시하는 부분이다.The classification partition window 210 is a part for displaying a classification classification registered by an IOC (functional classification), ICC (technical classification) or a user.

분류 코드 창(220)은 선택된 분류 구분에 따라 동적으로 생성되는데, 분류 구분에 따라 방법 분류, 기능 분류가 표시된다.The classification code window 220 is dynamically generated according to the selected classification classification, and the method classification and the function classification are displayed according to the classification classification.

사용자가 상기 대상 파일, 분류 구분 및 분류 코드를 선택한 후, 분류 시작(230)을 클릭하면 특허 정보의 분류 처리가 수행된다.After the user selects the target file, classification classification and classification code, and clicks classification start 230, classification processing of patent information is performed.

분류 트리 창(240)은 분류된 결과를 분류 코드에 따라 폴더 트리 형식으로 표시하는 부분이다. 하나의 특허 정보가 여러 개의 분류 코드에 속할 수도 있으며, 어느 분류 코드에도 속하지 않는 특허 정보는 미분류 특허로 분류된다. 도 5는 사용자가 분류 코드로서 전체 분류를 선택한 경우이며, 따라서, 분류 트리 창(240)에는 20개의 분류 코드가 표시되어 있다. 만일, 사용자가 20개의 분류 코드 중에서어느 하나를 클릭하면 해당 분류 코드로 분류된 특허 정보들이 계층 구조로 표시된다. 도 5에는 "분할" 이라는 분류 코드가 클릭된 경우 6개의 특허 정보가 표시되는 것을 도시하고 있다.The classification tree window 240 displays a classified result in a folder tree form according to the classification code. One patent information may belong to several classification codes, and patent information not belonging to any classification code is classified as an unclassified patent. 5 shows a case where the user selects the entire classification as the classification code. Accordingly, 20 classification codes are displayed in the classification tree window 240. If the user clicks on any one of the 20 classification codes, the patent information classified by the classification code is displayed in a hierarchical structure. FIG. 5 shows that six patent information is displayed when a classification code of "dividing" is clicked.

초록 출력 창(250)은 지정된 분류 코드로 분류된 특허 정보들의 서지적 사항 및 본문 요약 정보를 포함하는 초록이 표시되는 부분으로서, 상술한 바와 같이, 분류 결과 지정된 분류 코드에 속하는 특허 정보들이 20개 미만일 경우 해당 특허 정보들을 다중 창으로 모두 출력하여 사용자가 특허 정보를 용이하게 검색 및 검토할 수 있다. 그러나, 이는 로딩 속도를 고려한 것으로서, 필요에 따라서 20개 이상의 특허 정보에 대해서는 다중 보기 기능을 지원하도록 할 수도 있다. 만일 사용자가 6개의 특허 정보 중에서 어느 하나를 클릭하면 도 5에 도시된 바와 같이, 선택된 특허 정보의 초록만이 표시된다.The abstract output window 250 is a portion in which an abstract including bibliographic matters and text summary information of patent information classified under a designated classification code is displayed. As described above, 20 patent information belonging to a classification code designated as a classification result is displayed. If less, the patent information is output in multiple windows so that the user can easily search for and review the patent information. However, this is in consideration of the loading speed, it is also possible to support a multi-view function for more than 20 patent information, if necessary. If the user clicks on any one of the six patent information, as shown in Fig. 5, only the abstract of the selected patent information is displayed.

색인어 창(270)은 선택된 특허 정보에 상응하는 색인어를 표시하는 부분으로서, 색인어는 색인 모듈(80)에 의해 추출된 키워드를 의미한다. 상술한 바와 같이, 키워드는 형태소 분석 및 빈도 분석 등의 기법을 통해 추출될 수 있다.The index word window 270 is a portion for displaying an index word corresponding to the selected patent information, and the index word refers to a keyword extracted by the index module 80. As described above, keywords may be extracted through techniques such as morphological analysis and frequency analysis.

분류 기준 키워드 창(260)은 특허 정보들을 특정 분류 코드로 분류하기 위한 분류 기준 키워드가 표시되는 부분이다. 만일, 사용자가 색인어 창(270)에서 특정 키워드를 클릭하면 해당 키워드가 분류 기준 키워드 창(260)에 표시된다. 이 때, 여러 개의 키워드를 선택할 경우에는 연산자 버튼(280)을 이용하여 키워드들을 연산자를 통해 결합시켜야 하며, 연산자 없이 연속적인 키워드의 입력이 있는 경우 연산자 입력을 요구하는 메시지가 출력된다.The classification criteria keyword window 260 is a portion in which classification criteria keywords for classifying patent information into specific classification codes are displayed. If a user clicks on a specific keyword in the index word window 270, the keyword is displayed in the classification criteria keyword window 260. In this case, when selecting a plurality of keywords, the keywords must be combined through the operator using the operator button 280, and if there is a continuous input of the keyword without the operator, a message for requesting the operator input is output.

사용자가 키워드를 추가한 후, 등록 버튼(300)을 클릭하면 새로운 분류 기준이 등록된다. 만일 동일한 내용이 재등록되면 이미 등록되었음을 알리는 메시지가 출력된다.After the user adds the keyword, click the registration button 300 to register a new classification criteria. If the same contents are re-registered, a message indicating that they are already registered is displayed.

검증 버튼(310)을 클릭하면 대상 파일창(200)에서 선택한 특허 정보에 대해 분류 기준이 표시되는 창(260)에서 추가, 삭제, 수정을 통해 보강된 분류 기준으로 선택된 분류코드에 대하여 재분류를 실시한 후 결과를 메시지로 출력하게 된다.When the verification button 310 is clicked, the reclassification is performed on the classification code selected as the reinforcement classification standard by adding, deleting, or modifying in the window 260 in which the classification standard is displayed for the patent information selected in the target file window 200. After executing, the result is printed out as a message.

사용자는 검증한 결과를 확인 후 적용 버튼(320)을 클릭하여 보강된 분류 기준을 서버 컴퓨터(20)의 분류기준 데이터베이스에 적용시키게 된다.After checking the verification result, the user clicks the apply button 320 to apply the enhanced classification criteria to the classification criteria database of the server computer 20.

분류된 결과에 대한 사용자의 만족여부에 따라 분류결과 저장 버튼(330)을 클릭하면 분류기준에 의해 분류되어진 결과를 프로젝트 관리 내에 일정파일 형태로 저장후 저장결과가 메시지로 출력된다.If the user clicks on the storage result of classification according to whether the user is satisfied with the classified result, the result classified by the classification criteria is stored in the form of a schedule file in the project management, and the result is stored as a message.

비록, 본 실시예에서는 특허 정보의 분류방법에 한정하여 예시하고 설명하였으나, 본 발명은 이에 국한되지 않고, 각종 정보의 분류 방법에도 적용이 가능함은 물론이다.Although the exemplary embodiment has been illustrated and described with reference to the method of classifying patent information, the present invention is not limited thereto, and the present invention can be applied to a classifying method of various types of information.

본 발명에 따르면, 사용자가 자신의 목적에 적합한 문헌 정보(예를 들어, 특허정보, 기술정보, 논문정보, 뉴스기사, 역사사료 등과 같이 출력물의 형태 등)의 분류정의를 설정할 수 있는 효과가 있다.According to the present invention, there is an effect that the user can set the classification definition of the document information (for example, the form of the output, such as patent information, technical information, article information, news articles, historical feeds, etc.) suitable for his purpose. .

또한, 문헌의 분류를 신속하고 효율적으로 수행할 수 있고, 분류된 결과를 쉽고 효과적으로 파악할 수 있는 효과가 있다.In addition, the classification of documents can be performed quickly and efficiently, and the classified results can be easily and effectively grasped.

또한, 분류결과가 다시 분류기준에 반영되어 분류기준이 학습/진화되어 가는 효과가 있다.In addition, the classification result is reflected in the classification criteria again, and thus the classification criteria are learned / evolved.

비록 본 발명이 상기 언급된 바람직한 실시예와 관련하여 설명되어졌지만, 발명의 요지와 범위로부터 벗어남이 없이 다양한 수정이나 변형을 하는 것이 가능하다. 따라서 첨부된 특허청구의 범위는 본 발명의 요지에서 속하는 이러한 수정이나 변형을 포함할 것이다.Although the present invention has been described in connection with the above-mentioned preferred embodiments, it is possible to make various modifications or variations without departing from the spirit and scope of the invention. Accordingly, the appended claims will cover such modifications and variations as fall within the spirit of the invention.

Claims

In the learning-based document information classification method by user classification definition,

Receiving classification target file selection information and classification code selection information from a user computer;

Classifying the document information included in the classification target file according to classification criteria information corresponding to the selected classification code and transmitting the result to the user computer;

Receiving selection information of document information classified as classification reference information of the classification code, extracting corresponding green information and at least one or more indexed keywords, and transmitting them to the user computer;

Receiving a keyword selection signal for selecting one or more keywords among the extracted keywords from the user computer;

Performing any one of a process of adding, modifying, or deleting a keyword selected from the user computer to classification criteria information;

Reclassifying the document information based on the classification criteria information and transmitting the reclassified result to the user computer when the verification request information on the changed classification criteria information is received from the user computer; And

And storing the result of the classification criteria.

The method of claim 1,

Receiving a reclassification request signal from the user computer;

Reclassifying the document information;

Receiving a classification result storage request signal from the user computer when the reclassified document information matches the classification criteria information defined by the user computer; And

And classifying the classification result into a user classification result file.

The method of claim 1,

Receiving classification criteria information corresponding to a plurality of classification codes;

And classifying the plurality of pieces of document information into the plurality of pieces of classification codes based on the classification criteria information.

The method of claim 3, wherein

Processing and storing raw document information; And

And extracting and storing at least one or more index words from the processed document information according to a preset method.

The method of claim 4, wherein

The index word extraction is a learning-based document information classification method according to a user classification definition, characterized in that extracting only the root from the words included in the document information in a predetermined manner.

The method of claim 5, wherein

The index word is a learning-based document information classification method according to a user classification definition, characterized in that included in the document information more than a preset number.

The method of claim 1,

And configuring the screen so that all the document information classified by the classification criteria is displayed on one screen.

The method of claim 7, wherein

The bibliographic information is a patent number, IPC code, application date, publication date, name, applicant, inventor or summary information characterized in that the learning-based literature information classification method according to the user classification.

The method of claim 7, wherein

The method of classifying document information based on a user classification definition, comprising the step of configuring the document information classified by the classification criteria into the number of documents included in each classification code and the corresponding document by patent number and name. .

The method of claim 1,

And storing the reclassified document information as shared information of a specific project group.

The method of claim 10,

Receiving the classified or reclassified document information request signal including user information from another user computer belonging to the specific project group;

Confirming whether the other user is a member of the specific project group based on the user information; And

And when the member authentication is completed, transmitting the classified or reclassified document information to the other user computer.

The method of claim 11,

And generating an integrated classification target file by integrating the document information classified by the respective user computers belonging to the specific project group.

A recording medium having a computer readable program recorded thereon,

A recording medium having recorded thereon a program for executing the step of storing the result of the classification criteria.