KR20200028561A

KR20200028561A - System for personalized pimple management service

Info

Publication number: KR20200028561A
Application number: KR1020180106650A
Authority: KR
Inventors: 박정윤
Original assignee: 주식회사 이피엠솔루션즈
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2020-03-17

Abstract

The present invention relates to an automatic document classification service providing system. According to an embodiment of the present invention, the automatic document classification service providing server is interlocked with a user terminal, receives a first keyword input from a user through the user terminal, and provides a classification model related to the first keyword to the user terminal. The automatic document classification service providing server comprises: a data collection unit collecting a plurality of drawings and document information from an external server; a classification keyword extraction unit extracting classification keywords from a document included in the data collection unit and collecting user classification keywords based on user information; a classification model generation unit generating a plurality of classification models by learning the classification keywords extracted from the document and the user classification keywords and generating a customized classification model by extracting a classification model related to the first keyword among the plurality of classification models; and a classification service providing unit providing the customized classification model to the user terminal.

Description

System for providing automatic document classification service {SYSTEM FOR PERSONALIZED PIMPLE MANAGEMENT SERVICE}

본 발명은 문서 자동분류 서비스 제공 시스템에 관한 것이다.The present invention relates to a system for providing a document automatic classification service.

일반적으로, 공공기관이나 은행 또는 회사 등, 뿐만 아니라 개인에게 있어서, 문서의 자동 분류는 업무의 효율 측면에서 그 필요성이 증대되고 있다. 이러한 기능을 만족시키기 위한 자동 문서 분류 장치는 문서의 분류작업, 보관 작업 및 보관 이후 문서의 검색 및 확인 작업에 있어서 신속성과 효율성을 제공한다.In general, for public institutions, banks or companies, etc., as well as individuals, the automatic classification of documents is increasing in need in terms of work efficiency. The automatic document classification device for satisfying these functions provides speed and efficiency in sorting, archiving, and retrieving and checking documents after archiving.

또한, 전자문서의 도입에 따라, 문서보관량이 많이 줄었지만, 결제서, 검토서, 공문, 확인서, 인증서, 신분증 등은 여전히 문서 형식으로 사용되고 있는데, 이들을 전자화하여 분류 및 보관하는데 많은 어려움이 따르고 있다.In addition, with the introduction of electronic documents, the amount of document storage has decreased, but payments, review documents, official documents, confirmations, certificates, identification cards, etc. are still used in document form, and it is difficult to classify and store them electronically.

따라서 이를 해결하기 위하여, 자동 문서 분류 장치가 개발되어 사용되고 있다.Therefore, in order to solve this, an automatic document classification device has been developed and used.

이와 관련하여, 한국공개특허 제10-2012-0017235호는 자동으로 문서를 스캔, 분류, 보관하는 기능과 보관 이후에도 사용자 인터페이스를 이용하여 전자화된 문서를 검색, 확인, 인쇄하는 기능을 갖고 있는 자동문서분류보관장치를 개시하고 있다.In this regard, Korean Patent Publication No. 10-2012-0017235 automatically scans, sorts, and archives documents, and automatically documents that have the ability to search, view, and print electronic documents after storage. A classification storage device is disclosed.

한국공개특허 제10-2012-0017235호Korean Patent Publication No. 10-2012-0017235

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 본 발명의 일 실시예는, 문서 자동분류 서비스 제공 시스템을 제공하고자 한다. The present invention is to solve the problems of the prior art described above, an embodiment of the present invention, to provide a document automatic classification service providing system.

구체적으로 기업 또는 기관 등의 외부 서버에 산재된 도면 및 문서들을 자동 분류하고, 사용자의 니즈에 맞는 맞춤형 분류 모델을 제공하고자 한다. Specifically, it is intended to automatically classify drawings and documents scattered on an external server such as a company or an institution, and to provide a customized classification model suitable for a user's needs.

또한, 사용자 개개인의 목적에 맞는 분류 모델을 생성하여 사용자에게 제공함으로써, 방대한 양의 도면 또는 문서 관리 또는 문서 검색을 손쉽게 하도록 하는 문서 자동분류 서비스 제공 시스템을 제공한다. In addition, by providing a classification model suitable for each user's purpose and providing it to a user, an automatic document classification service providing system is provided to facilitate a large amount of drawing or document management or document search.

한편, 본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.On the other hand, the technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned are clearly understood by those skilled in the art from the following description. Will be understandable.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예에 따른 문서 자동 분류 서비스 제공 서버는 사용자 단말과 연동되며, 사용자로부터 상기 사용자 단말을 통해 입력받은 제 1 키워드를 전송받고, 상기 제 1 키워드와 관련된 분류모델을 상기 사용자 단말로 제공한다. 이때, 문서 자동분류 서비스 제공서버는 외부 서버로부터 복수의 도면 및 문서 정보를 수집하는 데이터 수집부; 상기 데이터 수집부에 포함된 문서 내에서 분류 키워드를 추출하고, 사용자 정보에 기초하여 사용자 분류 키워드를 수집하는 분류 키워드 추출부; 상기 문서 내에서 추출된 분류 키워드 및 상기 사용자 분류 키워드를 학습하여 복수의 분류모델을 생성하고, 상기 복수의 분류모델 중, 상기 제 1 키워드와 관련된 분류 모델을 추출하여 맞춤형 분류모델을 생성하는 분류모델 생성부; 및 상기 맞춤형 분류모델을 사용자 단말로 제공하는 분류서비스 제공부를 포함한다.As a technical means for achieving the above technical problem, the document automatic classification service providing server according to an embodiment of the present invention is linked to a user terminal, receives a first keyword input from the user through the user terminal, and the A classification model related to the first keyword is provided to the user terminal. At this time, the document automatic classification service providing server includes a data collection unit for collecting a plurality of drawings and document information from an external server; A classification keyword extraction unit that extracts classification keywords from documents included in the data collection unit and collects user classification keywords based on user information; A classification model that generates a plurality of classification models by learning the classification keywords extracted from the document and the user classification keywords, and extracts a classification model related to the first keyword from among the plurality of classification models to generate a customized classification model. Generation unit; And a classification service providing unit that provides the customized classification model to a user terminal.

전술한 과제 해결 수단 중 어느 하나에 의하면, 본 발명의 일 실시예는 문서 자동분류 서비스 제공 시스템및 방법을 제공할 수 있다. According to any one of the above-described problem solving means, an embodiment of the present invention can provide a system and method for providing an automatic document classification service.

구체적으로 기업 또는 기관 등의 외부 서버에 산재된 도면 및 문서들을 자동 분류하고, 사용자의 니즈에 맞는 맞춤형 분류 모델을 제공할 수 있다. Specifically, drawings and documents scattered on an external server such as a company or an institution may be automatically classified, and a customized classification model suitable for a user's needs may be provided.

또한, 사용자 개개인의 목적에 맞는 분류 모델을 생성하여 제공함으로써, 방대한 양의 도면 또는 문서 관리 또는 문서 검색을 손쉽게 할 수 있다. In addition, by generating and providing a classification model suitable for each user's purpose, a large amount of drawings or document management or document search can be easily performed.

한편, 본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다. On the other hand, the effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description. Will be able to.

도 1은 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템의 구성을 개략적으로 도시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 문서 자동 분류 서버(100)의 개략적인 구성도이다.
도 3은 본 발명의 일 실시예에 따라 도면 및 문서에서 분류 키워드를 추출하는 일례를 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따라 분류모델 생성부가 복수의 분류 모델을 생성하는 일례를 도시하고 있다.
도 5는 본 발명의 일 실시예에 따른 자동분류 및 검색 서비스 플랫폼을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템의 문서 자동분류 서비스 제공 방법을 상세히 설명하기 위한 흐름도이다.1 is a view schematically showing the configuration of a document automatic classification service providing system according to an embodiment of the present invention.
2 is a schematic configuration diagram of an automatic document classification server 100 according to an embodiment of the present invention.
3 is a diagram illustrating an example of extracting a classification keyword from drawings and documents according to an embodiment of the present invention.
4 illustrates an example in which the classification model generation unit generates a plurality of classification models according to an embodiment of the present invention.
5 is a view for explaining an automatic classification and search service platform according to an embodiment of the present invention.
6 is a flowchart illustrating in detail a method for providing a document automatic classification service in a system for automatically providing a document classification service according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present invention pertains can easily practice. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention, parts not related to the description are omitted.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part “includes” a certain component, it means that the component may further include other components, not to exclude other components, unless otherwise stated.

이하, 도면을 참조하여 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템을 상세히 설명하도록 한다. Hereinafter, a system for automatically providing a document classification service according to an embodiment of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템의 구성을 개략적으로 도시한 도면이다. 1 is a view schematically showing the configuration of a document automatic classification service providing system according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템은 문서 자동 분류 서버(100) 및 데이터베이스(200)를 포함한다. 이때, 문서자동 분류 서버(100)는 데이터베이스(200) 및 사용자 단말(300)과 네트워크를 통해 연동된다.An automatic document classification service providing system according to an embodiment of the present invention includes an automatic document classification server 100 and a database 200. At this time, the document automatic classification server 100 is linked to the database 200 and the user terminal 300 through a network.

이때, 네트워크는 단말 및 서버와 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 3GPP(3rd generation partnership project) 네트워크, LTE(long term evolution) 네트워크, WIMAX(world interoperability for microwave access) 네트워크, 인터넷(internet), LAN(local area network), Wireless LAN(Wireless local area network), WAN(wide area network), PAN(personal area network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(digital multimedia broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.At this time, the network means a connection structure capable of exchanging information between each node, such as a terminal and a server. Examples of such a network include a 3GPP (3rd generation partnership project) network, a long term evolution (LTE) network, and a WIMAX ( world interoperability for microwave access (internet), internet (local area network), wireless local area network (LAN), wide area network (WAN), personal area network (PAN), Bluetooth network, satellite Broadcast networks, analog broadcast networks, digital multimedia broadcasting (DMB) networks, and the like.

문서 자동분류 서버(100)는 복수의 데이터가 저장된 기업 또는 외부 서버로부터 도면 및 문서 정보를 수집하고, 수집된 도면 및 문서정보를 데이터베이스(200)에 저장한다. The document automatic classification server 100 collects drawing and document information from an enterprise or an external server in which a plurality of data are stored, and stores the collected drawing and document information in the database 200.

이때, 데이터베이스(200)는 메모리를 포함할 수 있다. 여기서 메모리는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 또는 휘발성 저장 장치를 통칭하는 것이다. 예를 들어, 메모리는 ROM(read only memory), PROM(programmable ROM), EPROM(erasable programmable ROM), EEPROM(electrically erasable programmable ROM), RAM(random access memory), 콤팩트 플래시(compact flash; CF) 카드, SD(secure digital) 카드, 메모리 스틱(memory stick), 솔리드 스테이트 드라이브(solid-state drive; SSD) 및 마이크로(micro) SD 카드 등과 같은 낸드 플래시 메모리(NAND flash memory), 하드 디스크 드라이브(hard disk drive; HDD) 등과 같은 마그네틱 컴퓨터 기억 장치 및 CD-ROM, DVD-ROM 등과 같은 광학 디스크 드라이브(optical disc drive) 등을 포함할 수 있다. At this time, the database 200 may include memory. Here, the memory refers to a nonvolatile or volatile storage device that keeps stored information even when power is not supplied. For example, the memory is a read only memory (ROM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), random access memory (RAM), compact flash (CF) card , NAND flash memory, hard disk drives such as secure digital (SD) cards, memory sticks, solid-state drives (SSDs), and micro SD cards drive (HDD), and the like, and may include magnetic computer storage devices and optical disc drives, such as CD-ROMs and DVD-ROMs.

또한, 문서 자동분류 서버(100)는 수집된 도면 및 문서 정보에 따라 각각의 도면 및 문서를 자동 분류하여 복수의 분류 모델을 생성한 후, 생성된 분류 모델 중, 사용자와 관련된 분류 모델을 추출하여 사용자 단말(300)로 제공한다. In addition, the document automatic classification server 100 automatically classifies each drawing and document according to the collected drawings and document information to generate a plurality of classification models, and then extracts a classification model related to a user from among the generated classification models Provided to the user terminal 300.

구체적으로, 문서 자동분류 서버(100)는 사용자 단말(300)로부터 사용자에 의해 입력된 제 1 키워드를 수신하고, 제 1 키워드와 연관된 분류 모델을 추출하여 사용자 단말(300)로 전송한다. Specifically, the document automatic classification server 100 receives the first keyword input by the user from the user terminal 300, extracts a classification model associated with the first keyword, and transmits it to the user terminal 300.

한편, 사용자단말(300)은 카메라모듈, 디스플레이, 메모리, 프로세서 및 네트워크 통신모듈을 포함하고, 이동하며 네트워크를 사용할 수 있는 스마트 단말일 수 있다. 예를 들어, 스마트 단말은 휴대성과 이동성이 보장되는 무선 통신 장치, 즉 PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), WiBro(Wireless Broadband Internet) 단말 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치 등을 포함할 수 있다. 또한, 스마트 단말은 노트북 컴퓨터(notebook computer), 태블릿(tablet) PC 등 이동성이 보장되며, 무선 통신이 가능한 개인용 컴퓨터와 스마트 폰(Smartphone) 등의 스마트 디바이스(smart device) 등을 포함할 수 있다.Meanwhile, the user terminal 300 may include a camera module, a display, a memory, a processor, and a network communication module, and may be a smart terminal capable of moving and using a network. For example, the smart terminal is a wireless communication device that guarantees portability and mobility, that is, Personal Communication System (PCS), Global System for Mobile communications (GSM), Personal Digital Cellular (PDC), Personal Handyphone System (PHS), PDA All kinds of handhelds such as Personal Digital Assistant (IMT), International Mobile Telecommunication (IMT) -2000, Code Division Multiple Access (CDMA) -2000, W-Code Division Multiple Access (W-CDMA), and Wireless Broadband Internet (WiBro) terminals (Handheld) -based wireless communication device. In addition, the smart terminal is a notebook computer (notebook computer), a tablet (tablet) PC, etc., mobility is guaranteed, and may include a personal computer capable of wireless communication and a smart device such as a smart phone.

이하에서는 본 발명의 일 실시예에 따른 문서 자동 분류 서비스 제공 시스템의 각 구성요소에 대하여 상세히 설명하도록 한다.Hereinafter, each component of the document automatic classification service providing system according to an embodiment of the present invention will be described in detail.

도 2는 본 발명의 일 실시예에 따른 문서 자동 분류 서버(100)의 개략적인 구성도이다. 2 is a schematic configuration diagram of an automatic document classification server 100 according to an embodiment of the present invention.

도 2를 참조하면, 문서 자동분류 서버(100)는 데이터 수집부(110), 분류 키워드 추출부(120), 분류모델 생성부(130) 및 분류 서비스 제공부(140)을 포함한다. 그러나, 상술한 구성 예에 한정되는 것은 아니며, 필요에 따라 세부 구성요소로 분리되거나 또는 다른 구성요소를 더 포함할 수도 있다. Referring to FIG. 2, the document automatic classification server 100 includes a data collection unit 110, a classification keyword extraction unit 120, a classification model generation unit 130, and a classification service providing unit 140. However, the present invention is not limited to the above-described configuration examples, and may be separated into detailed components or further include other components as necessary.

먼저, 데이터 수집부(100)는 기업 또는 외부 서버와 연동되며, 해당 서버에 저장된 도면 및 문서에 대한 정보를 수집하고 데이터베이스(200)에 저장한다. 이때, 도면 및 문서에 대한 정보는 일례로, 도면 또는 문서의 파일명, 문서의 내용, 작성자 및 작성일 중 적어도 어느 하나일 수 있으나, 이에 제한되는 것은 아니다. First, the data collection unit 100 is interlocked with a company or an external server, and collects information on drawings and documents stored in the corresponding server and stores it in the database 200. At this time, the information about the drawing and the document is, for example, the file name of the drawing or document, the content of the document, the author and the creation date, but may be at least one, but is not limited thereto.

이때, 도면은 이미지파일 즉, jpg, tif, tiff, dwg, 및 bmp 등의 확장자를 가지는 이미지 파일일 수 있으며, 문서는 doc, xls, xlsx, ppt, txt 및 pdf 등의 확장자를 가지는 전자문서 일 수 있다. In this case, the drawing may be an image file, that is, an image file having extensions such as jpg, tif, tiff, dwg, and bmp, and the document may be an electronic document having extensions such as doc, xls, xlsx, ppt, txt, and pdf. You can.

분류 키워드 추출부(120)는 데이터베이스(200)에 저장된 복수의 도면 및 문서 내에서 분류 키워드를 추출한다. The classification keyword extraction unit 120 extracts classification keywords from a plurality of drawings and documents stored in the database 200.

도 3은 본 발명의 일 실시예에 따라 도면 및 문서에서 분류 키워드를 추출하는 일례를 도시한 도면이다. 구체적으로 도 3의 (a)는 본 발며의 일 실시예에 따라 도면 내에서 텍스트를 추출하는 일례를 도시하고 있고, 도 3의 (b)는 전자문서에서 텍스트를 추출하는 일례를 도시하고 있다. 3 is a diagram illustrating an example of extracting a classification keyword from drawings and documents according to an embodiment of the present invention. Specifically, FIG. 3 (a) shows an example of extracting text from a drawing according to an embodiment of the present disclosure, and FIG. 3 (b) shows an example of extracting text from an electronic document.

도 3을 참조하면, 분류 키워드 추출부(120)는 도면 및 문서에 포함된 복수의 단어들을 추출하고, 각각의 단어에 대한 빈도수를 산출하여 복수의 분류 키워드를 결정할 수 있다. Referring to FIG. 3, the classification keyword extracting unit 120 may extract a plurality of words included in the drawings and documents, and calculate a frequency for each word to determine a plurality of classification keywords.

이때, 도면의 경우, 광학적 문자 판독장치(Optical character reader, OCR)//사용자 지정 컨트롤을 이용하여 도면 내에 삽입된 텍스트 및 단어를 인식할 수 있다. In this case, in the case of a drawing, it is possible to recognize text and words inserted in the drawing using an optical character reader (OCR) // user-defined control.

이후, 분류 키워드 추출부(120)는 추출된 단어 각각의 출현 빈도수를 산출하고, 산출된 출현 빈도수에 따라 분류 키워드를 학습하여 분류 모델을 생성할 수 있다. Thereafter, the classification keyword extraction unit 120 may calculate the frequency of occurrence of each of the extracted words, and learn the classification keyword according to the calculated frequency of occurrence to generate a classification model.

예를 들어, 분류 키워드 추출부(120)는 A문서에서 추출된 단어들 중, 빈도수가 일정 횟수 이상인 단어 또는 전체 단어에 대한 출현 빈도수 대비 해당 단어의 빈도 비율 값을 산출하여 특정 비율 이상인 단어들을 추출하고, 분류 키워드로 결정할 수 있다. For example, the classification keyword extracting unit 120 extracts words having a frequency of a certain ratio or more by calculating a frequency ratio value of the word relative to the frequency of occurrence of a word whose frequency is a certain number of times or more, or among words extracted from the document A And can be determined by classification keywords.

예를 들어, A문서에서 자동차, 자동, 무인, 영상, 및 제어라는 단어가 추출되고, 각각의 단어의 빈도수가 자동차 30회, 자동 35회, 무인 10회, 영상 5회, 제어 15회 등의 빈도수가 산출되고, 15회 이상의 빈도수를 가지는 단어를 분류키워드로 결정할 경우, 제 1 분류 키워드는 자동, 제 2 분류 키워드는 자동차, 제 3 분류키워드는 제어 15 로 결정될 수 있다. For example, in the document A, the words car, auto, unmanned, video, and control are extracted, and the frequency of each word is 30, 35 auto, 10 unmanned, 5 video, 15 control, etc. When a frequency is calculated, and a word having a frequency of 15 or more is determined as a classification keyword, the first classification keyword may be determined as automatic, the second classification keyword as automobile, and the third classification keyword as control 15.

또한, 분류 키워드 추출부(120)는 사용자 분류 키워드를 수집한다. 여기서, 사용자 분류키워드는 사용자 단말(300)을 사용하는 사용자 이름, 사용자가 속한 기관 또는 기업의 특성, 사용자가 소속되어 있는 부서 및 사용자가 진행중인 프로젝트 정보 등, 사용자와 관련된 키워드를 말한다.In addition, the classification keyword extraction unit 120 collects user classification keywords. Here, the user classification keyword refers to a keyword related to the user, such as a user name using the user terminal 300, a characteristic of an organization or a company to which the user belongs, a department to which the user belongs, and project information in progress by the user.

이어서, 분류모델 생성부(130)는 분류 키워드 추출부(120)로부터 도면 및 문서 내에서 추출된 분류키워드와 사용자 분류 키워드를 머신러닝을 통해 학습하여 복수의 분류 모델을 생성한다. Subsequently, the classification model generation unit 130 generates a plurality of classification models by learning classification keywords and user classification keywords extracted from drawings and documents from the classification keyword extraction unit 120 through machine learning.

도 4는 본 발명의 일 실시예에 따라 분류모델 생성부가 복수의 분류 모델을 생성하는 일례를 도시하고 있다. 4 illustrates an example in which the classification model generation unit generates a plurality of classification models according to an embodiment of the present invention.

도 4를 참조하면 분류모델 생성부(130)는 유사 키워드끼리 묶어주는 클러스터링 기법을 이용하여 복수의 분류 모델을 생성할 수 있다. Referring to FIG. 4, the classification model generation unit 130 may generate a plurality of classification models using a clustering technique that bundles similar keywords.

예를 들어, 분류모델 생성부(130)는 분류 키워드 중, "프로젝트"를 이용하여 클러스터링 함으로써 분류모델 1을 생성하고, 분류 키워드로 "제품"을 이용하여 클러스터링 함으로써 분류모델 2를 생성할 수 있다. For example, the classification model generation unit 130 may generate a classification model 1 by clustering using a "project" among classification keywords, and a classification model 2 by clustering using a "product" as the classification keyword. .

다시 말해, "프로젝트"라는 키워드가 포함된 도면 및 문서를 클러스터링 하여 분류모델 1을 생성하고, "제품"이라는 키워드가 포함된 도면 및 문서를 클러스터링 하여 분류모델 2를 생성할 수 있다. In other words, the classification model 1 may be generated by clustering drawings and documents containing the keyword “project”, and the classification model 2 may be generated by clustering the drawings and documents containing the keyword “product”.

또한 분류모델 생성부(130)는 사용자 분류키워드에 따라 분류모델을 생성할 수도 있다. 즉, 설계 담당 또는 제품 담당과 같은 사용자가 속한 부서 또는 업무와 관련된 분류 모델을 생성할 수도 있다.In addition, the classification model generation unit 130 may generate a classification model according to a user classification keyword. That is, a classification model related to a department or task to which a user belongs, such as a design manager or a product manager, may be generated.

상술한 방식으로 분류모델 생성부(130)는 기 설정된 개수 만 큼 분류모델을 생성할 수 있다. 또는 분류 모델 생성부(130)는 도면 및 문서 내에서 추출된 분류 키워드 전체에 대해서 분류 모델을 생성할 수 있다. 이렇게 생성된 복수의 분류 모델 전체 집합을 본 발명에서는 마스터 모델이라고 정의한다. In the above-described manner, the classification model generation unit 130 may generate as many classification models as the preset number. Alternatively, the classification model generation unit 130 may generate a classification model for all classification keywords extracted from the drawings and documents. The entire set of classification models generated in this way is defined as a master model in the present invention.

또한, 분류 모델 생성부(130)는 마스터 모델에 포함된 복수의 분류 모델 중, 사용자와 관련된 적어도 하나 이상의 분류 모델을 선출하여 사용자 맞춤형 모델을 생성하고 사용자 단말로 제공할 수 있다. In addition, the classification model generation unit 130 may select at least one classification model related to a user from among a plurality of classification models included in the master model to generate a user-customized model and provide it to a user terminal.

즉, 본 발명에서 사용자 맞춤형 모델이란, 마스터 모델에 포함된 분류 모델 중, 사용자와 관련된 적어도 하나 이상의 분류 모델 집합을 말한다. That is, in the present invention, the user-customized model refers to a set of at least one classification model associated with a user among classification models included in the master model.

이때, 본 발명의 일 실시예에 따른 분류 모델 생성부(130)는 사용자 단말로 사용자 맞춤형 모델을 생성하기 위한 제 1 키워드를 입력 받을 수 있다. At this time, the classification model generator 130 according to an embodiment of the present invention may receive a first keyword for generating a user-customized model with a user terminal.

따라서, 분류 모델 생성부(130)는 사용자 단말을 통해 입력된 제 1 키워드를 기준으로, 제 1 키워드와 관련된 유사 키워드 등을 추출하여, 분류 모델을 선택하고, 선택된 적어도 하나 이상의 분류 모델을 포함하도록 사용자 맞춤형 모델을 생성한다. Accordingly, the classification model generation unit 130 extracts similar keywords related to the first keyword based on the first keyword input through the user terminal, selects a classification model, and includes at least one selected classification model. Create custom models.

마지막으로, 분류 서비스 제공부(140)는 분류 모델 생성부(130)에 의해서 생성된 사용자 맞춤형 모델을 사용자에게 제공한다. 이때, 분류 서비스 제공부(140)는 자동 분류 및 검색 서비스 플랫폼을 통해서 사용자에게 사용자 맞춤형 분류 모델 및 맞춤형 분류 서비스를 제공할 수 있다. Finally, the classification service providing unit 140 provides a user-customized model generated by the classification model generation unit 130 to the user. At this time, the classification service providing unit 140 may provide a user-customized classification model and a customized classification service to a user through an automatic classification and search service platform.

도 5는 본 발명의 일 실시예에 따른 자동분류 및 검색 서비스 플랫폼을 설명하기 위한 도면이다. 5 is a view for explaining an automatic classification and search service platform according to an embodiment of the present invention.

도 5를 참조하면 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템은 자동분류 및 검색서비스 플랫폼(10)은 기업 또는 외부 서버에 포함된 도면관리 시스템(12), 문서관리 시스템(14) 및 자료관리 시스템(16)과 네트워크를 통해서 연동될 수 있다. 또한, 자동분류 및 검색서비스 플랫폼(10)은 문서 자동분류 서버(100)의 데이터 수집부(110)와 연동되며, 도면관리 시스템(12), 문서관리 시스템(14) 및 자료관리 시스템(16)을 통해 수집된 도면 및 문서에 대한 문서명, 내용, 작성자, 및 작성일 등의 도면 및 문서에 관한 정보들을 전송할 수 있다. Referring to Figure 5, the document automatic classification service providing system according to an embodiment of the present invention includes an automatic classification and search service platform 10, a drawing management system 12 and a document management system 14 included in an enterprise or an external server. And a data management system 16 and a network. In addition, the automatic classification and search service platform 10 is interlocked with the data collection unit 110 of the automatic document classification server 100, the drawing management system 12, the document management system 14 and the data management system 16 Information about drawings and documents, such as document name, content, author, and creation date, can be transmitted through the collected drawings and documents.

한편, 본 발명의 일 실시예에 따르면, 문서 자동분류 서비스 제공 시스템은 자동분류 및 검색서비스 플랫폼(10)은 웹페이지를 통해서 제공될 수 있다. 또는 사용자 단말을 통해서 어플리케이션의 형태로 다운로드 되고, 사용자 단말에 저장될 수 있다. 따라서, 사용자가 자동분류 및 검색서비스 플랫폼 어플리케이션을 실행시에, 사용자에게 맞춤형 분류 서비스를 제공할 수 있다. On the other hand, according to an embodiment of the present invention, the document automatic classification service providing system, the automatic classification and search service platform 10 may be provided through a web page. Alternatively, it can be downloaded in the form of an application through the user terminal and stored in the user terminal. Therefore, when the user executes the automatic classification and search service platform application, it is possible to provide a customized classification service to the user.

다시 말해, 본 발명의 일 실시예에 따른 분류 서비스 제공부(140)는 웹페이지의 자동분류 및 검색서비스 플랫폼(10)을 통해 사용자 맞춤형 분류 모델을 제공하거나, 자동분류 및 검색서비스 플랫폼 어플리케이션을 사용자 단말에 전송하여 사용자 맞춤형 분류 서비스를 제공할 수 있다. In other words, the classification service providing unit 140 according to an embodiment of the present invention provides a user-customized classification model through the automatic classification and search service platform 10 of a web page, or a user of the automatic classification and search service platform application. It is possible to provide a user-customized classification service by transmitting to a terminal.

이하, 도 6을 참조하여, 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템의 문서 자동분류 서비스 제공 방법을 상세히 설명하도록 한다. Hereinafter, a method for providing a document automatic classification service in a document automatic classification service providing system according to an embodiment of the present invention will be described in detail with reference to FIG. 6.

도 6은 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템의 문서 자동분류 서비스 제공 방법을 상세히 설명하기 위한 흐름도이다. 6 is a flowchart illustrating in detail a method for providing a document automatic classification service in a system for automatically providing a document classification service according to an embodiment of the present invention.

도 6을 참조하면, 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 방법은, 먼저 데이터 수집부(110)가 자동분류 및 검색 서비스 플랫폼을 통해 사용자 단말로부터 입력된 제 1 키워드를 수신한다(S10). Referring to FIG. 6, in the method for providing an automatic document classification service according to an embodiment of the present invention, first, the data collection unit 110 receives the first keyword input from the user terminal through the automatic classification and search service platform ( S10).

이후, 데이터 수집부(110)는 기업 또는 기관 등의 외부서버로부터 도면 및 문서 정보를 수집하여 데이터베이스(200)에 저장한다. 이때, 데이터 수집부(110)는 사용자가 속한 기업 또는 기관, 사용자가 소속된 부서 등의 특성에 따른 문서를 수집할 수 있다(S20).Thereafter, the data collection unit 110 collects drawing and document information from an external server such as a company or an institution and stores it in the database 200. At this time, the data collection unit 110 may collect documents according to characteristics of a company or organization to which the user belongs, a department to which the user belongs, etc. (S20).

이때, 도면 및 문서에 대한 정보는 일례로, 도면 또는 문서의 파일명, 문서의 내용, 작성자 및 작성일 중 적어도 어느 하나일 수 있으나, 이에 제한되는 것은 아니다. At this time, the information about the drawing and the document is, for example, the file name of the drawing or document, the content of the document, the author and the creation date, but may be at least one, but is not limited thereto.

여기서, 도면은 이미지파일 즉, jpg, tif, tiff, dwg, 및 bmp 등의 확장자를 가지는 이미지 파일일 수 있으며, 문서는 doc, xls, xlsx, ppt, txt 및 pdf 등의 확장자를 가지는 전자문서 일 수 있다. Here, the drawing may be an image file, that is, an image file having extensions such as jpg, tif, tiff, dwg, and bmp, and the document may be an electronic document having extensions such as doc, xls, xlsx, ppt, txt, and pdf. You can.

다음으로, 분류 키워드 추출부(120)는 데이터베이스(200)에 저장된 복수의 도면 및 문서 내에서 분류 키워드를 추출한다(S30). Next, the classification keyword extraction unit 120 extracts classification keywords from a plurality of drawings and documents stored in the database 200 (S30).

분류 키워드 추출부(120)는 도면 및 문서에 포함된 복수의 단어들을 추출하고, 각각의 단어에 대한 빈도수를 산출하여 복수의 분류 키워드를 결정할 수 있다. The classification keyword extraction unit 120 may extract a plurality of words included in the drawings and documents, and calculate a frequency for each word to determine a plurality of classification keywords.

이후, 분류 키워드 추출부(120)는 추출된 단어 각각의 출현 빈도수를 산출하고, 산출된 출현 빈도수에 따라 분류 키워드를 결정한다. 예를 들어, 분류 키워드 추출부(120)는 A문서에서 추출된 단어들 중, 빈도수가 일정 횟수 이상인 단어 또는 전체 단어에 대한 출현 빈도수 대비 해당 단어의 빈도 비율 값을 산출하여 특정 비율 이상인 단어들을 추출하고, 분류 키워드로 결정할 수 있다. Thereafter, the classification keyword extraction unit 120 calculates the frequency of occurrence of each of the extracted words, and determines the classification keyword according to the calculated frequency of occurrence. For example, the classification keyword extracting unit 120 extracts words having a frequency of a certain ratio or more by calculating a frequency ratio value of the word relative to the frequency of occurrence of a word whose frequency is a certain number of times or more, or among words extracted from the document A And can be determined by classification keywords.

이어서, 분류 키워드 추출부(120)는 사용자 분류 키워드를 수집하고(S42), 분류모델 생성부(130)가 분류 키워드 추출부(120)로부터 도면 및 문서 내에서 추출된 분류키워드와 사용자 분류 키워드를 머신러닝을 통해 학습한다(S40). Subsequently, the classification keyword extraction unit 120 collects user classification keywords (S42), and the classification model generation unit 130 extracts classification keywords and user classification keywords extracted from the drawing keywords and documents from the classification keyword extraction unit 120. Learning through machine learning (S40).

이후, 분류모델 생성부(130)는 학습된 분류 키워드를 통해 복수의 분류 모델을 생성한다(S50). 이때, 분류모델 생성부(130)는 유사 키워드끼리 묶어주는 클러스터링 기법을 이용하여 복수의 분류 모델을 생성할 수 있다. Thereafter, the classification model generation unit 130 generates a plurality of classification models through the learned classification keywords (S50). At this time, the classification model generation unit 130 may generate a plurality of classification models using a clustering technique that bundles similar keywords.

이후, 분류모델 생성부(130)는 생성된 분류 모델의 활용여부를 판단한다(S60). 이때, 분류모델 생성부(130)는 생성된 분류 모델의 활용여부를 사용자 단말로 표시하여, 사용자로 하여금 적합 여부를 확인하도록 하거나, 또는 기 설정된 신뢰도 이상일 경우, 분류 모델이 적합하다고 판단할 수 있다. Thereafter, the classification model generation unit 130 determines whether to use the generated classification model (S60). At this time, the classification model generation unit 130 may display whether or not the generated classification model is utilized by the user terminal, so that the user can check whether it is suitable, or if the reliability is higher than a predetermined reliability, the classification model may be determined to be suitable. .

이때, 분류 모델이 적합하다고 판단되는 경우, 해당 분류모델은 마스터 분류 모델을 생성하기 위한 분류모델의 한 요소로서 결정되고, 적합하지 않은것으로 판단되는 경우, 분류 키워드를 학습하는 단계(S40)로 되돌아가 분류 키워드를 학습하는 단계(S40) 및 분류 모델을 생성하는 단계(S50)를 다시 반복할 수 있다.At this time, when it is determined that the classification model is suitable, the classification model is determined as an element of the classification model for generating the master classification model, and when it is determined that it is not suitable, the classification keyword is returned to step S40. (A) learning the classification keyword (S40) and generating the classification model (S50) may be repeated again.

이에 따라 분류모델 생성부(130)는 기 설정된 개수 또는 도면 및 문서 내에서 추출된 분류 키워드 전체에 대해서 복수의 분류 모델을 생성하고, 마스터 모델을 생성한다(S70). Accordingly, the classification model generation unit 130 generates a plurality of classification models for a predetermined number or all classification keywords extracted from drawings and documents, and generates a master model (S70).

다음으로, 분류 모델 생성부(130)는 마스터 모델에 포함된 복수의 분류 모델 중, 사용자와 관련된 적어도 하나 이상의 분류 모델을 선출하여(S82) 사용자 맞춤형 모델을 생성하고(S80) 사용자 단말로 제공할 수 있다(S90). Next, the classification model generation unit 130 selects at least one classification model related to a user from among a plurality of classification models included in the master model (S82) to generate a user-customized model (S80) and provides it to the user terminal. It can be (S90).

구체적으로, 사용자와 관련된 적어도 하나 이상의 분류 모델을 선출하는 단계(S82)에서, 분류모델 생성부(130)는 제 1 키워드를 기준으로, 제 1 키워드와 관련된 유사 키워드 등을 추출하여, 분류 모델을 선택할 수 있다. Specifically, in the step (S82) of selecting at least one classification model associated with the user, the classification model generation unit 130 extracts similar keywords related to the first keyword based on the first keyword, and extracts the classification model. You can choose.

또한, 사용자 단말로 사용자 맞춤형 모델을 제공하는 단계(S90)에서, 분류 서비스 제공부(140)는 분류 모델 생성부(130)에 의해서 생성된 사용자 맞춤형 모델을 웹페이지의 자동분류 및 검색서비스 플랫폼(10)을 통해 사용자 맞춤형 분류 모델을 제공하거나, 자동분류 및 검색서비스 플랫폼 어플리케이션을 사용자 단말에 전송하여 사용자 맞춤형 분류 서비스를 제공할 수 있다. In addition, in the step (S90) of providing a user-customized model to the user terminal, the classification service providing unit 140 automatically classifies and searches the user-customized model generated by the classification model generation unit 130 of the webpage platform. Through 10), a user-defined classification model may be provided, or a user-defined classification service may be provided by transmitting an automatic classification and search service platform application to a user terminal.

한편, 본 발명의 일 실시예에 따른 문서 자동분류 서비스 제공 시스템은 문서 자동분류 서버(100) 에 접속한 사용자 단말(300)로부터 사용자 인증을 수행하고, 사용자가 제공받은 맞춤형 분류모델 서비스의 횟수 또는 서비스 기간에 따라 요금을 부과하는 결제부(미도시됨)를 더 포함할 수 있다. On the other hand, the document automatic classification service providing system according to an embodiment of the present invention performs user authentication from the user terminal 300 connected to the document automatic classification server 100, and the number of customized classification model services provided by the user, or A payment unit (not shown) for charging a fee according to the service period may be further included.

예를 들어, 문서 자동분류서버(100)는 결제부를 통해, 맞춤형 분류모델 서비스 요금을 산출하고 결제 서비스를 제공할 수 있다. For example, the document automatic classification server 100 may calculate a customized classification model service fee and provide a payment service through a payment unit.

상술한 바와 같이, 본 발명의 일 실시예에 따르면, 문서 자동분류 서비스를 제공 시스템을 제공할 수 있다. As described above, according to an embodiment of the present invention, a system for providing an automatic document classification service can be provided.

따라서, 사용자 개개인의 목적에 맞는 분류 모델을 생성하여 제공함으로써, 방대한 양의 도면 또는 문서 관리 또는 문서 검색을 손쉽게 할 수 있다. Therefore, by generating and providing a classification model suitable for each user's purpose, it is possible to easily manage a large amount of drawings or documents or search documents.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다. One embodiment of the invention may also be implemented in the form of a recording medium comprising instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically include computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism, and includes any information delivery media.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described in connection with specific embodiments, some or all of their components or operations may be implemented using a computer system having a general purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration only, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and it should be interpreted that all changes or modified forms derived from the meaning and scope of the claims and equivalent concepts thereof are included in the scope of the present invention. do.

10: 자동분류 및 검색 서비스 플랫폼
100: 문서 자동분류 서버
110: 데이터 수집부
120: 분류 키워드 추출부
130: 분류모델 생성부
140: 분류 서비스 제공부
200: 데이터 베이스
300: 사용자 단말10: Automatic classification and search service platform
100: automatic document classification server
110: data collection unit
120: classification keyword extraction unit
130: classification model generation unit
140: classification service provider
200: database
300: user terminal

Claims

In the document automatic classification service providing server interworking with a user terminal, receiving a first keyword input from the user through the user terminal, and providing a classification model related to the first keyword to the user terminal,
A data collection unit collecting a plurality of drawing and document information from an external server;
A classification keyword extraction unit that extracts classification keywords from documents included in the data collection unit and collects user classification keywords based on user information;
A classification model that generates a plurality of classification models by learning the classification keywords extracted from the document and the user classification keywords, and extracts a classification model related to the first keyword from among the plurality of classification models to generate a customized classification model. Generation unit; And
It includes a classification service providing unit for providing the customized classification model to the user terminal,
Server for automatic document classification service.

According to claim 1,
Further comprising a database for storing the drawings and document information collected from the data collection unit,
The drawings and document information
The document automatic classification service providing server, which includes any one of a file name, content, author, and creation date for each drawing and document.

According to claim 1,
The classification keyword extraction unit extracts at least one word included in each of the drawings and documents,
The classification keyword is determined according to the frequency of occurrence of the word,
Server for automatic document classification service.

The method of claim 3,
The classification keyword extraction unit determines the user classification keyword based on at least one of a department to which the user who input the first keyword belongs, a project in which the user is in progress, and a business pattern of the user,
Server for automatic document classification service.

The method of claim 4,
The classification model generation unit
Create a complete set of a plurality of classification models generated by learning the classification keywords and the user classification keywords as a master model,
At least one classification model related to the first keyword is extracted from the master model to generate a customized classification model.
Server for automatic document classification service.

The method of claim 5,
The classification model generation unit
Extracting similar keywords related to the first keyword, and generating the customized classification model based on the similar keywords,
Server for automatic document classification service.

According to claim 1,
Further comprising a payment service for classification services,
The classification service fee payment unit
Calculating a fee according to the number of times of the customized classification model provided to the user terminal, and transmitting the calculated fee information to the user terminal,
Server for automatic document classification service.