KR102219347B1

KR102219347B1 - Apparatus and method for classifying electric records based on business activity of organization

Info

Publication number: KR102219347B1
Application number: KR1020190020744A
Authority: KR
Inventors: 오효정; 건 김; 윤은하; 양동민
Original assignee: 전북대학교산학협력단
Priority date: 2018-11-07
Filing date: 2019-02-21
Publication date: 2021-02-24
Also published as: KR20200052801A

Abstract

전자 기록물 분류 장치로서, 전자 기록물 및 상기 전자 기록물을 생산한 조직의 업무 활동 정보로 구성된 학습 데이터를 이용하여 적어도 하나 이상의 전자 기록물 분류 모델들을 기계 학습시키는 업무 활동 학습부, 그리고 신규 전자 기록물을 수신하고, 상기 전자 기록물 분류 모델들을 이용하여 상기 신규 전자 기록물을 분류하는 전자 기록물 분류부를 포함한다.A device for classifying electronic records, comprising: a business activity learning unit for machine learning at least one electronic record classification model using learning data consisting of electronic records and business activity information of an organization that produced the electronic records, and receiving a new electronic record, And an electronic records classification unit for classifying the new electronic records using the electronic records classification models.

Description

An apparatus and method for classifying electronic records based on an organization's business activities {APPARATUS AND METHOD FOR CLASSIFYING ELECTRIC RECORDS BASED ON BUSINESS ACTIVITY OF ORGANIZATION}

본 발명은 조직의 업무 활동에 기반하여 생산된 전자 기록물을 분류하는 기술에 관한 것이다.The present invention relates to a technology for classifying electronic records produced based on the business activities of an organization.

최근 IT 기술의 발달로 조직에서 생산되는 기록물은 대부분 전자 기록물의 형태이며, 전자 기록물의 편의성 및 행정 업무의 증가로 인해 방대한 양의 전자 기록물들이 생산되고 있다. 이에 따라, 생산된 전자 기록물들을 자동으로 분류하는 지능형 기술이 필요한 실정이며, 특히 전지 기록물을 생산한 조직의 업무 활동 특성 및 조직 구성원의 업무 활동을 학습하여, 생산된 전자 기록물의 적합한 범주를 추천해주는 기술이 필요하다.With the recent development of IT technology, most of the records produced by organizations are in the form of electronic records, and a vast amount of electronic records are being produced due to the convenience of electronic records and an increase in administrative work. Accordingly, an intelligent technology that automatically classifies produced electronic records is required, and in particular, by learning the business activity characteristics of the organization that produced the battery records and the business activities of the members of the organization, recommending the appropriate categories of the produced electronic records. You need skills.

한편, 기존에도 자연어 처리 기법을 적용하여 특정 문서의 텍스트로부터 내용을 파악하고 기계학습 기법을 적용한 분류 방법은 시도되었다. 구체적으로, 기록물에 대한 가치가 높아짐에 따라 이를 효과적으로 저장, 관리, 보존 및 활용하고자 하는 요구가 커지게 되었고, 인공지능 기술의 발달에 힘입어 지능화 기술을 전자 기록물을 관리하는 분야에 접목하여 관리의 효율화를 꾀하고자 하였다.On the other hand, in the past, natural language processing techniques were applied to grasp the content from the text of a specific document and a classification method applied with machine learning techniques was attempted. Specifically, as the value of records increases, the demand for effective storage, management, preservation, and use of them has increased, and thanks to the development of artificial intelligence technology, intelligent technology has been applied to the field of managing electronic records. I tried to improve efficiency.

그러나, 전자 기록물을 생산한 조직의 업무 활동 및 특성을 파악하고, 이를 이용하여 전자 기록물을 분류하려는 시도는 전무하였다. 대부분의 조직에서 수행하는 전자 기록물 분류 기법은 대부분 자연어 처리 기법에 기반한 텍스트 자질만을 사용하는 방식으로, 조직의 업무 특성에 따른 다양한 활동 정보를 활용하지 않고 있었으며, 기 정의 또는 선별된 학습데이터에 기반한 고정학습 방법에 해당하여 신규 업무 도출이 불가한 한계가 있었다.However, there has been no attempt to grasp the business activities and characteristics of the organization that produced the electronic records and use them to classify the electronic records. Most of the electronic record classification techniques performed by most organizations use only text qualities based on natural language processing techniques, and do not utilize various activity information according to the organization's business characteristics, and are fixed based on predefined or selected learning data. There was a limit in which it was impossible to deduce new tasks corresponding to the learning method.

한국공개특허공보 제10-2018-0124529호(2018.11.21.)Korean Patent Application Publication No. 10-2018-0124529 (November 21, 2018)

본 발명이 해결하고자 하는 과제는 조직의 업무 활동의 특성에 기반하여 생산된 전자 기록물을 자동으로 분류하고, 분류된 전자 기록물의 적합성을 재결정하여 신규 업무를 인식하는 기술을 제공하는 것이다.The problem to be solved by the present invention is to provide a technology for automatically classifying produced electronic records based on the characteristics of the business activities of an organization and re-determining the suitability of the classified electronic records to recognize new business.

본 발명의 일 실시예에 따른 전자 기록물 분류 장치는 전자 기록물 및 상기 전자 기록물을 생산한 조직의 업무 활동 정보로 구성된 학습 데이터를 이용하여 적어도 하나 이상의 전자 기록물 분류 모델들을 기계 학습시키는 업무 활동 학습부, 그리고 신규 전자 기록물을 수신하고, 상기 전자 기록물 분류 모델들을 이용하여 상기 신규 전자 기록물을 분류하는 전자 기록물 분류부를 포함한다.An electronic record classification apparatus according to an embodiment of the present invention includes a work activity learning unit for machine learning at least one electronic record classification model by using learning data consisting of an electronic record and business activity information of an organization that produced the electronic record, And an electronic record classification unit for receiving the new electronic record and classifying the new electronic record using the electronic record classification models.

상기 업무 활동 정보는 상기 조직의 조직 정보, 상기 조직의 업무 정보 또는 상기 조직의 기록물 정보 중 적어도 하나를 포함한다.The business activity information includes at least one of organizational information of the organization, business information of the organization, or recorded information of the organization.

상기 업무 활동 학습부는 상기 전자 기록물, 상기 조직 정보 및 상기 업무 정보로 구성된 제1 학습 데이터를 이용하여 상기 전자 기록물을 다중 분류하기 위한 제1 전자 기록물 분류 모델을 기계 학습시키고, 상기 전자 기록물, 상기 조직 정보 및 상기 기록물 정보로 구성된 제2 학습 데이터를 이용하여 상기 전자 기록물을 이질 분류하기 위한 제2 전자 기록물 분류 모델을 기계 학습시킨다.The business activity learning unit machine learns a first electronic record classification model for multi-classifying the electronic record by using first learning data composed of the electronic record, the organization information, and the business information, and the electronic record, the organization Machine learning is performed on a second electronic recording object classification model for disparate classification of the electronic recording object by using the second learning data composed of information and the recorded object information.

상기 전자 기록물 분류부는 상기 제1 전자 기록물 분류 모델에 따른 제1 분류 결과 및 상기 제2 전자 기록물 분류 모델에 따른 제2 분류 결과에 가중치를 각각 적용하여, 상기 신규 전자 기록물에 대한 분류 정보를 생성한다.The electronic records classification unit applies weights to a first classification result according to the first electronic records classification model and a second classification result according to the second electronic records classification model, respectively, to generate classification information for the new electronic records. .

상기 전자 기록물 분류 장치는 동일한 분류 정보를 갖는 복수의 전자 기록물들 간의 응집도를 각각 결정하고, 상기 복수의 전자 기록물들 중에서 응집도가 미리 설정된 임계범위 내에 속하지 않은 적어도 하나 이상의 전자 기록물들을 신규 업무로 추천하는 신규 업무 인식부를 더 포함한다.The electronic record classification apparatus determines a degree of cohesion between a plurality of electronic records having the same classification information, and recommends at least one electronic record among the plurality of electronic records that do not fall within a preset threshold range as a new task. It further includes a new job recognition unit.

상기 신규 업무 인식부는 동일한 분류 정보를 갖는 각 전자 기록물들에 대해 벡터화 알고리즘을 이용하여 전자 기록물 임베딩 벡터를 생성하고, 상기 전자 기록물 임베딩 벡터에 대해 벡터 유사도 판단 알고리즘을 이용하여 상기 복수의 전자 기록물들 간의 응집도를 각각 결정한다.The new business recognition unit generates an electronic record embedding vector using a vectorization algorithm for each electronic record having the same classification information, and uses a vector similarity determination algorithm for the electronic record embedding vector. Determine the degree of aggregation, respectively.

본 발명의 일 실시예에 따른 전자 기록물 분류 장치가 수신된 전자 기록물을 분류하는 방법은 전자 기록물 및 상기 전자 기록물을 생산한 조직의 업무 활동 정보로 구성된 학습 데이터를 이용하여 적어도 하나 이상의 전자 기록물 분류 모델들을 기계 학습시키는 단계, 신규 전자 기록물을 수신하는 단계, 그리고 상기 전자 기록물 분류 모델들을 이용하여 상기 신규 전자 기록물을 분류하는 단계를 포함한다.A method of classifying received electronic records by an electronic records classification apparatus according to an embodiment of the present invention includes at least one electronic records classification model using learning data consisting of electronic records and business activity information of the organization that produced the electronic records. And machine learning the new electronic records, receiving a new electronic record, and classifying the new electronic record using the electronic record classification models.

상기 기계 학습시키는 단계는 상기 전자 기록물, 상기 조직 정보 및 상기 업무 정보로 구성된 제1 학습 데이터를 이용하여 상기 전자 기록물을 다중 분류하기 위한 제1 전자 기록물 분류 모델을 기계 학습시키는 단계, 그리고 상기 전자 기록물, 상기 조직 정보 및 상기 기록물 정보로 구성된 제2 학습 데이터를 이용하여 상기 전자 기록물을 이질 분류하기 위한 제2 전자 기록물 분류 모델을 기계 학습시키는 단계를 포함한다.The machine learning may include machine learning a first electronic record classification model for multi-classifying the electronic record using first learning data composed of the electronic record, the organization information, and the work information, and the electronic record And machine learning a second electronic records classification model for disparate classification of the electronic records by using second training data composed of the organization information and the records information.

상기 신규 전자 기록물을 분류하는 단계는 상기 제1 전자 기록물 분류 모델에 따른 제1 분류 결과 및 상기 제2 전자 기록물 분류 모델에 따른 제2 분류 결과에 가중치를 각각 적용하여, 상기 신규 전자 기록물에 대한 분류 정보를 생성한다.The step of classifying the new electronic records includes applying weights to a first classification result according to the first electronic records classification model and a second classification result according to the second electronic records classification model, respectively, and classifying the new electronic records. Generate information.

상기 전자 기록물 분류 장치가 수신된 전자 기록물을 분류하는 방법은 동일한 분류 정보를 갖는 복수의 전자 기록물들 간의 응집도를 각각 결정하는 단계, 그리고 상기 복수의 전자 기록물들 중에서 응집도가 미리 설정된 임계범위 내에 속하지 않은 적어도 하나 이상의 전자 기록물들을 신규 업무로 추천하는 단계를 더 포함한다.The method of classifying received electronic records by the electronic record classification apparatus includes determining a degree of cohesion between a plurality of electronic records having the same classification information, and a degree of cohesion among the plurality of electronic records not falling within a preset threshold range. And recommending at least one or more electronic records as a new job.

상기 응집도를 각각 결정하는 단계는 동일한 분류 정보를 갖는 각 전자 기록물들에 대해 벡터화 알고리즘을 이용하여 전자 기록물 임베딩 벡터를 생성하는 단계, 그리고 상기 전자 기록물 임베딩 벡터에 대해 벡터 유사도 판단 알고리즘을 이용하여 상기 복수의 전자 기록물들 간의 응집도를 각각 결정하는 단계를 포함한다.The determining of the degree of cohesion includes generating an electronic record embedding vector using a vectorization algorithm for each electronic record having the same classification information, and the plurality of the electronic record embedding vector using a vector similarity determination algorithm. And determining the degree of cohesion between the electronic records of each.

본 발명에 따르면, 새롭게 생산되는 전자 기록물의 적합 범주를 자동으로 추천 및 분류할 수 있어 효율적인 전자 기록물의 관리가 가능하다.According to the present invention, it is possible to automatically recommend and classify suitable categories of newly produced electronic records, thereby enabling efficient management of electronic records.

또한, 본 발명에 따르면, 기 분류된 전자 기록물의 적합성을 재결정하여 부적합하게 분류된 전자 기록물을 신규 업무로 도출함으로써, 전자 기록물의 응집도 높은 관리가 가능하다.Further, according to the present invention, by re-determining the suitability of pre-classified electronic records and deriving inappropriately classified electronic records as a new task, it is possible to manage high degree of aggregation of electronic records.

도 1은 한 실시예에 따른 전자 기록물 분류 장치가 구현되는 환경을 설명하는 도면이다.
도 2는 한 실시예에 따른 전자 기록물 분류 장치를 설명하는 도면이다.
도 3은 한 실시예에 따른 전자 기록물 분류부가 신규 전자 기록물을 분류하는 방법을 설명하는 도면이다.
도 4는 한 실시예에 따른 신규 업무 인식부가 전자 기록물 임베딩 벡터를 생성하는 방법을 도시한 도면이다.
도 5는 전자 기록물 임베딩 벡터 사이의 유사도를 계산하는 방법을 설명하는 도면이다.
도 6은 전자 기록물 분류 장치가 수신된 전자 기록물을 분류하는 방법을 설명하는 도면이다.1 is a diagram illustrating an environment in which an apparatus for classifying electronic records according to an embodiment is implemented.
2 is a diagram illustrating an apparatus for classifying electronic records according to an embodiment.
3 is a diagram illustrating a method of classifying a new electronic record by an electronic record classifying unit according to an embodiment.
4 is a diagram illustrating a method of generating an electronic record embedding vector by a new business recognition unit according to an embodiment.
5 is a diagram for explaining a method of calculating the similarity between electronic recording material embedding vectors.
6 is a diagram illustrating a method of classifying received electronic records by an electronic record classification apparatus.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the embodiments of the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless otherwise stated.

도 1은 한 실시예에 따른 전자 기록물 분류 장치가 구현되는 환경을 설명하는 도면이다.1 is a diagram illustrating an environment in which an apparatus for classifying electronic records according to an embodiment is implemented.

도 1을 참고하면, 전자 기록물 분류 장치가 구현되는 환경(1000)은 전자 기록물 생산 단말(100), 전자 기록물 분류 장치(200) 및 전자 기록물 관리 단말(300)을 포함한다.Referring to FIG. 1, an environment 1000 in which an electronic records classification apparatus is implemented includes an electronic records production terminal 100, an electronic records classification apparatus 200, and an electronic records management terminal 300.

전자 기록물 생산 단말(100)은 기록물을 전자형태로 작성하기 위한 단말을 지칭한다. 구체적으로, 전자 기록물 생산 단말(100)은 특정 기관 내 복수의 조직들 중 어느 한 조직에 속한 조직원의 단말로서, 전자 기록물 생산 단말(100)을 할당받은 조직원은 전자 기록물 생산 단말(100)을 통해 다양한 형태의 전자 기록물을 생산한다.The electronic record production terminal 100 refers to a terminal for creating a record in an electronic form. Specifically, the electronic records production terminal 100 is a terminal of an organization member belonging to one of a plurality of organizations within a specific institution, and an organization member who has been assigned the electronic records production terminal 100 through the electronic records production terminal 100 Produces various types of electronic records.

이 경우, 전자 기록물 생산 단말(100)은 PC(personal computer), 랩탑 컴퓨터, 스마트폰, 태블릿 PC 등을 포함하나, 이에 한정되지 않으며 컴퓨팅 가능한 다양한 장치일 수 있다.In this case, the electronic recording production terminal 100 includes, but is not limited to, a personal computer (PC), a laptop computer, a smart phone, a tablet PC, and the like, and may be various devices capable of computing.

또한, 본 명세서에서 전자 기록물은 전자 문서, 웹 문서 또는 행정 정보 데이터 세트 등과 같은 전자적으로 생성된 모든 기록 정보를 의미한다.In addition, in this specification, an electronic record means all electronically generated record information such as an electronic document, a web document, or an administrative information data set.

전자 기록물 분류 장치(200)는 전자 기록물 생산 단말(100)로부터 생산된 신규 전자 기록물을 수신하고, 전자 기록물 분류 모델을 이용하여 신규 전자 기록물을 분류한다. 또한, 전자 기록물 분류 장치(200)는 신규 전자 기록물 및 신규 전자 기록물의 분류 정보를 전자 기록물 관리 단말(300)로 전송한다.The electronic record classification apparatus 200 receives the new electronic record produced from the electronic record production terminal 100 and classifies the new electronic record using the electronic record classification model. In addition, the electronic records classification apparatus 200 transmits the new electronic records and classification information of the new electronic records to the electronic records management terminal 300.

전자 기록물 분류 장치(200)에 대해서는 이하 도면들을 통해 자세히 설명한다.The electronic record sorting apparatus 200 will be described in detail through the following drawings.

전자 기록물 관리 단말(300)은 신규 전자 기록물 및 분류 정보를 이용하여 신규 전자 기록물을 분류 및 관리한다. 구체적으로, 전자 기록물 관리 단말(300)은 전자 기록물들을 관리하는 업무 담당자, 즉 전자 기록물 관리 담당자의 단말일 수 있다. 이 경우, 전자 기록물 관리 단말(300)은 전자 기록물 생산 단말(100)과 마찬가지로 PC(personal computer), 랩탑 컴퓨터, 스마트폰, 태블릿 PC 등을 포함하나, 이에 한정되지 않으며 컴퓨팅 가능한 다양한 장치일 수 있다.The electronic records management terminal 300 classifies and manages new electronic records using the new electronic records and classification information. Specifically, the electronic records management terminal 300 may be a terminal of a person in charge of managing electronic records, that is, a person in charge of managing electronic records. In this case, the electronic records management terminal 300 includes, but is not limited to, a personal computer (PC), a laptop computer, a smart phone, a tablet PC, etc., like the electronic records production terminal 100, and may be various computing devices. .

도 2는 한 실시예에 따른 전자 기록물 분류 장치를 설명하는 도면이다.2 is a diagram illustrating an apparatus for classifying electronic records according to an embodiment.

도 2를 참고하면, 전자 기록물 분류 장치(200)는 업무 활동 학습부(210), 전자 기록물 분류부(220) 및 신규 업무 인식부(230)를 포함한다.Referring to FIG. 2, the electronic records classification apparatus 200 includes a business activity learning unit 210, an electronic records classification unit 220, and a new business recognition unit 230.

업무 활동 학습부(210)는 전자 기록물 및 전자 기록물을 생산한 조직의 업무 활동 정보로 구성된 학습 데이터를 이용하여 적어도 하나 이상의 전자 기록물 분류 모델들을 기계 학습시킨다.The work activity learning unit 210 machine learns at least one electronic record classification model by using the electronic record and the learning data composed of the work activity information of the organization that produced the electronic record.

구체적으로, 업무 활동 학습부(210)는 전자 기록물을 생산한 조직의 조직 정보, 업무 정보 또는 기록물 정보 중 적어도 하나를 포함하는 학습 데이터를 이용하여 기계 학습(Machine Learning) 기법을 통해 전자 기록물을 분류하기 위한 적어도 하나 이상의 전자 기록물 분류 모델들을 학습시킨다.Specifically, the work activity learning unit 210 classifies electronic records through a machine learning technique by using learning data including at least one of organizational information, work information, or record information of an organization that produced the electronic record. At least one or more electronic records classification models are trained.

여기서, 조직 정보는 전자 기록물을 생산한 조직의 조직도, 직제 또는 규정 등에 대한 정보를 포함하며, 업무 정보는 전자 기록물을 생산한 조직의 업무 분장, 매뉴얼 또는 표준 작업 절차(SOP, Standard operation procedure) 등에 대한 정보를 포함하며, 기록물 정보는 전자 기록물을 생산한 조직에서 생산된 전자 기록물의 내용, 송수신자 또는 서식 등에 대한 정보를 포함할 수 있다.Here, the organizational information includes information on the organizational chart, organization, or regulations of the organization that produced the electronic record, and the business information is the division of work of the organization that produced the electronic record, manual or standard operation procedure (SOP), etc. Information about the electronic record is included, and the record information may include information on the content, sender, or format of the electronic record produced by the organization that produced the electronic record.

업무 활동 학습부(210)는 전자 기록물, 조직 정보 및 업무 정보로 구성된 제1 학습 데이터를 이용하여 전자 기록물을 다중 분류하기 위한 제1 전자 기록물 분류 모델을 기계 학습시킨다.The work activity learning unit 210 machine-learns a first electronic record classification model for multi-classifying electronic records by using first training data composed of electronic records, organization information, and business information.

즉, 제1 학습 데이터는 제1 전자 기록물 분류 모델을 기계 학습시키기 위한 학습 데이터이며, 학습 패턴 쌍(x, d) 중 전자 기록물은 입력 x이고, 조직 정보 및 업무 정보는 목표치 d로 구성된다.That is, the first training data is training data for machine learning the first electronic record classification model, the electronic record among the learning pattern pairs (x, d) is the input x, and the organization information and the work information are composed of the target value d.

예를 들면, 전자 기록물이 "특허팀"에서 생산된 "명세서"인 경우, 제1 학습 데이터는 "명세서"를 입력으로, 전체 조직 내에서 "특허팀"에 속하는 조직도 정보 및 "특허팀"의 업무 분장 정보를 목표치로 하여 구성될 수 있다.For example, if the electronic record is a "specification" produced by the "patent team", the first learning data is the "specification" as input, and the organization chart information belonging to the "patent team" and the "patent team" It can be configured with task division information as a target value.

한편, 제1 학습 데이터는 전자 기록물을 생산한 조직 정보뿐만 아니라 조직의 업무와 관련된 정보를 포함하므로, 제1 학습 데이터를 통해 기계 학습된 제1 전자 기록물 분류 모델은 전자 기록물을 업무 등을 기초로 다중 분류할 수 있게 된다.On the other hand, since the first learning data includes not only information about the organization that produced the electronic record, but also information related to the work of the organization, the first electronic record classification model machine-learned through the first learning data uses the electronic record based on work, etc. Multiple classification is possible.

예를 들면, 제1 전자 기록물 분류 모델은 "명세서"가 입력된 경우, "특허팀"의 조직도 정보 및 "특허팀"의 업무 분장 정보에 기초하여 "명세서"에 대해 "특허팀"에서 생산된 것으로 분류할 수 있다.For example, in the first electronic records classification model, when "specifications" are entered, based on the organizational chart information of the "patent team" and the division information of the "patent team", the "specifications" produced by the "patent team" Can be classified as

또한, 업무 활동 학습부(210)는 전자 기록물, 조직 정보 및 기록물 정보로 구성된 제2 학습 데이터를 이용하여 전자 기록물을 이질 분류하기 위한 제2 전자 기록물 분류 모델을 기계 학습시킨다.In addition, the business activity learning unit 210 machine learns a second electronic records classification model for disparate classification of electronic records by using second training data composed of electronic records, organization information, and record information.

즉, 제2 학습 데이터는 제2 전자 기록물 분류 모델을 기계 학습시키기 위한 학습 데이터이며, 학습 패턴 쌍(x, d) 중 전자 기록물은 입력 x이고, 조직 정보 및 기록물 정보는 목표치 d로 구성된다.That is, the second training data is training data for machine learning the second electronic record classification model, the electronic record among the pairs of learning patterns (x, d) is the input x, and the organization information and the record information are composed of the target value d.

이 경우, 업무 활동 학습부(210)는 제2 학습 데이터의 기록물 정보를 결정하기 위해, 제2 학습 데이터의 전자 기록물을 데이터를 자연어 처리할 수 있다. In this case, the work activity learning unit 210 may process the data on the electronic record of the second training data in natural language in order to determine the record information of the second training data.

여기서, 자연어 처리는 전자 기록물의 텍스트를 기계적으로 분석해서 컴퓨터가 이해할 수 있는 형태로 만드는 것을 의미하며, 예를 들면, 자연어 처리는 형태소 분석, 품사 부착, 구절 단위 분석, 구문 분석 등을 통해 이루어질 수 있다.Here, natural language processing means to mechanically analyze the text of an electronic record and make it in a form that can be understood by a computer.For example, natural language processing can be performed through morpheme analysis, part-of-speech attachment, verse unit analysis, and syntax analysis. have.

예를 들면, 전자 기록물이 "특허팀"에서 생산된 "명세서"인 경우, 제2 학습 데이터는 "명세서"를 입력으로, "명세서"를 자연어 처리한 결과로 생성된 "명세서의 내용" 및 "서식 정보"를 목표치로 하여 구성될 수 있다.For example, if the electronic record is a "specification" produced by the "patent team", the second learning data is "specification" and "contents of the specification" generated as a result of natural language processing of the "specification" as input. It can be configured with "format information" as a target value.

한편, 제2 학습 데이터는 전자 기록물을 생산한 조직 정보뿐만 아니라 조직에서 생산된 전자 기록물과 관련된 기록물 정보를 포함하므로, 제2 학습 데이터를 통해 기계 학습된 제2 전자 기록물 분류 모델은 전자 기록물을 내용 및 서식 등을 기초로 이질 분류할 수 있게 된다.On the other hand, since the second learning data includes not only information about the organization that produced the electronic record, but also record information related to the electronic record produced by the organization, the second electronic record classification model machine-learned through the second learning data contains the electronic record. And it is possible to classify heterogeneous on the basis of the format.

예를 들면, 제2 전자 기록물 분류 모델은 "명세서"가 입력된 경우, "명세서의 내용" 및 "명세서의 서식 정보"에 기초하여 "명세서"에 대해 "특허팀"에서 생산된 것으로 분류할 수 있다.For example, the second electronic records classification model can be classified as produced by the "patent team" for "specifications" based on "contents of the specification" and "format information of the specification" when "specifications" are entered. have.

전자 기록물 분류부(220)는 신규 전자 기록물을 수신하고, 상기 전자 기록물 분류 모델들을 이용하여 상기 신규 전자 기록물을 분류한다.The electronic record sorting unit 220 receives a new electronic record and classifies the new electronic record using the electronic record classification models.

도 3은 한 실시예에 따른 전자 기록물 분류부가 신규 전자 기록물을 분류하는 방법을 설명하는 도면이다.3 is a diagram for explaining a method of classifying a new electronic record by an electronic record classifying unit according to an embodiment.

전자 기록물 분류부(220)는 제1 전자 기록물 분류 모델에 따른 제1 분류 결과 및 제2 전자 기록물 분류 모델에 따른 제2 분류 결과에 가중치를 각각 적용하여, 신규 전자 기록물에 대한 분류 정보를 생성한다.The electronic records classification unit 220 applies weights to the first classification result according to the first electronic records classification model and the second classification result according to the second electronic records classification model, respectively, to generate classification information for a new electronic record. .

도 3을 참고하면, 만일 제1 전자 기록물 분류 모델을 이용하여 신규 전자 기록물을 분류하는 경우, 전자 기록물 분류부(220)는 신규 전자 기록물을 조직의 업무 활동 등(예를 들면, 신규 전자 기록물의 업무 및 구체적인 범주)을 기초로 다중(multi-class) 분류하게 된다. 한편, 만일 제2 전자 기록물 분류 모델을 이용하여 신규 전자 기록물을 분류하는 경우, 전자 기록물 분류부(220)는 신규 전자 기록물을 기록관리정보와 같은 서식 등(예를 들면, 신규 전자 기록물의 내용, 송수신 부서 정보 및 서식 정보)을 기초로 이질(heterogeneous) 분류하게 된다. 따라서, 전자 기록물 분류부(220)가 제1 전자 기록물 분류 모델에 따른 제1 결과 및 제2 전자 기록물 분류 모델에 따른 제2 결과에 가중치를 각각 적용하여 신규 전자 기록물에 대한 분류 정보를 생성하는 경우, 다중 분류 및 이질 분류를 통합하여 수행하는 병합(hybrid) 분류를 수행할 수 있게 된다. 이 경우, 분류 정보는 신규 전자 기록물이 저장 및 보관될 전자철에 대한 식별 정보를 포함한다.Referring to FIG. 3, if a new electronic record is classified using the first electronic record classification model, the electronic record classification unit 220 classifies the new electronic record into the organization's business activities, etc. It is classified as multi-class based on task and specific category). On the other hand, if a new electronic record is classified using the second electronic record classification model, the electronic record classifying unit 220 stores the new electronic record in a format such as record management information (for example, the contents of the new electronic record, Transmitting and receiving department information and form information) are classified as heterogeneous. Accordingly, when the electronic records classification unit 220 applies weights to the first result according to the first electronic record classification model and the second result according to the second electronic record classification model, respectively, to generate classification information for a new electronic record. , It is possible to perform a hybrid classification that integrates multiple classifications and heterogeneous classifications. In this case, the classification information includes identification information on the electronic iron in which new electronic records are to be stored and stored.

구체적으로, 전자 기록물 분류부(220)는 제1 결과에 제1 가중치를 적용한 제1 결과값과 제2 결과에 제2 가중치를 적용한 제2 결과값을 비교하고, 제1 결과값과 제2 결과값 중에서 더 큰 결과값에 기초하여 분류 정보를 결정한다. 이 경우, 제1 가중치 및 제2 가중치는 사용자에 의해 설정될 수 있다.Specifically, the electronic record classification unit 220 compares the first result value to which the first weight is applied to the first result and the second result value to which the second weight is applied to the second result, and the first result value and the second result Classification information is determined based on the larger result value among the values. In this case, the first weight and the second weight may be set by the user.

예를 들면, 신규 전자 기록물인 "명세서"에 대해 제1 결과가 "특허팀"일 확률이 "1"을 기준으로 "0.8"로 결정되었고, 제2 결과가 "관리팀"일 확률이 "1"을 기준으로 "0.6"로 결정되었고, 제1 가중치 및 제2 가중치가 각각 "0.7"및 "0.3"인 경우, 전자 기록물 분류부(220)는 제1 결과값을 "0.8"에 "0.7"을 곱한 "0.56"으로 결정할 수 있고, 제2 결과값을 "0.6"에 "0.3"을 곱한 "0.18"로 결정할 수 있다. 또한, 전자 기록물 분류부(220)는 제1 결과값과 제2 결과값 중 제1 결과값이 더 크므로, 제1 결과값에 대응하는 "특허팀"을 분류 정보로서 결정할 수 있다.For example, for a new electronic record, "Specification", the probability that the first result is "patent team" is determined as "0.8" based on "1", and the second result is the "management team" probability of "1" Is determined as "0.6" and the first weight and the second weight are "0.7" and "0.3", respectively, the electronic record classification unit 220 sets the first result value to "0.8" and "0.7". Multiplied by "0.56" may be determined, and the second result may be determined as "0.18" multiplied by "0.6" by "0.3". In addition, since the first result value of the first result value and the second result value is larger, the electronic record classification unit 220 may determine “patent team” corresponding to the first result value as the classification information.

신규 업무 인식부(230)는 동일한 분류 정보를 갖는 복수의 전자 기록물들 간의 응집도를 각각 결정하고, 복수의 전자 기록물들 중에서 응집도가 미리 설정된 임계범위 내에 속하지 않은 적어도 하나 이상의 전자 기록물들을 신규 업무로 추천한다.The new business recognition unit 230 determines the degree of cohesion between a plurality of electronic records having the same classification information, and recommends at least one electronic record among the plurality of electronic records that do not fall within a preset threshold as a new business. do.

이 경우, 신규 업무 인식부(230)는 동일한 분류 정보를 갖는 복수의 전자 기록물들에 대해 벡터화 알고리즘, 로지스틱 회귀(Logistic regression)와 같은 확률기반 회귀분석 또는 결정트리 알고리즘에 의해 응집도를 결정할 수 있으며, 이하에서는 벡터화 알고리즘을 이용하여 응집도를 결정하는 방법을 설명한다. 구체적으로, 신규 업무 인식부(230)는 동일한 분류 정보를 갖는 각 전자 기록물들에 대해 벡터화 알고리즘을 이용하여 전자 기록물 임베딩 벡터를 생성할 수 있다.In this case, the new task recognition unit 230 may determine the degree of aggregation for a plurality of electronic records having the same classification information by a vectorization algorithm, a probability-based regression analysis such as logistic regression, or a decision tree algorithm, Hereinafter, a method of determining the degree of aggregation using a vectorization algorithm will be described. Specifically, the new task recognition unit 230 may generate an electronic record embedding vector using a vectorization algorithm for each electronic record having the same classification information.

예를 들면, 도 4를 참고하면, 신규 업무 인식부(230)는 A 범주를 분류 정보로 갖는 제1 전자 기록물 내지 제3 전자 기록물을 다차원 공간에 벡터화시키는 방식으로 각 전자 기록물들에 대해 전자 기록물 임베딩 벡터를 생성할 수 있다. 이 경우, 전자 기록물 임베딩 벡터는 동일한 분류 정보를 갖는 각 전자 기록물들 간의 관계를 추론할 수 있게 된다.For example, referring to FIG. 4, the new business recognition unit 230 vectorizes the first to third electronic records having category A as classification information in a multidimensional space. You can create embedding vectors. In this case, the electronic record embedding vector makes it possible to infer the relationship between the electronic records having the same classification information.

이 경우, 신규 업무 인식부(230)는 예를 들면, Word2vec 알고리즘 또는 Glove 알고리즘과 같은 임의의 임베딩 알고리즘을 이용하여 전자 기록물 임베딩 벡터를 생성할 수 있다.In this case, the new task recognition unit 230 may generate an electronic record embedding vector using an arbitrary embedding algorithm such as, for example, the Word2vec algorithm or the Glove algorithm.

이후, 신규 업무 인식부(230)는 전자 기록물 임베딩 벡터에 대해 벡터 유사도 판단 알고리즘을 이용하여 복수의 전자 기록물들 간의 응집도를 각각 결정한다.Thereafter, the new business recognition unit 230 determines the degree of cohesion between the plurality of electronic records using a vector similarity determination algorithm for the electronic record embedding vector.

예를 들면, 도 5를 참고하면, 신규 업무 인식부(230)는 각 전자 기록물 임베딩 벡터들 사이의 수학적 유사도를 코사인 유사도(Cosine Similarity)와 같은 벡터 유사도 판단 알고리즘을 이용하여 계산할 수 있고, 계산된 수학적 유사도를 각 전자 기록물들 사이의 응집도로 결정할 수 있다.For example, referring to FIG. 5, the new task recognition unit 230 may calculate the mathematical similarity between each electronic record embedding vectors using a vector similarity determination algorithm such as cosine similarity, and the calculated The degree of mathematical similarity can be determined by the degree of cohesion between individual electronic records.

만일 복수의 전자 기록물들 중에서 특정 전자 기록물이 다른 전자 기록물들과의 응집도가 낮아서 임계 범위 내에 존재하지 않는다면, 특정 전자 기록물은 기 분류된 범주에 해당하지 않는 것으로 결정할 수 있다. 따라서, 신규 업무 인식부(230)는 특정 전자 기록물을 신규 업무로 추천하고, 특정 전자 기록물에 대한 정보를 전자 기록물 관리 단말(300)로 전송할 수 있다.If, among a plurality of electronic records, a specific electronic record does not exist within a critical range due to a low degree of coherence with other electronic records, it may be determined that the specific electronic record does not fall under the pre-classified category. Accordingly, the new task recognition unit 230 may recommend a specific electronic record as a new task and transmit information on the specific electronic record to the electronic record management terminal 300.

도 6은 전자 기록물 분류 장치가 수신된 전자 기록물을 분류하는 방법을 설명하는 도면이다.6 is a diagram illustrating a method of classifying received electronic records by an electronic record classification apparatus.

도 6에서, 도 1 내지 도 5와 동일한 내용은 자세한 설명을 생략한다.In FIG. 6, detailed descriptions of the same details as those of FIGS. 1 to 5 are omitted.

도 6을 참고하면, 전자 기록물 분류 장치(200)는 전자 기록물 및 전자 기록물을 생산한 조직의 업무 활동 정보로 구성된 학습 데이터를 이용하여 적어도 하나 이상의 전자 기록물 분류 모델들을 기계 학습시킨다.Referring to FIG. 6, the electronic record classification apparatus 200 machine learns at least one electronic record classification model by using learning data composed of the electronic record and business activity information of the organization that produced the electronic record.

구체적으로, 업무 활동 정보는 조직의 조직 정보, 조직의 업무 정보 또는 조직의 기록물 정보 중 적어도 하나를 포함한다.Specifically, the business activity information includes at least one of organizational information of an organization, business information of an organization, or information of records of an organization.

또한, 전자 기록물 분류 장치(200)는 학습 데이터를 이용하여 전자 기록물을 다중 분류하기 위한 제1 전자 기록물 분류 모델을 기계 학습시킨다.In addition, the electronic record classification apparatus 200 machine learns a first electronic record classification model for multi-classifying electronic records by using the learning data.

이 경우, 전자 기록물 분류 장치(200)는 학습 데이터에 포함된 전자 기록물, 조직 정보 및 업무 정보를 이용하여 전자 기록물을 다중 분류하기 위한 제1 전자 기록물 분류 모델을 기계 학습시킨다(S100).In this case, the electronic record classification apparatus 200 machine learns a first electronic record classification model for multi-classifying the electronic record using the electronic record, organization information, and business information included in the learning data (S100).

또한, 전자 기록물 분류 장치(200)는 학습 데이터를 이용하여 전자 기록물을 이질 분류하기 위한 제2 전자 기록물 분류 모델을 기계 학습시킨다.In addition, the electronic records classification apparatus 200 machine-learns a second electronic records classification model for disparate classification of electronic records using the learning data.

이 경우, 전자 기록물 분류 장치(200)는 학습 데이터에 포함된 전자 기록물, 조직 정보 및 기록물 정보로 구성된 제2 학습 데이터를 이용하여 전자 기록물을 이질 분류하기 위한 제2 전자 기록물 분류 모델을 기계 학습시킨다(S110).In this case, the electronic records classification apparatus 200 machine-learns a second electronic records classification model for heterogeneous classification of electronic records by using the second learning data composed of electronic records, organization information, and record information included in the learning data. (S110).

전자 기록물 분류 장치(200)는 신규 전자 기록물을 수신한다(S120).The electronic record classification apparatus 200 receives a new electronic record (S120).

전자 기록물 분류 장치(200)는 제1 전자 기록물 분류 모델에 따른 제1 분류 결과 및 제2 전자 기록물 분류 모델에 따른 제2 분류 결과에 가중치를 각각 적용하여, 신규 전자 기록물에 대한 분류 정보를 생성한다(S130).The electronic records classification apparatus 200 applies weights to the first classification result according to the first electronic record classification model and the second classification result according to the second electronic record classification model, respectively, to generate classification information for a new electronic record. (S130).

전자 기록물 분류 장치(200)는 동일한 분류 정보를 갖는 복수의 전자 기록물들의 응집도를 각각 결정한다(S140).The electronic record sorting apparatus 200 determines a degree of aggregation of a plurality of electronic records having the same classification information (S140).

구체적으로, 전자 기록물 분류 장치(200)는 동일한 분류 정보를 갖는 각 전자 기록물들에 대해 벡터화 알고리즘을 이용하여 전자 기록물 임베딩 벡터를 생성하고, 전자 기록물 임베딩 벡터에 대해 벡터 유사도 판단 알고리즘을 이용하여 복수의 전자 기록물들 간의 응집도를 각각 결정한다.Specifically, the electronic record classification apparatus 200 generates an electronic record embedding vector using a vectorization algorithm for each electronic record having the same classification information, and uses a vector similarity determination algorithm for the electronic record embedding vector. The degree of cohesion between electronic records is determined respectively.

전자 기록물 분류 장치(200)는 복수의 전자 기록물들 중에서 응집도가 미리 설정된 임계범위 내에 속하지 않은 적어도 하나 이상의 전자 기록물들을 신규 업무로 추천한다(S150).The electronic record sorting apparatus 200 recommends at least one electronic record that does not fall within a predetermined threshold range among a plurality of electronic records as a new task (S150).

또한, 본 발명에 따르면, 기 분류된 전자 기록물을 재학습하여 신규 업무를 도출하고, 신규 업무로 도출된 전자 기록물을 재할당함으로써 응집도 높은 전자 기록물의 집합 관리가 가능하다.In addition, according to the present invention, by relearning previously classified electronic records to derive new tasks and reallocating electronic records derived as new tasks, it is possible to manage a collection of electronic records with high degree of aggregation.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only through an apparatus and a method, but may be implemented through a program that realizes a function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements by those skilled in the art using the basic concept of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

Claims

An electronic record sorting device,
At least one electronic records classification model is machined using learning data consisting of electronic records and business activity information including at least one of organizational information of the organization that produced the electronic records, business information of the organization, or records information of the organization. Learning work activity learning department; And
Including; an electronic record classifying unit for receiving a new electronic record and classifying the new electronic record using the electronic record classification models,
The above work activity learning department
A first classification result obtained by machine learning a first electronic record classification model for multi-classifying the electronic record using first learning data composed of the electronic record, the organization information, and the business information, and the electronic record, the organization information And a second classification result obtained by machine learning a second electronic record classification model for disparate classification of the electronic record by using second learning data composed of the record information.

delete

The method of claim 1,
The electronic record sorting unit
An electronic records classification apparatus for generating classification information on the new electronic records by applying weights to a first classification result according to the first electronic records classification model and a second classification result according to the second electronic records classification model.

The method of claim 1,
A new task recognition unit that determines a degree of aggregation between a plurality of electronic records having the same classification information, and recommends at least one or more electronic records of the plurality of electronic records that do not fall within a preset threshold range as a new task;
Electronic record classification device further comprising a.

The method of claim 5,
The new task recognition unit
For each electronic record having the same classification information, an electronic record embedding vector is generated using a vectorization algorithm, and the degree of cohesion between the plurality of electronic records is determined by using a vector similarity determination algorithm for the electronic record embedding vector. Electronic record sorting device.

A method for classifying received electronic records by an electronic record classification apparatus, comprising:
At least one electronic records classification model is machined using learning data consisting of electronic records and business activity information including at least one of organizational information of the organization that produced the electronic records, business information of the organization, or records information of the organization. Learning;
Receiving a new electronic record; And
Classifying the new electronic record using the electronic record classification models; Including,
The machine learning step
Calculating a first classification result of machine learning a first electronic record classification model for multi-classifying the electronic record using first training data composed of the electronic record, the organization information, and the business information;
Calculating a second classification result obtained by machine learning a second electronic record classification model for classifying the electronic record as heterogeneous using second learning data composed of the electronic record, the organization information, and the record information; And
Performing merge classification by integrating the first classification result and the second classification result;
Electronic records classification method comprising a.

delete

The method of claim 7,
The step of classifying the new electronic record
An electronic record classification method for generating classification information for the new electronic record by applying a weight to a first classification result according to the first electronic record classification model and a second classification result according to the second electronic record classification model, respectively.

The method of claim 7,
Determining a degree of aggregation between a plurality of electronic records each having the same classification information; And
Recommending at least one or more electronic records that do not fall within a predetermined threshold range among the plurality of electronic records as a new task;
Electronic record classification method further comprising a.

The method of claim 11,
Determining the degree of aggregation, respectively,
Generating an electronic record embedding vector using a vectorization algorithm for each electronic record having the same classification information; And
Determining a degree of cohesion between the plurality of electronic records by using a vector similarity determination algorithm with respect to the electronic record embedding vector;
Electronic records classification method comprising a.