KR102574459B1

KR102574459B1 - Device and method for electronic document management based on artificial intelligence having automatic notification function

Info

Publication number: KR102574459B1
Application number: KR1020230015378A
Authority: KR
Inventors: 신민영
Original assignee: 신민영
Priority date: 2023-02-06
Filing date: 2023-02-06
Publication date: 2023-09-04

Abstract

본 발명의 일 목적은 계약서 관리, 보고서 관리 등의 서비스에 적용되어 전자문서를 관리하고 분석하고 특정 이벤트에 대해 자동 알림을 제공하는 인공지능 기반의 전자문서 관리 장치를 제공하는 것에 있다.
이를 위해, 본 발명의 일 실시예에 따른 자동 알림 기능을 포함하는 인공지능 기반의 전자문서 관리 장치는, 전자문서 이미지를 입력받아 상기 전자문서 이미지에 포함된 텍스트를 항목별로 매칭한 복수의 항목 정보를 출력하기 위한 하나 이상의 인공 신경망 모듈; 상기 복수의 항목 정보 중 날짜와 관계된 항목 정보를 선별하기 위한 선별 모듈; 및 상기 선별된 항목 정보 중에서 자동 알림을 수행할 자동 알림 대상 항목 정보를 결정하기 위한 자동 알림 항목 결정 모듈을 포함한다.One object of the present invention is to provide an artificial intelligence-based electronic document management device that is applied to services such as contract management and report management to manage and analyze electronic documents and provide automatic notification of specific events.
To this end, an artificial intelligence-based electronic document management device including an automatic notification function according to an embodiment of the present invention receives an electronic document image and matches text included in the electronic document image for each item to obtain information about a plurality of items. one or more artificial neural network modules for outputting; a selection module for selecting item information related to a date from among the plurality of item information; and an automatic notification item determination module for determining automatic notification target item information to be automatically notified among the selected item information.

Description

Device and method for electronic document management based on artificial intelligence having automatic notification function

본 발명은 자동 알림 기능을 포함하는 인공지능 기반의 전자문서 관리 장치 및 방법에 관한 것이다. The present invention relates to an artificial intelligence-based electronic document management apparatus and method including an automatic notification function.

클라우드 시장의 발달로 최근 Docusign, Modusign 등과 같은 전자문서 비즈니스가 성행하기 시작하였고, 전자문서 비즈니스에 의해 기존 종이 문서의 다양한 문제점들이 해소되고 있는 실정이다. 예를 들어, 국내의 경우 은행 지점 한 곳에서 발생하는 종이문서는 연간 1억장에 달하여 박스 73,000여개와 600여평의 매우 큰 보관 공간을 요구하게 되는데, 매우 큰 공간을 차지하는 종이 문서에 비해 전자문서는 훨씬 더 적은 공간에서 더 많은 문서들을 관리할 수 있게 되었다. 또한, 전자문서를 활용하면 문서의 생산, 유통, 보관, 폐기 등에 따른 관리 비용이나 프린트, 잉크 비용 등도 절감할 수 있게 되며, 문서 관리 및 정보 유출 방지가 더 용이해지는 효과가 있다. Due to the development of the cloud market, electronic document businesses such as Docusign and Modusign have recently started to flourish, and various problems of existing paper documents are being solved by the electronic document business. For example, in Korea, paper documents generated at one bank branch reach 100 million annually, requiring 73,000 boxes and a very large storage space of 600 pyeong. Compared to paper documents, which take up a very large space, electronic documents You can manage more documents in a lot less space. In addition, when electronic documents are used, management costs due to document production, distribution, storage, disposal, etc., as well as printing and ink costs, can be reduced, and document management and information leakage prevention are made easier.

이러한 종이 문서 비즈니스에서 전자문서 비즈니스로 전이되는 과정에서 중요한 점은 기존의 종이 문서를 전자화하는 데에 있다. 이때 사용되는 기술이 광학문자판독(OCR) 기술이다. OCR은 스캐닝을 통한 이미지에서 문자를 인식하고 추출하는 기술이다. 이러한 OCR은 2000년대 초 금융권을 중심으로 활발하게 적용되었다. 당시 은행은 종이전표에 인쇄되었거나 고객이 수기로 입력한 종이문서를 처리하는 업무가 대부분이었는데, 고객이 제공한 문서에 대해 OCR 엔진을 학습 시켜서 인식률을 높이는 것이 중요한 과정이었고, 개발자들은 일일이 수동으로 엔진 파라미터를 조정하여야 했으며, 오인식된 부분을 수정하는데 많은 리소스가 들어가 전체 업무 정확도가 떨어지는 문제가 발생되었다. In the process of transitioning from the paper document business to the electronic document business, the important point is to digitize the existing paper documents. The technology used at this time is Optical Character Recognition (OCR) technology. OCR is a technology that recognizes and extracts text from an image through scanning. OCR was actively applied in the financial sector in the early 2000s. At that time, banks were mostly responsible for processing paper documents printed on paper slips or manually entered by customers, and it was an important process to increase the recognition rate by training the OCR engine on documents provided by customers. Parameters had to be adjusted, and a lot of resources were spent on correcting misrecognized parts, resulting in a decrease in overall work accuracy.

이후 전자문서 관리 비즈니스는 인공지능과 블록체인의 발달로 새로운 지평이 열리기 시작하였다. 특히, 인공지능과 관련하여 1989년 Y.LeCun에 의해 처음으로 제시된 CNN(Convolutional Neural Network)이 2012년 Alex Khrizhebsky의 AlexNet을 시작으로 AI 시대가 본격적으로 열리기 시작하였고, 최근에는 VGG-16, ResNet, EfficientNet, AmoebaNet, SqueezeNet, MobileNet, ShuffleNet, MNasNet, FBNet, CharmNet, DenseNet, Xception 등의 다양한 딥러닝 모델(또는, 딥 모델)이 제시되기 시작하였다. 이후 전자문서 관리 비즈니스에는 이러한 딥 모델이 활용되기 시작하면서 기존 OCR 기술의 한계가 극복되는 효과가 발생되었다. Since then, the electronic document management business has begun to open a new horizon with the development of artificial intelligence and blockchain. In particular, in relation to artificial intelligence, CNN (Convolutional Neural Network), which was first presented by Y.LeCun in 1989, started with Alex Khrizhebsky's AlexNet in 2012, and the AI era began in earnest. Recently, VGG-16, ResNet, Various deep learning models (or deep models) such as EfficientNet, AmoebaNet, SqueezeNet, MobileNet, ShuffleNet, MNasNet, FBNet, CharmNet, DenseNet, and Xception have begun to be presented. Since then, the electronic document management business has started to utilize this deep model, which has had the effect of overcoming the limitations of the existing OCR technology.

대한민국 공개특허 10-2021-0086849, 문서를 생성하기 위한 방법, 주식회사 리걸인사이트Republic of Korea Patent Publication No. 10-2021-0086849, method for generating documents, Legal Insight Co., Ltd. 대한민국 등록특허 10-2289935, 인공지능 기반의 법률 문서 분석 시스템 및 방법, 주식회사 인텔리콘 연구소Korean Registered Patent No. 10-2289935, artificial intelligence-based legal document analysis system and method, Intellicon Research Institute

따라서, 본 발명의 목적은 계약서 관리, 보고서 관리 등의 서비스에 적용되어 전자문서를 관리하고 분석하는 인공지능 기반의 전자문서 관리 장치 및 방법을 제공하는데에 있다. Accordingly, an object of the present invention is to provide an artificial intelligence-based electronic document management device and method for managing and analyzing electronic documents applied to services such as contract management and report management.

이하 본 발명의 목적을 달성하기 위한 구체적 수단에 대하여 설명한다.Hereinafter, specific means for achieving the object of the present invention will be described.

본 발명의 일실시예에 따른 자동 알림 기능을 포함하는 인공지능 기반의 전자문서 관리 장치는, 전자문서 이미지를 입력받아 상기 전자문서 이미지에 포함된 텍스트를 항목별로 매칭한 복수의 항목 정보를 출력하기 위한 하나 이상의 인공 신경망 모듈; 상기 복수의 항목 정보 중 날짜와 관계된 항목 정보를 선별하기 위한 선별 모듈; 및 상기 선별된 항목 정보 중에서 자동 알림을 수행할 자동 알림 대상 항목 정보를 결정하기 위한 자동 알림 항목 결정 모듈을 포함한다.An artificial intelligence-based electronic document management device including an automatic notification function according to an embodiment of the present invention receives an electronic document image and outputs a plurality of item information obtained by item-by-item matching of text included in the electronic document image. one or more artificial neural network modules for; a selection module for selecting item information related to a date from among the plurality of item information; and an automatic notification item determination module for determining automatic notification target item information to be automatically notified among the selected item information.

전술한 인공지능 기반의 전자문서 관리 장치에 있어서, 상기 자동 알림 항목 결정 모듈은, 상기 선별된 항목 정보를 입력 받아, 자동 알림을 수행할 항목 정보를 출력하는 트랜스포머 기반 인공 신경망 모듈을 포함하며, 상기 트랜스포머 기반 인공 신경망 모듈은 미래 기대 보상, 상태 정보 및 상기 인공 신경망 모듈의 과거 출력을 입력 받고, 출력으로서 자동 알림 대상 항목 정보를 포함하는 최적 출력을 생성하고, 상기 미래 기대 보상은 상기 자동 알림 항목 결정 모듈이 상기 자동 알림 대상 항목 정보를 생성함으로써 미래에 기대되는 보상들의 총합을 나타내고, 상기 상태 정보는 상기 자동 알림 항목 결정 모듈이 수신한 상기 선별된 항목 정보를 적어도 포함하고, 상기 과거 출력은 상기 트랜스포머 기반 인공 신경망 모듈이 과거에 출력한 자동 알림 대상 항목 정보를 포함한다. In the artificial intelligence-based electronic document management device described above, the automatic notification item determination module includes a transformer-based artificial neural network module that receives the selected item information and outputs item information for automatic notification, The transformer-based artificial neural network module receives future expected compensation, state information, and past outputs of the artificial neural network module, and generates an optimal output including automatic notification target item information as an output, and the future expected compensation determines the automatic notification item. The module represents the total amount of rewards expected in the future by generating the automatic notification target item information, the status information includes at least the selected item information received by the automatic notification item determination module, and the past output is the transformer It includes the automatic notification target item information output in the past by the base artificial neural network module.

전술한 인공지능 기반의 전자문서 관리 장치에 있어서, 상기 상태 정보는 상기 전자문서 이미지와 연관된 법령 정보를 포함한다. In the aforementioned artificial intelligence-based electronic document management device, the status information includes legal information associated with the electronic document image.

전술한 인공지능 기반의 전자문서 관리 장치에 있어서, 상기 트랜스포머 기반 인공 신경망 모듈은 제1 블록 및 제2 블록을 포함하고, 상기 제1 블록은 상기 미래 기대 보상 및 상기 상태 정보를 입력 받아 최종 인코딩 정보를 출력하여 상기 제2 블록에 입력으로서 제공하고, 상기 제2 블록은 상기 과거 출력을 입력 받아 상기 자동 알림 대상 항목 정보를 포함하는 최적 출력을 생성한다.In the artificial intelligence-based electronic document management apparatus described above, the transformer-based artificial neural network module includes a first block and a second block, and the first block receives the future expected compensation and the state information and receives final encoding information. is output and provided as an input to the second block, and the second block receives the past output and generates an optimal output including the automatic notification target item information.

전술한 인공지능 기반의 전자문서 관리 장치에 있어서, 상기 하나 이상의 인공 신경망 모듈은, 전자문서 이미지를 수신하고 상기 전자문서 이미지에 포함된 텍스트를 추출하여 텍스트 정보를 출력하는 인공신경망을 포함하는 텍스트 추출 모듈; 및 상기 텍스트 정보가 y좌표의 순서로 순차적으로 통합된 정보인 전체 텍스트 정보를 입력 받고, 상기 전자문서 이미지에 포함된 상기 텍스트를 항목별로 매칭한 항목 정보를 출력하는 인공신경망을 포함하는 텍스트 처리 모듈을 포함한다. In the aforementioned artificial intelligence-based electronic document management device, the at least one artificial neural network module includes an artificial neural network that receives an electronic document image, extracts text included in the electronic document image, and outputs text information. module; and a text processing module including an artificial neural network that receives entire text information, which is information in which the text information is sequentially integrated in the order of y-coordinates, and outputs item information obtained by item-by-item matching of the text included in the electronic document image. includes

상기한 바와 같이, 본 발명에 의하면 이하와 같은 효과가 있다.As described above, the present invention has the following effects.

첫째, 본 발명의 일실시예에 따르면, 개인정보의 유출 없이도 전자문서 관리 인공신경망을 학습시킬 수 있는 효과가 발생된다. First, according to an embodiment of the present invention, an effect of learning an artificial neural network for electronic document management without leakage of personal information is generated.

둘째, 본 발명의 일실시예에 따르면, 전자문서의 종류에 따라 맞춤형 클래스로 전자문서 항목을 분류할 수 있게 되는 효과가 발생된다. Second, according to an embodiment of the present invention, an effect of being able to classify electronic document items into customized classes according to the types of electronic documents occurs.

셋째, 본 발명의 일실시예에 따르면, 전자문서의 항목 정보 중 자동 알림이 필요한 항목 정보를 선별하여 자동 알림을 수행할 수 있는 효과가 발생된다.Thirdly, according to one embodiment of the present invention, an automatic notification can be performed by selecting item information requiring automatic notification among item information of an electronic document.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치의 작동관계를 도시한 모식도이다.
도 2는 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치와 전자문서 관리 서버의 각 단계에 따른 작동관계를 도시한 모식도이다.
도 3은 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치의 구성관계를 도시한 모식도이다.
도 4는 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치의 구체적인 작동관계를 도시한 모식도이다.
도 5는 본 발명의 일실시예에 따른 텍스트 추출 모듈의 구체적인 구성을 도시한 모식도이다.
도 6은 본 발명의 일실시예에 따른 텍스트 박스 생성 신경망 모듈을 도시한 모식도이다.
도 7은 본 발명의 일실시예에 따른 텍스트 박스 분절 모듈을 도시한 모식도이다.
도 8은 본 발명의 일실시예에 따른 분절 벡터 생성 모듈의 작동관계를 도시한 모식도이다.
도 9는 본 발명의 변형예에 따른 분절 벡터 생성 모듈의 작동관계를 도시한 모식도이다.
도 10은 본 발명의 일실시예에 따른 텍스트 시퀀스 생성 신경망 모듈을 도시한 모식도이다.
도 11은 본 발명의 일실시예에 따른 강화학습 모듈을 도시한 모식도이다.
도 12는 본 발명의 변형예에 따른 강화학습 모듈을 도시한 모식도이다.
도 13은 본 발명의 일실시예에 따른 강화학습 모듈의 동작예를 도시한 흐름도이다.
도 14는 본 발명의 일실시예에 따른 텍스트 처리 모듈을 도시한 모식도이다.
도 15는 본 발명의 일실시예에 따른 전자문서 분류 신경망 모듈을 도시한 모식도이다.
도 16은 본 발명의 일실시예에 따른 항목 분류 신경망 모듈을 도시한 모식도이다.
도 17은 본 발명의 일실시예에 따른 신경망 처리 모듈을 도시한 모식도이다.
도 18은 본 발명의 일실시예에 따른 수정 정보 생성 모듈을 도시한 모식도이다.
도 19는 본 발명의 일실시예에 따른 비정상 텍스트 구분 모듈의 비정상 텍스트 구분 과정을 도시한 모식도이다.
도 20은 본 발명의 일실시예에 따른 정상 텍스트 생성 모듈의 구조를 도시한 모식도이다.
도 21은 본 발명의 일 실시예에 따른 자동 알림 기능을 갖는 인공지능 기반의 전자문서 관리 장치를 개략적으로 나타낸 것이다.
도 22는 본 발명의 일 실시예에 따른 자동 알림 모듈을 개략적으로 나타낸 것이다.
도 23은 본 발명의 일 실시예에 따른 자동 알림 항목 결정 모듈에 포함된 트랜스포머 기반 인공 신경망 모듈을 개략적으로 나타내는 도면이다.The following drawings attached to this specification illustrate preferred embodiments of the present invention, and together with the detailed description of the invention serve to further understand the technical idea of the present invention, the present invention is limited only to those described in the drawings. and should not be interpreted.
1 is a schematic diagram showing the operational relationship of an artificial intelligence-based electronic document management device according to an embodiment of the present invention.
2 is a schematic diagram showing an operating relationship according to each step of an artificial intelligence-based electronic document management device and an electronic document management server according to an embodiment of the present invention.
3 is a schematic diagram showing the configuration of an artificial intelligence-based electronic document management device according to an embodiment of the present invention.
4 is a schematic diagram showing a specific operational relationship of an artificial intelligence-based electronic document management device according to an embodiment of the present invention.
5 is a schematic diagram showing a specific configuration of a text extraction module according to an embodiment of the present invention.
6 is a schematic diagram showing a text box generating neural network module according to an embodiment of the present invention.
7 is a schematic diagram showing a text box segmentation module according to an embodiment of the present invention.
8 is a schematic diagram showing an operating relationship of a segmental vector generation module according to an embodiment of the present invention.
9 is a schematic diagram showing an operating relationship of a segmental vector generation module according to a modified example of the present invention.
10 is a schematic diagram showing a text sequence generating neural network module according to an embodiment of the present invention.
11 is a schematic diagram showing a reinforcement learning module according to an embodiment of the present invention.
12 is a schematic diagram showing a reinforcement learning module according to a modified example of the present invention.
13 is a flowchart illustrating an operation example of a reinforcement learning module according to an embodiment of the present invention.
14 is a schematic diagram illustrating a text processing module according to an embodiment of the present invention.
15 is a schematic diagram showing an electronic document classification neural network module according to an embodiment of the present invention.
16 is a schematic diagram illustrating an item classification neural network module according to an embodiment of the present invention.
17 is a schematic diagram showing a neural network processing module according to an embodiment of the present invention.
18 is a schematic diagram illustrating a correction information generating module according to an embodiment of the present invention.
19 is a schematic diagram showing an abnormal text classification process of an abnormal text classification module according to an embodiment of the present invention.
20 is a schematic diagram showing the structure of a normal text generating module according to an embodiment of the present invention.
21 schematically illustrates an artificial intelligence-based electronic document management device having an automatic notification function according to an embodiment of the present invention.
22 schematically illustrates an automatic notification module according to an embodiment of the present invention.
23 is a diagram schematically illustrating a transformer-based artificial neural network module included in an automatic notification item determination module according to an embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을 쉽게 실시할 수 있는 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예에 대한 동작원리를 상세하게 설명함에 있어서 관련된 공지기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다.Hereinafter, an embodiment in which a person skilled in the art can easily practice the present invention will be described in detail with reference to the accompanying drawings. However, in the detailed description of the operating principle of the preferred embodiment of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted.

또한, 도면 전체에 걸쳐 유사한 기능 및 작용을 하는 부분에 대해서는 동일한 도면 부호를 사용한다. 명세서 전체에서, 특정 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고, 간접적으로 연결되어 있는 경우도 포함한다. 또한, 특정 구성요소를 포함한다는 것은 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In addition, the same reference numerals are used for parts having similar functions and actions throughout the drawings. Throughout the specification, when a specific part is said to be connected to another part, this includes not only the case where it is directly connected but also the case where it is indirectly connected with another element interposed therebetween. In addition, including a specific component does not exclude other components unless otherwise stated, but means that other components may be further included.

이하 발명의 설명에서 인공신경망은 인공신경망 모듈, 신경망 모듈, 딥러닝, 딥러닝 모델, 딥 모델, 모델 등의 용어로 기재될 수 있다. In the following description of the invention, artificial neural networks may be described in terms such as artificial neural network modules, neural network modules, deep learning, deep learning models, deep models, and models.

이하 발명의 설명에서 컨볼루져널 곱을 활용한 Neural Network인 Convolutional Neural Network은 CNN, ConvNet 등으로 기재될 수 있다.In the following description of the invention, the convolutional neural network, which is a neural network using convolutional products, may be described as CNN, ConvNet, and the like.

이하 발명의 설명에서 전자문서는 계약서, 견적서, 청구서, 보고서 등의 전자화 된 문서를 의미할 수 있다.In the following description of the invention, electronic documents may refer to electronic documents such as contracts, estimates, invoices, and reports.

인공지능 기반의 전자문서 관리 장치AI-based electronic document management device

본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치의 작동관계와 관련하여, 도 1은 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치의 작동관계를 도시한 모식도, 도 2는 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치와 전자문서 관리 서버의 각 단계에 따른 작동관계를 도시한 모식도, 도 3은 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치의 구성관계를 도시한 모식도, 도 4는 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치의 구체적인 작동관계를 도시한 모식도이다. Regarding the operating relationship of the artificial intelligence-based electronic document management device according to an embodiment of the present invention, Figure 1 is a schematic diagram showing the operating relationship of the artificial intelligence-based electronic document management device according to an embodiment of the present invention, 2 is a schematic diagram showing an operating relationship according to each step of an artificial intelligence-based electronic document management device and an electronic document management server according to an embodiment of the present invention, and FIG. 3 is an artificial intelligence-based document management server according to an embodiment of the present invention. Figure 4 is a schematic diagram showing the configuration relationship of the electronic document management device of the present invention, Figure 4 is a schematic diagram showing the specific operating relationship of the artificial intelligence-based electronic document management device according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 사용자 클라이언트(100)의 촬영 또는 별도의 스캔 장비의 스캔을 통해 특정 문서(예를 들어, 부동산 매매 계약서, 부동산 임대차 계약서 등)의 전자문서 이미지를 생성하여 사용자 클라이언트(100)에 포함된 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치(1)에 입력하고, 인공지능 기반의 전자문서 관리 장치(1)에서 전자문서 관리 서버(200)에 인공지능 기반 전자문서 관리 장치(1)에 포함된 인공신경망의 파라미터를 송신하도록 구성된다.As shown in FIG. 1, an electronic document image of a specific document (eg, a real estate sales contract, a real estate lease agreement, etc.) is generated through a photograph of the user client 100 or a scan of a separate scanning device, so that the user client ( 100) is input to the artificial intelligence-based electronic document management device 1 according to an embodiment of the present invention, and the artificial intelligence is sent to the electronic document management server 200 from the artificial intelligence-based electronic document management device 1. It is configured to transmit parameters of an artificial neural network included in the based electronic document management device 1.

또한, 도 2에 도시된 바와 같이 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치(1)는 복수개의 사용자 클라이언트(100)에 각각 구성되며, 전자문서 관리 서버(200)와 유무선 네트워크로 연결되어 클라이언트 학습 단계, 파라미터 업데이트 단계 및 메인 신경망 다운로드 단계를 수행하도록 구성될 수 있다. In addition, as shown in FIG. 2, the artificial intelligence-based electronic document management apparatus 1 according to an embodiment of the present invention is configured in each of a plurality of user clients 100, and the electronic document management server 200 and the wired/wireless It may be connected to a network and configured to perform a client learning step, a parameter update step, and a main neural network download step.

또한, 도 3 및 도 4에 도시된 바와 같이 본 발명의 일실시예에 따른 인공지능 기반의 전자문서 관리 장치(1)는, 텍스트 추출 모듈, 텍스트 처리 모듈, 신경망 처리 모듈을 포함하도록 구성될 수 있고, 전자문서 관리 서버(200)는 메인 신경망 모듈(210), 연합 학습 모듈(220)을 포함하도록 구성될 수 있다. 일 예에 따르면, 인공지능 기반의 전자문서 관리 장치(1)의 신경망 처리 모듈에서는 변경된 파라미터와 수정 변화 정보를 전자문서 관리 서버(200)의 연합 학습 모듈(220)에 송신하고, 연합 학습 모듈(220)에서는 복수의 사용자 클라이언트(100)에서 수신된 파라미터와 수정 변화 정보를 기초로 연합 학습을 수행하여 기학습된 메인 신경망을 생성하도록 구성될 수 있다. 기학습된 메인 신경망은 메인 신경망 모듈(210)을 통해 텍스트 추출 모듈과 텍스트 처리 모듈의 인공신경망 모듈을 업데이트하도록 구성될 수 있다. 이에 따르면, 전자문서에 포함된 개인정보가 다른 사용자 클라이언트에 공유되지 않으면서도 전자문서 관리를 수행하기 위한 텍스트 추출 모듈과 텍스트 처리 모듈의 인공신경망을 학습시킬 수 있게 되는 효과가 발생된다. In addition, as shown in FIGS. 3 and 4 , the artificial intelligence-based electronic document management device 1 according to an embodiment of the present invention may be configured to include a text extraction module, a text processing module, and a neural network processing module. And, the electronic document management server 200 may be configured to include a main neural network module 210 and a federated learning module 220. According to an example, the neural network processing module of the artificial intelligence-based electronic document management device 1 transmits the changed parameters and modified change information to the federation learning module 220 of the electronic document management server 200, and the federated learning module ( In step 220, a pretrained main neural network may be generated by performing federated learning based on parameters and modified change information received from the plurality of user clients 100. The pretrained main neural network may be configured to update the text extraction module and the artificial neural network module of the text processing module through the main neural network module 210 . According to this, an effect of being able to train the artificial neural network of the text extraction module and the text processing module for performing electronic document management without sharing personal information included in the electronic document with other user clients occurs.

이하에서는 인공지능 기반의 전자문서 관리 장치(1)의 텍스트 추출 모듈의 동작 원리에 대해 보다 구체적으로 설명한다. 도 5는 본 발명의 일실시예에 따른 텍스트 추출 모듈의 구체적인 구성을 도시한 모식도이다. Hereinafter, the operating principle of the text extraction module of the artificial intelligence-based electronic document management device 1 will be described in more detail. 5 is a schematic diagram showing a specific configuration of a text extraction module according to an embodiment of the present invention.

도 5에 도시된 바와 같이, 텍스트 추출 모듈은, 전자문서 이미지를 수신하고 상기 전자문서 이미지에 포함된 텍스트를 추출하여 텍스트 정보를 출력하는 모듈이다. 본 발명의 일실시예에 따른 텍스트 추출 모듈은, 텍스트 박스 생성 신경망 모듈, 텍스트 박스 분절 모듈, 텍스트 시퀀스 생성 신경망 모듈, 텍스트 정보 생성 모듈을 포함하도록 구성될 수 있다. As shown in FIG. 5 , the text extraction module is a module that receives an electronic document image, extracts text included in the electronic document image, and outputs text information. A text extraction module according to an embodiment of the present invention may include a text box generation neural network module, a text box segmentation module, a text sequence generation neural network module, and a text information generation module.

도 6은 본 발명의 일실시예에 따른 텍스트 추출 모듈 내의 텍스트 박스 생성 신경망 모듈을 도시한 모식도이다.6 is a schematic diagram showing a text box generating neural network module in a text extraction module according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 텍스트 박스 생성 신경망 모듈은, 전자문서 이미지를 입력 데이터로 하고 전자문서 이미지에 포함된 텍스트를 둘러싸는 bounding box인 텍스트 박스에 text가 포함되어있는지 여부를 추정하는 text class probability를 의미하는 text class 정보, 상기 텍스트 박스의 좌표, 높이, 너비, 각도를 포함하는 텍스트 좌표 정보 및 텍스트의 추정된 글자 크기를 의미하는 텍스트 사이즈 클래스 정보를 포함하는 텍스트 박스 정보를 출력 데이터로 하는 인공신경망 모듈을 의미할 수 있다. As shown in FIG. 6, the text box generation neural network module uses an electronic document image as input data and a text class that estimates whether text is included in a text box, which is a bounding box surrounding the text included in the electronic document image. Text box information including text class information meaning probability, text coordinate information including the coordinates, height, width and angle of the text box, and text size class information meaning the estimated font size of the text as output data It may mean an artificial neural network module.

본 발명의 일실시예에 따른 텍스트 박스 생성 신경망 모듈은 classification과 localization을 수행하는 multi-object detection 인공신경망 모듈을 포함할 수 있고, 이러한 multi-object detection 인공신경망 모듈로는 2-stage detector로서 RCNN(2013), OverFeat(ICLR 2014), Fast RCNN(ICCV 2015), Faster RCNN(NIPS 2015), Mask RCNN(ICCV 2017) 등이 활용될 수 있고, 1-stage detector로서 anchor based의 YOLO v1(CVPR 2016), YOLO v2(CVPR 2017), YOLO v3(arXiv 2018), SSD(ECCV 2016), RetinaNet(ICCV 2017) 등이 활용될 수 있으며, 1-stage detector로서 non-anchor based의 CornerNet(ECCV 2018), ExtreamNet(2019), CenterNet(2019) 등이 활용될 수 있고, CRAFT(down sampling/up sampling) 등이 활용될 수 있다. The text box generating neural network module according to an embodiment of the present invention may include a multi-object detection artificial neural network module that performs classification and localization, and this multi-object detection artificial neural network module includes RCNN as a 2-stage detector ( 2013), OverFeat (ICLR 2014), Fast RCNN (ICCV 2015), Faster RCNN (NIPS 2015), Mask RCNN (ICCV 2017), etc. can be used, and as a 1-stage detector, anchor-based YOLO v1 (CVPR 2016) , YOLO v2 (CVPR 2017), YOLO v3 (arXiv 2018), SSD (ECCV 2016), RetinaNet (ICCV 2017), etc. can be used, and as a 1-stage detector, non-anchor based CornerNet (ECCV 2018), ExtreamNet (2019), CenterNet (2019), etc. can be utilized, and CRAFT (down sampling/up sampling) can be utilized.

본 발명의 일실시예에 따른 텍스트 박스 생성 신경망 모듈의 출력 데이터는 전자 문서 이미지 내의 적어도 하나의 object에 대한 하나 이상의 텍스트 박스 이미지일 수 있으며, 이러한 텍스트 박스 이미지는 전자문서 이미지 내에 포함되는 적어도 하나 이상의 object에 대한 text class의 신뢰도를 의미하는 텍스트(text) class 정보(confidence score 또는 class probability), 해당 text object(text class로 추론되는 object)의 좌표 정보인 텍스트 좌표 정보(coordinate data)를 포함할 수 있고, object의 텍스트 좌표 정보는 인공신경망의 구성에 따라 bounding box의 top-left coner와 bottom-right coner의 좌표, bounding box의 centeral region 좌표, bounding box의 width 및 hight, bounding box의 각도를 포함하도록 구성될 수 있으며, bounding box 내의 텍스트의 추정된 글자 크기를 의미하는 텍스트 사이즈 클래스 정보를 포함하도록 구성될 수 있다. 예를 들어, 전자문서 이미지 내에 포함되는 특정 object에 대한 텍스트 class 정보는 [0.8], 텍스트 좌표 정보는 [x,y,w,h,θ], 텍스트 사이즈 클래스 정보는 [3] 등의 형태로 구성될 수 있다. 본 발명의 일실시예에 따르면, 텍스트 좌표 정보에 '각도' 정보가 더 포함되게 되므로, 텍스트 박스 이미지의 text 각도가 표준화되는 효과가 발생된다.Output data of the text box generating neural network module according to an embodiment of the present invention may be one or more text box images for at least one object in an electronic document image, and such text box images may include at least one or more text box images included in the electronic document image. It may include text class information (confidence score or class probability), which means the confidence of the text class for an object, and text coordinate data, which is the coordinate information of the corresponding text object (object inferred as a text class). and the text coordinate information of the object includes the coordinates of the top-left corner and bottom-right corner of the bounding box, the coordinates of the centeral region of the bounding box, the width and height of the bounding box, and the angle of the bounding box according to the configuration of the artificial neural network. It can be configured to include text size class information that means the estimated character size of text within the bounding box. For example, text class information for a specific object included in an electronic document image is [0.8], text coordinate information is [x, y, w, h, θ], text size class information is [3], etc. can be configured. According to an embodiment of the present invention, since 'angle' information is further included in the text coordinate information, the text angle of the text box image is standardized.

도 6은 본 발명의 일실시예에 따른 텍스트 박스 생성 신경망 모듈이 YOLO v1(CVPR 2016)으로 구성되는 경우의 구조를 도시한 모식도이다. 예를 들어, 도 6에 도시된 바와 같이 본 발명의 일실시예에 따른 텍스트 박스 생성 신경망 모듈을 YOLO v1(CVPR 2016)으로 구성하는 경우, DarkNet Architecture를 사용하게 되며, convolution layer들을 통해 feature map을 추출하고, fully connected layer를 거쳐 바로 bounding box의 text class probability(class confidence, 텍스트 클래스 정보), coordinate data(텍스트 좌표 정보)와 텍스트 사이즈 클래스 정보를 추론(inference)하여 출력 데이터로서 출력하도록 구성된다. YOLO에서는 input 이미지인 전자문서 이미지를 SxS grid로 나누고 각 grid 영역에 해당하는 bounding box(SxSxB개)와 Class confidence(Probability(object)×IoU(prediction, ground truth)), Class probability map(Probability(Class_i|object))을 구하도록 구성된다. 구체적인 네트워크 구조를 예를 들면, 한 grid 영역당 5개의 bounding box coordinate(텍스트 좌표 정보)와 confidence score(텍스트 클래스 정보)를 출력하도록 구성될 수 있고, 전자문서 이미지는 448x448x3의 크기로 입력되도록 구성될 수 있으며, DarkNet Architecture의 Activation map은 7x7x1024의 크기로 구성될 수 있고, DarkNet Architecture 이후 4096 및 7x7x30의 Fully Connected Layer가 구성될 수 있다.6 is a schematic diagram showing the structure of a text box generating neural network module according to an embodiment of the present invention composed of YOLO v1 (CVPR 2016). For example, as shown in FIG. 6, when the text box generating neural network module according to one embodiment of the present invention is configured with YOLO v1 (CVPR 2016), DarkNet Architecture is used, and feature maps are generated through convolution layers. It is configured to infer text class probability (class confidence, text class information), coordinate data (text coordinate information) and text size class information of the bounding box directly through a fully connected layer and output them as output data. In YOLO, the electronic document image, which is the input image, is divided into SxS grids, and the bounding boxes (SxSxB) corresponding to each grid area, Class confidence (Probability (object) × IoU (prediction, ground truth)), Class probability map (Probability (Class_i |object)). For example, a specific network structure can be configured to output 5 bounding box coordinates (text coordinate information) and confidence score (text class information) per grid area, and an electronic document image can be configured to be input in a size of 448x448x3. Activation map of DarkNet Architecture can be configured in size of 7x7x1024, and Fully Connected Layers of 4096 and 7x7x30 can be configured after DarkNet Architecture.

도 7은 본 발명의 일실시예에 따른 텍스트 추출 모듈 내의 텍스트 박스 분절 모듈을 도시한 모식도이다.7 is a schematic diagram showing a text box segmentation module in a text extraction module according to an embodiment of the present invention.

도 7에 도시된 바와 같이, 텍스트 박스 분절 모듈은, 도 6의 예에서 텍스트 박스 생성 신경망 모듈에 의해 출력된, 텍스트 박스 정보에 대응되는 전자문서 이미지의 일부분인 텍스트 박스 이미지를 입력 데이터로 하고, 텍스트 박스 이미지에 대한 Feature map(activation map, 액티베이션 정보)을 출력 데이터로 하는 Fully connected layer(FC layer) 또는 바람직하게는 Fully convolutional layer(FCN layer)가 포함된 CNN 모듈 및 상기 액티베이션 정보를 통합한 통합 액티베이션 정보인 텍스트 박스 특징 벡터를 출력으로서 생성하고 상기 텍스트 박스 특징 벡터를 길이 방향으로 n개(특정 분절 개수)의 벡터로 분절하여 구성된 n개의 분절 벡터 column(분절 벡터 정보)을 생성하는 분절 벡터 생성 모듈을 포함할 수 있다. As shown in FIG. 7, the text box segmentation module takes as input data a text box image, which is a part of an electronic document image corresponding to text box information, output by the text box generation neural network module in the example of FIG. 6, A CNN module including a fully connected layer (FC layer) or preferably a fully convolutional layer (FCN layer) that uses the feature map (activation map, activation information) for the text box image as output data, and an integration that integrates the activation information Segment vector generation that generates a text box feature vector, which is activation information, as an output, and generates n segment vector columns (segment vector information) formed by segmenting the text box feature vector into n vectors (a specific number of segments) in the longitudinal direction. modules may be included.

도 8은 본 발명의 일실시예에 따른 분절 벡터 생성 모듈의 작동관계를 도시한 모식도이다. 8 is a schematic diagram showing an operating relationship of a segmental vector generation module according to an embodiment of the present invention.

도 8에 도시된 바와 같이, 분절 벡터 생성 모듈은 FCN layer가 포함된 CNN 모듈에 구성된 복수의 Conv.layer와 연결되어 FCN layer가 포함된 CNN 모듈의 입력 데이터가 특정 크기의 텍스트 박스 이미지일 때 Conv. layer에서 출력되는 적어도 한 차원 이상의 Activation map을 포함하는 복수개의 액티베이션 정보(좌표 별 액티베이션 값을 포함)를 입력받고, 텍스트 박스 이미지에 대한 text class의 coordinate data(텍스트 좌표 정보)를 입력받으며, 복수개의 액티베이션 정보의 크기를 텍스트 좌표 정보에 대응되게 조정하고 통합하여 통합 액티베이션 정보인 텍스트 박스 특징 벡터를 생성하며, 텍스트 박스의 길이방향으로 액티베이션 값의 peak와 peak 사이를 분절하여 n개의 분절 벡터 column(분절 벡터 정보)을 생성하는 모듈이다. 이때, FCN Layer가 포함된 CNN 모듈은, FCN layer에 Global Average Pooling(GAP)이 적용되어 해당 텍스트 박스 이미지의 text class probability를 출력하는 output layer가 구성된 상태에서 기학습되도록 구성된다. 또한, 본 발명의 일실시예에 따른 분절 벡터 생성 모듈은, FCN Layer가 포함된 CNN 모듈에서 출력되는 text class probability에 따라 입력 데이터인 텍스트 박스 이미지의 크기(w,h)를 특정 가중치만큼 확장하여 입력 데이터로 다시 feed 하도록 구성될 수 있다. 이때, 예를 들어 특정 가중치는 1/[text class probabilty] 또는 1+(1-[text class probabilty])의 형태로 구성될 수 있다. 예를 들어, FCN Layer가 포함된 CNN 모듈에 입력된 텍스트 박스 이미지에 의해 출력된 text class probability가 0.7인 경우, 1.3의 가중치만큼 텍스트 박스 이미지를 확장하여 입력 데이터로 입력하도록 구성될 수 있다. As shown in FIG. 8, the segmentation vector generation module is connected to a plurality of Conv.layers configured in the CNN module including the FCN layer, and when the input data of the CNN module including the FCN layer is a text box image of a specific size, Conv. . A plurality of activation information (including activation values for each coordinate) including at least one-dimensional activation map output from the layer is input, coordinate data (text coordinate information) of the text class for the text box image is input, and multiple The size of the activation information is adjusted and integrated to correspond to the text coordinate information to generate a text box feature vector, which is integrated activation information, and segmented between the peaks of the activation values in the length direction of the text box to create n segment vector columns (segmental vector columns). vector information). At this time, the CNN module including the FCN layer is configured to be pre-learned in a state in which an output layer that outputs the text class probability of the text box image by applying Global Average Pooling (GAP) to the FCN layer is configured. In addition, the segment vector generation module according to an embodiment of the present invention expands the size (w, h) of the text box image, which is input data, by a specific weight according to the text class probability output from the CNN module including the FCN layer. It can be configured to feed back with input data. In this case, for example, the specific weight may be configured in the form of 1/[text class probabilty] or 1+(1-[text class probabilty]). For example, if the text class probability output by the text box image input to the CNN module including the FCN layer is 0.7, the text box image may be expanded by a weight of 1.3 and inputted as input data.

본 발명의 일실시예에 따른 텍스트 박스 분절 모듈에 따르면, 텍스트 박스 이미지 내에 포함된 문자 하나 하나를 별도의 분절 벡터 column으로 정밀하게 추론(inference)할 수 있게 되는 효과가 발생된다. 또한, FCN layer 및 GAP이 구성됨으로써, 입력 데이터인 텍스트 박스 이미지의 크기가 다양하게 구성되어도 텍스트 박스 특징 벡터의 생성이 가능한 효과가 발생된다. 또한, 텍스트 박스 생성 신경망 모듈과 텍스트 박스 분절 모듈에서 각각 text class probability를 추론하게 되므로, 텍스트 박스 생성 신경망 모듈에서 기추론된 text class probability를 기초로 텍스트 박스 분절 모듈의 FCN layer가 포함된 CNN 모듈을 기학습시킬 수 있는 효과가 발생된다. According to the text box segmentation module according to an embodiment of the present invention, each character included in the text box image can be precisely inferred as a separate segment vector column. In addition, since the FCN layer and the GAP are configured, even if the size of the text box image, which is input data, is configured in various ways, a text box feature vector can be generated. In addition, since the text class probability is inferred in the text box generation neural network module and the text box segmentation module, respectively, the CNN module including the FCN layer of the text box segmentation module is derived based on the text class probability previously inferred in the text box generation neural network module. There is an effect that can be learned.

도 9는 본 발명의 변형예에 따른 분절 벡터 생성 모듈의 작동관계를 도시한 모식도이다. 9 is a schematic diagram showing an operating relationship of a segmental vector generation module according to a modified example of the present invention.

도 9에 도시된 바와 같이, 본 발명의 변형예에 따르면, FCN layer가 포함된 CNN 모듈이 Global average pooling을 통해 text class probability를 출력 데이터로 출력하는 단일 컨볼루젼 네트워크로 구성된 인공신경망으로 구성되고, 분절 벡터 생성 모듈과 FCN layer가 포함된 CNN 모듈의 사이에 차원을 축소하는 클래스 액티베이션 생성 모듈을 더 포함하며, 클래스 액티베이션 생성 모듈은 FCN layer가 포함된 CNN 모듈의 복수의 conv.layer와 연결되어 FCN layer가 포함된 CNN 모듈의 입력 데이터가 텍스트 박스 이미지일 때 Conv.layer에서 출력되는 적어도 한 차원 이상의 Activation map을 포함하는 액티베이션 정보(좌표 별 액티베이션 값을 포함)를 입력받고 text class에 대응되는 Activation map을 포함하는 클래스 액티베이션 정보(text class에 따른 좌표 별 액티베이션 값을 포함)를 출력하며, Global average pooling 함수를 통해 text class probability를 출력하도록 구성된 FCN layer가 포함된 CNN 모듈과 함께 학습될 수 있다. 또한, 분절 벡터 생성 모듈에는 상기 클래스 액티베이션 정보와 텍스트 좌표 정보가 입력되며, 분절 벡터 생성 모듈은 입력되는 클래스 액티베이션 정보의 크기를 텍스트 좌표 정보의 크기에 대응되게 조정하고 통합하여 통합 액티베이션 정보인 텍스트 박스 특징 벡터를 생성하며, 텍스트 박스의 길이방향으로 액티베이션 값의 peak와 peak 사이를 분절하여 n개의 분절 벡터 column(분절 벡터 정보)을 생성하도록 구성될 수 있다. As shown in FIG. 9, according to a modified example of the present invention, a CNN module including an FCN layer is composed of an artificial neural network composed of a single convolutional network that outputs text class probability as output data through global average pooling, It further includes a class activation generation module that reduces dimensions between the segment vector generation module and the CNN module including the FCN layer, and the class activation generation module is connected to a plurality of conv.layers of the CNN module including the FCN layer to connect the FCN layer. When the input data of the CNN module including the layer is a text box image, the activation information (including the activation value for each coordinate) including the activation map of at least one dimension output from the Conv.layer is input and the activation map corresponding to the text class is received. It can be learned with a CNN module that includes an FCN layer configured to output class activation information (including activation values for each coordinate according to text class) and output text class probability through a global average pooling function. In addition, the class activation information and text coordinate information are input to the segment vector generation module, and the segment vector generation module adjusts and integrates the size of the input class activation information to correspond to the size of the text coordinate information and integrates the text box as integrated activation information. A feature vector may be generated, and n segment vector columns (segment vector information) may be generated by segmenting between peaks of activation values in the longitudinal direction of the text box.

이때, 클래스 액티베이션 생성 모듈에 의한 클래스 액티베이션 정보의 생성은 아래와 같이 수행될 수 있다. At this time, generation of class activation information by the class activation generation module may be performed as follows.

위 수학식 1에서, M_c(x,y)는 class c로의 분류에 영향을 주는 (x,y)에 위치한 액티베이션 값, w_k ^c는 activation map에서 class c에 대한 k번째 채널의 가중치, f_k(x,y)는 Activation map의 k번째 채널의 (x,y)에 위치한 액티베이션 값을 의미한다. In Equation 1 above, M _c (x, y) is the activation value located at (x, y) that affects classification into class c, w _k ^c is the weight of the k-th channel for class c in the activation map, f _k (x, y) means an activation value located at (x, y) of the k-th channel of the activation map.

본 발명의 변형예에 따르면, 전체 클래스에 대한 Activation map이 아닌, text class에 한정되어 생성되는 Class activaiton map을 이용하여 text class에 한정된 텍스트 박스 특징 벡터를 출력하게 되므로, 텍스트 박스 이미지 내에 포함된 문자 하나 하나를 별도의 분절 벡터 column으로 보다 정밀하게 추론(inference)할 수 있게 되는 효과가 발생된다. 또한, FCN layer가 포함된 CNN 모듈부터 클래스 액티베이션 생성 모듈까지 단일 인공신경망 모듈로 구성할 수 있게 됨으로써 인공신경망 학습 및 추론의 효율이 향상되는 효과가 발생된다. 또한, FCN layer 및 GAP이 구성됨으로써, 입력 데이터인 텍스트 박스 이미지의 크기가 다양하게 구성되어도 텍스트 박스 특징 벡터의 생성이 가능한 효과가 발생된다. 또한, 텍스트 박스 생성 신경망 모듈과 텍스트 박스 분절 모듈에서 각각 text class probability를 추론하게 되므로, 텍스트 박스 생성 신경망 모듈에서 기추론된 text class probability를 기초로 텍스트 박스 분절 모듈의 FCN layer가 포함된 CNN 모듈을 기학습시킬 수 있는 효과가 발생된다. According to the modified example of the present invention, since the text box feature vector limited to the text class is output using the class activaiton map generated limited to the text class instead of the activation map for the entire class, the character included in the text box image The effect of being able to more precisely infer each one as a separate segment vector column is generated. In addition, since a single artificial neural network module can be configured from a CNN module including an FCN layer to a class activation generation module, the efficiency of artificial neural network learning and inference is improved. In addition, since the FCN layer and the GAP are configured, even if the size of the text box image, which is input data, is configured in various ways, a text box feature vector can be generated. In addition, since the text class probability is inferred in the text box generation neural network module and the text box segmentation module, respectively, the CNN module including the FCN layer of the text box segmentation module is derived based on the text class probability previously inferred in the text box generation neural network module. There is an effect that can be learned.

텍스트 추출 모듈 내의 텍스트 시퀀스 생성 신경망 모듈은, 텍스트 박스 이미지에 대한 순차적인 분절 벡터 column(하나의 음절로 분절된 벡터 column)을 입력 데이터로 하고 m번째 분절 텍스트 정보(하나의 음절로 분절된 벡터에 대한 텍스트인 분절 텍스트에 대한 정보)를 순차적으로 포함하는 텍스트 시퀀스 정보를 출력 데이터로 하는 순환 인공신경망 모듈을 의미할 수 있다. 이때, 순환 인공신경망 모듈은 RNN(Recurrent Neural Network), LSTM(Long-short term memory), Bi-LSTM 등의 인공신경망 구조로 구성될 수 있다. The text sequence generation neural network module in the text extraction module uses the sequential segment vector column (vector column segmented into one syllable) for the text box image as input data and the m-th segment text information (vector segmented into one syllable). It may refer to a recursive artificial neural network module that outputs text sequence information sequentially including information about segmented text, which is text about text, as output data. At this time, the recurrent artificial neural network module may be configured with an artificial neural network structure such as a recurrent neural network (RNN), long-short term memory (LSTM), and Bi-LSTM.

도 10은 본 발명의 일실시예에 따른 텍스트 시퀀스 생성 신경망 모듈을 도시한 모식도이다. 10 is a schematic diagram showing a text sequence generating neural network module according to an embodiment of the present invention.

도 10에 도시된 바와 같이, 본 발명의 일실시예에 따른 텍스트 추출 모듈의 텍스트 시퀀스 생성 신경망 모듈은 복수개의 RNN 블록으로 구성되며, 하나의 RNN 블록은 제1 RNN 셀, 제2 RNN 셀을 포함할 수 있다. 도 10에 도시된 바와 같이, 제1 RNN 셀은 m번째 분절 벡터 column, 이전 셀의 출력 데이터 및 텍스트 박스 특징 벡터를 입력 데이터로 입력 받고 m번째 분절 벡터 column에 대한 언어 종류 정보(예를 들어, 한국어(kor)/영어(eng)/일본어(jp)/숫자(num) 등)를 출력하고, 제2 RNN 셀은 해당 m번째 분절 벡터 column에 대한 언어 종류 정보 및 제1 RNN 셀의 hidden state를 입력받아 m번째 분절 텍스트 정보를 출력하도록 구성될 수 있다. 본 발명의 일실시예에 따른 텍스트 시퀀스 생성 신경망 모듈은 제1 RNN 셀에서 출력된 언어 종류 정보가 'eng'인 경우, 강화학습 모듈에 의해 계산되는 모든 Action에 대한 Reward가 음수인 경우에 추론을 종료할 수 있다. 본 발명의 일실시예에 따른 텍스트 시퀀스 생성 신경망 모듈에 따르면, 언어 종류 정보 및 m번째 분절 텍스트 정보를 문자 기재 순서(좌측에서 우측으로)에 따라 순차적으로 출력하도록 구성되어 이전 step에서 생성된 언어 종류 정보와 분절 텍스트 정보가 다음 step에서 생성될 분절 텍스트 정보에 영향을 주게 되므로, 보다 정확하게 분절 텍스트를 생성할 수 있게 되는 효과가 발생된다. 또한, 제1 RNN셀에서 텍스트 박스 특징 벡터를 입력받게 되고 이를 기초로 제2 RNN셀에서 하나의 음절에 대한 분절 텍스트 정보를 추론하게 되므로, 텍스트 박스 내 문장의 전체적인 맥락에서의 분절 텍스트를 추론할 수 있게 되는 효과가 발생된다. As shown in FIG. 10, the text sequence generating neural network module of the text extraction module according to an embodiment of the present invention is composed of a plurality of RNN blocks, and one RNN block includes a first RNN cell and a second RNN cell. can do. As shown in FIG. 10, the first RNN cell receives the m-th segment vector column, the output data of the previous cell, and the text box feature vector as input data, and receives language type information for the m-th segment vector column (for example, Korean (kor) / English (eng) / Japanese (jp) / number (num), etc.) are output, and the second RNN cell displays the language type information for the corresponding m-th segment vector column and the hidden state of the first RNN cell. It may be configured to receive input and output m-th segment text information. The text sequence generation neural network module according to an embodiment of the present invention performs inference when the language type information output from the first RNN cell is 'eng' and when the rewards for all actions calculated by the reinforcement learning module are negative numbers. can be terminated According to the text sequence generation neural network module according to an embodiment of the present invention, it is configured to sequentially output language type information and m-th segment text information according to the order of character description (from left to right), and the language type generated in the previous step. Since the information and the segmented text information affect the segmented text information to be generated in the next step, the effect of generating the segmented text more accurately occurs. In addition, since the text box feature vector is input in the first RNN cell and segmented text information for one syllable is inferred in the second RNN cell based on this, segmented text in the overall context of the sentence in the text box can be inferred. The effect of being able to happen.

본 발명의 일실시예에 따른 텍스트 시퀀스 생성 신경망 모듈의 RNN 블록은 강화학습 모듈에 의해 학습될 수 있다. 상기 RNN 블록을 학습하는 강화학습 모듈은, 기생성된 텍스트 박스 특징 벡터, 언어 종류 벡터, m-1번째 분절 벡터 column, m-1번째 분절 텍스트 정보를 Environment로 하고, 상기 텍스트 시퀀스 생성 신경망 모듈의 각 RNN 블록을 Agent로 하고, 1번째 분절 벡터 column에서부터 n-1번째 분절 벡터 column까지의 언어 종류 정보 및 분절 텍스트 정보를 출력하였을 때의 상황을 State로 하며, 이러한 State에서 Agent인 RNN 블록이 m번째 분절 벡터 column에 대해 출력하는 언어 종류 정보 및 m번째 분절 텍스트 정보를 Action으로 하고, 현재 step의 출력 데이터인 현재 step의 언어 종류 정보 및 m번째 분절 텍스트 정보와 ground truth와의 유사도(예를 들어, cosine similarity)가 높거나 차이(예를 들어, Kullback-Leibler divergence)가 적을수록 높은 Reward가 생성되어 Agent인 RNN 블록의 은닉층을 업데이트 하도록 구성될 수 있다. The RNN block of the text sequence generating neural network module according to an embodiment of the present invention may be learned by the reinforcement learning module. The reinforcement learning module that learns the RNN block sets the pre-generated text box feature vector, language type vector, m-1 th segment vector column, and m-1 th segment text information as an environment, and the text sequence generation neural network module Each RNN block is an agent, and the situation when the language type information and segment text information from the 1st segment vector column to the n-1 th segment vector column is output is the state. The language type information and the m-th segment text information output for the th segment vector column are set as Actions, and the similarity between the language type information and the m-th segment text information of the current step, which are the output data of the current step, and the ground truth (for example, The higher the cosine similarity or the smaller the difference (eg, Kullback-Leibler divergence), the higher the reward is generated, which can be configured to update the hidden layer of the RNN block as an agent.

도 11은 본 발명의 일실시예에 따른 강화학습 모듈을 도시한 모식도이다. 11 is a schematic diagram showing a reinforcement learning module according to an embodiment of the present invention.

도 11에 도시된 바와 같이, 기생성된 텍스트 박스 특징 벡터, 언어 종류 벡터, m-1번째 분절 벡터 column, m-1번째 분절 텍스트 정보를 Environment로 하고, 상기 텍스트 시퀀스 생성 신경망 모듈의 각 RNN 블록을 Agent로 하고, 1번째 분절 벡터 column에서부터 n-1번째 분절 벡터 column까지의 언어 종류 정보 및 분절 텍스트 정보를 출력하였을 때의 상황을 State로 하며, 이러한 State에서 Agent인 RNN 블록이 m번째 분절 벡터 column에 대해 출력하는 언어 종류 정보 및 m번째 분절 텍스트 정보를 Action으로 하고, 현재 step의 출력 데이터인 현재 step의 언어 종류 정보 및 m번째 분절 텍스트 정보와 ground truth와의 유사도(예를 들어, cosine similarity)가 높거나 차이(예를 들어, Kullback-Leibler divergence)가 적을수록 높은 Reward가 생성되어 Agent인 RNN 블록의 은닉층을 업데이트 하도록 구성될 수 있다. 본 발명의 일실시예에 따른 강화학습 모듈에 의해 최적화가 완료된 RNN 블록은 은닉층이 고정되도록 구성될 수 있다. As shown in FIG. 11, the pre-generated text box feature vector, language type vector, m-1 th segment vector column, and m-1 th segment text information are set as Environment, and each RNN block of the text sequence generating neural network module is the Agent, and the situation when the language type information and segment text information from the 1st segment vector column to the n-1st segment vector column is output is the State. The language type information and the m-th segment text information output for the column are set as Actions, and the language type information and the m-th segment text information of the current step, which are the output data of the current step, and the similarity between the ground truth (eg, cosine similarity) It can be configured to update the hidden layer of the RNN block, which is an agent, by generating a higher reward as the difference is higher or the difference (eg, Kullback-Leibler divergence) is smaller. An RNN block optimized by the reinforcement learning module according to an embodiment of the present invention may be configured such that the hidden layer is fixed.

이에 따르면, 텍스트 시퀀스 생성 신경망 모듈 및 강화학습 모듈에 의해 이미지 텍스트 박스 특징 벡터에 대응되는 최적의 분절 텍스트 정보가 텍스트 박스의 각 문자의 순차에 맞게 생성되는 효과가 발생된다. 또한, 강화학습 모듈이 텍스트 시퀀스 생성 신경망 모듈에서 생성될 수 있는 모든 언어 종류 정보 및 m번째 분절 텍스트 정보에 대한 모든 경우의 수를 고려할 필요 없이 텍스트 박스의 문자 각각에 대해 순차적으로 최적화되도록 구성되므로 강화학습 모듈이 계산하여야 하는 경우의 수가 저감되어 컴퓨팅 리소스가 저감되는 효과가 발생된다. According to this, an effect of generating optimal segmentation text information corresponding to the feature vector of the image text box according to the sequence of each character in the text box is generated by the text sequence generating neural network module and the reinforcement learning module. In addition, since the reinforcement learning module is configured to be sequentially optimized for each character of the text box without considering all the number of cases for all language type information and m-th segment text information that can be generated by the text sequence generation neural network module, reinforcement The number of cases to be calculated by the learning module is reduced, resulting in a reduction in computing resources.

본 발명의 변형예에 따른 강화학습 모듈은 아래의 구성에 의해 보다 효과적인 강화학습에 의해 RNN 블록이 업데이트 되도록 구성될 수 있다. 도 12는 본 발명의 변형예에 따른 강화학습 모듈을 도시한 모식도이다. The reinforcement learning module according to the modified example of the present invention can be configured so that the RNN block is updated by more effective reinforcement learning through the following configuration. 12 is a schematic diagram showing a reinforcement learning module according to a modified example of the present invention.

도 12에 도시된 바와 같이, 본 발명의 변형예에 따른 강화학습 모듈은 특정 상태(state)에서의 가치를 출력하는 가치 함수를 학습하는 인공신경망인 가치망 및 언어 종류 정보 및 m번째 분절 텍스트 정보의 각 확률을 출력하는 정책 함수를 학습하는 정책망을 포함할 수 있고, 본 발명의 변형예에 따른 정책망 및 가치망은 텍스트 시퀀스 생성 신경망 모듈의 특정 RNN 블록에 연결되도록 구성될 수 있다. 정책망과 가치망은 RNN 블록과 연결되어 특정 순차에 대한 언어 종류 정보 및 분절 텍스트 정보를 출력할 수 있다.As shown in FIG. 12, the reinforcement learning module according to the modified example of the present invention is an artificial neural network that learns a value function outputting a value in a specific state, language type information, and m-th segment text information. It may include a policy network that learns a policy function that outputs each probability of , and the policy network and value network according to the modified example of the present invention may be configured to be connected to a specific RNN block of a text sequence generating neural network module. The policy network and the value network can be connected to the RNN block to output language type information and segmental text information for a specific sequence.

정책망은 강화학습 모듈의 각 상태(state)에서 선정된 언어 종류 정보 및 m번째 분절 텍스트 정보의 확률을 결정하는 인공신경망이고, 정책 함수를 학습하여 선정된 언어 종류 정보 및 m번째 분절 텍스트 정보의 확률을 출력하게 된다. 정책망의 Cost function은 정책함수와 가치망의 Cost Function을 곱하여 크로스 엔트로피(Cross Entropy)를 계산한 뒤 Policy gradient를 취한 함수일 수 있고, 예를 들면, 아래 수학식 2와 같이 구성될 수 있다. 정책망은 크로스 엔트로피와 가치망의 cost function인 시간차 에러의 곱을 기초로 back propagation 될 수 있다. The policy network is an artificial neural network that determines the probability of selected language type information and m-th segment text information in each state of the reinforcement learning module. output probabilities. The cost function of the policy network may be a function obtained by calculating cross entropy by multiplying the policy function by the cost function of the value network and then taking the policy gradient. For example, it may be configured as in Equation 2 below. The policy network can be back propagated based on the product of the cross entropy and the time difference error, which is the cost function of the value network.

수학식 2에서, π는 정책 함수, θ는 정책망 파라미터, π_θ(a_i│s_i)는 현재 에피소드에서 특정 액션(언어 종류 정보 및 m번째 분절 텍스트 정보)을 할 가능성, V는 가치 함수, w는 가치망 파라미터, s_i는 현재 에피소드인 i의 상태 정보, S_i+1은 다음 에피소드인 i+1의 상태 정보, r_i+1은 다음 에피소드에서 획득하는 것으로 예상되는 보상, V_w(s_i)는 현재 에피소드에서의 보상 가능성, V_w(s_i+1)는 다음 에피소드에서의 보상 가능성, γ는 감가율을 의미할 수 있다. 이때, r_i+1은 현재 step의 언어 종류 정보 및 m번째 분절 텍스트 정보와 ground truth와의 유사도를 수신하도록 구성될 수 있다. In Equation 2, π is a policy function, θ is a policy network parameter, π _θ (a _i │s _i ) is the probability of taking a specific action (language type information and m-th segment text information) in the current episode, and V is a value function , w is a value network parameter, s _i is the state information of the current episode i, S _i+1 is the state information of the next episode i+1, r _i+1 is the reward expected to be obtained in the next episode, V _w (s _i ) may mean the possibility of compensation in the current episode, V _w (s _i+1 ) may mean the possibility of compensation in the next episode, and γ may mean the depreciation rate. In this case, r _i+1 may be configured to receive the language type information of the current step and the similarity between the m-th segment text information and the ground truth.

본 발명의 일실시예에 따른 정책망은 강화학습이 진행되기 이전에 이전 언어 종류 정보 및 m번째 분절 텍스트 정보와 이에 따른 성과 정보(현재 step의 언어 종류 정보 및 m번째 분절 텍스트 정보와 ground truth와의 유사도)를 기초로 지도학습(Supervised Learning)되어 정책망의 weight가 업데이트 됨으로써 정책의 기초를 학습할 수 있다. 즉, 정책망의 weight는 언어 종류 정보 및 m번째 분절 텍스트 정보, 성과 정보를 토대로 지도학습되어 설정될 수 있다. 이에 따르면, 언어 종류 정보 및 m번째 분절 텍스트 정보의 히스토리에 의해 정책망이 매우 빠르게 학습될 수 있는 효과가 발생된다. In the policy network according to an embodiment of the present invention, prior language type information and m-th segment text information and result information (the language type information of the current step and the m-th segment text information and the ground truth) are used before reinforcement learning proceeds. The basis of the policy can be learned by updating the weight of the policy network through supervised learning based on similarity). That is, the weight of the policy network can be set through supervised learning based on language type information, m-th segment text information, and performance information. According to this, the policy network can be learned very quickly by the history of the language type information and the m-th segment text information.

또한, 본 발명의 일실시예에 따르면 정책망의 지도학습 시에 랜덤 벡터를 포함하여 이전 레이어의 연산부 종류 정보 및 파라미터 정보와 이에 따른 성과 정보를 기초로 지도학습이 되도록 구성될 수 있다. 랜덤 벡터는 예를 들면 가우시안 확률 분포(Gaussian distribution)를 이용할 수 있다. 이에 따르면, 정책망이 랜덤한 확률로 도전적인 언어 종류 정보 및 m번째 분절 텍스트 정보를 출력할 수 있게 되는 효과가 발생된다. 정책망의 지도학습 시에 이전 언어 종류 정보 및 m번째 분절 텍스트 정보와 이에 따른 성과 정보를 기초로 지도학습이 되도록 구성하면 언어 종류 정보 및 m번째 분절 텍스트 정보의 선정이 이전 step의 정책 내에서 최적화되는 결과가 나타나게 된다. 하지만, 본 발명의 일실시예에 따라 정책망의 지도학습 시에 랜덤 벡터를 포함하게 되면 강화학습이 진행될수록 정책망이 이전 step의 정책보다 더 효과적인 언어 종류 정보 및 m번째 분절 텍스트 정보를 학습할 수 있게 되는 효과가 발생된다. In addition, according to an embodiment of the present invention, during supervised learning of a policy network, supervised learning may be performed based on type information and parameter information of an operation unit of a previous layer, including random vectors, and result information accordingly. The random vector may use, for example, a Gaussian distribution. According to this, the policy network can output challenging language type information and m-th segment text information with random probability. When supervised learning of the policy network is configured so that supervised learning is based on previous language type information, m-th segment text information, and result information, the selection of language type information and m-th segment text information is optimized within the policy of the previous step. results will appear. However, according to an embodiment of the present invention, if a random vector is included during supervised learning of the policy network, as reinforcement learning progresses, the policy network can learn more effective language type information and m-th segment text information than the policy in the previous step. The effect of being able to happen.

가치망은 강화학습 모듈이 가질 수 있는 각 상태(State)에서 보상(Reward)을 달성할 가능성을 도출하는 인공신경망이고, 가치 함수를 학습하게 된다. 가치망은 에이전트(agent)인 RNN 블록이 어떤 방향으로 업데이트 될 지에 대한 방향성을 제시해주게 된다. 이를 위해, 가치망의 입력 변수는 강화학습 모듈의 상태에 대한 정보인 상태 정보로 설정되고, 가치망의 출력 변수는 RNN 블록이 보상을 달성할 가능성인 보상 가능성 정보(현재 step의 언어 종류 정보 및 m번째 분절 텍스트 정보와 ground truth와의 유사도)로 설정될 수 있다. 본 발명의 일실시예에 따른 보상 가능성 정보는 아래 수학식과 같은 Q-function으로 계산될 수 있다. The value network is an artificial neural network that derives the possibility of achieving a reward in each state that the reinforcement learning module can have, and learns a value function. The value network presents the direction in which the RNN block, which is the agent, will be updated. To this end, the input variable of the value network is set to state information, which is information about the state of the reinforcement learning module, and the output variable of the value network is set to reward possibility information, which is the possibility of the RNN block achieving reward (language type information of the current step and similarity between the m-th segment text information and the ground truth). Compensation possibility information according to an embodiment of the present invention may be calculated by a Q-function as shown in the following equation.

위 수학식 3에서 Q_π는 특정 정책 π에서 상태 s, 액션 a인 경우 미래에 예상되는 전체 보상 가능성 정보를 의미하고, R은 특정 기간의 보상, gamma는 감가율을 의미할 수 있다. S_t는 시간 t의 상태, A_t는 시간 t의 액션, E는 기대값을 의미할 수 있다. 본 발명의 일실시예에 따른 보상 가능성 정보(Q value)는 정책망의 업데이트 방향 및 크기를 규정하게 된다. In Equation 3 above, Q _π means information on the possibility of compensation expected in the future in the case of state s and action a in a specific policy π, R may mean compensation for a specific period, and gamma may mean depreciation rate. S _t may mean a state at time t, A _t may mean an action at time t, and E may mean an expected value. Compensation possibility information (Q value) according to an embodiment of the present invention defines the update direction and size of the policy network.

이때, 가치망의 Cost function은 가치 함수에 대한 MSE(Mean Square error) 함수일 수 있고, 예를 들면 아래 수학식 4와 같이 구성될 수 있다. 가치망은 가치망의 cost function인 시간차 에러를 기초로 back propagation 될 수 있다. At this time, the cost function of the value network may be a mean square error (MSE) function for the value function, and may be configured as shown in Equation 4 below, for example. The value network can be back propagated based on the time lag error, which is the cost function of the value network.

수학식 4에서, V는 가치 함수, w는 가치망 파라미터, s_i는 현재 에피소드인 i의 상태 정보, S_i+1은 다음 에피소드인 i+1의 상태 정보, r_i+1은 다음 에피소드에서 획득하는 것으로 예상되는 보상, V_w(s_i)는 현재 에피소드에서의 보상 가능성, V_w(s_i+1)는 다음 에피소드에서의 보상 가능성, γ는 감가율을 의미할 수 있다. 이때, r_i+1은 현재 step의 언어 종류 정보 및 m번째 분절 텍스트 정보와 ground truth와의 유사도를 수신하도록 구성될 수 있다. In Equation 4, V is a value function, w is a value network parameter, s _i is the state information of the current episode i, S _{i + 1} is the state information of the next episode i + 1, r _{i + 1} is the next episode A reward expected to be obtained, V _w (s _i ) may mean a reward possibility in the current episode, V _w (s _{i + 1} ) may mean a reward possibility in the next episode, and γ may mean a depreciation rate. In this case, r _i+1 may be configured to receive the language type information of the current step and the similarity between the m-th segment text information and the ground truth.

이에 따라, 가치망은 강화학습 모듈의 상태가 변경될 때 수학식 3의 Cost Function을 Gradient descent 시키는 방향으로 업데이트 할 수 있다. Accordingly, the value network can update the cost function of Equation 3 in the direction of gradient descent when the state of the reinforcement learning module is changed.

본 발명의 일실시예에 따르면 가치망을 정책망과 별도로 학습시키면서, 가치망의 Q value가 랜덤에서 시작하지 않고 Supervised되게 되므로 빠른 학습이 가능해지는 효과가 발생된다. 이에 따르면 매우 복잡도가 높은 언어 종류 정보 및 m번째 분절 텍스트 정보의 조합을 선택하는 액션(action)에 있어서 탐구(exploration) 부담을 크게 줄일 수 있게 되는 효과가 발생된다. According to an embodiment of the present invention, while learning the value network separately from the policy network, the Q value of the value network is supervised instead of starting from random, so that rapid learning is possible. According to this, an effect of greatly reducing the burden of exploration occurs in an action of selecting a combination of very complex language type information and m-th segment text information.

본 발명의 일실시예에 따른 강화학습 모듈에 따르면, 지도학습을 마친 정책망이 현재 에피소드 i의 언어 종류 정보 및 m번째 분절 텍스트 정보를 선정하게 되면 가치망이 선정된 언어 종류 정보 및 m번째 분절 텍스트 정보를 진행할 경우의 보상(현재 step의 언어 종류 정보 및 m번째 분절 텍스트 정보와 Ground truth와의 유사도)을 예측하도록 학습된다. 학습을 마친 강화학습 모듈의 정책망과 가치망은 RNN 블록을 활용한 시뮬레이션과 조합되어 최종적으로 언어 종류 정보 및 m번째 분절 텍스트 정보를 선정하는데 활용된다. According to the reinforcement learning module according to an embodiment of the present invention, when the policy network that has completed supervised learning selects the language type information and the m-th segment text information of the current episode i, the value network selects the selected language type information and the m-th segment text information. It is trained to predict the reward (the language type information of the current step and the similarity between the m-th segment text information and the ground truth) when processing text information. The policy network and value network of the reinforced learning module that has been trained are combined with simulation using RNN blocks, and are finally used to select language type information and m-th segment text information.

또한, 본 발명의 일실시예에 따른 가치망에 따르면 선정된 언어 종류 정보 및 m번째 분절 텍스트 정보의 확률을 출력하는 정책망의 업데이트가 매 에피소드마다 진행될 수 있는 효과가 발생된다. 기존의 강화학습에서는 강화학습 모델의 업데이트가 모든 에피소드가 종료된 이후에 진행되는 문제가 있어서, 언어 종류 정보 및 m번째 분절 텍스트 정보를 순차적으로 생성하는 RNN 모듈에 적용하는데는 어려움이 있었다. In addition, according to the value network according to an embodiment of the present invention, the update of the policy network outputting the probability of the selected language type information and the m-th segment text information can be performed every episode. In the existing reinforcement learning, there is a problem in that the reinforcement learning model is updated after all episodes are finished, so it is difficult to apply the language type information and the m-th segment text information to the RNN module that sequentially generates.

RNN 블록은 정책망과 가치망에서 계산되는 복수의 에이전트(agent)를 기초로 다양한 상태 및 다양한 액션에 대한 복수회의 시뮬레이션을 진행하여 최적의 언어 종류 정보 및 m번째 분절 텍스트 정보를 탐색하는 구성이다. 본 발명의 일실시예에 따른 RNN 블록은, 예를 들어, 몬테카를로 트리 탐색을 활용할 수 있고, 트리의 각 노드는 상태(state)를, 각 연결(edge)은 해당 상태에 대한 특정 액션에 따라 예상되는 가치(value)를 나타내며, 현재 상태를 뿌리 노드로 두고 새로운 액션을 취해 새로운 상태로 전이될 때 마다 잎(leaf) 노드가 확장되는 구조이다. 본 발명의 일실시예에 따른 RNN 블록에서 최적 언어 종류 정보 및 m번째 분절 텍스트 정보 탐색은 몬테카를로 트리 탐색이 활용되는 경우, Selection, Expansion, Evaluation, Backup의 4 단계로 처리될 수 있다. The RNN block is a component that searches for optimal language type information and m-th segment text information by conducting multiple simulations for various states and various actions based on a plurality of agents calculated in the policy network and value network. The RNN block according to an embodiment of the present invention, for example, can utilize Monte Carlo tree search, each node of the tree has a state, and each edge is expected according to a specific action for that state. It is a structure in which a leaf node is expanded whenever a transition is made to a new state by taking a new action with the current state as the root node. Searching for optimal language type information and m-th segment text information in the RNN block according to an embodiment of the present invention can be processed in four steps: Selection, Expansion, Evaluation, and Backup when Monte Carlo tree search is used.

RNN 블록의 Selection 단계는, 현재 상태로부터 잎 노드가 나올 때까지 선택 가능한 액션 중 가장 가치가 높은 액션을 선택하며 진행하는 단계이다. 이 때 연결(edge)에 저장해 둔 가치함수의 값과 탐구-이용 균형을 맞추기 위한 방문빈도 값을 이용한다. Selection 단계에서 액션 선택을 위한 수학식은 아래와 같다. The Selection step of the RNN block is a step in which the action with the highest value among selectable actions is selected and proceeded until a leaf node emerges from the current state. At this time, the value of the value function stored in the edge and the visit frequency value to balance exploration and use are used. The equation for selecting an action in the selection step is as follows.

위 수학식 5에서 a_t는 시간t에서의 액션(언어 종류 정보 및 m번째 분절 텍스트 정보 선정 수행)이고, Q(s_t,a)는 트리에 저장된 가치함수의 값이며, u(s_t,a)는 해당 상태-액션 쌍의 방문횟수에 반비례하는 값으로 탐구(exploration)와 이용의 균형을 맞추기 위해 사용된 것이다. In Equation 5 above, a _t is the action at time t (selection of language type information and m-th segment text information), Q (s _t , a) is the value of the value function stored in the tree, and u (s _t , a) is a value that is inversely proportional to the number of visits of the state-action pair, and is used to balance exploration and use.

RNN 블록의 Expansion 단계는, 시뮬레이션이 잎 노드까지 진행되면 지도학습으로 학습된 정책망의 확률에 따라 액션하여 새로운 노드를 잎 노드로 추가하는 단계이다. The Expansion step of the RNN block is a step of adding a new node as a leaf node by acting according to the probability of the policy network learned through supervised learning when the simulation progresses to the leaf node.

RNN 블록의 Evaluation 단계는, 새로 추가된 잎 노드로부터 가치망을 사용해 판단한 가치(보상 가능성)와 잎 노드로부터 정책망을 사용해 언어 종류 정보 및 m번째 분절 텍스트 정보 선정의 에피소드가 끝날 때까지 진행해 얻은 보상을 통해 잎 노드의 가치를 평가하는 단계이다. 아래 수학식은 새로운 잎 노드의 가치를 평가하는 예시이다. The evaluation stage of the RNN block proceeds until the end of the value (reward potential) determined using the value network from the newly added leaf node and the language type information and the m-th segment text information selection episode using the policy network from the leaf node. This step evaluates the value of the leaf node through The equation below is an example of evaluating the value of a new leaf node.

위 수학식 6에서 V(s_L)은 잎 노드의 가치, λ는 mixing 파라미터, v_θ(s_L)은 가치망을 통해 얻은 가치, z_L은 시뮬레이션을 계속하여 얻은 보상을 의미할 수 있다. In Equation 6 above, V(s _L ) is the value of the leaf node, λ is the mixing parameter, v _θ (s _L ) is the value obtained through the value network, and z _L may mean the compensation obtained by continuing the simulation.

RNN 블록의 Backup 단계는, 새로 추가된 잎 노드의 가치를 반영하여 시뮬레이션 중 방문한 노드들의 가치를 재평가하고 방문 빈도를 업데이트하는 단계이다. 아래 수학식은 노드 가치 재평가 및 방문 빈도 업데이트의 예시이다. The backup step of the RNN block is a step of reevaluating the value of the nodes visited during the simulation by reflecting the value of the newly added leaf node and updating the visit frequency. The equation below is an example of node value revaluation and visit frequency update.

위 수학식 7에서 s_L ⁱ는 i번째 시뮬레이션에서의 잎 노드를, 1(s,a,i)는 i번째 시뮬레이션에서 연결 (s,a)를 방문했는지를 나타내고, 트리 탐색이 완료되면 알고리즘은 뿌리 노드로부터 가장 많이 방문된 연결(s,a)을 선택하도록 구성될 수 있다. 본 발명의 일실시예에 따른 RNN 블록에 따르면 정책망에 의해 선별되는 복수의 언어 종류 정보 및 m번째 분절 텍스트 정보에 대해 가치망을 기초로 복수회 시뮬레이션을 선행하여 최적의 언어 종류 정보 및 m번째 분절 텍스트 정보를 선택할 수 있게되는 효과가 발생된다. In Equation 7 above, s _L ⁱ represents the leaf node in the ith simulation, 1(s,a,i) indicates whether the connection (s,a) was visited in the ith simulation, and when the tree search is completed, the algorithm It can be configured to select the most visited connection (s,a) from the root node. According to the RNN block according to an embodiment of the present invention, a plurality of language type information and m-th segment text information selected by a policy network are simulated multiple times based on a value network, and optimal language type information and m-th segment text information are preceded. An effect of being able to select segmental text information is generated.

본 발명의 일실시예에 따르면, 복수의 에이전트(Agent)가 구성되도록 강화학습 모듈이 구성될 수 있다. 복수의 에이전트가 구성되면 특정 상태, 특정 언어 종류 정보 및 m번째 분절 텍스트 정보 각각에 대해 강화학습 모듈이 선정하는 언어 종류 정보 및 m번째 분절 텍스트 정보이 상호 경쟁하여, 가장 최적의 언어 종류 정보 및 m번째 분절 텍스트 정보를 선정할 수 있게 되는 효과가 발생된다.According to one embodiment of the present invention, the reinforcement learning module may be configured such that a plurality of agents are configured. When a plurality of agents are configured, the language type information and the m-th segment text information selected by the reinforcement learning module compete with each other for a specific state, specific language type information, and m-th segment text information, respectively, to obtain the most optimal language type information and m-th segment text information. An effect of being able to select segmental text information is generated.

도 13은 본 발명의 일실시예에 따른 강화학습 모듈의 동작예를 도시한 흐름도이다. 13 is a flowchart illustrating an operation example of a reinforcement learning module according to an embodiment of the present invention.

도 13에 도시된 바와 같이, 텍스트 시퀀스 생성 신경망 모듈에 의해 상태 s(t)가 입력되면 가치망에 의해 정책망의 복수개의 에이전트(agent)들에 의해 다양한 언어 종류 정보 및 m번째 분절 텍스트 정보들이 RNN 블록에 입력되고, RNN 블록에 의해 출력되는 액션(action)인 선정된 언어 종류 정보 및 m번째 분절 텍스트 정보의 확률 a(t)에 의해 언어 종류 정보 및 m번째 분절 텍스트 정보가 선정되는 것으로 에피소드 t가 종료되고 에피소드 t+1이 시작된다. 에피소드 t+1에서는 다시 a(t)에 의한 상태 변화인 s(t+1)이 텍스트 시퀀스 생성 신경망 모듈에 의해 입력되고, a(t)에 따른 보상인 r(t+1)이 곧바로 입력되어 가치망 및 정책망을 업데이트하게 된다.As shown in FIG. 13, when state s(t) is input by the text sequence generating neural network module, various language type information and m-th segment text information are generated by a plurality of agents of the policy network through the value network. The language type information and the m-th segment text information are selected by the probability a(t) of the selected language type information and the m-th segment text information, which are input to the RNN block and are actions output by the RNN block. t ends and episode t+1 begins. In episode t+1, s(t+1), a state change by a(t), is input by the text sequence generating neural network module, and r(t+1), a reward according to a(t), is immediately input, Value and policy networks will be updated.

텍스트 정보 생성 모듈은, 상기 텍스트 좌표 정보 및 상기 텍스트 시퀀스 정보를 입력 데이터로 하고 [좌표, 텍스트]의 데이터 형태로 구성되는 텍스트 정보를 출력 데이터로 하는 모듈을 의미할 수 있다. 예를 들어, 특정 텍스트 박스에 대한 텍스트 정보는 [(x,y), 'text']의 구조로 구성될 수 있다. The text information generating module may refer to a module that uses the text coordinate information and the text sequence information as input data and text information configured in the data format of [coordinates, text] as output data. For example, text information for a specific text box may have a structure of [(x,y), 'text'].

도 14는 본 발명의 일실시예에 따른 텍스트 처리 모듈을 도시한 모식도이다. 도 14에 도시된 바와 같이 인공지능 기반의 전자문서 관리 장치(1)의 텍스트 처리 모듈은, 전자문서 이미지 내에 포함된 상기 텍스트 정보가 y좌표의 순서로 순차적으로 통합된 정보인 전체 텍스트 정보를 입력 받아 전자문서에 포함된 텍스트 정보를 항목별로 매칭한 항목 정보를 출력하는 모듈이다. 본 발명의 일실시예에 따른 텍스트 처리 모듈은, 전자문서 분류 신경망 모듈, 항목 분류 신경망 모듈, 항목 정보 생성 모듈을 포함하도록 구성될 수 있다. 예를 들어, 전체 텍스트 정보는 [(20,2)부동산매매계약서, (2,4)1.부동산의표시, (2,8)소재지...]의 형태로 구성될 수 있다. 14 is a schematic diagram illustrating a text processing module according to an embodiment of the present invention. As shown in FIG. 14, the text processing module of the artificial intelligence-based electronic document management device 1 inputs full text information, which is information in which the text information included in the electronic document image is sequentially integrated in the order of y coordinates It is a module that receives and outputs item information that matches text information included in an electronic document for each item. A text processing module according to an embodiment of the present invention may be configured to include an electronic document classification neural network module, an item classification neural network module, and an item information generation module. For example, full text information may be configured in the form of [(20,2) real estate sales contract, (2,4)1. display of real estate, (2,8) location...].

전자문서 분류 신경망 모듈과 관련하여, 도 15는 본 발명의 일실시예에 따른 전자문서 분류 신경망 모듈을 도시한 모식도이다. Regarding the electronic document classification neural network module, FIG. 15 is a schematic diagram illustrating an electronic document classification neural network module according to an embodiment of the present invention.

도 15에 도시된 바와 같이 전자문서 분류 신경망 모듈은, 전자문서 이미지 내에 포함된 상기 텍스트 정보가 y좌표의 순서로 순차적으로 통합된 정보인 전체 텍스트 정보를 입력 데이터로 하고, 전자문서 종류 class 및 각 class에 대한 신뢰도(confidence)를 포함하는 전자문서 종류 정보(예를 들어, [부동산매매계약서,0.85])를 출력 데이터로 하는 인공신경망 모듈로 구성될 수 있다. 전자문서 분류 신경망 모듈의 학습 세션과 관련하여, 전자문서 분류 신경망 모듈의 학습 세션은 출력된 전자문서 종류 정보와 ground truth의 차이를 기초로 클라이언트 학습 모듈에 의해 전자문서 분류 신경망 모듈의 파라미터가 업데이트되도록 구성될 수 있다. As shown in FIG. 15, the electronic document classification neural network module takes as input data all text information, which is information in which the text information included in the electronic document image is sequentially integrated in the order of y coordinates, and the electronic document type class and each It can be configured as an artificial neural network module that outputs electronic document type information (eg, [real estate sales contract, 0.85]) including confidence in class. Regarding the learning session of the electronic document classification neural network module, the training session of the electronic document classification neural network module is such that the parameters of the electronic document classification neural network module are updated by the client learning module based on the difference between the output electronic document type information and the ground truth. can be configured.

항목 분류 신경망 모듈과 관련하여, 도 16은 본 발명의 일실시예에 따른 항목 분류 신경망 모듈을 도시한 모식도이다. Regarding the item classification neural network module, FIG. 16 is a schematic diagram illustrating an item classification neural network module according to an embodiment of the present invention.

도 16에 도시된 바와 같이 항목 분류 신경망 모듈은, 상기 텍스트 정보 중 항목 분류의 대상이 되는 텍스트인 분류대상 텍스트와 상기 전체 텍스트 정보를 입력 데이터로 하고, 분류된 항목 class와 신뢰도(confidence)를 포함하는 항목 종류 정보(예를 들어, [매물 소재지,0.85])를 출력 데이터로 하는 인공신경망 모듈로 구성될 수 있다. 항목 분류 신경망 모듈의 학습 세션과 관련하여, 항목 분류 신경망 모듈의 학습 세션은 출력된 항목 종류 정보와 ground truth의 차이를 기초로 클라이언트 학습 모듈에 의해 항목 분류 신경망 모듈의 파라미터가 업데이트되도록 구성될 수 있다. 이때, 항목 분류 신경망 모듈은 전자문서 종류 class와 관련 없이 가능한 항목 class를 모두 추론하게 되며, 출력된 항목 종류 정보에는 전자문서별 항목 벡터가 곱해지도록 구성될 수 있고, 전자문서별 항목 벡터는 전자문서 종류 정보의 전자문서 종류 class에 따라 포함되지 않는 항목에 대해서 null 값([0])이 적용된 기설정된 벡터를 의미한다. 예를 들어, 항목 분류 신경망 모듈은 전자문서 종류와 관계없이 가능한 항목 class를 모두 추론하게 되므로, 전자문서 종류 class가 [부동산매매계약서]인 경우에도 특정 분류대상 텍스트에 대해 [전세보증금] class에 대해 confidence score를 추론하도록 구성될 수 있다. 이때, [부동산매매계약서]라는 전자문서 종류 class에 기설정된 전자문서별 항목 벡터에는 [전세보증금]에 대해 [0]이 매칭되어 있고 추론된 항목 종류 정보에 전자문서별 항목 벡터가 적용되면서 [전세보증금]에 대한 최종 confidence는 [0]으로 추론되도록 구성될 수 있다. 이에 따르면, 전자문서 종류 class에 따라 별개의 항목 분류 신경망 모듈을 구성하지 않게되고, 하나의 항목 분류 신경망 모듈만으로 복수의 전자문서 종류 class에 대한 항목 분류가 가능해지는 효과가 발생된다. As shown in FIG. 16, the item classification neural network module takes the text to be classified, which is the text subject to item classification among the text information, and the full text information as input data, and includes the classified item class and confidence. It can be composed of an artificial neural network module that outputs item type information (for example, [property location, 0.85]) as output data. Regarding the learning session of the item classification neural network module, the learning session of the item classification neural network module may be configured so that parameters of the item classification neural network module are updated by the client learning module based on the difference between the output item type information and the ground truth. . At this time, the item classification neural network module infers all possible item classes regardless of the electronic document type class, and the output item type information can be configured to be multiplied by the item vector for each electronic document, and the item vector for each electronic document is It means a preset vector to which a null value ([0]) is applied to items not included according to the electronic document type class of type information. For example, since the item classification neural network module infers all possible item classes regardless of the electronic document type, even if the electronic document type class is [Real Estate Sales Contract], for a specific classification target text, for the [Cheonse Deposit] class It can be configured to infer a confidence score. At this time, the item vector for each electronic document preset in the electronic document type class of [Real Estate Sales Contract] is matched with [0] for [Cheonse Deposit], and the item vector for each electronic document is applied to the inferred item type information [Cheonse The final confidence for the deposit] may be configured to be inferred as [0]. According to this, a separate item classification neural network module is not configured according to the electronic document type class, and item classification for a plurality of electronic document type classes is possible with only one item classification neural network module.

항목 정보 생성 모듈은, 항목 분류 신경망 모듈에서 상기 분류대상 텍스트와 상기 항목 class(항목 종류 정보)를 수신하고, 동일 항목 class로 분류되는 다른 분류대상 텍스트를 통합하여 항목 class 별로 항목 정보를 생성하는 모듈이다. 예를 들어, 분류대상 텍스트가 [서울시 강남구 ...]이고, 항목 종류 정보의 항목 class가 [매물 소재지]인 경우, 항목 정보는 [매물 소재지, 서울시 강남구 ...]의 형태의 데이터 구조로 구성될 수 있다. 또한, 예를 들어, [매물 소재지]의 항목 class로 분류되는 분류대상 텍스트가 [서울시 강남구 ...](분류대상 텍스트)와 [삼성동 힐스테이트2단지 2**동 2***호](다른 분류대상 텍스트)로 2개 이상인 경우, 분류대상 텍스트와 다른 분류대상 텍스트를 통합하여 항목 정보를 생성하도록 구성될 수 있다. The item information generation module receives the classification target text and the item class (item type information) from the item classification neural network module, and generates item information for each item class by integrating other classification target texts classified into the same item class. am. For example, if the text to be classified is [Gangnam-gu, Seoul ...] and the item class of item type information is [Location of a property], the item information is a data structure in the form of [Location of a property, Gangnam-gu, Seoul ...] can be configured. In addition, for example, the text to be classified that is classified as the item class of [Location of property] is [Gangnam-gu, Seoul ...] (text to be classified) and [Samseong-dong Hillstate 2 Danji 2**-2*** Building] ( If there are two or more different texts to be classified), item information may be generated by integrating the text to be classified with other texts to be classified.

신경망 처리 모듈과 관련하여, 도 17은 본 발명의 일실시예에 따른 신경망 처리 모듈을 도시한 모식도이다. Regarding the neural network processing module, FIG. 17 is a schematic diagram showing a neural network processing module according to an embodiment of the present invention.

도 17에 도시된 바와 같이 신경망 처리 모듈은, 텍스트 추출 모듈과 텍스트 처리 모듈에 포함된 인공신경망 모듈의 학습 세션을 처리하여 파라미터를 업데이트하고, 업데이트 된 파라미터를 전자문서 관리 서버의 연합 학습 모듈에 업로드하며, 전자문서 관리 서버의 메인 신경망 모듈에서 복수의 클라이언트에 의해 연합 학습된 메인 신경망을 다운로드 받는 모듈이다. 본 발명의 일실시예에 따른 신경망 처리 모듈은, 수정 정보 생성 모듈, 클라이언트 학습 모듈, 파라미터 업로드 모듈, 메인 신경망 다운로드 모듈을 포함하도록 구성될 수 있다. As shown in FIG. 17, the neural network processing module updates the parameters by processing the text extraction module and the learning session of the artificial neural network module included in the text processing module, and uploads the updated parameters to the federated learning module of the electronic document management server. It is a module that downloads the main neural network jointly learned by a plurality of clients in the main neural network module of the electronic document management server. A neural network processing module according to an embodiment of the present invention may include a correction information generation module, a client learning module, a parameter upload module, and a main neural network download module.

수정 정보 생성 모듈과 관련하여, 도 18은 본 발명의 일실시예에 따른 수정 정보 생성 모듈을 도시한 모식도이다. Regarding the correction information generation module, FIG. 18 is a schematic diagram showing the correction information generation module according to an embodiment of the present invention.

도 18에 도시된 바와 같이 수정 정보 생성 모듈은, 사용자 클라이언트의 디스플레이에 상기 전자문서 이미지, 상기 전자문서 종류 정보, 상기 텍스트 좌표 정보, 상기 항목 정보를 출력하고, 사용자의 입력에 의해 상기 정보에 대한 사용자의 수정이 반영된 수정 정보를 수신하는 모듈이다. 예를 들어, 전자문서 종류 정보가 [부동산매매계약서]로 추론되어 사용자 클라이언트의 디스플레이에 출력된 경우, 사용자가 전자문서 이미지를 확인하고 전자문서 종류 정보를 [부동산임대차계약서]로 수정하여 수정 정보를 생성하도록 구성될 수 있다. 이때, 전자문서 종류 정보가 수정되는 경우에는 전자문서별 항목 벡터가 변경되므로 항목 종류 정보, 항목 정보가 변경될 수 있다. 또한, 예를 들어 텍스트 좌표 정보가 [12,131,5,1,1]로 추론되어 사용자 클라이언트의 디스플레이에 출력된 경우 사용자가 전자문서 이미지를 확인하고 텍스트 좌표 정보를 [12,132,6,2,2]로 수정하여 수정 정보를 생성하도록 구성될 수 있다. 또한, 예를 들어 항목 정보가 [매물 소재지, 서울시 강남구...]로 추론되어 사용자 클라이언트의 디스플레이에 출력된 경우 사용자가 전자문서 이미지를 확인하고 항목 정보를 [매수자 주소, 서울시 강남구...]로 수정하여 수정 정보를 생성하도록 구성될 수 있다. As shown in FIG. 18, the correction information generation module outputs the electronic document image, the electronic document type information, the text coordinate information, and the item information to a display of a user client, and outputs information about the information by a user's input. This is a module that receives the correction information reflecting the user's correction. For example, if the electronic document type information is inferred as [real estate sales contract] and output on the display of the user client, the user checks the electronic document image and modifies the electronic document type information to [real estate lease contract] to display the corrected information. can be configured to create In this case, when the electronic document type information is modified, since the item vector for each electronic document is changed, item type information and item information may be changed. In addition, for example, if the text coordinate information is inferred as [12,131,5,1,1] and output on the display of the user client, the user checks the electronic document image and converts the text coordinate information to [12,132,6,2,2] It may be configured to generate correction information by modifying to . In addition, for example, when item information is inferred as [Location of sale, Gangnam-gu, Seoul...] and output on the display of the user client, the user checks the electronic document image and returns the item information to [buyer's address, Gangnam-gu, Seoul...] It may be configured to generate correction information by modifying to .

또한, 수정 정보 생성 모듈은, 항목 정보의 분류대상 텍스트의 비정상 여부를 추론하는 비정상 텍스트 구분 모듈을 더 포함하도록 구성될 수 있다. 분류대상 텍스트가 비정상으로 추론되는 경우 사용자 클라이언트의 디스플레이에 표시되도록 하여 사용자가 쉽게 수정 정보를 생성할 수 있도록 구성될 수 있다. In addition, the correction information generating module may be configured to further include an abnormal text classification module that infers whether or not the text subject to classification of the item information is abnormal. When the text to be classified is inferred to be abnormal, it may be displayed on the display of the user client so that the user can easily generate correction information.

비정상 텍스트 구분 모듈은 항목 정보의 항목 종류 정보 및 분류대상 텍스트를 통합한 다차원 벡터를 입력받아 항목 정보의 분류대상 텍스트의 비정상 텍스트 여부를 구분하는 모듈이다. 구체적으로, 본 발명의 일실시예에 따른 비정상 텍스트 구분 모듈은 생성 모듈과 구분 모듈을 포함할 수 있고, 구분 모듈을 이용하여 생성 모듈이 Random noise(Z)를 입력받아 정상적인 항목 종류 정보 및 분류대상 텍스트를 통합한 다차원 벡터인 정상 벡터를 생성하도록 생성 모듈을 학습한 뒤, 생성 모듈의 손실 함수 출력값 L(Loss)을 비정상 텍스트 스코어로 활용하여 생성 모듈에 입력되는 Random noise(Z)의 변화에 따라 L이 특정 값 이하로 낮아지는지 여부를 기초로 비정상 텍스트 구분 모듈에 입력되는 데이터의 비정상 텍스트 여부를 구분할 수 있다. The abnormal text classification module is a module that receives a multi-dimensional vector incorporating item type information and classification target text of item information and classifies whether or not the classification target text of item information is abnormal text. Specifically, the abnormal text classification module according to an embodiment of the present invention may include a generation module and a classification module, and the generation module receives random noise (Z) using the classification module and receives normal item type information and classification target. After learning the generation module to generate a normal vector, which is a multidimensional vector that integrates text, the loss function output value L (Loss) of the generation module is used as an abnormal text score according to the change in random noise (Z) input to the generation module. Whether or not data input to the abnormal text classification module is abnormal text can be distinguished based on whether L is lowered to a specific value or less.

본 발명의 일실시예에 따른 비정상 텍스트 구분 모듈의 생성 모듈은 인코더와 디코더로 구성되어 정상 벡터를 생성하도록 구성될 수 있고, 생성 모듈의 인코더는 정상적인 항목 종류 정보 및 분류대상 텍스트를 통합한 다차원 벡터를 수신하여 1 x 1 x k의 잠재변수로 인코딩하는 복수개의 연속된 ConvNet으로 구성될 수 있으며, 생성 모듈의 디코더는 1 x 1 x k의 잠재변수를 항목 종류 정보 및 분류대상 텍스트를 포함하는 다차원 벡터로 출력하도록 디코딩하는 복수개의 연속된 네트워크로 구성될 수 있다. 이때, 정상 벡터인 다차원 벡터를 입력하여 정상 벡터에 가까운 다차원 벡터를 출력하도록 생성 모듈이 학습될 수 있고, 생성 모듈에 의해 출력되는 다차원 벡터의 정상 벡터인지 여부를 구분하는 구분 모듈에 의해 학습될 수 있다. 본 발명의 일실시예에 따른 비정상 텍스트 구분 모듈의 구분 모듈은, CONCAT 함수와 복수개의 인코더를 통해 생성 모듈에 의해 출력되는 다차원 벡터의 정상 벡터인지 여부를 구분하도록 구성될 수 있다. The generation module of the abnormal text classification module according to an embodiment of the present invention may be configured to generate a normal vector composed of an encoder and a decoder, and the encoder of the generation module is a multidimensional vector in which normal item type information and text to be classified are integrated. It can be composed of a plurality of consecutive ConvNets that receive and encode into latent variables of 1 x 1 x k, and the decoder of the generating module transforms latent variables of 1 x 1 x k into multidimensional vectors including item type information and text to be classified. It can be composed of a plurality of consecutive networks that decode to output. In this case, the generation module may be trained to input a multidimensional vector that is a normal vector and output a multidimensional vector close to the normal vector, and may be learned by a discrimination module that distinguishes whether or not the multidimensional vector output by the generation module is a normal vector. there is. The classification module of the abnormal text classification module according to an embodiment of the present invention may be configured to distinguish whether a multidimensional vector output by the generation module is a normal vector through a CONCAT function and a plurality of encoders.

비정상 텍스트 구분 모듈의 학습과 관련하여, 도 18은 본 발명의 일실시예에 따른 비정상 텍스트 구분 모듈의 학습 과정을 도시한 모식도이다. 도 18에 도시된 바와 같이, 생성 모듈은 구분 모듈과 MinMax game을 구성하도록 Loss Function이 구성될 수 있고, 동시에 학습될 수 있다. 이하 수학식 8은 생성 모듈과 구분 모듈의 Loss Function이다. Regarding learning of the abnormal text classification module, FIG. 18 is a schematic diagram showing a learning process of the abnormal text classification module according to an embodiment of the present invention. As shown in FIG. 18, in the generation module, a Loss Function may be configured to configure a classification module and a MinMax game, and may be learned at the same time. Equation 8 below is a loss function of the generation module and the classification module.

위 수학식 8에서 G는 생성 모듈, D는 구분 모듈을 의미하며, z는 잠재 변수로서 입력되는 Random noise, y는 정상적인 항목 종류 정보 및 분류대상 텍스트를 통합한 다차원 벡터인 정상 벡터, G(x)는 생성된 다차원 벡터인 생성 벡터를 의미한다. 따라서, 수학식 8에 따르면 생성 모듈 및 구분 모듈의 Loss function은, 생성 모듈이 충분히 학습되지 않아서 구분 모듈이 잠재 변수인 랜덤 노이즈 z를 통해 y와 G(z)를 완벽하게 구분해내는 경우에는 D(z,y)=1, D(z,G(z))=0에 의해 0의 max 값을 갖고, 생성 모듈의 학습 후에 구분 모듈이 랜덤 노이즈 z를 통해 y와 G(z)를 구분해내지 못하는 경우 D(z,y)=1/2, D(z,G(z))=1/2에 의해 -log4의 min 값을 갖는다. 즉, 위 Loss function에 의해 생성 모듈이 랜덤 노이즈 z를 통해 생성한 다차원 벡터인 생성 벡터 G(z)와 정상적인 항목 종류 정보 및 분류대상 텍스트를 통합한 다차원 벡터인 정상 벡터인 y가 동일할 때, 생성 모듈은 Global minimum을 갖게 되고, 이러한 방향으로 생성 모듈 및 구분 모듈이 학습되게 된다. 생성 모듈과 구분 모듈은 상호 적대적 의존 관계에 의해 생성 모듈을 빠르게 최적화 할 수 있는 효과가 발생된다. In Equation 8 above, G is a generation module, D is a classification module, z is random noise input as a latent variable, y is a normal vector that is a multidimensional vector integrating normal item type information and text to be classified, G (x ) means a generated vector, which is a generated multidimensional vector. Therefore, according to Equation 8, the loss function of the generation module and the classification module is D if the generation module is not sufficiently learned and the classification module perfectly distinguishes y and G (z) through random noise z, which is a latent variable. (z, y) = 1, with a max value of 0 by D (z, G (z)) = 0, and after learning the generation module, the discrimination module distinguishes y and G (z) through random noise z If not, it has a min value of -log4 by D(z,y)=1/2 and D(z,G(z))=1/2. That is, when the generation vector G(z), which is a multidimensional vector generated by the generation module through random noise z by the above loss function, and the normal vector y, which is a multidimensional vector integrating normal item type information and text to be classified, are the same, The generation module has a global minimum, and in this direction, the generation module and the classification module are learned. The effect of quickly optimizing the generation module is generated by the mutual antagonistic dependency between the generation module and the division module.

비정상 텍스트 구분 모듈의 비정상 텍스트 구분과 관련하여, 도 19는 본 발명의 일실시예에 따른 비정상 텍스트 구분 모듈의 비정상 텍스트 구분 과정을 도시한 모식도이다. 도 19에 도시된 바와 같이, 생성 모듈은 랜덤 노이즈 z를 수신하여 정상 벡터에 가까운 다차원 벡터(생성 벡터로서의 항목 종류 정보 및 분류대상 텍스트)를 생성하게 되고, 입력되는 항목 종류 정보 및 분류대상 텍스트의 다차원 벡터 y(비정상 텍스트인지 여부를 구분하는 대상인 구분 대상 벡터)와 생성 벡터와의 차이를 기초로 입력되는 항목 종류 정보 및 분류대상 텍스트의 정상 또는 비정상 여부를 구분하게 된다. Regarding the abnormal text classification of the abnormal text classification module, FIG. 19 is a schematic diagram showing an abnormal text classification process of the abnormal text classification module according to an embodiment of the present invention. As shown in FIG. 19, the generating module receives random noise z to generate a multi-dimensional vector (item type information and text to be classified as a generated vector) close to a normal vector, and the input item type information and text to be classified are Based on the difference between the multi-dimensional vector y (a classification target vector that determines whether or not it is abnormal text) and the generated vector, whether the input item type information and the text to be classified is normal or abnormal is distinguished.

또한, 본 발명의 일실시예에 따른 생성 모듈은 학습 이후에 파라미터가 고정되도록 구성될 수 있고, G(z)와 y의 차이인 구분 손실함수(L)가 줄어들도록 Back Propagation을 통해 잠재 변수인 랜덤 노이즈 z를 조절하도록 구성될 수 있다. 아래 수학식 9는 G(z)와 y의 차이에 대한 구분 손실함수(L), 수학식 10은 잠재 변수인 랜덤 노이즈의 조절에 관한 것이다. In addition, the generation module according to an embodiment of the present invention may be configured so that the parameter is fixed after learning, and the latent variable is It can be configured to adjust the random noise z. Equation 9 below relates to a division loss function (L) for the difference between G(z) and y, and Equation 10 relates to control of random noise, which is a latent variable.

위 수학식 9, 10에서 L은 정상 벡터와 가깝게 생성된 생성 벡터와 구분 대상 벡터(입력된 항목 종류 정보 및 분류대상 텍스트)와의 차이인 구분 손실함수, G(z)는 생성 벡터, z는 잠재 변수인 랜덤 노이즈, y는 구분 대상 벡터, η는 학습률(Learning rate)을 의미한다. 이에 따르면, 항목 종류 정보 및 분류대상 텍스트의 다차원 벡터 y(구분 대상 벡터)가 정상 텍스트인 경우에는 생성 모듈 G의 파라미터를 고정한 상태로 구분 손실함수 L을 줄이기 위해 z를 조절하면 L의 손실값이 특정 값 이하로 낮아지게 된다. 또한, 사용자 데이터(구분 대상 데이터)의 다차원 벡터 y(구분 대상 벡터)가 비정상 텍스트인 경우에는 생성 모듈 G의 파라미터를 고정한 상태로 구분 손실함수 L을 줄이기 위해 z를 조절하더라도 L의 손실값이 특정 값 이하로 낮아지지 않게 된다. 즉, y가 비정상 텍스트인 경우의 Loss L은 y가 정상 텍스트인 경우의 L보다 상대적으로 높은 값을 가지게 된다. 따라서, L을 비정상 텍스트 스코어(anomaly score)로 사용하여 비정상 텍스트의 분류, 비정상 텍스트의 구분, 비정상 텍스트의 검출(detection)을 수행할 수 있게 되는 효과가 발생된다. In Equations 9 and 10 above, L is the division loss function that is the difference between the generation vector generated close to the normal vector and the division target vector (input item type information and classification target text), G (z) is the generation vector, and z is the latent Random noise as a variable, y is a vector to be classified, and η is a learning rate. According to this, if the multidimensional vector y (classification target vector) of the item type information and the text to be classified is normal text, if z is adjusted to reduce the division loss function L with the parameters of the generation module G fixed, the loss value of L is lowered below a certain value. In addition, if the multidimensional vector y (vector to be classified) of the user data (data to be classified) is abnormal text, even if z is adjusted to reduce the division loss function L with the parameters of the generation module G fixed, the loss value of L is specific. It does not fall below the value. That is, Loss L when y is abnormal text has a relatively higher value than L when y is normal text. Therefore, an effect of being able to classify abnormal text, classify abnormal text, and detect abnormal text using L as an anomaly score is generated.

또한, 수정 정보 생성 모듈은, 항목 정보의 분류대상 텍스트의 비정상 여부를 추론하는 비정상 텍스트 구분 모듈에서 비정상으로 추론되는 경우, 비정상 텍스트에 대해 정상 텍스트를 생성하여 사용자 클라이언트에 제시하는 정상 텍스트 생성 모듈을 더 포함하도록 구성될 수 있다. 분류대상 텍스트가 비정상으로 추론되는 경우, 정상 텍스트 생성 모듈에 의해 생성되는 정상 텍스트를 사용자 클라이언트의 디스플레이에 표시되도록 하여 사용자가 쉽게 수정 정보를 생성할 수 있도록 구성될 수 있다. In addition, the correction information generation module, if it is inferred as abnormal in the abnormal text classification module that infers whether or not the text subject to classification of item information is abnormal, generates normal text for the abnormal text and presents it to the user client. It may be configured to include more. When the text to be classified is inferred to be abnormal, the normal text generated by the normal text generation module may be displayed on the display of the user client so that the user can easily generate correction information.

정상 텍스트 생성 모듈의 구성과 관련하여, 도 20은 본 발명의 일실시예에 따른 정상 텍스트 생성 모듈의 구조를 도시한 모식도이다. 도 20에 도시된 바와 같이, 항목 종류 정보 및 분류대상 텍스트를 concatenate 한 결합 벡터를 정상 텍스트 생성 모듈의 입력 데이터로 하고, 정상 텍스트를 출력 데이터로 출력하는 인코더와 디코더가 결합된 형태의 Downsampling & Upsampling 인공신경망으로 구성될 수 있다. Regarding the configuration of the normal text generation module, FIG. 20 is a schematic diagram showing the structure of the normal text generation module according to an embodiment of the present invention. As shown in FIG. 20, Downsampling & Upsampling in the form of a combination of an encoder and a decoder for outputting normal text as output data, using a combination vector obtained by concatenating item type information and text to be classified as input data of the normal text generation module. It can be composed of an artificial neural network.

정상 텍스트 생성 모듈의 전반적인 실시예와 관련하여, 예를 들어, 본 발명의 일실시예에 따른 인코더는 항목 종류 정보 및 분류대상 텍스트를 입력 데이터로 수신하여 1 x 1 x k의 잠재변수인 인코딩 벡터를 출력 데이터로 인코딩하는 복수개의 연속된 Convolution Layer, Pooling Layer, Fully Connected Layer를 포함하는 ConvNet으로 구성될 수 있다. 또한, 인코더는 디코더와 Skip connection 구조로 구성될 수 있다. 인코더의 학습 세션에서, 본 발명의 일실시예에 따른 인코더에 입력되는 항목 종류 정보 및 분류대상 텍스트는 인코더 및 디코더의 각 Convolution layer에 대해 Channel-wise concatenation의 구조로 입력되도록 구성될 수 있다. 이때, 항목 종류 정보 및 분류대상 텍스트를 인코더 및 디코더의 각 Convolution layer에 대해 Channel-wise concatenation의 구조로 입력되는 구성에 의해 인코더 및 디코더의 학습 세션에서 Vanishing Gradient가 개선되고, Feature Propagation이 강화되며, Parameter 수가 절약되어 컴퓨팅 리소스가 저감되는 효과가 발생된다. 정상 텍스트 생성 모듈의 학습 세션에서는 정상 텍스트와 이에 대응되는 레퍼런스 데이터(Ground Truth)의 차이로 구성되는 손실을 저감시키는 방향으로 정상 텍스트 생성 모듈의 파라미터가 업데이트 되도록 구성될 수 있다. 이러한 정상 텍스트 생성 모듈의 손실함수는 Mean square loss, Cross entropy loss 등으로 구성될 수 있다.Regarding the overall embodiment of the normal text generation module, for example, an encoder according to an embodiment of the present invention receives item type information and text to be classified as input data and generates an encoding vector, which is a latent variable of 1 x 1 x k It can be composed of a ConvNet including a plurality of consecutive Convolution Layers, Pooling Layers, and Fully Connected Layers that encode output data. In addition, the encoder may be composed of a decoder and a skip connection structure. In the learning session of the encoder, item type information and text to be classified input to the encoder according to an embodiment of the present invention may be configured to be input in a channel-wise concatenation structure for each convolution layer of the encoder and decoder. At this time, the vanishing gradient is improved and feature propagation is strengthened in the learning session of the encoder and decoder by the configuration in which the item type information and the text to be classified are input in the structure of channel-wise concatenation for each convolution layer of the encoder and decoder, The number of parameters is saved, resulting in a reduction in computing resources. In the learning session of the normal text generation module, parameters of the normal text generation module may be updated in a direction to reduce a loss consisting of a difference between the normal text and the corresponding reference data (Ground Truth). The loss function of this normal text generation module can be composed of mean square loss, cross entropy loss, and the like.

클라이언트 학습 모듈(40)은, 상기 수정 정보 생성 모듈에서 사용자의 수정이 반영된 수정 정보를 수신하는 경우, 수정 정보를 ground truth로 하여 텍스트 추출 모듈과 텍스트 처리 모듈에 포함된 인공신경망의 파라미터를 업데이트 하도록 텍스트 추출 모듈과 텍스트 처리 모듈의 학습 세션을 수행할 수 있다. The client learning module 40, when the correction information generation module receives correction information reflecting the user's correction, uses the correction information as the ground truth to update the parameters of the artificial neural network included in the text extraction module and the text processing module. You can conduct learning sessions of text extraction module and text processing module.

클라이언트 학습 모듈(40)의 텍스트 추출 모듈의 일구성인 텍스트 박스 생성 신경망 모듈의 학습 세션과 관련하여, 클라이언트 학습 모듈(40)의 텍스트 박스 생성 신경망 모듈의 학습 세션은 상기 수정 정보 생성 모듈에서 수신된 수정 정보에 텍스트 박스 정보에 대한 수정이 포함된 경우 수행되도록 구성될 수 있고, 텍스트 박스 생성 신경망 모듈에 전자문서 이미지를 입력 데이터로 입력하고 전자문서 이미지에 포함된 텍스트를 둘러싸는 bounding box인 텍스트 박스의 좌표, 높이, 너비, 각도를 포함하는 텍스트 박스 정보를 출력 데이터로 하며, 출력된 텍스트 박스 정보와 수정 정보의 수정된 텍스트 박스 정보(ground truth)의 손실(loss)이 작아지는 방향으로(또는, 유사도가 높아지는 방향으로) 텍스트 박스 생성 신경망 모듈의 파라미터가 업데이트 되도록 구성될 수 있다. Regarding the learning session of the text box generating neural network module, which is a component of the text extraction module of the client learning module 40, the learning session of the text box generating neural network module of the client learning module 40 is received from the correction information generating module. It can be configured to be performed when the correction information includes correction for the text box information, and the text box, which is a bounding box that inputs the electronic document image as input data to the text box generating neural network module and surrounds the text included in the electronic document image The text box information including the coordinates, height, width, and angle of is used as output data, and the loss of the output text box information and the modified text box information (ground truth) of the correction information is reduced (or , in a direction in which similarity increases) parameters of the text box generating neural network module may be updated.

클라이언트 학습 모듈(40)의 텍스트 처리 모듈의 일구성인 전자문서 분류 신경망 모듈의 학습 세션과 관련하여, 클라이언트 학습 모듈(40)의 전자문서 분류 신경망 모듈의 학습 세션은 상기 수정 정보 생성 모듈에서 수신된 수정 정보에 전자문서 종류 class에 대한 수정이 포함된 경우 수행되도록 구성될 수 있고, 전자문서 분류 신경망 모듈에 텍스트 정보를 입력 데이터로 하고, 전자문서 종류 class 및 각 항목에 대한 weight를 출력 데이터로 하며, 출력된 전자문서 종류 class와 수정 정보의 수정된 전자문서 종류 class(ground truth)의 손실(loss)이 작아지는 방향으로((또는, 유사도가 높아지는 방향으로) 전자문서 분류 신경망 모듈의 파라미터가 업데이트 되도록 구성될 수 있다. Regarding the learning session of the electronic document classification neural network module, which is a component of the text processing module of the client learning module 40, the learning session of the electronic document classification neural network module of the client learning module 40 is received from the correction information generating module. It can be configured to be performed when the correction information includes modification of the electronic document type class, the electronic document classification neural network module takes text information as input data, and the electronic document type class and weight for each item as output data. , The parameters of the electronic document classification neural network module are updated in the direction in which the loss of the output electronic document type class and the modified electronic document type class (ground truth) of the correction information becomes smaller (or in the direction of similarity increases). It can be configured so that

클라이언트 학습 모듈(40)의 텍스트 처리 모듈의 일구성인 항목 분류 신경망 모듈의 학습 세션과 관련하여, 클라이언트 학습 모듈(40)의 항목 분류 신경망 모듈의 학습 세션은 상기 수정 정보 생성 모듈에서 수신된 수정 정보에 항목 class에 대한 수정이 포함된 경우 수행되도록 구성될 수 있고, 항목 분류 신경망 모듈에 상기 텍스트 정보 중 항목 분류의 대상이 되는 텍스트인 분류대상 텍스트와 전체 텍스트 정보의 pair를 입력 데이터로 하고, 분류된 항목 class와 신뢰도(confidence)를 출력 데이터로 하며, 특정 분류대상 텍스트에 대해 출력된 항목 class와 수정 정보의 수정된 항목 class(ground truth)의 손실(loss)이 작아지는 방향으로((또는, 유사도가 높아지는 방향으로) 항목 분류 신경망 모듈의 파라미터가 업데이트 되도록 구성될 수 있다. Regarding the training session of the item classification neural network module, which is a component of the text processing module of the client learning module 40, the learning session of the item classification neural network module of the client learning module 40 is the correction information received from the correction information generation module. It can be configured to be performed when modification of the item class is included in the item classification neural network module, and a pair of the text to be classified, which is the target text for item classification among the text information, and the full text information are input data, and the classification is performed. The output data is the item class and confidence, and the loss of the item class output for a specific classification target text and the modified item class (ground truth) of the correction information is reduced ((or, In a direction in which similarity increases) parameters of the item classification neural network module may be updated.

파라미터 업로드 모듈(50)은, 클라이언트 학습 모듈(40)에 의한 텍스트 추출 모듈 및 텍스트 처리 모듈의 학습 세션 이후, 텍스트 추출 모듈 및 텍스트 처리 모듈의 변경된 파라미터를 전자문서 관리 서버(200)의 연합 학습 모듈(220)에 업로드하는 모듈이다. 파라미터는, 텍스트 추출 모듈 및 텍스트 처리 모듈의 학습 세션 이후 그래디언트(g) 또는 인공신경망의 웨이트(w)를 포함할 수 있다. 파라미터 업로드 모듈(50)의 텍스트 박스 생성 신경망 모듈에 대한 파라미터 업로드는 상기 수정 정보 생성 모듈에서 수신된 수정 정보에 텍스트 박스 정보에 대한 수정이 포함된 경우에 수행되도록 구성되고, 파라미터 업로드 모듈(50)의 전자문서 분류 신경망 모듈에 대한 파라미터 업로드는 상기 수정 정보 생성 모듈에서 수신된 수정 정보에 전자문서 분류 class에 대한 수정이 포함된 경우에 수행되도록 구성되고, 파라미터 업로드 모듈(50)의 항목 분류 신경망 모듈에 대한 파라미터 업로드는 상기 수정 정보 생성 모듈에서 수신된 수정 정보에 항목 class에 대한 수정이 포함된 경우에 수행되도록 구성된다. After the learning session of the text extraction module and the text processing module by the client learning module 40, the parameter upload module 50 transfers the changed parameters of the text extraction module and the text processing module to the combined learning module of the electronic document management server 200. This module is uploaded to (220). The parameters may include the gradient (g) or the weight (w) of the artificial neural network after the learning sessions of the text extraction module and the text processing module. Parameter upload to the text box generating neural network module of the parameter upload module 50 is configured to be performed when the correction information received from the correction information generation module includes correction of the text box information, and the parameter upload module 50 Parameter upload for the electronic document classification neural network module is configured to be performed when the correction information received from the correction information generation module includes correction for the electronic document classification class, and the item classification neural network module of the parameter upload module 50 Parameter upload for is configured to be performed when modification information received from the modification information generating module includes modification of an item class.

또한, 파라미터 업로드 모듈(50)은, 텍스트 추출 모듈 및 텍스트 처리 모듈의 변경된 파라미터에 노이즈(ε)를 적용하여 전자문서 관리 서버(200)에 업로드하도록 구성될 수 있다. 예를 들어, 파라미터가 그래디언트(g)인 경우, g+ε로 업로드 되도록 구성될 수 있고, 파라미터가 웨이트(w)인 경우 w+ε로 업로드 되도록 구성될 수 있다. 이때, 노이즈(ε)는 양의 값과 음의 값이 랜덤하게 부여되어 연합 학습 모듈(220)에서의 연합 학습 시 노이즈의 영향이 최소한으로 적용되도록 구성될 수 있다. 이에 따르면, 클라이언트에서 파라미터에 노이즈가 적용되어 전자문서 관리 서버에 업로드 되므로, 제3자가 파라미터를 취득하더라도 전자문서 정보의 취득이 불가능한 효과가 발생된다. In addition, the parameter upload module 50 may be configured to apply noise ε to the changed parameters of the text extraction module and the text processing module and upload them to the electronic document management server 200 . For example, if the parameter is the gradient (g), it can be configured to be uploaded as g+ε, and if the parameter is the weight (w), it can be configured to be uploaded as w+ε. In this case, the noise ε may be configured to have a positive value and a negative value randomly assigned so that the effect of the noise is minimally applied during federated learning in the federated learning module 220 . According to this, since noise is applied to the parameter in the client and uploaded to the electronic document management server, even if a third party acquires the parameter, it is impossible to obtain the electronic document information.

메인 신경망 다운로드 모듈(60)은, 전자문서 관리 서버(200)의 연합 학습 모듈(220)에 의해 기학습된 텍스트 박스 생성 메인 신경망, 전자문서 분류 메인 신경망, 항목 분류 메인 신경망을 다운로드하여 텍스트 박스 생성 신경망 모듈, 전자문서 분류 신경망 모듈, 항목 분류 신경망 모듈의 적어도 일부 네트워크를 치환(교체, 전이)하는 모듈이다. The main neural network download module 60 downloads the text box generation main neural network, the electronic document classification main neural network, and the item classification main neural network pretrained by the federated learning module 220 of the electronic document management server 200 to generate a text box. This is a module that replaces (replaces, transfers) at least some networks of the neural network module, the electronic document classification neural network module, and the item classification neural network module.

텍스트 박스 생성 신경망 모듈의 네트워크 다운로드는, 전자문서 관리 서버(200)의 메인 신경망 모듈(210)에서 기학습된 텍스트 박스 생성 메인 신경망을 다운로드 받고, 텍스트 박스 생성 신경망 모듈의 후방 레이어(전문 스타일 판별기(130)에 의해 학습되는 최후 n개의 layer)를 제외한 나머지를 다운로드 받은 텍스트 박스 생성 메인 신경망으로 치환(교체, 전이)하도록 구성된다. 이에 따르면, 각 클라이언트의 텍스트 박스 생성 신경망 모듈의 신경망을 메인 신경망으로 완전히 교체하지 않으므로, 텍스트 박스 생성 신경망 모듈을 계속적으로 업데이트 하면서도 각 클라이언트에서 출력되는 전문적인 텍스트 박스의 스타일을 유지할 수 있게 되어 생성되는 텍스트 박스의 개인화가 가능해지는 효과가 발생된다. The network download of the text box generating neural network module downloads the pretrained text box generating main neural network from the main neural network module 210 of the electronic document management server 200, and the back layer of the text box generating neural network module (professional style discriminator) It is configured to replace (replace, transition) the rest except for the last n layers learned by (130) with the downloaded text box generating main neural network. According to this, since the neural network of each client's text box generating neural network module is not completely replaced with the main neural network, it is possible to maintain the style of a professional text box output from each client while continuously updating the text box generating neural network module. The effect of enabling personalization of the text box is generated.

전자문서 분류 메인 신경망의 네트워크 다운로드는, 전자문서 관리 서버(200)의 메인 신경망 모듈(210)에서 기학습된 전자문서 분류 메인 신경망을 다운로드 받고, 전자문서 분류 신경망 모듈을 다운로드 받은 전자문서 분류 메인 신경망으로 치환(교체, 전이)하도록 구성된다. Network download of the electronic document classification main neural network downloads the electronic document classification main neural network pretrained in the main neural network module 210 of the electronic document management server 200, and downloads the electronic document classification neural network module. It is configured to be replaced (replaced, transferred) with.

항목 분류 메인 신경망의 네트워크 다운로드는, 전자문서 관리 서버(200)의 메인 신경망 모듈(210)에서 기학습된 항목 분류 메인 신경망을 다운로드 받고, 항목 분류 신경망 모듈을 다운로드 받은 항목 분류 메인 신경망으로 치환(교체, 전이)하도록 구성된다. Network download of the item classification main neural network downloads the pre-learned item classification main neural network from the main neural network module 210 of the electronic document management server 200, and replaces the item classification neural network module with the downloaded item classification main neural network. , transition).

전자문서 관리 서버(200)는, 메인 신경망 모듈(210)과 연합 학습 모듈(220)을 포함할 수 있고, 복수의 사용자 클라이언트(100)에서 업로드되는 텍스트 박스 생성 신경망 모듈, 전자문서 분류 신경망 모듈, 항목 분류 신경망 모듈의 파라미터를 취합하여 메인 신경망 모듈(210)을 업데이트한 뒤, 기학습된 메인 신경망을 사용자 클라이언트(100)에 다시 배포하도록 구성되는 서버이다. The electronic document management server 200 may include a main neural network module 210 and a federated learning module 220, a text box generation neural network module uploaded from a plurality of user clients 100, an electronic document classification neural network module, A server configured to collect parameters of the item classification neural network module, update the main neural network module 210, and then redistribute the pretrained main neural network to the user client 100.

메인 신경망 모듈(210)은, 텍스트 박스 생성 메인 신경망, 전자문서 분류 메인 신경망, 항목 분류 메인 신경망을 포함할 수 있으며, 연합 학습 모듈(220)에 의해 특정 그래디언트(g) 또는 특정 웨이트(w)로 파라미터가 업데이트되도록 구성될 수 있다. 텍스트 박스 생성 메인 신경망은, 텍스트 박스 생성 신경망 모듈에 대응되는 메인 신경망을 의미하고, 전자문서 분류 메인 신경망은, 전자문서 분류 신경망 모듈에 대응되는 메인 신경망을 의미하며, 항목 분류 메인 신경망은, 항목 분류 신경망 모듈에 대응되는 메인 신경망을 의미한다. The main neural network module 210 may include a text box generating main neural network, an electronic document classification main neural network, and an item classification main neural network. Parameters can be configured to be updated. The text box generating main neural network refers to the main neural network corresponding to the text box generating neural network module, and the electronic document classification main neural network refers to the main neural network corresponding to the electronic document classification neural network module, and the item classification main neural network refers to the item classification. It means the main neural network corresponding to the neural network module.

연합 학습 모듈(220)은, 복수의 사용자 클라이언트(100)에서 업로드되는 텍스트 박스 생성 메인 신경망, 전자문서 분류 메인 신경망, 항목 분류 메인 신경망의 파라미터를 취합한 뒤, 취합 된 파라미터를 이용하여 메인 신경망 모듈(210)을 업데이트 하도록 구성되는 모듈이다. 이때, 취합된 파라미터를 이용하여 메인 신경망 모듈(210)을 업데이트하는 방법은 아래와 같이 수정 정보 생성 모듈에서 수신된 수정 정보와 수정 전 정보(텍스트 박스 정보, 전자문서 종류 class, 항목 class)의 차이인 수정 변화 정보가 파라미터의 가중치로 사용되고, 파라미터와 수정 변화 정보의 곱의 합산을 상기 수정 변화 정보의 합으로 나눔으로써 파라미터가 수정 변화 정보를 기준으로 평균되도록 구성될 수 있다. The federated learning module 220 collects the parameters of the text box generating main neural network, the electronic document classification main neural network, and the item classification main neural network uploaded from the plurality of user clients 100, and then uses the collected parameters as the main neural network module. (210) is a module configured to update. At this time, the method of updating the main neural network module 210 using the collected parameters is the difference between the correction information received from the correction information generation module and the information before correction (text box information, electronic document type class, item class) as follows The correction change information may be used as a weight of the parameter, and the parameter may be averaged based on the correction change information by dividing the sum of the product of the parameter and the correction change information by the sum of the correction change information.

파라미터가 그래디언트(g)인 경우, 연합 학습 모듈(220)에서는 아래의 수학식과 같이 메인 신경망 모듈(210)의 파라미터를 업데이트 하도록 구성될 수 있다. When the parameter is the gradient (g), the federated learning module 220 may be configured to update the parameter of the main neural network module 210 as shown in the following equation.

위 수학식 11에서, w_new는 메인 신경망 모듈(210)의 업데이트 후 웨이트, w_old는 메인 신경망 모듈(210)의 업데이트 이전 웨이트, α는 learning rate, s_n은 사용자 클라이언트(100) n에서 업로드 된 수정 변화 정보, g_n은 사용자 클라이언트(100) n에서 업로드 된 그래디언트를 의미하도록 구성될 수 있다. In Equation 11 above, w _new is the weight after updating the main neural network module 210, w _old is the weight before updating the main neural network module 210, α is the learning rate, and s _n is the user client 100 Upload from n The modified change information, g _n , may be configured to mean a gradient uploaded from the user client 100 n .

파라미터가 웨이트(w)인 경우, 연합 학습 모듈(220)에서는 아래의 수학식과 같이 메인 신경망 모듈(210)의 파라미터를 업데이트 하도록 구성될 수 있다. When the parameter is weight (w), the federated learning module 220 may be configured to update the parameter of the main neural network module 210 as shown in the following equation.

위 수학식 12에서, w_new는 메인 신경망 모듈(210)의 업데이트 후 웨이트, s_n은 사용자 클라이언트(100) n에서 업로드 된 수정 변화 정보, w_n은 사용자 클라이언트(100) n에서 업로드 된 웨이트를 의미하도록 구성될 수 있다. In Equation 12 above, w _new is the weight after updating the main neural network module 210, s _n is the correction change information uploaded from the user client 100 n, and w _n is the weight uploaded from the user client 100 n. can be configured to mean

이에 따르면, 메인 신경망 모듈(210)에서의 연합 학습 시 수정 변화 정보가 파라미터의 가중치로 사용됨으로써 미니 배치(mini batch)의 효과가 발생된다. 또한, 메인 신경망 모듈(210)에서의 연합 학습 시 수정 변화 정보가 파라미터의 가중치로 사용됨으로써 네트워크 토폴로지 및 비동기 통신 문제가 저감되는 효과가 발생된다. 또한, 수정 정보와 수정 전 정보의 차이가 큰 경우가 메인 신경망 모듈의 업데이트에 보다 큰 영향력을 갖게 되는 효과가 발생된다. According to this, during joint learning in the main neural network module 210, correction change information is used as a weight of a parameter, thereby generating a mini-batch effect. In addition, during federated learning in the main neural network module 210, correction change information is used as a weight of a parameter, thereby reducing network topology and asynchronous communication problems. In addition, when the difference between the corrected information and the pre-corrected information is large, an effect of having a greater influence on the update of the main neural network module occurs.

도 21은 본 발명의 일 실시예에 따른 자동 알림 기능을 갖는 인공지능 기반의 전자문서 관리 장치를 개략적으로 나타낸 것이다. 21 schematically illustrates an artificial intelligence-based electronic document management device having an automatic notification function according to an embodiment of the present invention.

도 21을 참조하면, 자동 알림 기능을 갖는 인공지능 기반의 전자문서 관리 장치(1)는, 전자문서 이미지를 입력받아, 전자문서 이미지에 포함된 텍스트를 항목 별로 매칭한 복수의 항목 정보를 출력할 수 있다. 이를 위해, 전자문서 이미지를 수신하고 전자문서 이미지에 포함된 텍스트를 추출하여 텍스트 정보를 출력하는 인공신경망을 포함하는 텍스트 추출 모듈과, 텍스트 정보가 y좌표의 순서로 순차적으로 통합된 정보인 전체 텍스트 정보를 입력 받고, 전자문서 이미지에 포함된 텍스트를 항목별로 매칭한 항목 정보를 출력하는 인공신경망을 포함하는 텍스트 처리 모듈과, 텍스트 추출 모듈 및 텍스트 처리 모듈에 포함된 인공신경망의 학습 세션을 처리하여 파라미터를 업데이트하기 위한 신경망 처리 모듈과, 복수의 항목 정보 중에서 자동 알림을 수행할 자동 알림 대상 항목 정보를 결정하기 위한 자동 알림 모듈을 포함하는 하나 이상의 인공 신경망 모듈이 이용될 수 있다. Referring to FIG. 21, the artificial intelligence-based electronic document management device 1 having an automatic notification function receives an electronic document image and outputs a plurality of items of information obtained by matching text included in the electronic document image for each item. can To this end, a text extraction module including an artificial neural network that receives an electronic document image, extracts text included in the electronic document image, and outputs text information, and the entire text, which is information in which the text information is sequentially integrated in the order of y coordinates A text processing module including an artificial neural network that receives information and outputs item information obtained by item-by-item matching of texts included in electronic document images, and processes learning sessions of the artificial neural network included in the text extraction module and text processing module. One or more artificial neural network modules including a neural network processing module for updating parameters and an automatic notification module for determining automatic notification target item information to be automatically notified among a plurality of item information may be used.

이러한 하나 이상의 인공 신경망 모듈 중에서, 텍스트 추출 모듈, 텍스트 처리 모듈 및 신경망 처리 모듈에 대해서는 앞에서 자세히 설명하였으므로, 중복되는 설명은 생략한다. Among the one or more artificial neural network modules, the text extraction module, the text processing module, and the neural network processing module have been described in detail above, and therefore, redundant descriptions will be omitted.

본 발명의 일 실시예에 따른 자동 알림 모듈은 복수의 항목 정보 중에서 자동 알림이 필요한 자동 알림 대상 항목 정보를 선별하여 결정할 수 있다. 예를 들어, 전자문서 이미지의 대상 문서가 부동산 매매 계약서인 경우에, 인공지능 기반의 전자문서 관리 장치(1)에 의해 생성된 복수의 항목 정보는 예를 들어, 매물 소재지, 계약 금액, 계약 당사자, 계약일, 잔금일, 전월세 갱신일 등을 포함할 수 있다. 이 중에서 잔금일, 전월세 갱신일은 자동 알림이 수행되는 경우 사용자에게 도움이 될 수 있는 항목 정보에 해당할 수 있으므로, 자동 알림 모듈은 강화 학습 기반의 인공 신경망 모듈을 통해 이러한 항목 정보를 자동 알림 대상 항목 정보로서 선별하여 결정할 수 있다. 그리고, 자동 알림 모듈(2)은 전자문서와 연관된, 또는 보다 바람직하게는 자동 알림 대상 항목 정보에 연관된 사용자 디바이스에 자동 알림을 수행할 수 있다. 이러한 자동 알림은 사용자 클라이언트의 자체 알림 기능을 사용하거나, SNS, 메신저 메시지, 이메일과 같은 다양한 방식으로 전송될 수 있으며, 이에 대해 한정하지는 않는다. The automatic notification module according to an embodiment of the present invention may select and determine automatic notification target item information requiring automatic notification among a plurality of items of information. For example, when the target document of the electronic document image is a real estate sales contract, the plurality of item information generated by the artificial intelligence-based electronic document management device 1 is, for example, the location of the property, the contract amount, and the parties to the contract , the date of the contract, the date of balance, and the renewal date of rent on a monthly basis. Among them, balance date and monthly rent renewal date may correspond to item information that can be helpful to the user if automatic notification is performed, so the automatic notification module provides automatic notification target items through reinforcement learning-based artificial neural network module. Information can be selected and determined. And, the automatic notification module 2 may perform automatic notification to a user device associated with an electronic document, or more preferably, associated with automatic notification target item information. Such automatic notification may be transmitted using a notification function of the user client or in various ways such as SNS, messenger message, and e-mail, but is not limited thereto.

도 22는 본 발명의 일 실시예에 따른 자동 알림 모듈을 개략적으로 나타낸 것이다. 22 schematically illustrates an automatic notification module according to an embodiment of the present invention.

도 22를 참조하면, 자동 알림 모듈(2)은 인공지능 기반의 전자문서 관리 장치(1)에 의해 생성된 복수의 항목 정보 중 날짜와 관계된 항목 정보를 선별하기 위한 선별 모듈과, 선별된 날짜 관련 항목 정보 중에서 자동 알림을 수행할 항목 정보를 결정하기 위한 자동 알림 항목 결정 모듈을 포함할 수 있다. Referring to FIG. 22, the automatic notification module 2 includes a selection module for selecting item information related to a date from among a plurality of item information generated by the artificial intelligence-based electronic document management device 1, and related to the selected date. An automatic notification item determination module for determining item information to be automatically notified among item information may be included.

선별 모듈은 복수의 항목 정보들 각각이 날짜와 관계되어 있는지 여부를 판정할 수 있다. 일례로, 날짜의 경우 2022년 10월 9일, 2022.10.9, 22/10/9 등 다양한 표기 방식으로 기재될 수 있으므로, 선별 모듈은 복수의 항목 정보 각각이 숫자를 포함하는지, 그리고 숫자를 포함한다면 숫자가 날짜를 나타내는지 여부를 판정할 수 있다. 그리고, 선별 모듈은 날짜와 관계된 항목 정보(즉, 날짜 관련 항목 정보)를 선별된 항목 정보들을 자동 알림 항목 결정 모듈로 전달할 수 있다. The selection module may determine whether each of the plurality of items of information is related to a date. For example, in the case of a date, since it can be written in various notation methods such as October 9, 2022, 2022.10.9, 22/10/9, etc., the screening module determines whether each of the plurality of item information includes a number and includes a number If so, you can determine whether a number represents a date. And, the selection module may transmit the item information related to the date (ie, item information related to the date) to the automatic notification item determination module.

자동 알림 항목 결정 모듈은 하나 이상의 날짜 관련 항목 정보 중에서 자동 알림을 수행할 항목 정보를 결정할 수 있다. 이 때, 자동 알림 항목 결정 모듈은 규칙 기반(Rule-based)으로 자동 알림 수행 대상 항목 정보를 결정할 수 있다. 예를 들어, 관리자가 미리 입력한 "잔금일", "중도금 입금일", "전월세 만기일"과 같은 항목 정보에 대해 자동 알림 수행 대상 항목 정보로 결정할 수 있다. The automatic notification item determination module may determine item information to be automatically notified from among one or more date-related item information. In this case, the automatic notification item determination module may determine automatic notification execution target item information on a rule-based basis. For example, item information such as "balance date", "intermediate payment payment date", and "expiration date of rent for cheonsei" input in advance by the manager may be determined as item information subject to automatic notification.

다른 일 실시예에 따르면, 자동 알림 항목 결정 모듈은 강화 학습 기반의 인공 신경망 모듈을 이용하여 하나 이상의 날짜 관련 항목 정보 중에서 자동 알림 수행 대상 항목 정보를 결정할 수 있다.According to another embodiment, the automatic notification item determination module may determine automatic notification execution target item information from among one or more date-related item information using a reinforcement learning-based artificial neural network module.

도 23은 본 발명의 일 실시예에 따른 자동 알림 항목 결정 모듈에 포함된 트랜스포머 기반 인공 신경망 모듈을 개략적으로 나타내는 도면이다.23 is a diagram schematically illustrating a transformer-based artificial neural network module included in an automatic notification item determination module according to an embodiment of the present invention.

도 23을 참조하면, 자동 알림 항목 결정 모듈은 트랜스포머 기반 인공 신경망 모듈을 포함할 수 있으며, 이러한 트랜스포머 기반 인공 신경망 모듈(10)을 이용한 강화 학습을 통해 자동 알림 대상 항목 정보(a_t)를 포함하는 최적 출력을 생성할 수 있다. Referring to FIG. 23, the automatic notification item determination module may include a transformer-based artificial neural network module, and through reinforcement learning using the transformer-based artificial neural network module 10, automatic notification target item information (a _t ) including Optimum output can be produced.

트랜스포머 기반 인공 신경망 모듈(10)에는 미래 기대 보상(R_t), 자동 알림 항목 결정 모듈이 수신한 상태 정보(S_t), 및 인공 신경망 모듈(10)에서 이전에 출력한 결과물인 자동 알림 대상 항목 정보에 대한 과거 출력(a_t-1)이 입력될 수 있다. The transformer-based artificial neural network module 10 includes a future expected reward (R _t ), state information received by the automatic notification item determination module (S _t ), and an automatic notification target item that is a result previously output from the artificial neural network module 10 Past output (a _t-1 ) for information can be input.

여기서, 미래 기대 보상(R_t)은 Return to go로써 자동 알림 항목 결정 모듈에서 특정 액션을 취함으로써, 즉, 자동 알림 항목 결정 모듈이 자동 알림 대상 항목 정보(a_t)를 생성함으로써 미래에 기대되는 보상들의 총합을 나타낸다. 예를 들어, 자동 알림 항목 결정 모듈은 자동 알림을 수행하였을 때의 사용자의 피드백 등에 의해 실제 자동 알림이 필요한 항목이었는지에 따라 상이한 보상이 주어지도록 설계될 수 있다. Here, the future expected reward (R _t ) is expected in the future by taking a specific action in the automatic notification item determination module as Return to go, that is, by the automatic notification item determination module generating automatic notification target item information (a _t ). represents the sum of the rewards. For example, the automatic notification item determination module may be designed to give different rewards depending on whether an item actually requires automatic notification based on a user's feedback when the automatic notification is performed.

상태 정보(S_t)는 자동 알림 항목 결정 모듈이 수신한 정보로서 날짜와 관계된 항목 정보와, 선택적으로는 관련 법령 정보를 포함할 수 있다. 도 22에 도시된 것처럼, 법령 정보는 최신 버전(즉, 개정안을 반영하여)으로 자동 알림 항목 결정 모듈에 별도로 입력될 수 있다. Status information (S _t ) is information received by the automatic notification item determination module, and may include date-related item information and, optionally, related law information. As shown in FIG. 22, the legal information may be separately input into the automatic notification item determination module in the latest version (ie, by reflecting the amendment).

과거 출력(a_t-1)은 트랜스포머 기반 인공 신경망 모듈(10)에 대한 입출력이 실시간으로 또는 시계열적으로 이루어지는 경우에, 트랜스포머 기반 인공 신경망 모듈(10)에서 이전에 출력한 결과물인 자동 알림 대상 항목 정보에 대한 과거 출력(a_t-1)을 나타낸다. The past output (a _t-1 ) is an automatic notification target item that is a result previously output from the transformer-based artificial neural network module 10 when input and output to the transformer-based artificial neural network module 10 is performed in real time or time series. Indicates the past output (a _t-1 ) for information.

이와 같이 미래 기대 보상(R_t), 상태 정보(S_t), 과거 출력 관리 모드(a_t-1)가 트랜스포머 기반 인공 신경망 모듈(10)에 입력되면, 트랜스포머 기반 인공 신경망 모듈(10)은 자동 알림 항목 결정 모듈에서 현 시점에서 선택할 최적의 출력인 자동 알림 대상 항목 정보(a_t)를 생성 및 출력할 수 있다. In this way, when future expected compensation (R _t ), state information (S _t ), and past output management mode (a _t-1 ) are input to the transformer-based artificial neural network module 10, the transformer-based artificial neural network module 10 automatically In the notification item determination module, automatic notification target item information (a _t ), which is the optimal output to be selected at the present time, can be generated and output.

이하에서는, 트랜스포머 기반 인공 신경망 모듈(10)을 통해 입력 데이터를 처리하여 결과 값을 출력하는 과정을 설명하도록 한다. Hereinafter, a process of processing input data through the transformer-based artificial neural network module 10 and outputting a resultant value will be described.

도 23을 참조하면, 본 발명의 일 실시예에 따르면, 자동 알림 항목 결정 모듈의 자동 알림 대상 항목 정보(a_t) 생성을 위한 트랜스포머 기반 인공 신경망 모듈(10)은 제1 블록(10a)(예를 들어, 인코더) 및 제2 블록(10b)(예를 들어, 디코더)을 포함할 수 있다.Referring to FIG. 23, according to an embodiment of the present invention, the transformer-based artificial neural network module 10 for generating automatic notification target item information (a _t ) of the automatic notification item determination module includes a first block 10a (eg For example, an encoder) and a second block 10b (eg, a decoder).

인공 신경망 모듈(10)의 제1 블록(10a)에 미래 기대 보상(R_t) 및 상태 정보(S_t)가 입력될 수 있다. 그리고, 이처럼 입력된 미래 기대 보상(R_t) 및 상태 정보(S_t)는 제1 블록(10a)의 임베딩 모듈에 입력되어 벡터화됨으로써 임베딩 정보가 출력되고 출력된 임베딩 정보의 묶음을 서로 다른 Linear Layer를 통해 Linear embedding하여 features dimension을 포함하는 Q(Query feature), K(Key feature), V(Value feature)로 구성한 뒤, Q(Query feature), K(Key feature), V(Value feature)를 Multi-head Attention Layer의 입력 데이터로 할 수 있다. Q, K, V는 임베딩 정보로부터 서로 다른 Linear embedding을 통해 생성될 수 있다. Q와 K는 1st MatMul operation의 입력 데이터로 입력되어 Scale 및 softmax operation을 통해 Q에 대해서 모든 K에 대한 유사도를 계산하여 일련의 임베딩 정보 사이의 유사도 벡터를 출력하고, 이 유사도 벡터와 V는 2nd MatMul operation의 입력 데이터로 입력될 수 있다. 그리고, Multi-head Attention Layer는 입력 받은 Q, K, V 값에 기초하여 어텐션 정보를 출력하여 Feed Forward Layer로 입력할 수 있다. 이러한 Multi-head Attention Layer 및 Feed Forward Layer는 N번 반복될 수 있으며, 이전 Feed Forward Layer에서 출력된 인코딩 정보가 그 다음 Multi-head Attention Layer의 입력이 되는 방식으로 반복이 이루어질 수 있다. 그리고 마지막 Feed Forward Layer는 최종 인코딩 정보를 출력할 수 있는데, 이러한 최종 인코딩 정보는 트랜스포머 기반 인공 신경망 모듈(10)의 제2 블록(10b)의 제2 Multi-head Attention Layer에 K(Key), V(Value) 값으로서 입력될 수 있다. An expected future reward (R _t ) and state information (S _t ) may be input to the first block 10a of the artificial neural network module 10 . In addition, the future expected compensation (R _t ) and state information (S _t ) input in this way are input to the embedding module of the first block 10a and vectorized so that embedding information is output and a bundle of output embedding information is divided into different linear layers Through linear embedding, it is composed of Q (Query feature), K (Key feature), and V (Value feature) including features dimensions, and then Q (Query feature), K (Key feature), and V (Value feature) are multi - Can be used as the input data of the head Attention Layer. Q, K, and V may be generated through different linear embeddings from embedding information. Q and K are input as input data of the 1st MatMul operation, calculate the similarity for all K for Q through Scale and softmax operation, and output a similarity vector between a series of embedding information, and this similarity vector and V are 2nd MatMul It can be input as input data of operation. In addition, the multi-head attention layer can output attention information based on the input Q, K, and V values and input it to the feed forward layer. These multi-head attention layers and feed forward layers can be repeated N times, and the encoding information output from the previous feed forward layer can be repeated in such a way that the input of the next multi-head attention layer. And the last Feed Forward Layer can output final encoding information, which is K (Key), V (Value) It can be input as a value.

또한, 트랜스포머 기반 인공 신경망 모듈(10)의 제2 블록(10b)에는 과거 출력(a_t-1)이 입력될 수 있다. 여기서 과거 출력(a_t-1)은 제2 블록(10b)의 임베딩 모듈에 입력되어 벡터화됨으로써 임베딩 정보가 출력되고 출력된 임베딩 정보의 묶음을 서로 다른 Linear Layer를 통해 Linear embedding하여 features dimension을 포함하는 Q(Query feature), K(Key feature), V(Value feature)로 구성한 뒤, Q(Query feature), K(Key feature), V(Value feature)를 제1 Multi-head Attention Layer의 입력 데이터로 할 수 있다. 그리고 제1 Multi-head Attention Layer는 Q(Query) 값을 출력할 수 있는데 이 Q 값은 제2 Multi-head Attention Layer로 입력된다. 제2 Multi-head Attention Layer는 K, V 값으로는 제1 블록(10a)으로부터의 최종 인코딩 정보를 입력받을 수 있다. 그리고 제2 Multi-head Attention Layer는 입력 받은 Q, K, V 값에 기초하여 어텐션 정보를 출력하여 Feed Forward Layer로 입력할 수 있다. 이러한 제1 Multi-head Attention Layer, 제2 Multi-head Attention Layer 및 Feed Forward Layer는 N번 반복될 수 있으며, 이전 Feed Forward Layer에서 출력된 디코딩 정보가 그 다음 제1 Multi-head Attention Layer의 입력이 되는 방식으로 반복이 이루어질 수 있다. 그리고 마지막 Feed Forward Layer는 최종 디코딩 정보를 출력할 수 있는데, 이러한 최종 디코딩 정보는 Linear Layer 및 Softmax Layer를 거쳐, 트랜스포머 인공 신경망 모듈(10)은 자동 알림 항목 결정 모듈에서 현 시점에서 선택할 최적의 출력인 자동 알림 대상 항목 정보(a_t)를 생성할 수 있다. In addition, the past output (a _t−1 ) may be input to the second block 10b of the transformer-based artificial neural network module 10 . Here, the past output (a _t-1 ) is input to the embedding module of the second block 10b and vectorized to output embedding information, and linear embedding a bundle of output embedding information through different linear layers to include features dimension. After configuring Q (Query feature), K (Key feature), and V (Value feature), Q (Query feature), K (Key feature), and V (Value feature) are used as input data of the first multi-head attention layer. can do. And the first multi-head attention layer can output a Q (Query) value, and this Q value is input to the second multi-head attention layer. The second multi-head attention layer may receive final encoding information from the first block 10a as K and V values. And the second multi-head attention layer can output attention information based on the input Q, K, and V values and input it to the Feed Forward Layer. The first multi-head attention layer, the second multi-head attention layer, and the feed forward layer may be repeated N times, and the decoding information output from the previous feed forward layer is input to the next first multi-head attention layer. Iterations can be made in this way. And the last Feed Forward Layer can output final decoding information, and this final decoding information goes through a Linear Layer and a Softmax Layer, and the Transformer Artificial Neural Network module 10 is the optimal output to be selected at this time in the automatic notification item determination module. Automatic notification target item information (a _t ) can be created.

다만, 전술한 것과 같은 트랜스포머 기반 인공 신경망 모듈(10)의 구체적인 구현 방식은 다양하게 달라질 수 있으며, 이에 대해 한정하지는 않는다. However, a specific implementation method of the transformer-based artificial neural network module 10 as described above may be variously changed, but is not limited thereto.

이상에서 설명한 바와 같이, 본 발명이 속하는 기술 분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 상술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함하는 것으로 해석되어야 한다.As described above, those skilled in the art to which the present invention pertains will be able to understand that the present invention can be embodied in other specific forms without changing its technical spirit or essential features. Therefore, the above-described embodiments should be understood as illustrative in all respects and not restrictive. The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention.

본 명세서 내에 기술된 특징들 및 장점들은 모두를 포함하지 않으며, 특히 많은 추가적인 특징들 및 장점들이 도면들, 명세서, 및 청구항들을 고려하여 당업자에게 명백해질 것이다. 더욱이, 본 명세서에 사용된 언어는 주로 읽기 쉽도록 그리고 교시의 목적으로 선택되었고, 본 발명의 주제를 묘사하거나 제한하기 위해 선택되지 않을 수도 있다는 것을 주의해야 한다.The features and advantages described in this specification are not all inclusive, and many additional features and advantages will become apparent to those skilled in the art, particularly from consideration of the drawings, specification, and claims. Moreover, it should be noted that the language used herein has been chosen primarily for readability and instructional purposes, and may not have been chosen to delineate or limit the subject matter of the invention.

본 발명의 실시예들의 상기한 설명은 예시의 목적으로 제시되었다. 이는 개시된 정확한 형태로 본 발명을 제한하거나, 빠뜨리는 것 없이 만들려고 의도한 것이 아니다. 당업자는 상기한 개시에 비추어 많은 수정 및 변형이 가능하다는 것을 이해할 수 있다.The foregoing description of embodiments of the present invention has been presented for purposes of illustration. It is not intended to limit the invention to the precise form disclosed or to do so without omission. Those skilled in the art can appreciate that many modifications and variations are possible in light of the above disclosure.

그러므로 본 발명의 범위는 상세한 설명에 의해 한정되지 않고, 이를 기반으로 하는 출원의 임의의 청구항들에 의해 한정된다. 따라서, 본 발명의 실시예들의 개시는 예시적인 것이며, 이하의 청구항에 기재된 본 발명의 범위를 제한하는 것은 아니다.Therefore, the scope of the present invention is not limited by the detailed description, but by any claims of the application based thereon. Accordingly, the disclosure of embodiments of the invention is illustrative and not limiting of the scope of the invention set forth in the claims below.

Claims

An artificial intelligence-based electronic document management device with an automatic notification function,
one or more artificial neural network modules for receiving an electronic document image and outputting a plurality of items of information obtained by item-by-item matching of text included in the electronic document image;
a selection module for selecting item information related to a date from among the plurality of item information; and
Automatic notification item determination module for determining automatic notification target item information to be automatically notified among the selected item information
Including,
The automatic notification item determination module,
A transformer-based artificial neural network module that receives the selected item information and outputs item information for automatic notification,
The transformer-based artificial neural network module receives future expected compensation, state information, and past outputs of the artificial neural network module, and generates an optimal output including automatic notification target item information as an output,
The future expected reward represents the total amount of rewards expected in the future as the automatic notification item determination module generates the automatic notification target item information;
The status information includes at least the selected item information received by the automatic notification item determination module;
The past output includes automatic notification target item information output in the past by the transformer-based artificial neural network module.
Artificial intelligence-based electronic document management device with automatic notification function.

delete

According to claim 1,
The status information includes legal information related to the electronic document image.
Artificial intelligence-based electronic document management device with automatic notification function.

According to claim 1,
The transformer-based artificial neural network module includes a first block and a second block,
The first block receives the future expected compensation and the state information, outputs final encoding information, and provides it to the second block as an input;
The second block receives the past output and generates an optimal output including the automatic notification target item information.
Artificial intelligence-based electronic document management device with automatic notification function.

According to claim 1,
The one or more artificial neural network modules,
a text extraction module including an artificial neural network that receives an electronic document image, extracts text included in the electronic document image, and outputs text information; and
A text processing module including an artificial neural network that receives full text information, which is information in which the text information is sequentially integrated in the order of y coordinates, and outputs item information obtained by item-by-item matching of the text included in the electronic document image.
An artificial intelligence-based electronic document management device that includes an automatic notification function.