KR102486105B1

KR102486105B1 - Method for managing training data for optical character recognition

Info

Publication number: KR102486105B1
Application number: KR1020200059652A
Authority: KR
Inventors: 김동영; 곽명성; 원은지
Original assignee: 삼성생명보험주식회사
Priority date: 2020-05-19
Filing date: 2020-05-19
Publication date: 2023-01-06
Also published as: KR20210142895A

Abstract

본 개시의 일 실시예에 따라, 컴퓨터 판독가능 저장 매체에 저장된 컴퓨터 프로그램이 개시된다. 상기 컴퓨터 프로그램은 하나 이상의 프로세서에서 실행되는 경우 적어도 하나 이상의 네트워크 함수를 이용하여 문서 이미지를 생성하기 위한 이하의 동작들을 수행하도록 하며 상기 동작들은: 적어도 하나 이상의 디지털 문서 이미지를 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득하는 동작; 상기 실제화된 디지털 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 디지털 문서 이미지를 획득하는 동작; 상기 실제화된 디지털 문서 이미지 및 실제 문서 이미지를 제 1 식별 네트워크에 입력하여 두 문서 이미지를 식별하도록 제 1 식별 네트워크를 학습하는 동작; 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과 및 상기 제 1 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 1 생성 네트워크와 제 2 생성 네트워크를 학습하는 동작 및 상기 학습된 네트워크를 이용하여 문서 이미지를 생성하는 동작을 포함할 수 있다.According to one embodiment of the present disclosure, a computer program stored in a computer readable storage medium is disclosed. The computer program, when executed in one or more processors, performs the following operations for generating a document image using at least one network function, the operations comprising: inputting at least one or more digital document images to a first generation network; obtaining an actualized digital document image; acquiring a restored digital document image by inputting the actualized digital document image to a second generation network; learning the first identification network to identify the two document images by inputting the actualized digital document image and the actual document image into a first identification network; learning a first generation network and a second generation network based at least in part on a comparison result between the digital document image and the restored digital document image and an identification result of the first identification network, and using the learned network An operation of generating a document image may be included.

Description

Training data management method for optical character recognition {METHOD FOR MANAGING TRAINING DATA FOR OPTICAL CHARACTER RECOGNITION}

본 개시는 머신 러닝에 관한 것으로 보다 구체적으로 머신 러닝을 위한 학습 데이터 관리 방법에 관한 것이다. The present disclosure relates to machine learning, and more specifically, to a learning data management method for machine learning.

인공 신경망은 데이터에 대한 학습이 필요하며, 학습데이터의 질과 양에 따라 인공 신경망의 성능이 결정된다. 마찬가지로, 인공 신경망을 이용한 OCR 분야에서 인공지능 기반 문자인식 기술은 학습 데이터의 양과 질에 크게 의존한다. 그러므로 학습 데이터를 많이 확보할 수 있을수록 인공지능 기반 문자인식 기술의 성능을 향상시킬 수 있다. Artificial neural networks need to learn about data, and the performance of artificial neural networks is determined by the quality and quantity of learning data. Similarly, in the field of OCR using artificial neural networks, AI-based character recognition technology is highly dependent on the quantity and quality of learning data. Therefore, the more learning data can be secured, the better the performance of AI-based character recognition technology can be.

대한민국 등록특허공보 KR10-2003221에는 “필기 이미지 데이터 생성 시스템 및 필기 이미지 데이터 생성 방법”을 개시하고 있다.Korean Registered Patent Publication KR10-2003221 discloses “a writing image data generating system and a writing image data generating method”.

본 개시는 전술한 배경기술에 대응하여 안출된 것으로, 머신 러닝을 위한 학습 데이터 관리 방법을 제공하기 위함이다.The present disclosure has been made in response to the above background art, and is to provide a learning data management method for machine learning.

전술한 바와 같은 과제를 실현하기 위한 본 개시의 일 실시예에 따라, 컴퓨터 판독가능 저장 매체에 저장된 컴퓨터 프로그램이 개시된다. 상기 컴퓨터 프로그램은 하나 이상의 프로세서에서 실행되는 경우 적어도 하나 이상의 네트워크 함수를 이용하여 문서 이미지를 생성하기 위한 이하의 동작들을 수행하도록 하며 상기 동작들은: 적어도 하나 이상의 디지털 문서 이미지를 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득하는 동작; 상기 실제화된 디지털 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 디지털 문서 이미지를 획득하는 동작; 상기 실제화된 디지털 문서 이미지 및 실제 문서 이미지를 제 1 식별 네트워크에 입력하여 두 문서 이미지를 식별하도록 제 1 식별 네트워크를 학습하는 동작; 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과 및 상기 제 1 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 1 생성 네트워크와 제 2 생성 네트워크를 학습하는 동작; 및 상기 학습된 생성 네트워크를 이용하여 문서 이미지를 생성하는 동작을 포함할 수 있다.According to one embodiment of the present disclosure for realizing the above object, a computer program stored in a computer readable storage medium is disclosed. The computer program, when executed in one or more processors, performs the following operations for generating a document image using at least one network function, the operations comprising: inputting at least one or more digital document images to a first generation network; obtaining an actualized digital document image; acquiring a restored digital document image by inputting the actualized digital document image to a second generation network; learning the first identification network to identify the two document images by inputting the actualized digital document image and the actual document image into a first identification network; learning a first generation network and a second generation network based at least in part on a comparison result of the digital document image and the restored digital document image and an identification result of the first identification network; and generating a document image using the learned generation network.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 적어도 하나 이상의 디지털 문서 이미지는, 디지털 문서 이미지를 생성하기 위한 요소들을 포함하는 디지털 문서 이미지 생성 자료에 기초하여 생성될 수 있다.In an alternative embodiment of computer program operations for performing the following operations for generating a document image, the at least one digital document image is based on digital document image generating material including elements for generating a digital document image. can be created

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 적어도 하나 이상의 디지털 문서 이미지는, 공개된 디지털 문서 자료를 크롤링(crawling)하여 획득될 수 있다.In an alternative embodiment of the computer program operations performing the following operations for generating a document image, the at least one digital document image may be obtained by crawling published digital document material.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 실제화된 디지털 문서 이미지는, 상기 제 1 생성 네트워크에 의해 상기 디지털 문서 이미지에 대한 스타일 변환이 수행된 이미지로서, 상기 실제 문서 이미지와 유사한 스타일을 가지며, 상기 복원된 디지털 문서 이미지는, 상기 제 2 생성 네트워크에 의해 상기 실제화된 디지털 문서 이미지에 대한 스타일 변환이 수행된 이미지로서, 상기 디지털 문서 이미지와 유사한 스타일을 가질 수 있다.In an alternative embodiment of computer program operations for performing the following operations for generating a document image, the actualized digital document image is an image for which style conversion has been performed on the digital document image by the first generation network. , has a style similar to that of the actual document image, and the restored digital document image is an image for which style conversion has been performed on the actualized digital document image by the second generation network, and has a style similar to that of the digital document image. can have

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 제 2 생성 네트워크는, 상기 제 1 생성 네트워크와 역함수의 관계에 있으며 제 1 생성 네트워크에 대한 거울상의 네트워크 구조를 가질 수 있다.In an alternative embodiment of the computer program operations for performing the following operations for generating a document image, the second generating network is inversely related to the first generating network and has a network structure that is a mirror image of the first generating network. can have

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 제 1 식별 네트워크는, 입력된 두 문서 이미지를 식별하기 위해 실제화된 디지털 문서 이미지와 실제 문서 이미지 각각에 대해 실제 문서 이미지 여부를 나타내는 확률값을 부여하기 위한 연산을 포함할 수 있다.In an alternative embodiment of the computer program operations for performing the following operations for generating a document image, the first identification network is configured for each of the actualized digital document image and the real document image to identify the two inputted document images. It may include an operation for giving a probability value indicating whether or not it is an actual document image.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 제 1 식별 네트워크는, 실제화된 디지털 문서 이미지에 대한 실제 문서 이미지 여부를 나타내는 확률값 및 실제 문서 이미지에 대한 실제 문서 이미지 여부를 나타내는 확률값에 기초한 손실함수를 이용하여 학습될 수 있다.In an alternative embodiment of computer program operations for performing the following operations for generating a document image, the first identification network may include a probability value indicating whether the actualized digital document image is a real document image and a real document image for the actual document image. It can be learned using a loss function based on a probability value representing whether or not a document image exists.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 제 1 생성 네트워크는, 상기 제 1 식별 네트워크가 상기 제 1 생성 네트워크에서 생성된 실제화된 디지털 문서 이미지와 실제 문서 이미지를 식별하지 못하도록 학습될 수 있다.In an alternative embodiment of the computer program operations for performing the following operations for generating a document image, the first generation network causes the first identification network to transmit the actualized digital document image generated in the first generation network and the actual It can be taught not to identify document images.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 문서 이미지를 생성하는 동작은, 디지털 문서 이미지를 상기 학습된 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 생성하는 동작을 포함할 수 있다.In an alternative embodiment of the computer program operations for performing the following operations for generating a document image, the generating of the document image inputs the digital document image to the first trained network to generate an actualized digital document image It may include an operation to generate.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 문서 이미지를 생성하는 동작은, 실제 문서 이미지를 상기 학습된 제 2 생성 네트워크에 입력하여 복원된 실제 문서 이미지를 생성하는 동작을 포함할 수 있다.In an alternative embodiment of the computer program operations for performing the following operations for generating a document image, the operation of generating the document image inputs a real document image to the learned second generation network to obtain a restored real document image. It may include an operation to generate.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 적어도 하나 이상의 실제 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 실제 문서 이미지를 획득하는 동작, 상기 복원된 실제 문서 이미지를 상기 제 1 생성 네트워크에 입력하여 실제화된 실제 문서 이미지를 획득하는 동작, 상기 복원된 실제 문서 이미지 및 상기 디지털 문서 이미지를 제 2 식별 네트워크에 입력하여 두 문서 이미지를 식별하도록 제 2 식별 네트워크를 학습하는 동작 및 상기 실제 문서 이미지와 상기 실제화된 실제 문서 이미지의 비교 결과 및 상기 제 2 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 2 생성 네트워크를 학습하는 동작을 더 포함할 수 있다.In an alternative embodiment of computer program operations for performing the following operations for generating a document image, inputting at least one real document image to a second generation network to obtain a restored real document image, the restored real document image acquiring a real document image by inputting a document image into the first generation network, and inputting the restored real document image and the digital document image into a second identification network to identify the two document images; and learning a second generation network based at least in part on a comparison result between the real document image and the actualized real document image and an identification result of the second identification network.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 적어도 하나 이상의 실제 문서 이미지는, 인터넷 상에 공개된 실제 문서 이미지를 크롤링(crawling)하여 획득될 수 있다.In an alternative embodiment of computer program operations for performing the following operations for generating a document image, the at least one real document image may be obtained by crawling real document images published on the Internet.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 복원된 실제 문서 이미지는 상기 제 2 생성 네트워크에 의해 상기 실제 문서 이미지에 대한 스타일 변환이 수행된 이미지로서 상기 디지털 문서 이미지와 유사한 스타일을 가지며 상기 실제화된 실제 문서 이미지는, 상기 제 1 생성 네트워크에 의해 상기 복원된 실제 문서 이미지에 대한 스타일 변환이 수행된 이미지로서 상기 실제 문서 이미지와 유사한 스타일을 가질 수 있다.In an alternative embodiment of computer program operations for performing the following operations for generating a document image, the restored real document image is an image for which style conversion is performed on the real document image by the second generation network, and The actual document image having a style similar to that of the digital document image may have a style similar to the actual document image as an image for which style conversion is performed on the restored real document image by the first generation network.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 제 2 식별 네트워크는, 입력된 두 문서 이미지를 식별하기 위해 복원된 실제 문서 이미지와 디지털 문서 이미지 각각에 대해 디지털 문서 이미지 여부를 나타내는 확률값을 부여하기 위한 연산을 포함할 수 있다.In an alternative embodiment of the computer program operations for performing the following operations for generating a document image, the second identification network is configured for each of the restored real document image and digital document image to identify the two input document images. It may include an operation for giving a probability value indicating whether or not it is a digital document image.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 제 2 식별 네트워크는, 복원된 실제 문서 이미지에 대한 디지털 문서 이미지 여부를 나타내는 확률값 및 디지털 문서 이미지에 대한 디지털 문서 이미지 여부를 나타내는 확률값에 기초한 손실함수를 이용하여 학습될 수 있다.In an alternative embodiment of the computer program operations for performing the following operations for generating a document image, the second identification network may include a probability value indicating whether the restored real document image is a digital document image and a digital document image for the digital document image. It can be learned using a loss function based on a probability value representing whether or not a document image exists.

문서 이미지를 생성하기 위한 이하의 동작들을 수행하는 컴퓨터 프로그램 동작들의 대안적인 실시예에서, 상기 제 2 생성 네트워크는, 상기 제 2 식별 네트워크가 복원된 실제 문서 이미지와 디지털 문서 이미지를 식별하지 못하도록 학습될 수 있다.In an alternative embodiment of computer program operations performing the following operations for generating a document image, the second generating network may be trained such that the second identification network does not discriminate between a reconstructed real document image and a digital document image. can

전술한 바와 같은 과제를 실현하기 위한 본 개시의 일 실시예에 따라 문서 이미지를 생성하기 위한 방법이 개시된다. 문서 이미지를 생성하기 위한 방법은: 적어도 하나 이상의 디지털 문서 이미지를 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득하는 단계; 상기 실제화된 디지털 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 디지털 문서 이미지를 획득하는 단계; 상기 실제화된 디지털 문서 이미지 및 실제 문서 이미지를 제 1 식별 네트워크에 입력하여 두 문서 이미지를 식별하도록 제 1 식별 네트워크를 학습하는 단계; 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과 및 상기 제 1 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 1 생성 네트워크와 제 2 생성 네트워크를 학습하는 단계; 및 디지털 문서 이미지를 상기 학습된 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득하는 단계;를 포함할 수 있다.A method for generating a document image is disclosed according to an embodiment of the present disclosure for realizing the above object. A method for generating a document image includes: inputting at least one digital document image into a first generating network to obtain a realized digital document image; acquiring a restored digital document image by inputting the actualized digital document image to a second generation network; learning the first identification network to identify the two document images by inputting the actualized digital document image and the real document image into a first identification network; learning a first generation network and a second generation network based at least in part on a comparison result of the digital document image and the restored digital document image and an identification result of the first identification network; and acquiring a realized digital document image by inputting the digital document image to the first generated network.

전술한 바와 같은 과제를 실현하기 위한 본 개시의 일 실시예에 따라 문서 이미지 생성기가 개시된다. 상기 생성기는 문서 이미지 데이터를 처리하기 위한 하나 이상의 프로세서; 및 적어도 하나 이상의 딥러닝 기반 네트워크 함수를 저장하는 메모리;를 포함하며, 그리고 상기 하나 이상의 프로세서는 적어도 하나 이상의 디지털 문서 이미지를 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득하고, 상기 실제화된 디지털 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 디지털 문서 이미지를 획득하고, 상기 실제화된 디지털 문서 이미지 및 실제 문서 이미지를 제 1 식별 네트워크에 입력하여 두 문서 이미지를 식별하도록 제 1 식별 네트워크를 학습하고, 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과 및 상기 제 1 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 1 생성 네트워크를 학습하고, 그리고 디지털 문서 이미지를 상기 학습된 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득할 수 있다.A document image generator is disclosed according to an embodiment of the present disclosure for realizing the above object. The generator may include one or more processors for processing document image data; and a memory for storing at least one deep learning-based network function, wherein the one or more processors obtain a realized digital document image by inputting at least one or more digital document images to a first generation network, A digital document image is input to a second generation network to obtain a restored digital document image, and the actualized digital document image and the actual document image are input to a first identification network to learn the first identification network to identify the two document images. learning a first generation network based at least in part on a comparison result of the digital document image with the restored digital document image and an identification result of the first identification network; A realized digital document image can be acquired by inputting it into the network.

전술한 바와 같은 과제를 실현하기 위한 본 개시의 일 실시예에 따라 인공 신경망의 동작과 관련된 데이터를 저장하는 데이터 구조가 저장된 컴퓨터 판독가능 기록매체가 개시된다. 상기 데이터는 이하의 동작을 통해 생성되며, 상기 동작은: 적어도 하나 이상의 디지털 문서 이미지를 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득하는 동작; 상기 실제화된 디지털 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 디지털 문서 이미지를 획득하는 동작; 상기 실제화된 디지털 문서 이미지 및 실제 문서 이미지를 제 1 식별 네트워크에 입력하여 두 문서 이미지를 식별하도록 제 1 식별 네트워크를 학습하는 동작; 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과 및 상기 제 1 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 1 생성 네트워크와 제 2 생성 네트워크를 학습하는 동작; 및 디지털 문서 이미지를 상기 학습된 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득하는 동작;을 포함할 수 있다.According to one embodiment of the present disclosure for realizing the above object, a computer readable recording medium having a data structure for storing data related to the operation of an artificial neural network is stored. The data is generated through the following operations, which include: inputting at least one or more digital document images to the first generation network to obtain a realized digital document image; acquiring a restored digital document image by inputting the actualized digital document image to a second generation network; learning the first identification network to identify the two document images by inputting the actualized digital document image and the actual document image into a first identification network; learning a first generation network and a second generation network based at least in part on a comparison result of the digital document image and the restored digital document image and an identification result of the first identification network; and acquiring a realized digital document image by inputting the digital document image to the first generated network.

본 개시는 머신 러닝을 위한 학습 데이터 관리 방법을 제공할 수 있다.The present disclosure may provide a learning data management method for machine learning.

도 1은 본 개시의 일 실시예에 따른 적어도 하나 이상의 네트워크 함수를 이용하여 문서 이미지를 생성하기 위한 컴퓨팅 장치의 블록 구성도이다.
도 2는 본 개시의 일 실시예에 따른 문서 이미지의 종류에 관한 예시를 도시한 도면이다.
도 3은 본 개시의 일 실시예에 따른 제 1 생성 네트워크와 제 1 식별 네트워크 상호간 학습 과정 및 네트워크간 관계를 나타내는 개념도이다.
도 4는 본 개시의 일 실시예에 따른 제 1 생성 네트워크와 제 2 생성 네트워크 및 제 1 식별 네트워크로 이루어진 인공지능 기반 문자인식에 사용되는 학습 데이터 생성기에 있어서 네트워크간 관계를 나타내는 개념도이다.
도 5는 본 개시의 일 실시예에 따라 제 2 생성 네트워크의 학습에 제 2 식별 네트워크를 사용하는 경우 네트워크간 관계를 나타내는 개념도이다.1 is a block diagram of a computing device for generating a document image using at least one network function according to an embodiment of the present disclosure.
2 is a diagram illustrating examples of types of document images according to an embodiment of the present disclosure.
3 is a conceptual diagram illustrating a learning process between a first generation network and a first identification network and a relationship between networks according to an embodiment of the present disclosure.
4 is a conceptual diagram illustrating a relationship between networks in a learning data generator used for AI-based character recognition including a first generation network, a second generation network, and a first identification network according to an embodiment of the present disclosure.
5 is a conceptual diagram illustrating a relationship between networks when a second identification network is used to learn a second generation network according to an embodiment of the present disclosure.

다양한 실시예들이 이제 도면을 참조하여 설명된다. 본 명세서에서, 다양한 설명들이 본 개시의 이해를 제공하기 위해서 제시된다. 그러나, 이러한 실시예들은 이러한 구체적인 설명 없이도 실행될 수 있음이 명백하다.Various embodiments are now described with reference to the drawings. In this specification, various descriptions are presented to provide an understanding of the present disclosure. However, it is apparent that these embodiments may be practiced without these specific details.

더불어, 용어 "또는"은 배타적 "또는"이 아니라 내포적 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되지 않거나 문맥상 명확하지 않은 경우에, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 의도된다. 즉, X가 A를 이용하거나; X가 B를 이용하거나; 또는 X가 A 및 B 모두를 이용하는 경우, "X는 A 또는 B를 이용한다"가 이들 경우들 어느 것으로도 적용될 수 있다. 또한, 본 명세서에 사용된 "및/또는"이라는 용어는 열거된 관련 아이템들 중 하나 이상의 아이템의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.In addition, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless otherwise specified or clear from the context, “X employs A or B” is intended to mean one of the natural inclusive substitutions. That is, X uses A; X uses B; Or, if X uses both A and B, "X uses either A or B" may apply to either of these cases. Also, the term "and/or" as used herein should be understood to refer to and include all possible combinations of one or more of the listed related items.

또한, "포함한다" 및/또는 "포함하는"이라는 용어는, 해당 특징 및/또는 구성요소가 존재함을 의미하는 것으로 이해되어야 한다. 다만, "포함한다" 및/또는 "포함하는"이라는 용어는, 하나 이상의 다른 특징, 구성요소 및/또는 이들의 그룹의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. 또한, 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 본 명세서와 청구범위에서 단수는 일반적으로 "하나 또는 그 이상"을 의미하는 것으로 해석되어야 한다.Also, the terms "comprises" and/or "comprising" should be understood to mean that the features and/or components are present. However, it should be understood that the terms "comprises" and/or "comprising" do not exclude the presence or addition of one or more other features, elements, and/or groups thereof. Also, unless otherwise specified or where the context clearly indicates that a singular form is indicated, the singular in this specification and claims should generally be construed to mean "one or more".

그리고, “A 또는 B 중 적어도 하나”이라는 용어는, “A만을 포함하는 경우”, “B 만을 포함하는 경우”, “A와 B의 구성으로 조합된 경우”를 의미하는 것으로 해석되어야 한다. In addition, the term “at least one of A or B” should be interpreted as meaning “when only A is included”, “when only B is included”, and “when A and B are combined”.

당업자들은 추가적으로 여기서 개시된 실시예들과 관련되어 설명된 다양한 예시적 논리적 블록들, 구성들, 모듈들, 회로들, 수단들, 로직들, 및 알고리즘 단계들이 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양쪽 모두의 조합들로 구현될 수 있음을 인식해야 한다. 하드웨어 및 소프트웨어의 상호교환성을 명백하게 예시하기 위해, 다양한 예시적 컴포넌트들, 블록들, 구성들, 수단들, 로직들, 모듈들, 회로들, 및 단계들은 그들의 기능성 측면에서 일반적으로 위에서 설명되었다. 그러한 기능성이 하드웨어로 또는 소프트웨어로서 구현되는지 여부는 전반적인 시스템에 부과된 특정 어플리케이션(application) 및 설계 제한들에 달려 있다. 숙련된 기술자들은 각각의 특정 어플리케이션들을 위해 다양한 방법들로 설명된 기능성을 구현할 수 있다. 다만, 그러한 구현의 결정들이 본 개시내용의 영역을 벗어나게 하는 것으로 해석되어서는 안된다.Those skilled in the art will further understand that the various illustrative logical blocks, components, modules, circuits, means, logics, and algorithm steps described in connection with the embodiments disclosed herein may be implemented using electronic hardware, computer software, or combinations of both. It should be recognized that it can be implemented as To clearly illustrate the interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented in hardware or as software depends on the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application. However, such implementation decisions should not be interpreted as causing a departure from the scope of this disclosure.

제시된 실시예들에 대한 설명은 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 발명을 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이다. 여기에 정의된 일반적인 원리들은 본 개시의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 발명은 여기에 제시된 실시예 들로 한정되는 것이 아니다. 본 발명은 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다. The description of the presented embodiments is provided to enable any person skilled in the art to use or practice the present invention. Various modifications to these embodiments will be apparent to those skilled in the art of this disclosure. The general principles defined herein may be applied to other embodiments without departing from the scope of this disclosure. Thus, the present invention is not limited to the embodiments presented herein. The present invention is to be accorded the widest scope consistent with the principles and novel features set forth herein.

본 명세서에 걸쳐, 디지털 문서 이미지는 컴퓨팅 장치에서 문서를 생성하기 위한 요소들에 기초하여 생성된 문서 포함한다. 디지털 문서 이미지는 컴퓨팅 장치에서 문서 편집기 등을 통해 생성된 문자열이 컴퓨팅 장치에 출력되는 화면 그대로 캡쳐(capture) 및/또는 저장된 문서 이미지를 포함한다. 즉, 본 명세서에서 디지털 문서 이미지란 온전히 컴퓨팅 장치 내부에서 전기적 신호에 의해 생성된 문서 이미지를 의미한다.Throughout this specification, a digital document image includes a document created based on elements for creating a document in a computing device. The digital document image includes a document image captured and/or stored as a screen on which a character string generated through a text editor or the like is output to the computing device. That is, in this specification, a digital document image refers to a document image completely generated by an electrical signal inside a computing device.

본 명세서에서 실제 문서 이미지는 컴퓨팅 장치 외부에서 작성되거나, 컴퓨팅 장치의 문서 편집기로 생성된 문서가 출력되는 등의 방법을 통해 생성된 문서를 촬영한 문서 이미지를 포함한다. 실제 문서 이미지는 사용자가 실생활에서 접할 수 있는 각종 문서를 촬영한 이미지를 포함한다. 또한 실제 문서 이미지는 예를 들어, 인쇄된 형태의 문자열과 사용자가 수기로 작성한 문자열이 섞여 있는 문서의 이미지를 포함한다. 또한, 실제 문서 이미지에는 문자열에 관해 노이즈가 포함되어 있을 수도 있다. 나아가 실제 문서 이미지는 전체 문서 이미지 내에서의 조도 분포가 일정하지 않을 수 있다.In this specification, the actual document image includes a document image obtained by photographing a document created through a method such as outputting a document created outside a computing device or generated by a document editor of a computing device. The actual document image includes a photographed image of various documents that a user may encounter in real life. In addition, the actual document image includes, for example, an image of a document in which a printed character string and a character string handwritten by a user are mixed. In addition, the actual document image may contain noise regarding character strings. Furthermore, the actual document image may not have a constant illuminance distribution within the entire document image.

본 명세서에서 실제화된 문서 이미지는 본 개시의 일 실시예에 의하며 생성된 문서 이미지를 포함한다. 본 개시에서 실제화된 문서 이미지는 실제화된 디지털 문서 이미지 및 실제화된 실제 문서 이미지를 포함하는 개념이다. 실제화된 디지털 문서 이미지는 디지털 문서 이미지를 후술할 제 1 생성 네트워크에 입력하여 그 출력으로 얻어진 이미지일 수 있다. 실제화된 실제 문서 이미지는 복원된 실제 문서 이미지를 제 1 생성 네트워크에 입력하여 그 출력으로 얻어진 이미지 일 수 있다. 본 명세서에 걸쳐 “실제화된”이라는 단어는 “제 1 생성 네트워크의 출력으로 얻어진”, “제 1 생성 네트워크로부터 생성된” 등과 같은 의미로 사용될 수 있다. 본 명세서에서 실제화된 문서 이미지는 실제 문서 이미지와 유사한 스타일을 갖는 문서 이미지를 의미한다. 상기 스타일이란 이미지 해상도, 이미지 노이즈 정도, 이미지에 포함된 색의 온도, 이미지에 포함된 색의 종류, 선의 종류, 선의 두께, 색감 등을 포함한다. 전술한 스타일의 기재는 예시일 뿐이며 본 개시는 이에 제한되지 않는다. The actualized document image in this specification includes a document image created according to an embodiment of the present disclosure. In the present disclosure, a materialized document image is a concept including a materialized digital document image and a materialized real document image. The actualized digital document image may be an image obtained as an output by inputting a digital document image to a first generation network to be described later. The actualized real document image may be an image obtained as an output of inputting the restored real document image to the first generating network. Throughout this specification, the word “actualized” may be used to mean “obtained as an output of a first generating network”, “generated from a first generating network”, and the like. In this specification, a realized document image means a document image having a style similar to that of an actual document image. The style includes image resolution, image noise level, color temperature included in the image, type of color included in the image, type of line, thickness of the line, color tone, and the like. The foregoing style of description is exemplary only and the present disclosure is not limited thereto.

본 명세서에서 복원된 문서 이미지는 복원된 디지털 문서 이미지 및 복원된 실제 문서 이미지를 포함하는 개념이다. 복원된 디지털 문서 이미지는 상기 실제화된 디지털 문서 이미지를 후술할 제 2 생성 네트워크에 입력하여 그 출력으로 얻어진 이미지일 수 있다. 복원된 실제 문서 이미지는 실제 문서 이미지를 제 2 생성 네트워크에 입력하여 그 출력으로 얻어진 이미지 일 수 있다. 본 명세서에 걸쳐 “복원된”이라는 단어는 “제 2 생성 네트워크의 출력으로 얻어진”, “제 2 생성 네트워크로부터 생성된” 등과 같은 의미로 사용될 수 있다. 본 명세서에서 복원된 문서 이미지는 디지털 문서 이미지와 유사한 스타일을 갖는 문서 이미지를 의미한다.In this specification, a restored document image is a concept including a restored digital document image and a restored actual document image. The restored digital document image may be an image obtained as an output by inputting the actualized digital document image to a second generation network to be described later. The restored real document image may be an image obtained as an output of inputting the real document image to the second generating network. Throughout this specification, the word “reconstructed” may be used in the same sense as “obtained as an output of the second generation network” or “generated from the second generation network”. In this specification, a restored document image means a document image having a style similar to that of a digital document image.

도 1은 본 개시의 일 실시예에 따른 적어도 하나 이상의 네트워크 함수를 이용하여 문서 이미지를 생성하기 위한 컴퓨팅 장치의 블록 구성도이다.1 is a block diagram of a computing device for generating a document image using at least one network function according to an embodiment of the present disclosure.

도 1에 도시된 컴퓨팅 장치(100)의 구성은 간략화 하여 나타낸 예시일 뿐이다. 본 개시의 일 실시예에서 컴퓨팅 장치(100)는 컴퓨팅 장치(100)의 컴퓨팅 환경을 수행하기 위한 다른 구성들이 포함될 수 있고, 개시된 구성들 중 일부만이 컴퓨팅 장치(100)를 구성할 수도 있다. 컴퓨팅 장치(100)는 프로세서(110), 메모리(130), 네트워크부(150)를 포함할 수 있다. The configuration of the computing device 100 shown in FIG. 1 is only a simplified example. In one embodiment of the present disclosure, the computing device 100 may include other components for performing a computing environment of the computing device 100, and only some of the disclosed components may constitute the computing device 100. The computing device 100 may include a processor 110 , a memory 130 , and a network unit 150 .

프로세서(110)는 적어도 하나 이상의 디지털 문서 이미지를 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 획득할 수 있다. 제 1 생성 네트워크의 학습 방법에 관하여는 후술하여 자세하게 설명한다.The processor 110 may input one or more digital document images to the first generation network to obtain a realized digital document image. The learning method of the first generation network will be described in detail later.

상기 적어도 하나 이상의 디지털 문서 이미지는, 디지털 문서 이미지를 생성하기 위한 요소들을 포함하는 디지털 문서 이미지 생성 자료에 기초하여 생성될 수 있다. 구체적으로, 디지털 이미지 생성 자료에는 문자열, 글자 색상, 글자 크기, 글자 폰트, 선의 종류, 배경 스타일 등의 속성이 포함될 수 있다. 예를 들어, 메모리에 저장된 N개의 문자열(문자열1, 문자열2, … 문자열N)이 있고, 메모리에 M개의 글자 폰트가 저장되어 있으며, 메모리에 디지털 문서의 배경 스타일이 K개 저장되어 있다고 한다면, 프로세서(110)는 최대 N*M*K 개의 상이한 디지털 문서 이미지를 생성할 수 있다. 프로세서(110)는 랜덤(random)함수에 기초하여 디지털 이미지 생성 자료를 기반으로 상이한 디지털 문서 이미지를 생성할 수 있다. 상기 랜덤함수는 특정 시드(seed)값 혹은 현재 시간을 시드로 하여 주어진 데이터 세트에서 랜덤하게 데이터를 선택하는 함수일 수 있다. 전술한 랜덤 함수에 대한 구체적 기재는 예시일 뿐이며 본 개시는 이에 제한되지 않는다.The at least one digital document image may be created based on digital document image generating data including elements for generating a digital document image. Specifically, the digital image generation data may include properties such as character string, text color, text size, text font, type of line, and background style. For example, if there are N strings stored in memory (string 1, string 2, ... string N), M fonts stored in memory, and K background styles of digital documents stored in memory, Processor 110 may generate up to N*M*K different digital document images. The processor 110 may generate different digital document images based on digital image generating data based on a random function. The random function may be a function that randomly selects data from a given data set using a specific seed value or a current time as a seed. The specific description of the aforementioned random function is only an example, and the present disclosure is not limited thereto.

프로세서(110)는 공개된 디지털 문서 자료를 크롤링(crawling)하여 적어도 하나 이상의 디지털 문서 이미지를 획득할 수 있다. 상기 크롤링 기법은 웹 페이지의 코드 상에서 이미지 태그를 검출하여 문서 이미지를 식별하고 관련 문서 이미지를 저장하여 실제 문서 이미지를 획득할 수 있다. 상기 크롤링 기법은 삽입된 이미지와 상관없이 웹페이지 화면에 출력되는 화면을 그대로 출력하여 디지털 문서 이미지를 획득할 수도 있다. 전술한 크롤링 기법에 대한 구체적 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다. 프로세서(110)는 크롤링 방식을 통해 공개된 디지털 문서 이미지 및 실제 문서 이미지를 보다 빠르고 다양하게 획득하여 생성 네트워크 및 식별 네트워크의 학습을 위한 학습 데이터로 사용함으로써, 풍부한 학습 데이터를 통해 생성 네트워크 및 식별 네트워크의 성능을 보다 향상시킬 수 있다. 본 개시의 일 실시예에서, 예를 들어, 디지털 문서 이미지는 금융업과 관련된 서류들의 이미지를 포함할 수 있으나 본 개시는 이에 제한되지 않는다.The processor 110 may acquire at least one digital document image by crawling open digital document data. The crawling technique detects an image tag on a code of a web page to identify a document image, and stores a related document image to obtain an actual document image. The crawling technique may obtain a digital document image by directly outputting a screen displayed on a webpage screen regardless of an inserted image. The specific description of the aforementioned crawling technique is only an example, and the present disclosure is not limited thereto. The processor 110 obtains digital document images and actual document images released through the crawling method more quickly and in various ways and uses them as training data for learning the generation network and the identification network, thereby generating the generation network and the identification network through the rich learning data. performance can be further improved. In one embodiment of the present disclosure, for example, the digital document image may include images of documents related to the financial industry, but the present disclosure is not limited thereto.

인공 신경망의 학습에 있어서 입력 데이터의 양과 질은 학습된 인공 신경망의 성능에 결정적이다. 본원 발명에 있어서 프로세서(110)는 디지털 문서 이미지 생성 자료에 기초하여 다양한 디지털 문서 이미지를 생성하거나 공개된 디지털 문서 자료를 크롤링하여 디지털 문서 이미지를 획득하므로 다양한 분야 디지털 문서 이미지를 신속하게 획득하여 인공 신경망에 많은 입력 데이터를 제공한다.In the training of artificial neural networks, the quantity and quality of input data is critical to the performance of the trained artificial neural networks. In the present invention, the processor 110 generates various digital document images based on digital document image generation data or acquires digital document images by crawling open digital document data, so that digital document images in various fields are quickly obtained and artificial neural networks are acquired. provides a lot of input data.

프로세서(110)는 실제화된 디지털 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 디지털 문서 이미지를 획득할 수 있다. 상기 실제화된 디지털 문서 이미지란, 디지털 문서 이미지를 실제 문서 이미지와 유사한 스타일을 갖도록 제 1 생성 네트워크에 입력되어 그 결과 출력으로 얻어진, 스타일 변환된 문서 이미지를 의미한다. 상기 복원된 디지털 문서 이미지란 상기 실제화된 디지털 문서 이미지에 대해 다시 디지털 문서 이미지와 유사한 스타일을 갖도록 제 2 생성 네트워크에 입력되어 그 결과 출력으로 얻어진, 스타일 변환된 문서 이미지를 의미한다. The processor 110 may obtain a restored digital document image by inputting the actualized digital document image to the second generating network. The actualized digital document image refers to a style-converted document image obtained as an output by inputting a digital document image to a first generation network to have a style similar to that of the actual document image. The restored digital document image refers to a style-converted document image that is input to a second generation network to have a style similar to that of the digital document image again, and obtained as an output as a result.

상기 제 2 생성 네트워크는 제 1 생성 네트워크와 역함수의 관계에 있는 생성 네트워크일 수 있다. 제 2 생성 네트워크는 제 1 생성 네트워크의 역함수 역할을 하기 위해 그 네트워크의 구조가 제 1 생성 네트워크와 거울상의 구조를 가질 수 있다. 제 2 생성 네트워크는 제 1 생성 네트워크와 입출력이 반전된 구조일 수 있다. 구체적으로, 제 2 생성 네트워크는 제 1 생성 네트워크의 출력을 입력으로 하고, 제 1 생성 네트워크의 입력과 유사한 데이터를 출력하도록 학습된 네트워크 함수일 수 있다. 즉, 제 1 생성 네트워크는 입력으로 디지털 문서 이미지를 받아 실제 문서 이미지와 유사한 실제화된 디지털 문서 이미지를 출력하고, 제 2 생성 네트워크는 입력으로 실제 문서 이미지를 받아 디지털 문서 이미지와 유사한 복원된 실제 문서 이미지를 출력할 수 있다. 제 2 생성 네트워크가 제 1 생성 네트워크에 대한 거울상 구조를 갖는 일 실시예로서, 제 1 생성 네트워크가 인코더-디코더로 이루어진 오토인코더 구조의 네트워크 함수일 경우, 제 2 생성 네트워크는 제 1 생성 네트워크와 거울상의 구조로서 디코더-인코더의 구조를 갖는 네트워크 함수일 수 있다. 이 경우, 제 2 생성 네트워크의 디코더-인코더 각각은 입력 레이어와 출력 레이어가 순차적으로 반전된 것(즉, 예를 들어, 디코더의 출력 레이어가 제 2 생성 네트워크의 입력 레이어, 디코더의 입력단이 제 2 생성 네트워크의 병목 레이어일 수 있으며, 인코더의 입력 레이어가 제 2 생성 네트워크의 출력 레이어이고, 인코더의 출력 레이어가 제 2 생성 네트워크의 병목 레이어 일 수 있다. 전술한 예시는 일예시에 불과할 뿐 본 개시를 제한하지 않는다. 제 2 생성 네트워크의 학습 방법에 대해서는 후술하여 자세히 설명한다. The second generation network may be a generation network having an inverse function relationship with the first generation network. The structure of the second generation network may have a mirror image structure of the first generation network in order to act as an inverse function of the first generation network. The second generation network may have a structure in which input and output are inverted from those of the first generation network. Specifically, the second generation network may be a network function learned to take an output of the first generation network as an input and output data similar to the input of the first generation network. That is, the first generating network receives a digital document image as an input and outputs a realized digital document image similar to the real document image, and the second generating network receives a real document image as an input and outputs a restored real document image similar to the digital document image. can output As an embodiment in which the second generating network has a mirror image structure of the first generating network, when the first generating network is a network function of an autoencoder structure composed of encoders and decoders, the second generating network is a mirror image of the first generating network. It may be a network function having a decoder-encoder structure as a structure. In this case, each of the decoder-encoders of the second generation network is one in which the input layer and the output layer are sequentially inverted (i.e., the output layer of the decoder is the input layer of the second generation network, and the input stage of the decoder is the second It may be the bottleneck layer of the generation network, the input layer of the encoder may be the output layer of the second generation network, and the output layer of the encoder may be the bottleneck layer of the second generation network. The learning method of the second generation network will be described in detail below.

프로세서(110)는 실제화된 디지털 문서 이미지 및 실제 문서 이미지를 제 1 식별 네트워크에 입력하여 두 문서 이미지를 식별하도록 제 1 식별 네트워크를 학습할 수 있다. 이 때 두 문서 이미지를 식별하도록 제 1 식별 네트워크를 학습한다는 것은 두 문서 이미지를 구별하기 위해 차이점을 학습하는 것을 포함한다. 이하에서는 프로세서(110)가 인공지능 기반 문자 인식에 사용되는 학습 데이터를 생성하기 위해 제 1 식별 네트워크를 학습하는 방법에 관하여 설명한다.The processor 110 may learn the first identification network to identify the two document images by inputting the actualized digital document image and the real document image to the first identification network. At this time, learning the first identification network to identify the two document images includes learning differences to distinguish the two document images. Hereinafter, a method for the processor 110 to learn the first identification network to generate learning data used for AI-based character recognition will be described.

본 명세서에 걸쳐, 연산 모델, 신경망, 네트워크 함수, 뉴럴 네트워크(neural network)는 동일한 의미로 사용될 수 있다. 신경망은 일반적으로 노드라 지칭될 수 있는 상호 연결된 계산 단위들의 집합으로 구성될 수 있다. 이러한 노드들은 뉴런(neuron)들로 지칭될 수도 있다. 신경망은 적어도 하나 이상의 노드들을 포함하여 구성된다. 신경망들을 구성하는 노드(또는 뉴런)들은 하나 이상의 링크에 의해 상호 연결될 수 있다.Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. A neural network may consist of a set of interconnected computational units, which may generally be referred to as nodes. These nodes may also be referred to as neurons. A neural network includes one or more nodes. Nodes (or neurons) constituting neural networks may be interconnected by one or more links.

제 1 생성 네트워크는 제 1 식별 네트워크와 적대적 생성 네트워크를 구성하여(Generative adversarial network, 이하 GAN)을 이용하여 상호 적대적으로 학습될 수 있다. 제 1 생성 네트워크는 적어도 하나 이상의 인공 신경망을 포함하여 입력된 문서 이미지를 실제 문서와 유사한 스타일을 가지도록 학습될 수 있다. 제 1 식별 네트워크는 상기 제 1 생성 네트워크에 의해 스타일 변환이 된 문서 이미지 즉, 실제화된 문서 이미지를 실제 문서 이미지와 비교하여 유사 여부를 판단할 수 있다. The first generative network may be learned adversarially by using a generative adversarial network (GAN) by configuring the first identification network and the adversarial network. The first generation network may include at least one artificial neural network and learn to have a style similar to an actual document for an input document image. The first identification network compares the document image style-converted by the first generation network, ie, the actualized document image, with the actual document image to determine whether they are similar.

프로세서(110)에 의해 상호 적대적인 방식으로 학습되는 제 1 생성 네트워크는 역전파(back propagation)를 위한 오차 계산을 위해 그 목적함수로써 제 1 식별 네트워크의 식별 결과를 활용할 수 있다. 상기 제 1 식별 네트워크는, 실제화된 디지털 문서 이미지와 실제 문서 이미지 각각에 대해 실제 문서 이미지 여부를 나타내는 확률값을 부여하기 위한 연산을 포함할 수 있다. 구체적으로, 제 1 식별 네트워크는 상기 실제 문서 이미지 여부를 나타내는 확률값에 대해 사전 결정된 구간을 설정하여 실제 문서 이미지와 입력된 문서 이미지의 스타일이 비슷한 경우, 사전 결정된 구간에서 유사 정도에 따라 높은 값을 부여하고 실제 문서 이미지와 입력된 문서 이미지의 스타일이 상이한 경우, 사전 결정된 구간에서 유사 정도에 따라 낮은 값을 부여하는 방법으로 학습될 수 있다. 여기서 제 1 식별 네트워크는 반대의 확률값을 부여하는 방식으로 학습될 수 있다. 다시 말해, 실제 문서 이미지와 스타일이 비슷한 경우 사전 결정된 구간에서 유사 정도에 따라 낮은 값을 부여하고 실제 문서 이미지와 입력된 문서 이미지의 스타일이 상이한 경우, 사전 결정된 구간에서 유사 정도에 따라 높은 값을 부여하는 방법으로도 학습될 수 있다. The first generation network learned in a mutually adversarial manner by the processor 110 may utilize the identification result of the first identification network as its objective function to calculate an error for back propagation. The first identification network may include an operation for assigning a probability value indicating whether or not the actual document image is a real document image to each of the actualized digital document image and the actual document image. Specifically, the first identification network sets a predetermined interval for the probability value indicating whether or not the actual document image is present, and when the style of the actual document image and the input document image are similar, a high value is given according to the degree of similarity in the predetermined interval. and when the style of the actual document image and the input document image are different, it may be learned in a method of assigning a low value according to the degree of similarity in a predetermined section. Here, the first identification network can be learned in a way that gives an opposite probability value. In other words, when the style is similar to the actual document image, a low value is given according to the degree of similarity in a predetermined section, and when the style of the actual document image and the input document image are different, a high value is given according to the degree of similarity in a predetermined section. It can also be learned in a way.

본 개시에 따른 일 실시예에 있어서, 프로세서(110)에 의해 상기 제 1 식별 네트워크는, 실제화된 디지털 문서 이미지에 대한 실제 문서 이미지 여부를 나타내는 확률값 및 실제 문서 이미지에 대한 실제 문서 이미지 여부를 나타내는 확률값에 기초한 손실함수를 이용하여 학습될 수 있다. 상기 실제화된 디지털 문서 이미지는 제 1 생성 네트워크에 의해 출력된 이미지일 수 있다. 상기 손실함수는 전체 인공 신경망을 최적화하기 위해 사용되는 함수이다. 본 명세서에서 손실함수와 목적함수는 동일한 의미로 사용될 수 있다.In one embodiment according to the present disclosure, the processor 110 determines whether the first identification network is a real document image for a realized digital document image and a probability value indicating whether a real document image is a real document image. It can be learned using a loss function based on The actualized digital document image may be an image output by the first generating network. The loss function is a function used to optimize the entire artificial neural network. In this specification, the loss function and the objective function may be used in the same meaning.

프로세서(110)는 상기 실제 문서 이미지 여부를 나타내는 각 문서 이미지에 대한 확률값에 기초한 손실함수로써 다음의 함수를 사용할 수 있다.The processor 110 may use the following function as a loss function based on a probability value for each document image indicating whether or not the document image is a real document image.

(손실함수 1)

(loss function 1)

손실함수 1에서, D는 제 1 식별 네트워크를 표현하는 함수이고, G는 제 1 생성 네트워크를 표현하는 함수이다. y는 제 1 식별 네트워크에 입력된 실제 문서 이미지를 표현한다. x는 제 1 생성 네트워크에 입력된 디지털 문서 이미지를 나타내고, 따라서 G(x)는 제 1 생성 네트워크의 출력인 실제화된 디지털 문서 이미지를 나타낸다. 프로세서(110)는 y와 G(x)를 제 1 식별 네트워크에 입력함으로써 실제 문서 이미지 여부를 나타내는 확률값을 각 문서 이미지에 부여하게 된다. In the loss function 1, D is a function representing the first identification network, and G is a function representing the first generating network. y represents the actual document image input to the first identification network. x represents the digital document image input to the first generation network, so G(x) represents the actualized digital document image that is the output of the first generation network. The processor 110 assigns a probability value indicating whether or not the document image is a real document image to each document image by inputting y and G(x) to the first identification network.

프로세서(110)는 제 1 식별 네트워크가 실제 문서 이미지와 실제화된 디지털 문서 이미지를 잘 구별할 수 있도록 학습시킬 수 있다. 구체적으로, 프로세서(110)는 제 1 식별 네트워크에 입력된 실제 문서 이미지에 실제 문서 이미지 여부를 나타내는 확률값을 높게 부여하고, 입력된 실제화된 디지털 문서 이미지에는 상기 확률값을 낮게 부여하도록 제 1 식별 네트워크를 학습할 수 있다. 프로세서(110)는 제 1 식별 네트워크가 상기 손실함수의 결과값이 최대가 되도록 학습할 수 있다. 다시 말해,

의 결과값이 최대가 되도록 제 1 식별 네트워크(D)를 학습한다. 예를 들어, 학습을 통해 식별 네트워크가 실제 문서 이미지와 실제화된 디지털 문서 이미지를 잘 구별할 수 있게 되면, 실제 문서 이미지인 입력 y에 대해서는 1에 가까운 확률값(D(y))을 부여하게 될 것이다. 또한 실제화된 디지털 문서 이미지인 입력 G(x)에 대해서는 0에 가까운 확률값(D(G(x))을 부여하게 될 것이다. 그 결과 상기 손실함수 1에 포함된 두 log 함수는 두 함수 모두 입력으로 1에 가까운 값을 얻게 되어 0에 가까운 함수값을 출력하게 된다. 결과적으로, 입력이 [0,1] 구간으로 제한된 log 함수의 결과값의 최대값은 0이므로, 상기 손실함수 1의 최대값은 0이 된다. 그러므로 프로세서(110)는 손실함수 1을 이용하여 제 1 식별 네트워크가 손실함수의 결과값이 최대가 되도록 학습할 수 있다. 여기서 손실함수 1은 일 예시에 불과하여 본원 발명은 이에 제한되지 않으며, 실제 문서 이미지와 실제화된 디지털 문서 이미지와 같이 구별하고자 하는 두 문서 이미지에 대해 사전 결정된 구간 내에서 구간의 양 끝 값을 확률값으로 부여하는 손실함수를 생성한 뒤, 프로세서(110)는 제 1 식별 네트워크가 그 손실함수의 결과값이 최대가 되도록 학습할 수 있다.The processor 110 may train the first identification network to distinguish between real document images and actualized digital document images. Specifically, the processor 110 uses the first identification network to assign a high probability value indicating whether or not a real document image is input to a real document image input to the first identification network, and to assign a low probability value to an input actualized digital document image. can learn The processor 110 may learn the first identification network to maximize the resulting value of the loss function. In other words,

The first identification network (D) is learned so that the result value of is maximized. For example, if the identification network can distinguish a real document image from a real digital document image through training, it will give a probability value (D(y)) close to 1 for the input y, which is a real document image. . In addition, a probability value D(G(x)) close to 0 will be given to the input G(x), which is a realized digital document image. As a result, the two log functions included in the loss function 1 are both inputs. A value close to 1 is obtained and a function value close to 0 is outputted. 0. Therefore, the processor 110 can learn the first identification network to maximize the resultant value of the loss function by using the loss function 1. Here, the loss function 1 is only an example and the present invention is limited thereto. After generating a loss function that assigns values at both ends of the interval as probability values within a predetermined interval for two document images to be distinguished, such as a real document image and a real digital document image, the processor 110 1 The identification network can learn to maximize the output of its loss function.

프로세서(110)는 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과 및 상기 제 1 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 1 생성 네트워크와 제 2 생성 네트워크를 학습할 수 있다. 여기서, 제 1 식별 네트워크의 식별 결과는, 제 1 식별 네트워크가 상기 실제화된 디지털 문서 이미지 및 상기 실제 문서 이미지 각각에 부여한 실제 문서 이미지 여부를 나타내는 확률값을 포함할 수 있다. Processor 110 may learn a first generation network and a second generation network based at least in part on a comparison result of the digital document image and the reconstructed digital document image and an identification result of the first identification network. Here, the identification result of the first identification network may include a probability value indicating whether or not the actual document image is a real document image that the first identification network assigns to each of the actualized digital document image and the real document image.

이하에서는 제 1 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 1 생성 네트워크를 학습하는 방법을 서술한다.The following describes a method for learning a first generating network based at least in part on an identification result of the first identifying network.

프로세서(110)는 GAN 방식에 기초하여 상기 제 1 식별 네트워크와 제 1 생성 네트워크를 상호 학습시킬 수 있다. 프로세서(110)는 상기 제 1 식별 네트워크가 실제 문서 이미지와 실제화된 디지털 문서 이미지를 식별하지 못하도록 제 1 생성 네트워크를 학습할 수 있다. 구체적으로, 제 1 생성 네트워크는 제 1 식별 네트워크와 동일한 손실함수에 대해 반대의 결과값을 도출하도록 학습할 수 있다. 예를 들어, 제 1 식별 네트워크가 손실함수가 최대가 되도록 학습되는 경우, 제 1 생성 네트워크는 손실함수가 최소가 되도록 학습될 수 있다. 반대로 제 1 식별 네트워크가 손실함수가 최소가 되도록 학습되는 경우에는 제 1 생성 네트워크는 손실함수가 최대가 되도록 학습될 수 있다. The processor 110 may mutually learn the first identification network and the first generation network based on a GAN scheme. The processor 110 may learn the first generation network so that the first identification network does not discriminate between real document images and actualized digital document images. Specifically, the first generation network may learn to derive an opposite result for the same loss function as the first identification network. For example, if the first identification network is trained to have a maximum loss function, the first generation network can be trained to have a minimum loss function. Conversely, when the first identification network is trained to have a minimum loss function, the first generation network can be trained to have a maximum loss function.

프로세서(110)는 상기한 손실함수 1에 기초하여 식별 결과를 도출하는 제 1 식별 네트워크가 실제 문서 이미지와 실제화된 디지털 문서 이미지를 식별하지 못하도록 제 1 생성 네트워크를 학습할 수 있다. 상기한 제 1 식별 네트워크가 손실함수 1이 최대값을 도출하도록 학습되는 경우, 프로세서(110)는 제 1 생성 네트워크가 손실함수 1이 최소값을 가지도록 학습할 수 있다. 다시 말해,

의 결과값이 최소가 되도록 제 1 생성 네트워크를 학습할 수 있다. 구체적으로, 손실함수 1의 결과값이 최소가 되도록 제 1 생성 네트워크를 학습한다는 것은, 실제화된 디지털 문서 이미지(G(x))에 대해 제 1 식별 네트워크(D)가 실제 문서 이미지와 스타일 유사도가 높다고 판단해 높은 확률값을 부여하는 경우를 말한다. 프로세서(110)가 제 1 생성 네트워크를 상기한 바와 같이 학습하는 경우

부분은

에 가까이 수렴하게 되므로 그 값은 음의 무한대로 발산하게 되어 결과적으로 손실함수는 최소값을 가지게 된다. 전술한 손실함수의 예시는 일 예시에 불과하며 본 개시는 이에 제한되지 않는다.The processor 110 may learn the first generation network so that the first identification network that derives the identification result based on the above loss function 1 does not discriminate between the real document image and the actualized digital document image. When the first identification network is learned to derive the maximum value of the loss function 1, the processor 110 may learn the first generation network to have the minimum value of the loss function 1. In other words,

The first generating network can be learned so that the output value of is minimized. Specifically, learning the first generation network so that the result value of the loss function 1 is minimized means that the first identification network (D) has a style similarity with the actual document image for the actualized digital document image (G(x)). This refers to the case in which a high probability value is assigned because it is judged to be high. When the processor 110 learns the first generation network as described above

part is

Since it converges close to , the value diverges to negative infinity, and as a result, the loss function has a minimum value. The above-described example of the loss function is only an example, and the present disclosure is not limited thereto.

상기한 바와 같이 GAN 방식으로 제 1 생성 네트워크와 제 1 식별 네트워크를 상호 학습함으로써, 문서 이미지를 생성하기 위한 본원 발명은 참값에 대한 라벨링 작업 없이, 비지도학습 방식으로 제 1 생성 네트워크를 생성할 수 있다.As described above, by mutually learning the first generation network and the first identification network in the GAN method, the present invention for generating a document image can generate the first generation network in an unsupervised learning method without labeling true values. there is.

이하에서는 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과에 적어도 부분적으로 기초하여 제 1 생성 네트워크 및 제 2 생성 네트워크를 학습하는 방법에 대해 서술한다.Hereinafter, a method for learning a first generation network and a second generation network based at least in part on a comparison result between the digital document image and the restored digital document image will be described.

본 개시의 일 실시예에 따라 프로세서(110)는 상기 제 1 식별 네트워크의 식별 결과 외에 추가적으로, 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과에 기초하여 제 1 생성 네트워크 및 제 2 생성 네트워크를 학습할 수 있다. 프로세서(110)는 제 1 생성 네트워크 및 제 2 생성 네트워크를 학습하는데 필요한 목적함수를 상기 비교 결과에 기초하여 작성할 수 있다. 구체적으로, 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과는 픽셀 간의 RGB 값의 차이를 의미할 수 있다. 또한 상기 비교 결과는, 디지털 문서 이미지를 참값으로 하고 제 1 생성 네트워크 및 제 2 생성 네트워크에 의해 생성된 복원된 디지털 문서 이미지를 최종 예측값으로 하는 손실함수의 결과를 의미할 수 있다. 여기서 손실함수는 참값과 예측값의 평균제곱오차(MSE, Mean Squared Error)를 의미할 수 있다. 또한, 손실함수로 크로스 엔트로피를 사용할 수도 있다. 나아가, 별도의 네트워크 함수를 통해 디지털 문서 이미지와 복원된 디지털 문서 이미지의 차이점을 학습하여 상기 비교 결과를 도출할 수도 있다. According to an embodiment of the present disclosure, the processor 110 determines a first generation network and a second generation network based on a comparison result between the digital document image and the restored digital document image, in addition to the identification result of the first identification network. can learn The processor 110 may create an objective function required to learn the first generation network and the second generation network based on the comparison result. Specifically, a comparison result between the digital document image and the restored digital document image may mean a difference in RGB values between pixels. Also, the comparison result may refer to a result of a loss function having a digital document image as a true value and a restored digital document image generated by the first generation network and the second generation network as final predicted values. Here, the loss function may mean the mean squared error (MSE) between the true value and the predicted value. Also, cross entropy can be used as a loss function. Furthermore, the comparison result may be derived by learning the difference between the digital document image and the restored digital document image through a separate network function.

본 개시의 일 실시예에 따라 상기 디지털 문서 이미지와 상기 복원된 디지털 문서 이미지의 비교 결과에 기초하여 제 1 생성 네트워크 및 제 2 생성 네트워크를 학습함으로써, 제 2 생성 네트워크는, 제 1 생성 네트워크가 최소한 출력된 문서 이미지의 스타일을 입력된 문서 이미지의 스타일로 변환 또는 복원할 수 있게 유지하며 학습되도록 한다. According to an embodiment of the present disclosure, a first generation network and a second generation network are learned based on a comparison result between the digital document image and the restored digital document image, so that the second generation network is configured to obtain at least the first generation network. The style of the output document image is maintained and learned so that it can be converted or restored to the style of the input document image.

일반적으로, 네트워크함수는 입력 데이터에 대한 예측으로 출력 데이터를 도출하고, 예측값을 참값과 비교하여 출력 데이터를 수정한다. 구체적으로 예를 들어, 제 1 생성 네트워크에 입력된 디지털 문서 이미지가 “가”라는 문자열을 포함하는 문서 이미지일 수 있다. 이때 제 1 생성 네트워크는 출력으로 “가”에 대한 실제화된 문서 이미지를 출력할 수 있다. 그 후 제 1 식별 네트워크에 “가”에 대한 실제화된 문서 이미지와 임의의 “나”에 대한 실제 문서 이미지가 입력되어 두 문서 이미지를 식별하게 되는 경우, 제 1 생성 네트워크는 제 1 식별 네트워크가 두 문서 이미지를 구별하지 못하도록 “가”를 “나”로 변경하도록 즉 문자열을 변경하도록 학습될 수 있다.In general, a network function derives output data by predicting input data, and compares the predicted value with the true value to modify the output data. Specifically, for example, the digital document image input to the first generating network may be a document image including the character string “A”. At this time, the first generating network may output the actualized document image for “A” as an output. After that, when a realized document image for “A” and an actual document image for “B” are input to the first identification network to identify the two document images, the first generation network identifies two document images. It can be learned to change “a” to “b”, that is, to change a character string so that document images cannot be distinguished.

그러나 본 개시에 의하면 제 1 생성 네트워크는, 제 2 생성 네트워크에 의해 제 1 생성 네트워크의 출력 데이터가 입력 데이터 스타일로 복원가능한 수준을 유지하며 스타일 변경을 학습하므로, 문자열 내용 자체의 변경과 같이 입력 데이터의 고유 특성을 잃어버릴 정도의 변경은 지양하면서 스타일 변경을 하도록 학습된다. 따라서, 본 개시에 의하면 제 1 생성 네트워크는 디지털 문서 이미지를 실제화하고, 그 실제화된 디지털 문서 이미지에 해당하는 정답 이미지가 없어도 제 1 생성 네트워크를 학습할 수 있다. 이는 스타일 변환 전 문서 이미지에 대한 스타일 변환 후 문서 이미지의 정답 이미지가 존재하지 않는 경우를 포함하여 학습 데이터의 양이 부족한 경우에도 인공지능 기반 문자인식 학습 데이터 생성기를 학습할 수 있는 장점이 있다.However, according to the present disclosure, since the first generation network maintains a level at which the output data of the first generation network can be restored to the input data style by the second generation network and learns the style change, the input data It is learned to change the style while avoiding changes to the extent of losing its unique characteristics. Therefore, according to the present disclosure, the first generating network can actualize a digital document image and learn the first generating network even if there is no correct image corresponding to the actualized digital document image. This has the advantage of being able to learn the AI-based character recognition learning data generator even when the amount of training data is insufficient, including when there is no correct image of the document image after style conversion for the document image before style conversion.

프로세서(110)는 디지털 문서 이미지와 복원된 디지털 문서 이미지의 비교 결과에 따른 오차를 역전파하여 제 1 생성 네트워크 및 제 2 생성 네트워크를 동시에 학습시킬 수 있다. 다시 말해, 제 1 생성 네트워크와 제 2 생성 네트워크를 하나의 네트워크 함수로 보면 상기 하나의 네트워크 함수는 디지털 문서 이미지를 입력으로 받아 복원된 디지털 문서 이미지를 출력하는 네트워크 함수이다. 따라서, 입력인 디지털 문서 이미지와 출력인 복원된 디지털 문서 이미지 사이의 오차를 기반으로 상기 하나의 네트워크 함수는 학습될 수 있다. 상기 하나의 네트워크 함수를 학습하는 과정은 제 1 생성 네트워크와 제 2 생성 네트워크를 동시에 학습시킬 수 있다.The processor 110 may train the first generation network and the second generation network at the same time by backpropagating an error according to a comparison result between the digital document image and the restored digital document image. In other words, when the first generation network and the second generation network are viewed as one network function, the one network function is a network function that receives a digital document image as an input and outputs a restored digital document image. Accordingly, the one network function may be learned based on an error between an input digital document image and an output digital document image. The process of learning one network function may simultaneously learn the first generation network and the second generation network.

프로세서(110)는 디지털 문서 이미지와 복원된 디지털 문서 이미지의 비교 결과에 기초하여 제 2 생성 네트워크를 학습시킬 수 있다. 구체적으로, 프로세서(110)는 제 1 생성 네트워크의 모델에 포함된 가중치, 편향 값 등을 고정하고, 제 1 생성 네트워크의 입력 데이터인 디지털 문서 이미지와 제 2 생성 네트워크의 출력 데이터인 복원된 디지털 문서 이미지의 비교 결과에 기초하여 제 2 생성 네트워크를 학습시킬 수 있다.The processor 110 may train the second generation network based on a comparison result between the digital document image and the restored digital document image. Specifically, the processor 110 fixes weights, bias values, etc. included in the model of the first generation network, and outputs the digital document image as input data of the first generation network and the restored digital document as output data of the second generation network. A second generation network may be trained based on the comparison result of the images.

프로세서(110)는 제 1 식별 네트워크의 식별 결과에 기초하여 제 2 생성 네트워크를 학습시킬 수 있다. 구체적으로, 프로세서(110)는 실제 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 실제 문서 이미지를 획득한 후, 이를 다시 제 1 생성 네트워크에 입력하여 실제화된 실제 문서 이미지를 획득할 수 있다. 그리고 상기 실제화된 실제 문서 이미지와 실제 문서 이미지를 학습된 제 1 식별 네트워크에 입력하여 그 오차를 제 2 생성 네트워크에 역전파함으로써 제 2 생성 네트워크를 학습할 수 있다.The processor 110 may train a second generation network based on the identification result of the first identification network. Specifically, the processor 110 may obtain a restored real document image by inputting the real document image to the second generation network, and then input the restored real document image to the first generation network to obtain a real document image. In addition, a second generation network may be learned by inputting the actualized real document image and the actual document image into the learned first identification network and back-propagating the error to the second generation network.

본 개시는 제 1 생성 네트워크의 출력의 스타일을 다시 제 1 생성 네트워크의 입력의 스타일로 변경하는 제 2 생성 네트워크를 둠으로써, 제 1 생성 네트워크가 최초의 입력인 디지털 문서 이미지를 실제 문서 이미지로 스타일 변환하는 과정에서 지나친 정보의 손실을 방지하고, 실제 문서 이미지 스타일로의 과적합(overfitting)을 방지할 수 있는 효과가 있다.The present disclosure provides a second generation network that changes the style of the output of the first generation network back to the style of the input of the first generation network, so that the first generation network styles a digital document image, which is an initial input, into an actual document image. There is an effect of preventing excessive loss of information in the process of conversion and preventing overfitting to an actual document image style.

프로세서(110)는 상기 학습된 생성 네트워크를 이용하여 문서 이미지를 생성할 수 있다. 상기 학습된 생성 네트워크는 제 1 생성 네트워크, 제 2 생성 네트워크를 포함한다. 프로세서(110)에 의한 상기 문서 이미지를 생성하는 동작은, 디지털 문서 이미지를 상기 학습된 제 1 생성 네트워크에 입력하여 실제화된 디지털 문서 이미지를 생성하는 동작일 수 있다. 이렇게 획득된 실제화된 디지털 문서 이미지는 인공 신경망에 의해 실제 문서 이미지와 구별할 수 없는 상태가 되어 인공지능 기반 문자인식 기술에 학습 데이터로서 제공될 수 있다. 또 다른 실시예로서 프로세서(110)에 의한 상기 문서 이미지를 생성하는 동작은, 실제 문서 이미지를 상기 학습된 제 2 생성 네트워크에 입력하여 복원된 실제 문서 이미지를 생성하는 동작일 수 있다. 전술한 바와 같이 제 2 생성 네트워크에 의해 복원된 문서 이미지는 디지털 문서 이미지와 유사한 스타일을 갖는 문서 이미지로써, 실제 문서 이미지에 대한 복원 작업을 통해 실제 문서 이미지에 존재하는 노이즈를 제거할 수 있다. 또한 제 2 생성 네트워크를 통해 수집된 실제 문서 이미지들의 스타일 표준화가 가능하게 된다. 여기서 스타일 표준화는 각기 다른 스타일들이 적용된 실제 문서 이미지를 디지털 문서 이미지 형태로 복원시킴으로써 디지털 문서 형식으로 표준화하는 것을 포함한다. 나아가, 실제 문서 이미지를 제 2 생성 네트워크에 입력한 뒤 그 출력을 문자 위치 인식, 문자 인식 기술 등을 포함하는 OCR 기술에 사용할 경우, 실제 문서 이미지에 대한 노이즈 제거 등을 포함하는 데이터 정제 효과를 얻을 수 있어, OCR 기술의 성능이 극대화될 수 있다. 다시 말해 효과적인 문자 인식을 위한 전처리기로서 기능할 수 있는 장점이 있다.The processor 110 may generate a document image using the learned generation network. The learned generation network includes a first generation network and a second generation network. The operation of generating the document image by the processor 110 may be an operation of inputting the digital document image to the first learned generation network to generate a realized digital document image. The actualized digital document image obtained in this way can be rendered indistinguishable from the actual document image by an artificial neural network and provided as learning data to an AI-based text recognition technology. As another embodiment, the operation of generating the document image by the processor 110 may be an operation of generating a restored real document image by inputting the actual document image to the learned second generation network. As described above, the document image reconstructed by the second generation network is a document image having a style similar to that of the digital document image, and noise existing in the actual document image can be removed through restoration of the actual document image. In addition, style standardization of actual document images collected through the second generating network is possible. Here, style standardization includes standardization into a digital document format by restoring an actual document image to which different styles are applied to a digital document image form. Furthermore, when an actual document image is input to the second generation network and the output is used for an OCR technology including character position recognition and character recognition technology, a data purification effect including noise removal on the actual document image can be obtained. Therefore, the performance of OCR technology can be maximized. In other words, it has the advantage of functioning as a preprocessor for effective character recognition.

도 2는 본 개시의 일 실시예에 따른 문서 이미지의 종류에 관한 예시를 도시한 도면이다.2 is a diagram illustrating examples of types of document images according to an embodiment of the present disclosure.

컴퓨팅 장치는 “충복형”이라는 문자열에 대해 생성된 디지털 문서 이미지(201)를 제 1 생성 네트워크에 입력하면 그 출력으로 “충복형”이라는 문자열에 대해 실제화된 디지털 문서 이미지(202)를 획득할 수 있다. 또는 컴퓨팅 장치는 “/h.h/o”라는 문자열에 관한 실제 문서 이미지(204)를 제 2 생성 네트워크에 입력하여 그 출력으로 “/h.h/o”이라는 문자열에 대해 복원된 실제 문서 이미지(203)를 획득할 수 있다. 이와 같이 디지털 문서 이미지를 제 1 생성 네트워크에 입력할 경우, 실제 문서 이미지와 유사한 스타일을 갖는 실제화된 디지털 문서 이미지를 생성할 수 있고, 실제 문서 이미지를 제 2 생성 네트워크에 입력할 경우, 디지털 문서 이미지와 유사한 스타일을 갖는 복원된 실제 문서 이미지를 생성할 수 있다. 전술한 문자열에 관한 예시는 일 예시에 불과하며 본 개시는 이에 제한되지 않는다.When the computing device inputs the digital document image 201 generated for the character string “overlapping type” into the first generation network, it can obtain a digital document image 202 actualized for the string “overlapping type” as an output thereof. there is. Alternatively, the computing device inputs the real document image 204 for the string “/h.h/o” into the second generation network and outputs the restored real document image 203 for the string “/h.h/o”. can be obtained In this way, when the digital document image is input to the first generation network, a realized digital document image having a style similar to the actual document image can be generated, and when the actual document image is input to the second generation network, the digital document image It is possible to create a restored real document image with a style similar to The above-described example of the character string is only an example, and the present disclosure is not limited thereto.

도 3은 본 개시의 일 실시예에 따른 제 1 생성 네트워크와 제 1 식별 네트워크 상호간 학습 과정 및 네트워크간 관계를 나타내는 개념도이다. 도 3의 실선(400)은 각 네트워크에 입력 및 출력되는 데이터의 흐름을 나타내고 파선(401)은 식별 네트워크의 두 문서 이미지에 대한 식별 결과에 기초하여 생성 네트워크가 학습되는 것을 나타낸다. 적어도 하나 이상의 디지털 문서 이미지(301)가 제 1 생성 네트워크(311)에 입력되면, 실제화된 디지털 문서 이미지(303)가 출력된다. 그 후 실제화된 디지털 문서 이미지(303)는 실제 문서 이미지(302)와 함께 제 1 식별 네트워크(312)에 입력될 수 있다. 제 1 식별 네트워크는 입력된 각 문서 이미지에 실제 문서 이미지 여부를 나타내는 확률값을 부여하고, 이를 더 잘 구별하기 위해 학습될 수 있다. 여기서 제 1 식별 네트워크가 두 문서 이미지를 잘 구별하기 위해 학습된다는 것에는, 전술한 바와 같이 실제 문서 이미지에 높은 확률값을 부여하고 실제화된 문서 이미지에 낮은 확률값을 부여하기 위한 연산을 포함할 수 있다. 제 1 식별 네트워크(312)의 결과에 기초하여 제 1 생성 네트워크(311)는 제 1 식별 네트워크가 두 이미지를 잘 구별하지 못하도록 학습될 수 있다. 즉, 제 1 생성 네트워크는 제 1 식별 네트워크가 실제화된 디지털 문서 이미지에 대해 실제 문서 이미지 여부를 나타내는 확률값을 높게 부여하도록 학습될 수 있다.3 is a conceptual diagram illustrating a learning process between a first generation network and a first identification network and a relationship between networks according to an embodiment of the present disclosure. The solid line 400 in FIG. 3 represents the flow of data input and output to each network, and the broken line 401 represents that the generation network is learned based on the identification results of the two document images of the identification network. When at least one digital document image 301 is input to the first generating network 311, a realized digital document image 303 is output. The actualized digital document image 303 may then be input to the first identification network 312 together with the actual document image 302 . The first identification network may assign a probability value indicating whether or not each input document image is a real document image, and may be trained to better discriminate it. As described above, the fact that the first identification network is trained to discriminate between two document images may include an operation for assigning a high probability value to an actual document image and a low probability value to an actualized document image. Based on the results of the first identification network 312, the first generation network 311 can be trained so that the first identification network does not distinguish between the two images well. That is, the first generation network may be trained to assign a high probability value indicating whether the first identification network is a real document image to the actualized digital document image.

도 4는 본 개시의 일 실시예에 따른 제 1 생성 네트워크와 제 2 생성 네트워크 및 제 1 식별 네트워크로 이루어진 인공지능 기반 문자인식에 사용되는 학습 데이터 생성기에 있어서 네트워크간 관계를 나타내는 개념도이다. 도 4에서 실선(400)은 각 네트워크에 입력 및 출력되는 데이터의 흐름을 나타내고 도 4에서 파선(401)은 식별 네트워크의 두 문서 이미지에 대한 식별 결과에 기초하여 생성 네트워크가 학습되는 것을 나타낸다. 그리고 일점쇄선(402)은 제 1 생성 네트워크 및 제 2 생성 네트워크를 하나의 네트워크 함수로 보았을 때 그 최종 출력 문서 이미지를 최초 입력 문서 이미지와 비교하여 그 비교 결과를 도출하는 것을 나타낸다. 상기 비교 결과에 의해 제 1 생성 네트워크 및 제 2 생성 네트워크는 학습될 수 있다. 제 1 생성 네트워크(311)는 제 2 생성 네트워크(313) 출력인 복원된 디지털 문서 이미지(305)와 최초 입력된 디지털 문서 이미지(301)의 비교 결과 및 제 1 식별 네트워크(312)의 식별 결과에 적어도 부분적으로 기초하여 학습될 수 있다. 상기 제 1 식별 네트워크(312)는 제 1 생성 네트워크(311)의 출력인 실제화된 디지털 문서 이미지(303) 및 실제 문서 이미지(302)를 입력 받아 두 문서 이미지를 식별할 수 있다. 상기 제 2 생성 네트워크(313)는 제 1 생성 네트워크(311)의 출력인 실제화된 디지털 문서 이미지(303)를 입력 받아 그 출력으로 복원된 디지털 문서 이미지(305)를 도출할 수 있다. 제 2 생성 네트워크(313)는 제 2 생성 네트워크(313)의 출력인 복원된 디지털 문서 이미지(305)와 제 1 생성 네트워크에 입력된 디지털 문서 이미지(301)의 비교 결과에 기초하여 학습될 수 있다. 제 2 생성 네트워크(313)는 제 1 생성 네트워크(311)와 동시에 학습될 수도 있고, 제 1 생성 네트워크(311)는 고정된 채 단독으로 학습될 수 있다.4 is a conceptual diagram illustrating a relationship between networks in a learning data generator used for AI-based character recognition including a first generation network, a second generation network, and a first identification network according to an embodiment of the present disclosure. In FIG. 4 , a solid line 400 represents a flow of data input and output to each network, and a broken line 401 in FIG. 4 indicates that the generation network is learned based on the identification results of two document images of the identification network. Further, a dashed-dotted line 402 indicates that a final output document image is compared with an initial input document image to derive a comparison result when the first generation network and the second generation network are regarded as one network function. Based on the comparison result, the first generation network and the second generation network may be learned. The first generation network 311 determines the comparison result between the restored digital document image 305 output from the second generation network 313 and the initially input digital document image 301 and the identification result of the first identification network 312. can be learned based, at least in part. The first identification network 312 may receive the actualized digital document image 303 and the actual document image 302 output from the first generation network 311 and identify the two document images. The second generation network 313 may receive the actualized digital document image 303 that is an output of the first generation network 311 and derive a restored digital document image 305 as an output thereof. The second generation network 313 may be learned based on a comparison result between the restored digital document image 305 output from the second generation network 313 and the digital document image 301 input to the first generation network. . The second generating network 313 may be learned simultaneously with the first generating network 311 , or the first generating network 311 may be learned alone while being fixed.

이하에서는 제 2 생성 네트워크의 효과적인 학습을 위해 제 2 식별 네트워크를 사용하는 방법에 대해 서술한다. Hereinafter, a method of using the second identification network for effective learning of the second generation network is described.

본 개시의 일 실시예에 따라 프로세서(110)는 제 2 생성 네트워크의 학습에 제 2 식별 네트워크를 사용할 수 있다. 구체적으로 프로세서(110)는 전술한 바와 같이 제 1 생성 네트워크와 제 1 식별 네트워크가 상호 학습되는 것과 대응하여 대칭적으로 제 2 생성 네트워크 및 제 2 식별 네트워크를 학습시킬 수 있다. 프로세서(110)는 적어도 하나 이상의 실제 문서 이미지를 제 2 생성 네트워크에 입력하여 복원된 실제 문서 이미지를 획득할 수 있다. 프로세서(110)는 상기 복원된 실제 문서 이미지를 상기 제 1 생성 네트워크에 입력하여 실제화된 실제 문서 이미지를 획득할 수 있다. 프로세서(110)는 상기 복원된 실제 문서 이미지 및 상기 디지털문서 이미지를 제 2 식별 네트워크에 입력하여 두 문서 이미지를 식별하도록 제 2 식별 네트워크를 학습시킬 수 있다. 나아가, 상기 실제 문서 이미지와 상기 실제화된 실제 문서 이미지의 비교 결과 및 상기 제 2 식별 네트워크의 식별 결과에 적어도 부분적으로 기초하여 제 2 생성 네트워크를 학습할 수 있다.According to an embodiment of the present disclosure, the processor 110 may use the second identification network to learn the second generation network. Specifically, as described above, the processor 110 may symmetrically learn the second generation network and the second identification network in response to mutual learning of the first generation network and the first identification network. The processor 110 may obtain a restored real document image by inputting at least one real document image to the second generation network. The processor 110 may acquire a real document image by inputting the restored real document image to the first generation network. The processor 110 may input the restored real document image and the digital document image to a second identification network to train the second identification network to identify the two document images. Furthermore, a second generation network may be learned based at least in part on a comparison result between the actual document image and the actualized actual document image and an identification result of the second identification network.

프로세서(110)는 제 2 생성 네트워크와 제 2 식별 네트워크를 전술한 GAN 방식으로 학습할 수 있다. 프로세서(110)에 의해 상기 제 2 식별 네트워크는, 입력된 두 문서 이미지를 식별하기 위해 복원된 실제 문서 이미지와 디지털 문서 이미지 각각에 대해 디지털 문서 이미지 여부를 나타내는 확률값을 부여하기 위한 연산을 포함한다. 즉, 제 2 식별 네트워크는 디지털 문서 이미지 및 제 2 생성 네트워크의 출력인, 복원된 실제 문서 이미지 각각에 대해 디지털 문서 이미지 여부를 나타내는 확률값을 부여함으로써 두 문서 이미지를 식별한다. 또한, 프로세서(110)에 의해 상기 제 2 식별 네트워크는, 복원된 실제 문서 이미지에 대한 디지털 문서 이미지 여부를 나타내는 확률값 및 디지털 문서 이미지에 대핸 디지털 문서 이미지 여부를 나타내는 확률값에 기초한 손실함수를 이용하여 학습될 수 있다. 이는 전술한 제 1 식별 네트워크의 학습방법에 대응된다.The processor 110 may learn the second generation network and the second identification network using the aforementioned GAN method. The second identification network by the processor 110 includes an operation for assigning a probability value indicating whether or not the document image is a digital document image to each of the restored real document image and the digital document image in order to identify the two input document images. That is, the second identification network identifies the two document images by assigning a probability value indicating whether or not they are digital document images to each of the digital document image and the restored real document image, which is an output of the second generation network. In addition, the second identification network by the processor 110 learns using a loss function based on a probability value indicating whether or not a restored real document image is a digital document image and a probability value indicating whether or not a digital document image is a digital document image. It can be. This corresponds to the learning method of the first identification network described above.

이하에서는 구체적으로 도 5를 참조하여 제 2 생성 네트워크 및 제 2 식별 네트워크의 학습 방법에 대해 설명한다. 도 5는 본 개시의 일 실시예에 따라 제 2 생성 네트워크의 학습에 제 2 식별 네트워크를 사용하는 경우 네트워크간 관계를 나타내는 개념도이다. 도 5에서 실선(400)은 각 네트워크에 입력 및 출력되는 데이터의 종류를 나타내고 도 5에서 파선(401)은 식별 네트워크의 두 문서 이미지에 대한 식별 결과에 기초하여 생성 네트워크가 학습되는 것을 나타낸다. 그리고 일점쇄선(402)은 제 1 생성 네트워크 및 제 2 생성 네트워크를 하나의 네트워크 함수로 보았을 때 그 최종 출력 문서 이미지를 최초 입력 문서 이미지와 비교하여 그 비교 결과를 도출하는 것을 나타낸다. 상기 비교 결과에 의해 제 1 생성 네트워크 및 제 2 생성 네트워크가 학습될 수 있다.Hereinafter, a method for learning the second generation network and the second identification network will be described with reference to FIG. 5 in detail. 5 is a conceptual diagram illustrating a relationship between networks when a second identification network is used to learn a second generation network according to an embodiment of the present disclosure. In FIG. 5, a solid line 400 indicates the type of data input and output to each network, and a broken line 401 in FIG. 5 indicates that the generation network is learned based on the identification results of two document images of the identification network. Further, a dashed-dotted line 402 indicates that a final output document image is compared with an initial input document image to derive a comparison result when the first generation network and the second generation network are regarded as one network function. Based on the comparison result, the first generation network and the second generation network may be learned.

도 5 에 있어서, 프로세서(110)는 제 2 생성 네트워크(313)에 적어도 하나 이상의 실제 문서 이미지(302)를 입력하여 복원된 실제 문서 이미지(304)를 출력할 수 있다. 여기서 복원된 실제 문서 이미지(304)는 본 명세서의 디지털 문서 이미지(301)와 유사한 스타일을 가지는 이미지를 의미한다. 프로세서(110)는 상기 복원된 실제 문서 이미지(304)와 디지털 문서 이미지(301)를 상기 제 2 식별 네트워크(314)에 입력하여 두 문 서 이미지를 식별하도록 제 2 식별 네트워크(314)를 학습할 수 있다. 상기 제 2 식별 네트워크의 식별 결과는 제 2 생성 네트워크(313)에 반영되어 제 2 생성 네트워크(313)는 제 2 식별 네트워크(314)가 두 문서 이미지를 잘 구별하지 못하도록, 다시 말해 제 2 식별 네트워크가 복원된 실제 문서 이미지를 디지털 문서 이미지로 판단하여 디지털 문서 이미지 여부를 나타내는 확률값을 높게 부여하도록, 학습될 수 있다. 프로세서(110)는 제 2 생성 네트워크의 출력인 복원된 실제 문서 이미지(304)를 제 1 생성 네트워크(311)에 입력하여 실제화된 실제 문서 이미지(306)를 획득할 수 있다. 상기 실제화된 실제 문서 이미지는 실제 문서 이미지에 대해 디지털 문서 이미지와 유사하게 복원화가 이루어진 뒤, 다시 실제화가 이루어진 문서 이미지로서, 실제 문서 이미지가 제 2 생성 네트워크 및 제 1 생성 네트워크에 의해 순차적으로 연산된 결과 얻어진 문서 이미지이다. 제 2 생성 네트워크는 상기 제 2 식별 네트워크의 식별 결과 외에도 실제화된 실제 문서 이미지 및 최초 입력의 실제 문서 이미지의 비교 결과에 기초하여 학습될 수 있다. 그 구체적 방식은 전술한 제 1 생성 네트워크가 디지털 문서 이미지와 복원된 디지털 문서 이미지의 비교 결과에 기초하여 학습되는 것과 대응될 수 있다.In FIG. 5 , the processor 110 may input one or more real document images 302 to the second generating network 313 and output a restored real document image 304 . Here, the restored actual document image 304 means an image having a style similar to the digital document image 301 of the present specification. The processor 110 inputs the restored real document image 304 and digital document image 301 to the second identification network 314 to learn the second identification network 314 to identify the two document images. can The identification result of the second identification network is reflected in the second generation network 313, and the second generation network 313 prevents the second identification network 314 from distinguishing two document images, that is, the second identification network 313 may be learned to determine a restored real document image as a digital document image and give a high probability value indicating whether or not it is a digital document image. The processor 110 may obtain a real document image 306 by inputting the restored real document image 304 output from the second generating network to the first generating network 311 . The actualized real document image is a document image in which a real document image is reconstructed similarly to a digital document image and then realized again. This is the resulting document image. The second generating network may be learned based on a comparison result between the actualized real document image and the original input real document image in addition to the identification result of the second identification network. The specific method may correspond to learning of the above-described first generation network based on a comparison result between the digital document image and the restored digital document image.

전술한 바와 같이 제 2 식별 네트워크를 통해 제 2 생성 네트워크를 제 1 생성 네트워크와 상응하는 방식으로 대응되게 학습할 경우, 제 1 식별 네트워크 하나만 사용하는 경우에 비해 제 2 생성 네트워크의 성능이 좋아지게 된다. 또한, 제 2 식별 네트워크는 실제 문서 이미지로부터 디지털 문서 이미지로의 스타일 변환을 수행하는 제 2 생성 네트워크의 성능을 증대시킨다. 또한, 디지털 문서 이미지를 입력으로 하고 복원된 디지털 문서 이미지를 출력으로 하는 제 1 및 제 2 생성 네트워크의 조합에 대한 학습에 더하여, 실제 문서 이미지를 입력으로 하고 실제화된 실제 문서 이미지를 출력으로 하는 제 2 및 제 1 생성 네트워크의 조합에 대한 학습이 가능하므로 각 생성 네트워크의 성능이 향상될 수 있다.As described above, when the second generation network is learned to correspond to the first generation network through the second identification network in a manner corresponding to the first generation network, the performance of the second generation network is improved compared to the case of using only the first identification network. . Additionally, the second identification network augments the performance of the second generation network to perform style conversion from real document images to digital document images. Further, in addition to learning the combination of the first and second generating networks that take a digital document image as an input and a restored digital document image as an output, a real document image as an input and a real document image as an output can be learned. Since it is possible to learn a combination of the second and first generation networks, the performance of each generation network can be improved.

본 개시에 있어서 제 1 생성 네트워크는 제 1 식별 네트워크에 의해 GAN 방식으로 상호 학습될 뿐만 아니라, 거울상의 제 2 생성 네트워크에 의해 원래의 디지털 문서 이미지 스타일로 변환이 가능해야 하는 제약을 고려하며 학습됨으로써 최초 입력인 디지털 문서 이미지의 주요 속성을 잃지 않으면서 실제 문서 이미지로 스타일 변환이 가능한 장점을 가진다. 마찬가지로, 제 2 생성 네트워크는 제 1 생성 네트워크와 함께 제 1 식별 네트워크에 기초하여 학습되거나, 추가적인 제 2 식별 네트워크에 기초하여 학습될 수 있다. 제 2 생성 네트워크는 실제 문서 이미지 또는 실제화된 문서 이미지를 디지털 문서 이미지로 스타일 변환시키는 네트워크로써, 실제 문서 이미지 등으로부터 노이즈 제거 등을 포함하는 스타일 변환을 통해 문자 인식 기술에 제공되기 위한 디지털 문서 이미지를 생성할 수 있다. In the present disclosure, the first generation network is not only mutually learned by the first identification network in a GAN manner, but also learned while considering the constraint that it must be possible to convert into the original digital document image style by the second generation network of the mirror image. It has the advantage of being able to convert the style into an actual document image without losing the main properties of the first input digital document image. Similarly, the second generating network can be learned based on the first identifying network together with the first generating network or based on an additional second identifying network. The second generation network is a network that style-converts an actual document image or a realized document image into a digital document image, and converts a digital document image to be provided to the character recognition technology through style conversion including noise removal from the actual document image. can create

본 개시에 의해 생성된 문서 이미지는, 제 1 생성 네트워크에 의해 생성된 문서 이미지와 제 2 생성 네트워크에 의해 생성된 문서 이미지로 구분될 수 있다. 이 때 제 1 생성 네트워크에 의해 생성된 문서 이미지는 인공지능 기반 문자인식에 사용되는 학습 데이터로 활용될 수 있으며, 이는 컴퓨팅 장치에 의해 실제 문서와 구별되지 않으므로 방대한 양의 실제 문서를 준비하는데 요구되는 시간과 비용을 절감할 수 있는 효과가 있다. 즉, 상기 제 1 생성 네트워크는 인공지능 기반 문자 인식 기술에 제공되기 위한 학습 데이터 생성기로서 기능할 수 있다. Document images generated according to the present disclosure may be divided into document images generated by the first generation network and document images generated by the second generation network. At this time, the document image generated by the first generation network can be used as learning data used for AI-based character recognition, and since it is indistinguishable from real documents by a computing device, it is required to prepare a large amount of real documents. It has the effect of saving time and money. That is, the first generation network may function as a learning data generator to be provided to an AI-based text recognition technology.

한 편, 제 2 생성 네트워크에 의해 생성된 문서 이미지는 실제 문서 이미지를 디지털 문서 이미지로 스타일 변환하였으므로 비교적 깨끗한 화질과 품질의 문서 이미지일 수 있다. 이와 같은 제 2 생성 네트워크는 문자 인식 기술에 고품질의 문서 이미지를 제공하기 위한 전처리기로서 기능할 수 있다.On the other hand, the document image generated by the second generation network may be a document image of relatively clear picture quality and quality because a real document image is style-converted into a digital document image. Such a second generating network may function as a preprocessor for providing high-quality document images to text recognition technology.

본 개시의 일 실시예에 따라 데이터 구조를 저장한 컴퓨터 판독가능 매체가 개시된다.According to an embodiment of the present disclosure, a computer readable medium storing a data structure is disclosed.

데이터 구조는 데이터에 효율적인 접근 및 수정을 가능하게 하는 데이터의 조직, 관리, 저장을 의미할 수 있다. 데이터 구조는 특정 문제(예를 들어, 최단 시간으로 데이터 검색, 데이터 저장, 데이터 수정) 해결을 위한 데이터의 조직을 의미할 수 있다. 데이터 구조는 특정한 데이터 처리 기능을 지원하도록 설계된, 데이터 요소들 간의 물리적이거나 논리적인 관계로 정의될 수도 있다. 데이터 요소들 간의 논리적인 관계는 사용자 정의 데이터 요소들 간의 연결관계를 포함할 수 있다. 데이터 요소들 간의 물리적인 관계는 컴퓨터 판독가능 저장매체(예를 들어, 영구 저장 장치)에 물리적으로 저장되어 있는 데이터 요소들 간의 실제 관계를 포함할 수 있다. 데이터 구조는 구체적으로 데이터의 집합, 데이터 간의 관계, 데이터에 적용할 수 있는 함수 또는 명령어를 포함할 수 있다. 효과적으로 설계된 데이터 구조를 통해 컴퓨팅 장치는 컴퓨팅 장치의 자원을 최소한으로 사용하면서 연산을 수행할 수 있다. 구체적으로 컴퓨팅 장치는 효과적으로 설계된 데이터 구조를 통해 연산, 읽기, 삽입, 삭제, 비교, 교환, 검색의 효율성을 높일 수 있다.Data structure can refer to the organization, management, and storage of data that enables efficient access and modification of data. Data structure may refer to the organization of data to solve a specific problem (eg, data retrieval, data storage, data modification in the shortest time). A data structure may be defined as a physical or logical relationship between data elements designed to support a specific data processing function. A logical relationship between data elements may include a connection relationship between user-defined data elements. A physical relationship between data elements may include an actual relationship between data elements physically stored in a computer-readable storage medium (eg, a persistent storage device). The data structure may specifically include a set of data, a relationship between data, and a function or command applicable to the data. Through an effectively designed data structure, a computing device can perform calculations while using minimal resources of the computing device. Specifically, the computing device can increase the efficiency of operation, reading, insertion, deletion, comparison, exchange, and search through an effectively designed data structure.

데이터 구조는 데이터 구조의 형태에 따라 선형 데이터 구조와 비선형 데이터 구조로 구분될 수 있다. 선형 데이터 구조는 하나의 데이터 뒤에 하나의 데이터만이 연결되는 구조일 수 있다. 선형 데이터 구조는 리스트(List), 스택(Stack), 큐(Queue), 데크(Deque)를 포함할 수 있다. 리스트는 내부적으로 순서가 존재하는 일련의 데이터 집합을 의미할 수 있다. 리스트는 연결 리스트(Linked List)를 포함할 수 있다. 연결 리스트는 각각의 데이터가 포인터를 가지고 한 줄로 연결되어 있는 방식으로 데이터가 연결된 데이터 구조일 수 있다. 연결 리스트에서 포인터는 다음이나 이전 데이터와의 연결 정보를 포함할 수 있다. 연결 리스트는 형태에 따라 단일 연결 리스트, 이중 연결 리스트, 원형 연결 리스트로 표현될 수 있다. 스택은 제한적으로 데이터에 접근할 수 있는 데이터 나열 구조일 수 있다. 스택은 데이터 구조의 한 쪽 끝에서만 데이터를 처리(예를 들어, 삽입 또는 삭제)할 수 있는 선형 데이터 구조일 수 있다. 스택에 저장된 데이터는 늦게 들어갈수록 빨리 나오는 데이터 구조(LIFO-Last in First Out)일 수 있다. 큐는 제한적으로 데이터에 접근할 수 있는 데이터 나열 구조로서, 스택과 달리 늦게 저장된 데이터일수록 늦게 나오는 데이터 구조(FIFO-First in First Out)일 수 있다. 데크는 데이터 구조의 양 쪽 끝에서 데이터를 처리할 수 있는 데이터 구조일 수 있다.The data structure can be divided into a linear data structure and a non-linear data structure according to the shape of the data structure. A linear data structure may be a structure in which only one data is connected after one data. Linear data structures may include lists, stacks, queues, and decks. A list may refer to a series of data sets in which order exists internally. The list may include a linked list. A linked list may be a data structure in which data are connected in such a way that each data is connected in a single line with a pointer. In a linked list, a pointer can contain information about connection to the next or previous data. A linked list can be expressed as a singly linked list, a doubly linked list, or a circular linked list depending on the form. A stack can be a data enumeration structure that allows limited access to data. A stack can be a linear data structure in which data can be processed (eg, inserted or deleted) at only one end of the data structure. The data stored in the stack may be a LIFO-Last in First Out (Last in First Out) data structure. A queue is a data listing structure that allows limited access to data, and unlike a stack, it can be a data structure (FIFO-First in First Out) in which data stored later comes out later. A deck can be a data structure that can handle data from either end of the data structure.

비선형 데이터 구조는 하나의 데이터 뒤에 복수개의 데이터가 연결되는 구조일 수 있다. 비선형 데이터 구조는 그래프(Graph) 데이터 구조를 포함할 수 있다. 그래프 데이터 구조는 정점(Vertex)과 간선(Edge)으로 정의될 수 있으며 간선은 서로 다른 두개의 정점을 연결하는 선을 포함할 수 있다. 그래프 데이터 구조 트리(Tree) 데이터 구조를 포함할 수 있다. 트리 데이터 구조는 트리에 포함된 복수개의 정점 중에서 서로 다른 두개의 정점을 연결시키는 경로가 하나인 데이터 구조일 수 있다. 즉 그래프 데이터 구조에서 루프(loop)를 형성하지 않는 데이터 구조일 수 있다.The nonlinear data structure may be a structure in which a plurality of data are connected after one data. The non-linear data structure may include a graph data structure. A graph data structure can be defined as a vertex and an edge, and an edge can include a line connecting two different vertices. A graph data structure may include a tree data structure. The tree data structure may be a data structure in which one path connects two different vertices among a plurality of vertices included in the tree. That is, it may be a data structure that does not form a loop in a graph data structure.

본 명세서에 걸쳐, 연산 모델, 신경망, 네트워크 함수, 뉴럴 네트워크(neural network)는 동일한 의미로 사용될 수 있다. 이하에서는 신경망으로 통일하여 기술한다. 데이터 구조는 신경망을 포함할 수 있다. 그리고 신경망을 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망을 포함한 데이터 구조는 또한 신경망에 의한 처리를 위하여 전처리된 데이터, 신경망에 입력되는 데이터, 신경망의 가중치, 신경망의 하이퍼 파라미터, 신경망으로부터 획득한 데이터, 신경망의 각 노드 또는 레이어와 연관된 활성 함수, 신경망의 학습을 위한 손실 함수 등을 포함할 수 있다. 신경망을 포함한 데이터 구조는 상기 개시된 구성들 중 임의의 구성 요소들을 포함할 수 있다. 즉 신경망을 포함한 데이터 구조는 신경망에 의한 처리를 위하여 전처리된 데이터, 신경망에 입력되는 데이터, 신경망의 가중치, 신경망의 하이퍼 파라미터, 신경망으로부터 획득한 데이터, 신경망의 각 노드 또는 레이어와 연관된 활성 함수, 신경망의 학습을 위한 손실 함수 등 전부 또는 이들의 임의의 조합을 포함하여 구성될 수 있다. 전술한 구성들 이외에도, 신경망을 포함한 데이터 구조는 신경망의 특성을 결정하는 임의의 다른 정보를 포함할 수 있다. 또한, 데이터 구조는 신경망의 연산 과정에 사용되거나 발생되는 모든 형태의 데이터를 포함할 수 있으며 전술한 사항에 제한되는 것은 아니다. 컴퓨터 판독가능 매체는 컴퓨터 판독가능 기록 매체 및/또는 컴퓨터 판독가능 전송 매체를 포함할 수 있다. 신경망은 일반적으로 노드라 지칭될 수 있는 상호 연결된 계산 단위들의 집합으로 구성될 수 있다. 이러한 노드들은 뉴런(neuron)들로 지칭될 수도 있다. 신경망은 적어도 하나 이상의 노드들을 포함하여 구성된다.Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. Hereinafter, a neural network is unified and described. The data structure may include a neural network. And the data structure including the neural network may be stored in a computer readable medium. The data structure including the neural network may also include preprocessed data for processing by the neural network, data input to the neural network, weights of the neural network, hyperparameters of the neural network, data obtained from the neural network, activation function associated with each node or layer of the neural network, and neural network It may include a loss function for learning of . A data structure including a neural network may include any of the components described above. That is, the data structure including the neural network includes preprocessed data for processing by the neural network, data input to the neural network, weights of the neural network, hyperparameters of the neural network, data obtained from the neural network, activation function associated with each node or layer of the neural network, and neural network. It may be configured to include all or any combination thereof, such as a loss function for learning of . In addition to the foregoing configurations, the data structure comprising the neural network may include any other information that determines the characteristics of the neural network. In addition, the data structure may include all types of data used or generated in the computational process of the neural network, but is not limited to the above. A computer readable medium may include a computer readable recording medium and/or a computer readable transmission medium. A neural network may consist of a set of interconnected computational units, which may generally be referred to as nodes. These nodes may also be referred to as neurons. A neural network includes one or more nodes.

데이터 구조는 신경망에 입력되는 데이터를 포함할 수 있다. 신경망에 입력되는 데이터를 포함하는 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망에 입력되는 데이터는 신경망 학습 과정에서 입력되는 학습 데이터 및/또는 학습이 완료된 신경망에 입력되는 입력 데이터를 포함할 수 있다. 신경망에 입력되는 데이터는 전처리(pre-processing)를 거친 데이터 및/또는 전처리 대상이 되는 데이터를 포함할 수 있다. 전처리는 데이터를 신경망에 입력시키기 위한 데이터 처리 과정을 포함할 수 있다. 따라서 데이터 구조는 전처리 대상이 되는 데이터 및 전처리로 발생되는 데이터를 포함할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 개시는 이에 제한되지 않는다.The data structure may include data input to the neural network. A data structure including data input to the neural network may be stored in a computer readable medium. Data input to the neural network may include training data input during a neural network learning process and/or input data input to a neural network that has been trained. Data input to the neural network may include pre-processed data and/or data subject to pre-processing. Pre-processing may include a data processing process for inputting data to a neural network. Accordingly, the data structure may include data subject to pre-processing and data generated by pre-processing. The foregoing data structure is only an example, and the present disclosure is not limited thereto.

데이터 구조는 신경망의 가중치를 포함할 수 있다. (본 명세서에서 가중치, 파라미터는 동일한 의미로 사용될 수 있다.) 그리고 신경망의 가중치를 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망은 복수개의 가중치를 포함할 수 있다. 가중치는 가변적일 수 있으며, 신경망이 원하는 기능을 수행하기 위해, 사용자 또는 알고리즘에 의해 가변 될 수 있다. 예를 들어, 하나의 출력 노드에 하나 이상의 입력 노드가 각각의 링크에 의해 상호 연결된 경우, 출력 노드는 상기 출력 노드와 연결된 입력 노드들에 입력된 값들 및 각각의 입력 노드들에 대응하는 링크에 설정된 가중치에 기초하여 출력 노드에서 출력되는 데이터 값을 결정할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 개시는 이에 제한되지 않는다.The data structure may include the weights of the neural network. (In this specification, weights and parameters may be used in the same meaning.) Also, a data structure including weights of a neural network may be stored in a computer readable medium. A neural network may include a plurality of weights. The weight may be variable, and may be changed by a user or an algorithm in order to perform a function desired by the neural network. For example, when one or more input nodes are interconnected by respective links to one output node, the output node is set to a link corresponding to values input to input nodes connected to the output node and respective input nodes. A data value output from an output node may be determined based on the weight. The foregoing data structure is only an example, and the present disclosure is not limited thereto.

제한이 아닌 예로서, 가중치는 신경망 학습 과정에서 가변되는 가중치 및/또는 신경망 학습이 완료된 가중치를 포함할 수 있다. 신경망 학습 과정에서 가변되는 가중치는 학습 사이클이 시작되는 시점의 가중치 및/또는 학습 사이클 동안 가변되는 가중치를 포함할 수 있다. 신경망 학습이 완료된 가중치는 학습 사이클이 완료된 가중치를 포함할 수 있다. 따라서 신경망의 가중치를 포함한 데이터 구조는 신경망 학습 과정에서 가변되는 가중치 및/또는 신경망 학습이 완료된 가중치를 포함한 데이터 구조를 포함할 수 있다. 그러므로 상술한 가중치 및/또는 각 가중치의 조합은 신경망의 가중치를 포함한 데이터 구조에 포함되는 것으로 한다. 전술한 데이터 구조는 예시일 뿐 본 개시는 이에 제한되지 않는다.As a non-limiting example, the weights may include weights that are varied during neural network training and/or weights for which neural network training has been completed. The variable weight in the neural network learning process may include a weight at the time the learning cycle starts and/or a variable weight during the learning cycle. The weights for which neural network learning has been completed may include weights for which learning cycles have been completed. Accordingly, the data structure including the weights of the neural network may include a data structure including weights that are variable during the neural network learning process and/or weights for which neural network learning is completed. Therefore, it is assumed that the above-described weights and/or combinations of weights are included in the data structure including the weights of the neural network. The foregoing data structure is only an example, and the present disclosure is not limited thereto.

신경망의 가중치를 포함한 데이터 구조는 직렬화(serialization) 과정을 거친 후 컴퓨터 판독가능 저장 매체(예를 들어, 메모리, 하드 디스크)에 저장될 수 있다. 직렬화는 데이터 구조를 동일하거나 다른 컴퓨팅 장치에 저장하고 나중에 다시 재구성하여 사용할 수 있는 형태로 변환하는 과정일 수 있다. 컴퓨팅 장치는 데이터 구조를 직렬화하여 네트워크를 통해 데이터를 송수신할 수 있다. 직렬화된 신경망의 가중치를 포함한 데이터 구조는 역직렬화(deserialization)를 통해 동일한 컴퓨팅 장치 또는 다른 컴퓨팅 장치에서 재구성될 수 있다. 신경망의 가중치를 포함한 데이터 구조는 직렬화에 한정되는 것은 아니다. 나아가 신경망의 가중치를 포함한 데이터 구조는 컴퓨팅 장치의 자원을 최소한으로 사용하면서 연산의 효율을 높이기 위한 데이터 구조(예를 들어, 비선형 데이터 구조에서 B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree)를 포함할 수 있다. 전술한 사항은 예시일 뿐 본 개시는 이에 제한되지 않는다.The data structure including the weights of the neural network may be stored in a computer readable storage medium (eg, a memory or a hard disk) after going through a serialization process. Serialization can be the process of converting a data structure into a form that can be stored on the same or another computing device and later reconstructed and used. A computing device may serialize data structures to transmit and receive data over a network. The data structure including the weights of the serialized neural network may be reconstructed on the same computing device or another computing device through deserialization. The data structure including the weights of the neural network is not limited to serialization. Furthermore, the data structure including the weights of the neural network is a data structure for increasing the efficiency of operation while minimizing the resource of the computing device (for example, B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree). The foregoing is only an example, and the present disclosure is not limited thereto.

데이터 구조는 신경망의 하이퍼 파라미터(Hyper-parameter)를 포함할 수 있다. 그리고 신경망의 하이퍼 파라미터를 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 하이퍼 파라미터는 사용자에 의해 가변되는 변수일 수 있다. 하이퍼 파라미터는 예를 들어, 학습률(learning rate), 비용 함수(cost function), 학습 사이클 반복 횟수, 가중치 초기화(Weight initialization)(예를 들어, 가중치 초기화 대상이 되는 가중치 값의 범위 설정), Hidden Unit 개수(예를 들어, 히든 레이어의 개수, 히든 레이어의 노드 수)를 포함할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 개시는 이에 제한되지 않는다.The data structure may include hyper-parameters of the neural network. Also, the data structure including the hyperparameters of the neural network may be stored in a computer readable medium. A hyperparameter may be a variable variable by a user. Hyperparameters include, for example, learning rate, cost function, number of learning cycle iterations, weight initialization (eg, setting the range of weight values to be targeted for weight initialization), hidden unit number (eg, the number of hidden layers and the number of nodes in the hidden layer). The foregoing data structure is only an example, and the present disclosure is not limited thereto.

제시된 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조는 예시적인 접근들의 일례임을 이해하도록 한다. 설계 우선순위들에 기반하여, 본 개시의 범위 내에서 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조가 재배열될 수 있다는 것을 이해하도록 한다. 첨부된 방법 청구항들은 샘플 순서로 다양한 단계들의 엘리먼트들을 제공하지만 제시된 특정한 순서 또는 계층 구조에 한정되는 것을 의미하지는 않는다.It is to be understood that the specific order or hierarchy of steps in the processes presented is an example of exemplary approaches. Based upon design priorities, it is to be understood that the specific order or hierarchy of steps in the processes may be rearranged within the scope of this disclosure. The accompanying method claims present elements of the various steps in a sample order, but are not meant to be limited to the specific order or hierarchy presented.

제시된 실시예들에 대한 설명은 임의의 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 개시를 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이며, 여기에 정의된 일반적인 원리들은 본 개시의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 개시는 여기에 제시된 실시예들로 한정되는 것이 아니라, 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다.The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be apparent to those skilled in the art of this disclosure, and the general principles defined herein may be applied to other embodiments without departing from the scope of this disclosure. Thus, the present disclosure is not to be limited to the embodiments presented herein, but is to be interpreted in the widest scope consistent with the principles and novel features presented herein.

Claims

A computer program stored in a computer-readable storage medium, which, when executed in one or more processors, performs the following operations for generating a document image using at least one network function, the operations comprising:
generating an actualized digital document image by inputting the digital document image to the first generation network;
generating a restored digital document image by inputting the actualized digital document image to a second generation network;
inputting the actualized digital document image and the actual document image to a first identification network to identify a difference in a first image property; and
training the first generation network so that the digital document image and the restored digital document image match, and a difference in the first image property is equal to or less than a predetermined level;
including,
A computer program stored on a computer-readable storage medium.

According to claim 1,
The digital document image is generated based on digital document image creation data including elements for generating a digital document image.
A computer program stored on a computer-readable storage medium.

According to claim 1,
The digital document image is obtained by crawling open digital document data,
A computer program stored on a computer-readable storage medium.

According to claim 1,
The actualized digital document image is an image for which style conversion has been performed on the digital document image by the first generation network, and has a style similar to that of the actual document image;
The restored digital document image is an image for which style conversion has been performed on the actualized digital document image by the second generation network, and has a style similar to that of the digital document image.
A computer program stored on a computer-readable storage medium.

According to claim 1,
The second generating network is in an inverse function relationship with the first generating network and has a network structure that is a mirror image of the first generating network.
A computer program stored on a computer-readable storage medium.

According to claim 1,
The first identification network includes an operation for assigning a probability value indicating whether the actual document image is a real document image to each of the actualized digital document image and the real document image in order to identify the two input document images,
A computer program stored on a computer-readable storage medium.

According to claim 1,
The first identification network is trained using a loss function based on a probability value indicating whether a real document image is a real document image and a probability value indicating whether a real document image is a real document image.
A computer program stored on a computer-readable storage medium.

delete

According to claim 1,
generating a restored real document image by inputting the real document image to a second generation network;
generating a real document image by inputting the restored real document image to the first generation network;
inputting the restored real document image and the digital document image to a second identification network to identify a difference in a second image property; and
learning the second generation network so that the real document image and the actualized real document image match, and a difference in the second image property is equal to or less than a predetermined level;
Including more,
A computer program stored on a computer-readable storage medium.

According to claim 11,
The actual document image is obtained by crawling an actual document image published on the Internet.
A computer program stored on a computer-readable storage medium.

According to claim 11,
The restored real document image is an image for which style conversion has been performed on the real document image by the second generation network, and has a style similar to that of the digital document image;
The actualized real document image is an image for which style conversion has been performed on the restored real document image by the first generation network, and has a style similar to that of the real document image.
A computer program stored on a computer-readable storage medium.

According to claim 11,
The second identification network includes an operation for assigning a probability value indicating whether or not the document image is a digital document image to each of the restored real document image and the digital document image in order to identify the two input document images.
A computer program stored on a computer-readable storage medium.

According to claim 11,
The second identification network is learned using a loss function based on a probability value indicating whether a restored real document image is a digital document image and a probability value indicating whether a digital document image is a digital document image.
A computer program stored on a computer-readable storage medium.

delete

A method for generating a document image in which each step is performed by a computing device, comprising:
generating an actualized digital document image by inputting the digital document image to a first generation network;
generating a restored digital document image by inputting the actualized digital document image to a second generation network;
inputting the actualized digital document image and the actual document image into a first identification network to identify a difference in a first image attribute; and
training the first generating network such that the digital document image and the restored digital document image match, and a difference in the first image property is equal to or less than a predetermined level;
including,
A method for generating document images.

In the document image generator,
one or more processors for processing document image data; and
a memory for storing at least one deep learning-based network function;
includes, and
the one or more processors
inputting the digital document image into a first generation network to generate a realized digital document image;
inputting the actualized digital document image to a second generation network to generate a restored digital document image;
inputting the actualized digital document image and the actual document image into a first identification network to identify a difference in a first image property; and
training the first generating network such that the digital document image and the reconstructed digital document image match, and a difference in the first image property is less than or equal to a predetermined level;
Document image generator.

delete