KR102619368B1

KR102619368B1 - Apparatus, method and user device for recognizing optimal text using ocr ai algorithm of a different kind based on deep learning

Info

Publication number: KR102619368B1
Application number: KR1020210014135A
Authority: KR
Inventors: 최현길; 최영재
Original assignee: (주)메인라인
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2024-01-02
Also published as: KR20220111019A

Abstract

이미지로부터 텍스트를 인식하는 장치는 사용자 단말로부터 텍스트를 포함하는 이미지 데이터를 수신하는 수신부, 상기 사용자 단말로부터 상기 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받는 텍스트 인식 엔진 선택부, 상기 선택된 적어도 둘 이상의 텍스트 인식 엔진을 이용하여 상기 이미지 데이터로부터 인식된 각각의 후보 텍스트를 도출하는 후보 텍스트 도출부, 상기 도출된 각각의 후보 텍스트를 기학습된 강화 학습 모델에 입력하여 최종 텍스트를 도출하는 최종 텍스트 도출부 및 상기 도출된 최종 텍스트를 상기 사용자 단말로 제공하는 제공부를 포함한다. An apparatus for recognizing text from an image includes a receiving unit that receives image data including text from a user terminal, and a text recognition engine selection unit that selects at least two text recognition engines including algorithms to be applied to the image data from the user terminal. , a candidate text derivation unit that derives each candidate text recognized from the image data using the selected at least two text recognition engines, and inputs each derived candidate text into a pre-trained reinforcement learning model to produce a final text. It includes a final text deriving unit that derives the final text and a providing unit that provides the derived final text to the user terminal.

Description

Device, method, and user terminal for recognizing optimal text using deep learning-based heterogeneous OCR AI algorithm {APPARATUS, METHOD AND USER DEVICE FOR RECOGNIZING OPTIMAL TEXT USING OCR AI ALGORITHM OF A DIFFERENT KIND BASED ON DEEP LEARNING}

본 발명은 OCR AI 알고리즘을 포함하는 이종의 텍스트 인식 엔진을 이용하여 이미지로부터 텍스트를 인식하는 장치, 방법 및 사용자 단말에 관한 것이다. The present invention relates to an apparatus, method, and user terminal for recognizing text from an image using a heterogeneous text recognition engine including an OCR AI algorithm.

OCR(Optical Character Reader)이란, 문서에 생성된 글자를 빛을 이용하여 판독하는 장치를 의미한다. OCR은 인식된 글자와 기저장된 텍스트 데이터 간의 비교를 통해 글자를 판독할 수 있게 되며, OCR의 판독 특성으로 인해 생성된 글자를 인식하는데 약간의 오류가 존재한다는 단점을 가지고 있다. OCR (Optical Character Reader) refers to a device that reads characters created in a document using light. OCR allows characters to be read through comparison between recognized characters and previously stored text data, but has the disadvantage of having some errors in recognizing the generated characters due to the reading characteristics of OCR.

이러한 OCR 기능과 관련하여, 선행기술인 한국공개특허 제 2008-0002084호는 광학 문자 판독을 위한 시스템 및 광학 문자 판독 방법을 개시하고 있다. In relation to this OCR function, Korean Patent Publication No. 2008-0002084, a prior art, discloses a system and method for optical character reading.

OCR은 특수한 판독 장치를 이용하기도 하지만, 소형 광 스캐너, 전용 소프트웨어를 사용해서 글자를 인식할 수도 있다. 종래에는 로직, 패턴, 알고리즘 등을 포함하는 PC용 솔루션을 통해 글자를 인식할 수 있었지만, 최근에는 AI(Artificial Intelligence) 기술의 발전에 따라 이미지 학습을 머신러닝, 딥러닝 기법 등을 활용하여 스캔된 이미지를 분석함으로써, 자동으로 글자를 인식할 수 있게 되었다. OCR uses special reading devices, but it can also recognize letters using small optical scanners and dedicated software. Previously, letters could be recognized through PC solutions that included logic, patterns, and algorithms, but recently, with the development of AI (Artificial Intelligence) technology, image learning has been implemented using machine learning and deep learning techniques to recognize scanned characters. By analyzing images, it became possible to automatically recognize letters.

그러나 OCR AI에 이용되는 신경망으로는 DBN, CNN, RNN, GAN 등과 같이 다양한 알고리즘이 이용된다. 이로 인해, OCR AI에서 이용하는 알고리즘에 따라 각각 다른 텍스트 인식 결과가 도출됨에 따라, OCR AI가 이용하는 알고리즘에 따라 사용자의 텍스트 인식에 대한 만족도가 달라진다는 단점을 가지고 있다. However, various algorithms such as DBN, CNN, RNN, and GAN are used as neural networks for OCR AI. Because of this, as different text recognition results are derived depending on the algorithm used in OCR AI, it has the disadvantage that the user's satisfaction with text recognition varies depending on the algorithm used by OCR AI.

사용자 단말로부터 텍스트를 포함하는 이미지 데이터를 수신하고, 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받는 장치, 방법 및 사용자 단말을 제공하고자 한다. An object is to provide an apparatus, method, and user terminal that receive image data including text from a user terminal and select at least two text recognition engines including algorithms to be applied to the image data.

적어도 둘 이상의 텍스트 인식 엔진을 이용하여 이미지 데이터로부터 인식된 각각의 후보 텍스트를 도출하고, 도출된 각각의 후보 텍스트를 기학습된 강화 학습 모델에 입력하여 최종 텍스트를 도출하고, 도출된 최종 텍스트를 사용자 단말로 제공하는 장치, 방법 및 사용자 단말을 제공하고자 한다. Each candidate text recognized from the image data is derived using at least two text recognition engines, each derived candidate text is input into a pre-trained reinforcement learning model to derive the final text, and the derived final text is sent to the user. We intend to provide a device, method, and user terminal provided as a terminal.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical challenges that this embodiment aims to achieve are not limited to the technical challenges described above, and other technical challenges may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 사용자 단말로부터 텍스트를 포함하는 이미지 데이터를 수신하는 수신부, 상기 사용자 단말로부터 상기 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받는 텍스트 인식 엔진 선택부, 상기 선택된 적어도 둘 이상의 텍스트 인식 엔진을 이용하여 상기 이미지 데이터로부터 인식된 각각의 후보 텍스트를 도출하는 후보 텍스트 도출부, 상기 도출된 각각의 후보 텍스트를 기학습된 강화 학습 모델에 입력하여 최종 텍스트를 도출하는 최종 텍스트 도출부 및 상기 도출된 최종 텍스트를 상기 사용자 단말로 제공하는 제공부를 포함하는 텍스트 인식 장치를 제공할 수 있다. As a means for achieving the above-described technical problem, an embodiment of the present invention provides at least two or more devices including a receiving unit for receiving image data including text from a user terminal, and an algorithm to be applied to the image data from the user terminal. A text recognition engine selection unit that selects a text recognition engine, a candidate text derivation unit that derives each candidate text recognized from the image data using the selected at least two text recognition engines, and a candidate text derivation unit that derives each candidate text recognized from the image data using the selected text recognition engine. A text recognition device can be provided, including a final text deriving unit that inputs a learned reinforcement learning model to derive the final text, and a providing unit that provides the derived final text to the user terminal.

본 발명의 다른 실시예는, 복수의 이미지 중 텍스트를 인식할 이미지를 선택받는 단계, 상기 선택된 이미지를 텍스트 인식 장치로 전송하는 단계, 상기 선택된 이미지에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받는 단계 및 상기 텍스트 인식 장치로부터 상기 선택된 적어도 둘 이상의 텍스트 인식 엔진을 통해 도출된 최종 텍스트를 제공받는 단계를 포함하고, 상기 최종 텍스트는 상기 텍스트 인식 장치에 의해 상기 선택된 적어도 둘 이상의 알고리즘을 이용하여 상기 이미지로부터 인식된 각각의 후보 텍스트가 도출되고, 상기 도출된 각각의 후보 텍스트가 기학습된 강화 학습 모델에 입력되어 도출되는 것인 텍스트 제공 방법을 제공할 수 있다. Another embodiment of the present invention includes selecting an image to recognize text from among a plurality of images, transmitting the selected image to a text recognition device, and at least two text recognition engines including an algorithm to be applied to the selected image. and receiving a final text derived from the at least two or more selected text recognition engines from the text recognition device, wherein the final text uses the at least two or more algorithms selected by the text recognition device. Thus, it is possible to provide a text providing method in which each candidate text recognized from the image is derived, and each derived candidate text is input into a pre-trained reinforcement learning model and derived.

본 발명의 또 다른 실시예는, 복수의 이미지 중 텍스트를 인식할 이미지를 선택받는 이미지 선택부, 상기 선택된 이미지를 텍스트 인식 장치로 전송하는 전송부, 상기 선택된 이미지에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받는 텍스트 인식 엔진 선택부 및 상기 텍스트 인식 장치로부터 상기 선택된 적어도 둘 이상의 텍스트 인식 엔진을 통해 도출된 최종 텍스트를 제공받는 제공부를 포함하고, 상기 최종 텍스트는 상기 텍스트 인식 장치에 의해 상기 선택된 적어도 둘 이상의 텍스트 인식 엔진을 이용하여 상기 이미지로부터 인식된 각각의 후보 텍스트가 도출되고, 상기 도출된 각각의 후보 텍스트가 기학습된 강화 학습 모델에 입력되어 도출되는 것인 사용자 단말을 제공할 수 있다. Another embodiment of the present invention includes at least two images including an image selection unit that selects an image to recognize text from among a plurality of images, a transmission unit that transmits the selected image to a text recognition device, and an algorithm to be applied to the selected image. a text recognition engine selection unit that selects one or more text recognition engines; and a provision unit that receives a final text derived from the text recognition device through the at least two selected text recognition engines, wherein the final text is provided by the text recognition device. Provided is a user terminal in which each candidate text recognized from the image is derived using the selected at least two text recognition engines, and each of the derived candidate texts is input into a pre-trained reinforcement learning model and derived. You can.

본 발명의 또 다른 실시예는, 사용자 단말로부터 텍스트를 포함하는 이미지 데이터를 수신하는 단계, 상기 사용자 단말로부터 상기 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받는 단계, 상기 선택된 적어도 둘 이상의 텍스트 인식 엔진을 이용하여 상기 이미지 데이터로부터 인식된 각각의 후보 텍스트를 도출하는 단계, 상기 도출된 각각의 후보 텍스트를 기학습된 강화 학습 모델에 입력하여 최종 텍스트를 도출하는 단계 및 상기 도출된 최종 텍스트를 상기 사용자 단말로 제공하는 단계를 포함하는 텍스트 인식 방법을 제공할 수 있다. Another embodiment of the present invention includes receiving image data including text from a user terminal, selecting at least two text recognition engines including an algorithm to be applied to the image data from the user terminal, and selecting the selected text recognition engine. Deriving each candidate text recognized from the image data using at least two text recognition engines, inputting each derived candidate text into a pre-trained reinforcement learning model to derive a final text, and deriving the final text. A text recognition method may be provided including providing the final text to the user terminal.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described means for solving the problem are merely illustrative and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description of the invention.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 종래에는 OCR AI에서 학습에 이용하는 알고리즘에 따라 각각 다른 텍스트 인식 결과가 도출되었으나, 둘 이상의 알고리즘을 이용함으로써 보다 정확한 텍스트 인식 결과를 도출할 수 있도록 하는 장치, 방법 및 사용자 단말을 제공할 수 있다. According to one of the means for solving the problem of the present invention described above, in the past, different text recognition results were derived depending on the algorithm used for learning in OCR AI, but by using two or more algorithms, more accurate text recognition results can be derived. A device, method, and user terminal may be provided.

적어도 둘 이상의 알고리즘을 이용하여 도출된 각각의 후보 텍스트를 기학습된 강화 학습 모델에 입력하여 최종 텍스트를 도출함으로써, 높은 정확도의 텍스트 인식 결과를 제공하고, 최적의 텍스트 인식율을 제공하도록 하는 장치, 방법 및 사용자 단말을 제공할 수 있다.An apparatus and method for providing highly accurate text recognition results and an optimal text recognition rate by inputting each candidate text derived using at least two or more algorithms into a pre-trained reinforcement learning model to derive the final text. and a user terminal may be provided.

각각의 후보 텍스트에 기초하여 생성된 비교 테이블에 대해 사용자가 사용자 규칙 정보를 생성하고, 생성된 사용자 규칙 정보를 정답 데이터로 학습하여 최종 텍스트를 도출하도록 하는 장치, 방법 및 사용자 단말을 제공할 수 있다.An apparatus, method, and user terminal can be provided that allow a user to create user rule information for a comparison table created based on each candidate text, and learn the generated user rule information with correct answer data to derive the final text. .

도 1은 본 발명의 일 실시예에 따른 텍스트 인식 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 사용자 단말의 구성도이다.
도 3은 본 발명의 일 실시예에 따른 텍스트 인식 장치의 구성도이다.
도 4는 본 발명의 일 실시예에 따른 이미지 데이터에 적용할 알고리즘을 포함하는 텍스트 인식 엔진을 선택받는 과정을 설명하기 위한 예시적인 도면이다.
도 5a 및 도 5b는 본 발명의 일 실시예에 따른 이미지 데이터로부터 최종 텍스트를 도출하는 과정을 설명하기 위한 예시적인 도면이다.
도 6은 본 발명의 일 실시예에 따른 텍스트 인식 장치에서 이미지로부터 텍스트를 인식하는 방법의 순서도이다.
도 7은 본 발명의 일 실시예에 따른 사용자 단말에서 이미지로부터 인식된 텍스트를 제공받는 방법의 순서도이다. 1 is a configuration diagram of a text recognition system according to an embodiment of the present invention.
Figure 2 is a configuration diagram of a user terminal according to an embodiment of the present invention.
Figure 3 is a configuration diagram of a text recognition device according to an embodiment of the present invention.
FIG. 4 is an exemplary diagram illustrating a process of selecting a text recognition engine including an algorithm to be applied to image data according to an embodiment of the present invention.
5A and 5B are exemplary diagrams for explaining a process of deriving final text from image data according to an embodiment of the present invention.
Figure 6 is a flowchart of a method for recognizing text from an image in a text recognition device according to an embodiment of the present invention.
Figure 7 is a flow chart of a method of receiving text recognized from an image in a user terminal according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Below, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily implement the present invention. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. In order to clearly explain the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are given similar reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected," but also the case where it is "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, this does not mean excluding other components unless specifically stated to the contrary, but may further include other components, and one or more other features. It should be understood that it does not exclude in advance the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In this specification, 'part' includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Additionally, one unit may be realized using two or more pieces of hardware, and two or more units may be realized using one piece of hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In this specification, some of the operations or functions described as being performed by a terminal or device may instead be performed on a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed on a terminal or device connected to the server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다. Hereinafter, an embodiment of the present invention will be described in detail with reference to the attached drawings.

도 1은 본 발명의 일 실시예에 따른 텍스트 인식 시스템의 구성도이다. 도 1을 참조하면, 텍스트 인식 시스템(1)은 사용자 단말(110) 및 텍스트 인식 장치(120)를 포함할 수 있다. 사용자 단말(110) 및 텍스트 인식 장치(120)는 텍스트 인식 시스템(1)에 의하여 제어될 수 있는 구성요소들을 예시적으로 도시한 것이다. 1 is a configuration diagram of a text recognition system according to an embodiment of the present invention. Referring to FIG. 1, the text recognition system 1 may include a user terminal 110 and a text recognition device 120. The user terminal 110 and the text recognition device 120 are exemplary illustrations of components that can be controlled by the text recognition system 1.

도 1의 텍스트 인식 시스템(1)의 각 구성요소들은 일반적으로 네트워크(network)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 텍스트 인식 장치(120)는 사용자 단말(110)과 동시에 또는 시간 간격을 두고 연결될 수 있다. Each component of the text recognition system 1 of FIG. 1 is generally connected through a network. For example, as shown in FIG. 1, the text recognition device 120 may be connected to the user terminal 110 simultaneously or at time intervals.

네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다. Network refers to a connection structure that allows information exchange between nodes such as terminals and servers, including Local Area Network (LAN), Wide Area Network (WAN), and World Wide Area Network (WWW). Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television communication networks, etc. Examples of wireless data communication networks include 3G, 4G, 5G, WIMAX (World Interoperability for Microwave Access), Wi-Fi, Bluetooth communication, infrared communication, ultrasonic communication, Visible Light Communication (VLC), and LiFi. ), etc. are included, but are not limited to this.

사용자 단말(110)은 복수의 이미지 중 텍스트를 인식할 이미지를 선택받고, 선택된 이미지를 텍스트 인식 장치(120)로 전송할 수 있다. The user terminal 110 may select an image to recognize text from among a plurality of images and transmit the selected image to the text recognition device 120.

사용자 단말(110)은 선택된 이미지에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받을 수 있다. 예를 들어, 사용자 단말(110)은 사용자로부터 DBN, CNN, DNN, RNN, GAN 등의 알고리즘 적어도 둘 이상의 텍스트 인식 엔진을 선택받을 수 있다. 여기서, 적어도 둘 이상의 텍스트 인식 엔진은 서로 다른 알고리즘을 기반으로 학습되는 것일 수 있다. The user terminal 110 may select at least two text recognition engines including algorithms to be applied to the selected image. For example, the user terminal 110 may receive at least two text recognition engines selected by the user, such as DBN, CNN, DNN, RNN, or GAN. Here, at least two text recognition engines may be learned based on different algorithms.

사용자 단말(110)은 텍스트 인식 장치(120)로부터 선택된 적어도 둘 이상의 텍스트 인식 엔진을 통해 도출된 최종 텍스트를 제공받을 수 있다. The user terminal 110 may receive the final text derived through at least two text recognition engines selected from the text recognition device 120.

사용자 단말(110)은 텍스트 인식 장치(120)로부터 각각의 후보 텍스트에 대해 생성된 비교 테이블을 제공받을 수 있다. The user terminal 110 may receive a comparison table generated for each candidate text from the text recognition device 120.

사용자 단말(110)은 비교 테이블에 기초하여 사용자 규칙 정보를 입력받고, 입력된 사용자 규칙 정보를 텍스트 인식 장치(120)로 전송할 수 있다. 이는, 텍스트 인식 장치(120)에서 인식 자체가 불가능한 규칙(rule) 또는 로직(logic)의 경우, 사용자 규칙 정보를 입력받음으로써, 사용자에 의해 인식된 텍스트에 대한 최종 점검 및 확인이 가능하도록 할 수 있다. The user terminal 110 may receive user rule information based on the comparison table and transmit the input user rule information to the text recognition device 120. This allows, in the case of a rule or logic that cannot be recognized by the text recognition device 120, a final inspection and confirmation of the text recognized by the user by receiving user rule information. there is.

이를 통해, 사용자 규칙 정보에 기초하여 각각의 후보 텍스트와 관련된 가로, 세로, 금액, 건수 등에 대한 크로스 점검이 수행됨으로써, 텍스트 인식 장치(120)에서 OCR을 통해 텍스트를 인식한 후, 값(value)에 대한 최종 점검이 수행되어 최종 텍스트가 도출되도록 할 수 있다. Through this, cross-checking is performed on the width, height, amount, number of items, etc. related to each candidate text based on the user rule information, so that the text recognition device 120 recognizes the text through OCR and sets the value. A final check can be performed to derive the final text.

사용자 단말(110)은 최종 텍스트가 도출되기까지 단계별 진행 스텝을 진행바(progress bar) 형식을 통해 각 단계별 진행 스텝을 확인할 수 있다. The user terminal 110 can check the progress steps for each step until the final text is derived through a progress bar format.

이러한 과정을 통해, 사용자 단말(110)은 어떤 알고리즘에 의해, 텍스트 인식이 어떻게 진행되고 있는지를 확인할 수 있으며, 최종적으로 사용자가 최종 텍스트에 대한 결과 내용을 승인 및 확인할 수 있게 된다. Through this process, the user terminal 110 can check how the text recognition is progressing and by what algorithm, and finally, the user can approve and confirm the result of the final text.

텍스트 인식 장치(120)는 사용자 단말(110)로부터 텍스트를 포함하는 이미지 데이터를 수신할 수 있다. 예를 들어, 텍스트 인식 장치(120)는 영수증 비용 청구서, 사업자등록증, 진료비 청구 영수증, 공공기관 문서 등과 관련된 이미지 데이터를 수신할 수 있다. The text recognition device 120 may receive image data including text from the user terminal 110. For example, the text recognition device 120 may receive image data related to receipts, billing statements, business registration certificates, medical expense claim receipts, public institution documents, etc.

텍스트 인식 장치(120)는 수신한 이미지 데이터에 대해 회전 보정, 구겨짐 보정, 노이즈 제거, 워터마크 제거, 외각 선명도 강화 등의 전처리를 수행할 수 있다. The text recognition device 120 may perform preprocessing, such as rotation correction, creasing correction, noise removal, watermark removal, and outer sharpness enhancement, on the received image data.

텍스트 인식 장치(120)는 사용자 단말(110)로부터 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받고, 선택된 적어도 둘 이상의 텍스트 인식 엔진을 이용하여 이미지 데이터로부터 인식된 각각의 후보 텍스트를 도출할 수 있다. The text recognition device 120 receives a selection of at least two text recognition engines including algorithms to be applied to image data from the user terminal 110, and selects each candidate recognized from the image data using the at least two selected text recognition engines. Text can be derived.

텍스트 인식 장치(120)는 각각의 후보 텍스트에 대한 비교 테이블을 생성하고, 생성된 비교 테이블을 사용자 단말(110)로 제공할 수 있다. The text recognition device 120 may generate a comparison table for each candidate text and provide the generated comparison table to the user terminal 110 .

텍스트 인식 장치(120)는 도출된 각각의 후보 텍스트를 기학습된 강화 학습 모델에 입력하여 최종 텍스트를 도출할 수 있다. 예를 들어, 텍스트 인식 장치(120)는 날짜, 숫자, 영문, 한글 각각의 단어에 대한 기본적인 유효성 검증을 수행하여 최종 텍스트를 도출할 수 있다. The text recognition device 120 may input each derived candidate text into a pre-trained reinforcement learning model to derive the final text. For example, the text recognition device 120 may perform basic validation on each date, number, English word, and Korean word to derive the final text.

예를 들어, 날짜의 경우, yyyy년 mm월 dd일, yyyy.mm.dd, yy.mm.dd 등과 같이 다양한 날짜 포맷이 존재하며, 이와 관련하여, 텍스트 인식 장치(120)는 정확한 yyyy년도에 대한 점검, mm월에 대한 점검, dd일에 대한 점검을 가장 최우선으로 수행하여 최종 텍스트를 도출할 수 있다. 이 때, 텍스트 인식 장치(120)는 숫자의 포맷으로 날짜 형식을 정정할 수 있다.For example, in the case of dates, there are various date formats such as yyyy year, mm month, dd day, yyyy.mm.dd, yy.mm.dd, etc. In this regard, the text recognition device 120 displays the exact year yyyy. The final text can be derived by performing the inspection for mm month, inspection for dd day as the highest priority. At this time, the text recognition device 120 can correct the date format to a numeric format.

만약, 년/월/일에 대한 한글 인식의 오류가 발생된 경우, 텍스트 인식 장치(120)는 자동 변환(convert) 로직을 적용하여, 년, 월, 일로 수정할 수 있다. 예를 들어, 텍스트 인식 장치(120)는 인식된 텍스트 중 '넌'->'년', '냔'->'년', '낟'->'년' 등으로 수정하여 최종 텍스트를 도출할 수 있다. If an error occurs in Korean recognition of year/month/day, the text recognition device 120 can apply automatic conversion logic to correct the year, month, and day. For example, the text recognition device 120 may derive the final text by modifying the recognized text into 'Nun' -> 'Yeon', 'Nyan' -> 'Yeon', 'Raat' -> 'Yeon', etc. You can.

다른 예를 들어, 숫자의 경우, 0~9를 기본값으로 하며, 텍스트 인식 장치(120)는 숫자의 중간에 위치한 쉼표(,)의 경우, 이미지 원본의 양식을 그대로 반영할 수 있다. 예를 들어, 텍스트 인식 장치(120)는 '123.200'->'123,200'으로 수정하여 최종 텍스트를 도출할 수 있다. For another example, in the case of numbers, 0 to 9 are the default values, and in the case of a comma (,) located in the middle of a number, the text recognition device 120 can directly reflect the format of the original image. For example, the text recognition device 120 can derive the final text by modifying '123.200' -> '123,200'.

또 다른 예를 들어, 영문의 경우, 텍스트 인식 장치(120)는 A~Z를 인식할 수 있으며, 영어 단어는 위키피디아, 백과사전 등에서 출현된 단어를 이용하여 수정하여 최종 텍스트를 도출할 수 있다. For another example, in the case of English text, the text recognition device 120 can recognize A to Z, and English words can be modified using words appearing in Wikipedia, an encyclopedia, etc. to derive the final text.

또 다른 예를 들어, 한글의 경우, 텍스트 인식 장치(120)는 다양한 용어, 어순, 문장에 대해 점검을 수행하며, 이를 위해 조사를 제외한 단어의 정확한 텍스트 추출이 우선적으로 되었는지를 점검한 후 수정을 통해 최종 텍스트를 도출할 수 있다. 예를 들어, 진료비 영수증의 경우, 텍스트 인식 장치(120)는 발생될 수 있는 다양한 단어인 '청구명세서', '검사료', '진찰료', '비급여' 등의 용어를 건강보험심사평가원의 진료비 청구 명세서의 요양 기관별, 종별의 용어에 기초하여 수정할 수 있다. For another example, in the case of Hangul, the text recognition device 120 performs checks on various terms, word orders, and sentences. To this end, it checks whether accurate text extraction of words excluding particles is prioritized, and then makes corrections. The final text can be derived through this. For example, in the case of a receipt for medical expenses, the text recognition device 120 uses various words that may be generated, such as 'bill statement', 'test fee', 'consultation fee', and 'non-coverage', to apply for medical expenses from the Health Insurance Review and Assessment Service. It can be modified based on the terms of each medical institution and type in the specification.

또 다른 예를 들어, 주민번호의 포맷, 사업자등록증의 포맷 등의 경우, 숫자와 '-'로 구성되며, 텍스트 인식 장치(120)는 이에 대한 오류 점검을 별도로 수행하지 않으나, 반드시 숫자 포맷으로 인식하여 수정할 수 있다. For another example, in the case of the format of a resident registration number, the format of a business registration certificate, etc., it consists of numbers and '-', and the text recognition device 120 does not perform a separate error check for this, but always recognizes it in a numeric format. This can be modified.

텍스트 인식 장치(120)는 사용자 단말(110)로부터 비교 테이블에 기초하여 생성된 사용자 규칙 정보를 수신하고, 수신한 사용자 규칙 정보에 기초하여 최종 텍스트를 도출하고, 도출된 최종 텍스트를 사용자 단말(110)로 제공할 수 있다. The text recognition device 120 receives user rule information generated based on the comparison table from the user terminal 110, derives the final text based on the received user rule information, and sends the derived final text to the user terminal 110. ) can be provided.

텍스트 인식 장치(120)는 각각의 후보 텍스트 및 사용자 규칙 정보를 강화 학습 모델에 입력하여 상태(state)를 인식하고, 인식된 상태로부터 리워드(Reward) 함수를 만족하는 행동(Action)을 도출하도록 강화 학습 모델을 학습시킬 수 있다. The text recognition device 120 inputs each candidate text and user rule information into a reinforcement learning model to recognize the state, and reinforces it to derive an action that satisfies the reward function from the recognized state. A learning model can be trained.

즉, 본원은 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 이용함으로써, 하나의 텍스트 인식 엔진을 이용하여 이미지 데이터로부터 텍스트를 인식하였을 때보다 더욱 높은 정확도를 획득할 수 있다는 장점을 제공할 수 있다. In other words, by using at least two text recognition engines including algorithms to be applied to image data, the present application has the advantage of being able to obtain higher accuracy than when recognizing text from image data using one text recognition engine. can be provided.

도 2는 본 발명의 일 실시예에 따른 사용자 단말의 구성도이다. 도 2를 참조하면, 사용자 단말(110)은 이미지 선택부(210), 전송부(220), 텍스트 인식 엔진 선택부(230), 제공부(240) 및 사용자 규칙 정보 입력부(250)를 포함할 수 있다. Figure 2 is a configuration diagram of a user terminal according to an embodiment of the present invention. Referring to FIG. 2, the user terminal 110 may include an image selection unit 210, a transmission unit 220, a text recognition engine selection unit 230, a provision unit 240, and a user rule information input unit 250. You can.

이미지 선택부(210)는 복수의 이미지 중 텍스트를 인식할 이미지를 선택받을 수 있다. 예를 들어, 사용자 단말(110)이 텍스트 인식 서비스를 제공하는 웹 사이트에 접속한 경우, 이미지 선택부(210)는 사용자로부터 드래그앤드롭(Drag&Drop) 방식을 통해 사용자 단말(110)에 저장된 이미지 또는 클라우드 서버(미도시)에 저장된 이미지를 선택받을 수 있다. The image selection unit 210 may select an image to recognize text from among a plurality of images. For example, when the user terminal 110 accesses a website that provides a text recognition service, the image selection unit 210 selects an image or an image stored in the user terminal 110 through a drag and drop method from the user. You can select an image stored on a cloud server (not shown).

전송부(220)는 선택된 이미지를 텍스트 인식 장치(120)로 전송할 수 있다. The transmission unit 220 may transmit the selected image to the text recognition device 120.

텍스트 인식 엔진 선택부(230)는 선택된 이미지에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받을 수 있다. 예를 들어, 텍스트 인식 엔진 선택부(230)는 멀티쓰레드(multi-thread)로 로드된 각각의 알고리즘으로 구성된 복수의 텍스트 인식 엔진 중 적어도 둘 이상의 텍스트 인식 엔진을 선택받을 수 있다. The text recognition engine selection unit 230 may select at least two text recognition engines including algorithms to be applied to the selected image. For example, the text recognition engine selection unit 230 may select at least two text recognition engines from among a plurality of text recognition engines each composed of algorithms loaded in multi-threads.

제공부(240)는 텍스트 인식 장치(120)로부터 선택된 적어도 둘 이상의 텍스트 인식 엔진을 통해 도출된 최종 텍스트를 제공받을 수 있다. The providing unit 240 may receive the final text derived through at least two text recognition engines selected from the text recognition device 120.

제공부(240)는 텍스트 인식 장치(120)로부터 각각의 후보 텍스트에 대해 생성된 비교 테이블을 제공받는 단계를 더 포함할 수 있다. The providing unit 240 may further include receiving a comparison table generated for each candidate text from the text recognition device 120.

사용자 규칙 정보 입력부(250)는 비교 테이블에 기초하여 사용자 규칙 정보를 입력받을 수 있다. 예를 들어, 사용자 규칙 정보 입력부(250)는 사용자로부터 사용자 규칙 정보로 제 1 후보 텍스트(500) 중 '진료비', 'A병원', '전화번호' 등의 선택을 입력받고, 제 2 후보 텍스트(510) 중 '234-4563'의 선택을 입력받을 수 있다. The user rule information input unit 250 may receive user rule information based on the comparison table. For example, the user rule information input unit 250 receives selections such as 'medical expenses', 'hospital A', and 'telephone number' among the first candidate texts 500 as user rule information from the user, and selects the second candidate text 500 as user rule information. You can select '234-4563' from (510).

전송부(220)는 사용자 규칙 정보를 텍스트 인식 장치(120)로 전송할 수 있다. The transmission unit 220 may transmit user rule information to the text recognition device 120.

도 3은 본 발명의 일 실시예에 따른 텍스트 인식 장치의 구성도이다. 도 3을 참조하면, 텍스트 인식 장치(120)는 수신부(310), 전처리 수행부(320), 텍스트 인식 엔진 선택부(330), 후보 텍스트 도출부(340), 비교 테이블 생성부(350), 최종 텍스트 도출부(360), 제공부(370) 및 학습부(380)를 포함할 수 있다. Figure 3 is a configuration diagram of a text recognition device according to an embodiment of the present invention. Referring to FIG. 3, the text recognition device 120 includes a receiving unit 310, a preprocessing unit 320, a text recognition engine selection unit 330, a candidate text deriving unit 340, a comparison table generating unit 350, It may include a final text derivation unit 360, a provision unit 370, and a learning unit 380.

수신부(310)는 사용자 단말(110)로부터 텍스트를 포함하는 이미지 데이터를 수신할 수 있다. 예를 들어, 수신부(310)는 사용자 단말(110)로부터 진료비 영수증 이미지 데이터, 영수증 비용 청구 이미지 데이터, 사업자등록증 이미지 데이터, 공공기관 문서 이미지 데이터 등을 수신할 수 있다. The receiving unit 310 may receive image data including text from the user terminal 110. For example, the receiving unit 310 may receive medical fee receipt image data, receipt fee claim image data, business registration certificate image data, and public institution document image data from the user terminal 110.

전처리 수행부(320)는 수신한 이미지 데이터에 대해 회전 보정, 구겨짐 보정, 노이즈 제거, 워터마크 제거, 외각 선명도 강화 등의 전처리를 수행할 수 있다. The preprocessing unit 320 may perform preprocessing, such as rotation correction, creasing correction, noise removal, watermark removal, and outer sharpness enhancement, on the received image data.

텍스트 인식 엔진 선택부(330)는 사용자 단말(110)로부터 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받을 수 있다. 여기서, 적어도 둘 이상의 텍스트 인식 엔진은 각각 다른 알고리즘을 기반으로 학습이 수행된 것일 수 있다. The text recognition engine selection unit 330 may select at least two text recognition engines including algorithms to be applied to image data from the user terminal 110. Here, at least two text recognition engines may each be trained based on different algorithms.

후보 텍스트 도출부(340)는 선택된 적어도 둘 이상의 텍스트 인식 엔진을 이용하여 이미지 데이터로부터 인식된 각각의 후보 텍스트를 도출할 수 있다. 여기서, 후보 텍스트 도출부(340)는 선택된 텍스트 인식 엔진에 따라 각각 다른 결과를 나타내는 후보 텍스트를 도출할 수 있다. The candidate text derivation unit 340 may derive each candidate text recognized from image data using at least two selected text recognition engines. Here, the candidate text derivation unit 340 may derive candidate texts showing different results depending on the selected text recognition engine.

최종 텍스트 도출부(360)는 도출된 각각의 후보 텍스트를 기학습된 강화 학습 모델에 입력하여 최종 텍스트를 도출할 수 있다. 이미지 데이터에 적용할 알고리즘을 선택받아 각각의 후보 텍스트를 도출하는 과정에 대해서는 도 4를 통해 상세히 설명하도록 한다. The final text derivation unit 360 may input each derived candidate text into a pre-trained reinforcement learning model to derive the final text. The process of selecting an algorithm to be applied to image data and deriving each candidate text will be explained in detail with reference to FIG. 4.

도 4는 본 발명의 일 실시예에 따른 이미지 데이터에 적용할 알고리즘을 선택받는 과정을 설명하기 위한 예시적인 도면이다. 도 4를 참조하면, 텍스트 인식 엔진 선택부(330)는 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받을 수 있다. 여기서, 텍스트 인식 엔진은 예를 들어, DBN(Deep Belief Network), CNN(Convolution Neural Network), GAN(Generative Adversarial Network), RNN(Recurrent Neural Network) 등의 알고리즘을 기반으로 학습된 것일 수 있으며, 적어도 둘 이상의 텍스트 인식 엔진은 각각 다른 알고리즘을 기반으로 학습된 것일 수 있다. FIG. 4 is an exemplary diagram illustrating a process of selecting an algorithm to be applied to image data according to an embodiment of the present invention. Referring to FIG. 4, the text recognition engine selection unit 330 may select at least two text recognition engines including algorithms to be applied to image data. Here, the text recognition engine may be, for example, learned based on an algorithm such as Deep Belief Network (DBN), Convolution Neural Network (CNN), Generative Adversarial Network (GAN), or Recurrent Neural Network (RNN), or at least Two or more text recognition engines may each be trained based on different algorithms.

DBN은 기계학습에서 사용되는 그래프 생성 모형(generative graphical model)로, 딥러닝에서는 잠재변수(latent variable)의 다중계층으로 이루어진 심층 신경망을 의미하며, 계층 간에 연결이 있지만, 계층 내의 유닛(unit) 간에는 연결이 없다는 특징을 가지고 있다. DBN은 생성 모형이라는 특성상 선행학습에 사용될 수 있고, 선행학습을 통해 초기 가중치를 학습한 후 역전파 혹은 다른 판별 알고리즘을 통해 가중치의 미조정을 수행함으로써, 선행학습된 가중치 초기값은 임의로 설정된 가중치 초기값에 비해 최적의 가중치에 가깝게 되어 미조정 단계의 성능과 속도를 향상시킬 수 있다. DBN is a generative graphical model used in machine learning. In deep learning, it refers to a deep neural network composed of multiple layers of latent variables. There are connections between layers, but there are connections between units within the layers. It has the characteristic of having no connection. Due to the nature of DBN as a generative model, it can be used for prior learning. By learning the initial weights through prior learning and then fine-tuning the weights through backpropagation or other discriminant algorithms, the initial values of the previously learned weights are set to the arbitrarily set initial weights. Compared to the value, it becomes closer to the optimal weight, which can improve the performance and speed of the fine-tuning step.

CNN은 최소한의 전처리를 사용하도록 설계된 다계층 퍼셉트론(multilayer perceptrons)의 한 종류로, 하나 또는 여러 개의 합성곱 계층과 그 위에 올려진 일반적인 인공 신경망 계층들로 이루어져 있으며, 가중치와 통합 계층(pooling layer)들을 추가로 활용하는 특징을 가지고 있다. 이러한 구조로 인해, 2차원 구조의 입력 데이터를 충분히 활용할 수 있어, 영상 및 음성 분야에서 뛰어난 성능을 보여줄 수 있다. CNN is a type of multilayer perceptrons designed to use minimal preprocessing. It consists of one or several convolution layers and general artificial neural network layers on top of them, including weight and pooling layers. It has the feature of making additional use of them. Due to this structure, it is possible to fully utilize input data in a two-dimensional structure, showing excellent performance in the video and audio fields.

GAN은 생성적 적대 신경망이란 의미를 가지며, 서로 적대하는 두 개의 신경망으로 구성되어, Generator라고 불리는 데이터 인스턴스 생성자와 Discriminator라는 데이터 검토자를 포함하며, Generator가 Discriminator에게 전달하는 데이터를 생성해내고, Discriminator가 해당 데이터가 참인지, 거짓인지에 대한 레이블(label)을 판별해내는 특징을 갖는다. GAN means generative adversarial network, and is composed of two neural networks that are hostile to each other. It includes a data instance generator called a generator and a data reviewer called a discriminator. The generator generates data that is passed to the discriminator, and the discriminator It has the characteristic of determining whether the data is true or false.

RNN은 히든 노드(hidden node)가 방향을 가진 엣지(edge)로 연결되어 순환 구조를 이루는 인공신경망의 한 종류로, 시퀀스의 길이에 관계없이 입력과 출력을 받아들일 수 있는 네트워크 구조로 구성되어 필요에 따라 다양하고 유연하게 구조를 생성할 수 있다는 특징을 갖는다. 이러한 RNN은 음성, 문자 등이 순차적으로 등장하는 데이터 처리에 적합하다. RNN is a type of artificial neural network in which hidden nodes are connected by directed edges to form a circular structure. It is composed of a network structure that can accept input and output regardless of the length of the sequence. It has the characteristic of being able to create diverse and flexible structures depending on the type. This RNN is suitable for processing data in which voice, text, etc. appear sequentially.

예를 들어, 수신부(310)가 사용자 단말(110)로부터 '홍길동_A병원 진료 영수증 이미지 데이터'(400)를 수신한 경우, 알고리즘 선택부(330)는 사용자 단말(110)로부터 복수의 텍스트 인식 엔진(410) 중 'CNN 알고리즘'을 포함하는 제 1 텍스트 인식 엔진(411) 및 'GAN 알고리즘'을 포함하는 제 2 텍스트 인식 엔진(412)을 선택받을 수 있다. For example, when the receiver 310 receives 'Hong Gil-dong_A Hospital medical treatment receipt image data' 400 from the user terminal 110, the algorithm selection unit 330 recognizes a plurality of texts from the user terminal 110. Among the engines 410, a first text recognition engine 411 including a 'CNN algorithm' and a second text recognition engine 412 including a 'GAN algorithm' may be selected.

예를 들어, 후보 텍스트 도출부(340)는 'CNN 알고리즘'을 포함하는 제 1 텍스트 인식 엔진(411)을 이용하여 제 1 후보 텍스트를 도출하고, 'GAN 알고리즘'을 포함하는 제 2 텍스트 인식 엔진(412)을 이용하여 제 2 후보 텍스트를 도출할 수 있다. For example, the candidate text derivation unit 340 derives the first candidate text using the first text recognition engine 411 including the 'CNN algorithm', and the second text recognition engine including the 'GAN algorithm'. The second candidate text can be derived using (412).

다시 도 1로 돌아와서, 비교 테이블 생성부(350)는 각각의 후보 텍스트에 대한 비교 테이블을 생성할 수 있다. 예를 들어, 비교 테이블 생성부(350)는 제 1 텍스트 인식 엔진을 통해 도출된 제 1 후보 텍스트 및 제 2 텍스트 인식 엔진을 통해 도출된 제 2 후보 텍스트를 비교한 비교 테이블을 생성할 수 있다. Returning to FIG. 1, the comparison table generator 350 may generate a comparison table for each candidate text. For example, the comparison table generator 350 may generate a comparison table that compares the first candidate text derived through the first text recognition engine and the second candidate text derived through the second text recognition engine.

제공부(370)는 생성된 비교 테이블을 사용자 단말(110)로 제공할 수 있다. The provider 370 may provide the generated comparison table to the user terminal 110 .

수신부(310)는 사용자 단말(110)로부터 비교 테이블에 기초하여 생성된 사용자 규칙 정보를 수신할 수 있다. 예를 들어, 적어도 하나의 후보 텍스트가 '2021.10.03'을 오류 인식으로 인해 '2021,10,03'으로 인식된 경우, 제 1 후보 텍스트 또는 제 2 후보 텍스트 중 "."으로 인식된 정보를 정답 데이터로 하는 사용자 규칙 정보를 수신할 수 있다. The receiving unit 310 may receive user rule information generated based on the comparison table from the user terminal 110. For example, if at least one candidate text '2021.10.03' is recognized as '2021,10,03' due to error recognition, the information recognized as "." among the first candidate text or the second candidate text User rule information using correct answer data can be received.

최종 텍스트 도출부(360)는 수신한 사용자 규칙 정보에 기초하여 최종 텍스트를 도출할 수 있다. The final text deriving unit 360 may derive the final text based on the received user rule information.

제공부(370)는 도출된 최종 텍스트를 사용자 단말(110)로 제공할 수 있다. The provider 370 may provide the derived final text to the user terminal 110.

학습부(380)는 각각의 후보 텍스트 및 사용자 규칙 정보를 강화 학습 모델에 입력하여 상태(state)를 인식하고, 인식된 상태로부터 리워드(Reward) 함수를 만족하는 행동(Action)을 도출하도록 강화 학습 모델을 학습시킬 수 있다. The learning unit 380 inputs each candidate text and user rule information into a reinforcement learning model to recognize the state, and performs reinforcement learning to derive an action that satisfies the reward function from the recognized state. You can train a model.

도 5a 및 도 5b는 본 발명의 일 실시예에 따른 이미지 데이터로부터 최종 텍스트를 도출하는 과정을 설명하기 위한 예시적인 도면이다. 5A and 5B are exemplary diagrams for explaining a process of deriving final text from image data according to an embodiment of the present invention.

도 5a는 본 발명의 일 실시예에 따른 사용자 규칙 정보를 도시한 예시적인 도면이다. 도 5a를 참조하면, 텍스트 인식 장치(120)가 제 1 알고리즘을 포함하는 제 1 텍스트 인식 엔진을 이용하여 제 1 후보 텍스트(500)를 도출하고, 제 2 알고리즘을 포함하는 제 2 텍스트 인식 엔진을 이용하여 제 2 후보 텍스트(510)를 도출하였다고 가정하자. FIG. 5A is an exemplary diagram illustrating user rule information according to an embodiment of the present invention. Referring to FIG. 5A, the text recognition device 120 derives the first candidate text 500 using a first text recognition engine including a first algorithm, and uses a second text recognition engine including a second algorithm. Let us assume that the second candidate text 510 is derived using

텍스트 인식 장치(120)는 제 1 후보 텍스트(500) 및 제 2 후보 텍스트(510)를 비교한 비교 테이블을 생성하여 사용자 단말(110)로 전송할 수 있다. 이 때, 도 5a에 도시한 바와 같이, 상기 비교 테이블은 각각의 후보 텍스트 간을 연관시켜 표시하는 테이블일 수 있다. 즉, 상기 비교 테이블은 제 1 후보 텍스트(500)를 구성하는 후보 텍스트와 제 2 후보 텍스트(510)를 구성하는 후보 텍스트에서 상호 연관되는 후보 텍스트 간을 사용자가 알기 쉽도록 연관(예를 들어, 실선으로 연관하여 표시)하는 테이블일 수 있다.The text recognition device 120 may generate a comparison table that compares the first candidate text 500 and the second candidate text 510 and transmit it to the user terminal 110. At this time, as shown in FIG. 5A, the comparison table may be a table that links and displays each candidate text. That is, the comparison table is a correlation (e.g., It may be a table (displayed in association with a solid line).

텍스트 인식 장치(120)는 사용자 단말(110)로부터 비교 테이블에 기초하여 생성된 사용자 규칙 정보를 수신할 수 있다. 예를 들어, 텍스트 인식 장치(120)는 제 1 후보 텍스트(500) 중 '진료비', 'A병원', '전화번호' 등이 선택되고, 제 2 후보 텍스트(510) 중 '234-4563'이 선택된 사용자 규칙 정보를 수신할 수 있다. The text recognition device 120 may receive user rule information generated based on the comparison table from the user terminal 110. For example, the text recognition device 120 selects 'medical expenses', 'hospital A', and 'telephone number' from the first candidate text 500, and '234-4563' from the second candidate text 510. This selected user rule information can be received.

도 5b는 본 발명의 일 실시예에 따른 최종 텍스트를 도시한 예시적인 도면이다. 도 4 및 도 5b를 참조하면, 텍스트 인식 장치(120)가 '홍길동_A병원 진료 영수증 이미지 데이터'(400)를 수신한 경우, 텍스트 인식 장치(120)는 수신한 이미지 데이터로부터 인식된 최종 텍스트를 도출할 수 있다. FIG. 5B is an exemplary diagram illustrating final text according to an embodiment of the present invention. Referring to FIGS. 4 and 5B, when the text recognition device 120 receives ‘Hong Gil-dong_A Hospital medical treatment receipt image data’ 400, the text recognition device 120 receives the final text recognized from the received image data. can be derived.

도 6은 본 발명의 일 실시예에 따른 텍스트 인식 장치에서 이미지로부터 텍스트를 인식하는 방법의 순서도이다. 도 6에 도시된 텍스트 인식 장치(120)에서 이미지로부터 텍스트를 인식하는 방법은 도 1 내지 도 5b에 도시된 실시예에 따른 텍스트 인식 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 5b에 도시된 실시예에 따른 텍스트 인식 장치(120)에서 이미지로부터 텍스트를 인식하는 방법에도 적용된다. Figure 6 is a flowchart of a method for recognizing text from an image in a text recognition device according to an embodiment of the present invention. The method of recognizing text from an image in the text recognition device 120 shown in FIG. 6 includes steps processed in time series by the text recognition system 1 according to the embodiment shown in FIGS. 1 to 5B. Therefore, even if the content is omitted below, it also applies to the method of recognizing text from an image in the text recognition device 120 according to the embodiment shown in FIGS. 1 to 5B.

단계 S610에서 텍스트 인식 장치(120)는 사용자 단말(110)로부터 텍스트를 포함하는 이미지 데이터를 수신할 수 있다. In step S610, the text recognition device 120 may receive image data including text from the user terminal 110.

단계 S620에서 텍스트 인식 장치(120)는 사용자 단말(110)로부터 이미지 데이터에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받을 수 있다. In step S620, the text recognition device 120 may select at least two text recognition engines including algorithms to be applied to image data from the user terminal 110.

단계 S630에서 텍스트 인식 장치(120)는 선택된 적어도 둘 이상의 텍스트 인식 엔진을 이용하여 이미지 데이터로부터 인식된 각각의 후보 텍스트를 도출할 수 있다. In step S630, the text recognition device 120 may derive each candidate text recognized from the image data using at least two selected text recognition engines.

단계 S640에서 텍스트 인식 장치(120)는 도출된 각각의 후보 텍스트를 기학습된 강화 학습 모델에 입력하여 최종 텍스트를 도출할 수 있다. In step S640, the text recognition device 120 may input each derived candidate text into a pre-trained reinforcement learning model to derive a final text.

단계 S650에서 텍스트 인식 장치(120)는 도출된 최종 텍스트를 사용자 단말(110)로 제공할 수 있다. In step S650, the text recognition device 120 may provide the derived final text to the user terminal 110.

상술한 설명에서, 단계 S610 내지 S650은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S610 to S650 may be further divided into additional steps or combined into fewer steps, depending on the implementation of the present invention. Additionally, some steps may be omitted or the order between steps may be switched as needed.

도 7은 본 발명의 일 실시예에 따른 사용자 단말에서 이미지로부터 인식된 텍스트를 제공받는 방법의 순서도이다. 도 7에 도시된 사용자 단말(110)에서 이미지로부터 인식된 텍스트를 제공받는 방법은 도 1 내지 도 6에 도시된 실시예에 따른 텍스트 인식 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 6에 도시된 실시예에 따른 사용자 단말(110)에서 이미지로부터 인식된 텍스트를 제공받는 방법에도 적용된다.Figure 7 is a flow chart of a method of receiving text recognized from an image in a user terminal according to an embodiment of the present invention. The method of receiving text recognized from an image in the user terminal 110 shown in FIG. 7 includes steps processed in time series by the text recognition system 1 according to the embodiment shown in FIGS. 1 to 6. . Therefore, even if the content is omitted below, it also applies to the method of receiving text recognized from an image in the user terminal 110 according to the embodiment shown in FIGS. 1 to 6.

단계 S710에서 사용자 단말(110)은 복수의 이미지 중 텍스트를 인식할 이미지를 선택받을 수 있다.In step S710, the user terminal 110 may select an image to recognize text from among a plurality of images.

단계 S720에서 사용자 단말(110)은 선택된 이미지를 텍스트 인식 장치(120)로 전송할 수 있다. In step S720, the user terminal 110 may transmit the selected image to the text recognition device 120.

단계 S730에서 사용자 단말(110)은 선택된 이미지에 적용할 알고리즘을 포함하는 적어도 둘 이상의 텍스트 인식 엔진을 선택받을 수 있다.In step S730, the user terminal 110 may select at least two text recognition engines including algorithms to be applied to the selected image.

단계 S740에서 사용자 단말(110)은 텍스트 인식 장치(120)로부터 선택된 적어도 둘 이상의 텍스트 인식 엔진을 통해 도출된 최종 텍스트를 제공받을 수 있다. In step S740, the user terminal 110 may receive the final text derived through at least two text recognition engines selected from the text recognition device 120.

상술한 설명에서, 단계 S710 내지 S740은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S710 to S740 may be further divided into additional steps or combined into fewer steps, depending on the implementation of the present invention. Additionally, some steps may be omitted or the order between steps may be switched as needed.

도 1 내지 도 7을 통해 설명된 사용자 단말에서 이미지로부터 인식된 텍스트를 제공받는 방법 및 텍스트 인식 장치에서 이미지로부터 텍스트를 인식하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 7을 통해 설명된 사용자 단말에서 이미지로부터 인식된 텍스트를 제공받는 방법 및 텍스트 인식 장치에서 이미지로부터 텍스트를 인식하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다. The method of receiving text recognized from an image in a user terminal described with reference to FIGS. 1 to 7 and the method of recognizing text from an image in a text recognition device include a computer program stored on a medium executable by a computer or a computer program executable by a computer. It can also be implemented in the form of a recording medium containing instructions. In addition, the method of receiving text recognized from an image in a user terminal and the method of recognizing text from an image in a text recognition device described with reference to FIGS. 1 to 7 may also be in the form of a computer program stored in a medium executed by a computer. It can be implemented.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media. Additionally, computer-readable media may include computer storage media. Computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The description of the present invention described above is for illustrative purposes, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential features of the present invention. will be. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. For example, each component described as unitary may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

110: 사용자 단말
120: 텍스트 인식 장치
210: 이미지 선택부
220: 전송부
230: 텍스트 인식 엔진 선택부
240: 제공부
250: 사용자 규칙 정보 입력부
310: 수신부
320: 전처리 수행부
330: 텍스트 인식 엔진 선택부
340: 후보 텍스트 도출부
350: 비교 테이블 생성부
360: 최종 텍스트 도출부
370: 제공부
380: 학습부110: user terminal
120: Text recognition device
210: Image selection unit
220: Transmission unit
230: Text recognition engine selection unit
240: Provider
250: User rule information input unit
310: receiving unit
320: Preprocessing execution unit
330: Text recognition engine selection unit
340: Candidate text derivation unit
350: Comparison table creation unit
360: Final text derivation unit
370: Provider
380: Learning Department

Claims

In a device for recognizing text from images,
a receiving unit that receives image data including text from a user terminal;
a text recognition engine selection unit that selects at least two text recognition engines including algorithms to be applied to the image data from the user terminal;
a candidate text derivation unit that derives each candidate text recognized from the image data using the selected at least two text recognition engines;
a final text derivation unit that inputs each of the derived candidate texts into a pre-trained reinforcement learning model to derive a final text; and
It includes a providing unit that provides the derived final text to the user terminal,
Further comprising a comparison table generator for generating a comparison table for each candidate text,
The provider provides the generated comparison table to the user terminal,
The receiving unit receives user rule information generated based on the comparison table from the user terminal,
The final text derivation unit derives the final text based on the received user rule information,
The comparison table is,
A text recognition device that associates and displays each candidate text.

According to claim 1,
A text recognition device further comprising a preprocessing unit that performs at least one of rotation correction, creasing correction, noise removal, watermark removal, and outer sharpness enhancement on the received image data.

delete

According to claim 1,
The reinforcement learning model recognizes a state by inputting each candidate text and the user rule information into the reinforcement learning model, and derives an action that satisfies a reward function from the recognized state. A text recognition device further comprising a learning unit for learning.

In a method of receiving text recognized from an image in a user terminal,
Selecting an image to recognize text from among a plurality of images;
transmitting the selected image to a text recognition device;
selecting at least two text recognition engines including algorithms to be applied to the selected image; and
Comprising: receiving a final text derived through the selected at least two text recognition engines from the text recognition device,
The final text is derived from each candidate text recognized from the image using at least two algorithms selected by the text recognition device,
Each candidate text derived above is input into a pre-trained reinforcement learning model and derived,
receiving a comparison table generated for each candidate text from the text recognition device;
receiving user rule information based on the comparison table; and
It further includes transmitting the input user rule information to the text recognition device,
The comparison table is,
A method of providing text, wherein each candidate text is associated and displayed.

delete

In a user terminal that receives text recognized from an image,
an image selection unit that selects an image to recognize text from among a plurality of images;
a transmission unit transmitting the selected image to a text recognition device;
an engine selection unit that selects at least two text recognition engines including algorithms to be applied to the selected image; and
A provision unit that receives final text derived from the text recognition device through the selected at least two text recognition engines,
The final text is derived from each candidate text recognized from the image using at least two algorithms selected by the text recognition device,
Each candidate text derived above is input into a pre-trained reinforcement learning model and derived,
The provider receives a comparison table generated for each candidate text from the text recognition device,
Further comprising a user rule information input unit that receives user rule information based on the comparison table,
The transmission unit transmits the input user rule information to the text recognition device,
The comparison table is,
A user terminal that associates and displays each of the candidate texts.

delete

In a method of recognizing text from an image in a text recognition device,
Receiving image data including text from a user terminal;
Receiving selection of at least two text recognition engines including algorithms to be applied to the image data from the user terminal;
Deriving each candidate text recognized from the image data using the selected at least two text recognition engines;
Inputting each of the derived candidate texts into a pre-trained reinforcement learning model to derive a final text; and
It includes providing the derived final text to the user terminal,
generating a comparison table for each candidate text;
providing the generated comparison table to the user terminal;
Receiving user rule information generated based on the comparison table from the user terminal; and
It further includes deriving the final text based on the received user rule information,
The comparison table is,
A text recognition method that associates and displays each candidate text.

According to claim 12,
A text recognition method further comprising performing at least one preprocessing of rotation correction, creasing correction, noise removal, watermark removal, and outer sharpness enhancement on the received image data.

delete

According to claim 12,
Each candidate text and the user rule information are input to the reinforcement learning model to recognize a state, and the reinforcement learning is performed to derive an action that satisfies a reward function from the recognized state. A text recognition method further comprising the step of training a model.