TWM645906U

TWM645906U - Document character recognition system

Info

Publication number: TWM645906U
Application number: TW111214321U
Authority: TW
Inventors: 詠恆鄭; 林宥樺; 林佳瑩; 倪文君; 吳昭儀; 王麒詳
Original assignee: 叡揚資訊股份有限公司
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-09-11

Abstract

本創作提供一種文件字元辨識系統，包含文件分析模組、前處理模組、OCR模組、後處理模組及介面模組。文件分析模組接獲文件影像，決定適合該文件影像的前處理方法及後處理方法，並分別通知前處理模組及後處理模組。前處理模組根據前處理方法對文件影像進行前處理， OCR模組對經前處理的文件影像進行文字辨識，對應產生一文件檔案，後處理模組則根據後處理方法對文件檔案進行後處理。介面模組提供使用者介面，用以檢視及編輯該文件檔案之內容。This creation provides a document character recognition system, including a document analysis module, a pre-processing module, an OCR module, a post-processing module and an interface module. The document analysis module receives the document image, determines the pre-processing method and post-processing method suitable for the document image, and notifies the pre-processing module and post-processing module respectively. The pre-processing module pre-processes the document image according to the pre-processing method, the OCR module performs text recognition on the pre-processed document image and generates a corresponding document file, and the post-processing module post-processes the document file according to the post-processing method. . The interface module provides a user interface for viewing and editing the contents of the document file.

Description

document character recognition system

本創作係關於光學字元辨識(optical character recognition,OCR)。 This creation is about optical character recognition (OCR).

針對文件的字元辨識，現有的技術方案一般皆是以單一資料的光學字元辨識(optical character recognition,OCR)任務來進行。欲對特定文件辨識時，須人工重新對目標文件進行模型的訓練，或設計對應的策略來加強辨識效果，且對於不同類型之目標文件的前處理，亦須人為介入。此外，如何進一步改善OCR之效果，使得到的文件檔案後續可以更好地被利用，也是本領域中欲解決的問題。 For document character recognition, existing technical solutions generally use single-data optical character recognition (optical character recognition, OCR) tasks. When you want to identify a specific document, you must manually re-train the model of the target document, or design a corresponding strategy to enhance the recognition effect. Human intervention is also required for pre-processing of different types of target documents. In addition, how to further improve the effect of OCR so that the obtained document files can be better utilized in the future is also a problem to be solved in this field.

為解決如前述先前技術中之問題，本創作提供一種文件字元辨識系統，包含彼此電性連接的多個硬體電路，其設置為用以組態成多個模組，且該等模組包含：一文件分析模組、一前處理模組、一OCR模組、一後處理模組及一介面模組，其中：該文件分析模組接獲一文件影像，基於一強化學習模型決定適合該文件影像的一前處理方法及一後處理方法，並分別通知該前處理模組及該後處理模組；該前處理模組根據該前處理方法對該文件影像進行前處理；該OCR模組對經前處理的該文件影像進行文字辨識，並對應產生一文件檔案；該後處理模組根據該後處理方法對該文件檔案進行後處理；該介面模組提供一使用者介面，用以檢視及編輯該文件檔案之內容。In order to solve the problems in the prior art as mentioned above, the present invention provides a document character recognition system, which includes a plurality of hardware circuits electrically connected to each other, and is configured to be configured into a plurality of modules, and the modules It includes: a document analysis module, a pre-processing module, an OCR module, a post-processing module and an interface module, wherein: the document analysis module receives a document image and determines a suitable document based on a reinforcement learning model. A pre-processing method and a post-processing method for the document image, and the pre-processing module and the post-processing module are notified respectively; the pre-processing module performs pre-processing on the document image according to the pre-processing method; the OCR module The group performs text recognition on the pre-processed document image and generates a document file accordingly; the post-processing module performs post-processing on the document file according to the post-processing method; the interface module provides a user interface for View and edit the contents of this document file.

根據本創作之部分具體實施例，該使用者介面允許一使用者針對該前處理模組、該OCR模組或該後處理模組之處理提供回饋。According to some embodiments of the present invention, the user interface allows a user to provide feedback on the processing of the pre-processing module, the OCR module or the post-processing module.

在一些具體實施例中，將針對該前處理模組或該後處理模組之處理的回饋提供予該文件分析模組，該回饋用以訓練該強化學習模型。In some embodiments, feedback on the processing of the pre-processing module or the post-processing module is provided to the document analysis module, and the feedback is used to train the reinforcement learning model.

根據本創作之部分具體實施例，該OCR模組用於文字辨識之模型包括一基於實例的學習模型。在一些具體實施例中，該使用者介面允許一使用者針對該OCR模組之處理提供回饋。在一些具體實施例中，將針對該OCR模組之處理的回饋提供予該OCR模組，該回饋用以訓練該基於實例的學習模型。According to some specific embodiments of the present invention, the model used by the OCR module for text recognition includes an instance-based learning model. In some embodiments, the user interface allows a user to provide feedback on the OCR module's processing. In some embodiments, feedback on the OCR module's processing is provided to the OCR module, and the feedback is used to train the instance-based learning model.

根據本創作之部分具體實施例，該介面模組根據一使用者對該內容之編輯，提供回饋予該前處理模組、該OCR模組或該後處理模組。According to some specific embodiments of the present invention, the interface module provides feedback to the pre-processing module, the OCR module or the post-processing module according to a user's editing of the content.

根據本創作之部分具體實施例，該介面模組根據一使用者對該內容之編輯，提供回饋予該OCR模組，該回饋用以訓練該基於實例的學習模型。According to some embodiments of the present invention, the interface module provides feedback to the OCR module based on a user's editing of the content, and the feedback is used to train the instance-based learning model.

本創作之其他目的及優點一部分記載於下述說明中，或者可透過本創作的實施例而理解。應了解前文之創作內容及下文之實施方式僅為例示性及闡釋性之說明，而非如申請專利範圍般限定本創作。Other purposes and advantages of the present invention are partly described in the following description, or may be understood through embodiments of the present invention. It should be understood that the foregoing creative content and the following implementation methods are only illustrative and explanatory descriptions, and do not limit the creative content as in the scope of the patent application.

除非另有指明，所有在此處使用的技術性和科學性術語具有如同本創作所屬技藝中之通常技術者一般所瞭解的意義。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this work belongs.

本文所使用的「一」乙詞，如未特別指明，係指至少一個（一個或一個以上）之數量。The words "one" and "B" used in this article, unless otherwise specified, refer to the quantity of at least one (one or more than one).

本創作係提供一種文件字元辨識系統，其包含：一文件分析模組，一前處理模組，一光學字元辨識（OCR）模組，一後處理模組，及一介面模組，其中：該文件分析模組接獲一文件影像，基於一強化學習（reinforcement learning）模型決定適合該文件影像的一前處理方法及一後處理方法，並分別通知該前處理模組及該後處理模組；該前處理模組根據該前處理方法對該文件影像進行前處理；該OCR模組對經前處理的該文件影像進行文字辨識，並對應產生一文件檔案；該後處理模組根據該後處理方法對該文件檔案進行後處理；以及該介面模組提供一使用者介面，用以檢視及編輯該文件檔案之內容。 This creation provides a document character recognition system, which includes: A file analysis module, A pre-processing module, An optical character recognition (OCR) module, a post-processing module, and An interface module, in: The document analysis module receives a document image, determines a pre-processing method and a post-processing method suitable for the document image based on a reinforcement learning model, and notifies the pre-processing module and the post-processing module respectively. ; The pre-processing module pre-processes the document image according to the pre-processing method; The OCR module performs text recognition on the pre-processed document image and generates a corresponding document file; The post-processing module performs post-processing on the document file according to the post-processing method; and The interface module provides a user interface for viewing and editing the contents of the document file.

本創作之文件字元辨識系統可包含五個訊號連接的模組：文件分析模組、前處理模組、OCR模組、後處理模組及介面模組。所述五個模組可以僅藉由硬體電路來實現，或是由軟體配合硬體電路來實現。The document character recognition system of this creation can include five signal-connected modules: document analysis module, pre-processing module, OCR module, post-processing module and interface module. The five modules can be implemented only by hardware circuits, or by software combined with hardware circuits.

根據本創作之部分具體實施例，文件分析模組、前處理模組、OCR模組、後處理模組及介面模組可設置在一伺服器中。According to some specific embodiments of this invention, the document analysis module, pre-processing module, OCR module, post-processing module and interface module can be set in a server.

在一些具體實施例中，本創作之文件字元辨識系統包含彼此電性連接的多個硬體電路，其設置為用以組態成多個模組，且該等模組包含如上述的文件分析模組、前處理模組、OCR模組、後處理模組及介面模組。In some specific embodiments, the document character recognition system of the present invention includes a plurality of hardware circuits electrically connected to each other, which are configured to be configured into multiple modules, and the modules include files as described above Analysis module, pre-processing module, OCR module, post-processing module and interface module.

介面模組提供的使用者介面，可透過使用者持有的一電子裝置之顯示單元顯示。The user interface provided by the interface module can be displayed through the display unit of an electronic device held by the user.

本文中所述之「電子裝置」包括但不限於桌上型電腦、筆記型電腦、平板電腦或智慧型手機。"Electronic devices" as mentioned in this article include, but are not limited to, desktop computers, laptops, tablets, or smartphones.

如上述的文件分析模組、前處理模組、OCR模組、後處理模組及介面模組中的一或多者，可藉由一軟體程式實現，該軟體程式由伺服器或電子裝置的處理單元（例如，處理器）所執行。所述「軟體程式」包括但不限於行動軟體（mobile application, App）。For example, one or more of the above-mentioned document analysis module, pre-processing module, OCR module, post-processing module and interface module can be implemented by a software program, which is implemented by a server or an electronic device. Executed by a processing unit (e.g., processor). The "software program" includes but is not limited to mobile application (App).

本文中所述之「前處理」係指對輸入影像進行影像處理，包括但不限於傾斜校正、對比亮度適應性調整、印章或浮水印等等戳章移除、影像特定區域裁切及光影等雜訊降噪…等。The "pre-processing" mentioned in this article refers to image processing of the input image, including but not limited to tilt correction, contrast brightness adjustment, stamp removal such as seals or watermarks, cropping of specific image areas, light and shadow, etc. Noise reduction...etc.

本文中所述之「後處理」主要係指對從影像所辨識出的文字與相關資訊（例如，文字行的位置資訊、文字行的屬性類別等）進行對應處理，包括但不限於文數字校正、金額數字檢核、辨識信心程度提示、欄位表格分析與偵測、辨識結果格式化及文件影像分類…等。較佳地，後處理可使產出的結果具有格式化資訊，以利後續之應用。The "post-processing" mentioned in this article mainly refers to the corresponding processing of the text recognized from the image and related information (for example, the position information of the text line, the attribute category of the text line, etc.), including but not limited to alphanumeric correction , Amount digital verification, recognition confidence level prompts, field table analysis and detection, recognition result formatting and document image classification...etc. Preferably, post-processing can enable the output results to have formatted information to facilitate subsequent application.

在文件分析上，如果採取單純的分類方法（classification）來進行，會有類別限制上的問題，亦即，若要定義一個全新的文件，會需要對分析模型進行重新訓練，也會遇到資料量多寡的問題。In file analysis, if we adopt a simple classification method, there will be problems with category restrictions. That is, if we want to define a brand new file, we will need to retrain the analysis model, and we will also encounter data The question of quantity.

本創作的文件分析使用自監督（self-supervised）方法，透過元資料（metadata）與影像特徵（以convolutional or transformer based模型擷取），以聚類（clustering）的方式來區分文件類型。此方式雖然不會將文件定義為明確的類型，但可以從輸入文件的影像資類中得到更豐富的資訊，以供下游任務使用。所述文件可為財務報表或公文，但不以此為限。The document analysis of this creation uses a self-supervised method to distinguish document types through clustering through metadata and image features (extracted with convolutional or transformer based models). Although this method does not define the file as a clear type, it can obtain richer information from the image data type of the input file for use in downstream tasks. The documents may be financial statements or official documents, but are not limited to this.

另一方面，本創作之文件分析模組採用一強化學習（reinforcement learning）之機器學習模型，以根據輸入的文件影像決定或推薦合適的前處理方法及後處理方法。亦即，利用現有的文件影像資料與其採用的前後處理方式作為基礎，將前／後處理的方法（或方式）作為動作（action），並將每種特定領域的文件視為環境資訊（environment）及其文件的資訊設定為狀態（state），後續會讓代理人模型（agent）自動挑選任意處理模式，最後評估處理完的文件辨識效率視為獎勵分數（reward）。在整個架構中，藉由多次的迭代會讓代理人模型（agent）了解到哪種特定的文件影像（state）應該由什麼樣組合的前／後處理（action）可以獲得更好的辨識效率（reward），以此方式達到對於特定文件的自適應性的前後處理組合。On the other hand, the document analysis module of this creation uses a machine learning model of reinforcement learning to determine or recommend appropriate pre-processing methods and post-processing methods based on the input document images. That is, use the existing document image data and the pre- and post-processing methods as the basis, use the pre/post-processing methods (or methods) as actions, and treat documents in each specific field as environmental information (environment) The information of its files is set as a state, and then the agent model (agent) automatically selects any processing mode. The final evaluation of the recognition efficiency of the processed files is regarded as a reward score (reward). In the entire architecture, through multiple iterations, the agent model (agent) will learn which specific document image (state) should have what combination of pre/post processing (action) to achieve better recognition efficiency. (reward), in this way to achieve an adaptive combination of pre- and post-processing for specific files.

此外，使用者介面可允許使用者針對前／後處理提供回饋，接獲的回饋可用以進一步訓練所述強化學習模型。In addition, the user interface may allow the user to provide feedback for pre/post processing, and the feedback received may be used to further train the reinforcement learning model.

在OCR處理的部分，本創作不限於特定的OCR模型，模型能夠偵測文字框與辨識對應的文字即可。當然，辨識的效率會取決使用的模型穩定性與準確性。In the OCR processing part, this creation is not limited to a specific OCR model. The model only needs to be able to detect the text frame and recognize the corresponding text. Of course, the efficiency of identification will depend on the stability and accuracy of the model used.

根據本創作之部分具體實施例， OCR模組用於文字辨識之模型包括一基於實例的學習模型（instance-based learning model，或稱memory-based learning model），且較佳地，將使用者針對OCR處理提供的回饋，用於所述基於實例的學習模型之訓練。According to some specific embodiments of this invention, the model used by the OCR module for text recognition includes an instance-based learning model (or memory-based learning model), and preferably, the user's target The feedback provided by the OCR process is used for training the instance-based learning model.

在一些具體實施例中，將使用者回饋的文字框資訊（包含文字框的中心位置，及對應的長寬資訊）與回饋的文字辨識內容，用於基於實例的機器學習模型之訓練，並採用一知識蒸餾（knowledge distillation）之方法。在OCR模型進行回饋更新時，需考慮到模型會有災難性遺忘（catastrophic forgetting）的狀況，因此基於實例的學習與知識蒸餾都是為了確保先前的學習資訊可以延續到後續的模型上。首先，基於實例的學習提供資料面上的保護，例如，藉由將歷史的資料進行類別的多變量分析（multivariate analysis，如Gaussian mixture model），選出具有代表性的資料存放進記憶（memory）中，使模型在根據回饋更新時可以利用先前的資料讓模型穩定更新。另一方面，採用知識蒸餾的方法可延續先前模型學到的知識（即，模型中的參數資訊），可利用損失函數（loss function）的設計，將先前模型的預測資訊混合新的模型預測狀況，讓其將先前模型的參數資訊帶入至新的模型中。In some specific embodiments, the text box information fed back by the user (including the center position of the text box and the corresponding length and width information) and the text recognition content fed back are used to train an instance-based machine learning model, and use A method of knowledge distillation. When the OCR model performs feedback updates, it is necessary to consider that the model will have catastrophic forgetting. Therefore, instance-based learning and knowledge distillation are to ensure that previous learning information can be continued to subsequent models. First, instance-based learning provides data protection. For example, by performing categorical multivariate analysis (such as Gaussian mixture model) on historical data, representative data is selected and stored in memory. , so that the model can use previous data to stably update the model when updating based on feedback. On the other hand, the method of knowledge distillation can continue the knowledge learned by the previous model (that is, the parameter information in the model), and the design of the loss function can be used to mix the prediction information of the previous model into the new model prediction situation. , allowing it to bring the parameter information of the previous model into the new model.

另外，若前／後處理模組調整無法收斂，可觸發OCR模組中「影像轉換與生成」之功能，對影像進行增強（data augmentation），來輔助整個回饋訓練的過程。針對文件影像的轉換與生成，不同於一般討論的藉由生成來產生更多的資料，本創作之方式是讓生成模型（例如，generative adversarial network-based model或diffusion-based model）能夠學習到目前模型能夠辨識到的影像特徵與模式（pattern），當遇到同一個辨識的文字但印刷或書寫的方式不同導致辨識率下降時，可以利用該方法將待辨識的影像進行重新生成，在保留現有的文字資訊下來對文字影像進行合成與強化，以生成可以讓模型提升辨識率文字影像。In addition, if the pre/post-processing module adjustments cannot converge, the "image conversion and generation" function in the OCR module can be triggered to perform data augmentation on the image to assist the entire feedback training process. For the conversion and generation of document images, it is different from the generally discussed generation of generating more data. The method of this creation is to allow the generative model (for example, a generative adversarial network-based model or diffusion-based model) to learn the current The image features and patterns that can be recognized by the model. When encountering the same recognized text but printed or written in different ways, resulting in a decrease in recognition rate, this method can be used to regenerate the image to be recognized, while retaining the existing The text information is used to synthesize and enhance the text image to generate a text image that allows the model to improve the recognition rate.

現參照圖1說明本創作之文件字元辨識系統的一具體實施例如下。A specific embodiment of the document character recognition system of this invention will now be described with reference to FIG. 1 as follows.

如圖1所示，文件字元辨識系統100包含文件分析模組110、前處理模組120、OCR模組130、後處理模組140及介面模組150。其中，文件分析模組110分別與前處理模組120及後處理模組140連接（電性連接或訊號連接），OCR模組130亦分別與前處理模組120及後處理模組140連接（電性連接或訊號連接），介面模組150則分別與前處理模組120、OCR模組130及後處理模組140連接（電性連接或訊號連接）。As shown in FIG. 1 , the document character recognition system 100 includes a document analysis module 110 , a pre-processing module 120 , an OCR module 130 , a post-processing module 140 and an interface module 150 . Among them, the document analysis module 110 is connected to the pre-processing module 120 and the post-processing module 140 respectively (electrical connection or signal connection), and the OCR module 130 is also connected to the pre-processing module 120 and the post-processing module 140 respectively ( electrical connection or signal connection), the interface module 150 is connected to the pre-processing module 120, the OCR module 130 and the post-processing module 140 respectively (electrical connection or signal connection).

文件分析模組110接獲文件影像（例如，文件掃描檔）後，會基於一強化學習模型決定適合該文件影像的前處理方法及後處理方法，並分別通知前處理模組120及後處理模組140。After receiving the document image (for example, a scanned document), the document analysis module 110 will determine the pre-processing method and post-processing method suitable for the document image based on a reinforcement learning model, and notify the pre-processing module 120 and the post-processing module respectively. Group 140.

前處理模組120根據該前處理方法對文件影像進行前處理，經前處理的文件影像交由OCR模組130進行文字辨識，並產生對應的文件檔案。The pre-processing module 120 pre-processes the document image according to the pre-processing method, and the pre-processed document image is sent to the OCR module 130 for text recognition and a corresponding document file is generated.

接著，後處理模組140根據該後處理方法對文件檔案進行後處理。最後，介面模組150可透過一電子裝置提供使用者介面，供使用者檢視及編輯該文件檔案之內容。Then, the post-processing module 140 performs post-processing on the file according to the post-processing method. Finally, the interface module 150 can provide a user interface through an electronic device for the user to view and edit the content of the document file.

此外，介面模組150可允許使用者對前處理模組120、OCR模組130及／或後處理模組140之處理提供回饋。所述回饋可用以訓練文件分析模組110的強化學習模型，或用以訓練OCR模組130中的機器學習模型。In addition, the interface module 150 may allow the user to provide feedback on the processing of the pre-processing module 120, the OCR module 130 and/or the post-processing module 140. The feedback can be used to train the reinforcement learning model of the document analysis module 110 or to train the machine learning model of the OCR module 130 .

100:文件字元辨識系統100: Document character recognition system

110:文件分析模組110:File analysis module

120:前處理模組120: Pre-processing module

130:OCR模組130:OCR module

140:後處理模組140:Post-processing module

150:介面模組150:Interface module

圖1為本創作之文件字元辨識系統的一具體實施例之結構圖。Figure 1 is a structural diagram of a specific embodiment of the document character recognition system of the present invention.

100:文件字元辨識系統 100: Document character recognition system

110:文件分析模組 110:File analysis module

120:前處理模組 120: Pre-processing module

130:OCR模組 130:OCR module

140:後處理模組 140:Post-processing module

150:介面模組 150:Interface module

Claims

A document character recognition system, including: a document analysis module; a pre-processing module connected to the document analysis module; an OCR module connected to the pre-processing module; a post-processing module connected to the document analysis module and the OCR module; and an interface module connected to the pre-processing module, the OCR module and the post-processing module; wherein: the document analysis module receives a document image based on a reinforcement learning model Determine a pre-processing method and a post-processing method suitable for the document image, and notify the pre-processing module and the post-processing module respectively; the pre-processing module pre-processes the document image according to the pre-processing method; the The OCR module performs text recognition on the pre-processed document image and generates a corresponding document file; the post-processing module performs post-processing on the document file according to the post-processing method; the interface module provides a user interface. Used to view and edit the contents of the document file.

The document character recognition system of claim 1, wherein the user interface allows a user to provide feedback on the processing of the pre-processing module, the OCR module or the post-processing module.

The document character recognition system as described in claim 2, wherein feedback on the processing of the pre-processing module or the post-processing module is provided to the document analysis module, and the feedback is used to train the reinforcement learning model.

The document character recognition system of claim 1, wherein the model used by the OCR module for character recognition includes an instance-based learning model.

The document character recognition system of claim 4, wherein the user interface allows a user to provide feedback on the processing of the OCR module.

The document character recognition system of claim 5, wherein feedback on processing by the OCR module is provided to the OCR module, and the feedback is used to train the instance-based learning model.

The document character recognition system as described in claim 1, wherein the interface module provides feedback to the pre-processing module, the OCR module or the post-processing module based on a user's editing of the content.

The document character recognition system of claim 4, wherein the interface module provides feedback to the OCR module based on a user's editing of the content, and the feedback is used to train the instance-based learning model.