TW509878B

TW509878B - Correction method for table characters and the user interface thereof

Info

Publication number: TW509878B
Application number: TW88108948A
Authority: TW
Inventors: Bai-Gang Huang; Bing-Kai Shiu; Ya-Shiuan Huang; Jung-You Suen
Original assignee: Ind Tech Res Inst
Priority date: 1999-05-31
Filing date: 1999-05-31
Publication date: 2002-11-11

Abstract

The present invention discloses a correction method for table characters and the user interface thereof, which includes the following steps: first, automatically correcting the possible errors from the recognition techniques according to the character definition and scope; then, automatically correcting the field character value according to the recognition result of the picked items to reduce the amount of characters to be corrected; next, determining the attributes of connected words based on the positions of the character images and the positions of the learning samples to improve the efficiency of the correction operation; finally, displaying the characters to be corrected from different tables and different fields on the user interface to provide the character images, the character values, the periphery image for the character, the connected character processing function, and other related information, so as to improve the operational performance of the character correction.

Description

發明領域：識字元進行校正的方法* 2識裝置所得到之妾在從表格單據辨識所摘取：:=字特別是-統。 π得杈正子兀之處理I 發明背景：類型：多4政！t:和工商團體為了調查或統計各 :的貝枓1會透過印製-定格式的表單，並由 =進行資料的訪查與填寫。以行政院主計處的每月靶订-次的人力資源訪查作業為例，每次會產生平均約25，_張的訪查表單，通常每張表； :上包含大約⑽攔位需要填寫。為了統計這篁龐大的表單資料’傳統上會透過影像掃插裝置表單的内容健存成影像資料。接著利用表單欄仿定位技術找出表單上欄位的位置，並且抽取：上攔位的書寫影像資料。再利用辨識系統例如：辨識系統（OCR )辨識出影像資料所代表-予並將這些資料予以儲存並做處理，以提：以關 3 本紙張尺度適用中國國家標準（CNS ) M規格（21〇><297公釐） A7 '---------- —_B7 五、發明説明（^ ^~ ---- 團體所需要的各式資訊。對於表單定位技術而言，仍然會有些許的欄位無法被成功的定位，這些無法定位的攔位將留待後續以^張表單為影像做人工校正時才能加以處理。。至於定位成功的攔位中，辨識系統也無法達成1〇〇 %的字元辨識成功率，故而產生諸如拒認或是辨識把握度不高的辨識失敗字元。而這些辨認失敗字元將會以待校正字元的形式，顯示於螢幕之上供使用者以人工方式進行校正。次參閱第1圖，顯示了一個傳統字元校正系統的資料處理流程圖。如圖中所示，處理流程由方塊2 處起始，接著於方塊4處輸入由辨識系統所產生之辨識結果。值得一提的是，此處輸入字元校正系統的辨識結果，僅為辨識失敗的部份，至於辨識成功之資料，則不會列入此處的待校正字元之中。接著，經濟部中央標準局員工消費合作社印製這些待校正字元將會進入方塊6處進行字元校正程序。透過將辨識失敗字元的影像資料個別的顯示於螢幕上，再利用人工進行逐字的校正以完成字元校正程序。當所有的字元校正作業完成之後，接著進行以欄位為基礎，進行如方塊8所表示之欄位更正程序。之後，再依序進行如方塊1〇所示之以整張表本紙張尺度適用中國國家標準（CNS ) Α4規格（210X297公釐） A7 ~^-----------_ 五、發明説明（3 ) ^ ~~鐘 "—' --- 單為基礎的更正程序。最後，於方塊12處結束字元校，系，的資料處理流程。由於，進行攔位更正程序，疋表單更正程序時，常需要顯示部份或全部的表早影像，往往需要大量資料運算處理的時間。此外，在移動表單影像進行f料更正時，φ需大量的等待時間而降低了作業速度。在此習知技藝中，於攔位以及表單的更正程序之前進行一道字元校正的程序，將有助於減少後續欄位與表單校正時的負擔， :可同時校正來自數張不同表單和不同表單攔“ 資料，以俾在字元影像清晰時，+需檢視其他部份的影像即可完成字元的校正工作，進而加速了整體的校正速度。由於字元校正程序在整個表單的處理速度上佔有舉足輕重的地位，因此如何能提高字元校正程序的處理能力，將會深刻妁影響整個表單資料讀取的速度。因此，以下將針對字元校正技術現存的問題進行更深入的探討。經濟部中央標準局員工消費合作社印製首先針對字元辨識錯誤的情形而言，除了辨識技術未能達到完美之外，在實際操做上仍然可以歸納出以下幾種主要的原因： (一）表單影像輸入時，品質不佳、雜訊多。 5 A7 B7 五、發明説明（小） (一）搁位字元影像部份被印章遮蓋、雜訊多。 (二）字元影像區域被塗抹、晝線或有摺痕。 (四）表單製作誤差過大，造成攔位定位失敗。 (五）手寫字體過於潦草、過大或過小。 G、）書寫工具顏色太淡或太濃、過大或過小。 (七）手寫字體相連，兩個字元誤認為一假字元。、而對這些誤判模式做進一步的分析比較，可以將辨識系統辨識錯誤的行為分成下列四類：如第2A圖所示，在辨識系統所擷取的範圍只見雜訊而不見字元影像。 (一）如第2B圖所示，在辨識系統所擷取的範圍内雖有字元影像敗。 (三）如第2C圖所示發生偏差或錯誤 (四）如第2D圖所示請先^ 聞讀之注意事項再_ 填鬌. J裝頁但雜訊過多而導致辨識失訂經濟部中央標準局員工消費合作社印製辨識系統擷取字元的位置致使字元影像不完整。辨識系統所擷取的範圍中子元影像相連而導致辨識失敗。這些待校正字元將隨著所欲處理表單數目的上 =而急速膨脹，而造成字元校正作業的負荷。在這樣的情況下’亟需要一套功能強大的表單字元Field of the Invention: A method for correcting literacy elements * 2 obtained by the recognition device is extracted from the form document identification :: = character especially-uniform. π 得正子子的 treatment I Background of the invention: Type: 4 more political! In order to investigate or count each of the t: and industry and commerce groups, Bei 1 will print and format the form, and the data will be checked and filled out by =. Take the monthly target-time human resource visits of the Chief Accounting Office of the Executive Yuan as an example, each visit will generate an average of about 25, _ Zhang interview forms, usually each table;: contains about 包含 stops need to be filled in . In order to count this huge form data, traditionally, the content of the form is saved as image data through the image scanning device. Then use the form column imitation positioning technology to find the position of the field on the form, and extract: the written image data of the upper block. Re-use the identification system, such as: The identification system (OCR) identifies the image data-and stores and processes the data to mention that the following three paper standards apply the Chinese National Standard (CNS) M specification (21〇 > < 297 mm) A7 '---------- —_B7 V. Description of the invention (^ ^ ~ ---- Various information required by the community. For the form positioning technology, still There will be some fields that cannot be successfully positioned. These untargeted blocks will be left for subsequent manual correction based on the ^ form as an image. As for the successfully positioned blocks, the recognition system cannot reach 1 〇％% success rate of character recognition, so recognition failure characters such as rejection or low recognition accuracy are generated. These recognition failure characters will be displayed on the screen in the form of characters to be corrected. The user performs the correction manually. Refer to Figure 1 for the data processing flowchart of a traditional character correction system. As shown in the figure, the processing flow starts at box 2 and then enters from box 4 Identification system It is worth mentioning that the recognition result of the character correction system entered here is only the part of the recognition failure. As for the data of successful recognition, it will not be included in the characters to be corrected here. Then, the employees of the Central Bureau of Standards of the Ministry of Economic Affairs printed the characters to be corrected and entered the box 6 to perform the character correction process. By displaying the image data of the characters that failed to be identified individually on the screen, it was manually performed. Verification is performed verbatim to complete the character correction procedure. After all the character correction operations are completed, the column-based correction process is performed as shown in box 8. After that, the steps are performed in sequence as in box 1. 〇The paper size shown in the table is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) A7 ~ ^ -----------_ V. Description of the invention (3) ^ ~~钟 "-'--- Single-based correction procedure. Finally, end the character correction, department, and data processing flow at box 12. Because of the correction procedure of the stop and the form correction procedure, it is often necessary to display Partial or full Part of the early image often requires a lot of data processing time. In addition, when moving the form image to correct the f material, φ requires a lot of waiting time and reduces the operating speed. In this conventional technique, Performing a character correction procedure before the form correction procedure will help reduce the burden of subsequent field and form corrections: it can correct "data from several different forms and different form blocks at the same time to save on the character image When it is clear, + you need to view other parts of the image to complete the character correction, which speeds up the overall correction speed. Because the character correction process plays a significant role in the processing speed of the entire form, how can you improve the character? The processing power of the meta-correction program will profoundly affect the speed of reading the entire form data. Therefore, the existing problems of character correction technology will be discussed in more detail below. In the case of the character recognition error of the Central Bureau of Standards of the Ministry of Economic Affairs, first of all, in addition to the failure of the recognition technology to achieve perfection, the following main reasons can be summarized in actual operation: (1) When inputting form images, the quality is poor and noise is high. 5 A7 B7 V. Description of the Invention (Small) (1) The image of the characters on the seat is covered by the seal and there is much noise. (2) The image area of the character is smeared, daylight or creased. (4) The form making error is too large, causing the stop positioning failure. (5) The handwritten font is too scribbled, too large or too small. G.) Writing instruments are too light or too thick, too large or too small. (7) The handwritten fonts are connected, and the two characters are mistaken for a fake character. For further analysis and comparison of these misjudgment modes, the behavior of the identification system's identification errors can be divided into the following four categories: As shown in Figure 2A, in the range captured by the identification system, only noise and no character images are seen. (1) As shown in Figure 2B, although there is a character image failure in the range captured by the recognition system. (3) Deviations or errors occur as shown in Figure 2C (4) As shown in Figure 2D, please ^ Read the notes before reading _ Fill in. J page but too much noise caused identification misordering Central Ministry of Economic Affairs The position of the characters captured by the Standards Bureau ’s Consumer Cooperative ’s print recognition system resulted in incomplete character images. The neutron images in the range captured by the recognition system are connected and the recognition fails. These characters to be corrected will rapidly expand as the number of forms to be processed increases, resulting in a load of character correction operations. In this case ’it ’s urgent to have a powerful set of form characters

杈正系統來處理上述的問題。 Μ 訂在此領域中’以往亦有做過一些相關的研究與發展。例如，中華民國專利0873 19號（徐英士等）提出一個用於電腦辅助資料登錄之影像顯示方法，依序更正字元，再更正攔位、最後更正整張表單。此參考例所提出之人工校正字元欄位内被拒認字元的方法’係為一種以單張表單為基礎的校正作業方式。其字元校正晝面如第3圖所示，將該待校正字 f之原始表單的檔案標示資訊顯示於螢幕下方的標，14處。標號1 6則顯示了待校正字元影像，而游、則提供了使用者直接進行校正的功能。此方法、、點疋，备手寫文字影像因為影像不清楚、影像 =取位置偏差，或受到雜訊影響而無法正確判斷文 ::夺’校正作業人員必須忽略該字元，又當手寫文予影像相連時，即無法一次校正完成該字元。綠作2美國專利5251273 (BettS等）提出表單閱 =業系、、统架構流程以及資料結構設計，依序更正勺：辨識表單後之錯誤。此參考例所提出之裝置中The system is dealing with the above problems. In this field, some related research and development have been done in the past. For example, the Republic of China Patent No. 0873 No. 19 (Xu Yingshi, etc.) proposed an image display method for computer-aided data registration, correcting characters in order, correcting stoppages, and finally correcting the entire form. The method of manually correcting the rejected characters in the character field proposed by this reference example is a correction operation method based on a single form. The character correction day is shown in Figure 3. The file labeling information of the original form of the character f to be corrected is displayed at the bottom of the screen at 14. The reference numeral 16 shows the image of the character to be corrected, and You and Yu provide a function for the user to perform correction directly. With this method, click, the handwritten text image cannot be correctly judged because the image is not clear, the image = position deviation, or affected by noise :: The correction operator must ignore the character and use the handwritten text to When the images are connected, the character cannot be corrected at one time. Green Works 2 US patent 5251273 (BettS, etc.) proposed form reading = industry department, system architecture flow, and data structure design, and correct them in order. Scoop: identify the errors after the form. In the device proposed by this reference example

St:辨識資料處理器，和-種以機器所產生之用來記錄辨識結果及更正歷史，並將其，傳至母個處理器。當錯誤更正處理完成後，工 (210X297公| ) 五、發明説明（< 作站螢幕上會顯示欄位影像供人力更正。再如美國專利5305396號（Betts 閱讀作業系統架構流程以及資料結構設計，依= 正掃描辨識表單後之錯誤。可針對不同客戶表單選擇字兀辨識流程以及辨識資料更正流程。此參考例所提出在辨識之前輸入一表單模板，該模板内含根據客戶需求而設定之系統操作參數。總括而言，以上的習知技藝在進行螢幕更正程序時，僅能顯示待校正字S影像、所在攔位之說明和其辨識結果在螢幕上。而這種作法對於雜訊影像、相連字元，以及抽取位置錯誤偏差的字元影像皆無法做有效的校正。此外，對於待校正字元，亦沒有考慮字元所在攔位本身對字元屬性的定義與範圍上的限制，而無法預先降低待校正字元的數目。因此，本發明即基於這樣的背景，提出了此一高效能的表單字元校正方法與使用者界介面。經濟部中央榡準局員工消費合作社印製發明目的及概述：本發明之目的為在提供一種表單字元校正方法與使用者界介面，能有效降低經辨識系統處理後之本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） 509878 五、發明説明（1 ) 待校正字元數目，可以箱春白紅女一預先自動修正辨識錯誤窣元’並可直接修正相連字元。、本發明之又另一目的為提供一種供表單字元校正？統應用之使用者界面’可跨越表單攔位進行‘ 元校正。根據前述發明背景與相關先前技藝中所提及的問題，本發明提出了_種新的表單字元校正方法，以及功能更強大的使用者界面。關於其簡要内容則分述如下。當表單經過辨識系統處理之後會產生辨識失敗的待校正字元，而這些待校正字元將會進入螢幕字儿校正程序巾進行處理。首先字元校正程序將會依據該待校正字元相對應之攔位的屬性、定義，以及其合理範圍值進行自動修正，其修正流程如下所示·· (一）檢查該待校正字元的是否落於其相對應之搁位疋義與範圍之内。 (二）若是，則不修正該待校正字元值。若否，則修正該待校正字元的字元值，並選取符合該攔位定義與範圍的最佳候選字。工張尺度適 509878 A7 B7 五、發明説明（免）其中，所謂的候選字係為辨識系統所產生對應該待校正字元的可能辨識結果。當待校正字元經過檢查定義與範圍後，本發明之字元校正方法會接著檢查表單上有無勾選符號可供利用，若有則依勾選符號修正該待校正字元值，其處理步驟如下： (一）檢查勾選項目辨識結果與攔位字元辨識結果是否相同。 (二）若是，則不修正該字元值。訂若否，則修正該字元值為勾選項目辨識結果。值得-提的是，上述檢查攔位定義範圍以及檢查 .勾選符號的程序，皆於表單字元校正系統的内部直接處理而不會顯示於螢幕之上。當待校正字元經過前4兩個步驟的自動修正後’接著顯示於螢幕之上以供人力直接進行校正。而：：明針對校正時所遭遇到的連字問題特別提出一道處理流程如下： (一）檢查字元的影像位置，判斷是否落在一個以 ^紙張尺度適用中^^準（CNS ) A4規格^ 10 509878 A7 --*—______ _B7 五、發明説明（^~ — 一 ~~--一. 上的字元位置。 (二）若是，則判斷為相連字元屬性。若否，則判斷為不相連字元屬性。而所判斷出相連字元的結果將顯示於本發明所提供的使用者界面之中’以提供操作者辨別與直接修正相連字元的能力。至於所提出的使用者界面則至少可以提供下列的使用界面： ' ()子元影像顯示區，以顯示多個待校正字元的影像 > 料與該待校正字元的辨識結果，並允許操作者直接透過螢幕更正該^ 識結果。 (二）字元週邊影像顯示區，以顯示該待校正字元之週邊影像。 (二）連字功能顯示區’以提供並顯示相連待正更字元的訊息。 (四）其他相關的字元影像處理訊息。▲ 經濟部中央榡隼局員工消費合作社印製綜上所述，透過預先的字元檢查動作，將可以有效的減少待校正字元的數量。而本發明所提供的使用者界面具有相連字元的處理功能，以及更強大的文字校正能力。因此本發明的表單字元校正方法 11 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐y 509878 A7St: identification data processor, and a kind of machine-generated data used to record identification results and correction history, and pass it to the parent processor. When the error correction process is completed, the work (210X297 male |) V. Invention description (& the station image will be displayed on the screen for human correction. Another example is US Patent No. 5305396 (Betts reading operating system architecture flow and data structure design) According to = the error after scanning the identification form. You can select the character recognition process and the identification data correction process for different customer forms. This reference example proposes to input a form template before identification, which contains the settings set according to customer needs. System operating parameters. In summary, the above known techniques can only display the image of the word S to be corrected, the description of the stop and the recognition result on the screen when performing the screen correction procedure. This method is useful for noise images. , Concatenated characters, and character images extracted from the wrong position cannot be corrected effectively. In addition, for the characters to be corrected, the definition of the character attributes and the limits of the character's block itself are not taken into account. Therefore, the number of characters to be corrected cannot be reduced in advance. Therefore, the present invention proposes this method based on such a background. High-efficiency form character correction method and user interface. The purpose and summary of the invention printed by the Employees' Cooperative of the Central Government Bureau of the Ministry of Economic Affairs: The purpose of the present invention is to provide a form character correction method and user interface. Effectively reduce the paper size processed by the identification system. Applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 509878. 5. Description of the invention (1) The number of characters to be corrected can be corrected automatically in advance. Errors can be corrected directly. Another object of the present invention is to provide a user interface for the correction of form characters? The user interface of the unified application can perform meta corrections across form stops. According to the foregoing background of the invention Related to the problems mentioned in the prior art, the present invention proposes a new form character correction method and a more powerful user interface. The brief content is described below. After the form is processed by the recognition system Characters to be corrected that fail to be identified will be generated, and these characters to be corrected will enter the screen character correction procedure. First, the character correction program will automatically correct according to the attributes, definitions, and reasonable range values of the block corresponding to the character to be corrected. The correction process is as follows: (1) Check the to be corrected Whether the character falls within the meaning and range of its corresponding placeholder. (2) If yes, the value of the character to be corrected is not modified. If not, the character value of the character to be corrected is modified, and Select the best candidate that meets the definition and scope of the block. Gage scale is 509878 A7 B7 5. Description of the invention (exempt) Among them, the so-called candidate is the possible recognition result generated by the recognition system corresponding to the character to be corrected. After the character to be corrected has been checked for definition and range, the character correction method of the present invention will then check whether a check mark is available on the form, and if so, correct the value of the character to be corrected according to the check mark. The steps are as follows: (1) Check whether the recognition result of the checked item is the same as the recognition result of the block character. (B) If yes, the character value is not modified. If not, modify the character value to the result of checking the selected item. It is worth mentioning that the above-mentioned definition of the check block and the check. The procedures of checking the symbols are directly processed in the form character correction system and will not be displayed on the screen. After the characters to be corrected have undergone the first two steps of automatic correction ’, they are then displayed on the screen for manual correction by humans. And: Ming specifically proposes a processing flow for the ligatures encountered during correction: (1) Check the image position of the character to determine whether it falls within a ^ paper standard applicable ^^ quasi (CNS) A4 specification ^ 10 509878 A7-* —______ _B7 V. Description of the invention (^ ~ — 一 ~~-一. The character position on (.)) If yes, it is judged as the connected character attribute. If not, it is judged as Unconnected character attributes. The results of the determined connected characters will be displayed in the user interface provided by the present invention to provide the operator with the ability to identify and directly modify the connected characters. As for the proposed user interface Then at least the following user interface can be provided: '() Child image display area to display images of multiple characters to be corrected > material and the recognition result of the character to be corrected, and allow the operator to correct the ^ Recognize the results. (2) Character surrounding image display area to display the surrounding image of the character to be corrected. (II) Hyphenation function display area 'to provide and display the information of the characters to be corrected. (IV) Other related Character image processing information. ▲ Printed by the Consumer Cooperatives of the Central Government Bureau of the Ministry of Economic Affairs In summary, through the advance character check action, the number of characters to be corrected can be effectively reduced. The use provided by the present invention The user interface has the processing function of connected characters and more powerful text correction capabilities. Therefore, the form character correction method of the present invention 11 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm y 509878 A7)

9878 A7 B79878 A7 B7

五、發明説明（I 第6A圖為本發明方法中，刺用字元屬性範圍以修正字元辨識結果之流程圖。第6B圖為本發明方法中，利用字元屬性範圍以修正字元辨識結果之示意圖。第6C圖為本發明方法中，利用字元屬性範圍以修正字元辨識結果之示意圖。第7A圖為本發明方法中，利用勾選符號以修正字元辨識結果之流程圖。第7B圖為本發明方法中，利用勾選符號以修正字元辨識結果之示意圖。第8圖為本發明方法中之使用者界面示意圖。第9圖為本發明之使用者界之一具體實施例。第1 Ο A圖為本發明中之使用者界面的單一字元影像與文字示意圖。經濟部中央標準局員工消費合作社印製第10B圖為本發明之使用者界面的欄位週邊影像與攔位說a月示意圖。第10C圖為本發明之使用者界面的攔位週邊影像與攔位說明示意圖。第11圖為本發明之使用者界面的相連字元影本紙張尺度適用中國國家標準（CNS ) A4規格（210><297公釐） ^)9878 經濟部中央標隼局員工消費合作社印製 A7 B7 五、發明説明（像與文字示意圖。第1 2 A圖為本發明中判斷左右相連字元之準則示意圖。第12B圖為本發明中判斷上下相連字元之準則示意圖。第12C圖為本發明中相連字元的判斷流程圖。第1 3圖為本發明中相連字元影像的顯示圖。第14圖為本發明之使用者界面中，相連字元攔位週邊影像與欄位說明示意圖。發明詳細說明：本發明提出了一套新的字元校正系統，以先期減少辨識系統中所產生的辨識失敗字元數目。之後透過本發明所揭露的相連字元辨識方法以及功能強化的使用者界面，提高螢幕字元校正程序的字元校正能力，並減輕後續對整個攔位或是整張表單辨識時的負擔。以下將以一個具體實施例來說明本發明之精神0 又首先參閱第4圖，顯示了一個針對表單、/ 料萃取之系統功能方塊示意圖。如圖中所，一進 < 亍身 ^不，表單 14 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） (請先閱讀背C之注意事項再填寫本頁)V. Description of the Invention (I FIG. 6A is a flowchart of using the attribute range of a character to modify the character recognition result in the method of the present invention. FIG. 6B is a flowchart of modifying the character recognition by using the character attribute range in the method of the present invention. A schematic diagram of the results. FIG. 6C is a schematic diagram of using the attribute range of the character to modify the character recognition result in the method of the present invention. FIG. 7A is a flowchart of correcting the character recognition result by using the check mark in the method of the present invention. FIG. 7B is a schematic diagram of using the check mark to modify the character recognition result in the method of the present invention. FIG. 8 is a schematic diagram of a user interface in the method of the present invention. FIG. 9 is a specific implementation of a user community of the present invention. Example: Figure 10A is a single-character image and text diagram of the user interface in the present invention. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs. Figure 10B is the surrounding image of the user interface field and Schematic diagram of stoppage a month. Figure 10C is a schematic diagram of the perimeter image and description of the stoppage of the user interface of the present invention. Figure 11 shows the connection of the user interface of the present invention. Yuanying's paper size applies Chinese National Standard (CNS) A4 specifications (210 > < 297 mm) ^) 9878 Printed by A7 B7, Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs 5. Description of the invention (Image and text diagram. Section 1 Fig. 2A is a schematic diagram of the criteria for judging left and right connected characters in the present invention. Fig. 12B is a schematic diagram of criteria for judging up and down connected characters in the present invention. Fig. 12C is a flowchart for judging connected characters in the present invention. The figure is a display diagram of the connected character image in the present invention. FIG. 14 is a schematic diagram of the surrounding image block and related field description of the connected character block in the user interface of the present invention. Detailed description of the invention: The present invention proposes a new set of Character correction system to reduce the number of characters that fail recognition in the recognition system in advance. Then, through the method for identifying connected characters and the enhanced user interface disclosed in the present invention, the character correction of the screen character correction process is improved. Ability, and reduce the burden on subsequent identification of the entire block or the entire form. The following will explain the spirit of the present invention with a specific embodiment. First, refer to Figure 4, which shows a schematic diagram of the system function block for form and material extraction. As shown in the figure, as soon as you enter < 亍身 ^ No, form 14 This paper size is applicable to China National Standard (CNS) A4 specifications ( 210X297 mm) (Please read the precautions of C before filling out this page)

5(098785 (09878

經濟部中央標準局員工消費合作社印製Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs

20經由掃插機22的掃描將被轉換成影送至掃描系…中做進一步的處理。：:後’ 24:會配合储存於資料載趙26 (如，磁.碟、：先碟專）之中的各類資料與程式，萃取出表資料並予以儲存。掃描系統2 4會依據所掃插 2〇^儲存每批次文件量…接著，㈣220 Scanning by the scanner 22 will be converted into images and sent to the scanning system for further processing. ：: 后 ’24: will cooperate with all kinds of data and programs stored in the data contained in Zhao 26 (for example, magnetic disk, disc first) to extract the table data and store it. Scanning system 2 4 will store the amount of documents in each batch according to the scanned insert 2〇 ^ Then, ㈣2

會根據預先學f的模式樣板，從表單影像36中萃取出各欄位内的影像資料，並進行辨識的程序。告辨識元成之後’會產生-定的辨識結果38。一般而I ΐ=Γ力：28都會依據辨識效果的可靠程產生辨識成功與辨識失敗等兩類的辨識敗的辨識結果38將會交由字讀正系統3G，將= 辨識失敗的資訊顯示在螢幕上、_ 、料的校正與登錄。最後字元校正系統30:::::The image data in each field will be extracted from the form image 36 according to the pattern template learned in advance, and the identification process will be performed. After the identification of Yuanyuan ’, a certain identification result 38 will be produced. Generally, I ΐ = Γ force: 28 will generate two types of recognition failure recognition results, such as recognition success and recognition failure, according to the reliable process of the recognition effect. 38 will be passed to the word reading positive system 3G, and the information of = recognition failure will be displayed on Calibration and registration of materials on the screen. Last character correction system 30 :::::

=:;:單資料儲存”料庫…，以供彳I 參閱第5圖，當辨識系统 w 讀取並進行辨識之後，辨識失先:的表字早中：字元資料幕字元校正程序之中做進 70將會進入榮圖中所示之字元校正流程，二:：校正工作。如少考拉μ 矛王序開始於方塊44 處’接者’於方塊46處取得由辨識系統所傳送來=:;: Single data storage "database ... For reference, please refer to Figure 5. After the recognition system w reads and recognizes, the recognition is not preceded by: The table character is in the middle: Character data screen character correction procedure If you make 70 in the middle, you will enter the character correction process shown in the picture. Second: Correction work. For example, the lesser Koala μ Spear King sequence starts at box 44 'Receiver' obtained at box 46 by the recognition system. Sent

本纸張尺度適用中晒家#This paper is suitable for Zhongsunjia #

----„----—· (請先聞讀背面之注意事項再填寫本I) m n If HI m . • In 1--1 509878 A7 B7 五、發明説明（丨If ) 之待校正字元之資料。之後於方塊48處針對各字元相對應的欄位應有的屬性或字元範圍做先行的校正處理，以減低待校正字元的數量。由於各式表單除了字元外，通常也會含有相對應字元值的勾選符號。若字元具有相對應勾選符號時，待校正字元將會於方塊50處，依據勾選符號修正字元值。值得_ 提的是，方塊48中利用攔位屬性範圍修正字元，以及方塊5 0利用勾選符號修正字元之步驟，皆為系統内部的運算過程而不會顧示於螢幕之上。然後，待校正字元將會於方塊52處判斷相連字元的情形，並將待校正字元顯示於螢幕之上供人工直接校正。方塊54會檢查是否所有待校正字元皆已完成校正，若否’則重回方塊46 ;若是，則結束整個螢幕字元校正程序，如方塊56所示。接著參閱第ό A圖，顯示了本發明中利用字元範圍與屬性修正字元的流程圖，其流程如下所示： (一）啟動流程，如方塊57。 (二）檢查辨識失敗字元之結果是否落於該攔位字元定義與範圍值之内，如方塊59。 (三）若步驟（二）為是，則不更正字元直接執行方請 m 閲讀雲‘ 之注意事項再填 % 本頁經濟部中央標準局員工消費合作社印製---- „----— · (Please read the notes on the back before filling in this I) mn If HI m. • In 1--1 509878 A7 B7 V. Explanation of the invention (丨 If) to be corrected Character data. Then, at box 48, correct the properties or character ranges corresponding to the fields corresponding to each character in advance to reduce the number of characters to be corrected. Because various forms are not only characters, , Usually also contains a check mark corresponding to the character value. If the character has a corresponding check mark, the character to be corrected will be at box 50, and the character value will be corrected according to the check mark. It is worth mentioning Yes, the steps of correcting characters by using the block attribute range in block 48 and correcting characters by using a check mark in block 50 are internal calculation processes of the system and will not be displayed on the screen. Then, the word to be corrected Yuan will judge the situation of connected characters at block 52, and display the characters to be corrected on the screen for manual direct correction. Block 54 will check whether all the characters to be corrected have been corrected, and if not, repeat Return to box 46; if yes, end the entire screen character correction process , As shown in block 56. Next, referring to FIG. 6A, a flowchart of using the character range and attribute correction characters in the present invention is shown. The flow is as follows: (1) Startup flow, as shown in block 57. (2) ) Check whether the result of identifying the failed character falls within the definition and range value of the block character, such as block 59. (3) If step (2) is YES, the character is executed directly without correction. Please read the cloud. 'Note for refill% Printed on this page by the Consumers' Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs

16 509878 A7 B7 五、發明説明（丨塊67判斷是否已檢查完所有字元：若是，則中止流程，如方塊69 ;若否，則以下一個待校正字元重新進行方塊59所示之流程。 (四）若步驟（二）為否，則繼續執行方塊63檢查是否有合於該字元定義與範圍之候選字。 (五）若步驟（四）為否，則不更正字元直接執行方塊67判斷是否已檢查完所有字元：若是，則中止流程’如方塊69 ;若否，則以下一個待校正字元重新進行方塊59所示之流程。 (六）若步驟（四）為是，則進行方塊65以該候選字更正該字元。 (七）判斷是否已檢查完所有字元，如方塊67 ,若是’則中止流程，如方塊69 ;若否，則以下一個待校正字元重新進行方塊57所示之流程直到檢查完所有字元為止。經濟部中央標隼局員工消費合作社印製參閱第6B圖、第6C圖，顯示了本發明中利用字元相對應攔位之屬性及範圍進行校正的示意圖。在說明之前，首先定義一個符號：16 509878 A7 B7 V. Description of the invention (block 67 judges whether all characters have been checked: if yes, the process is aborted, as shown in block 69; if not, the next character to be corrected is performed again as shown in block 59. (4) If step (2) is no, continue to block 63 to check whether there are candidate words that fit the definition and range of the character. (5) If step (4) is no, then directly execute the block without correcting the character. 67 to determine whether all characters have been checked: if yes, then the process is aborted, as in block 69; if not, the next character to be corrected is re-performed in the process shown in block 59. (6) if step (4) is yes, Then proceed to block 65 to correct the character with the candidate word. (7) Determine whether all characters have been checked, such as block 67, if it is', then abort the process, such as block 69; if not, the next to-be-corrected character is restarted. Perform the process shown in block 57 until all characters have been checked. Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs, refer to Figures 6B and 6C, which show the attributes and corresponding attributes of the characters in the present invention. Schematic diagram of range correction. Before explaining, first define a symbol:

Fm-n :為表單中編號為m之攔位（Field )，其 17 經濟部中央標準局員工消費合作社印製五、發明説明（丨6 令第η個字元。如第βΒ圖戶斤示，生一讀，而依據1辨燸的疋衫像6〇無法經由辨識类辨識的可能性提供㈣其中第-順位候選字62 而絲S 8各分从值為1。接者子兀校正系、、、 9依據該字元的屬性鱼笳& π 始抓坦ω 興耗圍，同時配合辨識系、，先所棱供的候選字修正字元 mm ^ ^ ^ 所明候選子係指當 :識：統進行辨識時，若是對字元的辨識把握度不 :接二Γ對辨識字元可能的辨識結果依照可能性排，供數個該辨識字元的候選字元。如第6B圖中框才^中所表示出的f訊可以明瞭，F㈣字元的屬性子1^其範圍則在〇〜2之間。由辨識系統所產生的第-順位候選字62為！，經檢查後落於該字元的範圍之間，因此該字元不做修正。又如另—實施例第6C圖所示，字元影像66具有一組候選字68。 =中第一順位候選字70的值為7,而第二順位候選字72的值則為丨，以下則依此順序類推。框格64 顯示出該Fm-n字元的屬性為數字，而其範圍係介於 ° 2之間。經檢查後發現第一順位候選字7 0的值為 7不在〇〜2之間，故不可能為正確的數字。接著往下尋找第二順位候選字72，其值為1恰落於〇〜2Fm-n: It is the field number m in the form. It is printed by 17 Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs. 5. Description of the invention (6th order η character. As shown in Figure βB) , Read a while, and the identification of the shirt according to 1 60 can not be identified through the recognition class provides the possibility of which-the first candidate word 62 and silk S 8 points from the value of 1. The correction system 、、、 9 According to the character's attributes, Yu 笳 & π starts to grasp the ω and consumes the surroundings, and at the same time cooperates with the identification system, the candidate characters that are provided first are modified characters mm ^ ^ ^ : Recognition: When the recognition is performed collectively, if the recognition accuracy of the character is not: then the possible recognition results of the recognition character are ranked according to the possibility, and several candidate characters for the recognition character are provided. As shown in FIG. 6B The f message shown in the middle frame is clear, and the attribute of the F㈣ character 1 ^ is in the range of 0 ~ 2. The -th candidate word 62 generated by the recognition system is! After inspection, Falls between the range of the character, so the character is not modified. As shown in Figure 6C of the other embodiment, the character shadow 66 has a set of candidate words 68. The value of the first candidate candidate 70 in the == 7, and the value of the second candidate candidate 72 is 丨, and so on, and so on. The box 64 shows the Fm-n The attribute of a character is a number, and its range is between ° 2. After inspection, it is found that the value of the first candidate candidate 70 is 7 is not between 0 and 2, so it is impossible to be a correct number. Then go to To find the second candidate candidate 72, whose value is 1 and falls just between 0 ~ 2

• m .......I 1 B =111 · '丨丨---_裝丨丨 /請先閲讀背1¾之注意事項再填寫衣f) 訂 ΐδ 本紙張尺度適用中國國家標準（CNS ) Α4規格（210Χ297公釐）經濟部中央標準局員工消費合作社印製 509878 A7 ___B7 五、發明説明（A ) 之間故修正字元Fm_n的值為1。而透過此步驟的檢查，將可初步減少待校正字元的數量。接著參閱第7A圖，顯示了本發明中利用勾選符號修正字元的流程圖，其流程如下所示： (一）啟動流程，如方塊7 3。 (二）檢查勾選項目是否與該對應之字元值相同，如方塊7 5。 (三）若步驟（二）為是，則不更正字元直接執行方塊7 9判斷是否已檢查完所有字元：若是，則中止流程，如方塊81 ;若否，則以下一個待校正字元重新進行方塊75所示之流程。 (四）若步驟（二）為否，則進行方堍77以該勾選符號之值更正該字元。 (五）判斷是否已檢查完所有字元，如方塊79，若是，則中止流程，如方塊81 ;若否，則以下一個待校正字元重新進行方塊75 所示之流程直到檢查完所有字元為止。參閱第7B圖，顯示了本發明中利用勾選符號修 «19 本紙張尺度適用中國國家標準（CNS ) Α4規格（210Χ297公釐） — ·！1Γ (請先閱讀f·面之注意事項再填寫本頁) .裝- -訂正字元值的示意圖。由提供攔位允許作業人員查5夕表單在填寫時，除了欄位書寫内容的勾選符：寫之外，也會有相對應怒的左側具有選項1金選二M第7B圖為例，搁位7 供作業人員書寫記錄其右側的空白區域則表相對應的選項卜若是：：寫的字元值為1，則代相對應的選項2。而作業人，字元值為2,則代表會同時記錄書寫字元與勾#;真\表單時，通常都會同時針對這兩個部份進至於辨識系統則號辨識的正確率十分的*仃辨識。由於對於勾選符符號的辨識社果不^ ’因此當書寫字元與勾選寫字亓=: 將以勾選符號的值更正書 = = 如圖中所示，辨識系統將書寫 ' 、而根據勾選符號的辨識結果則為選字元的辨識結果，直接修正辨識 =的值為丨。由於辨識系統對於勾選符號的辨識率丰间&對於具有勾選符號的表單欄位步驟將可以更進-步減少待校正字元數量。之後再將無法經由字元屬性範圍或勾選符號修正的待校正字顯示於螢幕之上而透過人工進行螢幕字元校正程序。值得一提的是’第6B圖、第6c圖，與第 7B圖所說明的待校正字元的處理皆為字元校正 509878 A7• m ....... I 1 B = 111 · '丨丨 ---_ installation 丨丨 / Please read the precautions on the back 1¾ before filling in the clothes f) Order ΐ This paper size applies to Chinese national standards (CNS ) A4 specification (210 × 297 mm) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 509878 A7 ___B7 5. Between the description of the invention (A), the value of the modified character Fm_n is 1. Through the inspection in this step, the number of characters to be corrected can be reduced initially. Next, referring to FIG. 7A, a flowchart of using the check mark to modify characters in the present invention is shown. The flow is as follows: (1) Start-up flow, such as block 73. (2) Check whether the selected item is the same as the corresponding character value, such as box 7 5. (3) If step (2) is YES, directly execute the block without correcting the characters 7 9 to determine whether all characters have been checked: if yes, abort the process, such as block 81; if not, then the next character to be corrected is Repeat the process shown in block 75. (4) If step (2) is no, proceed to F77 to correct the character with the value of the check mark. (5) Determine whether all characters have been checked, such as block 79, if yes, abort the process, such as block 81; if not, then perform the process shown in block 75 again on the next character to be corrected until all characters have been checked until. Refer to FIG. 7B, which shows the use of a check mark in the present invention to modify «19 This paper size applies the Chinese National Standard (CNS) A4 specification (210 × 297 mm) — ·! 1Γ (Please read the precautions for f · face before filling out this page). Schematic diagram of correcting the character value. Provide a stop to allow the operator to check the form on the 5th. When filling out the form, in addition to the check mark of the field writing content: In addition to writing, there will also be corresponding anger on the left side with option 1 gold option 2M Figure 7B as an example. Shelf 7 is for the operator to write and record the blank area on the right side of the table. If it is: The written character value is 1, the corresponding option 2 is substituted. For the operator, the character value is 2, which means that the book characters and ticks will be recorded at the same time. Normally, the two parts will be entered into the identification system at the same time. Identify. Because the recognition of the check mark symbol is not ^ ', so when writing characters and check marks 亓 =: The book will be corrected with the value of the check mark = = As shown in the figure, the recognition system will write', and The recognition result according to the check mark is the recognition result of the selected character, and the value of recognition = is directly corrected to 丨. Due to the recognition rate of the check mark by the recognition system, Toyoma & for form fields with check marks, the step will be further-reducing the number of characters to be corrected. After that, the characters to be corrected that cannot be corrected through the character attribute range or check mark are displayed on the screen and the screen character correction process is performed manually. It is worth mentioning that 'Figures 6B, 6c, and 7B process the characters to be corrected are all character corrections 509878 A7

I 請先閎讀背· 面之注意事項再填寫本頁裝 509878I Please read the memorandum items on the back and front before filling in this page. 509878

本紙張尺度適用中國國家襟發明所判斷出的相連字元以作進一步的校正。當所點選的字元影像92於連字功能顯示區中顯示為二連字元時，即可直接种對兩個字元作一次的修正，而改進了習知技藝中僅能修正成單一字元的缺點。如第9圖所示’即為使用者界面82的具體使用圖例，透過使用輸人裝置’操作者將可以直接修正使用者界面82中的待校正字元。參閱第U)A圖〜第1〇c圖，顯示了前述使用者界面中透過字元週邊影像顯*區所能夠達到的功效。如第屬圖所示，字元影像96經辨識後的文字〇:(如標號98所示）。然而該字元影像的值可能真疋為1，也可能僅是表單中的雜訊而已。透過第 10B圖的字元週邊影像顯示區域跡以宏觀的方式檢視子70影像96的週邊區域可發現影像96的值直為1。如若參酌第㈣圖’透過字元週邊影像^ 區域1〇8將可發現，影像96其實僅為雜訊而已。傳 =單螢！校正作業人員僅能夠從辨識系統中所抽行判讀，若是所抽取的影像僅為辨識系、洗在U影像時的雜訊，操作值，透過字元週邊影像顯：：如供的宏觀視角，择作去脸1、，/ ^ 备作者將可以报容易的判斷出每 22 (210X297公釐）This paper standard applies the concatenated characters determined by the Chinese national invention for further correction. When the selected character image 92 is displayed as two ligature characters in the ligature function display area, the two characters can be directly modified once, and the conventional technique can only be modified into a single character. Disadvantages of characters. As shown in FIG. 9 ′ is a specific example of the use of the user interface 82, and by using the input device, the operator can directly modify the characters to be corrected in the user interface 82. Refer to Figures U) A to 10c, which show the functions that can be achieved through the image display area around the characters in the aforementioned user interface. As shown in the subordinate figure, the recognized characters 0: of the character image 96 (as shown by reference numeral 98). However, the value of the character image may be true, or it may simply be noise in the form. Looking at the surrounding area of the image around the character in Figure 10B, the surrounding area of the sub-70 image 96 can be seen in macro view. If you refer to Figure ’through the surrounding image of the character ^ Area 108, you can find that the image 96 is only noise. Biography = single firefly! The calibration operator can only read from the recognition system. If the extracted image is only noise from the recognition system and washed in the U image, the operation value is displayed through the surrounding image of the character: Choosing to face 1, 1, / ^ The author will be able to easily report every 22 (210X297 mm)

509878 A7 B7 五、發明説明（ >丨經濟部中央標準局員工消費合作社印製個子元衫像的正確值而作出正讀的校正。參閱第1 1圖，顯示了一個相連字元的影像。作業人員在登錄表單資料時，由於書寫習慣使然，常會產生許多相連字元的情形。傳統的辨識系統會將相連字元視作單一個字元而進行辨識，而傳統的螢幕字元校正系統也僅能提供單一字元的修正功能。因此對於相連字元的修正，將必須留待後續的攔位或字元修正的程序之中進行，而造成了後段資料處理的負擔。如圖中所示，字元Fm-n的影像丨16顯示出3子元實際上為兩個相連的字元，然而根據辨識系統的判斷結果，其字元值為2 (如，標號丨丨8所示）。而依據傳統的作法僅能夠針對字元值作單位數的更正，而無法以一次修正即完成字元的校正工作。為了處理相連字元的情況，本發明提出了相對字元的判斷準則，以針對相連字元於螢幕字元校正程序中即能夠快速的判斷出，並以一次的修正而完成相連字元的校正。參閱第1 2 A圖’以說明本發明中用以判斷橫向相連字元的準則。透過對於表單攔位與字元的學習’可得到每個字元的學習樣本範圍所應佔據的位置。以子元Fm-n的學習樣本座標而t. 六及上角的 (請先閱讀背面之注意事項再填寫本頁} .裴訂 4 23 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐）509878 A7 B7 V. Description of the invention (> 丨 The staff consumer cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs printed the correct value of the child's shirt and made corrections. Refer to Figure 11 for an image of connected characters. When the operator logs in the form data, due to the writing habits, many connected characters often occur. The traditional recognition system recognizes the connected characters as a single character, and the traditional screen character correction system also Only the single-character correction function can be provided. Therefore, the correction of concatenated characters will have to be left in the subsequent block or character correction process, resulting in a burden of data processing in the subsequent paragraphs. As shown in the figure, The image of the character Fm-n 丨 16 shows that the 3 sub-elements are actually two connected characters, but according to the judgment result of the recognition system, the character value is 2 (for example, as indicated by the numeral 丨丨 8). According to the traditional method, only the unit value can be corrected for the character value, and the character correction cannot be completed with one correction. In order to deal with the situation of connected characters, the present invention proposes The relative character judgment criterion is to quickly determine the connected characters in the screen character correction procedure, and complete the correction of the connected characters with a single correction. Refer to FIG. 1 2A 'to explain the present invention. The criterion for judging horizontally connected characters in English. Through the study of form blocks and characters, we can get the position that each character's learning sample range should occupy. With the learning sample coordinates of sub-element Fm-n, t Six and upper corners (please read the precautions on the back before filling out this page). Pei order 4 23 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm)

子元Fm-n的相鄰字Fin 而右下角座標則為（χThe adjacent word Fin of the child Fm-n, and the coordinates of the lower right corner are (χ

Fm-(n+1)之學習樣本座標的左實際處理上，The left of the coordinates of the learning sample of Fm- (n + 1) is actually processed.

上負為（Χη3，γ 、為了表示方便的學習座標繪、充内4 而不會將其運算過程顯示於螢幕。 S子元校正系統璜取字元η的影像時，會將字元Fm-n之影像實際佔據的位置與學習座標的位置比較。以第12A圖所表示的橫向相連字元而言，字儿Fm-n之實際影像的左上角座標為（而右下角的座標為（U2)。若字元Fm-n的影像所佔據的位置同時落於兩個學習樣本座標Fm-n與]pm-(n+1)2 中，並與字元Fm-n相鄰之字元學習樣本座標區域重疊的部份超過1/3學習樣本欄的寬度時，即判定此字元為相連字元。而詳細的系統運算規則係定義如下·· 定義OVERLAP ( A，B ) ••用以計算A與B兩個區域的重疊面積。面積A ··字元Fm-n之實際影像座標（ΧηΥ!)〜本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） 509878五、發明説明（ >》）The negative value is (χη3, γ, for convenient learning of coordinate plotting, filling the inner 4 without displaying its calculation process on the screen. The S sub-element correction system will capture the character Fm- The position actually occupied by the image of n is compared with the position of the learning coordinates. For the horizontally connected characters shown in Figure 12A, the upper left corner of the actual image of the character Fm-n is (and the lower right corner is (U2 ). If the position occupied by the image of the character Fm-n falls into two learning sample coordinates Fm-n and] pm- (n + 1) 2 at the same time, and the character adjacent to the character Fm-n learns When the overlapping area of the sample coordinate area exceeds the width of the 1/3 learning sample column, the character is determined to be a connected character. The detailed system operation rules are defined as follows: · Definition of OVERLAP (A, B) Calculate the overlapping area of the two areas A and B. Area A ·· The actual image coordinates of the characters Fm-n (χηm!) ~ This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 509878 &Gt;>

面積B (Χ^Υ2)所佔據的面積。某欄位字元的學習樣本座標 (Χη3，Υη3)〜（Χη4，Υη4)所佔據的面積了而其中，該字元並非原先的字元Fm_n，而是與Fm-n相鄰的字元學習樣本座標的位置。The area occupied by area B (χ ^ 2). The area occupied by the learning sample coordinates (χη3, Υη3) ~ (χη4, Υη4) of a certain field character, and the character is not the original character Fm_n, but the character adjacent to Fm-n The position of the sample coordinates.

IF 相連字元的判斷標準則如下所示； f ίί ί f有欄位字元影像FH，則FH是否為相連字的判別標準如下卜 j疋φ马(FH ) AND 〇VERLAP ( Α，Β ) 大於等於（B/3) ； THEN判定FH字元影像為相連字The criteria for determining IF connected characters are as follows; f ίί ί f has a field character image FH, then the criterion for determining whether FH is a connected character is as follows: j 疋 φ 马 (FH) AND 〇VERLAP (Α, Β) Greater than or equal to (B / 3); THEN determines that the FH character image is a connected character

ENDIF 經濟部中央標準局員工消費合作社印製參閱第12B圖，顯示了在字元校正程序中所遭遇之垂直方向相連字的情形。此類垂直方向相連字的情形，一般是由於表單記錄人員書寫習慣而造成跨攔位相連字元被辨識系統誤認為單一字元所造成的錯誤。與橫向連字相類似，字元Fni — n的學習樣本左上角的座標為（χη1，γηΐ)，而其右下角的座標則 25 本紙張尺度適用中國國家標準（CNS ) Α4規格（210X297公慶） 509878 A7 B7 經濟部中央標準局員工消費合作社印製五、發明説明（分）為（Xn2，Yn2)。而在字元Fm-n垂直方向的某一相鄰欄位Fm'-rT的學習樣本左上角的座標為（χ^，Υη3)，而其右下角的座標則為（χη4，γη4)。至於表示字元 Fm-n、Fm、-n'之學習樣本範圍的方框則為說明方便而繪製，實際運算時學習樣本的範圍將僅供字元校正系統内部運算時的參數。仍參閱第12B圖，字元Fm-n的實際影像由於書寫的習慣而造成跨攔位上下相連的情形，而使得辨識系統將其座標辨識為（XhYJ〜（χ2，Υ2)。判斷垂直方向相連字元時，字元校正系統會首先讀取相連字元Fm-n的實際影像座標，並將字元Fm-n之影像貫際佔據的位置與學習座標F m、· n、的位置比較。若字元Fm-n的實際影像所佔據的位置同時落於兩個學習樣本座標Fm-n與Fm、-n、之中，並與字元Fm-n 垂直方向相鄰字元Fm'-n'之學習樣本座標區域重疊的部份超過1/2學習樣本攔的高度時，即判定此字元為相連字元。而其運算規則係如下所示：疋義ΟVERLAP ( A，B ) ··用以計算a與B兩個區域的重疊面積。面積A :字元Fm-n之實際影像座標（χ" Υι) ild. (請先閱讀背面之注意事項再填寫本頁) 裝-Printed by the ENDIF Consumer Standards Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs Refer to Figure 12B, which shows the situation of vertically connected characters encountered in the character correction process. This type of vertical concatenated characters is usually caused by the writing habits of form recorders. Cross-block concatenated characters are mistakenly recognized by the recognition system as a single character. Similar to horizontal ligatures, the coordinates of the upper-left corner of the learning sample of the character Fni — n are (χη1, γη 其), while the coordinates of the lower-right corner are 25. This paper size applies the Chinese National Standard (CNS) Α4 specification (210X297) ) 509878 A7 B7 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. The invention description (min) is (Xn2, Yn2). The coordinates of the upper left corner of the learning sample of an adjacent field Fm'-rT in the vertical direction of the character Fm-n are (χ ^, Υη3), and the coordinates of the lower right corner are (χη4, γη4). As for the box representing the learning sample range of the characters Fm-n, Fm, -n ', it is drawn for convenience of explanation. The range of the learning sample during the actual operation will only be used as a parameter for the internal operation of the character correction system. Still referring to FIG. 12B, the actual image of the character Fm-n is connected up and down across the stop due to the writing habits, so that the recognition system recognizes its coordinates as (XhYJ ~ (χ2, Υ2). Judging the vertical connection When characters are used, the character correction system first reads the actual image coordinates of the connected characters Fm-n, and compares the positions occupied by the images of the characters Fm-n with the positions of the learning coordinates Fm, · n ,. If the position occupied by the actual image of the character Fm-n falls between the two learning sample coordinates Fm-n and Fm, -n, and is perpendicular to the character Fm-n, the character Fm'-n When the overlapping area of the learning sample coordinate area exceeds the height of the 1/2 learning sample block, the character is determined to be a connected character. The calculation rule is as follows: 疋义〇VERLAP (A, B) ·· Used to calculate the overlapping area of two areas a and B. Area A: the actual image coordinates of the characters Fm-n (χ " Υι) ild. (Please read the precautions on the back before filling this page)

、1T 26 、發明説明Ui)1T 26 Invention Description Ui)

面積B (X2, Y2)所佔據的面積。某搁位字7〇的墨羽4装丄a / 、幻予S樣本座標（χη3，γη3) 〜（Χη4，Υη4)所佔據的面積，而其中，該字元並非原先的字元Fm-n，而是與Area occupied by area B (X2, Y2). The area occupied by the ink feather 4 of a certain seat word 70, and the sample coordinates (χη3, γη3) ~ (χη4, Υη4) of the sample S, and this character is not the original character Fm-n , But with

Fm-n在垂直方向上相鄰的字元學習樣本座標的位置。相連字元的判斷標準則如下所示； f f f有攔位字元影像仏〗，則Fi-j是否為相連子的判別標準如下·· IF ( Fm-n 不等於 Fi-j ) AND OVERLAP ( A B ) 大於等於（B/2) ’Fm-n is the position of the sample coordinates of the adjacent characters in the vertical direction. The criteria for judging connected characters are shown below; fff has a block character image 仏〗, then the criteria for judging whether Fi-j is a connected sub is as follows: · IF (Fm-n is not equal to Fi-j) AND OVERLAP (AB ) Greater than or equal to (B / 2) '

THEN判定Fi-j字元影像為相連字 ENDIF 參閱第12C圖，對於相連字元的判斷方式將可以表示成下列的流程：經濟部中央標準局員工消費合作社印製 (一）啟動流程，如方塊1 2 1。 (二）檢查辨識失敗字元之結果是否落在學習樣本中兩個以上的欄位字丁位置 27 本紙張尺度適用中國國家標準< CNS ) A4規格（210X297公釐） 509878 五、發明説明（) 内，如方塊12 3。 (三）若步驟（二）為否，Βί Φ 則不更正字元 (請先閱讀背面之注意事項再填寫本頁) 直接執行方塊129判斷| ^ 疋否已檢查完所有字元：若是’則中止流藉 L %，如方塊1 3 1 ; 若否，則以下一個待校正窆- ~+ 7L重新進行方塊123所示之流程。 (四）若步驟（二）為是，則繼續執行方塊125檢查該字元是否合乎連字檢查規則。 (五）若步驟（四）為否，則不更正字元直接執行方塊129判斷县I . 4辦疋否已檢查完所有字元：若是’則中止流籍正机私’如方塊1 3 1 ; 若否，則以下一個待校正窆- 工予π重新進行方塊123所示之流程。 (六）若步驟（四）為是，目，丨％〃疋則進行方塊127 以顧示該字元為連字。經濟部中央標準局員工消費合作社印製 (七）判.斷是否已檢查完所有字元，如方塊 12 9，若疋，則中止流程，如方塊i 3工；若否，則以下一個待校正字元重新進行方塊1 23所示之流程直到檢查完所有字元為止。 28 本紙張尺度適用中國國家標準（CNS ) A4規格（210X 297公釐）經濟部中央標準局員工消費合作社印製 Α7 ------------- Β7___五、發明説明（Μ ) ^~ ~—— ^參閱第13圖、第14圖，顯示了本發明之字元校正系統處理相連字元時的情形。首先參閱第^ 3 圖，使用者界面中的字元影像顯示區中· Fm-n字元無法辨認，而實際的情形可能是該處字體影像破裂^ 疋有雜訊導致辨識系統拒絕辨識該字元。而字元 Fm-(n+l)所顯示的影像經辨識系統辨識後之值為*。而符唬119為本發明之字元校正系統之使用者界面所提供之特殊功能，用以提供顯示於使用者界面上兩相鄰子元影像是否為真實表單中之相鄰字元的顯示付號。也就疋说’當字元Fm_n、Fm-(n+1)為真實表單中相鄰之兩字元時，在使用者界面之影像顯示區域中即會出現符號119。在習知技藝中，對於字元 Fm-n將不具有足夠的資訊加以更正，而字元Fm<n+1) 亦無法直接對相連字元進行校正。參閱第14圖，若是透過上述之準則判斷第Fm-(n+1)個字元為相連字時’在使用者界面左下方的字元週邊影像顯示區120中可以觀察到，字元 (標號1 24 )與其週邊影像的圖形。而使用者界面右下方的連字功能功能顯示區1 22將會自動判斷字元 Fm-(n+l)是否為為相連字；若是，則在其中的對話方 29 本紙張尺度適用中國國家標準（CNS ) A4規格（2丨〇X297公釐） (請先閱讀背面之注意事項再填寫本頁) -裝_ 訂 •4 塊中予以註記。當操作者以游標點選所欲更正的字兀時，本發明所提供的字元校正系統將會自動以判斷準則來檢查該字元是否為連字，並將判斷結果顯示於連字功能顯示區122之中。因此使用者將可以據字元週邊影像顯示區120以及連字元功能顯示區 122所提不的相連字元訊息，直接修正字元Fm-n的值為2而字元F.(n+1)的值為〇。如此將可使得相連子元可以在螢幕字元校正程序中即可獲得校正，進而減輕後續表單更正作業的負擔。最後，將應用本發明之字元校正系統透過資料載體、（如’磁碟、磁帶、光碟、記憶體等）予以儲存，並以中央處理裝置作為運算核心，所校正後的字元亦儲存於資料載體之中，以供後續的表單辨識作業使用。配合週邊裝置，如終端機、掃描器、印表機、通訊網絡，將可提供強大的表單資料處理能以上主要針對處理表單中數字字元的字元校正系統說明本發明的實施方式，然而本發明方法的應用：推及於任何形式字元的校正。本發明之以-較 :實施例說明如上，僅用於藉以幫助了解本發明之實轭非用以限定本發明之精神，而熟悉此領域技 30 509878 A7 B7 五、發明説明（叫）藝者於領悟本發明之精神後，在不脫離本發明之精神範圍内，當可做些許更動潤飾及等同之變化替換 (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 3 本紙張尺度適用申國國家標準（CNS ) A4規格（210X297公釐）THEN determines that the image of the Fi-j character is a conjunctive character ENDIF Refer to Figure 12C. The method for judging concatenated characters can be expressed as the following process: Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs (1) Startup process, such as a box 1 2 1. (2) Check whether the result of the recognition of the failed characters falls in two or more fields in the learning sample. 27 This paper size is applicable to the Chinese National Standard < CNS) A4 specification (210X297 mm) 509878 5. Description of the invention ( ), Such as box 12 3. (3) If step (2) is no, Βί Φ does not correct the characters (please read the precautions on the back before filling this page). Go directly to block 129 to judge | ^ 疋 No All characters have been checked: If yes, then Stop flow borrowing L%, as in block 1 3 1; if not, then the next to be corrected 窆-~ + 7L re-perform the flow shown in block 123. (D) If the answer to step (b) is yes, then proceed to block 125 to check whether the character complies with the hyphenation check rule. (5) If step (4) is no, then directly execute block 129 without correcting the characters to determine the county I. 4 to check whether all characters have been checked: if it is 'then abort the Liujizhenji private' as in block 1 3 1 ; If not, re-perform the flow shown in block 123 in the next to-be-corrected 窆-工予 π. (6) If step (4) is YES, then 丨% 〃疋 then go to block 127 to show that the character is a ligature. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs (7) Judge whether all characters have been checked, such as box 12 9; if 疋, the process is suspended, such as box i 3; if not, the following one is to be corrected The character repeats the flow shown in box 1 23 until all characters have been checked. 28 This paper size applies to China National Standard (CNS) A4 (210X 297 mm) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs A7 ------------- Β7 ___ V. Description of the invention ( Μ) ^ ~ ~ —— ^ Refer to FIG. 13 and FIG. 14, which show the situation when the character correction system of the present invention processes connected characters. First refer to Figure ^ 3. In the character image display area of the user interface, the Fm-n characters are unrecognizable, but the actual situation may be that the font image is broken there. 疋 Noise causes the recognition system to refuse to recognize the character. yuan. The image displayed by the character Fm- (n + l) is recognized by the recognition system as *. The fool 119 is a special function provided by the user interface of the character correction system of the present invention, and is used to provide whether the two adjacent sub-element images displayed on the user interface are adjacent characters in the real form. number. In other words, when the characters Fm_n and Fm- (n + 1) are two adjacent characters in the real form, the symbol 119 appears in the image display area of the user interface. In the conventional art, the characters Fm-n will not have enough information to be corrected, and the characters Fm < n + 1) cannot directly correct the connected characters. Referring to FIG. 14, if it is judged that the Fm- (n + 1) th character is a concatenated character through the above-mentioned criteria, the character surrounding the character image display area 120 at the lower left of the user interface can be observed. 1 24) and its surrounding image. And the ligature function function display area 1 22 at the bottom right of the user interface will automatically determine whether the character Fm- (n + l) is a concatenated character; if so, the interlocutor in it 29. The paper size applies to Chinese national standards (CNS) A4 specification (2 丨〇297mm) (Please read the precautions on the back before filling out this page)-Packing _ Order • Note in 4 blocks. When the operator selects the character to be corrected with the cursor, the character correction system provided by the present invention will automatically use the judgment criteria to check whether the character is a ligature and display the judgment result on the ligature function display. Area 122. Therefore, the user can directly modify the value of the character Fm-n to 2 and the character F. (n + 1) according to the connected character information mentioned in the image display area 120 and the hyphen function display area 122 around the character. ) Has a value of 0. In this way, the connected sub-elements can be corrected in the screen character correction process, thereby reducing the burden of subsequent form correction operations. Finally, the character correction system to which the present invention is applied is stored through a data carrier (such as a 'disk, tape, optical disc, memory, etc.), and a central processing device is used as a computing core. The corrected characters are also stored in the Data carrier for subsequent form identification. Cooperating with peripheral devices, such as terminals, scanners, printers, and communication networks, will provide powerful form data processing capabilities. The character correction system for processing digital characters in forms is described above. However, this embodiment Application of the inventive method: correction to any form of character. The comparison of the present invention is as follows: The examples are described above, and are only used to help understand the actual yoke of the present invention, not to limit the spirit of the present invention, and to be familiar with the technology in this field. 30 509878 A7 B7 V. Description of the invention (called) Artist After comprehending the spirit of the present invention, within the scope of the present invention, it can be modified and replaced with some changes (please read the precautions on the back before filling this page), the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs Printed 3 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm)

Claims

Time. Θ gas

、 Applicable patents in the form of form word + steps: early… method, this method at least includes the following characters input at least one to be corrected by the identification system :: some to be corrected characters and corresponding The definition of the stop-the characteristics of the court's tribute, 柃氺私士 # and me /, 4 of these to-be-corrected words-Yu Keshu's definition and scope are outside ... Waiziwu, and generated by the identification system Fall in the candidate that falls within the definition order, and correct the characters to be corrected outside the corresponding τ ^ 4 circle to reduce the number of right children; and the golden screen to display the one pair and the one This to-be-corrected word may be corrected in cooperation with the manual method. = Please correct the characters in the form of the "Scope" member of the patent: Niu Xian ... Check and correct the characters of the data to be corrected η: Department—The result of the internal operation of the microprocessor will not appear on the screen. 3. If the method of patent application scope contains a step, the corresponding symbol should be checked for correction. The form character corrector of the item uses the relative to the character to be corrected. 32 509878 8 8 8 8 ABCD Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs. 6. The scope of patent application. Meta correction method, wherein the step of reusing the check mark for correction includes at least the following steps: checking out the character to be corrected with the corresponding check mark; judging whether the value of the character to be corrected is consistent with the value of the check mark ; Correct the to-be-corrected characters that are inconsistent with the value of the check-mark with the value of the check-mark; and repeatedly check and correct all to-be-corrected characters with corresponding check-marks by the above two steps. 5. If the form character correction method in item 4 of the scope of patent application, wherein the step of correcting by using a check mark, is the result of the internal calculation of a microprocessor, it will not be displayed on the screen. 6. If the form character correction method of item 1 of the patent application scope includes a step of determining a connected character to be corrected to correct the connected character to be corrected. 7. If the form character correction method of item 6 of the scope of patent application is applied (please read the precautions on the back before filling this page) ^ —- I Threading This paper size is applicable to China National Standard (CNS) A4 specifications (210X 297 mm) 509878

It is stated in the patent fan park method that a series of steps are judged to be connected: The sub-element step includes at least checking that the character to be corrected falls into; the a of the learning block on the sentence M checks whether there are more than two should ... The character to be corrected for the ^ position is: when you are waiting for a certain distance; when it is the rated distance, it is judged: the character that is on hold. Ziwu is the connection to be corrected. 8. If the sound suppression method of item 7 of the scope of the patent application, the method of basin φ μ, +,, and table correction is the learning field refers to the system obtained through learning. Mother—The template position of the character. + Bai Suozi to 9. If the scope of the patent application is 帛 7, the form letter of page, where the above-mentioned rated distance ′ includes the 1/3 width of the learning block adjacent to the binary wide to be calibrated horizontally. . Printed by the Employees' Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs10. If the scope of patent application is 7 items, the form character correction method 'where the above-mentioned rated distance' includes the learning field adjacent to the character to be corrected in the vertical direction 1/2 of the height. / σ 34 This paper size is applicable to China National Standard (CMS) 8.4 (2i × 297mm)

Printed by the Consumers' Cooperative of the Ministry of Health and Economics of the People's Republic of China. 如 · If you apply the binding strike method, the form 7 of the scope 1 in the scope of Feng- 妗 / 、中中 should be corrected A _ Wu * Right Zhengfang use Above the user interface ... “Ziwu” will be displayed on the right of the correction character with a symbol. The sub-character can be used by the operator to simultaneously correct all the characters to be corrected of the connected to-be-gamma. In ^ 丨 2 · —A form of character correction is used to face at least A k · User interface 'Please read the ik number first in the user field; At least includes: a turn-in device for inputting character correction operations The command / signal y is not set to display the message of the character to be corrected generated by an identification system. The display device has: a character image display area that displays individual image data of the characters to be corrected And the recognition result, and allow the correction result to be directly corrected; and a peripheral image display area, which displays a surrounding image of the corrector element. ^ 13. The user interface according to item 12 of the scope of patent application, wherein the retrieval device is used to select any one of the characters to be corrected to become the character being corrected. 35 This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) I

Order i 509878 Printed by A8 B8 C8 D8 of the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economics 々. Patent application scope _ 14. If the user interface of the patent application scope item 12, the display device includes a hyphen function display The area 'provides and displays information on whether the character being corrected is a concatenated character. 15. If the user interface of item 12 or 14 of the scope of patent application, the display device therein includes a character description display area to display the position of the form where the character being corrected is located. 16. For the user interface of the scope of patent application No. 15, the character description display area displays the position of the field where the character being corrected is located. 17. For the user interface of the scope of application for patent No. 12, the character image display area includes a plurality of blocks to display the image and recognition result of the character to be corrected. 18. For the user interface of the scope of application for patent No. 12, wherein the above character image display area displays a symbol that can be used to represent adjacent characters. 36 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) -1 II S-. S- · -.— II —1-1 (Read the precautions on the back and read this page first) Thread -----