TWM626684U

TWM626684U - Document proofreading device

Info

Publication number: TWM626684U
Application number: TW110209185U
Authority: TW
Inventors: 穎欣李; 邱建中; 李藝鋒; 宋政隆; 王俊權
Original assignee: 中國信託商業銀行股份有限公司
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2022-05-11

Abstract

一種文件校對裝置，其取得一原始文件檔案中被複數個第一文字框框選的文字和一待驗文件影像檔案中被複數個第二文字框框選的文字，且將M(2

M

N)個框選的文字是唯一的之第一文字框配對到M個框選的文字是唯一的之第二文字框，並取得M個第一文字框與M個第二文字框的四個角點及中點位置的座標，且根據M個第一文字框的該等座標和M個第二文字框的該等座標的對應關係計算一座標轉換矩陣，並利用座標轉換矩陣將該等第一文字框投影到該待驗文件影像檔案中，再比對原始文件檔案之被該第一文字框框選的文字與待驗文件影像檔案之被該第一文字框框選的文字。 A document proofreading device, which obtains characters framed by a plurality of first character frames in an original document file and characters framed by a plurality of second character frames in a document image file to be checked, and converts M(2

M

N) The first text box where the selected text is unique is paired to the second text box where the text selected in the M box is unique, and the four corner points of the M first text boxes and the M second text boxes are obtained. and the coordinates of the midpoint position, and calculate a coordinate transformation matrix according to the correspondence between the coordinates of the M first text boxes and the coordinates of the M second text boxes, and use the coordinate transformation matrix to project the first text boxes. In the image file of the document to be checked, the text framed by the first text frame in the original document file is compared with the character framed by the first character frame in the image file of the document to be checked.

Description

document proofing device

本新型是有關於一種校對裝置，特別是指一種比對兩份文件相異處之文件校對裝置。 The new model relates to a proofreading device, in particular to a document proofreading device for comparing the differences between two documents.

在銀行、保險業等行業中，合約、同意書等客戶簽名的文件常需要人力進行校對確認，往往會耗費大量時間。而要找出原文件檔案與其經掃描或拍攝所產生的影像檔案的相異處，除了人工進行校對外，現有一種做法是先利用人工智慧訓練完成的深度學習模型對影像檔案進行文字偵測並產生文字框框選找到的文字影像，再對文字框框選的文字影像進行辨識，然後將辨識出來的整篇文字與原文件檔案作比對。這種做法的確可以迅速取得結果，但原文件檔案經過掃描或拍攝後產生的影像經常會有位移、旋轉、雜訊及手寫文字/簽名等狀況，易使得深度學習模型無法偵測並以文字框框選某些文字影像而影響文字辨識的準確度。 In banking, insurance and other industries, documents signed by customers such as contracts and consents often require manual verification and confirmation, which often consumes a lot of time. To find out the differences between the original document file and the image file generated by scanning or shooting, in addition to manual proofreading, an existing method is to first use a deep learning model trained by artificial intelligence to detect the text in the image file and then A text image selected by the text box is generated, and then the text image selected by the text box is recognized, and then the recognized whole text is compared with the original document file. This approach can indeed get results quickly, but the images generated after scanning or shooting the original document file often have displacement, rotation, noise, and handwritten text/signature, etc., which is easy to make the deep learning model unable to detect and frame the text Selecting some text images will affect the accuracy of text recognition.

因此，本新型之目的，即在提供一種文件校對裝置，其能精準地找出原始文件檔案中的文字與其經掃描或拍攝後產生的影像檔案中的文字的對應關係，以精準比對兩份文件的差異。 Therefore, the purpose of the present invention is to provide a document proofreading device, which can accurately find out the text in the original document file and its generated text after scanning or photographing. The correspondence between the text in the image file to accurately compare the differences between the two documents.

於是，本新型一種文件校對裝置，用以校對一原始文件檔案與一待驗文件影像檔案，並包括一儲存單元及一處理單元，該儲存單元儲存該原始文件檔案與該待驗文件影像檔案；該處理單元能存取該儲存單元並包含一文字擷取模組、一文字框配對模組、一轉換矩陣產生模組、一文字框投影模組及一比對模組；其中，該文字擷取模組從該原始文件檔案中取得複數個第一文字框以及被各該第一文字框框選的文字，並從該待驗文件影像檔案中取得複數個第二文字框以及被各該第二文字框框選的文字；該文字框配對模組判斷該原始文件檔案中至少N(N

2且N為正整數)個第一文字框所框選的文字是唯一的時，根據該N個第一文字框配對到該待驗文件影像檔案中框選唯一出現的文字的M(2

M

N且M為正整數)個第二文字框，並取得配對的M個第一文字框與M個第二文字框的四個角點及其中點位置的座標；該轉換矩陣產生模組根據配對的M個第一文字框的該等座標以及M個第二文字框的該等座標之間的對應關係，計算一座標轉換矩陣；該文字框投影模組根據該座標轉換矩陣，將該原始文件檔案中的該等第一文字框投影到該待驗文件影像檔案中，並取得該待驗文件影像檔案中被各該第一文字框框選的文字；該比對模組比對該原始文件檔案之被各該第一文字框框選的文字與該待驗文件影像檔案之被各該第一文字框的文字資訊，並輸出一比對結果。 Therefore, the novel document proofreading device is used for proofreading an original document file and an image file of a document to be checked, and includes a storage unit and a processing unit, and the storage unit stores the original document file and the image file of the document to be checked; The processing unit can access the storage unit and includes a text capture module, a text frame matching module, a conversion matrix generation module, a text frame projection module and a comparison module; wherein, the text capture module Obtain a plurality of first text boxes and the text framed by each of the first text frames from the original document file, and obtain a plurality of second text frames and the text framed by each of the second text frames from the image file of the document to be checked ; The text box matching module determines that at least N(N

2 and N is a positive integer) When the text framed by the first text boxes is unique, the M(2

M

N and M are positive integers) second text boxes, and obtain the coordinates of the four corner points and the midpoints of the paired M first text boxes and the M second text boxes; the conversion matrix generation module is based on the paired coordinates The correspondence between the coordinates of the M first text boxes and the coordinates of the M second text boxes calculates a coordinate transformation matrix; the text box projection module converts the original file into the original file according to the coordinate transformation matrix. The first text frames of the original document are projected into the image file of the document to be checked, and the characters selected by the first text frames in the image file of the document to be checked are obtained; the comparison module compares the original document file by the respective The text selected in the first text box is compared with the text information of the image file of the document to be checked by each of the first text boxes, and a comparison result is output.

在本新型的一些實施態樣中，該文字框配對模組判斷該原始文件檔案中少於N個第一文字框所框選的文字是唯一的時，則記錄該原始文件檔案中框選的文字相同的該等第一文字框及其數量，並記錄該待驗文件影像檔案中框選的文字相同的該等第二文字框及其數量，且將該原始文件檔案中框選的文字相同的該等第一文字框與該待驗文件影像檔案中框選的文字相同且數量與該等第一文字框相同的該等第二文字框進行配對，並且將該原始文件檔案中框選的文字相同的該等第一文字框以一第一矩形框框在其中，並取得至少兩個該第一矩形框的四個角點及其中點位置的座標，且將該待驗文件影像檔案中框選的文字相同的該等第二文字框以一第二矩形框框在其中，並取得與至少兩個該第一矩形框配對的至少兩個該第二矩形框的四個角點及其中點位置的座標；該轉換矩陣產生模組根據配對的至少兩個該第一矩形框的該等座標以及至少兩個該第二矩形框的該等座標之間的對應關係，計算該座標轉換矩陣。 In some embodiments of the present invention, when the text box matching module determines that the text framed by less than N first text boxes in the original document file is unique, it records the framed text in the original document file The same first text boxes and their number, and record the second text boxes and their number with the same framed text in the image file of the document to be checked, and the same framed text in the original document file. Match the first text box with the second text boxes with the same framed text and the same number as the first text boxes in the image file of the document to be checked, and match the framed text in the original document file with the same framed text. Wait for the first text frame to be framed by a first rectangular frame, and obtain at least two coordinates of the four corners of the first rectangular frame and the position of the midpoint, and the framed text in the image file of the document to be checked is the same The second text boxes are framed by a second rectangular frame, and the coordinates of the four corners and the midpoints of at least two of the second rectangular frames paired with at least two of the first rectangular frames are obtained; the conversion The matrix generation module calculates the coordinate transformation matrix according to the corresponding relationship between the coordinates of the paired at least two first rectangular frames and the coordinates of at least two second rectangular frames.

在本新型的一些實施態樣中，該文字擷取模組還取得該原始文件檔案和該待驗文件影像檔案的頁數，且該文字擷取模組判斷該原始文件檔案的頁數和該待驗文件影像檔案的頁數相同後，才從該原始文件檔案中取得該等第一文字框以及被各該第一文字框框選的文字，並從該待驗文件影像檔案中取得該等第二文字框以及被各該第二文字框框選的文字。 In some implementation aspects of the present invention, the text capture module further obtains the number of pages of the original document file and the image file of the document to be checked, and the text capture module determines the number of pages of the original document file and the After the number of pages in the image file of the document to be checked is the same, the first text boxes and the characters selected by each of the first text frames are obtained from the original document file, and the second characters are obtained from the image file of the document to be checked framed with and the text framed by each of the second text boxes.

在本新型的一些實施態樣中，該座標轉換矩陣是單應性矩陣。 In some embodiments of the present invention, the coordinate transformation matrix is a homography matrix.

本新型之功效在於：藉由將該原始文件檔案中的該等第一文字框投影到該待驗文件影像檔案中，再對該待驗文件影像檔案中被該等第一文字框框選的內容進行文字辨識，能解決該待驗文件影像檔案中某些文字因為位移、旋轉、雜訊或手寫文字/簽名/塗改等狀況而無法被偵測到的問題。 The effect of the present invention is: by projecting the first text frames in the original document file into the image file of the document to be checked, and then texting the content framed by the first text frames in the image file of the document to be checked Identification can solve the problem that some characters in the image file of the document to be checked cannot be detected due to displacement, rotation, noise, or handwritten characters/signatures/alterations.

1、4:原始文件檔案 1, 4: Original file archives

11、11’、41、41’:第一文字框 11, 11', 41, 41': the first text box

42:第一矩形框 42: The first rectangular frame

2、5:待驗文件影像檔案 2, 5: Image files of documents to be inspected

21、21’、51、51’:第二文字框 21, 21', 51, 51': the second text box

52:第二矩形框 52: Second rectangular frame

3:文件校對裝置 3: Document proofreading device

31:儲存單元 31: Storage unit

32:處理單元 32: Processing unit

321:文字擷取模組 321: Text Capture Module

322:文字框配對模組 322: Text box matching module

323:轉換矩陣產生模組 323: Conversion matrix generation module

324:文字框投影模組 324: Text box projection module

325:比對模組 325: Comparison Module

S1~S6、S3’、S4’:步驟 S1~S6, S3', S4': Steps

本新型之其他的特徵及功效，將於參照圖式的實施方式中清楚地顯示，其中：圖1是本新型文件校對裝置的一實施例包括的硬體元件和模組方塊示意圖；圖2是一原始文件檔案的示意圖；圖3是一待驗文件影像檔案的示意圖；圖4是本實施例進行文件校對的主要流程；圖5是在該原始文件檔案中以第一文字框框選偵測到的文字的示意圖；圖6是在該待驗文件影像檔案中以第二文字框框選偵測到的文字的示意圖；圖7是說明圖6所示的部分第一文字框與部分第二文字框配對的示意圖；圖8是說明取得配對的第一文字框和第二文字框的四個角點與中點位置的座標的示意圖；圖9是說明將該原始文件檔案中的該等第一文字框投影到該待驗文件影像檔案中的示意圖；圖10說明第一文字框的座標藉由座標轉換矩陣轉換成投影到該待驗文件影像檔案中的座標；圖11是說明該原始文件檔案中的第一文字框與該待驗文件影像檔案中的第二文字框的配對關係的示意圖；圖12說明將該原始文件檔案中框選相同文字的該等第一文字框以第一矩形框框在其中以及將該待驗文件影像檔案中框選相同文字的該等第二文字框以第二矩形框框在其中；及圖13是說明取得該等第一文字框和該等第二文字框的四個角點與中點位置的座標的示意圖；及圖14是說明將該原始文件檔案中的該等第一文字框投影到該待驗文件影像檔案中的示意圖。 Other features and effects of the present invention will be clearly shown in the embodiments with reference to the drawings, wherein: FIG. 1 is a schematic block diagram of hardware components and modules included in an embodiment of the document proofreading device of the present invention; FIG. 2 is a A schematic diagram of an original document file; FIG. 3 is a schematic diagram of an image file of a document to be checked; FIG. 4 is the main flow of document proofreading in the present embodiment; A schematic diagram of the text; FIG. 6 is a schematic diagram of the detected text in the image file of the document to be checked with a second text frame; 7 is a schematic diagram illustrating the pairing of part of the first text frame and part of the second text frame shown in FIG. 6; Schematic diagram; FIG. 9 is a schematic diagram illustrating the projection of the first text frames in the original document file into the image file of the document to be checked; FIG. 10 illustrates that the coordinates of the first text frame are converted into projections to the to-be-checked document by a coordinate transformation matrix. Coordinates in the document image file; FIG. 11 is a schematic diagram illustrating the pairing relationship between the first text frame in the original document file and the second text frame in the document image file to be checked; FIG. 12 illustrates the frame selection in the original document file The first text boxes of the same text are framed by a first rectangular frame, and the second text boxes of the same text in the image file of the document to be checked are framed by a second rectangular frame; and FIG. 13 is a description of obtaining the A schematic diagram of the coordinates of the positions of the four corners and the midpoints of the first text frame and the second text frame; and FIG. 14 is a diagram illustrating the projection of the first text frames in the original document file to the image file of the document to be checked Schematic in .

在本新型被詳細描述之前，應當注意在以下的說明內容中，類似的元件是以相同的編號來表示。 Before the present invention is described in detail, it should be noted that in the following description, similar elements are designated by the same reference numerals.

參閱圖1所示，是本新型文件校對裝置的一實施例的主要元件方塊圖，其用以校對如圖2所示的一原始文件檔案1與如圖3所示的一待驗文件影像檔案2，其中，該待驗文件影像檔案2是該原始文件檔案1的紙本經過掃描或拍攝所產生；且本實施例的該文件校對裝置3是藉由執行圖4所示的流程來實現文件校對，該文件校對裝置3是一電腦裝置，且如圖1所示，其主要包括一儲存單元31(例如電腦中內建、安裝或外接的記憶體模組)、一能存取該儲存單元31的處理單元32(例如中央處理器)以及圖1未示的其它相關零組件等。該儲存單元31中儲存或暫存要進行校對的該原始文件檔案1和該待驗文件影像檔案2；該處理單元32中預先載入有從一電腦可讀取的記錄媒體(例如該儲存單元31)讀取並可被該處理單元32執行的一程式，該程式包含一文字擷取模組321、一文字框配對模組322、一轉換矩陣產生模組323、一文字框投影模組324及一比對模組325。 Referring to FIG. 1, it is a block diagram of the main components of an embodiment of the novel document proofreading device, which is used for proofreading an original document file 1 shown in FIG. 2 and an image file of a document to be checked as shown in FIG. 3 2, wherein, the image file 2 of the document to be verified is generated by scanning or photographing the paper copy of the original document file 1; and the document proofreading device 3 of this embodiment realizes the document by executing the process shown in FIG. 4 For proofreading, the document proofreading device 3 is a computer device, and as shown in FIG. 1 , it mainly includes a storage unit 31 (such as a built-in, installed or external memory module in the computer), a storage unit capable of accessing the The processing unit 32 (eg, central processing unit) of 31 and other related components and the like not shown in FIG. 1 . The storage unit 31 stores or temporarily stores the original document file 1 and the image file 2 of the document to be checked; the processing unit 32 is preloaded with a computer-readable recording medium (for example, the storage unit 31) A program read and executable by the processing unit 32, the program includes a text capture module 321, a text frame matching module 322, a conversion matrix generation module 323, a text frame projection module 324 and a ratio to module 325.

藉此，如圖4的步驟S1，當該處理單元32執行該程式，該文字擷取模組321從該儲存單元31讀取該原始文件檔案1並偵測該原始文件檔案1的內容是否有文字，並將偵測到的文字以第一文字框11框選起來，以從該原始文件檔案1中獲得複數個第一文字框11以及被各該第一文字框11框選的內容12，如圖5所示；具體而言，該文字擷取模組321判斷該原始文件檔案1是PDF 格式時，則會使用文字偵測軟體，例如但不限於pdfminer等工具以文字框將偵測到的文字框選起來並讀取被文字框框選的內容，即文字資訊；且若該文字擷取模組321判斷該原始文件檔案1是DOC/ODF等文檔格式時，則將該原始文件檔案1轉換或轉存成PDF格式的檔案，再以上述文字偵測軟體偵測該原始文件檔案1中的文字並以文字框框選起來且讀取被文字框框選的內容，即文字資訊。 Thus, as shown in step S1 in FIG. 4 , when the processing unit 32 executes the program, the text capture module 321 reads the original document file 1 from the storage unit 31 and detects whether the content of the original document file 1 has text, and the detected text is framed by the first text box 11 to obtain a plurality of first text boxes 11 and the content 12 framed by each of the first text boxes 11 from the original document file 1, as shown in FIG. 5 Specifically, the text capture module 321 determines that the original document file 1 is a PDF When formatting, text detection software, such as but not limited to pdfminer and other tools, will be used to select the detected text box with the text box and read the content selected by the text box, that is, the text information; and if the text is extracted When the module 321 determines that the original document file 1 is in a document format such as DOC/ODF, it converts or converts the original document file 1 into a PDF format file, and then uses the above text detection software to detect the original document file 1. and select the text in the text box and read the content selected by the text box, that is, the text information.

而若該文字擷取模組321判斷該原始文件檔案1是影像檔案時，則利用預先藉由深度學習訓練完成且用以偵測文字的一文字偵測模型，例如但不限於RCNN(Region-based Convolutional Neural Networks，基於區域的卷積神經網路)或YOLO(You Only Look Once)等深度學習模型，對該原始文件檔案1進行文字偵測並將偵測到的文字以文字框框選起來。然後，該文字擷取模組321再利用預先藉由深度學習訓練完成且用以辨識文字的一文字辨識模型，對該原始文件檔案1中被第一文字框11框選的內容進行文字辨識，以取得被該等第一文字框11框選的內容，即文字資訊，例如圖5中第一列的Cat、Dog，第二列的Fish、King，第三列的Dog、Egg，第四列的Dog、Egg...等。 And if the text capture module 321 determines that the original document file 1 is an image file, it uses a text detection model trained in advance through deep learning and used to detect text, such as but not limited to RCNN (Region-based Convolutional Neural Networks, region-based convolutional neural network) or YOLO (You Only Look Once) and other deep learning models, perform text detection on the original file file 1 and select the detected text in a text box. Then, the text capture module 321 uses a text recognition model trained in advance through deep learning and used to recognize text to perform text recognition on the content framed by the first text box 11 in the original document file 1 to obtain The content framed by the first text boxes 11, that is, text information, such as Cat, Dog in the first row, Fish, King in the second row, Dog, Egg in the third row, Dog, Egg...etc.

同時，如圖6所示，該文字擷取模組321利用上述的該文字偵測模型偵測該待驗文件影像檔案2中的文字而產生複數個第二文字框21，再利用該文字辨識模型對該待驗文件影像檔案2中被第二文字框21框選的內容進行文字辨識，以取得各該第二文字框21框選的內容，即文字資訊，例如圖6中第一列的Cat、Dog，第二列的King，第三列的Dog、Egg，第四列的Egg...等；且由於該待驗文件影像檔案2可能是該原始文件檔案1經手寫文字/簽名/塗改等後經由掃描或拍攝所產生，所以其影像內容可能出現位移、旋轉、雜訊及手寫文字/簽名/塗改等狀況，而影響該文字偵測模型偵測文字及找到文字框，例如，當該待驗文件影像檔案2第四列中的”Dog”上出現一交叉斜線(或雜訊)時，該文字偵測模型將誤判該處沒有文字而沒有產生文字框框選”Dog”這個影像，且因為只有被第二文字框21框選的內容會被輸入該文字辨識模型中進行文字辨識，因而導致”Dog”這個字沒有被辨識出來。 At the same time, as shown in FIG. 6 , the text capture module 321 uses the above-mentioned text detection model to detect the text in the image file 2 of the document to be checked to generate plural numbers A second text box 21 is formed, and then the text recognition model is used to perform text recognition on the content framed by the second text frame 21 in the image file 2 of the document to be checked, so as to obtain the content framed by the second text frame 21, That is, text information, such as Cat and Dog in the first column of FIG. 6 , King in the second column, Dog and Egg in the third column, Egg in the fourth column, etc.; The original document file 1 is generated by scanning or photographing after the handwritten text/signature/alteration, etc., so the image content may appear displacement, rotation, noise, and handwritten text/signature/alteration, etc., which will affect the text detection. The model detects the text and finds the text box. For example, when a cross slash (or noise) appears on the "Dog" in the fourth column of the image file 2 of the document to be checked, the text detection model will misjudge that there is no text there The image "Dog" is not generated in a text box, and because only the content selected by the second text box 21 is input into the text recognition model for text recognition, the word "Dog" is not recognized.

值得一提的是，在進行上述步驟S1之前，該文字擷取模組321還可取得該原始文件檔案1和該待驗文件影像檔案2的頁數，且該文字擷取模組321判斷該原始文件檔案1的頁數和該待驗文件影像檔案2的頁數相同後，才執行步驟S1，否則即判定這兩份文件可能並無關聯而不執行步驟S1並輸出一文件錯誤訊息。 It is worth mentioning that, before performing the above step S1, the text capture module 321 can also obtain the page numbers of the original document file 1 and the image file 2 of the document to be checked, and the text capture module 321 determines the Step S1 is performed only after the number of pages of the original document file 1 is the same as the number of pages of the image file 2 of the document to be checked. Otherwise, it is determined that the two files may not be related, and step S1 is not performed and a file error message is output.

接著，該處理單元32執行圖4的步驟S2，令該文字框配對模組322判斷該原始文件檔案1中是否有至少N(N

2且N為正整數)個第一文字框11所框選的文字在該原始文件檔案1中是唯一的，若是，例如圖7所示，其中有五個第一文字框11’所框選的文字在該原始文件檔案1中是唯一出現的，即Cat、Fish、King、Car和Apple，則如圖4的步驟S3，該文字框配對模組322根據該五個第一文字框11’配對到該待驗文件影像檔案2中框選唯一出現的文字(例如圖7中的Cat、King、Car和Apple)的M(2

M

N)個第二文字框21’，在此M等於4，即該待驗文件影像檔案2中有四個第二文字框21’框選唯一出現的文字且所框選的文字與該五個第一文字框11’其中的四個第一文字框11’框選的文字配對；值得一提的是，該待驗文件影像檔案2中框選唯一出現的文字的第二文字框21’的數量不一定會與第一文字框11’相同(如同上述)，不過只要該待驗文件影像檔案2中框選唯一出現的文字的第二文字框21’的數量至少兩個且所框選的文字和該五個第一文字框11’所框選的文字其中至少兩個配對即可。 Next, the processing unit 32 executes step S2 in FIG. 4 , so that the text box matching module 322 determines whether the original document file 1 has at least N(N

2 and N is a positive integer) the text framed by the first text boxes 11 is unique in the original document file 1. If, for example, as shown in FIG. 7, there are five text frames selected by the first text boxes 11'. The original document file 1 is the only one that appears, namely Cat, Fish, King, Car and Apple, as shown in step S3 in FIG. 4 , the text box matching module 322 is paired with the five first text boxes 11 ′ In the image file 2 of the document to be checked, select the M(2) of the only text (such as Cat, King, Car and Apple in Figure 7)

M

N) second text boxes 21 ′, where M is equal to 4, that is, there are four second text boxes 21 ′ in the image file 2 of the document to be checked to select the only text that appears, and the selected text is the same as the five The four first text boxes 11 ′ in the first text box 11 ′ are frame-selected text pairs; it is worth mentioning that the number of the second text boxes 21 ′ which frame the only text appearing in the image file 2 of the document to be checked is not equal. It must be the same as the first text box 11' (as described above), but as long as the number of the second text boxes 21' in which the only text that appears in the image file 2 of the document to be checked is framed is at least two, and the framed text is the same as the At least two of the texts selected by the five first text boxes 11' may be matched.

然後，如圖8所示，該文字框配對模組322取得配對的四個第一文字框11’與四個第二文字框21’的四個角點及其中點位置(如圖8中的黑點)的座標；且此處所述的座標是指在該原始文件檔案1和在該待驗文件影像檔案2上的圖素位置。 Then, as shown in FIG. 8 , the text box pairing module 322 obtains the four corner points and the midpoint positions of the paired four first text boxes 11 ′ and four second text boxes 21 ′ (black in FIG. 8 ) point) coordinates; and the coordinates described here refer to the pixel positions on the original document file 1 and the image file 2 of the document to be checked.

接著，如圖4的步驟S4，該處理單元32令該轉換矩陣產生模組323根據配對的四個第一文字框11’的該等座標(共25個座標)以及四個第二文字框21’的該等座標(共25個座標)之間的對應關係，計算一用以將該原始文件檔案1中的該等第一文字框11投影到該待驗文件影像檔案2中的座標轉換矩陣，且在本實施例中，該座標轉換矩陣可以是但不限於單應性矩陣(Homography matrix)。 Next, as shown in step S4 in FIG. 4 , the processing unit 32 makes the conversion matrix generation module 323 according to the coordinates of the paired four first text boxes 11 ′ (25 coordinates in total) and the four second text boxes 21 ′ between these coordinates (a total of 25 coordinates) of Corresponding relationship, calculate a coordinate transformation matrix for projecting the first text boxes 11 in the original document file 1 to the image file 2 of the document to be checked, and in this embodiment, the coordinate transformation matrix can be Not limited to the Homography matrix.

然後，如圖4的步驟S5，該處理單元32令該文字框投影模組324根據該座標轉換矩陣，將該原始文件檔案1中的該等第一文字框11投影到該待驗文件影像檔案2中，如圖9所示；舉例來說，如圖10所示，以該原始文件檔案1中框選第一列的”Cat”的第一文字框11為例，將該第一文字框11的四個角點的座標(3,3)、(6,3)、(3,5)、(6,5)乘以該座標轉換矩陣H後，該第一文字框11的四個角點座標將轉換成(2,2)、(5,2)、(2,4)、(5,4)，且投影到該待驗文件影像檔案2時，該第一文字框11將剛好框選該待驗文件影像檔案2中的”Cat”。 Then, as shown in step S5 in FIG. 4 , the processing unit 32 instructs the text frame projection module 324 to project the first text frames 11 in the original document file 1 to the image file 2 of the document to be checked according to the coordinate transformation matrix , as shown in FIG. 9 ; for example, as shown in FIG. 10 , taking the first text box 11 of “Cat” in the first column of the original document file 1 as an example, the four text boxes of the first text box 11 After the coordinates (3,3), (6,3), (3,5), (6,5) of the corner points are multiplied by the coordinate conversion matrix H, the coordinates of the four corner points of the first text box 11 will be converted (2,2), (5,2), (2,4), (5,4), and when projected to the image file 2 of the document to be checked, the first text box 11 will just frame the document to be checked "Cat" in Image File 2.

藉此，即使該待驗文件影像檔案2之第四列中出現雜訊(交叉斜線)的”Dog”原本並未被偵測出來而未被第二文字框21框選，藉由上述將該原始文件檔案1中的全部第一文字框11投影到該待驗文件影像檔案2之步驟，則可將該待驗文件影像檔案2的第四列中未被偵測出來但在該原始文件檔案1的第四列中存在的”Dog”文字被第一文字框11框選起來，而彌補該待驗文件影像檔案2內容因為出現位移、旋轉、雜訊或手寫文字/簽名/塗改等狀況，導致文字無法被準確偵測到的缺點。 Therefore, even if the “Dog” with noise (cross-hatched) in the fourth row of the image file 2 of the document to be checked was not detected and not selected by the second text box 21, the above-mentioned In the step of projecting all the first text boxes 11 in the original document file 1 to the image file 2 of the document to be checked, the fourth row of the image file 2 of the document to be checked is not detected but is not detected in the original document file 1 The "Dog" text in the fourth column of the In this case, the defect that the text cannot be accurately detected.

然後，該文字框投影模組324再利用上述的該文字辨識模型對該待驗文件影像檔案2中被該等第一文字框11框選的內容進行文字辨識，以獲得該待驗文件影像檔案2中被各該第一文字框11框選的文字資訊。 Then, the text box projection module 324 uses the above-mentioned text recognition model to perform text recognition on the content framed by the first text boxes 11 in the image file 2 of the document to be checked, so as to obtain the image file 2 of the document to be checked The text information framed by each of the first text boxes 11 in the .

最後，如圖4的步驟S6，該處理單元32令該比對模組325比對該原始文件檔案1之被各該第一文字框11框選的文字與該待驗文件影像檔案2之被各該第一文字框11框選的文字，並輸出一比對結果，該比對結果可以是輸出或者標註該原始文件檔案1與該待驗文件檔案2內容相異之處，或者進一步判斷兩份文件之間的相異處是否大於設定的一門檻值，若是，則判定兩份文件的相異處過多，兩者可能並無關聯並輸出一錯誤訊息等。 Finally, as shown in step S6 in FIG. 4 , the processing unit 32 enables the comparison module 325 to compare the text framed by the first text boxes 11 of the original document file 1 with the text of the image file 2 of the document to be checked. The first text box 11 frames the selected text, and outputs a comparison result. The comparison result can be outputting or marking the content of the original document file 1 and the document file 2 to be checked for differences, or further judging the two documents. Whether the difference between the two documents is greater than a set threshold value, if so, it is determined that there are too many differences between the two documents, the two may not be related, and an error message is output.

此外，再回到上述的步驟S2，當該文字框配對模組322判斷如圖11所示的該原始文件檔案4中少於N個第一文字框41所框選的文字是唯一的時，則執行圖4的步驟S3’，該文字框配對模組322記錄該原始文件檔案4中框選的文字相同的該等第一文字框41及其數量(例如框選”Cat”的第一文字框41有兩個，框選”Dog”的第一文字框41有兩個，框選”Fish”的第一文字框41有兩個，框選”Egg”的第一文字框41有三個，框選”King”的第一文字框41有一個，框選”Apple”的第一文字框41有三個，框選”Car” 的第一文字框41有兩個)，以及記錄如圖11所示的該待驗文件影像檔案5中框選的文字相同的該等第二文字框51及其數量(例如框選”Cat”的第二文字框51有一個，框選”Dog”的第二文字框51有兩個，框選”Fish”的第二文字框51有兩個，框選”Egg”的第二文字框51有兩個，框選”King”的第二文字框51有一個，框選”Apple”的第二文字框51有兩個，框選”Car”的第二文字框51有兩個)。 In addition, returning to the above-mentioned step S2, when the text box matching module 322 determines that the text framed by less than N first text boxes 41 in the original document file 4 as shown in FIG. 11 is unique, then Step S3 ′ in FIG. 4 is executed, and the text box matching module 322 records the first text boxes 41 with the same text boxed in the original document file 4 and the number thereof (for example, the first text box 41 of the box-selected "Cat" has Two, there are two first text boxes 41 for box selection "Dog", two first text boxes 41 for box selection "Fish", three first text boxes 41 for box selection "Egg", and three first text boxes 41 for box selection "King" There is one first text box 41, and there are three first text boxes 41 for “Apple”, and “Car” is selected. There are two first text boxes 41 ), and record the second text boxes 51 and their numbers with the same text box-selected in the image file 5 of the document to be checked as shown in FIG. 11 (for example, box-selected “Cat” There is one second text box 51, two second text boxes 51 for selecting “Dog”, two second text boxes 51 for selecting “Fish”, and two second text boxes 51 for selecting “Egg” Two, there is one second text box 51 for box-selecting "King", two second text boxes 51 for box-selecting "Apple", and two second text boxes 51 for box-selecting "Car").

然後，該文字框配對模組322將該原始文件檔案4中框選的文字相同的該等第一文字框41與該待驗文件影像檔案5中框選的文字相同且數量與該等第一文字框41相同的該等第二文字框51進行配對，例如圖11所示，該原始文件檔案4中框選”Dog”、”Fish”、”King”、”Car”的該等第一文字框41’能與該待驗文件影像檔案5中框選”Dog”、”Fish”、”King”、”Car”的該等第二文字框51’配對；且如圖12所示，該文字框配對模組322將該原始文件檔案4中框選的文字相同且與該待驗文件影像檔案5中該等第二文字框51’配對的該等第一文字框41’以一第一矩形框42框在其中，亦即以第一矩形框42將所有相同的文字皆包含在其中；同理，該文字框配對模組322也將該待驗文件影像檔案5中框選的文字相同且與該原始文件檔案4中該等第一文字框41’配對的該等第二文字框51’以一第二矩形框52框在其中，亦即以第二矩形框52將所有相同的文字皆包含在其中。 Then, the text box matching module 322 has the same first text boxes 41 with the same text framed in the original document file 4 as the framed text in the image file 5 of the document to be checked, and the number is the same as the first text boxes 41 are paired with the same second text boxes 51. For example, as shown in FIG. 11, the first text boxes 41' of "Dog", "Fish", "King", and "Car" in the original document file 4 are selected. It can be paired with the second text boxes 51 ′ of “Dog”, “Fish”, “King”, and “Car” in the image file 5 of the document to be checked; and as shown in FIG. The group 322 has the same text framed in the original document file 4 and the first text frames 41 ′ that are paired with the second text frames 51 ′ in the image file 5 of the document to be checked are framed by a first rectangular frame 42 . Wherein, that is, the first rectangular frame 42 contains all the same characters therein; similarly, the character frame matching module 322 also frames the selected characters in the image file 5 of the document to be checked and is the same as the original document The second text boxes 51 ′ paired with the first text boxes 41 ′ in the file 4 are framed by a second rectangular frame 52 , that is, a second rectangular frame The box 52 contains all the same text in it.

然後，如圖13所示，該文字框配對模組322取得該等第一矩形框42(共有四個)與該等第二矩形框52(共有四個)的四個角點及其中點位置(如圖13中的黑點)的座標，接著，該處理單元32執行圖4的步驟S4’，令該轉換矩陣產生模組323根據該等第一矩形框42的該等座標以及該等該第二矩形框52的該等座標之間的對應關係，計算該座標轉換矩陣。值得一提的是，該文字框配對模組322可以不用取全部的該第一矩形框42以及全部的該等該第二矩形框52的四個角點及其中點位置的座標來計算該座標轉換矩陣，亦即，也可以只取至少兩個該等第一矩形框42和與至少兩個第一矩形框42配對的至少兩個第二矩形框52的四個角點及其中點位置的座標來計算該座標轉換矩陣。 Then, as shown in FIG. 13 , the text box matching module 322 obtains the positions of the four corners and the midpoints of the first rectangular frames 42 (four in total) and the second rectangular frames 52 (four in total). (the black dots in FIG. 13 ), then, the processing unit 32 executes step S4 ′ in FIG. 4 , so that the conversion matrix generation module 323 is based on the coordinates of the first rectangular frames 42 and the For the correspondence between the coordinates of the second rectangular frame 52, the coordinate transformation matrix is calculated. It is worth mentioning that the text box matching module 322 can calculate the coordinates without taking all the coordinates of the four corners of the first rectangular frame 42 and all of the second rectangular frames 52 and the positions of their midpoints. The transformation matrix, that is, it is also possible to only take at least two of the first rectangular frames 42 and at least two second rectangular frames 52 paired with the at least two first rectangular frames 42 The four corners and their midpoint positions coordinates to calculate the coordinate transformation matrix.

接著，進行上述的步驟S5，該文字框投影模組324根據步驟S4’計算得到的該座標轉換矩陣，將該原始文件檔案4中的該等第一文字框41投影到該待驗文件影像檔案5中，如圖14所示，藉此，該待驗文件影像檔案5之第三列中原本未被該第二文字框51框選的文字”Egg”就能被該第一文字框41框選，因此，當將該待驗文件影像檔案5中被該等第一文字框41框選的內容輸入該文字辨識模型進行辨識時，該待驗文件影像檔案5之第三列中原本被漏掉的文字”Egg”即能夠被該文字辨識模型辨識出來。 Next, the above-mentioned step S5 is performed, and the text frame projection module 324 projects the first text frames 41 in the original document file 4 to the image file 5 of the document to be checked according to the coordinate transformation matrix calculated in step S4 ′ 14, whereby the text “Egg” in the third column of the image file 5 of the document to be checked that was not originally framed by the second text box 51 can be framed by the first text box 41, Therefore, when the content framed by the first text boxes 41 in the image file 5 of the document to be verified is input into the text recognition model for recognition, the text that was originally missed in the third row of the image file 5 of the document to be verified "Egg" can be recognized by the text recognition model.

最後，進行如上所述的步驟S6，該處理單元32令該比對模組325比對該原始文件檔案4之被各該第一文字框41框選的文字資訊與該待驗文件影像檔案5之被各該第一文字框41框選的文字資訊，並輸出一比對結果。 Finally, the above-mentioned step S6 is performed, and the processing unit 32 makes the comparison module 325 compare the text information of the original document file 4 framed by each of the first text boxes 41 with the image file 5 of the document to be checked. The text information framed by each of the first text boxes 41 is outputted as a comparison result.

綜上所述，由於該原始文件檔案1、4中的內容通常較該原始文件檔案1、4經掃描或拍攝後產生的該待驗文件影像檔案2、5中的內容來得清晰正確，因此相較於該待驗文件影像檔案2、5，該原始文件檔案1、4中的文字內容更能夠被直接讀取(例如該原始文件檔案1、4是PDF檔)或者被文字偵測軟體準確地偵測(例如該原始文件檔案1、4是影像檔)並以文字框框選起來；所以上述實施例藉由將該原始文件檔案1、4中的該等第一文字框11、41投影到該待驗文件影像檔案2、5中，再對該待驗文件影像檔案2、5中被該等第一文字框11、41框選的內容進行文字辨識，而解決該待驗文件影像檔案2、5中某些文字因為位移、旋轉、雜訊或手寫文字/簽名/塗改等狀況而無法被準確偵測到的問題，確實達到本新型的功效與目的。 To sum up, since the content in the original document files 1 and 4 is usually clearer and more accurate than the content in the image files 2 and 5 of the document to be inspected generated after the original document files 1 and 4 are scanned or photographed, the corresponding Compared with the image files 2 and 5 of the document to be checked, the text content in the original document files 1 and 4 can be read directly (for example, the original document files 1 and 4 are PDF files) or can be accurately detected by text detection software. Detect (for example, the original document files 1, 4 are image files) and select them with text boxes; therefore, in the above embodiment, the first text boxes 11, 41 in the original document files 1, 4 are projected to the to-be-to-be Check the document image files 2, 5, and then perform text recognition on the contents framed by the first text boxes 11, 41 in the document image files 2, 5 to be checked, and solve the problem in the document image files 2, 5. Some characters cannot be accurately detected due to displacement, rotation, noise, or handwritten characters/signatures/alterations, etc., the functions and purposes of the new model are indeed achieved.

惟以上所述者，僅為本新型之實施例而已，當不能以此限定本新型實施之範圍，凡是依本新型申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾，皆仍屬本新型專利涵蓋之範圍內。 However, the above are only examples of the present invention, which should not limit the scope of the present invention. Any simple equivalent changes and modifications made according to the scope of the patent application for this new model and the contents of the patent specification are still within the scope of the present invention. within the scope of this new patent.

1、4:原始文件檔案 1, 4: Original file archives

2、5:待驗文件影像檔案 2, 5: Image files of documents to be inspected

3:文件校對裝置 3: Document proofreading device

31:儲存單元 31: Storage unit

32:處理單元 32: Processing unit

321:文字擷取模組 321: Text Capture Module

322:文字框配對模組 322: Text box matching module

323:轉換矩陣產生模組 323: Conversion matrix generation module

324:文字框投影模組 324: Text box projection module

325:比對模組 325: Comparison Module

Claims

A document proofreading device is used for proofreading an original document file and an image file of a document to be checked, and includes: a storage unit, in which the original document file and the image file of the document to be checked are stored; and a processing unit capable of accessing the storage unit and comprising a text capturing module, a text frame matching module, a conversion matrix generating module, a text frame projecting module and a matching module; wherein The text extraction module obtains a plurality of first text boxes and the text selected by each of the first text boxes from the original document file, and obtains a plurality of second text boxes and each of the first text boxes from the image file of the document to be checked. Two text boxes to select the text; When the text box matching module determines that the text selected by at least N (N≥2 and N is a positive integer) first text boxes in the original document file is unique, it is matched to the pending test according to the N first text boxes Select M (2≤M≤N and M is a positive integer) second text boxes of the only text in the document image file, and obtain the four corners of the matching M first text boxes and M second text boxes the coordinates of the position of the point and its midpoint; The conversion matrix generation module calculates a coordinate conversion matrix according to the correspondence between the coordinates of the paired M first text boxes and the coordinates of the M second text boxes; The text box projection module projects the first text boxes in the original document file into the image file of the document to be checked according to the coordinate transformation matrix, and obtains the image files of the document to be checked that are selected by the first text frames Text; The comparison module compares the text selected by each of the first text boxes of the original document file with the text information of the image file to be checked by each of the first text boxes, and outputs a comparison result.

The document proofreading device according to claim 1, wherein when the text box matching module determines that the text framed by less than N first text boxes in the original document file is unique, it records the frame selection in the original document file the first text boxes with the same text and their number, and record the second text boxes and their number with the same text boxed in the image file of the document to be checked, and the same text boxed in the original document file The first text boxes in the document image file to be checked are matched with the second text boxes with the same framed text and the same number as the first text boxes, and the framed text in the original document file is the same The first text boxes are framed by a first rectangular frame, and the coordinates of the four corner points and the midpoint of at least two of the first rectangular frames are obtained, and the text framed in the image file of the document to be checked is framed The same second text frames are framed by a second rectangular frame, and the coordinates of the four corners and the midpoint of at least two of the second rectangular frames paired with at least two of the first rectangular frames are obtained; The transformation matrix generating module calculates the coordinate transformation matrix according to the corresponding relationship between the coordinates of the paired at least two first rectangular frames and the coordinates of at least two second rectangular frames.

The document proofreading device according to claim 1 or 2, wherein the text capture module further obtains the page numbers of the original document file and the image file of the document to be checked, and the text capture module determines the Only after the number of pages is the same as the number of pages of the image file of the document to be checked, the first text boxes and the text information selected by each of the first text boxes are obtained from the original document file, and obtained from the image file of the document to be checked the second text boxes and text information framed by each of the second text boxes.

The document proofreading apparatus according to claim 1 or 2, wherein the coordinate transformation matrix is a homography matrix.