TWI745724B

TWI745724B - Mobile Document Recognition System

Info

Publication number: TWI745724B
Application number: TW108126342A
Authority: TW
Inventors: 王文進; 張智翔; 陳宗霆
Original assignee: 國泰人壽保險股份有限公司
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2021-11-11
Also published as: TW202105316A

Abstract

一種行動文件辨識系統包含一行動裝置，該行動裝置包括一影像擷取模組、一顯示模組、及一影像辨識模組。該影像辨識模組利用深度學習的技術，且包含多個分別對應多個類別的分類器作影像辨識。該影像擷取模組用於擷取以產生一預覽影像。藉由一使用者選擇對應一書面文件所屬的該類別的該分類器，以對該預覽影像作影像辨識。當該分類器對該預覽影像辨識出該書面文件屬於該類別時，藉由該顯示模組的該提示訊息通知該使用者成功辨識，使得該使用者產生一擷取指令，以將該預覽影像儲存為一待辨識文件影像，並再對該待辨識文件影像作再次辨識。A mobile document recognition system includes a mobile device. The mobile device includes an image capture module, a display module, and an image recognition module. The image recognition module uses deep learning technology and includes multiple classifiers corresponding to multiple categories for image recognition. The image capturing module is used for capturing to generate a preview image. A user selects the classifier corresponding to the category to which a written document belongs to perform image recognition on the preview image. When the classifier recognizes the preview image that the written document belongs to the category, the prompt message of the display module is used to notify the user of the successful recognition, so that the user generates a capture command to the preview image Save it as a document image to be identified, and re-identify the document image to be identified.

Description

Mobile Document Recognition System

本發明是有關於一種影像辨識系統，特別是指一種行動文件辨識系統。 The present invention relates to an image recognition system, in particular to a mobile document recognition system.

隨著行動裝置的普及，現有的金融或保險公司的業務人員也已廣泛地使用智慧型手機作為輔助完成交易的工具。舉例來說，當業務人員與客戶談妥一件貸款合約，而要執行簽約的程序時，業務人員會需要利用智慧型手機拍攝客戶的相關文件，例如身分證、土地所有權狀、財力證明等書面文件，進而產生多個文件影像。接著，該智慧型手機再藉由網路將該等文件影像傳送至一伺服系統，使得位於公司內的承辦人員根據該等文件影像，進行調閱以完成案件審查。如此一來，則能解決傳統以實體文件轉送所造成的不便與耗時。然而，當該智慧型手機所上傳的該等文件影像錯誤或不清晰時，該業務人員也需要先接到承辦人員的指示，才知道必須重新補拍，這也導致另外一種的時效延宕且讓客戶的觀感不佳。因此，如何判斷文件影像的正確性，以減少補件次數進而提升客戶服務便成為一個待解決的問題。 With the popularization of mobile devices, business personnel of existing financial or insurance companies have also widely used smartphones as a tool to assist in completing transactions. For example, when a business person negotiates a loan contract with a customer, and the signing process is to be executed, the business person will need to use a smartphone to take pictures of the customer's relevant documents, such as identity cards, land ownership certificates, financial proofs, etc. File, and then generate multiple file images. Then, the smart phone transmits the document images to a server system via the network, so that the undertaking staff located in the company can read the document images to complete the case review. In this way, the inconvenience and time-consuming caused by traditional physical file transfer can be solved. However, when the images of the files uploaded by the smart phone are wrong or unclear, the business person also needs to receive instructions from the undertaker before knowing that it must be re-shot, which also causes another kind of time delay and delays. Customer perception is not good. Therefore, how to determine the correctness of the document image to reduce the number of repairs and improve customer service Service becomes a problem to be solved.

因此，本發明的目的，即在提供一種在行動裝置即能運行的行動文件辨識系統。 Therefore, the purpose of the present invention is to provide a mobile document identification system that can run on a mobile device.

於是，根據本發明提供一種行動文件辨識系統，適用於一使用者，並包含一行動裝置，該行動裝置包括一影像擷取模組、一顯示模組、及一影像辨識模組。 Therefore, according to the present invention, a mobile document recognition system is provided, which is suitable for a user and includes a mobile device. The mobile device includes an image capture module, a display module, and an image recognition module.

該影像擷取模組用於擷取影像，以產生一預覽影像，並藉由該使用者產生一擷取指令，使得該影像擷取模組將包含一書面文件的該預覽影像儲存為一待辨識文件影像。該顯示模組電連接該影像擷取模組，以顯示該預覽影像，並還顯示一提示框及一提示訊息。該影像辨識模組電連接該影像擷取模組及該顯示模組，並利用深度學習(Deep learning)的技術作影像辨識，且包含多個分類器，該等分類器分別對應多個類別作影像辨識。 The image capture module is used to capture images to generate a preview image, and the user generates a capture command so that the image capture module saves the preview image including a written document as a waiting Identify document images. The display module is electrically connected to the image capturing module to display the preview image, and also displays a prompt box and a prompt message. The image recognition module is electrically connected to the image capture module and the display module, and uses deep learning technology for image recognition, and includes a plurality of classifiers, each of which corresponds to a plurality of categories. Image recognition.

其中，該書面文件屬於該等類別之其中一者，藉由該使用者選擇該等分類器之其中對應該書面文件所屬的該類別者作為一選定分類器。該影像辨識模組的該選定分類器對該預覽影像作影像辨識，使得當該選定分類器對該預覽影像辨識出該書面文件屬於該等類別之其中該者時，藉由該顯示模組的該提示訊息通知該使用者成功辨識，進而使得該使用者產生該擷取指令。 Wherein, the written document belongs to one of these categories, and the user selects one of the classifiers corresponding to the category of the written document as a selected classifier. The selected classifier of the image recognition module performs image recognition on the preview image, so that when the selected classifier recognizes the preview image that the written document belongs to one of the categories, by the display module The prompt message informs the user of the successful identification, so that the user generates the retrieval command.

該選定分類器還對該待辨識文件影像再次作影像辨識，且在成功辨識出該書面文件屬於該等類別之其中該者時，將該待辨識文件影像儲存為一已辨識文件影像。 The selected classifier also performs image recognition on the document image to be recognized again, and when successfully recognizing that the written document belongs to one of the categories, the document image to be recognized is stored as a recognized document image.

在一些實施態樣中，其中，當該選定分類器判斷該預覽影像不包含該書面文件時，藉由該顯示模組的該提示訊息通知該使用者將對應該書面文件的影像置於該提示框之內。當該選定分類器無法對該預覽影像辨識出該書面文件屬於該等類別之其中該者時，藉由該顯示模組的該提示訊息通知該使用者該書面文件不正確。當該選定分類器無法對該待辨識文件影像辨識出該書面文件屬於該等類別之其中該者時，藉由該顯示模組的該提示訊息通知該使用者該書面文件不正確。 In some embodiments, when the selected classifier determines that the preview image does not include the written document, the prompt message of the display module is used to notify the user to place the image corresponding to the written document in the prompt Within the box. When the selected classifier cannot identify the preview image that the written document belongs to one of the categories, the prompt message of the display module is used to notify the user that the written document is incorrect. When the selected classifier is unable to identify the document to be identified as belonging to one of the categories, the user is notified by the prompt message of the display module that the document is incorrect.

在一些實施態樣中，其中，當該選定分類器判斷該預覽影像不包含該書面文件時，該顯示模組還將該提示框顯示為紅色。當該選定分類器對該預覽影像辨識出該書面文件屬於該等類別之其中該者時，該顯示模組還將該提示框顯示為綠色。 In some implementation aspects, when the selected classifier determines that the preview image does not contain the written document, the display module further displays the prompt box in red. When the selected classifier recognizes the preview image that the written document belongs to one of the categories, the display module also displays the prompt box in green.

在另一些實施態樣中，其中，每一該分類器作影像辨識時，先將該預覽影像或該待辨識文件影像依比例縮小，使得影像的長度及寬度小於等於一預設尺寸，再於影像中加入黑色底圖，並調整要預測與辨識的部分影像在整個影像的正中間，接著，對影像中的每一像素的顏色值作正規化之後，而獲得一預處理影像。每一該分類器的一第一卷積神經網絡模型(Convolutional Neural Network，CNN)接收該預處理影像，以進行辨識，進而獲得一機率值。 In other implementations, when each of the classifiers performs image recognition, the preview image or the document image to be recognized is first scaled down so that the length and width of the image are less than or equal to a preset size, and then A black background image is added to the image, and the part of the image to be predicted and identified is adjusted to be in the middle of the entire image. Then, after normalizing the color value of each pixel in the image, a preprocessed image is obtained. Every A first Convolutional Neural Network (CNN) model of the classifier receives the preprocessed image for identification, and then obtains a probability value.

在一些實施態樣中，其中，該選定分類器是根據該機率值的大小，判斷對應該預處理影像的該預覽影像或該待辨識文件影像是否屬於該等類別之其中該者。 In some embodiments, the selected classifier determines whether the preview image corresponding to the preprocessed image or the document image to be recognized belongs to one of the categories according to the probability value.

在一些實施態樣中，其中，每一該分類器的該第一卷積神經網絡模型是根據該預處理影像中，該書面文件的一版型、一字型、一間距、一圖章、一顏色、及一特殊圖案之其中至少一者的特徵作影像辨識。 In some implementation aspects, the first convolutional neural network model of each classifier is based on a layout, a font, a spacing, a stamp, and a stamp of the written document in the preprocessed image. The characteristics of at least one of the color and a special pattern are used for image recognition.

在另一些實施態樣中，該行動文件辨識系統還包含一伺服系統，該行動裝置與該伺服系統建立連線，以將該已辨識文件影像傳送至該伺服系統儲存。 In other embodiments, the mobile document recognition system further includes a server system, and the mobile device establishes a connection with the server system to send the recognized document image to the server system for storage.

在另一些實施態樣中，該行動文件辨識系統還包含一伺服系統，該伺服系統包含一影像辨識單元，該影像辨識單元同樣利用深度學習的技術作影像辨識。該行動裝置與該伺服系統建立連線，以將該已辨識文件影像傳送至該伺服系統，該伺服系統的該影像辨識單元對該已辨識文件影像作影像辨識，並在成功辨識出該書面文件屬於該等類別之其中該者時，儲存該已辨識文件影像。 In other embodiments, the mobile document recognition system further includes a servo system, the servo system includes an image recognition unit, and the image recognition unit also uses deep learning technology for image recognition. The mobile device establishes a connection with the server system to send the recognized document image to the server system, and the image recognition unit of the server system performs image recognition on the recognized document image, and successfully recognizes the written document When it belongs to one of these categories, the recognized document image is stored.

在一些實施態樣中，其中，該伺服系統的該影像辨識單元包含多個對應該等類別的分類器，每一該分類器的一第二卷積神經網路模型對應同一個該類別的該第一卷積神經網路模型，每一該第一卷積神經網路模型相關於對應的該第二卷積神經網路模型。該行動裝置的該影像辨識模組的每一該分類器所佔用的儲存空間相對於對應的該伺服系統的該影像辨識單元的該分類器較小。 In some embodiments, the image recognition unit of the servo system The element includes a plurality of classifiers corresponding to the categories, and a second convolutional neural network model of each classifier corresponds to the first convolutional neural network model of the same category, and each of the first convolutional neural network models The neural network model is related to the corresponding second convolutional neural network model. The storage space occupied by each classifier of the image recognition module of the mobile device is smaller than that of the corresponding classifier of the image recognition unit of the servo system.

在另一些實施態樣中，其中，該等類別包含一身分類別、一土地類別、一房屋類別、一財力類別、一買賣類別、及一繳息類別之其中至少一者。 In other implementation aspects, the categories include at least one of an identity category, a land category, a housing category, a financial capability category, a trading category, and an interest payment category.

本發明的功效在於：藉由該影像辨識模組所包含的多個分類器分別對應多個類別的不同書面文件，使得使用者根據所要辨識的書面文件的類別，先選擇對應的分類器，而能夠有效且準確地辨識出該書面文件的所屬類別。更重要的是：藉由對每一分類器的設計優化，而能夠有效地減少所佔用的儲存空間，進而即使在該行動裝置不連網的情況下，都能夠獨立地運作而正確地執行影像辨識的功能。 The effect of the present invention is that the multiple classifiers included in the image recognition module respectively correspond to multiple types of different written documents, so that the user first selects the corresponding classifier according to the type of the written document to be recognized, and It can effectively and accurately identify the category of the written document. More importantly: By optimizing the design of each classifier, it can effectively reduce the storage space occupied, and even when the mobile device is not connected to the Internet, it can operate independently and execute the image correctly. Recognition function.

1:行動裝置 1: mobile device

11:影像擷取模組 11: Image capture module

12:顯示模組 12: Display module

13:影像辨識模組 13: Image recognition module

131:分類器 131: Classifier

132:分類器 132: Classifier

2:伺服系統 2: Servo system

21:影像辨識單元 21: Image recognition unit

211:分類器 211: Classifier

212:分類器 212: Classifier

9:網路 9: Internet

本發明的其他的特徵及功效，將於參照圖式的實施方式中清楚地呈現，其中：圖1是一方塊圖，說明本發明行動文件辨識系統的一實施例。 Other features and effects of the present invention will be clearly presented in the embodiments with reference to the drawings, in which: FIG. 1 is a block diagram illustrating an embodiment of the mobile document identification system of the present invention.

在本發明被詳細描述之前，應當注意在以下的說明內容中，類似的元件是以相同的編號來表示。 Before the present invention is described in detail, it should be noted that in the following description, similar elements are denoted by the same numbers.

參閱圖1，本發明行動文件辨識系統之一實施例，適用於一使用者，並包含一行動裝置1及一伺服系統2。該伺服系統2包含一影像辨識單元21，該影像辨識單元21利用深度學習(Deep learning)的技術作影像辨識。更詳細地說，該影像辨識單元21包含多個分類器211、212，每一該分類器211、212的一第一卷積神經網路模型(Convolutional Neural Network，CNN)，用於辨識多個類別之其中一對應者的書面文件。 Referring to FIG. 1, an embodiment of the mobile document identification system of the present invention is suitable for a user and includes a mobile device 1 and a servo system 2. The servo system 2 includes an image recognition unit 21 that uses deep learning technology for image recognition. In more detail, the image recognition unit 21 includes a plurality of classifiers 211, 212, and a first Convolutional Neural Network (CNN) model of each of the classifiers 211, 212 is used to identify a plurality of A written document of one of the corresponding categories.

在本實施例中，該等類別包含一身分類別、一土地類別、一房屋類別、一財力類別、一買賣類別、及一繳息類別，但不以此為限。舉例來說，屬於該身分類別的書面文件為身分證的正面及反面；屬於該土地類別的書面文件為土地權狀；屬於該房屋類別的書面文件為建物所有權狀；屬於該財力類別的書面文件為財力證明、財產歸屬清單；屬於該買賣類別的書面文件為買賣契約書；屬於該繳息類別的書面文件為近6個月的繳息證明。 In this embodiment, the categories include an identity category, a land category, a house category, a financial capability category, a transaction category, and an interest category, but are not limited to this. For example, the written documents belonging to the identity category are the front and back of the ID card; the written documents belonging to the land category are the land title; the written documents belonging to the house category are the building ownership; the written documents belonging to the financial category It is a financial proof and a list of property ownership; the written document that belongs to the transaction category is the sales contract; the written document that belongs to the interest payment category is the interest payment certificate for the past 6 months.

該行動裝置1例如是一智慧型手機、一平板電腦、或其他可攜式的電子設備，並包括一影像擷取模組11、一顯示模組12、及一影像辨識模組13。該影像擷取模組11用於擷取影像，以產生一預覽影像，並藉由該使用者產生一擷取指令，使得該影像擷取模組11將包含待辨識的該書面文件的該預覽影像儲存為一待辨識文件影像。該顯示模組12電連接該影像擷取模組11，以顯示該預覽影像，並還顯示一提示框及一提示訊息。 The mobile device 1 is, for example, a smart phone, a tablet computer, or other portable electronic equipment, and includes an image capture module 11, a display module 12, And an image recognition module 13. The image capturing module 11 is used for capturing images to generate a preview image, and by the user generating a capturing command, the image capturing module 11 will include the preview of the written document to be recognized The image is saved as a document image to be recognized. The display module 12 is electrically connected to the image capturing module 11 to display the preview image, and also displays a prompt box and a prompt message.

該影像辨識模組13電連接該影像擷取模組11及該顯示模組12，並同樣利用深度學習的技術作影像辨識，且包含多個分類器131、132，該等分類器131、132分別對應該等類別作影像辨識。要特別提醒的是：在本實施例中，該等類別的數量是6個，因此，該等分類器的數量也是6個，而為方便說明起見，圖1僅示例性地繪出2個分類器131、132或211、212，並非表示是實際的分類器數量。 The image recognition module 13 is electrically connected to the image capture module 11 and the display module 12, and also uses deep learning technology for image recognition, and includes a plurality of classifiers 131, 132, the classifiers 131, 132 Respectively correspond to these categories for image recognition. It should be specially reminded that: in this embodiment, the number of such categories is 6, therefore, the number of such classifiers is also 6, and for the convenience of description, only two are exemplified in Figure 1. The classifiers 131, 132 or 211, 212 do not indicate the actual number of classifiers.

更具體地說，該行動裝置1預先安裝一應用程式(APP)，當該使用者欲辨識該書面文件(例如是客戶的身分證)時，先點選以執行該應用程式，並選擇該等分類器131、132之其中對應該書面文件所屬的該類別者作為一選定分類器，例如是對應該身分類別的該分類器。接著，該使用者將該行動裝置1的該影像擷取模組11的相機鏡頭對準該書面文件，而在該使用者選擇該選定分類器之後，該影像辨識模組13即開始對該預覽影像作影像辨識。 More specifically, the mobile device 1 is pre-installed with an application program (APP). When the user wants to identify the written document (for example, the client's ID card), first click to execute the application, and select the One of the classifiers 131 and 132 corresponding to the category to which the written document belongs is used as a selected classifier, for example, the classifier corresponding to the identity category. Then, the user points the camera lens of the image capturing module 11 of the mobile device 1 at the written document, and after the user selects the selected classifier, the image recognition module 13 starts the preview The image is used for image recognition.

當該選定分類器判斷該預覽影像不包含該書面文件時，則藉由該顯示模組12的該提示訊息及該提示框通知該使用者將對應該書面文件的影像置於該提示框之內。舉例來說，該提示訊息是「請於提示框內對準浮水印文字拍攝」，並將該提示框顯示為紅色。 When the selected classifier determines that the preview image does not contain the written document, Then, the prompt message and the prompt box of the display module 12 notify the user to place the image corresponding to the written document in the prompt box. For example, the prompt message is "please aim at the watermark text in the prompt box", and the prompt box is displayed in red.

而當該選定分類器無法對該預覽影像辨識出該書面文件屬於該類別(如該身分類別)時，則藉由該顯示模組12的該提示訊息通知該使用者該書面文件不正確。 When the selected classifier cannot recognize the preview image that the written document belongs to the category (such as the identity category), the prompt message of the display module 12 is used to notify the user that the written document is incorrect.

而當該選定分類器對該預覽影像辨識出該書面文件屬於該類別(如該身分類別)時，則藉由該顯示模組12的該提示訊息及該提示框通知該使用者成功辨識，進而使得該使用者產生該擷取指令，如按壓該行動裝置1的一對應快門鍵。舉例來說，該提示訊息是「這可能是一張正確的身分證正面」，並將該提示框顯示為綠色。 When the selected classifier recognizes the preview image that the written document belongs to the category (such as the identity category), the prompt message and the prompt box of the display module 12 are used to notify the user of the successful identification, and then This allows the user to generate the capture command, such as pressing a corresponding shutter button of the mobile device 1. For example, the prompt message is "This may be the front of a correct ID card", and the prompt box is displayed in green.

在該使用者產生該擷取指令後，該影像擷取模組11產生該待辨識文件影像，該選定分類器還對該待辨識文件影像再次作影像辨識。當該選定分類器成功辨識出該書面文件屬於該類別(如該身分類別)時，該待辨識文件影像被儲存為一已辨識文件影像。反之，當該選定分類器無法對該待辨識文件影像辨識出該書面文件屬於該類別(如該身分類別)時，藉由該顯示模組12的該提示訊息通知該使用者該書面文件不正確。 After the user generates the capturing command, the image capturing module 11 generates the document image to be recognized, and the selected classifier further performs image recognition on the document image to be recognized. When the selected classifier successfully recognizes that the written document belongs to the category (such as the identity category), the document image to be recognized is stored as a recognized document image. Conversely, when the selected classifier fails to recognize that the written document belongs to the category (such as the identity category) from the image of the document to be recognized, the prompt message of the display module 12 is used to notify the user that the written document is incorrect .

更詳細地說，每一該分類器131、132作影像辨識時，先將該預覽影像或該待辨識文件影像依比例縮小，使得影像的長度及寬度小於等於一預設尺寸，以提升整體辨識的效率；再於影像中加入黑色底圖，並調整要預測與辨識的部分影像在整個影像的正中間，以提升辨識準確度；接著，對影像中的每一像素的顏色值作正規化之後，而獲得一預處理影像。該預設尺寸例如是120個畫素(pixel/px)，對顏色值作正規化是將其轉換成0~1之間的數值，以提升辨識的準確度。 In more detail, when each of the classifiers 131 and 132 performs image recognition, the preview image or the document image to be recognized is scaled down first, so that the length of the image and the The width is less than or equal to a preset size to improve the overall recognition efficiency; then add a black background image to the image, and adjust the part of the image to be predicted and recognized in the middle of the entire image to improve the accuracy of the recognition; After the color value of each pixel in is normalized, a preprocessed image is obtained. The preset size is, for example, 120 pixels (pixel/px), and normalizing the color value is to convert it to a value between 0 and 1, so as to improve the accuracy of recognition.

每一該分類器131、132的一第二卷積神經網絡模型接收該預處理影像，並根據該預處理影像中，該書面文件的一版型、一字型、一間距、一圖章、一顏色、及一特殊圖案之其中至少一者的特徵作影像辨識，進而獲得一機率值。該機率值介於0~1之間。該選定分類器是根據該機率值的大小，判斷對應該預處理影像的該預覽影像或該待辨識文件影像是否屬於對應該分類器131、132的該類別。舉例來說，當該機率值大於一預設閥值時，判斷為屬於對應該分類器131、132的該類別，該預設閥值例如是0.5或0.8，該預設閥值越高，表示影像中必須具有更多明確的特徵，才會被判斷為屬於該類別。 A second convolutional neural network model of each of the classifiers 131, 132 receives the pre-processed image, and according to the pre-processed image, the written document’s one format, one font, one spacing, one stamp, one The characteristics of at least one of the color and a special pattern are used for image recognition to obtain a probability value. The probability value is between 0 and 1. The selected classifier determines whether the preview image corresponding to the preprocessed image or the document image to be identified belongs to the category corresponding to the classifiers 131 and 132 according to the magnitude of the probability value. For example, when the probability value is greater than a preset threshold, it is determined to belong to the category corresponding to the classifiers 131 and 132. The preset threshold is, for example, 0.5 or 0.8. The higher the preset threshold, the higher the The image must have more clear features before it can be judged as belonging to this category.

當該書面文件被成功辨識出屬於該類別(如該身分類別)之後，該待辨識文件影像被儲存為該已辨識文件影像。該行動裝置1經由一網路9與該伺服系統2建立連線，以將該已辨識文件影像傳送至該伺服系統2，該伺服系統2的該影像辨識單元21藉由對應該類別(如該身分類別)的該分類器211、212的該第一卷積神經網路模型對該已辨識文件影像作影像辨識，並在成功辨識出該書面文件屬於該類別(如該身分類別)時，儲存該已辨識文件影像。 After the written document is successfully identified as belonging to the category (such as the identity category), the document image to be identified is stored as the identified document image. The mobile device 1 establishes a connection with the server system 2 via a network 9 to send the recognized document image to the server system 2, and the image recognition unit 21 of the server system 2 corresponds to The first convolutional neural network model of the classifier 211, 212 of the category (such as the identity category) performs image recognition on the recognized document image, and successfully identifies that the written document belongs to the category (if the identity category) ), save the recognized document image.

要特別強調的是：屬於同一個該類別的該第二卷積神經網路模型是相關於對應的該第一卷積神經網路模型。舉例來說，該伺服系統2的該影像辨識單元21的該等分類器211、212是採用Keras及Google Xception的相關技術，而建立該等第一卷積神經網路模型，而該行動裝置1的該影像辨識模組13的該等分類器131、132是採用Keras、Google Xception、及CoreML的相關技術對該等第一卷積神經網路模型作輕量化與優化，而建立該等第二卷積神經網路模型。參考下表，舉例說明將原始Google Xception總共15層的模型作不同程度的輕量化與優化所造成的辨識正確率的比較結果，但不以此為限。更詳細地說，深度學習開發工具Keras在模型建置時具有相關的設定可用於模型的優化，例如，模型的剪枝(Pruning或稱Network reduction)。剪枝是由於神經網路演算法的節點眾多而運算量龐大，因此，為了讓模型在使用上能再加速，通常會將當中權重較小的節點進行調整，如直接讓其權重降為0，而能夠降低運算量。而在Google Xception層數減少的優化中是利用整體學習(Ensemble learning)的相關技術，通過建構並結合多個模型來完成學習任務，以判斷在每次究竟該移除哪個層 (Block)才不會導致準確率下降太多。 It should be particularly emphasized that the second convolutional neural network model belonging to the same category is related to the corresponding first convolutional neural network model. For example, the classifiers 211 and 212 of the image recognition unit 21 of the servo system 2 adopt the related technologies of Keras and Google Xception to establish the first convolutional neural network models, and the mobile device 1 The classifiers 131, 132 of the image recognition module 13 adopt the related technologies of Keras, Google Xception, and CoreML to lighten and optimize the first convolutional neural network models, and establish the second convolutional neural network models. Convolutional neural network model. Refer to the following table to illustrate the comparison results of the recognition accuracy caused by different degrees of lightweight and optimization of the original Google Xception model with a total of 15 layers, but it is not limited to this. In more detail, the deep learning development tool Keras has relevant settings during model building that can be used for model optimization, for example, model pruning (Pruning or Network reduction). Pruning is due to the large number of nodes in the neural network algorithm and the huge amount of calculation. Therefore, in order to accelerate the use of the model, the nodes with smaller weights are usually adjusted, such as directly reducing the weight to 0, and Can reduce the amount of calculations. In the optimization of Google Xception layer reduction, the related technology of Ensemble learning is used to complete the learning task by constructing and combining multiple models to determine which layer should be removed each time. (Block) will not cause the accuracy to drop too much.

在本實施例中，該伺服系統2是採用使用層數為15的該等第一卷積神經網路模型，而該行動裝置1是採用移除層數為12，使用層數為3的該等第二卷積神經網路模型，使得該行動裝置1的該等分類器131、132僅佔儲存空間0.221MB，相較於該伺服系統2的該等分類器211、212需要儲存空間239MB，不但所使用的儲存空間相對少很多而能夠適合在行動裝置1上使用，也沒有對辨識的準確率有無法接受的影響，而能同時兼顧使用容量與正確性。 In this embodiment, the servo system 2 uses the first convolutional neural network models with 15 layers, and the mobile device 1 uses 12 removed layers and 3 layers With the second convolutional neural network model, the classifiers 131 and 132 of the mobile device 1 occupy only 0.221MB of storage space, compared to the classifiers 211 and 212 of the server system 2 that require 239MB of storage space. Not only the storage space used is relatively small and it is suitable for use on the mobile device 1, but it also does not have an unacceptable impact on the accuracy of identification, and it can take into account both the use capacity and the correctness at the same time.

另外要特別補充說明的是：在本實施例中，該伺服系統2會藉由該影像辨識單元21對該已辨識文件影像再次作影像辨識，而在其他實施例中，該伺服系統2也可以不對該已辨識文件影像作再次的影像辨識，改為直接儲存該已辨識文件影像。 In addition, it should be noted that: in this embodiment, the servo system 2 will perform image recognition again on the recognized document image through the image recognition unit 21, and In other embodiments, the servo system 2 may not perform image recognition again on the recognized document image, but directly store the recognized document image.

綜上所述，藉由多個分類器對應辨識屬於不同類別的書面文件，並藉由將卷積神經網路模型的輕量化與優化，不但有效地減少所佔用的儲存空間，而有利於安裝在行動裝置，同時也能兼顧辨識的正確性，更使得該行動裝置即使在不連網的情況下，仍然能夠藉由其影像辨識模組獨立地完成辨識書面文件的所屬類別，故確實能達成本發明的目的。 In summary, multiple classifiers are used to identify written documents belonging to different categories, and by lightening and optimizing the convolutional neural network model, it not only effectively reduces the storage space occupied, but also facilitates installation In a mobile device, it can also take into account the correctness of recognition, so that even when the mobile device is not connected to the Internet, it can still use its image recognition module to independently recognize the category of the written document, so it can indeed be achieved. The purpose of the present invention.

惟以上所述者，僅為本發明的實施例而已，當不能以此限定本發明實施的範圍，凡是依本發明申請專利範圍及專利說明書內容所作的簡單的等效變化與修飾，皆仍屬本發明專利涵蓋的範圍內。 However, the above are only examples of the present invention. When the scope of implementation of the present invention cannot be limited by this, all simple equivalent changes and modifications made in accordance with the scope of the patent application of the present invention and the content of the patent specification still belong to Within the scope covered by the patent of the present invention.

1:行動裝置 1: mobile device

11:影像擷取模組 11: Image capture module

12:顯示模組 12: Display module

13:影像辨識模組 13: Image recognition module

131:分類器 131: Classifier

132:分類器 132: Classifier

2:伺服系統 2: Servo system

21:影像辨識單元 21: Image recognition unit

211:分類器 211: Classifier

212:分類器 212: Classifier

9:網路 9: Internet

Claims

A mobile document recognition system is suitable for a user and includes a mobile device. The mobile device includes: an image capturing module for capturing images to generate a preview image, and generating a preview image by the user The capture command causes the image capture module to store the preview image containing a written document as a document image to be recognized; a display module is electrically connected to the image capture module to display the preview image, and return Display a prompt box and a prompt message; and an image recognition module, which is electrically connected to the image capture module and the display module, and uses deep learning technology for image recognition, and includes multiple classifiers , The classifiers respectively correspond to a plurality of categories for image recognition, where the written document belongs to one of these categories, and the user selects one of the classifiers corresponding to the category of the written document as A selected classifier, the selected classifier of the image recognition module performs image recognition on the preview image, so that when the selected classifier recognizes the preview image that the written document belongs to one of the categories, by The prompt message of the display module informs the user of the successful identification, so that the user generates the capture command. The selected classifier also performs image identification again on the to-be-identified document image, and the written document is successfully identified When it belongs to one of these categories, the document image to be identified is stored as an identified document image.

The mobile document identification system according to claim 1, wherein: when the selected classifier determines that the preview image does not contain the written document, the prompt message of the display module is used to notify the user that the corresponding written document is The image is placed in the prompt box, and when the selected classifier cannot recognize the preview image that the written document belongs to one of the categories, the user is notified of the written document by the prompt message of the display module The document is incorrect. When the selected classifier fails to identify the document to be identified as one of the categories, the user will be notified by the prompt message of the display module that the document is incorrect. .

The mobile document identification system according to claim 2, wherein: when the selected classifier determines that the preview image does not contain the written document, the display module also displays the prompt box in red, and when the selected classifier responds to the When the preview image recognizes that the written document belongs to one of these categories, the display module also displays the prompt box in green.

The mobile document recognition system according to claim 2, wherein when each of the classifiers performs image recognition, the preview image or the document image to be recognized is scaled down first, so that the length and width of the image are less than or equal to a preset Size, and then add a black base map to the image, and adjust the part of the image to be predicted and identified in the middle of the entire image, and then normalize the color value of each pixel in the image to obtain a preprocessed image , A first convolutional neural network (Convolutional Neural Network, CNN) model of each classifier receives the preprocessed image for identification, and then Obtain a probability value.

The mobile document identification system according to claim 4, wherein the selected classifier determines whether the preview image corresponding to the preprocessed image or the document image to be identified belongs to one of the categories according to the probability value By.

The mobile document identification system according to claim 5, wherein the first convolutional neural network model of each classifier is based on a layout, a font, and a spacing of the written document in the preprocessed image The characteristics of at least one of, a stamp, a color, and a special pattern are used for image recognition.

The mobile document identification system according to claim 5 further includes a server system, and the mobile device establishes a connection with the server system to send the identified document image to the server system for storage.

The mobile document recognition system according to claim 5 further includes a servo system, the servo system includes an image recognition unit, the image recognition unit also uses deep learning technology for image recognition, the mobile device is connected with the servo system Line to send the recognized document image to the servo system, and the image recognition unit of the servo system performs image recognition on the recognized document image, and when it successfully recognizes that the written document belongs to one of the categories To save the recognized document image.

The mobile document recognition system according to claim 8, wherein the image recognition unit of the servo system includes a plurality of classifiers corresponding to the categories, and a second convolutional neural network model of each of the classifiers corresponds to the same A first convolutional neural network model of the category, each of the first convolutional neural network model is related to the corresponding second convolutional neural network model, and the mobile device The storage space occupied by each classifier of the image recognition module is smaller than that of the corresponding classifier of the image recognition unit of the servo system.

The mobile document identification system according to claim 1, wherein the categories include at least one of an identity category, a land category, a house category, a financial capability category, a transaction category, and an interest category.