TWI701620B

TWI701620B - Document information extraction and archiving system

Info

Publication number: TWI701620B
Application number: TW108109781A
Authority: TW
Inventors: 趙式隆; 林奕辰; 沈昇勳; 王彥稀; 林哲賢
Original assignee: 洽吧智能股份有限公司
Priority date: 2019-03-21
Filing date: 2019-03-21
Publication date: 2020-08-11
Also published as: TW202036399A

Abstract

A document information extraction and archiving system is electrically connected to a database, and the Document information extraction and archiving system includes an input module, a text segmentation area detection module, a character recognition module, a semantic segmentation module, and a database connection module. The input module accepts a document image including a plurality of characters, and the text segmentation area detecting module frames the text in the document image by a first type of neural network model to form a text segmentation area. The character recognition module recognizes the text in the text segmentation area by a second type of neural network model to obtain an editable string. The semantic segmentation module breaks the string to form a plurality of word segments, and assigns each word segment a symbol. Moreover, the database connection module serially links the word segment to each field of the database according to the symbol thereof.

Description

Document information extraction and filing system

本發明是指一種文件資訊提取歸檔系統，特別是指一種涉及文字影像的文件資訊提取歸檔系統。 The invention refers to a file information extraction and archiving system, in particular to a file information extraction and archiving system involving text images.

目前，為了評估潛在客戶的既有保單，保險公司的經紀人員必須將既有保單上的資料輸入到保險公司的評估系統中，才能評估潛在客戶的既有保單並對潛在客戶做出進一步的建議。然而，既有保單上的資料眾多且潛在客戶往往只有書面資料無電子資料，經紀人員必須以手動的方式將既有保單上的資料輸入到保險公司的評估系統中，這樣會耗去不少時間，減低開發新客戶的效率。 At present, in order to evaluate the existing insurance policies of potential customers, the insurance company’s brokers must enter the information on the existing insurance policies into the insurance company’s evaluation system in order to evaluate the potential customers’ existing insurance policies and make further recommendations to the potential customers . However, there are many information on existing insurance policies and potential customers often only have written information without electronic information. Brokers must manually enter the information on existing insurance policies into the insurance company’s evaluation system, which will take a lot of time , Reduce the efficiency of developing new customers.

因此，如何自動地擷取既有保單上的資料並將其輸入到經紀人所屬保險公司的評估系統中，便是值得本領域具有通常知識者去思量的課題。 Therefore, how to automatically retrieve the information on the existing insurance policy and input it into the evaluation system of the insurance company to which the broker belongs is a subject worthy of consideration by those with ordinary knowledge in the field.

本發明之目的在於提供一文件資訊提取歸檔系統，能自動提取文件影像上的文字資訊。此文件資訊提取歸檔系統，電性連接到一資料庫，該文件資訊提取歸檔系統包括一輸入模組、一文字分割區域偵測模組、一文字辨識模組、一語義分割模組、與一資料庫對接模組。輸入模組接受一文件影像，該文件影像包括多個文字。文字分割區域偵測模組藉由一第一類神經網路模型對文件影像中的該文字進行框選，以形成至少一文字分割區域。文字辨識模組藉由一第二類神經網路模型對該文字分割區域中的該文字進行辨識，以取得可編輯的至少一字串，該字串包括可編輯的至少一文字。語義分割模組對該字串進行斷詞以形成多個分詞，並對每一個分詞賦予一詞性。而且，資料庫對接模組是依據該詞性對該分詞與該資料庫的各欄位進行串接。 The purpose of the present invention is to provide a document information extraction and filing system that can automatically extract text information on document images. The document information extraction and filing system is electrically connected to a database. The document information extraction and filing system includes an input module, a text segmentation area detection module, a text recognition module, a semantic segmentation module, and a database Docking module. The input module accepts a document image, and the document image includes multiple characters. The text segmentation area detection module uses a first-type neural network model to frame the text in the document image to form at least one text segmentation area. The text recognition module has a second The neural network-like model recognizes the text in the text segmentation area to obtain at least one editable character string, and the character string includes at least one editable character. The semantic segmentation module hyphenates the word string to form multiple word segments, and assigns a part of speech to each word segment. Moreover, the database docking module concatenates the word segmentation with each field of the database based on the part of speech.

如上述之文件資訊提取歸檔系統，第一類神經網路模型包括一第一卷積式神經網路模型與一目標檢測神經網路模型，第一卷積式神經網路模型對該文件影像進行特徵抽取以輸出一特徵向量，目標檢測神經網路模型根據該特徵向量的輸入對該序號型態之文字進行框選以形成序號分割區域。其中，第一卷積式神經網路模型為VGG模型、ResNet模型、或DenseNet模型。此外，目標檢測神經網路模型為YOLO模型、CTPN模型、或EAST模型。 As in the above-mentioned document information extraction and archiving system, the first type of neural network model includes a first convolutional neural network model and a target detection neural network model. The first convolutional neural network model performs processing on the document image The feature extraction is to output a feature vector, and the target detection neural network model performs frame selection on the serial number type text according to the input of the feature vector to form a serial number segmentation area. Among them, the first convolutional neural network model is a VGG model, a ResNet model, or a DenseNet model. In addition, the target detection neural network model is a YOLO model, a CTPN model, or an EAST model.

如上述之文件資訊提取歸檔系統，第二類神經網路模型包括一第二卷積式神經網路模型與一遞歸式神經網路模型，第二卷積式神經網路模型對該文字分割區域中的圖像進行處理以輸出一文字序列，該遞歸式神經網路模型根據文字序列的輸入以輸出可編輯的該字串。其中，遞歸式神經網路模型實施Connectionist Temporal Classification演算法。 As in the above-mentioned document information extraction and filing system, the second type of neural network model includes a second convolutional neural network model and a recursive neural network model. The second convolutional neural network model divides the text area The image in is processed to output a text sequence, and the recursive neural network model outputs the editable text string according to the input of the text sequence. Among them, the recurrent neural network model implements the Connectionist Temporal Classification algorithm.

如上述之文件資訊提取歸檔系統，第二類神經網路模型為Seq2Seq模型。 As in the above-mentioned document information extraction and filing system, the second type of neural network model is the Seq2Seq model.

如上述之文件資訊提取歸檔系統，語義分割模組更包括一詞庫與一規則模組，其中該詞庫儲存有特定領域的多個專有名詞，而該規則模組則是用於將各種分詞賦予不同的詞性。 As in the above-mentioned document information extraction and filing system, the semantic segmentation module further includes a vocabulary and a rule module. The vocabulary stores multiple proper nouns in a specific field, and the rule module is used to combine various Participles give different parts of speech.

如上述之文件資訊提取歸檔系統，語義分割模組將該些分詞向量化後，利用條件隨機場或隱藏式馬可夫模型對每一個分詞賦予一詞性。 As in the above-mentioned document information extraction and filing system, the semantic segmentation module vectorizes these word segments, and uses conditional random fields or hidden Markov models to assign a part of speech to each word segment.

如上述之文件資訊提取歸檔系統，語義分割模組包括一第三類神經網路模型，該語義分割模組將該字串的每一文字轉換成一固定維度之特徵向量，並將該特徵向量輸入到該第三類神經網路模型，以對每一個分詞賦予一詞性。其中，第三類神經網路模型屬於遞歸式神經網路。而且，第三類神經網路模型包括一條件隨機場層。 As in the above-mentioned document information extraction and filing system, the semantic segmentation module includes a third-type neural network model. The semantic segmentation module converts each character of the string into a fixed-dimensional feature vector, and inputs the feature vector to The third type of neural network model is to give each word segmentation a part of speech. Among them, the third type of neural network model is a recurrent neural network. Moreover, the third type of neural network model includes a conditional random field layer.

如上述之文件資訊提取歸檔系統，資料庫對接模組包括一文字分類器，該文字分類器對同一詞性的分詞進行更進一步的分類。 As in the above-mentioned document information extraction and filing system, the database docking module includes a word classifier, which further classifies the word segmentation of the same part of speech.

藉由本案之文件資訊提取歸檔系統，可將文件影像中的文字自動輸入到資料庫的對應欄位，無需使用到人類的手工輸入，大幅增進行政作業效率。 With the document information extraction and filing system in this case, the text in the document image can be automatically input into the corresponding field of the database, without the need for manual input by humans, which greatly improves the efficiency of administrative operations.

為讓本之上述特徵和優點能更明顯易懂，下文特舉較佳實施例，並配合所附圖式，作詳細說明如下。 In order to make the above-mentioned features and advantages of the present invention more obvious and understandable, a detailed description is given below of preferred embodiments in conjunction with the accompanying drawings.

10:文件影像 10: File image

12:文字分割區域 12: Text segmentation area

12a:圖片序列 12a: Picture sequence

30:資料庫 30: Database

40:影像輸入裝置 40: Video input device

100:文件資訊提取歸檔系統 100: Document information extraction and filing system

110:輸入模組 110: Input module

115:影像前處理模組 115: image pre-processing module

120:文字分割區域偵測模組 120: Text segmentation area detection module

122:第一類神經網路模型 122: The first type of neural network model

1221:第一卷積式神經網路模型 1221: The first convolutional neural network model

1223:目標檢測神經網路模型 1223: Neural network model for target detection

130:文字辨識模組 130: text recognition module

132:第二類神經網路模型 132: The second type of neural network model

1321:第二卷積式神經網路模型 1321: The second convolutional neural network model

1323:遞歸式神經網路模型 1323: Recurrent neural network model

140:語義分割模組 140: Semantic Segmentation Module

141:第三類神經網路模型 141: The third type of neural network model

1411:嵌入向量層 1411: Embedding vector layer

1413:遞歸式神經網路層 1413: Recurrent neural network layer

1415:激勵函數層 1415: excitation function layer

1417:條件隨機場層 1417: Conditional Random Field Layer

142:詞庫 142: Thesaurus

143:規則模組 143: Rule Module

150:資料庫對接模組 150: Database docking module

151:文字分類器 151: text classifier

下文將根據附圖來描述各種實施例，所述附圖是用來說明而不是用以任何方式來限制範圍，其中相似的標號表示相似的組件，並且其中：圖1所繪示為本發明之文件資訊提取歸檔系統的實施例。 Hereinafter, various embodiments will be described based on the accompanying drawings, which are used for illustration rather than limiting the scope in any way, wherein similar reference numerals indicate similar components, and among them: FIG. 1 is a drawing of the present invention. An embodiment of a file information extraction and filing system.

圖2A所繪示為保單的文件影像。 Figure 2A shows the document image of the insurance policy.

圖2B所繪示為經過影像前處理的保單的文件影像。 Figure 2B shows the document image of the insurance policy after image preprocessing.

圖2C所繪示為具有文字分割區域的文件影像。 FIG. 2C shows a document image with text division areas.

圖3所繪示為第一類神經網路模型的架構圖。 Figure 3 shows the architecture diagram of the first type of neural network model.

圖4所繪示為第二類神經網路模型的架構圖。 Figure 4 shows the architecture diagram of the second type of neural network model.

圖5所繪示為將文字分割區域拆解成多個圖片序列的示意圖。 FIG. 5 shows a schematic diagram of disassembling the text segmentation area into multiple picture sequences.

圖6所繪示為語義分割模組的架構圖。 FIG. 6 shows the structure diagram of the semantic segmentation module.

圖7所繪示為第三類神經網路模型的架構圖。 Figure 7 shows the architecture diagram of the third type of neural network model.

參照本文闡述的詳細內容和附圖說明是最好理解本發明。下面參照附圖會討論各種實施例。然而，本領域技術人員將容易理解，這裡關於附圖給出的詳細描述僅僅是為了解釋的目的，因為這些方法和系統可超出所描述的實施例。例如，所給出的教導和特定應用的需求可能產生多種可選的和合適的方法來實現在此描述的任何細節的功能。因此，任何方法可延伸超出所描述和示出的以下實施例中的特定實施選擇範圍。 The present invention is best understood with reference to the detailed content set forth herein and the description of the drawings. Various embodiments will be discussed below with reference to the drawings. However, those skilled in the art will easily understand that the detailed description given here with respect to the drawings is only for explanatory purposes, because these methods and systems may exceed the described embodiments. For example, the given teachings and specific application requirements may produce a variety of alternative and suitable methods to implement Now this describes the function in any detail. Therefore, any method can extend beyond the specific implementation options described and illustrated in the following embodiments.

在說明書及後續的申請專利範圍當中使用了某些詞彙來指稱特定的元件。所屬領域中具有通常知識者應可理解，不同的廠商可能會用不同的名詞來稱呼同樣的元件。本說明書及後續的申請專利範圍並不以名稱的差異來作為區分元件的方式，而是以元件在功能上的差異來作為區分的準則。在通篇說明書及後續的請求項當中所提及的「包含」或「包括」係為一開放式的用語，故應解釋成「包含但不限定於」。另外，「耦接」或「連接」一詞在此係包含任何直接及間接的電性連接手段。因此，若文中描述一第一裝置耦接於一第二裝置，則代表該第一裝置可直接電性連接於該第二裝置，或透過其他裝置或連接手段間接地電性連接至該第二裝置。 In the specification and subsequent patent applications, certain words are used to refer to specific elements. Those with ordinary knowledge in the field should understand that different manufacturers may use different terms to refer to the same components. The scope of this specification and subsequent patent applications does not use differences in names as a way to distinguish elements, but uses differences in functions of elements as a criterion for distinguishing. The "include" or "include" mentioned in the entire specification and the subsequent request items is an open term, so it should be interpreted as "include but not limited to". In addition, the term "coupled" or "connected" herein includes any direct and indirect electrical connection means. Therefore, if it is described that a first device is coupled to a second device, it means that the first device can be directly electrically connected to the second device, or indirectly electrically connected to the second device through other devices or connection means. Device.

請參閱圖1，圖1所繪示為本發明之文件資訊提取歸檔系統的實施例。文件資訊提取歸檔系統100包括一輸入模組110、一影像前處理模組115、一文字分割區域偵測模組120、一文字辨識模組130、一語義分割模組140、與一資料庫對接模組150，其中資料庫對接模組150是與一資料庫30連接。在本實施例中，資料庫30例如為保險公司的資料庫，此資料庫包括多個欄位，例如：姓名、身分證字號、投保類別、投保金額...等等。此外，輸入模組110例如是電性連接到一影像輸入裝置40，此影像輸入裝置40例如為一掃描裝置、一數位相機、或具有拍照功能的一智慧型手機。藉由此影像輸入裝置40，可將一文件影像(例如：保單，圖2A所示的相片)匯入到影像前處理模組115中。此影像前處理模組115能對該文件影像進行影像前處理，例如：方向轉正、曲面校正、圖片去噪、二值化等，以讓文件影像具有高對比之特性(如圖2B所示的文件影像10)，以方便後續的處理。在本實施例中，輸入模組110、影像前處理模組115、文字分割區域偵測模組120、文字辨識模組130、語義分割模組140、與資料庫對接模組150是設置於伺服端，伺服端例如是由一台或多台伺服器所組成。須注意的是，為了保護個人的隱私，圖2A中的要保人與被保險人姓名、保單號碼都進行遮蓋，圖2B中的要保人與被保險人姓名、保單號碼則進行變造。 Please refer to FIG. 1. FIG. 1 illustrates an embodiment of the document information extraction and filing system of the present invention. The document information extraction and filing system 100 includes an input module 110, an image pre-processing module 115, a text segmentation area detection module 120, a text recognition module 130, a semantic segmentation module 140, and a database docking module 150. The database docking module 150 is connected to a database 30. In this embodiment, the database 30 is, for example, an insurance company's database. The database includes multiple fields, such as name, ID number, insurance type, insurance amount, etc. In addition, the input module 110 is, for example, electrically connected to an image input device 40. The image input device 40 is, for example, a scanning device, a digital camera, or a smartphone with a camera function. With this image input device 40, a document image (for example, an insurance policy, the photo shown in FIG. 2A) can be imported into the image pre-processing module 115. The image pre-processing module 115 can perform image pre-processing on the document image, such as: orientation correction, curved surface correction, image denoising, binarization, etc., so that the document image has high contrast characteristics (as shown in FIG. 2B) File image 10) to facilitate subsequent processing. In this embodiment, the input module 110, the image pre-processing module 115, the text segmentation area detection module 120, the text recognition module 130, the semantic segmentation module 140, and the database docking module 150 are installed in the server For example, the server side is composed of one or more servers. It should be noted that, in order to protect personal privacy, the insured and the insured in Figure 2A The name of the person and the policy number are covered, while the names of the insured and the insured and the policy number in Figure 2B are changed.

經過前處理後的文件影像10會被傳輸到文字分割區域偵測模組120，文字分割區域偵測模組120包括第一類神經網路模型122，此第一類神經網路模型122能對文件影像10中的文字進行框選，以形成至少一文字分割區域12(圖2C所示為多個)。須注意的是，文字分割區域12中的文字是以影像的方式存在的，也就是說文字分割區域12中的文字在這個階段是無法編輯的。為了將這些文字轉為可編輯的文字，可藉由文字辨識模組130來完成。以下，將介紹文字分割區域偵測模組120與文字辨識模組130較詳細的運作機制。 The pre-processed document image 10 will be transmitted to the text segmentation area detection module 120. The text segmentation area detection module 120 includes the first type of neural network model 122. The first type of neural network model 122 can The text in the document image 10 is frame-selected to form at least one text segmentation area 12 (a plurality of texts are shown in FIG. 2C). It should be noted that the text in the text segmentation area 12 exists in the form of images, that is, the text in the text segmentation area 12 cannot be edited at this stage. In order to convert these texts into editable texts, the text recognition module 130 can be used to complete. Hereinafter, a more detailed operation mechanism of the text segmentation area detection module 120 and the text recognition module 130 will be introduced.

請同時參照圖3，第一類神經網路模型122包括一第一卷積式神經網路模型1221與一目標檢測神經網路模型1223，此第一卷積式神經網路模型1221屬於卷積式神經網路(convolutional neural network)，包括卷積層(convolutional layer)與採樣層(pooling layer)(卷積層與採樣層皆未於圖中繪式)，其中卷積層主要用於特徵抽取，而採樣層則是用於減少第一卷積式神經網路模型1221所需的參數，以免產生過擬合(overfitting)的情形。第一卷積式神經網路模型1221根據所輸入的文件影像10產生一特徵向量，之後特徵向量再輸入到此目標檢測神經網路模型1223。在本實施例中，第一卷積式神經網路模型1221可為VGG模型、ResNet模型、或DenseNet模型。此外，目標檢測神經網路模型1223可為YOLO模型，較佳為CTPN模型或EAST模型。在經過目標檢測神經網路模型1223的演算後，文件影像10中的文字便會被框選，而形成上述的文字分割區域12(如圖2C所示)。 3, the first type of neural network model 122 includes a first convolutional neural network model 1221 and a target detection neural network model 1223, this first convolutional neural network model 1221 belongs to convolution Convolutional neural network (convolutional neural network) includes convolutional layer and pooling layer (both convolutional layer and sampling layer are not shown in the figure). The convolutional layer is mainly used for feature extraction, and sampling The layer is used to reduce the parameters required by the first convolutional neural network model 1221 to avoid overfitting. The first convolutional neural network model 1221 generates a feature vector according to the input document image 10, and then the feature vector is input to the target detection neural network model 1223. In this embodiment, the first convolutional neural network model 1221 may be a VGG model, a ResNet model, or a DenseNet model. In addition, the target detection neural network model 1223 may be a YOLO model, preferably a CTPN model or an EAST model. After the calculation of the target detection neural network model 1223, the text in the document image 10 will be framed to form the aforementioned text segmentation area 12 (as shown in FIG. 2C).

待文件影像10中的文字被框選以形成文字分割區域12後，文字辨識模組130便會藉由一第二類神經網路模型132對文字分割區域12中的文字進行辨識。請同時參照圖4，第二類神經網路模型132包括一第二卷積式神經網路模型1321與一遞歸式神經網路模型1323，此第二卷積式神經網路模型1321與第一卷積式神經網路模型1221一樣同屬於卷積式神經網路(convolutional neural network)，此第二卷積式神經網路模型1321可對文字分割區域12中的文字進行預判斷。雖然第二卷積式神經網路模型1321可對文字分割區域12中的文字進行初步判斷，但較佳還是須在第二卷積式神經網路模型1321加上遞歸式神經網路模型1323，以對文字分割區域12中的文字進行更佳地辨識，相關詳細機制將在後文敘述。 After the text in the document image 10 is framed to form the text segmentation area 12, the text recognition module 130 will recognize the text in the text segmentation area 12 through a second-type neural network model 132. 4, the second type of neural network model 132 includes a second convolutional neural network model 1321 and a recurrent neural network model 1323, the second convolutional neural network model 1321 and the first The convolutional neural network model 1221 belongs to the same convolutional neural network (convolutional neural network). network), this second convolutional neural network model 1321 can pre-judge the text in the text segmentation area 12. Although the second convolutional neural network model 1321 can make preliminary judgments on the text in the text segmentation area 12, it is better to add a recursive neural network model 1323 to the second convolutional neural network model 1321. In order to better recognize the text in the text segmentation area 12, the relevant detailed mechanism will be described later.

第二卷積式神經網路模型1321在對文字分割區域12中的文字進行辨識時，會先將文字分割區域12拆解成多個圖片序列12a(如圖5)。舉例來說，若文字分割區域12包括「S」這個字元，則這些圖片序列12a可能是「S」的左邊部分、也可能是「S」的右邊部分，這樣一來第二卷積式神經網路模型1321有可能將「S」這個字元識別成這二個「S」字元。或者，反過來也可能將多個字元辨識成一個，比如「llc.」這個字串，第二卷積式神經網路模型1321可能將當中的二個l(“ll”)視為一個l(“1”)。遞歸式神經網路模型1323是屬於遞歸式神經網路(Recurrent Neural Network,RNN)，由於遞歸式神經網路會參考到之前的輸入也就是說具有短期記憶的功能，因此可以對第二卷積式神經網路模型1321可能的輸出錯誤進行校正，而正確辨識出文字分割區域12中的文字。 When the second convolutional neural network model 1321 recognizes the text in the text segmentation area 12, it first disassembles the text segmentation area 12 into a plurality of picture sequences 12a (as shown in FIG. 5). For example, if the text segmentation area 12 includes the character "S", these picture sequences 12a may be the left part of "S" or the right part of "S", so that the second convolutional neural network The network model 1321 may recognize the character "S" as these two "S" characters. Or, conversely, it is also possible to recognize multiple characters as one, such as the string "llc.", and the second convolutional neural network model 1321 may treat two ls ("ll") among them as one l ("1"). The recursive neural network model 1323 belongs to the recurrent neural network (Recurrent Neural Network, RNN). Because the recurrent neural network refers to the previous input, that is to say, it has the function of short-term memory, so it can perform the second convolution The possible output error of the neural network model 1321 is corrected, and the text in the text segmentation area 12 is correctly recognized.

在本實施例中，遞歸式神經網路模型1323例如是採用Connectionist Temporal Classification演算法(以下簡稱CTC演算法)。目前，CTC演算法主要是用在語音識別上，其詳細的運作原理可參考以下網頁：“Sequence Modeling With CTC”(https：//distill.pub/2017/ctc/) In this embodiment, the recursive neural network model 1323 uses the Connectionist Temporal Classification algorithm (hereinafter referred to as the CTC algorithm), for example. At present, the CTC algorithm is mainly used for speech recognition. For detailed operating principles, please refer to the following webpage: "Sequence Modeling With CTC" (https: //distill.pub/2017/ctc/)

本案的創作人經研究後發現，CTC演算法也可以適用於本案的文字辨識且具有良好的效果，主要原因在於語音辨識的情境與本案文字辨識的情境有部分共同之處。在語音辨識中一些比較常見的情形是：有些人講話比較快，有些人講話比較慢，或者某些人在某些音素會拉得比較長；而CTC演算法正式針對這些狀況開發出來的。而在本案的文字辨識中，有些文件中字元與字元之間的間距會拉得比較開(對應到語音辨識中有些人講話比較慢)，有些文件中字元與字元之間的間距會拉得比較緊湊(對應到語音辨識中有些人講話比較快)，而且本案中的文件影像有可能經由拍照取得的，這樣一來更可能因為拍照者拍攝的角度或遠近而產生文件中字元與字元之間的間距有所變化。因此，本案的創作人採用CTC演算法解決這樣的問題並獲得良好的效果。 After research, the creator of this case found that the CTC algorithm can also be applied to the text recognition of this case and has good results. The main reason is that the context of speech recognition and the context of text recognition in this case have some in common. Some of the more common situations in speech recognition are: some people speak faster, some people speak slower, or some people stretch longer in certain phonemes; and the CTC algorithm is officially developed for these situations. In the text recognition in this case, the spacing between characters in some documents will be relatively open (corresponding to some people speaking slowly in voice recognition), and the spacing between characters in some documents will be widened. Should be more compact (corresponding to some people speaking faster in speech recognition), and Moreover, the document image in this case may be obtained by taking a photo, so it is more likely that the distance between characters in the document will change due to the angle or distance of the photographer. Therefore, the creator of this case uses the CTC algorithm to solve such problems and obtain good results.

此外，第二類神經網路模型132也可以為Seq2Seq模型。Seq2Seq模型一般包括一編碼器(Encoder)和一解碼器(Decoder)，其中編碼器可為卷積式神經網路，其也會先將文字分割區域12拆解成多個圖片序列12a(如圖5)，並將圖片序列12a轉換成一個上下文向量(context vector)，之後再將該上下文向量輸入到解碼器，解碼器再將該上下文向量轉換成可編輯的字串。 In addition, the second type of neural network model 132 may also be a Seq2Seq model. The Seq2Seq model generally includes an encoder (Encoder) and a decoder (Decoder), where the encoder can be a convolutional neural network, which will also first disassemble the text segmentation area 12 into multiple image sequences 12a (as shown in the figure) 5), and convert the picture sequence 12a into a context vector, and then input the context vector to the decoder, and the decoder converts the context vector into an editable string.

值得注意的是，由於擷取文件影像(如圖2A所示)牽涉到拍照，便會產生不同人有不同拍攝角度的情況發生，因此第一類神經網路模型122與第二類神經網路模型132在訓練時可輸入不同角度、各種光線環境的下的文件影像，這些不同角度、各種光線環境的下的文件影像可直接利用電腦模擬的方式取得。 It is worth noting that because capturing document images (as shown in Figure 2A) involves taking pictures, different people have different shooting angles. Therefore, the first type of neural network model 122 and the second type of neural network The model 132 can input file images under different angles and various light environments during training. These file images under different angles and various light environments can be directly obtained by computer simulation.

在經由文字辨識模組130取得可編輯字串後，便可經由語義分割模組140對字串進行斷詞以形成多個分詞，並對每一個分詞賦予一詞性。在本實施例中，可使用jieba這個分詞程式進行斷詞以形成上述的分詞。此外，請參照圖6，語義分割模組140包括一詞庫142與一規則模組143，詞庫142例如儲存有特定領域的多個專有名詞，而規則模組143則是用於將各種分詞賦予不同的詞性，例如將「孔乙己」賦予人名這個詞性，將「台北市大直」賦予地名這個詞性，將「南山人壽」賦予企業名這個詞性，將「102/12/31」賦予日期這個詞性等。規則模組143除了可根據該分詞本身的特性進行詞性的賦予外，還可以根據該分詞於該字串中所在的位置來判斷，例如使用CYK算法(Cocke-Younger-Kasami algorithm，縮寫為CYK algorithm)。 After the editable character string is obtained through the text recognition module 130, the word string can be segmented through the semantic segmentation module 140 to form a plurality of word segments, and a part of speech is assigned to each word segment. In this embodiment, the word segmentation program jieba can be used for word segmentation to form the aforementioned word segmentation. In addition, referring to FIG. 6, the semantic segmentation module 140 includes a vocabulary 142 and a rule module 143. The vocabulary 142, for example, stores multiple proper nouns in a specific field, and the rule module 143 is used to combine various Participles are assigned different parts of speech. For example, "Kong Yiji" is assigned the part of speech of a person's name, "Taipei City Dazhi" is assigned the part of speech of place name, "Nanshan Life Insurance" is assigned the part of speech of company name, and "102/12/31" is assigned to the date. This part of speech etc. In addition to assigning part of speech according to the characteristics of the word segmentation, the rule module 143 can also determine the position of the word segmentation in the string. For example, using the CYK algorithm (Cocke-Younger-Kasami algorithm, abbreviated as CYK algorithm) ).

在其他實施例中，語義分割模組140將上述分詞向量化後，利用條件隨機場(Conditional Random Field，CRF)或隱藏式馬可夫模型(Hidden Markov Model)對每一個分詞賦予一詞性。 In other embodiments, the semantic segmentation module 140 vectorizes the aforementioned word segmentation, and then uses Conditional Random Field (CRF) or Hidden Markov Model (Hidden Markov Model) to assign a part of speech to each word segmentation.

在較佳的實施例中，如圖7所示，語義分割模組140則是包括一第三類神經網路模型141，此第三類神經網路模型141包括一嵌入向量層(Embedding layer)1411、一遞歸式神經網路層(RNN layer)1413、一激勵函數層(activation layer)1415、與一條件隨機場層(CFR layer)1417。在本實施例中，語義分割模組140會將字串的每一文字轉換成一固定維度之特徵向量，這些特徵向量即構成嵌入向量層1411。在此，遞歸式神經網路層1413除了可為基本的遞歸式神經網路或雙向遞歸式神經網路(Bi-RNN，如圖7中所示)外，還可包括長短期記憶遞歸式神經網路(LSTM-RNN，LSTM為Long Short-Term Memory的縮寫)、雙向長短期記憶遞歸式神經網路(BLSTM-RNN，BLSTM為Bidirectional Long Short-Term Memory的縮寫)、GRU遞歸式神經網路(GRU-RNN，GRU為Gated Recurrent Unit的縮寫)。此外，激勵函數層1415的激勵函數例如為tanh函數，而第三類神經網路模型141會加入條件隨機場層1417的原因在於條件隨機場在序列的標註上具有優勢。在經過第三類神經網路模型141的運算後，字串中的每個文字會被賦予一詞性，相關事例如下表格，其中1代表人名，2代表地名，3代表企業名、4為日期、5則為其他。 In a preferred embodiment, as shown in FIG. 7, the semantic segmentation module 140 includes a third type of neural network model 141, and the third type of neural network model 141 includes an embedding layer (Embedding layer) 1411, a recurrent neural network layer (RNN layer) 1413, an activation function layer (activation layer) 1415, and a conditional random field layer (CFR layer) 1417. In this embodiment, the semantic segmentation module 140 converts each character of the character string into a feature vector of a fixed dimension, and these feature vectors constitute the embedded vector layer 1411. Here, the recurrent neural network layer 1413 can be a basic recurrent neural network or a bidirectional recurrent neural network (Bi-RNN, as shown in Figure 7), and can also include long and short-term memory recurrent neural networks. Network (LSTM-RNN, LSTM is the abbreviation of Long Short-Term Memory), bidirectional long and short-term memory recurrent neural network (BLSTM-RNN, BLSTM is the abbreviation of Bidirectional Long Short-Term Memory), GRU recurrent neural network (GRU-RNN, GRU is the abbreviation of Gated Recurrent Unit). In addition, the excitation function of the excitation function layer 1415 is, for example, a tanh function, and the reason why the third type of neural network model 141 is added to the conditional random field layer 1417 is that the conditional random field has advantages in sequence labeling. After the third type of neural network model 141, each text in the string will be given a part of speech. Related matters are as follows, where 1 represents the name of the person, 2 represents the name of the place, 3 represents the name of the company, 4 represents the date, 5 is others.

請回去參照圖1，在完成字串中各分詞的詞性標註後，資料庫對接模組150便會依據各分詞的詞性對該分詞與資料庫30的各欄位進行串接。舉例來說，圖2B中的「孔乙己」就會被歸類到資料庫中30與人名相關的欄位。在較佳的實施例中，資料庫對接模組150還包括一文字分類器151，此文字分類器151可對同一詞性的分詞進行更進一步的分類。舉例來說，一份文件中可能出現不同的人名，文字分類器151可判斷這些人名中何為要保人、何為非要保人，而根據的規則例如為距離要保人這個詞最近的人名為要保人。 Please refer back to FIG. 1. After completing the part-of-speech tagging of each word segment in the string, the database docking module 150 will concatenate the word segmentation with each field of the database 30 according to the part-of-speech of each word segment. For example, "Kong Yiji" in Figure 2B will be classified into 30 fields related to the names of people in the database. In a preferred embodiment, the database docking module 150 further includes a text classifier 151, which can further classify the word segmentation of the same part of speech. For example, different names of persons may appear in a document, and the word classifier 151 can determine which of these names is the insured person and which is the non-insurable person, and the rule is based on the rule, for example, the word closest to the insured person The person is called the guarantor.

當圖2A中的文件影像中的各文字都藉由本實施例的文件資訊提取歸檔系統100而歸類到資料庫30中所對應的各欄位後，保險公司的經紀人員便可利用所屬保險公司的評估系統20對潛在客戶的既有保單進行評估並做出進一步的建議。相較於習知的做法，經紀人員無須以手動的方式將既有保單上的資料輸入到保險公司的評估系統中，這樣會節省不少時間，增加開發新客戶的效率。 When the text in the document image in FIG. 2A is classified into the corresponding fields in the database 30 by the document information extraction and filing system 100 of this embodiment, the brokers of the insurance company can use their insurance company The evaluation system 20 evaluates the existing insurance policies of potential customers and makes further recommendations. Compared with the conventional practice, brokers do not need to manually enter the information on existing insurance policies into the insurance company's evaluation system, which will save a lot of time and increase the efficiency of developing new customers.

值得注意的是，在上述中雖然是以既有保單為文件影像的實施例，但本領域具有通常知識者應可得知，本案之文件資訊提取歸檔系統還可適用於其他種類的文件，比如：委任書、合約、判決書等，而無需使用到人類的手工輸入，大幅增進行政作業效率。 It is worth noting that although the above example uses an existing insurance policy as the document image, those with ordinary knowledge in the field should know that the document information extraction and filing system in this case can also be applied to other types of documents, such as : Appointments, contracts, judgments, etc., without the use of human manual input, greatly improving the efficiency of administrative operations.

雖然本發明已以較佳實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作些許之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 Although the present invention has been disclosed as above in preferred embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be subject to those defined by the attached patent scope.

30:資料庫 30: Database

40:影像輸入裝置 40: Video input device

110:輸入模組 110: Input module

115:影像前處理模組 115: image pre-processing module

120:文字分割區域偵測模組 120: Text segmentation area detection module

122:第一類神經網路模型 122: The first type of neural network model

130:文字辨識模組 130: text recognition module

132:第二類神經網路模型 132: The second type of neural network model

140:語義分割模組 140: Semantic Segmentation Module

150:資料庫對接模組 150: Database docking module

151:文字分類器 151: text classifier

Claims

A document information extraction and archiving system is electrically connected to a database. The document information extraction and archiving system includes: an input module that accepts a document image, the document image includes a plurality of text; a text segmentation area detection module, by A first type neural network model is used to frame the text in the document image to form at least one text segmentation area; a text recognition module uses a second type neural network model for the text segmentation area The character is recognized to obtain at least one editable character string, the character string includes at least one editable character; a semantic segmentation module, which hyphenates the character string to form a plurality of word segments, and assigns each word segmentation A part of speech; and a database docking module, according to the part of speech, the word segmentation and each field of the database are concatenated; wherein, the first type of neural network model includes a first convolutional neural network model And a target detection neural network model, the first convolutional neural network model performs feature extraction on the document image to output a feature vector, the target detection neural network model is based on the input of the feature vector for the serial number type The text is framed to form the serial number segmentation area; wherein, the first convolutional neural network model is VGG model, ResNet model, or DenseNet model; wherein, the target detection neural network model is YOLO model, CTPN Model, or EAST model; wherein the second type of neural network model includes a second convolutional neural network model and a recursive neural network model, and the second convolutional neural network model segmented the text Image in Processing is performed to output a text sequence, and the recurrent neural network model outputs the editable text string according to the input of the text sequence; wherein the recurrent neural network model implements the Connectionist Temporal Classification algorithm.

A document information extraction and archiving system is electrically connected to a database. The document information extraction and archiving system includes: an input module that accepts a document image, the document image includes a plurality of text; a text segmentation area detection module, by A first type neural network model is used to frame the text in the document image to form at least one text segmentation area; a text recognition module uses a second type neural network model for the text segmentation area The character is recognized to obtain at least one editable character string, the character string includes at least one editable character; a semantic segmentation module, which hyphenates the character string to form a plurality of word segments, and assigns each word segmentation A part of speech; and a database docking module, according to the part of speech, the word segmentation and each field of the database are concatenated; wherein, the first type of neural network model includes a first convolutional neural network model And a target detection neural network model, the first convolutional neural network model performs feature extraction on the document image to output a feature vector, the target detection neural network model is based on the input of the feature vector for the serial number type The text is framed to form the serial number segmentation area; wherein, the first convolutional neural network model is VGG model, ResNet model, or DenseNet model; wherein, the target detection neural network model is YOLO model, CTPN Model, or EAST model; The second type of neural network model is the Seq2Seq model.

For example, the document information extraction and filing system described in item 1 or item 2 of the scope of patent application, wherein the semantic segmentation module further includes a vocabulary and a rule module, wherein the vocabulary stores a plurality of proprietary domains Nouns, and the rule module is used to assign different parts of speech to the participle according to the characteristics of the participle.

For example, the document information extraction and filing system described in item 1 or item 2 of the scope of patent application, wherein the semantic segmentation module vectorizes the word segmentation, and uses conditional random field or hidden Markov model to give each word segmentation a part of speech .

For example, the document information extraction and filing system described in item 1 or item 2 of the scope of patent application, wherein the semantic segmentation module includes a third type neural network model, and the semantic segmentation module converts each character of the string into a Fix the dimensional feature vector, and input the feature vector into the third type neural network model to assign a part of speech to each word segmentation.

In the document information extraction and filing system described in item 5 of the scope of patent application, the third type of neural network model is a recurrent neural network.