TW201911157A - A system and method for identifying a form and establishing a dynamic form automatically - Google Patents

A system and method for identifying a form and establishing a dynamic form automatically Download PDF

Info

Publication number
TW201911157A
TW201911157A TW106125615A TW106125615A TW201911157A TW 201911157 A TW201911157 A TW 201911157A TW 106125615 A TW106125615 A TW 106125615A TW 106125615 A TW106125615 A TW 106125615A TW 201911157 A TW201911157 A TW 201911157A
Authority
TW
Taiwan
Prior art keywords
database
feature
words
field
module
Prior art date
Application number
TW106125615A
Other languages
Chinese (zh)
Other versions
TWI648685B (en
Inventor
傅思源
王俊欽
呂美德
Original Assignee
捷思達數位開發有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 捷思達數位開發有限公司 filed Critical 捷思達數位開發有限公司
Priority to TW106125615A priority Critical patent/TWI648685B/en
Application granted granted Critical
Publication of TWI648685B publication Critical patent/TWI648685B/en
Publication of TW201911157A publication Critical patent/TW201911157A/en

Links

Landscapes

  • Character Input (AREA)

Abstract

A system and method for identifying a form and establishing a dynamic form automatically, comprising: a recognition module identifies the texts and layouts of the form through optical character recognition; an edit module corrects columns, texts or layouts; a classification module determines the properties of the contents of the form; a characteristic database stores characteristic words on the form and their associated words; the edit module compares with the characteristic words of the characteristic database and corrects the columns, texts or layouts according to the characteristic words automatically; the classification module determines the properties of the contents of the form by comparing with the characteristic words of the characteristic database.

Description

自動化辨識表單並建立動態表單之系統及其方法  System and method for automatically identifying forms and establishing dynamic forms  

本發明係關於一種辨識及建立表單的系統及其方法,特別係關於自動化辨識表單並建立動態表單之系統及其方法。 The present invention relates to a system for identifying and creating a form and a method thereof, and more particularly to a system and method for automatically identifying a form and creating a dynamic form.

在今日交通工具發達、資訊往來頻繁的工商社會裡,每天可能都會收到許多來自各公司企業、政府或個人的書面文件,如何快速有效率的處理龐大書面文件,亦是辦公室自動化的重要課題之一。 In today's industrial and commercial society where transportation is developed and information is frequent, many written documents from companies, governments or individuals may be received every day. How to process large and fast written documents quickly and efficiently is also an important topic of office automation. One.

許多文件為了統一格式或者讓使用者能簡單明瞭內容都會將文件以表格的方式呈現,而以表格方式呈現的文件,在辨識上通常較純文字的文件辨識來得更困難,因為表格文件通常包含了表格與文字,在表格的形態上往往也會隨著使用者的需要而變化出各式各樣不同的欄位與格線。 Many files will be presented in a tabular format in order to unify the format or allow the user to understand the content. The files presented in tabular form are usually more difficult to identify than the plain text files, because the form files usually contain Forms and texts, in the form of the form, often change a variety of different fields and grid lines as the user needs.

傳統書面文件轉成電子檔的方式大多是透過掃描器,將書面資料轉換成圖檔的電子數位影像格式,因為是圖檔的關係,電腦通常只會將其視為一張圖片,而無法確切取得其書面內容,故亦無法直接透過電腦分析此書面文件,還需要另外以人工的方式進行書面文件的判讀與分析後才能歸檔,當需要查詢此份文件時,更要明確的記得文件內容與檔案夾位置,造成文書處理上歸檔與查詢的不便。 Most of the traditional written documents are converted into electronic files by means of scanners. The written data is converted into electronic digital image format of the image file. Because it is a graphic file, the computer usually only treats it as a picture, but it cannot be exact. After obtaining the written content, it is not possible to analyze the written document directly through the computer. It is also necessary to manually interpret and analyze the written document before filing it. When it is necessary to inquire about this document, it is necessary to clearly remember the contents of the document and The location of the folder causes inconvenience in filing and querying the document.

由於表單的種類千變萬化,為了方便日後查找,大部份會分門別類,但多是採取人力方式判讀表單上的資訊再進行分類,且在日積月累後表單的文件資料數量可能非常龐大,若沒有分類得很精準,在日後查找文件時將會變得 非常不便,也可能始終都找不到該份文件。 Due to the ever-changing variety of forms, most of them will be classified in order to facilitate future search. However, most of them use human resources to interpret the information on the form and then classify them. The amount of documents in the form may be very large after a long period of time. Accurate, it will be very inconvenient to find files in the future, and the file may not be found at all times.

此外,現有將書面表格轉換成動態表單的技術,也常需要以人工方式填入欲處理的欄位(即:項目名稱)與欄位裡的內容(即:數值或記載的資訊等),在先前的習知技術,各種流程往返的書面文件,若以人工的方式來完成數位資料庫的建立,將會花費企業非常龐大的作業成本,而且各種書面文件的種類也是琳瑯滿目,小至隨手筆記,大至研究報告,都需先將資料分類再一一檢視,手續十分繁雜且耗時耗力。 In addition, the existing technology of converting a written form into a dynamic form often requires manual filling of the field to be processed (ie, the project name) and the content in the field (ie, numerical value or recorded information, etc.). Previously known techniques, written documents of various processes, and the manual establishment of digital database, will cost the company a very large operation cost, and the variety of written documents is also dazzling, as small as easy to take notes. As far as the research report is concerned, it is necessary to first sort the data and examine it one by one. The procedures are very complicated and time-consuming and labor-intensive.

目前已公開的文字辨識產品大多是著重在正確辨識出文字、字型、文字大小、文字版面位置、辨識速度或正確率等;若有文字列表,則是盡可能辨識出欄位、表格版面、欄位內容,將其放在正確的表格位置,接著大多是儲存成Microsoft Office的Word或Excel檔案格式,然而這些檔案格式並非是電子資料庫運用的檔案格式,致使無法直接用各種資料庫管理系統進行管理,因此無法直接呼叫出欄位內的資料進行資料庫運算處理,而由程式產生新版面的表單。先前習知技術往往都是人工,或是根據個案撰寫程式處理這些經過光學字元識別(Optical Character Recognition,OCR)後的Word或Excel表格檔案,然後才把表格檔案想辦法轉存成電子資料庫型態。 At present, most of the publicly available text recognition products focus on correctly identifying text, font, text size, text layout position, recognition speed or correct rate; if there is a text list, it is to identify the field, table layout, The contents of the field are placed in the correct form, and then most of them are stored in Microsoft Office Word or Excel file format. However, these file formats are not file formats used by electronic databases, making it impossible to directly use various database management systems. Management is performed, so it is not possible to directly call the data in the field for database operation, and the program generates a new layout form. Previously, the prior art techniques were often manual, or the Word or Excel file files after optical character recognition (OCR) were processed according to a case-writing program, and then the form file was saved into an electronic database. Type.

相較於先前習知技術,根據本發明的自動化辨識表單並建立動態表單之系統及方法,在不需要以人工方式進行書面表單的辨識,而能夠自動化的辨識書面表單的內容,並依照使用者的需求建立客製化的動態表單,方便使用者判讀、分析、歸類與使用由程式自動產生的客製化動態表單,故相較於先前技術,更能達到簡化作業程序、節省大量作業時間與人力之效果。 Compared with the prior art, the system and method for automatically identifying a form and establishing a dynamic form according to the present invention can automatically identify the content of the written form without requiring manual identification of the written form, and according to the user. The need to create customized dynamic forms that allow users to interpret, analyze, classify and use customized dynamic forms that are automatically generated by the program, thus simplifying the work process and saving a lot of work time compared to the prior art. The effect with manpower.

有鑑於上述習知技術之缺點,本發明之主要目的在於提供一種自動化辨識表單並建立動態表單之系統,包含:一辨識模組,耦接於一處理器,透過光學字元識別(Optical Character Recognition,OCR)技術辨識表單的文字及版面;一編排模組,耦接於該處理器,修正欄位、文字或版面格式;一分類模組, 耦接於該處理器,判斷表單內容的屬性;一特徵資料庫,耦接於一儲存裝置,該儲存裝置耦接於該處理器,該特徵資料庫儲存表單上的特徵詞語及其關聯性詞語;及一系統資料庫,耦接於該儲存裝置,該分類模組將表單內容存入該系統資料庫指定的屬性中;其中該編排模組比對該特徵資料庫的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式;其中該分類模組藉由比對該特徵資料庫的特徵詞語判斷表單內容的屬性。其中該辨識模組所辨識的表單內容若為數位格式則直接紀錄文字與版面即可,若為類比畫面則進行圖形辨識;其中該編排模組取得的表單內容,若為數位格式則直接輸出,若為類比畫面則自動修正欄位內容、自動選擇適合的版面套用並輸出。 In view of the above disadvantages of the prior art, the main object of the present invention is to provide a system for automatically identifying a form and creating a dynamic form, comprising: an identification module coupled to a processor and optical character recognition (Optical Character Recognition) , OCR) technology to identify the text and layout of the form; a programming module coupled to the processor, modifying the field, text or layout format; a classification module coupled to the processor to determine the properties of the form content; A feature database is coupled to a storage device, the storage device is coupled to the processor, the feature database stores feature words and associated words on the form, and a system database coupled to the storage device The classification module stores the content of the form in the attribute specified by the system database; wherein the programming module automatically corrects the field, text or layout format according to the feature word of the feature database; wherein the classification automatically corrects the field, text or layout format; The module determines the attributes of the form content by comparing the feature words of the feature database. If the content of the form recognized by the identification module is a digital format, the text and the layout may be directly recorded, and if the analog image is used for the graphic recognition, the form content obtained by the programming module may be directly output if it is a digital format. If it is an analog screen, the field content is automatically corrected, and the appropriate layout is automatically selected and output.

為達上述之目的,該自動化辨識表單並建立動態表單之系統,更包含:一驗證資料庫模組,耦接於該處理器,檢查修正該系統資料庫,並將修正資訊回饋到該特徵資料庫;及一動態表單模組,耦接於該處理器,透過該系統資料庫依照需求產生指定欄位的新表單。由於透過本自動化辨識表單並建立動態表單之系統所取得的書面表單資料(包含:欄位位置、格線或字符等)已數位化存成資料庫,所以可由程式或使用者任意選擇所需要的欄位,依照使用者需求排列欄位版面,並可透過雲端計算即時產生新版面的動態表單。 For the above purposes, the system for automatically identifying a form and establishing a dynamic form further includes: a verification database module coupled to the processor, checking and correcting the system database, and feeding back the correction information to the feature data. And a dynamic form module coupled to the processor to generate a new form of the specified field through the system database according to requirements. Since the written form data (including: field position, ruled line or character, etc.) obtained through the system for automatically identifying the form and creating the dynamic form has been digitally stored in the database, the program or the user can arbitrarily select the desired The field arranges the field layout according to the user's needs, and can generate a dynamic form of the new layout in real time through the cloud computing.

為達上述之目的,其中該辨識模組藉由圈選表單的範圍或內容,藉此標定欄位的位置與範圍;其中該特徵資料庫包含至少一層次分類,該層次分類包含至少一子類。圈選方式包含使用滑鼠、鍵盤或觸控等,但不限於此;不同的該層次分類皆可分別指定儲存在不同的主機中。 For the above purpose, the identification module determines the position and range of the field by circle the scope or content of the form; wherein the feature database includes at least one hierarchical classification, the hierarchical classification includes at least one subclass . The circle selection method includes, but is not limited to, using a mouse, a keyboard, or a touch; different levels of the classification can be separately stored in different hosts.

為達上述之目的,其中該編排模組預先定義特徵詞語,定義方式包含預先定義常見詞語或表格樣板、圈選表單上的數個單字為一組特徵詞語或預先定義沒有文字處為沒有欄位或空白欄位;其中該特徵資料庫更新該子類時,會逐一往上一個層次進行更新。更新完成後,會對更新的內容進行比對過往的歷史資料,建立並儲存關聯性詞語,若有新增關聯性詞語則對每一該層次分類的該子類,進行逐層更新關聯性詞語的資料內容。 For the above purposes, the orchestration module pre-defines the feature words, and the definition manner includes pre-defining common words or table templates, a plurality of words on the circled form as a set of feature words or pre-defined no characters at the no-field Or a blank field; when the feature database updates the subclass, it will be updated one level at a time. After the update is completed, the updated content is compared with the historical data, and the related words are created and stored. If there are new related words, the related words are updated layer by layer for each sub-category of the hierarchical classification. Information content.

為達上述之目的,其中該驗證資料庫模組檢查修正該辨識模組、該編排模組或該分類模組的正確性,並將修正資訊回饋到特徵資料庫;其中該特徵資料庫補充該關聯性詞語的欄位內容到該系統資料庫。驗證資料庫模組檢查修正該辨識模組、該編排模組或該分類模組的正確性時可同時檢查、各別檢查、批次檢查或定時檢查,但不限於此。 For the above purpose, the verification database module checks the correctness of the identification module, the programming module or the classification module, and feeds the correction information to the feature database; wherein the feature database supplements the The field content of the associated term is added to the system repository. When the verification database module checks and corrects the correctness of the identification module, the programming module or the classification module, it can simultaneously check, check, batch check or time check, but is not limited thereto.

本發明之另一目的係提供一種自動化辨識表單並建立動態表單之方法,包含:透過光學字元識別(Optical Character Recognition,OCR)技術辨識表單的文字及版面;比對一特徵資料庫的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式;比對該特徵資料庫的特徵詞語判斷表單內容的屬性;及將表單內容存入一系統資料庫指定的屬性中。 Another object of the present invention is to provide a method for automatically identifying a form and creating a dynamic form, comprising: identifying a text and a layout of a form through optical character recognition (OCR) technology; and comparing characteristic words of a feature database Automatically correct the field, text or layout format according to the feature words; determine the attribute of the form content than the feature words of the feature database; and store the form content in the attribute specified by the system database.

為達上述之另一目的,該自動化辨識表單並建立動態表單之方法,更包含:檢查修正該系統資料庫,並將修正資訊回饋到該特徵資料庫;及透過該系統資料庫依照需求產生指定欄位的新表單。原則上自動化辨識表單並建立動態表單之方法其步驟係依照以上所述之順序,但不限於此,亦可同時進行、重複進行或任意步驟的調換等,在不脫離本發明精神下,可依需求作適當的變化。 For the above other purposes, the method for automatically identifying a form and creating a dynamic form further includes: checking and modifying the system database, and feeding back the correction information to the feature database; and generating the designation according to requirements through the system database. A new form for the field. In principle, the steps of automatically identifying the form and establishing the dynamic form are in the order described above, but are not limited thereto, and may be performed simultaneously, repeatedly, or exchanged in any step, etc., without departing from the spirit of the present invention. Make the necessary changes.

為達上述之另一目的,其中,透過光學字元識別(Optical Character Recognition,OCR)技術辨識表單的文字及版面的步驟包含:藉由圈選表單的範圍或內容,藉此標定欄位的位置與範圍。除了透過光學字元識別(Optical Character Recognition,OCR)技術直接辨識出表單的文字及版面外,使用者亦可直接圈選表單的範圍或內容。 For the above other purposes, the step of recognizing the text and layout of the form by optical character recognition (OCR) technology includes: calibrating the position of the field by circle the range or content of the form With scope. In addition to directly recognizing the text and layout of the form through Optical Character Recognition (OCR) technology, the user can directly circle the scope or content of the form.

為達上述之另一目的,其中,比對一特徵資料庫的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式的步驟包含:預先定義特徵詞語,定義方式包含預先定義常見詞語或表格樣板、圈選表單上的數個單字為一組特徵詞語或預先定義沒有文字處為沒有欄位或空白欄位。故該特徵詞語不僅可來自特徵資料庫原有的特徵詞語,亦可透過使用者預先定義該特徵詞語,以便產生的新表單更精確的符合使用者的需求。 For the above other purposes, the step of automatically correcting the field, text or layout format according to the characteristic words of the feature database of the feature database includes: predefining the feature words, and the definition manner includes pre-defining common words or tables. Several words on the template and circled form are a set of feature words or pre-defined no text is no field or blank field. Therefore, the feature word can not only be derived from the original feature word of the feature database, but also can be predefined by the user, so that the generated new form can more accurately meet the user's needs.

為達上述之另一目的,其中,檢查修正該系統資料庫,並將修正資訊回饋到該特徵資料庫的步驟包含:先檢查修正辨識表單的文字及版面、依據特徵詞語自動修正、判斷表單內容屬性的正確性,並將修正資訊回饋到特徵資料庫。透過本自動化辨識表單並建立動態表單之系統的修正資訊回饋到特徵資料庫,可不斷累積特徵資料庫的資訊,包含:特徵詞語或關聯性詞語等,故本自動化辨識表單並建立動態表單之系統可自動化學習,建立更完善的資料庫。 For the other purpose of the above, the step of checking and correcting the system database and feeding back the correction information to the feature database comprises: first checking the text and layout of the modified identification form, automatically correcting according to the feature words, and determining the form content. The correctness of the attribute and the correction information is fed back to the feature database. The correction information of the system of the automatic identification form and the dynamic form is fed back to the feature database, and the information of the feature database can be continuously accumulated, including: characteristic words or related words, so the system for automatically identifying the form and establishing the dynamic form Automate learning and build a more complete database.

表單上的字詞及版面經過辨識、編排與分類後,可以儲存成系統資料庫,再依據該系統資料庫的內容,自動的動態建立指定的表單內容,故透過本自動化辨識表單並建立動態表單之系統能大量節省人工文書處理的成本。 After the words and layouts on the form are identified, arranged and classified, they can be stored in the system database, and the specified form content is automatically and dynamically created according to the contents of the system database. Therefore, the form is recognized and the dynamic form is created through the automated form. The system can save a lot of labor and manual processing costs.

100‧‧‧辨識模組 100‧‧‧ Identification Module

200‧‧‧編排模組 200‧‧‧Arranged modules

300‧‧‧分類模組 300‧‧‧Classification module

400‧‧‧驗證資料庫模組 400‧‧‧Verification database module

500‧‧‧動態表單模組 500‧‧‧Dynamic form module

600‧‧‧特徵資料庫 600‧‧‧Characteristic Database

620‧‧‧層次分類 620‧‧‧ hierarchical classification

640‧‧‧子類 640‧‧‧Subclass

700‧‧‧系統資料庫 700‧‧‧System Database

760‧‧‧處理器 760‧‧‧ processor

770‧‧‧儲存裝置 770‧‧‧Storage device

810‧‧‧步驟 810‧‧‧Steps

812‧‧‧步驟 812‧‧‧ steps

820‧‧‧步驟 820‧‧‧Steps

822‧‧‧步驟 822‧‧‧Steps

830‧‧‧步驟 830‧‧ steps

840‧‧‧步驟 840‧‧‧Steps

850‧‧‧步驟 850 ‧ ‧ steps

852‧‧‧步驟 852‧‧‧Steps

860‧‧‧步驟 860‧‧‧Steps

第一圖係顯示本發明實施例自動化辨識表單並建立動態表單之系統架構圖。 The first figure shows a system architecture diagram of an automated identification form and a dynamic form of an embodiment of the present invention.

第二圖係顯示本發明實施例特徵資料庫之結構圖。 The second figure shows the structure of the feature database of the embodiment of the present invention.

第三圖係顯示本發明實施例之流程步驟圖。 The third figure shows a flow chart of the embodiment of the present invention.

藉由參考下列詳細敘述,將可以更快地瞭解上述觀點以及本發明之優點,並且藉由下面的描述以及附加圖式,更容易了解本發明之精神。 The above aspects and the advantages of the present invention will be more readily understood from the following detailed description of the appended claims.

本發明將以較佳之實施例及觀點加以詳細敘述。下列描述提供本發明特定的施行細節,俾使閱者徹底瞭解這些實施例之實行方式。然該領域之熟習技藝者須瞭解本發明亦可在不具備這些細節之條件下實行。此外,文中不會對一些已熟知之結構或功能或是作細節描述,以避免各種實施例間不必要相關描述之混淆,以下描述中使用之術語將以最廣義的合理方式解釋,即使其與本發明某特定實施例之細節描述一起使用。 The invention will be described in detail in the preferred embodiments and aspects. The following description provides specific details of the implementation of the invention and is intended to provide a thorough understanding of the embodiments. Those skilled in the art will appreciate that the present invention may be practiced without these details. In addition, some well-known structures or functions may be described or described in detail to avoid obscuring the description of the various embodiments. The terms used in the following description will be interpreted in the broadest sense, even if A detailed description of a particular embodiment of the invention is used together.

參閱第一圖,該圖係顯示本發明實施例自動化辨識表單並建立動態表單之系統架構圖。根據本發明之一實施例,本發明之主要目的在於提供一種自動化辨識表單並建立動態表單之系統,包含:一辨識模組100,耦接於一處理器760,透過光學字元識別(Optical Character Recognition,OCR)技術辨識表單的文字及版面;一編排模組200,耦接於該處理器760,修正欄位、文字或版面格式;一分類模組300,耦接於該處理器760,判斷表單內容的屬性;一特徵資料庫600,耦接於一儲存裝置770,該儲存裝置770耦接於該處理器760,該特徵資料庫600儲存表單上的特徵詞語及其關聯性詞語;及一系統資料庫700,耦接於該儲存裝置770,該分類模組300將表單內容存入該系統資料庫700指定的屬性中;其中該編排模組200比對該特徵資料庫600的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式;其中該分類模組300藉由比對該特徵資料庫600的特徵詞語判斷表單內容的屬性。 Referring to the first figure, the figure shows a system architecture diagram of an automated identification form and a dynamic form in an embodiment of the present invention. According to an embodiment of the present invention, a main object of the present invention is to provide a system for automatically identifying a form and creating a dynamic form, comprising: an identification module 100 coupled to a processor 760 for optical character recognition (Optical Character) The recognition module (OCR) is configured to identify the text and the layout of the form; a programming module 200 is coupled to the processor 760 to modify the field, the text or the layout format; and a classification module 300 coupled to the processor 760 to determine Attributes of the form content; a feature database 600 coupled to a storage device 770, the storage device 770 is coupled to the processor 760, the feature database 600 stores feature words on the form and their associated words; The system database 700 is coupled to the storage device 770, and the classification module 300 stores the form content in the attribute specified by the system database 700; wherein the arrangement module 200 compares the feature words of the feature database 600, The field, text or layout format is automatically corrected based on the feature words; wherein the classification module 300 determines the attributes of the form content by comparing the feature words of the feature database 600.

該自動化辨識表單並建立動態表單之系統得由不同電子計算裝置或雲端設備共同操作之,亦得以同一電子計算裝置或雲端設備執行之,進一步地,可由相異操作系統之電子計算裝置或雲端設備以完成本發明,操作系統包含IOS、Windows、Android等,但並不以此為限,同理,該處理器760與該儲存裝置770得包含於不同的電子計算裝置或雲端設備,亦得包含於同一電子計算裝置或雲端設備,因本文所述之各模組或各資料庫可分別獨立安裝於不同的硬體設備再串接成本發明之自動化辨識表單並建立動態表單之系統,故本文所述之「耦接」可以是直接耦接或間接耦接,該儲存裝置770包含硬碟、隨身碟、雲端硬碟等,但並不以此為限,下文所述的自動化辨識表單並建立動態表單之系統係以同一電子計算裝置進行操作來說明,熟知該項技術領域之通常知識者應當理解,可廣泛應用於諸多電子計算裝置或雲端設備,本文所述之電子計算裝置包含桌上型電腦、筆記型電腦、智慧型通訊電子裝置或平板電腦等,但並不以此為限,本文所述之雲端設備包含雲端電腦、雲端資料庫等,但並不以此為限。 The system for automatically identifying forms and establishing dynamic forms can be operated by different electronic computing devices or cloud devices, and can also be executed by the same electronic computing device or cloud device, and further, by electronic computing devices or cloud devices of different operating systems. In order to complete the present invention, the operating system includes IOS, Windows, Android, etc., but not limited thereto. Similarly, the processor 760 and the storage device 770 may be included in different electronic computing devices or cloud devices, and may also include In the same electronic computing device or cloud device, because each module or each database described in the present invention can be separately installed on different hardware devices and then connected to the automatic identification form of the invention and establish a dynamic form system, The "coupling" may be directly coupled or indirectly coupled. The storage device 770 includes a hard disk, a flash drive, a cloud hard disk, etc., but is not limited thereto. The automatic identification form and dynamics are described below. The system of the form is described by operating with the same electronic computing device, and those of ordinary skill in the art will understand It can be widely used in many electronic computing devices or cloud devices. The electronic computing device described herein includes a desktop computer, a notebook computer, a smart communication electronic device or a tablet computer, but is not limited thereto. The cloud device includes a cloud computer, a cloud database, etc., but is not limited thereto.

參閱第一圖,該圖係顯示本發明實施例自動化辨識表單並建立動態表單之系統架構圖。根據本發明之較佳實施例,該自動化辨識表單並建立動態 表單之系統,包含:一辨識模組100,耦接於一處理器760,透過光學字元識別(Optical Character Recognition,OCR)技術辨識表單的文字及版面;一編排模組200,耦接於該處理器760,修正欄位、文字或版面格式;一分類模組300,耦接於該處理器760,判斷表單內容的屬性;一驗證資料庫模組400,耦接於該處理器760,檢查修正該系統資料庫700,並將修正資訊回饋到該特徵資料庫600;一動態表單模組500,耦接於該處理器760,透過該系統資料庫700依照需求產生指定欄位的新表單;一特徵資料庫600,耦接於一儲存裝置770,該儲存裝置770耦接於該處理器760,該特徵資料庫600儲存表單上的特徵詞語及其關聯性詞語;及一系統資料庫700,耦接於該儲存裝置770,該分類模組300將表單內容存入該系統資料庫700指定的屬性中;其中該辨識模組100藉由圈選表單的範圍或內容,藉此標定欄位的位置與範圍;其中該編排模組200比對該特徵資料庫600的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式;其中該編排模組200預先定義特徵詞語,定義方式包含預先定義常見詞語或表格樣板、圈選表單上的數個單字為一組特徵詞語或預先定義沒有文字處為沒有欄位或空白欄位;其中該分類模組300藉由比對該特徵資料庫600的特徵詞語判斷表單內容的屬性;其中該驗證資料庫模組400檢查修正該辨識模組100、該編排模組200或該分類模組300的正確性,並將修正資訊回饋到特徵資料庫600;其中該特徵資料庫600包含至少一層次分類620,該層次分類620包含至少一子類640;其中該特徵資料庫600更新該子類640時,會逐一往上一個層次進行更新;其中該特徵資料庫600補充該關聯性詞語的欄位內容到該系統資料庫700。 Referring to the first figure, the figure shows a system architecture diagram of an automated identification form and a dynamic form in an embodiment of the present invention. According to a preferred embodiment of the present invention, the system for automatically identifying a form and creating a dynamic form includes: an identification module 100 coupled to a processor 760 and identified by optical character recognition (OCR) technology. a text and a layout of the form; a programming module 200 coupled to the processor 760 to modify the field, text or layout format; a classification module 300 coupled to the processor 760 to determine the attributes of the form content; The verification database module 400 is coupled to the processor 760, and the correction of the system database 700 is performed, and the correction information is fed back to the feature database 600. A dynamic form module 500 is coupled to the processor 760. A new form of the specified field is generated through the system database 700. A feature database 600 is coupled to a storage device 770. The storage device 770 is coupled to the processor 760. The feature database 600 stores the form. a feature word and its associated term; and a system database 700 coupled to the storage device 770, the classifying module 300 storing the form content in an attribute specified by the system database 700; The identification module 100 calibrates the position and range of the field by circle the range or content of the form; wherein the arrangement module 200 automatically corrects the field according to the characteristic words of the feature database 600, a text or layout format; wherein the orchestration module 200 pre-defines the feature words, and the definition manner includes pre-defining common words or table templates, a plurality of words on the circled form as a set of feature words, or pre-defined no characters at the no-field Or a blank field; wherein the classification module 300 determines the attribute of the form content by comparing the feature words of the feature database 600; wherein the verification database module 400 checks and corrects the identification module 100, the programming module 200 or The correctness of the classification module 300 is fed back to the feature database 600; wherein the feature database 600 includes at least one hierarchical classification 620, the hierarchical classification 620 includes at least one subclass 640; wherein the feature database 600 When the subclass 640 is updated, it is updated one by one to the previous level; wherein the feature database 600 supplements the field content of the associated term to the department. Database 700.

其中該辨識模組100所辨識的表單內容若為數位格式則直接紀錄文字與版面即可,若為類比畫面則需擷取原始書面表單,將類比畫面轉存成數位影像,再進行圖形辨識,擷取原始書面表單的方式可透過許多不同的裝置或技術,例如:手機、相機、錄影機、掃描器等。此外,該辨識模組100可先比對該特徵資料庫600中是否有類似的既存版面可直接套用輸出,若無適合的版面,再選擇表單中需要文字辨識的區域或欄位,經光學字元識別(Optical Character Recognition,OCR)技術辨識成數位文字後輸出至該編排模組200,除了透過光學字元識別(Optical Character Recognition,OCR)技術直接辨識出表單的文字及版面外,使用者亦可直接圈選表單的範圍或內容,該辨識模組100圈選表單的 範圍或內容,其圈選方式包含使用滑鼠、鍵盤或觸控等,但不限於此。 If the content of the form recognized by the identification module 100 is a digital format, the text and the layout may be directly recorded. If the analog image is used, the original written form needs to be retrieved, and the analog image is transferred into a digital image, and then the graphic recognition is performed. The original written form can be accessed through many different devices or technologies, such as cell phones, cameras, video recorders, scanners, and the like. In addition, the identification module 100 can directly apply the output directly to the feature database 600 whether there is a similar existing layout. If there is no suitable layout, then select the area or field in the form that requires text recognition, and the optical word. The Optical Character Recognition (OCR) technology is recognized as a digital text and output to the programming module 200. In addition to directly recognizing the text and layout of the form through optical character recognition (OCR) technology, the user also The range or content of the form can be directly circled, and the identification module 100 circled the range or content of the form, and the circle selection method includes using a mouse, a keyboard or a touch, etc., but is not limited thereto.

其中該編排模組200取得來自辨識模組100的表單內容(包含:欄位數量、大小、欄位的關鍵名稱、欄位的描述內容等),若為數位格式則直接輸出,若為類比畫面則比對該特徵資料庫600的特徵詞語,依據特徵詞語自動修正欄位內容,例如:合併或分割欄位、修正該光學字元識別(Optical Character Recognition,OCR)技術所辨識出的文字等,但不限於此。該編排模組200亦可自動選擇適合的版面套用並輸出,最後記錄此表單的版面格式與表單內容的相關性。 The programming module 200 obtains the form content from the identification module 100 (including: the number of fields, the size, the key name of the field, the description content of the field, etc.), and if it is a digital format, it is directly output, if it is an analog screen Then, the feature words of the feature database 600 are automatically corrected according to the feature words, for example, the fields are merged or divided, and the characters recognized by the optical character recognition (OCR) technology are corrected. But it is not limited to this. The orchestration module 200 can also automatically select and output the appropriate layout, and finally record the relevance of the layout format of the form to the content of the form.

因為表單格式或內容的多樣化,故該編排模組200可先從原始的表單找出特徵詞語,而此特徵詞語可視需要定義為一欄位,並對照原始表單的圖檔,藉此辨識出各個欄位排版的版面,該定義方式包含:一、預先定義常見詞語或表格樣板,可供使用者自由選擇。二、對照原始表單的圖檔,透過滑鼠、鍵盤或觸控等方式,圈選表單上的數個單字為一組特徵詞語。三、預先定義沒有文字處為沒有欄位或空白欄位,如:定義欄位間的空白處是空白欄位。 Because of the diversification of the form format or content, the orchestration module 200 can first find the feature words from the original form, and the feature words can be defined as a field according to the need, and can be identified according to the original form image. The layout of each field layout, the definition includes: First, pre-defined common words or form templates, for users to choose freely. Second, according to the original form of the image file, through the mouse, keyboard or touch, etc., circle several words on the form as a set of feature words. Third, the pre-defined no text is no field or blank field, such as: the blank space between the defined fields is a blank field.

其中該分類模組300將表單內欄位的文字內容與該特徵資料庫600中所儲存的特徵詞語或關聯性詞語進行比對,以便判斷出表單內容的各種屬性,例如:判斷欄位中的文字應紀錄在該系統資料庫700中的欄位名稱(如:學校)或欄位內容(如:台灣大學)等,此外,在比對該特徵資料庫600的關聯性詞語時,可自動關聯原始表單中所沒有的資訊,例如:「學校/台灣大學」可以關聯出學校住址、英文名稱等相關資訊,再依據所有欄位的屬性,自動儲存至該系統資料庫700指定的屬性中。此系統還會對照曾經處理過的表單,依據其欄位的使用經驗,新增關聯性欄位,並記錄該關聯性欄位的關聯性詞語在特徵資料庫600中。 The classification module 300 compares the text content of the field in the form with the feature words or related words stored in the feature database 600, so as to determine various attributes of the form content, for example, determining the fields in the field. The text should be recorded in the system database 700 in the field name (such as: school) or field content (such as: Taiwan University), etc., in addition, in the association of the feature database 600, can be automatically associated Information not available in the original form, for example, "School/Taiwan University" can be associated with the school address, English name and other relevant information, and then automatically stored in the properties specified by the system database 700 according to the attributes of all fields. The system also adds a relevance field according to the experience of its field according to the form that has been processed, and records the related words of the related field in the feature database 600.

其中該驗證資料庫模組400檢查修正該辨識模組100、該編排模組200、該分類模組300或該系統資料庫700的正確性,並將修正資訊(如:版面格式、修正資料、特徵詞語等)回饋到特徵資料庫600。驗證資料庫模組400 檢查修正該辨識模組100、該編排模組200、該分類模組300或該系統資料庫700的正確性時可同時檢查、各別檢查、批次檢查或定時檢查,但不限於此。 The verification database module 400 checks the correctness of the identification module 100, the programming module 200, the classification module 300, or the system database 700, and corrects the information (eg, layout format, correction data, Feature words, etc.) are fed back to the feature database 600. The verification database module 400 can check whether the correctness of the identification module 100, the orchestration module 200, the classification module 300, or the system database 700 can be simultaneously checked, individually checked, batch checked, or periodically checked. But it is not limited to this.

參閱第二圖,該圖係顯示本發明實施例特徵資料庫600之結構圖。其中該特徵資料庫600中,不同分類的該子類640(如:第1子類640、第2子類640)可指定儲存在任一電子計算裝置或雲端設備中,而同一分類的該子類640(如:第1子類640、第1-1子類640與第1-2子類640)則可指定儲存在同一電子計算裝置或雲端設備中,在使用特徵資料庫600時,依據特徵詞語,只需存取相對應的同一電子計算裝置或雲端設備,藉此提高本自動化辨識表單並建立動態表單之系統的整體效能,此外,在更新該子類640時會由最底層的子類640往上一層逐一更新,例如:更新第1-1子類640後再更新第1子類640,並直到該特徵資料庫600全部完成更新。當該特徵資料庫600全部完成更新後,會對更新內容進行比對過往的歷史資料,建立並儲存關聯性詞語,若有新增關聯性詞語則對每一該層次分類620的該子類640,進行逐層更新關聯性詞語的資料內容,該特徵資料庫600還可儲存公認且一致的相關欄位內容(如:機構的中英文名稱、代表人、住址、聯絡電話等),當比對該特徵資料庫600時,可自動補充原先書面表單缺乏的欄位內容(如:機構的中英文名稱、代表人、住址、聯絡電話等)到系統資料庫700指定的屬性中,藉此模式,本自動化辨識表單並建立動態表單之系統在每完成一流程後,可累積特徵資料庫600的特徵詞語或關聯性詞語等,達到表格文件自動化學習辨識的效果,該自動化學習辨識的能力具有下列三項特點:一、制式的表格或文件,有許多重複的文字,不需要再一一辨識出所有重複的文字,只需專注在關鍵文字或欄位的辨識,節省文字或欄位辨識的處理時間,進而提高效能。二、由於表格文件的圖檔來源不一(如:掃描器、數位相機等),圖檔的解析度、拍攝角度、色差等也就不一致,較差的成像品質會影響文字辨識正確率,但可比對歷史資料的關鍵文字或圖形標誌,輔助判斷文字,進而增加文字辨識正確率。三、藉由前述兩項特點,在存進系統資料庫700時,因而能儲存於正確的欄位屬性,達到自動化學習判斷欄位屬性的功能。 Referring to the second drawing, the figure shows a structural diagram of a feature database 600 of an embodiment of the present invention. In the feature database 600, the subclass 640 (eg, the first subclass 640 and the second subclass 640) of different categories may be specified to be stored in any electronic computing device or cloud device, and the subclass of the same category. 640 (eg, the first subclass 640, the 1-1 subclass 640, and the 1-2 subclass 640) may be stored in the same electronic computing device or cloud device, and when the feature database 600 is used, Words, only need to access the corresponding electronic computing device or cloud device, thereby improving the overall performance of the system for automatically identifying the form and establishing a dynamic form. In addition, when updating the sub-class 640, the bottom sub-class will be The 640 is updated one by one to the upper layer, for example, after updating the 1-1th subclass 640 and then updating the first subclass 640, and until the feature database 600 is completely updated. After the feature database 600 is completely updated, the updated content is compared with the historical data, and the related words are created and stored. If there is a new associated word, the sub-class 640 of each of the hierarchical classifications 620 The content of the related words is updated layer by layer, and the feature database 600 can also store the recognized and consistent related field contents (eg, the Chinese and English names of the organization, the representative, the address, the contact number, etc.), when comparing The feature database 600 can automatically replenish the content of the field (such as the Chinese and English name, representative, address, contact number, etc.) of the original written form to the attribute specified by the system database 700, thereby using the mode, The system for automatically identifying the form and establishing the dynamic form can accumulate the feature words or related words of the feature database 600 after completing a process, and achieve the effect of automatic learning and identification of the form file, and the ability of the automatic learning identification has the following three Features: 1. Forms or documents with a lot of duplicate texts. You don't need to identify all the duplicated texts one by one. Note the identification of key words or fields to save processing time for text or field identification, thereby improving performance. Second, because the source of the file in the form file is different (such as: scanner, digital camera, etc.), the resolution, shooting angle, color difference, etc. of the image file are inconsistent, and poor image quality will affect the correct rate of text recognition, but comparable The key text or graphic mark of the historical data assists in judging the text, thereby increasing the correct rate of text recognition. Third, by the above two characteristics, when stored in the system database 700, it can be stored in the correct field attributes, to achieve the function of automatic learning to determine the field attributes.

於表單上的字詞及版面經過辨識、編排與分類後,已數位化並儲存在系統資料庫700中,其中該動態表單模組500再依據該系統資料庫700的 內容,自動的動態建立指定的表單內容,所以可由程式或使用者任意選擇所需要的欄位,依照使用者需求排列欄位版面,並可透過電子計算裝置或雲端設備即時產生新版面的動態表單,故透過本自動化辨識表單並建立動態表單之系統能大量節省人工文書處理的成本。 After the words and layouts on the form are identified, arranged, and classified, they are digitized and stored in the system database 700, wherein the dynamic form module 500 automatically establishes the designation automatically according to the contents of the system database 700. The content of the form, so the program or the user can arbitrarily select the required fields, arrange the field layout according to the user's needs, and instantly generate a dynamic form of the new layout through the electronic computing device or the cloud device, so the identification form is automatically transmitted through the present And the system of creating dynamic forms can save a lot of cost of manual document processing.

參閱第三圖,該圖係顯示本發明實施例之流程步驟圖。根據本發明之另一實施例,係提供一種自動化辨識表單並建立動態表單之方法,本文所述之流程提供不同步驟之示例。雖揭示特定順序及序列,除非另外指定,可更動流程之步驟順序。因此,所述之流程僅為示例性,且該流程得由不同順序步驟以執行之,甚至一些步驟可同時併行。除此之外,並非每一次執行皆包括相同步驟,故本文所述之實施例可能忽略一或多個步驟。本發明亦包括其他步驟流程。以下說明主要係藉由自動化辨識表單並建立動態表單之系統以操作下列步驟,必要時,得搭配其他習知步驟和元件以執行,該自動化辨識表單並建立動態表單之系統不侷限於由同一電子計算裝置實現步驟,得依照實際需求配置不同電子計算裝置以實現步驟流程。 Referring to the third figure, the figure shows a flow chart of an embodiment of the present invention. In accordance with another embodiment of the present invention, a method of automatically identifying a form and creating a dynamic form is provided, and the process described herein provides examples of different steps. Although specific sequences and sequences are disclosed, the order of the steps of the process can be changed unless otherwise specified. Therefore, the described process is merely exemplary, and the process may be performed by different sequential steps, and even some steps may be concurrent. In addition, not every implementation includes the same steps, and thus the embodiments described herein may omit one or more steps. The invention also includes other steps. The following description mainly relies on the system of automatically identifying forms and creating dynamic forms to operate the following steps, if necessary, with other conventional steps and components to perform, the system for automatically identifying forms and establishing dynamic forms is not limited to the same electronic The computing device implementation steps may be configured according to actual needs to implement different steps of the electronic computing device.

步驟810:圈選表單的範圍或內容,藉此標定欄位的位置與範圍。該辨識模組100可透過使用者以圈選的方式藉由光學字元識別技術辨識表單的內容,並藉此標定欄位的位置與範圍。 Step 810: Circle the range or content of the form to calibrate the position and extent of the field. The identification module 100 can identify the content of the form by optical character recognition technology in a circled manner, and thereby calibrate the position and range of the field.

步驟812:透過光學字元識別技術辨識表單的文字及版面。除了如步驟810採用圈選的方式以外,本自動化辨識表單並建立動態表單之系統亦可透過光學字元識別技術直接辨識表單的文字及版面。 Step 812: Identify the text and layout of the form by optical character recognition technology. In addition to the method of circle selection as in step 810, the system for automatically identifying forms and creating dynamic forms can also directly recognize the text and layout of the form through optical character recognition technology.

步驟820:預先定義特徵詞語,定義方式包含預先定義常見詞語或表格樣板、圈選表單上的數個單字為一組特徵詞語或預先定義沒有文字處為沒有欄位或空白欄位。該編排模組200為了修正該辨識模組100所辨識出的表單文字或版面,使用者可預先定義特徵詞語,以便取得更符合使用者需求的表單格式。 Step 820: Pre-defining the feature words, the definition manners include pre-defining common words or table templates, and the number of single words on the circled form is a set of feature words or pre-defined no text is no field or blank field. In order to correct the form text or layout recognized by the identification module 100, the user can pre-define feature words in order to obtain a form format that more closely matches the user's needs.

步驟822:比對一特徵資料庫600的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式。除了如步驟820採用預先定義特徵詞語的方式以外,本自動化辨識表單並建立動態表單之系統亦可自動比對該特徵資料庫600的特徵詞語,並依據特徵詞語修正欄位、文字或版面格式。 Step 822: Align the feature words of a feature database 600, and automatically correct the field, text or layout format according to the feature words. In addition to the manner in which the predefined feature words are employed as in step 820, the system for automatically identifying forms and creating dynamic forms can also automatically compare the feature words of the feature database 600 and modify the field, text, or layout format based on the feature words.

步驟830:比對該特徵資料庫600的特徵詞語判斷表單內容的屬性。該分類模組300比對該特徵資料庫600的特徵詞語,藉此判斷表單內容係欄位名稱或欄位內容等屬性。 Step 830: Determine the attributes of the form content than the feature words of the feature database 600. The classification module 300 compares the feature words of the feature database 600, thereby determining the attributes of the form content such as the field name or the field content.

步驟840:將表單內容存入一系統資料庫700指定的屬性中。該分類模組300判斷出表單內容的屬性後,再存入系統資料庫700所指定的屬性中。 Step 840: The form content is stored in the attribute specified by the system repository 700. The classification module 300 determines the attributes of the form content and stores them in the attributes specified by the system database 700.

步驟850:檢查修正辨識表單的文字及版面、依據特徵詞語自動修正、判斷表單內容屬性的正確性,並將修正資訊回饋到特徵資料庫600。該驗證資料庫模組400,檢查修正該辨識模組100、該編排模組200與該分類模組300的正確性。 Step 850: Check the text and layout of the correction identification form, automatically correct according to the feature words, determine the correctness of the form content attribute, and feed back the correction information to the feature database 600. The verification database module 400 checks and corrects the correctness of the identification module 100, the layout module 200, and the classification module 300.

步驟852:檢查修正該系統資料庫700,並將修正資訊回饋到該特徵資料庫600。該驗證資料庫模組400除了如步驟850檢查修正該三大模組以外,亦可檢查修正該系統資料庫700,且修正的資訊皆會回饋到該特徵資料庫600中,可不斷累積特徵資料庫600的資訊,包含:特徵詞語或關聯性詞語等,故本系統可自動化學習,建立更完善的資料庫。 Step 852: Check and correct the system database 700, and feed the correction information to the feature database 600. The verification database module 400 can check and correct the system database 700 in addition to the step 850 to check and correct the system module 700, and the corrected information is fed back to the feature database 600, and the feature data can be continuously accumulated. The information of the library 600 includes: characteristic words or related words, so the system can automatically learn and establish a more complete database.

步驟860:透過該系統資料庫700依照需求產生指定欄位的新表單。經由上述辨識、編排、分類、驗證的步驟後,該系統資料庫700與該特徵資料庫600所具備的表單資料將不斷的更新且更加完備,故使用者可隨時透過動態表單模組500建立屬於自己的客製化表單。 Step 860: Generate a new form of the specified field through the system repository 700 as needed. After the steps of identifying, arranging, classifying, and verifying, the form data of the system database 700 and the feature database 600 are continuously updated and more complete, so that the user can establish the belong to the dynamic form module 500 at any time. Your own customized form.

上述之目的在於解釋,各種特定細節係為了提供對於本發明之徹 底理解。熟知本發明領域之通常知識者應可實施本發明,而無需其中某些特定細節。在其他實施例中,習知的結構及裝置並未顯示於方塊圖中。在圖式元件之間可能包含中間結構。所述的元件可能包含額外的輸入和輸出,其並未詳細描繪於圖式中。 The above description is intended to be illustrative of specific details of the invention. Those skilled in the art of the invention will be able to practice the invention without some specific details. In other embodiments, conventional structures and devices are not shown in the block diagram. Intermediate structures may be included between the schematic elements. The elements described may contain additional inputs and outputs, which are not depicted in detail in the drawings.

本發明包含的各種處理程序,該處理程序得以硬碟元件加以執行,或內嵌於電腦可讀取指令中,其可形成一般或特殊目的且具有編程指令的處理器760或邏輯電路,以執行程序,除此之外,該程序亦得由硬體及軟體之組合加以執行。 The present invention encompasses various processing programs that are executed by hard disk components or embedded in computer readable instructions that form a general or special purpose processor 760 or logic circuit with programming instructions to perform In addition to the program, the program must also be executed by a combination of hardware and software.

用基本形式來描述方法,在未脫離本發明範疇下,任一方法或訊息得自程序中增加或刪除,熟知該項技術領域之通常知識者應可進一步改良或修正本發明,特定實施方式僅用以說明,非限制本發明。 The method is described in a basic form, and any method or message may be added or deleted from the program without departing from the scope of the invention, and those skilled in the art should be able to further improve or modify the invention. It is intended to illustrate and not to limit the invention.

若文中有一元件“A”耦接(或耦合)至元件“B”,元件A可能直接耦接(或耦合)至B,亦或是經元件C間接地耦接(或耦合)至B。若說明書載明一元件、特徵、結構、程序或特性A會導致一元件、特徵、結構、程序或特性B,其表示A至少為B之一部分原因,亦或是表示有其他元件、特徵、結構、程序或特性協助造成B。在說明書中所提到的“可能”一詞,其元件、特徵、程序或特性不受限於說明書中;說明書中所提到的數量不受限於“一”或“一個”等詞。 If a component "A" is coupled (or coupled) to component "B", component A may be directly coupled (or coupled) to B, or indirectly coupled (or coupled) to B via component C. If the specification states that a component, feature, structure, program, or characteristic A will result in a component, feature, structure, procedure, or characteristic B, it indicates that A is at least part of B, or indicates that there are other components, features, or structures. , program or feature assists in causing B. The word "may" as used in the specification, its elements, features, procedures or characteristics are not limited to the description; the number mentioned in the specification is not limited to the words "a" or "an".

本發明並未侷限在此處所描述之特定細節特徵。在本發明之精神與範疇下,與先前描述與圖式相關之許多不同的發明變更是可被允許的。因此,本發明將由下述之專利申請範圍來包含其所可能之修改變更,而非由上方描述來界定本發明之範疇。 The invention is not limited to the specific details described herein. Many different inventive variations related to the prior description and drawings are permissible in the spirit and scope of the present invention. Accordingly, the invention is intended to cover the modifications and modifications of the invention

Claims (10)

一種自動化辨識表單並建立動態表單之系統,包含:一辨識模組,耦接於一處理器,透過光學字元識別技術辨識表單的文字及版面;一編排模組,耦接於該處理器,修正欄位、文字或版面格式;一分類模組,耦接於該處理器,判斷表單內容的屬性;一特徵資料庫,耦接於一儲存裝置,該儲存裝置耦接於該處理器,該特徵資料庫儲存表單上的特徵詞語及其關聯性詞語;及一系統資料庫,耦接於該儲存裝置,該分類模組將表單內容存入該系統資料庫指定的屬性中;其中該編排模組比對該特徵資料庫的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式;其中該分類模組藉由比對該特徵資料庫的特徵詞語判斷表單內容的屬性。  A system for automatically identifying a form and creating a dynamic form, comprising: an identification module coupled to a processor for recognizing a text and a layout of the form by an optical character recognition technology; a programming module coupled to the processor, Correcting a field, a text, or a layout format; a classification module coupled to the processor to determine an attribute of the form content; a feature database coupled to the storage device, the storage device coupled to the processor, the The feature database stores the feature words and their associated words on the form; and a system database coupled to the storage device, the classification module stores the form content in the attribute specified by the system database; wherein the pattern module The group automatically corrects the field, text or layout format according to the characteristic words of the feature database, wherein the classification module determines the attribute of the form content by comparing the feature words of the feature database.   如申請專利範圍第1項所述之自動化辨識表單並建立動態表單之系統,更包含:一驗證資料庫模組,耦接於該處理器,檢查修正該系統資料庫,並將修正資訊回饋到該特徵資料庫;及一動態表單模組,耦接於該處理器,透過該系統資料庫依照需求產生指定欄位的新表單。  For example, the system for automatically identifying a form and establishing a dynamic form according to the scope of claim 1 further includes: a verification database module coupled to the processor, checking and correcting the system database, and feeding back the correction information to the The feature database and a dynamic form module are coupled to the processor, and generate a new form of the specified field through the system database according to requirements.   如申請專利範圍第2項所述之自動化辨識表單並建立動態表單之系統,其中該辨識模組藉由圈選表單的範圍或內容,藉此標定欄位的位置與範圍;其中該特徵資料庫包含至少一層次分類,該層次分類包含至少一子類。  For example, the system for automatically identifying a form and creating a dynamic form as described in claim 2, wherein the identification module determines the position and range of the field by circle the scope or content of the form; wherein the feature database Contains at least one hierarchical classification that includes at least one subcategory.   如申請專利範圍第3項所述之自動化辨識表單並建立動態表單之系統,其中該編排模組預先定義特徵詞語,定義方式包含預先定義常見詞語或表格樣板、圈選表單上的數個單字為一組特徵詞語或預先定義沒有文字處為沒有欄位或空白欄位;其中該特徵資料庫更新該子類時,會逐一往上一個層次進行更新。  For example, the system for automatically identifying a form and creating a dynamic form as described in claim 3, wherein the orchestration module pre-defines a feature word by defining a common word or a table template, and a plurality of words on the circled form. A set of feature words or pre-defined no text is no field or blank field; when the feature database updates the sub-category, it will be updated one by one to the previous level.   如申請專利範圍第4項所述之自動化辨識表單並建立動態表單之系統,其中該驗證資料庫模組檢查修正該辨識模組、該編排模組或該分類模組的正確性,並將修正資訊回饋到特徵資料庫;其中該特徵資料庫補充該關聯性詞語的欄位內容到該系統資料庫。  The system for automatically identifying a form and creating a dynamic form as described in claim 4, wherein the verification database module checks for correctness of the identification module, the orchestration module, or the classification module, and corrects The information is fed back to the feature database; wherein the feature database supplements the field content of the associated term to the system database.   一種自動化辨識表單並建立動態表單之方法,包含:透過光學字元識別技術辨識表單的文字及版面;比對一特徵資料庫的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式;比對該特徵資料庫的特徵詞語判斷表單內容的屬性;及將表單內容存入一系統資料庫指定的屬性中。  A method for automatically identifying a form and creating a dynamic form, comprising: identifying a text and a layout of the form through an optical character recognition technology; comparing the feature words of a feature database, automatically correcting the field, text or layout format according to the feature word; Determining the attribute of the form content of the feature word of the feature database; and storing the form content in the attribute specified by the system database.   如申請專利範圍第6項所述之自動化辨識表單並建立動態表單之方法,更包含:檢查修正該系統資料庫,並將修正資訊回饋到該特徵資料庫;及透過該系統資料庫依照需求產生指定欄位的新表單。  For example, the method for automatically identifying a form and creating a dynamic form as described in claim 6 includes: checking and correcting the system database, and feeding back the correction information to the feature database; and generating the required data through the system database. A new form that specifies the field.   如申請專利範圍第7項所述之自動化辨識表單並建立動態表單之方法,其中,透過光學字元識別技術辨識表單的文字及版面的步驟包含:藉由圈選表單的範圍或內容,藉此標定欄位的位置與範圍。  The method for automatically identifying a form and creating a dynamic form as described in claim 7 wherein the step of recognizing the text and layout of the form by optical character recognition technology comprises: by circle the scope or content of the form The location and extent of the calibration field.   如申請專利範圍第8項所述之自動化辨識表單並建立動態表單之方法,其中,比對一特徵資料庫的特徵詞語,依據特徵詞語自動修正欄位、文字或版面格式的步驟包含:預先定義特徵詞語,定義方式包含預先定義常見詞語或表格樣板、圈選表單上的數個單字為一組特徵詞語或預先定義沒有文字處為沒有欄位或空白欄位。  The method for automatically identifying a form and creating a dynamic form as described in claim 8 wherein the step of comparing the feature words of a feature database and automatically correcting the field, text or layout format according to the feature words comprises: predefining Feature words, defined by pre-defined common words or table templates, circled single words on a circled form as a set of feature words or pre-defined no text at no position or blank field.   如申請專利範圍第9項所述之自動化辨識表單並建立動態表單之方法,其中,檢查修正該系統資料庫,並將修正資訊回饋到該特徵資料庫的步驟包含:先檢查修正辨識表單的文字及版面、依據特徵詞語自動修正、判斷表單內容屬性的正確性,並將修正資訊回饋到特徵資料庫。  For example, the method for automatically identifying a form and establishing a dynamic form according to claim 9 of the patent scope, wherein the step of checking and correcting the system database and feeding back the correction information to the feature database comprises: first checking the text of the correction identification form And the layout, automatically correct according to the feature words, determine the correctness of the form content attributes, and feedback the correction information to the feature database.  
TW106125615A 2017-07-28 2017-07-28 A system and method for identifying a form and establishing a dynamic form automatically TWI648685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106125615A TWI648685B (en) 2017-07-28 2017-07-28 A system and method for identifying a form and establishing a dynamic form automatically

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106125615A TWI648685B (en) 2017-07-28 2017-07-28 A system and method for identifying a form and establishing a dynamic form automatically

Publications (2)

Publication Number Publication Date
TWI648685B TWI648685B (en) 2019-01-21
TW201911157A true TW201911157A (en) 2019-03-16

Family

ID=65803889

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106125615A TWI648685B (en) 2017-07-28 2017-07-28 A system and method for identifying a form and establishing a dynamic form automatically

Country Status (1)

Country Link
TW (1) TWI648685B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913930A (en) * 2019-05-10 2020-11-10 上海中晶科技有限公司 Species data analysis method, system and computer program product

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5923792A (en) * 1996-02-07 1999-07-13 Industrial Technology Research Institute Screen display methods for computer-aided data entry
US6405190B1 (en) * 1999-03-16 2002-06-11 Oracle Corporation Free format query processing in an information search and retrieval system
US7707039B2 (en) * 2004-02-15 2010-04-27 Exbiblio B.V. Automatic modification of web pages
US20060129532A1 (en) * 2004-12-13 2006-06-15 Taiwan Semiconductor Manufacturing Co., Ltd. Form generation system and method

Also Published As

Publication number Publication date
TWI648685B (en) 2019-01-21

Similar Documents

Publication Publication Date Title
CN110347953B (en) Page generation method, page generation device, computer equipment and storage medium
US11106906B2 (en) Systems and methods for information extraction from text documents with spatial context
WO2022057707A1 (en) Text recognition method, image recognition classification method, and document recognition processing method
US8468167B2 (en) Automatic data validation and correction
US20160055376A1 (en) Method and system for identification and extraction of data from structured documents
US20190278699A1 (en) System and method for automated software test case designing based on Machine Learning (ML)
KR20210090576A (en) A method, an apparatus, an electronic device, a storage medium and a program for controlling quality
US11341319B2 (en) Visual data mapping
US20180157738A1 (en) Informational retrieval
CN113678118A (en) Data extraction system
CN112286934A (en) Database table importing method, device, equipment and medium
US20200320291A1 (en) Techniques to determine document recognition errors
US11386263B2 (en) Automatic generation of form application
US20170132462A1 (en) Document checking support apparatus, document checking support system, and non-transitory computer readable medium
US11823086B2 (en) Membership analyzing method, apparatus, computer device and storage medium
TWI648685B (en) A system and method for identifying a form and establishing a dynamic form automatically
KR102282025B1 (en) Method for automatically sorting documents and extracting characters by using computer
CN118095205A (en) Information extraction method, device and equipment of layout file and storage medium
WO2024057589A1 (en) Correction suggesting method, correction suggesting system, and correction suggesting program
CN117111890A (en) Software requirement document analysis method, device and medium
US20170097697A1 (en) Input device, document input system, document input method, and computer program product
CN111177387A (en) User list information processing method, electronic device and computer readable storage medium
CN116860747A (en) Training sample generation method and device, electronic equipment and storage medium
CN115828856A (en) Test paper generation method, device, equipment and storage medium
Abdelaziz et al. Applications of integration of AI-based Optical Character Recognition (OCR) and Generative AI in Document Understanding and Processing