TWI659320B - Method for creating and indexing document image file having indexed content - Google Patents

Method for creating and indexing document image file having indexed content Download PDF

Info

Publication number
TWI659320B
TWI659320B TW106135375A TW106135375A TWI659320B TW I659320 B TWI659320 B TW I659320B TW 106135375 A TW106135375 A TW 106135375A TW 106135375 A TW106135375 A TW 106135375A TW I659320 B TWI659320 B TW I659320B
Authority
TW
Taiwan
Prior art keywords
roster
page
bar code
document image
image file
Prior art date
Application number
TW106135375A
Other languages
Chinese (zh)
Other versions
TW201917606A (en
Inventor
陳伊伶
陳彥宇
黃振瑩
黃淑華
Original Assignee
臺灣銀行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 臺灣銀行股份有限公司 filed Critical 臺灣銀行股份有限公司
Priority to TW106135375A priority Critical patent/TWI659320B/en
Publication of TW201917606A publication Critical patent/TW201917606A/en
Application granted granted Critical
Publication of TWI659320B publication Critical patent/TWI659320B/en

Links

Abstract

本發明提出一種內容可索引之文件影像檔的建立方法,適用於多份名冊,各名冊包含多個頁面,至少一頁面包含多個特徵資料。本方法包含:形成第一條碼於各名冊之首頁,第一條碼載有其所在名冊之辨識資訊;形成第二條碼於各名冊之各頁面,第二條碼載有其所在之名冊之頁面之頁次資訊,以及載有其所在之名冊之頁面之特徵資料之總筆數資訊;建立循序檔,所述循序檔載有各名冊之複數特徵資料,且這些特徵資料係依照其出現於名冊之次序而依序記載;及掃描所有名冊而形成文件影像檔。The invention proposes a method for establishing a content indexable document image file, which is applicable to multiple rosters, each of which contains multiple pages, and at least one page contains multiple characteristic data. The method includes: forming a first bar code on the first page of each roster, the first bar code containing identification information of the roster where it is located; forming a second bar code on each page of each roster, and a second bar code containing pages of the pages of the roster where it is located Secondary information, and the total number of pieces of information that contain the characteristic data of the pages of the roster; create a sequential file that contains the plural characteristic data of each roster, and these characteristic data are in accordance with the order in which they appear in the roster And record sequentially; and scan all the lists to form document image files.

Description

內容可索引之文件影像檔的建立方法及其索引方法Method for establishing content indexable document image file and its indexing method

本案是關於一種內容可索引之文件影像檔的建立方法及其索引方法。 This case is about a method for establishing an indexable document image file and its indexing method.

當記載有個人資訊以及與該筆個人資訊相對應內容的名冊資料會時常更新時,一旦時間久遠,最後要同時找出該筆個人資訊的所有資料內容時,將會變得很麻煩,也就是常常必須透過調閱名冊影像,再人工地毯式地查找各份名冊中是否出現該筆資訊。 When the roster data recording personal information and the content corresponding to the personal information is updated from time to time, once the time is long, it will become very troublesome to finally find out all the data content of the personal information at the same time, that is, It is often necessary to look through the image of the roster and manually search for whether the information appears in each roster.

舉例來說,當企業或政府機關基於某種目的而持續建立多份名冊,在甫建立的第一年,共建立了10份名冊,分別為名冊A1至A10,其中名冊A1的第6頁記載有人員X的資料。在建立的第二年,又新增了10份名冊,分別為名冊A11至A20,其中名冊A16的第15頁記載有人員X的資料。在建立的第三年,又新增了10份名冊,分別為名冊A21至A30,其中名冊A28之第7頁記載有人員X的資料。假設此時基於某特定原因而需要調出人員X過去三年的所有資料時,便必須透過人工逐頁翻閱名冊A1至A30始能找出這三年中所有跟人員X有關的資料。 For example, when an enterprise or government agency continues to establish multiple rosters for a certain purpose, in the first year of establishment, a total of 10 rosters were established, which are rosters A1 to A10, of which page 6 of rosters A1 records There is information on Person X. In the second year of the establishment, 10 additional rosters were added, which are rosters A11 to A20, of which the information on Person X is recorded on page 15 of the rosters A16. In the third year of the establishment, 10 additional rosteres were added, which are Rosters A21 to A30, of which page 7 of Roster A28 recorded the information of Person X. Suppose that at this time, for all specific information, it is necessary to call up all the data of Person X in the past three years, then you must manually search through the lists A1 to A30 page by page to find out all the data related to Person X in these three years.

承上,倘若名冊A1至A30有建立影像檔,則自然可以透過光學影像辨識技術來查找名冊A1至A30之所有頁面中出現有人員X的部 分,再將其擷取出來。然而,影像辨識技術非常耗時且需要龐大的運算資源,且一旦名冊的份數隨著時間而累積愈來愈多時,將導致每一次的查找都需要耗費極大的時間與電腦運算資源。 It is inherited that if there are image files on the roster A1 to A30, the optical image recognition technology can naturally be used to find the part where the person X appears on all pages of the roster A1 to A30. Minutes, and then take it out. However, the image recognition technology is very time-consuming and requires huge computing resources. Once the number of copies of the roster accumulates more and more over time, each search will take a great deal of time and computer computing resources.

在中華民國TW490644號專利中記載一種「紙本文件轉換電子檔案之方法」,當在將紙本文件掃描成電子檔時,若每份紙本文件的頁數不同時,可以在每份文件的首頁或末頁貼上條碼,如此一來便可以在後續合併所有文件的影像時,透過條碼而能夠輕易地辨識出個別文件起始位置。但該專利只是在說明多份文件透過掃描而合併成單一檔案時,可以透過條碼來辨識出各份文件的起始位置。該專利並沒有根據「文件特定內容」與該文件特定內容所在的「頁數」建立關聯,來讓資料查找人員不需要透過光學影像辨識技術的方式就可以在可能高達上百份的名冊中迅速查找出包含有該文件特定內容的名冊以及所在的頁數,進而可以迅速調閱出包含有該文件特定內容的所有資料。 In the patent of the Republic of China No. TW490644, a "method for converting paper documents into electronic files" is described. When scanning a paper document into an electronic file, if the number of pages of each paper document is different, The first or last page is affixed with a barcode, so that when the images of all documents are subsequently merged, the starting position of individual documents can be easily identified through the barcode. However, the patent only states that when multiple documents are combined into a single file through scanning, the starting position of each document can be identified through a barcode. The patent does not establish an association between the "document specific content" and the "page number" where the specific content of the document is located, so that the data searcher can quickly search the roster of hundreds of copies without using optical image recognition technology. Find out the roster and the number of pages that contain the specific content of the file, and then quickly retrieve all the materials that contain the specific content of the file.

本發明之一實施例提出一種內容可索引之文件影像檔的建立方法,包含:建立複數名冊,各名冊包含複數頁面,至少一頁面包含複數特徵資料。形成一第一條碼於各名冊之首頁,第一條碼載有其所在之名冊之辨識資訊。形成一第二條碼於各名冊之各頁面,第二條碼載有其所在之名冊之頁面之頁次,以及載有其所在之名冊之頁面之特徵資料之一總筆數資訊。建立一循序檔,循序檔載有各名冊之複數特徵資料,且特徵資料係依照其出現於名冊之次序而依序記載。掃描名冊而形成一文件影像檔。 An embodiment of the present invention provides a method for establishing a content indexable document image file, which includes: establishing a plurality of lists, each of which contains a plurality of pages, and at least one page contains a plurality of characteristic data. A first bar code is formed on the first page of each roster, and the first bar code contains identification information of the roster in which it is located. A second bar code is formed on each page of each roster. The second bar code contains the page number of the page on which the roster is located, and one of the characteristic data of the page on which the roster is located. A sequential file is established, and the sequential file contains the plural characteristic data of each roster, and the characteristic data is recorded in order according to the order in which it appears in the roster. Scan the roster to form a document image file.

在本發明之一實施例中,上述內容可索引之文件影像檔的建 立方法更包含形成第三條碼於各名冊之首頁,第三條碼載有所對應之名冊之類別資訊。 In one embodiment of the present invention, the above-mentioned content indexable document image file is constructed. The establishment method further includes forming a third bar code on the first page of each roster, and the third bar code contains the category information of the corresponding roster.

在本發明之一實施例中,更包含:建立一索引檔,且索引檔包含一索引值。 In an embodiment of the present invention, the method further includes: establishing an index file, and the index file includes an index value.

在本發明之一實施例中,上述內容可索引之文件影像檔的建立方法中所述的特徵資料係選自身分證字號、姓名或帳號等。 In one embodiment of the present invention, the feature data described in the method for establishing a content indexable document image file is selected from its own subnet number, name, or account number.

本發明之另一實施例係一種文件影像檔的內容索引方法,適用於前述方法所建立之文件影像檔。所述文件影像檔的內容索引方法包含:接收一查詢關鍵詞,查詢關鍵詞係選自上述特徵資料的其中一者;及於索引檔中搜尋查詢關鍵詞而判斷出符合查詢關鍵詞所在之名冊之頁面。 Another embodiment of the present invention is a content indexing method for a document image file, which is applicable to the document image file created by the foregoing method. The content indexing method for a document image file includes: receiving a query keyword, the query keyword being selected from one of the above-mentioned characteristic data; and searching for an index keyword in the index file to determine a roster that matches the query keyword Page.

在本發明之一實施例中,上述文件影像檔的內容索引方法更包含:根據第三條碼之類別資訊,判斷出符合查詢關鍵詞所在之名冊之頁面。 In an embodiment of the present invention, the content indexing method of the document image file further includes: determining, according to the category information of the third bar code, a page that matches the directory where the query keyword is located.

在本發明之一實施例中,上述文件影像檔的內容索引方法更包含:根據第一條碼之辨識資訊,判斷出符合查詢關鍵詞所在之名冊之頁面。 In an embodiment of the present invention, the content indexing method of the document image file further includes: determining, according to the identification information of the first barcode, a page that matches the directory where the query keyword is located.

101‧‧‧第一條碼 101‧‧‧The first bar code

102‧‧‧第二條碼 102‧‧‧The second bar code

103‧‧‧第三條碼 103‧‧‧ The third bar code

109a-109k‧‧‧特徵資料 109a-109k‧‧‧Characteristics

S11-S15‧‧‧步驟 S11-S15‧‧‧step

P01-P02‧‧‧步驟 P01-P02‧‧‧step

[圖1]為本發明之內容可索引之文件影像檔的建立方法之第一實施例的方法流程圖。 [FIG. 1] A method flowchart of a first embodiment of a method for creating a content indexable document image file according to the present invention.

[圖2]為適用本發明之第一份名冊的首頁示意圖。 [Figure 2] A schematic diagram of the first page of the first roster to which the present invention is applied.

[圖3]為適用本發明之第一份名冊的內頁示意圖。 [Figure 3] A schematic diagram of the inner pages of the first roster to which the present invention is applied.

[圖4]為適用本發明之第二份名冊的首頁示意圖。 [Fig. 4] A schematic diagram of the first page of the second roster to which the present invention is applied.

[圖5]為適用本發明之第二份名冊的內頁示意圖。 [Fig. 5] A schematic diagram of an inner page of a second roster to which the present invention is applied.

[圖6]為本發明之循序檔示意圖。 6 is a schematic diagram of a sequential file of the present invention.

[圖7]為本發明之文件影像檔的內容索引方法之一實施例的方法流程圖。 [FIG. 7] A method flowchart of an embodiment of a content indexing method of a document image file according to the present invention.

請參照圖1,為本發明之內容可索引之文件影像檔的建立方法之一實施例的方法流程圖,說明實施例的實施方法。本實施例的名冊通常是透過電腦系統產生。另請參照圖2至圖5,分別為適用本發明之第一份名冊的首頁示意圖、第一份名冊的內頁示意圖、第二份名冊的首頁示意圖與第二份名冊的內頁示意圖。為方便說明,本實施例所建立之內容可索引的文件影像檔係假設由二份名冊所構成,在實際應用上,文件影像檔可以由多份名冊所構成,且所包含的名冊的數量愈多,更能凸顯本發明的優點。此外,為了方便說明,在此也假設第一份名冊的總頁數為20頁,第二份名冊的總頁數為10頁,且第一份名冊僅有第1頁與第6頁包含有特徵資料,第二份名冊僅有第1頁與第8頁包含有特徵資料。本實施例所稱的特徵資料可以是但不限於身分證字號、姓名、帳號或者上述的組合。 Please refer to FIG. 1, which is a method flowchart of an embodiment of a method for establishing an indexable document image file according to the present invention, and describes an implementation method of the embodiment. The roster of this embodiment is usually generated through a computer system. Please also refer to FIG. 2 to FIG. 5, which are schematic diagrams of the first page of the first roster, the inner pages of the first roster, the first page of the second roster, and the inner pages of the second roster, respectively. For the convenience of explanation, the content indexable document image file created in this embodiment is assumed to be composed of two rosters. In practice, the document image file can be composed of multiple rosters, and the number of rosters included Many, can better highlight the advantages of the present invention. In addition, for the convenience of explanation, it is also assumed here that the total number of pages of the first roster is 20 pages, and the total number of pages of the second roster is 10 pages. Characteristic information, only the first and eighth pages of the second roster contain characteristic information. The characteristic data referred to in this embodiment may be, but is not limited to, an ID card name, name, account number, or a combination thereof.

為了讓文件影像檔之內容可索引,本實施例在建立名冊時,會在每份名冊的首頁形成第一條碼101,第一條碼101載有其所在之名冊之辨識資訊,辨識資訊提供辨識該名冊之關鍵值及辨識個別文件起始位置,前述關鍵值例如申請機關、申請年月等。如圖2所示,透過讀取第一份名冊首頁之第一條碼101便可得知第一份名冊之關鍵值。同樣地,如圖4 所示,透過讀取第二份名冊首頁之第一條碼101便可得知第二份名冊之關鍵值。在其他實施例中,透過辨識第一條碼101代表的數值,對應地自一資料庫獲得申請機關、申請年月等關鍵值作為辨識資訊,及辨識個別文件起始位置。 In order to make the content of the document image file indexable, when the roster is created in this embodiment, a first bar code 101 is formed on the first page of each roster. The first bar code 101 contains identification information of the roster in which it is located. The key value of the roster and identifying the starting position of individual documents. As shown in FIG. 2, the key value of the first register can be obtained by reading the first bar code 101 on the first page of the first register. Similarly, as shown in Figure 4 As shown, the key value of the second roster can be obtained by reading the first bar code 101 on the first page of the second roster. In other embodiments, by identifying the value represented by the first bar code 101, correspondingly, key values such as the application organization, application year, and month are obtained from a database as identification information, and the starting position of individual documents is identified.

如圖2所示,第一條碼101形成於第一份名冊之首頁整體的右上方。再如圖4所示,第一條碼101同樣係形成於第二份名冊之首頁整體的右上方。第一條碼101的形成位置並非一定要在名冊之首頁整體的右上方,而是可以視情況形成在名冊之首頁的任何位置。但原則上在同一份文件影像檔中,每份名冊的第一條碼101會形成在同一位置,以方便程式進行條碼列印。 As shown in FIG. 2, the first bar code 101 is formed at the upper right of the entire first page of the first roster. As shown in FIG. 4 again, the first bar code 101 is also formed at the upper right of the entire first page of the second roster. The formation position of the first bar code 101 does not have to be at the upper right of the entire first page of the roster, but may be formed anywhere on the first page of the roster as the case may be. However, in principle, in the same document image file, the first bar code 101 of each roster will be formed at the same position to facilitate the program to print the bar code.

除了形成第一條碼101之外,還必須在每份名冊的各頁面形成第二條碼102,第二條碼102載有其所在之名冊之頁面的頁次資訊與特徵資料筆數資訊。透過掃描裝置掃描前述二份名冊的所有頁面即可得到一文件影像檔。如圖2所示,第一份名冊的第1頁包含有5筆特徵資料109a、109b、109c、109d及109e,進行影像處理時,透過讀取第一份名冊第1頁之第二條碼102,便可以得知第一份名冊的第1頁具有5筆特徵資料。如圖3所示,第一份名冊第6頁包含有特徵資料109f、109g、109a、109h、109i、109j及109k,透過讀取第一份名冊第6頁之第二條碼102便可以得知第一份名冊的第6頁具有7筆特徵資料。再如圖4與圖5所示,透過讀取第二名冊第1頁與第8頁之第二條碼102,便可以得知第二份名冊的第1頁具有2筆特徵資料即109b與109c,第8頁具有3筆特徵資料即109i、109j及109a。而其它頁,因第二條碼102的特徵資料筆數資訊都為0,便可以得 知並未含有特徵資料。 In addition to forming the first bar code 101, a second bar code 102 must also be formed on each page of each roster. The second bar code 102 contains page information and characteristic information of the pages of the roster. A document image file can be obtained by scanning all pages of the aforementioned two lists through a scanning device. As shown in Figure 2, the first page of the first roster contains 5 pieces of characteristic information 109a, 109b, 109c, 109d, and 109e. When performing image processing, the second bar code 102 on page 1 of the first roster is read. , You can know that page 1 of the first roster has 5 characteristics. As shown in Figure 3, page 6 of the first roster contains characteristic information 109f, 109g, 109a, 109h, 109i, 109j, and 109k. You can find out by reading the second bar code 102 on page 6 of the first roster. Page 6 of the first roster contains 7 characteristics. As shown in Figures 4 and 5, by reading the second bar code 102 on pages 1 and 8 of the second roster, we can know that the first page of the second roster has two characteristic data, namely 109b and 109c. , Page 8 has 3 characteristic data, namely 109i, 109j and 109a. For other pages, because the number of feature data of the second barcode 102 is 0, you can get It does not contain characteristic data.

如圖2與圖3所示,第一份名冊之第二條碼102係形成於各頁面整體的右下方。再如圖4與圖5所示,第二份名冊之第二條碼102也是形成於各頁面整體的右下方。第二條碼102的形成位置並非一定要在名冊之各頁面整體的右下方,而是可以視情況形成在各頁面的任何位置。但原則上在同一份文件影像檔中,每份名冊的第二條碼102均會形成在同一位置,以方便程式進行條碼列印。 As shown in FIG. 2 and FIG. 3, the second bar code 102 of the first register is formed at the lower right of the entire page. As shown in FIG. 4 and FIG. 5, the second bar code 102 of the second register is also formed at the lower right of the entire page. The formation position of the second barcode 102 is not necessarily at the lower right of the entire pages of the roster, but may be formed at any position of each page as the case may be. However, in principle, in the same document image file, the second bar code 102 of each roster will be formed at the same position to facilitate the program to print the bar code.

請參照圖6,為本發明之循序檔示意圖。在第一實施例中,除了於各名冊形成第一條碼101、第二條碼102外,於電腦系統產製名冊時同時建立一個循序檔;循序檔載有各名冊之所有特徵資料,且這些特徵資料均是「依照其出現在各名冊之次序而依序記載」。建立一索引檔包含名冊關鍵值、名冊類別、與名冊內特徵資料之索引值。索引值的形成方式係先取得名冊影像檔,然後循序對名冊的每一頁面進行處理,以獲得各頁之第一條碼101資料。獲得第一條碼101資料後,先進行首頁判斷,如該頁之第一條碼101與前一頁不同,則表示與前一頁為不同之文件,應將該頁判斷為另一份文件之首頁,並且就該文件第一條碼之辨識資訊及第三條碼之文件屬性資料建立名冊索引值。文件屬性為名冊類時,依第一條碼取得對應之名冊循序檔,將循序檔讀取指標設定為第一筆。接著讀取第二條碼102,依第二條碼之頁次資訊與特徵資料筆數資訊,自循序檔讀取該筆數之特徵資料,建立特徵資料與影像檔、頁次間之關聯,而形成一名冊影像之內容索引值。 Please refer to FIG. 6, which is a schematic diagram of a sequential file of the present invention. In the first embodiment, in addition to forming the first bar code 101 and the second bar code 102 in each roster, a sequential file is simultaneously created when the computer system produces the roster; the sequential file contains all the characteristic data of each roster, and these features The data are "recorded in order according to the order in which they appear in each roster." Create an index file containing the index values of the key value of the roster, the category of the roster, and the characteristic data in the roster. The index value is formed by first obtaining the image file of the roster, and then sequentially processing each page of the roster to obtain the first barcode 101 data of each page. After obtaining the information of the first bar code 101, the first page is judged. If the first bar code 101 on this page is different from the previous page, it means that the document is different from the previous page, and this page should be judged as the first page of another document. , And establish a roster index value based on the identification information of the first barcode of the document and the document attribute data of the third barcode. When the attribute of the file is a roster type, the corresponding sequential file of the roster is obtained according to the first bar code, and the reading index of the sequential file is set to the first stroke. Then read the second bar code 102, according to the page information and feature data number information of the second bar code, read the feature data of the number from the sequential file, and establish the relationship between the feature data and the image file and page number, and form Content index value of a book image.

透過上述實施例建立之文件影像檔時,可藉由讀取影像檔之 第二條碼資訊及名冊循序檔,建立各特徵資料與影像檔及名冊頁次之關聯性,透由關聯性建立之索引檔,便可讓文件影像檔的內容可以被索引,具體建立索引檔之方法詳述如下。 When a document image file is created through the above embodiments, the image file can be read by The second bar code information and roster file are in order to establish the correlation between each feature data and the image file and roster page number. Through the index file created by the correlation, the content of the document image file can be indexed. The method is detailed below.

舉例來說,當進行整批文件掃描時,藉由第一條碼101能夠輕易地辨識個別文件的起始位置及前一文件的終止位置,並為該個別文件依名冊辨識資訊建立關鍵值,例如選定申請機關、申請年月為其關鍵值,而第三條碼103能提供該份文件屬性,例如屬A類名冊。取得個別文件之辨識資訊與文件屬性後,即可判斷該文件應否建立特徵資訊索引檔。若需要建立時,依第一條碼101的資訊取得名冊對應的循序檔,再藉由與第二條碼102的頁次與特徵資料的筆數資訊,搭配循序檔,逐頁建立該頁特徵資訊之索引值。索引檔之建立如圖6所示,依查詢關鍵詞109a,於循序檔第1筆、第8筆與第17筆分別出現。在影像處理時,只要讀取文件影像檔的第二條碼102取得該頁的頁次與特徵資料的總筆數資訊,再配合循序檔的特徵資料,就能輕易地為關鍵詞109a,建立第1筆特徵資料、第8筆特徵資料與第17筆特徵資料所在名冊與頁次之索引值。以下進一步詳細說明如何透過次序值、第一條碼與第二條碼來判斷出第1筆特徵資料、第8筆特徵資料與第17筆特徵資料所在的頁面。 For example, when scanning a whole batch of documents, the first barcode 101 can easily identify the start position of an individual document and the end position of the previous document, and establish key values for the individual document according to the roster identification information, such as Select the application organization and the application year and month as its key value, and the third barcode 103 can provide the document attribute, for example, it is a class A register. After obtaining the identification information and document attributes of an individual document, you can determine whether the document should establish an index of feature information. If you need to create a sequential file corresponding to the roster according to the information of the first barcode 101, and then use the number of pages and feature data of the second barcode 102 to match the sequential file, create page-by-page feature information. The index value. The creation of the index file is shown in FIG. 6. According to the query keyword 109a, the first, eighth, and 17th strokes of the sequential file appear. During image processing, as long as the second bar code 102 of the document image file is read to obtain the page number of the page and the total number of characteristic data, and then combined with the characteristic data of the sequential file, the key word 109a can be easily established, Index values of the roster and pages where 1 feature data, 8th feature data and 17th feature data are located. The following further describes in detail how to determine the page on which the first feature data, the eighth feature data, and the seventeenth feature data are located through the sequence value, the first bar code, and the second bar code.

在本實施例中,於程式讀取文件影像檔的第一份名冊第1頁的第一條碼101與第二條碼102,從而得知第一份名冊對應之循序檔以及第1頁具有5筆特徵資料;取得循序檔後,將循序檔讀取指標設定在第一筆。 In this embodiment, the first barcode 101 and the second barcode 102 on the first page of the first register of the document image file are read by the program, so as to know that the sequential file corresponding to the first register and the first page have 5 strokes Characteristic data; after obtaining the sequential file, set the sequential file reading index at the first stroke.

接著,依照第一份名冊第1頁的第二條碼102已經知道該頁次為第1頁,包含5筆特徵資料,因此處理完第一頁影像資料後,依序讀取 循序檔取得109a、109b、109c、109d、及109e等5筆資料,該5筆特徵資料為A0000000000、B0000000000、C0000000000、D0000000000、E0000000000,此5筆特徵資料即為第一份名冊第1頁所載之5筆特徵資料,便可為該5筆特徵資料與第一份名冊之第1頁建立關聯。以本實施例來說,讀完第一份名冊的第1頁之後,接著再讀取第一份名冊的第2頁至第5頁的第二條碼102,由於第一份名冊的第2頁至第5頁沒有包含任何特徵資料,因此至此仍然只有累積了5筆特徵資料。直到讀取了第一份名冊的第6頁的第二條碼102,如圖3所示,此時會取得該頁次為第6頁,總共包含有7筆特徵資料的資訊,此時再從循序檔第6筆起依續讀取7筆資料,又取得109f~109k對應之特徵資料F0000000000~K0000000000,這7筆特徵資料即對應第一份名冊第6頁之7筆特徵資料。 Then, according to the second bar code 102 on the first page of the first roster, it is known that this page is the first page and contains 5 pieces of characteristic data, so after processing the first page of image data, read them in order. Sequential files of 109a, 109b, 109c, 109d, and 109e were obtained. The five characteristics were A0000000000, B0000000000, C0000000000, D0000000000, and E0000000000. The five characteristics were the first page of the first register. 5 characteristic data, you can associate the 5 characteristic data with page 1 of the first roster. In this embodiment, after reading page 1 of the first roster, then read the second bar code 102 on pages 2 to 5 of the first roster, because page 2 of the first roster Page 5 does not contain any feature data, so far only 5 feature data have been accumulated. Until the second bar code 102 on page 6 of the first roster is read, as shown in FIG. 3, at this time, the page number 6 will be obtained, which contains a total of 7 pieces of characteristic information. From the sixth file of the sequential file, 7 data are read successively, and the characteristic data F0000000000 ~ K0000000000 corresponding to 109f ~ 109k are obtained. These 7 characteristic data correspond to 7 characteristic data on page 6 of the first roster.

再依序讀取第一份名冊的第7頁至第20頁的第二條碼102,由於第一份名冊的第7頁至第20頁均未包含任何特徵資料,因此讀取完第一份名冊後所累積的特徵資料的筆數為12筆,該12筆資料已與第一份名冊的頁次及其影像間建立關聯。 Then read the second bar code 102 on pages 7 to 20 of the first roster in order. Since pages 7 to 20 of the first roster do not contain any characteristic information, the first copy has been read The number of characteristic data accumulated after the roster is 12 and the 12 data have been associated with the pages of the first roster and its images.

接著讀取文件的下一頁,從下一頁的第一條碼101與第三條碼103,可得知此為另一個文件的開始,為第二份名冊類之文件,從該頁第二條碼102,得知該頁為第1頁包含有2筆特徵資料。讀取完第二份名冊的第1頁之後,依第一條碼101的資訊取得對應的循序檔,取得前2筆特徵資料的內容,並建立該2筆資料與第二份名冊第1頁次及其影像間之關聯。 Then read the next page of the file. From the first bar code 101 and the third bar code 103 on the next page, we can see that this is the beginning of another file, the second roster file, and the second bar code from this page. 102. It is learned that this page is the first page containing 2 pieces of characteristic information. After reading the first page of the second roster, obtain the corresponding sequential files based on the information of the first bar code 101, obtain the contents of the first two features, and create the two records and the first page of the second roster And its images.

接著依序讀取第二份名冊的第2頁至7頁的第二條碼102,由於第二份名冊的第2頁至第7頁均未包含有任何特徵資料,因此讀取完第二 份名冊第7頁後所累積讀取的特徵資料的筆數為2筆。 Then read the second bar code 102 on pages 2 to 7 of the second roster in order. Since pages 2 to 7 of the second roster do not contain any characteristic information, the second bar The cumulative number of characteristic data read after the seventh page of the register is two.

接著讀取第二份名冊的第8頁的第二條碼102,此時會取得第8頁包含有3筆特徵資料的資訊,此時累積的特徵資料的筆數為5筆,由此可得知第二份名冊的第8頁的特徵資料也會從循序檔的第3筆至第5筆資料擷取出來並與所在頁次的影像建立關聯。 Then read the second bar code 102 on the eighth page of the second roster. At this time, the eighth page contains the information of the three characteristic data. At this time, the accumulated number of characteristic data is five. It is known that the characteristic data on page 8 of the second roster will also be extracted from the third to fifth data of the sequential file and associated with the images on the page.

透過上述實施例,便可將由多份名冊所構成的文件影像檔建立索引值,當人員想要調閱出某特徵資料所存在的頁面時,只要提供查詢關鍵值,且查詢關鍵值必須是特徵資料的其中之一者。透過影像化過程建立之特徵資料的索引檔,便可以輕鬆調閱出目標文件之頁面。 Through the above embodiment, an index value can be created for a document image file composed of multiple rosters. When a person wants to view a page where a certain characteristic data exists, as long as the query key value is provided, the query key value must be a feature One of the materials. Through the index file of the characteristic data created by the imaging process, the page of the target document can be easily retrieved.

請再次參照圖2與圖4,其中第一份名冊的首頁與第二份名冊的首頁還可以形成有一第三條碼103,第三條碼103載有其所在之名冊之類別資訊。舉例來說,第一份名冊是主文件,第二份名冊則是主文件的附件,則透過讀取第一份名冊首頁的第三條碼103便可以得知第一份名冊為主文件,透過讀取第二份名次首頁的第三條碼103便可以得知第二份名冊為附件。 Please refer to FIG. 2 and FIG. 4 again. The first page of the first roster and the first page of the second roster can also form a third bar code 103, and the third bar code 103 contains the category information of the roster in which it is located. For example, the first roster is the main document, and the second roster is an attachment to the main document. You can learn that the first roster is the main document by reading the third bar code 103 on the first page of the first roster. Reading the third bar code 103 on the first page of the second ranking will reveal that the second list is an attachment.

透過形成第三條碼103可以讓調閱的功能更多元。舉例來說,如果循序檔在建立的時候係僅根據類別為主文件的名冊來建立,但是文件影像檔在建立的時候可能同時將主文件與附件同時掃描而構成單一個檔案。透過讀取名冊首頁的第三條碼103便可以得知目前所讀取的文件類別屬於主文件還是附件。例如讀取名冊首頁的第一條碼101與第三條碼103而得知此名冊的類別為附件,此時可跳過該名冊而直接讀取下一份名冊的首頁。 By forming the third bar code 103, the function of access can be made more diverse. For example, if a sequential file is created based on only the category of the main file when it is created, the document image file may be scanned at the same time as the main file and the attachment to form a single file. By reading the third bar code 103 on the first page of the roster, it can be known whether the file type currently read is the main file or an attachment. For example, reading the first bar code 101 and the third bar code 103 on the first page of the roster and knowing that the category of the roster is an attachment, at this time, you can skip the roster and directly read the first page of the next roster.

此外,透過第三條碼103的形成,還可以根據不同類別的名冊而分別建立其專屬的循序檔。舉例來說,第一循序檔係根據類別為主文件的名冊所建立,第二循序檔係根據類別為附件的名冊所建立。如此一來,人員可以選擇性地只調閱類別為主文件或附件的頁面,也可以同時調閱出包含有查詢關鍵詞(特徵資料)之主文件及附件的所有頁面。 In addition, through the formation of the third barcode 103, it is also possible to create its own sequential files according to different types of rosters. For example, the first sequential file is created based on the category of the roster of the main document, and the second sequential file is created based on the category of the roster of attachments. In this way, the personnel can selectively view only the pages whose categories are main documents or attachments, or they can simultaneously view all pages of the main document and attachments containing the query keywords (characteristics).

在此需特別說明的是,圖6僅為循序檔的例示,並非要求循序檔的實際內容必須如圖6所示般排列或者是必須在每個特徵資料旁邊加註次序值。原則上只要能夠讓程式讀取循序檔之後能得知各特徵資料在各名冊中出現的順序即可。 It should be particularly noted here that FIG. 6 is only an example of a sequential file, and it is not required that the actual content of the sequential file must be arranged as shown in FIG. 6 or that an order value must be added next to each feature data. In principle, as long as the program can read the sequential files, it can know the order in which each characteristic data appears in each roster.

請參照圖7,為本發明之文件影像檔的內容索引方法之一實施例的方法流程圖,包含步驟P01至步驟P02。如欲查詢特定資料與其影像檔案時,可提供的一查詢關鍵詞如特徵資料109a之A0000000000,代表要調閱出文件影像檔中出現有特徵資料A0000000000的所有名冊頁面。系統接收到查詢關鍵詞後,會讀取索引檔並利用索引值獲得關聯名冊的關鍵值及出現的頁次,而調閱出該A0000000000出現的頁次及其影像。此外,還可以同時指定名冊的類別資訊,進而透過第三條碼檢索出A0000000000在指定類別資訊之名冊中之頁面。藉此,本發明文件影像檔的內容索引方法,允許於調閱資料之同時不需要透過光學影像辨識技術的方式,就可以在可能高達上萬份的名冊中迅速查找出包含有該特徵資料的所有名冊以及其出現之所在頁數,進而可以迅速調閱出包含有該特徵資料的影像檔,而不需要於調閱資料前透過光學影像辨識技術事先辨識文件特定內容的所有資料。 Please refer to FIG. 7, which is a method flowchart of an embodiment of a content indexing method of a document image file according to the present invention, including steps P01 to P02. If you want to query specific information and its image file, a query keyword that can be provided, such as A0000000000 of characteristic data 109a, means to retrieve all the roster pages in the image file of the document that have characteristic data A0000000000. After receiving the query keywords, the system will read the index file and use the index value to obtain the key value of the associated roster and the number of pages that appear, and retrieve the pages and images of the A0000000000 that appear. In addition, you can also specify the category information of the roster at the same time, and then retrieve the page of A0000000000 in the roster of the specified category information through the third bar code. Therefore, the content indexing method of the document image file of the present invention allows to quickly find out the information that contains the characteristic data in the register that may reach up to tens of thousands of copies while reading the data without using the optical image recognition technology. All the roster and the number of pages on which it appears, so that you can quickly retrieve the image file containing the characteristic data, without the need to identify all the specific content of the document in advance through optical image recognition technology before reading the data.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明之精神和範圍內,當可作些許之修改與變化。因此,只要這些修改與變化是在後附之申請專利範圍及與其同等之範圍內,本發明也將涵蓋這些修改與變化。 Although the present invention has been disclosed as above with the examples, it is not intended to limit the present invention. Any person with ordinary knowledge in the technical field can make some modifications and changes without departing from the spirit and scope of the present invention. Therefore, as long as these modifications and changes are within the scope of the attached patent application and the scope equivalent thereto, the present invention will also cover these modifications and changes.

Claims (10)

一種內容可索引之文件影像檔的建立方法,包含:建立複數名冊,各該名冊包含複數頁面,至少一該頁面包含複數特徵資料;形成一第一條碼於各該名冊之首頁,該第一條碼載有所對應之該名冊之一辨識資訊;形成一第二條碼於各該名冊之各頁面,該第二條碼載有所對應之該名冊之該頁面之頁次及特徵資料之一總筆數資訊;建立一循序檔,該循序檔載有各該名冊之複數特徵資料,且該些特徵資料係依照其出現於該名冊之次序而依序記載;掃描該些名冊而形成一文件影像檔;及讀取各該名冊之第二條碼以及該循序檔而定義出各該特徵資料所在之各名冊之頁面之頁次,並建立一索引檔以記錄所定義出之各該特徵資料所在之各名冊之頁面之頁次。A method for establishing an indexable document image file includes: creating a plurality of lists, each of which contains a plurality of pages, at least one of which contains a plurality of characteristic data; forming a first bar code on the first page of each of the roster, the first bar code Contains the identification information of one of the corresponding roster; forms a second bar code on each page of each roster, the second bar code contains the total number of pages and feature data of the corresponding page of the roster Information; create a sequential file, which contains the plural characteristic data of each of the roster, and the characteristic data are recorded in order according to the order in which they appear in the roster; scanning the roster to form a document image file; And read the second bar code of each of the roster and the sequential file to define the page number of each roster where the feature data is located, and create an index file to record the roster where each of the feature data is defined Page number of page. 如請求項1所述之內容可索引之文件影像檔的建立方法,更包含:形成一第三條碼於各該名冊之首頁,該第三條碼載有其所在之該名冊之一類別資訊。The method for establishing a content indexable document image file as described in claim 1, further includes: forming a third barcode on the first page of each of the roster, the third bar code containing the category information of one of the roster. 如請求項1所述之內容可索引之文件影像檔的建立方法,更包含:形成一第三條碼於各該名冊之首頁,該第三條碼載有所對應之該名冊之一類別資訊。The method for establishing a content indexable document image file as described in claim 1, further comprising: forming a third bar code on the first page of each roster, and the third bar code contains corresponding category information of one of the roster. 如請求項1所述之內容可索引之文件影像檔的建立方法,更包含:於各該名冊的建立過程中,即時地或週期地偵測是否有新增的特徵資料,當偵測到有新增的特徵資料時,即同步記載所偵測到之新增的特徵資料循序檔。The method for creating a content indexable document image file as described in claim 1, further includes: during the establishment of each of the roster, detecting whether there is new feature data in real time or periodically. When new feature data is added, the detected new feature data sequence file is recorded synchronously. 如請求項1至4任一項所述之內容可索引之文件影像檔的建立方法,其中該特徵資料係選自身分證字號、姓名、帳號及其組合所構成的群組。The method for establishing a content indexable document image file as described in any one of claims 1 to 4, wherein the characteristic data is a group formed by selecting a subnet number, a name, an account number, and a combination thereof. 一種文件影像檔的內容索引方法,適用於如請求項2、3或5所建立之文件影像檔,該方法包含:接收一查詢關鍵詞,該查詢關鍵詞係選自該些特徵資料的其中一者;及於該索引檔中搜尋該查詢關鍵詞,而判斷出符合該查詢關鍵詞所在之名冊之頁面。A content indexing method for a document image file, which is applicable to a document image file created as in claim 2, 3, or 5. The method includes: receiving a query keyword, the query keyword being selected from one of the characteristic data Or; searching for the query keyword in the index file, and determining that it matches the page of the roster in which the query keyword is located. 如請求項6所述之文件影像檔的內容索引方法,更包含:根據該第三條碼之類別資訊,判斷出符合該查詢關鍵詞所在之名冊之頁面。The content indexing method for a document image file according to claim 6, further comprising: determining, according to the category information of the third bar code, a page of the roster that matches the query keyword. 如請求項6或7所述之文件影像檔的內容索引方法,更包含:根據該第一條碼所載之名冊的辨識資訊,判斷出符合該查詢關鍵詞所在之名冊之頁面。The content indexing method of the document image file according to claim 6 or 7, further comprises: determining, according to the identification information of the roster contained in the first bar code, a page that matches the roster where the query keyword is located. 一種文件影像檔的內容索引方法,適用於如請求項1或4所建立之文件影像檔,該方法包含:接收一查詢關鍵詞,該查詢關鍵詞係選自該些特徵資料的其中一者;及於該索引檔中搜尋該查詢關鍵詞,而判斷出符合該查詢關鍵詞所在之名冊之頁面。A content indexing method for a document image file, which is applicable to the document image file created by the request item 1 or 4. The method includes: receiving a query keyword, the query keyword is selected from one of the characteristic data; And searching the index keyword in the index file, and determining that the page of the roster in which the query keyword is located matches. 如請求項9所述之文件影像檔的內容索引方法,更包含:根據該第一條碼所載之名冊的辨識資訊,判斷出符合該查詢關鍵詞所在之名冊之頁面。The content indexing method of the document image file according to claim 9, further comprising: determining, according to the identification information of the roster contained in the first barcode, a page of the roster that matches the query keyword.
TW106135375A 2017-10-16 2017-10-16 Method for creating and indexing document image file having indexed content TWI659320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106135375A TWI659320B (en) 2017-10-16 2017-10-16 Method for creating and indexing document image file having indexed content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106135375A TWI659320B (en) 2017-10-16 2017-10-16 Method for creating and indexing document image file having indexed content

Publications (2)

Publication Number Publication Date
TW201917606A TW201917606A (en) 2019-05-01
TWI659320B true TWI659320B (en) 2019-05-11

Family

ID=67347743

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106135375A TWI659320B (en) 2017-10-16 2017-10-16 Method for creating and indexing document image file having indexed content

Country Status (1)

Country Link
TW (1) TWI659320B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676902A (en) * 2008-09-19 2010-03-24 众来科技股份有限公司 File control and management system with functions of identification, classification, search and storage and method
TW201504836A (en) * 2013-07-31 2015-02-01 Ubic Inc Document classification system, document classification method, and document classification program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676902A (en) * 2008-09-19 2010-03-24 众来科技股份有限公司 File control and management system with functions of identification, classification, search and storage and method
TW201504836A (en) * 2013-07-31 2015-02-01 Ubic Inc Document classification system, document classification method, and document classification program

Also Published As

Publication number Publication date
TW201917606A (en) 2019-05-01

Similar Documents

Publication Publication Date Title
JP2643094B2 (en) Document paper recognition system
US7081975B2 (en) Information input device
US8897508B2 (en) Method and apparatus to incorporate automatic face recognition in digital image collections
US8203732B2 (en) Searching for an image utilized in a print request to detect a device which sent the print request
US7949206B2 (en) Scanned image management device
US20080162603A1 (en) Document archiving system
US20060085442A1 (en) Document image information management apparatus and document image information management program
US20050160115A1 (en) Document imaging and indexing system
US7493323B2 (en) Document group analyzing apparatus, a document group analyzing method, a document group analyzing system, a program, and a recording medium
US20020059215A1 (en) Data search apparatus and method
US8290270B2 (en) Method and system for converting image text documents in bit-mapped formats to searchable text and for searching the searchable text
CN104346415B (en) Method for naming image document
KR20130018640A (en) Forensic system, method and program
US20080162602A1 (en) Document archiving system
EA003619B1 (en) System and method for searching electronic documents created with optical character recognition
JP2006285526A (en) Information retrieval according to image data
CN114201658B (en) File fast retrieval method based on face recognition
JP6786658B2 (en) Document reading system
US20070214177A1 (en) Document management system, program and method
CN109960684A (en) Image processing apparatus and storage medium
WO2012017599A1 (en) Information processing device, processing method, computer program, and integrated circuit
TWI659320B (en) Method for creating and indexing document image file having indexed content
JPH07239854A (en) Image filing system
US10789245B2 (en) Semiconductor parts search method using last alphabet deletion algorithm
JP3735313B2 (en) Image management system, image management method, and image management program