TW463099B - Automatic document editing system and method - Google Patents
Automatic document editing system and method Download PDFInfo
- Publication number
- TW463099B TW463099B TW89101060A TW89101060A TW463099B TW 463099 B TW463099 B TW 463099B TW 89101060 A TW89101060 A TW 89101060A TW 89101060 A TW89101060 A TW 89101060A TW 463099 B TW463099 B TW 463099B
- Authority
- TW
- Taiwan
- Prior art keywords
- block
- file
- text
- patent application
- blk
- Prior art date
Links
Landscapes
- Processing Or Creating Images (AREA)
Abstract
Description
4 63 09 94 63 09 9
本發明係有關於文件區塊的。枯 像處理的方式,自動文件剪輟 輯特別是一種利用影 的系統與方法。 發明背景 現代的印刷技術相當進步 樣。以現今平面媒體、報章、雜誌:令牛:排版趨於多 們的目光和增加資訊的可看性,大^ : 了達到吸引人 加以文字、圖片的修飾,使得現代文主二 能迅速且正確地瞭解彩色文件内文章的結 :1若 供讀者閱讀的順序,還可以將文件内的;文以i::=提 重新排列,或是分別對圖文作處理,將 ^卜式 Ϊ出ΤΙ;部份的文字㈣,再經文字二以予以Ϊ 識出來,作各種應用。 了 μ辨 人赫:Ϊ件的分析與暸解㈣’主要在利用電腦自動處理 人類所使用種類繁多的文件。過去雖然已有不少里2處理 動處理與分析的研究,但在彩色文件方面的探;則J件 文件分析的—個重要工作是,將一 成不同的區域。一般來說有兩種方法: 彩色文件影像分 由上而下 割The present invention relates to file blocks. With regard to the dead image processing method, automatic file clipping is especially a system and method using shadows. BACKGROUND OF THE INVENTION Modern printing technology is quite advanced. With today's print media, newspapers, and magazines: Ling Niu: typography tends to attract more people's attention and increase the visibility of information, ^: It has reached the appeal of adding text and pictures to make modern authors quickly and correctly. To understand the conclusions of the articles in the color file: 1 If the order of reading is available to the reader, you can also rearrange the text in the file; or rearrange the text with i :: =, or process the graphics and text separately. Part of the text is recognized by text two for various applications. Identified Human: Analysis and Understanding of Files㈣ ’are mainly using computers to automatically process a wide variety of documents used by humans. In the past, although there have been many researches on the processing and analysis of Li 2 processing, but the exploration of color files; the important thing of the J file analysis is that it will become a different area. Generally speaking, there are two methods: Color file image is divided from top to bottom
第4頁 4 63 09 9Page 4 4 63 09 9
五、發明說明(2) (top-down)或由下而上(bottom-up)。在習知文獻 Friedrich M. Wahl. Kwan Y. Wong > and Richard G. Ca/sey,丨'Block Segmentation and Text Extraction in ^j/ixed Text/Image Documents" Computer Graphics and Image Processing 20, 375-390, 1982,所揭露的"區5. Description of the invention (2) (top-down) or bottom-up. Friedrich M. Wahl. Kwan Y. Wong > and Richard G. Ca / sey, 'Block Segmentation and Text Extraction in ^ j / ixed Text / Image Documents " Computer Graphics and Image Processing 20, 375-390 , 1982, the "quoted district"
段長度平滑演算法(run length smoothing algorithm)" ,以及D. Wang and S.N. Srihari,"Classification of newspaper image blocks using texture analysis" v Computer Vision , Graph, and I mage Process. ,Vol. 47,pp. 3 27-3 52, 1 98 9,所揭露的"投影輪廓切割演算法 (projection profile cut algorithm)",為由上而下的 方法。 而由下往上的習知文獻有L. A. Fletcher and R. Kasturi,'丨 A robust algorithm for text string separation from mixed text/graphics images" IEEE Trans, Pattern Analysis and Machine Intelligence j,Vo 1. 10, pp. 910-918,1 988。將影像中的像素連結Run length smoothing algorithm ", and D. Wang and SN Srihari, " Classification of newspaper image blocks using texture analysis " v Computer Vision, Graph, and I mage Process., Vol. 47, pp 3 27-3 52, 1 98 9, the "projection profile cut algorithm" disclosed is a top-down approach. From the bottom up, the conventional literature is LA Fletcher and R. Kasturi, '丨 A robust algorithm for text string separation from mixed text / graphics images " IEEE Trans, Pattern Analysis and Machine Intelligence j, Vo 1. 10, pp. 910-918, 1 988. Linking pixels in an image
成許多連通分量(connected component),然後再合併成 較大區塊的方法。 而對於文件區塊分類的習知文獻有H. J. Lee and C. W. Ch i en" Segmentation of documents with text/graphic/image" Proc. of ICCPCOL , pp.A method of forming many connected components and then merging them into larger blocks. The conventional literature on file block classification includes H. J. Lee and C. W. Ch i en " Segmentation of documents with text / graphic / image " Proc. Of ICCPCOL, pp.
第5頁 4 63 09 9 五、發明說明(3) 1 88-1 94,1991。對於一文件影像,先利用塗黑法 (smearing method )將像素合併為一個個區塊區域(以〇(^ region),並找出區塊區域的邊界之後,將每個小區域區 塊先作粗分類(coarse classificati〇n ),並在處理前, 先债測文件傾斜的角度,並予以導正。此粗分類是利用每 個區塊的長寬比、區塊大小和作者所定義的一些限制值, 共分為:基本文字區(basic text)、抬頭(title)、圖片 和影像(graphics and image)、線段和雜訊(Une and noises)。之後,在細分類(fine classificati〇n)時,將 圖片和影像利用作者所定義的八個遮罩(mask)來計算分辨 出來,除此之外,合併重疊的相同類別區塊。接下來,更 進步將一些判斷成文字區的抬頭區塊,利用相關位置從 文字區中區分出來。最後,將文字區區塊合併成行(text line),並揭露一種合併(merge)與分割(spilt)並用的方 法將行句内的文字切割出來。 „ 另外,習知文獻 K. C. Fan and l. S. Wang, "Document Segmentation and Classification" Proc. of 1997 IPPR Conf. 〇n CVGIP . Taichung > Taiwan > , Pp. ,1 997,也揭露一種文件分割與分類的 方法可以先將掃描時邊界出現的的雜訊予以偵測並去 除’然後依作者所定義的標準,找出基本結構(basic com^o^ent) j然後再將這些基本結構分類。其過程是先將 之为為文字區(text)部份和非文字區(non-text)。對於文Page 5 4 63 09 9 V. Description of the Invention (3) 1 88-1 94, 1991. For a file image, first use the smearing method to merge pixels into block areas (with 0 (^ region), and find the boundary of the block area. Coarse classification (coarse classificati0n), and before processing, first measure the tilt angle of the document and correct it. This coarse classification uses the aspect ratio of each block, the block size, and some defined by the author. Limit values are divided into: basic text area, title, graphics and image, line and noise. After that, in the fine classificati At the same time, the images and images are calculated and distinguished using the eight masks defined by the author. In addition, the overlapping blocks of the same category are merged. Next, some progress is made to judge some of the text areas as head-up areas Blocks are distinguished from the text area by using related positions. Finally, the text area blocks are merged into text lines, and a method of combining and splitting is used to cut out the text in the sentence. „In addition, the conventional literature KC Fan and l. S. Wang, " Document Segmentation and Classification " Proc. Of 1997 IPPR Conf. 〇n CVGIP. Taichung > Taiwan >, Pp., 1 997, also disclosed a The method of file segmentation and classification can first detect and remove the noise appearing at the boundary when scanning, and then find the basic structure (basic com ^ o ^ ent) according to the criteria defined by the author. Classification. The process is to first divide it into a text area and a non-text area. For text
4 63 09 9 五、發明說明(4) 字部份’利用提出的演算法合併成字串(text string), 再將字串合併為文字段落(text paragraph),並在最後將 相同方向的文字段落合併。而對於非文字部份,使用遮罩 (mask)找出直線,並利用線的多寡來判斷欄(field)和表 (tab 1 e)的不同’接著觀察區塊内各點附近的黑點數,將 影像和圖片分辨出來。最後,利用多邊形(P〇 1 yg〇na 1 )來 表示每個區塊’並合併(merge)重疊的相同類別區塊或分 割(split)不同類別的區塊。 習知文獻L. F. Lee and W. H. Tsai, "Understanding of Arrangements and Extraction of Articles in Chinese Newspaper Images" Proc. Int. Conf. Computer Vision , Graphics and Image Processing , Nantou ’Taiwan , ROC , pp. 479-487 , 1 9 9 5,的揭露裡,則是針對中文的報紙,揭露對於排版瞭 解以及文章擷取的方法’並對於内文文字和標題文字採取 不同的切割方式,以解決不同的字體出現在同一個區塊。 習知文獻 S. C. Lin and W. H. Tsai , "Segmentation and Understanding of Color Magazine Images" Proc. International Computer Symposium , Kaohs i ung,Taiwan ’Republic of China,December 1 9 9 6,pp. 2 0 5 -2 1 2,的揭露裡,則對中文雜誌提出了自 動切割區塊的方法,並利用相關位置和區塊的一些特性找4 63 09 9 V. Description of the invention (4) The word part 'combined into a text string using the proposed algorithm, then merged the string into a text paragraph, and finally the text in the same direction Paragraph merge. For non-text parts, use a mask to find the straight line, and use the amount of the line to determine the difference between the field and the tab (e). Then observe the number of black dots near each point in the block. To distinguish between images and pictures. Finally, a polygon (P0 1 ygona 1) is used to represent each block 'and merge overlapping blocks of the same category or split blocks of different categories. LF Lee and WH Tsai, " Understanding of Arrangements and Extraction of Articles in Chinese Newspaper Images " Proc. Int. Conf. Computer Vision, Graphics and Image Processing, Nantou 'Taiwan, ROC, pp. 479-487, 1 The disclosure of 9 9 5 is aimed at Chinese newspapers, revealing the understanding of typography and the method of extracting articles, and adopting different cutting methods for the main text and headline text to solve the problem that different fonts appear in the same area. Piece. SC Lin and WH Tsai, " Segmentation and Understanding of Color Magazine Images " Proc. International Computer Symposium, Kaohs iung, Taiwan 'Republic of China, December 1 9 9 6, pp. 2 0 5 -2 1 2 In the disclosure of,, a method for automatically cutting blocks was proposed for Chinese magazines, and the relevant positions and some characteristics of blocks were used to find
4 63 09 9 五、發明說明(5)4 63 09 9 V. Description of the invention (5)
頁碼等D 出文章的標題、摘要、章節、拾頭 發明概述 t發:能對彩色文件做分解與重新組合排 之一是,提供一種自動文件剪輕 。其目的 自己的意思,達到自動文件剪輟 ==者可以依照 慧型的文件剪輯系統。 不赞月為一種智 本發明之自動文件剪輯系 取單元、區塊分類判斷單元、 文字切割單元,以及重排單元 統,主要包含有文件區塊抽 區塊順序判斷單元、行句與 一本,明之自動文件剪輯系統可搭配使用一中央處理 :、-纪憶體單元,# 一操作介面來實施各單元之間的 動文件剪輯功能’和作為所需要的資料儲存空間。操作介 面可,一特殊的工具列面板,來方便使用者的觀看選用, 並且得以輕易地操作各項文件之自動剪輯功能。 本發明之又一目的是, 本發明之自動文件剪輯方法 步驟、文件區塊抽取步驟、 判斷步驟、行句與文字切巧 提供一種自動文件剪輯方法。 ,主要包含有原始影像二值化 區塊分類判斷步驟、區塊川貝序 步驟,和文件影像重排步驟。Page number, etc. The title, abstract, chapter, and pick-up of the article are summarized. Summary of the Invention: To be able to decompose and reassemble color files. One of them is to provide an automatic file cutting. The purpose is to realize the automatic file clipping according to its own meaning. == Those who can follow the intelligent file editing system. The shameless month is a kind of intelligent document editing unit, block classification judgment unit, text cutting unit, and rearrangement unit system of the present invention, which mainly includes a document block extraction block sequence judgment unit, a line sentence, and a book. Ming Ming's automatic file editing system can be used with a central processing unit :,-Ji Yi body unit, # an operation interface to implement the function of moving file editing between units' and as the required data storage space. The operation interface is a special toolbar panel, which is convenient for users to watch and select, and can easily operate the automatic editing function of various files. Yet another object of the present invention is to provide an automatic file clipping method in the automatic file clipping method steps, file block extraction steps, judgment steps, lines, and text of the present invention. , Mainly including the binarization of the original image, the block classification judgment step, the block Chuanbei sequence step, and the document image rearrangement step.
4 63 09 94 63 09 9
本务月之原始影像二值化的步驟是根據"矩量保持 (moment preseving)原理",分別對紅、綠、駐—鍤 分量做矩量保持二值化,來取得=_八曰、;^一裡顧巴 "U u 付—原色分Ϊ適當的臨界值 (threshold),再根據一轉換關係得到最佳的二 像,以做區塊抽取。 ~~ ^ 本發明不僅能對彩色文件做分解與重新組合 被抽取出的文件區塊,可以包括是規則的矩 不規則的形狀。 A疋任何 在本發明之實施例中仰二少人,丨卞$僧γ、 之自動文件剪輯系統的各主要單元的功能對此:二::明 分解與重新組合排列,而得到一個重排後的結▲影像。做 兹配合下列圖式、實施例之詳失細說明及 圍,將上述及本發明之其他目的與優點詳述於后。凊範 明 圖式之簡要說 統的架構及各單元的功 圖1係本發明之自動文件剪輯系 示意圖。 一中央 施各單 。^根據本發明之自動文件剪輯系統,搭配使用 處理單元、一記憶體單元,和一操作介面來實 兀*之間的自動文件剪輯功能的方塊示意圖。The step of binarizing the original image of this month is based on the "moment preseving principle", and the binarization of the red, green, and 驻-锸 components is performed to maintain the binarization to obtain = _ 八月, ^ 一 里 顾 巴 " U u Fu-the primary color is divided into an appropriate threshold (threshold), and then the best two images are obtained according to a conversion relationship for block extraction. ~~ ^ The present invention can not only decompose and reassemble color files, the extracted file blocks can also include regular moments and irregular shapes. A 疋 Anyone in the embodiment of the present invention, the main unit of the automatic file editing system, the functions of the main units are as follows: 2: Decomposition and rearrangement, and get a rearrangement After the knot ▲ image. In conjunction with the detailed description and surroundings of the following drawings and embodiments, the above and other objects and advantages of the present invention will be described in detail below.凊 Fan Ming Brief description of the system structure and functions of each unit Figure 1 is a schematic diagram of the automatic file editing system of the present invention. One central government pays the bill. ^ A block diagram of the automatic document editing function according to the present invention, using a processing unit, a memory unit, and an operation interface to implement the automatic document editing function *.
4 63 09 94 63 09 9
五、發明說明(7) 圖3係本發明之自動文件剪辍方 ^ w万去的運作流程圖。 圖4係說明圖3中原始文件影傻- _ c ^ 像一值化的步驟流程。 圖5係§兒明圖3中文件區塊抽取的三種切 圖6、(a)和圖6(b)分別為一文件區^經垂=方向分割和水平 方向分割成兩個區塊後的結果。 圖6(0係說明將方形區塊’利°用影像的近似程纟,内縮成 豉小不規則區塊。 圖7係說明圖3中區塊分類判斷的步驟流程。 圖8係說明圖3中垂直文件區塊順序判斷的步驟流程。 圖9係說明圖3中行句與文字切割的步驟流程。 圖1〇係說明圖3中文件影像自動重排的步驟流程。 圖11 (a )〜11 (e)說明本發明之一較佳實施例。其中, 圖11 (a)係一欲剪輯的全彩文件影像。 圖11 (b)係圖11 (a)經本發明之自動文件剪輯系統的文件區 坡插取單元和區塊分類判斷單元後的一實施結果。 圖11 ( c)係圖11 ( b)再經本發明之自動文件剪輯系統的區塊 順序判斷單元後的一實施結果。 圖11 (d)係圖11 (c)再經本發明之自動文件剪輯系統的行句 與文字切割單元,及利用重排單元依照各種樣式進行 文件的重排,得到重排後的一個橫排的結果影像。 圖11 (e)係圖11 (c)再利用本發明之自動文件剪輯系統的行 句與文字切割單元及重排單元,依照各種樣式進行文 件的重排,得到重排後的一個直排的結果影像。 圖12(a)〜12(d)為關於本發明之自動文件剪輯系統的介面 4 63 09 9 五、發明說明(8) 圖。其中, 圖1 2 (a )列出「檔案(F )」之功能表的各項指令。 圖12(b)列出「底稿(L)」之功能表的各項指令。 圖1 2 (c )列出「剪輯與重排(0 )」之功能表的各項指令。 圖1 2(d)列出「影像與繪圖(I )」之功能表的各項指令。 圖號說明 1 0 0文件區塊剪輯系統 1 0 2區塊分類判斷單元 1 0 4行句與文字切割單元 1 0 6原始文件影像 2 0 1中央處理單元 2 0 3操作介面 1 0 1文件區塊抽取單元 1 0 3 區塊順序判斷單元 1 0 5重排單元 1 0 7重排後文件影像 202記憶體單元 2 0 4工具列面板V. Description of the invention (7) FIG. 3 is a flowchart of the operation of the automatic file clipping method of the present invention. FIG. 4 is a flowchart illustrating the steps of the original file shadow__c ^ like-value in FIG. 3. Figure 5 is the three cuts of the file block extraction in Figure 3, Figure 6, (a) and Figure 6 (b) are a file area ^ divided by vertical = direction and horizontal direction into two blocks result. Fig. 6 (0 is an explanation of the approximation of the image of the square block, which is used to reduce the size into small irregular blocks. Fig. 7 is a flowchart illustrating the steps for determining the classification of the block in Fig. 3. Fig. 8 is an explanatory diagram. The flow of steps for judging the order of the vertical file blocks in Fig. 3. Fig. 9 illustrates the flow of steps for cutting lines and characters in Fig. 3. Fig. 10 illustrates the flow of steps for automatic rearrangement of document images in Fig. 3. Fig. 11 (a) ~ 11 (e) illustrates a preferred embodiment of the present invention. Among them, FIG. 11 (a) is a full-color document image to be edited. FIG. 11 (b) is FIG. 11 (a) via the automatic document editing system of the present invention. An implementation result of the file area slope interpolation unit and a block classification judgment unit. Fig. 11 (c) is an implementation result after the block sequence judgment unit of the automatic file editing system of the present invention. 11 (d) is FIG. 11 (c) The line and text cutting unit of the automatic document editing system of the present invention, and the rearrangement of the document according to various styles using the rearrangement unit to obtain a horizontal result after rearrangement Fig. 11 (e) is the automatic file editing system of Fig. 11 (c) which reuses the present invention The conventional line and text cutting unit and rearrangement unit rearrange the documents according to various styles to obtain a straight-line result image after rearrangement. Figures 12 (a) ~ 12 (d) are about the automatic of the present invention. The interface of the file editing system 4 63 09 9 V. Description of the invention (8) Figure. Among them, Figure 12 (a) lists the various commands of the function table of "File (F)". Figure 12 (b) lists " The manuscripts of the function table of the "Manuscript (L)". Figure 12 (c) lists the commands of the function table of "Clip and Rearrange (0)". Figure 12 (d) lists the "Image and Drawing" (I) ”in the function table. Drawing number description 1 0 0 file block editing system 1 0 2 block classification judgment unit 1 4 lines and text cutting unit 1 0 6 original document image 2 0 1 center Processing unit 2 0 3 operation interface 1 0 1 file block extraction unit 1 0 3 block order judgment unit 1 0 5 rearrangement unit 1 0 7 rearranged document image 202 memory unit 2 0 4 toolbar panel
2 0 5輸入單元 2 06輸出單元 2 0 7儲存單元2 0 5 input unit 2 06 output unit 2 0 7 storage unit
3 0 1原始影像二值化步驟 3 0 2 文件區塊抽取步驟 3 0 3 區塊分類判斷步驟 3 0 4 區塊順序判斷步驟 3 〇 5行句與文字切割步驟 3 0 6文件影像重排步驟 401 取得三原色分量之適當的臨界值 402根據一轉換公式,將原始文件影像轉換為二值化影像 7 0 1 文件區塊的前處理步驟 7 0 2 文件區塊的分類判斷步驟 9 0 1 行句切割步驟3 0 1 Binary step of original image 3 0 2 File block extraction step 3 0 3 Block classification determination step 3 0 4 Block order determination step 3 〇 Line and text cutting step 3 0 6 Document image rearrangement step 401 Obtain the appropriate critical value of the three primary color components. 402 Convert the original document image into a binary image according to a conversion formula. 7 0 1 Pre-processing steps for file blocks 7 0 2 Classification and judgment steps for file blocks 9 0 1 Line sentences Cutting step
第11頁 ^ 63 09 9Page 11 ^ 63 09 9
9〇2内文區塊文字的切割步驟 9 〇 3標題區塊文字的切割步驟 發明之詳細說明 圖1係本發明之自動文件剪蚯 功能示意圖。如圖i所示,本:明構及各單元的 包含-文件區塊抽取單元1〇1 剪輯系統1〇〇 一區塊順序判斷單元1()3、—行判斷單元ι〇2、 及-重排單元m。 句與文字切割單元叫,以 參考圖1 ’首先,文件區塊柚 π 像1。6内的每一區塊進 :=1對原始文件影 後,區塊分_刹斷二1圖文區塊分離出來。然 斷。接:行區塊的圖或文性質做出判 塊順序判斷單元103判別文字區塊的順序。 r宝m字切割單元1〇4進行精細到每一個文字的 1到元105依照各種樣式進行文件的重排 付到重排後的結果影像1 07。 為根據一本發明之自動文件剪輯系統1〇〇,搭配使用 pfnn处理單Λ201、一記憶體單元202,和一操作介面 '知各單70之間的自動文件剪輯功能的方塊示意 圖。知作介面203更備有一工具列面板(panel)2〇4。Step of cutting text in block text in 920 9 Step of cutting text in block text in header Detailed description of the invention FIG. 1 is a schematic diagram of an automatic document cutting function of the present invention. As shown in Figure i, this: Bengo structure and the inclusion of each unit-file block extraction unit 101 clip system 100 block order judgment unit 1 () 3,-row judgment unit ι〇2, and- Rearrange unit m. The sentence and text cutting unit is called to refer to Figure 1 'First, the file block pomelo π is like every block in 1.6: 1 = After the original file is shadowed, the block is divided into _break off 2 1 graphic area The pieces are separated. Sure. Next: the block or sequence of the block makes a block order determination unit 103 to determine the order of the character block. The r-m word cutting unit 104 performs fine-graining to 1 to 105 of each character, and rearranges the document according to various styles. The rearranged result image 107 is displayed. A block diagram of an automatic document editing system 100 according to the present invention, using pfnn processing unit Λ201, a memory unit 202, and an operation interface 'knowledge unit 70'. The known interface 203 further includes a toolbar panel 204.
4 63 〇9 q 五、發明說明(10) 中央處理單元201用來執行自動 單元的功能。記憶體單元202係作為糸統100之各 間。藉由操作介面m上的工具歹=需上的資料儲存空 使用者得以觀看選用,並且輕易地板顯示的功能’ 功能。操作介面203的功能顯示易方也^是作以各圖項开=牛自動剪輯 ㈣hie User interiaee)來達成式使用者介面( 有視窗、對話方塊、主選單、工I 〉使用者介面包含 ke^及巨集UacrcO的使用環境/、,再配合熱鍵(h〇t 根據本發明’欲剪輯的文件可 將文件影像輪入於自動文件煎輯系二用—輸入單元2〇5 如利用掃描器來輸入文件影像資料。中。輪入單元205 執行各項文件剪輯功能後的社杲, 元206。輪出的方式有多種型態,、° ,輸出_至-輸出單 單元,或列印至一輸出農置,或以一 /么顯不於-顯示 至一伺服器中。 ,、子的文件資訊輸出 根據本發明,經整理或欲 可以用記憶體單元202内的新利用的,文件資料, 使用者藉由操作介面2〇3上的預覽^尹來保存,以方4 尋該剪輯文件資料。儲存單元保覽存:功:來觀看或搜 存入文件資料之相關的摘訊忐^的方式包括:ί …戈貝Λ,或存成為圖庫等。 4 63 〇9 9 ------- 五、發明說明(11) 圖3為本發明之自動女#酋 發明之自動文件剪輯方法依序方法的運作流程圖。本 步驟3(H、文件區塊抽取步驟3〇2 :值化 303、區塊順序判斷步 _ ^嵬刀類判斷步驟 和文件影㈣乂 丁句與文字切割步獅5, 和文件影像重排步驟3〇6。以下說明各個步驟流程。 圖4:係5兒明圖3令原始文株旦〈:禮一伯 前所述,本發明之原:文二。: ”對紅、綠、藍三原色做矩^ 換&之,就是取得原始影像之三原色分量之適當, 值,並將原始影像的灰階值(gray level)分成高於臨界 的一組與小於臨界值的另一組。 、界值 參考圖4,首先,在步驟4〇1中,係對一 件影像,根據矩量保持原理,分別取得三原八^ = 的臨界值Rt,Gt和Μ。然後,在步驟4〇2中 刀置之適▲ 值Rt二Gt和Bt ’利用—轉換公式,將原始文件影J臨界 素,設為二值化中的某—值,如黑或白。 / 、象 本發明之實施例中所利用的轉換公式如下. 若(Rp < (Rt + rz1)/2 且 Gp4 63 〇9 q 5. Description of the invention (10) The central processing unit 201 is used to perform the functions of the automatic unit. The memory unit 202 functions as the system 100. By using the tool on the operation interface m == the required data is stored, the user can view and select the function and easily display it on the floor ’function. The display of the function of the operation interface 203 is also easy. ^ It is used to achieve the user interface (with windows, dialog boxes, main menus, tools, etc.) to achieve a user interface (automatic editing 牛 hie User interiaee).> The user interface contains ke ^ And the use environment of the macro UacrcO /, and then cooperate with the hot key (h〇t according to the present invention, the file to be edited can be rotated into the automatic file editing system dual-use input unit 205 if a scanner is used To input the file image data. Medium. The round-in unit 205 performs various file editing functions, Yuan 206. There are various types of rotation, °, output _ to-output single unit, or print to An output is an agricultural product, or is displayed in a server as a /. It is displayed on a server. According to the present invention, the newly-used, document data in the memory unit 202 can be used according to the present invention. , The user saves by previewing ^ Yin on the operation interface 203, and finds the clip file information by the side 4. The storage unit saves and saves: Function: to view or search the relevant digests of the file information 忐 ^ The ways include: ί ... Gobe Λ, Save it as a gallery, etc. 4 63 〇9 9 ------- V. Description of the invention (11) Figure 3 is the operation flow chart of the sequential method of the automatic file editing method of the automatic female # chief invention of the present invention. This step 3 (H, file block extraction step 3202: value 303, block order judgment step _ ^ knife class judgment step and file shadowing sentence and text cutting step lion 5, and file image rearrangement step 3 〇6. The following describes the steps. Figure 4: Figure 5 shows the original text Zhudan <: Li Yibo, the original of the present invention: Wen II .: "Do the three primary colors of red, green and blue Moment ^ change & is to obtain the proper value of the three primary color components of the original image, and divide the gray level of the original image into a group above the threshold and another group below the threshold. Referring to FIG. 4, first, in step 401, an image is obtained, and the critical values Rt, Gt, and M of the three original ^ = are respectively obtained according to the principle of maintaining the moment. Then, the knife is set in step 402. The appropriate value ▲ Rt two Gt and Bt 'Using the conversion formula, the original file shadow J critical element is set to binarization A - value, such as black or white /, embodiments of the present invention as utilized in the following conversion formula If (Rp < (Rt + rz1) / 2 and Gp..
GtGt
Gzi)/2)或Gzi) / 2) or
Up < (Rt + rz1)/2 且 Bp〈(Bt + βζΐ) 2) 或 第u頁 4 63 09 9 五、發明説明(丨2) (GP < cGt + Gzl)/2 且 βρ < (Bt + bzi)/2)或 則,像素P的顏色設為黑色;否則,像素p的顏色設為 白色。其中, " ° KP表像素P的紅色分量;Rt表紅色分量的臨界值; BP表像素p的藍色分量;Bt表藍色分量的臨界值|Up < (Rt + rz1) / 2 and Bp <(Bt + βζΐ) 2) or page u 4 63 09 9 V. Description of the invention (丨 2) (GP < cGt + Gzl) / 2 and βρ < (Bt + bzi) / 2) Or, the color of the pixel P is set to black; otherwise, the color of the pixel p is set to white. Among them, " ° KP indicates the red component of pixel P; Rt indicates the critical value of red component; BP indicates the blue component of pixel p; Bt indicates the critical value of blue component |
Gp表像素p的綠色分量;Gt表綠色分量的臨界值;Gp represents the green component of pixel p; Gt represents the critical value of the green component;
Rzl表紅色分量在矩量保持二值化的偏移量; βζ1表藍色分量在矩量保持二值化的偏移量;Rzl indicates that the red component maintains a binarized offset in the moment; βζ1 indicates that the blue component maintains a binarized offset in the moment;
Gzl表綠色分量在矩量保持二值化的偏移量。 圖係°兒明圖3中文件區塊抽取的、三種切割。本發明之 文件區塊的抽取可包括單區塊的切割、單區塊智慧型的切 割以及夕區塊智慧型的切割。以下說明各切割方法。 首先說明一 用此文件區塊之 下)和(右,上), 明各切割方法。 文件區塊的表示方法。根據本發明,係利 左下角與右上角之兩個二維座標,(左, 的方式來記錄此文件區塊。接下來,說 單區塊的切割是直接 塊。所以,框取範圍的左 以使用者框取出的外框當抽取區 下角座標即(左,下),右上角座Gzl indicates that the green component maintains a binarized offset in the moment. The picture shows three kinds of cuts extracted from the file block in Fig. 3. The extraction of file blocks of the present invention may include single-block cutting, single-block intelligent cutting, and evening block intelligent cutting. Each cutting method will be described below. First of all, use this file block (below) and (right, top) to explain the cutting methods. Representation of file blocks. According to the present invention, the two-dimensional coordinates of the lower left corner and the upper right corner are used to record this file block. (Next, the cutting of a single block is a direct block. Therefore, the frame is taken to the left of the range. The outer frame taken from the user frame is the coordinates of the lower corner of the extraction area (left, bottom), and the upper right corner.
4 63 09 9 五、發明說明(13) 標即(右,上)。 單區塊智慧型切割包含兩個步驟:(a )對於一張文件 影像’先將文件區塊的约各範圍框取出來。(b)利用水平 投景/(vertical projection)和垂直投影(horizontal4 63 09 9 V. Description of the invention (13) The title is (right, top). The single-block intelligent cutting includes two steps: (a) For a file image ', first extract about the range frames of the file block. (B) the use of horizontal projection / (vertical projection) and vertical projection (horizontal
Pro j ec t i on )來找出文件區塊的實際最小外框,也就是找 出左下角座標(L,D)與右上角座標(R,U)。 水平投影尋找方法為’利用水平投影,由上而下找出 第一個有黑色投影的垂直座標分量即為";由下而上找到 第一個有黑色投影的垂直座標分量,即為"D „。 垂直投影尋找方法為’制垂直投影,由左而右找出 黑,投影的水平座標分量,即為再來由右而左 找到第-個黑色投影的水平座標分量,即為"R"。此時再 利用影像的近似程度,對方形外框做更進一步内縮動作, 就可以得到不規則形狀最小外框。 因為一張影像的文件區塊之間都 ^ 些間格在水平投影或垂直投影上,舍:比較大的間格’这 & •kCwhite run)。多I塊智慧型切割就β& & 區- •又,將文件區塊分割、.出來。切割的古^ 士 单切电丨兩锸,,^ μ ^的方向有垂直切割與水 平切割兩種。此種文件區塊的切割 j d 包含下列步驟:Pro j ec t i on) to find the actual smallest outer frame of the file block, that is, find the lower left corner coordinates (L, D) and the upper right corner coordinates (R, U). The method of finding the horizontal projection is' Using horizontal projection, find the first vertical coordinate component with black projection from top to bottom is " find the first vertical coordinate component with black projection from bottom to top, which is " D „. The method of looking for the vertical projection is to make a vertical projection, find black from left to right, and the horizontal coordinate component of the projection, that is, to find the horizontal coordinate component of the first black projection from right to left, which is " R ". At this time, the approximate degree of the image is used again, and the square outer frame is further indented, so that the smallest outer frame of irregular shape can be obtained. On the horizontal or vertical projection, the round: a relatively large compartment 'this & • kCwhite run). Multiple I-block smart cutting will be β & & area-• Also, the file block is divided and cut out. Cut There are two types of ancient, single-cut electricity, vertical, horizontal, and ^ μ ^ directions. Cutting jd of this file block includes the following steps:
第16頁 4 63 09 9Page 16 4 63 09 9
l a )對於一張文件影像’將其約略範圍框取出來。 (b) 根據目前區塊的垂直投影最大白色區段長度與水 平投影最大白色區段長度來決定每次切割的方 向。 (c) 如果垂直投影最大白色區段長度大於水平投影的 最大白色區段長度,就對區塊做垂直切割。反之 ’若水平投影最大白色區段長度大於垂直投影的 最大白色區段長度,則做水平切割。 (d) 切割出來的區塊’繼續重複步驟(b)和(c),以細 切成更小的區塊。 (e) 當最小方形區塊已經切割出來後,再利用如單區 塊智慧型拆解的方式,將各方形區塊内縮成最小 不規則區塊。 圖6(a)和圖6(b)分別為一文件區塊經垂直方向分割和 水平方向分割成兩個區塊後的結果。其中,斜線部分表示 白色區段。S1和S2分別為兩個白色區段的起始點,E1和以 分別為兩個白色區段的結束點。由圖6 ( a )可窥出,文件區 塊(L,D)和(R,U)。經垂直方向分割成兩個區塊(L,D)和 (S1,U) ’及(E1,D)和(R,U)。同樣地,由圖6(b)可看出, 文件區塊(L,D )和(R,U )經水平方向分割成兩個區塊(L, E 2 ) 和(R,U),及(L,D)和(R,S2)。圖6(c)則表示將方形區塊, 利用影像的近似程度,内縮成最小不規則區塊。l a) For a document image ’, take out its approximate frame. (b) Determine the direction of each cut based on the maximum white segment length of the vertical projection and the maximum white segment length of the horizontal projection of the current block. (c) If the maximum white segment length of the vertical projection is greater than the maximum white segment length of the horizontal projection, the block is cut vertically. Conversely, if the maximum white segment length of the horizontal projection is greater than the maximum white segment length of the vertical projection, a horizontal cut is made. (d) The cut out block 'continues to repeat steps (b) and (c) to finely cut into smaller blocks. (e) After the smallest square block has been cut out, the smart block disassembly method such as a single block is used to shrink each square block into the smallest irregular block. Figure 6 (a) and Figure 6 (b) are the results after a file block is split into two blocks in the vertical direction and the horizontal direction, respectively. Among them, the oblique line indicates a white segment. S1 and S2 are the starting points of the two white segments, and E1 and S2 are the ending points of the two white segments, respectively. As can be seen from Fig. 6 (a), the file blocks (L, D) and (R, U). It is vertically divided into two blocks (L, D) and (S1, U) 'and (E1, D) and (R, U). Similarly, it can be seen from FIG. 6 (b) that the file blocks (L, D) and (R, U) are horizontally divided into two blocks (L, E2) and (R, U), and (L, D) and (R, S2). Figure 6 (c) shows that the square block is reduced to the smallest irregular block by using the approximate degree of the image.
第17頁 4 63 09 9 五、發明說明(15) 在文件影像中的文件區塊抽取出來之後,本發明接下 來就是要判斷文件區塊的方向和類別。關於方向,垂直方 向(vertical orientation)的文件區塊是指:區塊内的文 字讀取的方向是由上而下;而水平方向(h〇riz〇ntal orientation)的文件區塊則是指:區塊内的文字讀取方向 是由右到左或左到右。而文件區塊初步的分類如下: (a)圖片區塊(picture bl〇ck):泛指影像、圖片以 及圖片說明。 (b) (c) 垂直標題區塊(vertical headline block):文 字排列為垂直方向的標題區塊。 (d) 水平標題區塊(horizontal headline block): 文字排列為水平方向的標題區塊D 垂直内文區塊(vertical content block):文字 排列為垂直方向的内文區塊。 (e)水平内文區塊(horizontal content block):文 子排列為水平方向的内文區塊。 本發明再次利用到垂直和水平兩個方向的投影,來分 辦文件區塊的類別。同時定義一些參數來判斷文件區塊的 類別,並利用到文件區塊兩個端點座標:(Lef t,D〇wn)、 (Right,Up)。以下為這些參數的定義:Page 17 4 63 09 9 V. Explanation of the invention (15) After extracting the file blocks in the file image, the present invention next determines the direction and type of the file blocks. Regarding the orientation, the vertical orientation file block refers to: the reading direction of the text in the block is from top to bottom; and the horizontal orientation (horizontal orientation) file block refers to: The text in the block is read from right to left or left to right. The preliminary classification of file blocks is as follows: (a) Picture block: Refers to images, pictures, and picture descriptions. (b) (c) Vertical headline block: The text is arranged in a vertical headline block. (d) Horizontal headline block: The headline block in which the text is arranged in the horizontal direction. D Vertical content block: The text block in which the text is arranged in the vertical direction. (e) Horizontal content block: The text is arranged as a horizontal text block. In the present invention, projections in both vertical and horizontal directions are used again to classify the types of file blocks. At the same time, some parameters are defined to determine the type of the file block, and two endpoint coordinates of the file block are used: (Lef t, Down), (Right, Up). The following are the definitions of these parameters:
Vert_Avg_Whi te :垂直方向投影的平均白色區段的長度;Vert_Avg_Whi te: the length of the average white segment projected in the vertical direction;
第18頁 4 63 09 9Page 18 4 63 09 9
Hori—AvgJhite :水平方向投影的平均白色區段的長度; ert_Avg_Black .垂直方向投影的平均黑色區段的長度; Hori一Avg一Black .水平方向投影的平均黑色區段的長度; Black_Rat1〇 :二值化後黑色點在文件區塊内所佔的比 有了這些參數後,圖7為說明圖3中區塊分類判斷的步 驟流程。參照圖7,說明此步驟流程如下: 首先’進行步驟701,步驟701為文件區塊的前處理步 驟’依序包含下列五個步驟:(a)利用圖5的抽取文件區塊 的演算法找出文件區塊的兩端點(Left,Down)、(Right, Up)。(b)去掉區塊中的孤立的黑點(isolated black point)。(c)對此一文件區塊做垂直投影找出 Vert_Avg_White 和Vert_Avg_Black。(d)對此一文件區塊 做水平投影找出Hori一Avg_White 和 Hori—Avg_Black。(e) 計算出Black_Ratio。 接下來,步驟70 2為文件區塊的分類判斷。分有下列幾 種狀況: (a)若(Black_Ratio> Q )或(Vert_Avg_White = 〇 且Hori_Avg_Whi te = 0),則,區塊設為圖片區 塊。Hori-AvgJhite: the length of the average white segment projected in the horizontal direction; ert_Avg_Black. The length of the average black segment projected in the vertical direction; Hori-Avg-Black. The length of the average black segment projected in the horizontal direction; Black_Rat10: binary After the ratio of the black points in the file block after the conversion has been obtained, FIG. 7 is a flowchart illustrating the steps of determining the classification of the block in FIG. 3. Referring to FIG. 7, the flow of this step is described as follows: First, “step 701 is performed, and step 701 is a pre-processing step of the file block”, which includes the following five steps in order: (a) using the algorithm of extracting file blocks of FIG. 5 to find The two ends of the file block (Left, Down), (Right, Up). (B) Remove the isolated black points in the block. (C) Perform vertical projection on this file block to find Vert_Avg_White and Vert_Avg_Black. (D) Perform horizontal projection on this file block to find Hori_Avg_White and Hori_Avg_Black. (E) Calculate Black_Ratio. Next, step 702 is the classification judgment of the file block. There are the following situations: (a) If (Black_Ratio > Q) or (Vert_Avg_White = 〇 and Hori_Avg_Whi te = 0), then the block is set as a picture block.
第19頁 4 63 09 9 五、發明說明(17) 則此區 文 (b)若 Hori_Avg_White > Vert_Avgjhite, 塊設為水平内文區塊;否則,此區塊設為垂直内 區塊 C2且 ,此區塊彀 (c)若(Right—Left)/Vert一Avg一Black S (Top—Down)/(Right—Left)2 C3,則 為垂直標題區塊。 (d)若(Top—Down)/Hoir—Avg_BlackS C2 且 -Left)/(Top—Down)g C3,則,此區塊設為欢 平標題區塊。 . 中 其 -1 Γ^3 值值# 數數數 常常常 AMW *θ AR定定定預預預 為為為 範圍為0. 5到1之間; 範圍在1 0以下;. 範圍為大於1。 區塊分類判斷步驟之後為區塊順序判斷的步 土判斷文件的區塊順序時,將文件分成垂 :本發 ;牛來處…先’先定義四個方向鄰近的區;:輿水平 數如下: 尼為四個表Page 19 4 63 09 9 V. Description of the invention (17) Then this area (b) If Hori_Avg_White > Vert_Avgjhite, the block is set to a horizontal text block; otherwise, this block is set to a vertical inner block C2 and, (C) If (Right-Left) / Vert-Avg-Black S (Top-Down) / (Right-Left) 2 C3, it is a vertical title block. (d) If (Top-Down) / Hoir-Avg_BlackS C2 and -Left) / (Top-Down) g C3, then this block is set as a flat title block. In its -1 Γ ^ 3 value value # Counts are often AMW * θ AR is set to pre-set to be in the range of 0.5 to 1; the range is below 10; the range is greater than 1 . When the block classification judgment step is followed by the block order judgment step to determine the block order of the file, the file is divided into vertical lines: the present; the source of the cattle ... First, first define the areas adjacent to the four directions; : Nepal is for four watches
Up—blk : 一區塊上面最接近且最左邊的區塊 Down_blk :-區塊下面最接近 ^Up-blk: the closest and leftmost block above a block Down_blk:-the closest below a block ^
Rigf : 一區塊右邊最接近且最下面的區\。。Rigf: The closest and lowest region to the right of a block \. .
第20頁Page 20
Lf: -區塊左邊最接近且最上二 4 63 09 9 五、發明說明(18) ^Lf:-The closest and top two on the left side of the block 4 63 09 9 V. Description of the invention (18) ^
Now一blk :指目前所在的區塊。 而當區塊本身的某一方向沒有任何區塊時,則這個方 命的鄰近區塊就設為空區塊(Null Bi〇ck)。 有了這些參數後’圖8為說明圖3中垂直文件區塊順序 判斷的步驟流程。參照圖8,依序說明此步驟流程如下: 梦棘801:找出每個區塊的四個方向鄰近的區塊Up一bl k、Now a blk: Refers to the current block. When there is no block in a certain direction of the block itself, the neighboring block of this command is set as a null block. After having these parameters', FIG. 8 is a flowchart illustrating the steps of judging the order of the vertical file blocks in FIG. Referring to FIG. 8, the sequence of steps is described as follows: DreamBridge 801: Find four blocks adjacent to each block Up-bl k,
Down一blk 、Left_blk 、Right_blk; 少驟802:找出第一序號的區塊,將此區塊設為n〇 w_b lk, 序號設為'—; 少驟8 0 3:若Now_blk有Left_blk且Left_blk還未被排 序,則依序進行下列三個步驟,否則,至步驟 804, ( 803a)設定此Lef t_blk 的Right_blk 為 Now_b1k , (80 3b)設定Now—blk 為Now_blk 的Left_blk, (803c)序號加一,回至步驟803; 少#8 04 :若Now_blk有Right_blk,則依序進行下列兩個 步驟,否則,至步驟805, (804a)設定此Right_blk 的Lef t_blk 為 Now_blk , (804b)設定Now_blk 為Now_blk 的Right_blk,至Down a blk, Left_blk, Right_blk; less step 802: find the block with the first sequence number, set this block to now_b lk, and the sequence number to '—; less step 8 0 3: if Now_blk has Left_blk and Left_blk If not yet sorted, perform the following three steps in order, otherwise, go to step 804, (803a) set Right_blk of this Lef t_blk to Now_b1k, (80 3b) set Now_blk to Left_blk of Now_blk, (803c) serial number plus First, go back to step 803; Less # 8 04: If Now_blk has Right_blk, perform the following two steps in order, otherwise, go to step 805, (804a) set the Left_blk of this Right_blk to Now_blk, (804b) set Now_blk to Right_blk for Now_blk, to
第21頁 4 63 09 9 五、發明說明(19) 步驟8 0 4 ; 步驟805:若 Now_blk 的Down—bl、k 為Null Block,則依序 進行下列兩個步驟’否則,設定此D〇wn_blk的 Up_blk 為Now一blk ,且Now_blk為Now_blk 的 Down_blk,且序號加一,並回至步驟803, (8 0 5 a )設定Now_b 1 k 為 No w_b 1 k 的 Le f t_b 1 k。 (8 0 5b)若 Now_blk = Null Block,則結束,否 則,回至步驟8 0 5。 水平文件區塊之順序判斷的步驟流程,與垂直文件區 塊之順序判斷的步驟流程類似。大致是找到序號第一的區 塊之後,先往下面找,找到文件影像的下邊線,再回溯到 最上邊’然後往右一層,再往下邊找’一直到每個區塊都 找過才停止。而加序號的限制歲垂直文件區塊之順序判斷 一嫌。 " 本發明將圖片區塊與非圖片區塊分開處理。對於非層 片區塊’將區塊内的行句與文字切割出來。並且’先將子 句切割出來後,才對每句的文字作切割處理。非圖片區ij 包括水平與垂直的標題區塊和内文區塊。圖9說明圖3中4 句與文字切割的步驟包含了非圖片區塊内的行句切割步專 9〇1、内文區塊文字的切割步驟902,和標題區塊文字的i 割步驟903。Page 21 4 63 09 9 V. Description of the invention (19) Step 8 0 4; Step 805: If the Down_bl and k of Now_blk are Null Blocks, perform the following two steps in order. Otherwise, set this D0wn_blk Up_blk is Now_blk, Now_blk is Down_blk of Now_blk, and the sequence number is increased by one, and returns to step 803, (8 0 5 a) Set Now_b 1 k to Le f t_b 1 k of No w_b 1 k. (8 0 5b) If Now_blk = Null Block, the process ends. Otherwise, go back to step 8 0 5. The procedure for determining the order of horizontal file blocks is similar to the procedure for determining the order of vertical file blocks. Roughly after finding the block with the first serial number, first search below, find the lower edge of the file image, then go back to the top, then go to the next layer, and then look down. . The order of the numbered vertical file blocks is judged to be a suspicion. " The present invention separates picture blocks from non-picture blocks. For non-slice blocks, the lines and words in the blocks are cut out. And after the clauses are cut out, the text of each sentence is cut. The non-picture area ij includes horizontal and vertical title blocks and text blocks. FIG. 9 illustrates that the step of cutting four sentences and text in FIG. 3 includes a line cutting step 901 in a non-picture block, a text block cutting step 902, and a title block text cutting step 903. .
第22頁Page 22
II
4 63 09 9 五、發明說明(20) 步驟9 0 1之行句切割步驟 “ 塊,則此行句切割步驟依序’右處理的是水平文件區 行的寬。(b)對每行的寬度,‘、'· (a)利用水平投影先找出每 由左右兩端逼近行的長度,/故β垂直方向的局部投影。(c) 實際長度。若處理的是=亩=得到水平文件區塊内每行的 句切割步驟依序為.(a)利 件區塊,方法大致相同,行 (b)對每行的寬产,#| *垂直投影先找出每行的寬。 ’ q〜見没,做水平方, ^ ^ ^ ^ ^ ^,, # Λ /Λ ^;(c} ^τ ^ 度。 又件區塊内母行的實際高 步驟902之内文區塊文字的切 =分割和水平内文文字的分割。本發心為垂垂直内文' 的分割’制到整個區塊的投影各:文文子 割文字”"内文文字的分割與垂直内;=投影來切 唯要主忍—些方向的問題,在投影方 和部份投影都要改為利㈣垂直投影。*整體投影 步驟903之標題區塊文字的切割中,分為 字的分割和水平標題文字的分割。本發明在垂直π顏文^ 分割上不利用整體投影,但利用部份投影的 $ 予 ±L·皆古 T I王,亦艮卩; =旯度,並且定義兩種引號的特性,將引號區別於—般文 字之外。如此利用每格行寬的方式去切割標題令〜又 2知通文予的分割與垂直標題文字的分割類似,唯,要注 意一些方向的問題,在投影方面,部份投影要改為利用到4 63 09 9 V. Description of the invention (20) Step 9 0 1 Line sentence cutting step "block, then this line sentence cutting step sequentially 'right handles the width of the horizontal file area line. (B) For each line Width, ',' · (a) First use the horizontal projection to find the length of each approximating line from the left and right ends, so β is the local projection in the vertical direction. (C) The actual length. If the processing is = mu = get the horizontal file The sentence cutting steps of each line in the block are in order: (a) block of sharp parts, the method is roughly the same, line (b) for the width of each line, # | * vertical projection first find the width of each line. q ~ See no, do the horizontal square, ^ ^ ^ ^ ^ ^ ,, # Λ / Λ ^; (c) ^ τ ^ degrees. Also the actual height of the parent line in the block. Cut = segmentation and segmentation of horizontal text. The original intention is the projection of the vertical segmentation of the vertical text to the entire block: the text sub-cuts the text "" segmentation of the text and vertical interior; = projection We must bear the main concern—for some orientation problems, the projection side and part of the projection must be changed to the vertical projection. * The cutting of the title block text in the overall projection step 903 , Divided into word segmentation and horizontal title text segmentation. The present invention does not use the overall projection in the vertical π Yanwen ^ segmentation, but uses a partially projected $ to ± L · Kigu TI King, also Gen 卩; = 旯It also defines the characteristics of two types of quotation marks, which distinguish them from ordinary text. In this way, the title line is cut by using the line width of each cell. The division of the text is similar to the division of vertical title text. , We should pay attention to some directions. In terms of projection, some projections should be used instead.
4 63 〇9 9 五、發明說明(21) 〜 垂直投影。 當文件區塊内的文字都予以分割完成,而且區塊之 的順序也已經判斷,本發明就可以對文件做重新排列的^ 作。本發明有兩種文件重排的方法,一為手動重排,另— 為自動重排。對於手動重排,本發明之系統備有字句間距 排版設定對話盒,供使用者選擇設定字句間距、字體大 小、邊界等等。既可以美化重排後的版面排版,也提供使 用者依照自己的喜愛來改變一些字句間排版的機會。 而自動重排的方法可以方便使用者不用設定字距排版 方式’所選擇的區塊’經過重新排列後,會盡量填滿整個 決定的版面,讓版面上不會有太多的空白區域,看起來比 較美觀。 在自動重排中’本發明先讓區塊内的文字,逐一依照 之前述的水平文#或垂直文件的方向排列完。若未達到預 先定義的兩個限制,則將字體變大。其中一個限制是不能 超過文件邊線,而另一傭是若有要顯示圖片區塊,則限定 圖片區塊至多只能縮小到1 /L,其中L為大於1的數,再小 的話,也算是超過預先定義的限制。圖丨〇說明圖3中文件 影像自動重排的步驟流程。在改變字體的過程中,為了能 排出較美觀的格式,其他設定也得調整,本發明設定一個 參數count,來調整其他的設定。參照圖丨〇,現將自動文4 63 〇9 9 V. Description of the invention (21) ~ Vertical projection. When the text in the file block is divided and the order of the blocks has been determined, the present invention can rearrange the files. There are two methods of file rearrangement in the present invention, one is manual rearrangement, and the other is automatic rearrangement. For manual rearrangement, the system of the present invention has a sentence spacing typesetting setting dialog box for users to choose to set the sentence spacing, font size, borders, and so on. It can not only beautify the rearranged layout, but also provide users with the opportunity to change the layout of some words according to their own preferences. The automatic rearrangement method can facilitate the user without setting the kerning type 'selected block'. After rearrangement, it will try to fill up the entire layout, so that there are not too many blank areas on the layout. See It looks more beautiful. In the automatic rearrangement, the present invention first arranges the characters in the block one by one according to the aforementioned horizontal text # or vertical file direction. If the two predefined limits are not reached, the font will be enlarged. One of the restrictions is that it cannot exceed the file border, and the other is that if there is a picture block to be displayed, the picture block can only be reduced to at most 1 / L, where L is a number greater than 1, and even smaller, it is considered to be Exceeding predefined limits. Figure 丨 〇 illustrates the process of automatic rearrangement of document images in Figure 3. In the process of changing the font, in order to discharge a more beautiful format, other settings must be adjusted. The present invention sets a parameter count to adjust other settings. Refer to the figure 丨 〇
4 63 09 9 五、發明說明(22) 件重排的步驟依序描述如下: / 步驟1 00 1 : count設定為1 ; 步驟1 0 02 :依字句間格設定,按照找出的 件區塊; 汁徘列所有文 步驟1 003:若顯示圖片區塊且有圖片區塊, 塊,且進行步驟l〇03a,否1J圖片區 1 004 ; 進仃步驟4 63 09 9 V. Description of the invention (22) The steps for rearranging the pieces are described as follows: / Step 1 00 1: Count is set to 1; Step 1 0 02: Set according to the space between the sentences, according to the block found Step 1 003: If a picture block is displayed and there is a picture block, block, and proceed to step 1003a, no 1J picture area 1 004; proceed to step
步驟1 003a:若圖片小於原圖的1/L 步驟1 00 5; 則進行 步驟1 0 04 :若沒有超過文件的邊線限制,則 1 004a至1 004c,否則,進行步驟1〇〇5下列步驟 步驟1 004a: β文字體的大小+Κι個像素 題字體的大小+K2個像素,不 步驟1 004b:若count除,以心餘Q,則 W1個像素, 』政+ 若count除以&餘〇,則行 W2個像素, 』祀十 步驟 1 004c: count=c〇Unt + i,回至步 1 0 0 2 ; 步驟1 0 0 5 :標題字體大小-K2個像素,推轩下別本时 糸 進灯下列步驟j 至步驟1 0 0 5b: 〇aStep 1 003a: If the picture is smaller than 1 / L of the original image Step 1 00 5; then proceed to Step 1 0 04: if the edge limit of the file is not exceeded, then 1 004a to 1 004c, otherwise, proceed to the following steps of step 105 Step 1 004a: β font size + K pixels pixel font size + K2 pixels, without step 1 004b: If count is divided by Q, then W1 pixels, 『政 + If count divided by & For the remaining 〇, the line is W2 pixels. 『Ten Step 1 004c: count = c〇Unt + i, go back to Step 1 0 2; Step 1 0 5: Title font size-K2 pixels, push Xuan Xia Time to enter the following steps j to steps 1 0 0 5b: 〇a
步驟1 0 0 5a:内文字體的大小-&個像素,並 重新排列所有區塊, ” W 4 63 09 9 五、發明說明(23) 步驟1 0 0 5b : 若超過文件邊線的限制,或圖片 小於1/L,則至步驟1 005a,否則 結束。 其中,I、K2、 數值,且Κ2 > δ W! W2 和L為機定(default)的常 > W,Step 1 0 0 5a: the size of the inner font-& pixels, and rearrange all the blocks, "W 4 63 09 9 V. Description of the invention (23) Step 1 0 0 5b: If the limit of the document border is exceeded, Or if the picture is less than 1 / L, go to step 1 005a, otherwise end. Among them, I, K2, and κ2 > δ W! W2 and L are constants > W,
L 步驟1004是用來做其他設定的調整,而步驟i〇〇5是在 利用内文文字的縮小來做最後微調的工作。 圖1 1 ( a )〜1 1 ( e )說明本發明之一較佳實施例。此實施 例中,將一全彩文件影像經以本發明之自動文件剪輯系統 的各主要單元的功能對此彩色文件做分解與重新組合排 列’而得到一個重排後的結果影像。圖丨丨(a)為一欲煎輯 的全彩文件影像。大小約450 0K〜5 0 0 0K位元組(byte),150 dPi,高1 5 0 0像素,寬looo像素。圖u(b)為 :::自:r牛剪輯系統的文件區塊抽取單元和區匕類 皁兀後的一實施結果。圖11(c)為圖11(b)再經本發明 的川i:ϊ:剪輯系統的區塊順序判斷單元,判別文字區塊 11 (。)再利用本發明 早凡依照各種樣式進行文件 ^ 排 排的結果影像。圖丨丨為gj i ,侍到重排後的一個横 圖為圖11(C)再利用本發明之自動文 4 63 09 9 五、發明說明(24) 件剪輯系統的行句與文字切割單元及重排單元,依照各種 樣式進行文件的重排,得到重排後的一個直排的結果影 像。 圖12(a)〜12(d)為關於本發明之自動文件剪輯系統的 介面圖。以下摘述其功能表之一些重要指令。 圖12(a)中’列出「檔案(F)」之功能表的各項指令。 其中之π自動網頁出版"指令的作用為,自動將編輯完成的 影像,轉變成html格式的檔案’以讓使用者可以用全球資 訊網(world wide web,的瀏覽 g(br〇wser)來 f 輯後的結果。 見、竭 圖12⑴中,列Λ「底稿(L)」之功能表 其中之自動去除底稿背景"指令的作用為 令。 對 背景雜訊與顏色去除。而,"色彩調和"指令的仵衫像的 底稿作減色的動作,以便壓縮檔案的大小。、作用為, 圖12(c)中,列出「剪輯與重排(〇)」之叫 ^令。這些指令的主要作用為將文件區^抽取=表的各項 單區塊的切割、單區塊智慧型的切割,以及 ^來,包括 的切割。 區塊智慧型 圖12(d)中,列出「影像與繪圖(1) 之1力能表的各The L step 1004 is used to adjust other settings, and the step i005 is to use the reduction of the text to make the final fine-tuning work. 11 (a) to 1 (e) illustrate a preferred embodiment of the present invention. In this embodiment, a full-color document image is decomposed and recombined and arranged with the functions of the main units of the automatic document editing system of the present invention to obtain a rearranged result image. Figure 丨 丨 (a) is a full-color document image to be edited. The size is about 450 0K to 5 0 0K bytes (bytes), 150 dPi, height 1550 pixels, wide looo pixels. Figure u (b) is the result of the implementation of the file block extraction unit and the file system of ::: from: r cattle editing system. FIG. 11 (c) is the block sequence judgment unit of the Chuan i: ϊ: editing system of FIG. 11 (b) according to the present invention, and the text block 11 (.) Is reused. The present invention has been used to arrange files according to various styles ^ Row of resulting images. Figure 丨 丨 is gj i. A horizontal view after the rearrangement is shown in Figure 11 (C). The automatic text of the present invention is reused 4 63 09 9 V. Description of the invention (24) Line and text cutting unit of the editing system And rearrangement unit, rearrange the files according to various styles, and obtain a straight-lined result image after rearrangement. 12 (a) to 12 (d) are interface diagrams related to the automatic document editing system of the present invention. The following summarizes some important instructions of its function table. In Figure 12 (a), the commands of the menu of "File (F)" are listed. Among them, the function of the π automatic web page publishing command is to automatically convert the edited image into a file in html format, so that users can use the world wide web (browser) to browse The result after the f series is shown in Fig. 12 (b). The function of the "Automatically remove the background of the manuscript" in the menu of Λ "Manuscript (L)" is to order the background noise and color. And, " The color-blending manuscript of the shirt image is reduced in color so as to reduce the size of the file. The effect is as shown in Figure 12 (c), which is called "Clip and Rearrange (0)". These The main function of the instruction is to extract the file area ^ from the single block cutting of the table, the intelligent cutting of the single block, and the included cutting. The block intelligence is shown in Figure 12 (d). "Each of the Force Table of Image and Drawing (1)
4 63 09 9 ϊ ΐ二其中之"區塊影像減色"指令的作、 =顏色;目,以便減少儲存空間”4,減少區塊所 的小視窗,稱之為圖庫區,可 ’圖12(d)之右 :,並且可以對這些存 拖::取出影像區 作。 幻鬼’作管理與搜尋的動 倣八紐去 動文件剪輯系統與方法不僅能對梦声令 做分解與重新組合挑 友小僅此對衫色文件 m 徘列,而被抽取出的文杜P掄,άΓ 規則的矩形,或是任h πt ®町文件區塊,可以是 疋任何不規則的形狀。 唯,以上所述者,備基 不能以此限定本發明 為本發明之較佳實施例而已,當 利範圍所作之均等變施之範圍。即大凡依本發明申請專 之範圍内。 ’ 輿修飾,皆應仍屬本發明專利涵芸4 63 09 9 ϊ ΐ Two of the "block image subtraction" instructions, = color; purpose, in order to reduce storage space "4, reducing the small window of the block, called the library area, can be Right of 12 (d) :, and you can drag and drop these :: Take out the image area to make. The ghost ghost's management and search of the dynamic copying system and method can not only decompose and re-create the dream order The combination picker can only list the shirt color file m, and the extracted Wendu P 抡, regular rectangle, or any h πt ® document file block can be any irregular shape. For the above, Beiji cannot use this to define the present invention as a preferred embodiment of the present invention. The scope of equivalent changes made by the scope of the right. That is, within the scope of the application of the present invention. Should still belong to the invention patent
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW89101060A TW463099B (en) | 2000-01-24 | 2000-01-24 | Automatic document editing system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW89101060A TW463099B (en) | 2000-01-24 | 2000-01-24 | Automatic document editing system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
TW463099B true TW463099B (en) | 2001-11-11 |
Family
ID=21658573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW89101060A TW463099B (en) | 2000-01-24 | 2000-01-24 | Automatic document editing system and method |
Country Status (1)
Country | Link |
---|---|
TW (1) | TW463099B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI402181B (en) * | 2007-12-27 | 2013-07-21 | Seiko Epson Corp | A recording control means, a recording control method, and a computer-readable recording medium |
TWI409689B (en) * | 2005-10-27 | 2013-09-21 | Ibm | Method, data processing system and computer program product for maximizing window display area using window flowing |
TWI638563B (en) * | 2017-05-19 | 2018-10-11 | 虹光精密工業股份有限公司 | Image capturing method and image capturing device using the same |
-
2000
- 2000-01-24 TW TW89101060A patent/TW463099B/en not_active IP Right Cessation
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI409689B (en) * | 2005-10-27 | 2013-09-21 | Ibm | Method, data processing system and computer program product for maximizing window display area using window flowing |
TWI402181B (en) * | 2007-12-27 | 2013-07-21 | Seiko Epson Corp | A recording control means, a recording control method, and a computer-readable recording medium |
US8619321B2 (en) | 2007-12-27 | 2013-12-31 | Seiko Epson Corporation | Recording control device and recording control method |
TWI638563B (en) * | 2017-05-19 | 2018-10-11 | 虹光精密工業股份有限公司 | Image capturing method and image capturing device using the same |
US10484560B2 (en) | 2017-05-19 | 2019-11-19 | Avision Inc. | Image capturing method capable of arranging a plurality of region images and image capturing device using the same |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6903751B2 (en) | System and method for editing electronic images | |
US8855413B2 (en) | Image reflow at word boundaries | |
EP1999688B1 (en) | Converting digital images containing text to token-based files for rendering | |
CN101820489B (en) | Image processing apparatus and image processing method | |
EP0522702B1 (en) | Spot color extraction | |
US7386789B2 (en) | Method for determining logical components of a document | |
US8000529B2 (en) | System and method for creating an editable template from a document image | |
US8379055B2 (en) | Electronic layout generation based on visual context | |
US20110050723A1 (en) | Image processing apparatus and method, and program | |
US20110229035A1 (en) | Image processing apparatus, image processing method, and storage medium | |
CN1241758A (en) | Image processing apparatus and method, and computer-readable memory | |
KR20100033412A (en) | Image processing apparatus, image processing method, and computer program | |
JP2009146064A (en) | Image processor, image processing method, and program and storage medium thereof | |
EP2544099A1 (en) | Method for creating an enrichment file associated with a page of an electronic document | |
US6850228B1 (en) | Universal file format for digital rich ink data | |
JPH11345339A (en) | Method, device and system for picture segmentation, and computer-readable memory | |
TW463099B (en) | Automatic document editing system and method | |
JP5182902B2 (en) | Document image output device | |
JPH0612540B2 (en) | Document creation support device | |
JP5159588B2 (en) | Image processing apparatus, image processing method, and computer program | |
Murguia | Document segmentation using texture variance and low resolution images | |
JP2003331299A (en) | Device, method and program for displaying reduced image, and recording medium recorded with program | |
JP5767549B2 (en) | Image processing apparatus, image processing method, and program | |
JP2006092127A (en) | Image processor, image processing method and program | |
JP5824309B2 (en) | Image processing apparatus, image processing method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GD4A | Issue of patent certificate for granted invention patent | ||
MM4A | Annulment or lapse of patent due to non-payment of fees |