TW463099B - Automatic document editing system and method - Google Patents

Automatic document editing system and method Download PDF

Info

Publication number
TW463099B
TW463099B TW89101060A TW89101060A TW463099B TW 463099 B TW463099 B TW 463099B TW 89101060 A TW89101060 A TW 89101060A TW 89101060 A TW89101060 A TW 89101060A TW 463099 B TW463099 B TW 463099B
Authority
TW
Taiwan
Prior art keywords
block
file
text
patent application
blk
Prior art date
Application number
TW89101060A
Other languages
Chinese (zh)
Inventor
Shan-Shih Huang
Wen-Hsiang Tsai
James Ching-Yu Yang
Original Assignee
Formosoft Internat Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Formosoft Internat Inc filed Critical Formosoft Internat Inc
Priority to TW89101060A priority Critical patent/TW463099B/en
Application granted granted Critical
Publication of TW463099B publication Critical patent/TW463099B/en

Links

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

This invention is an automatic document editing system and method including document block area fetching unit, block area classification determination unit, block area sequence determination unit, line sentence and text spilt unit, and rearrangement unit. It can separate and rearrange color documents. Users can achieve the goal of automatic document editing on their owns. The method in this invention includes original image two-value steps, document block area fetching steps, block area classification determination steps, block area sequence determination steps, line sentence and test spilt steps and text image rearrangement steps, and the text block area fetched out can be regular rectangles or any irregular shapes.

Description

4 63 09 94 63 09 9

本發明係有關於文件區塊的。枯 像處理的方式,自動文件剪輟 輯特別是一種利用影 的系統與方法。 發明背景 現代的印刷技術相當進步 樣。以現今平面媒體、報章、雜誌:令牛:排版趨於多 們的目光和增加資訊的可看性,大^ : 了達到吸引人 加以文字、圖片的修飾,使得現代文主二 能迅速且正確地瞭解彩色文件内文章的結 :1若 供讀者閱讀的順序,還可以將文件内的;文以i::=提 重新排列,或是分別對圖文作處理,將 ^卜式 Ϊ出ΤΙ;部份的文字㈣,再經文字二以予以Ϊ 識出來,作各種應用。 了 μ辨 人赫:Ϊ件的分析與暸解㈣’主要在利用電腦自動處理 人類所使用種類繁多的文件。過去雖然已有不少里2處理 動處理與分析的研究,但在彩色文件方面的探;則J件 文件分析的—個重要工作是,將一 成不同的區域。一般來說有兩種方法: 彩色文件影像分 由上而下 割The present invention relates to file blocks. With regard to the dead image processing method, automatic file clipping is especially a system and method using shadows. BACKGROUND OF THE INVENTION Modern printing technology is quite advanced. With today's print media, newspapers, and magazines: Ling Niu: typography tends to attract more people's attention and increase the visibility of information, ^: It has reached the appeal of adding text and pictures to make modern authors quickly and correctly. To understand the conclusions of the articles in the color file: 1 If the order of reading is available to the reader, you can also rearrange the text in the file; or rearrange the text with i :: =, or process the graphics and text separately. Part of the text is recognized by text two for various applications. Identified Human: Analysis and Understanding of Files㈣ ’are mainly using computers to automatically process a wide variety of documents used by humans. In the past, although there have been many researches on the processing and analysis of Li 2 processing, but the exploration of color files; the important thing of the J file analysis is that it will become a different area. Generally speaking, there are two methods: Color file image is divided from top to bottom

第4頁 4 63 09 9Page 4 4 63 09 9

五、發明說明(2) (top-down)或由下而上(bottom-up)。在習知文獻 Friedrich M. Wahl. Kwan Y. Wong > and Richard G. Ca/sey,丨'Block Segmentation and Text Extraction in ^j/ixed Text/Image Documents" Computer Graphics and Image Processing 20, 375-390, 1982,所揭露的"區5. Description of the invention (2) (top-down) or bottom-up. Friedrich M. Wahl. Kwan Y. Wong > and Richard G. Ca / sey, 'Block Segmentation and Text Extraction in ^ j / ixed Text / Image Documents " Computer Graphics and Image Processing 20, 375-390 , 1982, the "quoted district"

段長度平滑演算法(run length smoothing algorithm)" ,以及D. Wang and S.N. Srihari,"Classification of newspaper image blocks using texture analysis" v Computer Vision , Graph, and I mage Process. ,Vol. 47,pp. 3 27-3 52, 1 98 9,所揭露的"投影輪廓切割演算法 (projection profile cut algorithm)",為由上而下的 方法。 而由下往上的習知文獻有L. A. Fletcher and R. Kasturi,'丨 A robust algorithm for text string separation from mixed text/graphics images" IEEE Trans, Pattern Analysis and Machine Intelligence j,Vo 1. 10, pp. 910-918,1 988。將影像中的像素連結Run length smoothing algorithm ", and D. Wang and SN Srihari, " Classification of newspaper image blocks using texture analysis " v Computer Vision, Graph, and I mage Process., Vol. 47, pp 3 27-3 52, 1 98 9, the "projection profile cut algorithm" disclosed is a top-down approach. From the bottom up, the conventional literature is LA Fletcher and R. Kasturi, '丨 A robust algorithm for text string separation from mixed text / graphics images " IEEE Trans, Pattern Analysis and Machine Intelligence j, Vo 1. 10, pp. 910-918, 1 988. Linking pixels in an image

成許多連通分量(connected component),然後再合併成 較大區塊的方法。 而對於文件區塊分類的習知文獻有H. J. Lee and C. W. Ch i en" Segmentation of documents with text/graphic/image" Proc. of ICCPCOL , pp.A method of forming many connected components and then merging them into larger blocks. The conventional literature on file block classification includes H. J. Lee and C. W. Ch i en " Segmentation of documents with text / graphic / image " Proc. Of ICCPCOL, pp.

第5頁 4 63 09 9 五、發明說明(3) 1 88-1 94,1991。對於一文件影像,先利用塗黑法 (smearing method )將像素合併為一個個區塊區域(以〇(^ region),並找出區塊區域的邊界之後,將每個小區域區 塊先作粗分類(coarse classificati〇n ),並在處理前, 先债測文件傾斜的角度,並予以導正。此粗分類是利用每 個區塊的長寬比、區塊大小和作者所定義的一些限制值, 共分為:基本文字區(basic text)、抬頭(title)、圖片 和影像(graphics and image)、線段和雜訊(Une and noises)。之後,在細分類(fine classificati〇n)時,將 圖片和影像利用作者所定義的八個遮罩(mask)來計算分辨 出來,除此之外,合併重疊的相同類別區塊。接下來,更 進步將一些判斷成文字區的抬頭區塊,利用相關位置從 文字區中區分出來。最後,將文字區區塊合併成行(text line),並揭露一種合併(merge)與分割(spilt)並用的方 法將行句内的文字切割出來。 „ 另外,習知文獻 K. C. Fan and l. S. Wang, "Document Segmentation and Classification" Proc. of 1997 IPPR Conf. 〇n CVGIP . Taichung > Taiwan > , Pp. ,1 997,也揭露一種文件分割與分類的 方法可以先將掃描時邊界出現的的雜訊予以偵測並去 除’然後依作者所定義的標準,找出基本結構(basic com^o^ent) j然後再將這些基本結構分類。其過程是先將 之为為文字區(text)部份和非文字區(non-text)。對於文Page 5 4 63 09 9 V. Description of the Invention (3) 1 88-1 94, 1991. For a file image, first use the smearing method to merge pixels into block areas (with 0 (^ region), and find the boundary of the block area. Coarse classification (coarse classificati0n), and before processing, first measure the tilt angle of the document and correct it. This coarse classification uses the aspect ratio of each block, the block size, and some defined by the author. Limit values are divided into: basic text area, title, graphics and image, line and noise. After that, in the fine classificati At the same time, the images and images are calculated and distinguished using the eight masks defined by the author. In addition, the overlapping blocks of the same category are merged. Next, some progress is made to judge some of the text areas as head-up areas Blocks are distinguished from the text area by using related positions. Finally, the text area blocks are merged into text lines, and a method of combining and splitting is used to cut out the text in the sentence. „In addition, the conventional literature KC Fan and l. S. Wang, " Document Segmentation and Classification " Proc. Of 1997 IPPR Conf. 〇n CVGIP. Taichung > Taiwan >, Pp., 1 997, also disclosed a The method of file segmentation and classification can first detect and remove the noise appearing at the boundary when scanning, and then find the basic structure (basic com ^ o ^ ent) according to the criteria defined by the author. Classification. The process is to first divide it into a text area and a non-text area. For text

4 63 09 9 五、發明說明(4) 字部份’利用提出的演算法合併成字串(text string), 再將字串合併為文字段落(text paragraph),並在最後將 相同方向的文字段落合併。而對於非文字部份,使用遮罩 (mask)找出直線,並利用線的多寡來判斷欄(field)和表 (tab 1 e)的不同’接著觀察區塊内各點附近的黑點數,將 影像和圖片分辨出來。最後,利用多邊形(P〇 1 yg〇na 1 )來 表示每個區塊’並合併(merge)重疊的相同類別區塊或分 割(split)不同類別的區塊。 習知文獻L. F. Lee and W. H. Tsai, "Understanding of Arrangements and Extraction of Articles in Chinese Newspaper Images" Proc. Int. Conf. Computer Vision , Graphics and Image Processing , Nantou ’Taiwan , ROC , pp. 479-487 , 1 9 9 5,的揭露裡,則是針對中文的報紙,揭露對於排版瞭 解以及文章擷取的方法’並對於内文文字和標題文字採取 不同的切割方式,以解決不同的字體出現在同一個區塊。 習知文獻 S. C. Lin and W. H. Tsai , "Segmentation and Understanding of Color Magazine Images" Proc. International Computer Symposium , Kaohs i ung,Taiwan ’Republic of China,December 1 9 9 6,pp. 2 0 5 -2 1 2,的揭露裡,則對中文雜誌提出了自 動切割區塊的方法,並利用相關位置和區塊的一些特性找4 63 09 9 V. Description of the invention (4) The word part 'combined into a text string using the proposed algorithm, then merged the string into a text paragraph, and finally the text in the same direction Paragraph merge. For non-text parts, use a mask to find the straight line, and use the amount of the line to determine the difference between the field and the tab (e). Then observe the number of black dots near each point in the block. To distinguish between images and pictures. Finally, a polygon (P0 1 ygona 1) is used to represent each block 'and merge overlapping blocks of the same category or split blocks of different categories. LF Lee and WH Tsai, " Understanding of Arrangements and Extraction of Articles in Chinese Newspaper Images " Proc. Int. Conf. Computer Vision, Graphics and Image Processing, Nantou 'Taiwan, ROC, pp. 479-487, 1 The disclosure of 9 9 5 is aimed at Chinese newspapers, revealing the understanding of typography and the method of extracting articles, and adopting different cutting methods for the main text and headline text to solve the problem that different fonts appear in the same area. Piece. SC Lin and WH Tsai, " Segmentation and Understanding of Color Magazine Images " Proc. International Computer Symposium, Kaohs iung, Taiwan 'Republic of China, December 1 9 9 6, pp. 2 0 5 -2 1 2 In the disclosure of,, a method for automatically cutting blocks was proposed for Chinese magazines, and the relevant positions and some characteristics of blocks were used to find

4 63 09 9 五、發明說明(5)4 63 09 9 V. Description of the invention (5)

頁碼等D 出文章的標題、摘要、章節、拾頭 發明概述 t發:能對彩色文件做分解與重新組合排 之一是,提供一種自動文件剪輕 。其目的 自己的意思,達到自動文件剪輟 ==者可以依照 慧型的文件剪輯系統。 不赞月為一種智 本發明之自動文件剪輯系 取單元、區塊分類判斷單元、 文字切割單元,以及重排單元 統,主要包含有文件區塊抽 區塊順序判斷單元、行句與 一本,明之自動文件剪輯系統可搭配使用一中央處理 :、-纪憶體單元,# 一操作介面來實施各單元之間的 動文件剪輯功能’和作為所需要的資料儲存空間。操作介 面可,一特殊的工具列面板,來方便使用者的觀看選用, 並且得以輕易地操作各項文件之自動剪輯功能。 本發明之又一目的是, 本發明之自動文件剪輯方法 步驟、文件區塊抽取步驟、 判斷步驟、行句與文字切巧 提供一種自動文件剪輯方法。 ,主要包含有原始影像二值化 區塊分類判斷步驟、區塊川貝序 步驟,和文件影像重排步驟。Page number, etc. The title, abstract, chapter, and pick-up of the article are summarized. Summary of the Invention: To be able to decompose and reassemble color files. One of them is to provide an automatic file cutting. The purpose is to realize the automatic file clipping according to its own meaning. == Those who can follow the intelligent file editing system. The shameless month is a kind of intelligent document editing unit, block classification judgment unit, text cutting unit, and rearrangement unit system of the present invention, which mainly includes a document block extraction block sequence judgment unit, a line sentence, and a book. Ming Ming's automatic file editing system can be used with a central processing unit :,-Ji Yi body unit, # an operation interface to implement the function of moving file editing between units' and as the required data storage space. The operation interface is a special toolbar panel, which is convenient for users to watch and select, and can easily operate the automatic editing function of various files. Yet another object of the present invention is to provide an automatic file clipping method in the automatic file clipping method steps, file block extraction steps, judgment steps, lines, and text of the present invention. , Mainly including the binarization of the original image, the block classification judgment step, the block Chuanbei sequence step, and the document image rearrangement step.

4 63 09 94 63 09 9

本务月之原始影像二值化的步驟是根據"矩量保持 (moment preseving)原理",分別對紅、綠、駐—鍤 分量做矩量保持二值化,來取得=_八曰、;^一裡顧巴 "U u 付—原色分Ϊ適當的臨界值 (threshold),再根據一轉換關係得到最佳的二 像,以做區塊抽取。 ~~ ^ 本發明不僅能對彩色文件做分解與重新組合 被抽取出的文件區塊,可以包括是規則的矩 不規則的形狀。 A疋任何 在本發明之實施例中仰二少人,丨卞$僧γ、 之自動文件剪輯系統的各主要單元的功能對此:二::明 分解與重新組合排列,而得到一個重排後的結▲影像。做 兹配合下列圖式、實施例之詳失細說明及 圍,將上述及本發明之其他目的與優點詳述於后。凊範 明 圖式之簡要說 統的架構及各單元的功 圖1係本發明之自動文件剪輯系 示意圖。 一中央 施各單 。^根據本發明之自動文件剪輯系統,搭配使用 處理單元、一記憶體單元,和一操作介面來實 兀*之間的自動文件剪輯功能的方塊示意圖。The step of binarizing the original image of this month is based on the "moment preseving principle", and the binarization of the red, green, and 驻-锸 components is performed to maintain the binarization to obtain = _ 八月, ^ 一 里 顾 巴 " U u Fu-the primary color is divided into an appropriate threshold (threshold), and then the best two images are obtained according to a conversion relationship for block extraction. ~~ ^ The present invention can not only decompose and reassemble color files, the extracted file blocks can also include regular moments and irregular shapes. A 疋 Anyone in the embodiment of the present invention, the main unit of the automatic file editing system, the functions of the main units are as follows: 2: Decomposition and rearrangement, and get a rearrangement After the knot ▲ image. In conjunction with the detailed description and surroundings of the following drawings and embodiments, the above and other objects and advantages of the present invention will be described in detail below.凊 Fan Ming Brief description of the system structure and functions of each unit Figure 1 is a schematic diagram of the automatic file editing system of the present invention. One central government pays the bill. ^ A block diagram of the automatic document editing function according to the present invention, using a processing unit, a memory unit, and an operation interface to implement the automatic document editing function *.

4 63 09 94 63 09 9

五、發明說明(7) 圖3係本發明之自動文件剪辍方 ^ w万去的運作流程圖。 圖4係說明圖3中原始文件影傻- _ c ^ 像一值化的步驟流程。 圖5係§兒明圖3中文件區塊抽取的三種切 圖6、(a)和圖6(b)分別為一文件區^經垂=方向分割和水平 方向分割成兩個區塊後的結果。 圖6(0係說明將方形區塊’利°用影像的近似程纟,内縮成 豉小不規則區塊。 圖7係說明圖3中區塊分類判斷的步驟流程。 圖8係說明圖3中垂直文件區塊順序判斷的步驟流程。 圖9係說明圖3中行句與文字切割的步驟流程。 圖1〇係說明圖3中文件影像自動重排的步驟流程。 圖11 (a )〜11 (e)說明本發明之一較佳實施例。其中, 圖11 (a)係一欲剪輯的全彩文件影像。 圖11 (b)係圖11 (a)經本發明之自動文件剪輯系統的文件區 坡插取單元和區塊分類判斷單元後的一實施結果。 圖11 ( c)係圖11 ( b)再經本發明之自動文件剪輯系統的區塊 順序判斷單元後的一實施結果。 圖11 (d)係圖11 (c)再經本發明之自動文件剪輯系統的行句 與文字切割單元,及利用重排單元依照各種樣式進行 文件的重排,得到重排後的一個橫排的結果影像。 圖11 (e)係圖11 (c)再利用本發明之自動文件剪輯系統的行 句與文字切割單元及重排單元,依照各種樣式進行文 件的重排,得到重排後的一個直排的結果影像。 圖12(a)〜12(d)為關於本發明之自動文件剪輯系統的介面 4 63 09 9 五、發明說明(8) 圖。其中, 圖1 2 (a )列出「檔案(F )」之功能表的各項指令。 圖12(b)列出「底稿(L)」之功能表的各項指令。 圖1 2 (c )列出「剪輯與重排(0 )」之功能表的各項指令。 圖1 2(d)列出「影像與繪圖(I )」之功能表的各項指令。 圖號說明 1 0 0文件區塊剪輯系統 1 0 2區塊分類判斷單元 1 0 4行句與文字切割單元 1 0 6原始文件影像 2 0 1中央處理單元 2 0 3操作介面 1 0 1文件區塊抽取單元 1 0 3 區塊順序判斷單元 1 0 5重排單元 1 0 7重排後文件影像 202記憶體單元 2 0 4工具列面板V. Description of the invention (7) FIG. 3 is a flowchart of the operation of the automatic file clipping method of the present invention. FIG. 4 is a flowchart illustrating the steps of the original file shadow__c ^ like-value in FIG. 3. Figure 5 is the three cuts of the file block extraction in Figure 3, Figure 6, (a) and Figure 6 (b) are a file area ^ divided by vertical = direction and horizontal direction into two blocks result. Fig. 6 (0 is an explanation of the approximation of the image of the square block, which is used to reduce the size into small irregular blocks. Fig. 7 is a flowchart illustrating the steps for determining the classification of the block in Fig. 3. Fig. 8 is an explanatory diagram. The flow of steps for judging the order of the vertical file blocks in Fig. 3. Fig. 9 illustrates the flow of steps for cutting lines and characters in Fig. 3. Fig. 10 illustrates the flow of steps for automatic rearrangement of document images in Fig. 3. Fig. 11 (a) ~ 11 (e) illustrates a preferred embodiment of the present invention. Among them, FIG. 11 (a) is a full-color document image to be edited. FIG. 11 (b) is FIG. 11 (a) via the automatic document editing system of the present invention. An implementation result of the file area slope interpolation unit and a block classification judgment unit. Fig. 11 (c) is an implementation result after the block sequence judgment unit of the automatic file editing system of the present invention. 11 (d) is FIG. 11 (c) The line and text cutting unit of the automatic document editing system of the present invention, and the rearrangement of the document according to various styles using the rearrangement unit to obtain a horizontal result after rearrangement Fig. 11 (e) is the automatic file editing system of Fig. 11 (c) which reuses the present invention The conventional line and text cutting unit and rearrangement unit rearrange the documents according to various styles to obtain a straight-line result image after rearrangement. Figures 12 (a) ~ 12 (d) are about the automatic of the present invention. The interface of the file editing system 4 63 09 9 V. Description of the invention (8) Figure. Among them, Figure 12 (a) lists the various commands of the function table of "File (F)". Figure 12 (b) lists " The manuscripts of the function table of the "Manuscript (L)". Figure 12 (c) lists the commands of the function table of "Clip and Rearrange (0)". Figure 12 (d) lists the "Image and Drawing" (I) ”in the function table. Drawing number description 1 0 0 file block editing system 1 0 2 block classification judgment unit 1 4 lines and text cutting unit 1 0 6 original document image 2 0 1 center Processing unit 2 0 3 operation interface 1 0 1 file block extraction unit 1 0 3 block order judgment unit 1 0 5 rearrangement unit 1 0 7 rearranged document image 202 memory unit 2 0 4 toolbar panel

2 0 5輸入單元 2 06輸出單元 2 0 7儲存單元2 0 5 input unit 2 06 output unit 2 0 7 storage unit

3 0 1原始影像二值化步驟 3 0 2 文件區塊抽取步驟 3 0 3 區塊分類判斷步驟 3 0 4 區塊順序判斷步驟 3 〇 5行句與文字切割步驟 3 0 6文件影像重排步驟 401 取得三原色分量之適當的臨界值 402根據一轉換公式,將原始文件影像轉換為二值化影像 7 0 1 文件區塊的前處理步驟 7 0 2 文件區塊的分類判斷步驟 9 0 1 行句切割步驟3 0 1 Binary step of original image 3 0 2 File block extraction step 3 0 3 Block classification determination step 3 0 4 Block order determination step 3 〇 Line and text cutting step 3 0 6 Document image rearrangement step 401 Obtain the appropriate critical value of the three primary color components. 402 Convert the original document image into a binary image according to a conversion formula. 7 0 1 Pre-processing steps for file blocks 7 0 2 Classification and judgment steps for file blocks 9 0 1 Line sentences Cutting step

第11頁 ^ 63 09 9Page 11 ^ 63 09 9

9〇2内文區塊文字的切割步驟 9 〇 3標題區塊文字的切割步驟 發明之詳細說明 圖1係本發明之自動文件剪蚯 功能示意圖。如圖i所示,本:明構及各單元的 包含-文件區塊抽取單元1〇1 剪輯系統1〇〇 一區塊順序判斷單元1()3、—行判斷單元ι〇2、 及-重排單元m。 句與文字切割單元叫,以 參考圖1 ’首先,文件區塊柚 π 像1。6内的每一區塊進 :=1對原始文件影 後,區塊分_刹斷二1圖文區塊分離出來。然 斷。接:行區塊的圖或文性質做出判 塊順序判斷單元103判別文字區塊的順序。 r宝m字切割單元1〇4進行精細到每一個文字的 1到元105依照各種樣式進行文件的重排 付到重排後的結果影像1 07。 為根據一本發明之自動文件剪輯系統1〇〇,搭配使用 pfnn处理單Λ201、一記憶體單元202,和一操作介面 '知各單70之間的自動文件剪輯功能的方塊示意 圖。知作介面203更備有一工具列面板(panel)2〇4。Step of cutting text in block text in 920 9 Step of cutting text in block text in header Detailed description of the invention FIG. 1 is a schematic diagram of an automatic document cutting function of the present invention. As shown in Figure i, this: Bengo structure and the inclusion of each unit-file block extraction unit 101 clip system 100 block order judgment unit 1 () 3,-row judgment unit ι〇2, and- Rearrange unit m. The sentence and text cutting unit is called to refer to Figure 1 'First, the file block pomelo π is like every block in 1.6: 1 = After the original file is shadowed, the block is divided into _break off 2 1 graphic area The pieces are separated. Sure. Next: the block or sequence of the block makes a block order determination unit 103 to determine the order of the character block. The r-m word cutting unit 104 performs fine-graining to 1 to 105 of each character, and rearranges the document according to various styles. The rearranged result image 107 is displayed. A block diagram of an automatic document editing system 100 according to the present invention, using pfnn processing unit Λ201, a memory unit 202, and an operation interface 'knowledge unit 70'. The known interface 203 further includes a toolbar panel 204.

4 63 〇9 q 五、發明說明(10) 中央處理單元201用來執行自動 單元的功能。記憶體單元202係作為糸統100之各 間。藉由操作介面m上的工具歹=需上的資料儲存空 使用者得以觀看選用,並且輕易地板顯示的功能’ 功能。操作介面203的功能顯示易方也^是作以各圖項开=牛自動剪輯 ㈣hie User interiaee)來達成式使用者介面( 有視窗、對話方塊、主選單、工I 〉使用者介面包含 ke^及巨集UacrcO的使用環境/、,再配合熱鍵(h〇t 根據本發明’欲剪輯的文件可 將文件影像輪入於自動文件煎輯系二用—輸入單元2〇5 如利用掃描器來輸入文件影像資料。中。輪入單元205 執行各項文件剪輯功能後的社杲, 元206。輪出的方式有多種型態,、° ,輸出_至-輸出單 單元,或列印至一輸出農置,或以一 /么顯不於-顯示 至一伺服器中。 ,、子的文件資訊輸出 根據本發明,經整理或欲 可以用記憶體單元202内的新利用的,文件資料, 使用者藉由操作介面2〇3上的預覽^尹來保存,以方4 尋該剪輯文件資料。儲存單元保覽存:功:來觀看或搜 存入文件資料之相關的摘訊忐^的方式包括:ί …戈貝Λ,或存成為圖庫等。 4 63 〇9 9 ------- 五、發明說明(11) 圖3為本發明之自動女#酋 發明之自動文件剪輯方法依序方法的運作流程圖。本 步驟3(H、文件區塊抽取步驟3〇2 :值化 303、區塊順序判斷步 _ ^嵬刀類判斷步驟 和文件影㈣乂 丁句與文字切割步獅5, 和文件影像重排步驟3〇6。以下說明各個步驟流程。 圖4:係5兒明圖3令原始文株旦〈:禮一伯 前所述,本發明之原:文二。: ”對紅、綠、藍三原色做矩^ 換&amp;之,就是取得原始影像之三原色分量之適當, 值,並將原始影像的灰階值(gray level)分成高於臨界 的一組與小於臨界值的另一組。 、界值 參考圖4,首先,在步驟4〇1中,係對一 件影像,根據矩量保持原理,分別取得三原八^ = 的臨界值Rt,Gt和Μ。然後,在步驟4〇2中 刀置之適▲ 值Rt二Gt和Bt ’利用—轉換公式,將原始文件影J臨界 素,設為二值化中的某—值,如黑或白。 / 、象 本發明之實施例中所利用的轉換公式如下. 若(Rp &lt; (Rt + rz1)/2 且 Gp4 63 〇9 q 5. Description of the invention (10) The central processing unit 201 is used to perform the functions of the automatic unit. The memory unit 202 functions as the system 100. By using the tool on the operation interface m == the required data is stored, the user can view and select the function and easily display it on the floor ’function. The display of the function of the operation interface 203 is also easy. ^ It is used to achieve the user interface (with windows, dialog boxes, main menus, tools, etc.) to achieve a user interface (automatic editing 牛 hie User interiaee).> The user interface contains ke ^ And the use environment of the macro UacrcO /, and then cooperate with the hot key (h〇t according to the present invention, the file to be edited can be rotated into the automatic file editing system dual-use input unit 205 if a scanner is used To input the file image data. Medium. The round-in unit 205 performs various file editing functions, Yuan 206. There are various types of rotation, °, output _ to-output single unit, or print to An output is an agricultural product, or is displayed in a server as a /. It is displayed on a server. According to the present invention, the newly-used, document data in the memory unit 202 can be used according to the present invention. , The user saves by previewing ^ Yin on the operation interface 203, and finds the clip file information by the side 4. The storage unit saves and saves: Function: to view or search the relevant digests of the file information 忐 ^ The ways include: ί ... Gobe Λ, Save it as a gallery, etc. 4 63 〇9 9 ------- V. Description of the invention (11) Figure 3 is the operation flow chart of the sequential method of the automatic file editing method of the automatic female # chief invention of the present invention. This step 3 (H, file block extraction step 3202: value 303, block order judgment step _ ^ knife class judgment step and file shadowing sentence and text cutting step lion 5, and file image rearrangement step 3 〇6. The following describes the steps. Figure 4: Figure 5 shows the original text Zhudan <: Li Yibo, the original of the present invention: Wen II .: "Do the three primary colors of red, green and blue Moment ^ change &amp; is to obtain the proper value of the three primary color components of the original image, and divide the gray level of the original image into a group above the threshold and another group below the threshold. Referring to FIG. 4, first, in step 401, an image is obtained, and the critical values Rt, Gt, and M of the three original ^ = are respectively obtained according to the principle of maintaining the moment. Then, the knife is set in step 402. The appropriate value ▲ Rt two Gt and Bt 'Using the conversion formula, the original file shadow J critical element is set to binarization A - value, such as black or white /, embodiments of the present invention as utilized in the following conversion formula If (Rp &lt; (Rt + rz1) / 2 and Gp..

GtGt

Gzi)/2)或Gzi) / 2) or

Up &lt; (Rt + rz1)/2 且 Bp〈(Bt + βζΐ) 2) 或 第u頁 4 63 09 9 五、發明説明(丨2) (GP &lt; cGt + Gzl)/2 且 βρ &lt; (Bt + bzi)/2)或 則,像素P的顏色設為黑色;否則,像素p的顏色設為 白色。其中, &quot; ° KP表像素P的紅色分量;Rt表紅色分量的臨界值; BP表像素p的藍色分量;Bt表藍色分量的臨界值|Up &lt; (Rt + rz1) / 2 and Bp <(Bt + βζΐ) 2) or page u 4 63 09 9 V. Description of the invention (丨 2) (GP &lt; cGt + Gzl) / 2 and βρ &lt; (Bt + bzi) / 2) Or, the color of the pixel P is set to black; otherwise, the color of the pixel p is set to white. Among them, &quot; ° KP indicates the red component of pixel P; Rt indicates the critical value of red component; BP indicates the blue component of pixel p; Bt indicates the critical value of blue component |

Gp表像素p的綠色分量;Gt表綠色分量的臨界值;Gp represents the green component of pixel p; Gt represents the critical value of the green component;

Rzl表紅色分量在矩量保持二值化的偏移量; βζ1表藍色分量在矩量保持二值化的偏移量;Rzl indicates that the red component maintains a binarized offset in the moment; βζ1 indicates that the blue component maintains a binarized offset in the moment;

Gzl表綠色分量在矩量保持二值化的偏移量。 圖係°兒明圖3中文件區塊抽取的、三種切割。本發明之 文件區塊的抽取可包括單區塊的切割、單區塊智慧型的切 割以及夕區塊智慧型的切割。以下說明各切割方法。 首先說明一 用此文件區塊之 下)和(右,上), 明各切割方法。 文件區塊的表示方法。根據本發明,係利 左下角與右上角之兩個二維座標,(左, 的方式來記錄此文件區塊。接下來,說 單區塊的切割是直接 塊。所以,框取範圍的左 以使用者框取出的外框當抽取區 下角座標即(左,下),右上角座Gzl indicates that the green component maintains a binarized offset in the moment. The picture shows three kinds of cuts extracted from the file block in Fig. 3. The extraction of file blocks of the present invention may include single-block cutting, single-block intelligent cutting, and evening block intelligent cutting. Each cutting method will be described below. First of all, use this file block (below) and (right, top) to explain the cutting methods. Representation of file blocks. According to the present invention, the two-dimensional coordinates of the lower left corner and the upper right corner are used to record this file block. (Next, the cutting of a single block is a direct block. Therefore, the frame is taken to the left of the range. The outer frame taken from the user frame is the coordinates of the lower corner of the extraction area (left, bottom), and the upper right corner.

4 63 09 9 五、發明說明(13) 標即(右,上)。 單區塊智慧型切割包含兩個步驟:(a )對於一張文件 影像’先將文件區塊的约各範圍框取出來。(b)利用水平 投景/(vertical projection)和垂直投影(horizontal4 63 09 9 V. Description of the invention (13) The title is (right, top). The single-block intelligent cutting includes two steps: (a) For a file image ', first extract about the range frames of the file block. (B) the use of horizontal projection / (vertical projection) and vertical projection (horizontal

Pro j ec t i on )來找出文件區塊的實際最小外框,也就是找 出左下角座標(L,D)與右上角座標(R,U)。 水平投影尋找方法為’利用水平投影,由上而下找出 第一個有黑色投影的垂直座標分量即為&quot;;由下而上找到 第一個有黑色投影的垂直座標分量,即為&quot;D „。 垂直投影尋找方法為’制垂直投影,由左而右找出 黑,投影的水平座標分量,即為再來由右而左 找到第-個黑色投影的水平座標分量,即為&quot;R&quot;。此時再 利用影像的近似程度,對方形外框做更進一步内縮動作, 就可以得到不規則形狀最小外框。 因為一張影像的文件區塊之間都 ^ 些間格在水平投影或垂直投影上,舍:比較大的間格’这 &amp; •kCwhite run)。多I塊智慧型切割就β&amp; &amp; 區- •又,將文件區塊分割、.出來。切割的古^ 士 单切电丨兩锸,,^ μ ^的方向有垂直切割與水 平切割兩種。此種文件區塊的切割 j d 包含下列步驟:Pro j ec t i on) to find the actual smallest outer frame of the file block, that is, find the lower left corner coordinates (L, D) and the upper right corner coordinates (R, U). The method of finding the horizontal projection is' Using horizontal projection, find the first vertical coordinate component with black projection from top to bottom is &quot; find the first vertical coordinate component with black projection from bottom to top, which is &quot; D „. The method of looking for the vertical projection is to make a vertical projection, find black from left to right, and the horizontal coordinate component of the projection, that is, to find the horizontal coordinate component of the first black projection from right to left, which is &quot; R &quot;. At this time, the approximate degree of the image is used again, and the square outer frame is further indented, so that the smallest outer frame of irregular shape can be obtained. On the horizontal or vertical projection, the round: a relatively large compartment 'this &amp; • kCwhite run). Multiple I-block smart cutting will be β &amp; &amp; area-• Also, the file block is divided and cut out. Cut There are two types of ancient, single-cut electricity, vertical, horizontal, and ^ μ ^ directions. Cutting jd of this file block includes the following steps:

第16頁 4 63 09 9Page 16 4 63 09 9

l a )對於一張文件影像’將其約略範圍框取出來。 (b) 根據目前區塊的垂直投影最大白色區段長度與水 平投影最大白色區段長度來決定每次切割的方 向。 (c) 如果垂直投影最大白色區段長度大於水平投影的 最大白色區段長度,就對區塊做垂直切割。反之 ’若水平投影最大白色區段長度大於垂直投影的 最大白色區段長度,則做水平切割。 (d) 切割出來的區塊’繼續重複步驟(b)和(c),以細 切成更小的區塊。 (e) 當最小方形區塊已經切割出來後,再利用如單區 塊智慧型拆解的方式,將各方形區塊内縮成最小 不規則區塊。 圖6(a)和圖6(b)分別為一文件區塊經垂直方向分割和 水平方向分割成兩個區塊後的結果。其中,斜線部分表示 白色區段。S1和S2分別為兩個白色區段的起始點,E1和以 分別為兩個白色區段的結束點。由圖6 ( a )可窥出,文件區 塊(L,D)和(R,U)。經垂直方向分割成兩個區塊(L,D)和 (S1,U) ’及(E1,D)和(R,U)。同樣地,由圖6(b)可看出, 文件區塊(L,D )和(R,U )經水平方向分割成兩個區塊(L, E 2 ) 和(R,U),及(L,D)和(R,S2)。圖6(c)則表示將方形區塊, 利用影像的近似程度,内縮成最小不規則區塊。l a) For a document image ’, take out its approximate frame. (b) Determine the direction of each cut based on the maximum white segment length of the vertical projection and the maximum white segment length of the horizontal projection of the current block. (c) If the maximum white segment length of the vertical projection is greater than the maximum white segment length of the horizontal projection, the block is cut vertically. Conversely, if the maximum white segment length of the horizontal projection is greater than the maximum white segment length of the vertical projection, a horizontal cut is made. (d) The cut out block 'continues to repeat steps (b) and (c) to finely cut into smaller blocks. (e) After the smallest square block has been cut out, the smart block disassembly method such as a single block is used to shrink each square block into the smallest irregular block. Figure 6 (a) and Figure 6 (b) are the results after a file block is split into two blocks in the vertical direction and the horizontal direction, respectively. Among them, the oblique line indicates a white segment. S1 and S2 are the starting points of the two white segments, and E1 and S2 are the ending points of the two white segments, respectively. As can be seen from Fig. 6 (a), the file blocks (L, D) and (R, U). It is vertically divided into two blocks (L, D) and (S1, U) 'and (E1, D) and (R, U). Similarly, it can be seen from FIG. 6 (b) that the file blocks (L, D) and (R, U) are horizontally divided into two blocks (L, E2) and (R, U), and (L, D) and (R, S2). Figure 6 (c) shows that the square block is reduced to the smallest irregular block by using the approximate degree of the image.

第17頁 4 63 09 9 五、發明說明(15) 在文件影像中的文件區塊抽取出來之後,本發明接下 來就是要判斷文件區塊的方向和類別。關於方向,垂直方 向(vertical orientation)的文件區塊是指:區塊内的文 字讀取的方向是由上而下;而水平方向(h〇riz〇ntal orientation)的文件區塊則是指:區塊内的文字讀取方向 是由右到左或左到右。而文件區塊初步的分類如下: (a)圖片區塊(picture bl〇ck):泛指影像、圖片以 及圖片說明。 (b) (c) 垂直標題區塊(vertical headline block):文 字排列為垂直方向的標題區塊。 (d) 水平標題區塊(horizontal headline block): 文字排列為水平方向的標題區塊D 垂直内文區塊(vertical content block):文字 排列為垂直方向的内文區塊。 (e)水平内文區塊(horizontal content block):文 子排列為水平方向的内文區塊。 本發明再次利用到垂直和水平兩個方向的投影,來分 辦文件區塊的類別。同時定義一些參數來判斷文件區塊的 類別,並利用到文件區塊兩個端點座標:(Lef t,D〇wn)、 (Right,Up)。以下為這些參數的定義:Page 17 4 63 09 9 V. Explanation of the invention (15) After extracting the file blocks in the file image, the present invention next determines the direction and type of the file blocks. Regarding the orientation, the vertical orientation file block refers to: the reading direction of the text in the block is from top to bottom; and the horizontal orientation (horizontal orientation) file block refers to: The text in the block is read from right to left or left to right. The preliminary classification of file blocks is as follows: (a) Picture block: Refers to images, pictures, and picture descriptions. (b) (c) Vertical headline block: The text is arranged in a vertical headline block. (d) Horizontal headline block: The headline block in which the text is arranged in the horizontal direction. D Vertical content block: The text block in which the text is arranged in the vertical direction. (e) Horizontal content block: The text is arranged as a horizontal text block. In the present invention, projections in both vertical and horizontal directions are used again to classify the types of file blocks. At the same time, some parameters are defined to determine the type of the file block, and two endpoint coordinates of the file block are used: (Lef t, Down), (Right, Up). The following are the definitions of these parameters:

Vert_Avg_Whi te :垂直方向投影的平均白色區段的長度;Vert_Avg_Whi te: the length of the average white segment projected in the vertical direction;

第18頁 4 63 09 9Page 18 4 63 09 9

Hori—AvgJhite :水平方向投影的平均白色區段的長度; ert_Avg_Black .垂直方向投影的平均黑色區段的長度; Hori一Avg一Black .水平方向投影的平均黑色區段的長度; Black_Rat1〇 :二值化後黑色點在文件區塊内所佔的比 有了這些參數後,圖7為說明圖3中區塊分類判斷的步 驟流程。參照圖7,說明此步驟流程如下: 首先’進行步驟701,步驟701為文件區塊的前處理步 驟’依序包含下列五個步驟:(a)利用圖5的抽取文件區塊 的演算法找出文件區塊的兩端點(Left,Down)、(Right, Up)。(b)去掉區塊中的孤立的黑點(isolated black point)。(c)對此一文件區塊做垂直投影找出 Vert_Avg_White 和Vert_Avg_Black。(d)對此一文件區塊 做水平投影找出Hori一Avg_White 和 Hori—Avg_Black。(e) 計算出Black_Ratio。 接下來,步驟70 2為文件區塊的分類判斷。分有下列幾 種狀況: (a)若(Black_Ratio&gt; Q )或(Vert_Avg_White = 〇 且Hori_Avg_Whi te = 0),則,區塊設為圖片區 塊。Hori-AvgJhite: the length of the average white segment projected in the horizontal direction; ert_Avg_Black. The length of the average black segment projected in the vertical direction; Hori-Avg-Black. The length of the average black segment projected in the horizontal direction; Black_Rat10: binary After the ratio of the black points in the file block after the conversion has been obtained, FIG. 7 is a flowchart illustrating the steps of determining the classification of the block in FIG. 3. Referring to FIG. 7, the flow of this step is described as follows: First, “step 701 is performed, and step 701 is a pre-processing step of the file block”, which includes the following five steps in order: (a) using the algorithm of extracting file blocks of FIG. 5 to find The two ends of the file block (Left, Down), (Right, Up). (B) Remove the isolated black points in the block. (C) Perform vertical projection on this file block to find Vert_Avg_White and Vert_Avg_Black. (D) Perform horizontal projection on this file block to find Hori_Avg_White and Hori_Avg_Black. (E) Calculate Black_Ratio. Next, step 702 is the classification judgment of the file block. There are the following situations: (a) If (Black_Ratio &gt; Q) or (Vert_Avg_White = 〇 and Hori_Avg_Whi te = 0), then the block is set as a picture block.

第19頁 4 63 09 9 五、發明說明(17) 則此區 文 (b)若 Hori_Avg_White &gt; Vert_Avgjhite, 塊設為水平内文區塊;否則,此區塊設為垂直内 區塊 C2且 ,此區塊彀 (c)若(Right—Left)/Vert一Avg一Black S (Top—Down)/(Right—Left)2 C3,則 為垂直標題區塊。 (d)若(Top—Down)/Hoir—Avg_BlackS C2 且 -Left)/(Top—Down)g C3,則,此區塊設為欢 平標題區塊。 . 中 其 -1 Γ^3 值值# 數數數 常常常 AMW *θ AR定定定預預預 為為為 範圍為0. 5到1之間; 範圍在1 0以下;. 範圍為大於1。 區塊分類判斷步驟之後為區塊順序判斷的步 土判斷文件的區塊順序時,將文件分成垂 :本發 ;牛來處…先’先定義四個方向鄰近的區;:輿水平 數如下: 尼為四個表Page 19 4 63 09 9 V. Description of the invention (17) Then this area (b) If Hori_Avg_White &gt; Vert_Avgjhite, the block is set to a horizontal text block; otherwise, this block is set to a vertical inner block C2 and, (C) If (Right-Left) / Vert-Avg-Black S (Top-Down) / (Right-Left) 2 C3, it is a vertical title block. (d) If (Top-Down) / Hoir-Avg_BlackS C2 and -Left) / (Top-Down) g C3, then this block is set as a flat title block. In its -1 Γ ^ 3 value value # Counts are often AMW * θ AR is set to pre-set to be in the range of 0.5 to 1; the range is below 10; the range is greater than 1 . When the block classification judgment step is followed by the block order judgment step to determine the block order of the file, the file is divided into vertical lines: the present; the source of the cattle ... First, first define the areas adjacent to the four directions; : Nepal is for four watches

Up—blk : 一區塊上面最接近且最左邊的區塊 Down_blk :-區塊下面最接近 ^Up-blk: the closest and leftmost block above a block Down_blk:-the closest below a block ^

Rigf : 一區塊右邊最接近且最下面的區\。。Rigf: The closest and lowest region to the right of a block \. .

第20頁Page 20

Lf: -區塊左邊最接近且最上二 4 63 09 9 五、發明說明(18) ^Lf:-The closest and top two on the left side of the block 4 63 09 9 V. Description of the invention (18) ^

Now一blk :指目前所在的區塊。 而當區塊本身的某一方向沒有任何區塊時,則這個方 命的鄰近區塊就設為空區塊(Null Bi〇ck)。 有了這些參數後’圖8為說明圖3中垂直文件區塊順序 判斷的步驟流程。參照圖8,依序說明此步驟流程如下: 梦棘801:找出每個區塊的四個方向鄰近的區塊Up一bl k、Now a blk: Refers to the current block. When there is no block in a certain direction of the block itself, the neighboring block of this command is set as a null block. After having these parameters', FIG. 8 is a flowchart illustrating the steps of judging the order of the vertical file blocks in FIG. Referring to FIG. 8, the sequence of steps is described as follows: DreamBridge 801: Find four blocks adjacent to each block Up-bl k,

Down一blk 、Left_blk 、Right_blk; 少驟802:找出第一序號的區塊,將此區塊設為n〇 w_b lk, 序號設為'—; 少驟8 0 3:若Now_blk有Left_blk且Left_blk還未被排 序,則依序進行下列三個步驟,否則,至步驟 804, ( 803a)設定此Lef t_blk 的Right_blk 為 Now_b1k , (80 3b)設定Now—blk 為Now_blk 的Left_blk, (803c)序號加一,回至步驟803; 少#8 04 :若Now_blk有Right_blk,則依序進行下列兩個 步驟,否則,至步驟805, (804a)設定此Right_blk 的Lef t_blk 為 Now_blk , (804b)設定Now_blk 為Now_blk 的Right_blk,至Down a blk, Left_blk, Right_blk; less step 802: find the block with the first sequence number, set this block to now_b lk, and the sequence number to '—; less step 8 0 3: if Now_blk has Left_blk and Left_blk If not yet sorted, perform the following three steps in order, otherwise, go to step 804, (803a) set Right_blk of this Lef t_blk to Now_b1k, (80 3b) set Now_blk to Left_blk of Now_blk, (803c) serial number plus First, go back to step 803; Less # 8 04: If Now_blk has Right_blk, perform the following two steps in order, otherwise, go to step 805, (804a) set the Left_blk of this Right_blk to Now_blk, (804b) set Now_blk to Right_blk for Now_blk, to

第21頁 4 63 09 9 五、發明說明(19) 步驟8 0 4 ; 步驟805:若 Now_blk 的Down—bl、k 為Null Block,則依序 進行下列兩個步驟’否則,設定此D〇wn_blk的 Up_blk 為Now一blk ,且Now_blk為Now_blk 的 Down_blk,且序號加一,並回至步驟803, (8 0 5 a )設定Now_b 1 k 為 No w_b 1 k 的 Le f t_b 1 k。 (8 0 5b)若 Now_blk = Null Block,則結束,否 則,回至步驟8 0 5。 水平文件區塊之順序判斷的步驟流程,與垂直文件區 塊之順序判斷的步驟流程類似。大致是找到序號第一的區 塊之後,先往下面找,找到文件影像的下邊線,再回溯到 最上邊’然後往右一層,再往下邊找’一直到每個區塊都 找過才停止。而加序號的限制歲垂直文件區塊之順序判斷 一嫌。 &quot; 本發明將圖片區塊與非圖片區塊分開處理。對於非層 片區塊’將區塊内的行句與文字切割出來。並且’先將子 句切割出來後,才對每句的文字作切割處理。非圖片區ij 包括水平與垂直的標題區塊和内文區塊。圖9說明圖3中4 句與文字切割的步驟包含了非圖片區塊内的行句切割步專 9〇1、内文區塊文字的切割步驟902,和標題區塊文字的i 割步驟903。Page 21 4 63 09 9 V. Description of the invention (19) Step 8 0 4; Step 805: If the Down_bl and k of Now_blk are Null Blocks, perform the following two steps in order. Otherwise, set this D0wn_blk Up_blk is Now_blk, Now_blk is Down_blk of Now_blk, and the sequence number is increased by one, and returns to step 803, (8 0 5 a) Set Now_b 1 k to Le f t_b 1 k of No w_b 1 k. (8 0 5b) If Now_blk = Null Block, the process ends. Otherwise, go back to step 8 0 5. The procedure for determining the order of horizontal file blocks is similar to the procedure for determining the order of vertical file blocks. Roughly after finding the block with the first serial number, first search below, find the lower edge of the file image, then go back to the top, then go to the next layer, and then look down. . The order of the numbered vertical file blocks is judged to be a suspicion. &quot; The present invention separates picture blocks from non-picture blocks. For non-slice blocks, the lines and words in the blocks are cut out. And after the clauses are cut out, the text of each sentence is cut. The non-picture area ij includes horizontal and vertical title blocks and text blocks. FIG. 9 illustrates that the step of cutting four sentences and text in FIG. 3 includes a line cutting step 901 in a non-picture block, a text block cutting step 902, and a title block text cutting step 903. .

第22頁Page 22

II

4 63 09 9 五、發明說明(20) 步驟9 0 1之行句切割步驟 “ 塊,則此行句切割步驟依序’右處理的是水平文件區 行的寬。(b)對每行的寬度,‘、'· (a)利用水平投影先找出每 由左右兩端逼近行的長度,/故β垂直方向的局部投影。(c) 實際長度。若處理的是=亩=得到水平文件區塊内每行的 句切割步驟依序為.(a)利 件區塊,方法大致相同,行 (b)對每行的寬产,#| *垂直投影先找出每行的寬。 ’ q〜見没,做水平方, ^ ^ ^ ^ ^ ^,, # Λ /Λ ^;(c} ^τ ^ 度。 又件區塊内母行的實際高 步驟902之内文區塊文字的切 =分割和水平内文文字的分割。本發心為垂垂直内文' 的分割’制到整個區塊的投影各:文文子 割文字”&quot;内文文字的分割與垂直内;=投影來切 唯要主忍—些方向的問題,在投影方 和部份投影都要改為利㈣垂直投影。*整體投影 步驟903之標題區塊文字的切割中,分為 字的分割和水平標題文字的分割。本發明在垂直π顏文^ 分割上不利用整體投影,但利用部份投影的 $ 予 ±L·皆古 T I王,亦艮卩; =旯度,並且定義兩種引號的特性,將引號區別於—般文 字之外。如此利用每格行寬的方式去切割標題令〜又 2知通文予的分割與垂直標題文字的分割類似,唯,要注 意一些方向的問題,在投影方面,部份投影要改為利用到4 63 09 9 V. Description of the invention (20) Step 9 0 1 Line sentence cutting step "block, then this line sentence cutting step sequentially 'right handles the width of the horizontal file area line. (B) For each line Width, ',' · (a) First use the horizontal projection to find the length of each approximating line from the left and right ends, so β is the local projection in the vertical direction. (C) The actual length. If the processing is = mu = get the horizontal file The sentence cutting steps of each line in the block are in order: (a) block of sharp parts, the method is roughly the same, line (b) for the width of each line, # | * vertical projection first find the width of each line. q ~ See no, do the horizontal square, ^ ^ ^ ^ ^ ^ ,, # Λ / Λ ^; (c) ^ τ ^ degrees. Also the actual height of the parent line in the block. Cut = segmentation and segmentation of horizontal text. The original intention is the projection of the vertical segmentation of the vertical text to the entire block: the text sub-cuts the text "&quot; segmentation of the text and vertical interior; = projection We must bear the main concern—for some orientation problems, the projection side and part of the projection must be changed to the vertical projection. * The cutting of the title block text in the overall projection step 903 , Divided into word segmentation and horizontal title text segmentation. The present invention does not use the overall projection in the vertical π Yanwen ^ segmentation, but uses a partially projected $ to ± L · Kigu TI King, also Gen 卩; = 旯It also defines the characteristics of two types of quotation marks, which distinguish them from ordinary text. In this way, the title line is cut by using the line width of each cell. The division of the text is similar to the division of vertical title text. , We should pay attention to some directions. In terms of projection, some projections should be used instead.

4 63 〇9 9 五、發明說明(21) 〜 垂直投影。 當文件區塊内的文字都予以分割完成,而且區塊之 的順序也已經判斷,本發明就可以對文件做重新排列的^ 作。本發明有兩種文件重排的方法,一為手動重排,另— 為自動重排。對於手動重排,本發明之系統備有字句間距 排版設定對話盒,供使用者選擇設定字句間距、字體大 小、邊界等等。既可以美化重排後的版面排版,也提供使 用者依照自己的喜愛來改變一些字句間排版的機會。 而自動重排的方法可以方便使用者不用設定字距排版 方式’所選擇的區塊’經過重新排列後,會盡量填滿整個 決定的版面,讓版面上不會有太多的空白區域,看起來比 較美觀。 在自動重排中’本發明先讓區塊内的文字,逐一依照 之前述的水平文#或垂直文件的方向排列完。若未達到預 先定義的兩個限制,則將字體變大。其中一個限制是不能 超過文件邊線,而另一傭是若有要顯示圖片區塊,則限定 圖片區塊至多只能縮小到1 /L,其中L為大於1的數,再小 的話,也算是超過預先定義的限制。圖丨〇說明圖3中文件 影像自動重排的步驟流程。在改變字體的過程中,為了能 排出較美觀的格式,其他設定也得調整,本發明設定一個 參數count,來調整其他的設定。參照圖丨〇,現將自動文4 63 〇9 9 V. Description of the invention (21) ~ Vertical projection. When the text in the file block is divided and the order of the blocks has been determined, the present invention can rearrange the files. There are two methods of file rearrangement in the present invention, one is manual rearrangement, and the other is automatic rearrangement. For manual rearrangement, the system of the present invention has a sentence spacing typesetting setting dialog box for users to choose to set the sentence spacing, font size, borders, and so on. It can not only beautify the rearranged layout, but also provide users with the opportunity to change the layout of some words according to their own preferences. The automatic rearrangement method can facilitate the user without setting the kerning type 'selected block'. After rearrangement, it will try to fill up the entire layout, so that there are not too many blank areas on the layout. See It looks more beautiful. In the automatic rearrangement, the present invention first arranges the characters in the block one by one according to the aforementioned horizontal text # or vertical file direction. If the two predefined limits are not reached, the font will be enlarged. One of the restrictions is that it cannot exceed the file border, and the other is that if there is a picture block to be displayed, the picture block can only be reduced to at most 1 / L, where L is a number greater than 1, and even smaller, it is considered to be Exceeding predefined limits. Figure 丨 〇 illustrates the process of automatic rearrangement of document images in Figure 3. In the process of changing the font, in order to discharge a more beautiful format, other settings must be adjusted. The present invention sets a parameter count to adjust other settings. Refer to the figure 丨 〇

4 63 09 9 五、發明說明(22) 件重排的步驟依序描述如下: / 步驟1 00 1 : count設定為1 ; 步驟1 0 02 :依字句間格設定,按照找出的 件區塊; 汁徘列所有文 步驟1 003:若顯示圖片區塊且有圖片區塊, 塊,且進行步驟l〇03a,否1J圖片區 1 004 ; 進仃步驟4 63 09 9 V. Description of the invention (22) The steps for rearranging the pieces are described as follows: / Step 1 00 1: Count is set to 1; Step 1 0 02: Set according to the space between the sentences, according to the block found Step 1 003: If a picture block is displayed and there is a picture block, block, and proceed to step 1003a, no 1J picture area 1 004; proceed to step

步驟1 003a:若圖片小於原圖的1/L 步驟1 00 5; 則進行 步驟1 0 04 :若沒有超過文件的邊線限制,則 1 004a至1 004c,否則,進行步驟1〇〇5下列步驟 步驟1 004a: β文字體的大小+Κι個像素 題字體的大小+K2個像素,不 步驟1 004b:若count除,以心餘Q,則 W1個像素, 』政+ 若count除以&amp;餘〇,則行 W2個像素, 』祀十 步驟 1 004c: count=c〇Unt + i,回至步 1 0 0 2 ; 步驟1 0 0 5 :標題字體大小-K2個像素,推轩下別本时 糸 進灯下列步驟j 至步驟1 0 0 5b: 〇aStep 1 003a: If the picture is smaller than 1 / L of the original image Step 1 00 5; then proceed to Step 1 0 04: if the edge limit of the file is not exceeded, then 1 004a to 1 004c, otherwise, proceed to the following steps of step 105 Step 1 004a: β font size + K pixels pixel font size + K2 pixels, without step 1 004b: If count is divided by Q, then W1 pixels, 『政 + If count divided by &amp; For the remaining 〇, the line is W2 pixels. 『Ten Step 1 004c: count = c〇Unt + i, go back to Step 1 0 2; Step 1 0 5: Title font size-K2 pixels, push Xuan Xia Time to enter the following steps j to steps 1 0 0 5b: 〇a

步驟1 0 0 5a:内文字體的大小-&amp;個像素,並 重新排列所有區塊, ” W 4 63 09 9 五、發明說明(23) 步驟1 0 0 5b : 若超過文件邊線的限制,或圖片 小於1/L,則至步驟1 005a,否則 結束。 其中,I、K2、 數值,且Κ2 &gt; δ W! W2 和L為機定(default)的常 &gt; W,Step 1 0 0 5a: the size of the inner font-&amp; pixels, and rearrange all the blocks, "W 4 63 09 9 V. Description of the invention (23) Step 1 0 0 5b: If the limit of the document border is exceeded, Or if the picture is less than 1 / L, go to step 1 005a, otherwise end. Among them, I, K2, and κ2 &gt; δ W! W2 and L are constants &gt; W,

L 步驟1004是用來做其他設定的調整,而步驟i〇〇5是在 利用内文文字的縮小來做最後微調的工作。 圖1 1 ( a )〜1 1 ( e )說明本發明之一較佳實施例。此實施 例中,將一全彩文件影像經以本發明之自動文件剪輯系統 的各主要單元的功能對此彩色文件做分解與重新組合排 列’而得到一個重排後的結果影像。圖丨丨(a)為一欲煎輯 的全彩文件影像。大小約450 0K〜5 0 0 0K位元組(byte),150 dPi,高1 5 0 0像素,寬looo像素。圖u(b)為 :::自:r牛剪輯系統的文件區塊抽取單元和區匕類 皁兀後的一實施結果。圖11(c)為圖11(b)再經本發明 的川i:ϊ:剪輯系統的區塊順序判斷單元,判別文字區塊 11 (。)再利用本發明 早凡依照各種樣式進行文件 ^ 排 排的結果影像。圖丨丨為gj i ,侍到重排後的一個横 圖為圖11(C)再利用本發明之自動文 4 63 09 9 五、發明說明(24) 件剪輯系統的行句與文字切割單元及重排單元,依照各種 樣式進行文件的重排,得到重排後的一個直排的結果影 像。 圖12(a)〜12(d)為關於本發明之自動文件剪輯系統的 介面圖。以下摘述其功能表之一些重要指令。 圖12(a)中’列出「檔案(F)」之功能表的各項指令。 其中之π自動網頁出版&quot;指令的作用為,自動將編輯完成的 影像,轉變成html格式的檔案’以讓使用者可以用全球資 訊網(world wide web,的瀏覽 g(br〇wser)來 f 輯後的結果。 見、竭 圖12⑴中,列Λ「底稿(L)」之功能表 其中之自動去除底稿背景&quot;指令的作用為 令。 對 背景雜訊與顏色去除。而,&quot;色彩調和&quot;指令的仵衫像的 底稿作減色的動作,以便壓縮檔案的大小。、作用為, 圖12(c)中,列出「剪輯與重排(〇)」之叫 ^令。這些指令的主要作用為將文件區^抽取=表的各項 單區塊的切割、單區塊智慧型的切割,以及 ^來,包括 的切割。 區塊智慧型 圖12(d)中,列出「影像與繪圖(1) 之1力能表的各The L step 1004 is used to adjust other settings, and the step i005 is to use the reduction of the text to make the final fine-tuning work. 11 (a) to 1 (e) illustrate a preferred embodiment of the present invention. In this embodiment, a full-color document image is decomposed and recombined and arranged with the functions of the main units of the automatic document editing system of the present invention to obtain a rearranged result image. Figure 丨 丨 (a) is a full-color document image to be edited. The size is about 450 0K to 5 0 0K bytes (bytes), 150 dPi, height 1550 pixels, wide looo pixels. Figure u (b) is the result of the implementation of the file block extraction unit and the file system of ::: from: r cattle editing system. FIG. 11 (c) is the block sequence judgment unit of the Chuan i: ϊ: editing system of FIG. 11 (b) according to the present invention, and the text block 11 (.) Is reused. The present invention has been used to arrange files according to various styles ^ Row of resulting images. Figure 丨 丨 is gj i. A horizontal view after the rearrangement is shown in Figure 11 (C). The automatic text of the present invention is reused 4 63 09 9 V. Description of the invention (24) Line and text cutting unit of the editing system And rearrangement unit, rearrange the files according to various styles, and obtain a straight-lined result image after rearrangement. 12 (a) to 12 (d) are interface diagrams related to the automatic document editing system of the present invention. The following summarizes some important instructions of its function table. In Figure 12 (a), the commands of the menu of "File (F)" are listed. Among them, the function of the π automatic web page publishing command is to automatically convert the edited image into a file in html format, so that users can use the world wide web (browser) to browse The result after the f series is shown in Fig. 12 (b). The function of the "Automatically remove the background of the manuscript" in the menu of Λ "Manuscript (L)" is to order the background noise and color. And, " The color-blending manuscript of the shirt image is reduced in color so as to reduce the size of the file. The effect is as shown in Figure 12 (c), which is called "Clip and Rearrange (0)". These The main function of the instruction is to extract the file area ^ from the single block cutting of the table, the intelligent cutting of the single block, and the included cutting. The block intelligence is shown in Figure 12 (d). "Each of the Force Table of Image and Drawing (1)

4 63 09 9 ϊ ΐ二其中之&quot;區塊影像減色&quot;指令的作、 =顏色;目,以便減少儲存空間”4,減少區塊所 的小視窗,稱之為圖庫區,可 ’圖12(d)之右 :,並且可以對這些存 拖::取出影像區 作。 幻鬼’作管理與搜尋的動 倣八紐去 動文件剪輯系統與方法不僅能對梦声令 做分解與重新組合挑 友小僅此對衫色文件 m 徘列,而被抽取出的文杜P掄,άΓ 規則的矩形,或是任h πt ®町文件區塊,可以是 疋任何不規則的形狀。 唯,以上所述者,備基 不能以此限定本發明 為本發明之較佳實施例而已,當 利範圍所作之均等變施之範圍。即大凡依本發明申請專 之範圍内。 ’ 輿修飾,皆應仍屬本發明專利涵芸4 63 09 9 ϊ ΐ Two of the "block image subtraction" instructions, = color; purpose, in order to reduce storage space "4, reducing the small window of the block, called the library area, can be Right of 12 (d) :, and you can drag and drop these :: Take out the image area to make. The ghost ghost's management and search of the dynamic copying system and method can not only decompose and re-create the dream order The combination picker can only list the shirt color file m, and the extracted Wendu P 抡, regular rectangle, or any h πt ® document file block can be any irregular shape. For the above, Beiji cannot use this to define the present invention as a preferred embodiment of the present invention. The scope of equivalent changes made by the scope of the right. That is, within the scope of the application of the present invention. Should still belong to the invention patent

Claims (1)

4 63 09 9 六、申請專利範圍 1. 一種自動文件剪輯系統,包含有: 一文件區塊抽取單元,用來對一欲剪輯的原始文件影 像内的每一區塊進行切割,以將圖文區塊分離出來; 一區塊分類判斷單元,用來進行該區塊的圖或文性質 做出判斷; 一區塊順序判斷單元,用來判別文字區塊的順序; 一行句與文字切割單元,用來進行精細到每一個文字 的切割;以及, 一重排單元’依照各種樣式進行文件的重排’並得到 重排後的結果影像。 2. 如專利申請範圍第丨'項所述之自動文件剪輯系統,其 中,該自動文件剪輯系統係搭配使用一中央處理單 元、一記憶體單元,和一操作介面來實施該各單元之 間的自動文件剪輯功能。 3. 如專利申請範圍第2項所述之自動文件剪輯系統,其 中’該各單元之間的自動文件剪輯功能主要包含有檔 案功能、底稿功能、剪輯與重排功能’和影像與綠圖 功能。 4.如專利申請範圍第2項所述之自動文件剪輯系統,其 中’該操作介面更包括一圖庫區’用以拖入和取出被 剪輯影之影像區塊,旅對這些存入圖庫的區塊,有管4 63 09 9 VI. Scope of Patent Application 1. An automatic file editing system, including: a file block extraction unit, which is used to cut each block in an original file image to be edited, so as to convert graphics and text The blocks are separated out; a block classification and judgment unit is used to judge the graphic or text nature of the block; a block order judgment unit is used to judge the order of the text blocks; a line sentence and text cutting unit, It is used to cut each text from fine to fine; and a rearrangement unit 'arranges files according to various styles' and obtains the rearranged result image. 2. The automatic document editing system as described in the item of the patent application, wherein the automatic document editing system uses a central processing unit, a memory unit, and an operation interface to implement the inter-unit Automatic file clipping function. 3. The automatic file editing system as described in item 2 of the patent application scope, wherein the 'automatic file editing function between the units mainly includes a file function, a script function, a editing and rearranging function', and an image and green map function . 4. The automatic file editing system as described in item 2 of the patent application scope, wherein 'the operation interface further includes a library area' for dragging in and taking out the edited image blocks, and for those areas stored in the library Block, tube 笫29頁 ά 63 09 β 六、申請專利範圍 理與搜尋的功能。 5. 如專利申請範圍第2項所述之自動文件剪輯系統,其中 ,該中央處理單元用來執行該自動文件剪輯系統之各 單元的功能,該記憶體單元係作為所需要的資料儲存 空間,且藉由該操作介面的功能,使用者得以觀看選 用,並且輕易地操作各項文件自動剪輯功能。 6. 如專利申請範圍第5項所述之自動文件剪輯系統,其中 ,該操作介面的功能顯示方式係以圖形使用者介面來 達成,該圖形使用者介面包含有視窗、對話方塊、主 選單、工具列,再配合熱鍵及巨集的使用環境。 7. 如專利申請範圍第5項所述之自動文件剪輯系統,其中 ,該欲剪輯的文件係以一輸入單元將該文件的影像輸 入於該自動文件剪輯系統中。 8. 如專利申請範圍第5項所述之自動文件剪輯系統,其中 ,執行各項文件剪輯功能後的結果,係直接顯示於一 顯示單元。 9.如專利申請範圍第5項所述之自動文件剪輯系統,其中 ,執行各項文件剪輯功能後的結果,係以一共享的文 件資訊輸出至一伺服器中。笫 Page 29 ά 63 09 β 6. The scope of patent application and the function of searching. 5. The automatic document editing system according to item 2 of the scope of patent application, wherein the central processing unit is used to perform the functions of the units of the automatic document editing system, and the memory unit is used as the required data storage space, And through the function of the operation interface, the user can watch and choose, and easily operate the automatic editing function of various files. 6. The automatic document editing system according to item 5 of the scope of patent application, wherein the function display mode of the operation interface is achieved by a graphical user interface, which includes a window, a dialog box, a main menu, Toolbar, with the use of hotkeys and macros. 7. The automatic file editing system according to item 5 of the patent application scope, wherein the file to be edited is inputted into the automatic file editing system by an input unit. 8. The automatic file editing system as described in item 5 of the patent application scope, wherein the results after performing each file editing function are directly displayed on a display unit. 9. The automatic file editing system as described in item 5 of the patent application scope, wherein the results after executing each file editing function are output to a server with a shared file information. 第30頁 4 63 09 9Page 30 4 63 09 9 1 〇.如專利申請範圍第5項所述之自動文件剪輯系统,其 中’執行各項文件剪輯功能後的結果’係直接列至、 一輸出裝置。 1 1 · 一種自動文件剪輯方法,包含下列步驟: (a) 原始影像二值化步驟,將欲剪輯之原始文件影像 轉換成二值化的影像; (b) 文件區塊抽取步驟,從該二值化的影像裡,將文 件區塊抽取出來; (c) 區塊分類判斷步驟,進行該區塊的圖或文性質, 並判斷被抽取出來之文件區塊的方向和'類別. (d) 區塊順序判斷步驟,判別文字區塊的順序; (e) 行句與文字切割步驟,進行精細到每—個文字的 切割;以及, (f )文件影像重排步驟,依照各種樣式進行文件的重 排’並得到重排後的結果影像。 12.如專利申請範圍第11項所述之自動文件剪輯方法,其 中’步驟(a)更包含下列步驟: ” (a 1)根據矩量保持原理,分別取得紅、綠、藍三原色 分量之適當的臨界值Rt,Gt和Bt;以及, (a2)根據該臨界值Rt,Gt和Bt,利用一轉換公式,將 原始文件影像的灰階值分成高於臨界值的—組與10. The automatic file editing system as described in item 5 of the scope of patent application, wherein the 'result after performing each file editing function' is directly listed on an output device. 1 1 · An automatic file editing method, including the following steps: (a) the original image binarization step, which converts the original file image to be edited into a binary image; (b) the file block extraction step, from the two Extract the file blocks from the valued image; (c) Block classification judgment step, perform the graphic or textual nature of the block, and determine the direction and category of the extracted file blocks. (D) Block order judgment step to determine the order of text blocks; (e) line and text cutting steps to cut each word finely; and (f) document image rearrangement steps to perform document formatting according to various styles Rearrange 'and get the rearranged result image. 12. The automatic file editing method as described in item 11 of the scope of patent application, wherein 'step (a) further includes the following steps:' (a 1) According to the principle of moment retention, the appropriate red, green, and blue primary color components are obtained separately. Critical values Rt, Gt and Bt; and, (a2) According to the critical values Rt, Gt and Bt, using a conversion formula, the grayscale values of the original document image are divided into groups higher than the critical value—groups and 4 63 09 9 六、申請專利範圍 小於臨界值的另一組。 13. 如專利申請範圍第11項所述之自動文件剪輯方法,其 中,步驟(b )之文件區塊抽取更包含單區塊的切割、單 區塊智慧型的切割,以及多區塊智慧型的切割。 14. 如專利申請範圍第11項所述之自動文件剪輯方法,其 中,步驟(c )之文件區塊初步的分類包含下列五類: (a )圖片區塊’泛指影像、圖片以及圖片說明; (b ) 垂直標題區塊,文字排列為垂直方向的標題區 塊; (c ) 水平標題區塊’文字排列為水平方向的標題區 塊; (d) 垂直内文區塊,文字排列為垂直方向的内文區 塊;以及, (e) 水平内文區塊,文字排列為水平方向的内文區 塊。 15. 如專利申請範圍第11項所述之自動文件剪輯方法,其 中,步驟(c)之區塊分類判斷步驟包括文件區塊的前 處理,和文件區塊的分類判斷。 16. 如專利申請範圍第11項所述之自動文件剪輯方法,其 中,在步驟(d )之區塊順序判斷時,將文件分成垂直4 63 09 9 VI. Another group whose patent application scope is less than the critical value. 13. The automatic file editing method according to item 11 of the scope of patent application, wherein the file block extraction in step (b) further includes single-block cutting, single-block intelligent cutting, and multi-block intelligent Cutting. 14. The automatic file editing method as described in item 11 of the scope of patent application, wherein the preliminary classification of the file blocks in step (c) includes the following five categories: (a) Picture block 'refers to images, pictures, and picture descriptions ; (B) vertical title block, text arranged as vertical title block; (c) horizontal title block 'text arranged as horizontal title block; (d) vertical text block, text arranged as vertical And (e) a horizontal text block in which text is arranged in a horizontal text block. 15. The automatic file clipping method as described in item 11 of the scope of patent application, wherein the step of judging the classification of blocks in step (c) includes pre-processing of file blocks, and classification and judgment of file blocks. 16. The automatic file clipping method as described in item 11 of the scope of patent application, wherein, when the block order of step (d) is judged, the file is divided into vertical 第32頁 4 63 〇9 3 六、申請專利範圍 文件與水平文件來處理 17. 18. 19. 如專利申請範圍第11項所述之自動文件剪輯方、 中,步驟(e)之行句與文字切割步驟包含了非法’其 塊内的行句切割步驟、内文區塊文字的切割并圖片區 標題區塊文字的切割步驟。 夕騍,和 如專利申請範圍第11項所述之自動文件剪輯方去 中’步驟(f )之文件影像重排有手動重排,和自 其 排兩種文件重排的方法。- / 如專利申'請範圍第1 3項所述之自動文件剪輯方法,其 十’該單區塊的切割是直接以使用者框取出的外框;^ 抽取區塊。Page 32 4 63 〇9 3 VI. Application for Patent Scope Documents and Level Documents for Processing 17. 18. 19. The automatic document editing party as described in item 11 of the patent application scope, steps in step (e) and The text cutting step includes the step of illegally cutting the lines and sentences in the block, the text block in the text block, and the text block in the picture block title block. Evening, and the automatic file editing method described in item 11 of the scope of patent application, the file image rearrangement in step 'f' includes manual rearrangement and file rearrangement. -/ As in the method of automatic file editing described in item 13 of the patent application, tenth, the cutting of the single block is the outer frame directly taken out by the user frame; ^ Extract the block. 第33頁 A 63 09 9 六、申請專利範圍 r '~ -- 21·如專利申請範圍第丨3項所述之自動文件剪輯方法,其 中’該多區塊智慧型切割更包含下列步驟: (a) 對於—張文件影像’將其約略範圍框取出來; (b) 根據目前區塊的垂直投影最大白色區段長度與水 平技景:ί最大白色區段長度來決定每次切割的方 向; (c) 如果垂直投影最大白色區段長度大於水平投影的 最大白色區段長度,就對區塊做垂直切割,反之 ’右水平投影最大白色區段長度大於垂直投影的 最大白色區段長度,則做水平切割; (d) 切割出來的區塊,繼續重複步驟(b)和(〇),以細 切成更小的方形區塊;以及, (e )利用影像相似關係’將各方形區塊的外框内縮成 最小不規則外框。 22.如專利申請範圍第1 5項所述之自動文件剪輯方法,其 中,該文件區塊的前處理更包含下列步驟: ’、 (a) 找出該文件區塊的兩端點(左,下)與(右,上). (b) 去掉該區塊中的孤立的黑點; (c) 對該文件區塊做垂直投―影找出垂直方向投影的 平均白色區段的長度,和垂直方向投影的平均專 色區段的長度; ^ (d) 對該文件區塊做水平投影找出水平方向投影的 平均白色區段的長度,和水平方向投影的平均黑Page 33 A 63 09 9 VI. Patent application scope r '~-21 · The automatic file editing method described in item 丨 3 of the patent application scope, wherein' the multi-block smart cutting further includes the following steps: ( a) For-a file image ', take out its approximate frame; (b) determine the direction of each cut according to the maximum white segment length and horizontal scene length of the current block's vertical projection: ί the maximum white segment length; (c) If the maximum white segment length of the vertical projection is greater than the maximum white segment length of the horizontal projection, the block is cut vertically, otherwise 'the maximum white segment length of the right horizontal projection is greater than the maximum white segment length of the vertical projection, then Make horizontal cuts; (d) cut the blocks, continue to repeat steps (b) and (0) to finely cut into smaller square blocks; and, (e) use the image similarity relationship to divide each square block The outer frame shrinks to the smallest irregular frame. 22. The automatic file clipping method according to item 15 of the scope of patent application, wherein the pre-processing of the file block further includes the following steps: ', (a) find the two ends of the file block (left, (Bottom) and (right, top). (B) remove the isolated black point in the block; (c) make a vertical projection of the file block-find the length of the average white section projected vertically, and The length of the average spot color segment projected vertically; ^ (d) horizontally project the file block to find the length of the average white segment projected horizontally and the average black projected horizontally A 63039A 63039 六、申請專利範圍 色區段的長度;以及, (e )計算出二值化後黑色點在該文件區塊内所佔 例。 的比 23.如專利申請範圍第22項所述之自動文件剪輯方法,其 中’該文件區塊的分類判斷,分有下列幾種狀況.八 (a) 若(二值化後黑色點在該文件區塊内所佔的比 例&gt; C1 )或(垂直方向投影的平均黑色區段 的長度=0异水平方向投影的平均白色區段的 長度=0),則,該區塊設為圖片區塊; (b) 若(水平方向投影的平均白色區段的長度 &gt; 垂 直方向投影的平均白色區段的長度),則該區塊 設為水平内文區塊,否則,該區塊設為垂直內文 區塊; (c) 若(該區塊的寬/垂直方向投影的平均黑色區段 的長度)SC2且(該區塊的高/該區塊的寬)S C3,則,該區塊設為垂直標題區塊; (d) 若(該區塊的高/水平方向投影的平均黑色區段 的長度)SC2且(該區塊的,寬/該區塊的高)S C3,則,該區塊設為水平標題區塊; 其中, q 為預定的常數值’範圍為0. 5到1之間, C2為預定的常數值,範圍在1〇以下, C3為預定的常數值,範圍為大於1。6. The scope of the patent application The length of the color segment; and (e) Calculate the example of the black point in the file block after binarization. 23. The automatic file editing method as described in item 22 of the scope of patent application, wherein the classification and judgment of the file block are classified into the following situations. Eight (a) If (the black point is in the Proportion of file block> C1) or (length of average black segment projected in vertical direction = 0 length of average white segment projected in different horizontal direction = 0), then this block is set as a picture area Block; (b) If (the length of the average white segment projected in the horizontal direction &gt; the length of the average white segment projected in the vertical direction), the block is set as a horizontal text block, otherwise, the block is set as Vertical text block; (c) if (the width of the block / the length of the average black section projected in the vertical direction) SC2 and (the height of the block / the width of the block) S C3, then the area The block is set as a vertical title block; (d) If (the average black segment length of the block's height / horizontal projection) SC2 and (of the block, width / height of the block) S C3, then 5,1 之间。, This block is set as the horizontal title block; where q is a predetermined constant value 'range is 0.5 to 1. C2 is a predetermined constant value, in the range of less 1〇, C3 is a predetermined constant value, in the range of greater than 1. 第35頁 &gt; -463 09 9 _____ 六、申請專利範圍 24.如專利申請範圍第丨6項所述之自動文件剪輯方法,其 中,該垂直文件區塊順序的判斷包含下列步驟: (a) 找出每個區塊的四個方向鄰近的區塊Up—b lk、 D〇wn_blk &gt;Left_blk 'Right_blk; (b) 找出第一序號的區塊,將該區塊設為Now_b 1 k, 序號並設為一; (c) 若 Now_blk 有Left_blk 且Left_blk 還未被排 序,則依序進行下列(cl)至(c3)的3個步驟,否 則,進行步驟(d), (cl )設定Lef t_blk 的Right_blk 為Now_blk, (c2)設定Now_blk 為Now_blk 的Left —blk, (c3)序號加一,回至步驟(c); (d) 若N〇w_blk有Right_blk,則依序進行下列(dl)至 (d2)兩個步驟,否則,至步驟(e): (d 1 )設定R i gh t_b 1 k 的 Le f t_b 1 k 為Now_b 1 k, (d2)設定 Now_blk 為 Now_blk 的 Right_blk,回至 步驟(d), (e) 若 Now—blk 的Down_blk 為Null Block,則依序 進行下列步驟(el)至(e2),否則,設定該 Down_blk 的Up_blk 為Now_blk,且Now__blk 為 化评_13 11^的0〇¥11_1)11^,且序_號_^一,並回至步驟 (c), (el)設定Now_blk 為No夏—jlk 的Left_blk,Page 35 &gt; -463 09 9 _____ VI. Patent application scope 24. The automatic file editing method described in item 丨 6 of the patent application scope, wherein the judgment of the vertical file block order includes the following steps: (a) Find the four blocks adjacent to each block Up-b lk, D〇wn_blk &gt; Left_blk 'Right_blk; (b) Find the block with the first sequence number, and set the block to Now_b 1 k, The serial number is set to one. (C) If Now_blk has Left_blk and Left_blk has not been sorted, then perform the following 3 steps (cl) to (c3) in order, otherwise, go to step (d), (cl) set Lef Right_blk of t_blk is Now_blk, (c2) Set Now_blk to Left_blk, Left — blk, (c3) Add the serial number to one, and return to step (c); (d) If Now_blk has Right_blk, then proceed to the following (dl) Go to (d2) two steps, otherwise, go to step (e): (d 1) Set Le f t_b 1 k of Right_b 1 k to Now_b 1 k, (d2) Set Now_blk to Right_blk of Now_blk, and return to Steps (d), (e) If the Down_blk of Now_blk is a Null Block, perform the following steps (el) to (e2) in order, Then, set Up_blk of the Down_blk to Now_blk, and Now__blk is 0〇 ¥ 11_1) 11 ^ of the _13 11 ^, and the sequence number _ ^ one, and return to step (c), (el) Set Now_blk as No 夏 —Left_blk of jlk, 第36頁 463099 六、申請專利範圍 (e2)若Now_blk=空區塊,則結束,否則,回至 步驟(e )。 25. 如專利申請範圍第1 7項所述之自動文件剪輯方法,其 中,該行句切割步驟中,若處理的是水平文件區塊, 則該行句切割依序包含下列(e 1 )至(e3 )之3個步驟: (e 1 )利用水平投影先找出每行的寬; (e 2 )對每行的寬度,做垂直方向的局部投影;以及, (e3)由左右兩端逼近行的長度,以得到該水平文件區 塊内每行的實際長度; 若處理的是垂直文件區塊,則該行句切割依序包含下 列(ell)至(e31)之3個步驟: (e 11 )利用垂直投影先找出每行的寬; (e2 1 )對每行的寬度,做水平方向的局部投影;以及, (e31)由上下兩端逼近行的高度,以得到該垂直文件 區塊内每行的實際高度。 26. 如專利申請範圍第1 7項所述之自動文件剪輯方法,其 中,該内文區塊文字的切割中,分為垂直内文文字的 分割和水平内文文字的分割。 27.如專利申請範圍第25項所述之自動文件剪輯方法,其 中,該垂直内文文字的分割,係利用到整個區塊的投 影配合各行的部份投影來切割文字,而該水平内文文Page 36 463099 6. Scope of Patent Application (e2) If Now_blk = Empty Block, then it ends, otherwise, go back to step (e). 25. The automatic file clipping method described in item 17 of the scope of patent applications, wherein in the line cutting step, if a horizontal file block is processed, the line cutting includes the following (e 1) to (E3) 3 steps: (e1) first find the width of each line using horizontal projection; (e2) make a vertical local projection of the width of each line; and (e3) approach from the left and right ends The length of the line to get the actual length of each line in the horizontal file block. If the vertical file block is processed, the line cutting includes the following three steps (ell) to (e31) in order: (e 11) Use vertical projection to find the width of each line first; (e2 1) Make a local projection of the width of each line in the horizontal direction; and (e31) Approach the height of the line from the upper and lower ends to obtain the vertical file area The actual height of each row within the block. 26. The automatic file editing method as described in item 17 of the scope of patent application, wherein the cutting of the text block is divided into vertical text text division and horizontal text text division. 27. The automatic document editing method as described in item 25 of the patent application scope, wherein the vertical text segmentation uses a projection to the entire block and a partial projection of each line to cut the text, and the horizontal text Text 第37頁 4 63 09 9Page 37 4 63 09 9 六、申請專利範圍 字則係利 用到垂直投影來切割文字。 28.如專利申請範圍第23項所述之自動文件剪輯方法,其 中’該標題區塊文字的切割’分為垂直標題文字的二 割和水平標題文字的分割。 ” 29·如專利申請範圍第28項所述之自動文件剪輯方法’其 中’該垂直標題文字的分割係利用部份投影的特性了 即行的寬度’並且定義兩種引號的特性,將引號區 別於一般文字之外,如此利用每格行寬的方式去二割 標題的文字,而水平標題文字的分割在投影方面,貝^ 係利用到垂直投影去切割標題的文字。 3 0.如專利申請範圍第1 8項所述之自動文件剪輯方法,1 中’ δ亥自動重排的方法包含下列步驟: (fl)將參數count的初始值設定為1; (f 2 )依字句間格設定,按照找出的順序排列所 件區塊; 文 (f 3 )若顯示圖片區塊且有圖片區塊,則排列該圖片 區塊,且進行步驟〇f3a) ’否則,至步驟(f4). (f3a)若圖片小於原圖的1/L·,至步驟5);, (f 4)若沒有超過文件的邊線限制,則進行下列步騍 ({48)至(^4(〇,否則’至步驟(€5); (f4a)將内文字體的大小加K!個像素,標題字6. Scope of patent application Characters are cut by vertical projection. 28. The automatic file editing method according to item 23 of the scope of patent application, wherein 'the cutting of the title block text' is divided into a vertical title text dichotomy and a horizontal title text dichotomy. 29. The automatic file clipping method described in item 28 of the scope of patent application 'wherein' the segmentation of the vertical title text uses the characteristics of partial projection, ie the line width 'and defines the characteristics of two types of quotation marks, distinguishing quotation marks from In addition to the general text, the text of the headline is cut in such a way that the width of each line is used, and the division of the text of the horizontal headline is used in the projection. The vertical projection is used to cut the text of the headline. 3 0. Such as the scope of patent applications The automatic file editing method described in item 18, the method of δ 'Hai automatic rearrangement in 1 includes the following steps: (fl) set the initial value of the parameter count to 1; (f 2) set according to the space between the sentences, according to Arrange the blocks in the order found; if (f 3) displays a picture block and there is a picture block, arrange the picture block and proceed to step 〇f3a) 'Otherwise, go to step (f4). (F3a ) If the picture is less than 1 / L · of the original picture, go to step 5) ;, (f 4) If the edge limit of the file is not exceeded, perform the following steps ((48) to (^ 4 (〇, otherwise 'to step (€ 5); (f4a) Increase the size of the inner font by K! Pixels The title character 第38頁 4S3 099 六、申請專利範圍 體的大小加K2個像素,至步驟(f4b), (f 4b) 若count的值除以1餘〇,則字間距加 W!, 若count的值除以d2餘〇,則行間距加 W2, 、 (f4c) 將參數count的值加1,回至步驟(f2); (f5)將標題字體的大小減K2,進行下列步驟(f5a)至 (f5b): (f 5a)將内文字體的大小減&amp;,並重新排列所 有區塊, 1/L,則至步驟(f 5a (f5b)若超過文件邊線的限制,或圖片小於 ’否則,結束; L &gt;Page 38 4S3 099 6. The size of the range of patent application plus K2 pixels, to step (f4b), (f 4b) If the value of count is divided by more than 1, the word spacing is increased by W !, if the value of count is divided If d2 is more than 0, the line spacing is increased by W2, (f4c) Increase the value of the parameter count by 1 and return to step (f2); (f5) Reduce the size of the title font by K2, and perform the following steps (f5a) to (f5b) : (f 5a) Decrease the size of the inner font and rearrange all blocks, 1 / L, then go to step (f 5a (f5b) if it exceeds the limit of the file border, or the picture is smaller than 'otherwise, end; L &gt; 第39頁Page 39
TW89101060A 2000-01-24 2000-01-24 Automatic document editing system and method TW463099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW89101060A TW463099B (en) 2000-01-24 2000-01-24 Automatic document editing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW89101060A TW463099B (en) 2000-01-24 2000-01-24 Automatic document editing system and method

Publications (1)

Publication Number Publication Date
TW463099B true TW463099B (en) 2001-11-11

Family

ID=21658573

Family Applications (1)

Application Number Title Priority Date Filing Date
TW89101060A TW463099B (en) 2000-01-24 2000-01-24 Automatic document editing system and method

Country Status (1)

Country Link
TW (1) TW463099B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI402181B (en) * 2007-12-27 2013-07-21 Seiko Epson Corp A recording control means, a recording control method, and a computer-readable recording medium
TWI409689B (en) * 2005-10-27 2013-09-21 Ibm Method, data processing system and computer program product for maximizing window display area using window flowing
TWI638563B (en) * 2017-05-19 2018-10-11 虹光精密工業股份有限公司 Image capturing method and image capturing device using the same

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI409689B (en) * 2005-10-27 2013-09-21 Ibm Method, data processing system and computer program product for maximizing window display area using window flowing
TWI402181B (en) * 2007-12-27 2013-07-21 Seiko Epson Corp A recording control means, a recording control method, and a computer-readable recording medium
US8619321B2 (en) 2007-12-27 2013-12-31 Seiko Epson Corporation Recording control device and recording control method
TWI638563B (en) * 2017-05-19 2018-10-11 虹光精密工業股份有限公司 Image capturing method and image capturing device using the same
US10484560B2 (en) 2017-05-19 2019-11-19 Avision Inc. Image capturing method capable of arranging a plurality of region images and image capturing device using the same

Similar Documents

Publication Publication Date Title
US6903751B2 (en) System and method for editing electronic images
US8855413B2 (en) Image reflow at word boundaries
EP1999688B1 (en) Converting digital images containing text to token-based files for rendering
CN101820489B (en) Image processing apparatus and image processing method
EP0522702B1 (en) Spot color extraction
US7386789B2 (en) Method for determining logical components of a document
US8000529B2 (en) System and method for creating an editable template from a document image
US8379055B2 (en) Electronic layout generation based on visual context
US20110050723A1 (en) Image processing apparatus and method, and program
US20110229035A1 (en) Image processing apparatus, image processing method, and storage medium
CN1241758A (en) Image processing apparatus and method, and computer-readable memory
KR20100033412A (en) Image processing apparatus, image processing method, and computer program
JP2009146064A (en) Image processor, image processing method, and program and storage medium thereof
EP2544099A1 (en) Method for creating an enrichment file associated with a page of an electronic document
US6850228B1 (en) Universal file format for digital rich ink data
JPH11345339A (en) Method, device and system for picture segmentation, and computer-readable memory
TW463099B (en) Automatic document editing system and method
JP5182902B2 (en) Document image output device
JPH0612540B2 (en) Document creation support device
JP5159588B2 (en) Image processing apparatus, image processing method, and computer program
Murguia Document segmentation using texture variance and low resolution images
JP2003331299A (en) Device, method and program for displaying reduced image, and recording medium recorded with program
JP5767549B2 (en) Image processing apparatus, image processing method, and program
JP2006092127A (en) Image processor, image processing method and program
JP5824309B2 (en) Image processing apparatus, image processing method, and program

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees