TW463099B

TW463099B - Automatic document editing system and method

Info

Publication number: TW463099B
Application number: TW89101060A
Authority: TW
Inventors: Shan-Shih Huang; Wen-Hsiang Tsai; James Ching-Yu Yang
Original assignee: Formosoft Internat Inc
Priority date: 2000-01-24
Filing date: 2000-01-24
Publication date: 2001-11-11

Abstract

This invention is an automatic document editing system and method including document block area fetching unit, block area classification determination unit, block area sequence determination unit, line sentence and text spilt unit, and rearrangement unit. It can separate and rearrange color documents. Users can achieve the goal of automatic document editing on their owns. The method in this invention includes original image two-value steps, document block area fetching steps, block area classification determination steps, block area sequence determination steps, line sentence and test spilt steps and text image rearrangement steps, and the text block area fetched out can be regular rectangles or any irregular shapes.

Description

4 63 09 94 63 09 9

本發明係有關於文件區塊的。枯像處理的方式，自動文件剪輟輯特別是一種利用影的系統與方法。發明背景現代的印刷技術相當進步樣。以現今平面媒體、報章、雜誌：令牛：排版趨於多們的目光和增加資訊的可看性，大^ : 了達到吸引人加以文字、圖片的修飾，使得現代文主二能迅速且正確地瞭解彩色文件内文章的結 :1若供讀者閱讀的順序，還可以將文件内的；文以i::=提重新排列，或是分別對圖文作處理，將 ^卜式 Ϊ出ΤΙ;部份的文字㈣，再經文字二以予以Ϊ 識出來，作各種應用。了 μ辨人赫：Ϊ件的分析與暸解㈣’主要在利用電腦自動處理人類所使用種類繁多的文件。過去雖然已有不少里2處理動處理與分析的研究，但在彩色文件方面的探；則J件文件分析的—個重要工作是，將一成不同的區域。一般來說有兩種方法：彩色文件影像分由上而下割The present invention relates to file blocks. With regard to the dead image processing method, automatic file clipping is especially a system and method using shadows. BACKGROUND OF THE INVENTION Modern printing technology is quite advanced. With today's print media, newspapers, and magazines: Ling Niu: typography tends to attract more people's attention and increase the visibility of information, ^: It has reached the appeal of adding text and pictures to make modern authors quickly and correctly. To understand the conclusions of the articles in the color file: 1 If the order of reading is available to the reader, you can also rearrange the text in the file; or rearrange the text with i :: =, or process the graphics and text separately. Part of the text is recognized by text two for various applications. Identified Human: Analysis and Understanding of Files㈣ ’are mainly using computers to automatically process a wide variety of documents used by humans. In the past, although there have been many researches on the processing and analysis of Li 2 processing, but the exploration of color files; the important thing of the J file analysis is that it will become a different area. Generally speaking, there are two methods: Color file image is divided from top to bottom

第4頁 4 63 09 9Page 4 4 63 09 9

五、發明說明（2) (top-down)或由下而上（bottom-up)。在習知文獻 Friedrich M. Wahl. Kwan Y. Wong > and Richard G. Ca/sey，丨'Block Segmentation and Text Extraction in ^j/ixed Text/Image Documents" Computer Graphics and Image Processing 20， 375-390， 1982，所揭露的"區5. Description of the invention (2) (top-down) or bottom-up. Friedrich M. Wahl. Kwan Y. Wong > and Richard G. Ca / sey, 'Block Segmentation and Text Extraction in ^ j / ixed Text / Image Documents " Computer Graphics and Image Processing 20, 375-390 , 1982, the "quoted district"

段長度平滑演算法（run length smoothing algorithm)" ，以及D. Wang and S.N. Srihari，"Classification of newspaper image blocks using texture analysis" v Computer Vision ， Graph, and I mage Process. ，Vol. 47，pp. 3 27-3 52， 1 98 9，所揭露的"投影輪廓切割演算法 (projection profile cut algorithm)"，為由上而下的方法。而由下往上的習知文獻有L. A. Fletcher and R. Kasturi，'丨 A robust algorithm for text string separation from mixed text/graphics images" IEEE Trans， Pattern Analysis and Machine Intelligence j，Vo 1. 10， pp. 910-918，1 988。將影像中的像素連結Run length smoothing algorithm ", and D. Wang and SN Srihari, " Classification of newspaper image blocks using texture analysis " v Computer Vision, Graph, and I mage Process., Vol. 47, pp 3 27-3 52, 1 98 9, the "projection profile cut algorithm" disclosed is a top-down approach. From the bottom up, the conventional literature is LA Fletcher and R. Kasturi, '丨 A robust algorithm for text string separation from mixed text / graphics images " IEEE Trans, Pattern Analysis and Machine Intelligence j, Vo 1. 10, pp. 910-918, 1 988. Linking pixels in an image

成許多連通分量（connected component)，然後再合併成較大區塊的方法。而對於文件區塊分類的習知文獻有H. J. Lee and C. W. Ch i en" Segmentation of documents with text/graphic/image" Proc. of ICCPCOL ， pp.A method of forming many connected components and then merging them into larger blocks. The conventional literature on file block classification includes H. J. Lee and C. W. Ch i en " Segmentation of documents with text / graphic / image " Proc. Of ICCPCOL, pp.

第5頁 4 63 09 9 五、發明說明（3) 1 88-1 94，1991。對於一文件影像，先利用塗黑法 (smearing method )將像素合併為一個個區塊區域（以〇(^ region)，並找出區塊區域的邊界之後，將每個小區域區塊先作粗分類（coarse classificati〇n )，並在處理前，先债測文件傾斜的角度，並予以導正。此粗分類是利用每個區塊的長寬比、區塊大小和作者所定義的一些限制值，共分為：基本文字區（basic text)、抬頭（title)、圖片和影像（graphics and image)、線段和雜訊（Une and noises)。之後，在細分類（fine classificati〇n)時，將圖片和影像利用作者所定義的八個遮罩（mask)來計算分辨出來，除此之外，合併重疊的相同類別區塊。接下來，更進步將一些判斷成文字區的抬頭區塊，利用相關位置從文字區中區分出來。最後，將文字區區塊合併成行（text line)，並揭露一種合併（merge)與分割（spilt)並用的方法將行句内的文字切割出來。 „ 另外，習知文獻 K. C. Fan and l. S. Wang, "Document Segmentation and Classification" Proc. of 1997 IPPR Conf. 〇n CVGIP . Taichung > Taiwan > , Pp. ，1 997，也揭露一種文件分割與分類的方法可以先將掃描時邊界出現的的雜訊予以偵測並去除’然後依作者所定義的標準，找出基本結構（basic com^o^ent) j然後再將這些基本結構分類。其過程是先將之为為文字區（text)部份和非文字區（non-text)。對於文Page 5 4 63 09 9 V. Description of the Invention (3) 1 88-1 94, 1991. For a file image, first use the smearing method to merge pixels into block areas (with 0 (^ region), and find the boundary of the block area. Coarse classification (coarse classificati0n), and before processing, first measure the tilt angle of the document and correct it. This coarse classification uses the aspect ratio of each block, the block size, and some defined by the author. Limit values are divided into: basic text area, title, graphics and image, line and noise. After that, in the fine classificati At the same time, the images and images are calculated and distinguished using the eight masks defined by the author. In addition, the overlapping blocks of the same category are merged. Next, some progress is made to judge some of the text areas as head-up areas Blocks are distinguished from the text area by using related positions. Finally, the text area blocks are merged into text lines, and a method of combining and splitting is used to cut out the text in the sentence. „In addition, the conventional literature KC Fan and l. S. Wang, " Document Segmentation and Classification " Proc. Of 1997 IPPR Conf. 〇n CVGIP. Taichung > Taiwan >, Pp., 1 997, also disclosed a The method of file segmentation and classification can first detect and remove the noise appearing at the boundary when scanning, and then find the basic structure (basic com ^ o ^ ent) according to the criteria defined by the author. Classification. The process is to first divide it into a text area and a non-text area. For text

4 63 09 9 五、發明說明（4) 字部份’利用提出的演算法合併成字串（text string)，再將字串合併為文字段落（text paragraph)，並在最後將相同方向的文字段落合併。而對於非文字部份，使用遮罩 (mask)找出直線，並利用線的多寡來判斷欄（field)和表 (tab 1 e)的不同’接著觀察區塊内各點附近的黑點數，將影像和圖片分辨出來。最後，利用多邊形（P〇 1 yg〇na 1 )來表示每個區塊’並合併（merge)重疊的相同類別區塊或分割（split)不同類別的區塊。習知文獻L. F. Lee and W. H. Tsai， "Understanding of Arrangements and Extraction of Articles in Chinese Newspaper Images" Proc. Int. Conf. Computer Vision ， Graphics and Image Processing ， Nantou ’Taiwan ， ROC ， pp. 479-487 ， 1 9 9 5，的揭露裡，則是針對中文的報紙，揭露對於排版瞭解以及文章擷取的方法’並對於内文文字和標題文字採取不同的切割方式，以解決不同的字體出現在同一個區塊。習知文獻 S. C. Lin and W. H. Tsai , "Segmentation and Understanding of Color Magazine Images" Proc. International Computer Symposium ， Kaohs i ung，Taiwan ’Republic of China，December 1 9 9 6，pp. 2 0 5 -2 1 2，的揭露裡，則對中文雜誌提出了自動切割區塊的方法，並利用相關位置和區塊的一些特性找4 63 09 9 V. Description of the invention (4) The word part 'combined into a text string using the proposed algorithm, then merged the string into a text paragraph, and finally the text in the same direction Paragraph merge. For non-text parts, use a mask to find the straight line, and use the amount of the line to determine the difference between the field and the tab (e). Then observe the number of black dots near each point in the block. To distinguish between images and pictures. Finally, a polygon (P0 1 ygona 1) is used to represent each block 'and merge overlapping blocks of the same category or split blocks of different categories. LF Lee and WH Tsai, " Understanding of Arrangements and Extraction of Articles in Chinese Newspaper Images " Proc. Int. Conf. Computer Vision, Graphics and Image Processing, Nantou 'Taiwan, ROC, pp. 479-487, 1 The disclosure of 9 9 5 is aimed at Chinese newspapers, revealing the understanding of typography and the method of extracting articles, and adopting different cutting methods for the main text and headline text to solve the problem that different fonts appear in the same area. Piece. SC Lin and WH Tsai, " Segmentation and Understanding of Color Magazine Images " Proc. International Computer Symposium, Kaohs iung, Taiwan 'Republic of China, December 1 9 9 6, pp. 2 0 5 -2 1 2 In the disclosure of,, a method for automatically cutting blocks was proposed for Chinese magazines, and the relevant positions and some characteristics of blocks were used to find

4 63 09 9 五、發明說明（5)4 63 09 9 V. Description of the invention (5)

頁碼等D 出文章的標題、摘要、章節、拾頭發明概述 t發：能對彩色文件做分解與重新組合排之一是，提供一種自動文件剪輕。其目的自己的意思，達到自動文件剪輟 ==者可以依照慧型的文件剪輯系統。不赞月為一種智本發明之自動文件剪輯系取單元、區塊分類判斷單元、文字切割單元，以及重排單元統，主要包含有文件區塊抽區塊順序判斷單元、行句與一本，明之自動文件剪輯系統可搭配使用一中央處理 :、-纪憶體單元，# 一操作介面來實施各單元之間的動文件剪輯功能’和作為所需要的資料儲存空間。操作介面可，一特殊的工具列面板，來方便使用者的觀看選用，並且得以輕易地操作各項文件之自動剪輯功能。本發明之又一目的是，本發明之自動文件剪輯方法步驟、文件區塊抽取步驟、判斷步驟、行句與文字切巧提供一種自動文件剪輯方法。，主要包含有原始影像二值化區塊分類判斷步驟、區塊川貝序步驟，和文件影像重排步驟。Page number, etc. The title, abstract, chapter, and pick-up of the article are summarized. Summary of the Invention: To be able to decompose and reassemble color files. One of them is to provide an automatic file cutting. The purpose is to realize the automatic file clipping according to its own meaning. == Those who can follow the intelligent file editing system. The shameless month is a kind of intelligent document editing unit, block classification judgment unit, text cutting unit, and rearrangement unit system of the present invention, which mainly includes a document block extraction block sequence judgment unit, a line sentence, and a book. Ming Ming's automatic file editing system can be used with a central processing unit :,-Ji Yi body unit, # an operation interface to implement the function of moving file editing between units' and as the required data storage space. The operation interface is a special toolbar panel, which is convenient for users to watch and select, and can easily operate the automatic editing function of various files. Yet another object of the present invention is to provide an automatic file clipping method in the automatic file clipping method steps, file block extraction steps, judgment steps, lines, and text of the present invention. , Mainly including the binarization of the original image, the block classification judgment step, the block Chuanbei sequence step, and the document image rearrangement step.

4 63 09 94 63 09 9

本务月之原始影像二值化的步驟是根據"矩量保持 (moment preseving)原理"，分別對紅、綠、駐—鍤分量做矩量保持二值化，來取得=_八曰、;^一裡顧巴 "U u 付—原色分Ϊ適當的臨界值 (threshold),再根據一轉換關係得到最佳的二像，以做區塊抽取。 ~~ ^ 本發明不僅能對彩色文件做分解與重新組合被抽取出的文件區塊，可以包括是規則的矩不規則的形狀。 A疋任何在本發明之實施例中仰二少人，丨卞$僧γ、之自動文件剪輯系統的各主要單元的功能對此：二：：明分解與重新組合排列，而得到一個重排後的結▲影像。做兹配合下列圖式、實施例之詳失細說明及圍，將上述及本發明之其他目的與優點詳述於后。凊範明圖式之簡要說統的架構及各單元的功圖1係本發明之自動文件剪輯系示意圖。一中央施各單。^根據本發明之自動文件剪輯系統，搭配使用處理單元、一記憶體單元，和一操作介面來實兀*之間的自動文件剪輯功能的方塊示意圖。The step of binarizing the original image of this month is based on the "moment preseving principle", and the binarization of the red, green, and 驻-锸 components is performed to maintain the binarization to obtain = _ 八月, ^ 一里顾巴 " U u Fu-the primary color is divided into an appropriate threshold (threshold), and then the best two images are obtained according to a conversion relationship for block extraction. ~~ ^ The present invention can not only decompose and reassemble color files, the extracted file blocks can also include regular moments and irregular shapes. A 疋 Anyone in the embodiment of the present invention, the main unit of the automatic file editing system, the functions of the main units are as follows: 2: Decomposition and rearrangement, and get a rearrangement After the knot ▲ image. In conjunction with the detailed description and surroundings of the following drawings and embodiments, the above and other objects and advantages of the present invention will be described in detail below.凊 Fan Ming Brief description of the system structure and functions of each unit Figure 1 is a schematic diagram of the automatic file editing system of the present invention. One central government pays the bill. ^ A block diagram of the automatic document editing function according to the present invention, using a processing unit, a memory unit, and an operation interface to implement the automatic document editing function *.

4 63 09 94 63 09 9

五、發明說明（7) 圖3係本發明之自動文件剪辍方 ^ w万去的運作流程圖。圖4係說明圖3中原始文件影傻- _ c ^ 像一值化的步驟流程。圖5係§兒明圖3中文件區塊抽取的三種切圖6、（a)和圖6(b)分別為一文件區^經垂=方向分割和水平方向分割成兩個區塊後的結果。圖6(0係說明將方形區塊’利°用影像的近似程纟，内縮成豉小不規則區塊。圖7係說明圖3中區塊分類判斷的步驟流程。圖8係說明圖3中垂直文件區塊順序判斷的步驟流程。圖9係說明圖3中行句與文字切割的步驟流程。圖1〇係說明圖3中文件影像自動重排的步驟流程。圖11 (a )〜11 (e)說明本發明之一較佳實施例。其中，圖11 (a)係一欲剪輯的全彩文件影像。圖11 (b)係圖11 (a)經本發明之自動文件剪輯系統的文件區坡插取單元和區塊分類判斷單元後的一實施結果。圖11 ( c)係圖11 ( b)再經本發明之自動文件剪輯系統的區塊順序判斷單元後的一實施結果。圖11 (d)係圖11 (c)再經本發明之自動文件剪輯系統的行句與文字切割單元，及利用重排單元依照各種樣式進行文件的重排，得到重排後的一個橫排的結果影像。圖11 (e)係圖11 (c)再利用本發明之自動文件剪輯系統的行句與文字切割單元及重排單元，依照各種樣式進行文件的重排，得到重排後的一個直排的結果影像。圖12(a)〜12(d)為關於本發明之自動文件剪輯系統的介面 4 63 09 9 五、發明說明（8) 圖。其中，圖1 2 (a )列出「檔案（F )」之功能表的各項指令。圖12(b)列出「底稿（L)」之功能表的各項指令。圖1 2 (c )列出「剪輯與重排（0 )」之功能表的各項指令。圖1 2(d)列出「影像與繪圖（I )」之功能表的各項指令。圖號說明 1 0 0文件區塊剪輯系統 1 0 2區塊分類判斷單元 1 0 4行句與文字切割單元 1 0 6原始文件影像 2 0 1中央處理單元 2 0 3操作介面 1 0 1文件區塊抽取單元 1 0 3 區塊順序判斷單元 1 0 5重排單元 1 0 7重排後文件影像 202記憶體單元 2 0 4工具列面板V. Description of the invention (7) FIG. 3 is a flowchart of the operation of the automatic file clipping method of the present invention. FIG. 4 is a flowchart illustrating the steps of the original file shadow__c ^ like-value in FIG. 3. Figure 5 is the three cuts of the file block extraction in Figure 3, Figure 6, (a) and Figure 6 (b) are a file area ^ divided by vertical = direction and horizontal direction into two blocks result. Fig. 6 (0 is an explanation of the approximation of the image of the square block, which is used to reduce the size into small irregular blocks. Fig. 7 is a flowchart illustrating the steps for determining the classification of the block in Fig. 3. Fig. 8 is an explanatory diagram. The flow of steps for judging the order of the vertical file blocks in Fig. 3. Fig. 9 illustrates the flow of steps for cutting lines and characters in Fig. 3. Fig. 10 illustrates the flow of steps for automatic rearrangement of document images in Fig. 3. Fig. 11 (a) ~ 11 (e) illustrates a preferred embodiment of the present invention. Among them, FIG. 11 (a) is a full-color document image to be edited. FIG. 11 (b) is FIG. 11 (a) via the automatic document editing system of the present invention. An implementation result of the file area slope interpolation unit and a block classification judgment unit. Fig. 11 (c) is an implementation result after the block sequence judgment unit of the automatic file editing system of the present invention. 11 (d) is FIG. 11 (c) The line and text cutting unit of the automatic document editing system of the present invention, and the rearrangement of the document according to various styles using the rearrangement unit to obtain a horizontal result after rearrangement Fig. 11 (e) is the automatic file editing system of Fig. 11 (c) which reuses the present invention The conventional line and text cutting unit and rearrangement unit rearrange the documents according to various styles to obtain a straight-line result image after rearrangement. Figures 12 (a) ~ 12 (d) are about the automatic of the present invention. The interface of the file editing system 4 63 09 9 V. Description of the invention (8) Figure. Among them, Figure 12 (a) lists the various commands of the function table of "File (F)". Figure 12 (b) lists " The manuscripts of the function table of the "Manuscript (L)". Figure 12 (c) lists the commands of the function table of "Clip and Rearrange (0)". Figure 12 (d) lists the "Image and Drawing" (I) ”in the function table. Drawing number description 1 0 0 file block editing system 1 0 2 block classification judgment unit 1 4 lines and text cutting unit 1 0 6 original document image 2 0 1 center Processing unit 2 0 3 operation interface 1 0 1 file block extraction unit 1 0 3 block order judgment unit 1 0 5 rearrangement unit 1 0 7 rearranged document image 202 memory unit 2 0 4 toolbar panel

2 0 5輸入單元 2 06輸出單元 2 0 7儲存單元2 0 5 input unit 2 06 output unit 2 0 7 storage unit

3 0 1原始影像二值化步驟 3 0 2 文件區塊抽取步驟 3 0 3 區塊分類判斷步驟 3 0 4 區塊順序判斷步驟 3 〇 5行句與文字切割步驟 3 0 6文件影像重排步驟 401 取得三原色分量之適當的臨界值 402根據一轉換公式，將原始文件影像轉換為二值化影像 7 0 1 文件區塊的前處理步驟 7 0 2 文件區塊的分類判斷步驟 9 0 1 行句切割步驟3 0 1 Binary step of original image 3 0 2 File block extraction step 3 0 3 Block classification determination step 3 0 4 Block order determination step 3 〇 Line and text cutting step 3 0 6 Document image rearrangement step 401 Obtain the appropriate critical value of the three primary color components. 402 Convert the original document image into a binary image according to a conversion formula. 7 0 1 Pre-processing steps for file blocks 7 0 2 Classification and judgment steps for file blocks 9 0 1 Line sentences Cutting step

第11頁 ^ 63 09 9Page 11 ^ 63 09 9

9〇2内文區塊文字的切割步驟 9 〇 3標題區塊文字的切割步驟發明之詳細說明圖1係本發明之自動文件剪蚯功能示意圖。如圖i所示，本：明構及各單元的包含-文件區塊抽取單元1〇1 剪輯系統1〇〇一區塊順序判斷單元1()3、—行判斷單元ι〇2、及-重排單元m。句與文字切割單元叫，以參考圖1 ’首先，文件區塊柚 π 像1。6内的每一區塊進：=1對原始文件影後，區塊分_刹斷二1圖文區塊分離出來。然斷。接：行區塊的圖或文性質做出判塊順序判斷單元103判別文字區塊的順序。 r宝m字切割單元1〇4進行精細到每一個文字的 1到元105依照各種樣式進行文件的重排付到重排後的結果影像1 07。為根據一本發明之自動文件剪輯系統1〇〇，搭配使用 pfnn处理單Λ201、一記憶體單元202，和一操作介面 '知各單70之間的自動文件剪輯功能的方塊示意圖。知作介面203更備有一工具列面板（panel)2〇4。Step of cutting text in block text in 920 9 Step of cutting text in block text in header Detailed description of the invention FIG. 1 is a schematic diagram of an automatic document cutting function of the present invention. As shown in Figure i, this: Bengo structure and the inclusion of each unit-file block extraction unit 101 clip system 100 block order judgment unit 1 () 3,-row judgment unit ι〇2, and- Rearrange unit m. The sentence and text cutting unit is called to refer to Figure 1 'First, the file block pomelo π is like every block in 1.6: 1 = After the original file is shadowed, the block is divided into _break off 2 1 graphic area The pieces are separated. Sure. Next: the block or sequence of the block makes a block order determination unit 103 to determine the order of the character block. The r-m word cutting unit 104 performs fine-graining to 1 to 105 of each character, and rearranges the document according to various styles. The rearranged result image 107 is displayed. A block diagram of an automatic document editing system 100 according to the present invention, using pfnn processing unit Λ201, a memory unit 202, and an operation interface 'knowledge unit 70'. The known interface 203 further includes a toolbar panel 204.

4 63 〇9 q 五、發明說明（10) 中央處理單元201用來執行自動單元的功能。記憶體單元202係作為糸統100之各間。藉由操作介面m上的工具歹=需上的資料儲存空使用者得以觀看選用，並且輕易地板顯示的功能’ 功能。操作介面203的功能顯示易方也^是作以各圖項开=牛自動剪輯㈣hie User interiaee)來達成式使用者介面（有視窗、對話方塊、主選單、工I 〉使用者介面包含 ke^及巨集UacrcO的使用環境/、，再配合熱鍵（h〇t 根據本發明’欲剪輯的文件可將文件影像輪入於自動文件煎輯系二用—輸入單元2〇5 如利用掃描器來輸入文件影像資料。中。輪入單元205 執行各項文件剪輯功能後的社杲，元206。輪出的方式有多種型態,、° ，輸出_至-輸出單單元，或列印至一輸出農置，或以一 /么顯不於-顯示至一伺服器中。，、子的文件資訊輸出根據本發明，經整理或欲可以用記憶體單元202内的新利用的，文件資料，使用者藉由操作介面2〇3上的預覽^尹來保存，以方4 尋該剪輯文件資料。儲存單元保覽存：功：來觀看或搜存入文件資料之相關的摘訊忐^的方式包括：ί …戈貝Λ，或存成為圖庫等。 4 63 〇9 9 ------- 五、發明說明（11) 圖3為本發明之自動女#酋發明之自動文件剪輯方法依序方法的運作流程圖。本步驟3(H、文件區塊抽取步驟3〇2 :值化 303、區塊順序判斷步 _ ^嵬刀類判斷步驟和文件影㈣乂丁句與文字切割步獅5，和文件影像重排步驟3〇6。以下說明各個步驟流程。圖4：係5兒明圖3令原始文株旦〈:禮一伯前所述，本發明之原：文二。： ”對紅、綠、藍三原色做矩^ 換&之，就是取得原始影像之三原色分量之適當，值，並將原始影像的灰階值（gray level)分成高於臨界的一組與小於臨界值的另一組。、界值參考圖4，首先，在步驟4〇1中，係對一件影像，根據矩量保持原理，分別取得三原八^ = 的臨界值Rt，Gt和Μ。然後，在步驟4〇2中刀置之適▲ 值Rt二Gt和Bt ’利用—轉換公式，將原始文件影J臨界素，設為二值化中的某—值，如黑或白。 / 、象本發明之實施例中所利用的轉換公式如下. 若（Rp < (Rt + rz1)/2 且 Gp4 63 〇9 q 5. Description of the invention (10) The central processing unit 201 is used to perform the functions of the automatic unit. The memory unit 202 functions as the system 100. By using the tool on the operation interface m == the required data is stored, the user can view and select the function and easily display it on the floor ’function. The display of the function of the operation interface 203 is also easy. ^ It is used to achieve the user interface (with windows, dialog boxes, main menus, tools, etc.) to achieve a user interface (automatic editing 牛 hie User interiaee).> The user interface contains ke ^ And the use environment of the macro UacrcO /, and then cooperate with the hot key (h〇t according to the present invention, the file to be edited can be rotated into the automatic file editing system dual-use input unit 205 if a scanner is used To input the file image data. Medium. The round-in unit 205 performs various file editing functions, Yuan 206. There are various types of rotation, °, output _ to-output single unit, or print to An output is an agricultural product, or is displayed in a server as a /. It is displayed on a server. According to the present invention, the newly-used, document data in the memory unit 202 can be used according to the present invention. , The user saves by previewing ^ Yin on the operation interface 203, and finds the clip file information by the side 4. The storage unit saves and saves: Function: to view or search the relevant digests of the file information 忐 ^ The ways include: ί ... Gobe Λ, Save it as a gallery, etc. 4 63 〇9 9 ------- V. Description of the invention (11) Figure 3 is the operation flow chart of the sequential method of the automatic file editing method of the automatic female # chief invention of the present invention. This step 3 (H, file block extraction step 3202: value 303, block order judgment step _ ^ knife class judgment step and file shadowing sentence and text cutting step lion 5, and file image rearrangement step 3 〇6. The following describes the steps. Figure 4: Figure 5 shows the original text Zhudan <: Li Yibo, the original of the present invention: Wen II .: "Do the three primary colors of red, green and blue Moment ^ change & is to obtain the proper value of the three primary color components of the original image, and divide the gray level of the original image into a group above the threshold and another group below the threshold. Referring to FIG. 4, first, in step 401, an image is obtained, and the critical values Rt, Gt, and M of the three original ^ = are respectively obtained according to the principle of maintaining the moment. Then, the knife is set in step 402. The appropriate value ▲ Rt two Gt and Bt 'Using the conversion formula, the original file shadow J critical element is set to binarization A - value, such as black or white /, embodiments of the present invention as utilized in the following conversion formula If (Rp < (Rt + rz1) / 2 and Gp..

GtGt

Gzi)/2)或Gzi) / 2) or

Up < (Rt + rz1)/2 且 Bp〈（Bt + βζΐ) 2) 或第u頁 4 63 09 9 五、發明説明（丨2) (GP < cGt + Gzl)/2 且 βρ < (Bt + bzi)/2)或則，像素P的顏色設為黑色；否則，像素p的顏色設為白色。其中， " ° KP表像素P的紅色分量；Rt表紅色分量的臨界值； BP表像素p的藍色分量；Bt表藍色分量的臨界值|Up < (Rt + rz1) / 2 and Bp <(Bt + βζΐ) 2) or page u 4 63 09 9 V. Description of the invention (丨 2) (GP < cGt + Gzl) / 2 and βρ < (Bt + bzi) / 2) Or, the color of the pixel P is set to black; otherwise, the color of the pixel p is set to white. Among them, " ° KP indicates the red component of pixel P; Rt indicates the critical value of red component; BP indicates the blue component of pixel p; Bt indicates the critical value of blue component |

Gp表像素p的綠色分量；Gt表綠色分量的臨界值；Gp represents the green component of pixel p; Gt represents the critical value of the green component;

Rzl表紅色分量在矩量保持二值化的偏移量； βζ1表藍色分量在矩量保持二值化的偏移量；Rzl indicates that the red component maintains a binarized offset in the moment; βζ1 indicates that the blue component maintains a binarized offset in the moment;

Gzl表綠色分量在矩量保持二值化的偏移量。圖係°兒明圖3中文件區塊抽取的、三種切割。本發明之文件區塊的抽取可包括單區塊的切割、單區塊智慧型的切割以及夕區塊智慧型的切割。以下說明各切割方法。首先說明一用此文件區塊之下）和（右，上），明各切割方法。文件區塊的表示方法。根據本發明，係利左下角與右上角之兩個二維座標，（左，的方式來記錄此文件區塊。接下來，說單區塊的切割是直接塊。所以，框取範圍的左以使用者框取出的外框當抽取區下角座標即（左，下），右上角座Gzl indicates that the green component maintains a binarized offset in the moment. The picture shows three kinds of cuts extracted from the file block in Fig. 3. The extraction of file blocks of the present invention may include single-block cutting, single-block intelligent cutting, and evening block intelligent cutting. Each cutting method will be described below. First of all, use this file block (below) and (right, top) to explain the cutting methods. Representation of file blocks. According to the present invention, the two-dimensional coordinates of the lower left corner and the upper right corner are used to record this file block. (Next, the cutting of a single block is a direct block. Therefore, the frame is taken to the left of the range. The outer frame taken from the user frame is the coordinates of the lower corner of the extraction area (left, bottom), and the upper right corner.

4 63 09 9 五、發明說明（13) 標即（右，上）。單區塊智慧型切割包含兩個步驟：（a )對於一張文件影像’先將文件區塊的约各範圍框取出來。（b)利用水平投景/(vertical projection)和垂直投影（horizontal4 63 09 9 V. Description of the invention (13) The title is (right, top). The single-block intelligent cutting includes two steps: (a) For a file image ', first extract about the range frames of the file block. (B) the use of horizontal projection / (vertical projection) and vertical projection (horizontal

Pro j ec t i on )來找出文件區塊的實際最小外框，也就是找出左下角座標（L，D)與右上角座標（R，U)。水平投影尋找方法為’利用水平投影，由上而下找出第一個有黑色投影的垂直座標分量即為";由下而上找到第一個有黑色投影的垂直座標分量，即為"D „。垂直投影尋找方法為’制垂直投影，由左而右找出黑，投影的水平座標分量，即為再來由右而左找到第-個黑色投影的水平座標分量，即為"R"。此時再利用影像的近似程度，對方形外框做更進一步内縮動作，就可以得到不規則形狀最小外框。因為一張影像的文件區塊之間都 ^ 些間格在水平投影或垂直投影上，舍：比較大的間格’这 & •kCwhite run)。多I塊智慧型切割就β& & 區- •又，將文件區塊分割、.出來。切割的古^ 士单切电丨兩锸,,^ μ ^的方向有垂直切割與水平切割兩種。此種文件區塊的切割 j d 包含下列步驟：Pro j ec t i on) to find the actual smallest outer frame of the file block, that is, find the lower left corner coordinates (L, D) and the upper right corner coordinates (R, U). The method of finding the horizontal projection is' Using horizontal projection, find the first vertical coordinate component with black projection from top to bottom is " find the first vertical coordinate component with black projection from bottom to top, which is " D „. The method of looking for the vertical projection is to make a vertical projection, find black from left to right, and the horizontal coordinate component of the projection, that is, to find the horizontal coordinate component of the first black projection from right to left, which is " R ". At this time, the approximate degree of the image is used again, and the square outer frame is further indented, so that the smallest outer frame of irregular shape can be obtained. On the horizontal or vertical projection, the round: a relatively large compartment 'this & • kCwhite run). Multiple I-block smart cutting will be β & & area-• Also, the file block is divided and cut out. Cut There are two types of ancient, single-cut electricity, vertical, horizontal, and ^ μ ^ directions. Cutting jd of this file block includes the following steps:

第16頁 4 63 09 9Page 16 4 63 09 9

l a )對於一張文件影像’將其約略範圍框取出來。 (b) 根據目前區塊的垂直投影最大白色區段長度與水平投影最大白色區段長度來決定每次切割的方向。 (c) 如果垂直投影最大白色區段長度大於水平投影的最大白色區段長度，就對區塊做垂直切割。反之 ’若水平投影最大白色區段長度大於垂直投影的最大白色區段長度，則做水平切割。 (d) 切割出來的區塊’繼續重複步驟（b)和（c)，以細切成更小的區塊。 (e) 當最小方形區塊已經切割出來後，再利用如單區塊智慧型拆解的方式，將各方形區塊内縮成最小不規則區塊。圖6(a)和圖6(b)分別為一文件區塊經垂直方向分割和水平方向分割成兩個區塊後的結果。其中，斜線部分表示白色區段。S1和S2分別為兩個白色區段的起始點，E1和以分別為兩個白色區段的結束點。由圖6 ( a )可窥出，文件區塊（L，D)和（R，U)。經垂直方向分割成兩個區塊（L，D)和 (S1,U) ’及（E1，D)和（R，U)。同樣地，由圖6(b)可看出，文件區塊（L，D )和（R，U )經水平方向分割成兩個區塊（L, E 2 ) 和（R，U)，及（L，D)和（R，S2)。圖6(c)則表示將方形區塊，利用影像的近似程度，内縮成最小不規則區塊。l a) For a document image ’, take out its approximate frame. (b) Determine the direction of each cut based on the maximum white segment length of the vertical projection and the maximum white segment length of the horizontal projection of the current block. (c) If the maximum white segment length of the vertical projection is greater than the maximum white segment length of the horizontal projection, the block is cut vertically. Conversely, if the maximum white segment length of the horizontal projection is greater than the maximum white segment length of the vertical projection, a horizontal cut is made. (d) The cut out block 'continues to repeat steps (b) and (c) to finely cut into smaller blocks. (e) After the smallest square block has been cut out, the smart block disassembly method such as a single block is used to shrink each square block into the smallest irregular block. Figure 6 (a) and Figure 6 (b) are the results after a file block is split into two blocks in the vertical direction and the horizontal direction, respectively. Among them, the oblique line indicates a white segment. S1 and S2 are the starting points of the two white segments, and E1 and S2 are the ending points of the two white segments, respectively. As can be seen from Fig. 6 (a), the file blocks (L, D) and (R, U). It is vertically divided into two blocks (L, D) and (S1, U) 'and (E1, D) and (R, U). Similarly, it can be seen from FIG. 6 (b) that the file blocks (L, D) and (R, U) are horizontally divided into two blocks (L, E2) and (R, U), and (L, D) and (R, S2). Figure 6 (c) shows that the square block is reduced to the smallest irregular block by using the approximate degree of the image.

第17頁 4 63 09 9 五、發明說明（15) 在文件影像中的文件區塊抽取出來之後，本發明接下來就是要判斷文件區塊的方向和類別。關於方向，垂直方向（vertical orientation)的文件區塊是指：區塊内的文字讀取的方向是由上而下；而水平方向（h〇riz〇ntal orientation)的文件區塊則是指：區塊内的文字讀取方向是由右到左或左到右。而文件區塊初步的分類如下： (a)圖片區塊（picture bl〇ck):泛指影像、圖片以及圖片說明。 (b) (c) 垂直標題區塊（vertical headline block):文字排列為垂直方向的標題區塊。 (d) 水平標題區塊（horizontal headline block): 文字排列為水平方向的標題區塊D 垂直内文區塊（vertical content block):文字排列為垂直方向的内文區塊。 (e)水平内文區塊（horizontal content block):文子排列為水平方向的内文區塊。本發明再次利用到垂直和水平兩個方向的投影，來分辦文件區塊的類別。同時定義一些參數來判斷文件區塊的類別，並利用到文件區塊兩個端點座標：（Lef t，D〇wn)、 (Right，Up)。以下為這些參數的定義：Page 17 4 63 09 9 V. Explanation of the invention (15) After extracting the file blocks in the file image, the present invention next determines the direction and type of the file blocks. Regarding the orientation, the vertical orientation file block refers to: the reading direction of the text in the block is from top to bottom; and the horizontal orientation (horizontal orientation) file block refers to: The text in the block is read from right to left or left to right. The preliminary classification of file blocks is as follows: (a) Picture block: Refers to images, pictures, and picture descriptions. (b) (c) Vertical headline block: The text is arranged in a vertical headline block. (d) Horizontal headline block: The headline block in which the text is arranged in the horizontal direction. D Vertical content block: The text block in which the text is arranged in the vertical direction. (e) Horizontal content block: The text is arranged as a horizontal text block. In the present invention, projections in both vertical and horizontal directions are used again to classify the types of file blocks. At the same time, some parameters are defined to determine the type of the file block, and two endpoint coordinates of the file block are used: (Lef t, Down), (Right, Up). The following are the definitions of these parameters:

Vert_Avg_Whi te :垂直方向投影的平均白色區段的長度；Vert_Avg_Whi te: the length of the average white segment projected in the vertical direction;

第18頁 4 63 09 9Page 18 4 63 09 9

Hori—AvgJhite :水平方向投影的平均白色區段的長度； ert_Avg_Black .垂直方向投影的平均黑色區段的長度； Hori一Avg一Black .水平方向投影的平均黑色區段的長度； Black_Rat1〇 :二值化後黑色點在文件區塊内所佔的比有了這些參數後，圖7為說明圖3中區塊分類判斷的步驟流程。參照圖7，說明此步驟流程如下：首先’進行步驟701，步驟701為文件區塊的前處理步驟’依序包含下列五個步驟：（a)利用圖5的抽取文件區塊的演算法找出文件區塊的兩端點（Left，Down)、（Right， Up)。（b)去掉區塊中的孤立的黑點（isolated black point)。（c)對此一文件區塊做垂直投影找出 Vert_Avg_White 和Vert_Avg_Black。（d)對此一文件區塊做水平投影找出Hori一Avg_White 和 Hori—Avg_Black。（e) 計算出Black_Ratio。接下來，步驟70 2為文件區塊的分類判斷。分有下列幾種狀況： (a)若（Black_Ratio> Q )或（Vert_Avg_White = 〇且Hori_Avg_Whi te = 0)，則，區塊設為圖片區塊。Hori-AvgJhite: the length of the average white segment projected in the horizontal direction; ert_Avg_Black. The length of the average black segment projected in the vertical direction; Hori-Avg-Black. The length of the average black segment projected in the horizontal direction; Black_Rat10: binary After the ratio of the black points in the file block after the conversion has been obtained, FIG. 7 is a flowchart illustrating the steps of determining the classification of the block in FIG. 3. Referring to FIG. 7, the flow of this step is described as follows: First, “step 701 is performed, and step 701 is a pre-processing step of the file block”, which includes the following five steps in order: (a) using the algorithm of extracting file blocks of FIG. 5 to find The two ends of the file block (Left, Down), (Right, Up). (B) Remove the isolated black points in the block. (C) Perform vertical projection on this file block to find Vert_Avg_White and Vert_Avg_Black. (D) Perform horizontal projection on this file block to find Hori_Avg_White and Hori_Avg_Black. (E) Calculate Black_Ratio. Next, step 702 is the classification judgment of the file block. There are the following situations: (a) If (Black_Ratio > Q) or (Vert_Avg_White = 〇 and Hori_Avg_Whi te = 0), then the block is set as a picture block.

第19頁 4 63 09 9 五、發明說明（17) 則此區文 (b)若 Hori_Avg_White > Vert_Avgjhite，塊設為水平内文區塊；否則，此區塊設為垂直内區塊 C2且，此區塊彀 (c)若（Right—Left)/Vert一Avg一Black S (Top—Down)/(Right—Left)2 C3，則為垂直標題區塊。 (d)若（Top—Down)/Hoir—Avg_BlackS C2 且 -Left)/(Top—Down)g C3，則，此區塊設為欢平標題區塊。 . 中其 -1 Γ^3 值值# 數數數常常常 AMW *θ AR定定定預預預為為為範圍為0. 5到1之間；範圍在1 0以下；. 範圍為大於1。區塊分類判斷步驟之後為區塊順序判斷的步土判斷文件的區塊順序時，將文件分成垂：本發 ;牛來處…先’先定義四個方向鄰近的區；：輿水平數如下：尼為四個表Page 19 4 63 09 9 V. Description of the invention (17) Then this area (b) If Hori_Avg_White > Vert_Avgjhite, the block is set to a horizontal text block; otherwise, this block is set to a vertical inner block C2 and, (C) If (Right-Left) / Vert-Avg-Black S (Top-Down) / (Right-Left) 2 C3, it is a vertical title block. (d) If (Top-Down) / Hoir-Avg_BlackS C2 and -Left) / (Top-Down) g C3, then this block is set as a flat title block. In its -1 Γ ^ 3 value value # Counts are often AMW * θ AR is set to pre-set to be in the range of 0.5 to 1; the range is below 10; the range is greater than 1 . When the block classification judgment step is followed by the block order judgment step to determine the block order of the file, the file is divided into vertical lines: the present; the source of the cattle ... First, first define the areas adjacent to the four directions; : Nepal is for four watches

Up—blk : 一區塊上面最接近且最左邊的區塊 Down_blk :-區塊下面最接近 ^Up-blk: the closest and leftmost block above a block Down_blk:-the closest below a block ^

Rigf : 一區塊右邊最接近且最下面的區\。。Rigf: The closest and lowest region to the right of a block \. .

第20頁Page 20

Lf: -區塊左邊最接近且最上二 4 63 09 9 五、發明說明（18) ^Lf:-The closest and top two on the left side of the block 4 63 09 9 V. Description of the invention (18) ^

Now一blk :指目前所在的區塊。而當區塊本身的某一方向沒有任何區塊時，則這個方命的鄰近區塊就設為空區塊（Null Bi〇ck)。有了這些參數後’圖8為說明圖3中垂直文件區塊順序判斷的步驟流程。參照圖8，依序說明此步驟流程如下：梦棘801:找出每個區塊的四個方向鄰近的區塊Up一bl k、Now a blk: Refers to the current block. When there is no block in a certain direction of the block itself, the neighboring block of this command is set as a null block. After having these parameters', FIG. 8 is a flowchart illustrating the steps of judging the order of the vertical file blocks in FIG. Referring to FIG. 8, the sequence of steps is described as follows: DreamBridge 801: Find four blocks adjacent to each block Up-bl k,

Down一blk 、Left_blk 、Right_blk; 少驟802:找出第一序號的區塊，將此區塊設為n〇 w_b lk，序號設為'—；少驟8 0 3:若Now_blk有Left_blk且Left_blk還未被排序，則依序進行下列三個步驟，否則，至步驟 804， ( 803a)設定此Lef t_blk 的Right_blk 為 Now_b1k ， (80 3b)設定Now—blk 為Now_blk 的Left_blk， (803c)序號加一，回至步驟803; 少#8 04 :若Now_blk有Right_blk，則依序進行下列兩個步驟，否則，至步驟805， (804a)設定此Right_blk 的Lef t_blk 為 Now_blk ， (804b)設定Now_blk 為Now_blk 的Right_blk，至Down a blk, Left_blk, Right_blk; less step 802: find the block with the first sequence number, set this block to now_b lk, and the sequence number to '—; less step 8 0 3: if Now_blk has Left_blk and Left_blk If not yet sorted, perform the following three steps in order, otherwise, go to step 804, (803a) set Right_blk of this Lef t_blk to Now_b1k, (80 3b) set Now_blk to Left_blk of Now_blk, (803c) serial number plus First, go back to step 803; Less # 8 04: If Now_blk has Right_blk, perform the following two steps in order, otherwise, go to step 805, (804a) set the Left_blk of this Right_blk to Now_blk, (804b) set Now_blk to Right_blk for Now_blk, to

第21頁 4 63 09 9 五、發明說明（19) 步驟8 0 4 ; 步驟805:若 Now_blk 的Down—bl、k 為Null Block，則依序進行下列兩個步驟’否則，設定此D〇wn_blk的 Up_blk 為Now一blk ，且Now_blk為Now_blk 的 Down_blk，且序號加一，並回至步驟803， (8 0 5 a )設定Now_b 1 k 為 No w_b 1 k 的 Le f t_b 1 k。 (8 0 5b)若 Now_blk = Null Block，則結束，否則，回至步驟8 0 5。水平文件區塊之順序判斷的步驟流程，與垂直文件區塊之順序判斷的步驟流程類似。大致是找到序號第一的區塊之後，先往下面找，找到文件影像的下邊線，再回溯到最上邊’然後往右一層，再往下邊找’一直到每個區塊都找過才停止。而加序號的限制歲垂直文件區塊之順序判斷一嫌。 " 本發明將圖片區塊與非圖片區塊分開處理。對於非層片區塊’將區塊内的行句與文字切割出來。並且’先將子句切割出來後，才對每句的文字作切割處理。非圖片區ij 包括水平與垂直的標題區塊和内文區塊。圖9說明圖3中4 句與文字切割的步驟包含了非圖片區塊内的行句切割步專 9〇1、内文區塊文字的切割步驟902，和標題區塊文字的i 割步驟903。Page 21 4 63 09 9 V. Description of the invention (19) Step 8 0 4; Step 805: If the Down_bl and k of Now_blk are Null Blocks, perform the following two steps in order. Otherwise, set this D0wn_blk Up_blk is Now_blk, Now_blk is Down_blk of Now_blk, and the sequence number is increased by one, and returns to step 803, (8 0 5 a) Set Now_b 1 k to Le f t_b 1 k of No w_b 1 k. (8 0 5b) If Now_blk = Null Block, the process ends. Otherwise, go back to step 8 0 5. The procedure for determining the order of horizontal file blocks is similar to the procedure for determining the order of vertical file blocks. Roughly after finding the block with the first serial number, first search below, find the lower edge of the file image, then go back to the top, then go to the next layer, and then look down. . The order of the numbered vertical file blocks is judged to be a suspicion. " The present invention separates picture blocks from non-picture blocks. For non-slice blocks, the lines and words in the blocks are cut out. And after the clauses are cut out, the text of each sentence is cut. The non-picture area ij includes horizontal and vertical title blocks and text blocks. FIG. 9 illustrates that the step of cutting four sentences and text in FIG. 3 includes a line cutting step 901 in a non-picture block, a text block cutting step 902, and a title block text cutting step 903. .

第22頁Page 22

II

4 63 09 9 五、發明說明（20) 步驟9 0 1之行句切割步驟 “ 塊，則此行句切割步驟依序’右處理的是水平文件區行的寬。（b)對每行的寬度，‘、'· (a)利用水平投影先找出每由左右兩端逼近行的長度，/故β垂直方向的局部投影。（c) 實際長度。若處理的是=亩=得到水平文件區塊内每行的句切割步驟依序為.（a)利件區塊，方法大致相同，行 (b)對每行的寬产，#| *垂直投影先找出每行的寬。 ’ q〜見没，做水平方， ^ ^ ^ ^ ^ ^,, # Λ /Λ ^;(c} ^τ ^ 度。又件區塊内母行的實際高步驟902之内文區塊文字的切 =分割和水平内文文字的分割。本發心為垂垂直内文' 的分割’制到整個區塊的投影各：文文子割文字”"内文文字的分割與垂直内；=投影來切唯要主忍—些方向的問題，在投影方和部份投影都要改為利㈣垂直投影。*整體投影步驟903之標題區塊文字的切割中，分為字的分割和水平標題文字的分割。本發明在垂直π顏文^ 分割上不利用整體投影，但利用部份投影的 $ 予 ±L·皆古 T I王，亦艮卩； =旯度，並且定義兩種引號的特性，將引號區別於—般文字之外。如此利用每格行寬的方式去切割標題令〜又 2知通文予的分割與垂直標題文字的分割類似，唯，要注意一些方向的問題，在投影方面，部份投影要改為利用到4 63 09 9 V. Description of the invention (20) Step 9 0 1 Line sentence cutting step "block, then this line sentence cutting step sequentially 'right handles the width of the horizontal file area line. (B) For each line Width, ',' · (a) First use the horizontal projection to find the length of each approximating line from the left and right ends, so β is the local projection in the vertical direction. (C) The actual length. If the processing is = mu = get the horizontal file The sentence cutting steps of each line in the block are in order: (a) block of sharp parts, the method is roughly the same, line (b) for the width of each line, # | * vertical projection first find the width of each line. q ~ See no, do the horizontal square, ^ ^ ^ ^ ^ ^ ,, # Λ / Λ ^; (c) ^ τ ^ degrees. Also the actual height of the parent line in the block. Cut = segmentation and segmentation of horizontal text. The original intention is the projection of the vertical segmentation of the vertical text to the entire block: the text sub-cuts the text "" segmentation of the text and vertical interior; = projection We must bear the main concern—for some orientation problems, the projection side and part of the projection must be changed to the vertical projection. * The cutting of the title block text in the overall projection step 903 , Divided into word segmentation and horizontal title text segmentation. The present invention does not use the overall projection in the vertical π Yanwen ^ segmentation, but uses a partially projected $ to ± L · Kigu TI King, also Gen 卩; = 旯It also defines the characteristics of two types of quotation marks, which distinguish them from ordinary text. In this way, the title line is cut by using the line width of each cell. The division of the text is similar to the division of vertical title text. , We should pay attention to some directions. In terms of projection, some projections should be used instead.

4 63 〇9 9 五、發明說明（21) 〜垂直投影。當文件區塊内的文字都予以分割完成，而且區塊之的順序也已經判斷，本發明就可以對文件做重新排列的^ 作。本發明有兩種文件重排的方法，一為手動重排，另— 為自動重排。對於手動重排，本發明之系統備有字句間距排版設定對話盒，供使用者選擇設定字句間距、字體大小、邊界等等。既可以美化重排後的版面排版，也提供使用者依照自己的喜愛來改變一些字句間排版的機會。而自動重排的方法可以方便使用者不用設定字距排版方式’所選擇的區塊’經過重新排列後，會盡量填滿整個決定的版面，讓版面上不會有太多的空白區域，看起來比較美觀。在自動重排中’本發明先讓區塊内的文字，逐一依照之前述的水平文#或垂直文件的方向排列完。若未達到預先定義的兩個限制，則將字體變大。其中一個限制是不能超過文件邊線，而另一傭是若有要顯示圖片區塊，則限定圖片區塊至多只能縮小到1 /L，其中L為大於1的數，再小的話，也算是超過預先定義的限制。圖丨〇說明圖3中文件影像自動重排的步驟流程。在改變字體的過程中，為了能排出較美觀的格式，其他設定也得調整，本發明設定一個參數count，來調整其他的設定。參照圖丨〇，現將自動文4 63 〇9 9 V. Description of the invention (21) ~ Vertical projection. When the text in the file block is divided and the order of the blocks has been determined, the present invention can rearrange the files. There are two methods of file rearrangement in the present invention, one is manual rearrangement, and the other is automatic rearrangement. For manual rearrangement, the system of the present invention has a sentence spacing typesetting setting dialog box for users to choose to set the sentence spacing, font size, borders, and so on. It can not only beautify the rearranged layout, but also provide users with the opportunity to change the layout of some words according to their own preferences. The automatic rearrangement method can facilitate the user without setting the kerning type 'selected block'. After rearrangement, it will try to fill up the entire layout, so that there are not too many blank areas on the layout. See It looks more beautiful. In the automatic rearrangement, the present invention first arranges the characters in the block one by one according to the aforementioned horizontal text # or vertical file direction. If the two predefined limits are not reached, the font will be enlarged. One of the restrictions is that it cannot exceed the file border, and the other is that if there is a picture block to be displayed, the picture block can only be reduced to at most 1 / L, where L is a number greater than 1, and even smaller, it is considered to be Exceeding predefined limits. Figure 丨〇 illustrates the process of automatic rearrangement of document images in Figure 3. In the process of changing the font, in order to discharge a more beautiful format, other settings must be adjusted. The present invention sets a parameter count to adjust other settings. Refer to the figure 丨〇

4 63 09 9 五、發明說明（22) 件重排的步驟依序描述如下： / 步驟1 00 1 : count設定為1 ; 步驟1 0 02 :依字句間格設定，按照找出的件區塊；汁徘列所有文步驟1 003:若顯示圖片區塊且有圖片區塊，塊，且進行步驟l〇03a，否1J圖片區 1 004 ；進仃步驟4 63 09 9 V. Description of the invention (22) The steps for rearranging the pieces are described as follows: / Step 1 00 1: Count is set to 1; Step 1 0 02: Set according to the space between the sentences, according to the block found Step 1 003: If a picture block is displayed and there is a picture block, block, and proceed to step 1003a, no 1J picture area 1 004; proceed to step

步驟1 003a:若圖片小於原圖的1/L 步驟1 00 5; 則進行步驟1 0 04 :若沒有超過文件的邊線限制，則 1 004a至1 004c，否則，進行步驟1〇〇5下列步驟步驟1 004a: β文字體的大小+Κι個像素題字體的大小+K2個像素，不步驟1 004b:若count除,以心餘Q，則 W1個像素，』政+ 若count除以&餘〇，則行 W2個像素，』祀十步驟 1 004c: count=c〇Unt + i，回至步 1 0 0 2 ；步驟1 0 0 5 :標題字體大小-K2個像素，推轩下別本时糸進灯下列步驟j 至步驟1 0 0 5b: 〇aStep 1 003a: If the picture is smaller than 1 / L of the original image Step 1 00 5; then proceed to Step 1 0 04: if the edge limit of the file is not exceeded, then 1 004a to 1 004c, otherwise, proceed to the following steps of step 105 Step 1 004a: β font size + K pixels pixel font size + K2 pixels, without step 1 004b: If count is divided by Q, then W1 pixels, 『政 + If count divided by & For the remaining 〇, the line is W2 pixels. 『Ten Step 1 004c: count = c〇Unt + i, go back to Step 1 0 2; Step 1 0 5: Title font size-K2 pixels, push Xuan Xia Time to enter the following steps j to steps 1 0 0 5b: 〇a

步驟1 0 0 5a:内文字體的大小-&個像素，並重新排列所有區塊， ” W 4 63 09 9 五、發明說明（23) 步驟1 0 0 5b : 若超過文件邊線的限制，或圖片小於1/L，則至步驟1 005a，否則結束。其中，I、K2、數值，且Κ2 > δ W! W2 和L為機定（default)的常 > W,Step 1 0 0 5a: the size of the inner font-& pixels, and rearrange all the blocks, "W 4 63 09 9 V. Description of the invention (23) Step 1 0 0 5b: If the limit of the document border is exceeded, Or if the picture is less than 1 / L, go to step 1 005a, otherwise end. Among them, I, K2, and κ2 > δ W! W2 and L are constants > W,

L 步驟1004是用來做其他設定的調整，而步驟i〇〇5是在利用内文文字的縮小來做最後微調的工作。圖1 1 ( a )〜1 1 ( e )說明本發明之一較佳實施例。此實施例中，將一全彩文件影像經以本發明之自動文件剪輯系統的各主要單元的功能對此彩色文件做分解與重新組合排列’而得到一個重排後的結果影像。圖丨丨（a)為一欲煎輯的全彩文件影像。大小約450 0K〜5 0 0 0K位元組（byte)，150 dPi，高1 5 0 0像素，寬looo像素。圖u(b)為 :::自：r牛剪輯系統的文件區塊抽取單元和區匕類皁兀後的一實施結果。圖11(c)為圖11(b)再經本發明的川i:ϊ:剪輯系統的區塊順序判斷單元，判別文字區塊 11 (。)再利用本發明早凡依照各種樣式進行文件 ^ 排排的結果影像。圖丨丨為gj i ，侍到重排後的一個横圖為圖11(C)再利用本發明之自動文 4 63 09 9 五、發明說明（24) 件剪輯系統的行句與文字切割單元及重排單元，依照各種樣式進行文件的重排，得到重排後的一個直排的結果影像。圖12(a)〜12(d)為關於本發明之自動文件剪輯系統的介面圖。以下摘述其功能表之一些重要指令。圖12(a)中’列出「檔案（F)」之功能表的各項指令。其中之π自動網頁出版"指令的作用為，自動將編輯完成的影像，轉變成html格式的檔案’以讓使用者可以用全球資訊網（world wide web，的瀏覽 g(br〇wser)來 f 輯後的結果。見、竭圖12⑴中，列Λ「底稿（L)」之功能表其中之自動去除底稿背景"指令的作用為令。對背景雜訊與顏色去除。而，"色彩調和"指令的仵衫像的底稿作減色的動作，以便壓縮檔案的大小。、作用為，圖12(c)中，列出「剪輯與重排（〇)」之叫 ^令。這些指令的主要作用為將文件區^抽取=表的各項單區塊的切割、單區塊智慧型的切割，以及 ^來，包括的切割。區塊智慧型圖12(d)中，列出「影像與繪圖（1) 之1力能表的各The L step 1004 is used to adjust other settings, and the step i005 is to use the reduction of the text to make the final fine-tuning work. 11 (a) to 1 (e) illustrate a preferred embodiment of the present invention. In this embodiment, a full-color document image is decomposed and recombined and arranged with the functions of the main units of the automatic document editing system of the present invention to obtain a rearranged result image. Figure 丨丨 (a) is a full-color document image to be edited. The size is about 450 0K to 5 0 0K bytes (bytes), 150 dPi, height 1550 pixels, wide looo pixels. Figure u (b) is the result of the implementation of the file block extraction unit and the file system of ::: from: r cattle editing system. FIG. 11 (c) is the block sequence judgment unit of the Chuan i: ϊ: editing system of FIG. 11 (b) according to the present invention, and the text block 11 (.) Is reused. The present invention has been used to arrange files according to various styles ^ Row of resulting images. Figure 丨丨 is gj i. A horizontal view after the rearrangement is shown in Figure 11 (C). The automatic text of the present invention is reused 4 63 09 9 V. Description of the invention (24) Line and text cutting unit of the editing system And rearrangement unit, rearrange the files according to various styles, and obtain a straight-lined result image after rearrangement. 12 (a) to 12 (d) are interface diagrams related to the automatic document editing system of the present invention. The following summarizes some important instructions of its function table. In Figure 12 (a), the commands of the menu of "File (F)" are listed. Among them, the function of the π automatic web page publishing command is to automatically convert the edited image into a file in html format, so that users can use the world wide web (browser) to browse The result after the f series is shown in Fig. 12 (b). The function of the "Automatically remove the background of the manuscript" in the menu of Λ "Manuscript (L)" is to order the background noise and color. And, " The color-blending manuscript of the shirt image is reduced in color so as to reduce the size of the file. The effect is as shown in Figure 12 (c), which is called "Clip and Rearrange (0)". These The main function of the instruction is to extract the file area ^ from the single block cutting of the table, the intelligent cutting of the single block, and the included cutting. The block intelligence is shown in Figure 12 (d). "Each of the Force Table of Image and Drawing (1)

4 63 09 9 ϊ ΐ二其中之"區塊影像減色"指令的作、 =顏色；目，以便減少儲存空間”4，減少區塊所的小視窗，稱之為圖庫區，可 ’圖12(d)之右 :，並且可以對這些存拖：：取出影像區作。幻鬼’作管理與搜尋的動倣八紐去動文件剪輯系統與方法不僅能對梦声令做分解與重新組合挑友小僅此對衫色文件 m 徘列，而被抽取出的文杜P掄，άΓ 規則的矩形，或是任h πt ®町文件區塊，可以是疋任何不規則的形狀。唯，以上所述者，備基不能以此限定本發明為本發明之較佳實施例而已，當利範圍所作之均等變施之範圍。即大凡依本發明申請專之範圍内。 ’ 輿修飾，皆應仍屬本發明專利涵芸4 63 09 9 ϊ ΐ Two of the "block image subtraction" instructions, = color; purpose, in order to reduce storage space "4, reducing the small window of the block, called the library area, can be Right of 12 (d) :, and you can drag and drop these :: Take out the image area to make. The ghost ghost's management and search of the dynamic copying system and method can not only decompose and re-create the dream order The combination picker can only list the shirt color file m, and the extracted Wendu P 抡, regular rectangle, or any h πt ® document file block can be any irregular shape. For the above, Beiji cannot use this to define the present invention as a preferred embodiment of the present invention. The scope of equivalent changes made by the scope of the right. That is, within the scope of the application of the present invention. Should still belong to the invention patent

Claims

4 63 09 9 VI. Scope of Patent Application 1. An automatic file editing system, including: a file block extraction unit, which is used to cut each block in an original file image to be edited, so as to convert graphics and text The blocks are separated out; a block classification and judgment unit is used to judge the graphic or text nature of the block; a block order judgment unit is used to judge the order of the text blocks; a line sentence and text cutting unit, It is used to cut each text from fine to fine; and a rearrangement unit 'arranges files according to various styles' and obtains the rearranged result image. 2. The automatic document editing system as described in the item of the patent application, wherein the automatic document editing system uses a central processing unit, a memory unit, and an operation interface to implement the inter-unit Automatic file clipping function. 3. The automatic file editing system as described in item 2 of the patent application scope, wherein the 'automatic file editing function between the units mainly includes a file function, a script function, a editing and rearranging function', and an image and green map function . 4. The automatic file editing system as described in item 2 of the patent application scope, wherein 'the operation interface further includes a library area' for dragging in and taking out the edited image blocks, and for those areas stored in the library Block, tube

笫 Page 29 ά 63 09 β 6. The scope of patent application and the function of searching. 5. The automatic document editing system according to item 2 of the scope of patent application, wherein the central processing unit is used to perform the functions of the units of the automatic document editing system, and the memory unit is used as the required data storage space, And through the function of the operation interface, the user can watch and choose, and easily operate the automatic editing function of various files. 6. The automatic document editing system according to item 5 of the scope of patent application, wherein the function display mode of the operation interface is achieved by a graphical user interface, which includes a window, a dialog box, a main menu, Toolbar, with the use of hotkeys and macros. 7. The automatic file editing system according to item 5 of the patent application scope, wherein the file to be edited is inputted into the automatic file editing system by an input unit. 8. The automatic file editing system as described in item 5 of the patent application scope, wherein the results after performing each file editing function are directly displayed on a display unit. 9. The automatic file editing system as described in item 5 of the patent application scope, wherein the results after executing each file editing function are output to a server with a shared file information.

Page 30 4 63 09 9

10. The automatic file editing system as described in item 5 of the scope of patent application, wherein the 'result after performing each file editing function' is directly listed on an output device. 1 1 · An automatic file editing method, including the following steps: (a) the original image binarization step, which converts the original file image to be edited into a binary image; (b) the file block extraction step, from the two Extract the file blocks from the valued image; (c) Block classification judgment step, perform the graphic or textual nature of the block, and determine the direction and category of the extracted file blocks. (D) Block order judgment step to determine the order of text blocks; (e) line and text cutting steps to cut each word finely; and (f) document image rearrangement steps to perform document formatting according to various styles Rearrange 'and get the rearranged result image. 12. The automatic file editing method as described in item 11 of the scope of patent application, wherein 'step (a) further includes the following steps:' (a 1) According to the principle of moment retention, the appropriate red, green, and blue primary color components are obtained separately. Critical values Rt, Gt and Bt; and, (a2) According to the critical values Rt, Gt and Bt, using a conversion formula, the grayscale values of the original document image are divided into groups higher than the critical value—groups and

4 63 09 9 VI. Another group whose patent application scope is less than the critical value. 13. The automatic file editing method according to item 11 of the scope of patent application, wherein the file block extraction in step (b) further includes single-block cutting, single-block intelligent cutting, and multi-block intelligent Cutting. 14. The automatic file editing method as described in item 11 of the scope of patent application, wherein the preliminary classification of the file blocks in step (c) includes the following five categories: (a) Picture block 'refers to images, pictures, and picture descriptions ; (B) vertical title block, text arranged as vertical title block; (c) horizontal title block 'text arranged as horizontal title block; (d) vertical text block, text arranged as vertical And (e) a horizontal text block in which text is arranged in a horizontal text block. 15. The automatic file clipping method as described in item 11 of the scope of patent application, wherein the step of judging the classification of blocks in step (c) includes pre-processing of file blocks, and classification and judgment of file blocks. 16. The automatic file clipping method as described in item 11 of the scope of patent application, wherein, when the block order of step (d) is judged, the file is divided into vertical

Page 32 4 63 〇9 3 VI. Application for Patent Scope Documents and Level Documents for Processing 17. 18. 19. The automatic document editing party as described in item 11 of the patent application scope, steps in step (e) and The text cutting step includes the step of illegally cutting the lines and sentences in the block, the text block in the text block, and the text block in the picture block title block. Evening, and the automatic file editing method described in item 11 of the scope of patent application, the file image rearrangement in step 'f' includes manual rearrangement and file rearrangement. -/ As in the method of automatic file editing described in item 13 of the patent application, tenth, the cutting of the single block is the outer frame directly taken out by the user frame; ^ Extract the block.

Page 33 A 63 09 9 VI. Patent application scope r '~-21 · The automatic file editing method described in item 丨 3 of the patent application scope, wherein' the multi-block smart cutting further includes the following steps: ( a) For-a file image ', take out its approximate frame; (b) determine the direction of each cut according to the maximum white segment length and horizontal scene length of the current block's vertical projection: ί the maximum white segment length; (c) If the maximum white segment length of the vertical projection is greater than the maximum white segment length of the horizontal projection, the block is cut vertically, otherwise 'the maximum white segment length of the right horizontal projection is greater than the maximum white segment length of the vertical projection, then Make horizontal cuts; (d) cut the blocks, continue to repeat steps (b) and (0) to finely cut into smaller square blocks; and, (e) use the image similarity relationship to divide each square block The outer frame shrinks to the smallest irregular frame. 22. The automatic file clipping method according to item 15 of the scope of patent application, wherein the pre-processing of the file block further includes the following steps: ', (a) find the two ends of the file block (left, (Bottom) and (right, top). (B) remove the isolated black point in the block; (c) make a vertical projection of the file block-find the length of the average white section projected vertically, and The length of the average spot color segment projected vertically; ^ (d) horizontally project the file block to find the length of the average white segment projected horizontally and the average black projected horizontally

A 63039

6. The scope of the patent application The length of the color segment; and (e) Calculate the example of the black point in the file block after binarization. 23. The automatic file editing method as described in item 22 of the scope of patent application, wherein the classification and judgment of the file block are classified into the following situations. Eight (a) If (the black point is in the Proportion of file block> C1) or (length of average black segment projected in vertical direction = 0 length of average white segment projected in different horizontal direction = 0), then this block is set as a picture area Block; (b) If (the length of the average white segment projected in the horizontal direction > the length of the average white segment projected in the vertical direction), the block is set as a horizontal text block, otherwise, the block is set as Vertical text block; (c) if (the width of the block / the length of the average black section projected in the vertical direction) SC2 and (the height of the block / the width of the block) S C3, then the area The block is set as a vertical title block; (d) If (the average black segment length of the block's height / horizontal projection) SC2 and (of the block, width / height of the block) S C3, then 5，1 之间。, This block is set as the horizontal title block; where q is a predetermined constant value 'range is 0.5 to 1. C2 is a predetermined constant value, in the range of less 1〇, C3 is a predetermined constant value, in the range of greater than 1.

Page 35 > -463 09 9 _____ VI. Patent application scope 24. The automatic file editing method described in item 丨 6 of the patent application scope, wherein the judgment of the vertical file block order includes the following steps: (a) Find the four blocks adjacent to each block Up-b lk, D〇wn_blk > Left_blk 'Right_blk; (b) Find the block with the first sequence number, and set the block to Now_b 1 k, The serial number is set to one. (C) If Now_blk has Left_blk and Left_blk has not been sorted, then perform the following 3 steps (cl) to (c3) in order, otherwise, go to step (d), (cl) set Lef Right_blk of t_blk is Now_blk, (c2) Set Now_blk to Left_blk, Left — blk, (c3) Add the serial number to one, and return to step (c); (d) If Now_blk has Right_blk, then proceed to the following (dl) Go to (d2) two steps, otherwise, go to step (e): (d 1) Set Le f t_b 1 k of Right_b 1 k to Now_b 1 k, (d2) Set Now_blk to Right_blk of Now_blk, and return to Steps (d), (e) If the Down_blk of Now_blk is a Null Block, perform the following steps (el) to (e2) in order, Then, set Up_blk of the Down_blk to Now_blk, and Now__blk is 0〇 ¥ 11_1) 11 ^ of the _13 11 ^, and the sequence number _ ^ one, and return to step (c), (el) Set Now_blk as No 夏 —Left_blk of jlk,

Page 36 463099 6. Scope of Patent Application (e2) If Now_blk = Empty Block, then it ends, otherwise, go back to step (e). 25. The automatic file clipping method described in item 17 of the scope of patent applications, wherein in the line cutting step, if a horizontal file block is processed, the line cutting includes the following (e 1) to (E3) 3 steps: (e1) first find the width of each line using horizontal projection; (e2) make a vertical local projection of the width of each line; and (e3) approach from the left and right ends The length of the line to get the actual length of each line in the horizontal file block. If the vertical file block is processed, the line cutting includes the following three steps (ell) to (e31) in order: (e 11) Use vertical projection to find the width of each line first; (e2 1) Make a local projection of the width of each line in the horizontal direction; and (e31) Approach the height of the line from the upper and lower ends to obtain the vertical file area The actual height of each row within the block. 26. The automatic file editing method as described in item 17 of the scope of patent application, wherein the cutting of the text block is divided into vertical text text division and horizontal text text division. 27. The automatic document editing method as described in item 25 of the patent application scope, wherein the vertical text segmentation uses a projection to the entire block and a partial projection of each line to cut the text, and the horizontal text Text

Page 37 4 63 09 9

6. Scope of patent application Characters are cut by vertical projection. 28. The automatic file editing method according to item 23 of the scope of patent application, wherein 'the cutting of the title block text' is divided into a vertical title text dichotomy and a horizontal title text dichotomy. 29. The automatic file clipping method described in item 28 of the scope of patent application 'wherein' the segmentation of the vertical title text uses the characteristics of partial projection, ie the line width 'and defines the characteristics of two types of quotation marks, distinguishing quotation marks from In addition to the general text, the text of the headline is cut in such a way that the width of each line is used, and the division of the text of the horizontal headline is used in the projection. The vertical projection is used to cut the text of the headline. 3 0. Such as the scope of patent applications The automatic file editing method described in item 18, the method of δ 'Hai automatic rearrangement in 1 includes the following steps: (fl) set the initial value of the parameter count to 1; (f 2) set according to the space between the sentences, according to Arrange the blocks in the order found; if (f 3) displays a picture block and there is a picture block, arrange the picture block and proceed to step 〇f3a) 'Otherwise, go to step (f4). (F3a ) If the picture is less than 1 / L · of the original picture, go to step 5) ;, (f 4) If the edge limit of the file is not exceeded, perform the following steps ((48) to (^ 4 (〇, otherwise 'to step (€ 5); (f4a) Increase the size of the inner font by K! Pixels The title character

Page 38 4S3 099 6. The size of the range of patent application plus K2 pixels, to step (f4b), (f 4b) If the value of count is divided by more than 1, the word spacing is increased by W !, if the value of count is divided If d2 is more than 0, the line spacing is increased by W2, (f4c) Increase the value of the parameter count by 1 and return to step (f2); (f5) Reduce the size of the title font by K2, and perform the following steps (f5a) to (f5b) : (f 5a) Decrease the size of the inner font and rearrange all blocks, 1 / L, then go to step (f 5a (f5b) if it exceeds the limit of the file border, or the picture is smaller than 'otherwise, end; L >

Page 39