AU2007249098B2 - Method of multi-level decomposition for colour document layout analysis - Google Patents

Method of multi-level decomposition for colour document layout analysis Download PDF

Info

Publication number
AU2007249098B2
AU2007249098B2 AU2007249098A AU2007249098A AU2007249098B2 AU 2007249098 B2 AU2007249098 B2 AU 2007249098B2 AU 2007249098 A AU2007249098 A AU 2007249098A AU 2007249098 A AU2007249098 A AU 2007249098A AU 2007249098 B2 AU2007249098 B2 AU 2007249098B2
Authority
AU
Australia
Prior art keywords
tile
colour
document
macroregion
dominant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2007249098A
Other versions
AU2007249098A1 (en
Inventor
Yu-Ling Chen
Steven Richard Irrgang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to AU2007249098A priority Critical patent/AU2007249098B2/en
Priority to US12/327,247 priority patent/US8532374B2/en
Publication of AU2007249098A1 publication Critical patent/AU2007249098A1/en
Application granted granted Critical
Publication of AU2007249098B2 publication Critical patent/AU2007249098B2/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/162Segmentation; Edge detection involving graph-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Facsimile Image Signal Circuits (AREA)
  • Image Processing (AREA)

Description

S&FRef: 831910 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3-chome, of Applicant : Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): Yu-Ling Chen, Steven Richard Irrgang Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Method of multi-level decomposition for colour document layout analysis The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(l 067395 1) -1 METHOD OF MULTI-LEVEL DECOMPOSITION FOR COLOUR DOCUMENT LAYOUT ANALYSIS TECHNICAL FIELD The current invention relates to document layout analysis and segmentation and, in particular, to document object representation. BACKGROUND ART 5 The proliferation of scanning technology combined with ever increasing computational processing power has lead to many advances in the area of document analysis. Document analysis systems may be used to extract semantic information from a scanned document, often by means of optical character recognition (OCR) technology. The scanning of a document results in the formation of a single layered image. Document 10 analysis is used in a growing number of applications such as automated form reading. Such systems can also be used to improve compression of a document by selectively using a compression method appropriate to the particular content of each part of a page of the document. Improved document compression lends itself to applications such as archiving and electronic distribution. 15 Document layout analysis applications such as OCR, automatic form processing and other scan-to-high-level document processes require a segmentation step to decompose a document image from its raw pixel representation into a more structured format prior to the actual page layout analysis. This segmentation step dominates the overall speed and accuracy of these applications. Many existing applications employ a black and white 20 segmentation, in which the document image is binarised into a binary image, consisting of black and white pixels. Regions are then formed by connected groups of black or white pixels. While binarisation is an efficient technique, it suffers from an inability to 1064869_1 831910_spec_02 -2 distinguish and isolate adjoining colours of similar luminance. Furthermore, it throws away much useful colour information during the process. For complex documents with multi-colour foreground and background objects, binarisation is clearly inadequate. Thus, a colour segmentation method is required. 5 Effective colour segmentation is a challenging problem. This is especially important for scanned images due to scanning and printing artefacts which can pose serious problems to identifying perceptually uniform colour regions. A perceptually uniform colour region in a digital image is a group of connected pixels (or pixels in close proximity) that a human observer interprets as semantically related. For example, the 10 pixels that made up an alphabet character in a document image appear the same colour to the reader. However on a closer inspection, the number of colours is usually far higher because of printing and scanning artefacts such as halftone, bleeding, and noise. The challenge is to satisfy the competing requirements of remaining stable to local colour fluctuations due to noise, in what would otherwise be a unitary colour structure in the 15 source image, whilst remaining sensitive to genuine changes, such as a change from white background to another light coloured background or smaller text in non constant coloured background.. Page decomposition is a form of colour segmentation that is specifically targeted at document image analysis. In addition to colour information, it uses knowledge of 20 document layout structure, and text and non-text characteristics extensively to aid the segmentation process. There are two main approaches to full colour page decomposition: bottom-up and top-down. The bottom-up approach examines each pixel in turn and groups adjoining pixels of similar colour values to form connected components. This method has the 1064869_1 831910_spec_02 -3 advantage of being efficient; however it is highly sensitive to noise and colour fluctuations because of its lack of context information. Thus, it tends to produce a large number of erroneous connected components, resulting in fragmentation. In contrast, the top-down approach partitions a page into non-overlapping blocks. Each block is analysed and given 5 a label using local features and statistics extracted at its full or a reduced resolution. This approach only provides a coarse segmentation into regions of interest, e.g., Block A is likely to be text; and Block B is likely to be line-art. Pixel level segmentation can be achieved by further processing these regions of interest, but an additional pass is both slow in software implementations and expensive in a hardware implementation. With a single 10 label per block, this approach is unsuitable for complex images where regions may consist of a number of different document content types, e.g., text over line-art. There is a need for a colour segmentation technique that can decompose a document image into document object representations that can represent complex document contents with pixel level accuracy, and at the same time takes into account local 15 context information that can be used to distinguish genuine changes from noise. SUMMARY Specifically disclosed is a method of generating a multi-layered document representation for classifying content components of a document from a single layered image. 20 In accordance with one aspect of the present disclosure there is provided a method of generating a multi-layered document representation of a document from an image of the document, the method comprising the steps of: (a) converting each of a plurality of tiles of predetermined size of said image into a representation having a plurality of layers, the representation corresponding to at 1064869_1 831910_spec_02 -4 least one said tiles comprising multiple coloured layers, each said tile comprising a superposition of the corresponding said coloured layers; and (b) merging, for each of said coloured layers, adjacent ones of said tiles, thereby generating a multi-layered document representation. 5 Other aspects are also disclosed. BRIEF DESCRIPTION OF THE DRAWINGS At least one embodiment of the present invention will now be described with reference to the drawings, in which: Fig. 1 is a flowchart illustrating a method for decomposing a colour document into 10 a multi-layered representation; Fig. 2 is a flowchart illustrating a method for macroregion generation; Figs. 3(a) and 3(b) illustrate macroregions of a typical document; Figs. 4(a) and 4(b) depict macroregion generation at tile level; Fig. 5 is a flowchart illustrating a method for finding a merging macroregion 15 candidate; and Fig. 6 is a flowchart of a method for comparing merging macroregion candidates; Fig. 7 is a schematic block diagram of a general purpose computer upon which arrangements described can be practiced; and Fig. 8 illustrates the dominant colour comparison between adjacent tiles. 20 DETAILED DESCRIPTION INCLUDING BEST MODE Digitising a paper document to an electronic form suitable for efficient storage, retrieval, and interpretation is a challenging problem. An efficient document object representation is needed to solve this problem. Page decomposition is an important stage in representing document images obtained by scanning. The performance of a document 1064869_1 831910_spec_02 -5 analysis system depends greatly on the correctness of the page decomposition stage. Presently disclosed is a technique of colour page decomposition for generating a multi layered document representation from a single layered document image. This representation is capable of representing complex document images such as posters and 5 brochures, where text on image and multi-coloured background is common. Specifically the original document may be considered a superposition of each of the layers of the multi layered document. The colour page decomposition methods presently disclosed may be implemented using a computer system 700, such as that shown in Fig. 7 wherein the processes of Figs. 1 10 to 6 may be implemented as software, such as one or more application programs executable within the computer system 700. In particular, the steps of the colour page decomposition methods are effected by instructions in the software that are carried out within the computer system 700. The instructions may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two 15 separate parts, in which a first part and the corresponding code modules performs the colour page decomposition methods and a second part and the corresponding code modules manage a user interface between the first part and the user. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 700 from the computer readable medium, 20 and then executed by the computer system 700. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 700 preferably effects an advantageous apparatus for colour page decomposition. 1064869_1 831910_spec_02 -6 As seen in Fig. 7, the computer system 700 is formed by a computer module 701, input devices such as a keyboard 702, a mouse pointer device 703 and scanner 718, and output devices including a printer 715, a display device 714 and loudspeakers 717. An external Modulator-Demodulator (Modem) transceiver device 716 may be used by the 5 computer module 701 for communicating to and from a communications network 720 via a connection 721. The network 720 may be a wide-area network (WAN), such as the Internet or a private WAN. Where the connection 721 is a telephone line, the modem 716 may be a traditional "dial-up" modem. Alternatively, where the connection 721 is a high capacity (eg: cable) connection, the modem 716 may be a broadband modem. A wireless 10 modem may also be used for wireless connection to the network 720. The computer module 701 typically includes at least one processor unit 705, and a memory unit 706 for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The module 701 also includes an number of input/output (1/0) interfaces including an audio-video interface 707 that couples to the video 15 display 714 and loudspeakers 717, an 1/0 interface 713 for the keyboard 702 and mouse 703 and optionally a joystick (not illustrated), and an interface 708 for the external modem 716, scanner 718 and printer 715. In some implementations, the modem 716 may be incorporated within the computer module 701, for example within the interface 708. The computer module 701 also has a local network interface 711 which, via a 20 connection 723, permits coupling of the computer system 700 to a local computer network 722, known as a Local Area Network (LAN). As also illustrated, the local network 722 may also couple to the wide network 720 via a connection 724, which would typically include a so-called "firewall" device or similar functionality. The interface 711 1064869_1 831910_spec_02 -7 may be formed by an EthernetTM circuit card, a wireless Bluetoothi" or an IEEE 802.11 wireless arrangement. The interfaces 708 and 713 may afford both serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards 5 and having corresponding USB connectors (not illustrated). Storage devices 709 are provided and typically include a hard disk drive (HDD) 710. Other devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 712 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (eg: CD-ROM, DVD), USB-RAM, and floppy disks 10 for example may then be used as appropriate sources of data to the system 700. The components 705 to 713 of the computer module 701 typically communicate via an interconnected bus 704 and in a manner which results in a conventional mode of operation of the computer system 700 known to those in the relevant art. Examples of computers on which the described arrangements can be practised include IBM-PC's and 15 compatibles, Sun Sparcstations, Apple MacTM or alike computer systems evolved therefrom. The scanner 718 may be used to scan pages of documents to provide a scanned image to the computer module 701, for storage in the HDD 710, for example. That scanned image may then be subject to image analysis and other processing to perform the 20 colour page decomposition tasks. Scanned images may also be sourced from the networks 720 and 722, for example. Typically, the application programs discussed above are resident on the hard disk drive 710 and read and controlled in execution by the processor 705. Intermediate storage of such programs and any data fetched from the networks 720 and 722 may be 1064869_1 831910_spec_02 -8 accomplished using the semiconductor memory 706, possibly in concert with the hard disk drive 710. In some instances, the application programs may be supplied to the user encoded on one or more CD-ROM and read via the corresponding drive 712, or alternatively may be read by the user from the networks 720 or 722. Still further, the 5 software can also be loaded into the computer system 700 from other computer readable media. Computer readable media refers to any storage medium that participates in providing instructions and/or data to the computer system 700 for execution and/or processing. Examples of such media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, a magneto-optical disk, or a computer 10 readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 701. Examples of computer readable transmission media that may also participate in the provision of instructions and/or data include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and 15 information recorded on Websites and the like. The second part of the application programs and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUls) to be rendered or otherwise represented upon the display 714. Through manipulation of the keyboard 702 and the mouse 703, a user of the computer system 700 20 and the application may manipulate the interface to provide controlling commands and/or input to the applications associated with the GUI(s). One or more of the methods of colour page decomposition may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the 1064869_1 831910_spec_02 -9 functions or sub functions of colour page decomposition. Such dedicated hardware may also include one or more microprocessors and associated memories. Fig. 1 is a flowchart of a method 100 for decomposing a coloured image into a multi-layered representation. The process in Fig. 1 employs a loop structure beginning in 5 step 120 where an input image 110, preferably an RGB image at a resolution of 300dpi, is partitioned into non-overlapping, uniform sized tiles, preferably of size 32 x 32 pixels. The tiles are preferably processed in raster order - that is from left to right, and top to bottom. The first tile to be processed is the top-left tile of the input image, and the last tile to be processed is the bottom-right tile of the input image. This form of tiling is used for 10 efficiency purposes. Alternatively, overlapping and non-fixed size tiles may be used. The tiles may alternatively be referred to as blocks. Steps 130 to 150 form a loop which processes each of the tiles. Step 130 receives tile data preferably in tile raster order, and in which every tile undergoes a detailed pixel colour analysis to generate dominant colours and tile statistics. A tile of size 32 x 32 15 has 1024 pixels, thus giving it a maximum possible of 1024 distinct colours. However, at 300dpi, a tile is a very small area, and can typically be represented by a limited number of colours. These colours are representative of the colours of the pixels in the tile and are referred to as the dominant colours of the tile. Pixels with the same dominant colour may or may not be connected. This can be 20 seen in Fig. 4(a) where two tiles 401 and 402 are shown side-by-side and which have dominant colours 403, 404 and 405. It is seen that the colour 405 represents one "connected" region, whereas each of the colours 403 and 404 has two connected regions 406,407 and 408,409 respectively. Useful tile statistics and other information such as pixel count, edge ratios, bitmap, and shared boundary pixel counts for each dominant 1064869_1 831910_spec_02 -10 colour may be extracted in step 130. Other statistics such as colour variance may also be calculated. Thus, a reference to a dominant colour refers to the representative colour of a set of pixels, and its associated statistics. The pixel colour analysis step 130 may be implemented in a number of different ways. It may involve a number of image processing 5 operations. A typical implementation may include the following processes: colour conversion, noise filtering, image enhancement, colour quantisation, dominant colour detection, and neighbourhood analysis, as known in the art. It has been determined through numerical experiments that a majority of the tiles in a document image with foreground information can be reliably represented by four colours 10 or less. Tiles may also be represented by more than four colours. Step 140 operates to generate macroregions. A macroregion in the context of the present description is a document structural object that encompasses a group of dominant colours with similar characteristics in close proximity. The grouping represents a region of semantically related coloured segments, such as a group of smaller regions, typically spread 15 across a number of tiles. Figs. 3(a) and 3(b) provide illustrations of macroregions on a typical document. In Fig. 3(a), a scanned document forming a single layered image 300 is shown composed of four types of objects, being an overall (white) background 310, a local (coloured) background 320, lines of (black) text 330 formed as two paragraphs 332 and334, and an image/graphical object 340. Fig. 3(b) shows how these four types of 20 objects may be grouped into multiple semantically coherent regions or macroregions that collectively form a multi-layered representation 399. It can be seen that each object type forms at least one macroregion. The background 310 forms a macroregion 370, the image 340 forms a macroregion 380, and the local background 320 forms a macroregion 390. The text lines 330 produce two macroregions 350 and 360 due to the 1064869_1 831910_spec_02 -11 significant gap between the two paragraphs 332 and 334. Although not accurately depicted in Fig. 3(b) the layer 370 has cut-outs sized and shaped to accommodate the overlying macroregions 380 and 390, and further, the macroregion 390 has cut-out corresponding to the outlines of the particular text characters present in the layers 350 and 360. As a 5 consequence, when the various macroregions 350-390 are superimposed as their layers they collectively represent the image 300. Note that the layering described here is not the same as layered objects in a graphical object rendering system where each object may have its own "z-level". In this description, the layers are for representative purposes to illustrate how the various macroregions superimpose. 10 Fig. 4(b) is an illustration of macroregions at tile level. In Fig. 4(a), the left tile 401 contains two dominant colours 403 and 405, and the right tile has three dominant colours 403, 404 and 405. Each dominant colour may be associated with a number of segments. For the purpose of macroregion generation, segments of the same dominant colour are treated as a single entity, whose combined statistics are used for determining 15 merging decisions. Thus any reference to a dominant colour of a tile refers to all the tile segments with the same dominant colour and its associated statistics. In this example, dominant colour segments with similar tile statistics are filled with the same pattern. These segments merge across the tile border, based on their statistics, to form three macroregions as shown in Fig. 4(b), corresponding to the dominant colours 403, 404 and 405. It can be 20 seen that the left tile 401 belongs to two macroregions, while the right tile 402 is part of three macroregions. A macroregion or a memory record of a macroregion may include the following data features: an average colour, bounding box coordinates, a binary mask, the number of tiles, the number of pixels, the number of neighbouring macroregions, pointers to those 1064869_1 831910_spec_02 -12 neighbours, and various statistics derived from edge, colour and contrast information within the macroregion. Processing in step 140 begins by receiving tile dominant colour and statistics in tile raster order, and by which each dominant colour is either merged to an existing 5 macroregion or converted to a new macroregion. Details of this macroregion generation process 140 will be explained further with reference to Fig. 2 below. Step 150 tests if any more tiles remain to be processed. If so, the method 100 returns to step 130 to perform pixel analysis on the next tile. Where there are no more tiles, the method 100 ends and the resulting macroregions form a multi-layered representation 160 of the input image 110. 10 By decomposing the document image 110 into the multi-level overlapping document object representation 160, it is possible to satisfy the conflicting requirement of remaining stable to local colour fluctuations due to various undesirable noises, and remaining sensitive to genuine changes in the document. This multi-layered representation 160 may be further processed by a classification stage 170 to provide 15 semantic labels to the macroregions. These labels can be used in a variety of applications 180 to generate a final output 190. Some typical examples of the applications 180 include document compression, OCR, and vectorisation. The macroregion generation step 140 is further expanded upon Fig. 2. A macroregion is formed by merging dominant colours with similar tile statistics in adjacent 20 tiles. The purpose of this step is to find suitable neighbouring macroregions for the current dominant colour to merge with. If there are no suitable neighbouring macroregions, a new macroregion is formed using the current dominant colour. In tile raster order processing, with the exception of the first raster row of tiles, at any instance there are two adjacent previously considered tiles to the current tile location: one from above and one from left. It 1064869_1 831910_spec_02 -13 is important to note that a tile may belong to more than one macroregion. In the preferred implementation, the first raster tile doesn't merge, and in first row, the current tile can merge with left adjacent tile. The process 140 in Fig. 2 begins with the current tile data and statistics as an 5 input 210. Step 220 commences a loop that operates for each dominant colour in the current tile. With a dominant colour, step 225 then obtains an adjacent tile. This adjacent tile can either be from above or left. In Step 230, the most suitable macroregion for merging from the adjacent tile is chosen as a "best match macroregion". The process of step 230 of determining the best match macroregion is further described below with 10 reference to Fig. 5. A test is then performed in step 240 on the best match macroregion to further determine its suitability for merging. This best match macroregion is stored in step 250 if it passes the test, otherwise that macroregion is ignored. The storage in step 250 creates a list of candidate macroregions. Processing continues at decision step 255 which it checks whether all adjacent tiles have been processed. If not, control returns to step 225 15 and the remaining adjacent tiles are processed. The processes described provide for the merging of macroregions of adjacent (above and left) tiles. Depending upon the type of implementation, merging may be performed with other connected tiles such as diagonal tiles. Once all the adjacent tiles have been processed, the potential merging candidate 20 macroregions are compared in step 260 in order to select or consolidate the merging candidates into a final merging candidate. Step 260 is described in detail below with reference to Fig. 6. A check at decision 265 is performed to determine if the candidate list, formed from step 250, is empty. If the list is empty, a new macroregion is formed using the current dominant colour and statistics in step 280. Otherwise the current dominant colour 1064869_1 831910_spec_02 -14 is merged with the final candidate in step 270. Step 290 tests for a further dominant colour within the tile and if such exists then returns to step 220 to re-commence the loop. This process is repeated for each of the dominant colours within the current tile. Fig. 5 illustrates step 230 in greater detail. The purpose of this step is to find the 5 best matching candidate from the neighbouring macroregions. The process 230 takes the dominant colour statistics and the macroregions 510 associated with the adjacent tile as an input. An adaptive threshold for minimum colour distance is determined in step 520. The adaptive threshold is made looser for dominant colours with few pixels. It is also adjusted according the edge ratio and tile border pixel count of the dominant colour segments. The 10 threshold is determined from the colour and the geometry of the distance between the candidates from adjacent tiles. In Step 530 each adjacent macroregion is selected in turn. The colour distance between the current dominant colour and this selected macroregion is then calculated in step 540. This distance may be calculated using the Euclidean (direct) distance or the city block (Manhattan) distance. 15 A check in step 550 is performed to determine if the determined colour distance is closer than the current closest distance. The closest distance in step 550 starts with the threshold decided in step 520 and may be replaced with a smaller distance stored in step 560. If it is (Yes), this macroregion and its colour distance are stored in step 560 and the process 230 moves to step 570. Otherwise processing continues at decision step 570 to 20 test for more macroregions. If such exist, control returns to step 530 until there are no more macroregions to process. Fig. 6 illustrates step 260 in greater detail and which involves a number of merging criteria. Processing begins with the dominant colour and a list of macroregion candidates 610 being received as input. A check is performed in decision step 620, as a first criteria to 1064869_1 831910_spec_02 -15 determine if the number of candidates is greater than one. If the answer is No, the processing of step 260 terminates. Otherwise, processing moves to step 630, where the candidate from the left adjacent tile is checked with the candidate from the above adjacent tile. If the two candidates are the same, forming a second criteria, one of the candidates is 5 removed from the candidate list in step 635, and the process 260 again terminates. Otherwise the colour distance between the two candidates is determined in step 640. This distance is calculated preferably using the Euclidean distance. Other distance metrics such as city block distance may also be used. If the colour distance is below a predetermined threshold, as determined in step 650 representing a third criteria, the two candidates are 10 merged in step 655, and processing again ends. Otherwise, the candidate with the closer colour distance to the dominant colour is chosen as a final merging candidate, while the other candidate is ignored. The process performed by Fig. 6 is depicted in Fig. 8 for a current tile and its two adjacent (left and above) tiles. Each tile may have up to four dominant colours. For each dominant colour in the current tile, a distance measure is made 15 with each dominant colour in each of the adjacent tiles. This is illustrated for one dominant colour in the current tile, Ccl. From that the two smallest (shortest) distances are identified, in this case Ci1 and Ca2. The geometry of macroregions C11 and Ca2 decides the threshold and a comparison between the two shortest distances determines whether or not colour Ccl is merged with either or neither of ClI or Ca2. 20 The net result of the process 100 described above is a multi-layered representation of a single layered image in which each layer represents a segment of the original image. By distinguishing between different segments of the source image, further processing may be performed on the layers or their respective content. Industrial Applicability 1064869_1 831910_spec_02 -16 The above that the arrangements described are applicable to the computer and data processing industries and particularly for decomposition of colour documents for layout analysis. The foregoing describes only some embodiments of the present invention, and 5 modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive. (Australia Only) In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of". Variations of the word "comprising", such as "comprise" and 10 "comprises" have correspondingly varied meanings. 1064869_1 831910_spec_02

Claims (20)

1. A method of generating a multi-layered document representation of a document from an image of the document, the method comprising the steps of: 5 (a) converting each of a plurality of tiles of predetermined size of said image into a representation having a plurality of layers, the representation corresponding to at least one said tiles comprising multiple coloured layers, each said tile comprising a superposition of the corresponding said coloured layers; and (b) merging, for each of said coloured layers, adjacent ones of said tiles, thereby 10 generating a multi-layered document representation.
2. A method according to claim 1 wherein said image of the document is a single layered image. 15
3. A method according to claim 1, wherein step (a) comprises: (aa) quantising colours of each said tile to a limited plurality of dominant colours associated with that said tile; and (ab) extracting tile information associated with each said dominant colour within each said tile to identify macroregions within each said tile associated with a corresponding 20 one of the tile dominant colours.
4. A method according to claim 1, 2 or 3 wherein at least step (a) is performed in raster tile order. 1064869_1 831910_spec_02 -18
5. A method according to claim 4, wherein steps (a) and (b) are performed in raster tile order.
6. A method according to claim 3 wherein step (b) comprises, for current one of said 5 tiles, the steps of: (ba) selecting a dominant colour of said current tile; (bb) examining each dominant colour of at least one adjacent tile to identify a corresponding best matching macroregion for that adjacent tile; (bc) comparing each said best matching macroregion with merging criteria; 10 (bd) where the merging criteria of one said best matching macroregion is satisfied, merging said dominant colour of said current tile with said one best matching macroregion; and (be) where the merging criteria is not satisfied, creating a new macroregion from said dominant colour of said current tile. 15
7. A method according to claim 6 wherein step (b) is performed in raster tile order and the adjacent tile comprises at least one of a tile to the left of the current tile and a tile above the current tile. 20
8. A method according to claim 7 wherein step (bb) comprises: (bba) determining a (first) threshold; (bbb) determining a colour distance value between the dominant colour of said current tile and each dominant colour of the adjacent tile; and 1064869_1 831910_spec_02 -19 (bbc) using the first threshold to assess a best match between the dominant colours to thereby identify the corresponding best matching macroregion in the adjacent tile. 5
9. A method according to claim 7 wherein step (bc) comprises (bca) determining a colour distance between best matching macroregions; (bcb) comparing the colour distances with a (second) threshold; and (bcc) retaining the macroregion closest to the dominant colour of the current tile as the one best matching macroregion. 10
10. A computer readable medium having a computer program recorded thereon, the program being executable in a computer to generate a multi-layered document representation from an image of a document, the program comprising: code for converting each of a plurality of tiles of predetermined size of said image 15 into a representation having a plurality of layers, the representation corresponding to at least one said tiles comprising multiple coloured layers, each said tile comprising a superposition of the corresponding said coloured layers; and code for merging, for each of said coloured layers, adjacent ones of said tiles thereby generating a multi-layered document representation; 20
11. A computer readable medium according to claim 10, wherein said image of the document is a single layered image and said code for converting comprises: code for quantising colours of each said tile to a limited plurality of dominant colours associated with that said tile; and 1064869_1 831910_spec 02 -20 code for extracting tile information associated with each said dominant colour within each said tile to identify macroregions within each said tile associated with a corresponding one of the tile dominant colours. 5
12. A computer readable medium according to claim 10 or 11 wherein the image is processed in raster tile order.
13. A computer readable medium according to claim 12, wherein said code for merging comprises code, operative upon a current one of said tiles, to: 10 select a dominant colour of said current tile; examine each dominant colour of at least one adjacent tile to identify a corresponding best matching macroregion for that adjacent tile; compare each said best matching macroregion with merging criteria; merging, where the merging criteria of one said best matching macroregion is 15 satisfied, said dominant colour of said current tile with said one best matching macroregion; and creating, where the merging criteria is not satisfied, a new macroregion from said dominant colour of said current tile. 20
14. A computer readable medium according to claim 13 wherein the adjacent tile comprises at least one of a tile to the left of the current tile and a tile above the current tile.
15. A computer readable medium according to claim 13 or 14 wherein the code operative to examine comprises: 1064869_1 831910_spec_02 -21 code for determining a (first) threshold; code for determining a colour distance value between the dominant colour of said current tile and each dominant colour of the adjacent tile; and code for using the first threshold to assess a best match between the dominant 5 colours to thereby identify the corresponding best matching macroregion in the adjacent tile.
16. A computer readable medium according to claim 14 or 15 wherein the code operative to compare comprises: 10 code for determining a colour distance between best matching macroregions; code for comparing the colour distances with a (second) threshold; and code for retaining the macroregion closest to the dominant colour of the current tile as the one best matching macroregion. 15
17. A system for generating a multi-layered document representation of a document from an image of the document, said system comprising: a scanning device arranged to scan the document to form a raw pixel image of the document; a computer apparatus configured to receive the image of the document and to 20 partition the image into a plurality of tiles of predetermined size and to process the tiles in raster tile order, the processing comprising: (a) converting each of the tiles into a representation having a plurality of layers, the representation corresponding to at least one said tiles comprising multiple coloured 1064869_1 831910_spec_02 -22 layers, each said tile comprising a superposition of the corresponding said coloured layers; and (b) merging, for each of said coloured layers, adjacent ones of said tiles, thereby generating a multi-layered document representation; 5
18. A method of generating a multi-layered representation of an image of a document substantially as described herein with reference to any one of the embodiments as that embodiment is illustrated in the drawings. 10
19. A computer readable medium having a program recorded thereon, the program being adapted to make a computer execute a method according to any one of claims 1 to 15 or 18.
20. Computer apparatus adapted for performing a method according to any one of 15 claims I to 15 or 18. Dated this 17th day of December 2007 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant 20 Spruson&Ferguson 1064869_1 831910_spec_02
AU2007249098A 2007-12-05 2007-12-18 Method of multi-level decomposition for colour document layout analysis Ceased AU2007249098B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2007249098A AU2007249098B2 (en) 2007-12-05 2007-12-18 Method of multi-level decomposition for colour document layout analysis
US12/327,247 US8532374B2 (en) 2007-12-05 2008-12-03 Colour document layout analysis with multi-level decomposition

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
AU2007237365A AU2007237365B2 (en) 2007-12-05 2007-12-05 Colour reproduction in a colour document image
AU2007237365 2007-12-05
AU2007249098A AU2007249098B2 (en) 2007-12-05 2007-12-18 Method of multi-level decomposition for colour document layout analysis

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU2007237365A Division AU2007237365B2 (en) 2007-12-05 2007-12-05 Colour reproduction in a colour document image

Publications (2)

Publication Number Publication Date
AU2007249098A1 AU2007249098A1 (en) 2009-06-25
AU2007249098B2 true AU2007249098B2 (en) 2010-03-04

Family

ID=40822813

Family Applications (4)

Application Number Title Priority Date Filing Date
AU2007237365A Ceased AU2007237365B2 (en) 2007-12-05 2007-12-05 Colour reproduction in a colour document image
AU2007249103A Ceased AU2007249103B2 (en) 2007-12-05 2007-12-18 Document analysis method
AU2007249098A Ceased AU2007249098B2 (en) 2007-12-05 2007-12-18 Method of multi-level decomposition for colour document layout analysis
AU2007249099A Ceased AU2007249099B2 (en) 2007-12-05 2007-12-18 Block-based noise detection and reduction method with pixel level classification granularity

Family Applications Before (2)

Application Number Title Priority Date Filing Date
AU2007237365A Ceased AU2007237365B2 (en) 2007-12-05 2007-12-05 Colour reproduction in a colour document image
AU2007249103A Ceased AU2007249103B2 (en) 2007-12-05 2007-12-18 Document analysis method

Family Applications After (1)

Application Number Title Priority Date Filing Date
AU2007249099A Ceased AU2007249099B2 (en) 2007-12-05 2007-12-18 Block-based noise detection and reduction method with pixel level classification granularity

Country Status (1)

Country Link
AU (4) AU2007237365B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2009201252B2 (en) * 2009-03-31 2011-06-02 Canon Kabushiki Kaisha Colour correcting foreground colours for visual quality improvement

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6369830B1 (en) * 1999-05-10 2002-04-09 Apple Computer, Inc. Rendering translucent layers in a display system
US20030202697A1 (en) * 2002-04-25 2003-10-30 Simard Patrice Y. Segmented layered image system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115496A (en) * 1995-03-02 2000-09-05 Apple Computer, Inc. Method and apparatus for accelerating image data compression
AU6251496A (en) * 1995-06-05 1996-12-24 Apple Computer, Inc. Block classification for accelerating image data compression
US5883973A (en) * 1996-02-20 1999-03-16 Seiko Epson Corporation Method and apparatus for processing a document by segmentation into text and image areas
ATE360863T1 (en) * 1999-02-05 2007-05-15 Samsung Electronics Co Ltd METHOD AND DEVICE FOR PROCESSING COLOR IMAGES
US6778697B1 (en) * 1999-02-05 2004-08-17 Samsung Electronics Co., Ltd. Color image processing method and apparatus thereof
NL1015943C2 (en) * 2000-08-16 2002-02-19 Ocu Technologies B V Interpretation of colored documents.
US7260259B2 (en) * 2002-01-08 2007-08-21 Siemens Medical Solutions Usa, Inc. Image segmentation using statistical clustering with saddle point detection
JP5008572B2 (en) * 2004-12-21 2012-08-22 キヤノン株式会社 Image processing method, image processing apparatus, and computer-readable medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6369830B1 (en) * 1999-05-10 2002-04-09 Apple Computer, Inc. Rendering translucent layers in a display system
US20030202697A1 (en) * 2002-04-25 2003-10-30 Simard Patrice Y. Segmented layered image system

Also Published As

Publication number Publication date
AU2007249099A1 (en) 2009-06-25
AU2007249103A1 (en) 2009-07-02
AU2007237365B2 (en) 2011-05-12
AU2007249103B2 (en) 2011-05-12
AU2007249098A1 (en) 2009-06-25
AU2007237365A1 (en) 2009-06-25
AU2007249099B2 (en) 2011-12-01

Similar Documents

Publication Publication Date Title
US8532374B2 (en) Colour document layout analysis with multi-level decomposition
JP6151763B2 (en) Word segmentation for document images using recursive segmentation
US8503781B2 (en) Finding text regions from coloured image independent of colours
US7623712B2 (en) Image processing method and apparatus
JP4859025B2 (en) Similar image search device, similar image search processing method, program, and information recording medium
US20040218838A1 (en) Image processing apparatus and method therefor
US7386171B2 (en) Activity detector
US8126270B2 (en) Image processing apparatus and image processing method for performing region segmentation processing
US7596271B2 (en) Image processing system and image processing method
JPH02105978A (en) System and method for automatic document segmentation
JP2005509223A (en) Apparatus and method for code recognition
JP2006246435A (en) Image processing apparatus, control method thereof, and program
US20070133031A1 (en) Image processing apparatus and image processing method
CN1719865A (en) Image processing system and image processing method
US20110075932A1 (en) Image processing method and image processing apparatus for extracting heading region from image of document
US6360006B1 (en) Color block selection
US20140086473A1 (en) Image processing device, an image processing method and a program to be used to implement the image processing
AU2007249098B2 (en) Method of multi-level decomposition for colour document layout analysis
JP2004178562A (en) Image segmentation by graph
US8126193B2 (en) Image forming apparatus and method of image forming
Simske Low-resolution photo/drawing classification: metrics, method and archiving optimization
JP2011018311A (en) Device and program for retrieving image, and recording medium
US7805000B2 (en) Image processing for binarization of image data
JP2019153230A (en) Information processor and information processing program
US11948342B2 (en) Image processing apparatus, image processing method, and non-transitory storage medium for determining extraction target pixel

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)
MK14 Patent ceased section 143(a) (annual fees not paid) or expired