S&F Ref: 834324 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3-chome, of Applicant : Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): Iain Bruce Templeton Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Colour reproduction in a colour document image The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c( 10464891 ) -1 COLOUR REPRODUCTION IN A COLOUR DOCUMENT IMAGE FIELD OF THE INVENTION The current invention relates to document layout analysis and segmentation and, in particular, relates to document object recognition. DESCRIPTION OF BACKGROUND ART The proliferation of scanning technology combined with ever increasing 5 computational processing power has lead to many advances in the area of document analysis systems. These systems may be used to extract semantic information from a scanned document, often by means of optical character recognition (OCR) technology. This technology is used in a growing number of applications such as automated form reading. These systems can also be used to improve compression of a document by 10 selectively using an appropriate compression method depending on the content of each part of the page. Improved document compression lends itself to applications such as archiving and electronic distribution. Document layout analysis applications such as OCR, automatic form processing and scan to high level document require a segmentation step to decompose a document 15 image from its raw pixel representation into a more structured format prior to the actual page layout analysis. This segmentation step dominates the overall speed and accuracy of these applications. Many existing applications employ a black and white segmentation, in which the document image is binarised into a binary image, consisting of black and white pixels. Regions are then formed by connected groups of black or white pixels. While 20 binarisation is an efficient technique, it suffers from its inability to distinguish and isolate adjoining colours of similar luminance. Furthermore, it throws away much useful colour information during the process. For complex documents with multi-colour foreground and 1046058_1 834324_speci -2 background objects, binarisation is clearly inadequate. Thus, a colour segmentation method is required. Effective colour segmentation is a challenging problem. This is especially important for scanned images due to scanning and printing artefacts which can pose 5 serious problems to identifying perceptually uniform colour regions. A perceptually uniform colour region in a digital image is a group of connected pixels (or pixels in close spatial proximity - i.e. spatially located) that a human observer interprets as semantically related. For example, the pixels that made up an alphabet character in a document image appear the same colour to the reader, even though the pixels may have subtly different 10 colours. There are two main approaches to full colour page layout analysis: bottom-up and top-down. The bottom-up approach examines each pixel in turn and groups adjoining pixels of similar colour values to form connected components. This method has the advantage of being efficient; however it is highly sensitive to noise and colour fluctuations 15 because of its lack of context information. Thus, it tends to produce a large number of erroneous connected components, resulting in fragmentation. In contrast, the top-down approach partitions a page into non-overlapping blocks. Each block is analysed and given a label using local features and statistics extracted at its full or a reduced resolution. This approach only provides a coarse segmentation into regions of interest, e.g., Block A is 20 likely to be text; and Block B is likely to be line-art. Pixel level segmentation can be achieved by further processing these regions of interest. Common pixel-level segmentation methods are based on binary connected component (CC) generation. These have the disadvantage that the resulting page layout description no longer contains information about the colour of the original pixels. One 1046058_1 834324_speci -3 method of determining the colour of the pixels is to re-examine the original input image, however this consumes extra time and requires that the original image is still extant. A second disadvantage of binary pixel analysis is that the decision for the binarisation decision is often based on colour, which may not reflect the actual structure of the page. A 5 pixel may be determined to be a background pixel when it is in fact a foreground pixel. Finally, binary pixel analysis can merge pixels from visual and logically distinct regions into a single object. SUMMARY OF THE INVENTION It is an object of the present invention to substantially overcome or at least ameliorate one or more deficiencies of known methods. The present invention provides a method of colour connected component (CC) 10 generation that maintains the colour information in the original document throughout the pixel analysis processing, and that is not affected by the noisiness of colour in the original image. According to one aspect of the present invention there is provided a method of extracting text from a page, said page comprising text and images of various colours being 15 represented as a bitmap image, the method comprising the steps of: dividing the bitmap image into a plurality of connected components, the connected components consisting of similar pixels that are closely spatially located; selecting a plurality of the connected components that represent text from said portion of a page; 20 grouping said plurality of connected components based on a logical structure to form a logical structure grouping of said portion of a page; 1046058_1 834324_speci -4 grouping the plurality of connected components that are included in the logical structure based on colour of said connected components to form a colour grouping; and storing said logical structure grouping and said colour grouping to represent said extracted text of a page. 5 According to another aspect of the present invention there is provided a method of segmenting a portion of a page comprising text, said portion of a page being represented as a bitmap image, the method comprising the steps of: dividing the bitmap image into a plurality of connected components, the connected components consisting of similar pixels that are closely spatially located; 10 selecting a plurality of the connected components that represent text from said portion of a page; grouping said plurality of connected components based on a logical structure to form a logical structure grouping of said portion of a page; grouping the plurality of connected components that are included in the logical 15 structure based on colour of said connected components to form a colour grouping, wherein the colour grouping is formed independently of the logical structure; and storing said logical structure grouping and said colour grouping to represent a segmented portion of a page. BRIEF DESCRIPTION OF THE DRAWINGS 20 One or more embodiments of the invention will now be described with reference to the following drawings, in which: Figure 1 is a schematic block diagram of a general purpose computer on which the embodiments of the invention may be practised; 1046058_1 834324_speci -5 Figure 2 comprising Figs 2(a) and 2(b) an illustration of segmented pixel sets in a typical document; ; Figure 3 is a schematic flow diagram illustrating a method of reproducing the colour of a document image according to one embodiment of the invention; 5 Figure 4 is a schematic flow diagram illustrating the method of dividing an image into connected components as used in the method of Figure 3; Figure 5 is a schematic flow diagram illustrating the method of quantising and segmenting an image as used in the method of Figure 4; Figure 6 is a schematic flow diagram illustrating the method of selecting text 10 components as used in the method of Figure 3; Figure 7 is a schematic flow diagram illustrating the method of grouping text components based on a logical structure as used in the method of Figure 3; Figure 8 is a schematic flow diagram illustrating the method of grouping text components based on colour as used in the method of Figure 3; 15 Figure 9 is a schematic flow diagram illustrating one embodiment of determining whether a text component should be included into a colour group as used in the method of Figure 8; Figure 10 is a schematic flow diagram illustrating a second embodiment of determining whether a text component should be included into a colour group as used in 20 the method of Figure 8; Figure I I is a schematic flow diagram illustrating a method of adding a CC to a colour group as used in Figure 8; 1046058_1 834324_speci -6 Figure 12 is a schematic flow diagram illustrating an alternate method of adding a CC to a colour group as used in Figure 8 that aims to minimise colour group fragmentation; Figure 13 is an example of a region of an input image containing two paragraphs of 5 text; Figure 14 is the input region of Figure 13 where some of the text has been grouped into line groups; Figure 15 is the input region of Figure 13 where the text has been grouped into paragraphs; 10 Figure 16 is a representation of a first colour group generated from the input image of Figure 13; Figure 17 is a representation of a second colour group generated from the input image of Figure 13. Figure 18 is an example of the enclosure of connected components. 15 DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION The method of image segmentation an particularly text extraction described herein may be implemented using a computer system 900, such as that shown in Fig. I wherein the processes of Figs. 2 - 18 may be implemented as software, such as one or more 20 application programs executable within the computer system 900. In particular, the steps of the method of image segmentation are effected by instructions in the software that are carried out within the computer system 900. The instructions may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code 1046058_1 834324_speci -7 modules performs the segmentation methods and a second part and the corresponding code modules manage a user interface between the first part and the user. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 900 from the computer 5 readable medium, and then executed by the computer system 900. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 900 preferably effects an advantageous apparatus for image segmentation. As seen in Fig. 1, the computer system 900 is formed by a computer module 901, 10 input devices such as a keyboard 902 and a mouse pointer device 903, and output devices including a printer 915, a display device 914 and loudspeakers 917. A scanner (not illustrated but well known) may be connected as a source of a bitmap pixel-based image. Bitmap images may also be derived from the networks. An external Modulator Demodulator (Modem) transceiver device 916 may be used by the computer module 901 15 for communicating to and from a communications network 920 via a connection 921. The network 920 may be a wide-area network (WAN), such as the Internet or a private WAN. Where the connection 921 is a telephone line, the modem 916 may be a traditional "dial up" modem. Alternatively, where the connection 921 is a high capacity (eg: cable) connection, the modem 916 may be a broadband modem. A wireless modem may also be 20 used for wireless connection to the network 920. The computer module 901 typically includes at least one processor unit 905, and a memory unit 906 for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The module 901 also includes an number of input/output (I/O) interfaces including an audio-video interface 907 that couples to the video 1046058_1 834324_speci -8 display 914 and loudspeakers 917, an I/O interface 913 for the keyboard 902 and mouse 903 and optionally a joystick (not illustrated), and an interface 908 for the external modem 916 and printer 915. In some implementations, the modem 916 may be incorporated within the computer module 901, for example within the interface 908. The 5 computer module 901 also has a local network interface 911 which, via a connection 923, permits coupling of the computer system 900 to a local computer network 922, known as a Local Area Network (LAN). As also illustrated, the local network 922 may also couple to the wide network 920 via a connection 924, which would typically include a so-called "firewall" device or similar functionality. The interface 911 may be formed by an 10 EthernetTM circuit card, a wireless BluetoothTM or an IEEE 802.11 wireless arrangement. The interfaces 908 and 913 may afford both serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 909 are provided and typically include a hard disk drive (HDD) 910. Other devices such as a 15 floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 912 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (eg: CD-ROM, DVD), USB-RAM, and floppy disks for example may then be used as appropriate sources of data to the system 900. The components 905, to 913 of the computer module 901 typically communicate via 20 an interconnected bus 904 and in a manner which results in a conventional mode of operation of the computer system 900 known to those in the relevant art. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations, Apple MacTM or alike computer systems evolved therefrom. 1046058_1 834324_speci -9 Typically, the application programs discussed above are resident on the hard disk drive 910 and read and controlled in execution by the processor 905. Intermediate storage of such programs and any data fetched from the networks 920 and 922 may be accomplished using the semiconductor memory 906, possibly in concert with the hard disk 5 drive 910. In some instances, the application programs may be supplied to the user encoded on one or more CD-ROM and read via the corresponding drive 912, or alternatively may be read by the user from the networks 920 or 922. Still further, the software can also be loaded into the computer system 900 from other computer readable media. Computer readable media refers to any storage medium that participates in 10 providing instructions and/or data to the computer system 900 for execution and/or processing. Examples of such media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 901. Examples of computer readable transmission 15 media that may also participate in the provision of instructions and/or data include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. The second part of the application programs and the corresponding code modules 20 mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 914. Through manipulation of the keyboard 902 and the mouse 903, a user of the computer system 900 and the application may manipulate the interface to provide controlling commands and/or input to the applications associated with the GUI(s). 1046058_1 834324_speci -10 Figure 2 provides an illustration of segmented pixel sets or portions in a typical document or page. In Figure 2(a) , the page document is shown to compose of four types of objects, with 610 representing an overall background portion, 620 a local background portion, 630 a text lines portion, and 640 an image/graphical object portion. Figure 2(b) 5 shows how these four types of objects may be grouped into semantically coherent sets of segmented pixels representing segmented portions of the page. It can be seen that each object type forms a single pixel set, except 630 which produces two pixel sets (650 and 660) due to the significant gap between the two paragraphs. Figure 3 shows a flow diagram of the present exemplary embodiment Which is 10 desirably implemented in software according to the application described above. An input image 600 is input to a process 100 which converts the input image 600 into a collection of connected components. At step 200 the connected components created at step 100 are examined and the text components are selected. Step 300 groups the text components selected at step 200 into a logical structure grouping. Step 400 groups the text components 15 selected at step 200 into a colour grouping. Finally step 500 stores the logical grouping from step 300 and the colour grouping from step 400 into memory. In reference to Figure 4 the process 100 of converting the input image into connected components will now be further explained. The input image is first quantised and segmented in step 110; however any suitable method of segmenting pixels may be 20 substituted. The segmented pixels created by process 110 are stored into a collection of sets of segmented pixels. The sets of segmented pixels in the collection of sets of segmented pixels are examined by a process 120 to form collected components. Each connected component produced by process 120 contains of a description of which pixels in the input 10460581 834324_speci -11 image 600 contributed to the connected component, and a reference to the segmented pixel sets as produced by process 120 from which the connected component was generated. The connected components consist of the adjacent pixels found within the segments described above. Step 130 creates a graph of the connected components that were formed 5 at step 120. In the preferred embodiment, this graph is a tree that is based on the containment of the connected components. With reference to Figure 5, the process 110 will now be further described. The input bitmap image 600 is divided into tiles at step 111 and then each tile is processed in turn. The size of the tiles is preferably 32 pixels by 32 pixels at 300 dots per inch 10 resolution and 64 pixels by 64 pixels at 600 dots per inch resolution. Process 112 will examine the pixels in a single tile to determine a set of quantised colours that best represent the tile. In the preferred embodiment the maximum number of quantised colours in a single tile is 4; however any suitable number of quantised colours may be used. Step 113 uses the quantised colours found by step 112 to segment the pixels in the 15 input image into a collection of segmented pixel sets. Each segmented pixel set consists of a one or more pixel segments contained within the tile and statistics that describe properties of the segmented pixel set. The pixel segments are made up of one or more adjacent pixels contained within the tile. Examples of statistics include colours, number of pixels, pixel density and the ratio of perimeter to area of the pixels. 20 Step 114 takes the segmented pixel sets created at step 113 for the current tile and looks for sets of segmented pixels from previous tiles which have similar values of their statistics. The previous tiles at step 114 may be the tile directly to the left and directly above the current tile, or any other suitable tiles. If a segmented pixel set from a previous 1046058_1 834324_speci -12 tile and a segmented pixel set from the current tile have similar values, then the two segmented pixel sets may be merged together. Decision 115 checks if there are any more tiles in the image to process, and if so returns to step 112, otherwise process 110 is complete. 5 Figure 6 further describes step 200 from Figure 3, where each connected component (CC) from the connected component graph produced by process 100 is examined individually to select the text CCs. Step 210 extracts statistics from the CC. Such statistics may include the information stored in the set of segmented pixels from which the CC generated. Examples of the statistics used include: size, density, aspect 10 ratio, straightness, and the relationship between the current CC and any nearby CCs. The CC relationship information may be distance between the edges of the CC, differences in the coordinates between the bounding boxes of the CCs or any other suitable information. The information may have been generated during step 100 or may be generated as part of step 220. 15 The information from 210 is used in a classification step 230 which uses the information to calculate the likelihood of whether the CC represents pixels from the original image that make up the text in the image. Decision step 230 examines the classification from step 220, and if it is text step 240 which stores the CC to a text CC set. If there are no more CCs to process at 250, then process 200 finishes, otherwise process 20 200 continues from step 210 processing the next CC in the CC graph. With reference to Figure 7, process 300 will now be explained further. Although this figure describes a method of finding a logical grouping of text CCs, there are other suitable methods that could be applied to the problem. The present exemplary embodiment does not rely on any particular method. The text CCs that were extracted from the CC 1046058_1 834324_speci -13 graph at step 200 are considered in more detail. Step 310 will examine the adjacency of the text CCs. In the preferred embodiment this involves calculating a nearest neighbour graph of the CCs. The text CCs are analysed to find the CC which is closest in the left, up, right and down directions. 5 Step 320 will form groups of CCs that correspond to text lines using the information generated by the adjacency analysis of step 310. The text lines consist of one or more CCs that make up what a human would consider a line of text, and thus represent a text characteristic or would be interpreted by a human as "text-like". After the text lines are formed, a second adjacency analysis 330 will be performed. 10 This adjacency analysis operates on the text lines created at step 320. The adjacency analysis examines the text lines using an adjacency analysis algorithm such as nearest neighbours to produce text line adjacency information. The text line adjacency information is used in step 340 to find a grouping of text lines that correspond to paragraphs on the input image. Step 350 will examine the CCs in the graph that were not 15 classified as text to refine the paragraph grouping. In the preferred embodiment, said refinement may include ensuring that the paragraphs added to the logical structure do not overlap any significant non-text regions of the input image. The paragraphs after refinement are added to a logical structure group at step 360. The paragraphs therefore form another text characteristic and may also be considered text-like. 20 Figure 8 is a flow diagram showing the process of grouping CCs into a colour grouping. The CCs in the connected component graph created at step 130 are examined one at a time. A decision step 420 examines a single CC and determines whether that CC should be included into a colour group. If so, step 430 adds the CC to the appropriate colour group. If at 440 there are more CCs to be processed control returns to 410, 1046058_1 834324_speci -14 otherwise process 400 is finished. The methods used by steps 420 and 430 are further described in later figures. Figure 9 shows a flow diagram of one embodiment of the decision step 420 of determining whether a CC should be included into a colour group. In the present 5 embodiment, step 421 checks if a CC was classified as a text CC during process 200. If the CC was classified as a text CC, the decision exits at terminator 422 with a result "YES"; otherwise the decision process exits at terminator 423 with a result "NO". Figure 10 shows a flow diagram of the preferred embodiment of the decision step 420 of determining whether a CC should be included into a colour group. In the preferred 10 embodiment, step 421 checks if a CC was classified as a text CC during process 200. If the decision was no, then the decision process terminates at 423 with a result "NO". If the CC was classified as a text CC, then decision 424 checks if the CC was included in the logical grouping. If the CC was not included in the logical structure grouping, then the decision step terminates at 423 with result "NO". Otherwise, the decision process 15 terminates at 422 with a result "YES". Figure I1 shows one method of adding a CC to a colour group as used in the method of colour grouping described in Figure 8. After it is determined that a CC should be a member of a colour group, step 431 will find the segmented pixel set created in step 113 of Figure 5, and the CC will be added to a collection of CCs that belong to the 20 segmented pixel set in step 432. Figure 12 shows an alternative method of adding a CC to a colour group, as used in the method of colour grouping described in Figure 8. In some cases the segmented pixel sets are fragmented, which can lead to the colour groups being fragmented. This method attempts to overcome the problem of colour group fragmentation by considering other 1046058_1 834324_speci -15 colour groups that were created for a different segmented pixel set. Step 431 searches for the segmented pixel set that is the source of the current CC. When found step 433 searches for a nearby segmented pixel set. The nearby segmented pixel set is preferably close in position on the input image and close in colour value or any other suitable pixel attribute. 5 Step 434 considers which of the segmented pixel sets is more appropriate. An example of a more appropriate set could be as follows. If the source segmented pixel set has only some small number of CCs, and the nearby pixel set has a large number of CCs spanning a large area portion of the page, then the nearby pixel set would be the more appropriate pixel set. 10 If the nearby segmented pixel set is more appropriate, step 435 adds the CC to the collection in the nearby segmented pixel set; otherwise at step 436 the CC is added to the source segmented pixel set found at step 433. Figure 13 is a figure showing a region 700 from an input image. The region consists of connected components, such as in the case of word 701 which is made up of 5 15 connected components, one for each letter; and word 702, made up of 9 connected components. In the figure 700, the words 701 and 703 have similar statistical attributes as used in step 114. The words 702 and 704 also have similar statistical attributes as used in step 114. The attributes for 701 and 702, and for 703 and 704 are not similar. For this input image region 700, the words 701 and 703 may be in the same segmented pixel set; 20 and the words 702 and 704 may be in the same segmented pixel set, depending upon any distance thresholds used for merging during step 114. Figure 14 is an example of the result of line grouping performed in step 320 of Figure 7. The connected components in the word 701 have been grouped into a text line 1046058_1 834324_speci -16 group 711. Similar the connected components in the word 702 have been grouped into a text line group 712. Figure 15 is an example of the result of paragraph grouping performed at step 340 of Figure 7. The CCs in the input image region 700 have been grouped into two 5 paragraphs 721 and 722. Figure 16 shows the output of process 400 of producing a colour grouping having been applied to the input image region 700. The first colour group 730 contains the words labelled 702 and 704 from the input image 700 as shown in Figure 13. Figure 17 shows a second output of process 400 of producing a colour grouping. 10 The colour group 740 shows the words from the input image 700 that are contained in the segmented pixel set associated with words 701 and 703; that is, the words that are not contained in the same segmented pixel set as the words 702 and 704. Figure 18 shows 5 connected components and the enclosure tree that is generated by those components. A first connected component is enclosed by a second connected 15 component if none of the first components contributing pixels exist outside of the second connected component. The ring 810 is the top level CC and is represented in the graph by node 811. The left arc 820 and the right arc 830 are both enclosed by the ring 810; therefore they are children of the ring node 811 at 821 and 831. The circle 840 is not enclosed by either the left arc 820 or the right arc 830, however it is enclosed by the ring 20 810. Therefore circle 840 is represented by node 841 which is a child of the ring node 811. Finally rectangle 850 is enclosed by the left arc 820, so rectangle node 841 is a child of left arc node 821. INDUSTRIAL APPLICABILITY 1046058_1 834324_speci -17 The arrangements described are applicable to the computer and data processing industries and particularly for scanning of images and image segmentation. The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and 5 spirit of the invention, the embodiments being illustrative and not restrictive. (Australia Only) In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of". Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. 1046058_1 834324_speci