GB2275844A

GB2275844A - Image Zone Discrimination Using a Neural Network

Info

Publication number: GB2275844A
Application number: GB9403151A
Authority: GB
Inventors: Michael C Murdock
Original assignee: INFO ENTERPRISES Inc
Current assignee: INFO ENTERPRISES Inc
Priority date: 1993-03-02
Filing date: 1994-02-18
Publication date: 1994-09-07
Also published as: JPH0773154A; GB9403151D0

Abstract

An image zone discrimination system 10 and method automatically identify whether a region of a raster image contains text, pictures, or tables by using a neural network 13. The neural network 13 includes neurons 14 which subsample the region and a multi layer perceptron 16 which determines whether the region is text, a picture, or a table. <IMAGE>

Description

IMAGE ZONE DISCRIMINATION SYSTEM AND METHOD Technical Field This invention relates generally to document conversion systems and methods and, in particular, to an image zone discrimination system and method for identifying automatically text, pictures, and tables from a page of a document.

Background of the Invention A page of a document may contain different types of image zones such as text, pictures, and tables. Pictures include drawings, figures, images, photographs, or any other type of illustrative markings. In conventional document conversion systems, each of these types of image zones is processed differently and separately. For example, an image zone containing text needs to be processed by an optical character recognition system to recognize characters of the text. Similarly, an image zone which contains pictures needs to be processed by a raster-tovector system. However, an image zone which contains pictures does not require any processing by an optical character recognition engine.

Conventional document conversion systems can separate pieces of a raster image into image zones or blocks of information.

For example, if a raster image contains a picture and text describing the picture, conventional document conversion systems can separate the picture and the text each into an unlabeled image zone. Some conventional document conversion systems can highlight the separated blocks of information or image zones.

However, before the image zones can be processed, a document conversion operator must identify what type of image zone the highlighted part of the raster image is and manually label the identified image zone with its zone type (text, image, table, handwriting). Therefore, there exists a significant need to identify automatically an image zone type from a portion of a raster image and to label the portion of the raster image with the correct image zone type.

Summary of the Invention The present invention has utility in identifying automatically by using a neural network whether an image zone contains text, a table, or a picture.

Thus, it is an advantage of the present invention to determine, identify, or recognize automatically by using a neural network whether an image zone type is text, a table, or an picture image.

According to one aspect of the invention, an image zone discrimination system is provided which is connectable to receive a plurality of image zones. Each of the image zones has one of a plurality of zone types including text, a picture, or a table. The image zone discrimination system comprises (a) a region identifier which selects a region from each of the image zones and (b) a neural network connected to the region identifier, the neural network sampling the region and identifying a zone type of the subsampled region of the image zone.

According to another aspect of the invention, an image zone discrimination method is provided which is executed on a computer as part of a computer program. The method identifies a plurality of zone types including text, pictures, images, and tables from a page of a document. The computer is connectable to receive a plurality of image zones. The method comprises the steps of (a) selecting a region from each of the image zones; (b) subsampling the region; and (c) identifying automatically a zone type of the subsampled region of the image zone.

Brief Description of the Drawings The invention is pointed out with particularity in the appended claims. However, other features of the invention will become more apparent and the invention will be best understood by referring to the following detailed description in conjunction with the accompanying drawings in which: FIG. 1 shows an image zone discrimination system in accordance with a preferred embodiment of the invention; FIG. 2 shows a flowchart of an image zone discrimination method in accordance with a preferred embodiment of the invention; and FIG. 3 shows one of a plurality of fixed-weight, binary threshhold subsampling neurons in accordance with a preferred embodiment of the invention.

Description of the Preferred Embodiments FIG. 1 shows an image zone discrimination system 10 in accordance with a preferred embodiment of the invention. The image zone discrimination system 10 comprises a region identifier 12 and a neural network 13. Neural network comprises a layer of subsampling neurons 14 and a multilayer perceptron 16.

As shown in FIG. 1, a raster image is received by zone segmenter 5 which separates portions of the raster image into blocks of similar information or image zones. Zone segmenter 5 also extracts each of the image zones from the raster image which may contain multiple image zones. Zone segmenter 5 is a commercially available intelligent character recognition system such as ScanWorX from Xerox.

Once the image zones are separated by zone segmenter 5, each of the image zones is sent to image zone discrimination system 10 which recognizes, determines, or identifies a zone type of each of the image zones. Image zone discrimination system 10 outputs the zone type and notifies the document conversion operator of the zone type. Image zone discrimination system 10 also labels the image zone with the identified zone type. If image zone discrimination system 10 incorrectly identifies the proper zone type, the operator can correct the misidentified zone type with the correct zone type.

As shown in FIG. 1, image zone discrimination system 10 comprises a region identifier 12 and multiple layer neural network 13 which includes a layer of subsampling neurons 14 and multilayer perceptron 16. Region identifier 12 receives each image zone from zone segmenter 5 and selects a region from each of the image zones. The layer of subsampling neurons 14 receives the region selected by region identifier 12 and subsamples this region to a size which is compatible with the amount of information which can be processed by multilayer perceptron 16. The layer of subsampling neurons 14 is discussed in more detail below.

The subsampled region is provided to multilayer perceptron 16 of neural network 13 which determines whether the selected region is text, a picture, or a table. Based on the result of the determination by multilayer perceptron 16, image zone discrimination system 10 notifies an operator of said identified zone type by labelling the image zone with the identified zone type.

The operator can then change the zone type of the image zone if image zone discrimination system 10 misidentifies said zone type.

FIG. 2 shows a flowchart of an image zone discrimination method in accordance with a preferred embodiment of the invention. Steps 200-202 which are executed by zone segmenter 5 prepare a raster image for processing by image zone discrimination system 10. First, zone segmenter 5 in step 200 receives a raster image, identifies manually or automatically the different image zones from the raster image in step 201, and extracts in step 202 each image zone from the raster image. The segmentatIon performed in step 201 is typically performed by a human operator, but can also be accomplished by auto-segmentation software which is commercially available.

Once the image zone is extracted from a raster image, segmenter 5 in step 203 stores the image zones. Each of the image zones is then sent to image zone discrimination system 10 in the TIFF (tagged image file format) format. Although the image zones are sent in TIFF format, other formats can be used as well.

In step 204, image zone discrimination system 10 selects a region of the image zone. The first stage of the image zone discrimination system 10 is a region identification process. This process takes as input pixels from the extracted image zone and creates as output a small region, R, of this input image zone. The dimensions of R are hR x wR where hR is the height of the region and wR is the width of the region. Both the height and width are measured in pixels. Currently, image zone discrimination system 10 creates region R by extracting pixels from the middle of the image zone. However, other sophisticated techniques can be used to determine whether region R contains enough information for multiple layer neural network 13 to make a zone discrimination.

After the region R is selected, a layer of subsampling neuron 14 of neural network 13 in step 205 subsamples the region R.

Subsampling neurons 14 subsample the selected region R in x (width) and y (height) dimensions to a size that can be easily processed by multilayer perceptron 16 of neural network 13. The input to neuron layer 14 is region R. The number of inputs to the layer of subsampling neurons 14 is the size of region R, or hR x wR.

The output of the first layer of subsampling neurons 14 is Rss which represents a subsampled region R. The dimensions of Rss are hRss x wRss, where hRss is the height of the subsampled region and wRss is the width of the subsampled region. These dimensions are determined by the subsampled ratios for the x and y dimensions.

The subsampling ratio in the x dimension, rx is given as: rx = wR I WRss EQ. 1 The subsampling ration in the y dimension, ry is given as: ry = hR / hRss EQ. 2 The size of region R, the horizontal and vertical subsampling ratios rx and ry, and the size of region Rss may be chosen independently, but consistent with equations 1 and 2.

The number of neurons N in subsampling layer 14 is given by the following equation: N = hRss x wRss EQ. 3 Each of the N neurons is a fixed weight binary threshhold neuron 31 as shown in the example shown in FIG. 3. In FIG. 3, four pixels 301-304 from a part of region R 30 are being sampled and are fed as inputs to neuron 31. Each of the four pixels 301-304 has a value of one ("on") or zero ("off"). Each of the values of the pixels is multiplied by a connection weight by multipliers 32. A connection weight is one divided by the number of pixels being sampled. The connection weight in this example is equal to one quarter (.25). The results of the multiplications are added together in adder 34. If the result of the addition is greater to or equal to .5, thresh hold 36 assigns the neuron (itself) a value of one. If the result of the addition is less than .5, thresh hold 36 assigns itself a value of zero.

Once the number of neurons in subsampling layer 14 has been determined from the above equations, the connection weights must be determined. Each neuron in subsampling layer 14 is connected to every pixel in region R through a connection weight. A connection weight is the value to assign the pixel if the pixel is "on" (or a one) rather than "off" (or a zero). The input to the neuron in subsampling layer 14 on that connection line is the product of the connection weights and pixels in region R. The connection weights to neuron Ni,j (i=0 to wRss; j=0 to hRss) from all pixels (r,c) in region R which satisfy the following constraint: ((r, c) r=0)(r)+m;c=(j)(r) +n; mCE(0, + n; m # (0, ry-1), n CE (0, rx- 1)} are set to the following: connection weightr,c = 1 /(rx)(ry) The constraint formula identifies which of the pixels in region R are multiplied by the connection weightr,c. In the constraint formula, r represents rows and c represents columns of region R.

The combination of (r,c) will specify pixels by the row and column designation.

An example of how the above equations are used in processing a region is given as follows. Suppose that region R has width wR equal to 400 pixels and height hR equal to 100 pixels. Further suppose that the desired subsampled region Rss has a width of WRss equal to 80 pixels while height hRss equals 20 pixels. The subsampling ratios are determined as follows: = = WR / wRss = 400 /80 = 5 ry = hR/ hRss = 100 /20 = 5 Therefore, the number of neurons in subsampling layer 14 is the following: N = hRss x wRss = 20 x 80 = 1600.

Each of the 1600 neurons is a fixed weight binary threshhold neuron as shown in FIG. 3. The following constraint formulae determines a set of pixels (r,c) which are assigned a connection weight which connect to neuron N1,2: ((r,c) | r = (i)(ry)+ m; c (j)(rx)+ n; m # (0, ry-1), n # (0, rx- 1)) ((r, c) I r = (i)(5) + m; c = (j)(5) + n; m CE (0, -4), n CE (0, 4)) For neuron N1,2, this is the following set of pixels: (r,c) = ( (5,10), (5,11), (5,12), (5,13), (5,14) (6,10), (6,11), (6,12), (6,13), (6,14) (9,10), (9,11), (9,12), (9, 13), (9,14)) The connection weightr,c for each of the pixels in the set of pixels is set to the following: connection weightr,c = 1 (rx)(ry) = 1/ (5)(5) = .040 Therefore, if any of the pixels in the set of pixels is "on", it is multiplied by the connection weight of .040. After each of the multiplications is found for each of the pixels, the connection weights are summed. If the result of the summation is greater or equal to .5, neuron N1,2 is given the value of one or is "on".

Otherwise N1,2 is given a value of zero.

According to FIG. 2, once region R has been subsampled by the layer of subsampling neurons 14 in step 205, multilayer perceptron 16 of neural network 13 processes the subsampled region Rss in step 206.

Multilayer perceptron 16 is a conventional multilayer perceptron such as is available in the public domain packages Genesis or PlaNet, for example. Multilayer perceptron 16 is trained with a well-known learning rule called Backward Error Propagation. The topology of the multilayer perceptron neural network 16 is a threelayer fully connected network with N input-layer neurons, 100 hidden-layer neurons, and 3 output-layer neurons. The input-layer is a layer of fanout, distribution nodes that do not have adaptable weights or squashing functions. Hyperbolic tangent functions were used for the nonlinear squashing functions in the hidden and output layers.

The multilayer perceptron neural network 16 is trained on a sample with a text/table/raster frequency distribution consistent with the document class that is being converted, such as electronic data manuals, for example. For this document class, the frequency of occurrence of each image zone type is determined using standard statistical techniques.

The output of multilayer perceptron 16 shown in FIG. 1 is a vector of three values. Each element of this output vector codes for one of the zone types (text, table, and picture). The vector elements range in value from -1 to +1. The vector element with the largest value corresponds to the most likely zone type. The result is output to an operator who can change the zone type if it is incorrectly identified by neural network 13 of image zone discrimination system 10.

Returning to FIG. 2, once the result is output, image zone discrimination system 10 in step 207 determines whether segmenter 5 has any more image zones to be processed for a particular raster image. If there are remaining image zones to be processed in step 207, image zone discrimination system 10 receives the next image zone and repeats steps 204-207 until all image zones have been processed by system 10.

It will be appreciated by those skilled in the art that the present invention automatically identifies an image zone type from a portion of a raster image and labels the portion of the raster image with the correct image zone type. This is a feature which the prior art systems are incapable of performing.

It is intended by the appended claims to cover all modifications of the invention which fall within the true spirit and scope of the invention. For example, although the document zone discrimination system recognizes text, tables, and pictures, the neural network may also be trained to recognize other zone types such as handwriting, for example.

What is claimed is:

Claims

1. An image zone discrimination system connectable to receive a plurality of image zones, each of said image zones being one of a plurality of zone types including text, a picture, or a table, said system comprising: a region identifier which selects a region from each of said image zones; and a neural network connected to said region identifier, said neural network sampling said region and identifing a zone type of said subsampled region of said image zone.

2. An image zone discrimination system as recited in claim 1, wherein said neural network comprises: a layer of subsampling neurons; and a multilayer perceptron connected to said layer of subsampling neurons.

3. An image zone discrimination system as recited in claim 2, wherein said layer of subsampling neurons comprises a plurality of fixed-weight, binary threshhold neurons.

4. An image zone discrimination system as recited in claim 2, wherein said multi layer perceptron comprises: a plurality of input-layer neurons; a plurality of hidden-layer neurons each connected to each of said input-layer neurons; and a plurality of output-layer neurons each connected to each of said hidden-layer neurons.

5. A system executed on a computer as part of a computer program for identifying a plurality of zone types including text, pictures, and tables from a page of a document, said system being connectable to receive a raster image, said system comprising: zone segmentation means for separating said raster image into image zones; and image zone discrimination means for identifying automatically a zone type of each of said image zones.

6. A system as recited in claim 5, wherein said image zone discrimination means comprises: region identification means for selecting a region from each of said image zones; and neural network means for subsampling said region and for identifying a zone type of said subsampled region of said image zone.

7. A system as recited in claim 6, wherein said neural network comprises: a layer of subsampling neurons; and a multilayer perceptron connected to said layer of subsampling neurons.

8. A system as recited in claim 7, wherein said layer of subsampling neurons comprises a plurality of fixed-weight, binary threshhold neurons.

9. A system as recited in claim 7, wherein said multilayer perceptron comprises: a plurality oi input-layer neurons; a plurality of hidden-layer neurons each connected to each of said input-layer neurons; and a plurality of output-layer neurons each connected to each of said hidden-layer neurons.

10. An image zone discrimination method executed on a computer as part of a computer program for identifying a plurality of zone types including text, images, and tables from a page of a document, said computer being connectable to receive a plurality of image zones, said method comprising the steps of: (a) selecting a region from each of said image zones; (b) subsampling said region; and (c) identifying automatically a zone type of said subsampled region of said image zone.

11. An image zone discrimination method as recited in claim 10, further comprising the step of: (d) labelling said region of said raster image with said identified zone type.

12. An image zone discrimination method executed on a computer as part of a computer program for identifying a plurality of zone types including text, images, and tables from a page of a document, said computer being connectable to receive a plurality of zone images from said page of said document, said computer including a multiple layered neural network, said method comprising the steps of: a) selecting a region from each of said zone images; b) subsampling said region using said multiple layered neural network; c) identifying a zone type of said region which identifies said zone type of said region using said multiple layered neural network; and d) labelling said region with said identified zone type.