GB2275844A - Image Zone Discrimination Using a Neural Network - Google Patents

Image Zone Discrimination Using a Neural Network Download PDF

Info

Publication number
GB2275844A
GB2275844A GB9403151A GB9403151A GB2275844A GB 2275844 A GB2275844 A GB 2275844A GB 9403151 A GB9403151 A GB 9403151A GB 9403151 A GB9403151 A GB 9403151A GB 2275844 A GB2275844 A GB 2275844A
Authority
GB
United Kingdom
Prior art keywords
image
zone
region
neurons
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB9403151A
Other versions
GB9403151D0 (en
Inventor
Michael C Murdock
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INFO ENTERPRISES Inc
Original Assignee
INFO ENTERPRISES Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INFO ENTERPRISES Inc filed Critical INFO ENTERPRISES Inc
Publication of GB9403151D0 publication Critical patent/GB9403151D0/en
Publication of GB2275844A publication Critical patent/GB2275844A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/40Picture signal circuits
    • H04N1/40062Discrimination between different image types, e.g. two-tone, continuous tone
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • G06V30/18019Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
    • G06V30/18038Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters
    • G06V30/18048Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters with interaction between the responses of different filters, e.g. cortical complex cells
    • G06V30/18057Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

An image zone discrimination system 10 and method automatically identify whether a region of a raster image contains text, pictures, or tables by using a neural network 13. The neural network 13 includes neurons 14 which subsample the region and a multi layer perceptron 16 which determines whether the region is text, a picture, or a table. <IMAGE>

Description

IMAGE ZONE DISCRIMINATION SYSTEM AND METHOD Technical Field This invention relates generally to document conversion systems and methods and, in particular, to an image zone discrimination system and method for identifying automatically text, pictures, and tables from a page of a document.
Background of the Invention A page of a document may contain different types of image zones such as text, pictures, and tables. Pictures include drawings, figures, images, photographs, or any other type of illustrative markings. In conventional document conversion systems, each of these types of image zones is processed differently and separately. For example, an image zone containing text needs to be processed by an optical character recognition system to recognize characters of the text. Similarly, an image zone which contains pictures needs to be processed by a raster-tovector system. However, an image zone which contains pictures does not require any processing by an optical character recognition engine.
Conventional document conversion systems can separate pieces of a raster image into image zones or blocks of information.
For example, if a raster image contains a picture and text describing the picture, conventional document conversion systems can separate the picture and the text each into an unlabeled image zone. Some conventional document conversion systems can highlight the separated blocks of information or image zones.
However, before the image zones can be processed, a document conversion operator must identify what type of image zone the highlighted part of the raster image is and manually label the identified image zone with its zone type (text, image, table, handwriting). Therefore, there exists a significant need to identify automatically an image zone type from a portion of a raster image and to label the portion of the raster image with the correct image zone type.
Summary of the Invention The present invention has utility in identifying automatically by using a neural network whether an image zone contains text, a table, or a picture.
Thus, it is an advantage of the present invention to determine, identify, or recognize automatically by using a neural network whether an image zone type is text, a table, or an picture image.
According to one aspect of the invention, an image zone discrimination system is provided which is connectable to receive a plurality of image zones. Each of the image zones has one of a plurality of zone types including text, a picture, or a table. The image zone discrimination system comprises (a) a region identifier which selects a region from each of the image zones and (b) a neural network connected to the region identifier, the neural network sampling the region and identifying a zone type of the subsampled region of the image zone.
According to another aspect of the invention, an image zone discrimination method is provided which is executed on a computer as part of a computer program. The method identifies a plurality of zone types including text, pictures, images, and tables from a page of a document. The computer is connectable to receive a plurality of image zones. The method comprises the steps of (a) selecting a region from each of the image zones; (b) subsampling the region; and (c) identifying automatically a zone type of the subsampled region of the image zone.
Brief Description of the Drawings The invention is pointed out with particularity in the appended claims. However, other features of the invention will become more apparent and the invention will be best understood by referring to the following detailed description in conjunction with the accompanying drawings in which: FIG. 1 shows an image zone discrimination system in accordance with a preferred embodiment of the invention; FIG. 2 shows a flowchart of an image zone discrimination method in accordance with a preferred embodiment of the invention; and FIG. 3 shows one of a plurality of fixed-weight, binary threshhold subsampling neurons in accordance with a preferred embodiment of the invention.
Description of the Preferred Embodiments FIG. 1 shows an image zone discrimination system 10 in accordance with a preferred embodiment of the invention. The image zone discrimination system 10 comprises a region identifier 12 and a neural network 13. Neural network comprises a layer of subsampling neurons 14 and a multilayer perceptron 16.
As shown in FIG. 1, a raster image is received by zone segmenter 5 which separates portions of the raster image into blocks of similar information or image zones. Zone segmenter 5 also extracts each of the image zones from the raster image which may contain multiple image zones. Zone segmenter 5 is a commercially available intelligent character recognition system such as ScanWorX from Xerox.
Once the image zones are separated by zone segmenter 5, each of the image zones is sent to image zone discrimination system 10 which recognizes, determines, or identifies a zone type of each of the image zones. Image zone discrimination system 10 outputs the zone type and notifies the document conversion operator of the zone type. Image zone discrimination system 10 also labels the image zone with the identified zone type. If image zone discrimination system 10 incorrectly identifies the proper zone type, the operator can correct the misidentified zone type with the correct zone type.
As shown in FIG. 1, image zone discrimination system 10 comprises a region identifier 12 and multiple layer neural network 13 which includes a layer of subsampling neurons 14 and multilayer perceptron 16. Region identifier 12 receives each image zone from zone segmenter 5 and selects a region from each of the image zones. The layer of subsampling neurons 14 receives the region selected by region identifier 12 and subsamples this region to a size which is compatible with the amount of information which can be processed by multilayer perceptron 16. The layer of subsampling neurons 14 is discussed in more detail below.
The subsampled region is provided to multilayer perceptron 16 of neural network 13 which determines whether the selected region is text, a picture, or a table. Based on the result of the determination by multilayer perceptron 16, image zone discrimination system 10 notifies an operator of said identified zone type by labelling the image zone with the identified zone type.
The operator can then change the zone type of the image zone if image zone discrimination system 10 misidentifies said zone type.
FIG. 2 shows a flowchart of an image zone discrimination method in accordance with a preferred embodiment of the invention. Steps 200-202 which are executed by zone segmenter 5 prepare a raster image for processing by image zone discrimination system 10. First, zone segmenter 5 in step 200 receives a raster image, identifies manually or automatically the different image zones from the raster image in step 201, and extracts in step 202 each image zone from the raster image. The segmentatIon performed in step 201 is typically performed by a human operator, but can also be accomplished by auto-segmentation software which is commercially available.
Once the image zone is extracted from a raster image, segmenter 5 in step 203 stores the image zones. Each of the image zones is then sent to image zone discrimination system 10 in the TIFF (tagged image file format) format. Although the image zones are sent in TIFF format, other formats can be used as well.
In step 204, image zone discrimination system 10 selects a region of the image zone. The first stage of the image zone discrimination system 10 is a region identification process. This process takes as input pixels from the extracted image zone and creates as output a small region, R, of this input image zone. The dimensions of R are hR x wR where hR is the height of the region and wR is the width of the region. Both the height and width are measured in pixels. Currently, image zone discrimination system 10 creates region R by extracting pixels from the middle of the image zone. However, other sophisticated techniques can be used to determine whether region R contains enough information for multiple layer neural network 13 to make a zone discrimination.
After the region R is selected, a layer of subsampling neuron 14 of neural network 13 in step 205 subsamples the region R.
Subsampling neurons 14 subsample the selected region R in x (width) and y (height) dimensions to a size that can be easily processed by multilayer perceptron 16 of neural network 13. The input to neuron layer 14 is region R. The number of inputs to the layer of subsampling neurons 14 is the size of region R, or hR x wR.
The output of the first layer of subsampling neurons 14 is Rss which represents a subsampled region R. The dimensions of Rss are hRss x wRss, where hRss is the height of the subsampled region and wRss is the width of the subsampled region. These dimensions are determined by the subsampled ratios for the x and y dimensions.
The subsampling ratio in the x dimension, rx is given as: rx = wR I WRss EQ. 1 The subsampling ration in the y dimension, ry is given as: ry = hR / hRss EQ. 2 The size of region R, the horizontal and vertical subsampling ratios rx and ry, and the size of region Rss may be chosen independently, but consistent with equations 1 and 2.
The number of neurons N in subsampling layer 14 is given by the following equation: N = hRss x wRss EQ. 3 Each of the N neurons is a fixed weight binary threshhold neuron 31 as shown in the example shown in FIG. 3. In FIG. 3, four pixels 301-304 from a part of region R 30 are being sampled and are fed as inputs to neuron 31. Each of the four pixels 301-304 has a value of one ("on") or zero ("off"). Each of the values of the pixels is multiplied by a connection weight by multipliers 32. A connection weight is one divided by the number of pixels being sampled. The connection weight in this example is equal to one quarter (.25). The results of the multiplications are added together in adder 34. If the result of the addition is greater to or equal to .5, thresh hold 36 assigns the neuron (itself) a value of one. If the result of the addition is less than .5, thresh hold 36 assigns itself a value of zero.
Once the number of neurons in subsampling layer 14 has been determined from the above equations, the connection weights must be determined. Each neuron in subsampling layer 14 is connected to every pixel in region R through a connection weight. A connection weight is the value to assign the pixel if the pixel is "on" (or a one) rather than "off" (or a zero). The input to the neuron in subsampling layer 14 on that connection line is the product of the connection weights and pixels in region R. The connection weights to neuron Ni,j (i=0 to wRss; j=0 to hRss) from all pixels (r,c) in region R which satisfy the following constraint: ((r, c) r=0)(r)+m;c=(j)(r) +n; mCE(0, + n; m # (0, ry-1), n CE (0, rx- 1)} are set to the following: connection weightr,c = 1 /(rx)(ry) The constraint formula identifies which of the pixels in region R are multiplied by the connection weightr,c. In the constraint formula, r represents rows and c represents columns of region R.
The combination of (r,c) will specify pixels by the row and column designation.
An example of how the above equations are used in processing a region is given as follows. Suppose that region R has width wR equal to 400 pixels and height hR equal to 100 pixels. Further suppose that the desired subsampled region Rss has a width of WRss equal to 80 pixels while height hRss equals 20 pixels. The subsampling ratios are determined as follows: = = WR / wRss = 400 /80 = 5 ry = hR/ hRss = 100 /20 = 5 Therefore, the number of neurons in subsampling layer 14 is the following: N = hRss x wRss = 20 x 80 = 1600.
Each of the 1600 neurons is a fixed weight binary threshhold neuron as shown in FIG. 3. The following constraint formulae determines a set of pixels (r,c) which are assigned a connection weight which connect to neuron N1,2: ((r,c) | r = (i)(ry)+ m; c (j)(rx)+ n; m # (0, ry-1), n # (0, rx- 1)) ((r, c) I r = (i)(5) + m; c = (j)(5) + n; m CE (0, -4), n CE (0, 4)) For neuron N1,2, this is the following set of pixels: (r,c) = ( (5,10), (5,11), (5,12), (5,13), (5,14) (6,10), (6,11), (6,12), (6,13), (6,14) (9,10), (9,11), (9,12), (9, 13), (9,14)) The connection weightr,c for each of the pixels in the set of pixels is set to the following: connection weightr,c = 1 (rx)(ry) = 1/ (5)(5) = .040 Therefore, if any of the pixels in the set of pixels is "on", it is multiplied by the connection weight of .040. After each of the multiplications is found for each of the pixels, the connection weights are summed. If the result of the summation is greater or equal to .5, neuron N1,2 is given the value of one or is "on".
Otherwise N1,2 is given a value of zero.
According to FIG. 2, once region R has been subsampled by the layer of subsampling neurons 14 in step 205, multilayer perceptron 16 of neural network 13 processes the subsampled region Rss in step 206.
Multilayer perceptron 16 is a conventional multilayer perceptron such as is available in the public domain packages Genesis or PlaNet, for example. Multilayer perceptron 16 is trained with a well-known learning rule called Backward Error Propagation. The topology of the multilayer perceptron neural network 16 is a threelayer fully connected network with N input-layer neurons, 100 hidden-layer neurons, and 3 output-layer neurons. The input-layer is a layer of fanout, distribution nodes that do not have adaptable weights or squashing functions. Hyperbolic tangent functions were used for the nonlinear squashing functions in the hidden and output layers.
The multilayer perceptron neural network 16 is trained on a sample with a text/table/raster frequency distribution consistent with the document class that is being converted, such as electronic data manuals, for example. For this document class, the frequency of occurrence of each image zone type is determined using standard statistical techniques.
The output of multilayer perceptron 16 shown in FIG. 1 is a vector of three values. Each element of this output vector codes for one of the zone types (text, table, and picture). The vector elements range in value from -1 to +1. The vector element with the largest value corresponds to the most likely zone type. The result is output to an operator who can change the zone type if it is incorrectly identified by neural network 13 of image zone discrimination system 10.
Returning to FIG. 2, once the result is output, image zone discrimination system 10 in step 207 determines whether segmenter 5 has any more image zones to be processed for a particular raster image. If there are remaining image zones to be processed in step 207, image zone discrimination system 10 receives the next image zone and repeats steps 204-207 until all image zones have been processed by system 10.
It will be appreciated by those skilled in the art that the present invention automatically identifies an image zone type from a portion of a raster image and labels the portion of the raster image with the correct image zone type. This is a feature which the prior art systems are incapable of performing.
It is intended by the appended claims to cover all modifications of the invention which fall within the true spirit and scope of the invention. For example, although the document zone discrimination system recognizes text, tables, and pictures, the neural network may also be trained to recognize other zone types such as handwriting, for example.
What is claimed is:

Claims (12)

Claims
1. An image zone discrimination system connectable to receive a plurality of image zones, each of said image zones being one of a plurality of zone types including text, a picture, or a table, said system comprising: a region identifier which selects a region from each of said image zones; and a neural network connected to said region identifier, said neural network sampling said region and identifing a zone type of said subsampled region of said image zone.
2. An image zone discrimination system as recited in claim 1, wherein said neural network comprises: a layer of subsampling neurons; and a multilayer perceptron connected to said layer of subsampling neurons.
3. An image zone discrimination system as recited in claim 2, wherein said layer of subsampling neurons comprises a plurality of fixed-weight, binary threshhold neurons.
4. An image zone discrimination system as recited in claim 2, wherein said multi layer perceptron comprises: a plurality of input-layer neurons; a plurality of hidden-layer neurons each connected to each of said input-layer neurons; and a plurality of output-layer neurons each connected to each of said hidden-layer neurons.
5. A system executed on a computer as part of a computer program for identifying a plurality of zone types including text, pictures, and tables from a page of a document, said system being connectable to receive a raster image, said system comprising: zone segmentation means for separating said raster image into image zones; and image zone discrimination means for identifying automatically a zone type of each of said image zones.
6. A system as recited in claim 5, wherein said image zone discrimination means comprises: region identification means for selecting a region from each of said image zones; and neural network means for subsampling said region and for identifying a zone type of said subsampled region of said image zone.
7. A system as recited in claim 6, wherein said neural network comprises: a layer of subsampling neurons; and a multilayer perceptron connected to said layer of subsampling neurons.
8. A system as recited in claim 7, wherein said layer of subsampling neurons comprises a plurality of fixed-weight, binary threshhold neurons.
9. A system as recited in claim 7, wherein said multilayer perceptron comprises: a plurality oi input-layer neurons; a plurality of hidden-layer neurons each connected to each of said input-layer neurons; and a plurality of output-layer neurons each connected to each of said hidden-layer neurons.
10. An image zone discrimination method executed on a computer as part of a computer program for identifying a plurality of zone types including text, images, and tables from a page of a document, said computer being connectable to receive a plurality of image zones, said method comprising the steps of: (a) selecting a region from each of said image zones; (b) subsampling said region; and (c) identifying automatically a zone type of said subsampled region of said image zone.
11. An image zone discrimination method as recited in claim 10, further comprising the step of: (d) labelling said region of said raster image with said identified zone type.
12. An image zone discrimination method executed on a computer as part of a computer program for identifying a plurality of zone types including text, images, and tables from a page of a document, said computer being connectable to receive a plurality of zone images from said page of said document, said computer including a multiple layered neural network, said method comprising the steps of: a) selecting a region from each of said zone images; b) subsampling said region using said multiple layered neural network; c) identifying a zone type of said region which identifies said zone type of said region using said multiple layered neural network; and d) labelling said region with said identified zone type.
GB9403151A 1993-03-02 1994-02-18 Image Zone Discrimination Using a Neural Network Withdrawn GB2275844A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US2530993A 1993-03-02 1993-03-02

Publications (2)

Publication Number Publication Date
GB9403151D0 GB9403151D0 (en) 1994-04-06
GB2275844A true GB2275844A (en) 1994-09-07

Family

ID=21825284

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9403151A Withdrawn GB2275844A (en) 1993-03-02 1994-02-18 Image Zone Discrimination Using a Neural Network

Country Status (2)

Country Link
JP (1) JPH0773154A (en)
GB (1) GB2275844A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287998A (en) * 2019-05-28 2019-09-27 浙江工业大学 A kind of scientific and technical literature picture extracting method based on Faster-RCNN

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10140573B2 (en) * 2014-03-03 2018-11-27 Qualcomm Incorporated Neural network adaptation to current computational resources

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2167264A (en) * 1984-10-19 1986-05-21 Canon Kk Discriminating between different types of image data
WO1989003150A1 (en) * 1987-10-05 1989-04-06 Eastman Kodak Company Image discrimination
EP0404236A1 (en) * 1989-06-21 1990-12-27 Océ-Nederland B.V. Image segmentation method and device
EP0469315A2 (en) * 1990-07-31 1992-02-05 Siemens Aktiengesellschaft Method for visual inspection of two- or three-dimensional images
EP0494026A2 (en) * 1990-12-31 1992-07-08 Goldstar Co. Ltd. Method for automatically distinguishing between graphic information and text information of image data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2167264A (en) * 1984-10-19 1986-05-21 Canon Kk Discriminating between different types of image data
WO1989003150A1 (en) * 1987-10-05 1989-04-06 Eastman Kodak Company Image discrimination
EP0404236A1 (en) * 1989-06-21 1990-12-27 Océ-Nederland B.V. Image segmentation method and device
EP0469315A2 (en) * 1990-07-31 1992-02-05 Siemens Aktiengesellschaft Method for visual inspection of two- or three-dimensional images
EP0494026A2 (en) * 1990-12-31 1992-07-08 Goldstar Co. Ltd. Method for automatically distinguishing between graphic information and text information of image data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IBM Technical Disclosure Bulletin,Vol 29,No 5,October 1986 pages 2130 to 2133 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287998A (en) * 2019-05-28 2019-09-27 浙江工业大学 A kind of scientific and technical literature picture extracting method based on Faster-RCNN

Also Published As

Publication number Publication date
JPH0773154A (en) 1995-03-17
GB9403151D0 (en) 1994-04-06

Similar Documents

Publication Publication Date Title
US5373566A (en) Neural network-based diacritical marker recognition system and method
US6563959B1 (en) Perceptual similarity image retrieval method
US9910829B2 (en) Automatic document separation
US7031555B2 (en) Perceptual similarity image retrieval
US20040218838A1 (en) Image processing apparatus and method therefor
US5852676A (en) Method and apparatus for locating and identifying fields within a document
Yeung et al. Video browsing using clustering and scene transitions on compressed sequences
EP0654746B1 (en) Form identification and processing system
US5818952A (en) Apparatus for assigning categories to words in a documents for databases
CN109964250A (en) For analyzing the method and system of the image in convolutional neural networks
CN1343339A (en) Video stream classifiable symbol isolation method and system
US20050008263A1 (en) Image retrieving system, image classifying system, image retrieving program, image classifying program, image retrieving method and image classifying method
CN110321894A (en) A kind of library book method for rapidly positioning based on deep learning OCR
JP3634266B2 (en) Color video processing method and apparatus
CN109213886B (en) Image retrieval method and system based on image segmentation and fuzzy pattern recognition
EP0388725B1 (en) Texture discrimination method
Bouillon et al. Grayification: a meaningful grayscale conversion to improve handwritten historical documents analysis
US7286722B2 (en) Memo image managing apparatus, memo image managing system and memo image managing method
CN111553361B (en) Pathological section label identification method
CN101802844B (en) Applying a segmentation engine to different mappings of a digital image
GB2275844A (en) Image Zone Discrimination Using a Neural Network
CN112365451A (en) Method, device and equipment for determining image quality grade and computer readable medium
CN110659585A (en) Pedestrian detection method based on interactive attribute supervision
CN114694133B (en) Text recognition method based on combination of image processing and deep learning
JP4031189B2 (en) Document recognition apparatus and document recognition method

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)