GB2275844A - Image Zone Discrimination Using a Neural Network - Google Patents
Image Zone Discrimination Using a Neural Network Download PDFInfo
- Publication number
- GB2275844A GB2275844A GB9403151A GB9403151A GB2275844A GB 2275844 A GB2275844 A GB 2275844A GB 9403151 A GB9403151 A GB 9403151A GB 9403151 A GB9403151 A GB 9403151A GB 2275844 A GB2275844 A GB 2275844A
- Authority
- GB
- United Kingdom
- Prior art keywords
- image
- zone
- region
- neurons
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/40—Picture signal circuits
- H04N1/40062—Discrimination between different image types, e.g. two-tone, continuous tone
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/1801—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
- G06V30/18019—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
- G06V30/18038—Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters
- G06V30/18048—Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters with interaction between the responses of different filters, e.g. cortical complex cells
- G06V30/18057—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
An image zone discrimination system 10 and method automatically identify whether a region of a raster image contains text, pictures, or tables by using a neural network 13. The neural network 13 includes neurons 14 which subsample the region and a multi layer perceptron 16 which determines whether the region is text, a picture, or a table. <IMAGE>
Description
IMAGE ZONE DISCRIMINATION SYSTEM AND METHOD
Technical Field
This invention relates generally to document conversion systems and methods and, in particular, to an image zone discrimination system and method for identifying automatically text, pictures, and tables from a page of a document.
Background of the Invention
A page of a document may contain different types of image zones such as text, pictures, and tables. Pictures include drawings, figures, images, photographs, or any other type of illustrative markings. In conventional document conversion systems, each of these types of image zones is processed differently and separately. For example, an image zone containing text needs to be processed by an optical character recognition system to recognize characters of the text. Similarly, an image zone which contains pictures needs to be processed by a raster-tovector system. However, an image zone which contains pictures does not require any processing by an optical character recognition engine.
Conventional document conversion systems can separate pieces of a raster image into image zones or blocks of information.
For example, if a raster image contains a picture and text describing the picture, conventional document conversion systems can separate the picture and the text each into an unlabeled image zone. Some conventional document conversion systems can highlight the separated blocks of information or image zones.
However, before the image zones can be processed, a document conversion operator must identify what type of image zone the highlighted part of the raster image is and manually label the identified image zone with its zone type (text, image, table, handwriting). Therefore, there exists a significant need to identify automatically an image zone type from a portion of a raster image and to label the portion of the raster image with the correct image zone type.
Summary of the Invention
The present invention has utility in identifying automatically by using a neural network whether an image zone contains text, a table, or a picture.
Thus, it is an advantage of the present invention to determine, identify, or recognize automatically by using a neural network whether an image zone type is text, a table, or an picture image.
According to one aspect of the invention, an image zone discrimination system is provided which is connectable to receive a plurality of image zones. Each of the image zones has one of a plurality of zone types including text, a picture, or a table. The image zone discrimination system comprises (a) a region identifier which selects a region from each of the image zones and (b) a neural network connected to the region identifier, the neural network sampling the region and identifying a zone type of the subsampled region of the image zone.
According to another aspect of the invention, an image zone discrimination method is provided which is executed on a computer as part of a computer program. The method identifies a plurality of zone types including text, pictures, images, and tables from a page of a document. The computer is connectable to receive a plurality of image zones. The method comprises the steps of (a) selecting a region from each of the image zones; (b) subsampling the region; and (c) identifying automatically a zone type of the subsampled region of the image zone.
Brief Description of the Drawings
The invention is pointed out with particularity in the appended claims. However, other features of the invention will become more apparent and the invention will be best understood by referring to the following detailed description in conjunction with the accompanying drawings in which:
FIG. 1 shows an image zone discrimination system in accordance with a preferred embodiment of the invention;
FIG. 2 shows a flowchart of an image zone discrimination method in accordance with a preferred embodiment of the invention; and
FIG. 3 shows one of a plurality of fixed-weight, binary threshhold subsampling neurons in accordance with a preferred embodiment of the invention.
Description of the Preferred Embodiments
FIG. 1 shows an image zone discrimination system 10 in accordance with a preferred embodiment of the invention. The image zone discrimination system 10 comprises a region identifier 12 and a neural network 13. Neural network comprises a layer of subsampling neurons 14 and a multilayer perceptron 16.
As shown in FIG. 1, a raster image is received by zone segmenter 5 which separates portions of the raster image into blocks of similar information or image zones. Zone segmenter 5 also extracts each of the image zones from the raster image which may contain multiple image zones. Zone segmenter 5 is a commercially available intelligent character recognition system such as ScanWorX from Xerox.
Once the image zones are separated by zone segmenter 5, each of the image zones is sent to image zone discrimination system 10 which recognizes, determines, or identifies a zone type of each of the image zones. Image zone discrimination system 10 outputs the zone type and notifies the document conversion operator of the zone type. Image zone discrimination system 10 also labels the image zone with the identified zone type. If image zone discrimination system 10 incorrectly identifies the proper zone type, the operator can correct the misidentified zone type with the correct zone type.
As shown in FIG. 1, image zone discrimination system 10 comprises a region identifier 12 and multiple layer neural network 13 which includes a layer of subsampling neurons 14 and multilayer perceptron 16. Region identifier 12 receives each image zone from zone segmenter 5 and selects a region from each of the image zones. The layer of subsampling neurons 14 receives the region selected by region identifier 12 and subsamples this region to a size which is compatible with the amount of information which can be processed by multilayer perceptron 16. The layer of subsampling neurons 14 is discussed in more detail below.
The subsampled region is provided to multilayer perceptron 16 of neural network 13 which determines whether the selected region is text, a picture, or a table. Based on the result of the determination by multilayer perceptron 16, image zone discrimination system 10 notifies an operator of said identified zone type by labelling the image zone with the identified zone type.
The operator can then change the zone type of the image zone if image zone discrimination system 10 misidentifies said zone type.
FIG. 2 shows a flowchart of an image zone discrimination method in accordance with a preferred embodiment of the invention. Steps 200-202 which are executed by zone segmenter 5 prepare a raster image for processing by image zone discrimination system 10. First, zone segmenter 5 in step 200 receives a raster image, identifies manually or automatically the different image zones from the raster image in step 201, and extracts in step 202 each image zone from the raster image. The segmentatIon performed in step 201 is typically performed by a human operator, but can also be accomplished by auto-segmentation software which is commercially available.
Once the image zone is extracted from a raster image, segmenter 5 in step 203 stores the image zones. Each of the image zones is then sent to image zone discrimination system 10 in the
TIFF (tagged image file format) format. Although the image zones are sent in TIFF format, other formats can be used as well.
In step 204, image zone discrimination system 10 selects a region of the image zone. The first stage of the image zone discrimination system 10 is a region identification process. This process takes as input pixels from the extracted image zone and creates as output a small region, R, of this input image zone. The dimensions of R are hR x wR where hR is the height of the region and wR is the width of the region. Both the height and width are measured in pixels. Currently, image zone discrimination system 10 creates region R by extracting pixels from the middle of the image zone. However, other sophisticated techniques can be used to determine whether region R contains enough information for multiple layer neural network 13 to make a zone discrimination.
After the region R is selected, a layer of subsampling neuron 14 of neural network 13 in step 205 subsamples the region R.
Subsampling neurons 14 subsample the selected region R in x (width) and y (height) dimensions to a size that can be easily processed by multilayer perceptron 16 of neural network 13. The input to neuron layer 14 is region R. The number of inputs to the layer of subsampling neurons 14 is the size of region R, or hR x wR.
The output of the first layer of subsampling neurons 14 is Rss which represents a subsampled region R. The dimensions of Rss are hRss x wRss, where hRss is the height of the subsampled region and wRss is the width of the subsampled region. These dimensions are determined by the subsampled ratios for the x and y dimensions.
The subsampling ratio in the x dimension, rx is given as:
rx = wR I WRss EQ. 1
The subsampling ration in the y dimension, ry is given as:
ry = hR / hRss EQ. 2
The size of region R, the horizontal and vertical subsampling ratios rx and ry, and the size of region Rss may be chosen independently, but consistent with equations 1 and 2.
The number of neurons N in subsampling layer 14 is given by the following equation:
N = hRss x wRss EQ. 3
Each of the N neurons is a fixed weight binary threshhold neuron 31 as shown in the example shown in FIG. 3. In FIG. 3, four pixels 301-304 from a part of region R 30 are being sampled and are fed as inputs to neuron 31. Each of the four pixels 301-304 has a value of one ("on") or zero ("off"). Each of the values of the pixels is multiplied by a connection weight by multipliers 32. A connection weight is one divided by the number of pixels being sampled. The connection weight in this example is equal to one quarter (.25). The results of the multiplications are added together in adder 34. If the result of the addition is greater to or equal to .5, thresh hold 36 assigns the neuron (itself) a value of one. If the result of the addition is less than .5, thresh hold 36 assigns itself a value of zero.
Once the number of neurons in subsampling layer 14 has been determined from the above equations, the connection weights must be determined. Each neuron in subsampling layer 14 is connected to every pixel in region R through a connection weight. A connection weight is the value to assign the pixel if the pixel is "on" (or a one) rather than "off" (or a zero). The input to the neuron in subsampling layer 14 on that connection line is the product of the connection weights and pixels in region R. The connection weights to neuron Ni,j (i=0 to wRss; j=0 to hRss) from all pixels (r,c) in region R which satisfy the following constraint: ((r, c) r=0)(r)+m;c=(j)(r) +n; mCE(0, + n; m # (0, ry-1), n CE (0, rx- 1)} are set to the following:
connection weightr,c = 1 /(rx)(ry) The constraint formula identifies which of the pixels in region R are multiplied by the connection weightr,c. In the constraint formula, r represents rows and c represents columns of region R.
The combination of (r,c) will specify pixels by the row and column designation.
An example of how the above equations are used in processing a region is given as follows. Suppose that region R has width wR equal to 400 pixels and height hR equal to 100 pixels. Further suppose that the desired subsampled region Rss has a width of WRss equal to 80 pixels while height hRss equals 20 pixels. The subsampling ratios are determined as follows: = = WR / wRss = 400 /80 = 5 ry = hR/ hRss = 100 /20 = 5 Therefore, the number of neurons in subsampling layer 14 is the following:
N = hRss x wRss = 20 x 80 = 1600.
Each of the 1600 neurons is a fixed weight binary threshhold neuron as shown in FIG. 3. The following constraint formulae determines a set of pixels (r,c) which are assigned a connection weight which connect to neuron N1,2: ((r,c) | r = (i)(ry)+ m; c (j)(rx)+ n; m # (0, ry-1), n # (0, rx- 1))
((r, c) I r = (i)(5) + m; c = (j)(5) + n; m CE (0, -4), n CE (0, 4)) For neuron N1,2, this is the following set of pixels:
(r,c) = ( (5,10), (5,11), (5,12), (5,13), (5,14) (6,10), (6,11), (6,12), (6,13), (6,14)
(9,10), (9,11), (9,12), (9, 13), (9,14)) The connection weightr,c for each of the pixels in the set of pixels is set to the following:
connection weightr,c = 1 (rx)(ry) = 1/ (5)(5) = .040
Therefore, if any of the pixels in the set of pixels is "on", it is multiplied by the connection weight of .040. After each of the multiplications is found for each of the pixels, the connection weights are summed. If the result of the summation is greater or equal to .5, neuron N1,2 is given the value of one or is "on".
Otherwise N1,2 is given a value of zero.
According to FIG. 2, once region R has been subsampled by the layer of subsampling neurons 14 in step 205, multilayer perceptron 16 of neural network 13 processes the subsampled region Rss in step 206.
Multilayer perceptron 16 is a conventional multilayer perceptron such as is available in the public domain packages Genesis or
PlaNet, for example. Multilayer perceptron 16 is trained with a well-known learning rule called Backward Error Propagation. The topology of the multilayer perceptron neural network 16 is a threelayer fully connected network with N input-layer neurons, 100 hidden-layer neurons, and 3 output-layer neurons. The input-layer is a layer of fanout, distribution nodes that do not have adaptable weights or squashing functions. Hyperbolic tangent functions were used for the nonlinear squashing functions in the hidden and output layers.
The multilayer perceptron neural network 16 is trained on a sample with a text/table/raster frequency distribution consistent with the document class that is being converted, such as electronic data manuals, for example. For this document class, the frequency of occurrence of each image zone type is determined using standard statistical techniques.
The output of multilayer perceptron 16 shown in FIG. 1 is a vector of three values. Each element of this output vector codes for one of the zone types (text, table, and picture). The vector elements range in value from -1 to +1. The vector element with the largest value corresponds to the most likely zone type. The result is output to an operator who can change the zone type if it is incorrectly identified by neural network 13 of image zone discrimination system 10.
Returning to FIG. 2, once the result is output, image zone discrimination system 10 in step 207 determines whether segmenter 5 has any more image zones to be processed for a particular raster image. If there are remaining image zones to be processed in step 207, image zone discrimination system 10 receives the next image zone and repeats steps 204-207 until all image zones have been processed by system 10.
It will be appreciated by those skilled in the art that the present invention automatically identifies an image zone type from a portion of a raster image and labels the portion of the raster image with the correct image zone type. This is a feature which the prior art systems are incapable of performing.
It is intended by the appended claims to cover all modifications of the invention which fall within the true spirit and scope of the invention. For example, although the document zone discrimination system recognizes text, tables, and pictures, the neural network may also be trained to recognize other zone types such as handwriting, for example.
What is claimed is:
Claims (12)
1. An image zone discrimination system connectable to receive a plurality of image zones, each of said image zones being one of a plurality of zone types including text, a picture, or a table, said system comprising:
a region identifier which selects a region from each of said image zones; and
a neural network connected to said region identifier, said neural network sampling said region and identifing a zone type of said subsampled region of said image zone.
2. An image zone discrimination system as recited in claim 1, wherein said neural network comprises:
a layer of subsampling neurons; and
a multilayer perceptron connected to said layer of subsampling neurons.
3. An image zone discrimination system as recited in claim 2, wherein said layer of subsampling neurons comprises a plurality of fixed-weight, binary threshhold neurons.
4. An image zone discrimination system as recited in claim 2, wherein said multi layer perceptron comprises:
a plurality of input-layer neurons;
a plurality of hidden-layer neurons each connected to each of said input-layer neurons; and
a plurality of output-layer neurons each connected to each of said hidden-layer neurons.
5. A system executed on a computer as part of a computer program for identifying a plurality of zone types including text, pictures, and tables from a page of a document, said system being connectable to receive a raster image, said system comprising:
zone segmentation means for separating said raster image into image zones; and
image zone discrimination means for identifying automatically a zone type of each of said image zones.
6. A system as recited in claim 5, wherein said image zone discrimination means comprises:
region identification means for selecting a region from each of said image zones; and
neural network means for subsampling said region and for identifying a zone type of said subsampled region of said image zone.
7. A system as recited in claim 6, wherein said neural network comprises:
a layer of subsampling neurons; and
a multilayer perceptron connected to said layer of subsampling neurons.
8. A system as recited in claim 7, wherein said layer of subsampling neurons comprises a plurality of fixed-weight, binary threshhold neurons.
9. A system as recited in claim 7, wherein said multilayer perceptron comprises:
a plurality oi input-layer neurons;
a plurality of hidden-layer neurons each connected to each of said input-layer neurons; and
a plurality of output-layer neurons each connected to each of said hidden-layer neurons.
10. An image zone discrimination method executed on a computer as part of a computer program for identifying a plurality of zone types including text, images, and tables from a page of a document, said computer being connectable to receive a plurality of image zones, said method comprising the steps of:
(a) selecting a region from each of said image zones;
(b) subsampling said region; and
(c) identifying automatically a zone type of said subsampled region of said image zone.
11. An image zone discrimination method as recited in claim 10, further comprising the step of:
(d) labelling said region of said raster image with said identified zone type.
12. An image zone discrimination method executed on a computer as part of a computer program for identifying a plurality of zone types including text, images, and tables from a page of a document, said computer being connectable to receive a plurality of zone images from said page of said document, said computer including a multiple layered neural network, said method comprising the steps of:
a) selecting a region from each of said zone images;
b) subsampling said region using said multiple layered neural network;
c) identifying a zone type of said region which identifies said zone type of said region using said multiple layered neural network; and
d) labelling said region with said identified zone type.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US2530993A | 1993-03-02 | 1993-03-02 |
Publications (2)
Publication Number | Publication Date |
---|---|
GB9403151D0 GB9403151D0 (en) | 1994-04-06 |
GB2275844A true GB2275844A (en) | 1994-09-07 |
Family
ID=21825284
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB9403151A Withdrawn GB2275844A (en) | 1993-03-02 | 1994-02-18 | Image Zone Discrimination Using a Neural Network |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPH0773154A (en) |
GB (1) | GB2275844A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287998A (en) * | 2019-05-28 | 2019-09-27 | 浙江工业大学 | A kind of scientific and technical literature picture extracting method based on Faster-RCNN |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10140573B2 (en) * | 2014-03-03 | 2018-11-27 | Qualcomm Incorporated | Neural network adaptation to current computational resources |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2167264A (en) * | 1984-10-19 | 1986-05-21 | Canon Kk | Discriminating between different types of image data |
WO1989003150A1 (en) * | 1987-10-05 | 1989-04-06 | Eastman Kodak Company | Image discrimination |
EP0404236A1 (en) * | 1989-06-21 | 1990-12-27 | Océ-Nederland B.V. | Image segmentation method and device |
EP0469315A2 (en) * | 1990-07-31 | 1992-02-05 | Siemens Aktiengesellschaft | Method for visual inspection of two- or three-dimensional images |
EP0494026A2 (en) * | 1990-12-31 | 1992-07-08 | Goldstar Co. Ltd. | Method for automatically distinguishing between graphic information and text information of image data |
-
1994
- 1994-02-18 GB GB9403151A patent/GB2275844A/en not_active Withdrawn
- 1994-02-25 JP JP6051024A patent/JPH0773154A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2167264A (en) * | 1984-10-19 | 1986-05-21 | Canon Kk | Discriminating between different types of image data |
WO1989003150A1 (en) * | 1987-10-05 | 1989-04-06 | Eastman Kodak Company | Image discrimination |
EP0404236A1 (en) * | 1989-06-21 | 1990-12-27 | Océ-Nederland B.V. | Image segmentation method and device |
EP0469315A2 (en) * | 1990-07-31 | 1992-02-05 | Siemens Aktiengesellschaft | Method for visual inspection of two- or three-dimensional images |
EP0494026A2 (en) * | 1990-12-31 | 1992-07-08 | Goldstar Co. Ltd. | Method for automatically distinguishing between graphic information and text information of image data |
Non-Patent Citations (1)
Title |
---|
IBM Technical Disclosure Bulletin,Vol 29,No 5,October 1986 pages 2130 to 2133 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287998A (en) * | 2019-05-28 | 2019-09-27 | 浙江工业大学 | A kind of scientific and technical literature picture extracting method based on Faster-RCNN |
Also Published As
Publication number | Publication date |
---|---|
JPH0773154A (en) | 1995-03-17 |
GB9403151D0 (en) | 1994-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5373566A (en) | Neural network-based diacritical marker recognition system and method | |
US6563959B1 (en) | Perceptual similarity image retrieval method | |
US9910829B2 (en) | Automatic document separation | |
US7031555B2 (en) | Perceptual similarity image retrieval | |
US20040218838A1 (en) | Image processing apparatus and method therefor | |
US5852676A (en) | Method and apparatus for locating and identifying fields within a document | |
Yeung et al. | Video browsing using clustering and scene transitions on compressed sequences | |
EP0654746B1 (en) | Form identification and processing system | |
US5818952A (en) | Apparatus for assigning categories to words in a documents for databases | |
CN109964250A (en) | For analyzing the method and system of the image in convolutional neural networks | |
CN1343339A (en) | Video stream classifiable symbol isolation method and system | |
US20050008263A1 (en) | Image retrieving system, image classifying system, image retrieving program, image classifying program, image retrieving method and image classifying method | |
CN110321894A (en) | A kind of library book method for rapidly positioning based on deep learning OCR | |
JP3634266B2 (en) | Color video processing method and apparatus | |
CN109213886B (en) | Image retrieval method and system based on image segmentation and fuzzy pattern recognition | |
EP0388725B1 (en) | Texture discrimination method | |
Bouillon et al. | Grayification: a meaningful grayscale conversion to improve handwritten historical documents analysis | |
US7286722B2 (en) | Memo image managing apparatus, memo image managing system and memo image managing method | |
CN111553361B (en) | Pathological section label identification method | |
CN101802844B (en) | Applying a segmentation engine to different mappings of a digital image | |
GB2275844A (en) | Image Zone Discrimination Using a Neural Network | |
CN112365451A (en) | Method, device and equipment for determining image quality grade and computer readable medium | |
CN110659585A (en) | Pedestrian detection method based on interactive attribute supervision | |
CN114694133B (en) | Text recognition method based on combination of image processing and deep learning | |
JP4031189B2 (en) | Document recognition apparatus and document recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |