WO2000060532A9

WO2000060532A9 - Method and apparatus for restoration of low resolution images

Info

Publication number: WO2000060532A9
Application number: PCT/US2000/008494
Authority: WO
Inventors: Chein-I Chang; Paul Thouin
Original assignee: Univ Maryland; Chein-I Chang; Paul Thouin
Priority date: 1999-04-01
Filing date: 2000-03-30
Publication date: 2002-06-20
Also published as: AU4052300A; WO2000060532A1

Abstract

The definition of a low-resolution image (10) may be improved by taking advantage of characteristics of its bimodal distribution (202) so as to produce an expanded image (206) with improved definition. This can not only reconstruct image information deteriorated by low-resolution devices, but also can refine original low spatial resolution so as to significantly improve image quality. An image is improved by iteratively solving a nonlinear optimization problem using a preferred Bimodal-Smooth-Average scoring method. Among the applications for the technique are improved optical character recognition, restoration of binary images, and restoration of video frames.

Description

TITLE: METHOD AND APPARATUS FOR RESTORATION OF LOW RESOLUTION IMAGES

FIELD AND BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates in general to image processing, and more particularly to creating a higher resolution image from a lower resolution image.

Background Information

The invention described and claimed herein comprises a novel method and apparatus for improving the definition of a low-resolution image by taking advantage of characteristics of its bimodal distribution so as to produce an expanded image with improved definition. This can not only^' reconstruct image information deteriorated by low-resolution devices, but also can refine original low spatial resolution images so as to significantly improve image quality. An image is improved by iteratively solving a nonlinear optimization problem using a preferred Bimodal-Smooth- Average scoring method. Among the applications for the technique are improved optical character recognition, restoration of binary images and video frames.

As used herein, the terms "low" and "high" are comparative only: they refer to the comparison between two images and are not meant to imply any absolute degree of resolution. As described in greater detail below, image acquisition consists of converting a continuous image into discrete values obtained from a group of sensor elements. While generally cheaper to produce and transmit, a low- resolution imaging system generally produces a less accurate representation of the continuous image than would a high-resolution imaging system. Common methods of expanding low-resolution images to high- 5 resolution images by interpolation typically smooth over important details in the process.

Many applications (including OCR, video surveilance and video compression algorithms) would benefit from improved expansion techniques.

Processing of document images has traditionally been performed using the binary format due to savings in disk storage and computer processing. As computer performance and storage capacity have dramatically increased recently, grayscale scanning and processing of document images have become more prevalent. Text images traditionally have been obtained from documents. However due to the current increase in digital video a significant amount of text is found in this type of imagery as well.

Document images are typically acquired by a scanner, whereas video frames are most often captured by a digital camera. The image acquisition process in both cases consists of converting a continuous image into ^■--> discrete values obtained from a group of sensor elements. Each sensor element produc'es a value which is a function of the amount of light incident on the device. For 8-bit grayscale quantization, the allowable range of values for each sensor are integers from 0 (black) to 255 (white). The sensors are typically arranged in a non-overlapping grid of square elements, smaller elements result in higher resolution imagery.

Text image resolution expansion has become increasingly important in a number of areas of image process- 20 ing. Optical Character Recognition (OCR) of document images continues to be of great importance as we attempt to become a paperless society. Restoring text from video surveillance imagery is often crucial to law enforcement agencies. Digital video compression algorithms can also benefit from successful text resolution expansion techniques. Common methods of interpolation, which were not designed specifically for text images, typically smooth over the important details and produce inadequate expansion. This application describes a new nonlinear restoration technique for text images, which creates smooth foreground and background regions while preserving sharp edge transitions.

There has been significant research in both text enhancement and resolution expansion fields [7]. Some text enhancement efforts focus on fixing broken or touching characters [19, 22]. Other methods [5, 23] use degraded samples to model the image and then solve for the restored image using using a recursive technique. To remove the effects of noise, [12] uses a morphological filter and [1] uses pixel patterns to improve OCR results. A variety of methods have been proposed in order to improve contrast within text images including quadratic filters [14], soft morphological filters [9], non-linear mapping [17], and using a multi-resolution pyramid approach and fuzzy edge detectors [15]. Numerous resolution expansion methods have been published in the literature as well [2, 4, 6, 8, 10, 13, 16, 21, 3]. Of particular interest are linear interpolation and cubic spline expansion which will be used to compare with the proposed technique. Linear interpolation tends to smooth the image data at transition regions and results in a high-resolution image that appears blurry. Cubic spline expansion allows for sharp transitions, but tends to produce a ringing effect at these discontinuities.

A number of research efforts investigated combinations of text enhancement with resolution expansion in order to improve low-resolution text images. Shannon interpolation is performed with text separation from the image background in [11] to improve the OCR accuracy of digital video. In [20], deblurring is combined with linear interpolation to enhance document images obtained from digital cameras. Both of these methods essentially perform the text enhancement and resolution expansion as two separate steps. The proposed method computes resolution expansion using an algorithm specifically designed to enhance text images to perform both of these steps simultaneously.

The goal of resolution expansion is to create an expanded image with improved definition from observed low-resolution imagery. Acquisition of this low-resolution imagery can be modeled by averaging a block of pixels within a high-resolution image. Resolution expansion is an ill-posed inverse problem. For a given low-resolution image, a virtually infinite set of expanded images can be generated by the observed data.

SUMMARY OF THE INVENTION

The foregoing problems are overcome, and other advantages are provided by a method and apparatus for enhancing an image by iteratively solving a nonlinear optimization problem using a preferred Bimodal- Smooth-Average (BSA) scoring method, in accordance with the invention, which minimizes a BSA score, 5 defined as the weighted sum of separate bimodal, smoothness and average measures. This can not only reconstruct image information deteriorated by low-resolution devices, but also can refine original low spatial resolution so as to significantly improve image quality.

It is an object of the invention to restore a high-resolution image given a low-resolution image.

It is an object of the invention to provide a method and apparatus for improving the definition of a low- L0 resolution image so as to produce an image with improved definition.

It is a further object of the invention to provide a method and apparatus for reconstructing image information deteriorated by low-resolution devices.

It is a further object of the invention to provide a method and apparatus for refining original low spatial resolution so as to significantly improve image quality.

L5 It is a further object of the invention to automatically enhance text within document or video images so as to improve the accuracy of subsequent optical character recognition.

It is a further object of the invention to provide a method and apparatus for improving restoration of video images.

A principal feature of the invention is the iterative solution of a nonlinear optimization problem using a preferred Bimodal- Smooth-Average scoring method.

Among the advantages of the invention are automated enhancement of images.

These and other objects, features and advantages which will be apparent from the discussion which follows are achieved, in accordance with the invention, by providing a novel method and apparatus for improving image quality by iteratively solving a nonlinear optimization problem, using a Bimodal-Smooth-Average scoring method.

The various features of novelty which characterize the invention are pointed out with particularity in the claims annexed to and forming a part of this disclosure. For a better understanding of the invention, its advantages and objects, reference is made to the accompanying drawings and descriptive matter in which a preferred embodiment of the invention is illustrated.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and still other objects of this invention will become apparent, along with various advantages and features of novelty residing in the present embodiments, from study of the following drawings, in which:

Figure 1 is a schematic of a system suitable for carrying out the invention.

5 Figure 2 is a block diagram of the text resolution expansion system of the invention.

Figure 3 is a flow chart of the BSA process of the invention.

Figure 4 is an illustration of the results of a BSA restoration experiment, showing the progressive improvement of resolution with iterations.

Figure 5 illustrates the performance of the BSA process as compared to prior art techniques of linear ' 10 interpolation and cubic spline in grayscale document restoration and grayscale video frame restoration, respectively.

Figures 6 and 7 illustrate the performance of optical character recognition following cubic spline (Figure 6) versus BSA (Figure 7) image enhancement.

Figure 8 compares OCR accuracy of spline vs. BSA for gray-level images and binary document images.

^■ _j r Figure 9 illustrates experimental results of binary document restoration.

Figures 10 and 11 illustrate experimental results of restoration of video frames. DESCRIPTION OF THE PREFERRED EMBODIMENT

The invention is a novel method and apparatus which improves the definition of a low-resolution image by taking advantage of characteristics of its bimodal distribution so as to produce an expanded image with improved definition. This can not only reconstruct image information deteriorated by low-resolution devices, but also can refine original low spatial resolution so as to significantly improve image quality.

Referring to Figure 1, the invention may be illustrated in the context of an image acquisition system (1) providing image data (10) which serves as an input to a general purpose computer (100) programmed to carry out the following processes.

Image acquisition system (1) comprises a plurality of image sensors (2) which generate image data (10) concerning the image. This image data (10) serves as input to computer (100) via input means (101).

Computer (100) includes processing means (102), programmed to execute the steps set forth herein so as to generate a high-resolution image, which may be displayed or otherwise used via output means (103).

Referring to Figure 2, a text resolution expansion systm receives low-resolution image (10), preprocesses the image in preprocessing module (201) and outputs the results to bimodal estimation module (202); the" bimodal estimation module (202) processes the data and outputs the result to resolution expansion module

(203) for pixel replication and output to BSA restoration module (204) where the image is partitioned into overlapping blocks prior to BSA restoration; the BSA restoration module then outputs the result to a post-processing module (205) which provides an expanded resolution image (206) for display or other use.

The operation of the BSA restoration module (204) is illustrated in greater detail in Figure 3. An initial image (301) is provided and an initial BSA score is computed (302); an image update is then computed

(303) so as to produce an updated image (304) . A new BSA score is computed for the updated image (305) and compared to the prior BSA score (306); if the new score does not represent an improvement over the prior score by a predetermined amount, the restoration process is complete; otherwise, another image update is computed (303) and the process continues iteratively until the improvement is below the predetermined amount.

The image acquisition process consists of converting a continuous image into discrete values obtained from a group of sensor elements. Each sensor element produces a value which is a function of the amount of light incident on the device. For 8-bit grayscale quantization, the allowable range of values for each sensor are integers from 0 (black) to 255 (white). The sensors are typically arranged in a non-overlapping grid of square elements, smaller elements result in higher resolution imagery. In a high-resolution imaging system the number of sensors is adequate to represent the desired text image. The majority of pixels within the image are either white or black, with a small number of gray pixels occurring at the edges. In a low-resolution imaging system the number of sensors has been reduced. This low-resolution acquisition results in significant blockiness and is insufficient to accurately represent this image. Each sensor element effectively averages the image within its section of the grid, resulting in an increased amount of gray pixels. Low-resolution imaging can therefore be thought of as block-averaging high-resolution images.

The problem is to restore the high-resolution image HI_qr>qc given only the low-resolution image LI_r._c, where r and c are the number of rows and columns in the low-resolution image and q is the resolution expansion factor. The image acquisition process of obtaining I_r._c from Η.I_qτΛC is given by

The value of LI_r._c is the average of the high-resolution pixels within the q x q neighborhood. Eq. (1) represents a typical image restoration problem where we are required to restore the HI_qr._qc based on the observed LI_r._c via the relationship described by this equation. Since there are a great number of high- resolution images which may satisfy the constraint of the observed low-resolution image given by Eq. (1), image restoration is generally an ill-posed inverse problem. The system presented in block diagram in Fig. 2 comprises five modules, the Preprocessing Module, the Bimodal Estimation Module, the Resolution Expansion Module, the BSA Restoration Module, and the Postprocessing Module. A low-resolution image is initially input to the Preprocessing Module where binary and color images are converted to grayscale images. The image histogram is computed by the Bimodal Estimation Module to estimate the means of the black and white pixel distributions. The initial image expansion is performed by the Resolution Expansion Module which uses pixel replication to create a high- resolution image from the low-resolution original. The image is divided into overlapping blocks of pixels which are restored independently by the BSA algorithm in the BSA Restoration Module. When a binary expanded image is desired, the Postprocessing Module performs a grayscale-to-binary conversion. The next section details the image expansion system for grayscale images, followed by a section on binary image expansion.

Grayscale Image Resolution Expansion

The image resolution expansion process for grayscale images, which includes both video frames and grayscale document images, is described in this section. It should be noted that our system depicted in Fig. 2 processes grayscale and color images in a very similar fashion. The only difference is in the Preprocessing Module where color images are initially converted to grayscale images using standard techniques. These formerly color images are processed throughout the system as grayscale and the resulting high-resolution expanded images are grayscale as well. Since the goal of our system is to improve automated OCR results, this loss of color information is not a concern. Once the system has a grayscale image, its histogram is then computed by the Bimodal Estimation Module. The distribution of a text image is typically bimodal with a large white peak corresponding to the background and a smaller dark peak corresponding to the text pixels. The peak of the white distribution μw and the peak of the black distribution μβ are estimated from the calculated histogram. These values are required later by the system. Pixel replication, where the value of each high-resolution pixel is equal to its corresponding low-resolution pixel, is performed in the Resolution Expansion Module to form the initial expanded image. The BSA Restoration Module partitions the image into overlapping blocks and restores the blocks independently. Finally, the Postprocessing Module converts the high-resolution grayscale image to a binary image if a binary expanded image is desirable. In what follows, the BSA Restoration Module will be described in detail.

A block diagram of the BSA Restoration process is shown in Fig. 3 where Ij represents the image at iteration i, BSAi is the score at iteration i, and δi is the change to the image at iteration i. The restoration process can be summarized as follows. The initial score -55-4₀ is computed from the original image Jo- At each iteration, the image update _>, and new score BSAi are computed. The iterations continue to minimize the BSA score until a convergence is reached. At this stage, a restored image is produced.

The scoring function used by the BSA Restoration process is designed to measure how well a group of pixels within an image represent the desired properties of a text image. This function, referred to as the BSA scoring function, is expressed as the weighted sum of a Bimodal score B, a Smoothness score S, and an Average score A. More precisely, the BSA score is defined by

BSA{x) = λι-3(-e) + λ₂S{x) + X₃A{x) (2)

where λi, λ₂, and λ₃ are Lagrange multipliers and a; is a block of pixels to be restored. Throughout this paper, a block x is defined both as a group of 4 x 4 low-resolution pixels or as the 4g x 4q high-resolution pixels that are derived from them. The 4 x 4 size was specifically chosen because it contains enough pixels to adequately measure text characteristics but is not too large to be computationally burdensome. The goal of the restoration process is to iteratively solve for the block of pixels x that minimizes the BSA{x) score given b Eq. (2). Each of the three scores will be discussed in the following subsections followed by a description of an iterative minimization procedure.

The Bimodal Score The typical distribution of a text image contains two peaks, a large one at μ_whi_te, which normally represent the page's background, and a secondary peak at μ_ti_ack representing the foreground text. From the histogram of the given low-resolution text image, estimates of the means for the black and white distributions are calculated. These means are used to compute the bimodal score B{x), which measures how far an image block x is from bimodal. The bimodal score used in this paper is defined by

5(l) = {Xr,c - μblack)² Xr,c ~ βwhitef (3) r,c

where r and c are the row and column indices within the block being evaluated.

When a pixel value within x is close to either βbi c_k ^or μw_hit_e, its contribution to B{x) is minimal. The bimodal minimum score of B(x) = 0 means that the image is perfectly bimodal, the value of every pixel is equal to either μ_whi_te or μu_ack- Solving for the block of pixels x that minimizes B{x) produces a strongly bimodal image, which is one of the desired properties of this proposed text restoration technique.

To minimize B{x), both the first and second derivatives of the bimodal score will be computed. The score given by Eq. (3) is differentiable, the partial derivative of B{x) with respect to pixel x_r,_c is

_Ά = 4x^₀ - 6{μ_white + μ_biack)x² _r,_c (⁴)

X^ white + ^βwhiteβblack + βblack)^xr,c ~ 2βwhiteβblack {βwhite + βblack)

and the second partial derivative for the bimodal score is

d²B{x) dxl — ^2^χr,c ^~ 12(t-ωft.ie + βblack)Xr,c + ^{βwhite + ^βwhiteβblack + βblack) (5)

Partial derivatives of the bimodal score B{x) are independent of neighboring pixels. The estimated means of the bimodal distribution, μu_ack and μ_whi_te, are known a priori and their appropriate constants can be pre-computed.

The Smoothness Score

With the exception of edges, text images tend to be very smooth in both the foreground and background regions which results in neighbors with similar values. A smoothness score, which is computed for each block of pixels, is introduced to measure this feature. For this proposed algorithm, a simple statistic using only the four nearest neighbors of each pixel is used. Other more sophisticated smoothness measures could be implemented as well. The smoothness score S{x) used by this technique is given by

^Si^x) = ∑[(^xr-l,c ~ ^xr,c + {^χ _r,c-l ~ ^χr,X + { r,c+l ^~ %r,c)² + {^xr+l,c ^{~ x}r,c)²] (6) r,c

where r" and c are the row and column indices within the block being evaluated. The minimum value of S{x) = 0 occurs when all pixels have identical values.

The first and second partial derivatives are calculated in a straightforward manner. The first partial derivative of this smoothing score with respect to pixel x_r._c is

dS(x) „ . . ._.

— = -2(αv-l,c + Xr,c-1 + r,c+l + ^xr+l,c) + 8 r,c (7)

The second partial derivatives are nonzero only when

d*S(x) _ _g &S{x) _ ₂ d*S(x) _ . _(g)

and are equal to zero for all other combinations. The first partial derivative of the smoothness score S{x) is a function of the four neighboring pixels. The second partial derivatives are non-zero only with respect to their corresponding neighbors. The Average Constraint Score

It is reasonable to require that the average of a group of high-resolution pixels is close to the original value of the low-resolution pixel from which they were derived. For each block of low-resolution pixels, an average score A{x) is used to measure how well the restored high-resolution pixels meet the average constraint imposed by their corresponding low-resolution pixels. Let the values of the four low-resolution pixels within a 2 x 2 block be μι, μ₂, μ₃, and μ₄. The q x q group of high-resolution pixels that are being restored from pixel μι are represented by {xf- , 1 < (r, c) < q}. The average score for this 2 x 2 block is expressed by

where i is the index for the low-resolution pixels, μi is the value of each low-resolution pixel, and X_r,'_c are the restored high-resolution pixels corresponding to pixel /i_». The initial high-resolution image formed by using pixel replication always has an average score of zero because it satisfies the constraint.

The first partial derivative for the group of high-resolution pixels corresponding to pixel μi is equal to

The second partial derivatives are simply,

_{{i) )} = -, V(r, c, r₁, c₁) G _:W (11) r, O γ-_ ,c_ "

Both the first and second partial derivatives of the average score A(x) are non-zero only with respect to pixels within the q x q group of high-resolution pixels corresponding to the low-resolution pixel from which they were expanded. Solving for the Restored Image

The goal of the restoration algorithm is to solve for the image block that minimizes the scoring function BSA(x) introduced in Eq. 2. Throughout this paper, a block x is defined both as a group of 4 x 4 low- resolution pixels and as the 4q x 4q high-resolution pixels that are derived from them. The 4 x 4 size was specifically chosen because it contains enough pixels to adequately measure text characteristics but is not too large to be computationally burdensome. The goal of resolution enhancement is to create a restored image with improved resolution.

Pixel replication, where every value within a q x q neighborhood is identical to the corresponding low- resolution pixel, is used for the initial expansion. Each 4q x 4q block of high-resolution pixels is restored independently using iterative optimization techniques described in this section to solve for the block which minimizes the BSA score. At each iteration, the first and second partial derivatives of the BSA scoring function are used to determine the image update. To avoid block boundary discontinuities only the center 3q x >q pixels are updated. The entire image is therefore divided into blocks that overlap by one quarter, or q x 4q pixels, and can be restored independently. This iterative minimization of the BSA score continues until convergence is reached resulting in the restored image.

Initially, each 4ςr x 4q block of pixels x is converted to a (4g)²-long vector x using raster scanning,

x[q(r — 1) + c] = x(r, c) for 1 < r, c < q (12)

A small distance away from x the BSA function can be represented by its second order Taylor series approximation [18],

BSA{x + δ) ∞ BSA(x) + [VBSA{x)]δ + (13)

and the change in BSA is given by

where δ is the small change to the image vector x, VBSA{x) is'the gradient, and H is the Hessian matrix. The (4ςr)² x {4q)² Hessian given below by Eq. (15) is the symmetric matrix of mixed partial second derivatives, which shows how a change in two variables affects the BSA function.

Since the Hessian matrix is symmetric,

d²BSA _ d²BSA

(16) dxidxj dx_jdxi

only half of the matrix needs to be computed. To maximize the function BSA{x), the variables in the Hessian matrix are first made independent. To do this the Hessian is diagonalized using a similarity transform. Each eigenvector of the Hessian matrix is placed in a separate column to form a unitary eigenmatrix E . That is, the product of the eigenmatrix with its transpose is equal to the identity matrix EE^T = I. When the Hessian matrix is pre-multiplied by the transposed eigenmatrix and post-multiplied by the eigenmatrix, the resulting matrix E^THE is diagonal. Because the Hessian is real and symmetric, it is always diagonalizable. The similarity transform results in the diagonalized Hessian, E^THE, which is shown at right.

The Taylor series approximation to the change in the scoring function ΔBSA can now be expressed in terms of the {4q)² x (4g)² Hessian matrix H, its {4q)² x {4q)² eigenmatrix E, the 1 x (4ςr)² gradient of the scoring function BSA{x), and the (4g)² x 1 small change in the image vector δ,

ABSA = {[VBSA{x)]E){E^τδ) + (17)

With the following substitutions,

VBSA'{x) = [VBSA(x)]E (18) δ' = E^τδ (19)

H' = E^THE (20)

Eq. (17) can be simplified to

ABSA = [V-3S-4' (£)][<?] + (21)

The functional minimum is achieved by stepping in the direction

δ^> = H'^VBSA'ix) (22)

in the transformed domain, which is simply δ = E¹ δ' in the pixel domain. For each iteration, the image update δ' is determined. The iterations continue until convergence is reached, resulting in a desired restored image.

An example of this iterative image restoration process is shown in Fig. 4. The original 4 x 4 block of pixels is expanded by a factor of q = 4 using pixel replication to produce a 16 x 16 high-resolution image shown in

Fig. 4(a). As the iterative restoration process proceeds in Figs. 4(b-f), the image becomes more bimodal and smooth resulting in a greatly improved image. The majority of gray pixels that occur between characters are replaced with either black or white values, resulting in a strongly bimodal distribution. The resulting image is _:, also smooth in both the foreground and background regions while maintaing the constraint that the average of each 4 x 4 block of high-resolution pixels is close to the original value of each corresponding low-resolution pixel. The minimization procedure is completed in 30 iterations for this image. As the iterations proceed, the image becomes more bimodal and smooth and these two scores are reduced. The average score increases as the restoration proceeds, but this score is still significantly smaller than the bimodal and smoothness scores. Minimization of the BSA score produces a restored image that is the optimal combination of these bimodal, smoothness, and average measures.

Experimental Results for Grayscale Images

The proposed BSA restoration algorithm was compared to several common expansion methods, including 0 pixel replication, linear interpolation, and cubic spline expansion. In linear interpolation, a linear fit is calculated between all pixels within each column, and then repeated for all pixels within each row. These images naturally tend to be smooth, without sharp discontinuities, producing blurry results. Cubic spline expansion [4] approximates the given discrete low-resolution pixels as a smooth continuous curve obtained from the weighted sum of cubic spline basis functions and resamples the curve to obtain the high-resolution -5 image. This method allows for sharp edges but often overshoots at these discontinuities, producing a ringing effect. The BSA text restoration technique creates smooth foreground and background regions and permits sharp edges at transition regions, while maintaining the low-resolution average constraint. Images restored with this technique are shown to be both qualitatively and quantitatively superior to other common resolution expansion methods. Two experiments were conducted to numerically compare the restoration methods. The ■0 first experiment involved creating low-resolution images from high-resolution originals, expanding the low- resolution imagery, and then measuring the distance to the originals. The second experiment involved scanning low-resolution document images, expanding the images with the various techniques, and using OCR accuracy to measure the success of the restoration. To qualitatively illustrate the differences between these resolution expansion techniques, Fig. 5 shows resulting images obtained from linear interpolation, cubic spline expansion, and ^"the proposed BSA-based restoration technique. The word "applications" from an image scanned at 100 dpi using 8-bit grayscale quantization is shown in Fig. 5(a) where significant blockiness is apparent. Linear interpolation by a factor of four was used to create the image in Fig. 5(b) which is very blurry and lacks good contrast. Fig. 5(c) depicts the resulting image from cubic spline expansion which has better contrast but is still not sharp at the edges. The image obtained using BSA restoration in Fig. 5(d) has excellent contrast and sharp edges and is superior to the images obtained using other interpolation methods for this example.

The first experiment to

measure image restoration success involved creating low-resolution images by block-averaging images as described by Eq. (1). Restored images are then compared with the original to determine the success of restoration numerically. The mean squared error (MSE) was used to compare the various methods of image resolution expansion. The definition of mean squared error is

.. R c MSE = -— 2_. _^"(original_r,c ~ restored_rχ² (23) r= c=l

where R and G are the number of rows and columns in the images.

A comparative study of the reduction in mean squared error for the various image expansion techniques was conducted. The first comparison involved a group of five full-page images scanned at 300 dpi and degraded with block-averaging factor q = 2. Linear interpolation reduced the MSE by an average of 34.4% for these images. Cubic spline expansion performed much better by reducing the MSE by an average of 70.3%. The proposed BSA restoration technique was the most accurate for these images and resulted in an average 80.9% reduction in mean squared error. A second group of five images again scanned at 300 dpi and severly degraded by q = 4 block-averaging were also used to compare the MSE of the various expansion techniques. Expansion using linear interpolation reduced the MSE by an average of 12.4% and cubic spline expansion resulted in a 40.2% reduction average. The best results were obtained using the BSA algorithm which reduced the mean squared error by an average of 60.7%.

As a final experiment, we used OCR accuracy to numerically compare our algorithm against existing resolution expansion methods. A set of 122 full-page journal documents from the University of Washington English Document Image Database I CD-ROM were used. These binary documents were initially printed

5 by a laser printer with 300 dots per inch (dpi) resolution. Each page was then scanned using 8-bit grayscale quantization at 75 dpi to create low-resolution original images. These 75 dpi resolution pages were then expanded using various resolution expansion methods by a factor of four to create 300 dpi images which were processed by Caere's OmniPage Pro 7.0 commercial OCR package, the world's best-selling desktop OCR software. For all cases within these experiments, both spline interpolation and BSA restoration produced

L0 superior results to linear interpolation, therefore the linear interpolation results will not be discussed further.

The resulting text files were compared to the ground truth provided by the University of Washington CD- ROM using the OCR Accuracy Report Version 5.3 software developed at UNLV-ISRI. There were a total of 339,575 characters in these 122 images. Cubic spline interpolation resulted in 36,959 character errors and the BSA-based method had 30,668 character errors for an overall improvement of 17.0% for this set of images.

L5 OCR character accuracy for all 122 document images is plotted in Fig. 8 for cubic spline interpolation and

BSA expansion. The results are sorted based on the spline OCR results which are shown as a solid line. For each image, the BSA result is plotted as an "X" if the OCR accuracy is worse than spline and plotted as an "O" where the accuracy has been improved. The BSA restoration resulted in improved OCR accuracy for 72% of the images in this test set. Even in the cases where cubic spline resolution expansion improved OCR

20 more than the BSA algorithm, the images produced by the BSA were typically more visually appealing.

For the images in this experiment, the expansions produced by the BSA algorithm produce superior OCR accuracy results compared to other existing methods.

Binary Image Resolution Expansion Document images have traditionally been scanned using binary thresholding due to the inherent binary nature of text images. Binary image processing therefore still retains great significance in the document research community. The BSA restoration method described in the previous section is well suited to process grayscale images but is not capable of restoring binary images. In order for our system to restore binary text images, each original binary image is first convolved with a spatial mask in the Preprocessing Module shown in Fig. 2 (201) to create a grayscale image. The Bimodal Estimation, Resolution Expansion, and BSA Restoration Modules used for binary images are identical to those used for grayscale images in our system. Once the high-resolution grayscale image has been created, a global threshold Tu_na_r is used in the Postprocessing Module to convert the grayscale image back to a binary image. This threshold is computed to be halfway between the white and black bimodal peaks

rp __ βwhite ~t^~ βblack C Λ\

-l binary — '"

This section will detail the conversion from a binary image to a grayscale image using a spatial convolution mask in the Preprocessing Module.

Binary to Grayscale Conversion

The first step of the binary restoration process is to convert the original low-resolution binary image into a low-resolution grayscale image. In our system, 8-bit grayscale quantization is used so that the allowable range of values for each grayscale pixel are integers from 0 (black) to 255 (white). Convolution is performed to create gray values from a group of neighboring binary pixels. The discrete convolution of two images f_x>y and g_Xιϋ, denoted by f_x._y * g_x__y, is defined by

B-i σ-i Jχ,y * g_x._y = _. _ Jr,c9x—r,y—c (25) r=0 C=0

where R is the number of rows and C is the number of columns within each image. The expression in Eq. (25) is a mathematical representation of a spatial convolution mask. Because convolution involves flipping g_XtV, it must be symmetric about its center. Throughout this paper the 3 x 3 mask g_x ._y defined by

is used to perform this convolution, where WM is the relative weight for the middle pixel, ws represents the four side pixels, and wc corresponds to the weight for each of the four corner pixels. The values of the mask are equal to zero for all other possible locations. Spatial convolution can be thought of as a weighted averaging over a neighborhood of pixels. A constraint is enforced on these weights, based on the distance from the center of the mask,

WM > ws > wc > 0 (27)

so pixels closer to the center are more heavily weighted. The value of the grayscale pixels χ^3rav are easily computed from the original binary pixel _r>c along with its eight neighbors

gray _

[wM r,c + Ws{^xr-l,c + r+l,c + r,c-l + r,c+l) + (28)

WM + 4ws + 4wc Wc{x_r-l,c-l + ^xr-l,c+l + -Vt-l,c-l + ^r+l.o+l))]

Constrained Binary to Grayscale Conversion

In our system, we impose a further constraint on the computed grayscale pixels. If the binary pixel is black, then the grayscale pixel should be dark as well. Specifically, if x = 0 then 0 < -c^srα2/ < 127 and similarly if x = 255 then 128 < χ^9rav < 255. However, this constraint is not always the case by performing the straightforward 3 x 3 convolutions.

In order to compute each grayscale pixel while enforcing this constraint, a parameter Δ is introduced to measure the effect of the surrounding eight pixel neighbors on pixel x{r, c)

^{Δ =} 4ws+4w_c [ws{^xr-l,c + Xr+l,c + ^χ _r,c-l + ^xr,c+l) + (29)

5 Wc{x_r-l,c-l + r- l.c+1 + ^χr+l,c-l + ^xr+l,c+l)}

The range of this measure is 0 < Δ < 255. The computed grayscale pixel χ^9rav is then determined based on the original value of the binary image x in the following manner

if a; = 0 then x^9ray(r, c) = X J (30) if x = 255 then x^9ray(r, c) = 128 + L— ]

Δ

L0 where \ x\ is the integer floor rounding function defined by the largest integer that is less than or equal to x.

To quantitatively compare our system with other expansion methods, resulting images were compared with cubic spline expansion to measure OCR performance. Shown in Fig. 6(a) is the resulting 300 dpi image obtained using cubic spline expansion on a paragraph from one of the University of Washington document images. The text file created by OCR is shown in Fig. 6(b) where mistakes have been highlighted. For >-5 this example, the OCR results for the cubic spline image had 7 areas where mistakes were made. The same image paragraph was processed by our system resulting in the image in Fig. 7(a) which has better contrast than its corresponding cubic spline image in Fig. 6(a). OCR results are improved as well, only 4 mistake areas are highlighted in Fig. 7(b).

Experimental Results for Binary Documents To measure the success of the system in restoring binary document images, a second experiment was conducted using a total of 48 document pages from the University of Washington database. These images were scanned at 100 dpi using binary quantization and then expanded by a factor of three using the binary BSA restoration technique described in Section 4 to create 300 dpi images. The binary enhancement process for the sample word "representative" from this experiment is shown in Fig. 9 to visually illustrate the improvements. The original binary image in Fig. 9(a) was convolved with a spatial mask of weights ws = 1 and w — 0.707 to produce a grayscale image in Fig. 9(b). This low-resolution grayscale image -was enhanced by the system to produce a high-resolution grayscale image in Fig. 9(c) which was converted to a binary image to produce a restored image in Fig. 9(d). The resulting high-resolution image was noticably less blocky than the original.

To quantitatively measure the success of the system in enhancing binary document images, OCR accuracy was compared before and after resolution expansion. The expanded images produced by our system were compared with 300 dpi images created by pixel replication. There were a total of over 140,000 characters in this dataset. The overall character accuracy for the original images was 82.0% and the overall character accuracy for the restored images was 89.1% which was a 39.5% reduction in errors, a significant improvement over pixel replication.

Experimental Results for Video Frames

This section demonstrates the capability of this system to improve low-resolution video frames. Video frames are typically acquired as color imagery. As discussed earlier, color images can be processed by the system in very similar fashion that is used to process grayscale images. The only difference lies in the Preprocessing

Module where color images are initially converted to grayscale images using standard techniques. These images are then processed throughout the system as grayscale images, so are the resulting high-resolution expanded images. All of the images described in this section were originally color and have been converted to grayscale by the system.

Six low-resolution video frames were processed by the system to demonstrate this capability. These video images were obtained from broadcast television and captured at a resolution of 320 x 240 pixels for each frame. Image restoration results for a sample video frame are shown in Figs. 10 and 11 for two different scales. The original grayscale image is displayed in both Figs. 10(a) and 11(a) where blockiness and touching characters were abundant. The image obtained from the BSA restoration in Figs. 10(b) and 11(b) is visually superior with a high contrast and uniform background.

System Performance

The previous experimental results demonstrate the advantages of our text restoration system compared to standard methods of interpolation such as linear interpolation and cubic spline expansion. The system was tested using images from the widely available University of Washington document database and evaluated using the commercial Caere OmniPage Pro OCR software package to measure character accuracy. For the document and video images in these experiments, the images produced by our restoration system resulted in a higher OCR accuracy compared to other standard methods.

As stated earlier, the goal of this resolution expansion system is to improve the OCR accuracy. Real-time image processing is not a requirement. Our system takes approximately 10 minutes to expand a single full- page 100 dpi document image by a factor of 3 on a 250 MHz workstation. Cubic spline interpolation is much faster than our system and takes about 10 seconds for a single document page. Video frames are typically smaller than low-resolution document images and are therefore processed more quickly by the system. A 320 x 240 video frame takes approximately 2 minutes for our system to expand by a factor of three. Cubic spline expansion of a video frame only takes about 2 seconds. If the speed concern becomes an issue in the future, our approach has a potential for massive parallelization because the images can be divided into blocks of pixels which can be restored independently. Summary

Thus, our system was shown to be capable of enhancing grayscale documents and video images as well as binary document images. The success of the system was demonstrated by experiments using images from a standard document image database and a commercial OCR package. Restoration of grayscale images was performed by optimizing bimodal, smoothness, and average (BSA) scores that measure desired properties of text images. These scores were combined to form a single scoring function which produced images that were strongly bimodal and smooth, while satisfying the average constraint score. When the original image was binary, it was initially converted to a grayscale image using a spatial convolution mask and then processed as grayscale images. These resultant images restored using our system were shown to be superior to images expanded using existing linear interpolation and cubic spline expansion techniques.

Optical character recognition accuracy was also used to quantitatively measure the success of various resolution expansion methods. Both binary and grayscale text images restored with our system were shown experimentally to improve OCR accuracy compared to linear interpolation and cubic spline expansion. Despite that our system was designed to enhance text images, it can also be used to accurately expand checks, tables, charts, graphs, line drawings, and other types of documents as well due to their similarities.

Bibliography

[1] T. Akiyama, N. Miyamoto, M. Oguro, and K. Ogura. Faxed Document Image Restoration Method based on Local Pixel Patterns. Proceedings of the SPIE, 3305:253-262, 1998.

[2] Ting Chung Chen and Rui J. P. de Figueiredo. Image Decimation and Interpolation Techniques based on Frequency Domain Analysis. IEEE Transactions on Communications, 32(4):479-483, 1984.

[3] Ting Chung Chen and Rui J. P. de Figueiredo. Two-Dimensional Interpolation by Generalized Spline Filters based _* on Partial Differential Equation Image Models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(3):631-642, June 1985.

[4] Hsieh S. Hou and Harry C. Andrews. Cubic Ssplines for Image Interpolation and Digital Filtering. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(6):508-517, December 1978.

[5] M. Y. Jaisimha, E. A. Riskin, R. Ladner, and S. Werner. Model-based Restoration of Document Images for OCR. Proceedings of the SPIE, 2660:297-308, 1996.

[6] N. B. Karayiannis and A. N. Venetsanopoulos. Image Interpolation based on Variational Principles. Signal Processing, 25:259-288, 1991.

[7] Howard Kaufman and A. Murat Tekalp. Survey of Estimation Techniques in Image Restoration. IEEE

Control Systems, ll(l):16-24, January 1991. [8] R. G. Keys. Cubic Convolution Interpolation for Digital Image Processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(6):1153-1160, 1981.

[9] L. Koskinen, H. Huttunen, and J. T. Astola. Text Enhancement Method based on Soft Morphological Filters. Proceedings of the SPIE, 2181:243-253, 1994.

[10] A. D. Kulkarni and K. Sivaraman. Interpolation of Digital Imagery using Hyperspace Spproximation.

Signal Processing, 7:65-73, 1987.

[11] H. Li, 0. E. Kia, and D. S. Doermann. Text Enhancement in Digital Video. Proceedings of the SPIE, 3651:2-9, 1999.

[12] J. Liang, R. M. Haralick, and I. T. Phillips. Document Image Restoration using Binary Morphological Filters. Proceedings of the SPIE, 2660:274-285, 1996.

[13] V. S. Nalwa. Edge-Fetector Tesolution Improvement by Image Interpolation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(3):446-451, 1987.

[14] G. Ramponi and P. Fontanot. Enhancing Document Images with a Quadratic Filter. Signal Processing, 33:23-34, 1993.

[15] F. Sattar and D. B. H. Tay. On the Multiresolution Enhancement of Document Images using Fuzzy

Logic Approach. ICPR98 Proceedings International Conference on Pattern Recognition, pages 939-942, 1998.

[16] Richard R. Schultz and Robert L. Stevenson. A Bayesian Approach to Image Expansion for Improved Definition. IEEE Transactions on Image Processing, 3(3):233-242, 1994.

[17] Y.-C. Shin, R. Sridhar, V. Demjanenko, P. W. Palumbo, and J. J. Hull. Contrast Enhancement of Mail

Piece Images. Proceedings of the SPIE, 1661:27-37, 1992.

[18] J. Skilling and R. K. Bryan. Maximum Entropy Image Reconstruction: General Algorithm. Mon. Not. Royal Astronomy Society, 211:111-124, 1984. [19] P. Stubberud, J. Kanai, and V. Kalluri. Adaptive Image Restoration of Text Images that Contain Touching or Broken Characters. ICDAR95 Proceedings International Conference on Document Analysis ^■ Recognition, pages 778-781, 1995.

[20] M. J. Taylor and C. R. Dance. Enhancement of Document Images from Cameras. Proceedings of the SPIE, 3305:230-241, 1998.

[21] Michael Unser, Akram Aldroubi, and Murray Eden. Fast B-Spline Transforms for Continuous Image Representation and Interpolation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3) :277-285, March 1991.

[22] A. P. Whichello and H. Yan. Linking Broken Character Borders with Variable Sized Masks to Improve Recognition. Pattern Recognition, 29(8):1429-1435, 1996.

[23] M.-Y. Yoon, S.-W. Lee, and J.-S. Kim. Faxed Image Restoration using Kalman Filtering. ICDAR95 Proceedings International Conference on Document Analysis Recognition, pages 677-681, 1995.

US Provisional Application Serial No. 60/127,316, filed April 1, 1999 (including its attachments) and the following attached documents are incorporated herein by reference as if fully set forth herein:

1. "A Method for Restoration of Low-Resolution Bimodal Images" (authored by PAUL THOUIN and CHEIN-I CHANG, and published in the conference proceedings for the 1999 Symposium on Document Image Understanding Technology at Annapolis, MD April 14-16, 1999).

2. "Constrained nonlinear restoration of JPEG compressed low- resolution text from grayscale images using a Gibbs-Markov random field prior" , authored by PAUL THOUIN and CHEIN-I CHANG, and published in IST/SPIE 10th Annual Symposium, Electronic Imaging '98: Science and Technology (presented at San Jose, California January 24-30, 1998); and

3. "A Gibbs-Markov Random Field Approach to Restoration of JPEG- Compressed Text Images" (unpublished, authored by PAUL THOUIN and CHEIN-I CHANG).

While a specific embodiment of the invention has been shown and described in detail to illustrate the application of the principles of the invention, it will be understood that the invention may be embodied otherwise without departing from such principles and that various modifications, alternate constructions, and equivalents will occur to those skilled in the art given the benefit of this disclosure. Thus, the invention is not limited to the specific embodiment described herein, but is defined by the appended claims.

Claims

1. A process for restoration of a low-resolution image comprising the steps of:

subdividing said low-resolution image into groups of pixels;

generating initial conditions as an initial expansion of a restored (high-resolution) image;

iteratively revision each group of the restored (high- resolution) image so as to minimize the BSA score for said group of pixels.

2. The process of claim 1 wherein the step of generating initial conditions comprises the step of either replicating or linearly interpolating or cubic-spline interpolating each group of pixels.

3. The process of claim 1 wherein said low-resolution image is a binary image.

4. The process of claim 1 wherein said low-resolution image is a gray-scale image.

5. The process of claim 1 wherein said low-resolution image is a color image.

6. The process of claim 1 wherein said low-resolution image is a video image.

7. The process of claim 1 wherein an image is provided, an initial BSA score associated with said input is computed, and updated image output is produced from said imaeg and its associated BSA score, and updated BSA score is computed associated with the updated image output, the prior computed BSA score is compared with the updated BSA score and updated image outputs and calculate updated BSA scores are produced iteratively until an updated BSA score does not differ from its prior computed BSA score by more than a predetermined amount.

8. A machine for restoration of a low-resolution image comprising image acquisition means for providing said low resolution image to a computer processor, said processor programmed to carry out the steps of subdividing said low resolution image into groups of pixels; replicating each group of pixels as an initial expansion of a restored (high-resolution) image; iteratively revising each group of the restored (high- resolution) image so as to minimize the BSA score for said group of pixels; and providing as an output a restored image.

9. A machine as in claim 8 wherein said computer processor further includes a preprocessing module which provides an output to a bimodal estimation module which outputs an estimation to a resolution expansion module for pixel replication and output to a BSA restoration module which produces an output representative of an enhanced image.

10. A machine as in claim 8 wherein said BSA restoration module comprises means for accepting an image as an input, computing an initial BSA score associated with said input and producing an updated image output therefrom, computing an updated BSA score associated with the updated image, comparing the prior computed BSA score with the updated BSA score and continuing to produce updated image outputs and calculate updated BSA scores until an updated BSA score does not differ from its prior computed BSA score by more than a predetermined amount.