WO2006103629A1

WO2006103629A1 - Method and device for enhancing a digital image

Info

Publication number: WO2006103629A1
Application number: PCT/IB2006/050945
Authority: WO
Inventors: Ahmet Ekin
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2005-04-01
Filing date: 2006-03-28
Publication date: 2006-10-05

Abstract

The present invention concerns a device for enhancing a digital image comprising: - a circuit (20) for detecting regions of the image containing predetermined types of objects and a background; and - a circuit (30) for processing said regions by modifying image parameters defining the contrast of the image, wherein said circuit (30) of a region comprises the determination, in a module (33), of two separate target range segments for said image parameters, a lower segment and a higher segment, image parameters of the background being mapped, in a module (34), to one of said segments and image parameters of the objects being mapped, in said module (34), to the other of said segments to maintain at least a predetermined relative contrast between said object and said background.

Description

METHOD AND DEVICE FOR ENHANCING A DIGITAL IMAGE

TECHNICAL FIELD OF THE INVENTION

The present invention relates to a device for enhancing a digital image and especially text regions in a digital image, in order to protect displays from burn- in effect, and to a corresponding enhancement method.

BACKGROUND OF THE INVENTION

Electronic displays activate red, green and blue light phosphors for each pixel to form images. Although there are variations among different technologies in the realization of this concept, all types of displays are susceptible to the same damaging effect, known as "burn- in". The "burn- in" effect results from the display of a static scene for a long duration, such as TV logos, score boxes etc. The "burn-in" effect mainly refers to deterioration and fast aging of the phosphor elements corresponding to pixels at the static image location. The end result is also appearance of a ghost image on the screen at all times, including the times the display has been turned off. The burn-in problem is not fixable and requires the display to be changed at a significant cost.

A common solution to the burn- in problem, for example for computer displays, is to activate a screen saver that involves the replacement of the original screen image with a moving image. However in many applications this is not an acceptable solution as it causes the disappearance of the original content. Other solutions also exist, based for example, on reducing the contrast of the image when a static region is detected, as for example in document US 6,313,878. Still another method consists in adjusting parameters such as luminance and colour channels or adding transparent overlays to static regions as described in document US 2003/0071769. However, none of these solutions preserves the quality of the content, that is to say, the quality of the objects within the image, and especially the readability of the textual content.

Document WO 02/075705 describes a solution, corresponding to the preamble of claim 1, in which digital images are enhanced by detecting regions of the image which contain a predetermined type of object and processing said regions by modifying image parameters which define the contrast of the image in order to adapt the processing to the regions depending upon the type of objects which they contain. Still this document does not provide a solution to the burn- in problem and even an eventual combination with the other documents mentioned would still have the drawbacks exposed above as the entire regions containing the object would be processed and thus the visibility of the objects would be affected to prevent a burn- in effect. Especially in the case of text objects or the like such as logos, the quality and the contrast are particularly important and a reduction of the contrast or of quality or even deterioration of one of the colours or luminance channel may make the text completely unreadable, which is not acceptable.

In another field, other solutions exist to prevent the burn- in effect as for example in document WO 98 09428 which deals with the superimposition of text information on a video background. In the document, two different images provided by two different sources are superimposed while the brightness of the image superimposed is controlled to maintain at least a predetermined minimum contrast between the solely enhanced image and the other image used as a reference. This predetermined minimum contrast is obtained by leaving the entire background image unchanged and leaving the entire superimposed image unchanged or increasing its overall brightness. Accordingly, the overall contrast of the image after superimposition is unchanged or increased, which aggravates the burn-in effect in consequence.

As indicated above, the known methods of enhancing a digital image while presuming the content, comprise detecting regions of the image containing predetermined types of objects and processing said regions by modifying image parameters defining the contrast, but do not deal efficiently with burn- in effect and the quality of the objects and their readability.

SUMMARY OF THE INVENTION

Accordingly it is an object of the invention to provide a device performing a new method to enhance a digital image while preserving the quality of the objects and preventing the burn-in effect.

To this end, the invention relates to a device for enhancing a digital image comprising:

- a first circuit for detecting regions of the image containing a predetermined type of object and a background; and

- a second circuit for processing said regions by modifying image parameters defining the contrast of the image, wherein said second circuit for processing a region comprises a module for the determination of two separate target range segments (δ-_|,δ₂) for said image parameters, a lower segment (O₁ ) and a higher segment (δ₂ ) , image parameters of the background being mapped to one of said segments and image parameters of the objects being mapped to the other of said segments to maintain at least a predetermined relative contrast between said object and said background.

With said method, objects in digital images are detected and special processing operations are applied in the corresponding regions. These operations comprise the modification of image parameters of both the object and the background. These modifications are adapted to maintain at least a predetermined relative contrast between the object and the background by mapping image parameters of the object and of the background to two separate contrast range segments to preserve the quality of the object.

In consequence, the overall contrast is limited to the maximum contrast existing between two segments which can be set in order to prevent the burn- in effect.

Besides, the quality of the object is preserved and its readability is ensured by this minimum predetermined relative contrast.

Other features of the device of the invention are further recited in the dependent claims.

It is also an object of the invention to provide corresponding method and program as recited in claims 11 and 13.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described, by way of example, with reference to the accompanying drawings in which :

Figure 1 illustrates an image enhancing device according to the invention.

Figures 2A and 2B and figures 3A, 3B and 3C are symbolical representations of texts regions during the implementation of the device shown in Figure 1.

DETAILED DESCRIPTION

Referring to figure 1, a device for enhancing a digital image is illustrated. In this embodiment, the digital image comprises a text object or the like such as a logo, and is a video frame that is part of a sequence of frames and thus has a previous frame and a following frame. Each digital image is an array of pixels.

Advantageously, the device comprises a circuit 10 for identifying static regions within the image over a predetermined period of time. This identification of static regions is achieved in a conventional way by comparison of following frames one to another. The accuracy of this identification depends upon the time period over which comparisons are made and the level of similarity accepted between the same regions of different frames.

The device of figure 1 also comprises a circuit 20 for detecting objects within the image and more precisely within the static regions of the image. For example, this detection circuit 20 performs a detection comprises a detection of objects using the algorithm described in the documents entitled "Robust real-time object detection" by P. VIOLA and M. JONES published under the reference Proc.IEEE CVPR 2001. Especially this detection circuit 20 comprises a module 21 for detecting text objects within the static regions of the processed image and specific algorithms are involved.

Some existing text detection algorithms exploit the high contrast properties of overlay text regions. In a favourable text detection algorithm, the horizontal and vertical derivatives of the frame where text will be detected are computed first in order to enhance the high contrast regions. It is well known in the image and video processing literature that simple masks approximate the derivative of an image.

After the derivatives are computed for each of the colour channels (or intensity and chrominance channels depending on the selected colour space), the edge orientation feature is computed. The edge orientation feature has first been proposed by Rainer Lienhart and Axel Wernicke in "Localizing and Segmenting Text in Images, Videos and Web Pages"; IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No.4, pp. 256-268, April 2002.

A statistical learning tool can be used to find an optimal text/non-text classifier. Support Vector Machines (SVMs) result in binary classifiers and have nice generalization capabilities. An SVM-based classifier trained with 1,000 text blocks and, at most, 3,000 non-text blocks for which edge orientation features are computed, has provided good results in our experiments. Because it is difficult to find the representative hard-to-classify non-text examples, the popular bootstrapping approach that was introduced by K.K. Sung and T. Poggio in "Example-based learning for view- based human face detection", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998, can be followed. Bootstrap-based training is completed in several iterations and, in each iteration, the resulting classifier is tested on some images that do not contain text. False alarms over this data set represent difficult non-text examples that the current classifier cannot correctly classify. These non-text samples are added to the training set; hence, the non-text training dataset grows and the classifier is retrained with this enlarged dataset.

When a classifier is being trained, an important issue to decide upon is the size of the image blocks that are fed to the classifier because the height of the block determines the smallest detectable font size whereas the width of the block determines the smallest detectable text width. A block size of 12 x 12 is chosen for the training of the classifier because in a typical frame with a height of 400 pixels, it is rare to find smaller font size than 12.

Having computed the SVM-based classifier parameters in the training stage, text detection in a new frame is performed in two stages: 1) Detection of text-candidate blocks by using SVM-based classifier, and 2) binarization of text to extract pixel- accurate text mask for pixel-accurate enhancement. In the first stage, edge orientation features are extracted for every 12x12 window in the image and all pixels in the current window are classified as text or not by the SVM-based classifier. Because text can be larger than 12 pixels, font size independence is achieved by running the classifier with 12 x 12 window size over multiple resolutions and location independence is achieved by moving the window in horizontal and vertical directions to evaluate the classifier over the whole image. This first stage extracts text-candidate blocks, also called regions of interest (ROI), that need to be further processed to extract binary text mask.

A module 21 for the detection of a text in the image is responsible for extraction of text line boundaries, binarization of the text, and extraction of individual words. Initially, the coordinates of the horizontal text lines are computed. For that purpose, edge detection is performed in the region of interest (ROI) to find the high- frequency pixels most of which are expected to be text. Because the ROI is mainly dominated by text, it is expected that the top of a text line will demonstrate an increase in the number of edges whereas the bottom of a text line will show a corresponding fall in the number of edges. Projections along horizontal and/or vertical dimensions are effective descriptors to easily determine such locations. In contrast to intensity projections that are used in many text segmentation algorithms, edge projections are robust to the variations in the colour of the text.

This module 21 for the detection of a text in the static regions may further comprise thresholding to extract binary text mask. This step involves automatically computing a threshold value to find the binary and pixel-wise more accurate text mask. The pixels occurring just outside the text line boundaries are defined as background. The threshold value is set such that no pixel outside the detected text lines, which refer to background, is assigned as text pixel.

The module 21 for the detection of a text in the static regions may further comprise determining a word boundary by performing a morphological closing operation on the identified text pixels and identifying further pixels as text pixels if the further pixels are located within the word boundary. A morphological closing operation and a connected-component labelling algorithm are applied to the resulting text mask to segment individual words. The closing operation joins separate characters in words while connected-component labelling algorithm extracts connected regions (words in this case).

In the embodiment described, the module 21 is followed by a mask image creation in a module 22, this mask image having the label of the object type for each pixel. For example, the value 0 is attributed for pixels that are not part of an object, the value 1 for a pixel being part of a text object and the value 2 for a pixel being part of a face object.

Advantageously, detecting the objects in the circuit 20 also comprises an estimation of the colours in the static regions of the image in order to detect the objects more accurately. In the embodiment described this is achieved in a module 23, by comparing the values of the colour parameters of each pixel to the parameters of each neighbour in order to detect the edges of the objects more accurately, the mask image being corrected consequently.

Advantageously, the circuit 20 also comprises a module 24 for the determination of the parameters of the text objects detected. For example, this module 24 allows to determine if a text is horizontal or slanted and also classify the text upon its size. For example this computation of the size of the text line is achieved by taking the absolute difference between the lowest and the highest y-coordinate of the text line. In an alternative embodiment the size is determined by finding the upper and lower base line coordinates of the text line so that the outline effect due to upper elongated letters and lower elongated characters can be prevented. The height in this case can be assigned as the absolute difference between the lower and the upper baseline y- coordinate.

Furthermore, in the text objects maximum and minimum luminance values are computed by the module 24 and if the difference is high and occurs for every character evenly distributed across the whole text-line, it is considered as text shading. Alternatively, the circuit 20 allows detection of logo objects or of both text and logo objects by the use of any existing algorithm.

After detection of the objects within the image, creation of the mask image, and determination of object parameters, enhancement of the digital image is achieved. Accordingly, the method continues by processing the digital image in a circuit 30. This processing is achieved by modifying image parameters defining the contrast of the image by local low level operations applied to the selected regions to process the image data by pixel or elementary group of pixels of varying sizes. Each region to process comprises an object, which is in the example a text object, and also a background that corresponds to everything else in the region, i.e. every pixel of the region that does not belong to an object.

The maximum possible contrast corresponds to the difference between the highest and the lowest pixel values that can be attained. This maximum contrast is fixed by the type of image. For example, for 8 bit images, it is equal to 255-0 = 255. Then for one specific image, the overall contrast is set as the difference between the highest pixel value in the image and the lowest pixel value in the image. Thus, at the most the contrast is also 255 for 8 bit images but it can be much less. This applies also for a region of an image, in which the overall contrast of the region is set as the difference between the highest and lowest pixel values in the region.

In the embodiment described, for each region to process, the circuit 30 first comprises a module 31 for the determination of an original contrast range. This original contrast range is identified by the Greek letter Δ and its length corresponds, in the example, to the overall contrast of the region, i.e. the interval between the highest and the lowest pixel values in the region, for one image parameter defining the contrast within the region such as luminance or colour. Accordingly, the length of the original contrast range Δ is equal to the value of the overall contrast of the region to process. Eventually, Δ is defined as the interval between the highest and the lowest pixel values of the noise filtered image, in which case its length is equal to the value of the overall contrast of the region after noise filtering.

Of course, the original contrast range can also be set as the interval between the highest and lowest possible values, its length being equal to the maximum possible contrast or be set as the interval between the highest and the lowest values in the image, thus its length being equal to the image overall contrast. In the example, each image parameters pixels in an 8 bit channel can assume values in the range 0 to 255 accordingly, the maximum length of the original contrast range Δ is 255. The Greek letters α and Ω are used to designate respectively the lowest and highest pixels luminance or colour value used to determine Δ such that Δ = [α,Ω].

The module 31 is followed by a module 32 for the determination of a target contrast range to which the region will be mapped in order to protect the display from burn- in effect. The target contrast range referenced as δ , is selected such that its length is significantly less than the length of the original contrast range Δ . For example, the length of the target contrast range δ is equal to 1/5 of the length of the original contrast range Δ . Advantageously, the lowest possible pixel value in the target contrast range, identified by the Greek letter β , is set to 0 and its highest pixel value, referenced by Greek letter ω , is equal to (Ω -α)/5 . For example, the target contrast range is set from 0 to 50 if the original contrast range is defined as the maximum possible contrast for an 8-bit image. Accordingly, the length of the target contrast range δ is inferior to the length of the original contrast range Δ and the highest value of the target contrast range δ is inferior to the highest value of the original contrast range Δ . These two constraints ensure that the overall contrast is decreased after processing.

Furthermore, the circuit 30 comprises a module 33 for the determination of a lower segment identified by δ₁ and a higher segment identified by δ₂ within the target contrast range δ . These segments δ₁ and δ₂ are two separate range segments, which means that the highest value of δ₁ segment is lower than the lowest value of segment δ₂. Each one of these segments is a target range for the modification of either the object or the background of the region to process. Advantageously the maximum and minimum values of each or one of these segments are defined as functions of values of the target range δ , such as the maximum value ω . For example, these segments are defined by the following equations: δ = [β,ω] δ_x = [C₄ω ; C₃ω] and δ₂ = [C₂ω;C_xω] with 0 < C₄ < C₃ < C₂ < C₁ < 1

Eventually, coefficients C₄ and C₃ as well as coefficients C₂ and C₁ are equal one to another in which case each of the target segments is restricted to one single value. For example, coefficients C₃ and C₄ are both equal to 0, segment δ-_\ being reduced to the value 0 and coefficients C₂ and C₁ are both equal to 1, segment δ₂ being reduced to the value ω .

Once these two target segments have been defined, the circuit 30 also comprises a module 34 for the modification of the image parameters of the object and of the background within the processed region. More precisely the image parameters of the object and of the background are mapped from the original contrast range respectively to one or the other of the segments. Thus a predetermined minimum relative contrast between the background and the object is ensured. The relative contrast is the contrast existing between the object and the background and is computed by the difference of image parameter values of pixels from the background and from the object.

In the embodiment described, the relative contrast is the smallest difference possible over the region between the image parameter value of any pixel of the object and the image parameter value of any pixel of the background. Accordingly the minimum relative contrast corresponds to the length of the interval between the two segments δ-_| and δ₂ which is different from zero as the two segments are separate. In another embodiment, the relative contrast is the difference between the median image parameter value of all the pixels of the background and the median image parameter value of all the pixels of the object. The value of the minimum relative contrast is determined by setting the coefficients C₁, C₂, C₃ and C₄.

Furthermore, the overall contrast of the entire region after processing is limited and at maximum is equal to the difference between the highest value of the higher segment δ₂ and the lower value of the lower segment δ-_| . These two segments being set in the contrast target range δ , the overall contrast is limited to the length of range δ . In the example, the length of the target contrast range δ is smaller than the length of to the original contrast range Δ and the maximum value of the range δ is smaller than the maximum value of the range Δ , thus the overall contrast of the region is decreased after processing. If the text is brighter than the background, which is called normal text, the background is mapped to the lower segment δ₁ while the text is mapped to the higher segment δ₂.

This situation is represented in figures 2 A and 2B. Figure 2 A represents the original region with a normal text while figure 2B represents the processed region with a target contrast range of 0 to 50 and the following coefficients C1=C2=1 and C3=C4=0. In the opposite case, if the text is darker than the background, which is called inverse text, then the text is mapped to the lower segment δ₁ and the background is mapped to the higher segment δ₂.

An example of such a processing applied simultaneously to a text region and a logo region is represented with references to the figures 3A to 3C. Figure 3A represents the original region, figure 3B represents the region after being processed in a first case in which the target contrast range is 0 to 50 and the coefficients used are respectively Cl=I, C2=0.5, C3=0.25 and C4=0. Figure 3C represents the same region after processing with the coefficients set as follow Cl=I, C2=0.75, C3=C4=0.

Accordingly, the overall contrast is decreased by reduction in intensity maximum and reduction of the length of the range to protect the display form burn- in effect. Furthermore the objects, and especially text objects or the like such as logos, are preserved by specific modifications to maintain a predetermined relative contrast between the background and the object by mapping each of them to a different separate segment of the target contrast range.

The method exposed above can be achieved by mapping one or several of different image parameters such as luminance or colour value of one channel. It can also be achieved by mapping several image parameters. For example, for content with colour each colour channel is processed independently.

Alternatively, target range segments for the background and for the object are computed for each colour channel and at least a predetermined difference is maintained between the target range segments defined for each colour channel so as to keep a relative contrast between them. In another embodiment, the target contrast range is predetermined and is not determined as a function of the original contrast range. In this case the overall contrast is limited by setting a proper target contrast range whose maximum value is adapted to avoid high luminance and whose length is adapted to restrict the maximum overall contrast.

Alternatively, the segments toward which the image parameters of the background and of the object are mapped are predetermined and are not defined as functions of the target contrast range values.

In another embodiment, the boundaries of the target range segments are functions of the maximum and/or minimum values of the target contrast range or of the original contrast range or of any other value such as the median value.

The method implemented in the device illustrated in figure 1 of the invention can be achieved either by computers and the like or by dedicated devices. Generally speaking, such a device has, according to the invention, a first unit adapted to detect regions of the image containing predetermined types of objects, such as text object, and a second unit adapted to process said regions of the image. This second unit is adapted to map image parameters of the background and image parameters of the object to the two separate lower segment O₁ and higher segment δ₂ , to maintain at least a predetermined relative contrast between said object and said background.

The method implemented in such an enhancing device according to the invention can be carried out by a computer program for a processing unit, comprising a set of instructions which, when loaded into said processing unit, causes the processing unit to carry out said method.

It may be observed that there are numerous ways of implementing functions by means of items of hardware or software, or both. In this respect, the drawings are very diagrammatic and represent only one possible embodiment of the device according to the invention. Thus, although Fig.1 shows different functions as different blocks, this by no means excludes that a single item of hardware or software carries out several functions. Nor does it exclude that an assembly of items of hardware or software or both carry out a function.

The remarks made herein before demonstrate that the detailed description with reference to the drawings, illustrates rather than limits the invention, the numerous alternatives falling within the scope of the appended claims. The word "comprising" does not exclude the presence of other elements or steps than those listed in a claim. The word "a" or "an" preceding an element or step does not exclude the presence of a plurality of such elements or steps.

Claims

1. A device for enhancing a digital image comprising:

- a circuit (20) for detecting regions of the image containing a predetermined type of object and a background; and

- a circuit (30) for processing said regions by modifying image parameters defining the contrast of the image, wherein said circuit (30) for processing said regions comprises a module (33) for the determination of two separate target range segments (δ-_|,δ₂) for said image parameters, a lower segment (O₁ ) and a higher segment (δ₂ ) , image parameters of the background being mapped in a module (34) to one of said segments and image parameters of the objects being mapped in said module (34) to the other of said segments to maintain at least a predetermined relative contrast between said object and said background.

2. A device according to claim 1, wherein it also comprises means for identifying within the image regions which are static over a predetermined period of time, further processing being applied only to said static regions.

3. A device according to any of claims 1 and 2, wherein said circuit (30) also comprises:

- a module (31) for determining the original contrast range (Δ) of the region to process;

- a module (32) for a target contrast range (δ ) for said region to process; and

- a module (33) for said two separate range segments (δ-_|,δ₂ ) within the target contrast range (δ ).

4. A device according to claim 3, wherein the maximum value of the target contrast range (δ ) is inferior to the maximum value of the original contrast range (Δ) and the length of the target contrast range (δ ) is inferior to the length of the original contrast range (Δ), to decrease the overall contrast.

5. A device according to claim 3, wherein the maximum and/or minimum values of said segments (δ-_|,δ₂) are function of the values of the target contrast range (δ ) and/or of the values of the original contrast range (Δ).

6. A device according to any of claims 1 to 5, wherein at least one of said segments (δ-_|,δ₂) is restricted to a single value.

7. A device according to any of claims 1 to 6, wherein the image processed comprises several channels with respective image parameters, each channel of the image being processed separately.

8. A device according to claim 7, wherein processing is adapted to maintain at least a predetermined difference between the values of the image parameter of said different image channels.

9. A device according to any of claims 1 to 8, wherein said image parameter is selected in the group consisting of luminance and colour brightness of each image channel.

10. A device according to any of claims 1 to 9, wherein said predetermined type of object is selected in the group consisting of text, logos and textual graphics content.

11. A method of enhancing a digital image comprising a first step adapted to detect regions of the image containing a predetermined type of object and a background, and a second step adapted to process said regions of the image by modifying image parameters defining the contrast of the image, characterized in that the second step is adapted to map (28) image parameters of the background and image parameters of the object to two separate target range segment (5^O₂ ) for said image parameters, a lower segment (O₁ ) and a higher segment (δ₂ ) , to maintain at least a predetermined relative contrast between said object and said background.

12. Computer program for a processing unit comprising a set of instructions which, when loaded into said processing unit, causes the processing unit to carry out the steps of the method as claimed in claim 11.