WO2006103629A1 - Method and device for enhancing a digital image - Google Patents

Method and device for enhancing a digital image Download PDF

Info

Publication number
WO2006103629A1
WO2006103629A1 PCT/IB2006/050945 IB2006050945W WO2006103629A1 WO 2006103629 A1 WO2006103629 A1 WO 2006103629A1 IB 2006050945 W IB2006050945 W IB 2006050945W WO 2006103629 A1 WO2006103629 A1 WO 2006103629A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
contrast
range
regions
text
Prior art date
Application number
PCT/IB2006/050945
Other languages
French (fr)
Inventor
Ahmet Ekin
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2006103629A1 publication Critical patent/WO2006103629A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing

Definitions

  • the present invention relates to a device for enhancing a digital image and especially text regions in a digital image, in order to protect displays from burn- in effect, and to a corresponding enhancement method.
  • a common solution to the burn- in problem is to activate a screen saver that involves the replacement of the original screen image with a moving image.
  • this is not an acceptable solution as it causes the disappearance of the original content.
  • Other solutions also exist, based for example, on reducing the contrast of the image when a static region is detected, as for example in document US 6,313,878.
  • Still another method consists in adjusting parameters such as luminance and colour channels or adding transparent overlays to static regions as described in document US 2003/0071769.
  • none of these solutions preserves the quality of the content, that is to say, the quality of the objects within the image, and especially the readability of the textual content.
  • Document WO 02/075705 describes a solution, corresponding to the preamble of claim 1, in which digital images are enhanced by detecting regions of the image which contain a predetermined type of object and processing said regions by modifying image parameters which define the contrast of the image in order to adapt the processing to the regions depending upon the type of objects which they contain. Still this document does not provide a solution to the burn- in problem and even an eventual combination with the other documents mentioned would still have the drawbacks exposed above as the entire regions containing the object would be processed and thus the visibility of the objects would be affected to prevent a burn- in effect. Especially in the case of text objects or the like such as logos, the quality and the contrast are particularly important and a reduction of the contrast or of quality or even deterioration of one of the colours or luminance channel may make the text completely unreadable, which is not acceptable.
  • the known methods of enhancing a digital image while presuming the content comprise detecting regions of the image containing predetermined types of objects and processing said regions by modifying image parameters defining the contrast, but do not deal efficiently with burn- in effect and the quality of the objects and their readability.
  • the invention relates to a device for enhancing a digital image comprising:
  • said second circuit for processing a region comprises a module for the determination of two separate target range segments ( ⁇ -
  • objects in digital images are detected and special processing operations are applied in the corresponding regions.
  • These operations comprise the modification of image parameters of both the object and the background.
  • These modifications are adapted to maintain at least a predetermined relative contrast between the object and the background by mapping image parameters of the object and of the background to two separate contrast range segments to preserve the quality of the object.
  • the overall contrast is limited to the maximum contrast existing between two segments which can be set in order to prevent the burn- in effect.
  • Figure 1 illustrates an image enhancing device according to the invention.
  • Figures 2A and 2B and figures 3A, 3B and 3C are symbolical representations of texts regions during the implementation of the device shown in Figure 1.
  • the digital image comprises a text object or the like such as a logo, and is a video frame that is part of a sequence of frames and thus has a previous frame and a following frame.
  • Each digital image is an array of pixels.
  • the device comprises a circuit 10 for identifying static regions within the image over a predetermined period of time.
  • This identification of static regions is achieved in a conventional way by comparison of following frames one to another. The accuracy of this identification depends upon the time period over which comparisons are made and the level of similarity accepted between the same regions of different frames.
  • the device of figure 1 also comprises a circuit 20 for detecting objects within the image and more precisely within the static regions of the image.
  • this detection circuit 20 performs a detection comprises a detection of objects using the algorithm described in the documents entitled "Robust real-time object detection” by P. VIOLA and M. JONES published under the reference Proc.IEEE CVPR 2001.
  • this detection circuit 20 comprises a module 21 for detecting text objects within the static regions of the processed image and specific algorithms are involved.
  • the edge orientation feature is computed.
  • the edge orientation feature has first been proposed by Rainer Lienhart and Axel Wernicke in "Localizing and Segmenting Text in Images, Videos and Web Pages"; IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No.4, pp. 256-268, April 2002.
  • a statistical learning tool can be used to find an optimal text/non-text classifier.
  • Support Vector Machines SVMs
  • SVMs Support Vector Machines
  • the popular bootstrapping approach that was introduced by K.K. Sung and T. Poggio in "Example-based learning for view- based human face detection", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998, can be followed.
  • Bootstrap-based training is completed in several iterations and, in each iteration, the resulting classifier is tested on some images that do not contain text. False alarms over this data set represent difficult non-text examples that the current classifier cannot correctly classify. These non-text samples are added to the training set; hence, the non-text training dataset grows and the classifier is retrained with this enlarged dataset.
  • a block size of 12 x 12 is chosen for the training of the classifier because in a typical frame with a height of 400 pixels, it is rare to find smaller font size than 12.
  • text detection in a new frame is performed in two stages: 1) Detection of text-candidate blocks by using SVM-based classifier, and 2) binarization of text to extract pixel- accurate text mask for pixel-accurate enhancement.
  • edge orientation features are extracted for every 12x12 window in the image and all pixels in the current window are classified as text or not by the SVM-based classifier. Because text can be larger than 12 pixels, font size independence is achieved by running the classifier with 12 x 12 window size over multiple resolutions and location independence is achieved by moving the window in horizontal and vertical directions to evaluate the classifier over the whole image.
  • This first stage extracts text-candidate blocks, also called regions of interest (ROI), that need to be further processed to extract binary text mask.
  • ROI regions of interest
  • a module 21 for the detection of a text in the image is responsible for extraction of text line boundaries, binarization of the text, and extraction of individual words. Initially, the coordinates of the horizontal text lines are computed. For that purpose, edge detection is performed in the region of interest (ROI) to find the high- frequency pixels most of which are expected to be text. Because the ROI is mainly dominated by text, it is expected that the top of a text line will demonstrate an increase in the number of edges whereas the bottom of a text line will show a corresponding fall in the number of edges. Projections along horizontal and/or vertical dimensions are effective descriptors to easily determine such locations. In contrast to intensity projections that are used in many text segmentation algorithms, edge projections are robust to the variations in the colour of the text.
  • This module 21 for the detection of a text in the static regions may further comprise thresholding to extract binary text mask. This step involves automatically computing a threshold value to find the binary and pixel-wise more accurate text mask. The pixels occurring just outside the text line boundaries are defined as background. The threshold value is set such that no pixel outside the detected text lines, which refer to background, is assigned as text pixel.
  • the module 21 for the detection of a text in the static regions may further comprise determining a word boundary by performing a morphological closing operation on the identified text pixels and identifying further pixels as text pixels if the further pixels are located within the word boundary.
  • a morphological closing operation and a connected-component labelling algorithm are applied to the resulting text mask to segment individual words. The closing operation joins separate characters in words while connected-component labelling algorithm extracts connected regions (words in this case).
  • the module 21 is followed by a mask image creation in a module 22, this mask image having the label of the object type for each pixel.
  • the value 0 is attributed for pixels that are not part of an object, the value 1 for a pixel being part of a text object and the value 2 for a pixel being part of a face object.
  • detecting the objects in the circuit 20 also comprises an estimation of the colours in the static regions of the image in order to detect the objects more accurately.
  • this is achieved in a module 23, by comparing the values of the colour parameters of each pixel to the parameters of each neighbour in order to detect the edges of the objects more accurately, the mask image being corrected consequently.
  • the circuit 20 also comprises a module 24 for the determination of the parameters of the text objects detected.
  • this module 24 allows to determine if a text is horizontal or slanted and also classify the text upon its size. For example this computation of the size of the text line is achieved by taking the absolute difference between the lowest and the highest y-coordinate of the text line.
  • the size is determined by finding the upper and lower base line coordinates of the text line so that the outline effect due to upper elongated letters and lower elongated characters can be prevented. The height in this case can be assigned as the absolute difference between the lower and the upper baseline y- coordinate.
  • the circuit 20 allows detection of logo objects or of both text and logo objects by the use of any existing algorithm.
  • the method continues by processing the digital image in a circuit 30.
  • This processing is achieved by modifying image parameters defining the contrast of the image by local low level operations applied to the selected regions to process the image data by pixel or elementary group of pixels of varying sizes.
  • Each region to process comprises an object, which is in the example a text object, and also a background that corresponds to everything else in the region, i.e. every pixel of the region that does not belong to an object.
  • the maximum possible contrast corresponds to the difference between the highest and the lowest pixel values that can be attained.
  • the circuit 30 first comprises a module 31 for the determination of an original contrast range.
  • This original contrast range is identified by the Greek letter ⁇ and its length corresponds, in the example, to the overall contrast of the region, i.e. the interval between the highest and the lowest pixel values in the region, for one image parameter defining the contrast within the region such as luminance or colour.
  • the length of the original contrast range ⁇ is equal to the value of the overall contrast of the region to process.
  • is defined as the interval between the highest and the lowest pixel values of the noise filtered image, in which case its length is equal to the value of the overall contrast of the region after noise filtering.
  • the original contrast range can also be set as the interval between the highest and lowest possible values, its length being equal to the maximum possible contrast or be set as the interval between the highest and the lowest values in the image, thus its length being equal to the image overall contrast.
  • each image parameters pixels in an 8 bit channel can assume values in the range 0 to 255 accordingly, the maximum length of the original contrast range ⁇ is 255.
  • the module 31 is followed by a module 32 for the determination of a target contrast range to which the region will be mapped in order to protect the display from burn- in effect.
  • the target contrast range referenced as ⁇ is selected such that its length is significantly less than the length of the original contrast range ⁇ .
  • the length of the target contrast range ⁇ is equal to 1/5 of the length of the original contrast range ⁇ .
  • the lowest possible pixel value in the target contrast range, identified by the Greek letter ⁇ is set to 0 and its highest pixel value, referenced by Greek letter ⁇ , is equal to ( ⁇ - ⁇ )/5 .
  • the target contrast range is set from 0 to 50 if the original contrast range is defined as the maximum possible contrast for an 8-bit image. Accordingly, the length of the target contrast range ⁇ is inferior to the length of the original contrast range ⁇ and the highest value of the target contrast range ⁇ is inferior to the highest value of the original contrast range ⁇ .
  • the circuit 30 comprises a module 33 for the determination of a lower segment identified by ⁇ 1 and a higher segment identified by ⁇ 2 within the target contrast range ⁇ .
  • These segments ⁇ 1 and ⁇ 2 are two separate range segments, which means that the highest value of ⁇ 1 segment is lower than the lowest value of segment ⁇ 2 .
  • Each one of these segments is a target range for the modification of either the object or the background of the region to process.
  • the maximum and minimum values of each or one of these segments are defined as functions of values of the target range ⁇ , such as the maximum value ⁇ .
  • coefficients C 4 and C 3 as well as coefficients C 2 and C 1 are equal one to another in which case each of the target segments is restricted to one single value. For example, coefficients C 3 and C 4 are both equal to 0, segment ⁇ - ⁇ being reduced to the value 0 and coefficients C 2 and C 1 are both equal to 1, segment ⁇ 2 being reduced to the value ⁇ .
  • the circuit 30 also comprises a module 34 for the modification of the image parameters of the object and of the background within the processed region. More precisely the image parameters of the object and of the background are mapped from the original contrast range respectively to one or the other of the segments. Thus a predetermined minimum relative contrast between the background and the object is ensured.
  • the relative contrast is the contrast existing between the object and the background and is computed by the difference of image parameter values of pixels from the background and from the object.
  • the relative contrast is the smallest difference possible over the region between the image parameter value of any pixel of the object and the image parameter value of any pixel of the background. Accordingly the minimum relative contrast corresponds to the length of the interval between the two segments ⁇ -
  • the relative contrast is the difference between the median image parameter value of all the pixels of the background and the median image parameter value of all the pixels of the object. The value of the minimum relative contrast is determined by setting the coefficients C 1 , C 2 , C 3 and C 4 .
  • the overall contrast of the entire region after processing is limited and at maximum is equal to the difference between the highest value of the higher segment ⁇ 2 and the lower value of the lower segment ⁇ -
  • the overall contrast is limited to the length of range ⁇ .
  • the length of the target contrast range ⁇ is smaller than the length of to the original contrast range ⁇ and the maximum value of the range ⁇ is smaller than the maximum value of the range ⁇ , thus the overall contrast of the region is decreased after processing. If the text is brighter than the background, which is called normal text, the background is mapped to the lower segment ⁇ 1 while the text is mapped to the higher segment ⁇ 2 .
  • the text is darker than the background, which is called inverse text, then the text is mapped to the lower segment ⁇ 1 and the background is mapped to the higher segment ⁇ 2 .
  • FIG. 3A represents the original region
  • the overall contrast is decreased by reduction in intensity maximum and reduction of the length of the range to protect the display form burn- in effect.
  • the objects, and especially text objects or the like are preserved by specific modifications to maintain a predetermined relative contrast between the background and the object by mapping each of them to a different separate segment of the target contrast range.
  • the method exposed above can be achieved by mapping one or several of different image parameters such as luminance or colour value of one channel. It can also be achieved by mapping several image parameters. For example, for content with colour each colour channel is processed independently.
  • target range segments for the background and for the object are computed for each colour channel and at least a predetermined difference is maintained between the target range segments defined for each colour channel so as to keep a relative contrast between them.
  • the target contrast range is predetermined and is not determined as a function of the original contrast range. In this case the overall contrast is limited by setting a proper target contrast range whose maximum value is adapted to avoid high luminance and whose length is adapted to restrict the maximum overall contrast.
  • the segments toward which the image parameters of the background and of the object are mapped are predetermined and are not defined as functions of the target contrast range values.
  • the boundaries of the target range segments are functions of the maximum and/or minimum values of the target contrast range or of the original contrast range or of any other value such as the median value.
  • Such a device has, according to the invention, a first unit adapted to detect regions of the image containing predetermined types of objects, such as text object, and a second unit adapted to process said regions of the image.
  • This second unit is adapted to map image parameters of the background and image parameters of the object to the two separate lower segment O 1 and higher segment ⁇ 2 , to maintain at least a predetermined relative contrast between said object and said background.
  • the method implemented in such an enhancing device according to the invention can be carried out by a computer program for a processing unit, comprising a set of instructions which, when loaded into said processing unit, causes the processing unit to carry out said method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The present invention concerns a device for enhancing a digital image comprising: - a circuit (20) for detecting regions of the image containing predetermined types of objects and a background; and - a circuit (30) for processing said regions by modifying image parameters defining the contrast of the image, wherein said circuit (30) of a region comprises the determination, in a module (33), of two separate target range segments for said image parameters, a lower segment and a higher segment, image parameters of the background being mapped, in a module (34), to one of said segments and image parameters of the objects being mapped, in said module (34), to the other of said segments to maintain at least a predetermined relative contrast between said object and said background.

Description

METHOD AND DEVICE FOR ENHANCING A DIGITAL IMAGE
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a device for enhancing a digital image and especially text regions in a digital image, in order to protect displays from burn- in effect, and to a corresponding enhancement method.
BACKGROUND OF THE INVENTION
Electronic displays activate red, green and blue light phosphors for each pixel to form images. Although there are variations among different technologies in the realization of this concept, all types of displays are susceptible to the same damaging effect, known as "burn- in". The "burn- in" effect results from the display of a static scene for a long duration, such as TV logos, score boxes etc. The "burn-in" effect mainly refers to deterioration and fast aging of the phosphor elements corresponding to pixels at the static image location. The end result is also appearance of a ghost image on the screen at all times, including the times the display has been turned off. The burn-in problem is not fixable and requires the display to be changed at a significant cost.
A common solution to the burn- in problem, for example for computer displays, is to activate a screen saver that involves the replacement of the original screen image with a moving image. However in many applications this is not an acceptable solution as it causes the disappearance of the original content. Other solutions also exist, based for example, on reducing the contrast of the image when a static region is detected, as for example in document US 6,313,878. Still another method consists in adjusting parameters such as luminance and colour channels or adding transparent overlays to static regions as described in document US 2003/0071769. However, none of these solutions preserves the quality of the content, that is to say, the quality of the objects within the image, and especially the readability of the textual content.
Document WO 02/075705 describes a solution, corresponding to the preamble of claim 1, in which digital images are enhanced by detecting regions of the image which contain a predetermined type of object and processing said regions by modifying image parameters which define the contrast of the image in order to adapt the processing to the regions depending upon the type of objects which they contain. Still this document does not provide a solution to the burn- in problem and even an eventual combination with the other documents mentioned would still have the drawbacks exposed above as the entire regions containing the object would be processed and thus the visibility of the objects would be affected to prevent a burn- in effect. Especially in the case of text objects or the like such as logos, the quality and the contrast are particularly important and a reduction of the contrast or of quality or even deterioration of one of the colours or luminance channel may make the text completely unreadable, which is not acceptable.
In another field, other solutions exist to prevent the burn- in effect as for example in document WO 98 09428 which deals with the superimposition of text information on a video background. In the document, two different images provided by two different sources are superimposed while the brightness of the image superimposed is controlled to maintain at least a predetermined minimum contrast between the solely enhanced image and the other image used as a reference. This predetermined minimum contrast is obtained by leaving the entire background image unchanged and leaving the entire superimposed image unchanged or increasing its overall brightness. Accordingly, the overall contrast of the image after superimposition is unchanged or increased, which aggravates the burn-in effect in consequence.
As indicated above, the known methods of enhancing a digital image while presuming the content, comprise detecting regions of the image containing predetermined types of objects and processing said regions by modifying image parameters defining the contrast, but do not deal efficiently with burn- in effect and the quality of the objects and their readability.
SUMMARY OF THE INVENTION
Accordingly it is an object of the invention to provide a device performing a new method to enhance a digital image while preserving the quality of the objects and preventing the burn-in effect.
To this end, the invention relates to a device for enhancing a digital image comprising:
- a first circuit for detecting regions of the image containing a predetermined type of object and a background; and
- a second circuit for processing said regions by modifying image parameters defining the contrast of the image, wherein said second circuit for processing a region comprises a module for the determination of two separate target range segments (δ-|2) for said image parameters, a lower segment (O1 ) and a higher segment (δ2 ) , image parameters of the background being mapped to one of said segments and image parameters of the objects being mapped to the other of said segments to maintain at least a predetermined relative contrast between said object and said background.
With said method, objects in digital images are detected and special processing operations are applied in the corresponding regions. These operations comprise the modification of image parameters of both the object and the background. These modifications are adapted to maintain at least a predetermined relative contrast between the object and the background by mapping image parameters of the object and of the background to two separate contrast range segments to preserve the quality of the object.
In consequence, the overall contrast is limited to the maximum contrast existing between two segments which can be set in order to prevent the burn- in effect.
Besides, the quality of the object is preserved and its readability is ensured by this minimum predetermined relative contrast.
Other features of the device of the invention are further recited in the dependent claims.
It is also an object of the invention to provide corresponding method and program as recited in claims 11 and 13.
BRIEF DESCRIPTION OF THE FIGURES
The present invention will now be described, by way of example, with reference to the accompanying drawings in which :
Figure 1 illustrates an image enhancing device according to the invention.
Figures 2A and 2B and figures 3A, 3B and 3C are symbolical representations of texts regions during the implementation of the device shown in Figure 1.
DETAILED DESCRIPTION
Referring to figure 1, a device for enhancing a digital image is illustrated. In this embodiment, the digital image comprises a text object or the like such as a logo, and is a video frame that is part of a sequence of frames and thus has a previous frame and a following frame. Each digital image is an array of pixels.
Advantageously, the device comprises a circuit 10 for identifying static regions within the image over a predetermined period of time. This identification of static regions is achieved in a conventional way by comparison of following frames one to another. The accuracy of this identification depends upon the time period over which comparisons are made and the level of similarity accepted between the same regions of different frames.
The device of figure 1 also comprises a circuit 20 for detecting objects within the image and more precisely within the static regions of the image. For example, this detection circuit 20 performs a detection comprises a detection of objects using the algorithm described in the documents entitled "Robust real-time object detection" by P. VIOLA and M. JONES published under the reference Proc.IEEE CVPR 2001. Especially this detection circuit 20 comprises a module 21 for detecting text objects within the static regions of the processed image and specific algorithms are involved.
Some existing text detection algorithms exploit the high contrast properties of overlay text regions. In a favourable text detection algorithm, the horizontal and vertical derivatives of the frame where text will be detected are computed first in order to enhance the high contrast regions. It is well known in the image and video processing literature that simple masks approximate the derivative of an image.
After the derivatives are computed for each of the colour channels (or intensity and chrominance channels depending on the selected colour space), the edge orientation feature is computed. The edge orientation feature has first been proposed by Rainer Lienhart and Axel Wernicke in "Localizing and Segmenting Text in Images, Videos and Web Pages"; IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No.4, pp. 256-268, April 2002.
A statistical learning tool can be used to find an optimal text/non-text classifier. Support Vector Machines (SVMs) result in binary classifiers and have nice generalization capabilities. An SVM-based classifier trained with 1,000 text blocks and, at most, 3,000 non-text blocks for which edge orientation features are computed, has provided good results in our experiments. Because it is difficult to find the representative hard-to-classify non-text examples, the popular bootstrapping approach that was introduced by K.K. Sung and T. Poggio in "Example-based learning for view- based human face detection", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998, can be followed. Bootstrap-based training is completed in several iterations and, in each iteration, the resulting classifier is tested on some images that do not contain text. False alarms over this data set represent difficult non-text examples that the current classifier cannot correctly classify. These non-text samples are added to the training set; hence, the non-text training dataset grows and the classifier is retrained with this enlarged dataset.
When a classifier is being trained, an important issue to decide upon is the size of the image blocks that are fed to the classifier because the height of the block determines the smallest detectable font size whereas the width of the block determines the smallest detectable text width. A block size of 12 x 12 is chosen for the training of the classifier because in a typical frame with a height of 400 pixels, it is rare to find smaller font size than 12.
Having computed the SVM-based classifier parameters in the training stage, text detection in a new frame is performed in two stages: 1) Detection of text-candidate blocks by using SVM-based classifier, and 2) binarization of text to extract pixel- accurate text mask for pixel-accurate enhancement. In the first stage, edge orientation features are extracted for every 12x12 window in the image and all pixels in the current window are classified as text or not by the SVM-based classifier. Because text can be larger than 12 pixels, font size independence is achieved by running the classifier with 12 x 12 window size over multiple resolutions and location independence is achieved by moving the window in horizontal and vertical directions to evaluate the classifier over the whole image. This first stage extracts text-candidate blocks, also called regions of interest (ROI), that need to be further processed to extract binary text mask.
A module 21 for the detection of a text in the image is responsible for extraction of text line boundaries, binarization of the text, and extraction of individual words. Initially, the coordinates of the horizontal text lines are computed. For that purpose, edge detection is performed in the region of interest (ROI) to find the high- frequency pixels most of which are expected to be text. Because the ROI is mainly dominated by text, it is expected that the top of a text line will demonstrate an increase in the number of edges whereas the bottom of a text line will show a corresponding fall in the number of edges. Projections along horizontal and/or vertical dimensions are effective descriptors to easily determine such locations. In contrast to intensity projections that are used in many text segmentation algorithms, edge projections are robust to the variations in the colour of the text.
This module 21 for the detection of a text in the static regions may further comprise thresholding to extract binary text mask. This step involves automatically computing a threshold value to find the binary and pixel-wise more accurate text mask. The pixels occurring just outside the text line boundaries are defined as background. The threshold value is set such that no pixel outside the detected text lines, which refer to background, is assigned as text pixel.
The module 21 for the detection of a text in the static regions may further comprise determining a word boundary by performing a morphological closing operation on the identified text pixels and identifying further pixels as text pixels if the further pixels are located within the word boundary. A morphological closing operation and a connected-component labelling algorithm are applied to the resulting text mask to segment individual words. The closing operation joins separate characters in words while connected-component labelling algorithm extracts connected regions (words in this case).
In the embodiment described, the module 21 is followed by a mask image creation in a module 22, this mask image having the label of the object type for each pixel. For example, the value 0 is attributed for pixels that are not part of an object, the value 1 for a pixel being part of a text object and the value 2 for a pixel being part of a face object.
Advantageously, detecting the objects in the circuit 20 also comprises an estimation of the colours in the static regions of the image in order to detect the objects more accurately. In the embodiment described this is achieved in a module 23, by comparing the values of the colour parameters of each pixel to the parameters of each neighbour in order to detect the edges of the objects more accurately, the mask image being corrected consequently.
Advantageously, the circuit 20 also comprises a module 24 for the determination of the parameters of the text objects detected. For example, this module 24 allows to determine if a text is horizontal or slanted and also classify the text upon its size. For example this computation of the size of the text line is achieved by taking the absolute difference between the lowest and the highest y-coordinate of the text line. In an alternative embodiment the size is determined by finding the upper and lower base line coordinates of the text line so that the outline effect due to upper elongated letters and lower elongated characters can be prevented. The height in this case can be assigned as the absolute difference between the lower and the upper baseline y- coordinate.
Furthermore, in the text objects maximum and minimum luminance values are computed by the module 24 and if the difference is high and occurs for every character evenly distributed across the whole text-line, it is considered as text shading. Alternatively, the circuit 20 allows detection of logo objects or of both text and logo objects by the use of any existing algorithm.
After detection of the objects within the image, creation of the mask image, and determination of object parameters, enhancement of the digital image is achieved. Accordingly, the method continues by processing the digital image in a circuit 30. This processing is achieved by modifying image parameters defining the contrast of the image by local low level operations applied to the selected regions to process the image data by pixel or elementary group of pixels of varying sizes. Each region to process comprises an object, which is in the example a text object, and also a background that corresponds to everything else in the region, i.e. every pixel of the region that does not belong to an object.
The maximum possible contrast corresponds to the difference between the highest and the lowest pixel values that can be attained. This maximum contrast is fixed by the type of image. For example, for 8 bit images, it is equal to 255-0 = 255. Then for one specific image, the overall contrast is set as the difference between the highest pixel value in the image and the lowest pixel value in the image. Thus, at the most the contrast is also 255 for 8 bit images but it can be much less. This applies also for a region of an image, in which the overall contrast of the region is set as the difference between the highest and lowest pixel values in the region.
In the embodiment described, for each region to process, the circuit 30 first comprises a module 31 for the determination of an original contrast range. This original contrast range is identified by the Greek letter Δ and its length corresponds, in the example, to the overall contrast of the region, i.e. the interval between the highest and the lowest pixel values in the region, for one image parameter defining the contrast within the region such as luminance or colour. Accordingly, the length of the original contrast range Δ is equal to the value of the overall contrast of the region to process. Eventually, Δ is defined as the interval between the highest and the lowest pixel values of the noise filtered image, in which case its length is equal to the value of the overall contrast of the region after noise filtering.
Of course, the original contrast range can also be set as the interval between the highest and lowest possible values, its length being equal to the maximum possible contrast or be set as the interval between the highest and the lowest values in the image, thus its length being equal to the image overall contrast. In the example, each image parameters pixels in an 8 bit channel can assume values in the range 0 to 255 accordingly, the maximum length of the original contrast range Δ is 255. The Greek letters α and Ω are used to designate respectively the lowest and highest pixels luminance or colour value used to determine Δ such that Δ = [α,Ω].
The module 31 is followed by a module 32 for the determination of a target contrast range to which the region will be mapped in order to protect the display from burn- in effect. The target contrast range referenced as δ , is selected such that its length is significantly less than the length of the original contrast range Δ . For example, the length of the target contrast range δ is equal to 1/5 of the length of the original contrast range Δ . Advantageously, the lowest possible pixel value in the target contrast range, identified by the Greek letter β , is set to 0 and its highest pixel value, referenced by Greek letter ω , is equal to (Ω -α)/5 . For example, the target contrast range is set from 0 to 50 if the original contrast range is defined as the maximum possible contrast for an 8-bit image. Accordingly, the length of the target contrast range δ is inferior to the length of the original contrast range Δ and the highest value of the target contrast range δ is inferior to the highest value of the original contrast range Δ . These two constraints ensure that the overall contrast is decreased after processing.
Furthermore, the circuit 30 comprises a module 33 for the determination of a lower segment identified by δ1 and a higher segment identified by δ2 within the target contrast range δ . These segments δ1 and δ2 are two separate range segments, which means that the highest value of δ1 segment is lower than the lowest value of segment δ2. Each one of these segments is a target range for the modification of either the object or the background of the region to process. Advantageously the maximum and minimum values of each or one of these segments are defined as functions of values of the target range δ , such as the maximum value ω . For example, these segments are defined by the following equations: δ = [β,ω] δx = [C4ω ; C3ω] and δ2 = [C2ω;Cxω] with 0 < C4 < C3 < C2 < C1 < 1
Eventually, coefficients C4 and C3 as well as coefficients C2 and C1 are equal one to another in which case each of the target segments is restricted to one single value. For example, coefficients C3 and C4 are both equal to 0, segment δ-\ being reduced to the value 0 and coefficients C2 and C1 are both equal to 1, segment δ2 being reduced to the value ω .
Once these two target segments have been defined, the circuit 30 also comprises a module 34 for the modification of the image parameters of the object and of the background within the processed region. More precisely the image parameters of the object and of the background are mapped from the original contrast range respectively to one or the other of the segments. Thus a predetermined minimum relative contrast between the background and the object is ensured. The relative contrast is the contrast existing between the object and the background and is computed by the difference of image parameter values of pixels from the background and from the object.
In the embodiment described, the relative contrast is the smallest difference possible over the region between the image parameter value of any pixel of the object and the image parameter value of any pixel of the background. Accordingly the minimum relative contrast corresponds to the length of the interval between the two segments δ-| and δ2 which is different from zero as the two segments are separate. In another embodiment, the relative contrast is the difference between the median image parameter value of all the pixels of the background and the median image parameter value of all the pixels of the object. The value of the minimum relative contrast is determined by setting the coefficients C1, C2, C3 and C4.
Furthermore, the overall contrast of the entire region after processing is limited and at maximum is equal to the difference between the highest value of the higher segment δ2 and the lower value of the lower segment δ-| . These two segments being set in the contrast target range δ , the overall contrast is limited to the length of range δ . In the example, the length of the target contrast range δ is smaller than the length of to the original contrast range Δ and the maximum value of the range δ is smaller than the maximum value of the range Δ , thus the overall contrast of the region is decreased after processing. If the text is brighter than the background, which is called normal text, the background is mapped to the lower segment δ1 while the text is mapped to the higher segment δ2.
This situation is represented in figures 2 A and 2B. Figure 2 A represents the original region with a normal text while figure 2B represents the processed region with a target contrast range of 0 to 50 and the following coefficients C1=C2=1 and C3=C4=0. In the opposite case, if the text is darker than the background, which is called inverse text, then the text is mapped to the lower segment δ1 and the background is mapped to the higher segment δ2.
An example of such a processing applied simultaneously to a text region and a logo region is represented with references to the figures 3A to 3C. Figure 3A represents the original region, figure 3B represents the region after being processed in a first case in which the target contrast range is 0 to 50 and the coefficients used are respectively Cl=I, C2=0.5, C3=0.25 and C4=0. Figure 3C represents the same region after processing with the coefficients set as follow Cl=I, C2=0.75, C3=C4=0.
Accordingly, the overall contrast is decreased by reduction in intensity maximum and reduction of the length of the range to protect the display form burn- in effect. Furthermore the objects, and especially text objects or the like such as logos, are preserved by specific modifications to maintain a predetermined relative contrast between the background and the object by mapping each of them to a different separate segment of the target contrast range.
The method exposed above can be achieved by mapping one or several of different image parameters such as luminance or colour value of one channel. It can also be achieved by mapping several image parameters. For example, for content with colour each colour channel is processed independently.
Alternatively, target range segments for the background and for the object are computed for each colour channel and at least a predetermined difference is maintained between the target range segments defined for each colour channel so as to keep a relative contrast between them. In another embodiment, the target contrast range is predetermined and is not determined as a function of the original contrast range. In this case the overall contrast is limited by setting a proper target contrast range whose maximum value is adapted to avoid high luminance and whose length is adapted to restrict the maximum overall contrast.
Alternatively, the segments toward which the image parameters of the background and of the object are mapped are predetermined and are not defined as functions of the target contrast range values.
In another embodiment, the boundaries of the target range segments are functions of the maximum and/or minimum values of the target contrast range or of the original contrast range or of any other value such as the median value.
The method implemented in the device illustrated in figure 1 of the invention can be achieved either by computers and the like or by dedicated devices. Generally speaking, such a device has, according to the invention, a first unit adapted to detect regions of the image containing predetermined types of objects, such as text object, and a second unit adapted to process said regions of the image. This second unit is adapted to map image parameters of the background and image parameters of the object to the two separate lower segment O1 and higher segment δ2 , to maintain at least a predetermined relative contrast between said object and said background.
The method implemented in such an enhancing device according to the invention can be carried out by a computer program for a processing unit, comprising a set of instructions which, when loaded into said processing unit, causes the processing unit to carry out said method.
It may be observed that there are numerous ways of implementing functions by means of items of hardware or software, or both. In this respect, the drawings are very diagrammatic and represent only one possible embodiment of the device according to the invention. Thus, although Fig.1 shows different functions as different blocks, this by no means excludes that a single item of hardware or software carries out several functions. Nor does it exclude that an assembly of items of hardware or software or both carry out a function.
The remarks made herein before demonstrate that the detailed description with reference to the drawings, illustrates rather than limits the invention, the numerous alternatives falling within the scope of the appended claims. The word "comprising" does not exclude the presence of other elements or steps than those listed in a claim. The word "a" or "an" preceding an element or step does not exclude the presence of a plurality of such elements or steps.

Claims

1. A device for enhancing a digital image comprising:
- a circuit (20) for detecting regions of the image containing a predetermined type of object and a background; and
- a circuit (30) for processing said regions by modifying image parameters defining the contrast of the image, wherein said circuit (30) for processing said regions comprises a module (33) for the determination of two separate target range segments (δ-|2) for said image parameters, a lower segment (O1 ) and a higher segment (δ2 ) , image parameters of the background being mapped in a module (34) to one of said segments and image parameters of the objects being mapped in said module (34) to the other of said segments to maintain at least a predetermined relative contrast between said object and said background.
2. A device according to claim 1, wherein it also comprises means for identifying within the image regions which are static over a predetermined period of time, further processing being applied only to said static regions.
3. A device according to any of claims 1 and 2, wherein said circuit (30) also comprises:
- a module (31) for determining the original contrast range (Δ) of the region to process;
- a module (32) for a target contrast range (δ ) for said region to process; and
- a module (33) for said two separate range segments (δ-|2 ) within the target contrast range (δ ).
4. A device according to claim 3, wherein the maximum value of the target contrast range (δ ) is inferior to the maximum value of the original contrast range (Δ) and the length of the target contrast range (δ ) is inferior to the length of the original contrast range (Δ), to decrease the overall contrast.
5. A device according to claim 3, wherein the maximum and/or minimum values of said segments (δ-|2) are function of the values of the target contrast range (δ ) and/or of the values of the original contrast range (Δ).
6. A device according to any of claims 1 to 5, wherein at least one of said segments (δ-|2) is restricted to a single value.
7. A device according to any of claims 1 to 6, wherein the image processed comprises several channels with respective image parameters, each channel of the image being processed separately.
8. A device according to claim 7, wherein processing is adapted to maintain at least a predetermined difference between the values of the image parameter of said different image channels.
9. A device according to any of claims 1 to 8, wherein said image parameter is selected in the group consisting of luminance and colour brightness of each image channel.
10. A device according to any of claims 1 to 9, wherein said predetermined type of object is selected in the group consisting of text, logos and textual graphics content.
11. A method of enhancing a digital image comprising a first step adapted to detect regions of the image containing a predetermined type of object and a background, and a second step adapted to process said regions of the image by modifying image parameters defining the contrast of the image, characterized in that the second step is adapted to map (28) image parameters of the background and image parameters of the object to two separate target range segment (5^O2 ) for said image parameters, a lower segment (O1 ) and a higher segment (δ2 ) , to maintain at least a predetermined relative contrast between said object and said background.
12. Computer program for a processing unit comprising a set of instructions which, when loaded into said processing unit, causes the processing unit to carry out the steps of the method as claimed in claim 11.
PCT/IB2006/050945 2005-04-01 2006-03-28 Method and device for enhancing a digital image WO2006103629A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05300242 2005-04-01
EP05300242.4 2005-04-01

Publications (1)

Publication Number Publication Date
WO2006103629A1 true WO2006103629A1 (en) 2006-10-05

Family

ID=36695038

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/050945 WO2006103629A1 (en) 2005-04-01 2006-03-28 Method and device for enhancing a digital image

Country Status (1)

Country Link
WO (1) WO2006103629A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010056272A1 (en) 2008-11-14 2010-05-20 Global Oled Technology Llc Tonescale compression for electroluminescent display
CN103155023A (en) * 2010-09-01 2013-06-12 高通股份有限公司 Dimming techniques for emissive displays
US9311716B2 (en) 2014-05-14 2016-04-12 International Business Machines Corporation Static image segmentation
US9601047B2 (en) 2008-11-14 2017-03-21 Global Oled Technology Llc Method for dimming electroluminescent display
CN111275034A (en) * 2020-01-19 2020-06-12 世纪龙信息网络有限责任公司 Method, device, equipment and storage medium for extracting text region from image
CN112967208A (en) * 2021-04-23 2021-06-15 北京恒安嘉新安全技术有限公司 Image processing method and device, electronic equipment and storage medium
US11328495B2 (en) * 2018-08-06 2022-05-10 Wrapmate Llc Systems and methods for generating vehicle wraps

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4549217A (en) * 1983-09-30 1985-10-22 Rca Corporation Automatic contrast reduction circuit for a teletext or monitor operation
US6313878B1 (en) * 1998-11-20 2001-11-06 Sony Corporation Method and structure for providing an automatic hardware-implemented screen-saver function to a display product
US20030071769A1 (en) * 2001-10-16 2003-04-17 Dan Sullivan Method and apparatus for preventing plasma display screen burn-in
US20040251842A1 (en) * 2003-06-10 2004-12-16 Hitachi, Ltd Image display device and method of displaying images with static image detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4549217A (en) * 1983-09-30 1985-10-22 Rca Corporation Automatic contrast reduction circuit for a teletext or monitor operation
US6313878B1 (en) * 1998-11-20 2001-11-06 Sony Corporation Method and structure for providing an automatic hardware-implemented screen-saver function to a display product
US20030071769A1 (en) * 2001-10-16 2003-04-17 Dan Sullivan Method and apparatus for preventing plasma display screen burn-in
US20040251842A1 (en) * 2003-06-10 2004-12-16 Hitachi, Ltd Image display device and method of displaying images with static image detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DONGQING ZHANG ET AL: "General and domain-specific techniques for detecting and recognizing superimposed text in video", PROCEEDINGS 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. ICIP 2002. ROCHESTER, NY, SEPT. 22 - 25, 2002, INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, NEW YORK, NY : IEEE, US, vol. VOL. 2 OF 3, 22 September 2002 (2002-09-22), pages 593 - 596, XP010607393, ISBN: 0-7803-7622-6 *
HENG W.J. AND TIAN Q.: "Content enhancement for e-learning lecture video using foreground/background separation", 2002 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 9 December 2002 (2002-12-09) - 11 December 2002 (2002-12-11), pages 436 - 439, XP002393559 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010056272A1 (en) 2008-11-14 2010-05-20 Global Oled Technology Llc Tonescale compression for electroluminescent display
US9601047B2 (en) 2008-11-14 2017-03-21 Global Oled Technology Llc Method for dimming electroluminescent display
US8576145B2 (en) 2008-11-14 2013-11-05 Global Oled Technology Llc Tonescale compression for electroluminescent display
US9218762B2 (en) 2010-09-01 2015-12-22 Qualcomm Incorporated Dimming techniques for emissive displays
CN103155023B (en) * 2010-09-01 2016-04-06 高通股份有限公司 Technology is dimmed for emissive display
CN103155023A (en) * 2010-09-01 2013-06-12 高通股份有限公司 Dimming techniques for emissive displays
US9311716B2 (en) 2014-05-14 2016-04-12 International Business Machines Corporation Static image segmentation
US10049459B2 (en) 2014-05-14 2018-08-14 International Business Machines Corporation Static image segmentation
US11328495B2 (en) * 2018-08-06 2022-05-10 Wrapmate Llc Systems and methods for generating vehicle wraps
US11830019B2 (en) 2018-08-06 2023-11-28 Wrapmate Inc. Systems and methods for generating vehicle wraps
CN111275034A (en) * 2020-01-19 2020-06-12 世纪龙信息网络有限责任公司 Method, device, equipment and storage medium for extracting text region from image
CN111275034B (en) * 2020-01-19 2023-09-12 天翼数字生活科技有限公司 Method, device, equipment and storage medium for extracting text region from image
CN112967208A (en) * 2021-04-23 2021-06-15 北京恒安嘉新安全技术有限公司 Image processing method and device, electronic equipment and storage medium
CN112967208B (en) * 2021-04-23 2024-05-14 北京恒安嘉新安全技术有限公司 Image processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
EP1066577B1 (en) System and method for analyzing video content using detected text in video frames
US20050163374A1 (en) Methods and systems for automatic detection of continuous-tone regions in document images
EP1840798A1 (en) Method for classifying digital image data
US20120163708A1 (en) Apparatus for and method of generating classifier for detecting specific object in image
US20080095442A1 (en) Detection and Modification of Text in a Image
WO2006103629A1 (en) Method and device for enhancing a digital image
KR20210043681A (en) Binaryization and normalization-based inpainting for text removal
Moradi et al. Farsi/Arabic text extraction from video images by corner detection
Wei et al. A robust video text detection approach using SVM
Song et al. A novel image text extraction method based on k-means clustering
Zhang et al. A combined algorithm for video text extraction
Jung et al. A new approach for text segmentation using a stroke filter
Liu et al. A novel multi-oriented chinese text extraction approach from videos
Kumar An efficient text extraction algorithm in complex images
Srivastav et al. Text detection in scene images using stroke width and nearest-neighbor constraints
Zhang et al. A novel approach for binarization of overlay text
Tehsin et al. Survey of region-based text extraction techniques for efficient indexing of image/video retrieval
Lai et al. Binarization by local k-means clustering for Korean text extraction
Patel et al. A novel approach for detecting number plate based on overlapping window and region clustering for Indian conditions
Dhir Video Text extraction and recognition: A survey
Zhuge et al. Robust video text detection with morphological filtering enhanced MSER
Miene et al. Extracting textual inserts from digital videos
Khatib et al. A hybrid multilevel text extraction algorithm in scene images
Guo et al. Automatic video text localization and recognition
WO2006090318A2 (en) System for enhancing a digital image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

NENP Non-entry into the national phase

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

122 Ep: pct application non-entry in european phase

Ref document number: 06727762

Country of ref document: EP

Kind code of ref document: A1

WWW Wipo information: withdrawn in national office

Ref document number: 6727762

Country of ref document: EP