Classification unit for and method of discriminating between synthetic and natural image regions
The invention relates to a method of discriminating between synthetic image regions and natural image regions in an image.
The invention further relates to a classification unit for discriminating between synthetic image regions and natural image regions in an image. The invention further relates to an image display apparatus comprising: means to receive an image; a classification unit for discriminating between synthetic image regions and natural image regions in the image; an image processing unit to process the image based on output of the classification unit; and a display device to display the processed image.
Many aspects of signal processing applications, such as feature extraction and content driven processing, compression and retrieval operations, are heavily dependent upon the ability to accurately segment the image into regions that are considered likely to represent a natural image, such as a photo or video image, and regions likely to represent so-called synthetic images such as computer generated text and/or graphics.
By discriminating between the data representing regions of the image that are either classified as natural or synthetic, natural or synthetic content-dedicated algorithms can then be employed so as to provide for further, and particularly appropriate, and accurate, signal processing applications. Without such segmentation, application of an algorithm to the whole image occurs and disadvantages can arise. For example, the same image-enhancement algorithms applied to both natural and synthetic regions of an image may produce significant improvements in the perceived quality of the natural image regions but may lead disadvantageously to artifacts in the synthetic parts of the image.
Thus, it can prove inappropriate to attempt to enhance the complete image without first seeking to discriminate, and separate, the natural regions of the image from
synthetic regions of the image. Once such different regions have been identified, respective appropriate processing algorithms can then be applied.
Of course, further advantages can arise in handling the image data in this manner. For example, the automatic optimization of the bandwidth utilization in coding applications such as arranging a fax machine to adopt separate encoding schemes for video images and for pure text/graphics content can be achieved.
US-A-6195459 discloses an algorithm arranged for discriminating between natural and synthetic regions of an image which provides for a block-analysis of the image with subsequent clustering of blocks found likely to fall either in the synthetic or natural category. The, generally rectangular, area formed by such clustered blocks is then refined and either accepted as a synthetic or accepted as a natural region responsive to further analysis steps, or discarded. However, such a known arrangement is disadvantageously limited in the range of graphics patterns that can be accurately identified and also with regard to its general accuracy and efficiency and its sensitivity to noise.
Also, this known algorithm is arranged to operate in accordance with a method that is considered unnecessarily complex and which exhibits a relatively high computational load which can disadvantageously restrict the usability of the algorithm in some circumstances.
It is an object of the invention to provide a method of the kind described in the opening paragraph which can be applied in real-time image processing applications relatively easy.
This object of the invention is achieved in that the method comprises: a number of probability estimation steps, each step estimating for a particular pixel of the image an elementary probability value representing a probability of the particular pixel of being located in one of the natural image regions based on values of pixels of a group of pixels in a neighborhood of the particular pixel; and a combination step of calculating for the particular pixel a final probability value representing the probability of the particular pixel of being located in one of the natural
image regions by combining the respective elementary probability values estimated in the probability estimation steps.
The mix of estimation steps result in a reliable final probability value indicating the probability of being located in one of the natural image regions, even by using values of very few pixels. With values of pixels is meant e.g. luminance or color levels of the pixels. Only a small portion of the image is used to calculate the final value of probability. Preferably the group of pixels corresponds to a block of pixels. The block of pixels comprises typically 2x2, 3x3, 3x5, 3x7 or 5x5 pixels. The method according to the invention can be applied in low- cost small delay streaming data processing. E.g. in the case of a block of 3x3 pixels the method can be used to label pixels of an image to be displayed with a measure of
"naturality", adding a negligible delay of one to two times the scan time of one line of the display device. Most classification methods work off-line and use all pixels of the entire image.
Another advantage is that the output is not a binary classification, but comprises a range of values. This is especially useful if content dependent processing downstream is designed to pass softly from one type of processing to another as the content changes, hence reducing artifacts due to content misclassification.
Estimation steps, requiring a relatively low computational effort, are carried out on a group of pixels in the neighborhood of the particular pixel. Preferably the particular pixel is included in the group of pixels and is the central pixel of the group of pixels.
Preferably the same group of pixels is used in the various probability estimation steps related to the particular pixel. However it is possible that for some probability estimation steps the group of pixels is extended with additional pixels.
Below three basically different estimations steps are disclosed. It should be noted that other probability estimations steps can be applied too and that the number of probability estimation steps can also differ from three. The probability estimation steps are called first, second and third respectively. However these names are used for identification purposes only. These names are not related to any order or combination of probability estimation steps. An embodiment of the method according to the invention is characterized in that in a first one of the probability estimation steps, the associated elementary probability value is estimated by dividing a number of different pixel values that is present in the group of pixels by a value that is related to the number of pixels in the group of pixels. This estimation step is simple and fast. This probability estimation step results in a probability
value NOV (Number Of Values) which equals 1 if the number of distinct pixel values that is present in the group of pixels is equal to the number of pixels in the group of pixels, whereas the probability value NOV equals 0 in a flat portion of the image.
An embodiment of the method according to the invention is characterized in that in a second one of the probability estimation steps, the associated elementary probability value is estimated by means of weighted summation of differences between pixel values of pixels of the group and the differences corresponding to distances between non-zero bins in a histogram of pixel values of the group of pixels. In this probability estimation step a simplified histogram analysis is carried out. Since natural images often comprise colors or luminance values very close to each other, small differences are scored better than large differences. In this embodiment the histogram or array of present pixel values is scanned and the differences between the present pixel values are calculated. Each difference is then weighted and added to compute the second probability value. This second probability value is further called SOV (Separation of Values). An embodiment of the method according to the invention is characterized in that in a third one of the probability estimation steps, the associated elementary probability value is estimated by calculating absolute values of two-directional gradients of pixel values of pixels of the group of pixels. Use is made of the knowledge that artificial graphic elements usually spread horizontally or vertically. That is why two-directional gradients are calculated to emphasis diagonal gradients.
An embodiment of the method according to the invention is characterized in that in the third one of the probability estimation steps, the associated elementary probability value is calculated by means of weighting a sum of the absolute values of the two-directional gradients. In order to calculate a scalar value, which can later on be used for succeeding calculations for the particular pixel, multiple gradient values which have been calculated for the group of pixels have to be combined. To achieve this the non-zero values are averaged and weighted.
An embodiment of the method according to the invention is characterized in that in the third one of the probability estimation steps a LUT is applied for weighting the sum of the absolute values of the two-directional gradients. A Look-Up Table (LUT) is very easy approach to implement functions. Preferably also a weighting function for the calculation of the second probability value is implemented by means of a LUT.
An embodiment of the method according to the invention is characterized in that in the combination step the final probability value is calculated by summation of the
elementary probability values estimated in the probability estimation steps, divided by the number of elementary probability values. This is a fast and easy approach.
An embodiment of the method according to the invention is characterized in that in the combination step the final probability value is calculated by means of a thresholded power sum of the elementary probability values estimated in the probability estimation steps. It is preferable that the final probability value is high if one of the probability values is relatively high regardless to the other probability values being calculated during the estimation steps.
Modifications of the method and variations thereof may correspond to modifications and variations thereof of the classification unit and of the image display apparatus described.
These and other aspects of the method, of the classification unit and of the image display apparatus according to the invention will become apparent from and will be elucidated with respect to the implementations and embodiments described hereinafter and with reference to the accompanying drawings, wherein:
Fig. 1 schematically shows an embodiment of the classification unit according to the invention; Fig. 2 shows an example of a weighting function for the gradient estimator; and
Fig. 3 schematically shows an embodiment of the image display apparatus.
Corresponding reference numerals have the same meaning in all of the Figs.
Fig. 1 schematically shows an embodiment of the classification unit 100 for discriminating between synthetic image regions 116 and natural image regions 118 in an image 114. The classification unit 100 comprises: three probability estimators 102-106, each estimators arranged to estimate for a particular pixel 120 of the image 114 respective probability values related to a probability of being located in one of the natural image regions 118 based on values of pixels of a block 122 of pixels in a neighborhood of the particular pixel 120; and a combination unit 108 designed to calculate for the particular pixel 120 one final probability value related to the probability of being located in one of the natural image
regions 118 by combining the respective probability values estimated by the probability estimators 102-106.
On the input connector 110 of the classification unit 100 an image 114 is provided. This image 114 has one large synthetic region 116 comprising text and other types of graphics, and has one natural image region 118 representing photographic data which has been captured by a camera and then digitized. The classification unit 100 is arranged to classify the pixels of the image 114 by using a sliding window approach. The output of the classification unit 100 is provided at the output connector 112. This output is a two-dimensional matrix 124 in which each element corresponds to a respective pixel of the image 114. The values of the elements represent the probability of being located in a natural image region 118. In Fig. 1 it is depicted that a portion 126 of the elements of the two-dimensional matrix 124 is labeled as "natural". This embodiment is arranged to process images with a maximum of 256 distinct luminance values. The output is not a binary classification, but the range of values of the elements of the two-dimensional matrix 124 has also 256 different values. This is especially useful if content dependent processing downstream is designed to pass softly from one type of processing to another as the content changes, hence reducing artifacts due to content misclassification. The scale ranges from 0 for synthetic to 255 for natural.
The first probability estimator 102 is designed to calculate the probability value NOV (Number Of Values). This corresponds with the calculation as specified by Equation 1:
_ (number of different values in the block -1) (number of pixels in the block -I)
Hence, NOV equals 1 if the number of distinct pixel values that is present in the group of pixels is equal to the number of pixels in the group of pixels, whereas the NOV equals 0 in a flat portion of the image. The second probability estimator 104 is designed to- calculate the probability value SOV (Separation of Values). This probability value is calculated by means of weighted summation of differences between pixel values of pixels of the block 122, with the differences corresponding to distances between non-zero bins in a histogram of pixel values of the group of pixels. The second estimator 104 is designed to perform a simplified histogram analysis. The histogram or array of present pixel values is scanned and the differences between the present values are calculated. Each difference is then weighted and added to compute the probability value SOV . Once the list of values that is present in the block 122 is ordered, the distance or separation between each value of this list and the next
value in the list is calculated. This is illustrated by means of an example. Suppose that the block 122 of pixels comprises 9 pixels with the following values: {1,1,3,4,7,7,250,255,255}. Then the following values can be distinguished: {1,3,4,7,250,255} and the differences between the distinguishable values are {2,1,3,243,5}. See table 1. These differences are called separations, St
Table 1
Each separation St is then weighted and added to compute the probability value SOV . This corresponds with the calculation as specified by Equation 2: nc-l
SOV = ∑Ws(Si)> (2)
!=1 where nc is the number of different values in the block and Ws ( ) a weighting function for variable n . An example of a weighting function is given in Equation 3:
where npix is equal to the total number of pixels in the block of pixels, and k a constant that is related to the maximum separation which is expected in a natural image region: typically 30-40. Since natural images often have values that do not differ much in small portions of the image, small differences are scored higher than large differences. Hence, SOV equals 1 in the case of a natural image region, whereas the SOV equals 0 in a synthetic image region. The third probability estimator 106 is designed to calculate the probability value GRD (Gradient). This probability value is calculated by means of calculating absolute values of two-directional gradients of pixel values of pixels of the block. Using the knowledge that artificial graphic elements usually spread horizontally or vertically, the two-
directional derivative order to emphasize diagonal gradients.
d Lum .
Since is a local property in a particular location of the picture, GRD is only dxdy
calculated using very small blocks of pixels e.g. 2x2 or 3x3 around the particular pixel, thus reducing the computational effort.
In terms of pixel values, the horizontal differences of the original n x m block of pixels are differentiated again vertically, obtaining a block of pixels with dimensions (n - 1) x (m - 1) . For example, given a block of pixels with dimension of 3 x 3 pixels: pi pA p7
Lum = p2 p5 p8 p2> p6 p9
The horizontal gradient is computed giving the new matrix:
~(pA - p\) (p7 - pA) dlυm (P5 -p2)(p8 -p5) dx (p6 - p3)(P9 - p6) which is then differentiated again vertically. After that the absolute values are determined: f d ^2\vcm> "abs((/75 - P2) - (pA - pi)) abs((/;8 - p5) - (p7 - pA)) 2ndGrad = abs dxdy _abs(G?6 - p3) - (p5 - p2)) abs((/>9 - pβ) - (p8 - p5))
Then the non zero values are averaged:
^2ndGrady
AGRD =- nonzerovalues (4) number of non zero values Finally the gradient GRD is determined by applying an appropriate weighting function Wg .
(See Fig. 2)
GRD = Wg(AGRD) (5)
The combination unit 108 is arranged to calculate for each pixel the final probability value FPV based on the probability values which have been determined for each pixel. Hence, when the probability values NOV , SOV and GRD have been calculated for a pixel, the final probability value FPV can be calculated by means "of a summation or by means of a thresholded power sum. The summation is given by Equation 6 and the power sum by Equation 7.
(NOV + SOV + GRD)
FPV = 255. (6)
FPV = min(255,[(l6 - NOV)2 + (16 - SOV)2 + (16 • GRD)2 J) (7)
The effect of applying Equation 7 is that the value of FPV is low in the case that the values of the probability values NO V , SOV and GRD are all relatively low, but the value of FPV
is near to the maximum if one of the probability values is relatively high, regardless to the values of the other probability values.
Fig. 2 shows an example of a weighting function W (AGRD) for the gradient estimator 106. The x-axis 204 corresponds the parameter AGRD , and the y-axis corresponds with probability value GRD . Hence the weighting function 200 according to Equation 5 is depicted. The weighting function gives maximum score to the smallest non-zero value and gives a lower score to higher values. Above a predetermined threshold 208 all scores are equal to zero. The weighting function can be implemented by means of a LUT.
Fig. 3 schematically shows an embodiment of the image display apparatus 300 according to the invention. The image display apparatus comprises: means to receive an image 302; a classification unit 100 for discriminating between synthetic image regions and natural image regions in the image. This classification unit 100 is as described in connection with Fig. 1; - an image processing unit 306 to process the image based on output of the classification unit 100; and a display device 308 to display the processed image. Typically image data will be provided to the display apparatus 300 via the input connector 310 as a video signal. The image data might e.g. be rendered by a computer system and converted to a video signal by the video controller of the computer system. It can be either a analogues or a digital signal. Before being displayed on the display device 308 the image is processed by the image processing unit 306. As a control signal the output of the classification unit 100 is provided to the image processing unit 306. Appropriate processing is performed depending on the type of data: natural image regions are processed differently form synthetic image regions.
The image classification and image processing coufd also be performed by the computer system before the image data is sent to a display device.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be constructed as limiting the claim. The word 'comprising' does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware
comprising several distinct elements and by means of a suitable programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware.