GB2488218A - Detecting the presence or absence of an object in an image - Google Patents

Detecting the presence or absence of an object in an image Download PDF

Info

Publication number
GB2488218A
GB2488218A GB1202394.1A GB201202394A GB2488218A GB 2488218 A GB2488218 A GB 2488218A GB 201202394 A GB201202394 A GB 201202394A GB 2488218 A GB2488218 A GB 2488218A
Authority
GB
United Kingdom
Prior art keywords
image
frequency spectrum
frequency
comparing
pass filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1202394.1A
Other versions
GB2488218B (en
GB201202394D0 (en
Inventor
Denise Bland
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Broadcasting Corp
Original Assignee
British Broadcasting Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Broadcasting Corp filed Critical British Broadcasting Corp
Publication of GB201202394D0 publication Critical patent/GB201202394D0/en
Publication of GB2488218A publication Critical patent/GB2488218A/en
Application granted granted Critical
Publication of GB2488218B publication Critical patent/GB2488218B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/41Analysis of texture based on statistical description of texture
    • G06T7/42Analysis of texture based on statistical description of texture using transform domain methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/164Detection; Localisation; Normalisation using holistic features
    • G06T7/0085
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • G06T7/402
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06K9/00221
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Abstract

A system and method for processing an image to assert the presence or absence of an object involves receiving an image and filtering the image to emphasise luminance variation. The luminance variation enhances edges and this luminance edge image is then transformed using a two-dimensional frequency transform such as a Fourier transform to produce an image frequency spectrum such as a power spectral density. The image frequency spectrum of the image is compared to image frequency spectrums of other images or to a set of data representing a generic image which may involve calculating the difference in amplitude between the spectrum and a reference at each of a number of frequency bins of a region of frequency bins. Where a match is found, the presence or absence of an object or type of object may be asserted. The filter used may be an orientation filter such as a Gabor filter, a vertical, horizontal, forward or backward diagonal pass filter.

Description

S
METHOD AND APPARATUS FOR IMAGE PROCESSING
BACKGROUND OF THE iNVENTION
This invention relates to image processing for asserting the presence or absence of an object within an image.
A variety of image processing systems and methods are known for determining the presence or absence of an object within an image, in particular for face detection. One example is the openCV software which can be used to detect the position and presence of a face within an image.
In addition, there are various techniques for identifying the position of a face within an image, such as within digital cameras which use the detection of regions of colour matching the facial skin tone as a mechanism for identifying the presence and location of a face within an image. More complex systems and methods are also known for attempting to identify a given person within an image.
SUMMARY OF THE INVENTiON
We have appreciated the need for a computationally simpler approach to improving the accuracy of detection of the presence or absence of an object within an image. In particular, we have appreciated the need to provide a better approach to avoiding "false positives" in face detection systems7 namely reducing the occurrence of identifying non facial features as a face.
The invention is defined in the claims to which reference is now directed.
An embodiment of the invention comprises an input for receiving an image from an image source. The image source may be a single image or a stream of video from which the input selects one or more image frames.
The image source may itself be the output of an object detection system * 2 which provides an isolated portion of an image frame showing only what is believed to be a detected object.
A filtering means is arranged to receive the image and emphasise S luminance variation to produce a luminance edge image. The means for filtering may be provided by vertical pass, horizontal pass, backward diagonal pass and forward diagonal pass filters, the output of which is combined by taking the maximum pixel value from each filter to produce a single resulting luminance edge image. Such a luminance edge image will render changes in luminance of the original image as visible regions in the luminance image, but areas in the original image that are smoothly varying will not be visible within the luminance edge image. The luminance edge image therefore increases or enhances the visibility of edges or large changes in luminance, whilst decreasing the visibility of smooth tones.
Means for transforming the luminance edge image using a two dimensional frequency transform is provided to produce an image frequency spectrum. Preferably, the transform is a fast Fourier Transform operating in two dimensions so that the resultant frequency spectrum represents the frequency components in each of the directions of filtering of the image.
A means for comparing the amplitude and/or frequency of pads of the image frequency spectrum is provided to compare these against one or more reference spectrums. Preferably, the amplitude of corresponding peaks or the amplitude difference between peak and trough or frequency position of a peak is compared against a reference spectrum for the object to be detected. The comparison determines if the difference between the test and reference spectrum in any of these aspects is greater than a threshold and, if so, asserts that the object or object type is not present.
The invention may be embodied in either a method or apparatus and can be particular used for receiving an input of a portion of an image in which a face is purportedly detected and can then assert a signal to indicate whether the detection of a face is correct or false.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will now be described in more detail by way of example with reference to the drawings, in which: Figure 1: is a schematic diagram of an apparatus embodying the invention; Figure 2: shows an image frame; Figure 3: shows a portion of the image selected from the image frame of Figure 2 for filtered versions and a resultant luminance edge image; Figure 4: shows the frequency spectrum of the luminance edge image of a Figure 3; Figure 5: shows the smoothed frequency spectrum of Figure 4; Figure 6: shows a reference frequency spectrum with the x, y 450 forward and 45° backward axis highlighted; Figure 7: shows the frequency spectrum along each of the x, y, 450 forward and 45° backward axis of Figure 6 for the reference frequency spectrum and o' for the test frequency spectrum; Figure 8: shows a second image frame; Figure 9: shows the image selected from the image frame of Figure 5, the full orientation pass filters and the resulting luminance edge image.
Figure 10: shows the frequency smooth spectrum of the luminance edge image of Figure 9; Figure 11: shows the frequency spectrum along each of the x, y, and 45° axis; Figure 12: shows a third image frame; Figure 13: shows an image selected from the image frame, the four orientation pass filtered images and the resultant luminance image; Figure 14: shows the frequency spectrum of Figure 13; Figure 15: shows the frequencies along each of the x, y, and 45° axis of Figure 14; Figure 16: shows the results of face detection without operation of the embodiment of the invention; Figure 17: shows face detection with verification using the embodiment of the invention; Figure 18: shows soft edge filtered test faces; Figure 19: shows soft edge filtered known training faces; Figure 20: shows the PSD of soft edge filtered test faces; Figure 21: shows the PSD of soft edge filtered known training faces; Figure 22: shows test results for a number of correctly identified faces against a threshold dB used in analysis using an FLD algorithm; Figure 23: shows test results for a number of correctly identified faces against a threshold dB used for analysis using a direct comparison; and Figure 24: shows test results for a number of correctly identified faces against a number of central frequency bins used for analysis using a direct comparison.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The invention may be embodied in a method or apparatus for processing an image to detect the presence or absence of an object such as a face.
The embodiment is particularly useful for receiving the output of a face recognition system and an image of a purported face from within an image frame and for asserting a signal to indicate whether or not, in fact, a face has correctly been detected.
The face detection verification reduces the number of false positive faces detections from a face detector. The particular application of the embodiment is in detecting full frontal faces as an extracted feature. The face detector operates on any archive content, therefore only black and white images are assumed. The method uses the output from a face detector and applies a soft edge filter to the black and white detected face and takes the frequency spectrum of the filtered black and white soft edge filtered detected face. The detected face is verified by comparing significant points in the 2D frequency domain with those from a reference spectrum of a number of averaged true positive full frontal faces. If significant points of the detected face are within specified limits of the reference spectra, the detected face is verified; otherwise the detected face is rejected.
A system embodying the invention will first be described with reference to Figure 1 and then example input and output images with reference to the remaining figures.
A system embodying the invention in Figure 1 has an input 10 receiving an image or series of images on line 8. The images received on line 8 may themselves be an output from a previous image detector comprising just the portion of image frames believed to contain one or more given objects.
Alternatively, the images on line 8 may comprise a video stream of whole * 6 image frames and the input 10 may itself comprise a mechanism for selecting images from image frames for further analysis. If the input video stream B is a colour image, the input 10 preferably converts this to a luminance image. A filter arrangement 12 comprises a mechanism for filtering the luminance image in each of four orientations, namely horizontal, vertical, and two diagonal orientations. Each such filtering process produces a luminance filtered version of the image. The filter arrangement 12 then combines these four filtered versions together by selecting pixel by pixel the maximum value from each of the four representations to provide a composite or luminance edge image. The fitter may also adjust the sizing of the image as appropriate, such as to adjust vertical or horizontal position to concentrate on features in the image desired such as eyes, nose and mouth in the case of a face recognition system. The filter may be considered as a soft edge filter and is preferably a Gabor filter.
The output of the Gabor filter 12 is applied to a two dimensional Fast Fourier Transform 14 which is applied to the filtered image to produce a power spectral density or image frequency spectrum of the image. This power spectral density is then smoothed within the low pass filter 14 and provided to a comparator 16. The comparator contains one or more reference spectrums from prior sampled objects auto smoothed in the same manner as the power spectral density from the Fast Fourier Transform 14 and comprises a comparison function. The comparison function takes sections of the power spectral density and compares against corresponding sections of power spectral density from the previously sampled objects so as to provide a comparison and to assert a signal indicating the presence or absence of an object from output line 18.
The comparison function may also compare parts of the frequency spectrum to other parts, for example comparing the peaks produced by one diagonal filter to those of the other diagonal filter.
First Comparison Arrangement The nature of the processes undertaken by the apparatus embodying the invenfion for a first comparison arrangement will now be described in more S detail with reference to example images and Figures 2-17.
A first example image and the resulting comparison process is shown in Figures 2-7. An entire image frame shown in Figure 2 has a circle to show a section of the image selected from the whole image frame in which an object has been identified (here a news presenter's face). The image is first converted to the luminance image as shown in the top left of Figure 3.
This luminance image is then put through four Gabor filters for each of horizontal, vertical and two diagonal orientations, as shown on the four right-hand images of Figure 3. Lastly1 the maximal pixel values for each of the four orientations are taken on a per pixel basis and the extreme vertical edges of the filtered image are zeroed for a more oval face shape concentrating on eyes, nose and mouth, as shown as the resultant luminance edge image at the bottom left of Figure 3. Each of the filters may be considered as pass filters in the sense that they allow image structures in each of the respective orientations to pass. The vertical pass filter, for example, can be seen to allow the vertical structures of the news presenter's hair to pass. Similarly, the horizontal pass filter can be seen to allow the horizontal features of eyebrows and mouth to pass. Similarly, the diagonal pass filters allow corresponding diagonal dimensions to pass.
On taking the maximum value from each of the four orientation pass filters images the resultant image at the bottom left shows the features within the original image having different luminance from adjacent pixels clearly. It is interesting to note that the resulting luminance edge image at the bottom left of Figure 3 has characteristics defined by the horizontal, vertical and diagonal pass filters as will be seen later. * 8
A two dimensional Fast Fourier Transform is applied to the filtered image and its power spectral density calculated as shown in Figure 4. A smoothed Version of this power sectral density is then calculated as shown in Figure 5. Features of the power spectral density can be S understood by considering the previous filtration steps and the nature of the Fast Fourier Transform. A central peak and side lobes can be seen being the classic sinc function expected on producing a Fast Fourier Transform of the square image frame. The width of the central peak relates to the width of the image frame itself. Characteristic side lobes can also be seen in the x and y frequency bin directions as well as in the 45° forward and reverse diagonal directions as a result of the previous vertical, horizontal and diagonal filtration step. The position and amplitude of the side lobes will depend upon the dimension and quantity of structures in the x, y and diagonal directions of the luminance image. A strong structural feature in the y direction, such as the eyes and mouth shown in Figure 3, will result in a high amplitude side lobe in the x frequency bins shown in Figure 5.
A corresponding reference power spectral density shown in Figure 6 with thick black lines showing cross sections in each of the x, y, 45° forward and 45° backward directions corresponding to the previous filtration stage.
The power spectrum along each of these dimensions is compared in a comparator as shown by the curves in Figure 7.
The features in Figure 7 will now be described in more detail. The plots using the "" symbol represent the reference spectra. The plots with an CL0J7 symbol represent the test spectra. In figure 7, the plots with the highest amplitude side lobe represent the x dimension. The plots with the second highest side lobe represent the y dimension. The plots with the third highest side lobe represent the forward diagonal dimension and the plots with the lowest side lobe represent the backward diagonal dimension. The diamond marks the maximum difference between the reference plot and S 9 the test plot for a specific orientation. The darker "" marks show the peaks for the main peak and side band. The lighter "a" marks show the troughs. Comparing the luminance image of Figure 3 and the side bands labelled in Figure 7, we can see that the strong structural feature of the mouth and eyes in the x axis in Figure 3 produces a high amplitude side lobe labelled x in Figure 7. The strong x structural features of the eyes and mouth which are shorter and even more prominent than the y structural features such as the hair, produce a higher frequency peak (because the structural features are shorter) and also a higher amplitude of the peak (because there are more structural features). As can be seen, the test and reference plots compare well to one another.
A second test image will now be described with reference to Figures 8-11.
Figure 8 shows an image frame comprising a scene with many structural features. Figure 9 shows at the top left an image selected from the image frame by a face detection process that is clearly a "false positive" as it does not actually represent a face. This area is shown with a circle in Figure 8. As already described, vertical, horizontal, backward and forward diagonal orientation filters are used on the detector image to produce a luminance image at the bottom left of Figure 9 highlighting the structural features in each of the horizontal, vertical and diagonal directions. A power spectral density plot is produced by a Fast Fourier Transform and smoothing, as shown in Figure 10. A comparison function then compares the power spectral density in each of the x, y and diagonal orientation as graphically represented in Figure 11. As can be seen shown by the horizontal bars and the double ended arrow, the peak of the lower most side lobe represented by "o" symbols being the forward diagonal has a very low side band in comparison to the peak of the backward diagonal side lobe represented by "o". It is expected that the two peaks for the diagonal orientations are of similar height and have little difference in height between them as shown in the reference signal. The difference * 10 between the side lobe peaks shown in Figure 11 is above a predefined threshold and so the image is determined to be a false positive and a corresponding output asserted on an output line. The comparison function in this example is therefore comparing features resulting from one of the orientation filters against features from another orientation filter. This may be done instead of, or in addition to, comparison with reference data.
Figures 12-15 show a third example image frame and corresponding image processing. Figure 12 shows a whole image frame with a portion circled showing a sign identified as a face by a facial recognition algorithm.
Figure 13 shows this image, the vertical, horizontal and diagonal filters and the resulting luminance edge image at the bottom left as previously described. It is noted that this has a strong x component in a single horizontal line, but even stronger y component in multiple shorter lines.
We therefore expect the comparison frequency spectrum to show a high amplitude high frequency for the y component and a slightly lower amplitude lower frequency x component. The corresponding power spectrum is shown in Figure 14 and the sections of the frequency power spectrum in x, y and diagonal directions in Figure 15. As expected, the y frequency plot has a significantly higher side lobe at a higher frequency than the reference plot indicating that the detected face is a false positive.
The peak levels of the y frequency plot between the reference and test image is emphasised with horizontal bars and double ended arrow in Figure 15. In this example, the comparison is between test data and reference data, but an additional comparison could be made between features within the test data due to different orientation filters as with the second example image.
The precision/recall results from a 30 minute 1992 news programme with Anna Ford (1 90920326_044506_archive_news) are compared for the OpenCV face detector with and without face verification, The ground truth is positive for a full frontal face covering more than 2% of the frame and I 11 negative otherwise. The requirement is for detected faces to be reliable (higher precision) as the expense of missing some correct faces (lower recall).
S Precision is calculated as number of true positive faces divided by the number of true positive faces plus the false positive faces.
Recall is calculated as number of true positive faced divided by the number of true positive faces plus the false negative faces.
In the graph of Figure 16, the ground truth full frontal face data is shown in red and the OpenCV (without verification) number of detected faces is shown in blue. The calculated precision is 78.2% and recall is 92.9%.
In the graph of Figure 17, the ground truth full frontal face data is shown in red and the verified OpenCV number of detected faces is shown in blue.
The calculated precision is 92.4% and recall is 86.3%.
The assumption is that a face is oval and has strong horizontal frequency components from the eyes and mouth, some strong vertical frequency components from the nose and some diagonal frequency components from the curvature of the eyes, nose and mouth.
Criteria for false positive face detection are too large a deviation from our generic face's frequency components. Face verification improvise precision from 78.2% to 92.4% (correctly rejects false positive faces) at the expense of a reduction in recall from 92.9% to 86.3% (incorrectly rejects some true positive faces).
In the first comparison arrangement as described above, the comparison of the power spectral density (PSD) of the Fast Fourier Transform of a filtered image along selected axes may be compared to that of one of more reference images. If the difference in amplitude between the data for the image in question and the reference image at one or more selected frequency positions is above a predefined threshold, then the object being detected is asserted as not present. The selected frequency positions may be minima or maxima within the curves or at other chosen positions. In addition, the reference image(s) may be different examples of a given object (such as different faces) so as to determine which of a number of faces is likely to be present. The reference data could just be one "generic" example of a type of object (such as a generic face) so as to determine if the category of object is present or not (a face is present or not). The amplitude difference threshold at the selected positions may be arranged so that an object is deemed not to be present if any of the selected frequency positions has an amplitude above the threshold. Alternatively, a certain number of frequency positions could have an amplitude difference above the threshold. Such parameters may be varied depending upon the data analysed.
We have further appreciated that the type of comparison performed may be extended so as to compare frequency bins within selected areas of the power spectral density (PSD) of the Fast Fourier Transform of a filtered image against reference data, rather than just at selected frequency positions as described above. I 13
Second Comparison Arrangement The nature of the processes undertaken by the apparatus embodying the invention for a second comparison arrangement will now be described in more detail with reference to example images and Figures 18 -24.
The second comparison arrangement uses the same initial steps as the first comparison arrangement1 but with a different matching technique. The method takes the frequency spectrum of the soft edge filtered black and white image of the detected face and compares the Power Spectral Density (PSD) with that of a known set of faces. If the detected face is a colour image it is first converted to a luminance image. The luminance image of the detected face is resized and put through four Gabor filters for horizontal, vertical and two diagonal orientations. The maximum pixel value from the four orientations is taken on a per pixei basis. The soft edge filtered test images are in Figure 18 and the soft edge filtered known training images are in Figure 19. A 2D FFT is applied to the soft edge filtered image and its Power Spectral Density (PSD) calculated. The PSD of the soft edge filtered test images are in Figure 20 and the PSD of the soft edge filtered known training images are in Figure 21.
The way the comparison with the known set of faces is performed differs from the first comparison technique. Instead of comparing at selected frequencies along axes of the PSD frequency space, an area of the PSD space is analysed. The unknown face is then matched to the known face with the closest Euclidean distance to the unknown face either using a Fisher Linear Discriminant (FLD) algorithm or by direct frequency comparison. The Euclidean distance is determined by calculating the difference in power between the test image and one of the training images at each frequency bin within a selected area as will now be described. The determining of the distance will first be described in relation to Figures 18 to 21.
Figure 18 shows a set of soft edge filtered images of faces to be tested.
The corresponding power spectra density of the frequency spectrum of each such filtered image is shown in Figure 20. Figure 19 shows a set of training soft edge filtered images of faces. The corresponding power spectra density of the frequency spectrum of each such filtered image is shown in Figure 21. For ease of representation, these are show in two dimensions (X -Y) with the power dimension (Z) represented by the brightness of the image (the PSD could equally be shown in a perspective view as in earlier figures, such as Figure 14). Conceptually, the approach in the present comparison arrangement is to compare areas of the PSDs of the images to be tested (Figure 20) against corresponding areas of the PSOs of the training images (Figure 21).
Consider first comparing P50 test image A (Figure 20) against P50 training image A (Figure 21). At each frequency bin, the difference in power "d" between the two images may be calculated for a selected area (here shown as a white square). The selected area may be defined as, for example, a square of n x n frequency bins. In this example, there will be n2 power difference values calculated. The selected area(s) may alternatively be defined by a threshold dB value expressing which frequency bins to include in the calculation based upon their amplitude in relation to the maximum amplitude of the PSD. Those below a certain threshold will be excluded from the calculation.
Three alternative methods of matching the PSOs are proposed, 1) FLD of the threshold PSD, 2) direct comparison of the threshold PSD components and 3) direct comparison of the low frequency PSD components. 0 15
The first method takes the difference values "d" of the PSD of the soft edge filtered images for each of the possible matches and applies the Fisher Linear Discriminant (FLD) method only to those frequency bins where the difference is more than 26dB or 27dB below the maximum power. The first method results in 10 out of 10 faces correctly recognised in the test data set shown, as can be seen in Figure 22. The choice of 26dB or 27dB is based on empirical analysis of the data. It is interesting to note that this level includes frequency bins for the central lobe of the PSD, but not any frequency bins for side lobes. This can most easily be seen by referring back to Figure 15. The side lobes are greater than 27dB down from the maximum. This suggests that the greatest accuracy is found by comparing the difference values for the central lobe only, and that the threshold value should be set accordingly.
The second method takes the difference value "d" and directly compares the frequency bins of the PSD and selecting the known face with the shortest summation of the Euclidean distance squared (d2) from the unknown faces PSD component amplitude. When there is more than one image of a known face, the average distance of all the images of that known face to the test face is compared. As before, only values of PSD that are more than a certain threshold lower than the maximum are included. This results in 10 out of 10 faces correctly recognised faces when the PSD threshold is from 20dB to 33dB (14 correct settings), and when it is above 36dB, 37dB, 59dB and above 60dB below the maximum, as shown in Figure 23.
In the third method, instead of selecting the frequency components above a threshold a central 2D area of PSD frequency bins around the dc value is used. Instead of varying the PSD threshold, all PSD components within this square low frequency region are evaluated. As with the second method, face matching is by directly comparing the frequency bins of the low frequency PSD and selecting the known face with the shortest summation of the Euclidean distance squared from the unknown faces low frequency PSD component ampfltude.
Thus, for a square of N pixels (n x n bins), the summation performed is the sum for all N pixels of d2 or (equation 1): This results in 10 out of 10 faces correctly recognised faces when the number of frequency components evaluated is above (2(x1)+1)A2, where x = 5, totalling a minimum of N=81 frequency components (9 x 9 bins), as shown in Figure 24. The number of bins assessed will always be an odd number squared as a central DC bin and equal numbers of bins either side will be assessed, namely a choice of 1, 3, 5, 7 squared and so on, expressed generally as (2(x1)+1)A2.
It is interesting to note that just evaluating the dc PSD component (when x=1) gives 6 out of 10 correctly recognised faces. Evaluating 9 PSD low frequency components (when x=2) gives 9 out of 10 correctly recognised faces, the same level of performance as the baseline. For 10 out of 10 correctly recognised faces a minimum of 81 PSD low frequency components (when x=5) are required.
The proposed first and second comparison methods of taking the frequency spectrum of the soft edge filtered black and white image of the detected face then comparing with the Power Spectral Density (PSD) provide improvements over know techniques. In either case, the technique may be used to assert the presence or absence of a type of object (facelno face) or which of a number of objects of a particular type is present (which face or even which facial expression of a given face).
The second comparison method improves upon the first by considering regions of the PSD in comparisons. The unknown face is matched to the known face with the closest Euclidean distance to the unknown face either using Fisher Linear Discriminant (FLD), direct frequency comparison of the threshold PSD or direct frequency comparison of the low frequency PSD components. For direct frequency comparison, when there is more than one image of a known face, the average distance of all the images of that known face to the test face is compared.
The FLD of the PSD face matching method results in 10 out of 10 faces correctly recognised faces when the PSD threshold is set to 26dB or 27dB below the maximum. Face matching by directly comparing the frequency bins of the PSD results in 10 out of 10 faces correctly recognised faces when the PSD threshold is from 20dB to 33dB (14 correct settings), and when it is 36dB, 37dB, 59dB and above 60dB below the maximum. Face matching by selecting a central 2D area of PSD frequency bins around the dc value results in 10 out of 10 faces correctly recognised faces when the number of frequency components evaluated is above (2(x.-1)+1)"2, where x = 5, totalling a minimum of 81 freqiiency components.
All matching techniques can give better results of 10 correctly recognised faces out of 10 with an appropriate PSD threshold or minimum number of frequency components; whereas the baseline spatial FLD technique gives 9 out of 10 correctly recognised faces. The Fisherface implementation in the spatial domain is taken as the baseline for comparison. The baseline is an open source Fisher Linear Discriminant (FLD) or Fisherface implementation of face recognition in the spatial domain from Mathworks by Amir Hossein Omidvarnia. The baseline FLD algorithm on the pixel images results in 9 out of 10 faces correctly recognised.
The invention may be embodied in a process, dedicated hardware, generally programmable hardware or in a computer program arranged to execute the method described. 1 18
The Fisher Linear Discriminant (FLD) is known to the skHled person, but a brief explanation of applying the FLD in the improvements herein follows for completeness. FLD performs dimensionality reduction (as does PCA) while maintaining the class discriminatory information. Assume we have rn-dimensional samples {xl, x2,..., xM} dimensions. For the face recognition example embodiment, we have 118 by 118 a total of 13924 power spectral density bins, and so this number of dimensions, as each power spectral density bin is treated as a dimension. In the embodiment, we have a total of 10 classes, one class for each person. What we wish to achieve is to reduce the number of dimensions and provide a large distance between the means and good class separability.
The solution proposed by Fisher, as used herein, is to maximize a function that represents the difference between the class means, normalized by a measure of the within-class variability, or the so-called scatter, an equivalent of the variance. In doing so, the projections of samples from the same class are projected very close to each other and, at the same time, the projected means of the sample classes are projected as far apart as possible.
The projected values for the samples are used as fewer dimensions are required to represent the data and the projected samples still contain the class discriminatory information. For the face recognition embodiment, the Euclidean distance squared d2' between the projected test image PSD and the projection of all training images PSD are calculated. The test image PSD is expected to have minimum distance with its corresponding image PSD in the training database.

Claims (32)

  1. CLAIMS1. A method of processing an image to assert the presence or absence of an object within the image, comprising: receiving an image from an image source; filtering the image using a filter that emphasises luminance variation to produce a luminance edge image; transforming the luminance edge image using a two dimensional frequency transform to produce an image frequency spectrum; and comparing amplitude and/or frequency of parts of the image frequency spectrum to those of one or more reference frequency spectrums or to other parts of the image frequency spectrum; and asserting a signal indicating the presence or absence of an object within the image as a result of the comparison.
  2. 2. A method according to claim 1, wherein the filtering comprises using a orientation pass filter to allow a high rate of change of luminance change to pass.
  3. 3. A method according to claim 2, wherein the filtering comprises using one or more of a vertical pass filter, horizontai pass filter and a diagonal pass filter.
  4. 4. A method according to claim 2 or 3, wherein the filtering comprises using each of a vertical pass filter, horizontal pass filter, forward diagonal pass filter and backward diagonal pass filter to produce respective intermediate images and selecting the maximum pixel values from intermediate images to produce the luminance edge image.
  5. 5. A method according to any of claims 2 to 4, wherein the orientation pass filter is a Gabor filter.
  6. 6. A method according to any preceding claim, wherein the transforming comprises using a Fourier Transform and the image frequency spectrum comprises a power spectral density.
  7. 7. A method according to claim 6, wherein the transforming comprises smoothing the results of the Fourier Transform and the image frequency spectrum comprises a smoothed power spectral density.
  8. 6. A method according to any preceding claim, wherein the comparing comprises comparing peaks and/ or troughs of the image frequency spectrum to those of one or more reference frequency spectrums.
  9. 9. A method according to claim 8, wherein the comparing comprises comparing the amplitude of a peak in the image frequency spectrum to a corresponding peak in one or more reference frequency spectrums.
  10. 10. A method according to 8 or 9, wherein the comparing comprises comparing the amplitude difference between a trough and a peak in the frequency spectrum to the amplitude difference between a trough and peak in the one or more reference spectrums.
  11. 11. A method according to any of claims 8 to 10, wherein the comparing comprises comparing the position of a side peak of the frequency spectrum to that of one or more reference frequency spectrums.
  12. 12. A method according to any of claims I to 7, wherein the comparing comprises calculating the difference in amplitude of the image frequency spectrum and a reference frequency spectrum at each frequency bin of a region of frequency bins.
  13. 13. A method according to claim 12, wherein the region of frequency bins comprises those frequency bins in the image frequency spectrum having an amplitude greater than a threshold.
  14. 14. A method according to claim 12, wherein the wherein the region of frequency bins comprises those frequency bins in the image frequency spectrum within a region of n x n bins centred on the DC frequency bin.
  15. 15. A method according to any of claims 12, 13 or 14, wherein the comparison determines the reference frequency spectrum that has the closest Euclidian distance.
  16. 16. A method according to any of claims 12, 13 or 14, wherein the comparison determines the best matching reference frequency spectrum using the Fisher Linear Discriminant algorithm.
  17. 17. Apparatus for processing an image to assert the presence or absence of an object within the image, comprising: an input for receiving an image from an image source; means for filtering the image using a filter that emphasises luminance variation to produce a luminance edge image; means for transforming the luminance edge image using a two dimensional frequency transform to produce an image frequency spectrum; means for comparing amplitude and/or frequency of parts of the image frequency spectrum to those of one or more reference frequency spectrums or to other parts of the image frequency spectrum; and means for asserting a signal indicating the presence or absence of an object within the image as a result of the comparison.
  18. 18. Apparatus according to claim 17, wherein the means for filtering comprises a orientation pass filter to allow a high rate of luminance change to pass.
  19. 19. Apparatus according to claim 18, wherein the means for filtering comprises one or more of a vertical pass filter, horizontal pass filter and a diagonal pass filter.
  20. 20. Apparatus according to claim 18 or 19, wherein the means for filtering comprises each of a vertical pass filter, horizontal pass filter, forward diagonal pass filter and backward diagonal pass filter arranged to produce respective intermediate images and means for selecting the maximum pixel values from the intermediate images to produce the luminance edge image.
  21. 21. Apparatus according to any of claims 18 to 20, wherein the orientation pass filter is a Gabor filter.
  22. 22. Apparatus according to any of claims 17 to 21, wherein the means for transforming comprises a Fourier Transform and the image frequency spectrum comprises a power spectral density.
  23. 23, Apparatus according to claim 22, wherein the means for transforming comprises means for smoothing the results of the Fourier Transform and the image frequency spectrum comprises a smoothed power spectral density.
  24. 24. Apparatus according to any of claims 17 to 13, wherein the means for comparing comprises means for comparing peaks andl or troughs of the image frequency spectrum to those of one or more reference frequency spectrums.
  25. 25. Apparatus according to claim 24, wherein the means for comparing comprises means for comparing the amplitude of a peak in the image frequency spectrum to a corresponding peak in one or more reference frequency spectrums.
  26. 26. Apparatus according to claim 24 or 25, wherein the means comparing comprises means for comparing the amplitude difference between a trough and a peak in the frequency spectrum to the amplitude difference between a trough and peak in the one or more reference spectrums.
  27. 27. Apparatus according to any of claims 24 to 26, wherein the means for comparing comprises means for comparing the position of a side peak of the frequency spectrum to that of one or more reference frequency spectrums.
  28. 28. Apparatus according to any of claims 17 to 23, wherein the comparing comprises calculating the difference in amplitude of the image frequency spectrum and a reference frequency spectrum at each frequency bin of a region of frequency bins.
  29. 29. Apparatus according to claim 28, wherein the region of frequency bins comprises those frequency bins in the image frequency spectrum having an amplitude greater than a threshold.
  30. 30. Apparatus according to claim 28, wherein the wherein the region of frequency bins comprises those frequency bins in the image frequency spectrum within a region of n x n bins centred on the DC frequency bin.
  31. 31. Apparatus according to any of claims 26, 29 or 30, wherein the comparison determines the reference frequency spectrum that has the closest Euclidian distance.
  32. 32. Apparatus according to any of claims 28, 29 or 30, wherein the comparison determines the best matching reference frequency spectrum using the Fisher Linear Discrimiriant algorithm.
GB1202394.1A 2011-02-18 2012-02-09 Method and apparatus for image processing Active GB2488218B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GBGB1102920.4A GB201102920D0 (en) 2011-02-18 2011-02-18 Method and apparatus for image processing

Publications (3)

Publication Number Publication Date
GB201202394D0 GB201202394D0 (en) 2012-03-28
GB2488218A true GB2488218A (en) 2012-08-22
GB2488218B GB2488218B (en) 2017-03-01

Family

ID=43881383

Family Applications (2)

Application Number Title Priority Date Filing Date
GBGB1102920.4A Ceased GB201102920D0 (en) 2011-02-18 2011-02-18 Method and apparatus for image processing
GB1202394.1A Active GB2488218B (en) 2011-02-18 2012-02-09 Method and apparatus for image processing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GBGB1102920.4A Ceased GB201102920D0 (en) 2011-02-18 2011-02-18 Method and apparatus for image processing

Country Status (1)

Country Link
GB (2) GB201102920D0 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065297A (en) * 2012-12-20 2013-04-24 清华大学 Image edge detecting method based on Fourier transformation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000163583A (en) * 1998-11-27 2000-06-16 Fujitsu Ltd Method and device for detecting object

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000163583A (en) * 1998-11-27 2000-06-16 Fujitsu Ltd Method and device for detecting object

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065297A (en) * 2012-12-20 2013-04-24 清华大学 Image edge detecting method based on Fourier transformation
CN103065297B (en) * 2012-12-20 2015-08-05 清华大学 A kind of method for detecting image edge based on Fourier transform

Also Published As

Publication number Publication date
GB2488218B (en) 2017-03-01
GB201202394D0 (en) 2012-03-28
GB201102920D0 (en) 2011-04-06

Similar Documents

Publication Publication Date Title
JP5107045B2 (en) Method for identifying a pixel representing an iris in an image acquired for the eye
US9990563B2 (en) Image processing device, image processing method, program, and recording medium for detection of epidermis pattern
CN101339607B (en) Human face recognition method and system, human face recognition model training method and system
Ng et al. A review of iris recognition algorithms
JP2009523265A (en) Method for extracting iris features in an image
Frucci et al. WIRE: Watershed based iris recognition
JP2007188504A (en) Method for filtering pixel intensity in image
Heusch et al. Lighting normalization algorithms for face verification
Türkan et al. Human eye localization using edge projections.
CN107239729B (en) Illumination face recognition method based on illumination estimation
CN103745237A (en) Face identification algorithm under different illumination conditions
CN111523344B (en) Human body living body detection system and method
Asmuni et al. An improved multiscale retinex algorithm for motion-blurred iris images to minimize the intra-individual variations
Ng et al. An effective segmentation method for iris recognition system
GB2488218A (en) Detecting the presence or absence of an object in an image
Makwana et al. Evaluation and analysis of illumination normalization methods for face recognition
Khalifa et al. Adaptive score normalization: a novel approach for multimodal biometric systems
WO2009144330A1 (en) Method for detection of objectionable contents in still or animated digital images
Bartunek et al. Improved adaptive fingerprint binarization
CN106952241A (en) A kind of electromagnetic image method of partition based on morphological method and Meanshift algorithms
TW201324375A (en) Rebuilding method for blur fingerprint images
Derman et al. Integrating facial makeup detection into multimodal biometric user verification system
Srisuk et al. A gabor quotient image for face recognition under varying illumination
Ng et al. Iris recognition algorithms based on texture analysis
Rossant et al. A robust iris identification system based on wavelet packet decomposition and local comparisons of the extracted signatures