US20170161549A1

US20170161549A1 - Processing Device and Method for Face Detection

Info

Publication number: US20170161549A1
Application number: US15/416,533
Authority: US
Inventors: Vijayachandran Mariappan; Rahul Arvind JADHAV; Puneet Balmukund Sharma
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2014-08-07
Filing date: 2017-01-26
Publication date: 2017-06-08
Also published as: EP3167407A1; CN106462736B; WO2016019709A1; CN106462736A; US10296782B2; EP3167407A4

Abstract

A method for detecting a presence of at least one face in at least one image is comprises creating an image patch map based on a plurality of face patches identified for at least one window in the in at least one image, estimating a bounding box, and searching within the bounding box to detect presence of the at least one face in the at least one image. The present disclosure discloses use of any classifier which works on top of any feature representation to identify face patches and then using a masking system to identify bounding boxes.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application No. PCT/CN2015/071466, filed on Jan. 23, 2015, which claims priority to Indian Patent Application No. IN3891/CHE/2014, filed on Aug. 7, 2014 both of which are hereby incorporated by reference in their entireties.

TECHNICAL FIELD

The present disclosure relates to the field of image processing, and in particular, to a face-detection processing method and a processing device for face detection.

BACKGROUND

Face detection is an important research direction in the field of computer vision because of its wide potential applications, such as video surveillance, human computer interaction, face recognition, security authentication, and face image database management etc. Face detection is to determine whether there are any faces within a given image, and return the location and extent of each face in the image if one or more faces are present.
Today, high definition (HD) cameras are an affordable commodity and are being widely used in all types of applications, video surveillance, for instance. Video analytics in the form of face detection has to match the high resolution output from the cameras and thus the performance of these algorithms is extremely critical for overall performance of analytics.
Face detection algorithms are usually employed in smart phones, bio-metric devices to detect the face and later recognize them. All smart phones today are equipped with a feature wherein it can unlock the phone by matching the faces. This application requires a fast face detection algorithm at its core. The exemplary output of a face detection engine is shown in FIG. 1.
A face detection framework which is essentially an Adaptive Boosting (AdaBoost) based cascaded classifier subsystem and has produced excellent accuracy with real-time performance. AdaBoost is a machine learning meta-algorithm which may be used in conjunction with many other types of learning algorithms to improve their performance. This performance though is directly proportional to the resolution of the image/video frame.
The general overall process of face detection algorithm is shown in FIG. 2 and the modules of any face detection algorithm includes but not limited to:
Feature representation module: Any face detection system uses some sort of feature representation which can identify facial features and correlate them in way such that overall output can be judged as a face or a non-face. Examples of feature representations are, Local Binary Patterns (LBP) and Modified Census Transform (MCT). These are alternative representations (in place of pixel intensity) which usually have better invariance to illumination, slight changes in pose/expressions.
Classifier module: Classifier provides a way to correlate multiple features. Examples are Cascaded Adaboost Classifier and Support Vector Machines (SVM).
Search space generator module: Given an image/video frame, a face can be present at any “location” and at any “scale”. Thus the face detection logic has to search (using a sliding window approach) for the possibility of the face “at all locations” and “at all the scales”. This usually results in scanning of hundreds of thousands of windows even in a low resolution image.
Also there are various algorithms like bounding box based algorithms that tries to identify the bounding box within which there is a possibility of a face to be detected. Thus the face detection classifier now has to search only within this bounding box and thus improves the speed of detection dramatically. The estimated bounding box and the face box as shown in FIG. 3.
However, it may be understood that it is not necessary to always find a face within the estimated bounding box. Secondly the estimated bounding box might not be centered on the face.
The sliding window approach is the most common technique to generate search space used for objects detection. A classifier is evaluated at every location, and an object is detected when the classifier response is above a preset threshold. Cascades speed up the detection by rejecting the background quickly and spending more time on object like regions. Although cascades were introduced, scanning with fine grid spacing is still computationally expensive.
To increase the scanning speed one approach is to train a classifier with perturbed training data to handle small shifts in the object location. But this significantly increases the number of weak classifiers required in the overall model since the training data will be noisy (unaligned/perturbed).
Another simple approach is to increase the grid spacing (decreases the number of windows being evaluated). Unfortunately, as the grid spacing is increased the number of detection decreases rapidly.
As shown in FIG. 4, in the graph (bottom line), we can see that as the grid spacing increases there is an exponential drop in the accuracy of the regular full face classifier.
A technique to reduce the number of miss detections while increasing the grid spacing when using the sliding window approach for object detection also exists.
The disclosed technique trains a classifier (Cpatch) using decision tree and this Cpatch classifier is evaluated on a regular grid, while the main classifier (Cobject) is placed on location predicted by Cpatch. The left hand side (LHS) of FIG. 5 shows a sample face with different patch locations shown in different dashed rectangles. A patch is of size W_p×h_pand all the patches are given as an input to the decision tree, where w_pis the width of the path and h_pis the height of the patch. The leaf nodes of the decision tree corresponds to patches that have been identified. The right hand side (RHS) of FIG. 5 shows patches identified on leaf nodes and the corresponding offsets for the full face.
The core idea of this technique is to use a decision tree based approach using very light-weight and simple features such as pixel intensity value and then use this Cpatch classifier as a pre-processing step. The actual Cobject classifier works only on the output from the Cpatch classifier. Thus if the Cpatch classifier is able to remove bulk of the windows then the Cobject classifier has relatively less work to be done resulting in improved performance. The face bounding box for faster face detection technique is shown in FIG. 5.
There are other approaches which are based on skin color segmentation to speed up the face detection algorithms. These techniques try to check the portion of image where the skin color is found and then try to apply face detection only on that pockets/sub windows.
However, the technique that is discussed above results in loss of accuracy. As shown in FIG. 4 the lines shows the data of an available techniques for face detection using the sliding window approach for object detection. It improves the accuracy but still is lower than the desirable. For e.g. at 6×6 grid spacing the accuracy is shown to be about 80 percent (%) which is down by almost 15-18% from peak. Even though all the disclosed techniques and the available techniques for face detection are used for accurate face detection, they still have a massive drawback of an amount of time that is spent in the detection process and reducing the processing time with higher accuracy rate. Further, the existing image processing or face detection algorithms requires high end processing, and accordingly requires a high end processing advanced hardware which involves higher cost. Furthermore, as the image processing or face detection algorithm requires high end processing, the usage of central processing unit (CPU) for this purpose is also increased in the process.
In view of the drawbacks and limitation discussed above, there exists a need to provide an efficient technique for face detection with higher accuracy of detection, less processing time and the technique must work on low-cost hardware and must have low CPU usage.

SUMMARY

This summary is provided to introduce concepts related a processing device and method for faster face detection and the concepts are further described below in the detailed description. The above-described problems are addressed and a technical solution is achieved in the present disclosure by providing a face-detection processing methods, and processing devices for faster face detection.
The present disclosure, in various embodiments, provide a face-detection processing method, and processing devices for faster face detection.
In one implementation, in view of the difficulties discussed above, the objective of this disclosure is to provide an image processing method which is able to detect a presence of at least one face in at least one image, which will not require a large memory capacity, which will be capable of performing high-speed processing in real time or offline, which can be produced at a low cost, which can detect specified patterns with certainty, with a very small probability of false positives.
In one implementation, a face detection method that may be used even on lower end hardware is disclosed. This technique ensures that a very low CPU usage is done for face detection method and thus can be employed on low-cost hardware.
In one implementation, an efficient technique to estimate the bounding box for the faces in the image such that the subsequent full face classifier can be applied within the bounding box only.
In one implementation, the technique disclosed in the present disclosure involves sliding the search window at higher pixel-shifts such that the total numbers of windows scanned are highly reduced.
In one implementation, a mechanism to locate face patches when the sliding pixel-shifts are increased and thus not impacting the output of the overall face classifier is disclosed. In a regular face detection system, the grid spacing used is 1×1 i.e. 1 pixel shift in each (x,y) direction. The present disclosure disclosed achieves a grid spacing of 6×6 i.e. the sliding window is shifted with 6 pixels in both x and y directions. This achieves an overall reduction/window compression of 36:1 i.e. in ideal scenarios the performance increase can be ˜36 (6×6) times.
In one implementation, a specific consideration is given to maintain the accuracy of the present technique even at higher pixel shifts.
In one implementation, the technique to identify face patches rather than full face to estimate the bounding box at higher pixel shifts and then using this bounding box to search for the presence of a face is disclosed.
Accordingly, in one implementation, a method for detecting a presence of at least one face in at least one image is disclosed. The method comprises of creating an image patch map based on a plurality of face patches identified for at least one window in said in at least one image, estimating a bounding box, and searching within said bounding box to detect presence of said at least one face in said at least one image.
In one implementation, a processing device is disclosed. The processing device comprises of memory storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations that comprises of creating an image patch map based on a plurality of face patches identified for at least one window in at least one image, estimating a bounding box, and searching within said bounding box to detect presence of at least one face in said at least one image, is disclosed. The said non-transitory computer readable storage medium storing instructions and said one or more processors are a part of a processing device.
In one implementation, a processing device is disclosed. The processing device comprises of one or more storages capable of storing one or more images and other data; and a face detector. The processing device is configured to perform operations that comprises of creating an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating a bounding box, and searching within the bounding box to detect presence of the at least one face in the one or more images.
In one implementation, creating the image patch map comprises of identifying the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the of at least one face, training a patch classifier using the plurality of face patches identified, evaluating the patch classifier trained, and applying the patch classifier on said windows using a pre-defined grid spacing, thereby creating the image patch map.
In one implementation, the present disclosure provides certain advantages that may include but not limited to:
The present disclosure improves face detection time multi-fold without impacting the accuracy.
The present disclosure may be used in real-time systems even with HD videos/images.
The present disclosure is suitable for generic object detection and not constrained to face domain.
These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.

Corresponding reference characters indicate corresponding parts throughout the several views. The exemplification set out herein illustrates preferred embodiments of the disclosure, in one form, and such exemplification is not to be construed as limiting the scope of the disclosure in any manner.

FIG. 1 illustrating an output of a face detection engine, is shown, in accordance with an embodiment of the present subject matter.

FIG. 2 illustrating a flow of face detection algorithm is shown, in accordance with an embodiment of the present subject matter.

FIG. 3 illustrating an estimated Bounding box and the face box is shown, in accordance with an embodiment of the present subject matter.

FIG. 4 illustrating a graph showing impact of grid spacing on detection accuracy is shown, in accordance with an embodiment of the present subject matter.

FIG. 5 illustrating a face bounding box for faster face detection is shown, in accordance with an embodiment of the present subject matter.

FIG. 6 illustrating a flow chart for face detection in present disclosure is shown, in accordance with an embodiment of the present subject matter.

FIG. 7 illustrating a face patch classification in present disclosure is shown, in accordance with an embodiment of the present subject matter.

FIG. 8 illustrating face patch examples in present disclosure are shown, in accordance with an embodiment of the present subject matter.

FIG. 9 illustrating a face patch masking operation in present disclosure is shown, in accordance with an embodiment of the present subject matter.

FIG. 10 illustrating a subsequent localized search within bounding box in present disclosure is shown, in accordance with an embodiment of the present subject matter.

FIG. 11 illustrating a detection flow chart for bounding box based, is shown, in accordance with an embodiment of the present subject matter.

FIG. 12 illustrating an operation to detect presence of at least one face in the at least one image executed by one or more processors is shown, in accordance with an embodiment of the present subject matter.

FIG. 13 illustrating a method for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.

FIG. 14 illustrating a method for creating the image patch map is shown, in accordance with an embodiment of the present subject matter.

FIG. 15 illustrating a special purpose processing device for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.

It is to be understood that the attached drawings are for purposes of illustrating the concepts of the disclosure and may not be to scale.

DETAILED DESCRIPTION

In order to make the aforementioned objectives, technical solutions and advantages of the present disclosure more comprehensible, embodiments are described below with accompanying figures.
The objects, advantages and other novel features of the present disclosure will be apparent to those skilled in the art from the following detailed description when read in conjunction with the appended claims and accompanying drawings.
Processing devices and methods for faster face detection are described. The present technique disclosed uses a patch based approach for identification of the face patch and then applying a full face classifier in the bounding box.
The present technique is characterized in the way the patches are formed, the features that are used to train on the patches and the way the bounding box is defined.
In one implementation, the present technique may be categorized into three major steps as shown in FIG. 6:
Applying patch classifier step: Patch classifier is applied on windows derived by doing a grid spacing of 6×6.
Estimate bounding box step: We have an image map from the “Applying patch classifier step” as noted above and then a mask is applied which checks how many of the patches of the window actually mapped to the face patch.
Searching within bounding box step: Once the 36×36 bounding box is found, the method searches within that bounding box using an aggressive grid spacing of 1×1, wherein the grid spacing used is 1×1 i.e. 1 pixel shift in each (x,y) direction.
In one implementation a face template size is 24×24 and the patches are formed using 36×36 area centered on 24×24 face area. This area is assumed considering the worst case scenarios for 6×6 grid spacing. In one example, the face box is the actual area occupies by a face/object to be identified in the image. In one implementation, the face box may be obtained by any of the known face detector or detection technique available in the art.
In one implementation, training the patch classifier Cpatch is achieved by training a decision tree using the 9 different types of patch samples as shown in the FIG. 8. The leaf node of the tree will identify the patch type. An MCT technique may be used for feature representation rather than simple binary tests as mentioned in the earlier approach. For decision tree, the nodes are split based on one-versus (vs.)-all approach, i.e. one patch vs. the rest of the patches. Further, non-face samples in the training may not be used. It is understood that the goal of the Cpatch classifier is to identify the face patch accurately and not distinguish between face patch and non-faces.
In one implementation, evaluating the Cpatch classifier is achieved using the Cpatch classifier. The Cpatch is applied on classifier on all the windows with a grid spacing of 6×6. As every window gives some patch type, an image patch map based on the patch identified is created for every window at a grid spacing of 6×6. The image patches type and the formation is shown in FIG. 8. In one example, the image patch map may include different patch location information arranged in different rectangles. The patch may be of size 4×4, 6×6, 8×8, and so on, and all the patches are given as an input to the decision tree. The leaf nodes of the decision tree correspond to patches have been identified. In one example, the image patch map may include an arrangement of pixel locations of different patches. Further, the image patch map may be obtained by any of the existing techniques.
In one implementation, once the image patch map is obtained, a matrix mask of [1, 2, 3, 4, 5, 6, 7, 8, 9] is applied to check how many patches around the face have matched. In one implementation, a tolerance of 4 may be considered, i.e. if 4 or more types in the mask match then 36×36 area as a possible face bounding box is chosen. The face patch masking operation is shown in FIG. 9.
In one implementation, after the face bounding box is estimated, a local search within that bounding box is performed and in the worst case the number of 24×24 windows searched in the bounding box can be 36. As shown in FIG. 10, on the LHS, the bounding box that is estimated is shown and on RHS, a localized search done within the bounding box to identify the face is shown.
In one implementation, the present disclosure provides the usage of 6×6 grid spacing. The technical advantages of using this grid spacing are as follows.
The number of classified patches: In case of 6×6, the numbers of possible patches are 9 with an overlap area of 3 pixels. In case of 4×4 it (possible patches) will be more and in case of 8×8 it will be less. All of this is depending upon the overlapped area. This will result in too many or too less leaf nodes for classification and regression trees (CART) or Random forest classifier which is used for patch classification.
Background area covered: The chosen grid size may result in some of the background area being covered in the test/train images. Usually some pixels around the eyes and below the lips for a 24×24 face image are used. For bounding box the area is extended without actually zooming in the face image, thus this will mean that some of the background area such as ears, hair, chin will come into picture, if 8×8 size is used then more of background area may come into picture which will have adverse effect on the patch classifier output.
While aspects of described processing devices and methods for faster face detection may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
While illustrative embodiments of the present disclosure are described below, it will be appreciated that the present disclosure may be practiced without the specified details, and that numerous implementation-specific decisions may be made to the disclosure described herein to achieve the developer's specific goals, such as compliance with system-related and business-related constraints, which will vary from one system to other system such a development effort might be complex and time-consuming, it would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure. For example, selected aspects are shown in block diagram form, rather than in detail, in order to avoid obscuring or unduly limiting the present disclosure. Such descriptions and representations are used by those skilled in the art to describe and convey the substance of their work to others skilled in the art. The present disclosure will now be described with reference to the drawings described below.
FIG. 11 illustrates a detection flow chart for bounding box based in accordance with an embodiment of the present subject matter.
FIG. 12 illustrates operations to detect presence of at least one face in the at least one image executed by one or more processors in accordance with an embodiment of the present subject matter.
In one implementation, a processing device is disclosed. The processing device comprises of non-transitory computer readable storage medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations that comprise of creating an image patch map based on a plurality of face patches identified for at least one window in at least one image, estimating a bounding box, and searching within the bounding box to detect presence of at least one face in the at least one image, is disclosed. The non-transitory computer readable storage medium storing instructions and the one or more processors are a part of a processing device.
In one implementation, the image patch map is created using the steps of: identifying said plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face, training a patch classifier using the plurality of face patches identified, evaluating the patch classifier trained, and applying the patch classifier on the windows using a pre-defined grid spacing, thereby creating the image patch map.
In one implementation, the plurality of face patches are identified using the at least one window surrounding a face box for identifying the plurality of face patches, wherein the at least one window size is holds the plurality of face patches in a face template centered on the face template size.
In one implementation, the at least one window size is preferably of 36×36, and the face template size is preferably of 24×24.
In one implementation, the pre-defined grid spacing is preferably of size 6×6.
In one implementation, the bounding box is estimated by applying a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face.
In one implementation, searching within the bounding box is a localized searching and is characterized using an aggressive grid spacing of size 1×1.
In one implementation, the non-transitory computer readable storage medium storing instructions performs a patch based approach for plurality of face patches identified for at least one window thereby applying a full face classifier in the bounding box.
In one implementation, wherein training the patch classifier is characterized by use of a decision tree using at least one face patch from the plurality of face patches to identify at least one patch type, and one-vs.-all approach, wherein one-vs.-all approach considers one face patch vs. the rest of face patches.
In one implementation, it is understood that the evaluation of the patch classifier is the evaluation of the trained classifier on the target image or input received image by the device on which face is detected. In one implementation there are two sets of images one from which the classifier learns that a particular structure is face and the other one which this classifier is applied. Evaluation usually is referred to the application of this trained classifier on the target image in which a face is to be detected.
FIG. 13 illustrating method 200 of detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter. The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in described processing device 102 as shown in FIG. 7.
At step 202, an image patch map is created. In on implementation, the image map is created based on a plurality of face patches identified for at least one window in said in at least one image.
At step 204, a bounding box is estimated. In one implementation, the bounding box is estimated by applying 308 a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face, as shown in FIG. 9.
At step 206, the bounding box is searched to detect presence of the at least one face in the at least one image. In one implementation, searching within the bounding box is a localized searching and is characterized using an aggressive grid spacing of size 1×1.
FIG. 14 illustrating a method for creating 202 the image patch map is shown, in accordance with an embodiment of the present subject matter.
At step 302, the plurality of face patches are identified for the at least one window. In one implementation, the at least one window is a detected face region of the at least one face. In one implementation, the plurality of face patches are identified using the at least one window surrounding a face box for identifying 302 the plurality of face patches, wherein the at least one window size holds the plurality of face patches in a face template centered on the face template size. In one implementation, the at least one window size is preferably of 36×36, and the face template size is preferably of 24×24.
At step 304, patch classifiers are trained using the plurality of face patches identified.
At step 306, the patch classifier trained are evaluated.
At step 308, the patch classifiers are applied on the at least one window using a pre-defined grid spacing, thereby creating 202 the image patch map. In one implementation, the pre-defined grid spacing is preferably of size 6×6.
FIG. 15 illustrating a special purpose processing device 102 for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.
In one implementation, a processing device 102 is disclosed. The processing device 102 comprises of one or more storages 402 capable of storing one or more images and other data, and a face detector 404. The processing device 102 is configured to perform operations that comprise of creating 202 an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating 204 a bounding box, and searching 206 within the bounding box to detect presence of the at least one face in the one or more images.
In one implementation, the image patch map is created using the steps of identifying 302 the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face, training 304 a patch classifier using the plurality of face patches identified, evaluating 306 the patch classifier trained, and applying 308 the patch classifier on the windows using a pre-defined grid spacing, thereby creating 202 the image patch map.
In one implementation, the plurality of face patches are identified using the at least one window surrounding a face box for identifying 302 the plurality of face patches, wherein the at least one window size holds the plurality of face patches in a face template centered on the face template size.
In one implementation, the at least one window size is preferably of 36×36, and the face template size is preferably of 24×24.
In one implementation, the pre-defined grid spacing is preferably of size 6×6.
In one implementation, the bounding box is estimated by applying 308 a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face. The bounding box is estimated based on a threshold, wherein the bounding box is preferably of size 36×36, and the threshold is based on the at least one face patch mapped with the at least one face. In one example, the bounding box may be estimated based on a threshold value keeping a tolerance of 4 i.e., if 4 or more types in the mask matching then bounding box of 36×36 is chosen.
In one implementation, searching 206 within the bounding box is a localized searching 206 and is characterized using an aggressive grid spacing of size 1×1.
In one implementation, the non-transitory computer readable storage medium 108 storing instructions performs a patch based approach for plurality of face patches identified for at least one window thereby applying 308 a full face classifier in the bounding box.
In one implementation, wherein training 304 the patch classifier is characterized by use of a decision tree using at least one face patch from the plurality of face patches to identify at least one patch type, and one-vs.-all approach, wherein one-vs.-all approach considers one face patch vs. the rest of face patches.
In one implementation, processing device 102 comprises a processor(s) 104 and a non-transitory computer readable storage medium 108 coupled to the processor(s) 104. The non-transitory computer readable storage medium 108 may have a plurality of instructions stored in it. The instructions are executed using the processor 104 coupled to the non-transitory computer readable storage medium 108.
In one embodiment, the computer system 102 may include at least one processor 104, an interface (s) 106 may be an input/output (I/O) interface, and a non-transitory computer readable storage medium (s) 108. The at least one processor 104 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 104 is configured to fetch and execute computer-readable instructions stored in the non-transitory computer readable storage medium 108.
The I/O interface 106 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 106 may allow the computer system 102 to interact with a user directly or through the client devices (not shown). Further, the I/O interface 106 may enable the computer system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 106 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as wireless LAN (WLAN), cellular, or satellite. The I/O interface 106 may include one or more ports for connecting a number of devices to one another or to another server.
The non-transitory computer readable storage medium 108 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 108 may include but is not limited to the plurality of instruction (s). In one implementation the memory may include the face detector 404 which further comprises of plurality of instruction(s) configured to perform operations to detect presence of said at least one face in the one or more images. The operation may include but is not limited to creating 202 an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating 204 a bounding box, and searching 206 within the bounding box to detect presence of the at least one face in the one or more images.
In one implementation, the processing device 102 may include storage(s) 402 configured to store at least one image received from the external devices or captured by the processing device 102.
Although the present subject matter is explained considering that the present system 102 is implemented as a processing device 102, it may be understood that the processing device 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, as a software on a server and the like. It will be understood that the processing device 102 may be accessed by multiple users through one or more user devices collectively referred to as user hereinafter, or applications residing on the user devices. Examples of the processing device 102 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation.
FIG. 11 illustrating a face patch classification in present disclosure is shown, in accordance with an embodiment of the present subject matter.
FIG. 12 illustrating a face patch examples in present disclosure are shown, in accordance with an embodiment of the present subject matter.
FIG. 13 illustrating a face patch masking operation in present disclosure is shown, in accordance with an embodiment of the present subject matter.
FIG. 14 illustrating a subsequent localized search within bounding box in present disclosure is shown, in accordance with an embodiment of the present subject matter.
FIG. 15 illustrating a flow chart for face detection in present disclosure is shown, in accordance with an embodiment of the present subject matter.
In one implementation, the patch classifier discussed in above sections is derived using a decision tree or a random forest based classifier. But it is well understood by the person skilled in the art that any other classifier may be used as well in place of these.
Secondly, the feature representation is by the use of MCT in the present disclosure. But it is well understood by the person skilled in the art that any other feature type may be chosen which will have some accuracy versus central processing unit (CPU) performance tradeoff.
Next, a simple mask as shown in FIG. 13 is disclosed, but it is well understood by the person skilled in the art that there are several other variations of this mask that can be employed. One technique is to assign different weighting to different patches. It is possible to use a weight based approach for patch classification wherein with every detected face patch there will be a corresponding weighting assigned to that patch. The final output will be summation of that weighting threshold by an empirical value that can be derived during training phase.
Thus, it is well understood by the person skilled in the art that the present disclosure encompasses an idea of using any classifier which works on top of any feature representation to identify face patches and then using a masking system to identify bounding boxes.
Although implementations for a processing device and method for faster face detection have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for the processing device and method for faster face detection.
Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include the following.
The present disclosure improves face detection time multi-fold without impacting the accuracy.
The present disclosure may be used in real-time systems even with HD videos/images.
The present disclosure is suitable for generic object detection and not constrained to face domain.
Exemplary embodiments discussed above may provide certain applicable areas of the present disclosure. Though not required to practice aspects of the disclosure, these application of the disclosure may include:
Handheld Terminals/devices: Face detection step is a precursor to any face recognition system. The technique mentioned in this document will make sure that even the low-cost, low-power handheld terminals can have face detection logic inbuilt.
HD Video Surveillance: With HD camera becoming commodity hardware it becomes all the more important to process the HD input video frames at higher speed within the constrained hardware. The technique mentioned here will improve the speed of detection many folds.
Camera auto-focus based on face detection: Cameras with face detection notice when a face is in the frame and then set the autofocus and exposure settings to give priority to the face. Cameras usually have low CPU capability and thus it is more important to achieve HD frame face detection with lower CPU utilization.
Finally, it should be understood that the above embodiments are only used to explain, but not to limit the technical solution of the present application. Despite the detailed description of the present application with reference to above preferred embodiments, it should be understood that various modifications, changes or equivalent replacements can be made by those skilled in the art without departing from the scope of the present application and covered in the claims of the present application.

Claims

1. A method for detecting a presence of at least one face in at least one image, comprising:

creating an image patch map based on a plurality of face patches identified for at least one window in the at least one image;

estimating a bounding box based on the image patch map; and

searching within the bounding box to detect the presence of the at least one face in the at least one image.

2. The method of claim 1, wherein creating the image patch map comprises:

identifying the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face;

training a patch classifier using the plurality of face patches identified;

applying the patch classifier on the at least one window using a pre-defined grid spacing to create the image patch map.

3. The method of claim 1, wherein the plurality of face patches are identified using a window of a window size surrounding a face box, and wherein the window holds the plurality of face patches in a face template centered on a face template size.

4. The method of claim 1, wherein the window size is of 36 pixels by 36 pixels, and wherein face template size is of 24 pixels by 24 pixels.

5. The method of claim 1, wherein a size of the pre-defined grid spacing is 6 pixels by 6 pixels.

6. The method of claim 1, wherein estimating the bounding box comprises applying a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face to estimate the bounding box based on a threshold, wherein a size of the bounding box is 36 pixels by 36 pixels, and wherein the threshold is based on the at least one face patch mapped with the at least one face.

7. The method of claim 1, wherein searching within the bounding box comprises a localized searching and is performed using a grid spacing of size 1 pixel by 1 pixel.

8. The method of claim 1, wherein training the patch classifier comprises using at least one face patch in a decision tree from the plurality of face patches to identify at least one patch type, wherein training the patch classifier comprises a one-vs-all approach, and wherein the one-vs-all approach considers one face patch a remainder rest of face patches.

9. A processing device for detecting a presence of at least one face in at least one image, the processing device comprising:

a processor;

a memory coupled to the processor for executing a plurality of instructions present in the memory, wherein the execution of the instructions cause the processor to perform operations comprising:

creating an image patch map based on a plurality of face patches identified for at least one window in at least one image;

estimating a bounding box based on the image patch map; and

10. The processing device of claim 9, wherein the processor is configured to perform operations comprising:

training a patch classifier using the plurality of face patches identified;

applying the patch classifier on the windows using a pre-defined grid spacing to create the image patch map.

11. The processing device of claim 10, wherein the plurality of face patches are identified using a window of a window size surrounding a face box to identify a plurality of face patches, and wherein the window holds the plurality of face patches in a face template centered on the face template size.

12. The processing device of claim 9, wherein the at least one window size is of 36 pixels by 36 pixels, and wherein the face template size is of 24 pixels by 24 pixels.

13. The processing device of claim 9, wherein a size of the pre-defined grid spacing is 6 pixels by 6 pixels.

14. The processing device of claim 9, wherein the processor is configured to perform operations comprising applying a matrix mask on the image patch map to check at least one face patch the plurality of face patches are mapped to the at least one face to estimate the bounding box based on a threshold, wherein a size of the bounding box is 36 pixels by 36 pixels, and wherein the threshold is based on the at least one face patch mapped with the at least one face.

15. The processing device of claim 9, wherein searching within the bounding box comprises a localized searching and is performed using a grid spacing of size of 1 pixel by 1 pixel.

16. The processing device of claim 9, wherein training the patch classifier comprises using at least one face patch in a decision tree from the plurality of face patches to identify at least one patch type, wherein training the path classifier comprises a one-vs-all approach, and wherein the one-vs-all approach considers one face patch against a remainder of face patches.

17. The processing device of claim 9, wherein the memory is further configured to store at least one image which is used for detecting the presence of the at least one face.