US20170161549A1 - Processing Device and Method for Face Detection - Google Patents

Processing Device and Method for Face Detection Download PDF

Info

Publication number
US20170161549A1
US20170161549A1 US15/416,533 US201715416533A US2017161549A1 US 20170161549 A1 US20170161549 A1 US 20170161549A1 US 201715416533 A US201715416533 A US 201715416533A US 2017161549 A1 US2017161549 A1 US 2017161549A1
Authority
US
United States
Prior art keywords
face
patch
image
patches
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/416,533
Other versions
US10296782B2 (en
Inventor
Vijayachandran Mariappan
Rahul Arvind JADHAV
Puneet Balmukund Sharma
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Assigned to HUAWEI TECHNOLOGIES CO., LTD. reassignment HUAWEI TECHNOLOGIES CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JADHAV, Rahul Arvind, MARIAPPAN, Vijayachandran, SHARMA, Puneet Balmukund
Publication of US20170161549A1 publication Critical patent/US20170161549A1/en
Application granted granted Critical
Publication of US10296782B2 publication Critical patent/US10296782B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • G06K9/00234
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • G06K9/00288
    • G06K9/4604
    • G06K9/66
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • G06K2009/4666
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/467Encoded features or binary features, e.g. local binary patterns [LBP]

Definitions

  • the present disclosure relates to the field of image processing, and in particular, to a face-detection processing method and a processing device for face detection.
  • Face detection is an important research direction in the field of computer vision because of its wide potential applications, such as video surveillance, human computer interaction, face recognition, security authentication, and face image database management etc. Face detection is to determine whether there are any faces within a given image, and return the location and extent of each face in the image if one or more faces are present.
  • High definition (HD) cameras are an affordable commodity and are being widely used in all types of applications, video surveillance, for instance.
  • Video analytics in the form of face detection has to match the high resolution output from the cameras and thus the performance of these algorithms is extremely critical for overall performance of analytics.
  • Face detection algorithms are usually employed in smart phones, bio-metric devices to detect the face and later recognize them. All smart phones today are equipped with a feature wherein it can unlock the phone by matching the faces. This application requires a fast face detection algorithm at its core.
  • the exemplary output of a face detection engine is shown in FIG. 1 .
  • AdaBoost Adaptive Boosting
  • AdaBoost is a machine learning meta-algorithm which may be used in conjunction with many other types of learning algorithms to improve their performance. This performance though is directly proportional to the resolution of the image/video frame.
  • FIG. 2 The general overall process of face detection algorithm is shown in FIG. 2 and the modules of any face detection algorithm includes but not limited to:
  • Feature representation module Any face detection system uses some sort of feature representation which can identify facial features and correlate them in way such that overall output can be judged as a face or a non-face.
  • feature representations are, Local Binary Patterns (LBP) and Modified Census Transform (MCT). These are alternative representations (in place of pixel intensity) which usually have better invariance to illumination, slight changes in pose/expressions.
  • LBP Local Binary Patterns
  • MCT Modified Census Transform
  • Classifier module Classifier provides a way to correlate multiple features. Examples are Cascaded Adaboost Classifier and Support Vector Machines (SVM).
  • SVM Support Vector Machines
  • Search space generator module Given an image/video frame, a face can be present at any “location” and at any “scale”. Thus the face detection logic has to search (using a sliding window approach) for the possibility of the face “at all locations” and “at all the scales”. This usually results in scanning of hundreds of thousands of windows even in a low resolution image.
  • the estimated bounding box might not be centered on the face.
  • the sliding window approach is the most common technique to generate search space used for objects detection.
  • a classifier is evaluated at every location, and an object is detected when the classifier response is above a preset threshold.
  • Cascades speed up the detection by rejecting the background quickly and spending more time on object like regions. Although cascades were introduced, scanning with fine grid spacing is still computationally expensive.
  • the disclosed technique trains a classifier (Cpatch) using decision tree and this Cpatch classifier is evaluated on a regular grid, while the main classifier (Cobject) is placed on location predicted by Cpatch.
  • the left hand side (LHS) of FIG. 5 shows a sample face with different patch locations shown in different dashed rectangles.
  • a patch is of size W p ⁇ h p and all the patches are given as an input to the decision tree, where w p is the width of the path and h p is the height of the patch.
  • the leaf nodes of the decision tree corresponds to patches that have been identified.
  • the right hand side (RHS) of FIG. 5 shows patches identified on leaf nodes and the corresponding offsets for the full face.
  • the core idea of this technique is to use a decision tree based approach using very light-weight and simple features such as pixel intensity value and then use this Cpatch classifier as a pre-processing step.
  • the actual Cobject classifier works only on the output from the Cpatch classifier. Thus if the Cpatch classifier is able to remove bulk of the windows then the Cobject classifier has relatively less work to be done resulting in improved performance.
  • the face bounding box for faster face detection technique is shown in FIG. 5 .
  • the technique that is discussed above results in loss of accuracy.
  • the lines shows the data of an available techniques for face detection using the sliding window approach for object detection. It improves the accuracy but still is lower than the desirable. For e.g. at 6 ⁇ 6 grid spacing the accuracy is shown to be about 80 percent (%) which is down by almost 15-18% from peak.
  • the existing image processing or face detection algorithms requires high end processing, and accordingly requires a high end processing advanced hardware which involves higher cost.
  • CPU central processing unit
  • the present disclosure in various embodiments, provide a face-detection processing method, and processing devices for faster face detection.
  • the objective of this disclosure is to provide an image processing method which is able to detect a presence of at least one face in at least one image, which will not require a large memory capacity, which will be capable of performing high-speed processing in real time or offline, which can be produced at a low cost, which can detect specified patterns with certainty, with a very small probability of false positives.
  • a face detection method that may be used even on lower end hardware is disclosed. This technique ensures that a very low CPU usage is done for face detection method and thus can be employed on low-cost hardware.
  • the technique disclosed in the present disclosure involves sliding the search window at higher pixel-shifts such that the total numbers of windows scanned are highly reduced.
  • a mechanism to locate face patches when the sliding pixel-shifts are increased and thus not impacting the output of the overall face classifier is disclosed.
  • the grid spacing used is 1 ⁇ 1 i.e. 1 pixel shift in each (x,y) direction.
  • the present disclosure disclosed achieves a grid spacing of 6 ⁇ 6 i.e. the sliding window is shifted with 6 pixels in both x and y directions. This achieves an overall reduction/window compression of 36:1 i.e. in ideal scenarios the performance increase can be ⁇ 36 (6 ⁇ 6) times.
  • the technique to identify face patches rather than full face to estimate the bounding box at higher pixel shifts and then using this bounding box to search for the presence of a face is disclosed.
  • a method for detecting a presence of at least one face in at least one image comprises of creating an image patch map based on a plurality of face patches identified for at least one window in said in at least one image, estimating a bounding box, and searching within said bounding box to detect presence of said at least one face in said at least one image.
  • a processing device comprises of memory storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations that comprises of creating an image patch map based on a plurality of face patches identified for at least one window in at least one image, estimating a bounding box, and searching within said bounding box to detect presence of at least one face in said at least one image, is disclosed.
  • the said non-transitory computer readable storage medium storing instructions and said one or more processors are a part of a processing device.
  • a processing device comprises of one or more storages capable of storing one or more images and other data; and a face detector.
  • the processing device is configured to perform operations that comprises of creating an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating a bounding box, and searching within the bounding box to detect presence of the at least one face in the one or more images.
  • creating the image patch map comprises of identifying the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the of at least one face, training a patch classifier using the plurality of face patches identified, evaluating the patch classifier trained, and applying the patch classifier on said windows using a pre-defined grid spacing, thereby creating the image patch map.
  • the present disclosure provides certain advantages that may include but not limited to:
  • the present disclosure improves face detection time multi-fold without impacting the accuracy.
  • the present disclosure may be used in real-time systems even with HD videos/images.
  • the present disclosure is suitable for generic object detection and not constrained to face domain.
  • FIG. 1 illustrating an output of a face detection engine, is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 2 illustrating a flow of face detection algorithm is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 3 illustrating an estimated Bounding box and the face box is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 4 illustrating a graph showing impact of grid spacing on detection accuracy is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 5 illustrating a face bounding box for faster face detection is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 6 illustrating a flow chart for face detection in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 7 illustrating a face patch classification in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 8 illustrating face patch examples in present disclosure are shown, in accordance with an embodiment of the present subject matter.
  • FIG. 9 illustrating a face patch masking operation in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 10 illustrating a subsequent localized search within bounding box in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 11 illustrating a detection flow chart for bounding box based, is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 12 illustrating an operation to detect presence of at least one face in the at least one image executed by one or more processors is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 13 illustrating a method for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 14 illustrating a method for creating the image patch map is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 15 illustrating a special purpose processing device for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.
  • the present technique disclosed uses a patch based approach for identification of the face patch and then applying a full face classifier in the bounding box.
  • the present technique is characterized in the way the patches are formed, the features that are used to train on the patches and the way the bounding box is defined.
  • the present technique may be categorized into three major steps as shown in FIG. 6 :
  • Patch classifier is applied on windows derived by doing a grid spacing of 6 ⁇ 6.
  • Estimate bounding box step We have an image map from the “Applying patch classifier step” as noted above and then a mask is applied which checks how many of the patches of the window actually mapped to the face patch.
  • the method searches within that bounding box using an aggressive grid spacing of 1 ⁇ 1, wherein the grid spacing used is 1 ⁇ 1 i.e. 1 pixel shift in each (x,y) direction.
  • a face template size is 24 ⁇ 24 and the patches are formed using 36 ⁇ 36 area centered on 24 ⁇ 24 face area. This area is assumed considering the worst case scenarios for 6 ⁇ 6 grid spacing.
  • the face box is the actual area occupies by a face/object to be identified in the image.
  • the face box may be obtained by any of the known face detector or detection technique available in the art.
  • training the patch classifier Cpatch is achieved by training a decision tree using the 9 different types of patch samples as shown in the FIG. 8 .
  • the leaf node of the tree will identify the patch type.
  • An MCT technique may be used for feature representation rather than simple binary tests as mentioned in the earlier approach.
  • the nodes are split based on one-versus (vs.)-all approach, i.e. one patch vs. the rest of the patches. Further, non-face samples in the training may not be used. It is understood that the goal of the Cpatch classifier is to identify the face patch accurately and not distinguish between face patch and non-faces.
  • evaluating the Cpatch classifier is achieved using the Cpatch classifier.
  • the Cpatch is applied on classifier on all the windows with a grid spacing of 6 ⁇ 6.
  • an image patch map based on the patch identified is created for every window at a grid spacing of 6 ⁇ 6.
  • the image patches type and the formation is shown in FIG. 8 .
  • the image patch map may include different patch location information arranged in different rectangles.
  • the patch may be of size 4 ⁇ 4, 6 ⁇ 6, 8 ⁇ 8, and so on, and all the patches are given as an input to the decision tree.
  • the leaf nodes of the decision tree correspond to patches have been identified.
  • the image patch map may include an arrangement of pixel locations of different patches. Further, the image patch map may be obtained by any of the existing techniques.
  • a matrix mask of [1, 2, 3, 4, 5, 6, 7, 8, 9] is applied to check how many patches around the face have matched.
  • a tolerance of 4 may be considered, i.e. if 4 or more types in the mask match then 36 ⁇ 36 area as a possible face bounding box is chosen.
  • the face patch masking operation is shown in FIG. 9 .
  • a local search within that bounding box is performed and in the worst case the number of 24 ⁇ 24 windows searched in the bounding box can be 36.
  • the bounding box that is estimated is shown and on RHS, a localized search done within the bounding box to identify the face is shown.
  • the present disclosure provides the usage of 6 ⁇ 6 grid spacing.
  • the technical advantages of using this grid spacing are as follows.
  • the number of classified patches In case of 6 ⁇ 6, the numbers of possible patches are 9 with an overlap area of 3 pixels. In case of 4 ⁇ 4 it (possible patches) will be more and in case of 8 ⁇ 8 it will be less. All of this is depending upon the overlapped area. This will result in too many or too less leaf nodes for classification and regression trees (CART) or Random forest classifier which is used for patch classification.
  • CART classification and regression trees
  • Background area covered The chosen grid size may result in some of the background area being covered in the test/train images. Usually some pixels around the eyes and below the lips for a 24 ⁇ 24 face image are used. For bounding box the area is extended without actually zooming in the face image, thus this will mean that some of the background area such as ears, hair, chin will come into picture, if 8 ⁇ 8 size is used then more of background area may come into picture which will have adverse effect on the patch classifier output.
  • FIG. 11 illustrates a detection flow chart for bounding box based in accordance with an embodiment of the present subject matter.
  • FIG. 12 illustrates operations to detect presence of at least one face in the at least one image executed by one or more processors in accordance with an embodiment of the present subject matter.
  • a processing device comprises of non-transitory computer readable storage medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations that comprise of creating an image patch map based on a plurality of face patches identified for at least one window in at least one image, estimating a bounding box, and searching within the bounding box to detect presence of at least one face in the at least one image, is disclosed.
  • the non-transitory computer readable storage medium storing instructions and the one or more processors are a part of a processing device.
  • the image patch map is created using the steps of: identifying said plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face, training a patch classifier using the plurality of face patches identified, evaluating the patch classifier trained, and applying the patch classifier on the windows using a pre-defined grid spacing, thereby creating the image patch map.
  • the plurality of face patches are identified using the at least one window surrounding a face box for identifying the plurality of face patches, wherein the at least one window size is holds the plurality of face patches in a face template centered on the face template size.
  • the at least one window size is preferably of 36 ⁇ 36, and the face template size is preferably of 24 ⁇ 24.
  • the pre-defined grid spacing is preferably of size 6 ⁇ 6.
  • the bounding box is estimated by applying a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face.
  • searching within the bounding box is a localized searching and is characterized using an aggressive grid spacing of size 1 ⁇ 1.
  • the non-transitory computer readable storage medium storing instructions performs a patch based approach for plurality of face patches identified for at least one window thereby applying a full face classifier in the bounding box.
  • training the patch classifier is characterized by use of a decision tree using at least one face patch from the plurality of face patches to identify at least one patch type, and one-vs.-all approach, wherein one-vs.-all approach considers one face patch vs. the rest of face patches.
  • the evaluation of the patch classifier is the evaluation of the trained classifier on the target image or input received image by the device on which face is detected.
  • FIG. 13 illustrating method 200 of detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.
  • the method may be described in the general context of computer executable instructions.
  • computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types.
  • the method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network.
  • computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
  • the order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in described processing device 102 as shown in FIG. 7 .
  • an image patch map is created.
  • the image map is created based on a plurality of face patches identified for at least one window in said in at least one image.
  • a bounding box is estimated.
  • the bounding box is estimated by applying 308 a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face, as shown in FIG. 9 .
  • the bounding box is searched to detect presence of the at least one face in the at least one image.
  • searching within the bounding box is a localized searching and is characterized using an aggressive grid spacing of size 1 ⁇ 1.
  • FIG. 14 illustrating a method for creating 202 the image patch map is shown, in accordance with an embodiment of the present subject matter.
  • the plurality of face patches are identified for the at least one window.
  • the at least one window is a detected face region of the at least one face.
  • the plurality of face patches are identified using the at least one window surrounding a face box for identifying 302 the plurality of face patches, wherein the at least one window size holds the plurality of face patches in a face template centered on the face template size.
  • the at least one window size is preferably of 36 ⁇ 36, and the face template size is preferably of 24 ⁇ 24.
  • patch classifiers are trained using the plurality of face patches identified.
  • the patch classifier trained are evaluated.
  • the patch classifiers are applied on the at least one window using a pre-defined grid spacing, thereby creating 202 the image patch map.
  • the pre-defined grid spacing is preferably of size 6 ⁇ 6.
  • FIG. 15 illustrating a special purpose processing device 102 for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.
  • a processing device 102 comprises of one or more storages 402 capable of storing one or more images and other data, and a face detector 404 .
  • the processing device 102 is configured to perform operations that comprise of creating 202 an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating 204 a bounding box, and searching 206 within the bounding box to detect presence of the at least one face in the one or more images.
  • the image patch map is created using the steps of identifying 302 the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face, training 304 a patch classifier using the plurality of face patches identified, evaluating 306 the patch classifier trained, and applying 308 the patch classifier on the windows using a pre-defined grid spacing, thereby creating 202 the image patch map.
  • the plurality of face patches are identified using the at least one window surrounding a face box for identifying 302 the plurality of face patches, wherein the at least one window size holds the plurality of face patches in a face template centered on the face template size.
  • the at least one window size is preferably of 36 ⁇ 36, and the face template size is preferably of 24 ⁇ 24.
  • the pre-defined grid spacing is preferably of size 6 ⁇ 6.
  • the bounding box is estimated by applying 308 a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face.
  • the bounding box is estimated based on a threshold, wherein the bounding box is preferably of size 36 ⁇ 36, and the threshold is based on the at least one face patch mapped with the at least one face.
  • the bounding box may be estimated based on a threshold value keeping a tolerance of 4 i.e., if 4 or more types in the mask matching then bounding box of 36 ⁇ 36 is chosen.
  • searching 206 within the bounding box is a localized searching 206 and is characterized using an aggressive grid spacing of size 1 ⁇ 1.
  • the non-transitory computer readable storage medium 108 storing instructions performs a patch based approach for plurality of face patches identified for at least one window thereby applying 308 a full face classifier in the bounding box.
  • training 304 the patch classifier is characterized by use of a decision tree using at least one face patch from the plurality of face patches to identify at least one patch type, and one-vs.-all approach, wherein one-vs.-all approach considers one face patch vs. the rest of face patches.
  • processing device 102 comprises a processor(s) 104 and a non-transitory computer readable storage medium 108 coupled to the processor(s) 104 .
  • the non-transitory computer readable storage medium 108 may have a plurality of instructions stored in it. The instructions are executed using the processor 104 coupled to the non-transitory computer readable storage medium 108 .
  • the computer system 102 may include at least one processor 104 , an interface (s) 106 may be an input/output (I/O) interface, and a non-transitory computer readable storage medium (s) 108 .
  • the at least one processor 104 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
  • the at least one processor 104 is configured to fetch and execute computer-readable instructions stored in the non-transitory computer readable storage medium 108 .
  • the I/O interface 106 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like.
  • the I/O interface 106 may allow the computer system 102 to interact with a user directly or through the client devices (not shown). Further, the I/O interface 106 may enable the computer system 102 to communicate with other computing devices, such as web servers and external data servers (not shown).
  • the I/O interface 106 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as wireless LAN (WLAN), cellular, or satellite.
  • the I/O interface 106 may include one or more ports for connecting a number of devices to one another or to another server.
  • the non-transitory computer readable storage medium 108 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM)
  • non-volatile memory such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • the memory 108 may include but is not limited to the plurality of instruction (s).
  • the memory may include the face detector 404 which further comprises of plurality of instruction(s) configured to perform operations to detect presence of said at least one face in the one or more images.
  • the operation may include but is not limited to creating 202 an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating 204 a bounding box, and searching 206 within the bounding box to detect presence of the at least one face in the one or more images.
  • the processing device 102 may include storage(s) 402 configured to store at least one image received from the external devices or captured by the processing device 102 .
  • the processing device 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, as a software on a server and the like. It will be understood that the processing device 102 may be accessed by multiple users through one or more user devices collectively referred to as user hereinafter, or applications residing on the user devices. Examples of the processing device 102 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation.
  • FIG. 11 illustrating a face patch classification in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 12 illustrating a face patch examples in present disclosure are shown, in accordance with an embodiment of the present subject matter.
  • FIG. 13 illustrating a face patch masking operation in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 14 illustrating a subsequent localized search within bounding box in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 15 illustrating a flow chart for face detection in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • the patch classifier discussed in above sections is derived using a decision tree or a random forest based classifier. But it is well understood by the person skilled in the art that any other classifier may be used as well in place of these.
  • the feature representation is by the use of MCT in the present disclosure. But it is well understood by the person skilled in the art that any other feature type may be chosen which will have some accuracy versus central processing unit (CPU) performance tradeoff.
  • CPU central processing unit
  • FIG. 13 a simple mask as shown in FIG. 13 is disclosed, but it is well understood by the person skilled in the art that there are several other variations of this mask that can be employed.
  • One technique is to assign different weighting to different patches. It is possible to use a weight based approach for patch classification wherein with every detected face patch there will be a corresponding weighting assigned to that patch. The final output will be summation of that weighting threshold by an empirical value that can be derived during training phase.
  • the present disclosure improves face detection time multi-fold without impacting the accuracy.
  • the present disclosure may be used in real-time systems even with HD videos/images.
  • the present disclosure is suitable for generic object detection and not constrained to face domain.
  • Face detection step is a precursor to any face recognition system. The technique mentioned in this document will make sure that even the low-cost, low-power handheld terminals can have face detection logic inbuilt.
  • HD Video Surveillance With HD camera becoming commodity hardware it becomes all the more important to process the HD input video frames at higher speed within the constrained hardware. The technique mentioned here will improve the speed of detection many folds.
  • Camera auto-focus based on face detection Cameras with face detection notice when a face is in the frame and then set the autofocus and exposure settings to give priority to the face. Cameras usually have low CPU capability and thus it is more important to achieve HD frame face detection with lower CPU utilization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

A method for detecting a presence of at least one face in at least one image is comprises creating an image patch map based on a plurality of face patches identified for at least one window in the in at least one image, estimating a bounding box, and searching within the bounding box to detect presence of the at least one face in the at least one image. The present disclosure discloses use of any classifier which works on top of any feature representation to identify face patches and then using a masking system to identify bounding boxes.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of International Application No. PCT/CN2015/071466, filed on Jan. 23, 2015, which claims priority to Indian Patent Application No. IN3891/CHE/2014, filed on Aug. 7, 2014 both of which are hereby incorporated by reference in their entireties.
  • TECHNICAL FIELD
  • The present disclosure relates to the field of image processing, and in particular, to a face-detection processing method and a processing device for face detection.
  • BACKGROUND
  • Face detection is an important research direction in the field of computer vision because of its wide potential applications, such as video surveillance, human computer interaction, face recognition, security authentication, and face image database management etc. Face detection is to determine whether there are any faces within a given image, and return the location and extent of each face in the image if one or more faces are present.
  • Today, high definition (HD) cameras are an affordable commodity and are being widely used in all types of applications, video surveillance, for instance. Video analytics in the form of face detection has to match the high resolution output from the cameras and thus the performance of these algorithms is extremely critical for overall performance of analytics.
  • Face detection algorithms are usually employed in smart phones, bio-metric devices to detect the face and later recognize them. All smart phones today are equipped with a feature wherein it can unlock the phone by matching the faces. This application requires a fast face detection algorithm at its core. The exemplary output of a face detection engine is shown in FIG. 1.
  • A face detection framework which is essentially an Adaptive Boosting (AdaBoost) based cascaded classifier subsystem and has produced excellent accuracy with real-time performance. AdaBoost is a machine learning meta-algorithm which may be used in conjunction with many other types of learning algorithms to improve their performance. This performance though is directly proportional to the resolution of the image/video frame.
  • The general overall process of face detection algorithm is shown in FIG. 2 and the modules of any face detection algorithm includes but not limited to:
  • Feature representation module: Any face detection system uses some sort of feature representation which can identify facial features and correlate them in way such that overall output can be judged as a face or a non-face. Examples of feature representations are, Local Binary Patterns (LBP) and Modified Census Transform (MCT). These are alternative representations (in place of pixel intensity) which usually have better invariance to illumination, slight changes in pose/expressions.
  • Classifier module: Classifier provides a way to correlate multiple features. Examples are Cascaded Adaboost Classifier and Support Vector Machines (SVM).
  • Search space generator module: Given an image/video frame, a face can be present at any “location” and at any “scale”. Thus the face detection logic has to search (using a sliding window approach) for the possibility of the face “at all locations” and “at all the scales”. This usually results in scanning of hundreds of thousands of windows even in a low resolution image.
  • Also there are various algorithms like bounding box based algorithms that tries to identify the bounding box within which there is a possibility of a face to be detected. Thus the face detection classifier now has to search only within this bounding box and thus improves the speed of detection dramatically. The estimated bounding box and the face box as shown in FIG. 3.
  • However, it may be understood that it is not necessary to always find a face within the estimated bounding box. Secondly the estimated bounding box might not be centered on the face.
  • The sliding window approach is the most common technique to generate search space used for objects detection. A classifier is evaluated at every location, and an object is detected when the classifier response is above a preset threshold. Cascades speed up the detection by rejecting the background quickly and spending more time on object like regions. Although cascades were introduced, scanning with fine grid spacing is still computationally expensive.
  • To increase the scanning speed one approach is to train a classifier with perturbed training data to handle small shifts in the object location. But this significantly increases the number of weak classifiers required in the overall model since the training data will be noisy (unaligned/perturbed).
  • Another simple approach is to increase the grid spacing (decreases the number of windows being evaluated). Unfortunately, as the grid spacing is increased the number of detection decreases rapidly.
  • As shown in FIG. 4, in the graph (bottom line), we can see that as the grid spacing increases there is an exponential drop in the accuracy of the regular full face classifier.
  • A technique to reduce the number of miss detections while increasing the grid spacing when using the sliding window approach for object detection also exists.
  • The disclosed technique trains a classifier (Cpatch) using decision tree and this Cpatch classifier is evaluated on a regular grid, while the main classifier (Cobject) is placed on location predicted by Cpatch. The left hand side (LHS) of FIG. 5 shows a sample face with different patch locations shown in different dashed rectangles. A patch is of size Wp×hp and all the patches are given as an input to the decision tree, where wp is the width of the path and hp is the height of the patch. The leaf nodes of the decision tree corresponds to patches that have been identified. The right hand side (RHS) of FIG. 5 shows patches identified on leaf nodes and the corresponding offsets for the full face.
  • The core idea of this technique is to use a decision tree based approach using very light-weight and simple features such as pixel intensity value and then use this Cpatch classifier as a pre-processing step. The actual Cobject classifier works only on the output from the Cpatch classifier. Thus if the Cpatch classifier is able to remove bulk of the windows then the Cobject classifier has relatively less work to be done resulting in improved performance. The face bounding box for faster face detection technique is shown in FIG. 5.
  • There are other approaches which are based on skin color segmentation to speed up the face detection algorithms. These techniques try to check the portion of image where the skin color is found and then try to apply face detection only on that pockets/sub windows.
  • However, the technique that is discussed above results in loss of accuracy. As shown in FIG. 4 the lines shows the data of an available techniques for face detection using the sliding window approach for object detection. It improves the accuracy but still is lower than the desirable. For e.g. at 6×6 grid spacing the accuracy is shown to be about 80 percent (%) which is down by almost 15-18% from peak. Even though all the disclosed techniques and the available techniques for face detection are used for accurate face detection, they still have a massive drawback of an amount of time that is spent in the detection process and reducing the processing time with higher accuracy rate. Further, the existing image processing or face detection algorithms requires high end processing, and accordingly requires a high end processing advanced hardware which involves higher cost. Furthermore, as the image processing or face detection algorithm requires high end processing, the usage of central processing unit (CPU) for this purpose is also increased in the process.
  • In view of the drawbacks and limitation discussed above, there exists a need to provide an efficient technique for face detection with higher accuracy of detection, less processing time and the technique must work on low-cost hardware and must have low CPU usage.
  • SUMMARY
  • This summary is provided to introduce concepts related a processing device and method for faster face detection and the concepts are further described below in the detailed description. The above-described problems are addressed and a technical solution is achieved in the present disclosure by providing a face-detection processing methods, and processing devices for faster face detection.
  • The present disclosure, in various embodiments, provide a face-detection processing method, and processing devices for faster face detection.
  • In one implementation, in view of the difficulties discussed above, the objective of this disclosure is to provide an image processing method which is able to detect a presence of at least one face in at least one image, which will not require a large memory capacity, which will be capable of performing high-speed processing in real time or offline, which can be produced at a low cost, which can detect specified patterns with certainty, with a very small probability of false positives.
  • In one implementation, a face detection method that may be used even on lower end hardware is disclosed. This technique ensures that a very low CPU usage is done for face detection method and thus can be employed on low-cost hardware.
  • In one implementation, an efficient technique to estimate the bounding box for the faces in the image such that the subsequent full face classifier can be applied within the bounding box only.
  • In one implementation, the technique disclosed in the present disclosure involves sliding the search window at higher pixel-shifts such that the total numbers of windows scanned are highly reduced.
  • In one implementation, a mechanism to locate face patches when the sliding pixel-shifts are increased and thus not impacting the output of the overall face classifier is disclosed. In a regular face detection system, the grid spacing used is 1×1 i.e. 1 pixel shift in each (x,y) direction. The present disclosure disclosed achieves a grid spacing of 6×6 i.e. the sliding window is shifted with 6 pixels in both x and y directions. This achieves an overall reduction/window compression of 36:1 i.e. in ideal scenarios the performance increase can be ˜36 (6×6) times.
  • In one implementation, a specific consideration is given to maintain the accuracy of the present technique even at higher pixel shifts.
  • In one implementation, the technique to identify face patches rather than full face to estimate the bounding box at higher pixel shifts and then using this bounding box to search for the presence of a face is disclosed.
  • Accordingly, in one implementation, a method for detecting a presence of at least one face in at least one image is disclosed. The method comprises of creating an image patch map based on a plurality of face patches identified for at least one window in said in at least one image, estimating a bounding box, and searching within said bounding box to detect presence of said at least one face in said at least one image.
  • In one implementation, a processing device is disclosed. The processing device comprises of memory storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations that comprises of creating an image patch map based on a plurality of face patches identified for at least one window in at least one image, estimating a bounding box, and searching within said bounding box to detect presence of at least one face in said at least one image, is disclosed. The said non-transitory computer readable storage medium storing instructions and said one or more processors are a part of a processing device.
  • In one implementation, a processing device is disclosed. The processing device comprises of one or more storages capable of storing one or more images and other data; and a face detector. The processing device is configured to perform operations that comprises of creating an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating a bounding box, and searching within the bounding box to detect presence of the at least one face in the one or more images.
  • In one implementation, creating the image patch map comprises of identifying the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the of at least one face, training a patch classifier using the plurality of face patches identified, evaluating the patch classifier trained, and applying the patch classifier on said windows using a pre-defined grid spacing, thereby creating the image patch map.
  • In one implementation, the present disclosure provides certain advantages that may include but not limited to:
  • The present disclosure improves face detection time multi-fold without impacting the accuracy.
  • The present disclosure may be used in real-time systems even with HD videos/images.
  • The present disclosure is suitable for generic object detection and not constrained to face domain.
  • These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
  • Corresponding reference characters indicate corresponding parts throughout the several views. The exemplification set out herein illustrates preferred embodiments of the disclosure, in one form, and such exemplification is not to be construed as limiting the scope of the disclosure in any manner.
  • FIG. 1 illustrating an output of a face detection engine, is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 2 illustrating a flow of face detection algorithm is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 3 illustrating an estimated Bounding box and the face box is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 4 illustrating a graph showing impact of grid spacing on detection accuracy is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 5 illustrating a face bounding box for faster face detection is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 6 illustrating a flow chart for face detection in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 7 illustrating a face patch classification in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 8 illustrating face patch examples in present disclosure are shown, in accordance with an embodiment of the present subject matter.
  • FIG. 9 illustrating a face patch masking operation in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 10 illustrating a subsequent localized search within bounding box in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 11 illustrating a detection flow chart for bounding box based, is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 12 illustrating an operation to detect presence of at least one face in the at least one image executed by one or more processors is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 13 illustrating a method for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 14 illustrating a method for creating the image patch map is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 15 illustrating a special purpose processing device for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.
  • It is to be understood that the attached drawings are for purposes of illustrating the concepts of the disclosure and may not be to scale.
  • DETAILED DESCRIPTION
  • In order to make the aforementioned objectives, technical solutions and advantages of the present disclosure more comprehensible, embodiments are described below with accompanying figures.
  • The objects, advantages and other novel features of the present disclosure will be apparent to those skilled in the art from the following detailed description when read in conjunction with the appended claims and accompanying drawings.
  • Processing devices and methods for faster face detection are described. The present technique disclosed uses a patch based approach for identification of the face patch and then applying a full face classifier in the bounding box.
  • The present technique is characterized in the way the patches are formed, the features that are used to train on the patches and the way the bounding box is defined.
  • In one implementation, the present technique may be categorized into three major steps as shown in FIG. 6:
  • Applying patch classifier step: Patch classifier is applied on windows derived by doing a grid spacing of 6×6.
  • Estimate bounding box step: We have an image map from the “Applying patch classifier step” as noted above and then a mask is applied which checks how many of the patches of the window actually mapped to the face patch.
  • Searching within bounding box step: Once the 36×36 bounding box is found, the method searches within that bounding box using an aggressive grid spacing of 1×1, wherein the grid spacing used is 1×1 i.e. 1 pixel shift in each (x,y) direction.
  • In one implementation a face template size is 24×24 and the patches are formed using 36×36 area centered on 24×24 face area. This area is assumed considering the worst case scenarios for 6×6 grid spacing. In one example, the face box is the actual area occupies by a face/object to be identified in the image. In one implementation, the face box may be obtained by any of the known face detector or detection technique available in the art.
  • In one implementation, training the patch classifier Cpatch is achieved by training a decision tree using the 9 different types of patch samples as shown in the FIG. 8. The leaf node of the tree will identify the patch type. An MCT technique may be used for feature representation rather than simple binary tests as mentioned in the earlier approach. For decision tree, the nodes are split based on one-versus (vs.)-all approach, i.e. one patch vs. the rest of the patches. Further, non-face samples in the training may not be used. It is understood that the goal of the Cpatch classifier is to identify the face patch accurately and not distinguish between face patch and non-faces.
  • In one implementation, evaluating the Cpatch classifier is achieved using the Cpatch classifier. The Cpatch is applied on classifier on all the windows with a grid spacing of 6×6. As every window gives some patch type, an image patch map based on the patch identified is created for every window at a grid spacing of 6×6. The image patches type and the formation is shown in FIG. 8. In one example, the image patch map may include different patch location information arranged in different rectangles. The patch may be of size 4×4, 6×6, 8×8, and so on, and all the patches are given as an input to the decision tree. The leaf nodes of the decision tree correspond to patches have been identified. In one example, the image patch map may include an arrangement of pixel locations of different patches. Further, the image patch map may be obtained by any of the existing techniques.
  • In one implementation, once the image patch map is obtained, a matrix mask of [1, 2, 3, 4, 5, 6, 7, 8, 9] is applied to check how many patches around the face have matched. In one implementation, a tolerance of 4 may be considered, i.e. if 4 or more types in the mask match then 36×36 area as a possible face bounding box is chosen. The face patch masking operation is shown in FIG. 9.
  • In one implementation, after the face bounding box is estimated, a local search within that bounding box is performed and in the worst case the number of 24×24 windows searched in the bounding box can be 36. As shown in FIG. 10, on the LHS, the bounding box that is estimated is shown and on RHS, a localized search done within the bounding box to identify the face is shown.
  • In one implementation, the present disclosure provides the usage of 6×6 grid spacing. The technical advantages of using this grid spacing are as follows.
  • The number of classified patches: In case of 6×6, the numbers of possible patches are 9 with an overlap area of 3 pixels. In case of 4×4 it (possible patches) will be more and in case of 8×8 it will be less. All of this is depending upon the overlapped area. This will result in too many or too less leaf nodes for classification and regression trees (CART) or Random forest classifier which is used for patch classification.
  • Background area covered: The chosen grid size may result in some of the background area being covered in the test/train images. Usually some pixels around the eyes and below the lips for a 24×24 face image are used. For bounding box the area is extended without actually zooming in the face image, thus this will mean that some of the background area such as ears, hair, chin will come into picture, if 8×8 size is used then more of background area may come into picture which will have adverse effect on the patch classifier output.
  • While aspects of described processing devices and methods for faster face detection may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
  • While illustrative embodiments of the present disclosure are described below, it will be appreciated that the present disclosure may be practiced without the specified details, and that numerous implementation-specific decisions may be made to the disclosure described herein to achieve the developer's specific goals, such as compliance with system-related and business-related constraints, which will vary from one system to other system such a development effort might be complex and time-consuming, it would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure. For example, selected aspects are shown in block diagram form, rather than in detail, in order to avoid obscuring or unduly limiting the present disclosure. Such descriptions and representations are used by those skilled in the art to describe and convey the substance of their work to others skilled in the art. The present disclosure will now be described with reference to the drawings described below.
  • FIG. 11 illustrates a detection flow chart for bounding box based in accordance with an embodiment of the present subject matter.
  • FIG. 12 illustrates operations to detect presence of at least one face in the at least one image executed by one or more processors in accordance with an embodiment of the present subject matter.
  • In one implementation, a processing device is disclosed. The processing device comprises of non-transitory computer readable storage medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations that comprise of creating an image patch map based on a plurality of face patches identified for at least one window in at least one image, estimating a bounding box, and searching within the bounding box to detect presence of at least one face in the at least one image, is disclosed. The non-transitory computer readable storage medium storing instructions and the one or more processors are a part of a processing device.
  • In one implementation, the image patch map is created using the steps of: identifying said plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face, training a patch classifier using the plurality of face patches identified, evaluating the patch classifier trained, and applying the patch classifier on the windows using a pre-defined grid spacing, thereby creating the image patch map.
  • In one implementation, the plurality of face patches are identified using the at least one window surrounding a face box for identifying the plurality of face patches, wherein the at least one window size is holds the plurality of face patches in a face template centered on the face template size.
  • In one implementation, the at least one window size is preferably of 36×36, and the face template size is preferably of 24×24.
  • In one implementation, the pre-defined grid spacing is preferably of size 6×6.
  • In one implementation, the bounding box is estimated by applying a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face.
  • In one implementation, searching within the bounding box is a localized searching and is characterized using an aggressive grid spacing of size 1×1.
  • In one implementation, the non-transitory computer readable storage medium storing instructions performs a patch based approach for plurality of face patches identified for at least one window thereby applying a full face classifier in the bounding box.
  • In one implementation, wherein training the patch classifier is characterized by use of a decision tree using at least one face patch from the plurality of face patches to identify at least one patch type, and one-vs.-all approach, wherein one-vs.-all approach considers one face patch vs. the rest of face patches.
  • In one implementation, it is understood that the evaluation of the patch classifier is the evaluation of the trained classifier on the target image or input received image by the device on which face is detected. In one implementation there are two sets of images one from which the classifier learns that a particular structure is face and the other one which this classifier is applied. Evaluation usually is referred to the application of this trained classifier on the target image in which a face is to be detected.
  • FIG. 13 illustrating method 200 of detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter. The method may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
  • The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in described processing device 102 as shown in FIG. 7.
  • At step 202, an image patch map is created. In on implementation, the image map is created based on a plurality of face patches identified for at least one window in said in at least one image.
  • At step 204, a bounding box is estimated. In one implementation, the bounding box is estimated by applying 308 a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face, as shown in FIG. 9.
  • At step 206, the bounding box is searched to detect presence of the at least one face in the at least one image. In one implementation, searching within the bounding box is a localized searching and is characterized using an aggressive grid spacing of size 1×1.
  • FIG. 14 illustrating a method for creating 202 the image patch map is shown, in accordance with an embodiment of the present subject matter.
  • At step 302, the plurality of face patches are identified for the at least one window. In one implementation, the at least one window is a detected face region of the at least one face. In one implementation, the plurality of face patches are identified using the at least one window surrounding a face box for identifying 302 the plurality of face patches, wherein the at least one window size holds the plurality of face patches in a face template centered on the face template size. In one implementation, the at least one window size is preferably of 36×36, and the face template size is preferably of 24×24.
  • At step 304, patch classifiers are trained using the plurality of face patches identified.
  • At step 306, the patch classifier trained are evaluated.
  • At step 308, the patch classifiers are applied on the at least one window using a pre-defined grid spacing, thereby creating 202 the image patch map. In one implementation, the pre-defined grid spacing is preferably of size 6×6.
  • FIG. 15 illustrating a special purpose processing device 102 for detecting a presence of at least one face in at least one image is shown, in accordance with an embodiment of the present subject matter.
  • In one implementation, a processing device 102 is disclosed. The processing device 102 comprises of one or more storages 402 capable of storing one or more images and other data, and a face detector 404. The processing device 102 is configured to perform operations that comprise of creating 202 an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating 204 a bounding box, and searching 206 within the bounding box to detect presence of the at least one face in the one or more images.
  • In one implementation, the image patch map is created using the steps of identifying 302 the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face, training 304 a patch classifier using the plurality of face patches identified, evaluating 306 the patch classifier trained, and applying 308 the patch classifier on the windows using a pre-defined grid spacing, thereby creating 202 the image patch map.
  • In one implementation, the plurality of face patches are identified using the at least one window surrounding a face box for identifying 302 the plurality of face patches, wherein the at least one window size holds the plurality of face patches in a face template centered on the face template size.
  • In one implementation, the at least one window size is preferably of 36×36, and the face template size is preferably of 24×24.
  • In one implementation, the pre-defined grid spacing is preferably of size 6×6.
  • In one implementation, the bounding box is estimated by applying 308 a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face. The bounding box is estimated based on a threshold, wherein the bounding box is preferably of size 36×36, and the threshold is based on the at least one face patch mapped with the at least one face. In one example, the bounding box may be estimated based on a threshold value keeping a tolerance of 4 i.e., if 4 or more types in the mask matching then bounding box of 36×36 is chosen.
  • In one implementation, searching 206 within the bounding box is a localized searching 206 and is characterized using an aggressive grid spacing of size 1×1.
  • In one implementation, the non-transitory computer readable storage medium 108 storing instructions performs a patch based approach for plurality of face patches identified for at least one window thereby applying 308 a full face classifier in the bounding box.
  • In one implementation, wherein training 304 the patch classifier is characterized by use of a decision tree using at least one face patch from the plurality of face patches to identify at least one patch type, and one-vs.-all approach, wherein one-vs.-all approach considers one face patch vs. the rest of face patches.
  • In one implementation, processing device 102 comprises a processor(s) 104 and a non-transitory computer readable storage medium 108 coupled to the processor(s) 104. The non-transitory computer readable storage medium 108 may have a plurality of instructions stored in it. The instructions are executed using the processor 104 coupled to the non-transitory computer readable storage medium 108.
  • In one embodiment, the computer system 102 may include at least one processor 104, an interface (s) 106 may be an input/output (I/O) interface, and a non-transitory computer readable storage medium (s) 108. The at least one processor 104 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 104 is configured to fetch and execute computer-readable instructions stored in the non-transitory computer readable storage medium 108.
  • The I/O interface 106 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 106 may allow the computer system 102 to interact with a user directly or through the client devices (not shown). Further, the I/O interface 106 may enable the computer system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 106 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as wireless LAN (WLAN), cellular, or satellite. The I/O interface 106 may include one or more ports for connecting a number of devices to one another or to another server.
  • The non-transitory computer readable storage medium 108 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 108 may include but is not limited to the plurality of instruction (s). In one implementation the memory may include the face detector 404 which further comprises of plurality of instruction(s) configured to perform operations to detect presence of said at least one face in the one or more images. The operation may include but is not limited to creating 202 an image patch map based on a plurality of face patches identified for at least one window in the one or more images, estimating 204 a bounding box, and searching 206 within the bounding box to detect presence of the at least one face in the one or more images.
  • In one implementation, the processing device 102 may include storage(s) 402 configured to store at least one image received from the external devices or captured by the processing device 102.
  • Although the present subject matter is explained considering that the present system 102 is implemented as a processing device 102, it may be understood that the processing device 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, as a software on a server and the like. It will be understood that the processing device 102 may be accessed by multiple users through one or more user devices collectively referred to as user hereinafter, or applications residing on the user devices. Examples of the processing device 102 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation.
  • FIG. 11 illustrating a face patch classification in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 12 illustrating a face patch examples in present disclosure are shown, in accordance with an embodiment of the present subject matter.
  • FIG. 13 illustrating a face patch masking operation in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 14 illustrating a subsequent localized search within bounding box in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • FIG. 15 illustrating a flow chart for face detection in present disclosure is shown, in accordance with an embodiment of the present subject matter.
  • In one implementation, the patch classifier discussed in above sections is derived using a decision tree or a random forest based classifier. But it is well understood by the person skilled in the art that any other classifier may be used as well in place of these.
  • Secondly, the feature representation is by the use of MCT in the present disclosure. But it is well understood by the person skilled in the art that any other feature type may be chosen which will have some accuracy versus central processing unit (CPU) performance tradeoff.
  • Next, a simple mask as shown in FIG. 13 is disclosed, but it is well understood by the person skilled in the art that there are several other variations of this mask that can be employed. One technique is to assign different weighting to different patches. It is possible to use a weight based approach for patch classification wherein with every detected face patch there will be a corresponding weighting assigned to that patch. The final output will be summation of that weighting threshold by an empirical value that can be derived during training phase.
  • Thus, it is well understood by the person skilled in the art that the present disclosure encompasses an idea of using any classifier which works on top of any feature representation to identify face patches and then using a masking system to identify bounding boxes.
  • Although implementations for a processing device and method for faster face detection have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for the processing device and method for faster face detection.
  • Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include the following.
  • The present disclosure improves face detection time multi-fold without impacting the accuracy.
  • The present disclosure may be used in real-time systems even with HD videos/images.
  • The present disclosure is suitable for generic object detection and not constrained to face domain.
  • Exemplary embodiments discussed above may provide certain applicable areas of the present disclosure. Though not required to practice aspects of the disclosure, these application of the disclosure may include:
  • Handheld Terminals/devices: Face detection step is a precursor to any face recognition system. The technique mentioned in this document will make sure that even the low-cost, low-power handheld terminals can have face detection logic inbuilt.
  • HD Video Surveillance: With HD camera becoming commodity hardware it becomes all the more important to process the HD input video frames at higher speed within the constrained hardware. The technique mentioned here will improve the speed of detection many folds.
  • Camera auto-focus based on face detection: Cameras with face detection notice when a face is in the frame and then set the autofocus and exposure settings to give priority to the face. Cameras usually have low CPU capability and thus it is more important to achieve HD frame face detection with lower CPU utilization.
  • Finally, it should be understood that the above embodiments are only used to explain, but not to limit the technical solution of the present application. Despite the detailed description of the present application with reference to above preferred embodiments, it should be understood that various modifications, changes or equivalent replacements can be made by those skilled in the art without departing from the scope of the present application and covered in the claims of the present application.

Claims (17)

1. A method for detecting a presence of at least one face in at least one image, comprising:
creating an image patch map based on a plurality of face patches identified for at least one window in the at least one image;
estimating a bounding box based on the image patch map; and
searching within the bounding box to detect the presence of the at least one face in the at least one image.
2. The method of claim 1, wherein creating the image patch map comprises:
identifying the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face;
training a patch classifier using the plurality of face patches identified;
applying the patch classifier on the at least one window using a pre-defined grid spacing to create the image patch map.
3. The method of claim 1, wherein the plurality of face patches are identified using a window of a window size surrounding a face box, and wherein the window holds the plurality of face patches in a face template centered on a face template size.
4. The method of claim 1, wherein the window size is of 36 pixels by 36 pixels, and wherein face template size is of 24 pixels by 24 pixels.
5. The method of claim 1, wherein a size of the pre-defined grid spacing is 6 pixels by 6 pixels.
6. The method of claim 1, wherein estimating the bounding box comprises applying a matrix mask on the image patch map to check at least one face patch from the plurality of face patches are mapped to the at least one face to estimate the bounding box based on a threshold, wherein a size of the bounding box is 36 pixels by 36 pixels, and wherein the threshold is based on the at least one face patch mapped with the at least one face.
7. The method of claim 1, wherein searching within the bounding box comprises a localized searching and is performed using a grid spacing of size 1 pixel by 1 pixel.
8. The method of claim 1, wherein training the patch classifier comprises using at least one face patch in a decision tree from the plurality of face patches to identify at least one patch type, wherein training the patch classifier comprises a one-vs-all approach, and wherein the one-vs-all approach considers one face patch a remainder rest of face patches.
9. A processing device for detecting a presence of at least one face in at least one image, the processing device comprising:
a processor;
a memory coupled to the processor for executing a plurality of instructions present in the memory, wherein the execution of the instructions cause the processor to perform operations comprising:
creating an image patch map based on a plurality of face patches identified for at least one window in at least one image;
estimating a bounding box based on the image patch map; and
searching within the bounding box to detect the presence of the at least one face in the at least one image.
10. The processing device of claim 9, wherein the processor is configured to perform operations comprising:
identifying the plurality of face patches for the at least one window, wherein the at least one window is a detected face region of the at least one face;
training a patch classifier using the plurality of face patches identified;
applying the patch classifier on the windows using a pre-defined grid spacing to create the image patch map.
11. The processing device of claim 10, wherein the plurality of face patches are identified using a window of a window size surrounding a face box to identify a plurality of face patches, and wherein the window holds the plurality of face patches in a face template centered on the face template size.
12. The processing device of claim 9, wherein the at least one window size is of 36 pixels by 36 pixels, and wherein the face template size is of 24 pixels by 24 pixels.
13. The processing device of claim 9, wherein a size of the pre-defined grid spacing is 6 pixels by 6 pixels.
14. The processing device of claim 9, wherein the processor is configured to perform operations comprising applying a matrix mask on the image patch map to check at least one face patch the plurality of face patches are mapped to the at least one face to estimate the bounding box based on a threshold, wherein a size of the bounding box is 36 pixels by 36 pixels, and wherein the threshold is based on the at least one face patch mapped with the at least one face.
15. The processing device of claim 9, wherein searching within the bounding box comprises a localized searching and is performed using a grid spacing of size of 1 pixel by 1 pixel.
16. The processing device of claim 9, wherein training the patch classifier comprises using at least one face patch in a decision tree from the plurality of face patches to identify at least one patch type, wherein training the path classifier comprises a one-vs-all approach, and wherein the one-vs-all approach considers one face patch against a remainder of face patches.
17. The processing device of claim 9, wherein the memory is further configured to store at least one image which is used for detecting the presence of the at least one face.
US15/416,533 2014-08-07 2017-01-26 Processing device and method for face detection Active 2035-02-05 US10296782B2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IN3891CH2014 2014-08-07
IN3891/CHE/2014 2014-08-07
ININ3891/CHE/2014 2014-08-07
PCT/CN2015/071466 WO2016019709A1 (en) 2014-08-07 2015-01-23 A processing device and method for face detection

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/071466 Continuation WO2016019709A1 (en) 2014-08-07 2015-01-23 A processing device and method for face detection

Publications (2)

Publication Number Publication Date
US20170161549A1 true US20170161549A1 (en) 2017-06-08
US10296782B2 US10296782B2 (en) 2019-05-21

Family

ID=55263095

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/416,533 Active 2035-02-05 US10296782B2 (en) 2014-08-07 2017-01-26 Processing device and method for face detection

Country Status (4)

Country Link
US (1) US10296782B2 (en)
EP (1) EP3167407A4 (en)
CN (1) CN106462736B (en)
WO (1) WO2016019709A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019651B1 (en) * 2016-12-25 2018-07-10 Facebook, Inc. Robust shape prediction for face alignment

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107851192B (en) * 2015-05-13 2023-04-14 北京市商汤科技开发有限公司 Apparatus and method for detecting face part and face
CN110660067A (en) * 2018-06-28 2020-01-07 杭州海康威视数字技术股份有限公司 Target detection method and device
CN109376717A (en) * 2018-12-14 2019-02-22 中科软科技股份有限公司 Personal identification method, device, electronic equipment and the storage medium of face comparison
CN112215154B (en) * 2020-10-13 2021-05-25 北京中电兴发科技有限公司 Mask-based model evaluation method applied to face detection system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070154096A1 (en) * 2005-12-31 2007-07-05 Jiangen Cao Facial feature detection on mobile devices
US20110001850A1 (en) * 2008-02-01 2011-01-06 Gaubatz Matthew D Automatic Redeye Detection
US20140241623A1 (en) * 2013-02-22 2014-08-28 Nec Laboratories America, Inc. Window Dependent Feature Regions and Strict Spatial Layout for Object Detection
US20150110352A1 (en) * 2013-10-23 2015-04-23 Imagination Technologies Limited Skin Colour Probability Map
US20150186748A1 (en) * 2012-09-06 2015-07-02 The University Of Manchester Image processing apparatus and method for fitting a deformable shape model to an image using random forest regression voting
US20150243031A1 (en) * 2014-02-21 2015-08-27 Metaio Gmbh Method and device for determining at least one object feature of an object comprised in an image
US20150347822A1 (en) * 2014-05-29 2015-12-03 Beijing Kuangshi Technology Co., Ltd. Facial Landmark Localization Using Coarse-to-Fine Cascaded Neural Networks

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5715325A (en) 1995-08-30 1998-02-03 Siemens Corporate Research, Inc. Apparatus and method for detecting a face in a video image
US6263113B1 (en) 1998-12-11 2001-07-17 Philips Electronics North America Corp. Method for detecting a face in a digital image
JP3639452B2 (en) * 1999-02-12 2005-04-20 シャープ株式会社 Image processing device
US7155058B2 (en) * 2002-04-24 2006-12-26 Hewlett-Packard Development Company, L.P. System and method for automatically detecting and correcting red eye
KR100474848B1 (en) * 2002-07-19 2005-03-10 삼성전자주식회사 System and method for detecting and tracking a plurality of faces in real-time by integrating the visual ques
US7916126B2 (en) * 2007-06-13 2011-03-29 Apple Inc. Bottom-up watershed dataflow method and region-specific segmentation based on historic data to identify patches on a touch sensor panel
US8712109B2 (en) * 2009-05-08 2014-04-29 Microsoft Corporation Pose-variant face recognition using multiscale local descriptors
CN101923637B (en) * 2010-07-21 2016-03-16 康佳集团股份有限公司 A kind of mobile terminal and method for detecting human face thereof and device
US8774519B2 (en) * 2012-08-07 2014-07-08 Apple Inc. Landmark detection in digital images
CN103489174B (en) * 2013-10-08 2016-06-29 武汉大学 A kind of face super-resolution method kept based on residual error
CN103824049A (en) * 2014-02-17 2014-05-28 北京旷视科技有限公司 Cascaded neural network-based face key point detection method
CN103870824B (en) * 2014-03-28 2017-10-20 海信集团有限公司 A kind of face method for catching and device during Face datection tracking

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070154096A1 (en) * 2005-12-31 2007-07-05 Jiangen Cao Facial feature detection on mobile devices
US20110001850A1 (en) * 2008-02-01 2011-01-06 Gaubatz Matthew D Automatic Redeye Detection
US20150186748A1 (en) * 2012-09-06 2015-07-02 The University Of Manchester Image processing apparatus and method for fitting a deformable shape model to an image using random forest regression voting
US20140241623A1 (en) * 2013-02-22 2014-08-28 Nec Laboratories America, Inc. Window Dependent Feature Regions and Strict Spatial Layout for Object Detection
US20150110352A1 (en) * 2013-10-23 2015-04-23 Imagination Technologies Limited Skin Colour Probability Map
US20150243031A1 (en) * 2014-02-21 2015-08-27 Metaio Gmbh Method and device for determining at least one object feature of an object comprised in an image
US20150347822A1 (en) * 2014-05-29 2015-12-03 Beijing Kuangshi Technology Co., Ltd. Facial Landmark Localization Using Coarse-to-Fine Cascaded Neural Networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Köstinger, Martin, et al. "Robust face detection by simple means." DAGM 2012 CVAW workshop. 2012. *
Köstinger, Martin, et al. "Robust face detection by simple means." DAGM 2012 CVAW workshop. 2012. *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019651B1 (en) * 2016-12-25 2018-07-10 Facebook, Inc. Robust shape prediction for face alignment

Also Published As

Publication number Publication date
EP3167407A1 (en) 2017-05-17
CN106462736B (en) 2020-11-06
WO2016019709A1 (en) 2016-02-11
CN106462736A (en) 2017-02-22
US10296782B2 (en) 2019-05-21
EP3167407A4 (en) 2017-11-08

Similar Documents

Publication Publication Date Title
US20200364443A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
US10296782B2 (en) Processing device and method for face detection
Peng et al. Face presentation attack detection using guided scale texture
KR101385599B1 (en) Method and apparatus for interfering montage
Pang et al. Learning sampling distributions for efficient object detection
Shepley Deep learning for face recognition: a critical analysis
Gautam et al. Video analytics-based intelligent surveillance system for smart buildings
Wang et al. A coupled encoder–decoder network for joint face detection and landmark localization
WO2021137946A1 (en) Forgery detection of face image
US11334773B2 (en) Task-based image masking
Ye et al. Scene text detection via integrated discrimination of component appearance and consensus
Ranftl et al. Real‐time AdaBoost cascade face tracker based on likelihood map and optical flow
KR102177453B1 (en) Face recognition method and face recognition apparatus
Alshaikhli et al. Face-Fake-Net: The Deep Learning Method for Image Face Anti-Spoofing Detection: Paper ID 45
Singh et al. Face recognition using open source computer vision library (OpenCV) with Python
Huang et al. Multi-Teacher Single-Student Visual Transformer with Multi-Level Attention for Face Spoofing Detection.
Senthilkumar et al. Suspicious human activity detection in classroom examination
Kompella et al. Weakly supervised multi-scale recurrent convolutional neural network for co-saliency detection and co-segmentation
EP4332910A1 (en) Behavior detection method, electronic device, and computer readable storage medium
Bhattacharya HybridFaceMaskNet: A novel face-mask detection framework using hybrid approach
Forczmański Human face detection in thermal images using an ensemble of cascading classifiers
Smiatacz et al. Local texture pattern selection for efficient face recognition and tracking
Shanmuhappriya Automatic attendance monitoring system using deep learning
Maheshwari et al. Bilingual text detection in natural scene images using invariant moments
Liu et al. A new face detection framework based on adaptive cascaded network

Legal Events

Date Code Title Description
AS Assignment

Owner name: HUAWEI TECHNOLOGIES CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARIAPPAN, VIJAYACHANDRAN;JADHAV, RAHUL ARVIND;SHARMA, PUNEET BALMUKUND;REEL/FRAME:041386/0451

Effective date: 20150728

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4