WO2019114036A1 - Procédé et dispositif de détection de visage, dispositif informatique et support d'informations lisible par ordinateur - Google Patents

Procédé et dispositif de détection de visage, dispositif informatique et support d'informations lisible par ordinateur Download PDF

Info

Publication number
WO2019114036A1
WO2019114036A1 PCT/CN2017/119043 CN2017119043W WO2019114036A1 WO 2019114036 A1 WO2019114036 A1 WO 2019114036A1 CN 2017119043 W CN2017119043 W CN 2017119043W WO 2019114036 A1 WO2019114036 A1 WO 2019114036A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
face
frame
shoulder
head
Prior art date
Application number
PCT/CN2017/119043
Other languages
English (en)
Chinese (zh)
Inventor
张兆丰
牟永强
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2019114036A1 publication Critical patent/WO2019114036A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates to the field of computer vision technology, and in particular, to a face detection method and apparatus, a computer apparatus, and a computer readable storage medium.
  • Pedestrians capture commonly used methods such as face detection, head-shoulder detection, and pedestrian detection. Because face features are obvious and stable, face detection is the highest detection rate among the three detection methods, and the lowest detection rate. However, the actual application scene is more complicated, the change of the face angle (upward head, bow, side face), the change of illumination (backlight, shadow), occlusion (sunglasses, masks, hats), etc. will reduce the detection rate of the face.
  • the head-shoulder test detects the head and shoulders as a whole. Because the head and shoulders are not as distinctive and unique as the face features, the detection effect is slightly worse than the face.
  • head-to-shoulder detection generally uses edge features (HOG) or texture features (LBP), which are more complex and computationally time consuming.
  • HOG edge features
  • LBP texture features
  • Pedestrian detection generally requires the detection of the whole body. Pedestrians must all appear in the picture to be detected, but the actual scene is often not satisfied.
  • a first aspect of the present application provides a face detection method, the method comprising:
  • the detection frame is classified to obtain multiple candidate face frames
  • the number of layers of the image pyramid is determined by the following formula:
  • n represents the number of layers of the image pyramid of the image to be detected
  • k up represents the multiple of the image to be detected
  • w img and h img respectively represent the width and height of the image to be detected
  • w m and h m respectively represent the person
  • n octave represents the number of layers of the image between each double size in the image pyramid.
  • the collecting the aggregate channel features of each layer image of the image pyramid includes:
  • the face detection model and the head-shoulder detection model are classifiers formed by cascading a plurality of decision trees.
  • the method further includes: acquiring a training sample of the head-shoulder detection model, and the specific method is as follows:
  • the trained face detection model is reduced by several decision trees to obtain a new face detection model
  • the trained face detection model and the new face detection model are detected on a preset image, and the new face detection model detects more faces than the trained face Detecting the face detected by the model;
  • the position of the face frame in the preset image is marked, and the face frame is extended to obtain the head-shoulder a frame, marking a position of the head-shoulder frame in the preset image;
  • the shoulder frame image scales the intercepted non-head-shoulder frame image to a predetermined size as a negative sample for training the head-shoulder detection model.
  • a second aspect of the present application provides a face detecting device, the device comprising:
  • a construction unit for constructing an image pyramid for the image to be detected
  • An extracting unit configured to extract an aggregate channel feature of each layer image of the image pyramid, to obtain a feature pyramid of the image to be detected
  • a first detecting unit configured to slide on each layer image of the image pyramid by using a first sliding window according to a first preset step, to obtain a plurality of first detecting frames, and using the trained face detecting model according to the The feature pyramid classifies the first detection frame to obtain a plurality of candidate face frames;
  • a first merging unit configured to merge the candidate face frames to obtain a merged candidate face frame
  • a second detecting unit configured to slide on each layer of the image pyramid according to a second preset step by using a second sliding window, to obtain a plurality of second detecting frames, and using the trained head-shoulder detecting model according to the The feature pyramid classifies the second detection frame to obtain a plurality of candidate head-shoulder frames;
  • a second merging unit configured to merge the candidate head-shoulder frames to obtain a combined candidate head-shoulder frame
  • a prediction unit configured to predict a face from the merged candidate head-shoulder frame by using the trained face frame prediction model to obtain a predicted face frame
  • a third merging unit configured to merge the merged candidate face frame and the predicted face frame to obtain a target face frame.
  • the constructing unit determines the number of layers of the image pyramid according to the following formula:
  • n represents the number of layers of the image pyramid of the image to be detected
  • k up represents the multiple of the image to be detected
  • w img and h img respectively represent the width and height of the image to be detected
  • w m and h m respectively represent the person
  • n octave represents the number of layers of the image between each double size in the image pyramid.
  • the face detection model and the head-shoulder detection model are classifiers formed by cascading a plurality of decision trees.
  • a third aspect of the present application provides a computer apparatus including a processor that implements the face detection method when the processor is configured to execute a computer program stored in a memory.
  • a fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program that implements the face detection method when executed by a processor.
  • the invention constructs an image pyramid of the image to be detected; extracts an aggregate channel feature of each layer image of the image pyramid to obtain a feature pyramid of the image to be detected; and uses the first sliding window according to the first preset step size in the image pyramid Sliding on each layer of the image, obtaining a plurality of first detection frames, and classifying the first detection frame according to the feature pyramid by using the trained face detection model to obtain a plurality of candidate face frames;
  • the face frames are merged to obtain a merged candidate face frame; and the second sliding window is used to slide on each layer of the image pyramid according to the second preset step to obtain a plurality of second detection frames, which are trained.
  • the head-shoulder detection model classifies the second detection frame according to the feature pyramid to obtain a plurality of candidate head-shoulder frames; and combines the candidate head-shoulder frames to obtain a combined candidate head-shoulder frame; Using the trained face frame prediction model to predict a face from the merged candidate head-shoulder frame to obtain a predicted face frame; the merged candidate face frame and the Measuring face framing combined to give the target face framing.
  • the normal face detection (that is, face detection by the face detection model) has a high detection rate and a low false detection rate.
  • the present invention uses the usual face detection as a main detection scheme.
  • the usual face detection is sensitive to changes in angle (upward head, low head, side face), changes in illumination (backlighting, shadowing), occlusion (sunglasses, masks, caps), etc., and is prone to missed detection.
  • the present invention employs head-shoulder detection as an auxiliary detection scheme, and after detecting the head-shoulder area, the face frame is extracted. Finally, the face faces obtained by the usual face detection and head-shoulder detection are combined to form the final face frame output.
  • the present invention combines face detection and head-shoulder detection to improve the face detection rate.
  • the present invention adopts the same feature (ie, aggregate channel feature) in face detection and head-shoulder detection, which reduces the time for feature extraction and speeds up the detection process. Therefore, the present invention can realize face detection with fast high detection rate.
  • FIG. 1 is a flowchart of a face detection method according to Embodiment 1 of the present invention.
  • FIG. 2 is a schematic diagram of a face frame prediction model as a convolutional neural network.
  • FIG. 3 is a structural diagram of a face detecting apparatus according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram of a computer device according to Embodiment 3 of the present invention.
  • the face detection method of the present invention is applied in one or more computer devices.
  • the computer device is a device capable of automatically performing numerical calculation and/or information processing according to an instruction set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor and an application specific integrated circuit (ASIC). , Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.
  • ASIC application specific integrated circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded devices etc.
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device.
  • FIG. 1 is a flowchart of a face detection method according to Embodiment 1 of the present invention.
  • the face detection method is applied to a computer device.
  • the face detection method can be applied to various video surveillance scenarios, such as intelligent transportation, access control systems, urban security and security.
  • intelligent transportation the present invention can be used to perform face detection on a pedestrian or a driver.
  • the present invention detects a face region from an image to be detected for face-based processing such as face recognition, expression analysis, and the like.
  • face-based processing such as face recognition, expression analysis, and the like.
  • the surveillance image captured by the camera near the zebra crossing on the road is the image to be detected, and the present invention detects the face region from the monitoring image for pedestrian recognition.
  • the face detection method specifically includes the following steps:
  • the image to be detected is an image containing a human face, usually a surveillance image.
  • the image to be detected may include one face or multiple faces.
  • the image to be detected may be an image received from the outside, an image taken by the computer device, an image read from a memory of the computer device, or the like.
  • the image to be detected may be a grayscale image or a color image such as an RGB image, an LUV image, or an HSV image.
  • the image pyramid to be detected is a different scale of the image to be detected (can be enlarged or reduced) to obtain a scaled image of different sizes, and the image to be detected and its scaled image constitute an image pyramid of the image to be detected.
  • the image to be detected is scaled by 75% to obtain a first scaled image
  • the image to be detected is scaled by 50% to obtain a second scaled image
  • the image to be detected is scaled by 25% to obtain a third scaled image, the image to be detected and the first scaled image
  • the second zoom The image, the third zoom image constitutes an image pyramid.
  • the number of layers of the image pyramid of the image to be detected may be determined according to the size of the image to be detected and the size of the face detection model (see step 103) used in the present invention (ie, the size of the input image received by the face detection model).
  • the number of layers of the image pyramid of the image to be detected can be determined by the following formula:
  • n represents the number of layers of the image pyramid of the image to be detected
  • k up represents the multiple of the image to be detected (ie, the multiple of the image to be detected is enlarged)
  • w img and h img respectively represent the width and height of the image to be detected
  • w m and h m respectively represent the width and height of the face detection model (ie, the width and height of the input image received by the face detection model)
  • n octave represents the number of layers of the image between every two sizes in the image pyramid.
  • the width and height of the image to be detected are known, and the width and height of the face detection model are also known.
  • k up can be set by the user as needed, or the system default (for example, the default is 2).
  • the n octave can be set by the user as needed, or the system default (for example, the default is 8).
  • the polymeric channel features can include color features, gradient magnitude features, and gradient direction histogram features.
  • the color features may include RGB color features, LUV color features, HSV color features, grayscale features, and the like.
  • the color feature can be obtained directly from the image to be detected. For example, if the image to be detected is an RGB image, the RGB color feature can be directly obtained; if the image to be detected is an LUV image, the LUV color feature can be directly obtained; if the image to be detected is an HSV image, the HSV color feature can be directly obtained; The image is a grayscale image, and the grayscale feature can be directly obtained.
  • the image to be detected may be converted to obtain the color feature.
  • the image to be detected is an RGB image
  • the RGB image may be converted into a grayscale image (ie, a corresponding RGB value is calculated according to the gray value of each pixel) to obtain a grayscale feature of the image to be detected.
  • Gradient has a variety of calculation methods, such as using Sobel, Prewitt or Roberts operators to calculate the gradient of each pixel (including horizontal gradient values and vertical gradient values).
  • the gradient magnitude and gradient direction of each pixel point are determined according to the gradient of each pixel point.
  • the gradient magnitude of each pixel of the image is the gradient magnitude feature of the image.
  • the gradient direction histogram of the image that is, the gradient direction histogram feature of the image
  • the image may be divided into a plurality of equal-sized blocks (for example, 4 ⁇ 4 blocks), and the gradient direction histograms of the respective blocks are respectively obtained, and the image is obtained according to the gradient direction histogram of each block. Gradient direction histogram.
  • the gradient direction histogram of each block can be calculated as follows: according to the gradient direction of each pixel in the block, each pixel in the block is divided into a plurality of different angular ranges (for example, 6 angular ranges); statistical area The gradient amplitude of the pixel points in each angular range of the block obtains the gradient magnitude of each angular range in the block; and the gradient direction histogram of each block is obtained according to the gradient amplitude of each angular range in the block.
  • an aggregate channel feature (referred to as a real feature) of a partial image (referred to as a real feature layer) in the image pyramid, and other images in the image pyramid (referred to as an approximation) may be calculated.
  • the aggregate channel feature of the feature layer is obtained by real feature interpolation, for example, by real feature interpolation corresponding to the real feature layer closest to it.
  • the real feature layer in the image pyramid can be specified by the user as needed, or it can be system default.
  • s represents the ratio of the approximate feature layer to the real feature layer.
  • ⁇ ⁇ is constant for one feature, and the value of ⁇ ⁇ can be estimated in the following manner. Estimating, be replaced by a k ⁇ s k s, among them, Indicates that the image I i is scaled by the ratio s, f ⁇ (I) represents the feature ⁇ for the image I, and the features are averaged, and N represents the number of images participating in the estimation. In a specific embodiment, the value of s is N is taken to be 50000, and ⁇ ⁇ is obtained by the least squares method.
  • the 103 Swipe on each layer image of the image pyramid according to a first preset step by using a first sliding window to obtain a plurality of first detection frames, and use the trained face detection model according to the feature pyramid pair
  • the first detection frame is classified to obtain a plurality of candidate face frames.
  • the candidate face frame is a first detection frame classified as a face.
  • the size of the first sliding window is equal to the size of the input image received by the face detection model.
  • the size of the first sliding window is 32 ⁇ 32
  • the first preset step size is 2 (ie, 2 pixels).
  • the first sliding window and the first predetermined step size may be other sizes.
  • the first sliding window slides on the image of each layer of the image pyramid according to a preset direction (for example, from top to bottom, from left to right), and each position obtains a first detection frame, and the trained face detection model is used.
  • the first detection frame is classified to determine whether the first detection frame is a candidate face frame.
  • the face detection model may be a classifier formed by cascading a plurality of (for example, 512) decision trees, that is, a strong classifier formed by cascading a plurality of weak classifiers.
  • a decision tree also known as a decision tree, is a tree structure that is applied to classification. Each internal node in the decision tree represents a test of an attribute, with each edge representing a test result, a leaf node representing the distribution of a class or class, and the top node being the root node.
  • the decision tree constituting the face detection model may have a depth of 8 or other values.
  • the face detection model formed by multiple decision trees can be trained using an adbost method such as the Gentle adboost method.
  • the training samples required to train the face detection model include positive samples and negative samples.
  • the positive sample of the trained face detection model is a face frame image
  • the negative sample is a non-face frame image.
  • the face frame image may be intercepted from the monitoring image, and the captured face frame image is scaled to a first predetermined size (for example, 32 ⁇ 32) as a positive sample of the training face detection model;
  • the non-face frame image is intercepted, and the intercepted non-face frame image is scaled to a first predetermined size (for example, 32 ⁇ 32) as a negative sample of the training face detection model.
  • the intercepted non-face frame image is an image taken from an image area outside the area where the face frame is located.
  • the training of the face detection model can refer to the prior art, and details are not described herein again.
  • Merging the candidate face frame is to de-weight the candidate face frame.
  • the merged candidate face frame can be one or more. If the image to be detected includes a face, a merged candidate face frame may be obtained; if the image to be detected includes multiple faces, a merged candidate face frame may be obtained for each face.
  • Candidate face frames can be merged by non-maximum suppression (NMS) algorithm, that is, according to the probability that the candidate face frame belongs to the face and the overlapping area ratio of the candidate face frame (Intersection over Union, IOU) ) Merging candidate face frames.
  • NMS non-maximum suppression
  • the merging of the candidate face frames by the NMS algorithm may include: sorting all candidate face frames according to the probability of belonging to the face; selecting the candidate face frame with the highest probability, and separately determining other candidates Whether the ratio of the overlapping area of the face frame to the selected candidate face frame is greater than a first preset threshold (for example, 0.25); if the overlapping area ratio is greater than the first preset threshold, deleting the other candidate face frame, and selecting the selected
  • the candidate face frame is used as the merged candidate face frame; the candidate face frame with the highest probability is selected from the remaining candidate face frames, and the above process is repeated until all the merged candidate face frames are obtained.
  • the remaining candidate face frame refers to the candidate face frame that is removed from the deleted candidate face frame and the merged candidate face frame.
  • candidate face frames which are ranked as A, B, C, D, E, and F according to the probability of belonging to the face.
  • Select the candidate face frame F with the highest probability to determine whether the ratio of the overlapping areas of A to E and F is greater than the first preset threshold.
  • B and D are deleted, and the flag F is the first merged candidate face frame obtained.
  • From the remaining candidate face frames A, C, and E select the face E of the candidate with the highest probability, and determine whether the ratio of the overlapping area of A, C, and E is greater than the first preset threshold.
  • the merged candidate face frames F and E are obtained by the NMS algorithm.
  • Step 105 Swipe on each layer of the image pyramid according to a second preset step by using a second sliding window to obtain a plurality of second detection frames, and using the trained head-shoulder detection model according to the feature pyramid pair
  • the second detection frame is classified to obtain a plurality of candidate head-shoulder frames.
  • the candidate head-shoulder frame is a second detection frame classified into a head-shoulder frame.
  • the size of the second sliding window is equal to the size of the input image received by the head-shoulder detection model.
  • the size of the second sliding window may be 64 ⁇ 64, and the second preset step size may be 2.
  • the second sliding window and the second predetermined step size may be other sizes.
  • the second preset step size may be equal to the first preset step size.
  • the second preset step size may also be not equal to the first preset step size.
  • the first preset step size is 2, and the second preset step size is 4.
  • the second sliding window slides on each layer image of the image pyramid according to a preset direction (for example, from top to bottom, from left to right), and each position obtains a second detection frame, and the trained face detection model is used.
  • the second detection frame is classified to determine whether the second detection frame is a candidate head-shoulder frame.
  • the head-shoulder detection model may be a classifier formed by cascading a plurality (eg, 512) of decision trees.
  • the number of decision trees included in the head-shoulder detection model may be the same as or different from the number of decision trees included in the face detection model.
  • the decision tree constituting the head-shoulder detection model may have a depth of 8, or may be other values.
  • a training sample of the head-shoulder detection model can be obtained from the trained face detection model.
  • the trained face detection model cascaded by the decision tree can be reduced by several decision trees to obtain a new face detection model.
  • the trained face detection model and the new face detection model detect the face on the monitoring image, and the new face detection model detects more faces than the face detected by the trained face detection model.
  • the position of the face frame in the monitoring image is marked, and the face frame is extended to obtain the head-shoulder frame, and the position of the head-shoulder frame in the monitoring image is marked.
  • the position of the head-shoulder frame is marked as [x', y', w', h'], x', y' represents the coordinates of the top left corner of the head-shoulder frame, and w' represents the width of the head-shoulder frame. h' indicates the height of the head-shoulder frame.
  • the head-shoulder frame image may be intercepted from the surveillance image, and the intercepted head-shoulder frame image is scaled to a second predetermined size (eg, 64 ⁇ 64) as a positive sample of the training head-shoulder detection model; the non-head is intercepted from the surveillance image a shoulder frame image that scales the intercepted non-head-shoulder frame image to a second predetermined size as a negative sample of the training head-shoulder detection model.
  • the intercepted non-head-shoulder frame image is an image taken from an image area outside the area where the head-shoulder frame is located.
  • the training sample required by the head-shoulder detection model can be conveniently obtained by the trained face detection model, and the obtained training samples are obtained from the monitoring image, and thus are more in line with the actual monitoring scene.
  • the head-shoulder detection model formed by multiple decision trees can be trained using an adbost method such as the Gentle adboost method.
  • Existing head-shoulder detection generally uses edge features (HOG) or texture features (LBP), which are more complex and computationally time consuming.
  • the invention performs head-shoulder detection according to the feature pyramid of the image to be detected, does not need to perform additional feature extraction, saves the time of feature extraction in the head-shoulder detection process, accelerates the speed of head-shoulder detection, thereby improving the invention The efficiency of the face detection method.
  • Merging the candidate face frame is to de-weight the candidate head-shoulder frame.
  • the combined candidate head-shoulder frame may be one or more. If the image to be detected includes a head-shoulder, a merged candidate head-shoulder frame can be obtained; if the image to be detected includes a plurality of head-shoulders, a merged candidate head-shoulder frame can be obtained for each head-shoulder. .
  • the candidate head-shoulder frames may be merged by a non-maximum suppression algorithm, that is, the candidate head-shoulder frames are merged according to the probability that the candidate head-shoulder frame belongs to the head-shoulder and the candidate head-shoulder frame overlap area ratio.
  • combining the candidate head-shoulder frames by the non-maximum suppression algorithm may include: sorting all candidate head-shoulder frames according to the probability of belonging to the head-shoulder; selecting the candidate head with the highest probability-shoulder a frame, respectively, determining whether a ratio of overlapping area of the other candidate head-shoulder frame and the selected candidate head-shoulder frame is greater than a second preset threshold (eg, 0.30); if the overlapping area ratio is greater than a second preset threshold, deleting the other Candidate head-shoulder frame, and select the candidate head-shoulder frame as the merged candidate head-shoulder frame; select the highest probability candidate head-shoulder frame from the remaining candidate head-shoulder frame, and repeat the above process until Get all the merged candidate head-shoulder frames.
  • the remaining candidate head-shoulder frame refers to the candidate head-shoulder frame remaining except the deleted candidate head-shoulder frame and the merged candidate head-shoulder frame.
  • candidate head-shoulder frames which are ranked as A', B', C', D', E', F' according to the probability of belonging to the head-shoulder.
  • the candidate head with the highest probability, the shoulder frame F' is selected to determine whether the ratio of the overlapping area of A' ⁇ E' and F' is greater than the second predetermined threshold. Assuming that the overlapping area ratio of B', D', and F exceeds the second predetermined threshold, B', D' are deleted, and the flag F' is the first merged candidate head-shoulder frame obtained.
  • 107 predicting a face from the merged candidate head-shoulder frame by using the trained face frame prediction model to obtain a predicted face frame.
  • the face frame prediction model may be a convolutional neural network.
  • the face frame prediction model may be the convolutional neural network shown in FIG. 2, and the convolutional neural network includes two 3 ⁇ 3 convolution layers, one 2 ⁇ 2 convolution layer, and one full connection layer, the first two.
  • the convolutional layer uses the maximum pooling of 3X3.
  • the goal of regression is the position of the face frame [x, y, w, h].
  • the head-shoulder detection face frame prediction model for example, convolutional neural network
  • the merged candidate face frame and the predicted face frame may be merged by a non-maximum suppression algorithm, that is, according to the combined candidate face frame and the predicted face frame belonging to the head-shoulder probability and the merged candidate The ratio of the overlapping area of the face frame and the predicted face frame merges the candidate head-shoulder frames.
  • combining the merged candidate face frame and the predicted face frame by the non-maximum suppression algorithm may include: including all the merged candidate face frames and the predicted face frame according to the face.
  • the probability of sorting is from high to low; the face frame with the highest probability of selection (which may be the merged candidate face frame or the predicted face frame) is used to determine whether the ratio of the overlapping area of the other face frames to the selected face frame is respectively If the ratio of the overlap area is greater than the third preset threshold, the other face frame is deleted, and the selected face frame is used as the target face frame; from the remaining face frame Select the face box with the highest probability and repeat the above process until all target face frames are obtained.
  • the remaining face frame refers to the face frame left by the deleted face frame and the target face frame.
  • the first preset threshold, the second preset threshold, and the third preset threshold may be the same or different.
  • the face detection method of the first embodiment detects an image pyramid of the image to be detected; extracts an aggregate channel feature of each layer image of the image pyramid to obtain a feature pyramid of the image to be detected; and uses the first sliding window according to the first preset step Sliding on each layer image of the image pyramid to obtain a plurality of first detection frames, and using the trained face detection model to classify the first detection frame according to the feature pyramid to obtain a plurality of candidate faces Blocking the candidate face frames to obtain a merged candidate face frame; using a second sliding window to slide on each layer of the image pyramid according to a second preset step to obtain a plurality of second detections Blocking, by using the trained head-shoulder detection model, classifying the second detection frame according to the feature pyramid to obtain a plurality of candidate head-shoulder frames; and combining the candidate head-shoulder frames to obtain a combined Candidate head-shoulder frame; predicting a face from the merged candidate head-shoulder frame using the trained face frame prediction model to obtain a predicted face frame;
  • the normal face detection (that is, the face detection by the face detection model) has a high detection rate and a low false detection rate.
  • the face detection method of the first embodiment uses the usual face detection as the main detection scheme. However, the usual face detection is sensitive to changes in angle (upward head, low head, side face), changes in illumination (backlighting, shadow), occlusion (sunglasses, masks, hats), etc., and is prone to missed detection.
  • the face detection method of the first embodiment adopts the head-shoulder detection as an auxiliary detection scheme, and after detecting the head-shoulder area, the face frame is extracted. Finally, the face faces obtained by the usual face detection and head-shoulder detection are combined to form the final face frame output.
  • the face detection method of the first embodiment uses a combination of face detection and head-shoulder detection to improve the face detection rate. Meanwhile, the face detection method of the first embodiment adopts the same feature (ie, the aggregate channel feature, that is, the feature pyramid) in face detection and head-shoulder detection, which reduces the time for feature extraction and speeds up the detection process. Therefore, the face detection method of the first embodiment can realize face detection with fast high detection rate.
  • the face detection method of the first embodiment uses a combination of face detection and head-shoulder detection to improve the face detection rate.
  • the face detection method of the first embodiment adopts the same feature (ie, the aggregate channel feature, that is, the feature pyramid) in face detection and head-shoulder detection, which reduces the time for feature extraction and speeds up the detection process. Therefore, the face detection method of the first embodiment can realize face detection with fast high detection rate.
  • FIG. 3 is a structural diagram of a face detecting apparatus according to Embodiment 2 of the present invention.
  • the face detecting apparatus 10 may include: a construction unit 301, an extraction unit 302, a first detection unit 303, a first merging unit 304, a second detection unit 305, a second merging unit 306, and a prediction unit. 307.
  • the construction unit 301 is configured to construct an image pyramid for the image to be detected.
  • the image to be detected is an image containing a human face, usually a surveillance image.
  • the image to be detected may include one face or multiple faces.
  • the image to be detected may be a grayscale image or a color image such as an RGB image, an LUV image, or an HSV image.
  • the color feature can be obtained directly from the image to be detected. For example, if the image to be detected is an RGB image, the RGB color feature can be directly obtained; if the image to be detected is an LUV image, the LUV color feature can be directly obtained; if the image to be detected is an HSV image, the HSV color feature can be directly obtained; The image is a grayscale image, and the grayscale feature can be directly obtained.
  • the image to be detected may be converted to obtain the color feature.
  • the image to be detected is an RGB image
  • the RGB image may be converted into a grayscale image (ie, a corresponding RGB value is calculated according to the gray value of each pixel) to obtain a grayscale feature of the image to be detected.
  • the image pyramid to be detected is a different scale of the image to be detected (can be enlarged or reduced) to obtain a scaled image of different sizes, and the image to be detected and its scaled image constitute an image pyramid of the image to be detected.
  • the image to be detected is scaled by 75% to obtain a first scaled image
  • the image to be detected is scaled by 50% to obtain a second scaled image
  • the image to be detected is scaled by 25% to obtain a third scaled image, the image to be detected and the first scaled image
  • the second zoom The image, the third zoom image constitutes an image pyramid.
  • the number of layers of the image pyramid of the image to be detected may be determined according to the size of the image to be detected and the size of the face detection model (see step 103) used in the present invention (ie, the size of the input image received by the face detection model).
  • the number of layers of the image pyramid of the image to be detected can be determined by the following formula:
  • n represents the number of layers of the image pyramid of the image to be detected
  • k up represents the multiple of the image to be detected (ie, the multiple of the image to be detected is enlarged)
  • w img and h img respectively represent the width and height of the image to be detected
  • w m and h m respectively represent the width and height of the face detection model (ie, the width and height of the input image received by the face detection model)
  • n octave represents the number of layers of the image between every two sizes in the image pyramid.
  • the width and height of the image to be detected are known, and the width and height of the face detection model are also known.
  • k up can be set by the user as needed, or the system default (for example, the default is 2).
  • the n octave can be set by the user as needed, or the system default (for example, the default is 8).
  • the extracting unit 302 is configured to extract an aggregate channel feature of each layer image of the image pyramid to obtain a feature pyramid of the image to be detected.
  • the polymeric channel features can include color features, gradient magnitude features, and gradient direction histogram features.
  • the color features may include RGB color features, LUV color features, HSV color features, grayscale features, and the like.
  • Gradient has a variety of calculation methods, such as using Sobel, Prewitt or Roberts operators to calculate the gradient of each pixel (including horizontal gradient values and vertical gradient values).
  • the gradient magnitude and gradient direction of each pixel point are determined according to the gradient of each pixel point.
  • the gradient magnitude of each pixel of the image is the gradient magnitude feature of the image.
  • the gradient direction histogram of the image that is, the gradient direction histogram feature of the image
  • the image may be divided into a plurality of equal-sized blocks (for example, 4 ⁇ 4 blocks), and the gradient direction histograms of the respective blocks are respectively obtained, and the image is obtained according to the gradient direction histogram of each block. Gradient direction histogram.
  • the gradient direction histogram of each block can be calculated as follows: according to the gradient direction of each pixel in the block, each pixel in the block is divided into a plurality of different angular ranges (for example, 6 angular ranges); statistical area The gradient amplitude of the pixel points in each angular range of the block obtains the gradient magnitude of each angular range in the block; and the gradient direction histogram of each block is obtained according to the gradient amplitude of each angular range in the block.
  • the gradient direction histogram of the image can be obtained from the gradient direction histogram of each block in the image.
  • the gradient direction histogram vectors of the respective blocks in the image may be connected in series to form a gradient direction histogram series vector, and the gradient direction histogram series vector is the gradient direction histogram feature of the image.
  • an aggregate channel feature (referred to as a real feature) of a partial image (referred to as a real feature layer) in the image pyramid, and other images in the image pyramid (referred to as an approximation) may be calculated.
  • the aggregate channel feature of the feature layer is obtained by real feature interpolation, for example, by real feature interpolation corresponding to the real feature layer closest to it.
  • the real feature layer in the image pyramid can be specified by the user as needed, or it can be system default.
  • s represents the ratio of the approximate feature layer to the real feature layer.
  • ⁇ ⁇ is constant for one feature, and the value of ⁇ ⁇ can be estimated in the following manner. Estimating, be replaced by a k ⁇ s k s, among them, Indicates that the image I i is scaled by the ratio s, f ⁇ (I) represents the feature ⁇ for the image I, and the features are averaged, and N represents the number of images participating in the estimation. In a specific embodiment, the value of s is N is taken to be 50000, and ⁇ ⁇ is obtained by the least squares method.
  • the first detecting unit 303 is configured to slide on each layer image of the image pyramid according to the first preset step by using the first sliding window to obtain a plurality of first detection frames, and use the trained face detection model according to the The feature pyramid classifies the first detection frame to obtain a plurality of candidate face frames.
  • the candidate face frame is a first detection frame classified as a face.
  • the size of the first sliding window is equal to the size of the input image received by the face detection model.
  • the size of the first sliding window is 32 ⁇ 32
  • the first preset step size is 2 (ie, 2 pixels).
  • the first sliding window and the first predetermined step size may be other sizes.
  • the first sliding window slides on the image of each layer of the image pyramid according to a preset direction (for example, from top to bottom, from left to right), and each position obtains a first detection frame, and the trained face detection model is used.
  • the first detection frame is classified to determine whether the first detection frame is a candidate face frame.
  • the face detection model may be a classifier formed by cascading a plurality of (for example, 512) decision trees, that is, a strong classifier formed by cascading a plurality of weak classifiers.
  • a decision tree also known as a decision tree, is a tree structure that is applied to classification. Each internal node in the decision tree represents a test of an attribute, with each edge representing a test result, a leaf node representing the distribution of a class or class, and the top node being the root node.
  • the decision tree constituting the face detection model may have a depth of 8 or other values.
  • the face detection model formed by multiple decision trees can be trained using an adbost method such as the Gentle adboost method.
  • the training samples required to train the face detection model include positive samples and negative samples.
  • the positive sample of the trained face detection model is a face frame image
  • the negative sample is a non-face frame image.
  • the face frame image may be intercepted from the monitoring image, and the captured face frame image is scaled to a first predetermined size (for example, 32 ⁇ 32) as a positive sample of the training face detection model;
  • the non-face frame image is intercepted, and the intercepted non-face frame image is scaled to a first predetermined size (for example, 32 ⁇ 32) as a negative sample of the training face detection model.
  • the intercepted non-face frame image is an image taken from an image area outside the area where the face frame is located.
  • the training of the face detection model can refer to the prior art, and details are not described herein again.
  • the first merging unit 304 is configured to merge the candidate face frames to obtain a merged candidate face frame.
  • Merging the candidate face frame is to de-weight the candidate face frame.
  • the merged candidate face frame can be one or more. If the image to be detected includes a face, a merged candidate face frame may be obtained; if the image to be detected includes multiple faces, a merged candidate face frame may be obtained for each face.
  • Candidate face frames can be merged by non-maximum suppression (NMS) algorithm, that is, according to the probability that the candidate face frame belongs to the face and the overlapping area ratio of the candidate face frame (Intersection over Union, IOU) ) Merging candidate face frames.
  • NMS non-maximum suppression
  • the merging of the candidate face frames by the NMS algorithm may include: sorting all candidate face frames according to the probability of belonging to the face; selecting the candidate face frame with the highest probability, and separately determining other candidates Whether the ratio of the overlapping area of the face frame to the selected candidate face frame is greater than a first preset threshold (for example, 0.25); if the overlapping area ratio is greater than the first preset threshold, deleting the other candidate face frame, and selecting the selected
  • the candidate face frame is used as the merged candidate face frame; the candidate face frame with the highest probability is selected from the remaining candidate face frames, and the above process is repeated until all the merged candidate face frames are obtained.
  • the remaining candidate face frame refers to the candidate face frame that is removed from the deleted candidate face frame and the merged candidate face frame.
  • candidate face frames which are ranked as A, B, C, D, E, and F according to the probability of belonging to the face.
  • Select the candidate face frame F with the highest probability to determine whether the ratio of the overlapping areas of A to E and F is greater than the first preset threshold.
  • B and D are deleted, and the flag F is the first merged candidate face frame obtained.
  • From the remaining candidate face frames A, C, and E select the face E of the candidate with the highest probability, and determine whether the ratio of the overlapping area of A, C, and E is greater than the first preset threshold.
  • the merged candidate face frames F and E are obtained by the NMS algorithm.
  • a second detecting unit 305 configured to slide on each layer of the image pyramid according to a second preset step by using a second sliding window, to obtain a plurality of second detecting frames, and using the trained head-shoulder detecting model according to the
  • the feature pyramid classifies the second detection frame to obtain a plurality of candidate head-shoulder frames.
  • the candidate head-shoulder frame is a second detection frame classified into a head-shoulder frame.
  • the size of the second sliding window is equal to the size of the input image received by the head-shoulder detection model.
  • the size of the second sliding window may be 64 ⁇ 64, and the second preset step size may be 2.
  • the second sliding window and the second predetermined step size may be other sizes.
  • the second preset step size may be equal to the first preset step size.
  • the second preset step size may also be not equal to the first preset step size.
  • the first preset step size is 2, and the second preset step size is 4.
  • the second sliding window slides on each layer image of the image pyramid according to a preset direction (for example, from top to bottom, from left to right), and each position obtains a second detection frame, and the trained face detection model is used.
  • the second detection frame is classified to determine whether the second detection frame is a candidate head-shoulder frame.
  • the head-shoulder detection model may be a classifier formed by cascading a plurality (eg, 512) of decision trees.
  • the number of decision trees included in the head-shoulder detection model may be the same as or different from the number of decision trees included in the face detection model.
  • the decision tree constituting the head-shoulder detection model may have a depth of 8, or may be other values.
  • a training sample of the head-shoulder detection model can be obtained from the trained face detection model.
  • the trained face detection model cascaded by the decision tree can be reduced by several decision trees to obtain a new face detection model.
  • the trained face detection model and the new face detection model detect the face on the monitoring image, and the new face detection model detects more faces than the face detected by the trained face detection model.
  • the position of the face frame in the monitoring image is marked, and the face frame is extended to obtain the head-shoulder frame, and the position of the head-shoulder frame in the monitoring image is marked.
  • the position of the head-shoulder frame is marked as [x', y', w', h'], x', y' represents the coordinates of the top left corner of the head-shoulder frame, and w' represents the width of the head-shoulder frame. h' indicates the height of the head-shoulder frame.
  • the head-shoulder frame image may be intercepted from the surveillance image; the intercepted head-shoulder frame image is scaled to a second predetermined size (eg, 64 ⁇ 64) as a positive sample of the training head-shoulder detection model; the non-head is intercepted from the surveillance image a shoulder frame image that scales the intercepted non-head-shoulder frame image to a second predetermined size as a negative sample of the training head-shoulder detection model.
  • the intercepted non-head-shoulder frame image is an image taken from an image area outside the area where the head-shoulder frame is located.
  • the training sample required by the head-shoulder detection model can be conveniently obtained by the trained face detection model, and the obtained training samples are obtained from the monitoring image, and thus are more in line with the actual monitoring scene.
  • the head-shoulder detection model formed by multiple decision trees can be trained using an adbost method such as the Gentle adboost method.
  • Existing head-shoulder detection generally uses edge features (HOG) or texture features (LBP), which are more complex and computationally time consuming.
  • the invention performs head-shoulder detection according to the feature pyramid of the image to be detected, does not need to perform additional feature extraction, saves the time of feature extraction in the head-shoulder detection process, accelerates the speed of head-shoulder detection, thereby improving the invention The efficiency of the face detection method.
  • the second merging unit 306 is configured to combine the candidate head-shoulder frames to obtain a combined candidate head-shoulder frame.
  • Combining the candidate head-shoulder frames is to de-weight the candidate head-shoulder frame.
  • the combined candidate head-shoulder frame may be one or more. If the image to be detected includes a head-shoulder, a merged candidate head-shoulder frame can be obtained; if the image to be detected includes a plurality of head-shoulders, a merged candidate head-shoulder frame can be obtained for each head-shoulder. .
  • the candidate head-shoulder frames may be merged by a non-maximum suppression algorithm, that is, the candidate head-shoulder frames are merged according to the probability that the candidate head-shoulder frame belongs to the head-shoulder and the candidate head-shoulder frame overlap area ratio.
  • combining the candidate head-shoulder frames by the non-maximum suppression algorithm may include: sorting all candidate head-shoulder frames according to the probability of belonging to the head-shoulder; selecting the candidate head with the highest probability-shoulder a frame, respectively, determining whether a ratio of overlapping area of the other candidate head-shoulder frame and the selected candidate head-shoulder frame is greater than a second preset threshold (eg, 0.30); if the overlapping area ratio is greater than a second preset threshold, deleting the other Candidate head-shoulder frame, and select the candidate head-shoulder frame as the merged candidate head-shoulder frame; select the highest probability candidate head-shoulder frame from the remaining candidate head-shoulder frame, and repeat the above process until Get all the merged candidate head-shoulder frames.
  • the remaining candidate head-shoulder frame refers to the candidate head-shoulder frame remaining except the deleted candidate head-shoulder frame and the merged candidate head-shoulder frame.
  • candidate head-shoulder frames which are ranked as A', B', C', D', E', F' according to the probability of belonging to the head-shoulder.
  • the candidate head with the highest probability, the shoulder frame F' is selected to determine whether the ratio of the overlapping area of A' ⁇ E' and F' is greater than the second predetermined threshold. Assuming that the overlapping area ratio of B', D', and F exceeds the second predetermined threshold, B', D' are deleted, and the flag F' is the first merged candidate head-shoulder frame obtained.
  • the prediction unit 307 is configured to predict a face from the merged candidate head-shoulder frame by using the trained face frame prediction model to obtain a predicted face frame.
  • the face frame prediction model may be a convolutional neural network.
  • the face frame prediction model may be the convolutional neural network shown in FIG. 2, and the convolutional neural network includes two 3 ⁇ 3 convolution layers, one 2 ⁇ 2 convolution layer, and one full connection layer, the first two.
  • the convolutional layer uses the maximum pooling of 3X3.
  • the goal of regression is the position of the face frame [x, y, w, h].
  • the head-shoulder detection face frame prediction model for example, convolutional neural network
  • the third merging unit 308 is configured to combine the merged candidate face frame and the predicted face frame to obtain a target face frame.
  • the merged candidate face frame and the predicted face frame may be merged by a non-maximum suppression algorithm, that is, according to the combined candidate face frame and the predicted face frame belonging to the head-shoulder probability and the merged candidate The ratio of the overlapping area of the face frame and the predicted face frame merges the candidate head-shoulder frames.
  • combining the merged candidate face frame and the predicted face frame by the non-maximum suppression algorithm may include: including all the merged candidate face frames and the predicted face frame according to the face.
  • the probability of sorting is from high to low; the face frame with the highest probability of selection (which may be the merged candidate face frame or the predicted face frame) is used to determine whether the ratio of the overlapping area of the other face frames to the selected face frame is respectively If the ratio of the overlap area is greater than the third preset threshold, the other face frame is deleted, and the selected face frame is used as the target face frame; from the remaining face frame Select the face box with the highest probability and repeat the above process until all target face frames are obtained.
  • the remaining face frame refers to the face frame left by the deleted face frame and the target face frame.
  • the face detecting device of the second embodiment constructs an image pyramid of the image to be detected; extracts an aggregate channel feature of each layer image of the image pyramid to obtain a feature pyramid of the image to be detected; and uses the first sliding window according to the first preset step Sliding on each layer image of the image pyramid to obtain a plurality of first detection frames, and using the trained face detection model to classify the first detection frame according to the feature pyramid to obtain a plurality of candidate faces Blocking the candidate face frames to obtain a merged candidate face frame; using a second sliding window to slide on each layer of the image pyramid according to a second preset step to obtain a plurality of second detections Blocking, by using the trained head-shoulder detection model, classifying the second detection frame according to the feature pyramid to obtain a plurality of candidate head-shoulder frames; and combining the candidate head-shoulder frames to obtain a combined Candidate head-shoulder frame; predicting a face from the merged candidate head-shoulder frame using the trained face frame prediction model to obtain a predicted face frame;
  • the normal face detection (that is, the face detection by the face detection model) has a high detection rate and a low false detection rate.
  • the face detection device of the second embodiment uses the usual face detection as the main detection scheme. However, the usual face detection is sensitive to changes in angle (upward head, low head, side face), changes in illumination (backlighting, shadow), occlusion (sunglasses, masks, hats), etc., and is prone to missed detection.
  • the face detecting device of the second embodiment adopts head-shoulder detection as an auxiliary detection scheme, and after detecting the head-shoulder region, the face frame is extracted. Finally, the face faces obtained by the usual face detection and head-shoulder detection are combined to form the final face frame output.
  • the face detecting device of the second embodiment uses the face detection and the head-shoulder detection in combination to improve the face detection rate. Meanwhile, the face detecting device of the second embodiment adopts the same feature (ie, the aggregate channel feature) in face detection and head-shoulder detection, which reduces the time for feature extraction and speeds up the detection process. Therefore, the face detecting device of the second embodiment can realize face detection with fast high detection rate.
  • FIG. 3 is a schematic diagram of a computer apparatus according to Embodiment 3 of the present invention.
  • the computer device 1 includes a memory 20, a processor 30, and a computer program 40, such as a face detection program, stored in the memory 20 and operable on the processor 30.
  • a computer program 40 such as a face detection program
  • the steps in the embodiment of the face detection method described above are implemented, for example, steps 101-108 shown in FIG.
  • the processor 30 executes the computer program 40
  • the functions of the modules/units in the above device embodiments are implemented, such as the units 301-308 in FIG.
  • the computer program 40 can be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to complete this invention.
  • the one or more modules/units may be a series of computer program instruction segments capable of performing a particular function for describing the execution of the computer program 40 in the computer device 1.
  • the computer program 40 may be divided into the construction unit 301, the extraction unit 302, the first detection unit 303, the first merging unit 304, the second detection unit 305, the second merging unit 306, and the prediction unit 307 in FIG.
  • the third merging unit 308, for the specific functions of each unit, refer to the second embodiment.
  • the computer device 1 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. It will be understood by those skilled in the art that the schematic diagram 4 is merely an example of the computer device 1, and does not constitute a limitation of the computer device 1, and may include more or less components than those illustrated, or may combine some components, or different.
  • the components, such as the computer device 1, may also include input and output devices, network access devices, buses, and the like.
  • the processor 30 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, and the processor 30 is a control center of the computer device 1, and connects the entire computer device 1 by using various interfaces and lines. Various parts.
  • the memory 20 can be used to store the computer program 40 and/or modules/units by running or executing computer programs and/or modules/units stored in the memory 20, and by calling in memory.
  • the data within 20 implements various functions of the computer device 1.
  • the memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be Data (such as audio data, phone book, etc.) created according to the use of the computer device 1 is stored.
  • the memory 20 may include a high-speed random access memory, and may also include a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (Secure Digital, SD).
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (Secure Digital, SD).
  • SMC smart memory card
  • SD Secure Digital
  • Card flash card, at least one disk storage device, flash device, or other volatile solid state storage device.
  • the modules/units integrated by the computer device 1 can be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the present invention implements all or part of the processes in the foregoing embodiments, and may also be completed by a computer program to instruct related hardware.
  • the computer program may be stored in a computer readable storage medium. The steps of the various method embodiments described above may be implemented when the program is executed by the processor.
  • the computer program comprises computer program code, which may be in the form of source code, object code form, executable file or some intermediate form.
  • the computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM). , random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in a jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, computer readable media Does not include electrical carrier signals and telecommunication signals.
  • the disclosed computer apparatus and method may be implemented in other manners.
  • the computer device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division, and the actual implementation may have another division manner.
  • each functional unit in each embodiment of the present invention may be integrated in the same processing unit, or each unit may exist physically separately, or two or more units may be integrated in the same unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software function modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un dispositif de détection de visage, un dispositif informatique et un support d'informations lisible. Le procédé consiste à : construire une pyramide d'images d'une image en cours de détection ; extraire des caractéristiques de canal d'agrégat d'images à des niveaux respectifs de la pyramide d'images pour obtenir une pyramide de caractéristiques de l'image ; utiliser une première fenêtre dynamique pour obtenir de multiples premières trames de détection de l'image, et classifier les premières trames de détection pour obtenir de multiples trames de visage candidates ; combiner les trames de visage candidates ; utiliser une seconde fenêtre dynamique pour obtenir de multiples secondes trames de détection de l'image, et classifier les secondes trames de détection pour obtenir de multiples trames de tête-épaules candidates ; combiner les trames de tête-épaules candidates ; prédire un visage à partir d'une trame de tête-épaules candidate combinée pour obtenir une trame de visage prédit ; et combiner une trame de visage candidate combinée et la trame de visage prédit pour obtenir une trame de visage cible. Le procédé permet d'effectuer une détection de visage rapide ayant un taux de détection élevé.
PCT/CN2017/119043 2017-12-12 2017-12-27 Procédé et dispositif de détection de visage, dispositif informatique et support d'informations lisible par ordinateur WO2019114036A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711319416.X 2017-12-12
CN201711319416.XA CN109918969B (zh) 2017-12-12 2017-12-12 人脸检测方法及装置、计算机装置和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2019114036A1 true WO2019114036A1 (fr) 2019-06-20

Family

ID=66819559

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/119043 WO2019114036A1 (fr) 2017-12-12 2017-12-27 Procédé et dispositif de détection de visage, dispositif informatique et support d'informations lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN109918969B (fr)
WO (1) WO2019114036A1 (fr)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179218A (zh) * 2019-12-06 2020-05-19 深圳市派科斯科技有限公司 传送带物料检测方法、装置、存储介质及终端设备
CN111538861A (zh) * 2020-04-22 2020-08-14 浙江大华技术股份有限公司 基于监控视频进行图像检索的方法、装置、设备及介质
CN111783601A (zh) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111832460A (zh) * 2020-07-06 2020-10-27 北京工业大学 一种基于多特征融合的人脸图像提取方法及系统
CN112183351A (zh) * 2020-09-28 2021-01-05 普联国际有限公司 结合肤色信息的人脸检测方法、装置、设备及可读存储介质
CN112825138A (zh) * 2019-11-21 2021-05-21 佳能株式会社 图像处理设备、图像处理方法、摄像设备及机器可读介质
CN113095284A (zh) * 2021-04-30 2021-07-09 平安国际智慧城市科技股份有限公司 人脸选取方法、装置、设备及计算机可读存储介质
CN113221812A (zh) * 2021-05-26 2021-08-06 广州织点智能科技有限公司 人脸关键点检测模型的训练方法和人脸关键点检测方法
CN113723274A (zh) * 2021-08-27 2021-11-30 上海科技大学 改进基于非极大抑制的目标物体检测方法
CN114444895A (zh) * 2021-12-31 2022-05-06 深圳云天励飞技术股份有限公司 清洁质量评估方法及相关设备

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264396B (zh) * 2019-06-27 2022-11-18 杨骥 视频人脸替换方法、系统及计算机可读存储介质
CN113051960A (zh) * 2019-12-26 2021-06-29 深圳市光鉴科技有限公司 深度图人脸检测方法、系统、设备及存储介质
CN111985439A (zh) * 2020-08-31 2020-11-24 中移(杭州)信息技术有限公司 人脸检测方法、装置、设备和存储介质
CN112507786B (zh) * 2020-11-03 2022-04-08 浙江大华技术股份有限公司 人体多部位检测框关联方法、装置、电子装置和存储介质
CN112714253B (zh) * 2020-12-28 2022-08-26 维沃移动通信有限公司 视频录制方法、装置、电子设备和可读存储介质
CN113095257A (zh) * 2021-04-20 2021-07-09 上海商汤智能科技有限公司 异常行为检测方法、装置、设备及存储介质
CN113269761A (zh) * 2021-05-31 2021-08-17 广东联通通信建设有限公司 一种倒影检测方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131728A (zh) * 2007-09-29 2008-02-27 东华大学 一种基于Shape Context的人脸形状匹配方法
CN102163283A (zh) * 2011-05-25 2011-08-24 电子科技大学 一种基于局部三值模式的人脸特征提取方法
CN102254183A (zh) * 2011-07-18 2011-11-23 北京汉邦高科数字技术有限公司 一种基于AdaBoost算法的人脸检测方法
CN106529448A (zh) * 2016-10-27 2017-03-22 四川长虹电器股份有限公司 利用聚合通道特征进行多视角人脸检测的方法
CN107330390A (zh) * 2017-06-26 2017-11-07 上海远洲核信软件科技股份有限公司 一种基于图像分析和深度学习的人数统计方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096801A (zh) * 2009-12-14 2011-06-15 北京中星微电子有限公司 一种坐姿检测方法及装置
CN104361327B (zh) * 2014-11-20 2018-09-18 苏州科达科技股份有限公司 一种行人检测方法和系统
CN106650615B (zh) * 2016-11-07 2018-03-27 深圳云天励飞技术有限公司 一种图像处理方法及终端
CN106991688A (zh) * 2017-03-09 2017-07-28 广东欧珀移动通信有限公司 人体跟踪方法、人体跟踪装置和电子装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131728A (zh) * 2007-09-29 2008-02-27 东华大学 一种基于Shape Context的人脸形状匹配方法
CN102163283A (zh) * 2011-05-25 2011-08-24 电子科技大学 一种基于局部三值模式的人脸特征提取方法
CN102254183A (zh) * 2011-07-18 2011-11-23 北京汉邦高科数字技术有限公司 一种基于AdaBoost算法的人脸检测方法
CN106529448A (zh) * 2016-10-27 2017-03-22 四川长虹电器股份有限公司 利用聚合通道特征进行多视角人脸检测的方法
CN107330390A (zh) * 2017-06-26 2017-11-07 上海远洲核信软件科技股份有限公司 一种基于图像分析和深度学习的人数统计方法

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11670112B2 (en) 2019-11-21 2023-06-06 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and image capture apparatus
CN112825138A (zh) * 2019-11-21 2021-05-21 佳能株式会社 图像处理设备、图像处理方法、摄像设备及机器可读介质
EP3826293A1 (fr) * 2019-11-21 2021-05-26 Canon Kabushiki Kaisha Détection séparée de visage et tête et sélection du résultat pour détecter des caractéristiques faciales
CN111179218A (zh) * 2019-12-06 2020-05-19 深圳市派科斯科技有限公司 传送带物料检测方法、装置、存储介质及终端设备
CN111179218B (zh) * 2019-12-06 2023-07-04 深圳市燕麦科技股份有限公司 传送带物料检测方法、装置、存储介质及终端设备
CN111538861A (zh) * 2020-04-22 2020-08-14 浙江大华技术股份有限公司 基于监控视频进行图像检索的方法、装置、设备及介质
CN111538861B (zh) * 2020-04-22 2023-08-15 浙江大华技术股份有限公司 基于监控视频进行图像检索的方法、装置、设备及介质
CN111783601A (zh) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111783601B (zh) * 2020-06-24 2024-04-26 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111832460A (zh) * 2020-07-06 2020-10-27 北京工业大学 一种基于多特征融合的人脸图像提取方法及系统
CN111832460B (zh) * 2020-07-06 2024-05-21 北京工业大学 一种基于多特征融合的人脸图像提取方法及系统
CN112183351A (zh) * 2020-09-28 2021-01-05 普联国际有限公司 结合肤色信息的人脸检测方法、装置、设备及可读存储介质
CN112183351B (zh) * 2020-09-28 2024-03-29 普联国际有限公司 结合肤色信息的人脸检测方法、装置、设备及可读存储介质
CN113095284A (zh) * 2021-04-30 2021-07-09 平安国际智慧城市科技股份有限公司 人脸选取方法、装置、设备及计算机可读存储介质
CN113221812A (zh) * 2021-05-26 2021-08-06 广州织点智能科技有限公司 人脸关键点检测模型的训练方法和人脸关键点检测方法
CN113723274A (zh) * 2021-08-27 2021-11-30 上海科技大学 改进基于非极大抑制的目标物体检测方法
CN113723274B (zh) * 2021-08-27 2023-09-22 上海科技大学 改进基于非极大抑制的目标物体检测方法
CN114444895A (zh) * 2021-12-31 2022-05-06 深圳云天励飞技术股份有限公司 清洁质量评估方法及相关设备

Also Published As

Publication number Publication date
CN109918969A (zh) 2019-06-21
CN109918969B (zh) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2019114036A1 (fr) Procédé et dispositif de détection de visage, dispositif informatique et support d'informations lisible par ordinateur
US11062123B2 (en) Method, terminal, and storage medium for tracking facial critical area
Wei et al. Multi-vehicle detection algorithm through combining Harr and HOG features
CN108121986B (zh) 目标检测方法及装置、计算机装置和计算机可读存储介质
WO2018103608A1 (fr) Procédé de détection de texte, dispositif et support d'enregistrement
US8750573B2 (en) Hand gesture detection
US8792722B2 (en) Hand gesture detection
WO2019218824A1 (fr) Procédé d'acquisition de piste de mouvement et dispositif associé, support de stockage et terminal
US8351662B2 (en) System and method for face verification using video sequence
WO2020107717A1 (fr) Appareil et procédé de détection de région de saillance visuelle
US9014467B2 (en) Image processing method and image processing device
CN104866616B (zh) 监控视频目标搜索方法
CN107273832B (zh) 基于积分通道特征与卷积神经网络的车牌识别方法及系统
JP2017531883A (ja) 画像の主要被写体を抽出する方法とシステム
JP2003030667A (ja) イメージ内で目を自動的に位置決めする方法
WO2020187160A1 (fr) Procédé et système de reconnaissance faciale basés sur un réseau neuronal à convolution profonde en cascade
WO2018082308A1 (fr) Procédé de traitement d'image et terminal
JP6095817B1 (ja) 物体検出装置
CN105046278B (zh) 基于Haar特征的Adaboost检测算法的优化方法
WO2019119515A1 (fr) Procédé d'analyse et de filtrage de visage, dispositif, appareil intégré, diélectrique et circuit intégré
WO2019095998A1 (fr) Procédé et dispositif de reconnaissance d'image, dispositif informatique et support de stockage lisible par ordinateur
KR101343623B1 (ko) 적응적 피부색 검출 방법, 그리고 이를 이용한 얼굴 검출 방법 및 그 장치
Ghandour et al. Building shadow detection based on multi-thresholding segmentation
Wo et al. A saliency detection model using aggregation degree of color and texture
CN113762027B (zh) 一种异常行为的识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17934931

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17934931

Country of ref document: EP

Kind code of ref document: A1