CN112287769B

CN112287769B - Face detection method, device, equipment and storage medium

Info

Publication number: CN112287769B
Application number: CN202011073284.9A
Authority: CN
Inventors: 朱国华; 徐昆
Original assignee: Jianghan University
Current assignee: Jianghan University
Priority date: 2020-10-09
Filing date: 2020-10-09
Publication date: 2024-03-12
Anticipated expiration: 2040-10-09
Also published as: CN112287769A

Abstract

The invention relates to the field of software image recognition, and provides a face detection method, a face detection device, face detection equipment and a storage medium. The method comprises the following steps: acquiring an input image; dividing an input image to obtain a plurality of dividing units; extracting the HOG characteristics of the direction gradient histogram of each segmentation unit to obtain HOG vectors; normalizing each segmentation unit to obtain a plurality of normalization units; calculating the mean value, variance and image difference of each normalization unit to obtain the mean value of a plurality of segmentation units, the variance of the plurality of segmentation units and the image difference of the plurality of segmentation units; feature fusion is carried out on the mean value of each segmentation unit, the variance of the segmentation unit, the image difference of the segmentation unit and the HOG vector to obtain a target feature vector; and inputting the target feature vector into a trained support vector machine to obtain a face detection result. The accuracy of face prediction is improved.

Description

Face detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image recognition, and in particular, to a face detection method, apparatus, device, and storage medium.

Background

The face detection problem is derived from face recognition, which is one of the most effective and popular verification means at present, and is widely applied to mobile phones, computers, access control and other devices. As face recognition is widely used, face detection is also beginning to be regarded as a separate problem. In general, the use of face detection systems is complex, and it must be able to recognize faces in different environments, which requires that it must be able to adapt to different environments in addition to a high recognition rate. At present, the technology related to face detection and face recognition is used in various fields, and has important academic value and application value in the aspects of information retrieval, target monitoring, target tracking, automatic driving and the like. Current improved algorithms based on raw HOG use only gradient information to extract feature descriptors, thus making the model unstable and not efficient in extracting image information when facing blurred and edge-smooth images.

Disclosure of Invention

The invention aims to solve the technical problem that the face recognition accuracy is too low, and provides a face detection method, which comprises the following steps:

acquiring an input image;

dividing the input image to obtain a plurality of dividing units;

extracting HOG characteristics of the directional gradient histogram of each segmentation unit to obtain HOG vectors;

normalizing each segmentation unit to obtain a plurality of normalization units;

calculating the mean value, the variance and the image difference of each normalization unit to obtain the mean value of a plurality of segmentation units, the variance of the plurality of segmentation units and the image difference of the plurality of segmentation units;

feature fusion is carried out on the mean value of each segmentation unit, the variance of the segmentation unit, the image difference of the segmentation unit and the HOG vector to obtain a target feature vector;

and inputting the target feature vector into a trained support vector machine to obtain a face detection result.

In some possible designs, the extracting the direction gradient Histogram (HOG) feature of each of the segmentation units, to obtain a plurality of HOG vectors includes:

by passing throughCalculating a horizontal gradient of each pixel in each of the partition units, wherein +.>For the coordinate in the partition unit +.>Horizontal gradient of pixel,/->For the coordinate in the partition unit +.>A pixel value of the pixel;

by passing throughCalculating a vertical gradient of each pixel in each of said segmentation units, wherein +.>For the coordinate in the partition unit +.>A vertical gradient of the pixel;

by passing throughCalculating the coordinates of each of the divided units as +.>Gradient magnitude of pixel, wherein +.>For the coordinate in the partition unit +.>Gradient magnitude of the pixel;

by passing throughCalculating the coordinates of each of the divided units as +.>Gradient direction of pixel, wherein->For the coordinate in the partition unit +.>Gradient direction of the pixel;

counting the gradient magnitude of the pixels and the gradient direction of the pixels in each unit to obtain the HOG characteristics;

and combining the HOG features to obtain the HOG vector.

In some possible designs, the normalizing process each of the segmentation units results in a plurality of normalization units, including:

taking any segmentation unit as a target segmentation unit;

obtaining a maximum pixel value in the target segmentation unit and a minimum pixel value in the target segmentation unit to obtain a pixel upper limit threshold valuePixel lower threshold +.>；

If saidSetting the pixel values of all the pixel points in the target segmentation unit to 0;

if saidBy->Calculating a normalized pixel value, wherein P is the normalized pixel value,>is the original pixel value;

and obtaining a normalization unit through the normalized pixel values.

In some possible designs, the calculating the mean, the variance, and the image difference of each normalized unit, to obtain the mean of the plurality of divided units, the variance of the plurality of divided units, and the image difference of the plurality of divided units includes:

calculating the pixel mean value of the pixel points in each dividing unit;

calculating pixel variances of pixel points in each dividing unit;

and calculating the absolute value of the difference value between each segmentation unit and the preset image.

In some possible designs, after the capturing the input image, the method further comprises:

by passing throughAnd graying the input image, wherein r is the pixel value of the red channel of the target pixel point, g is the pixel value of the green channel of the target pixel point, and b is the pixel value of the blue channel of the target pixel point.

In some possible designs, the inputting the target feature vector into a trained support vector machine to obtain a face detection result includes:

if multi-scale detection is to be performed, the input image is scaled by a non-maximum algorithm.

In some possible designs, before the target feature vector is input to the trained support vector machine to obtain the face detection result, the method further includes:

acquiring a plurality of training data and labeling labels corresponding to the training data;

inputting the training data and the corresponding labeling label into the initial support vector machine;

training the initial support vector machines under a plurality of neural network model parameters through a training function to obtain a plurality of support vector machines;

calculating the loss function values of the support vector machines, and taking the support vector machine with the smallest loss function value as the target support vector machine;

and deploying the target support vector machine to obtain the trained support vector machine.

In a second aspect, the present invention provides a face detection apparatus having a function of implementing a method corresponding to the face detection platform provided in the first aspect. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware.

The face detection apparatus includes:

the input/output module is used for acquiring an input image;

the processing module is used for dividing the input image to obtain a plurality of dividing units; extracting HOG characteristics of the directional gradient histogram of each segmentation unit to obtain HOG vectors; normalizing each segmentation unit to obtain a plurality of normalization units; calculating the mean value, the variance and the image difference of each normalization unit to obtain the mean value of a plurality of segmentation units, the variance of the plurality of segmentation units and the image difference of the plurality of segmentation units; feature fusion is carried out on the mean value of each segmentation unit, the variance of the segmentation unit, the image difference of the segmentation unit and the HOG vector to obtain a target feature vector; and inputting the target feature vector into a trained support vector machine to obtain a face detection result.

In some possible designs, the processing module is further to:

and combining the HOG features to obtain the HOG vector.

In some possible designs, the processing module is further to:

taking any segmentation unit as a target segmentation unit;

and obtaining a normalization unit through the normalized pixel values.

In some possible designs, the processing module is further to:

calculating the pixel mean value of the pixel points in each dividing unit;

calculating pixel variances of pixel points in each dividing unit;

In some possible designs, the processing module is further to:

In yet another aspect, the present invention provides a face detection apparatus, which includes at least one connected processor, a memory, and an input/output unit, where the memory is configured to store program code, and the processor is configured to invoke the program code in the memory to perform the method described in the foregoing aspects.

In yet another aspect, the invention provides a computer storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.

Compared with the prior art, the method has the advantages that features are extracted on the image by using the HOG algorithm, then the image is cut into independent units after global normalization is carried out on the original image, the mean value, variance and difference information between each unit and a standard face are extracted, and finally the information extracted by all the units is arranged into vectors and then combined with the original HOG features.

Drawings

Fig. 1-1 is a schematic flow chart of a face detection method in an embodiment of the invention;

fig. 1-2 are schematic diagrams illustrating the effect of a maximum suppression algorithm of a face detection method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those listed or explicitly listed or inherent to such process, method, article, or apparatus, but may include other steps or modules that may not be listed or inherent to such process, method, article, or apparatus, the partitioning of such modules by the present invention may be by one logical partitioning, and may be implemented by other means, such as a plurality of modules may be combined or integrated in another system, or some features may be omitted, or not implemented.

Referring to fig. 1-1, the following provides a face detection method, which includes:

101. an input image is acquired.

In this embodiment, the input image may be stored in a database or in a network hard disk.

102. Dividing the input image to obtain a plurality of dividing units.

In this embodiment, the image is divided into units of the same size. If the original image has a height H, a width W and a side length d of the dividing unit, then one of the original images may be divided into units, and for the case where the division is impossible, the edge portion of the image may be discarded so as to be exactly divisible. The edge portions of the image may be discarded because the edges of the image do not contain significant facial features such as eyes, nose, mouth, etc. The image is cut into units of the same size. If for an image of one size it is split into units of size, then the image can be split into a total of 64 units of the same size, 8 units in the horizontal direction and 8 units in the vertical direction.

103. And extracting the HOG characteristics of the directional gradient histogram of each segmentation unit to obtain HOG vectors.

In this embodiment, the gradient information of each cell is counted to obtain its gradient direction histogram. Merging statistical feature vectors of adjacent segmentation units

104. And normalizing each segmentation unit to obtain a plurality of normalization units.

In this embodiment, after the image is divided, unit normalization is performed. Assume that the maximum pixel value inside each cell isThe minimum pixel value is +.>。

105. And calculating the mean value, the variance and the image difference of each normalization unit to obtain the mean value of the plurality of segmentation units, the variance of the plurality of segmentation units and the image difference of the plurality of segmentation units.

In this embodiment, the mean, variance, and mean image difference of each cell are extracted.

106. And carrying out feature fusion on the mean value of each segmentation unit, the variance of the segmentation unit, the image difference of the segmentation unit and the HOG vector to obtain a target feature vector.

In this embodiment, the unit mean feature vector, the unit variance feature vector, the mean image difference feature vector and the HOG feature vector of the original image are spliced to obtain the final feature vector of the original image.

107. And inputting the target feature vector into a trained support vector machine to obtain a face detection result.

In this embodiment, face detection is implemented on an image of an arbitrary size. The support vector machine can only detect feature vectors of a specific length. If 8 x 8 cells are used in the training, the SVM can only detect 8 x 8 size targets. To use it to detect faces of different sizes in a picture, multi-scale detection and non-maximal suppression are also required.

In some embodiments, the extracting the HOG feature of the direction gradient histogram of each of the segmentation units to obtain a plurality of HOG vectors includes:

and combining the HOG features to obtain the HOG vector.

In the above embodiment, the image is cut into units of the same size. Such as for oneAn image of size, which is divided into +.>The image can be divided into a total of 64 cells of the same size, 8 cells in the horizontal direction and 8 cells in the vertical direction. 4. And counting the gradient information of each unit to obtain a gradient direction histogram of the unit. If the gradient information is statistically +.>Then the resulting vector length is 9. Initially, the value representing these 9 directions is 0, and for each pixel in the cell, its gradient direction is calculated first>To see which interval above it belongs to, then to let its gradient size +.>The above operation is performed for each pixel of the cell, added to its corresponding direction, to get its gradient direction histogram. 5. And merging the statistical feature vectors of the adjacent cell units to obtain the feature vector of each block. Different single cell merging schemes may result in different eigenvector lengths for the block. 4 units which are adjacent to each other in the vertical directionMerging into block blocks, then each block has a size of +.>The feature vector length is +.>. The whole image has +.>A block, then the feature vector length of the image is; />。

In some embodiments, the normalizing process obtains a plurality of normalized units by each of the dividing units, including:

taking any segmentation unit as a target segmentation unit;

and obtaining a normalization unit through the normalized pixel values.

In the above embodiment, the normalization processing is performed on the pixel points in the above manner.

In some embodiments, the calculating the mean, variance, and image difference of each normalized unit to obtain the mean of the plurality of divided units, the variance of the plurality of divided units, and the image difference of the plurality of divided units includes:

calculating the pixel mean value of the pixel points in each dividing unit;

calculating pixel variances of pixel points in each dividing unit;

In the above embodiment, the average value of the extraction unit is: first, the pixel average value of each cell is calculated, and if the original image is divided into n cells, the cell average feature vector of the image is an n-dimensional feature vector. Extracting unit variance: the variance inside each cell is first calculated, then the variances of all cells are combined, and the cell variance feature vector is also an n-dimensional feature vector. Extracting mean image difference: firstly, extracting unit mean value feature vectors of a standard face and an original image, and then subtracting the feature vectors and taking absolute values to obtain a mean value image difference feature vector of the original image.

In some embodiments, after the capturing the input image, the method further comprises:

In the above embodiment, the image is first grayed out, and the RGB values of each pixel of the original image are (r, g, b), and then the pixel values after graying out are assumed to be the same.

In some embodiments, the inputting the target feature vector into a trained support vector machine to obtain a face detection result includes:

In the above embodiment, as in fig. 1-2, the image is subjected to multi-scale detection. The multi-scale detection algorithm can eliminate the influence caused by the scale difference of the images, and targets with different sizes can be found by reducing the images and scanning and detecting each reduced image. To achieve multi-scale detection, it is common practice to use relative coordinates to record the position of the matrix. If the coordinates of an upper left corner of an image are (u, d, l, r) and the coordinates of a matrix are (u/H, d/H, l/W, r/W), the relative coordinates can be expressed as (u/H, d/H, l/W). By continuously scaling the image, the face in the image can be scaled to the size which can be detected by the detector, then the relative coordinates of the face are recorded, and finally, the absolute coordinates under the (H, W) scale can be programmed by the inverse conversion of the above process for all face coordinates.

In some embodiments, before the target feature vector is input to the trained support vector machine to obtain the face detection result, the method further includes:

In the above embodiment, the support vector machine face detector is trained. And obtaining feature vectors of all images of the training set, then sending the feature vectors into a support vector machine model for training to obtain a support vector machine face detector, wherein the classifier has the function of identifying faces, and if the feature vectors extracted for the images with the faces are output as 1, otherwise, the feature vectors are output as 0.

A schematic structure of a face detection apparatus 20 shown in fig. 2 is applicable to face detection. The face detection apparatus according to the embodiment of the present invention can implement the steps corresponding to the face detection method performed in the embodiment corresponding to fig. 1-1 described above. The functions of the face detection apparatus 20 may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware. The face detection apparatus may include an input/output module 201 and a processing module 202, and the functional implementation of the processing module 202 and the input/output module 201 may refer to the operations performed in the embodiments corresponding to fig. 1-1, which are not described herein. The input-output module 201 may be used to control the input, output and acquisition operations of the input-output module 201.

In some embodiments, the input-output module 201 may be configured to obtain an input image;

the processing module 202 may be configured to segment the input image to obtain a plurality of segmentation units; extracting HOG characteristics of the directional gradient histogram of each segmentation unit to obtain HOG vectors; normalizing each segmentation unit to obtain a plurality of normalization units; calculating the mean value, the variance and the image difference of each normalization unit to obtain the mean value of a plurality of segmentation units, the variance of the plurality of segmentation units and the image difference of the plurality of segmentation units; feature fusion is carried out on the mean value of each segmentation unit, the variance of the segmentation unit, the image difference of the segmentation unit and the HOG vector to obtain a target feature vector; and inputting the target feature vector into a trained support vector machine to obtain a face detection result.

In some embodiments, the processing module 202 is further configured to:

and combining the HOG features to obtain the HOG vector.

In some embodiments, the processing module 202 is further configured to:

taking any segmentation unit as a target segmentation unit;

and obtaining a normalization unit through the normalized pixel values.

In some embodiments, the processing module 202 is further configured to:

calculating the pixel mean value of the pixel points in each dividing unit;

calculating pixel variances of pixel points in each dividing unit;

In some embodiments, the processing module 202 is further configured to:

The creation means in the embodiment of the present invention are described above from the point of view of modularized functional entities, and the following describes a computer device from the point of view of hardware, as shown in fig. 3, which includes: a processor, a memory, an input output unit (which may also be a transceiver, not identified in fig. 3) and a computer program stored in the memory and executable on the processor. For example, the computer program may be a program corresponding to the face detection method in the embodiment corresponding to fig. 1-1. For example, when the computer device implements the functions of the face detection apparatus 20 shown in fig. 2, the processor implements the steps in the face detection method performed by the face detection apparatus 20 in the embodiment corresponding to fig. 2 described above when executing the computer program. Alternatively, the processor may implement the functions of each module in the face detection apparatus 20 according to the embodiment corresponding to fig. 2 when executing the computer program. For another example, the computer program may be a program corresponding to the face detection method in the embodiment corresponding to fig. 1-1.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the computer device, connecting various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the computer device by running or executing the computer program and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

The input-output unit may be replaced by a receiver and a transmitter, and may be the same or different physical entities. Are the same physical entities and may be collectively referred to as input/output units. The input and output may be a transceiver.

The memory may be integrated in the processor or may be provided separately from the processor.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM), comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method according to the embodiments of the present invention.

While the embodiments of the present invention have been described above with reference to the drawings, the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many modifications may be made thereto by those of ordinary skill in the art without departing from the spirit of the present invention and the scope of the appended claims, which are to be accorded the full scope of the present invention as defined by the following description and drawings, or by any equivalent structures or equivalent flow changes, or by direct or indirect application to other relevant technical fields.

Claims

1. A face detection method, the method comprising:

acquiring an input image;

after the input image is acquired, byGraying the input image, wherein r is the pixel value of a red channel of a target pixel point, g is the pixel value of a green channel of the target pixel point, and b is the pixel value of a blue channel of the target pixel point;

dividing the input image to obtain a plurality of dividing units;

calculating the mean value, variance and image difference of each normalization unit to obtain the mean value of a plurality of segmentation units, the variance of the plurality of segmentation units and the image difference of the plurality of segmentation units, wherein the method specifically comprises the following steps: calculating the pixel mean value of the pixel points in each dividing unit; calculating pixel variances of pixel points in each dividing unit; calculating the absolute value of the difference value between each segmentation unit and a preset image;

inputting the training data and the corresponding labeling label into an initial support vector machine;

calculating the loss function values of the support vector machines, and taking the support vector machine with the smallest loss function value as a target support vector machine;

deploying the target support vector machine to obtain a trained support vector machine;

inputting the target feature vector into a trained support vector machine to obtain a face detection result;

2. The method of claim 1, wherein the extracting the direction gradient histogram HOG feature for each of the segmentation units results in a plurality of HOG vectors, comprising:

by passing throughCalculating a vertical gradient of each pixel in each of said segmentation units, wherein +.>For the segmentationCoordinates in the cell are +.>A vertical gradient of the pixel;

and merging the HOG features to obtain the HOG vector.

3. The method of claim 2, wherein the normalizing process each of the segmentation units results in a plurality of normalized units, comprising:

taking any segmentation unit as a target segmentation unit;

and obtaining a normalization unit through the normalized pixel values.