CN111507239B

CN111507239B - Local feature face recognition method based on image pyramid

Info

Publication number: CN111507239B
Application number: CN202010288097.6A
Authority: CN
Inventors: 万洪达; 殷俊
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2020-04-14
Filing date: 2020-04-14
Publication date: 2023-09-22
Anticipated expiration: 2040-04-14
Also published as: CN111507239A

Abstract

The invention provides a local feature face recognition method based on an image pyramid. Firstly, collecting face images of related persons and taking the face images as a training set, secondly, graying the face images, inputting the face images into an image pyramid, decomposing each layer of the pyramid to obtain a low-frequency face image and a high-frequency face image, simultaneously removing noise elements in the high-frequency face image, extracting central symmetrical double-layer local three-value features of the images, and finally, cascading to form a combined feature histogram and locally storing the combined feature histogram so as to facilitate subsequent identification. The method provided by the invention has better robustness on the noise of the face image, and can extract the fine features of the face and accurately reflect the local feature change of the face, so that higher classification precision can be provided for the follow-up face recognition and other works.

Description

Local feature face recognition method based on image pyramid

Technical Field

The invention relates to the field of image processing, in particular to a local feature face recognition method based on an image pyramid.

Background

With the decline in computer hardware costs and the rapid development of various technologies, the range of transactions that computers can help humans solve has increased. The method is characterized in that the computer can be updated from simple calculation to extract data and information needed by people from massive images, which is mainly due to the vigorous development of artificial intelligence technology. Among the artificial intelligence fields, face recognition is an active research field due to challenges and wide application prospects. People often finish identity verification by extracting and identifying facial information of a person, such as places with dense personnel, such as stations, ports, customs, airports and the like, and the identities of the people can be quickly, conveniently and accurately identified by comparing the facial appearance information of the related people stored in a database in advance through a computer. In addition, the face recognition technology has the incomparable advantages of other biological recognition technologies:

first, gather conveniently. The collection of facial images typically requires only one camera compared to fingerprints and palmprints that require a particular collection device. In particular, face acquisition is not very demanding in terms of acquisition environment, e.g., fingerprint acquisition typically requires no sweat or dirt on the finger of the person being acquired, whereas face acquisition does not have these limitations.

Second, the cost is low, and the expandability is good. If the original face recognition algorithm is to be improved, the modified algorithm is only required to be integrated into a new version of software installation package, and new equipment or hardware upgrading is not required, so that the cost is saved. Meanwhile, the function can be added for equipment which does not support face recognition originally by installing corresponding software, so that the face recognition technology has good expandability.

Thirdly, the recognition efficiency is high. Along with the development of the face recognition technology, more and more related algorithms are proposed by scientific research institutions, universities and enterprises, and the recognition accuracy and efficiency are considered by the algorithms. Many face recognition methods today can accomplish authentication in a time frame of seconds or milliseconds.

Because of the technical advantages, a large number of scientific research institutions and scholars at home and abroad are invested in the research of the face recognition technology. The face feature extraction method based on the local mode is widely discussed and explored after self-extraction. However, most local feature extraction modes have the problems of imperfect extraction of fine features, easy noise interference, higher feature dimension and the like, so that the improvement of the existing local feature extraction method has important practical application significance for improving the face recognition precision.

In the face recognition technology of the present stage, the manual descriptors represented by the local binary pattern (Local Binary Pattern, LBP) and the local ternary pattern (Local Ternary Pattern, LTP) are widely applied and researched because the implementation principle is relatively simple, the time complexity of the algorithm is low, the texture characteristics, gray level change and illumination change of the image can be effectively reflected, and the like. However, the local binary pattern and the local ternary pattern may not accurately represent the subtle feature changes of the face due to the fixed threshold selection, so that sometimes the recognition effect is not as expected, and when the facial images belonging to the same person change, the LBP and the LTP may misidentify the same person as a person of other identities. In addition, the face features acquired through the two descriptors are affected by image noise, which is unfavorable for final feature comparison and recognition. Especially in the environment with weak illumination, the shot facial image is filled with a large number of noise points, which brings about the problem of reduced face recognition accuracy.

Disclosure of Invention

The invention aims at: aiming at the technical problems related to the background technology, the local feature face recognition method based on the image pyramid is provided. The technical scheme adopted by the invention is as follows:

the technical scheme adopted by the invention is as follows:

a local feature face recognition method based on an image pyramid comprises the following specific steps:

s1, collecting facial images (10 or more of each subject are preferable) of each face to be detected;

s2, making the image acquired in the step S1 into a training set;

s3, graying the image, inputting the image into an image pyramid for image decomposition, and removing noise in the high-frequency image for the high-frequency image and the low-frequency image obtained by decomposing each layer of the image pyramid;

s4, calculating facial texture features of all images contained in the image pyramid by using a central symmetry double-layer local three-value mode (Center-Symmetric Dual Local Ternary Pattern, CS-DLTP), and then connecting all the feature histograms obtained by calculation in series to form a feature histogram vector;

s5, locally storing the feature histogram vector, and directly applying the feature histogram vector to a related scene in the later stage.

As a further technical scheme of the invention: the image capturing in step S1 may be performed in various ways, including:

directly downloading credentials or work photographs of the corresponding identification objects on the network; shooting a corresponding person through camera equipment; and intercepting the related pictures by the monitoring camera to complete the acquisition of the face image.

As a further technical scheme of the invention: the step of creating the training set in the step S2 includes the following steps:

s201, renaming and serializing all images in a face image library according to requirements, and writing a corresponding batch processing program according to specific requirements;

s202, distributing a label for each image, wherein the label is used as an identity of an image face owner, and the uniqueness is kept, namely, one person corresponds to one unique label and the labels corresponding to all images of the same person are consistent; storing in XML or TXT format according to the data format of the training image storage path and the label, and ensuring that the training image storage path and the label completely follow the PASCAL VOC format if the training image storage path and the label are stored in the XML format;

s203, placing the file generated in the step S202 into a corresponding folder for waiting training, and designating the storage position of the file on a disk in a corresponding training program;

as a further technical scheme of the invention: in the step S3, an image pyramid is constructed and image decomposition is performed, and the step of removing noise in the high-frequency image includes the following steps:

s301, constructing a Gaussian pyramid, wherein all images in the Gaussian pyramid are derived from the same face original image, and the resolution of the images is lower when the images are closer to the pyramid cone tip; the original face image G ₀ As the input image of the Gaussian pyramid, firstly smoothing the original image by utilizing a low-pass filtering mode, and then downsampling the smoothed image to obtain a first-layer pyramid image with reduced size and resolution; by the above way, each layer of the Gaussian pyramid is constructed by the next layer, and the related formula is expressed as follows:

wherein w is Gaussian kernel matrix, w (m, n) is element corresponding to m-th row and n-th column in kernel matrix, R _x And R is R _y Respectively representing downsampling rates of the first layer image in the directions of the x axis and the y axis, and R is _x ＝2、R _y =2. Kappa (m, n) represents a gaussian low pass filter, the formula is described as:

wherein sigma _x And sigma (sigma) _y The standard deviation of the filter in the x-axis and y-axis directions are shown respectively,

in general, in order to reduce the computational complexity, a gaussian kernel function with a size of 5×5 may be used instead of κ (m, n), and the image with the corresponding layer number is obtained through convolution operation, where the gaussian kernel matrix is represented as follows:

assuming w (m, n) is an element corresponding to the m-th row and the n-th column in the kernel matrix, the gaussian pyramid construction mode can be simplified by using the following formula:

after obtaining face images with a plurality of resolutions, constructing a Gaussian pyramid according to the principle that the image resolution from the bottom layer to the top layer is gradually reduced; therefore, if the original face image is downsampled for n times, the number of layers of the constructed Gaussian pyramid is also n;

s302, constructing a Laplacian pyramid, wherein the Laplacian pyramid is used for reconstructing an upper-layer non-sampled image from a pyramid lower-layer image, and decomposing each layer of image of the Gaussian pyramid in a prediction residual error mode; for Gao Sijin in step S301Image G of the ith layer of the character tower _i First, up-sampling operation is performed to make G _i Swelled to the image G of the i-1 layer _i-1 The sizes are consistent, wherein the missing rank values are filled by default using 0 elements in the up-sampling process; then the expanded image is convolved by using a Gaussian kernel to obtain a corresponding low-frequency image L _i The method comprises the steps of carrying out a first treatment on the surface of the Finally let G _i And a low frequency image L _i Difference is made to obtain a high-frequency image H _i The correlation formula is as follows:

H _i ＝G _i -L ₁ ，

where ∈represents the up-sampling,a convolution operation is represented and is performed,

the Laplacian pyramid is used as an expansion of the Gaussian pyramid, the construction mode is completely based on the Gaussian pyramid, and the purpose of decomposing each layer of image of the Gaussian pyramid into a low-frequency image and a high-frequency image is to solve the problem that the image of each layer of image of the Gaussian pyramid is different; therefore, for a Gaussian pyramid with the layer number of n, the layer number of the constructed Laplacian pyramid is also n;

s303, regarding the high-frequency image H with resolution of X X Y obtained by decomposition in the step S302 _i In order to further remove noise elements contained in an image, a pixel of a small energy value contained in the image needs to be set to zero, and a correlation formula is described as follows:

wherein H is _i (x, y) represents H _i The mid-coordinate point is the pixel gray value of (x, y),represents H _i T is the adaptive threshold, and there are:

for convenience of description, the invention outputs the low-frequency information image L output by the ith layer of the Laplacian pyramid _i Marked as LF _i High-frequency information image H after noise removal _i Marked as HF _i 。

As a further technical scheme of the invention: in the step S4, the CS-DLTP is used for calculating the facial texture characteristics of the face image, and the characteristic histogram vector is formed by the following steps:

s401, for any one of the low-frequency image or the high-frequency image output by the image pyramid constructed in the step S303, R is used around the central pixel point firstly _in 8 pixel points are selected as sampling points in a circular adjacent area with a radius, and the 8 sampling points are distributed at equal intervals; secondly, for the 8 pixel points, a pixel point is selected along the ray direction of the connection between the central point and the pixel points (the distance from all the pixel points to the central pixel point is ensured to be consistent), so that the 8 pixel points at the outer side form a pixel R _ou The CS-DLTP characteristic is obtained by fusing the sampling points for the circular neighborhood of the radius; the specific calculation formula is as follows:

wherein Δg _i And expressing the fusion characteristic value, wherein the formula is as follows:

in the above, g _c The gray value representing the center pixel point of the local neighborhood,representing gray values of two adjacent pixels taken in a certain direction of the center pixel, t being a variable threshold, calculatingThe formula is as follows:

wherein g _a Is the sampling point g around the central pixel point _i Is calculated by the following formula:

s402, aiming at each pixel point in the face image, the method comprises the following steps ofPattern splitting into top-level patternsAnd bottom layer mode->The concrete expression is as follows:

wherein:

for convenience of description, the two sub-coding modes are respectively named asAnd->They are combined by coding rules in each direction; the code generated by the two sub-coding modes at each pixel point can be described as:

finally, the two sub-coding modes are connected in series and updatedThe expression mode of (a):

s403, repeating S401-S402 for each low-frequency image and each high-frequency image output by the image pyramid constructed in the step S303 to obtain the low-frequency image characteristics of the ith layer pyramidHigh frequency image featuresConcatenating these features forms a feature histogram vector:

where n represents the number of layers of the image pyramid.

As a further technical scheme of the invention: the step S5 of locally storing the characteristic histogram vector, and the later direct application in the identification work of the related scene comprises the following steps:

s501, locally storing the feature histogram vector calculated in the step S4 in an XML or TXT format, and ensuring that each line in the file only contains the features of one image;

s502, in face recognition work of a later-stage related scene, extracting feature histogram vectors of face images to be recognized according to steps S3-S4, and then calculating chi-square distances between the optimized features and all stored features, wherein a calculation formula is as follows:

wherein H= { H _i |i＝1，2，…，B}、K＝{k _i I=1, 2, …, B } respectively represent a face image feature histogram and a face image feature histogram to be identified which are locally stored, and B represents the dimension of the histogram; therefore, the label corresponding to the image with the smallest chi-square distance between the training set and the image to be detected can be regarded as the true identity of the face to be detected.

Compared with the prior art, the invention has the advantages or positive effects

1. According to the local feature face recognition method based on the image pyramid, the original face image is subjected to multistage decomposition in a mode of constructing the image pyramid; the face recognition method provided by the invention is insensitive to image noise because the resolution of the image in the image pyramid is lower as the image is closer to the pyramid cone tip, and the noise density tends to be gradually sparse along with the reduction or decomposition of the image resolution until the image tends to disappear.

2. In the process of constructing the image pyramid, each layer of image of the pyramid is subjected to cross-band decomposition, namely, the pyramid is divided into a high-frequency image and a low-frequency image, so that the features of different layers of the face are further refined and distinguished; meanwhile, noise elements in the high-frequency image are automatically judged and removed by using a self-defined threshold value mode, so that the robustness of the method for eliminating the image noise is further enhanced.

3. The invention provides a central symmetry double-layer local three-value mode (CS-DLTP), wherein CS-DLTP can effectively acquire texture characteristics, gray level changes, certain adaptability to illumination transformation and the like while retaining the advantages of LBP and LTP, and the robustness of an algorithm to strong illumination changes is enhanced by expanding the original coding mode; in consideration of the defect of the existing algorithm on the extraction capability of the fine features of the human face, the CS-DLTP descriptor provided by the invention selects to collect a plurality of information points along the direction of the facial texture, and weights and fuses the information points according to the sequence far from the central pixel point so as to achieve the purpose of considering the fine feature change of the human face; the method for sampling the plurality of information points also strengthens the robustness of CS-DLTP to Gaussian noise and spiced salt noise, and compared with the prior art, the CS-DLTP can avoid the interference of noise points to local feature codes as much as possible by controlling the selection of double-layer sampling radius, so that the method provided by the invention can obviously improve the face recognition accuracy.

4. The CS-DLTP coding scheme provided by the invention calculates the detailed change trend among the facial image features by introducing the variable threshold, and compared with the prior art, the method provided by the invention has more adaptability to the change among different facial images.

Drawings

FIG. 1 is a flow chart of an implementation of the image pyramid-based local feature face recognition method of the present invention;

fig. 2 is a schematic structural diagram of a gaussian pyramid in the local feature face recognition method based on an image pyramid;

fig. 3 is a schematic structural diagram of a laplacian pyramid in the image pyramid-based local feature face recognition method of the present invention;

fig. 4 is a diagram illustrating a coding scheme of a CS-DLTP descriptor in the image pyramid-based local feature face recognition method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, a local feature face recognition method based on an image pyramid includes the following specific steps:

s2, making the image acquired in the step S1 into a training set;

s301, constructing a Gaussian pyramid according to the figure 2, wherein all images in the Gaussian pyramid are derived from the same face original image, and the resolution of the images is lower when the images are closer to the pyramid cone tip; the original face image G ₀ As the input image of the Gaussian pyramid, firstly smoothing the original image by utilizing a low-pass filtering mode, and then downsampling the smoothed image to obtain a first-layer pyramid image with reduced size and resolution; by this, each layer G of Gaussian pyramid ₁ 、G ₂ 、…、G _n All are obtained by the construction of the next layer, and the related formulas are expressed as follows:

s302, constructing a Laplacian pyramid according to FIG. 3, wherein the Laplacian pyramid is used for reconstructing an upper-layer non-sampled image from a pyramid lower-layer image, and decomposing each layer image of the Gaussian pyramid in a prediction residual mode; for image G of the ith layer of the Gaussian pyramid in step S301 _i First, up-sampling operation is performed to make G _i Swelled to the image G of the i-1 layer _i-1 The sizes are consistent, wherein the missing rank values are filled by default using 0 elements in the up-sampling process; then the expanded image is convolved by using a Gaussian kernel to obtain a corresponding low-frequency image L _i The method comprises the steps of carrying out a first treatment on the surface of the Finally let G _i And a low frequency image L _i Difference is made to obtain a high-frequency image H _i And by analogy, a low-frequency image L of the layer 2 can be obtained by calculating the layer 1 image ₂ And high-frequency image H ₂ And a low-frequency image L of the nth layer _n And high-frequency image H _n The correlation formula is as follows:

H _i ＝G _i -L ₁ ，

s401, for the configuration of step S303Any one low-frequency image or high-frequency image output by the built image pyramid is firstly formed by R around the central pixel point _in 8 pixel points are selected as sampling points in a circular adjacent area with a radius, and the 8 sampling points are distributed at equal intervals; secondly, for the 8 pixel points, a pixel point is selected along the ray direction of the connection between the central point and the pixel points (the distance from all the pixel points to the central pixel point is ensured to be consistent), so that the 8 pixel points at the outer side form a pixel R _ou The CS-DLTP characteristic is obtained by fusing the sampling points for the circular neighborhood of the radius; referring to fig. 4, for a center pixel point C, a sampling point a ₀ 、A ₁ 、…、A ₇ Uniformly distributed at R _in Sampling point B on the inner circular sampling neighborhood of radius ₀ 、B ₁ 、…、B ₇ Uniformly distributed at R _ou On the sampling neighborhood outside the radius, the specific calculation formula is as follows:

in the above, g _c The gray value representing the center pixel point of the local neighborhood,the gray value of two adjacent pixel points taken in a certain direction of the central pixel point is represented, t is a variable threshold value, and the calculation formula is as follows:

wherein:

for convenience of description, the two sub-coding modes are respectively named asAndthey are combined by coding rules in each direction; the code generated by the two sub-coding modes at each pixel point can be described as:

where n represents the number of layers of the image pyramid.

Working principle: the invention provides a local feature face recognition method based on an image pyramid, which uses the image pyramid to decompose an original face image into a multi-level low-frequency image L _i And high-frequency image H _i I is the number of layers corresponding to the image pyramid, the low-frequency image represents the global feature of the face, and the high-frequency image represents the fine feature and the edge feature of the face; for noise elements possibly existing in the high-frequency image, the invention further uses the self-adaptive threshold value to judge the energy values of all pixel points, and the pixel points with the energy values smaller than the threshold value can be regarded as noise, and the pixel values of the pixel points need to be set to 0; the invention provides a central symmetry double-layer local three-value mode (CS-DLTP), wherein the CS-DLTP samples a plurality of information points for each pixel point of all low-frequency images and high-frequency images obtained through image pyramid decomposition, and weights and fuses the information points according to the sequence far from the central pixel point; the invention is further used for accurately judging the specific magnitude relation of the gray values of all sampling points in the image through the variable threshold value; the invention further connects all CS-DLTP features of the low-frequency image and the high-frequency image in series to form the feature histogram vector and stores the feature histogram vector locally, and can be directly applied to relevant face recognition scenes in the later period.

While the invention has been described with respect to the preferred embodiments, it will be apparent to those skilled in the art that the invention is not limited thereto, and that various modifications and changes can be made without departing from the spirit and scope of the invention.

Claims

1. The local feature face recognition method based on the image pyramid is characterized by comprising the following steps of:

s1, aiming at each face to be detected, acquiring a face image of the face;

s2, making the image acquired in the step S1 into a training set;

s4, calculating facial texture features of all images contained in the image pyramid by using CS-DLTP descriptors, and then connecting all the calculated feature histograms in series to form feature histogram vectors;

s5, locally storing the feature histogram vector, and directly applying the feature histogram vector to a related scene in the later stage;

the facial image acquisition method in the step S1 includes:

directly downloading credentials or work photographs of the corresponding identification objects on the network; shooting a corresponding person through camera equipment; the monitoring camera intercepts the related pictures to complete the acquisition of the face image;

the step S2 of manufacturing the training set comprises the following steps:

s201, renaming and serializing all images in a face image library according to requirements, and writing a corresponding batch processing program to finish;

step S3 includes the steps of:

wherein w is Gaussian kernel matrix, w (m, n) is element corresponding to m-th row and n-th column in kernel matrix, R _x And R is R _y Respectively representing downsampling rates of the first layer image in the directions of the x axis and the y axis, and R is _x ＝2、R _y =2; kappa (m, n) represents a gaussian low pass filter, the formula is described as:

wherein sigma _x And sigma (sigma) _y The standard deviation of the filter in the x-axis and y-axis directions are shown respectively;

s302, constructing a Laplacian pyramid, wherein the Laplacian pyramid is used for reconstructing an upper-layer non-sampled image from a pyramid lower-layer image, and decomposing each layer of image of the Gaussian pyramid in a prediction residual error mode; for image G of the ith layer of the Gaussian pyramid in step S301 _i First, up-sampling operation is performed to make G _i Swelled to the image G of the i-1 layer _i-1 The sizes are consistent, wherein the missing rank values are filled by default using 0 elements in the up-sampling process; then the expanded image is convolved by using a Gaussian kernel to obtain a corresponding low-frequency image L _i The method comprises the steps of carrying out a first treatment on the surface of the Finally let G _i And a low frequency image L _i Difference is made to obtain a high-frequency image H _i The correlation formula is as follows:

H _i ＝G _i -L ₁ ，

where ∈represents the up-sampling,representing a convolution operation;

finally, a low-frequency information image L to be output by the ith layer of the Laplacian pyramid _i Marked as LF _i High-frequency information image H after noise removal _i Marked as HF _i ；

Step S4 includes the steps of:

s401, for any one of the low-frequency image or the high-frequency image output by the image pyramid constructed in the step S303, R is used around the central pixel point firstly _in 8 pixel points are selected as sampling points in a circular adjacent area with a radius, and the 8 sampling points are distributed at equal intervals; secondly, for the 8 pixel points, a pixel point is selected along the ray direction of the connection between the central point and the pixel point, so that the 8 pixel points at the outer side form a pixel with R _ou The CS-DLTP characteristic is obtained by fusing the sampling points for the circular neighborhood of the radius; the specific calculation formula is as follows:

s402, aiming at each pixel point in the face image, the method comprises the following steps ofMode splitting into Top-level modes->And bottom layer mode->The concrete expression is as follows:

wherein:

the two sub-coding modes of the generated CS-DLTP can be respectively expressed asAnd +.>They are combined by coding rules in each direction; the code generated by the two sub-coding modes at each pixel point can be described as:

wherein n represents the number of layers of the image pyramid;

step S5 includes the steps of:

s502, in the face recognition work of the related scene in the later period, extracting feature histogram vectors of the face image to be recognized according to steps S3-S4, and then calculating the chi-square distance between the optimized features and all the stored features, wherein the calculation formula is as follows: