CN111523414B

CN111523414B - Face recognition method, device, computer equipment and storage medium

Info

Publication number: CN111523414B
Application number: CN202010285535.3A
Authority: CN
Inventors: 张官兴; 王赟; 郭蔚; 黄康莹; 张铁亮
Original assignee: Shanghai Ewa Intelligent Technology Co ltd; Shaoxing Ewa Technology Co ltd
Current assignee: Shanghai Ewa Intelligent Technology Co ltd; Shaoxing Ewa Technology Co ltd
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2023-10-24
Anticipated expiration: 2040-04-13
Also published as: CN111523414A

Abstract

The application relates to the technical field of computers, in particular to a face recognition method, a face recognition device, computer equipment and a storage medium. The method comprises the following steps: acquiring an initial image, and performing hierarchical face detection on the initial image to obtain a target face area, wherein the hierarchical face recognition comprises the steps of performing human body area recognition on the initial image and performing face recognition on the recognized human body area; and cutting the target face area from the initial image, inputting the cut target face area into a face recognition model for face feature extraction, and using the cut target face area for face recognition. The method can improve the detection precision.

Description

Face recognition method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of face recognition technologies, and in particular, to a face recognition method, apparatus, computer device, and storage medium.

Background

The traditional image and face detection and recognition method mainly comprises the steps of inputting collected full-image data into a trained neural network model through an end-to-end neural network architecture, and extracting and recognizing feature vectors.

The traditional face recognition method is limited by computing power and neural network model input limitation in the recognition process, input resolution is required to be reduced, face texture feature loss is caused, and model recognition accuracy is reduced. Meanwhile, the candidate region of the model for face position detection needs to traverse the whole image, so that the calculation parameters of the model are increased, the calculation amount is large, and the time consumption is long.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a face recognition method, apparatus, computer device, and storage medium capable of improving detection accuracy.

A face recognition method, the method comprising:

acquiring an initial image, and performing hierarchical face detection on the initial image to obtain a target face area, wherein the hierarchical face recognition comprises the steps of performing human body area recognition on the initial image and performing face recognition on the recognized human body area;

and cutting the target face area from the initial image, inputting the cut target face area into a face recognition model for face feature extraction, and using the cut target face area for face recognition.

In one embodiment, the step of performing hierarchical face detection on the initial image to obtain a target face area includes:

detecting the initial image to obtain a target human body area;

preprocessing the obtained target human body area to obtain at least one image to be processed with standard size;

performing face detection on the image to be processed to obtain a face area to be processed;

and mapping the face area to be processed to the initial image to obtain a target face area.

In one embodiment, the detecting the initial image to obtain the target human body area includes:

Scaling and graying the initial image to obtain an image to be identified;

and detecting the image to be identified to obtain a target human body area.

In one embodiment, the detecting the image to be identified to obtain the target human body area includes:

inputting the image to be identified into a human body detection model which is trained in advance;

obtaining a plurality of candidate frames with different length-width ratios and areas, which are predetermined in the human body detection model;

extracting image features corresponding to a plurality of candidate frames with different length-width ratios and areas through the human body detection model, and calculating to obtain target confidence coefficient of each candidate frame according to the extracted image features;

calculating a target human body area according to the target confidence coefficient of each candidate frame through non-maximum value inhibition operation;

the training mode of the human body detection model comprises the following steps:

acquiring a training set marked with a real human body area;

downsampling each first training image in a training set to obtain a first feature map;

generating a plurality of first training candidate frames with different length-width ratios and areas according to the first feature map;

calculating a first training error of the first training candidate frame;

And adjusting model parameters according to the marked real human body region in the training set and the first training error, and training to obtain a human body detection model.

In one embodiment, the aspect ratio of the first training candidate frame is preset; or alternatively

The aspect ratio of the first training candidate frame is obtained by clustering the marked real human body areas in the first training image; the area of the first training candidate frame is determined according to the area of the marked real human body area in each class after the marked human body areas in the first training image are clustered; or alternatively

The area of the first training candidate frame is determined by the difference of areas of the marked human body areas caused by the difference of the distance.

In one embodiment, the performing face detection on the image to be processed to obtain a face area to be processed includes:

respectively inputting the images to be processed into a face detection model;

respectively extracting at least one preset position area of the image to be processed through the face detection model;

generating face candidate frames with different proportions and sizes according to the preset position area;

Calculating the confidence coefficient corresponding to the face candidate frame region;

obtaining a face region through non-maximum suppression operation and calculation according to the confidence coefficient of each face candidate frame;

the training mode of the face detection model comprises the following steps:

extracting a real human body region from a training set marked with the real human body region, wherein the real human body region is marked with a real human face region;

performing scale transformation on the real human body area to obtain a plurality of second training images with standard sizes;

generating a plurality of preset position areas of the second training images;

generating a plurality of second training candidate frames with different proportions and sizes according to the preset position area;

calculating a second training error corresponding to the second training candidate frame region;

and training according to the marked real face area and the confidence coefficient to obtain a face detection model.

In one embodiment, the performing scale transformation on the obtained target human body area to obtain a plurality of to-be-processed images with standard sizes includes:

mapping the target human body area to the initial image to obtain the position of the target human body area;

extracting a human body image corresponding to the target human body region from the initial image according to the position;

And performing scale transformation on the human body image to obtain a plurality of images to be processed with standard sizes.

A face recognition device, the device comprising:

the human face region detection module is used for acquiring an initial image, carrying out hierarchical human face detection on the initial image to obtain a target human face region, and carrying out human body region identification on the initial image and carrying out human face identification on the identified human body region;

and the face recognition module is used for inputting the target face region which is cut from the initial image into a face recognition model for face feature extraction and used for face recognition.

A computer device comprising a memory storing a computer program and a processor implementing the steps of any one of the methods described above when the processor executes the computer program.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the preceding claims.

According to the face recognition method, the device, the computer equipment and the storage medium, the initial image is subjected to hierarchical face detection, namely, the human body area is detected first, then the detected human body area is subjected to face recognition, so that the human body area is detected first, the position of the human face is relatively determined during detection, and the detection precision is improved.

Drawings

FIG. 1 is a flow chart of a face recognition method in one embodiment;

FIG. 2 is a flowchart of step S102 in the embodiment shown in FIG. 1;

FIG. 3 is an image transformation diagram of a face recognition method in one embodiment;

FIG. 4 is a schematic representation of a first training image in a training set, in one embodiment;

FIG. 5 is a schematic illustration of a target human body region in one embodiment;

FIG. 6 is a schematic diagram of a second training image obtained by preprocessing a target human body region in one embodiment;

fig. 7 is a schematic diagram of a face recognition method in another embodiment;

fig. 8 is a block diagram of a face recognition device in one embodiment;

fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In one embodiment, as shown in fig. 1, a face recognition method is provided, where the embodiment is applied to a terminal to illustrate the method, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

S102: and acquiring an initial image, performing hierarchical face detection on the initial image to obtain a target face area, wherein the hierarchical face recognition comprises the steps of performing human body area recognition on the initial image and performing face recognition on the recognized human body area.

Specifically, the initial image may refer to an image photographed by a camera, and pixels of the initial image generally photographed are fixed, and the hierarchical face recognition includes performing a human body region recognition on the initial image and performing a face recognition on the recognized human body region, wherein the human body region recognition refers to a human body region in which a complete human brain image is photographed from the initial image, and the face recognition refers to a determination of a human face region from the recognized human body region. The identification of the human body area can be performed by detecting the image to be identified after the initial image is subjected to precision processing, so that the processing amount of data can be reduced, and the processing efficiency is improved. After the terminal obtains the target human body area, mapping the target human body area into an initial image or an image to be identified to obtain the human body area, then carrying out face detection on the human body area to obtain the human face area, mapping the human face area into a high-precision initial image for cutting, and then cutting the initial image to obtain the high-precision human face area, so that the follow-up face recognition is facilitated.

Taking 1080P as an example, the conventional terminal equipment performs face recognition, because of limitation of factors such as computing power and power consumption, the image input to the neural network can perform resolution reduction operation, but because the face area in the image can be in a far-near relationship due to shooting, after the original image is in resolution reduction, the far-away face can lose detail characteristics such as textures and colors used for face recognition, meanwhile, the requirement of traversing the searched face area in the whole image is larger for limited computing power, memory and electric quantity of the terminal, so that the face recognition task can be completed more quickly and accurately, the face detection and recognition can be completed in a hierarchical face detection mode, and the face image can be directly cut from the original image to be obtained as the input of face recognition through mapping the face area position in the original image, and the recognition precision can be greatly improved. Optionally, the terminal may perform scaling and graying processing on the initial image to obtain the image to be identified, so that by adopting the coarse texture feature image, that is, the image to be identified is used as the input of the detection network, the requirements on computing power, memory, power consumption and the like of the hardware device of the terminal and the complexity of the model can be greatly reduced.

S104: and cutting the target face area from the initial image, inputting the cut target face area into a face recognition model for face feature extraction, and using the cut target face area for face recognition.

Specifically, after the terminal identifies the target face area, mapping the target face area into an initial image according to the position coordinates of the target face area, so that the initial image can be cut according to the position coordinates to obtain a high-precision target face area, and the high-precision target face area is input into a face recognition model to extract face features for face recognition. Optionally, after obtaining the high-precision target face area, the terminal may perform feature extraction and the like on the high-precision target face area, so as to facilitate subsequent processing. For example, the terminal may uniformly scale the high-precision target face area to a certain resolution, for example, 250×250, so that the face image may be input to the face feature extraction network to extract the face features, and further other processing may be performed.

According to the face recognition method, the initial image is subjected to hierarchical face detection, namely, the human body area is detected first, then the detected human body area is subjected to face recognition, so that the human body area is detected first, the position of the human body area is relatively determined during face detection, and the detection precision is improved.

In one embodiment, referring to fig. 2, performing hierarchical face detection on an initial image to obtain a target face area includes:

s202: and detecting the initial image to obtain a target human body area.

Firstly, a target human body area refers to an area of a human body with a complete human face area in an initial image, firstly, a terminal carries out coarse granularity detection on the initial image so as to obtain the human body area including the complete human face area in the initial image, specifically, as shown in fig. 3, fig. 3 is an image transformation diagram of a human face recognition method in one embodiment; in other embodiments, there may be multiple or 1, where there are two human bodies in the initial image 1000px, and the terminal identifies the initial image to obtain the target human body areas F1 and F2 without specific limitation.

Alternatively, the terminal may process the initial image through a pre-trained human detection model to obtain the target human body region. The human body detection model can be obtained by training in advance according to a plurality of training images marked with human body areas. The size and the area of the candidate frame in the human body detection model can be predetermined according to training images during training, and the size is determined according to the aspect ratio and the size area of the actual human body under the condition of different distances from the camera, so that the detection efficiency of the candidate detection window (Anchor box) can be improved.

S204: and preprocessing the obtained target human body area to obtain a plurality of images to be processed with standard sizes.

Specifically, the preprocessing process comprises cutting, scaling and filling, and in order to cut, scale and fill the target human body area obtained through human body detection model detection under the same resolution, the subsequent terminal only needs to detect the human face by using a face detection frame with a fixed size to obtain the human face area, and the size of the human face in the image is not required to be adjusted according to the difference of the size of the human face, and the size proportion of the human face and the human body is generally fixed in a certain range, so that the subsequent recognition of the human face area can be facilitated after the obtained target human body area is scaled. With reference to fig. 3, the resolutions of the two obtained human body regions are 350px×950px and 190px×520px, respectively, so that the terminal can scale the obtained human body regions to the same resolution. In addition, optionally, after the terminal identifies the target human body area, the terminal performs fine adjustment on the position of the target human body area, cuts out the human body area from the initial image, and then performs scale transformation on the cut-out human body area. The size and the area of the candidate frame in the face detection model can be predetermined according to training images during training, and the size is determined according to the aspect ratio of the face in an actual human body, so that the detection efficiency can be improved.

S206: and carrying out face detection on the image to be processed to obtain a face area to be processed.

Specifically, the face region to be processed refers to a face region extracted from the image to be processed obtained through preprocessing. Optionally, the terminal may input the image to be processed into a face detection model to obtain a face area to be processed, where the face detection model may be obtained by training in advance, for example, according to a training image labeled with a human body area and a human face area, the labeled human body area is first extracted from the training image, and then the extracted human body area of the labeled human face area is used as a sample to train.

S208: and mapping the face area to be processed to the initial image to obtain a target face area.

Specifically, the mapping operation refers to a process of determining the position of the face region to be processed in the initial image, so that the accuracy is low because the image to be processed is operated when the face recognition is performed, and the image is mapped into the high-accuracy initial image, so that the target face region with high resolution can be obtained, and the face feature extraction is convenient to perform subsequent face feature extraction for performing the face recognition.

When the terminal identifies the human body area to be processed, the coordinate position of the human face area to be processed in the image to be processed needs to be recorded, so that the coordinate position of the human face area to be processed in the identified human body area can be determined according to the size transformation of the image to be processed and the human body area. And then the position of the face area to be processed in the initial image is determined according to the position of the face area in the initial image, so that the terminal can determine the target face area in the initial image through the position of the face area to be processed in the initial image.

According to the face recognition method, the initial image is firstly recognized to obtain the target human body area, and then the target human body area is subjected to scale change to obtain a plurality of images to be processed of standard size, so that the images to be processed can be subjected to face recognition to obtain the face area to be processed, and then the face area to be processed is mapped into the initial image to obtain the high-resolution target face area, so that the human body area is detected firstly, and then the scale is unified, so that the positions of the human faces are relatively determined during detection, and the detection precision is improved.

In one embodiment, detecting the initial image to obtain the target human body region includes: scaling and graying the initial image to obtain an image to be identified; and detecting the image to be identified to obtain a target human body area.

Specifically, since the resolution of the initial image is higher, only the target human body area needs to be detected, and an image with too high resolution is not needed, the terminal can perform scaling and graying processing on the initial image, for example, the initial image is firstly converted into a gray image, then the gray image is normalized, only the outline and general details of the human body are required to be reserved and used as input data for model detection, the size of the input data can be reduced, the identification efficiency of the target human body area is improved, and since the input data is reduced, the calculation force requirement on the hardware end is reduced, and meanwhile the memory overhead is reduced.

In the above embodiment, the scaling process and the graying process are performed on the initial image in advance, so that the size of the input data is reduced, and the recognition efficiency of the target human body region is improved.

In one embodiment, identifying the image to be identified to obtain the target human body region includes: inputting an image to be identified into a human body detection model which is trained in advance; obtaining a plurality of candidate frames with different aspect ratio and areas, which are predetermined in a human body detection model; extracting image features corresponding to a plurality of candidate frames with different length-width ratios and areas through a human body detection model, and calculating to obtain target confidence coefficient of each candidate frame according to the extracted image features; and obtaining the target human body region through non-maximum suppression operation and calculation according to the target confidence coefficient of each candidate frame.

Wherein the candidate frame is preset, and the aspect ratio of the candidate frame is preset; or the aspect ratio of the candidate frame is obtained by clustering according to the aspect ratio of the marked real human body area in the training image of the human body detection model during training. The processing candidate boxes can be preset according to experience, for example, three candidate boxes with different aspect ratios of 1:4, 5:8 and 1:1. Or the clustering analysis is directly carried out on the labeling frames in the labeling data set according to the training of the human body detection model, so that the proper candidate frame proportion is selected, and the self-adaptability and the detection efficiency of the model can be improved.

The area of the candidate frame may be set according to the difference of the distance target and the near target, or the area may be predetermined in the training process, for example, determined empirically or by cluster analysis.

The terminal inputs the image to be identified into a pre-trained human body detection model, so that a plurality of predetermined candidate frames with different length-width ratios and areas can be firstly obtained through the human body detection model, then image features corresponding to the candidate frames with different length-width ratios and areas are extracted, the target confidence coefficient of each candidate frame is calculated according to the extracted image features, so that a plurality of target human body areas corresponding to the target confidence coefficient which is not 0 can be obtained through the terminal, and then the target human body areas can be obtained through non-maximum suppression, for example, the terminal sequences all the target confidence coefficients. The candidate frame with the highest target confidence is selected, then the coincidence degree (iou) of other candidate frames and the current candidate frame is calculated, and if the coincidence degree is larger than a certain threshold value, the candidate frames are deleted, because there may be several candidate frames with high scores in the same person, which are the same person but do not need as many frames, and only one is needed.

In the above embodiment, the identification process of the target human body area is provided, and the process only needs to process the image to be identified with low resolution, so that the speed is high.

In one embodiment, the training manner of the face detection model includes: acquiring a training set marked with a real human body area; downsampling each first training image in a training set to obtain a first feature map; generating a plurality of first training candidate frames with different length-width ratios and areas according to the first feature map; calculating a first training error of the first training candidate frame; and adjusting model parameters according to the marked real human body region in the training set and the first training error, and training to obtain a human body detection model.

Specifically, the terminal may first obtain a training set labeled with a real human body area, for example, as shown in fig. 4, where a human body area in each first training image in the training set, where the brain is not shot or is shot with a brain insufficiency, is used as a negative sample. Then, in order to improve the processing efficiency, the terminal may pre-process each first training image, gray and scale to a uniform size. Thus, the processed first training image is input into a preset detection network, the structure of the detection network can be preset according to the requirement, and the human body detection model can be obtained after the detection network is trained.

The terminal samples the first training image to obtain a plurality of first feature maps, for example, the terminal may downsample the first training image to obtain a plurality of first feature maps, and divide the first feature maps into n×n cells (network cells, and divide the feature maps of the specific layer into cells with an area size). Then the terminal generates a plurality of first training candidate frames with different aspect ratio and areas according to each grid unit, so that position logistic regression can be performed on the first candidate frames and first training errors of the first training candidate frames, namely x, y, w, h and confidence, can be calculated: the first training error may include two aspects, one is the size of the likelihood that the first candidate frame contains the target, and the other is the accuracy of the first candidate frame. The former is denoted Pr (object), when the first candidate frame is a negative sample such as background, i.e. contains no object, pr (object) =0. And Pr (object) =1 when the first candidate frame contains a target. The accuracy of the first candidate frame may be characterized by the degree of coincidence IOU of the first candidate frame with the actual frame (ground trunk), denoted IOUtrunk pred. The first training error may thus be defined as confidence=pr (object) ×ioutry pred (the intersection ratio of the frame of the sample-noted real human body region and the first training candidate frame); wherein if the real human body area with the artificial mark falls in a cell, the first item Pr (Object) is taken as 1, otherwise, the first item Pr (Object) is taken as 1; the second term IOUtry_pred is the degree of coincidence IOU value between the predicted first training candidate box and the box of the actual annotated real human body region.

Wherein, alternatively, since the aspect ratio of the human body is relatively fixed, the terminal may use a plurality of first candidate frames of fixed scale sizes as the human body detection window. In one embodiment, the aspect ratio of the first training candidate frame is preset; or the aspect ratio of the first training candidate frame is obtained by clustering the marked real human body areas in the first training image. That is, the aspect ratio of the first candidate box may be obtained empirically or by k-means clustering. If the experience parameters are adopted, three candidate frames with different aspect ratios of 1:4, 5:8 and 1:1 can be selected. If the frame is obtained through clustering, the terminal directly performs clustering analysis on the frames of the real human body areas in the first training image marked with the real human body areas, so that the proper candidate frame proportion is selected.

In one embodiment, the area of the first training candidate frame may be predefined, for example, the area of the first training candidate frame is determined according to the area of the labeled real human body region in each class after the labeled human body region in the first training image is clustered; or the areas of the first training candidate frames are determined by the difference of areas of the marked human body areas caused by the difference of the distance, so that the user can set the first training candidate frames with different areas according to the difference of the areas. Thus, the sensitivity of the human body detection model to targets with different sizes can be further improved.

In practical application, candidate frames with different area sizes can be generated by clustering the sizes of human body areas in the training set, or the areas of the candidate areas can be determined by the difference of the areas occupied by the human body areas in the first training image due to the difference of the distance.

Where w is the width of the first training candidate frame, h is the length of the first training candidate frame, s is the area of the first training candidate frame, and ratio is the aspect ratio of the first training candidate frame.

For example, assuming that the input size of the first training image is 416×416, the feature layer size obtained by sampling is 13×13, the aspect ratio of one of the candidate frames is 0.2:0.6 or 0.2:0.2, and the normalization is performed on the basis of 416×416, where the feature layer candidate frame size is: 32 is downsampling multiplying power or reducing multiplying power, the general value range of s is 0.2-0.9,

w＝0.2*416/32＝0.2*13＝2.6 h＝0.6*416/32＝0.6*13＝7.8

w＝0.2*416/32＝0.1*13＝2.6 h＝0.2*416/32＝0.2*13＝2.6

in this way, the terminal may further downsample the first feature map, and obtain a second feature map (16 times downsampled) and a third feature map (32 times downsampled) … … nth feature map. And the downsampling multiple is set according to the actual situation or the upsampling operation is carried out on the basis of the nth characteristic diagram. If the nth feature map is obtained by 32 times of sampling, the nth feature map is up-sampled by 2 times to obtain a 16 times second feature map, and similarly, the nth feature map is up-sampled by 2 times on the basis of the second feature map to obtain an 8 th feature map; the feature map obtained by each up-sampling can be fused with the feature map with the same size output by the corresponding down-sampling, and then the operation of logically regressing the candidate frame and calculating the first training error of the candidate frame is performed on the basis of the up-sampling feature map.

And the terminal gradually iterates through the operation until the human body detection model converges. Optionally, the terminal may further obtain a verification set to verify the human body detection model to determine whether the human body detection model meets the requirement, and if not, continue training.

In the above embodiment, the length-width ratio and the area value range of the candidate frame are fixed, so that the training speed and the adaptivity of the human body detection model are improved.

In one embodiment, performing face recognition on an image to be processed to obtain a face region to be processed includes: respectively inputting the images to be processed into a face detection model; respectively extracting at least one preset position area of the image to be processed through a face detection model; generating a plurality of face candidate frames with different proportions and sizes according to a preset position area; calculating the confidence coefficient corresponding to the face candidate frame region; and obtaining the face region through non-maximum value inhibition operation and calculation according to the confidence coefficient of each face candidate frame.

The face area is generally located at the upper part of the human body area, so the terminal can respectively extract preset position areas of a plurality of images to be processed through the face detection model, for example, the terminal can downsample the images to be processed, generally 1*8 grid cells, and then extract the first 4 grid cell areas as preset position areas so as to identify the face area.

Wherein the face candidate frame is preset, and the face candidate frame may be preset empirically. Or the clustering analysis is directly carried out on the labeling frames in the labeling data set according to the training of the face detection model, so that the proper candidate frame proportion is selected, and the self-adaptability of the model can be improved.

The terminal inputs the preset position area into a pre-trained face detection model, so that a plurality of predetermined candidate frames with different length-width ratios and areas can be obtained through the face detection model, then image features corresponding to the candidate frames with different length-width ratios and areas are extracted, confidence degrees of each face candidate frame are calculated according to the extracted image features, so that the terminal can obtain face areas corresponding to a plurality of non-0 confidence degrees, and then the face areas can be obtained through non-maximum suppression, for example, the terminal can sort all the target confidence degrees. Selecting the face candidate frame with the highest target confidence, then calculating the coincidence degree (iou) of other face candidate frames and the current face candidate frame, and deleting if the coincidence degree is larger than a certain threshold value, wherein the face candidate frames with high scores possibly exist in the same face, are the same face but do not need as many frames, and only one face candidate frame is needed.

In the above embodiment, the face region recognition process is provided, and the process only needs to process the image to be recognized with low resolution, so that the speed is high.

In one embodiment, the training manner of the face detection model includes: extracting a real human body region from the training set marked with the real human body region, wherein the real human face region is marked in the real human body region; performing scale transformation on the real human body area to obtain a plurality of second training images with standard sizes; generating a plurality of preset position areas of the second training images; generating a plurality of second training candidate frames with different proportions and sizes according to the preset position area; calculating a second training error corresponding to the second training candidate frame region; training according to the labeled real face region and the confidence coefficient to obtain a face detection model.

Specifically, fig. 5 and 6 are combined, wherein fig. 5 is a schematic diagram of a target human body region in one embodiment. Fig. 6 is a schematic diagram of a second training image obtained by performing scale transformation on a target human body region in one embodiment. Because the distance and the position of the person in the image are different, the obtained image area is large or small, and the obtained image area is full or incomplete, in order to facilitate the processing of a subsequent face position detection model, the terminal intercepts a target human body area from an initial image, performs scaling and filling to the same resolution (namely standard size) as shown in fig. 6, obtains the face area position through scaling and filling the same resolution as the input of a face detection model, maps the face area position into the initial image (high-resolution image), intercepts a corresponding face image, inputs the face image into a face recognition network, and performs face feature extraction for face recognition.

During training, the terminal extracts a real human body area from a training set marked with the real human body area, and the real human face area is marked in the real human body area; thus, the terminal can conduct scale transformation on the real human body area to obtain a plurality of second training images with standard sizes.

And then the terminal downsamples the second training image to obtain a second feature map, and divides the second feature map into n×m cells, which are preferably divided into 1*8 cells. The terminal may then acquire a preset location area, for example, the first 4 cells are used to generate a plurality of second training candidate frames with different proportions and sizes. The terminal then calculates a second training error corresponding to the second training candidate frame region, for example, performs position logistic regression on the second candidate frame region and calculates a second training error of the second training candidate region, so as to be used for detecting the face; other cells were discarded.

And finally, training the terminal according to the labeled real face region and the confidence coefficient until the model converges, and obtaining the face detection model.

In one embodiment, performing scale transformation on the obtained target human body region to obtain a plurality of images to be processed with standard sizes, including: mapping the target human body area to the initial image to obtain the position of the target human body area; extracting a human body image corresponding to the target human body area from the initial image according to the position; and performing scale transformation on the human body image to obtain a plurality of images to be processed with standard sizes.

Specifically, since the target human body area is identified according to the image to be identified, the target human body area is identified based on the image to be identified for low resolution, so that parameters of the model are reduced by multiple, the terminal can obtain the position of the target human body area in the initial image according to the position of the target human body area in the image to be identified, in this way, the terminal can extract the human body image corresponding to the target human body area from the initial image according to the determined position, and then the terminal performs scale transformation, filling and the like on the extracted human body image to obtain a plurality of images to be processed with standard size (wherein the scale transformation is due to the limitation of the face recognition network and the like on the size of the input image, the terminal is only scaled by simple and rough, so that the human body area is deformed, and therefore, some image terminals need to ensure that the scale size of the image is unchanged through filling). Therefore, the sizes of a plurality of images to be processed for face detection by the subsequent terminal are the same, and the processing is convenient. The terminal detects in the coarse-grained low-resolution image (binary image, data only have 0 and 1, so the whole data volume is small) through hierarchical distribution, obtains the target human body area and the face position information, and then maps the face position information into the high-resolution initial image so as to intercept the high-resolution face image.

It should be noted that, herein, extracting the human body image corresponding to the target human body region from the initial image refers to cutting the initial image to obtain the target human body region, and then performing edge expansion, zero filling, and amplification to a uniform standard size to obtain the image to be processed. In the above embodiment, the human body image is extracted from the initial image according to the target human body region, so that the resolution is in accordance with the requirement, and then the human body image is subjected to scale transformation to obtain a plurality of images to be processed with standard sizes, which lays a foundation for subsequent face detection.

In one embodiment, referring to fig. 7, fig. 7 is a schematic diagram of a face recognition method in another embodiment, in this embodiment, a terminal acquires an initial image, and then performs preprocessing on the initial image, including graying processing, scaling normalization processing, and the like to obtain an image to be recognized, so that the data amount is reduced, and thus the image to be processed is input into a trained human body detection model, a plurality of candidate frames with different proportions and areas are generated through the human body detection model, image features corresponding to the candidate frames with different aspect ratios and areas are extracted, and a target confidence coefficient of each candidate frame is calculated according to the extracted image features; and obtaining the target human body region through non-maximum suppression operation and calculation according to the target confidence coefficient of each candidate frame. The terminal maps the target human body area into the initial image, cuts the initial image according to the mapped coordinates of the target human body area to obtain a human body image, and expands and enlarges the human body object to obtain a plurality of standard-sized images to be processed. Finally, the terminal inputs the image to be processed into a face detection model to obtain a face region, wherein the face detection model can firstly extract a preset position region in the image to be processed; generating a plurality of face candidate frames with different proportions and sizes according to a preset position area; calculating the confidence coefficient corresponding to the face candidate frame region; and obtaining the face region through non-maximum value inhibition operation and calculation according to the confidence coefficient of each face candidate frame. And finally, mapping the face area into the original image to obtain the target face area.

In the above embodiment, the terminal detects the human body region first, then detects the human body region to obtain the human face region, so that the detection accuracy is improved, and secondly, the speed can be improved because only the gray level image is adopted for detecting the human body region.

It should be understood that, although the steps in the flowcharts of fig. 1, 2, and 7 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 1, 2, and 7 may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the execution of the steps or stages is not necessarily sequential, but may be performed in rotation or alternatively with at least a portion of the steps or stages of other steps or other steps.

In one embodiment, as shown in fig. 8, there is provided a face recognition apparatus including: a face region detection module 100 and a face recognition module 200, wherein

The face region detection module 100 is configured to obtain an initial image, and perform hierarchical face detection on the initial image to obtain a target face region, where the hierarchical face recognition includes performing human region recognition on the initial image and performing face recognition on the recognized human region;

the face recognition module 200 is configured to cut the target face area from the initial image, and input the cut target face area to a face recognition model for face feature extraction, so as to perform face recognition.

In one embodiment, the face region detection module 100 may include:

the human body region identification unit is used for acquiring an initial image and detecting the initial image to obtain a target human body region;

the size conversion unit is used for preprocessing the obtained target human body area to obtain a plurality of images to be processed with standard sizes;

the face region identification unit is used for carrying out face detection on the image to be processed to obtain a face region to be processed;

and the mapping unit is used for mapping the face area to be processed into the initial image to obtain a target face area.

In one embodiment, the human body region identifying unit may include:

the preprocessing unit is used for performing scaling processing and graying processing on the initial image to obtain an image to be identified;

and the human body identification unit is used for detecting the image to be identified to obtain a target human body area.

In one embodiment, the human body region identifying unit may include:

the first input unit is used for inputting the image to be identified into the human body detection model which is trained in advance;

a candidate frame acquisition unit for acquiring a plurality of candidate frames of different aspect ratio and area predetermined in the human body detection model;

the target confidence coefficient calculation unit is used for extracting image features corresponding to a plurality of candidate frames with different length-width ratios and areas through the human body detection model, and calculating the target confidence coefficient of each candidate frame according to the extracted image features;

and the human body area determining unit is used for obtaining the target human body area through non-maximum value inhibition operation and calculation according to the target confidence coefficient of each candidate frame.

The face recognition device may further include:

the first training set acquisition module is used for acquiring a training set marked with a real human body area;

The sampling module is used for downsampling each first training image in the training set to obtain a first feature map;

the first training candidate frame generation module is used for generating a plurality of first training candidate frames with different length-width ratios and areas according to the first feature map;

the first training error calculation module is used for calculating a first training error of the first training candidate frame;

the first training module is used for adjusting model parameters according to the real human body region marked in the training set and the first training error, and training to obtain a human body detection model.

The aspect ratio of the first training candidate frame is obtained by clustering the marked real human body areas in the first training image.

In one embodiment, the area of the first training candidate frame is determined according to the area of the marked real human body area in each class after the marked human body areas in the first training image are clustered; or alternatively

In one embodiment, the face region recognition module 300 may include:

The second input unit is used for respectively inputting the images to be processed into the face detection model;

the position extraction unit is used for respectively extracting preset position areas of at least one image to be processed through the face detection model;

the face candidate frame acquisition unit is used for generating face candidate frames with different proportions and sizes according to the preset position area;

the confidence coefficient calculating unit is used for calculating the confidence coefficient corresponding to the face candidate frame area;

the face region determining unit is used for obtaining the face region through non-maximum value suppression operation and calculation according to the confidence coefficient of each face candidate frame.

The face recognition device may further include:

the second training set acquisition module is used for extracting a real human body area from the training set marked with the real human body area, wherein the real human face area is marked in the real human body area;

the second training image acquisition module is used for performing scale transformation on the real human body area to obtain a plurality of second training images with standard sizes;

the preset position area extraction module is used for generating preset position areas of a plurality of second training images;

the second training candidate frame generation module is used for generating a plurality of second training candidate frames with different proportions and sizes according to the preset position area;

The second training error calculation module is used for calculating a second training error corresponding to the second training candidate frame area;

and the second training module is used for training according to the labeled real face region and the confidence level to obtain a face detection model.

In one embodiment, the size conversion unit includes:

the first mapping unit is used for mapping the target human body area to the initial image to obtain the position of the target human body area;

the human body image extraction unit is used for extracting a human body image corresponding to the target human body area from the initial image according to the position;

the scale transformation unit is used for performing scale transformation on the human body image to obtain a plurality of images to be processed with standard sizes.

For specific limitations of the face recognition apparatus, reference may be made to the above limitations of the face recognition method, and no further description is given here. The respective modules in the above-described face recognition apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal or a server, and when it is a terminal, its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a face recognition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of: acquiring an initial image, and performing hierarchical face detection on the initial image to obtain a target face area, wherein the hierarchical face recognition comprises the steps of performing human body area recognition on the initial image and performing face recognition on the recognized human body area; and cutting the target face area from the initial image, inputting the cut target face area into a face recognition model for face feature extraction, and using the cut target face area for face recognition.

In one embodiment, the step of performing hierarchical face detection on the initial image implemented when the processor executes the computer program to obtain a target face area includes: detecting the initial image to obtain a target human body area; preprocessing the obtained target human body area to obtain at least one image to be processed with standard size; performing face detection on the image to be processed to obtain a face area to be processed; and mapping the face area to be processed to the initial image to obtain a target face area.

In one embodiment, the detecting the initial image to obtain the target human body region implemented by the processor when executing the computer program includes: scaling and graying the initial image to obtain an image to be identified; and detecting the image to be identified to obtain a target human body area.

In one embodiment, the detecting the image to be identified, which is implemented when the processor executes the computer program, obtains a target human body area, including: inputting the image to be identified into a human body detection model which is trained in advance; obtaining a plurality of candidate frames with different length-width ratios and areas, which are predetermined in the human body detection model; extracting image features corresponding to a plurality of candidate frames with different length-width ratios and areas through the human body detection model, and calculating to obtain target confidence coefficient of each candidate frame according to the extracted image features; and obtaining the target human body region through non-maximum value inhibition operation and calculation according to the target confidence coefficient of each candidate frame.

Wherein the training mode of the human body detection model involved in the execution of the computer program by the processor comprises the following steps: acquiring a training set marked with a real human body area; downsampling each first training image in a training set to obtain a first feature map; generating a plurality of first training candidate frames with different length-width ratios and areas according to the first feature map; calculating a first training error of the first training candidate frame; and adjusting model parameters according to the marked real human body region in the training set and the first training error, and training to obtain a human body detection model.

In one embodiment, the aspect ratio of the first training candidate frame involved in the execution of the computer program by the processor is preset; or the aspect ratio of the first training candidate frame is obtained by clustering the marked real human body areas in the first training image; the area of the first training candidate frame involved in the execution of the computer program by the processor is determined according to the area of the marked real human body area in each class after the marked human body areas in the first training image are clustered; or the area of the first training candidate frame is determined by the difference of areas of the marked human body areas caused by the difference of the distance.

In one embodiment, the performing face detection on the image to be processed, which is implemented when the processor executes the computer program, obtains a face area to be processed, and the method includes: respectively inputting the images to be processed into a face detection model; respectively extracting at least one preset position area of the image to be processed through the face detection model; generating face candidate frames with different proportions and sizes according to the preset position area; calculating the confidence coefficient corresponding to the face candidate frame region; and obtaining the face area through non-maximum value inhibition operation and calculation according to the confidence coefficient of each face candidate frame.

The training mode of the face detection model related to the execution of the computer program by the processor comprises the following steps: extracting a real human body region from a training set marked with the real human body region, wherein the real human body region is marked with a real human face region; performing scale transformation on the real human body area to obtain a plurality of second training images with standard sizes; generating a plurality of preset position areas of the second training images; generating a plurality of second training candidate frames with different proportions and sizes according to the preset position area; calculating a second training error corresponding to the second training candidate frame region; and training according to the marked real face area and the confidence coefficient to obtain a face detection model.

In one embodiment, the scaling of the obtained target human body region to obtain a plurality of images to be processed of standard size, which is implemented when the processor executes the computer program, includes: mapping the target human body area to the initial image to obtain the position of the target human body area; extracting a human body image corresponding to the target human body region from the initial image according to the position; and performing scale transformation on the human body image to obtain a plurality of images to be processed with standard sizes.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring an initial image, and performing hierarchical face detection on the initial image to obtain a target face area, wherein the hierarchical face recognition comprises the steps of performing human body area recognition on the initial image and performing face recognition on the recognized human body area; and cutting the target face area from the initial image, inputting the cut target face area into a face recognition model for face feature extraction, and using the cut target face area for face recognition.

In one embodiment, the performing hierarchical face detection on the initial image to obtain a target face area when the computer program is executed by the processor includes: detecting the initial image to obtain a target human body area; preprocessing the obtained target human body area to obtain at least one image to be processed with standard size; performing face detection on the image to be processed to obtain a face area to be processed; and mapping the face area to be processed to the initial image to obtain a target face area.

In one embodiment, the detecting the initial image to obtain the target human body region implemented when the computer program is executed by the processor includes: scaling and graying the initial image to obtain an image to be identified; and detecting the image to be identified to obtain a target human body area.

In one embodiment, the detecting the image to be identified, which is implemented when the computer program is executed by the processor, obtains a target human body region, including: inputting the image to be identified into a human body detection model which is trained in advance; obtaining a plurality of candidate frames with different length-width ratios and areas, which are predetermined in the human body detection model; extracting image features corresponding to a plurality of candidate frames with different length-width ratios and areas through the human body detection model, and calculating to obtain target confidence coefficient of each candidate frame according to the extracted image features; and obtaining the target human body region through non-maximum value inhibition operation and calculation according to the target confidence coefficient of each candidate frame.

Wherein the training mode of the human body detection model when the computer program is executed by the processor comprises the following steps: acquiring a training set marked with a real human body area; downsampling each first training image in a training set to obtain a first feature map; generating a plurality of first training candidate frames with different length-width ratios and areas according to the first feature map; calculating a first training error of the first training candidate frame; and adjusting model parameters according to the marked real human body region in the training set and the first training error, and training to obtain a human body detection model.

In one embodiment, the aspect ratio of the first training candidate frame referred to when the computer program is executed by the processor is preset; or the aspect ratio of the first training candidate frame is obtained by clustering the marked real human body areas in the first training image; the area of the first training candidate frame involved when the computer program is executed by the processor is determined according to the area of the marked real human body area in each class after the marked human body areas in the first training image are clustered; or the area of the first training candidate frame is determined by the difference of areas of the marked human body areas caused by the difference of the distance.

In one embodiment, the performing face detection on the image to be processed to obtain a face area to be processed when the computer program is executed by the processor includes: respectively inputting the images to be processed into a face detection model; respectively extracting at least one preset position area of the image to be processed through the face detection model; generating face candidate frames with different proportions and sizes according to the preset position area; calculating the confidence coefficient corresponding to the face candidate frame region; and obtaining the face area through non-maximum value inhibition operation and calculation according to the confidence coefficient of each face candidate frame.

Wherein the training mode of the face detection model when the computer program is executed by the processor comprises the following steps: extracting a real human body region from a training set marked with the real human body region, wherein the real human body region is marked with a real human face region; performing scale transformation on the real human body area to obtain a plurality of second training images with standard sizes; generating a plurality of preset position areas of the second training images; generating a plurality of second training candidate frames with different proportions and sizes according to the preset position area; calculating a second training error corresponding to the second training candidate frame region; and training according to the marked real face area and the confidence coefficient to obtain a face detection model.

In one embodiment, the performing the scaling of the obtained target human body region to obtain a plurality of images to be processed of standard size, which is implemented when the computer program is executed by the processor, includes: mapping the target human body area to the initial image to obtain the position of the target human body area; extracting a human body image corresponding to the target human body region from the initial image according to the position; and performing scale transformation on the human body image to obtain a plurality of images to be processed with standard sizes.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of face recognition, the method comprising:

acquiring an initial image, and detecting the initial image to obtain a target human body area;

respectively inputting the images to be processed into a face detection model;

obtaining a face area to be processed through non-maximum suppression operation and according to the confidence coefficient of each face candidate frame;

Mapping the face region to be processed to the initial image to obtain a target face region;

the target face area is cut from the initial image and then input into a face recognition model for face feature extraction for face recognition;

the training mode of the face detection model comprises the following steps:

generating a plurality of preset position areas of the second training images;

and training according to the marked real face area and the second training error to obtain a face detection model.

2. The method of claim 1, wherein detecting the initial image to obtain a target human body region comprises:

scaling and graying the initial image to obtain an image to be identified;

And detecting the image to be identified to obtain a target human body area.

3. The method according to claim 2, wherein detecting the image to be identified to obtain the target human body region comprises:

acquiring a training set marked with a real human body area;

calculating a first training error of the first training candidate frame;

4. The method of claim 3, wherein the aspect ratio of the first training candidate frame is predetermined; or alternatively

The aspect ratio of the first training candidate frame is obtained by clustering the marked real human body areas in the first training image; the area of the first training candidate frame is determined according to the area of the marked real human body area in each class after the marked real human body areas in the first training image are clustered; or alternatively

The area of the first training candidate frame is determined by the difference of areas of the marked real human body areas caused by the difference of the distance.

5. The method according to any one of claims 1-4, wherein scaling the obtained target human body region to obtain a plurality of images to be processed of standard size, comprises:

6. A face recognition device, the device comprising:

The face region detection module is used for acquiring an initial image and detecting the initial image to obtain a target human body region; preprocessing the obtained target human body area to obtain at least one image to be processed with standard size; respectively inputting the images to be processed into a face detection model; respectively extracting at least one preset position area of the image to be processed through the face detection model; generating face candidate frames with different proportions and sizes according to the preset position area; calculating the confidence coefficient corresponding to the face candidate frame region; obtaining a face area to be processed through non-maximum suppression operation and according to the confidence coefficient of each face candidate frame; mapping the face region to be processed to the initial image to obtain a target face region;

the face recognition module is used for inputting the target face region which is cut from the initial image into a face recognition model for face feature extraction and used for face recognition;

the apparatus further comprises:

and the second training module is used for training according to the noted real face area and the second training error to obtain a face detection model.

7. The apparatus of claim 6, wherein the face region detection module comprises:

8. The apparatus of claim 7, wherein the human body recognition unit comprises:

the first input unit is used for inputting the image to be identified into a human body detection model which is trained in advance;

the human body area determining unit is used for obtaining a target human body area through non-maximum value inhibition operation and calculation according to the target confidence coefficient of each candidate frame;

the apparatus further comprises:

and the first training module is used for adjusting model parameters according to the real human body area marked in the training set and the first training error, and training to obtain a human body detection model.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.