CN111914601A - Efficient batch face recognition and matting system based on deep learning - Google Patents

Efficient batch face recognition and matting system based on deep learning Download PDF

Info

Publication number
CN111914601A
CN111914601A CN201910387472.XA CN201910387472A CN111914601A CN 111914601 A CN111914601 A CN 111914601A CN 201910387472 A CN201910387472 A CN 201910387472A CN 111914601 A CN111914601 A CN 111914601A
Authority
CN
China
Prior art keywords
face
batch
module
matting
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910387472.XA
Other languages
Chinese (zh)
Inventor
陈支泽
朱振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Shineng Intelligent Technology Co ltd
Original Assignee
Nanjing Shineng Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Shineng Intelligent Technology Co ltd filed Critical Nanjing Shineng Intelligent Technology Co ltd
Priority to CN201910387472.XA priority Critical patent/CN111914601A/en
Publication of CN111914601A publication Critical patent/CN111914601A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering

Abstract

The invention discloses an efficient batch face recognition and matting system based on deep learning, which comprises: the multi-thread video decoding module is used for acquiring video streams of multiple paths of cameras and pedestrians, and each path of video stream is decoded by using a single thread to obtain a digital representation form of an image; the face detection module maintains a queue for receiving images obtained by multi-thread decoding, splices a plurality of images, batched the plurality of images, and designs an improved cascade face detection network to carry out face detection to obtain a face area and face key point coordinates; the face recognition module extracts face features by using an improved lightweight neural network and performs comparison recognition; and the face matting module is used for matting out a face region which takes the center of a face as a central point and is high and wide by a user, and if the region contains other faces, the other faces are subjected to fuzzy processing to protect the portrait right and privacy of the user.

Description

Efficient batch face recognition and matting system based on deep learning
Technical Field
The invention relates to the field of face detection and recognition, in particular to an efficient batch face recognition and matting system based on deep learning.
Background
The face detection and recognition in a real scene are always hot topics in the field of computer vision, and the difficulty is represented as the fact that the face in the real scene is accompanied by interference of factors such as complex gestures, light rays, expressions, shielding and the like, so how to design an algorithm with good robustness and high speed to detect and recognize the face becomes a key problem which needs to be solved urgently. The traditional face detection or recognition method extracts an interest region in a face through manually designed local description features such as LBP, HOG and Gabor, classifies the interest region through an integrated classifier, and judges the detection and recognition results, but the manually designed feature extraction method is not strong enough in robustness, difficult to cope with the change of real scene noise, not high in efficiency and difficult to meet the requirement of real-time performance. Face detection and recognition techniques typified by deep learning have been highly distinctive in recent years. Compared with the traditional method, the deep learning-based method does not need to design features manually, uses the neurons of the multilevel organization and the nonlinear activation unit to extract the features, uses a large number of samples to learn in the training stage, and greatly improves the accuracy of detection and identification; on the other hand, due to the development of modern parallel computing units and parallel technologies, the real-time performance can meet the scene requirements, and the deep learning method becomes an important means for face detection and application landing recognition.
MTCNN is an important deep learning-based face detection method that finds face regions and face key points in an image through multi-stage candidate frames. The original MTCNN network structure is relatively inefficient, and when the image pixels in a real scene are relatively large, the predicted time is difficult to meet the real-time requirement; particularly, when a single server is used for accessing a plurality of cameras to detect the human face, the sequential detection speed of the images of each camera is very slow; if the detected human faces are sequentially identified by using a common convolutional neural network, the running speed is further reduced; in some scenes, it is also difficult to efficiently deduct a background area containing a face and push the background area to a user, and protect the portrait right and privacy of other people in the background.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an efficient batch face recognition and matting system based on deep learning, which can rapidly and efficiently process face images acquired by a plurality of cameras in batches, complete detection and recognition tasks, and matte the face to be recognized in an area containing a certain background size, and can perform mosaic processing to protect the portrait right and privacy of a user if other faces are contained. The system comprises a multi-thread video decoding module, a face detection module, a face recognition module and a face matting module.
In a first aspect, a multi-thread video decoding module receives an image stream collected by a camera using a plurality of independent threads, decodes the image to obtain a matrix form that can be represented in a computer, and inserts the matrix form into a shared queue.
In a second aspect, the present invention provides a face detection module for detecting a face contained in an image, including:
(1) copying the decoded image queue once, and performing median filtering processing on the copied image, wherein the median filtering processing aims at reducing salt and pepper noise in the image, and avoiding false detection and improving the speed of detection and subsequent identification;
(2) splicing the filtered image queues according to rows to form a complete batch for a face detection module to detect faces, wherein the batch aims to batch images, and the efficiency of detecting batch face data at one time by using a parallel computing technology on hardware equipment with a multi-core computing unit is greatly higher than that of sequentially detecting single images;
(3) designing a new cascade detection network Alpha-MTCNN, changing the minimum input of a PNet network in the original version MTCNN from 12 x 12 to 24 x 24, meeting the minimum face requirement required by a real scene, and simultaneously reducing the range of an image pyramid so as to improve the rate of multi-scale space detection;
(4) changing the sizes and the numbers of Convolution kernels in the PNet, the RNet and the ONet, reducing the calculated amount of Convolution by using a depth Separable Convolution (Depthwise Separable Convolution) technology, and reducing the calculated amount by using a Stride method to replace an original Max Pooling (Max Pooling) method in the Convolution process; and the BN layer is introduced to solve the problem that the parameter update in the network causes frequent change of data distribution.
(5) An LNet network is separately accessed behind the ONet to extract the key points of the face.
In a third aspect, the present invention provides a face recognition module for recognizing and detecting a detected face, including:
(1) aligning the detected human faces, zooming the aligned human faces into a size of 112 multiplied by 112, and forming a batch, namely a batch, by all the obtained human faces;
(2) the input format of the recognition network is a four-dimensional tensor, namely batch × 3 × 112 × 112, wherein batch represents the number of aligned human faces, 3 represents the number of RGB image channels, and 112 represent the width and height of images, and the efficiency of a parallel computing unit can be maximized by using an improved high-efficiency lightweight neural network MobileFaceNet to extract feature vectors in batch and extracting features in batch on a GPU or a server with a plurality of computing cores compared with the sequential extraction of features of a single image;
(3) and normalizing the extracted face feature vector, comparing the normalized face feature vector with the face vectors of known face labels in the bottom library, calculating cosine distance, and obtaining the identified face label by using a nearest neighbor classifier.
In a fourth aspect, the present invention provides a face matting module for matting out a region of a face of interest in an image, comprising:
(1) taking the mean value of four coordinates of the recognized face rectangle as a central point, and taking the specified width and height as boundaries for expansion to form a rectangular area;
(2) polling all other non-interesting faces, comparing the intersection ratios of these face regions and the above rectangular regions IoU;
(3) if IoU is greater than 0, median filtering is carried out on the face, and at the moment, other faces which are not recognized are blurred, so that the portrait right is protected.
Drawings
FIG. 1 is a basic flow diagram of the present invention;
FIG. 2 is a basic flow chart of the face detection method of the present invention;
FIG. 3 is a diagram of a novel PNet face detection network architecture in accordance with the present invention;
FIG. 4 is a diagram of a novel RNet face detection network architecture in accordance with the present invention;
FIG. 5 is a diagram of a novel ONet face detection network architecture in accordance with the present invention;
FIG. 6 is a diagram of a novel LNet face key point detection network architecture in accordance with the present invention;
fig. 7 is a schematic view of face matting.
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the specific embodiments. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
FIG. 1 is a schematic diagram of an efficient batch face recognition and matting system based on deep learning according to the present invention. Referring to fig. 1, the system provided in this embodiment includes a multi-thread video decoding module, a face detection module, a face recognition module, and a face matting module. Firstly, the video decoding module receives image data transmitted by a plurality of cameras, the decoding of each camera is carried out in a single different thread, so that the phenomena of screen splash, disorder and the like caused by delay are prevented, and the decoding protocol can use H.265 or other protocols. And transmitting the image data obtained by decoding each thread to a subsequent face detection module.
In this implementation, the face detection module completes a detection task after image decoding, and obtains 4 coordinate positions of a face rectangular region and 5 coordinate positions of face key points, where 4 coordinates refer to coordinate positions corresponding to four corners of the face rectangular region, and 5 face key points refer to positions of key points at five positions including two eyes, a nose, and two ends of a mouth corner in a face. The face detection module comprises image batch processing, image smoothing processing and processing of four novel multi-stage networks PNet, RNet, ONet and LNet.
The detection steps of the face detection module are shown in fig. 2. Firstly, smoothing an image, then performing Gaussian multi-scale sampling, accessing the processed image to a novel PNet network to obtain a primary candidate frame, and performing Non-Maximum Suppression (NMS) on a face detected by the PNet network so as to filter repeated candidate areas; accessing the filtered candidate frame into RNet to further screen a non-face area, and then performing NMS (network management system) processing; and further processing the candidate frame obtained by the RNet through ONet to obtain a final candidate frame, and then processing the final candidate frame by using NMS (network management system) to obtain a final face region. And accessing the LNet after the ONet to extract the key points of the face.
In specific implementation, the face detection module uses an independent thread to execute a detection task, the thread contains a shared queue for receiving images obtained by multi-thread decoding, if the current queue is idle, the images in the shared queue are spliced in batches according to lines to form a single large image, the batch aims to facilitate parallel computing cores to operate, further the face recognition efficiency is greatly improved, and the utilization rate of hardware resources is improved, and parallel computing modes include but are not limited to multi-core, multi-thread, single instruction multiple data and other modes, and are not detailed herein. And carrying out copy operation on the large graphs formed in batches once, and then carrying out subsequent detection processing on the copied large graphs. The purpose of copying here is to have a certain loss of information to the image preprocessing operations in the detection task, while the integrity of the information needs to be maintained in the subsequent recognition process.
The median filtering method is a nonlinear smoothing technology, the gray value of each pixel point is set as the median of the gray values of all the pixel points in a certain neighborhood window of the point, the filtering effect on impulse noise and salt and pepper noise is obvious, the median filtering processing is carried out in the implementation, the purpose of the median filtering processing is to smooth the image and remove the noise in the image, when the scene facing the camera is complex, such as dark light and more dense points, the number of primary face candidate frames can be reduced to a certain extent by introducing filtering operation, the false detection rate can be reduced, and the detection speed can be accelerated.
Optionally, the image smoothing preprocessing is not limited to median filtering, and gaussian filtering and mean filtering may also be used, and the specific implementation process may be attempted for a specific scene, which is not described in detail here.
In this embodiment, the face detection task uses a multi-stage Convolutional Neural Network (CNN). The convolutional neural network is a feedforward neural network comprising a plurality of layers of convolutional filters and an activation unit, and a great breakthrough is made in the field of computer vision because a characteristic extraction mode does not need to be manually designed, and a prediction task can be accurately executed when the training data volume is huge. MTCNN is a multi-stage convolution neural network used for face detection, the MTCNN of the original version has low operation efficiency and is easy to be interfered by noise to cause false detection, the invention improves and optimizes the MTCNN in image input preprocessing, network structure and the like in many aspects, a new high-efficiency face detection network, namely Alpha-MTCNN is designed, the Alpha-MTCNN respectively passes through a novel PNet network and a novel RNet network for the input image, the novel ONet network outputs region coordinates of all faces in the image, and simultaneously passes through a novel LNet network to output key point coordinates of the faces.
Specifically, for the Alpha-MTCNN method, the minimum input of the original PNet network is changed from 12 × 12 to 24 × 24, so that the minimum face requirement required by a real scene is met, and the range of the image Gaussian pyramid is reduced, so that the multi-scale space detection rate is increased; the sizes of Convolution kernels and the number of the Convolution kernels in the PNet, the RNet and the ONet are changed, the calculated amount of Convolution is reduced by using a depth Separable Convolution (Depthwise Separable Convolution) technology, and the calculated amount is reduced by using a Stride method instead of an original Max Pooling (Max Pooling) method in the Convolution process. In addition, Batch Normalization (BN) is used for each neural network to solve the problem that data distribution changes caused by frequent updating of network parameters during training, so that the training convergence degree is accelerated, and the method comprises the following steps:
(1) calculating the mean value of the batch processing data:
Figure RE-GDA0002172035580000041
(2) calculating batch data variance:
Figure RE-GDA0002172035580000051
(3) input data is normalized:
Figure RE-GDA0002172035580000052
(4) performing scale transformation and offset:
Figure RE-GDA0002172035580000053
in particular, the details of the new PNet network architecture used above are shown in fig. 3. The new PNet is a full convolution network that accepts arbitrary inputs of sizes greater than 24 x 24. Performing convolution operation with convolution kernel of 3 × 3 size and step of 2 on the input image; using a depth separable convolution technology for the obtained feature map, namely firstly using a convolution kernel of 3 multiplied by 3 to carry out grouping convolution operation, and then using a convolution kernel of 1 multiplied by 1 to carry out common convolution operation; carrying out depth separable convolution operation with consistent convolution kernel size on the new feature graph; on the basis, the operation of depth separable convolution and step length of 2 is carried out again; and finally, performing a depth separable convolution operation with consistent convolution kernel size. After the convolution operations, the probability information of whether the face is contained and the coordinate position of the face rectangular region can be obtained by accessing the softmax classification layer and the common regression layer.
Details of the new RNet network architecture used are shown in figure 4. The new RNet is a full convolution network that accepts the face candidate box after PNet filtering and scales the candidate box by 24 x 24 size as input. Firstly, convolution operation with the size of 2 multiplied by 2 and the step length of 2 is used for an input image; this operation then sequentially switches in four depth separable convolution operations of convolution size 3 x 3, where the third separable convolution layer uses a sliding operation with a step size of 2. After the convolution operations, the probability information of whether the face is contained and the coordinate position of the face rectangular region can be obtained by accessing the softmax classification layer and the common regression layer.
Details of the new ONet network architecture used are shown in figure 5. The new ONet is a full convolution network that accepts face candidate frames after RNet filtering and scales the candidate frames by 48 x 48 size as input. Firstly, performing convolution operation on an input image by using a convolution kernel with the size of 3 multiplied by 3, the step length of 2 and the filling of 1; then, the operation is sequentially accessed into 3 depth separable convolution operations with convolution size of 3 multiplied by 3, step length of 2 and filling of 1; finally, a depth separable convolution operation layer with the convolution size of 3 x 3 is accessed. After the convolution operation, the probability information of whether the face is contained and the coordinate position of the face rectangular region can be obtained by accessing the softmax classification layer and the common regression layer, and at the moment, the final face detection region coordinate information is obtained.
Details of the new LNet network architecture used are shown in fig. 6. The new LNet is a full convolution network that accepts face candidate frames after ONet filtering for the purpose of extracting face keypoint coordinate information. The candidate box is scaled by 48 x 48 size as input at the time of implementation. Firstly, performing convolution operation on an input image by using a convolution kernel with the size of 3 multiplied by 3, the step length of 2 and the filling of 1; then accessing a depth separable convolution operation layer with the convolution size of 2 multiplied by 2, the step length of 2 and the filling of 1; then sequentially accessing 3 depth separable convolution operations with convolution size of 3 multiplied by 3, step size of 2 and padding of 1; finally, a deep separable convolution operation with the convolution size of 3 x 3 is accessed. After the convolution operations, the coordinate positions of the key points of the human face can be obtained by accessing the regression layer.
The network design structure used in this implementation includes, but is not limited to, the above description, and the specific number of convolution kernels per layer in each network can be tried according to specific data and scenarios, which will not be described in detail here.
The face recognition module of the invention completes the tasks of feature extraction and comparison after detection.
Preferably, the face recognition module aligns the detected faces, transforms the faces into a uniform form according to the detected 5 face key point coordinates by using a similarity transformation method, scales the aligned faces into 112 × 112 sizes, and forms a batch of all the detected faces. The purpose of batch is to facilitate the operation of parallel computing cores, thereby greatly improving the face recognition efficiency and improving the utilization rate of hardware resources. The parallel computing modes include, but are not limited to, multi-core, multi-thread, single instruction multiple data (simd), etc., and are not described in detail herein.
In this embodiment, the input format of the face recognition network is a four-dimensional tensor, that is, batch × 3 × 112 × 112, where batch represents the number of aligned faces, 3 represents the number of RGB image channels, and 112 represent the width and height of an image. The implementation uses the high-efficiency improved lightweight neural network MobileFaceNet to extract the feature vectors in batches, and the used MobileFaceNet network is connected with the feature vectors with 512 dimensions in a full output mode so as to enhance the representation capability of the image and have stronger identification capability on the face, thereby improving the face comparison accuracy.
The face recognition module normalizes the extracted 512-dimensional face feature vectors, compares the normalized 512-dimensional face feature vectors with face vectors of known face labels, calculates cosine distances, and obtains recognized face labels by using a nearest neighbor classifier. The cosine distance calculation formula is:
Figure DEST_PATH_IMAGE005
the face matting module of the invention needs to find the area with the specified size containing the face (the interesting face) from the original image for each identified face, and the user can select to push the face area. If other faces (non-interesting faces) are found in the area, a blurring process is needed to protect the portrait right and privacy of the user. The method comprises the following specific steps:
(1) the method comprises the steps of taking the average value of four coordinate positions of an identified face rectangle as a central point, taking specified width and height as boundaries for expansion, wherein the width and height can be set by a user to form an expanded rectangular area;
(2) polling all other non-interesting faces, comparing the intersection ratios of these face regions and the above rectangular regions IoU;
(3) if IoU is greater than 0, median filtering is carried out on the face, and at the moment, other faces which are not recognized are blurred, so that the portrait right is protected.
As shown in fig. 7, the face matting effect implemented by the present invention is demonstrated. Fig. 7 shows the original image on the left and the image after face extraction, in which other faces are blurred. According to the invention, the face of the identified interesting person can be scratched out through the face scratching module, and the scratched and expanded rectangular region not only contains the face but also contains a part of background image, so that the face does not appear to be obtrusive, and the user experience can be improved to a certain extent; and the non-interesting faces are subjected to mosaic processing in the pushed pictures, so that the portrait right and privacy of the user are ensured not to be disclosed.

Claims (14)

1. The utility model provides a high-efficient batch face identification and sectional drawing system based on deep learning which characterized in that includes following module:
the multithreading video decoding module is used for acquiring videos of multiple paths of camera faces, and each path of video stream is decoded by using a single thread to obtain a matrix representation form of an image;
the face detection module maintains a queue to receive images obtained by Multi-thread decoding, splices a plurality of images to facilitate batch processing, and designs an improved cascade face detection Network Alpha-MTCNN (Alpha Multi-task conditional Neural Network) to extract face regions and face key point coordinates;
the face recognition module extracts face features by using an improved lightweight neural network MobileFaceNet and performs comparison recognition;
and the face matting module is used for matting out a face region with the center of the face region as a central point and the height and width specified by the user for the face image of interest, and if the region contains other faces, the face matting module is used for carrying out fuzzy processing on the other faces to protect the portrait right and privacy of the user.
2. The efficient batch face recognition and matting system based on deep learning of claim 1, wherein the input and video decoding module receives pedestrian videos of different regions collected by multiple high-definition cameras and decodes the pedestrian videos by using multiple independent threads on a shared server, and the decoded images are placed in a shared queue.
3. The efficient batch face recognition and matting system based on deep learning according to claim 1, wherein the face detection module performs median filtering on the input face image to reduce noise, so as to reduce redundancy of candidate face frames in the cascade network and improve the speed of face detection.
4. The efficient batch face recognition and matting system based on deep learning of claim 1, wherein the face detection module splices a plurality of images obtained from an image queue in batches according to rows, and performs batch detection on a Graphics Processing Unit (GPU) or a server with a plurality of computational cores.
5. The efficient batch face recognition and matting system based on deep learning as claimed in claim 3, wherein the face detection module uses median filtering to smooth the image before face detection, the purpose of the median filtering is to smooth the image and remove noise in the image, when the scene faced by the camera is complex, such as dark light and more dense points, the number of primary face candidate frames can be reduced to a certain extent by introducing filtering operation, so that not only can the false detection rate be reduced, but also the detection speed can be increased.
6. The deep learning based efficient batch face recognition and matting system according to claim 3, wherein the face detection module innovatively improves an original cascade face detection method MTCNN (Multi-task conditional Neural Network) Network from multiple aspects to obtain Alpha-MTCNN.
7. The efficient batch face recognition and matting system based on deep learning as claimed in claim 6, wherein the Alpha-MTCNN method changes the minimum input of the original PNet network from 12 x 12 to 24 x 24, satisfying the minimum face requirement for real scene, and reducing the range of Gaussian pyramid of multi-scale image, thereby increasing the speed of multi-scale space detection.
8. The efficient batch face recognition and matting system based on deep learning as claimed in claim 6, wherein the Alpha-MTCNN method changes the Convolution kernel size and the number of Convolution kernels in PNet, RNet and ONet, reduces the amount of computation of Convolution by using deep Separable Convolution (Depthwise Separable Convolition) technique, and reduces the amount of computation by using Stride method instead of the original Max Pooling (Max Pooling) method during Convolution; batch Normalization (BN) is used for solving the problem that the data distribution changes due to frequent updating of network parameters during training, and the steps are as follows:
(1) calculating the mean value of the batch processing data:
Figure 970074DEST_PATH_IMAGE001
(2) calculating batch data variance:
Figure 887214DEST_PATH_IMAGE002
(3) input data is normalized:
Figure 355367DEST_PATH_IMAGE003
(4) performing scale transformation and offset:
Figure 750576DEST_PATH_IMAGE004
9. the system of claim 6, wherein the Alpha-MTCNN has access to an LNet network separately behind the ONet for face key point extraction.
10. The efficient batch face recognition and matting system based on deep learning of claim 1 wherein the face detection module returns all detected face region coordinates and face key point coordinates to subsequent recognition modules in batch.
11. The system of claim 1, wherein the face recognition module aligns the detected faces using similarity transformation, scales the aligned faces to 112 × 112, and combines all the detected faces into a batch.
12. The efficient batch face recognition and matting system based on deep learning according to claim 10, wherein the input format of the face recognition module recognition network is four-dimensional tensor, i.e. batch x 3 x 112, where batch represents the number of aligned faces, 3 represents the number of RGB image channels, and 112 represent the width and height of the image, and the extraction of feature vectors in batch using the improved efficient lightweight neural network MobileFaceNet can maximize the efficiency of parallel computing units compared with the sequential extraction of features from a single image on a GPU or a server with multiple computing cores.
13. The efficient batch face recognition and matting system based on deep learning of claim 1 wherein the face recognition module normalizes the extracted face feature vectors, compares the normalized face feature vectors with face vectors of known face labels, calculates cosine distances, and obtains recognized face labels using nearest neighbor classifiers.
14. The efficient batch face recognition and matting system according to claim 1, wherein the face matting module needs to find a region of a specified size containing each recognized face from an original drawing for each recognized face, and the specific implementation includes:
taking the mean value of four coordinates of the recognized face rectangle as a central point, and taking the specified width and height as boundaries for expansion to form a rectangular area;
polling all other non-interesting faces, comparing the intersection ratios of these face regions and the above rectangular regions IoU;
if IoU is greater than 0, median filtering is carried out on the face, and other faces which are not interested are blurred at the moment, so that the portrait right and privacy are protected.
CN201910387472.XA 2019-05-10 2019-05-10 Efficient batch face recognition and matting system based on deep learning Pending CN111914601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910387472.XA CN111914601A (en) 2019-05-10 2019-05-10 Efficient batch face recognition and matting system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910387472.XA CN111914601A (en) 2019-05-10 2019-05-10 Efficient batch face recognition and matting system based on deep learning

Publications (1)

Publication Number Publication Date
CN111914601A true CN111914601A (en) 2020-11-10

Family

ID=73242521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910387472.XA Pending CN111914601A (en) 2019-05-10 2019-05-10 Efficient batch face recognition and matting system based on deep learning

Country Status (1)

Country Link
CN (1) CN111914601A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712489A (en) * 2020-12-31 2021-04-27 北京澎思科技有限公司 Method, system and computer readable storage medium for image processing
CN112734682A (en) * 2020-12-31 2021-04-30 杭州艾芯智能科技有限公司 Face detection surface vector data acceleration method, system, computer device and storage medium
CN113807327A (en) * 2021-11-18 2021-12-17 武汉博特智能科技有限公司 Deep learning side face image processing method and system based on light compensation
CN116030524A (en) * 2023-02-09 2023-04-28 摩尔线程智能科技(北京)有限责任公司 Face recognition method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712489A (en) * 2020-12-31 2021-04-27 北京澎思科技有限公司 Method, system and computer readable storage medium for image processing
CN112734682A (en) * 2020-12-31 2021-04-30 杭州艾芯智能科技有限公司 Face detection surface vector data acceleration method, system, computer device and storage medium
CN112734682B (en) * 2020-12-31 2023-08-01 杭州芯炬视人工智能科技有限公司 Face detection surface vector data acceleration method, system, computer device and storage medium
CN113807327A (en) * 2021-11-18 2021-12-17 武汉博特智能科技有限公司 Deep learning side face image processing method and system based on light compensation
CN116030524A (en) * 2023-02-09 2023-04-28 摩尔线程智能科技(北京)有限责任公司 Face recognition method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11610362B2 (en) Data volume sculptor for deep learning acceleration
Luo et al. Fire smoke detection algorithm based on motion characteristic and convolutional neural networks
CN111914601A (en) Efficient batch face recognition and matting system based on deep learning
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
RU2427911C1 (en) Method to detect faces on image using classifiers cascade
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
WO2022083504A1 (en) Machine-learning model, methods and systems for removal of unwanted people from photographs
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
Guo et al. Improved hand tracking system
Kryjak et al. FPGA implementation of real-time head-shoulder detection using local binary patterns, SVM and foreground object detection
Nguyen et al. Yolo based real-time human detection for smart video surveillance at the edge
Sharma et al. Towards a robust, real-time face processing system using CUDA-enabled GPUs
Saran et al. Traffic video surveillance: Vehicle detection and classification
CN114783003A (en) Pedestrian re-identification method and device based on local feature attention
CN111931603A (en) Human body action recognition system and method based on double-current convolution network of competitive combination network
CN111291612A (en) Pedestrian re-identification method and device based on multi-person multi-camera tracking
JP2005032250A (en) Method for processing face detection, and device for detecting faces in image
Li et al. Robust foreground segmentation based on two effective background models
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
Watanabe et al. Distance to center of mass encoding for instance segmentation
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN114333062A (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
Li et al. Video face recognition system: Retinaface-mnet-faster and secondary search
Dong et al. Foreground detection with simultaneous dictionary learning and historical pixel maintenance
Depraz et al. Real-time object detection and tracking in omni-directional surveillance using GPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201110

WD01 Invention patent application deemed withdrawn after publication