CN108197613B - Face detection optimization method based on deep convolution cascade network - Google Patents

Face detection optimization method based on deep convolution cascade network Download PDF

Info

Publication number
CN108197613B
CN108197613B CN201810146901.XA CN201810146901A CN108197613B CN 108197613 B CN108197613 B CN 108197613B CN 201810146901 A CN201810146901 A CN 201810146901A CN 108197613 B CN108197613 B CN 108197613B
Authority
CN
China
Prior art keywords
network
hot area
face
area
hot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810146901.XA
Other languages
Chinese (zh)
Other versions
CN108197613A (en
Inventor
王思俊
刘琰
王国峰
慈红斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tiandy Technologies Co Ltd
Original Assignee
Tiandy Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tiandy Technologies Co Ltd filed Critical Tiandy Technologies Co Ltd
Priority to CN201810146901.XA priority Critical patent/CN108197613B/en
Publication of CN108197613A publication Critical patent/CN108197613A/en
Application granted granted Critical
Publication of CN108197613B publication Critical patent/CN108197613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a face detection optimization method based on a deep convolution cascade network, which specifically comprises the following steps: detecting a possibly appearing area of the face, namely a hot area by using a deep cascade network; updating the hot area, and setting all areas which are not in the hot area to zero; and carrying out data sparse compression on the obtained whole image. The invention reduces the redundant computation amount to a great extent and improves the operation efficiency of the algorithm. The efficiency of the CNN network face detection algorithm on a front-end embedded platform can be improved by 20-30%, the method disclosed by the invention does not lose the detection precision, a better solution is provided for the rapid operation of a deep convolutional network on the front end, and a foundation is laid for the scale application of a later network.

Description

Face detection optimization method based on deep convolution cascade network
Technical Field
The invention belongs to the technical field of automatic detection, and particularly relates to a face detection optimization method based on a deep convolution cascade network.
Background
Wide and deep networks bring better effects, but the calculated amount is also huge, so that how to enable a CNN network to run on an embedded type is a problem at present. The general approach to solve this problem is: 1. lighter CNN networks are designed to reduce the amount of computation, but small networks generally come at the expense of accuracy. 2. The acceleration is carried out by adopting special instructions according to different embedded platforms, and the compiling and debugging are time-consuming and labor-consuming. 3. The use of a hardcore with a CNN network for implementation is not problematic, but relatively costly.
Disclosure of Invention
In view of this, the present invention aims to provide a face detection optimization method based on a deep convolutional cascade network, so as to reduce the redundant computation amount and improve the detection rate.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a face detection optimization method based on a deep convolution cascade network specifically comprises the following steps:
(1) detecting a possibly appearing area of the face, namely a hot area by using a deep cascade network;
(2) updating the hot area, and setting all areas which are not in the hot area to zero;
(3) and carrying out data sparse compression on the obtained whole image.
Further, the step (1) specifically comprises the following steps:
(11) a depth cascade network is adopted, and a suspected area of a target is quickly determined by utilizing a first layer of depth network;
(12) filling the pyramid face area, resetting the interference area, and expanding the filling graph;
(13) and comparing the filled human face hot area with the original image to obtain a human face hot area map which is finally sent to a subsequent network.
Further, the step (2) specifically includes the steps of:
(21) dividing a processing frame into two odd-even processing branches according to the frame number of the video;
(22) carrying out full-image detection on odd frames and generating a hot area map at the same time, and transmitting the generated hot area to an even frame processing branch;
(23) and the even frame processing branch performs data sparse processing on the hot area map and sends the hot area map into the CNN network for operation.
Further, the step (3) specifically comprises the following steps:
(31) performing pixel-wise AND on the pyramid image and the corresponding heat map, and enabling all non-face areas to be 0;
(32) and converting the generated graph into a sparse matrix.
Compared with the prior art, the face detection optimization method based on the deep convolution cascade network has the following advantages:
the invention reduces the redundant computation amount to a great extent and improves the operation efficiency of the algorithm. The efficiency of the CNN network face detection algorithm on a front-end embedded platform can be improved by 20-30%, the method disclosed by the invention does not lose the detection precision, a better solution is provided for the rapid operation of a deep convolutional network on the front end, and a foundation is laid for the scale application of a later network.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic diagram of a face detection optimization method based on a deep convolutional cascade network according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a deep network model for face localization according to an embodiment of the present invention;
fig. 3 is a schematic diagram of image data sparseness according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1, the invention provides a face detection optimization method based on a deep convolution cascade network, which greatly reduces the redundant computation amount to a great extent and improves the detection rate. The technical scheme of the invention is mainly realized by three major aspects of face hot area calculation, updating and data sparsity. And rapidly detecting a possibly appearing area of the face, namely a hot area, by the first-layer deep network, updating the hot area according to the method of the process, setting all the areas which are not in the hot area to zero, and performing data sparse compression on the obtained whole image to reduce the data calculation amount of the subsequent network.
1. Hot zone calculation:
because the deep cascade network is adopted, the front-layer network has the function of quickly detecting the areas similar to human faces, and the subsequent network removes false targets and corrects the target positions. The invention adopts the characteristic of the front-layer network, and the hot zone calculation is realized by dividing into three steps, which are as follows:
1) the suspected area of the target, namely the hot zone, is quickly determined by utilizing the first layer deep network.
Fig. 2 is a depth network model structure for locating a face of a 3-channel image with 12 × 12 input size, where Conv represents the size of a convolution kernel, MP represents a convolution kernel moving step size, face classification represents whether an input image block with 12 × 12 size is a face, and bounding box regression represents a specific position of a face regressed by a network.
The method comprises the steps of scanning an input image by taking 12-by-12 as a basic unit to judge whether each position is a face or not, regressing to obtain the coordinate position of the face with the size of a template, and carrying out pyramid transformation on the image by setting a certain coefficient (which can be defined automatically according to actual needs, generally, the numerical value is 0.6-0.99, the larger the numerical value is, the more the number of pyramid layers is), so that the face with any size in the image can traverse the template scale, the detection of the face with multiple scales is realized, and in the code implementation process, a full convolution mode can be adopted, so that the calculation is simpler.
2) And filling the pyramid face area, resetting the interference area, and expanding the filling map to avoid missing detection caused by target motion. The method comprises the specific steps that a pyramid image where a bounding box with the face classification of 1 is located is filled with the bounding box, the rest positions are cleared, in order to avoid position deviation caused by target motion, the filling image is expanded, the center is generally kept unchanged, and the filling width is changed to be 2 times of the original bounding box.
3) And comparing the filled human face hot area with the original image to obtain a human face hot area map which is finally sent to a subsequent network.
2. The hot zone is updated.
As the targets in the scene are in continuous motion and new targets appear from time to time, in order to prevent missing detection, the invention adopts a hot area updating algorithm to update the suspected area where the face appears in real time. The method comprises the following specific steps:
1) the processing frame is divided into two odd and even processing branches according to the frame number of the video.
2) Full picture detection is performed on odd frames while hot-zone maps are generated. And passes the generated hot zone to the even frame processing branch.
3) And the even frame processing branch performs data sparseness processing on the hot zone map and then sends the hot zone map into a CNN network for operation.
3. Image data sparseness
Sparse matrix vector multiplication can replace dense matrix operation under many conditions, so that memory occupation can be greatly saved, and calculation cost is reduced. Matrix vector multiplication is different from matrix and matrix multiplication, which is completely access-intensive calculation, and the main optimization direction is to improve the access efficiency or reduce the access overhead.
The image data sparseness can be illustrated by the following example as shown in fig. 3, where we take a triplet representation method to represent the sparse matrix as an example, and the upper sparse matrix can be represented as: ((1,4,22),(1,7,15),(2,2,11),(3,4, -6),(4,6,39),(6,3,28)). The sparse matrix represented by the ternary array table not only saves space, but also enables certain operations of the matrix to be less time than a classical algorithm. Of course, there are many methods for sparse matrix, and a faster and better matrix sparse method can be selected according to actual conditions.
The image data sparseness of the invention mainly comprises the following steps:
1) and performing pixel-wise AND operation on the pyramid image and the corresponding heat map, and enabling all non-face areas to be 0.
2) And converting the generated graph into a sparse matrix.
3) The linear operation library is adopted to directly convert the dense matrix into a sparse matrix for the convolution calculation of the neural network, and the conversion format is defined as follows:
the Eigen syntax is used to define the dynamic floating point type matrix: and invokes the classforms for sparse conversion. Eigen:: Map < const Eigen:: Matrix < float, Eigen:: Dynamic, Eigen:: RowMajor > > mat _ B (B, mat _ B _ row, mat _ B _ col);
mat_b.sparseView(0.3f)
4) the CNN convolution layer adopts the sparse matrix to carry out convolution operation, so that the operation efficiency is improved, and the CNN convolution layer is one of key points for improving the efficiency of the scheme.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (1)

1. A face detection optimization method based on a deep convolution cascade network is characterized in that: the method specifically comprises the following steps:
(1) detecting a possibly appearing area of the human face, namely a hot area by using a deep convolution cascade network;
the step (1) comprises the following steps:
(11) a depth convolution cascade network is adopted, and a suspected area of a target is quickly determined by utilizing a first layer of depth convolution network;
(12) filling the pyramid face area, resetting the interference area, and expanding the filling graph;
(13) comparing the filled human face hot area with the original image to obtain a human face hot area map which is finally sent to a subsequent network;
(2) updating the hot area, and setting all areas which are not in the hot area to zero;
the step (2) comprises the steps of:
(21) dividing a processing frame into two odd-even processing branches according to the frame number of the video;
(22) carrying out full-image detection on odd frames, generating a hot area map at the same time, and transmitting the generated hot area map to an even frame processing branch;
(23) the even frame processing branch performs data sparse processing on the hot area map and sends the hot area map into a CNN network for operation;
(3) carrying out data sparse compression on the obtained whole image;
the step (3) comprises the following steps:
(31) performing pixel-wise AND operation on the pyramid image and the corresponding hot area map, and enabling all non-face areas to be 0;
(32) and converting the generated graph into a sparse matrix.
CN201810146901.XA 2018-02-12 2018-02-12 Face detection optimization method based on deep convolution cascade network Active CN108197613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810146901.XA CN108197613B (en) 2018-02-12 2018-02-12 Face detection optimization method based on deep convolution cascade network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810146901.XA CN108197613B (en) 2018-02-12 2018-02-12 Face detection optimization method based on deep convolution cascade network

Publications (2)

Publication Number Publication Date
CN108197613A CN108197613A (en) 2018-06-22
CN108197613B true CN108197613B (en) 2022-02-08

Family

ID=62593312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810146901.XA Active CN108197613B (en) 2018-02-12 2018-02-12 Face detection optimization method based on deep convolution cascade network

Country Status (1)

Country Link
CN (1) CN108197613B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446964A (en) * 2018-10-19 2019-03-08 天津天地伟业投资管理有限公司 Face detection analysis method and device based on end-to-end single-stage multiple scale detecting device
CN109711322A (en) * 2018-12-24 2019-05-03 天津天地伟业信息系统集成有限公司 A kind of people's vehicle separation method based on RFCN

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924894A (en) * 2006-09-27 2007-03-07 北京中星微电子有限公司 Multiple attitude human face detection and track system and method
CN104361363A (en) * 2014-11-25 2015-02-18 中国科学院自动化研究所 Deep deconvolution feature learning network, generating method thereof and image classifying method
CN106650682A (en) * 2016-12-29 2017-05-10 Tcl集团股份有限公司 Method and device for face tracking
CN107292272A (en) * 2017-06-27 2017-10-24 广东工业大学 A kind of method and system of the recognition of face in the video of real-time Transmission
CN107492115A (en) * 2017-08-30 2017-12-19 北京小米移动软件有限公司 The detection method and device of destination object

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
PL366550A1 (en) * 2001-08-24 2005-02-07 Koninklijke Philips Electronics N.V. Adding fields of a video frame
US9904874B2 (en) * 2015-11-05 2018-02-27 Microsoft Technology Licensing, Llc Hardware-efficient deep convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924894A (en) * 2006-09-27 2007-03-07 北京中星微电子有限公司 Multiple attitude human face detection and track system and method
CN104361363A (en) * 2014-11-25 2015-02-18 中国科学院自动化研究所 Deep deconvolution feature learning network, generating method thereof and image classifying method
CN106650682A (en) * 2016-12-29 2017-05-10 Tcl集团股份有限公司 Method and device for face tracking
CN107292272A (en) * 2017-06-27 2017-10-24 广东工业大学 A kind of method and system of the recognition of face in the video of real-time Transmission
CN107492115A (en) * 2017-08-30 2017-12-19 北京小米移动软件有限公司 The detection method and device of destination object

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Code Selection Mechanism Using Deep Learning;Hang CUI et al.;《2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip》;20161208;全文 *
基于word embedding 和 CNN的情感分类模型;蔡慧苹等;《计算机应用研究》;20161231;第33卷(第10期);第2902-2909页 *

Also Published As

Publication number Publication date
CN108197613A (en) 2018-06-22

Similar Documents

Publication Publication Date Title
CN111626128A (en) Improved YOLOv 3-based pedestrian detection method in orchard environment
CN112084923B (en) Remote sensing image semantic segmentation method, storage medium and computing device
CN106952304B (en) A kind of depth image calculation method using video sequence interframe correlation
CN111028335B (en) Point cloud data block surface patch reconstruction method based on deep learning
CN111079604A (en) Method for quickly detecting tiny target facing large-scale remote sensing image
KR20200063368A (en) Unsupervised stereo matching apparatus and method using confidential correspondence consistency
CN108197613B (en) Face detection optimization method based on deep convolution cascade network
CN104299241A (en) Remote sensing image significance target detection method and system based on Hadoop
CN100444190C (en) Human face characteristic positioning method based on weighting active shape building module
CN116402851A (en) Infrared dim target tracking method under complex background
CN117274515A (en) Visual SLAM method and system based on ORB and NeRF mapping
CN112101113B (en) Lightweight unmanned aerial vehicle image small target detection method
CN116523970B (en) Dynamic three-dimensional target tracking method and device based on secondary implicit matching
CN116051699B (en) Dynamic capture data processing method, device, equipment and storage medium
Li et al. Pillar‐based 3D object detection from point cloud with multiattention mechanism
CN108986212B (en) Three-dimensional virtual terrain LOD model generation method based on crack elimination
CN105184809A (en) Moving object detection method and moving object detection device
CN114399728B (en) Foggy scene crowd counting method
CN116228986A (en) Indoor scene illumination estimation method based on local-global completion strategy
CN114881850A (en) Point cloud super-resolution method and device, electronic equipment and storage medium
CN114494284A (en) Scene analysis model and method based on explicit supervision area relation
CN112819955A (en) Improved reconstruction method based on digital image three-dimensional model
CN107292850B (en) A kind of light stream parallel acceleration method based on Nearest Neighbor Search
Hou et al. Depth estimation and object detection for monocular semantic SLAM using deep convolutional network
Luo et al. Real-time pedestrian detection method based on improved YOLOv3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220114

Address after: No.8, Haitai Huake 2nd Road, Huayuan Industrial Zone, Binhai New Area, Tianjin, 300450

Applicant after: TIANDY TECHNOLOGIES Co.,Ltd.

Address before: Room A310, building 4, No.8, Haitai Huake 2nd Road, Huayuan Industrial Zone, Binhai New Area, Tianjin, 300384

Applicant before: TIANJIN TIANDY INFORMATION SYSTEMS INTEGRATION Co.,Ltd.

GR01 Patent grant
GR01 Patent grant