CN110705366A - Real-time human head detection method based on stair scene - Google Patents

Real-time human head detection method based on stair scene Download PDF

Info

Publication number
CN110705366A
CN110705366A CN201910844880.3A CN201910844880A CN110705366A CN 110705366 A CN110705366 A CN 110705366A CN 201910844880 A CN201910844880 A CN 201910844880A CN 110705366 A CN110705366 A CN 110705366A
Authority
CN
China
Prior art keywords
human head
stair
fchd
real
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910844880.3A
Other languages
Chinese (zh)
Inventor
张发恩
胡太祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Qizhi (guangzhou) Technology Co Ltd
Original Assignee
Innovation Qizhi (guangzhou) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Qizhi (guangzhou) Technology Co Ltd filed Critical Innovation Qizhi (guangzhou) Technology Co Ltd
Priority to CN201910844880.3A priority Critical patent/CN110705366A/en
Publication of CN110705366A publication Critical patent/CN110705366A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time human head detection method based on a stair scene in the field of computer vision, which comprises the following specific steps of: s1: collecting a large number of picture data sets of a stair scene; s2: labeling the data set, wherein a labeling box needs to contain information of human head and shoulders; s3: dividing a data set into a training set, a testing set and a verification set; s4: enhancing the training set data; s5: extracting marking frame data in a training set, performing kmeans clustering on marking frame information, and selecting categories to obtain different anchor information; s6: constructing an FCHD + FPN network architecture; s7: training by using an FCHD + FPN network model according to the labeled stair head training set; s8: testing the accuracy of the trained model in the verification set; s9: the generated model is used for detecting the human head in a real stair scene, an anchor is selected through a clustering method based on a real-time human head detection method of the stair scene, a labeling area is adjusted by combining shoulder information, and meanwhile, an FCHD method is improved by fusing an FPN network to improve the detection accuracy.

Description

Real-time human head detection method based on stair scene
Technical Field
The invention relates to the technical field of computer vision, in particular to a real-time human head detection method based on a stair scene.
Background
The existing human head detection method has two ideas, one is an algorithm idea of regression, a crowd density chart is obtained according to image regression, the method can only show the crowding condition of the human flow, the specific position of a human cannot be positioned, and the requirement on the resolution ratio of an image is high; another method of detecting objects, such as SSD, Yolo, fast-rcnn series of algorithms, detects the number of people, and these algorithms are poor in the case of mutual occlusion and difficult to achieve the requirements in terms of accuracy and speed of detection at the same time.
FCHD is the latest detection algorithm of this kind of scene of human head detection, but FCHD's anchor has only selected two kinds of sizes, and it is not good to have a generalization ability in the actual application, and the undetected rate is higher, because the size of human head is great with camera locating position and people's distance relation.
Based on the method, the real-time human head detection method based on the stair scene is designed, the proper anchors are selected through a clustering method, the labeled area is adjusted by combining shoulder information, and meanwhile, the FCHD method is improved by fusing an FPN network to improve the detection accuracy so as to solve the problems.
Disclosure of Invention
The invention aims to provide a real-time human head detection method based on a stair scene so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: the real-time human head detection method based on the stair scene comprises the following specific steps:
s1: acquiring a large number of picture data sets of stair scenes in public places;
s2: labeling the data set, wherein a labeling box needs to contain information of human head and shoulders;
s3: dividing a data set into a training set, a testing set and a verification set;
s4: enhancing the training set data;
s5: extracting marking frame data in a training set, performing kmeans clustering on marking frame information, and selecting categories to obtain different anchor information;
s6: constructing an FCHD + FPN network architecture;
s7: training by using an FCHD + FPN network model according to the labeled stair head training set;
s8: testing the accuracy of the trained model in the verification set;
s9: and detecting the human head in a real stair scene by using the generated model.
Preferably, the public place of step S1 includes a shopping mall or a subway.
Preferably, in step S4, the enhancing manner includes horizontal inversion, random cropping, color dithering, scaling, and rotation transformation.
Preferably, in step S6, the FCHD + FPN network architecture is to add an FPN network on the basis of FCHD, and at the same time, resnet101 is used in the FCHD basic model to adjust the NMS algorithm to the SOFT-NMS algorithm.
Compared with the prior art, the invention has the beneficial effects that:
1. on the basis of a common detection framework Faster rcnn, an FCHD (fuzzy C-means high definition) and FPN (field programmable gate array) network framework is fused, and the human head detection speed is greatly improved by fusing a one-stage model of the FCHD and the FPN;
2. the accuracy of detection is improved remarkably by the resnet101+ FPN network, and meanwhile, the candidate frames are partially optimized, so that the missing rate is reduced;
3. in the aspect of data processing, partial human body characteristics are used for marking data, so that the accuracy of model detection is improved;
4. the anchor frame obtained by training set clustering is closer to a real scene, and the missing rate of human head detection is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a block diagram of the FCHD + FPN network model of the present invention;
fig. 3 is a diagram of the last required feature generated by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the present invention provides a technical solution: the real-time human head detection method based on the stair scene comprises the following specific steps:
s1: acquiring a large number of picture data sets of stair scenes in public places, wherein the public places comprise shopping malls or subways;
s2: labeling the data set, wherein a labeling box needs to contain information of human head and shoulders;
s3: dividing a data set into a training set, a testing set and a verification set;
s4: enhancing the training set data in a horizontal inversion mode, a random pruning mode, a color dithering mode, a scale transformation mode and a rotation transformation mode;
s5: extracting marking frame data in a training set, performing kmeans clustering on marking frame information, and selecting categories to obtain different anchor information;
s6: constructing an FCHD + FPN network architecture, adding an FPN network on the basis of FCHD, and meanwhile, adopting resnet101 in an FCHD basic model to adjust an NMS algorithm to a SOFT-NMS algorithm;
s7: training by using an FCHD + FPN network model according to the labeled stair head training set;
s8: testing the accuracy of the trained model in the verification set;
s9: and detecting the human head in a real stair scene by using the generated model.
As an embodiment of the present invention
kmeans clustering:
1. the clustering data used is a detection data set only with labeling boxes, and after the data is labeled, a file containing the positions and the categories of the labeling boxes is generated, wherein each row contains (x)j,yj,wj,hj) J ∈ {1, 2, …, N }, i.e., the coordinates of grouttuthboxes with respect to the original image, (x)j,yj) Is the center point of the frame, (w)j,hj) The width and height of the frame, and N is the number of all the marked frames;
2. first, k cluster center points (W) are giveni,Hi) I ∈ {1, 2, …, k }, where Wi,HiIs the width and height dimensions of the anchor boxes, and since the anchor boxes are not fixed in position, there are no (x, y) coordinates, only width and height;
3. calculating the distance d between each labeling frame and the center point of each cluster to be 1-IOU (labeling frame, clustering center), wherein the center point of each labeling frame coincides with the clustering center during calculation, so that the IOU value can be calculated, namely d is 1-IOU [ (x)j,yj,wj,hj),(xj,yj,Wi,Hi)]J is {1, 2, …, N }, i is {1, 2, …, k }. Allocating the marking frame to the nearest clustering center;
4. after all the label boxes are distributed, recalculating the cluster center point for each cluster in the way of
Figure BDA0002194907220000041
The number of the marking frames of the ith cluster is the average value of the width and the height of all the marking frames in the cluster;
5. repeating the steps 3 and 4 until the change amount of the cluster center is small.
As another embodiment of the present invention
FCHD + FPN network model:
FPN module
The pre-trained model resnet101 is used as the base model for the entire framework. Firstly, the high-level feature is up-sampled by 2 times (nearest up-sampling method), then the convolution kernel of 1 x 1 is carried out to make the front and back channels consistent, and simultaneously the front and back channels are combined with the corresponding previous-level feature, and the combination mode is the addition between pixels. This process is iterated until the finest feature map is generated. The start of the iteration, the already fused signatures are processed with a 3 x 3 convolution kernel (to eliminate aliasing effects of the upsampling) to generate the final desired signature, as shown in fig. 3.
Data set preparation
Brainwash public dataset: 11917 pieces, 91146 marking boxes, source store monitoring video data
SCUT _ HEAD public data set: 4405, 111251 boxes for labels, Source classroom Surveillance video and Web crawler data
Personal annotation data set: 2000, source subway video data
Loss function
The loss function used to train the model is a multitask loss function, and the equation is as follows:
Figure BDA0002194907220000051
where i is the index of the selected anchor, p is the prediction probability of i, Lcls represents the classification penalty, and Lreg represents the regression penalty. Lcls is calculated over all anchors, while Lreg is calculated only over the correct anchors. Lcls is the largest loss between the two classes (head and background). And Lreg is a defined smooth L1 penalty. Both loss terms are normalized by Ncls and Nreg, which are the number of samples classified and regressed, respectively.
Hyper-parametric design
The base model is initialized using the pre-trained resnet101, and all and new layers of the pre-trained model are retrained. The new layer is initialized with random weights sampled from the standard normal distribution. The weights attenuation during training was 0.0005. The entire model was fine-tuned using SGD. The learning rate for the training was set to 0.001 and the model was trained for 30 rounds, approaching 440k iterations. After the completion of 15 periods, the learning rate was attenuated by a proportion of 0.1.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (4)

1. A real-time human head detection method based on a stair scene is characterized by comprising the following steps: the method comprises the following specific steps:
s1: acquiring a large number of picture data sets of stair scenes in public places;
s2: labeling the data set, wherein a labeling box needs to contain information of human head and shoulders;
s3: dividing a data set into a training set, a testing set and a verification set;
s4: enhancing the training set data;
s5: extracting marking frame data in a training set, performing kmeans clustering on marking frame information, and selecting categories to obtain different anchor information;
s6: constructing an FCHD + FPN network architecture;
s7: training by using an FCHD + FPN network model according to the labeled stair head training set;
s8: testing the accuracy of the trained model in the verification set;
s9: and detecting the human head in a real stair scene by using the generated model.
2. The stair scene-based real-time human head detection method according to claim 1, wherein: the public place of the step S1 includes a shopping mall or a subway.
3. The stair scene-based real-time human head detection method according to claim 1, wherein: in step S4, the enhancing method includes horizontal inversion, random trimming, color dithering, scaling, and rotation transformation.
4. The stair scene-based real-time human head detection method according to claim 1, wherein: in step S6, the FCHD + FPN network architecture is to add an FPN network on the basis of FCHD, and at the same time, adjust the NMS algorithm to the SOFT-NMS algorithm by using resnet101 in the FCHD basic model.
CN201910844880.3A 2019-09-07 2019-09-07 Real-time human head detection method based on stair scene Pending CN110705366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910844880.3A CN110705366A (en) 2019-09-07 2019-09-07 Real-time human head detection method based on stair scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910844880.3A CN110705366A (en) 2019-09-07 2019-09-07 Real-time human head detection method based on stair scene

Publications (1)

Publication Number Publication Date
CN110705366A true CN110705366A (en) 2020-01-17

Family

ID=69194806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910844880.3A Pending CN110705366A (en) 2019-09-07 2019-09-07 Real-time human head detection method based on stair scene

Country Status (1)

Country Link
CN (1) CN110705366A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368749A (en) * 2020-03-06 2020-07-03 创新奇智(广州)科技有限公司 Automatic identification method and system for stair area
CN111832465A (en) * 2020-07-08 2020-10-27 星宏集群有限公司 Real-time head classification detection method based on MobileNet V3
CN111950612A (en) * 2020-07-30 2020-11-17 中国科学院大学 FPN-based weak and small target detection method for fusion factor
CN113505771A (en) * 2021-09-13 2021-10-15 华东交通大学 Double-stage article detection method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070074A (en) * 2019-05-07 2019-07-30 安徽工业大学 A method of building pedestrian detection model
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism
CN110070074A (en) * 2019-05-07 2019-07-30 安徽工业大学 A method of building pedestrian detection model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ADITYA VORA: "FCHD: A fast and accurate head detector", 《ARXIV》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368749A (en) * 2020-03-06 2020-07-03 创新奇智(广州)科技有限公司 Automatic identification method and system for stair area
CN111368749B (en) * 2020-03-06 2023-06-13 创新奇智(广州)科技有限公司 Automatic identification method and system for stair area
CN111832465A (en) * 2020-07-08 2020-10-27 星宏集群有限公司 Real-time head classification detection method based on MobileNet V3
CN111832465B (en) * 2020-07-08 2022-03-29 星宏集群有限公司 Real-time head classification detection method based on MobileNet V3
CN111950612A (en) * 2020-07-30 2020-11-17 中国科学院大学 FPN-based weak and small target detection method for fusion factor
CN113505771A (en) * 2021-09-13 2021-10-15 华东交通大学 Double-stage article detection method and device
CN113505771B (en) * 2021-09-13 2021-12-03 华东交通大学 Double-stage article detection method and device

Similar Documents

Publication Publication Date Title
US10019652B2 (en) Generating a virtual world to assess real-world video analysis performance
CN110705366A (en) Real-time human head detection method based on stair scene
Etten City-scale road extraction from satellite imagery v2: Road speeds and travel times
Marin et al. Learning appearance in virtual scenarios for pedestrian detection
US11854244B2 (en) Labeling techniques for a modified panoptic labeling neural network
CN108537743A (en) A kind of face-image Enhancement Method based on generation confrontation network
CN111598030A (en) Method and system for detecting and segmenting vehicle in aerial image
CN110163188B (en) Video processing and method, device and equipment for embedding target object in video
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN112784736B (en) Character interaction behavior recognition method based on multi-modal feature fusion
CN111832489A (en) Subway crowd density estimation method and system based on target detection
CN111553397A (en) Cross-domain target detection method based on regional full convolution network and self-adaption
CN112084869A (en) Compact quadrilateral representation-based building target detection method
US11853892B2 (en) Learning to segment via cut-and-paste
CN107767416A (en) The recognition methods of pedestrian's direction in a kind of low-resolution image
CN114117614A (en) Method and system for automatically generating building facade texture
CN112633220A (en) Human body posture estimation method based on bidirectional serialization modeling
CN116453121B (en) Training method and device for lane line recognition model
CN111626134A (en) Dense crowd counting method, system and terminal based on hidden density distribution
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN115577768A (en) Semi-supervised model training method and device
CN115829915A (en) Image quality detection method, electronic device, storage medium, and program product
Liu et al. Translational Symmetry-Aware Facade Parsing for 3-D Building Reconstruction
CN113762009B (en) Crowd counting method based on multi-scale feature fusion and double-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200117