CN111401418A - Employee dressing specification detection method based on improved Faster r-cnn - Google Patents

Employee dressing specification detection method based on improved Faster r-cnn Download PDF

Info

Publication number
CN111401418A
CN111401418A CN202010147949.XA CN202010147949A CN111401418A CN 111401418 A CN111401418 A CN 111401418A CN 202010147949 A CN202010147949 A CN 202010147949A CN 111401418 A CN111401418 A CN 111401418A
Authority
CN
China
Prior art keywords
layer
network
convolution
block
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010147949.XA
Other languages
Chinese (zh)
Inventor
包晓安
黄友
张娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University Of Science And Technology Tongxiang Research Institute Co ltd
Original Assignee
Zhejiang University Of Science And Technology Tongxiang Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University Of Science And Technology Tongxiang Research Institute Co ltd filed Critical Zhejiang University Of Science And Technology Tongxiang Research Institute Co ltd
Priority to CN202010147949.XA priority Critical patent/CN111401418A/en
Publication of CN111401418A publication Critical patent/CN111401418A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for detecting employee dressing specification based on improved Faster r-cnn, which is used for collecting, labeling and enhancing data of a sample data set aiming at different application scenes; establishing an improved Faster r-cnn network model; training the improved network by using the enhanced training sample set; detecting the test sample set by using the trained network model; and analyzing whether the detection result meets the predefined dressing standard or not, and if not, feeding back the detected dressing non-standard result. Compared with a one-stage target detection algorithm, the improved algorithm can achieve higher accuracy. The improved network can achieve a better detection effect on multi-scale targets on one hand, and can also improve the detection rate on the other hand. The method can be directly applied to real-time detection of the monitoring video, and can play a role in monitoring and reminding the dressing standard of the staff, thereby effectively improving the efficiency of staff management.

Description

Employee dressing specification detection method based on improved Faster r-cnn
Technical Field
The invention relates to the field of computer vision, target detection and deep learning, detects target behaviors, and particularly relates to a method for detecting employee dressing specifications based on improved Faster r-cnn.
Background
In recent years, with the rapid development of computers, networks, and image processing and transmission technologies, companies tend to have intelligent and information management for employees. In order to facilitate supervision and management of the behaviors of the staff, monitoring equipment is installed in many companies, the behaviors of the staff are shot through a camera, and then special staff are arranged to observe whether abnormal behaviors exist in pictures shot on a screen. However, as the size of the company is increased, the number of employees is increased, the number of cameras is also increased, and the behavior of the employees is supervised and managed by manpower, which is time-consuming and labor-consuming, and is easy to miss due to visual fatigue or temporary neglect. Aiming at the current situations of large amount of monitoring videos, low use efficiency, complex management and need of examination by a large number of personnel at present, according to the requirements and specifications of an enterprise management layer aiming at different scenes, the invention aims to adopt a Fatser r-cnn network model based on deep learning to carry out real-time detection on the pictures acquired by a camera. The main task is to apply a target detection technology based on deep learning in a monitoring system to detect an interested moving target in a video in real time, so that the safety monitoring efficiency can be effectively improved, and a large amount of financial and material resources can be saved.
The existing target detection algorithms mainly have two types: an image recognition method based on a traditional characteristic operator and a target detection method based on deep learning. The image identification method based on the traditional feature operator has poor robustness on scenes, and has poor detection effect after the scenes are changed. The target detection algorithm based on deep learning is mainly divided into two types: a single-stage based target detection and a two-stage based target detection algorithm. The two methods are difficult to balance in the aspects of detection precision and detection speed, and the improved Fatser r-cnn network model provided by the invention can achieve higher detection speed while ensuring higher detection accuracy.
Disclosure of Invention
In order to solve the problem of detection of the dressing standard of the employee in the prior art, the invention provides a method for detecting the dressing standard of the employee based on improved Faster r-cnn, which extracts multi-layer and more robust image characteristics, so that a model can achieve better detection speed while ensuring better detection precision. The specific technical scheme is as follows:
A. collecting and labeling sample data sets aiming at different application scenes;
B. performing data enhancement on the sample data set;
C. establishing an improved Faster r-cnn network model, which comprises a characteristic pyramid network, a guide anchor point frame generation network, region-of-interest mapping and characteristic graph pooling, a classification sub-network and a regression frame sub-network; sending an image under an application scene into a feature pyramid network to extract a multi-scale semantic feature map, sending the multi-scale semantic feature map into a guide anchor point frame generation network to generate an anchor point frame, pooling the multi-scale semantic feature map with the anchor point frame to obtain feature maps with consistent scales, sending the feature maps into a classification sub-network to predict the category of the anchor point frame, and sending the feature maps into a regression frame sub-network to predict the position of the anchor point frame;
D. training the improved Faster r-cnn network model by adopting the sample data set after data enhancement to generate a training model;
E. and D, acquiring the dressing image of the employee and inputting the dressing image into the training model generated in the step D to obtain the type and the position of the dressing to be detected, and sending a reminding signal if the dressing type or the dressing position of the employee is detected to be not in accordance with the preset standard according to the dressing specification of the employee.
Further, the step a specifically comprises:
a1, collecting a sample data set, including manual collection of images in actual application scenes and downloading of network images;
and A2, labeling the sample data set, namely labeling the target to be detected in the sample image by a L-enable Img tool according to the dressing standard of the employee to be detected, wherein the label of the labeled rectangular box comprises employee clothing-staff, non-employee clothing-notstat, apron-pinafere, hat-hat and mask-mask, and the labeled rectangular box is automatically stored as a labeled file corresponding to the sample image after labeling is completed.
Further, the step B specifically includes:
b1, data expansion including turning, scaling and brightness change;
turning, namely turning the sample data set image up and down and turning the sample data set image left and right, wherein the turned image is used as a new sample image;
zooming, wherein zooming operations are carried out on the sample data set image, and the zooming ratios are 0.5, 0.8, 1.2 and 1.5 respectively;
the brightness change is carried out on the sample image, the change of illumination intensity under the real condition is simulated, and the brightness change proportion is respectively 0.5, 0.75, 1.25 and 1.50;
b2, fusing the sample data set image containing the target to be detected with the randomly selected normal image not containing the target to be detected, wherein the fusion coefficients are 0.3, 0.5 and 0.7 respectively, and updating the label file of the sample image;
b3, cutting, namely randomly cutting the sample data set image, wherein the cutting is divided into length random cutting, width random cutting and overall random cutting, and the random intervals are respectively 10% of the length and the width;
after the above operations are completed, the annotation file of the sample image is updated at the same time.
Further, the feature pyramid network in the step C is composed of a bottom-up feedforward calculation network and a top-down lateral connection network;
the feedforward calculation network consists of an initialization convolutional layer, an initialization pooling layer, a first block layer, a second block layer, a third block layer and a fourth block layer which are sequentially stacked; the initialization convolutional layer consists of a convolutional layer, a batch normalization layer and a nonlinear activation layer, the size of a convolutional layer convolutional kernel is 7 x 7, the step length of the convolutional kernel is 2, and the number of generated characteristic channels is 64; initializing the step length of the pooling layer to be 2; the output of the initialization pooling layer is connected to four block layers: the first block layer comprises 3 residual modules, the second block layer comprises 4 residual modules, the third block layer comprises 6 residual modules, the fourth block layer comprises 3 residual modules, each residual module comprises three convolution layers with 3 convolution kernels respectively being 1 x 1, 3 x 3 and 1 x 1, a batch normalization layer and an activation function layer, and the convolution kernels of the convolution layers in the four block layers are 64, 128, 256 and 512 in sequence; each block layer also comprises a branch consisting of a batch normalization layer, a nonlinear activation layer and a convolution layer, the convolution size of the convolution layer is 1 x 1, the number of generated characteristic diagram channels is 256, 512, 1024 and 512 respectively, the input of the branch is the same as the input of the first residual module in each block layer, and the output of the branch and the output of the first residual module are added to be used as the input of the next residual module; the characteristics of each block layer output fig. 1 serves on the one hand as input for the next block layer and on the other hand as input for the lateral connection network;
the lateral connection network takes the feature maps generated by the first layer block, the second layer block, the third layer block and the fourth layer block in the feedforward calculation network as the input of lateral connection, and uses convolution layers with convolution kernel size of 1 x 1 and step length of 1 to respectively operate, and then adds the result with the result of top-down up sampling to output four semantic feature maps with different levels; and operating the feature map generated by the fourth layer block in the feedforward calculation network through a convolution layer with the convolution kernel size of 1 x 1 and the step length of 2 to obtain a fifth semantic feature map.
Furthermore, the guided anchor frame generation network in the step C guides the generation of the anchor frame by using a semantic feature map, and comprises a position prediction branch, a shape prediction branch and a feature adaptive branch;
the position prediction branch is used for predicting which areas should be used as central points to generate anchor points, dividing the area of the whole feature map into a target central area, a peripheral area and an neglected area, marking an area of a small block at the center of a real target frame, which corresponds to the feature map, as the target central area, as a positive sample during training, and marking the rest areas as the neglected or negative samples according to the distance from the center; the position prediction branch consists of a convolution layer, a nonlinear activation layer and a loss layer, the convolution kernel size of the convolution layer is 1 x 1, the number of generated characteristic channels is 1, the output of the position prediction branch is the probability that each position of a characteristic diagram is a target center, and the position predicted as the target center area is used as a candidate center area of an anchor point;
the shape prediction branch is used for predicting the optimal length and width by giving an anchor point central point, belongs to a regression problem, firstly, 9 groups of w and h are sampled in a target central area by adopting an approximate method, the overlapping degree of the 9 groups and a target real frame is calculated, and the w and h with the maximum overlapping degree are the w and h of the current anchor point position; the shape prediction network is a convolution layer with convolution kernel size of 1 x 1, the number of generated characteristic channels is 2, and the output of the network is the predicted value of the length and width of the anchor point frame at each position of the characteristic diagram;
the feature self-adaptive branch directly blends the shape information of the anchor frame into the feature map by using deformable convolution operation, so that the newly obtained feature map can adapt to the shape of the anchor frame at each position, the predicted value of the length and the width of the anchor frame at each position of the feature map is used, the position offset of the next layer of convolution kernel is obtained by 1-1 convolution, and then the original feature map is corrected by 3-3 deformable convolution operation, so that the feature map adapting to the shape of the anchor frame is obtained.
And further, the region-of-interest mapping and feature map pooling in the step C is performed according to coordinates and position information of the anchor point frame obtained by the network generated by the guide anchor point frame, the anchor point frame is mapped into the feature map, and then pooling is performed to obtain the feature map with the same scale.
Further, in the classifying sub-network and the regression frame sub-network described in step C, both calculate the corresponding class and position coordinate through the full-connection layer network, the output size of the classifying sub-network is 2 × k × a, and the output size of the regression frame sub-network is 4 × a, where k represents the number of classes and a represents the number of anchor points.
Further, the classification sub-network calculates a classification loss using a focus loss function.
Further, the step D specifically includes:
d1, pre-training of the network: pre-training the improved Faster r-cnn network model by adopting a VOC data set, and storing the trained network model parameters as pth files;
d2, secondary training of the network: loading a pre-training model, carrying out secondary training on the pre-training model by using a sample data set after data enhancement, setting the initial learning rate to be 0.01, reducing the value of the learning rate in a step-type manner according to the increase of the training times, setting the batch training size to be 16, finally obtaining the trained model, and saving the trained model parameter file as a pth file.
Further, the step E specifically includes:
e1, detecting the test set sample by using the trained model, wherein the detection results are respectively a prediction type, a confidence coefficient and a corresponding position coordinate, and are stored as a pth file;
e2, calculating the precision of the to-be-detected category and the average precision of all categories according to the detection result of E1.
Compared with the prior art, the invention has the beneficial effects that:
(1) the data enhancement operation is carried out on the collected sample set, so that the problems of insufficient network learning and poor detection effect caused by too small actual sample amount can be solved;
(2) by adopting a residual error network of a pre-activated residual error unit and using a PRelu function to replace a Relu function as a nonlinear activation function, under the condition of increasing a very small amount of network parameters, image features with better robustness can be extracted, so that the accuracy of network target detection is improved;
(3) the image characteristic pyramid network is adopted to fuse the characteristics, so that multi-scale characteristics can be obtained, and good detection effects are achieved for targets with different scales;
(4) the network is generated by adopting the guide anchor point frame, so that the detection time can be greatly shortened while higher detection accuracy is ensured; the classification sub-network calculates the classification loss by adopting a focus loss function, so that the problems of unbalance of positive and negative samples of a candidate frame and unbalance of difficult and easy samples in target detection can be effectively solved;
(5) along with the increase of the training times, the learning rate is gradually reduced in a step-type mode, and the training speed can be effectively accelerated. Therefore, the invention is a technical breakthrough for the detection method of the dressing specification of the staff and solves the problems in the existing detection methods.
Drawings
FIG. 1 is a diagram of the steps of the method of the present invention;
FIG. 2 is a schematic diagram of an improved Faster r-cnn network;
FIG. 3 is a diagram of a pre-activation residual block architecture;
FIG. 4 is a schematic diagram of the PRelu function;
FIG. 5 is a diagram of an image pyramid network structure;
FIG. 6 is a diagram of a network structure generated by a guided anchor box;
FIG. 7 is a flow chart of employee dressing specification detection.
Detailed Description
The invention is further described by the following detailed description in conjunction with the accompanying drawings.
Referring to fig. 1, the implementation steps of the present invention are as follows:
A. collecting and labeling sample data set aiming at different application scenes
The method comprises the steps of collecting a sample data set, wherein the source of the data set comprises two parts, namely images of various targets to be detected downloaded on the network, and the other part is manually collected under the actual application scene, marking the sample data set, marking the collected data set by using L-enable Img software, marking all the targets to be detected appearing in each image, avoiding missing, and automatically storing the marked targets as marked files with the same names as the images after marking.
B. Data enhancement for training sample set
And B, performing data enhancement on the sample set obtained in the step A mainly through three modes: data augmentation, fusion, and clipping. The data expansion method mainly comprises the following steps: flipping, zooming, brightness variation, etc. Turning, namely turning the sample data set image up and down and turning the sample data set image left and right, wherein the turned image is used as a new sample image; zooming, wherein zooming operations are carried out on the sample data set image, and the zooming ratios are 0.5, 0.8, 1.2 and 1.5 respectively; and contrast enhancement, namely performing contrast enhancement operation on the sample data set image to simulate the change of illumination intensity under the real condition, wherein enhancement coefficients are 0.5, 0.75, 1.25 and 1.50 respectively. After the above operations are completed, the label file fusion of the operation sample is updated at the same time, the sample data set image containing the target to be detected is fused with the randomly selected normal image without the target to be detected, the fusion coefficients are 0.3, 0.5 and 0.7 respectively, and the label file of the operation sample is updated at the same time; the cutting is to cut the sample data set image randomly, the cutting is divided into length random cutting, width random cutting and overall random cutting, the random interval is 10% of the length and the width respectively, and meanwhile, the label file of the cut image is updated. And combining the new data set obtained by the three modes with the original data set into a sample data set.
C. Improvement based on improved Faster r-cnn network model
As shown in fig. 2, the improved Faster r-cnn network model consists of a feature pyramid network, a guided anchor box generation network, region of interest mapping and feature map pooling, a classification sub-network, and a regression box sub-network. The image is sent into a feature pyramid network to extract multi-scale features, the extracted feature graph is sent into a guide anchor point frame generation network to generate an anchor point frame, then the multi-scale feature graph is pooled to obtain feature graphs with consistent scales, finally the feature graphs are sent into a classification sub-network to predict the category of the anchor point frame, and the category is sent into a regression sub-network to predict the position of the anchor point frame.
The characteristic pyramid network is composed of a bottom-up feedforward calculation network and a top-down lateral connection network, wherein the bottom-up feedforward calculation network is composed of an initialization convolution layer, an initialization pooling layer, a first block layer, a second block layer, a third block layer and a fourth block layer which are sequentially stacked.
The initialization convolutional layer consists of a convolutional layer, a batch normalization layer and a nonlinear activation layer. The convolution kernel size is 7 × 7, the convolution kernel step size is 2, and the number of generated feature channels is 64. The step size for initializing the pooling layer is 2.
As shown in fig. 3, the block layer is composed of a plurality of pre-activation residual modules, and the block includes three sets of normalization layer, nonlinear activation layer and convolution layer. The first block layer has 3 blocks, where the first block consists of two branches in parallel. The first branch consists of a batch standardization layer, a nonlinear activation layer, a convolution layer, a batch standardization layer, a nonlinear activation layer and a convolution layer; the convolution kernel sizes of the convolutional layers were 1 × 1, 3 × 3, and 1 × 1, respectively, and the number of generated signature channels was 64, and 256, respectively. The second branch circuit consists of a batch normalization layer, a nonlinear activation layer and a convolution layer; the convolution size of the convolutional layer is 1 × 1, and the number of generated feature map channels is 256. And adding the characteristic graphs of the first branch and the second branch to be used as the input of the next block. The second block consists of a batch normalization layer, a nonlinear activation layer, a convolution layer, a batch normalization layer, a nonlinear activation layer and a convolution layer; the convolution kernel sizes of the convolutional layers were 1 × 1, 3 × 3, and 1 × 1, respectively, and the number of generated signature channels was 64, and 256, respectively. And then added to the input of the current block as the input of the next block. The third block is structurally identical to the second block.
The second block layer has 4 blocks, where the first block consists of two branches in parallel. The first branch consists of a batch standardization layer, a nonlinear activation layer, a convolution layer, a batch standardization layer, a nonlinear activation layer and a convolution layer; the convolution kernel sizes of the convolutional layers are 1 × 1, 3 × 3 and 1 × 1 in sequence, the number of generated feature map channels is 128, 128 and 512, respectively, and the step size of the first convolutional layer is 2. The second branch circuit consists of a batch normalization layer, a nonlinear activation layer and a convolution layer; the convolution size of the convolutional layer is 1 × 1, the number of generated feature map channels is 512, and the step size of the convolutional layer is 2. And adding the characteristic graphs of the first branch and the second branch to be used as the input of the next block. The second block consists of a batch normalization layer, a nonlinear activation layer, a convolution layer, a batch normalization layer, a nonlinear activation layer and a convolution layer; the convolution kernel sizes of the convolutional layers were 1 × 1, 3 × 3, and 1 × 1, respectively, and the number of generated signature channels was 128, and 512, respectively. And then added to the input of the current block as the input of the next block. The two latter blocks are all structured as the second block.
The third block level has 6 blocks, where the first block consists of two branches in parallel. The first branch consists of a batch standardization layer, a nonlinear activation layer, a convolution layer, a batch standardization layer, a nonlinear activation layer and a convolution layer; the convolution kernel sizes of the convolutional layers are 1 × 1, 3 × 3 and 1 × 1 in sequence, the number of generated feature map channels is 256, 256 and 1024, respectively, and the step size of the first convolutional layer is 2. The second branch circuit consists of a batch normalization layer, a nonlinear activation layer and a convolution layer; the convolution size of the convolutional layer is 1 × 1, the number of generated feature map channels is 1024, and the step size of the convolutional layer is 2. And adding the characteristic graphs of the first branch and the second branch to be used as the input of the next block. The second block consists of a batch normalization layer, a nonlinear activation layer, a convolution layer, a batch normalization layer, a nonlinear activation layer and a convolution layer; the convolution kernel sizes of the convolutional layers are 1 × 1, 3 × 3 and 1 × 1, respectively, and the number of generated feature map channels is 256, 256 and 1024, respectively. And then added to the input of the current block as the input of the next block. The structure of the rear four blocks is the same as that of the second block.
The fourth block layer has 3 blocks, where the first block consists of two branches in parallel. The first branch consists of a batch standardization layer, a nonlinear activation layer, a convolution layer, a batch standardization layer, a nonlinear activation layer and a convolution layer; the convolution kernel sizes of the convolutional layers are 1 × 1, 3 × 3 and 1 × 1 in sequence, the number of generated feature map channels is 128, 128 and 512, respectively, and the step size of the first convolutional layer is 2. The second branch circuit consists of a batch normalization layer, a nonlinear activation layer and a convolution layer; the convolution size of the convolutional layer is 1 × 1, the number of generated feature map channels is 512, and the step size of the convolutional layer is 2. And adding the characteristic graphs of the first branch and the second branch to be used as the input of the next block. The second block consists of a batch normalization layer, a nonlinear activation layer, a convolution layer, a batch normalization layer, a nonlinear activation layer and a convolution layer; the convolution kernel sizes of the convolutional layers were 1 × 1, 3 × 3, and 1 × 1, respectively, and the number of generated signature channels was 128, and 512, respectively. And then added to the input of the current block as the input of the next block. The third block is structurally identical to the second block.
The image is input into a feedforward computing network to carry out a series of operations as shown in figure 3, and a multi-scale feature map is extracted.
FIG. 4, in which the nonlinear activation function employs the Pre L U function shown in FIG. 4:
Figure BDA0002401425920000081
wherein
Figure BDA0002401425920000082
As a parameter, the setting of 0.8 is proved by experiments to have better effect.
The top-down lateral connection is a multi-scale feature map obtained by calculation of a feedforward calculation network as an input, and the output is a feature map after high-low-dimensional features are fused. As shown in fig. 5, the signature graphs generated by the first, second, third and fourth layer blocks in the feed forward computing network are used as inputs to the lateral connections, denoted as m1, m2, m3 and m 4. Operating m4 by a convolution layer with a convolution kernel size of 1 x 1 to obtain a feature map p4 with the channel number of 256; operating m3 by using a convolution layer with the convolution kernel size of 1 x 1 to obtain a feature map p31 with the channel number of 256, and then performing up-sampling on p4 and adding the up-sampling and p31 to obtain p 3; operating m2 by using a convolution layer with the convolution kernel size of 1 x 1 to obtain a feature map p21 with the channel number of 256, and then performing up-sampling on p3 and adding the up-sampling and p21 to obtain p 2; operating m1 by using a convolution layer with the convolution kernel size of 1 x 1 to obtain a feature map p11 with the channel number of 256, and then performing up-sampling on p2 and adding the up-sampling and p11 to obtain p 1; and (3) operating m4 by using the convolution layer with the convolution kernel size of 1 × 1 and the step size of 2 to obtain a feature map p5 with the channel number of 256.
The guided anchor frame generation network utilizes semantic features to guide the generation of anchor frames. As shown in fig. 6, the guided anchor block generation network is composed of three parts, namely a position prediction branch, a shape prediction branch and a feature adaptive branch:
the position prediction branch is used for predicting which areas should be used as central points to generate anchor points. The area of the whole feature map is divided into a target central area, a peripheral area and an neglected area, a small area of the center of a real target frame corresponding to the area on the feature map is marked as the target central area, the target central area is used as a positive sample during training, and the rest areas are marked as neglected or negative samples according to the distance from the center. The position prediction network consists of a convolution layer, a nonlinear active layer and a loss layer, the convolution kernel size of the convolution layer is 1 x 1, and the number of generated characteristic channels is 1. The output of the network is the probability that each location of the feature map is the center of the object. The position predicted as the target central area is used as a candidate central area of the anchor point;
shape prediction branches, which predict the optimal length and width by giving the anchor point center point, are a regression problem. First, in the target center region, the common 9 groups w and h are sampled by approximation. And then calculating the overlapping degree of the 9 groups with the target real frame, wherein the w and h with the maximum overlapping degree are the w and h of the current anchor point position. The shape prediction network generates 2 feature channels by a convolution layer with convolution kernel size of 1 x 1, and the output of the network is the predicted value of the length and width of the anchor point frame at each position of the feature map;
and the feature self-adaptive branch directly blends the shape information of the anchor point frame into the feature map by utilizing deformable convolution operation, so that the newly obtained feature map can adapt to the shape of each position anchor point frame, namely the receptive field corresponding to the anchor point frame with a larger size is larger than the receptive field corresponding to the anchor point frame with a smaller size is smaller than the receptive field corresponding to the anchor point frame with a smaller size. And obtaining the position offset of the next convolution kernel by convolution of 1 x 1 at anchor points w and h predicted by C32. Correcting the original characteristic diagram by using a 3-by-3 deformable convolution operation to obtain a characteristic diagram adapting to the shape of the anchor point frame;
on the one hand, the feature map obtained above is input into the position prediction branch of the guidance anchor frame generation network to predict the target central region, and on the other hand, the feature map is input into the shape prediction branch of the guidance anchor frame generation network to predict the optimal length and width at the anchor. Meanwhile, according to the relevant information of the image scale change, the anchor point candidate frame can be obtained. And performing convolution of the shape prediction branch result by 1 x 1 to obtain the position offset of the next layer of convolution kernel. And correcting the original characteristic diagram by using a 3-by-3 deformable convolution operation to obtain the characteristic diagram adapting to the shape of the anchor point frame.
And mapping the region of interest and pooling the feature map, mapping the anchor point frame into the feature map according to the coordinates and position information of the anchor point frame obtained by the network generated by guiding the anchor point frame, and then performing pooling operation to obtain the feature map with fixed size. Sending the feature map with the fixed size into a classification sub-network to obtain a category predicted value of the anchor point frame; and simultaneously sending the anchor point frame into a regression subnetwork to obtain a coordinate prediction value of the anchor point frame. The classification sub-network and the regression frame sub-network are used for calculating corresponding categories and position coordinates through a full-connection layer network, the output size of the classification sub-network is 2 k A, the output size of the regression frame sub-network is 4A, wherein k represents the number of the categories, and A represents the number of anchor points. The classification sub-network calculates the classification loss by adopting a focus loss function, and the problem of unbalanced samples can be effectively solved.
D. Pre-training, training and selection of networks
And C, pre-training the improved Faster r-cnn network in the step C by adopting a VOC data set, and storing the trained network model parameters as a pth file. And B, dividing the sample set after the data enhancement through the step B into a training set and a testing set according to the ratio of 8: 2. And loading the pre-trained model and training by using the training sample set. Setting the initial learning rate to be 0.01, wherein the value of the learning rate is reduced in a step mode according to the increase of the training times; setting the batch size to 16; and saving the trained network parameter file as a pth file.
E. Detecting a test sample set by using a trained network model
And D, detecting the test set sample by using the model trained in the step D, wherein detection results are respectively a prediction type, a confidence coefficient and a corresponding position coordinate, and are stored as a pth file. And respectively calculating the precision of the to-be-detected category and the average precision of all the categories according to the detection result.
The flow of performing employee dressing specification detection is shown in fig. 7, and firstly, images to be detected are obtained from a video monitoring terminal every 15 s; secondly, the image to be detected is sent to a neural network loaded with trained model parameters, and a detection result is calculated. The result comprises the detected object type, confidence and corresponding position coordinates; then, according to the dressing standard defined in advance, whether the dressing of the staff in the image to be detected meets the standard or not is analyzed by combining the detection result; and finally, outputting the detection result, if the detection result is not standard, starting a reminding service, storing the detection result and sending the detection result to a relevant responsible person. Tests prove that the detection speed of the invention can reach 4 pictures in 1 minute, and the detection precision reaches 92%.
The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims (9)

1. An employee dressing specification detection method based on improved Faster r-cnn is characterized by comprising the following steps:
A. collecting and labeling sample data sets aiming at different application scenes;
B. performing data enhancement on the sample data set;
C. establishing an improved Faster r-cnn network model, which comprises a characteristic pyramid network, a guide anchor point frame generation network, region-of-interest mapping and characteristic graph pooling, a classification sub-network and a regression frame sub-network; sending an image under an application scene into a feature pyramid network to extract a multi-scale semantic feature map, sending the multi-scale semantic feature map into a guide anchor point frame generation network to generate an anchor point frame, pooling the multi-scale semantic feature map with the anchor point frame to obtain feature maps with consistent scales, sending the feature maps into a classification sub-network to predict the category of the anchor point frame, and sending the feature maps into a regression frame sub-network to predict the position of the anchor point frame;
D. training the improved Faster r-cnn network model by adopting the sample data set after data enhancement to generate a training model;
E. and D, acquiring the dressing image of the employee and inputting the dressing image into the training model generated in the step D to obtain the type and the position of the dressing to be detected, and sending a reminding signal if the dressing type or the dressing position of the employee is detected to be not in accordance with the preset standard according to the dressing specification of the employee.
2. The method for detecting employee dressing rules based on improved Faster r-cnn according to claim 1, wherein the step a specifically comprises:
a1, collecting a sample data set, including manual collection of images in actual application scenes and downloading of network images;
and A2, labeling the sample data set, namely labeling the target to be detected in the sample image by a L-enable Img tool according to the dressing standard of the employee to be detected, wherein the label of the labeled rectangular box comprises employee clothing-staff, non-employee clothing-notstat, apron-pinafere, hat-hat and mask-mask, and the labeled rectangular box is automatically stored as a labeled file corresponding to the sample image after labeling is completed.
3. The method for detecting employee dressing rules based on improved Faster r-cnn according to claim 1, wherein the step B specifically comprises:
b1, data expansion including turning, scaling and brightness change;
turning, namely turning the sample data set image up and down and turning the sample data set image left and right, taking the turned image as a new sample image, and updating an annotation file of the sample image;
zooming, namely zooming the sample data set image, wherein the zooming ratios are 0.5, 0.8, 1.2 and 1.5 respectively, and updating the label file of the sample image;
the brightness changes, the sample image is subjected to random brightness changes, the change of illumination intensity under the real condition is simulated, the brightness change proportion is respectively 0.5, 0.75, 1.25 and 1.50, and meanwhile, the annotation file of the sample image is updated;
b2, fusing the sample data set image containing the target to be detected with the randomly selected normal image not containing the target to be detected, wherein the fusion coefficients are 0.3, 0.5 and 0.7 respectively, and updating the label file of the sample image;
and B3, cutting, namely randomly cutting the sample data set image, wherein the cutting is divided into length random cutting, width random cutting and overall random cutting, the random intervals are respectively 10% of the length and the width, and meanwhile, the label file of the sample image is updated.
4. The improved Faster r-cnn based employee rigging specification detection method according to claim 1, wherein the feature pyramid network of step C is comprised of a bottom-up feed forward computing network and a top-down lateral connection network;
the feedforward calculation network consists of an initialization convolutional layer, an initialization pooling layer, a first block layer, a second block layer, a third block layer and a fourth block layer which are sequentially stacked; the initialization convolutional layer consists of a convolutional layer, a batch normalization layer and a nonlinear activation layer, the size of a convolutional layer convolutional kernel is 7 x 7, the step length of the convolutional kernel is 2, and the number of generated characteristic channels is 64; initializing the step length of the pooling layer to be 2; the output of the initialization pooling layer is connected to four block layers: the first block layer comprises 3 residual modules, the second block layer comprises 4 residual modules, the third block layer comprises 6 residual modules, the fourth block layer comprises 3 residual modules, each residual module comprises three convolution layers with 3 convolution kernels respectively being 1 x 1, 3 x 3 and 1 x 1, a batch normalization layer and an activation function layer, and the convolution kernels of the convolution layers in the four block layers are 64, 128, 256 and 512 in sequence; each block layer also comprises a branch consisting of a batch normalization layer, a nonlinear activation layer and a convolution layer, the convolution size of the convolution layer is 1 x 1, the number of generated characteristic diagram channels is 256, 512, 1024 and 512 respectively, the input of the branch is the same as the input of the first residual module in each block layer, and the output of the branch and the output of the first residual module are added to be used as the input of the next residual module; the feature diagram output by each block layer is used as the input of the next block layer on one hand and used as the input of a lateral connection network on the other hand;
the lateral connection network takes the feature maps generated by the first layer block, the second layer block, the third layer block and the fourth layer block in the feedforward calculation network as the input of lateral connection, and uses convolution layers with convolution kernel size of 1 x 1 and step length of 1 to respectively operate, and then adds the result with the result of top-down up sampling to output four semantic feature maps with different levels; and operating the feature map generated by the fourth layer block in the feedforward calculation network through a convolution layer with the convolution kernel size of 1 x 1 and the step length of 2 to obtain a fifth semantic feature map.
5. The method for detecting employee's dressing rules based on improved Faster r-cnn according to claim 1, wherein the guided anchor block generation network of step C utilizes a semantic feature map to guide the generation of anchor blocks, and comprises a position prediction branch, a shape prediction branch and a feature adaptive branch;
the position prediction branch is used for predicting which areas should be used as central points to generate anchor points, dividing the area of the whole feature map into a target central area, a peripheral area and an neglected area, marking an area of a small block at the center of a real target frame, which corresponds to the feature map, as the target central area, as a positive sample during training, and marking the rest areas as the neglected or negative samples according to the distance from the center; the position prediction branch consists of a convolution layer, a nonlinear activation layer and a loss layer, the convolution kernel size of the convolution layer is 1 x 1, the number of generated characteristic channels is 1, the output of the position prediction branch is the probability that each position of a characteristic diagram is a target center, and the position predicted as the target center area is used as a candidate center area of an anchor point;
the shape prediction branch is used for predicting the optimal length and width by giving an anchor point central point, belongs to a regression problem, firstly, 9 groups of w and h are sampled in a target central area by adopting an approximate method, the overlapping degree of the 9 groups and a target real frame is calculated, and the w and h with the maximum overlapping degree are the w and h of the current anchor point position; the shape prediction network is a convolution layer with convolution kernel size of 1 x 1, the number of generated characteristic channels is 2, and the output of the network is the predicted value of the length and width of the anchor point frame at each position of the characteristic diagram;
the feature self-adaptive branch directly blends the shape information of the anchor frame into the feature map by using deformable convolution operation, so that the newly obtained feature map can adapt to the shape of the anchor frame at each position, the predicted value of the length and the width of the anchor frame at each position of the feature map is used, the position offset of the next layer of convolution kernel is obtained by 1-1 convolution, and then the original feature map is corrected by 3-3 deformable convolution operation, so that the feature map adapting to the shape of the anchor frame is obtained.
6. The method for detecting employee dressing rules based on improved Faster r-cnn according to claim 1, wherein the region-of-interest mapping and feature map pooling of step C maps the anchor point frame to the feature map according to the coordinates and position information of the anchor point frame obtained by the guiding anchor point frame generation network, and then pooling is performed to obtain the feature map with a consistent scale.
7. The improved Faster r-cnn based employee rigging specification detection method according to claim 1, wherein the classification sub-network and the regression frame sub-network of step C, both of which calculate the corresponding category and position coordinates through a full link network, have an output size of 2 x k a for the classification sub-network and 4 x a for the regression frame sub-network, where k represents the number of categories and a represents the number of anchor points.
8. The improved Faster r-cnn based employee rigging specification detection method according to claim 7, wherein said classification sub-network calculates classification loss using a focus loss function.
9. The method for detecting employee dressing rules based on improved Faster r-cnn according to claim 1, wherein the step D comprises:
d1, pre-training of the network: pre-training the improved Faster r-cnn network model by adopting a VOC data set, and storing the trained network model parameters as pth files;
d2, secondary training of the network: loading a pre-training model, carrying out secondary training on the pre-training model by using a sample data set after data enhancement, setting the initial learning rate to be 0.01, reducing the value of the learning rate in a step-type manner according to the increase of the training times, setting the batch training size to be 16, finally obtaining the trained model, and saving the trained model parameter file as a pth file.
CN202010147949.XA 2020-03-05 2020-03-05 Employee dressing specification detection method based on improved Faster r-cnn Pending CN111401418A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010147949.XA CN111401418A (en) 2020-03-05 2020-03-05 Employee dressing specification detection method based on improved Faster r-cnn

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010147949.XA CN111401418A (en) 2020-03-05 2020-03-05 Employee dressing specification detection method based on improved Faster r-cnn

Publications (1)

Publication Number Publication Date
CN111401418A true CN111401418A (en) 2020-07-10

Family

ID=71432212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010147949.XA Pending CN111401418A (en) 2020-03-05 2020-03-05 Employee dressing specification detection method based on improved Faster r-cnn

Country Status (1)

Country Link
CN (1) CN111401418A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183417A (en) * 2020-09-30 2021-01-05 重庆天智慧启科技有限公司 Business consultant service capability evaluation system and method
CN112434670A (en) * 2020-12-14 2021-03-02 武汉纺织大学 Equipment and method for detecting abnormal behavior of power operation
CN113012139A (en) * 2021-03-29 2021-06-22 南京奥纵智能科技有限公司 Deep learning algorithm for detecting defects of conductive particles of liquid crystal display
CN113139530A (en) * 2021-06-21 2021-07-20 城云科技(中国)有限公司 Method and device for detecting sleep post behavior and electronic equipment thereof
CN113496260A (en) * 2021-07-06 2021-10-12 浙江大学 Grain depot worker non-standard operation detection method based on improved YOLOv3 algorithm
CN113642574A (en) * 2021-07-30 2021-11-12 中国人民解放军军事科学院国防科技创新研究院 Small sample target detection method based on feature weighting and network fine tuning
CN113869249A (en) * 2021-09-30 2021-12-31 广州文远知行科技有限公司 Lane line marking method, device, equipment and readable storage medium
CN113902958A (en) * 2021-10-12 2022-01-07 广东电网有限责任公司广州供电局 Anchor point self-adaption based infrastructure field personnel detection method
WO2023077821A1 (en) * 2021-11-07 2023-05-11 西北工业大学 Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image
CN116503517A (en) * 2023-06-27 2023-07-28 江西农业大学 Method and system for generating image by long text

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170176190A1 (en) * 2017-03-09 2017-06-22 Thomas Danaher Harvey Devices and methods to facilitate escape from a venue with a sudden hazard
CN107463892A (en) * 2017-07-27 2017-12-12 北京大学深圳研究生院 Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics
CN108052900A (en) * 2017-12-12 2018-05-18 成都睿码科技有限责任公司 A kind of method by monitor video automatic decision dressing specification
CN109299688A (en) * 2018-09-19 2019-02-01 厦门大学 Ship Detection based on deformable fast convolution neural network
CN109472226A (en) * 2018-10-29 2019-03-15 上海交通大学 A kind of sleep behavioral value method based on deep learning
CN109711401A (en) * 2018-12-03 2019-05-03 广东工业大学 A kind of Method for text detection in natural scene image based on Faster Rcnn
CN110059674A (en) * 2019-05-24 2019-07-26 天津科技大学 Standard dressing detection method based on deep learning
CN110097129A (en) * 2019-05-05 2019-08-06 西安电子科技大学 Remote sensing target detection method based on profile wave grouping feature pyramid convolution

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170176190A1 (en) * 2017-03-09 2017-06-22 Thomas Danaher Harvey Devices and methods to facilitate escape from a venue with a sudden hazard
CN107463892A (en) * 2017-07-27 2017-12-12 北京大学深圳研究生院 Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics
CN108052900A (en) * 2017-12-12 2018-05-18 成都睿码科技有限责任公司 A kind of method by monitor video automatic decision dressing specification
CN109299688A (en) * 2018-09-19 2019-02-01 厦门大学 Ship Detection based on deformable fast convolution neural network
CN109472226A (en) * 2018-10-29 2019-03-15 上海交通大学 A kind of sleep behavioral value method based on deep learning
CN109711401A (en) * 2018-12-03 2019-05-03 广东工业大学 A kind of Method for text detection in natural scene image based on Faster Rcnn
CN110097129A (en) * 2019-05-05 2019-08-06 西安电子科技大学 Remote sensing target detection method based on profile wave grouping feature pyramid convolution
CN110059674A (en) * 2019-05-24 2019-07-26 天津科技大学 Standard dressing detection method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIFENG DAI, ET.AL: "Deformable convolutional networks" *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183417B (en) * 2020-09-30 2023-12-05 重庆天智慧启科技有限公司 System and method for evaluating service capability of consultant in department of industry
CN112183417A (en) * 2020-09-30 2021-01-05 重庆天智慧启科技有限公司 Business consultant service capability evaluation system and method
CN112434670A (en) * 2020-12-14 2021-03-02 武汉纺织大学 Equipment and method for detecting abnormal behavior of power operation
CN113012139A (en) * 2021-03-29 2021-06-22 南京奥纵智能科技有限公司 Deep learning algorithm for detecting defects of conductive particles of liquid crystal display
CN113139530A (en) * 2021-06-21 2021-07-20 城云科技(中国)有限公司 Method and device for detecting sleep post behavior and electronic equipment thereof
CN113139530B (en) * 2021-06-21 2021-09-03 城云科技(中国)有限公司 Method and device for detecting sleep post behavior and electronic equipment thereof
CN113496260A (en) * 2021-07-06 2021-10-12 浙江大学 Grain depot worker non-standard operation detection method based on improved YOLOv3 algorithm
CN113496260B (en) * 2021-07-06 2024-01-30 浙江大学 Grain depot personnel non-standard operation detection method based on improved YOLOv3 algorithm
CN113642574A (en) * 2021-07-30 2021-11-12 中国人民解放军军事科学院国防科技创新研究院 Small sample target detection method based on feature weighting and network fine tuning
CN113642574B (en) * 2021-07-30 2022-11-29 中国人民解放军军事科学院国防科技创新研究院 Small sample target detection method based on feature weighting and network fine tuning
CN113869249A (en) * 2021-09-30 2021-12-31 广州文远知行科技有限公司 Lane line marking method, device, equipment and readable storage medium
CN113869249B (en) * 2021-09-30 2024-05-07 广州文远知行科技有限公司 Lane marking method, device, equipment and readable storage medium
CN113902958A (en) * 2021-10-12 2022-01-07 广东电网有限责任公司广州供电局 Anchor point self-adaption based infrastructure field personnel detection method
WO2023077821A1 (en) * 2021-11-07 2023-05-11 西北工业大学 Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image
CN116503517B (en) * 2023-06-27 2023-09-05 江西农业大学 Method and system for generating image by long text
CN116503517A (en) * 2023-06-27 2023-07-28 江西农业大学 Method and system for generating image by long text

Similar Documents

Publication Publication Date Title
CN111401418A (en) Employee dressing specification detection method based on improved Faster r-cnn
Rijal et al. Ensemble of deep neural networks for estimating particulate matter from images
CN113870260B (en) Welding defect real-time detection method and system based on high-frequency time sequence data
CN110084165B (en) Intelligent identification and early warning method for abnormal events in open scene of power field based on edge calculation
CN111401419A (en) Improved RetinaNet-based employee dressing specification detection method
CN112801146B (en) Target detection method and system
CN112613569B (en) Image recognition method, training method and device for image classification model
CN111209958B (en) Substation equipment detection method and device based on deep learning
CN111723657B (en) River foreign matter detection method and device based on YOLOv3 and self-optimization
CN116310785B (en) Unmanned aerial vehicle image pavement disease detection method based on YOLO v4
CN112613454A (en) Electric power infrastructure construction site violation identification method and system
CN111368636A (en) Object classification method and device, computer equipment and storage medium
CN110175519B (en) Method and device for identifying separation and combination identification instrument of transformer substation and storage medium
US20230048386A1 (en) Method for detecting defect and method for training model
CN113222149A (en) Model training method, device, equipment and storage medium
CN112288700A (en) Rail defect detection method
CN114494168A (en) Model determination, image recognition and industrial quality inspection method, equipment and storage medium
CN115830399A (en) Classification model training method, apparatus, device, storage medium, and program product
CN116823793A (en) Device defect detection method, device, electronic device and readable storage medium
CN113408630A (en) Transformer substation indicator lamp state identification method
CN115984158A (en) Defect analysis method and device, electronic equipment and computer readable storage medium
CN112396104A (en) Plasma discharge identification method and system based on machine learning
CN117114420B (en) Image recognition-based industrial and trade safety accident risk management and control system and method
CN117670755B (en) Detection method and device for lifting hook anti-drop device, storage medium and electronic equipment
CN111626409B (en) Data generation method for image quality detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination