CN113139481A - Classroom people counting method based on yolov3 - Google Patents
Classroom people counting method based on yolov3 Download PDFInfo
- Publication number
- CN113139481A CN113139481A CN202110466081.4A CN202110466081A CN113139481A CN 113139481 A CN113139481 A CN 113139481A CN 202110466081 A CN202110466081 A CN 202110466081A CN 113139481 A CN113139481 A CN 113139481A
- Authority
- CN
- China
- Prior art keywords
- image
- follows
- frame
- classroom
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000001514 detection method Methods 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 29
- 238000002372 labelling Methods 0.000 claims abstract description 27
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000003064 k means clustering Methods 0.000 claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a yolov 3-based classroom people counting method, which comprises the following steps of: s1, acquiring an original image of a classroom as an image set for model training through a camera arranged on the ceiling of the classroom; s2, labeling the top view of the head of each student in the original image, generating a labeling file, and calculating a labeling frame in the labeling file; s3, extracting features, and performing convolution and downsampling on the input image; s4, establishing a detection model, clustering the data set by using a k-means clustering algorithm, wherein the detection model adopts an improved yolov3 network and comprises a feature extraction network and a target detection layer; s5, training a detection model; and S6, inputting the image to be detected into the detection model for people number detection. The method is based on yolov3 algorithm, can quickly and accurately count the number of students in a classroom at a certain moment, and teachers can quickly find out whether the students come up, whether someone is mixed in or quit early in the course of lecture, so that the phenomena of late arrival, early quit, class escape and the like of the students can be improved.
Description
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to a classroom people counting method based on yolov 3.
Background
Under the university 'class-walking' type education, the conditions of late arrival, early departure and class escape of students are endless. The counting of the number of people in each class by the teacher is time-consuming, and whether students leave or enter in the middle of the class can not be known in real time.
Some methods for detecting more people, such as a single-chip microcomputer infrared detection method, are easily affected by the environment, and are low in detection accuracy and high in energy consumption.
Disclosure of Invention
The invention mainly aims to overcome the defects and shortcomings of the prior art, and provides a classroom people counting method based on yolov3, based on a yolov3 algorithm, the number of students in a classroom at a certain moment can be quickly and accurately counted, a teacher can quickly find whether the students come together, whether people are mixed or quit early in the course of lecturing, and the phenomena of late arrival, early quit, class escape and the like of the students are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the classroom people counting method based on yolov3 comprises the following steps:
s1, acquiring an original image of a classroom as an image set for model training through a camera arranged on the ceiling of the classroom;
s2, labeling the top view of the head of each student in the original image, generating a labeling file, and calculating a labeling frame in the labeling file;
s3, extracting features, namely performing convolution processing and downsampling on the input image;
s4, establishing a detection model, clustering the data set by using a k-means clustering algorithm, wherein the detection model adopts an improved yolov3 network and comprises a feature extraction network and a target detection layer;
s5, training a detection model;
and S6, inputting the image to be detected into the detection model for people number detection.
Further, the acquiring of the original image of the classroom specifically includes:
the method comprises the steps of collecting images of students in class through cameras on ceilings of the classrooms, collecting not less than 20 images of the classrooms, collecting not less than 100 frames of images in each classroom, and enabling each frame to be an image at different moments.
Further, the labeling specifically includes:
and (3) marking the top view of the head of each person in the acquired original image by using a Bounding box image marking tool LabelImg, uniformly marking the top views as a type of head, generating and storing an xml marking file after marking, and simultaneously storing the original image.
Further, the calculation of the labeling frame in the labeling file specifically includes:
4 parameters of each marking frame in the xml marking file are calculated and converted, and the formula is as follows:
w=W*(X1-X2)
h=H*(Y1-Y2)
wherein ,X1、Y1、X2 and Y2Are respectively a label4 parameters of the frame, wherein W and H are the width and height of the original image respectively, and W and H are output values and correspond to the width and height of the marked frame respectively;
and (4) integrating all the w and h sets for calculating and converting output into a wh-data set.
Further, the step S3 is specifically:
extracting the characteristics of the original image, and adopting an improved yolov3 network characteristic extraction network, wherein the algorithm applied by the characteristic extraction network is as follows:
adjusting the size of the input image to 416 x 416, and performing convolution on the input image by adopting a convolution kernel of 16 x 3, wherein the step size is 1;
let the input image size be k x k, the convolution kernel be n x n, and the convolution formula be as follows:
wherein ,yijRepresenting the pixel value, w, of the convolved output map at the subscript value i, juvRepresenting the pixel value, x, at the subscript value u, v in the corresponding convolution kerneli-u+1,j-v+1Representing the pixel values of image x at i-u +1, j-v + 1;
corresponding to the net input y of the first layer(1)The standard normalization formula is as follows:
wherein ,E(y(1)) And var (y)(1)) Means y under the current parameters(1)The expectation and variance of each dimension over the entire training set,an output for the first layer normalization;
and correcting the output image by adopting a Leaky ReLU function as an activation function, wherein the formula is as follows, x represents the input image, and a takes positive real numbers:
performing 5 times of downsampling on the feature map output by the convolution operation;
the convolution kernels used are 32 × 3, 64 × 3, 128 × 3, 256 × 3, 51 × 3 in turn, the convolution steps are all 2, and the convolution objects are all the previous layer of output feature map;
after 5 times of downsampling of the feature map, 4 groups of convolution incomplete modules consisting of 256 × 1 and 512 × 3 convolution kernels are used for extracting features of the feature map output from the previous layer, the convolution step is 1, and the output is 26 × 26 feature maps.
Further, the data set is clustered by using a k-means clustering algorithm, specifically, the wh-data set is clustered:
selecting a clustering center point number k as 3; the k-means clustering algorithm is specifically as follows:
given a data sample X, n objects X ═ X are included1,X2,X3,...,Xn-wherein each object has attributes of m dimensions;
the aim of the k-means algorithm is to gather n objects into specified k class clusters according to the similarity among the objects, wherein each object belongs to and only belongs to one class cluster with the minimum distance from the center of the class cluster;
initializing k cluster centers C ═ C1,C2,C3,...,Ck},1<k≤n;
By calculating the euclidean distance of each object to each cluster center, as shown in the following equation:
wherein ,XiDenotes that the ith object 1. ltoreq. i.ltoreq.n, CjJ is more than or equal to 1 and less than or equal to k and X of j-th cluster centeritT-th attribute representing the ith object, t is more than or equal to 1 and less than or equal to m, CjtA tth attribute representing a jth cluster center;
sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the object to obtain k clusters { S }1,S2,S3,...,Sk};
The k-means algorithm defines a prototype of a class cluster by using a center, wherein the class cluster center is the mean value of all objects in the class cluster in each dimension, and the calculation formula is as follows:
wherein ,ClRepresents the center of the first cluster, l is more than or equal to 1 and less than or equal to k, | SlI represents the number of objects in the ith class cluster, XiRepresents the ith object in the ith class cluster, and is more than or equal to 1 and less than or equal to i and less than or equal to | Sl|;
And clustering the wh-data set by the k-means algorithm to obtain the width and height of three prior frames.
Further, step S4 further includes the following steps:
inputting the images in the training set and the xml data into a modified yolov3 network for training, wherein the steps are as follows:
the input image is detected by using three prior boxes obtained previously, and the specific algorithm is as follows:
dividing an input image into S-S grids, and if the center of a target is in a certain grid, the grid is responsible for the detection of the target; each grid predicts 3 bounding boxes with dimensions:
S×S×B×(4+1+C)
b represents the number of predicted prediction frames of each grid, and B is set to be 3; 4 center coordinates and width and height (b) of each prediction boxx,by,bw,bh) 1 represents the confidence; c represents the total amount of the categories, and C is set to be 1;
the formula of the parameters of the prediction box and the parameters of the real box is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
wherein ,cx and cyIs the upper left corner coordinate of the grid; p is a radical ofw and phMapping the prior frame to the width and height of the feature map; bx,by,bw,bhThe output values are respectively the central coordinate, width and height of the prediction frame; t is tx,ty,tw,thThe output values are respectively the center coordinate, width and height of the real frame; sigma (t)x),σ(ty) Indicates that t is to be expressed using Sigmoid functionx,tyCompression onto (O, 1);
the confidence is calculated as follows:
wherein ,representing the confidence of the jth prediction box of the ith grid; prThe probability of whether the current prediction box has an object is represented;a value representing the IOU of the real box with which the predicted box most closely matches;
the calculation formula of the IOU is as follows:
wherein, A is the area of the frame of the predicted frame, and B is the area of the frame of the real frame.
Further, the loss function formula used by the detection model is as follows:
wherein i represents the ith grid, and j represents the jth prediction frame predicted by each grid; (x)i,yi) The prediction frame center coordinates representing the ith mesh prediction; w is ai and hiRepresenting the width and height of the prediction box; p (c) represents the probability that the object belongs to the c-th class; coord is a weight coefficient; lambda [ alpha ]noobjIs a penalty weight coefficient; i isij objAnd whether the jth prediction box of the ith grid is responsible for predicting the target or not is represented, and the value is 0 or 1.
Further, step S5 includes training a network model, setting parameters of the cfg of the original yolov3 network configuration file, starting training a training set after setting, stopping training until convergence of the loss function, and storing weights of the trained network model.
Further, step S6 is specifically:
detecting an image to be detected by using the trained network model, and selecting the confidence coefficient with the maximum numerical value of three prediction frames predicted by each grid on the image; setting a threshold value to be 0.75, and if the confidence coefficient is smaller than the threshold value, marking as F; if the confidence coefficient is larger than the threshold value, marking as T; the number of T is the detected number of people.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method uses the convolutional neural network which is more suitable for detecting the small target as the feature extraction network of the improved yolov3 algorithm, the size of the output feature map is more specific to the detection of the small target of the student head top view shot by the classroom camera, and the accuracy of target detection is greatly improved.
2. The loss function used by the method is more specific to the application of small target detection, and the influence of disappearance of the Sigmoid function gradient can be effectively reduced, so that the detection model is converged more quickly, and the detection result is more accurate.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in FIG. 1, the classroom people counting method based on yolov3 of the invention comprises the following steps:
s1, acquiring an original image of a classroom as a data set for model training through a camera arranged on a ceiling of the classroom, specifically:
the method comprises the steps of collecting images of students in class through cameras on ceilings of the classrooms, collecting images of not less than 20 classrooms, and collecting images of not less than 100 frames in each classroom.
S2, labeling the top view of the head of each student in the original image, generating a labeling file, and calculating a labeling frame in the labeling file, wherein the labeling frame specifically comprises the following steps:
marking the top view of the head of each person in the acquired original image by using a Bounding box image marking tool LabelImg, uniformly marking the top views as a type of head, generating and storing an xml marking file after marking, and simultaneously storing an original image;
the specific calculation of the labeling frame in the labeling file is as follows:
4 parameters of each marking frame in the xml marking file are calculated and converted, and the formula is as follows:
w=W*(X1-X2)
h=H*(Y1-Y2)
wherein ,X1、Y1、X2 and Y2Respectively representing 4 parameters of the labeling frame, wherein W and H respectively represent the width and height of the original image, and W and H respectively represent output values and respectively correspond to the width and height of the labeling frame;
and (4) integrating all the w and h sets for calculating and converting output into a wh-data set.
S3, feature extraction, namely performing convolution processing and down sampling on the input image, and clustering the operated data set by using a k-means algorithm, wherein the method specifically comprises the following steps:
extracting the characteristics of the original image, and adopting an improved yolov3 network characteristic extraction network, wherein the algorithm applied by the characteristic extraction network is as follows:
adjusting the size of the input image to 416 x 416, and performing convolution on the input image by adopting a convolution kernel of 16 x 3, wherein the step size is 1;
let the input image size be k x k, the convolution kernel be n x n, and the convolution formula be as follows:
wherein ,yijRepresenting the pixel value, w, of the convolved output map at the subscript value i, juvRepresenting the pixel value, x, at the subscript value u, v in the corresponding convolution kerneli-u+1,j-v+1Representing the pixel values of image x at i-u +1, j-v + 1;
corresponding to the net input y of the first layer(1)The standard normalization formula is as follows:
wherein ,E(y(1)) And var (y)(1)) Means y under the current parameters(1)The expectation and variance of each dimension over the entire training set,an output for the first layer normalization;
and correcting the output image by adopting a Leaky ReLU function as an activation function, wherein the formula is as follows, x represents the input image, and a takes positive real numbers:
performing 5 times of downsampling on the feature map output by the convolution operation;
the convolution kernels used were 32 × 3, 64 × 3, 128 × 3, 256 × 3, 51 × 3 in this order, the convolution steps were all 2, and the objects of convolution were all the previous output feature maps.
After 5 times of downsampling of the feature map, 4 groups of convolution incomplete modules consisting of 256 × 1 and 512 × 3 convolution kernels are used for extracting features of the feature map output from the previous layer, the convolution step is 1, and the output is 26 × 26 feature maps.
S4, establishing a detection model, clustering the data set by using a k-means clustering algorithm, wherein the detection model adopts an improved yolov3 network and comprises a feature extraction network and a target detection layer, and the method specifically comprises the following steps:
s41, clustering the data set by using a k-means clustering algorithm, specifically clustering the wh-data set:
selecting a clustering center point number k as 3; the k-means clustering algorithm is specifically as follows:
given a data sample X, n objects X ═ X are included1,X2,X3,...,Xn-wherein each object has attributes of m dimensions;
the aim of the k-means algorithm is to gather n objects into specified k class clusters according to the similarity among the objects, wherein each object belongs to and only belongs to one class cluster with the minimum distance from the center of the class cluster;
initializing k cluster centers C ═ C1,C2,C3,...,Ck},1<k≤n;
By calculating the euclidean distance of each object to each cluster center, as shown in the following equation:
wherein ,XiDenotes that the ith object 1. ltoreq. i.ltoreq.n, CjJ is more than or equal to 1 and less than or equal to k and X of j-th cluster centeritT-th attribute representing the ith object, t is more than or equal to 1 and less than or equal to m, CjtA tth attribute representing a jth cluster center;
sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the object to obtain k clusters { S }1,S2,S3,...,Sk};
The k-means algorithm defines a prototype of a class cluster by using a center, wherein the class cluster center is the mean value of all objects in the class cluster in each dimension, and the calculation formula is as follows:
wherein ,ClRepresents the center of the first cluster, l is more than or equal to 1 and less than or equal to k, | SlI represents the number of objects in the ith class cluster, XiRepresents the ith object in the ith class cluster, and is more than or equal to 1 and less than or equal to i and less than or equal to | Sl|;
Clustering the wh-data set by the k-means algorithm to obtain the width and height of three prior frames;
s42, inputting the images in the training set and the xml data into the improved yolov3 network for training, wherein the steps are as follows:
the input image is detected by using three prior boxes obtained previously, and the specific algorithm is as follows:
dividing an input image into S-S grids, and if the center of a target is in a certain grid, the grid is responsible for the detection of the target; each grid predicts 3 bounding boxes with dimensions:
S×S×B×(4+1+C)
b represents the number of predicted prediction frames of each grid, and B is set to be 3; 4 center coordinates and width and height (b) of each prediction boxx,by,bw,bh) 1 represents the confidence; c represents the total amount of the categories, and C is set to be 1;
the formula of the parameters of the prediction box and the parameters of the real box is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
wherein ,cx and cyIs the upper left corner coordinate of the grid; p is a radical ofw and phMapping the prior frame to the width and height of the feature map; bx,by,bw,bhThe output values are respectively the central coordinate, width and height of the prediction frame; t is tx,ty,tw,thThe output values are respectively the center coordinate, width and height of the real frame; sigma (t)x),σ(ty) Indicates that t is to be expressed using Sigmoid functionx,tyCompression onto (0, 1);
the confidence is calculated as follows:
wherein ,representing the confidence of the jth prediction box of the ith grid; prThe probability of whether the current prediction box has an object is represented;specifying the value of the IOU of the real box that the predicted box is most matched with;
the calculation formula of the IOU is as follows:
wherein, A is the area of the frame of the predicted frame, and B is the area of the frame of the real frame.
The loss function formula used by the detection model is as follows:
wherein i represents the ith grid, and j represents the jth prediction frame predicted by each grid; (x)i,yi) The prediction frame center coordinates representing the ith mesh prediction; w is ai and hiRepresenting prediction blocksWidth and height; p (c) represents the probability that the object belongs to the c-th class; coord is a weight coefficient; lambda [ alpha ]noobjIs a penalty weight coefficient; i isij objWhether the jth prediction box of the ith grid is responsible for predicting the target or not is represented, and the value is 0 or 1;
s5, carrying out detection model training, specifically:
training a network model, setting parameters of an original yolov3 network configuration file cfg, starting training a training set after the setting is finished, stopping training until the convergence of a loss function, and storing the trained network model weights.
S6, inputting the image to be detected into the detection model for people number detection, specifically:
detecting an image to be detected by using the trained network model, and selecting the confidence coefficient with the maximum numerical value of three prediction frames predicted by each grid on the image; setting a threshold value to be 0.75, and if the confidence coefficient is smaller than the threshold value, marking as F; if the confidence coefficient is larger than the threshold value, marking as T; the number of T is the detected number of people.
It should also be noted that in this specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. The classroom people counting method based on yolov3 is characterized by comprising the following steps:
s1, acquiring an original image of a classroom as an image set for model training through a camera arranged on the ceiling of the classroom;
s2, labeling the top view of the head of each student in the original image, generating a labeling file, and calculating a labeling frame in the labeling file;
s3, extracting features, namely performing convolution processing and downsampling on the input image;
s4, establishing a detection model, clustering the data set by using a k-means clustering algorithm, wherein the detection model adopts an improved yolov3 network and comprises a feature extraction network and a target detection layer;
s5, training a detection model;
and S6, inputting the image to be detected into the detection model for people number detection.
2. The yolov 3-based classroom people counting method as claimed in claim 1, wherein the raw images of the collected classroom are specifically:
the method comprises the steps of collecting images of students in class through cameras on ceilings of the classrooms, collecting not less than 20 images of the classrooms, collecting not less than 100 frames of images in each classroom, and enabling each frame to be an image at different moments.
3. The yolov 3-based classroom people counting method as claimed in claim 1, wherein said labeling is specifically:
and (3) marking the top view of the head of each person in the acquired original image by using a Bounding box image marking tool LabelImg, uniformly marking the top views as a type of head, generating and storing an xml marking file after marking, and simultaneously storing the original image.
4. The yolov 3-based classroom people counting method as defined in claim 3, wherein the calculation of the annotation box in the annotation file is specifically as follows:
4 parameters of each marking frame in the xml marking file are calculated and converted, and the formula is as follows:
w=W*(X1-X2)
h=H*(Y1-Y2)
wherein ,X1、Y1、X2 and Y2Respectively representing 4 parameters of the labeling frame, wherein W and H respectively represent the width and height of the original image, and W and H respectively represent output values and respectively correspond to the width and height of the labeling frame;
and (4) integrating all the w and h sets for calculating and converting output into a wh-data set.
5. The yolov 3-based classroom people counting method as claimed in claim 1, wherein said step S3 specifically comprises:
extracting the characteristics of the original image, and adopting an improved yolov3 network characteristic extraction network, wherein the algorithm applied by the characteristic extraction network is as follows:
adjusting the size of the input image to 416 x 416, and performing convolution on the input image by adopting a convolution kernel of 16 x 3, wherein the step size is 1;
let the input image size be k x k, the convolution kernel be n x n, and the convolution formula be as follows:
wherein ,yijRepresenting the pixel value, w, of the convolved output map at the subscript value i, juvRepresenting the pixel value, x, at the subscript value u, v in the corresponding convolution kerneli-u+1,j-v+1Representing the pixel values of image x at i-u +1, j-v + 1;
corresponding to the net input y of the first layer(1)The standard normalization formula is as follows:
wherein ,E(y(1)) And var (y)(1)) Means y under the current parameters(1)The expectation and variance of each dimension over the entire training set,an output for the first layer normalization;
and correcting the output image by adopting a Leaky ReLU function as an activation function, wherein the formula is as follows, x represents the input image, and a takes positive real numbers:
performing 5 times of downsampling on the feature map output by the convolution operation;
the convolution kernels used are 32 × 3, 64 × 3, 128 × 3, 256 × 3, 51 × 3 in turn, the convolution steps are all 2, and the convolution objects are all the previous layer of output feature map;
after 5 times of downsampling of the feature map, 4 groups of convolution incomplete modules consisting of 256 × 1 and 512 × 3 convolution kernels are used for extracting features of the feature map output from the previous layer, the convolution step is 1, and the output is 26 × 26 feature maps.
6. The yolov 3-based classroom people counting method as defined in claim 4, wherein the data sets are clustered using a k-means clustering algorithm, specifically clustering wh-data sets:
selecting a clustering center point number k as 3; the k-means clustering algorithm is specifically as follows:
given a data sample X, n objects X ═ X are included1,X2,X3,...,Xn-wherein each object has attributes of m dimensions;
the aim of the k-means algorithm is to gather n objects into specified k class clusters according to the similarity among the objects, wherein each object belongs to and only belongs to one class cluster with the minimum distance from the center of the class cluster;
initializing k cluster centers C ═ C1,C2,C3,...,Ck},1<k≤n;
By calculating the euclidean distance of each object to each cluster center, as shown in the following equation:
wherein ,XiDenotes that the ith object 1. ltoreq. i.ltoreq.n, CjJ is more than or equal to 1 and less than or equal to k and X of j-th cluster centeritT-th attribute representing the ith object, t is more than or equal to 1 and less than or equal to m, CjtA tth attribute representing a jth cluster center;
sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the object to obtain k clusters { S }1,S2,S3,...,Sk};
The k-means algorithm defines a prototype of a class cluster by using a center, wherein the class cluster center is the mean value of all objects in the class cluster in each dimension, and the calculation formula is as follows:
wherein ,ClRepresents the center of the first cluster, l is more than or equal to 1 and less than or equal to k, | SlI represents the number of objects in the ith class cluster, XiRepresents the ith object in the ith class cluster, and is more than or equal to 1 and less than or equal to i and less than or equal to | Sl|;
And clustering the wh-data set by the k-means algorithm to obtain the width and height of three prior frames.
7. The yolov 3-based classroom people counting method as claimed in claim 1, wherein the step S4 further comprises the steps of:
inputting the images in the training set and the xml data into a modified yolov3 network for training, wherein the steps are as follows:
the input image is detected by using three prior boxes obtained previously, and the specific algorithm is as follows:
dividing an input image into S-S grids, and if the center of a target is in a certain grid, the grid is responsible for the detection of the target; each grid predicts 3 bounding boxes with dimensions:
S×S×B×(4+1+C)
b represents the number of predicted prediction frames of each grid, and B is set to be 3; 4 center coordinates and width and height (b) of each prediction boxx,by,bw,bh) 1 represents the confidence; c represents the total amount of the categories, and C is set to be 1;
the formula of the parameters of the prediction box and the parameters of the real box is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
wherein ,cx and cyIs the upper left corner coordinate of the grid; p is a radical ofw and phMapping the prior frame to the width and height of the feature map; bx,by,bw,bhThe output values are respectively the central coordinate, width and height of the prediction frame; t is tx,ty,tw,thThe output values are respectively the center coordinate, width and height of the real frame; sigma (t)x),σ(ty) Indicates that t is to be expressed using Sigmoid functionx,tyCompression onto (0, 1);
the confidence is calculated as follows:
wherein ,representing the confidence of the jth prediction box of the ith grid; prThe probability of whether the current prediction box has an object is represented;a value representing the IOU of the real box with which the predicted box most closely matches;
the calculation formula of the IOU is as follows:
wherein, A is the area of the frame of the predicted frame, and B is the area of the frame of the real frame.
8. The yolov 3-based classroom people statistics method of claim 7, wherein the loss function used by the detection model is formulated as follows:
wherein i represents the ith grid, and j represents the jth prediction frame predicted by each grid; (x)i,yi) The prediction frame center coordinates representing the ith mesh prediction; w is ai and hiRepresenting the width and height of the prediction box; p (c) represents the probability that the object belongs to the c-th class;coordis a weight coefficient; lambda [ alpha ]noobjIs a penalty weight coefficient; i isij objAnd whether the jth prediction box of the ith grid is responsible for predicting the target or not is represented, and the value is 0 or 1.
9. The yolov 3-based classroom people counting method as claimed in claim 1, wherein, step S5 comprises training the network model, setting parameters of cfg of original yolov3 network configuration file, starting training the training set after setting, stopping training until the loss function converges, and saving weights weight of the trained network model.
10. The yolov 3-based classroom people counting method as claimed in claim 1, wherein the step S6 is specifically:
detecting an image to be detected by using the trained network model, and selecting the confidence coefficient with the maximum numerical value of three prediction frames predicted by each grid on the image; setting a threshold value to be 0.75, and if the confidence coefficient is smaller than the threshold value, marking as F; if the confidence coefficient is larger than the threshold value, marking as T; the number of T is the detected number of people.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110466081.4A CN113139481B (en) | 2021-04-28 | 2021-04-28 | Classroom people counting method based on yolov3 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110466081.4A CN113139481B (en) | 2021-04-28 | 2021-04-28 | Classroom people counting method based on yolov3 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113139481A true CN113139481A (en) | 2021-07-20 |
CN113139481B CN113139481B (en) | 2023-09-01 |
Family
ID=76816299
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110466081.4A Active CN113139481B (en) | 2021-04-28 | 2021-04-28 | Classroom people counting method based on yolov3 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113139481B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113989708A (en) * | 2021-10-27 | 2022-01-28 | 福州大学 | Campus library epidemic prevention and control method based on YOLO v4 |
CN114495003A (en) * | 2022-01-24 | 2022-05-13 | 上海申视信科技有限公司 | People number identification and statistics method and system based on improved YOLOv3 network |
CN116563797A (en) * | 2023-07-10 | 2023-08-08 | 安徽网谷智能技术有限公司 | Monitoring management system for intelligent campus |
CN117557820A (en) * | 2024-01-08 | 2024-02-13 | 浙江锦德光电材料有限公司 | Quantum dot optical film damage detection method and system based on machine vision |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108509860A (en) * | 2018-03-09 | 2018-09-07 | 西安电子科技大学 | HOh Xil Tibetan antelope detection method based on convolutional neural networks |
CN108647587A (en) * | 2018-04-23 | 2018-10-12 | 腾讯科技(深圳)有限公司 | Demographic method, device, terminal and storage medium |
CN108717798A (en) * | 2018-07-16 | 2018-10-30 | 辽宁工程技术大学 | A kind of intelligent public transportation system based on Internet of Things pattern |
CN108830145A (en) * | 2018-05-04 | 2018-11-16 | 深圳技术大学(筹) | A kind of demographic method and storage medium based on deep neural network |
CN110060233A (en) * | 2019-03-20 | 2019-07-26 | 中国农业机械化科学研究院 | A kind of corn ear damage testing method |
CN110837795A (en) * | 2019-11-04 | 2020-02-25 | 防灾科技学院 | Teaching condition intelligent monitoring method, device and equipment based on classroom monitoring video |
CN111626128A (en) * | 2020-04-27 | 2020-09-04 | 江苏大学 | Improved YOLOv 3-based pedestrian detection method in orchard environment |
-
2021
- 2021-04-28 CN CN202110466081.4A patent/CN113139481B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108509860A (en) * | 2018-03-09 | 2018-09-07 | 西安电子科技大学 | HOh Xil Tibetan antelope detection method based on convolutional neural networks |
CN108647587A (en) * | 2018-04-23 | 2018-10-12 | 腾讯科技(深圳)有限公司 | Demographic method, device, terminal and storage medium |
CN108830145A (en) * | 2018-05-04 | 2018-11-16 | 深圳技术大学(筹) | A kind of demographic method and storage medium based on deep neural network |
CN108717798A (en) * | 2018-07-16 | 2018-10-30 | 辽宁工程技术大学 | A kind of intelligent public transportation system based on Internet of Things pattern |
CN110060233A (en) * | 2019-03-20 | 2019-07-26 | 中国农业机械化科学研究院 | A kind of corn ear damage testing method |
CN110837795A (en) * | 2019-11-04 | 2020-02-25 | 防灾科技学院 | Teaching condition intelligent monitoring method, device and equipment based on classroom monitoring video |
CN111626128A (en) * | 2020-04-27 | 2020-09-04 | 江苏大学 | Improved YOLOv 3-based pedestrian detection method in orchard environment |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113989708A (en) * | 2021-10-27 | 2022-01-28 | 福州大学 | Campus library epidemic prevention and control method based on YOLO v4 |
CN113989708B (en) * | 2021-10-27 | 2024-06-04 | 福州大学 | Campus library epidemic prevention and control method based on YOLO v4 |
CN114495003A (en) * | 2022-01-24 | 2022-05-13 | 上海申视信科技有限公司 | People number identification and statistics method and system based on improved YOLOv3 network |
CN116563797A (en) * | 2023-07-10 | 2023-08-08 | 安徽网谷智能技术有限公司 | Monitoring management system for intelligent campus |
CN116563797B (en) * | 2023-07-10 | 2023-10-27 | 安徽网谷智能技术有限公司 | Monitoring management system for intelligent campus |
CN117557820A (en) * | 2024-01-08 | 2024-02-13 | 浙江锦德光电材料有限公司 | Quantum dot optical film damage detection method and system based on machine vision |
CN117557820B (en) * | 2024-01-08 | 2024-04-16 | 浙江锦德光电材料有限公司 | Quantum dot optical film damage detection method and system based on machine vision |
Also Published As
Publication number | Publication date |
---|---|
CN113139481B (en) | 2023-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113139481B (en) | Classroom people counting method based on yolov3 | |
JP6892558B2 (en) | Theological assistance method and the theological assistance system that adopts the method | |
CN110334765B (en) | Remote sensing image classification method based on attention mechanism multi-scale deep learning | |
CN110321361B (en) | Test question recommendation and judgment method based on improved LSTM neural network model | |
CN107292246A (en) | Infrared human body target identification method based on HOG PCA and transfer learning | |
CN109376637A (en) | Passenger number statistical system based on video monitoring image processing | |
CN110889672A (en) | Student card punching and class taking state detection system based on deep learning | |
CN109299707A (en) | A kind of unsupervised pedestrian recognition methods again based on fuzzy depth cluster | |
CN106156765A (en) | safety detection method based on computer vision | |
CN110321862B (en) | Pedestrian re-identification method based on compact ternary loss | |
CN108256486B (en) | Image identification method and device based on nonnegative low-rank and semi-supervised learning | |
CN107392251B (en) | Method for improving target detection network performance by using classified pictures | |
CN109902615A (en) | A kind of multiple age bracket image generating methods based on confrontation network | |
CN109784288B (en) | Pedestrian re-identification method based on discrimination perception fusion | |
CN107808376A (en) | A kind of detection method of raising one's hand based on deep learning | |
CN110163567A (en) | Classroom roll calling system based on multitask concatenated convolutional neural network | |
CN114898460B (en) | Teacher nonverbal behavior detection method based on graph convolution neural network | |
CN111860297A (en) | SLAM loop detection method applied to indoor fixed space | |
CN109190458A (en) | A kind of person of low position's head inspecting method based on deep learning | |
CN107832747A (en) | A kind of face identification method based on low-rank dictionary learning algorithm | |
CN116052211A (en) | Knowledge distillation-based YOLOv5s lightweight sheep variety identification method and system | |
CN114627553A (en) | Method for detecting classroom scene student behaviors based on convolutional neural network | |
CN114299279A (en) | Unmarked group rhesus monkey motion amount estimation method based on face detection and recognition | |
CN108280516A (en) | The optimization method of Intelligent evolution is mutually won between a kind of multigroup convolutional neural networks | |
Pei et al. | Convolutional neural networks for class attendance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |