CN109978035B - Pedestrian detection method based on improved k-means and loss function - Google Patents

Pedestrian detection method based on improved k-means and loss function Download PDF

Info

Publication number
CN109978035B
CN109978035B CN201910202078.4A CN201910202078A CN109978035B CN 109978035 B CN109978035 B CN 109978035B CN 201910202078 A CN201910202078 A CN 201910202078A CN 109978035 B CN109978035 B CN 109978035B
Authority
CN
China
Prior art keywords
data
xml
picture
loss
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910202078.4A
Other languages
Chinese (zh)
Other versions
CN109978035A (en
Inventor
郭杰
郑佳卉
吴宪云
李云松
解静
邱尚锋
林朋雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910202078.4A priority Critical patent/CN109978035B/en
Publication of CN109978035A publication Critical patent/CN109978035A/en
Application granted granted Critical
Publication of CN109978035B publication Critical patent/CN109978035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The invention provides a pedestrian detection method based on improved k-means and a loss function, which is used for classifying and identifying videos or images containing pedestrian targets and mainly solves the problems that a clustering result is inaccurate and a prediction frame cannot learn loss according to self size characteristics in the prior art, and the method comprises the following steps: constructing a training set and a test set; clustering the training set based on an improved k-means algorithm; improving a loss function of a YOLOv3 detection network; training the training set based on the improved loss function; and detecting the test set. The invention screens out invalid data in the labeling information of the training set in the clustering stage, and clusters the obtained valid data, thereby obtaining more accurate initialized size of the candidate frame, and enabling different prediction frames to learn different prediction losses according to the size characteristics of the different prediction frames, thereby obtaining more accurate pedestrian target detection network.

Description

Pedestrian detection method based on improved k-means and loss function
Technical Field
The invention belongs to the technical field of target detection, relates to a pedestrian detection method, and particularly relates to a pedestrian detection method based on improved k-means and an improved loss function, which can be used for classifying and identifying videos or images containing pedestrian targets.
Background
The pedestrian detection refers to detecting position coordinates and confidence degrees of pedestrians in videos or images, and main indexes for measuring detection results comprise detection accuracy and detection speed, wherein the most important measurement index is the detection accuracy which is often influenced by pedestrian characteristics and loss functions.
At present, the common pedestrian detection methods can be divided into two categories, namely pedestrian detection based on a traditional algorithm and pedestrian detection based on deep learning according to different extraction modes of pedestrian features.
The traditional pedestrian detection method mainly comprises a global feature detection method, a local feature extraction-based detection method and a multi-feature-based detection method. The detection method based on the global features mainly detects the contour of the pedestrian through the gradient direction histogram of the whole image so as to find the position of the pedestrian. The detection method based on the local feature extraction mainly comprises the steps of extracting the local features of an input picture and detecting the local features by matching with the features of pedestrians. The detection method based on the multiple features mainly extracts and detects various features such as gray scale, contour and the like and integrates detection results of the features. The common advantages of the three methods are simplicity and rapidness, but because the characteristics of the pedestrian are sensitive to factors such as illumination, background and shielding, background noise and light interference are easily introduced during detection, so that the detection precision of the traditional pedestrian detection method is low.
The development of deep learning brings a new idea for the research of pedestrian detection. The pedestrian detection method based on deep learning mainly comprises a detection method based on candidate frame selection and a detection method based on end-to-end, the detection method based on candidate frame selection mainly comprises the steps of manually selecting candidate frames and then carrying out network training, and although the method has a good detection effect, the detection efficiency of a network is low due to the fact that the candidate frames are selected in advance.
In recent years, an end-to-end-based detection method gradually becomes a mainstream method in the pedestrian detection field due to good detection accuracy and detection efficiency, the method takes a target detection network based on deep learning as a basic network, and initializes the size of a candidate frame by using a clustering method, so that the initial size of the candidate frame is close to the size of pedestrian features, the network is more easily converged, then a training set is trained by using a loss function to obtain a pedestrian detection network model, and finally a pedestrian detection network model is used for detecting a test set picture to obtain the position coordinates and confidence of all pedestrian targets. However, the basic network detection accuracy adopted by most pedestrian detection algorithms at present is still not ideal, such as YOLOv1, YOLOv2 and the like, and therefore the detection accuracy of all the pedestrian target detection algorithms is low. For example, a patent application having application publication No. CN 109325418A entitled "pedestrian identification method in road traffic environment based on improved YOLOv 3" discloses a method for pedestrian detection by improved YOLOv 3. According to the method, a YOLOv3 is used as a basic network, the number of candidate frames is increased in the k-means clustering process, so that the capability of extracting features of the network is increased, and then when the network is trained by using a loss function, the weight of a coordinate loss function in the loss function is increased, so that a pedestrian detection network model is obtained. But the method does not consider the condition that the labeled information in the training set is invalid when the k-means is used for clustering, so that the clustering result is inaccurate; in addition, the method does not consider the problem that the coordinate errors and the width and height errors in the coordinate loss function are different in learning proportion by prediction frames with different sizes when calculating the loss, so that the prediction frames cannot learn the loss according to the size characteristics of the prediction frames. Therefore, how to screen out effective data in the labeling information of the training set and calculate more accurate loss still remains a problem to be solved urgently in the field.
Disclosure of Invention
The invention aims to provide a pedestrian detection method based on improved k-means and a loss function aiming at overcoming the defects of the existing pedestrian detection technology, and aims to improve the detection accuracy of pedestrian targets under different scenes.
The technical idea of the invention is as follows: firstly, a training set and a test set are constructed, secondly, an improved k-means clustering algorithm is used for clustering the labeled information of the training set, the clustering result is used as a size initialization value of a YOLOv3 network candidate frame, then the training set is trained based on an improved loss function in a YOLOv3 network, and finally, the trained pedestrian detection network model is used for detecting the test set.
According to the technical idea, the technical scheme adopted for achieving the purpose of the invention comprises the following steps:
(1) constructing a training set and a testing set:
(1a) storing continuous or discontinuous N-frame images in a pedestrian video in any scene into a JPEGImages folder in a jpg picture form, and naming each picture, wherein N is more than 1000;
(1b) taking more than half of pictures in a JPEGImages folder as a training picture set, taking the rest pictures as a test picture set, writing the names of all the pictures in the training picture set into a train.txt file under an ImageSets/Main folder, and simultaneously writing the names of all the pictures in the test picture set into a test.txt file under the ImageSets/Main folder;
(1c) carrying out frame marking on different pedestrians contained in each picture in the training picture set and the test picture set, storing coordinate data of a marking frame, and then storing the type person of a pedestrian target contained in the marking frame and the coordinate data of the marking frame contained in each picture into an xml file to obtain an indications file folder consisting of a plurality of xml files, wherein the name of each xml file is the same as that of the corresponding pedestrian picture;
(1d) selecting an xml file which is selected from an options folder and has the same name as a picture in a train.txt file as a marking information set of a training picture set, taking an xml file which is selected from the options folder and has the same name as a picture in a test.txt file as a marking information set of a test picture set, writing the marking information set of the training picture set into a train.txt file under a darknet folder, writing the marking information set of the test picture set into the test.txt file under the darknet folder, wherein the training picture set and the xml marking information set corresponding to the training picture set form a training set, and the test picture set and the xml marking information set corresponding to the testing picture set form a test set;
(2) clustering the training set based on an improved k-means algorithm:
(2a) and (3) screening the labeled information in the training set:
(2a1) writing coordinate data extracted from an xml marking file corresponding to a training set into an array data _ xml with the length of l, taking a first group of coordinate data read from the data _ xml as current coordinate data, and initializing the current index value q of the first group of coordinate data in the data _ xml to be 0;
(2a2) defining coordinate data corresponding to q in data _ xml: defining the projection coordinate of the x axis corresponding to the upper left corner of the labeling frame as xminAnd the y-axis projection coordinate corresponding to the upper left corner of the labeling box is defined as yminAnd the projection coordinate of the x axis corresponding to the lower right corner of the labeling frame is defined as xmaxAnd the y-axis projection coordinate corresponding to the lower right corner of the labeling frame is defined as ymax
(2a3) Calculating xminAnd xmaxDifference x ofd,yminAnd ymaxDifference y ofdAnd determining xdAnd ydWhether the data in the corresponding data _ xml is valid data or not is judged if x isd0 or ydX is 0dAnd ydDeleting the invalid data l-1 when the data in the corresponding data _ xml is invalid data, and executing the step (2a 2); if xdNot equal to 0 and ydNot equal to 0, then xdAnd ydIf the data in the corresponding data _ xml is valid data, executing the step (2a 4);
(2a4) calculating xdAnd ydAnd judging the validity of the data in the data _ xml corresponding to the div according to whether the div > 3 is satisfied, if so, the data in the data _ xml corresponding to the div is invalid data, deleting the invalid data, and making l-1, and executing the step (2a5), otherwise, the data in the data _ xml corresponding to the div is valid data, making q +1, and executing the step (2a 5);
(2a5) repeating the steps (2a2) - (2a4) until q is equal to l, and obtaining effective annotation information;
(2b) clustering the effective labeling information:
(2b1) setting the number of clustering centers as k, wherein k is more than 0, constructing a two-dimensional matrix data _ k with the length l of the data _ xml as the row number and the k as the column number, wherein the row of the data _ k represents effective marking information stored in the data _ xml, the list represents the value of the clustering centers, and initializing the data _ k to be 0;
(2b2) respectively carrying out random initialization on the k clustering centers;
(2b3) calculating distance values of l effective marking information and k clustering centers in the data _ xml, and writing each distance value into the position where the row corresponding to the effective marking information and the column corresponding to the clustering center in the data _ k are located;
(2b4) taking the effective marking information corresponding to each row in the data _ k as a member of a corresponding clustering center of the column where the minimum distance value in each row is located, and updating the numerical value of each clustering center into a mean value of the width and the height of the member of each clustering center;
(2b5) repeating the steps (2b3) and (2b4) until the values of the k clustering centers are not changed any more, and taking the values of the k clustering centers as clustering results;
(3) the loss function of the YOLOv3 detection network is improved:
modifying coordinate Loss function in YOLOv3 detection network Loss function into Loss'coord
Figure BDA0001997803410000041
ti=2-wi×hi
Wherein λ iscoordWeight parameter representing the network to the coordinates of the prediction box, l.w representing the size of the network divided over the picture width, l.h representing the size of the network divided over the picture height, l.n representing the number of prediction boxes in the network, i being a variable for an iteration of l.w × l.h, j being a variable for an iteration of l.n, wiIndicates the width of the prediction box and,
Figure BDA0001997803410000042
width, h, of the reference frameiIndicating the high of the prediction box that,
Figure BDA0001997803410000051
high, x representing the label boxiRepresenting the projection of the coordinates of the upper left corner of the prediction box on the x-axis,
Figure BDA0001997803410000052
denotes xmin,yiRepresents the projection of the coordinates of the upper left corner of the prediction box on the y-axis,
Figure BDA0001997803410000053
denotes ymin
(4) Training the training set based on the improved loss function:
(4a) taking the clustering result as a size initialization value of a YOLOv3 network candidate box;
(4b) performing K times of iterative training on the training set based on an improved loss function in the YOLOv3 network, wherein K is more than 10000, and obtaining a pedestrian detection network model;
(5) and (3) detecting the test set:
and inputting the test set to be detected into a pedestrian detection network model for detection to obtain the position coordinate and confidence coefficient of each pedestrian target.
Compared with the prior art, the invention has the following advantages:
the method improves the loss function in the YOLOv3, increases the learning weight of the coordinate error in the coordinate loss function for the small-size prediction frame, and avoids the defect that the prediction frame cannot learn loss according to the size characteristics of the prediction frame, and simultaneously improves the k-means clustering algorithm, screens the values of the width-height size and the width-height ratio of the marking frame in the training set, removes invalid data while retaining the valid data, clusters the valid data, and avoids the defect that the detection precision is influenced by the inaccurate clustering result caused by the invalid marking information, and simulation results show that compared with the prior art, the method effectively improves the detection precision of pedestrian detection.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments.
Referring to fig. 1, the present invention includes the steps of:
step 1) constructing a training set and a testing set:
step 1a) extracting a frame of picture from continuous or discontinuous N frames of pictures of pedestrians in any scene shot by a camera, an unmanned aerial vehicle or a mobile phone every 10 frames and storing the pictures into a JPEGImages folder, wherein N is more than 10000, in the embodiment, the continuous 12000 frames of pictures in the videos of the pedestrians in the road shot by the mobile phone are adopted, and the pictures are named as different names, wherein the resolution of the videos is 1920 multiplied by 1080, and the number of the pictures stored in the JPEGImages folder is not less than 1000;
step 1b) taking more than half of pictures in a JPEGImages folder as a training picture set, taking the rest pictures as a test picture set, dividing the training picture set and the test picture set by using a ratio of 7:3 in the embodiment, writing the names of all pictures in the training picture set into a train.txt file under an ImageSets/Main folder, and simultaneously writing the names of all pictures in the test picture set into a test.txt file under the ImageSets/Main folder, wherein the name of each picture is taken as a line in the train.txt file and the test.txt file;
step 1c), performing frame marking on pedestrian targets contained in each picture in the training picture set and the test picture set:
step 1c1) for the category and position coordinates (x) of the pedestrian objectmin,ymin,xmax,ymax) Labeling, wherein each pedestrian object is of the type person, xminFor the corresponding x-axis projection coordinate, y, of the upper left corner of the labeling boxminFor the y-axis projection coordinate, x, corresponding to the upper left corner of the labeling boxmaxFor the corresponding x-axis projection coordinate, y, of the lower right corner of the labeling boxmaxProjecting coordinates of a y axis corresponding to the lower right corner of the labeling frame;
step 1c2) storing the labeling information of all pedestrian targets in each picture of the training picture set and the test picture set in an xml format to obtain an annotation folder consisting of a plurality of xml format files, wherein the name of each xml format file is the same as the picture name corresponding to the labeling information contained in the xml format file, and if the name of the labeling information file corresponding to the picture 000001.jpg is 000001.xml, the jpeggemages folder, the annotation folder and the ImageSets folder are placed in the folder darknet;
step 1d) selecting an xml file which is selected from an options folder and has the same name as a picture in a train.txt file as a marking information set of a training picture set, selecting an xml file which is selected from a test.txt file and has the same name as a picture in a test.txt file as a marking information set of a test picture set, writing the marking information set of the training picture set into a train.txt file under a darknet folder, writing the marking information set of the test picture set into the test.txt file under the darknet folder, wherein the training picture set and the corresponding xml marking information set form a training set, and the test picture set and the corresponding xml marking information set form a test set;
step 2) clustering the training set based on an improved k-means algorithm:
step 2a) screening the labeled information in the training set:
step 2a1) constructing an array data _ xml, extracting coordinate data from xml files of all training sets by using obj in python, writing the coordinate data into the data _ xml in sequence, wherein each member of the data _ xml represents a set of coordinate data, calculating the length l of the data _ xml by using a len function in python, reading the first set of coordinate data in the data _ xml, and initializing the current index value q of the data _ xml to be 0;
step 2a2) defining coordinate data corresponding to q in data _ xml: defining the projection coordinate of the x axis corresponding to the upper left corner of the labeling frame as xminAnd the y-axis projection coordinate corresponding to the upper left corner of the labeling box is defined as yminAnd the projection coordinate of the x axis corresponding to the lower right corner of the labeling frame is defined as xmaxAnd the y-axis projection coordinate corresponding to the lower right corner of the labeling frame is defined as ymax
Step 2a3) calculating xminAnd xmaxDifference x ofd,yminAnd ymaxDifference y ofdWherein x ismin、xmax、yminAnd ymaxAre all floating point type numbers, and determine xdAnd ydWhether the data in the corresponding data _ xml is valid data or not is judged if x isd0 or ydX is 0dAnd ydThe data in the corresponding data _ xml is invalid data, and de in python is usedThe l function deletes the invalid set of data in data _ xml, l ═ l-1, and performs step (2a 2); if xdNot equal to 0 and ydNot equal to 0, then xdAnd ydIf the data in the corresponding data _ xml is valid data, executing the step (2a 4);
step 2a4) calculating xdAnd ydIf yes, the data in the data _ xml corresponding to the div is invalid data, the set of invalid data is deleted in the data _ xml by using a del function in python, and the step (2a5) is executed, otherwise, the data in the data _ xml corresponding to the div is valid data, and the step (2a5) is executed, so that q is q + 1;
step 2a5) repeatedly executing the steps (2a2) - (2a4) until q is equal to l, and obtaining valid annotation information, namely all annotation information in the data _ xml at this time;
step 2b) clustering the effective labeling information:
step 2b1) manually setting the number of the clustering centers as k, wherein k is greater than 0, in this embodiment, k is 9, constructing a two-dimensional matrix data _ k, the row number of the two-dimensional matrix data _ k is the length l of the data _ xml at this time, the column number is k, the row of the data _ k represents effective marking information stored in the data _ xml, the column represents the value of the clustering centers, and the data _ k is initialized to 0 by np.zeros in python;
step 2b2) randomly initializing k clustering centers respectively by np in python, wherein each clustering center is a group of floating point type arrays with the length of 2, and writing the values of the clustering centers into boxes named clusters;
step 2b3) calculating distance values d (box, centroid) between l effective labeling information in the data _ xml and k clustering centers, wherein the calculation expression is as follows:
d(box,centroid)=1-IOU(box,centroid)
Figure BDA0001997803410000081
box=xd×yd
wherein, the centroid represents the product of two floating point type members in the cluster center, box ^ centroid represents the intersection of box and centroid, and box ^ gou centroid represents the union of box and centroid, and then each d (box, centroid) is written into the position where the row corresponding to the effective marking information in data _ k and the column corresponding to the cluster center are located;
step 2b4) uses np.argmin in python to calculate the column where the minimum distance value in each row of data _ k is located and records it into the variable nearest _ clusters, and updates each cluster center in python using the following statements:
clusters[cluster]=dist(boxes[nearest_clusters==cluster],axis=0)
wherein, cluster is the index of the cluster center, and the cluster is added once in python until all cluster centers are updated, and the updated cluster centers are still stored in boxes named as cluster;
step 2b5) repeating the steps (2b3) and (2b4) until the values of k cluster centers are not changed any more, and taking the values of k cluster centers as a clustering result;
step 3) improving a loss function of the YOLOv3 detection network:
modifying the coordinate Loss function in the delta _ region _ box function of the region _ layer.c file in the darknet/src folder to Loss'coord
Figure BDA0001997803410000082
ti=2-wi×hi
The complete modified Loss function Loss' in YOLOv3 is:
Loss'=Lossnoobj+Lossobj+Lossclass+Loss'coord
Figure BDA0001997803410000091
Figure BDA0001997803410000092
Figure BDA0001997803410000093
therein, LossnoobjLoss of confidence function, Loss, representing a prediction box that does not contain an objectobjLoss, confidence Loss function representing a prediction box containing an objectclassRepresents a class Loss function, Loss'coordRepresenting an improved coordinate loss function, λcoordWeight parameter representing the network to the coordinates of the prediction box, l.w representing the size of the network divided over the picture width, l.h representing the size of the network divided over the picture height, l.n representing the number of prediction boxes in the network, i being a variable for an iteration of l.w × l.h, j being a variable for an iteration of l.n, wiIndicates the width of the prediction box and,
Figure BDA0001997803410000094
width, h, of the reference frameiIndicating the high of the prediction box that,
Figure BDA0001997803410000095
high, x representing the label boxiRepresenting the projection of the coordinates of the upper left corner of the prediction box on the x-axis,
Figure BDA0001997803410000096
denotes xmin,yiRepresents the projection of the coordinates of the upper left corner of the prediction box on the y-axis,
Figure BDA0001997803410000097
denotes ymin,LossnoobjLoss of confidence function, Loss, representing a prediction box that does not contain an objectobjLoss of confidence function for the prediction box containing the target, LossclassAs a function of class loss, λnoobjThe coefficients corresponding to the prediction blocks that represent no object,
Figure BDA0001997803410000098
is a parameter indicating whether the prediction box does not contain a target, ciFor the purpose of predicting the confidence of the box,
Figure BDA0001997803410000099
to label the box confidence, λobjRepresenting the coefficients corresponding to the prediction box containing the target,
Figure BDA00019978034100000910
is a parameter indicating whether the prediction box contains a target; lambda [ alpha ]classRepresenting coefficients corresponding to prediction boxes containing object classes, c representing an iteration variable for a class, class representing the total class in the dataset, pi(c) Indicates the probability that the prediction box contains the c category,
Figure BDA00019978034100000911
representing the probability that the label box contains the class c;
step 4) training the training set based on the improved loss function:
step 4a) carrying out initialization setting on training parameters of the pedestrian detection network:
modifying paths of a training set and a test set in the voc.data file, setting the maximum iteration times max _ batches to 50200 times, setting the picture batch processing size to 64, and setting the initial learning rate to 10-3Momentum of 0.9;
step 4b) taking the clustering result as the size initialization value of the Yolov3 network candidate box:
writing the clustering results in anchors in yolov3-voc.cfg file;
step 4c) performing K times of iterative training on the training set based on the improved loss function in the YOLOv3 network, wherein K is more than 10000, and K in the embodiment is 20000, so as to obtain a pedestrian detection network model;
step 5) detecting the test set:
step 5a) entering shell commands under the darknet folder:
./darknet detector test cfg/voc.data cfg/yolov3-voc.cfg yolov3-voc_20000.weights
and step 5b), the pedestrian detection network model performs forward calculation on the read-in test set picture through an improved loss function according to the input shell command to obtain the position coordinate and confidence coefficient of each pedestrian target, and the position coordinate and the confidence coefficient are stored in a data/out folder.
The technical effects of the invention are further explained by combining simulation experiments as follows:
1. simulation conditions and contents:
the simulation experiment of the invention is realized in the configuration environment of Intel (R) Xeon (R) CPU E5-2650 v4@2.20GHz, GeForce GTX 1080ti x4 and 32G internal memory. Pedestrian video data used in the experiment is derived from pedestrians on roads in and near the campus of the western-style electronic technology university actually shot by a red-rice note7 mobile phone.
Simulation experiment: compared and simulated with the prior art, the method for detecting the pedestrian based on the improved k-means and the improved loss function comprises the steps of firstly screening effective data of the marking information of the training set by utilizing the improved k-means after constructing the training set and the test set according to the invention, then respectively clustering the effective data and all the data in the marking information of the training set to obtain respective clustering results, respectively taking the two clustering results as the initial sizes of the improved loss function-based Yolov3 and the network candidate frame in the prior art, then respectively training the training set by utilizing the improved loss function in the Yolov3, simultaneously training the training set by utilizing the network in the prior art 20000 times, finally obtaining respective pedestrian detection network models, respectively inputting the test set into the two pedestrian detection network models to obtain the position coordinate and the confidence result of each pedestrian target respectively detected by the two models, and counting the detection precision of the two methods, wherein the specific detection precision is compared as shown in the following table.
2. And (3) simulation result analysis:
compared with the prior art, the pedestrian detection result obtained by the invention has obvious advantages, and the detection precision of the prior art and the detection precision of the invention are shown in the table 1:
TABLE 1 detection accuracy contrast table
Evaluation index Prior Art The invention
Detection accuracy 87.3 89.0
As is apparent from the table, the detection precision obtained by the method is higher, and the detection effect of the method on the pedestrian target is better than that of the prior art.
The above description is only one specific example of the present invention and should not be construed as limiting the invention in any way. It will be apparent to persons skilled in the relevant art(s) that various modifications and changes in form and detail can be made therein without departing from the principles and arrangements of the invention, but these modifications and changes based on the inventive concept are also within the scope of the invention as defined in the appended claims.

Claims (2)

1. A pedestrian detection method based on improved k-means and a loss function, comprising the steps of:
(1) constructing a training set and a testing set:
(1a) storing continuous or discontinuous N-frame images in a pedestrian video in any scene into a JPEGImages folder in a jpg picture form, and naming each picture, wherein N is more than 10000;
(1b) taking more than half of pictures in a JPEGImages folder as a training picture set, taking the rest pictures as a test picture set, writing the names of all the pictures in the training picture set into a train.txt file under an ImageSets/Main folder, and simultaneously writing the names of all the pictures in the test picture set into a test.txt file under the ImageSets/Main folder;
(1c) carrying out frame marking on different pedestrians contained in each picture in the training picture set and the test picture set, storing coordinate data of a marking frame, and then storing the type person of a pedestrian target contained in the marking frame and the coordinate data of the marking frame contained in each picture into an xml file to obtain an indications file folder consisting of a plurality of xml files, wherein the name of each xml file is the same as that of the corresponding pedestrian picture;
(1d) selecting an xml file which is selected from an options folder and has the same name as a picture in a train.txt file as a marking information set of a training picture set, taking an xml file which is selected from the options folder and has the same name as a picture in a test.txt file as a marking information set of a test picture set, writing the marking information set of the training picture set into a train.txt file under a darknet folder, writing the marking information set of the test picture set into the test.txt file under the darknet folder, wherein the training picture set and the xml marking information set corresponding to the training picture set form a training set, and the test picture set and the xml marking information set corresponding to the testing picture set form a test set;
(2) clustering the training set based on an improved k-means algorithm:
(2a) and (3) screening the labeled information in the training set:
(2a1) writing coordinate data extracted from an xml marking file corresponding to a training set into an array data _ xml with the length of l, taking a first group of coordinate data read from the data _ xml as current coordinate data, and initializing the current index value q of the first group of coordinate data in the data _ xml to be 0;
(2a2) defining coordinate data corresponding to q in data _ xml: defining the projection coordinate of the x axis corresponding to the upper left corner of the labeling frame as xminAnd the y-axis projection coordinate corresponding to the upper left corner of the labeling box is defined as yminAnd the projection coordinate of the x axis corresponding to the lower right corner of the labeling frame is defined as xmaxAnd the y-axis projection coordinate corresponding to the lower right corner of the labeling frame is defined as ymax
(2a3) Calculating xminAnd xmaxDifference x ofd,yminAnd ymaxDifference y ofdAnd determining xdAnd ydWhether the data in the corresponding data _ xml is valid data or not is judged if x isd0 or ydX is 0dAnd ydDeleting the invalid data l-1 when the data in the corresponding data _ xml is invalid data, and executing the step (2a 2); if xdNot equal to 0 and ydNot equal to 0, then xdAnd ydIf the data in the corresponding data _ xml is valid data, executing the step (2a 4);
(2a4) calculating xdAnd ydAnd judging the validity of the data in the data _ xml corresponding to the div according to whether the div > 3 is satisfied, if so, the data in the data _ xml corresponding to the div is invalid data, deleting the invalid data, and making l-1, and executing the step (2a5), otherwise, the data in the data _ xml corresponding to the div is valid data, making q +1, and executing the step (2a 5);
(2a5) repeating the steps (2a2) - (2a4) until q is equal to l, and obtaining effective annotation information;
(2b) clustering the effective labeling information:
(2b1) setting the number of clustering centers as k, wherein k is more than 0, constructing a two-dimensional matrix data _ k with the length l of the data _ xml as the row number and the k as the column number, wherein the row of the data _ k represents effective marking information stored in the data _ xml, the list represents the value of the clustering centers, and initializing the data _ k to be 0;
(2b2) respectively carrying out random initialization on the k clustering centers;
(2b3) calculating distance values of l effective marking information and k clustering centers in the data _ xml, and writing each distance value into the position where the row corresponding to the effective marking information and the column corresponding to the clustering center in the data _ k are located;
(2b4) taking the effective marking information corresponding to each row in the data _ k as a member of a corresponding clustering center of the column where the minimum distance value in each row is located, and updating the numerical value of each clustering center into a mean value of the width and the height of the member of each clustering center;
(2b5) repeating the steps (2b3) and (2b4) until the values of the k clustering centers are not changed any more, and taking the values of the k clustering centers as clustering results;
(3) the loss function of the YOLOv3 detection network is improved:
modifying coordinate Loss function in YOLOv3 detection network Loss function into Loss'coord
Figure FDA0002882373970000031
ti=2-wi×hi
Wherein λ iscoordWeight parameters representing the network to the coordinates of the prediction box, l.w representing the size of the network divided over the picture width, l.h representing the size of the network divided over the picture height, l.n representing the number of prediction boxes in the network, i being a variable for an iteration of l.w × l.h, j being a variable for an iteration of l.n,
Figure FDA0002882373970000032
a parameter, w, indicating whether the prediction box contains an objectiIndicates the width of the prediction box and,
Figure FDA0002882373970000033
width, h, of the reference frameiIndicating the high of the prediction box that,
Figure FDA0002882373970000034
high, x representing the label boxiRepresenting the projection of the coordinates of the upper left corner of the prediction box on the x-axis,
Figure FDA0002882373970000035
denotes xmin,yiRepresents the projection of the coordinates of the upper left corner of the prediction box on the y-axis,
Figure FDA0002882373970000036
denotes ymin
(4) Training the training set based on the improved loss function:
(4a) taking the clustering result as a size initialization value of a YOLOv3 network candidate box;
(4b) performing K times of iterative training on the training set based on an improved loss function in the YOLOv3 network, wherein K is more than 10000, and obtaining a pedestrian detection network model;
(5) and (3) detecting the test set:
and inputting the test set to be detected into a pedestrian detection network model for detection to obtain the position coordinate and confidence coefficient of each pedestrian target.
2. The pedestrian detection method based on the improved k-means and the loss function of claim 1, wherein the YOLOv3 in the step (3) is used for detecting the loss function of the network, and the calculation expression is
Loss=Lossnoobj+Lossobj+Lossclass+Losscoord
Figure FDA0002882373970000037
Figure FDA0002882373970000038
Figure FDA0002882373970000041
Figure FDA0002882373970000042
ti=2-wi×hi
Wherein Loss denotes the Loss function, LossnoobjLoss of confidence function, Loss, representing a prediction box that does not contain an objectobjRepresenting prediction boxes containing objectsLoss of confidence function, LossclassRepresents the class Loss function, LosscoordRepresenting the coordinate loss function, λnoobjCoefficients corresponding to prediction frames not including the target are represented, l.w represents the division size of the network in the picture width direction, l.h represents the division size of the network in the picture height direction, i, j are respectively corresponding iteration variables,
Figure FDA0002882373970000043
is a parameter indicating whether the prediction box does not contain a target, ciFor the purpose of predicting the confidence of the box,
Figure FDA0002882373970000044
the confidence of the labeling box is obtained; lambda [ alpha ]objRepresenting the coefficients corresponding to the prediction box containing the target,
Figure FDA0002882373970000045
is a parameter indicating whether the prediction box contains a target; lambda [ alpha ]classRepresenting coefficients corresponding to prediction boxes containing object classes, c representing an iteration variable for a class, class representing the total class in the dataset, pi(c) Indicates the probability that the prediction box contains the c category,
Figure FDA0002882373970000046
denotes the probability, λ, of the label box containing the c classcoordWeight parameter, w, representing the coordinates of the prediction box by the networkiIndicates the width of the prediction box and,
Figure FDA0002882373970000047
width, h, of the reference frameiIndicating the high of the prediction box that,
Figure FDA0002882373970000048
high, x representing the label boxiRepresenting the projection of the coordinates of the upper left corner of the prediction box on the x-axis,
Figure FDA0002882373970000049
denotes xmin,yiRepresents the projection of the coordinates of the upper left corner of the prediction box on the y-axis,
Figure FDA00028823739700000410
denotes ymin
CN201910202078.4A 2019-03-18 2019-03-18 Pedestrian detection method based on improved k-means and loss function Active CN109978035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910202078.4A CN109978035B (en) 2019-03-18 2019-03-18 Pedestrian detection method based on improved k-means and loss function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910202078.4A CN109978035B (en) 2019-03-18 2019-03-18 Pedestrian detection method based on improved k-means and loss function

Publications (2)

Publication Number Publication Date
CN109978035A CN109978035A (en) 2019-07-05
CN109978035B true CN109978035B (en) 2021-04-02

Family

ID=67079213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910202078.4A Active CN109978035B (en) 2019-03-18 2019-03-18 Pedestrian detection method based on improved k-means and loss function

Country Status (1)

Country Link
CN (1) CN109978035B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866476B (en) * 2019-11-06 2023-09-01 南京信息职业技术学院 Dense stacking target detection method based on automatic labeling and transfer learning
CN110942005A (en) * 2019-11-21 2020-03-31 网易(杭州)网络有限公司 Object recognition method and device
CN110929646A (en) * 2019-11-22 2020-03-27 国网福建省电力有限公司 Power distribution tower reverse-off information rapid identification method based on unmanned aerial vehicle aerial image
CN111104965A (en) * 2019-11-25 2020-05-05 河北科技大学 Vehicle target identification method and device
CN111274894A (en) * 2020-01-15 2020-06-12 太原科技大学 Improved YOLOv 3-based method for detecting on-duty state of personnel
CN113537257A (en) * 2020-04-13 2021-10-22 山西农业大学 Wheat detection method realized based on YoLov3 network
CN112800906B (en) * 2021-01-19 2022-08-30 吉林大学 Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
CN113361347A (en) * 2021-05-25 2021-09-07 东南大学成贤学院 Job site safety detection method based on YOLO algorithm
CN113807472B (en) * 2021-11-19 2022-02-22 智道网联科技(北京)有限公司 Hierarchical target detection method and device
CN114119583A (en) * 2021-12-01 2022-03-01 常州市新创智能科技有限公司 Industrial visual inspection system, method, network model selection method and warp knitting machine

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186776A (en) * 2013-04-03 2013-07-03 西安电子科技大学 Human detection method based on multiple features and depth information
CN107358223A (en) * 2017-08-16 2017-11-17 上海荷福人工智能科技(集团)有限公司 A kind of Face datection and face alignment method based on yolo
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image
CN108647665A (en) * 2018-05-18 2018-10-12 西安电子科技大学 Vehicle real-time detection method of taking photo by plane based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872477B (en) * 2009-04-24 2014-07-16 索尼株式会社 Method and device for detecting object in image and system containing device
US9460354B2 (en) * 2012-11-09 2016-10-04 Analog Devices Global Object detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186776A (en) * 2013-04-03 2013-07-03 西安电子科技大学 Human detection method based on multiple features and depth information
CN107358223A (en) * 2017-08-16 2017-11-17 上海荷福人工智能科技(集团)有限公司 A kind of Face datection and face alignment method based on yolo
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image
CN108647665A (en) * 2018-05-18 2018-10-12 西安电子科技大学 Vehicle real-time detection method of taking photo by plane based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Pedestrian Detection Method Based on Faster R-CNN;Hui Zhang 等;《 2017 13th International Conference on Computational Intelligence and Security》;IEEE;20171218;427-430 *
Pedestrian Detection: An Evaluation of the State of the Art;Piotr Dollar 等;《 IEEE Transactions on Pattern Analysis and Machine Intelligence》;IEEE;20120430;第34卷(第4期);743-761 *

Also Published As

Publication number Publication date
CN109978035A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN109978035B (en) Pedestrian detection method based on improved k-means and loss function
CN111104898B (en) Image scene classification method and device based on target semantics and attention mechanism
CN111476284B (en) Image recognition model training and image recognition method and device and electronic equipment
CN112668594B (en) Unsupervised image target detection method based on antagonism domain adaptation
US20080304743A1 (en) Active segmentation for groups of images
CN108399378B (en) Natural scene image identification method based on VGG deep convolution network
CN105574550A (en) Vehicle identification method and device
CN109063549B (en) High-resolution aerial video moving target detection method based on deep neural network
WO2023000160A1 (en) Hyperspectral remote sensing image semi-supervised classification method, apparatus, and device, and storage medium
CN110716792B (en) Target detector and construction method and application thereof
KR20200027888A (en) Learning method, learning device for detecting lane using lane model and test method, test device using the same
CN115471739A (en) Cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning
CN110659601A (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
JP2019185787A (en) Remote determination of containers in geographical region
CN112634368A (en) Method and device for generating space and OR graph model of scene target and electronic equipment
CN110135428B (en) Image segmentation processing method and device
CN111444816A (en) Multi-scale dense pedestrian detection method based on fast RCNN
CN114549959A (en) Infrared dim target real-time detection method and system based on target detection model
CN114299291A (en) Interpretable artificial intelligent medical image semantic segmentation method
CN112329810B (en) Image recognition model training method and device based on significance detection
CN116522565B (en) BIM-based power engineering design power distribution network planning method and computer equipment
CN112990282A (en) Method and device for classifying fine-grained small sample images
CN113139540B (en) Backboard detection method and equipment
US20220366242A1 (en) Information processing apparatus, information processing method, and storage medium
CN109409415A (en) A kind of LLE algorithm kept based on global information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant