CN109978035A

CN109978035A - Pedestrian detection method based on improved k-means and loss function

Info

Publication number: CN109978035A
Application number: CN201910202078.4A
Authority: CN
Inventors: 郭杰; 郑佳卉; 吴宪云; 李云松; 解静; 邱尚锋; 林朋雨
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2019-07-05
Anticipated expiration: 2039-03-18
Also published as: CN109978035B

Abstract

The invention proposes a kind of pedestrian detection method based on improved k-means and loss function, for the video comprising pedestrian target or image to be classified and are identified, it mainly solves the problems, such as that cluster result inaccuracy in the prior art and prediction block cannot lose according to own dimensions feature learning, realizes step are as follows: building training set and test set；Training set is clustered based on improved k-means algorithm；The loss function of YOLOv3 detection network is improved；Training set is trained based on improved loss function；Test set is detected.The present invention screens out the invalid data in training set markup information in clustering phase, obtained valid data are clustered, to obtain more accurate candidate frame initialization size, lose different prediction blocks according to the different prediction of own dimensions feature learning, to obtain more accurate pedestrian target detection network.

Description

Pedestrian detection method based on improved k-means and loss function

Technical field

The invention belongs to target detection technique fields, are related to a kind of pedestrian detection method, and in particular to one kind is based on improvement K-means and improved loss function pedestrian detection method, can be used for carrying out the video comprising pedestrian target or image Classification and identification.

Background technique

Pedestrian detection, which refers to, detects the position coordinates of pedestrian and confidence level in video or image, measures testing result Refer mainly to indicate detection accuracy and detection speed, most important one measurement index is detection accuracy, and detection accuracy is often It is influenced by pedestrian's feature and loss function.

Currently, common pedestrian detection method can be divided into according to the extracting mode difference of pedestrian's feature based on tradition calculation The pedestrian detection of method and two class of pedestrian detection based on deep learning.

Traditional pedestrian detection method mainly has the detection method of global characteristics, the detection method based on local shape factor With the detection method based on multiple features.Detection method based on global characteristics is mainly the gradient orientation histogram by whole picture figure The profile of pedestrian is detected to find the position of pedestrian.Detection method based on local shape factor mainly extracts input picture Local feature by match pedestrian's feature detect.For detection method based on multiple features mainly to gray scale, profile etc. is more Seed type feature extracts the testing result of detection and these comprehensive features.The Common advantages of three of the above method are simple fast Speed, but since pedestrian's feature is to illumination, background and the factors such as to block more sensitive, be readily incorporated when detecting ambient noise and Light interference, therefore traditional pedestrian detection method detection accuracy is lower.

The development of deep learning is that the research of pedestrian detection brings new thinking.Pedestrian detection side based on deep learning Method is mainly had the detection method chosen based on candidate frame and based on detection method end to end, the detection side chosen based on candidate frame The mainly artificial candidate frame of choosing of method carries out network training again, although this method has good detection effect, due to its thing First choosing candidate frame causes the detection efficiency of network very low.

In recent years, based on detection method end to end since it is increasingly becoming with preferable detection accuracy and detection efficiency The main stream approach in pedestrian detection field, this method are basic network with the target detection network based on deep learning, utilize cluster Method the size of candidate frame is initialized, to allow the original dimension of candidate frame close to the size of pedestrian's feature so that Network is more easier to restrain, and is then trained using loss function to training set, obtains pedestrian detection network model, finally makes Test set picture is detected with pedestrian detection network model to obtain the position coordinates and confidence level of all pedestrian targets.However Basic network detection accuracy used by current most of pedestrian detection algorithms is still undesirable, such as YOLOv1, YOLOv2, therefore The detection accuracy of these pedestrian target detection algorithms is lower.For example, application publication number is CN 109325418A, entitled " base The patent application of the pedestrian recognition method under the road traffic environment for improving YOLOv3 " discloses a kind of by improved The method of YOLOv3 progress pedestrian detection.This method is basic network with YOLOv3, first in the process using k-means cluster In increase the number of candidate frame, thus increase network extract feature ability, then again network using loss function into When row training, the weight of the coordinate loss function in loss function is increased, pedestrian detection network model is obtained.But this method exists The situation that markup information is invalid in training set is not accounted for when being clustered using k-means, so that cluster result is inaccurate； And this method does not account for different size prediction frames when calculating and losing to the error of coordinate and width height in coordinate loss function Error learns specific gravity different problems, prevent prediction block according to own dimensions feature learning from losing.Therefore, how to filter out Valid data in training set markup information and to calculate more accurate loss be still the field urgent problem to be solved.

Summary of the invention

It is an object of the invention to be directed to the deficiency of above-mentioned existing pedestrian detection technology, propose a kind of based on improved k- The pedestrian detection method of means and loss function, it is intended to improve the detection accuracy of pedestrian target under different scenes.

Technical thought of the invention is: building training set and test set first, is secondly calculated using improved k-means cluster Method clusters the markup information of training set, and using cluster result as the size initialization value of YOLOv3 network candidates frame, It is then based on improved loss function in YOLOv3 network to be trained training set, finally utilizes trained pedestrian detection net Network model detects test set.

According to above-mentioned technical thought, the technical solution for realizing that the object of the invention is taken includes the following steps:

(1) training set and test set are constructed:

(1a) is by the N frame image under any scene of acquisition in pedestrian's video continuously or discontinuously in the form of jpg picture It is saved in JPEGImages file, and each width picture is named, N > 1000；

(1b) using picture more than half in JPEGImages file as training pictures, remaining picture is as test Pictures, and by training pictures in all pictures title write-in ImageSets/Main file under trainval.txt In file, while the test.txt for concentrating the title of all pictures to be written under ImageSets/Main file in test picture is literary In part；

The different pedestrians that every width picture that (1c) concentrates training pictures and test picture is included carry out picture frame mark, And the coordinate data of callout box is saved, then by the classification person of pedestrian target included in callout box and every width picture In include the coordinate data of callout box be saved in xml document, obtain the Annotations being made of multiple xml documents text Part folder, wherein the title of the corresponding pedestrian's picture of the title of each xml document is identical；

(1d) will be identical with picture name in trainval.txt file from what is chosen in Annotations file Markup information collection of the xml document as training pictures, xml document identical with picture name in test.txt file is as survey The markup information collection of pictures is tried, and the markup information collection of training pictures is written to the train.txt under darknet file In file, by the test.txt file under the markup information collection write-in darknet file for testing pictures, the training is schemed Piece collection xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml markup information collection corresponding with its, which is constituted, to be surveyed Examination collection；

(2) training set is clustered based on improved k-means algorithm:

(2a) screens the markup information in training set:

The array that the coordinate data write-in length extracted from the corresponding xml mark file of training set is l by (2a1) In data_xml, using read from data_xml first group of coordinate data as changing coordinates data, and initialize its Current index value q=0 in data_xml；

(2a2) defines the corresponding coordinate data of q in data_xml: the corresponding x-axis projection coordinate in the callout box upper left corner is determined Justice is x_min, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as y_min, the corresponding x-axis projection coordinate in the callout box lower right corner It is defined as x_max, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as y_max；

(2a3) calculates x_minWith x_maxDifference x_d, y_minWith y_maxDifference y_d, and judge x_dAnd y_dCorresponding data_xml In data whether be valid data, if x_d=0 or y_d=0, then x_dAnd y_dData in corresponding data_xml are invalid number According to deleting the invalid data, l=l-1, and execute step (2a2)；If x_d≠ 0 and y_d≠ 0, then x_dAnd y_dCorresponding data_xml In data be valid data, execute step (2a4)；

(2a4) calculates x_dWith y_dQuotient div, and according to the number in the corresponding data_xml of the whether true judgement div of div > 3 According to validity delete the invalid data if so, data in the corresponding data_xml of div are invalid data, l=l-1, And step (2a5) is executed, otherwise, then the data in the corresponding data_xml of div are valid data, enable q=q+1, and execute step Suddenly (2a5)；

(2a5) repeats step (2a2)~(2a4) until q=l, obtains effective markup information；

(2b) clusters effective markup information:

(2b1) sets the number of cluster centre as k, and k > 0 is constructed using the length l of data_xml as line number, using k as columns The row of two-dimensional matrix data_k, data_k indicate that the effective markup information saved in data_xml, column indicate the value of cluster centre, And initializing data_k is 0,；

(2b2) carries out random initializtion to k cluster centre respectively；

(2b3) calculates the distance value of l effectively markup information and k cluster centre in data_xml, and by each distance Position in value write-in data_k effectively where the corresponding row of markup information and the corresponding column of cluster centre；

(2b4) is using the corresponding effective markup information of a line every in data_k as where lowest distance value in every a line The member of corresponding cluster centre is arranged, and the numerical value of each cluster centre is updated to wide and high by each cluster centre member Mean value；

(2b5) repeats step (2b3) and (2b4), clusters until the value of k cluster centre no longer changes, and by k The value at center is as cluster result；

(3) loss function of YOLOv3 detection network is improved:

YOLOv3 is detected into the coordinate loss function in network losses function and is revised as Loss'_coord:

t_i=2-w_i×h_i

Wherein, λ_coordNetwork is indicated to the weight parameter of prediction block coordinate, l.w indicates network in the ruler of the wide upper division of picture Very little, l.h indicates the size that network divides on picture height, and l.n indicates the number of prediction block in network, and i is to change to l.w × l.h The variable in generation, j are the variable to l.n iteration, w_iIndicate the width of prediction block,Indicate the width of callout box, h_iIndicate prediction block Height,Indicate the height of callout box, x_iIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate x_min, y_iIndicate prediction block Top left co-ordinate y-axis projection,Indicate y_min；

(4) training set is trained based on improved loss function:

(4a) is using cluster result as the size initialization value of YOLOv3 network candidates frame；

(4b) is based on improved loss function in YOLOv3 network and carries out K repetitive exercise to training set, and K > 10000 is obtained To pedestrian detection network model；

(5) test set is detected:

Test set to be detected is input in pedestrian detection network model and is detected, the position of each pedestrian target is obtained Set coordinate and confidence level.

Compared with the prior art, the invention has the following advantages:

The present invention improves the loss function in YOLOv3, increases in coordinate loss function to small size prediction block The study weight of error of coordinate avoids the defect that prediction block cannot lose according to own dimensions feature learning, meanwhile, the present invention K-means clustering algorithm is improved, the value of wide Gao great little and the ratio of width to height to callout box in training set are screened, and are retained Invalid data is removed while valid data, and valid data are clustered, and avoids invalid markup information and cluster is tied Fruit is inaccurate and influences the defect of detection accuracy, and simulation result shows that the present invention effectively improves row compared with prior art The detection accuracy of people's detection.

Detailed description of the invention

Fig. 1 is implementation flow chart of the present invention.

Specific embodiment

In the following with reference to the drawings and specific embodiments, present invention is further described in detail.

Referring to Fig.1, the present invention includes the following steps:

Step 1) constructs training set and test set:

Step 1a) by video camera, unmanned plane or mobile phone shooting any scene under pedestrian video in continuously or discontinuously N frame picture extract and a frame picture and be saved in JPEGImages file, N > 10000 every 10 frames, adopted in the present embodiment It is continuous 12000 frame picture in pedestrian's video in the road of mobile phone shooting, each width picture is named as different names, Wherein the resolution ratio of video is that the quantity of the picture saved in 1920 × 1080, JPEGImages file is no less than 1000 width；

Step 1b) using picture more than half in JPEGImages file as training pictures, remaining picture conduct Pictures are tested, the ratio cut partition training pictures and test pictures of 7:3 are used in the present embodiment, and will be trained in pictures In trainval.txt file under the title write-in ImageSets/Main file of all pictures, while pictures will be tested In all pictures title write-in ImageSets/Main file under test.txt file in, wherein the title of every width picture A line is used as in trainval.txt file and test.txt file；

Step 1c) picture frame mark is carried out with the pedestrian target that every width picture that picture is concentrated is included is tested to training pictures Note:

Step 1c1) to the classification and position coordinates (x of pedestrian target_min,y_min,x_max,y_max) be labeled, wherein each The classification of pedestrian target is person, x_minFor the corresponding x-axis projection coordinate in the callout box upper left corner, y_minFor the callout box upper left corner Corresponding y-axis projection coordinate, x_maxFor the corresponding x-axis projection coordinate in the callout box lower right corner, y_maxFor the corresponding y in the callout box lower right corner Axial projection's coordinate；

Step 1c2) by training pictures and test pictures every width picture in all pedestrian targets markup information with Xml format is saved, and the Annotations file being made of multiple xml formatted files is obtained, wherein each xml format The title of file with it includes markup information corresponding to picture name it is identical, the mark as corresponding to picture 000001.jpg Information file name is 000001.xml, by JPEGImages file, Annotations file and ImageSets file It folds up in file darknet；

Step 1d) it will be identical as picture name in trainval.txt file from what is chosen in Annotations file Xml document as training pictures markup information collection, xml document conduct identical with picture name in test.txt file The markup information collection of pictures is tested, and will be under the markup information collection write-in darknet file of training pictures In train.txt file, the markup information collection for testing pictures is written in the test.txt file under darknet file, Trained pictures xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml mark corresponding with its are believed Breath collection constitutes test set；

Step 2) is based on improved k-means algorithm and clusters to training set:

Step 2a) markup information in training set is screened:

Step 2a1) building array data_xml, using the obj.findtext in python from the xml of all training sets Coordinate data is extracted in file, and coordinate data is sequentially written in data_xml, wherein each member of data_xml It indicates one group of coordinate data, the length l of data_xml is calculated using the len function in python, read the in data_xml One group of coordinate data, and initialize the current index value q=0 of data_xml；

Step 2a2) define the corresponding coordinate data of q in data_xml: by the corresponding x-axis projection coordinate in the callout box upper left corner It is defined as x_min, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as y_min, the corresponding x-axis projection seat in the callout box lower right corner Mark is defined as x_max, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as y_max；

Step 2a3) calculate x_minWith x_maxDifference x_d, y_minWith y_maxDifference y_d, wherein x_min、x_max、y_minAnd y_maxIt is Floating type number, and judge x_dAnd y_dWhether the data in corresponding data_xml are valid data, if x_d=0 or y_d=0, then x_dWith y_dData in corresponding data_xml are invalid data, delete this in data_xml using the del function in python Group invalid data, l=l-1, and execute step (2a2)；If x_d≠ 0 and y_d≠ 0, then x_dAnd y_dNumber in corresponding data_xml According to for valid data, execute step (2a4)；

Step 2a4) calculate x_dWith y_dQuotient div, and according in the corresponding data_xml of the whether true judgement div of div > 3 The validity of data use the del letter in python if so, data in the corresponding data_xml of div are invalid data Number deletes this group of invalid data, l=l-1 in data_xml, and executes step (2a5), otherwise, then the corresponding data_ of div Data in xml are valid data, enable q=q+1, and execute step (2a5)；

Step 2a5) step (2a2)~(2a4) is repeated until q=l, effective markup information is obtained, i.e., data_ at this time Whole markup informations in xml；

Step 2b) effective markup information is clustered:

Step 2b1) number of artificial setting cluster centre is k, k > 0, k is 9 in the present embodiment, constructs two-dimensional matrix Data_k, line number are the length l of data_xml at this time, and the row of columns k, data_k indicate to save in data_xml effective Markup information, column indicate the value of cluster centre, and the use of the np.zeros initialization data_k in python are 0,；

Step 2b2) k cluster centre is carried out respectively using the np.random.choice in python it is initial at random Change, wherein each cluster centre is the floating type array that one group of length is 2, the value of cluster centre is written entitled clusters's In boxes；

Step 2b3) calculate l effectively markup informations and k cluster centre in data_xml distance value d (box, Centroid), calculation expression are as follows:

D (box, centroid)=1-IOU (box, centroid)

Box=x_d×y_d

Wherein centroid indicate cluster centre in two floating type members product, box ∩ centorid indicate box with The intersection of centroid, box ∪ centorid indicates the union of box and centroid, then by each d (box, centroid) The position in data_k effectively where the corresponding row of markup information and the corresponding column of cluster centre is written；

Step 2b4) using the np.argmin in python calculate the column in the every a line of data_k where lowest distance value simultaneously It is recorded in variable nearest_clusters, each cluster centre is updated using following sentence in python:

Clusters [cluster]=dist (boxes [nearest_clusters==cluster], axis=0)

Wherein, cluster is the index of cluster centre, and every once above-mentioned sentence cluster that executes adds one in python, Until finishing until all updating to all cluster centres, updated cluster centre is still stored in entitled clusters's In boxes；

Step 2b5) step (2b3) and (2b4) is repeated, until the value of k cluster centre no longer changes, and by k The value of cluster centre is as cluster result；

Step 3) improves the loss function of YOLOv3 detection network:

By the coordinate in the delta_region_box function of region_layer.c file in darknet/src file Loss function is revised as Loss'_coord:

t_i=2-w_i×h_i

Loss function Loss' is then completely improved in YOLOv3 are as follows:

Loss'=Loss_noobj+Loss_obj+Loss_class+Loss'_coord

Wherein, Loss_noobjIndicate the confidence level loss function for not including the prediction block of target, Loss_objIt indicates to include target Prediction block confidence level loss function, Loss_classIndicate classification loss function, Loss'_coordIndicate improved coordinate loss Function, λ_coordNetwork is indicated to the weight parameter of prediction block coordinate, l.w indicates size of the network in the wide upper division of picture, l.h table Showing the size that network divides on picture height, l.n indicates the number of prediction block in network, and i is the variable to l.w × l.h iteration, J is the variable to l.n iteration, w_iIndicate the width of prediction block,Indicate the width of callout box, h_iIndicate the height of prediction block,It indicates The height of callout box, x_iIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate x_min, y_iIndicate the upper left corner of prediction block Coordinate y-axis projection,Indicate y_min, Loss_noobjIndicate the confidence level loss function for not including the prediction block of target, Loss_objFor the confidence level loss function of the prediction block comprising target, Loss_classFor classification loss function, λ_noobjExpression is not wrapped The corresponding coefficient of prediction block containing target,It is to indicate whether the prediction block does not include the parameter of target, c_iIt is set for prediction block Reliability,For callout box confidence level, λ_objIndicate the corresponding coefficient of prediction block comprising target,It is whether to indicate the prediction block Parameter comprising target；λ_classIndicate that the corresponding coefficient of prediction block comprising target category, c indicate that the iteration for classification becomes Amount, class indicate classification total in data set, p_i(c) probability containing c classification in prediction block is indicated,Indicate callout box In the probability containing c classification；

Step 4) is based on improved loss function and is trained to training set:

Step 4a) Initialize installation is carried out to the training parameter of pedestrian detection network:

The path of training set and test set in voc.data file is modified, and maximum number of iterations max_batches is set It is 50200 times, picture batch processing size is 64, and initial learning rate is 10^-3, momentum 0.9；

Step 4b) using cluster result as the size initialization value of YOLOv3 network candidates frame:

Cluster result is written in the anchors in yolov3-voc.cfg file；

Step 4c) K repetitive exercise, K > carried out to training set based on improved loss function in YOLOv3 network 10000, K is 20000 in the present embodiment, obtains pedestrian detection network model；

Step 5) detects test set:

Step 5a) under darknet file input shell-command:

./darknet detector test cfg/voc.data cfg/yolov3-voc.cfg yolov3-voc_ 20000.weights

Step 5b) shell-command of the pedestrian detection network model according to input, by improved loss function to reading Test set picture carries out forward calculation, obtains the position coordinates and confidence level of each pedestrian target, and is stored in data/out text In part folder.

Below in conjunction with emulation experiment, technical effect of the invention is described further:

1. simulated conditions and content:

Emulation experiment of the invention is in Intel (R) Xeon (R) CPU E5-2650 v4@2.20GHz, GeForce GTX It is realized under the configuration surroundings of 1080ti x4,32G memory.Pedestrian's video data used in experiment, which derives from, uses red rice In the Xian Electronics Science and Technology University campus of note7 mobile phone actual photographed and nearby road pedestrian.

Emulation experiment: using based on the pedestrian detection method and the prior art pair for improving k-means and improvement loss function The detection accuracy of pedestrian detection does contrast simulation, after constructing training set and test set according to the present invention, first to the mark of training set It infuses the improved k-means of use of information and carries out valid data screening, later respectively to the valid data in training set markup information It is clustered to obtain respective cluster result with total data, using two cluster results as based on improvement loss function The initialization size of YOLOv3 and in the prior art network candidates frame, then using improved loss function in YOLOv3 to training Collection carry out 20000 times it is trained while using prior art network to training set progress 20000 training, finally obtain respective Pedestrian detection network model is separately input to the test set to obtain two models in two pedestrian detection network models to detect respectively Position coordinates and the confidence level of each pedestrian target out are as a result, and count the detection accuracy of two methods, specific detection accuracy Comparison is as shown in the table.

2. analysis of simulation result:

The obtained pedestrian detection result of the present invention has apparent advantage, the prior art and this hair compared with prior art Bright detection accuracy is as shown in table 1:

1 detection accuracy contrast table of table

Evaluation index	The prior art	The present invention
			Detection accuracy	87.3	89.0

, it is apparent that the obtained detection accuracy of the present invention is bigger from table, show inspection of the present invention to pedestrian target It surveys effect and is better than the prior art.

Above description is only example of the present invention, does not constitute any limitation of the invention.For this field Professional for, all may be without departing substantially from the principle of the invention, structure the case where after having understood the content of present invention and principle Under, various modifications and variations in form and details are carried out, but these modifications and variations based on inventive concept are still at this Within the claims of invention.

Claims

1. a kind of pedestrian detection method based on improved k-means and loss function, which comprises the steps of:

(1) training set and test set are constructed:

(1a) saves the N frame image under any scene of acquisition in pedestrian's video continuously or discontinuously in the form of jpg picture It is named into JPEGImages file, and to each width picture, N > 10000；

(1b) using picture more than half in JPEGImages file as training pictures, remaining picture is as test picture Collect, and the title of all pictures in training pictures is written to the trainval.txt file under ImageSets/Main file In, at the same by test picture concentrate all pictures title be written ImageSets/Main file under test.txt file In；

The different pedestrians that every width picture that (1c) concentrates training pictures and test picture is included carry out picture frame mark, and protect The coordinate data of callout box is deposited, then will be wrapped in the classification person of pedestrian target included in callout box and every width picture The coordinate data of the callout box contained is saved in xml document, obtains the Annotations file being made of multiple xml documents, Wherein the title of the corresponding pedestrian's picture of the title of each xml document is identical；

(1d) will be literary from the xml identical with picture name in trainval.txt file chosen in Annotations file Markup information collection of the part as training pictures, xml document identical with picture name in test.txt file is as test chart The markup information collection of piece collection, and by training pictures markup information collection write-in darknet file under train.txt file In, the markup information collection for testing pictures is written in the test.txt file under darknet file, the trained pictures Xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml markup information collection corresponding with its constitute test Collection；

(2) training set is clustered based on improved k-means algorithm:

(2a) screens the markup information in training set:

The array data_ that the coordinate data write-in length extracted from the corresponding xml mark file of training set is l by (2a1) In xml, using read from data_xml first group of coordinate data as changing coordinates data, and it is initialized in data_xml In current index value q=0；

(2a2) defines the corresponding coordinate data of q in data_xml: the corresponding x-axis projection coordinate in the callout box upper left corner is defined as x_min, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as y_min, the corresponding x-axis projection coordinate definition in the callout box lower right corner For x_max, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as y_max；

(2a3) calculates x_minWith x_maxDifference x_d, y_minWith y_maxDifference y_d, and judge x_dAnd y_dNumber in corresponding data_xml According to whether being valid data, if x_d=0 or y_d=0, then x_dAnd y_dData in corresponding data_xml are invalid data, are deleted The invalid data, l=l-1, and execute step (2a2)；If x_d≠ 0 and y_d≠ 0, then x_dAnd y_dNumber in corresponding data_xml According to for valid data, execute step (2a4)；

(2a4) calculates x_dWith y_dQuotient div, and according to the data in the corresponding data_xml of the whether true judgement div of div > 3 Validity deletes the invalid data, l=l-1, and hold if so, the data in the corresponding data_xml of div are invalid data Row step (2a5), otherwise, then the data in the corresponding data_xml of div are valid data, enable q=q+1, and execute step (2a5)；

(2b) clusters effective markup information:

(2b1) sets the number of cluster centre as k, and k > 0 is constructed using the length l of data_xml as line number, using k as the two dimension of columns The row of matrix data_k, data_k indicate that the effective markup information saved in data_xml, column indicate the value of cluster centre, and just Beginningization data_k is 0,；

(2b2) carries out random initializtion to k cluster centre respectively；

(2b3) calculates the distance value of l effective markup informations and k cluster centre in data_xml, and each distance value is write Enter the position in data_k where the corresponding row of effective markup information and the corresponding column of cluster centre；

(2b4) is using the corresponding effective markup information of a line every in data_k as the column pair where lowest distance value in every a line It answers the member of cluster centre, and the numerical value of each cluster centre is updated to by wide and high equal of each cluster centre member Value；

(2b5) repeats step (2b3) and (2b4), until the value of k cluster centre no longer changes, and by k cluster centre Value as cluster result；

(3) loss function of YOLOv3 detection network is improved:

t_i=2-w_i×h_i

Wherein, λ_coordIndicate network to the weight parameter of prediction block coordinate, l.w indicate network in the size of the wide upper division of picture, L.h indicates the size that network divides on picture height, and l.n indicates the number of prediction block in network, and i is to l.w × l.h iteration Variable, j are the variable to l.n iteration, w_iIndicate the width of prediction block,Indicate the width of callout box, h_iIndicate the height of prediction block, Indicate the height of callout box, x_iIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate x_min, y_iIndicate a left side for prediction block Upper angular coordinate y-axis projection,Indicate y_min；

(4) training set is trained based on improved loss function:

(4b) is based on improved loss function in YOLOv3 network and carries out K repetitive exercise to training set, and K > 10000 is gone People detects network model；

(5) test set is detected:

Test set to be detected is input in pedestrian detection network model and is detected, the position for obtaining each pedestrian target is sat Mark and confidence level.

2. the pedestrian detection method according to claim 1 based on improved k-means and loss function, feature exist In YOLOv3 described in step (3) detects the loss function of network, calculation expression Loss=Loss_noobj+Loss_obj+ Loss_class+Loss_coord

t_i=2-w_i×h_i

Wherein, Loss indicates loss function, Loss_noobjIndicate the confidence level loss function for not including the prediction block of target, Loss_objIndicate the confidence level loss function of the prediction block comprising target, Loss_classIndicate classification loss function, Loss_coordTable Show coordinate loss function, λ_noobjIndicate that the corresponding coefficient of prediction block for not including target, l.w indicate network in picture wide direction Division size, l.h indicates division size of the network on the high direction of picture, and i, j are respectively corresponding iteration variable,It is Indicate whether the prediction block does not include the parameter of target, c_iFor prediction block confidence level,For callout box confidence level；λ_objIndicate packet The corresponding coefficient of prediction block containing target,Be indicate the prediction block whether include target parameter；λ_classIt indicates to include target The corresponding coefficient of the prediction block of classification, c indicate the iteration variable for being directed to classification, and class indicates classification total in data set, p_i (c) probability containing c classification in prediction block is indicated,Indicate the probability containing c classification in callout box, λ_coordIndicate network To the weight parameter of prediction block coordinate, w_iIndicate the width of prediction block,Indicate the width of callout box, h_iIndicate the height of prediction block,Table Show the height of callout box, x_iIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate x_min, y_iIndicate the upper left of prediction block Angular coordinate y-axis projection,Indicate y_min。