CN109978035A - Pedestrian detection method based on improved k-means and loss function - Google Patents

Pedestrian detection method based on improved k-means and loss function Download PDF

Info

Publication number
CN109978035A
CN109978035A CN201910202078.4A CN201910202078A CN109978035A CN 109978035 A CN109978035 A CN 109978035A CN 201910202078 A CN201910202078 A CN 201910202078A CN 109978035 A CN109978035 A CN 109978035A
Authority
CN
China
Prior art keywords
data
indicate
xml
prediction block
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910202078.4A
Other languages
Chinese (zh)
Other versions
CN109978035B (en
Inventor
郭杰
郑佳卉
吴宪云
李云松
解静
邱尚锋
林朋雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910202078.4A priority Critical patent/CN109978035B/en
Publication of CN109978035A publication Critical patent/CN109978035A/en
Application granted granted Critical
Publication of CN109978035B publication Critical patent/CN109978035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention proposes a kind of pedestrian detection method based on improved k-means and loss function, for the video comprising pedestrian target or image to be classified and are identified, it mainly solves the problems, such as that cluster result inaccuracy in the prior art and prediction block cannot lose according to own dimensions feature learning, realizes step are as follows: building training set and test set;Training set is clustered based on improved k-means algorithm;The loss function of YOLOv3 detection network is improved;Training set is trained based on improved loss function;Test set is detected.The present invention screens out the invalid data in training set markup information in clustering phase, obtained valid data are clustered, to obtain more accurate candidate frame initialization size, lose different prediction blocks according to the different prediction of own dimensions feature learning, to obtain more accurate pedestrian target detection network.

Description

Pedestrian detection method based on improved k-means and loss function
Technical field
The invention belongs to target detection technique fields, are related to a kind of pedestrian detection method, and in particular to one kind is based on improvement K-means and improved loss function pedestrian detection method, can be used for carrying out the video comprising pedestrian target or image Classification and identification.
Background technique
Pedestrian detection, which refers to, detects the position coordinates of pedestrian and confidence level in video or image, measures testing result Refer mainly to indicate detection accuracy and detection speed, most important one measurement index is detection accuracy, and detection accuracy is often It is influenced by pedestrian's feature and loss function.
Currently, common pedestrian detection method can be divided into according to the extracting mode difference of pedestrian's feature based on tradition calculation The pedestrian detection of method and two class of pedestrian detection based on deep learning.
Traditional pedestrian detection method mainly has the detection method of global characteristics, the detection method based on local shape factor With the detection method based on multiple features.Detection method based on global characteristics is mainly the gradient orientation histogram by whole picture figure The profile of pedestrian is detected to find the position of pedestrian.Detection method based on local shape factor mainly extracts input picture Local feature by match pedestrian's feature detect.For detection method based on multiple features mainly to gray scale, profile etc. is more Seed type feature extracts the testing result of detection and these comprehensive features.The Common advantages of three of the above method are simple fast Speed, but since pedestrian's feature is to illumination, background and the factors such as to block more sensitive, be readily incorporated when detecting ambient noise and Light interference, therefore traditional pedestrian detection method detection accuracy is lower.
The development of deep learning is that the research of pedestrian detection brings new thinking.Pedestrian detection side based on deep learning Method is mainly had the detection method chosen based on candidate frame and based on detection method end to end, the detection side chosen based on candidate frame The mainly artificial candidate frame of choosing of method carries out network training again, although this method has good detection effect, due to its thing First choosing candidate frame causes the detection efficiency of network very low.
In recent years, based on detection method end to end since it is increasingly becoming with preferable detection accuracy and detection efficiency The main stream approach in pedestrian detection field, this method are basic network with the target detection network based on deep learning, utilize cluster Method the size of candidate frame is initialized, to allow the original dimension of candidate frame close to the size of pedestrian's feature so that Network is more easier to restrain, and is then trained using loss function to training set, obtains pedestrian detection network model, finally makes Test set picture is detected with pedestrian detection network model to obtain the position coordinates and confidence level of all pedestrian targets.However Basic network detection accuracy used by current most of pedestrian detection algorithms is still undesirable, such as YOLOv1, YOLOv2, therefore The detection accuracy of these pedestrian target detection algorithms is lower.For example, application publication number is CN 109325418A, entitled " base The patent application of the pedestrian recognition method under the road traffic environment for improving YOLOv3 " discloses a kind of by improved The method of YOLOv3 progress pedestrian detection.This method is basic network with YOLOv3, first in the process using k-means cluster In increase the number of candidate frame, thus increase network extract feature ability, then again network using loss function into When row training, the weight of the coordinate loss function in loss function is increased, pedestrian detection network model is obtained.But this method exists The situation that markup information is invalid in training set is not accounted for when being clustered using k-means, so that cluster result is inaccurate; And this method does not account for different size prediction frames when calculating and losing to the error of coordinate and width height in coordinate loss function Error learns specific gravity different problems, prevent prediction block according to own dimensions feature learning from losing.Therefore, how to filter out Valid data in training set markup information and to calculate more accurate loss be still the field urgent problem to be solved.
Summary of the invention
It is an object of the invention to be directed to the deficiency of above-mentioned existing pedestrian detection technology, propose a kind of based on improved k- The pedestrian detection method of means and loss function, it is intended to improve the detection accuracy of pedestrian target under different scenes.
Technical thought of the invention is: building training set and test set first, is secondly calculated using improved k-means cluster Method clusters the markup information of training set, and using cluster result as the size initialization value of YOLOv3 network candidates frame, It is then based on improved loss function in YOLOv3 network to be trained training set, finally utilizes trained pedestrian detection net Network model detects test set.
According to above-mentioned technical thought, the technical solution for realizing that the object of the invention is taken includes the following steps:
(1) training set and test set are constructed:
(1a) is by the N frame image under any scene of acquisition in pedestrian's video continuously or discontinuously in the form of jpg picture It is saved in JPEGImages file, and each width picture is named, N > 1000;
(1b) using picture more than half in JPEGImages file as training pictures, remaining picture is as test Pictures, and by training pictures in all pictures title write-in ImageSets/Main file under trainval.txt In file, while the test.txt for concentrating the title of all pictures to be written under ImageSets/Main file in test picture is literary In part;
The different pedestrians that every width picture that (1c) concentrates training pictures and test picture is included carry out picture frame mark, And the coordinate data of callout box is saved, then by the classification person of pedestrian target included in callout box and every width picture In include the coordinate data of callout box be saved in xml document, obtain the Annotations being made of multiple xml documents text Part folder, wherein the title of the corresponding pedestrian's picture of the title of each xml document is identical;
(1d) will be identical with picture name in trainval.txt file from what is chosen in Annotations file Markup information collection of the xml document as training pictures, xml document identical with picture name in test.txt file is as survey The markup information collection of pictures is tried, and the markup information collection of training pictures is written to the train.txt under darknet file In file, by the test.txt file under the markup information collection write-in darknet file for testing pictures, the training is schemed Piece collection xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml markup information collection corresponding with its, which is constituted, to be surveyed Examination collection;
(2) training set is clustered based on improved k-means algorithm:
(2a) screens the markup information in training set:
The array that the coordinate data write-in length extracted from the corresponding xml mark file of training set is l by (2a1) In data_xml, using read from data_xml first group of coordinate data as changing coordinates data, and initialize its Current index value q=0 in data_xml;
(2a2) defines the corresponding coordinate data of q in data_xml: the corresponding x-axis projection coordinate in the callout box upper left corner is determined Justice is xmin, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as ymin, the corresponding x-axis projection coordinate in the callout box lower right corner It is defined as xmax, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as ymax
(2a3) calculates xminWith xmaxDifference xd, yminWith ymaxDifference yd, and judge xdAnd ydCorresponding data_xml In data whether be valid data, if xd=0 or yd=0, then xdAnd ydData in corresponding data_xml are invalid number According to deleting the invalid data, l=l-1, and execute step (2a2);If xd≠ 0 and yd≠ 0, then xdAnd ydCorresponding data_xml In data be valid data, execute step (2a4);
(2a4) calculates xdWith ydQuotient div, and according to the number in the corresponding data_xml of the whether true judgement div of div > 3 According to validity delete the invalid data if so, data in the corresponding data_xml of div are invalid data, l=l-1, And step (2a5) is executed, otherwise, then the data in the corresponding data_xml of div are valid data, enable q=q+1, and execute step Suddenly (2a5);
(2a5) repeats step (2a2)~(2a4) until q=l, obtains effective markup information;
(2b) clusters effective markup information:
(2b1) sets the number of cluster centre as k, and k > 0 is constructed using the length l of data_xml as line number, using k as columns The row of two-dimensional matrix data_k, data_k indicate that the effective markup information saved in data_xml, column indicate the value of cluster centre, And initializing data_k is 0,;
(2b2) carries out random initializtion to k cluster centre respectively;
(2b3) calculates the distance value of l effectively markup information and k cluster centre in data_xml, and by each distance Position in value write-in data_k effectively where the corresponding row of markup information and the corresponding column of cluster centre;
(2b4) is using the corresponding effective markup information of a line every in data_k as where lowest distance value in every a line The member of corresponding cluster centre is arranged, and the numerical value of each cluster centre is updated to wide and high by each cluster centre member Mean value;
(2b5) repeats step (2b3) and (2b4), clusters until the value of k cluster centre no longer changes, and by k The value at center is as cluster result;
(3) loss function of YOLOv3 detection network is improved:
YOLOv3 is detected into the coordinate loss function in network losses function and is revised as Loss'coord:
ti=2-wi×hi
Wherein, λcoordNetwork is indicated to the weight parameter of prediction block coordinate, l.w indicates network in the ruler of the wide upper division of picture Very little, l.h indicates the size that network divides on picture height, and l.n indicates the number of prediction block in network, and i is to change to l.w × l.h The variable in generation, j are the variable to l.n iteration, wiIndicate the width of prediction block,Indicate the width of callout box, hiIndicate prediction block Height,Indicate the height of callout box, xiIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate xmin, yiIndicate prediction block Top left co-ordinate y-axis projection,Indicate ymin
(4) training set is trained based on improved loss function:
(4a) is using cluster result as the size initialization value of YOLOv3 network candidates frame;
(4b) is based on improved loss function in YOLOv3 network and carries out K repetitive exercise to training set, and K > 10000 is obtained To pedestrian detection network model;
(5) test set is detected:
Test set to be detected is input in pedestrian detection network model and is detected, the position of each pedestrian target is obtained Set coordinate and confidence level.
Compared with the prior art, the invention has the following advantages:
The present invention improves the loss function in YOLOv3, increases in coordinate loss function to small size prediction block The study weight of error of coordinate avoids the defect that prediction block cannot lose according to own dimensions feature learning, meanwhile, the present invention K-means clustering algorithm is improved, the value of wide Gao great little and the ratio of width to height to callout box in training set are screened, and are retained Invalid data is removed while valid data, and valid data are clustered, and avoids invalid markup information and cluster is tied Fruit is inaccurate and influences the defect of detection accuracy, and simulation result shows that the present invention effectively improves row compared with prior art The detection accuracy of people's detection.
Detailed description of the invention
Fig. 1 is implementation flow chart of the present invention.
Specific embodiment
In the following with reference to the drawings and specific embodiments, present invention is further described in detail.
Referring to Fig.1, the present invention includes the following steps:
Step 1) constructs training set and test set:
Step 1a) by video camera, unmanned plane or mobile phone shooting any scene under pedestrian video in continuously or discontinuously N frame picture extract and a frame picture and be saved in JPEGImages file, N > 10000 every 10 frames, adopted in the present embodiment It is continuous 12000 frame picture in pedestrian's video in the road of mobile phone shooting, each width picture is named as different names, Wherein the resolution ratio of video is that the quantity of the picture saved in 1920 × 1080, JPEGImages file is no less than 1000 width;
Step 1b) using picture more than half in JPEGImages file as training pictures, remaining picture conduct Pictures are tested, the ratio cut partition training pictures and test pictures of 7:3 are used in the present embodiment, and will be trained in pictures In trainval.txt file under the title write-in ImageSets/Main file of all pictures, while pictures will be tested In all pictures title write-in ImageSets/Main file under test.txt file in, wherein the title of every width picture A line is used as in trainval.txt file and test.txt file;
Step 1c) picture frame mark is carried out with the pedestrian target that every width picture that picture is concentrated is included is tested to training pictures Note:
Step 1c1) to the classification and position coordinates (x of pedestrian targetmin,ymin,xmax,ymax) be labeled, wherein each The classification of pedestrian target is person, xminFor the corresponding x-axis projection coordinate in the callout box upper left corner, yminFor the callout box upper left corner Corresponding y-axis projection coordinate, xmaxFor the corresponding x-axis projection coordinate in the callout box lower right corner, ymaxFor the corresponding y in the callout box lower right corner Axial projection's coordinate;
Step 1c2) by training pictures and test pictures every width picture in all pedestrian targets markup information with Xml format is saved, and the Annotations file being made of multiple xml formatted files is obtained, wherein each xml format The title of file with it includes markup information corresponding to picture name it is identical, the mark as corresponding to picture 000001.jpg Information file name is 000001.xml, by JPEGImages file, Annotations file and ImageSets file It folds up in file darknet;
Step 1d) it will be identical as picture name in trainval.txt file from what is chosen in Annotations file Xml document as training pictures markup information collection, xml document conduct identical with picture name in test.txt file The markup information collection of pictures is tested, and will be under the markup information collection write-in darknet file of training pictures In train.txt file, the markup information collection for testing pictures is written in the test.txt file under darknet file, Trained pictures xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml mark corresponding with its are believed Breath collection constitutes test set;
Step 2) is based on improved k-means algorithm and clusters to training set:
Step 2a) markup information in training set is screened:
Step 2a1) building array data_xml, using the obj.findtext in python from the xml of all training sets Coordinate data is extracted in file, and coordinate data is sequentially written in data_xml, wherein each member of data_xml It indicates one group of coordinate data, the length l of data_xml is calculated using the len function in python, read the in data_xml One group of coordinate data, and initialize the current index value q=0 of data_xml;
Step 2a2) define the corresponding coordinate data of q in data_xml: by the corresponding x-axis projection coordinate in the callout box upper left corner It is defined as xmin, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as ymin, the corresponding x-axis projection seat in the callout box lower right corner Mark is defined as xmax, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as ymax
Step 2a3) calculate xminWith xmaxDifference xd, yminWith ymaxDifference yd, wherein xmin、xmax、yminAnd ymaxIt is Floating type number, and judge xdAnd ydWhether the data in corresponding data_xml are valid data, if xd=0 or yd=0, then xdWith ydData in corresponding data_xml are invalid data, delete this in data_xml using the del function in python Group invalid data, l=l-1, and execute step (2a2);If xd≠ 0 and yd≠ 0, then xdAnd ydNumber in corresponding data_xml According to for valid data, execute step (2a4);
Step 2a4) calculate xdWith ydQuotient div, and according in the corresponding data_xml of the whether true judgement div of div > 3 The validity of data use the del letter in python if so, data in the corresponding data_xml of div are invalid data Number deletes this group of invalid data, l=l-1 in data_xml, and executes step (2a5), otherwise, then the corresponding data_ of div Data in xml are valid data, enable q=q+1, and execute step (2a5);
Step 2a5) step (2a2)~(2a4) is repeated until q=l, effective markup information is obtained, i.e., data_ at this time Whole markup informations in xml;
Step 2b) effective markup information is clustered:
Step 2b1) number of artificial setting cluster centre is k, k > 0, k is 9 in the present embodiment, constructs two-dimensional matrix Data_k, line number are the length l of data_xml at this time, and the row of columns k, data_k indicate to save in data_xml effective Markup information, column indicate the value of cluster centre, and the use of the np.zeros initialization data_k in python are 0,;
Step 2b2) k cluster centre is carried out respectively using the np.random.choice in python it is initial at random Change, wherein each cluster centre is the floating type array that one group of length is 2, the value of cluster centre is written entitled clusters's In boxes;
Step 2b3) calculate l effectively markup informations and k cluster centre in data_xml distance value d (box, Centroid), calculation expression are as follows:
D (box, centroid)=1-IOU (box, centroid)
Box=xd×yd
Wherein centroid indicate cluster centre in two floating type members product, box ∩ centorid indicate box with The intersection of centroid, box ∪ centorid indicates the union of box and centroid, then by each d (box, centroid) The position in data_k effectively where the corresponding row of markup information and the corresponding column of cluster centre is written;
Step 2b4) using the np.argmin in python calculate the column in the every a line of data_k where lowest distance value simultaneously It is recorded in variable nearest_clusters, each cluster centre is updated using following sentence in python:
Clusters [cluster]=dist (boxes [nearest_clusters==cluster], axis=0)
Wherein, cluster is the index of cluster centre, and every once above-mentioned sentence cluster that executes adds one in python, Until finishing until all updating to all cluster centres, updated cluster centre is still stored in entitled clusters's In boxes;
Step 2b5) step (2b3) and (2b4) is repeated, until the value of k cluster centre no longer changes, and by k The value of cluster centre is as cluster result;
Step 3) improves the loss function of YOLOv3 detection network:
By the coordinate in the delta_region_box function of region_layer.c file in darknet/src file Loss function is revised as Loss'coord:
ti=2-wi×hi
Loss function Loss' is then completely improved in YOLOv3 are as follows:
Loss'=Lossnoobj+Lossobj+Lossclass+Loss'coord
Wherein, LossnoobjIndicate the confidence level loss function for not including the prediction block of target, LossobjIt indicates to include target Prediction block confidence level loss function, LossclassIndicate classification loss function, Loss'coordIndicate improved coordinate loss Function, λcoordNetwork is indicated to the weight parameter of prediction block coordinate, l.w indicates size of the network in the wide upper division of picture, l.h table Showing the size that network divides on picture height, l.n indicates the number of prediction block in network, and i is the variable to l.w × l.h iteration, J is the variable to l.n iteration, wiIndicate the width of prediction block,Indicate the width of callout box, hiIndicate the height of prediction block,It indicates The height of callout box, xiIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate xmin, yiIndicate the upper left corner of prediction block Coordinate y-axis projection,Indicate ymin, LossnoobjIndicate the confidence level loss function for not including the prediction block of target, LossobjFor the confidence level loss function of the prediction block comprising target, LossclassFor classification loss function, λnoobjExpression is not wrapped The corresponding coefficient of prediction block containing target,It is to indicate whether the prediction block does not include the parameter of target, ciIt is set for prediction block Reliability,For callout box confidence level, λobjIndicate the corresponding coefficient of prediction block comprising target,It is whether to indicate the prediction block Parameter comprising target;λclassIndicate that the corresponding coefficient of prediction block comprising target category, c indicate that the iteration for classification becomes Amount, class indicate classification total in data set, pi(c) probability containing c classification in prediction block is indicated,Indicate callout box In the probability containing c classification;
Step 4) is based on improved loss function and is trained to training set:
Step 4a) Initialize installation is carried out to the training parameter of pedestrian detection network:
The path of training set and test set in voc.data file is modified, and maximum number of iterations max_batches is set It is 50200 times, picture batch processing size is 64, and initial learning rate is 10-3, momentum 0.9;
Step 4b) using cluster result as the size initialization value of YOLOv3 network candidates frame:
Cluster result is written in the anchors in yolov3-voc.cfg file;
Step 4c) K repetitive exercise, K > carried out to training set based on improved loss function in YOLOv3 network 10000, K is 20000 in the present embodiment, obtains pedestrian detection network model;
Step 5) detects test set:
Step 5a) under darknet file input shell-command:
./darknet detector test cfg/voc.data cfg/yolov3-voc.cfg yolov3-voc_ 20000.weights
Step 5b) shell-command of the pedestrian detection network model according to input, by improved loss function to reading Test set picture carries out forward calculation, obtains the position coordinates and confidence level of each pedestrian target, and is stored in data/out text In part folder.
Below in conjunction with emulation experiment, technical effect of the invention is described further:
1. simulated conditions and content:
Emulation experiment of the invention is in Intel (R) Xeon (R) CPU E5-2650 v4@2.20GHz, GeForce GTX It is realized under the configuration surroundings of 1080ti x4,32G memory.Pedestrian's video data used in experiment, which derives from, uses red rice In the Xian Electronics Science and Technology University campus of note7 mobile phone actual photographed and nearby road pedestrian.
Emulation experiment: using based on the pedestrian detection method and the prior art pair for improving k-means and improvement loss function The detection accuracy of pedestrian detection does contrast simulation, after constructing training set and test set according to the present invention, first to the mark of training set It infuses the improved k-means of use of information and carries out valid data screening, later respectively to the valid data in training set markup information It is clustered to obtain respective cluster result with total data, using two cluster results as based on improvement loss function The initialization size of YOLOv3 and in the prior art network candidates frame, then using improved loss function in YOLOv3 to training Collection carry out 20000 times it is trained while using prior art network to training set progress 20000 training, finally obtain respective Pedestrian detection network model is separately input to the test set to obtain two models in two pedestrian detection network models to detect respectively Position coordinates and the confidence level of each pedestrian target out are as a result, and count the detection accuracy of two methods, specific detection accuracy Comparison is as shown in the table.
2. analysis of simulation result:
The obtained pedestrian detection result of the present invention has apparent advantage, the prior art and this hair compared with prior art Bright detection accuracy is as shown in table 1:
1 detection accuracy contrast table of table
Evaluation index The prior art The present invention
Detection accuracy 87.3 89.0
, it is apparent that the obtained detection accuracy of the present invention is bigger from table, show inspection of the present invention to pedestrian target It surveys effect and is better than the prior art.
Above description is only example of the present invention, does not constitute any limitation of the invention.For this field Professional for, all may be without departing substantially from the principle of the invention, structure the case where after having understood the content of present invention and principle Under, various modifications and variations in form and details are carried out, but these modifications and variations based on inventive concept are still at this Within the claims of invention.

Claims (2)

1. a kind of pedestrian detection method based on improved k-means and loss function, which comprises the steps of:
(1) training set and test set are constructed:
(1a) saves the N frame image under any scene of acquisition in pedestrian's video continuously or discontinuously in the form of jpg picture It is named into JPEGImages file, and to each width picture, N > 10000;
(1b) using picture more than half in JPEGImages file as training pictures, remaining picture is as test picture Collect, and the title of all pictures in training pictures is written to the trainval.txt file under ImageSets/Main file In, at the same by test picture concentrate all pictures title be written ImageSets/Main file under test.txt file In;
The different pedestrians that every width picture that (1c) concentrates training pictures and test picture is included carry out picture frame mark, and protect The coordinate data of callout box is deposited, then will be wrapped in the classification person of pedestrian target included in callout box and every width picture The coordinate data of the callout box contained is saved in xml document, obtains the Annotations file being made of multiple xml documents, Wherein the title of the corresponding pedestrian's picture of the title of each xml document is identical;
(1d) will be literary from the xml identical with picture name in trainval.txt file chosen in Annotations file Markup information collection of the part as training pictures, xml document identical with picture name in test.txt file is as test chart The markup information collection of piece collection, and by training pictures markup information collection write-in darknet file under train.txt file In, the markup information collection for testing pictures is written in the test.txt file under darknet file, the trained pictures Xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml markup information collection corresponding with its constitute test Collection;
(2) training set is clustered based on improved k-means algorithm:
(2a) screens the markup information in training set:
The array data_ that the coordinate data write-in length extracted from the corresponding xml mark file of training set is l by (2a1) In xml, using read from data_xml first group of coordinate data as changing coordinates data, and it is initialized in data_xml In current index value q=0;
(2a2) defines the corresponding coordinate data of q in data_xml: the corresponding x-axis projection coordinate in the callout box upper left corner is defined as xmin, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as ymin, the corresponding x-axis projection coordinate definition in the callout box lower right corner For xmax, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as ymax
(2a3) calculates xminWith xmaxDifference xd, yminWith ymaxDifference yd, and judge xdAnd ydNumber in corresponding data_xml According to whether being valid data, if xd=0 or yd=0, then xdAnd ydData in corresponding data_xml are invalid data, are deleted The invalid data, l=l-1, and execute step (2a2);If xd≠ 0 and yd≠ 0, then xdAnd ydNumber in corresponding data_xml According to for valid data, execute step (2a4);
(2a4) calculates xdWith ydQuotient div, and according to the data in the corresponding data_xml of the whether true judgement div of div > 3 Validity deletes the invalid data, l=l-1, and hold if so, the data in the corresponding data_xml of div are invalid data Row step (2a5), otherwise, then the data in the corresponding data_xml of div are valid data, enable q=q+1, and execute step (2a5);
(2a5) repeats step (2a2)~(2a4) until q=l, obtains effective markup information;
(2b) clusters effective markup information:
(2b1) sets the number of cluster centre as k, and k > 0 is constructed using the length l of data_xml as line number, using k as the two dimension of columns The row of matrix data_k, data_k indicate that the effective markup information saved in data_xml, column indicate the value of cluster centre, and just Beginningization data_k is 0,;
(2b2) carries out random initializtion to k cluster centre respectively;
(2b3) calculates the distance value of l effective markup informations and k cluster centre in data_xml, and each distance value is write Enter the position in data_k where the corresponding row of effective markup information and the corresponding column of cluster centre;
(2b4) is using the corresponding effective markup information of a line every in data_k as the column pair where lowest distance value in every a line It answers the member of cluster centre, and the numerical value of each cluster centre is updated to by wide and high equal of each cluster centre member Value;
(2b5) repeats step (2b3) and (2b4), until the value of k cluster centre no longer changes, and by k cluster centre Value as cluster result;
(3) loss function of YOLOv3 detection network is improved:
YOLOv3 is detected into the coordinate loss function in network losses function and is revised as Loss'coord:
ti=2-wi×hi
Wherein, λcoordIndicate network to the weight parameter of prediction block coordinate, l.w indicate network in the size of the wide upper division of picture, L.h indicates the size that network divides on picture height, and l.n indicates the number of prediction block in network, and i is to l.w × l.h iteration Variable, j are the variable to l.n iteration, wiIndicate the width of prediction block,Indicate the width of callout box, hiIndicate the height of prediction block, Indicate the height of callout box, xiIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate xmin, yiIndicate a left side for prediction block Upper angular coordinate y-axis projection,Indicate ymin
(4) training set is trained based on improved loss function:
(4a) is using cluster result as the size initialization value of YOLOv3 network candidates frame;
(4b) is based on improved loss function in YOLOv3 network and carries out K repetitive exercise to training set, and K > 10000 is gone People detects network model;
(5) test set is detected:
Test set to be detected is input in pedestrian detection network model and is detected, the position for obtaining each pedestrian target is sat Mark and confidence level.
2. the pedestrian detection method according to claim 1 based on improved k-means and loss function, feature exist In YOLOv3 described in step (3) detects the loss function of network, calculation expression Loss=Lossnoobj+Lossobj+ Lossclass+Losscoord
ti=2-wi×hi
Wherein, Loss indicates loss function, LossnoobjIndicate the confidence level loss function for not including the prediction block of target, LossobjIndicate the confidence level loss function of the prediction block comprising target, LossclassIndicate classification loss function, LosscoordTable Show coordinate loss function, λnoobjIndicate that the corresponding coefficient of prediction block for not including target, l.w indicate network in picture wide direction Division size, l.h indicates division size of the network on the high direction of picture, and i, j are respectively corresponding iteration variable,It is Indicate whether the prediction block does not include the parameter of target, ciFor prediction block confidence level,For callout box confidence level;λobjIndicate packet The corresponding coefficient of prediction block containing target,Be indicate the prediction block whether include target parameter;λclassIt indicates to include target The corresponding coefficient of the prediction block of classification, c indicate the iteration variable for being directed to classification, and class indicates classification total in data set, pi (c) probability containing c classification in prediction block is indicated,Indicate the probability containing c classification in callout box, λcoordIndicate network To the weight parameter of prediction block coordinate, wiIndicate the width of prediction block,Indicate the width of callout box, hiIndicate the height of prediction block,Table Show the height of callout box, xiIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate xmin, yiIndicate the upper left of prediction block Angular coordinate y-axis projection,Indicate ymin
CN201910202078.4A 2019-03-18 2019-03-18 Pedestrian detection method based on improved k-means and loss function Active CN109978035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910202078.4A CN109978035B (en) 2019-03-18 2019-03-18 Pedestrian detection method based on improved k-means and loss function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910202078.4A CN109978035B (en) 2019-03-18 2019-03-18 Pedestrian detection method based on improved k-means and loss function

Publications (2)

Publication Number Publication Date
CN109978035A true CN109978035A (en) 2019-07-05
CN109978035B CN109978035B (en) 2021-04-02

Family

ID=67079213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910202078.4A Active CN109978035B (en) 2019-03-18 2019-03-18 Pedestrian detection method based on improved k-means and loss function

Country Status (1)

Country Link
CN (1) CN109978035B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866476A (en) * 2019-11-06 2020-03-06 南京信息职业技术学院 Dense stacking target detection method based on automatic labeling and transfer learning
CN110929646A (en) * 2019-11-22 2020-03-27 国网福建省电力有限公司 Power distribution tower reverse-off information rapid identification method based on unmanned aerial vehicle aerial image
CN110942005A (en) * 2019-11-21 2020-03-31 网易(杭州)网络有限公司 Object recognition method and device
CN111104965A (en) * 2019-11-25 2020-05-05 河北科技大学 Vehicle target identification method and device
CN111274894A (en) * 2020-01-15 2020-06-12 太原科技大学 Improved YOLOv 3-based method for detecting on-duty state of personnel
CN112800906A (en) * 2021-01-19 2021-05-14 吉林大学 Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
CN113361347A (en) * 2021-05-25 2021-09-07 东南大学成贤学院 Job site safety detection method based on YOLO algorithm
CN113537257A (en) * 2020-04-13 2021-10-22 山西农业大学 Wheat detection method realized based on YoLov3 network
CN113807472A (en) * 2021-11-19 2021-12-17 智道网联科技(北京)有限公司 Hierarchical target detection method and device
CN114119583A (en) * 2021-12-01 2022-03-01 常州市新创智能科技有限公司 Industrial visual inspection system, method, network model selection method and warp knitting machine

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100272366A1 (en) * 2009-04-24 2010-10-28 Sony Corporation Method and device of detecting object in image and system including the device
CN103186776A (en) * 2013-04-03 2013-07-03 西安电子科技大学 Human detection method based on multiple features and depth information
US20140133698A1 (en) * 2012-11-09 2014-05-15 Analog Devices Technology Object detection
CN107358223A (en) * 2017-08-16 2017-11-17 上海荷福人工智能科技(集团)有限公司 A kind of Face datection and face alignment method based on yolo
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image
CN108647665A (en) * 2018-05-18 2018-10-12 西安电子科技大学 Vehicle real-time detection method of taking photo by plane based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100272366A1 (en) * 2009-04-24 2010-10-28 Sony Corporation Method and device of detecting object in image and system including the device
US20140133698A1 (en) * 2012-11-09 2014-05-15 Analog Devices Technology Object detection
CN103186776A (en) * 2013-04-03 2013-07-03 西安电子科技大学 Human detection method based on multiple features and depth information
CN107358223A (en) * 2017-08-16 2017-11-17 上海荷福人工智能科技(集团)有限公司 A kind of Face datection and face alignment method based on yolo
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image
CN108647665A (en) * 2018-05-18 2018-10-12 西安电子科技大学 Vehicle real-time detection method of taking photo by plane based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUI ZHANG 等: "Pedestrian Detection Method Based on Faster R-CNN", 《 2017 13TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY》 *
PIOTR DOLLAR 等: "Pedestrian Detection: An Evaluation of the State of the Art", 《 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866476A (en) * 2019-11-06 2020-03-06 南京信息职业技术学院 Dense stacking target detection method based on automatic labeling and transfer learning
CN110866476B (en) * 2019-11-06 2023-09-01 南京信息职业技术学院 Dense stacking target detection method based on automatic labeling and transfer learning
CN110942005A (en) * 2019-11-21 2020-03-31 网易(杭州)网络有限公司 Object recognition method and device
CN110929646A (en) * 2019-11-22 2020-03-27 国网福建省电力有限公司 Power distribution tower reverse-off information rapid identification method based on unmanned aerial vehicle aerial image
CN111104965A (en) * 2019-11-25 2020-05-05 河北科技大学 Vehicle target identification method and device
CN111274894A (en) * 2020-01-15 2020-06-12 太原科技大学 Improved YOLOv 3-based method for detecting on-duty state of personnel
CN113537257A (en) * 2020-04-13 2021-10-22 山西农业大学 Wheat detection method realized based on YoLov3 network
CN112800906A (en) * 2021-01-19 2021-05-14 吉林大学 Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
CN113361347A (en) * 2021-05-25 2021-09-07 东南大学成贤学院 Job site safety detection method based on YOLO algorithm
CN113807472A (en) * 2021-11-19 2021-12-17 智道网联科技(北京)有限公司 Hierarchical target detection method and device
CN113807472B (en) * 2021-11-19 2022-02-22 智道网联科技(北京)有限公司 Hierarchical target detection method and device
CN114119583A (en) * 2021-12-01 2022-03-01 常州市新创智能科技有限公司 Industrial visual inspection system, method, network model selection method and warp knitting machine

Also Published As

Publication number Publication date
CN109978035B (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN109978035A (en) Pedestrian detection method based on improved k-means and loss function
CN111104898B (en) Image scene classification method and device based on target semantics and attention mechanism
Chen et al. City-scale map creation and updating using GPS collections
Chu et al. Camera as weather sensor: Estimating weather information from single images
CN108171233A (en) Use the method and apparatus of the object detection of the deep learning model based on region
CN108647665A (en) Vehicle real-time detection method of taking photo by plane based on deep learning
Yap et al. A comparative study of mobile-based landmark recognition techniques
US9858503B2 (en) Acceleration of linear classifiers
CN105574550A (en) Vehicle identification method and device
CN108052966A (en) Remote sensing images scene based on convolutional neural networks automatically extracts and sorting technique
CN107346328A (en) A kind of cross-module state association learning method based on more granularity hierarchical networks
CN110309780A (en) High resolution image houseclearing based on BFD-IGA-SVM model quickly supervises identification
CN109766835A (en) The SAR target identification method of confrontation network is generated based on multi-parameters optimization
Chen et al. Clues from the beaten path: Location estimation with bursty sequences of tourist photos
KR102516588B1 (en) A measurement device, method and program that provides measured and estimated values of fine dust concentration through satellite image analysis using an artificial intelligence model
CN108492298A (en) Based on the multispectral image change detecting method for generating confrontation network
CN113918837B (en) Method and system for generating city interest point category representation
CN103955709B (en) Weighted synthetic kernel and triple markov field (TMF) based polarimetric synthetic aperture radar (SAR) image classification method
CN110276363A (en) A kind of birds small target detecting method based on density map estimation
Sathish et al. Detection and localization of multiple objects using VGGNet and single shot detection
CN106250918B (en) A kind of mixed Gauss model matching process based on improved soil-shifting distance
CN110659601A (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN105205807B (en) Method for detecting change of remote sensing image based on sparse automatic coding machine
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN108154511A (en) SAR image segmentation method based on submodule dictionary learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant