CN109978035A - Pedestrian detection method based on improved k-means and loss function - Google Patents
Pedestrian detection method based on improved k-means and loss function Download PDFInfo
- Publication number
- CN109978035A CN109978035A CN201910202078.4A CN201910202078A CN109978035A CN 109978035 A CN109978035 A CN 109978035A CN 201910202078 A CN201910202078 A CN 201910202078A CN 109978035 A CN109978035 A CN 109978035A
- Authority
- CN
- China
- Prior art keywords
- data
- indicate
- xml
- prediction block
- picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention proposes a kind of pedestrian detection method based on improved k-means and loss function, for the video comprising pedestrian target or image to be classified and are identified, it mainly solves the problems, such as that cluster result inaccuracy in the prior art and prediction block cannot lose according to own dimensions feature learning, realizes step are as follows: building training set and test set;Training set is clustered based on improved k-means algorithm;The loss function of YOLOv3 detection network is improved;Training set is trained based on improved loss function;Test set is detected.The present invention screens out the invalid data in training set markup information in clustering phase, obtained valid data are clustered, to obtain more accurate candidate frame initialization size, lose different prediction blocks according to the different prediction of own dimensions feature learning, to obtain more accurate pedestrian target detection network.
Description
Technical field
The invention belongs to target detection technique fields, are related to a kind of pedestrian detection method, and in particular to one kind is based on improvement
K-means and improved loss function pedestrian detection method, can be used for carrying out the video comprising pedestrian target or image
Classification and identification.
Background technique
Pedestrian detection, which refers to, detects the position coordinates of pedestrian and confidence level in video or image, measures testing result
Refer mainly to indicate detection accuracy and detection speed, most important one measurement index is detection accuracy, and detection accuracy is often
It is influenced by pedestrian's feature and loss function.
Currently, common pedestrian detection method can be divided into according to the extracting mode difference of pedestrian's feature based on tradition calculation
The pedestrian detection of method and two class of pedestrian detection based on deep learning.
Traditional pedestrian detection method mainly has the detection method of global characteristics, the detection method based on local shape factor
With the detection method based on multiple features.Detection method based on global characteristics is mainly the gradient orientation histogram by whole picture figure
The profile of pedestrian is detected to find the position of pedestrian.Detection method based on local shape factor mainly extracts input picture
Local feature by match pedestrian's feature detect.For detection method based on multiple features mainly to gray scale, profile etc. is more
Seed type feature extracts the testing result of detection and these comprehensive features.The Common advantages of three of the above method are simple fast
Speed, but since pedestrian's feature is to illumination, background and the factors such as to block more sensitive, be readily incorporated when detecting ambient noise and
Light interference, therefore traditional pedestrian detection method detection accuracy is lower.
The development of deep learning is that the research of pedestrian detection brings new thinking.Pedestrian detection side based on deep learning
Method is mainly had the detection method chosen based on candidate frame and based on detection method end to end, the detection side chosen based on candidate frame
The mainly artificial candidate frame of choosing of method carries out network training again, although this method has good detection effect, due to its thing
First choosing candidate frame causes the detection efficiency of network very low.
In recent years, based on detection method end to end since it is increasingly becoming with preferable detection accuracy and detection efficiency
The main stream approach in pedestrian detection field, this method are basic network with the target detection network based on deep learning, utilize cluster
Method the size of candidate frame is initialized, to allow the original dimension of candidate frame close to the size of pedestrian's feature so that
Network is more easier to restrain, and is then trained using loss function to training set, obtains pedestrian detection network model, finally makes
Test set picture is detected with pedestrian detection network model to obtain the position coordinates and confidence level of all pedestrian targets.However
Basic network detection accuracy used by current most of pedestrian detection algorithms is still undesirable, such as YOLOv1, YOLOv2, therefore
The detection accuracy of these pedestrian target detection algorithms is lower.For example, application publication number is CN 109325418A, entitled " base
The patent application of the pedestrian recognition method under the road traffic environment for improving YOLOv3 " discloses a kind of by improved
The method of YOLOv3 progress pedestrian detection.This method is basic network with YOLOv3, first in the process using k-means cluster
In increase the number of candidate frame, thus increase network extract feature ability, then again network using loss function into
When row training, the weight of the coordinate loss function in loss function is increased, pedestrian detection network model is obtained.But this method exists
The situation that markup information is invalid in training set is not accounted for when being clustered using k-means, so that cluster result is inaccurate;
And this method does not account for different size prediction frames when calculating and losing to the error of coordinate and width height in coordinate loss function
Error learns specific gravity different problems, prevent prediction block according to own dimensions feature learning from losing.Therefore, how to filter out
Valid data in training set markup information and to calculate more accurate loss be still the field urgent problem to be solved.
Summary of the invention
It is an object of the invention to be directed to the deficiency of above-mentioned existing pedestrian detection technology, propose a kind of based on improved k-
The pedestrian detection method of means and loss function, it is intended to improve the detection accuracy of pedestrian target under different scenes.
Technical thought of the invention is: building training set and test set first, is secondly calculated using improved k-means cluster
Method clusters the markup information of training set, and using cluster result as the size initialization value of YOLOv3 network candidates frame,
It is then based on improved loss function in YOLOv3 network to be trained training set, finally utilizes trained pedestrian detection net
Network model detects test set.
According to above-mentioned technical thought, the technical solution for realizing that the object of the invention is taken includes the following steps:
(1) training set and test set are constructed:
(1a) is by the N frame image under any scene of acquisition in pedestrian's video continuously or discontinuously in the form of jpg picture
It is saved in JPEGImages file, and each width picture is named, N > 1000;
(1b) using picture more than half in JPEGImages file as training pictures, remaining picture is as test
Pictures, and by training pictures in all pictures title write-in ImageSets/Main file under trainval.txt
In file, while the test.txt for concentrating the title of all pictures to be written under ImageSets/Main file in test picture is literary
In part;
The different pedestrians that every width picture that (1c) concentrates training pictures and test picture is included carry out picture frame mark,
And the coordinate data of callout box is saved, then by the classification person of pedestrian target included in callout box and every width picture
In include the coordinate data of callout box be saved in xml document, obtain the Annotations being made of multiple xml documents text
Part folder, wherein the title of the corresponding pedestrian's picture of the title of each xml document is identical;
(1d) will be identical with picture name in trainval.txt file from what is chosen in Annotations file
Markup information collection of the xml document as training pictures, xml document identical with picture name in test.txt file is as survey
The markup information collection of pictures is tried, and the markup information collection of training pictures is written to the train.txt under darknet file
In file, by the test.txt file under the markup information collection write-in darknet file for testing pictures, the training is schemed
Piece collection xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml markup information collection corresponding with its, which is constituted, to be surveyed
Examination collection;
(2) training set is clustered based on improved k-means algorithm:
(2a) screens the markup information in training set:
The array that the coordinate data write-in length extracted from the corresponding xml mark file of training set is l by (2a1)
In data_xml, using read from data_xml first group of coordinate data as changing coordinates data, and initialize its
Current index value q=0 in data_xml;
(2a2) defines the corresponding coordinate data of q in data_xml: the corresponding x-axis projection coordinate in the callout box upper left corner is determined
Justice is xmin, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as ymin, the corresponding x-axis projection coordinate in the callout box lower right corner
It is defined as xmax, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as ymax;
(2a3) calculates xminWith xmaxDifference xd, yminWith ymaxDifference yd, and judge xdAnd ydCorresponding data_xml
In data whether be valid data, if xd=0 or yd=0, then xdAnd ydData in corresponding data_xml are invalid number
According to deleting the invalid data, l=l-1, and execute step (2a2);If xd≠ 0 and yd≠ 0, then xdAnd ydCorresponding data_xml
In data be valid data, execute step (2a4);
(2a4) calculates xdWith ydQuotient div, and according to the number in the corresponding data_xml of the whether true judgement div of div > 3
According to validity delete the invalid data if so, data in the corresponding data_xml of div are invalid data, l=l-1,
And step (2a5) is executed, otherwise, then the data in the corresponding data_xml of div are valid data, enable q=q+1, and execute step
Suddenly (2a5);
(2a5) repeats step (2a2)~(2a4) until q=l, obtains effective markup information;
(2b) clusters effective markup information:
(2b1) sets the number of cluster centre as k, and k > 0 is constructed using the length l of data_xml as line number, using k as columns
The row of two-dimensional matrix data_k, data_k indicate that the effective markup information saved in data_xml, column indicate the value of cluster centre,
And initializing data_k is 0,;
(2b2) carries out random initializtion to k cluster centre respectively;
(2b3) calculates the distance value of l effectively markup information and k cluster centre in data_xml, and by each distance
Position in value write-in data_k effectively where the corresponding row of markup information and the corresponding column of cluster centre;
(2b4) is using the corresponding effective markup information of a line every in data_k as where lowest distance value in every a line
The member of corresponding cluster centre is arranged, and the numerical value of each cluster centre is updated to wide and high by each cluster centre member
Mean value;
(2b5) repeats step (2b3) and (2b4), clusters until the value of k cluster centre no longer changes, and by k
The value at center is as cluster result;
(3) loss function of YOLOv3 detection network is improved:
YOLOv3 is detected into the coordinate loss function in network losses function and is revised as Loss'coord:
ti=2-wi×hi
Wherein, λcoordNetwork is indicated to the weight parameter of prediction block coordinate, l.w indicates network in the ruler of the wide upper division of picture
Very little, l.h indicates the size that network divides on picture height, and l.n indicates the number of prediction block in network, and i is to change to l.w × l.h
The variable in generation, j are the variable to l.n iteration, wiIndicate the width of prediction block,Indicate the width of callout box, hiIndicate prediction block
Height,Indicate the height of callout box, xiIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate xmin, yiIndicate prediction block
Top left co-ordinate y-axis projection,Indicate ymin;
(4) training set is trained based on improved loss function:
(4a) is using cluster result as the size initialization value of YOLOv3 network candidates frame;
(4b) is based on improved loss function in YOLOv3 network and carries out K repetitive exercise to training set, and K > 10000 is obtained
To pedestrian detection network model;
(5) test set is detected:
Test set to be detected is input in pedestrian detection network model and is detected, the position of each pedestrian target is obtained
Set coordinate and confidence level.
Compared with the prior art, the invention has the following advantages:
The present invention improves the loss function in YOLOv3, increases in coordinate loss function to small size prediction block
The study weight of error of coordinate avoids the defect that prediction block cannot lose according to own dimensions feature learning, meanwhile, the present invention
K-means clustering algorithm is improved, the value of wide Gao great little and the ratio of width to height to callout box in training set are screened, and are retained
Invalid data is removed while valid data, and valid data are clustered, and avoids invalid markup information and cluster is tied
Fruit is inaccurate and influences the defect of detection accuracy, and simulation result shows that the present invention effectively improves row compared with prior art
The detection accuracy of people's detection.
Detailed description of the invention
Fig. 1 is implementation flow chart of the present invention.
Specific embodiment
In the following with reference to the drawings and specific embodiments, present invention is further described in detail.
Referring to Fig.1, the present invention includes the following steps:
Step 1) constructs training set and test set:
Step 1a) by video camera, unmanned plane or mobile phone shooting any scene under pedestrian video in continuously or discontinuously
N frame picture extract and a frame picture and be saved in JPEGImages file, N > 10000 every 10 frames, adopted in the present embodiment
It is continuous 12000 frame picture in pedestrian's video in the road of mobile phone shooting, each width picture is named as different names,
Wherein the resolution ratio of video is that the quantity of the picture saved in 1920 × 1080, JPEGImages file is no less than 1000 width;
Step 1b) using picture more than half in JPEGImages file as training pictures, remaining picture conduct
Pictures are tested, the ratio cut partition training pictures and test pictures of 7:3 are used in the present embodiment, and will be trained in pictures
In trainval.txt file under the title write-in ImageSets/Main file of all pictures, while pictures will be tested
In all pictures title write-in ImageSets/Main file under test.txt file in, wherein the title of every width picture
A line is used as in trainval.txt file and test.txt file;
Step 1c) picture frame mark is carried out with the pedestrian target that every width picture that picture is concentrated is included is tested to training pictures
Note:
Step 1c1) to the classification and position coordinates (x of pedestrian targetmin,ymin,xmax,ymax) be labeled, wherein each
The classification of pedestrian target is person, xminFor the corresponding x-axis projection coordinate in the callout box upper left corner, yminFor the callout box upper left corner
Corresponding y-axis projection coordinate, xmaxFor the corresponding x-axis projection coordinate in the callout box lower right corner, ymaxFor the corresponding y in the callout box lower right corner
Axial projection's coordinate;
Step 1c2) by training pictures and test pictures every width picture in all pedestrian targets markup information with
Xml format is saved, and the Annotations file being made of multiple xml formatted files is obtained, wherein each xml format
The title of file with it includes markup information corresponding to picture name it is identical, the mark as corresponding to picture 000001.jpg
Information file name is 000001.xml, by JPEGImages file, Annotations file and ImageSets file
It folds up in file darknet;
Step 1d) it will be identical as picture name in trainval.txt file from what is chosen in Annotations file
Xml document as training pictures markup information collection, xml document conduct identical with picture name in test.txt file
The markup information collection of pictures is tested, and will be under the markup information collection write-in darknet file of training pictures
In train.txt file, the markup information collection for testing pictures is written in the test.txt file under darknet file,
Trained pictures xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml mark corresponding with its are believed
Breath collection constitutes test set;
Step 2) is based on improved k-means algorithm and clusters to training set:
Step 2a) markup information in training set is screened:
Step 2a1) building array data_xml, using the obj.findtext in python from the xml of all training sets
Coordinate data is extracted in file, and coordinate data is sequentially written in data_xml, wherein each member of data_xml
It indicates one group of coordinate data, the length l of data_xml is calculated using the len function in python, read the in data_xml
One group of coordinate data, and initialize the current index value q=0 of data_xml;
Step 2a2) define the corresponding coordinate data of q in data_xml: by the corresponding x-axis projection coordinate in the callout box upper left corner
It is defined as xmin, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as ymin, the corresponding x-axis projection seat in the callout box lower right corner
Mark is defined as xmax, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as ymax;
Step 2a3) calculate xminWith xmaxDifference xd, yminWith ymaxDifference yd, wherein xmin、xmax、yminAnd ymaxIt is
Floating type number, and judge xdAnd ydWhether the data in corresponding data_xml are valid data, if xd=0 or yd=0, then xdWith
ydData in corresponding data_xml are invalid data, delete this in data_xml using the del function in python
Group invalid data, l=l-1, and execute step (2a2);If xd≠ 0 and yd≠ 0, then xdAnd ydNumber in corresponding data_xml
According to for valid data, execute step (2a4);
Step 2a4) calculate xdWith ydQuotient div, and according in the corresponding data_xml of the whether true judgement div of div > 3
The validity of data use the del letter in python if so, data in the corresponding data_xml of div are invalid data
Number deletes this group of invalid data, l=l-1 in data_xml, and executes step (2a5), otherwise, then the corresponding data_ of div
Data in xml are valid data, enable q=q+1, and execute step (2a5);
Step 2a5) step (2a2)~(2a4) is repeated until q=l, effective markup information is obtained, i.e., data_ at this time
Whole markup informations in xml;
Step 2b) effective markup information is clustered:
Step 2b1) number of artificial setting cluster centre is k, k > 0, k is 9 in the present embodiment, constructs two-dimensional matrix
Data_k, line number are the length l of data_xml at this time, and the row of columns k, data_k indicate to save in data_xml effective
Markup information, column indicate the value of cluster centre, and the use of the np.zeros initialization data_k in python are 0,;
Step 2b2) k cluster centre is carried out respectively using the np.random.choice in python it is initial at random
Change, wherein each cluster centre is the floating type array that one group of length is 2, the value of cluster centre is written entitled clusters's
In boxes;
Step 2b3) calculate l effectively markup informations and k cluster centre in data_xml distance value d (box,
Centroid), calculation expression are as follows:
D (box, centroid)=1-IOU (box, centroid)
Box=xd×yd
Wherein centroid indicate cluster centre in two floating type members product, box ∩ centorid indicate box with
The intersection of centroid, box ∪ centorid indicates the union of box and centroid, then by each d (box, centroid)
The position in data_k effectively where the corresponding row of markup information and the corresponding column of cluster centre is written;
Step 2b4) using the np.argmin in python calculate the column in the every a line of data_k where lowest distance value simultaneously
It is recorded in variable nearest_clusters, each cluster centre is updated using following sentence in python:
Clusters [cluster]=dist (boxes [nearest_clusters==cluster], axis=0)
Wherein, cluster is the index of cluster centre, and every once above-mentioned sentence cluster that executes adds one in python,
Until finishing until all updating to all cluster centres, updated cluster centre is still stored in entitled clusters's
In boxes;
Step 2b5) step (2b3) and (2b4) is repeated, until the value of k cluster centre no longer changes, and by k
The value of cluster centre is as cluster result;
Step 3) improves the loss function of YOLOv3 detection network:
By the coordinate in the delta_region_box function of region_layer.c file in darknet/src file
Loss function is revised as Loss'coord:
ti=2-wi×hi
Loss function Loss' is then completely improved in YOLOv3 are as follows:
Loss'=Lossnoobj+Lossobj+Lossclass+Loss'coord
Wherein, LossnoobjIndicate the confidence level loss function for not including the prediction block of target, LossobjIt indicates to include target
Prediction block confidence level loss function, LossclassIndicate classification loss function, Loss'coordIndicate improved coordinate loss
Function, λcoordNetwork is indicated to the weight parameter of prediction block coordinate, l.w indicates size of the network in the wide upper division of picture, l.h table
Showing the size that network divides on picture height, l.n indicates the number of prediction block in network, and i is the variable to l.w × l.h iteration,
J is the variable to l.n iteration, wiIndicate the width of prediction block,Indicate the width of callout box, hiIndicate the height of prediction block,It indicates
The height of callout box, xiIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate xmin, yiIndicate the upper left corner of prediction block
Coordinate y-axis projection,Indicate ymin, LossnoobjIndicate the confidence level loss function for not including the prediction block of target,
LossobjFor the confidence level loss function of the prediction block comprising target, LossclassFor classification loss function, λnoobjExpression is not wrapped
The corresponding coefficient of prediction block containing target,It is to indicate whether the prediction block does not include the parameter of target, ciIt is set for prediction block
Reliability,For callout box confidence level, λobjIndicate the corresponding coefficient of prediction block comprising target,It is whether to indicate the prediction block
Parameter comprising target;λclassIndicate that the corresponding coefficient of prediction block comprising target category, c indicate that the iteration for classification becomes
Amount, class indicate classification total in data set, pi(c) probability containing c classification in prediction block is indicated,Indicate callout box
In the probability containing c classification;
Step 4) is based on improved loss function and is trained to training set:
Step 4a) Initialize installation is carried out to the training parameter of pedestrian detection network:
The path of training set and test set in voc.data file is modified, and maximum number of iterations max_batches is set
It is 50200 times, picture batch processing size is 64, and initial learning rate is 10-3, momentum 0.9;
Step 4b) using cluster result as the size initialization value of YOLOv3 network candidates frame:
Cluster result is written in the anchors in yolov3-voc.cfg file;
Step 4c) K repetitive exercise, K > carried out to training set based on improved loss function in YOLOv3 network
10000, K is 20000 in the present embodiment, obtains pedestrian detection network model;
Step 5) detects test set:
Step 5a) under darknet file input shell-command:
./darknet detector test cfg/voc.data cfg/yolov3-voc.cfg yolov3-voc_
20000.weights
Step 5b) shell-command of the pedestrian detection network model according to input, by improved loss function to reading
Test set picture carries out forward calculation, obtains the position coordinates and confidence level of each pedestrian target, and is stored in data/out text
In part folder.
Below in conjunction with emulation experiment, technical effect of the invention is described further:
1. simulated conditions and content:
Emulation experiment of the invention is in Intel (R) Xeon (R) CPU E5-2650 v4@2.20GHz, GeForce GTX
It is realized under the configuration surroundings of 1080ti x4,32G memory.Pedestrian's video data used in experiment, which derives from, uses red rice
In the Xian Electronics Science and Technology University campus of note7 mobile phone actual photographed and nearby road pedestrian.
Emulation experiment: using based on the pedestrian detection method and the prior art pair for improving k-means and improvement loss function
The detection accuracy of pedestrian detection does contrast simulation, after constructing training set and test set according to the present invention, first to the mark of training set
It infuses the improved k-means of use of information and carries out valid data screening, later respectively to the valid data in training set markup information
It is clustered to obtain respective cluster result with total data, using two cluster results as based on improvement loss function
The initialization size of YOLOv3 and in the prior art network candidates frame, then using improved loss function in YOLOv3 to training
Collection carry out 20000 times it is trained while using prior art network to training set progress 20000 training, finally obtain respective
Pedestrian detection network model is separately input to the test set to obtain two models in two pedestrian detection network models to detect respectively
Position coordinates and the confidence level of each pedestrian target out are as a result, and count the detection accuracy of two methods, specific detection accuracy
Comparison is as shown in the table.
2. analysis of simulation result:
The obtained pedestrian detection result of the present invention has apparent advantage, the prior art and this hair compared with prior art
Bright detection accuracy is as shown in table 1:
1 detection accuracy contrast table of table
Evaluation index | The prior art | The present invention |
Detection accuracy | 87.3 | 89.0 |
, it is apparent that the obtained detection accuracy of the present invention is bigger from table, show inspection of the present invention to pedestrian target
It surveys effect and is better than the prior art.
Above description is only example of the present invention, does not constitute any limitation of the invention.For this field
Professional for, all may be without departing substantially from the principle of the invention, structure the case where after having understood the content of present invention and principle
Under, various modifications and variations in form and details are carried out, but these modifications and variations based on inventive concept are still at this
Within the claims of invention.
Claims (2)
1. a kind of pedestrian detection method based on improved k-means and loss function, which comprises the steps of:
(1) training set and test set are constructed:
(1a) saves the N frame image under any scene of acquisition in pedestrian's video continuously or discontinuously in the form of jpg picture
It is named into JPEGImages file, and to each width picture, N > 10000;
(1b) using picture more than half in JPEGImages file as training pictures, remaining picture is as test picture
Collect, and the title of all pictures in training pictures is written to the trainval.txt file under ImageSets/Main file
In, at the same by test picture concentrate all pictures title be written ImageSets/Main file under test.txt file
In;
The different pedestrians that every width picture that (1c) concentrates training pictures and test picture is included carry out picture frame mark, and protect
The coordinate data of callout box is deposited, then will be wrapped in the classification person of pedestrian target included in callout box and every width picture
The coordinate data of the callout box contained is saved in xml document, obtains the Annotations file being made of multiple xml documents,
Wherein the title of the corresponding pedestrian's picture of the title of each xml document is identical;
(1d) will be literary from the xml identical with picture name in trainval.txt file chosen in Annotations file
Markup information collection of the part as training pictures, xml document identical with picture name in test.txt file is as test chart
The markup information collection of piece collection, and by training pictures markup information collection write-in darknet file under train.txt file
In, the markup information collection for testing pictures is written in the test.txt file under darknet file, the trained pictures
Xml markup information Ji Gouchengxunlianji corresponding with its, test pictures xml markup information collection corresponding with its constitute test
Collection;
(2) training set is clustered based on improved k-means algorithm:
(2a) screens the markup information in training set:
The array data_ that the coordinate data write-in length extracted from the corresponding xml mark file of training set is l by (2a1)
In xml, using read from data_xml first group of coordinate data as changing coordinates data, and it is initialized in data_xml
In current index value q=0;
(2a2) defines the corresponding coordinate data of q in data_xml: the corresponding x-axis projection coordinate in the callout box upper left corner is defined as
xmin, the corresponding y-axis projection coordinate in the callout box upper left corner is defined as ymin, the corresponding x-axis projection coordinate definition in the callout box lower right corner
For xmax, the corresponding y-axis projection coordinate in the callout box lower right corner is defined as ymax;
(2a3) calculates xminWith xmaxDifference xd, yminWith ymaxDifference yd, and judge xdAnd ydNumber in corresponding data_xml
According to whether being valid data, if xd=0 or yd=0, then xdAnd ydData in corresponding data_xml are invalid data, are deleted
The invalid data, l=l-1, and execute step (2a2);If xd≠ 0 and yd≠ 0, then xdAnd ydNumber in corresponding data_xml
According to for valid data, execute step (2a4);
(2a4) calculates xdWith ydQuotient div, and according to the data in the corresponding data_xml of the whether true judgement div of div > 3
Validity deletes the invalid data, l=l-1, and hold if so, the data in the corresponding data_xml of div are invalid data
Row step (2a5), otherwise, then the data in the corresponding data_xml of div are valid data, enable q=q+1, and execute step
(2a5);
(2a5) repeats step (2a2)~(2a4) until q=l, obtains effective markup information;
(2b) clusters effective markup information:
(2b1) sets the number of cluster centre as k, and k > 0 is constructed using the length l of data_xml as line number, using k as the two dimension of columns
The row of matrix data_k, data_k indicate that the effective markup information saved in data_xml, column indicate the value of cluster centre, and just
Beginningization data_k is 0,;
(2b2) carries out random initializtion to k cluster centre respectively;
(2b3) calculates the distance value of l effective markup informations and k cluster centre in data_xml, and each distance value is write
Enter the position in data_k where the corresponding row of effective markup information and the corresponding column of cluster centre;
(2b4) is using the corresponding effective markup information of a line every in data_k as the column pair where lowest distance value in every a line
It answers the member of cluster centre, and the numerical value of each cluster centre is updated to by wide and high equal of each cluster centre member
Value;
(2b5) repeats step (2b3) and (2b4), until the value of k cluster centre no longer changes, and by k cluster centre
Value as cluster result;
(3) loss function of YOLOv3 detection network is improved:
YOLOv3 is detected into the coordinate loss function in network losses function and is revised as Loss'coord:
ti=2-wi×hi
Wherein, λcoordIndicate network to the weight parameter of prediction block coordinate, l.w indicate network in the size of the wide upper division of picture,
L.h indicates the size that network divides on picture height, and l.n indicates the number of prediction block in network, and i is to l.w × l.h iteration
Variable, j are the variable to l.n iteration, wiIndicate the width of prediction block,Indicate the width of callout box, hiIndicate the height of prediction block,
Indicate the height of callout box, xiIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate xmin, yiIndicate a left side for prediction block
Upper angular coordinate y-axis projection,Indicate ymin;
(4) training set is trained based on improved loss function:
(4a) is using cluster result as the size initialization value of YOLOv3 network candidates frame;
(4b) is based on improved loss function in YOLOv3 network and carries out K repetitive exercise to training set, and K > 10000 is gone
People detects network model;
(5) test set is detected:
Test set to be detected is input in pedestrian detection network model and is detected, the position for obtaining each pedestrian target is sat
Mark and confidence level.
2. the pedestrian detection method according to claim 1 based on improved k-means and loss function, feature exist
In YOLOv3 described in step (3) detects the loss function of network, calculation expression Loss=Lossnoobj+Lossobj+
Lossclass+Losscoord
ti=2-wi×hi
Wherein, Loss indicates loss function, LossnoobjIndicate the confidence level loss function for not including the prediction block of target,
LossobjIndicate the confidence level loss function of the prediction block comprising target, LossclassIndicate classification loss function, LosscoordTable
Show coordinate loss function, λnoobjIndicate that the corresponding coefficient of prediction block for not including target, l.w indicate network in picture wide direction
Division size, l.h indicates division size of the network on the high direction of picture, and i, j are respectively corresponding iteration variable,It is
Indicate whether the prediction block does not include the parameter of target, ciFor prediction block confidence level,For callout box confidence level;λobjIndicate packet
The corresponding coefficient of prediction block containing target,Be indicate the prediction block whether include target parameter;λclassIt indicates to include target
The corresponding coefficient of the prediction block of classification, c indicate the iteration variable for being directed to classification, and class indicates classification total in data set, pi
(c) probability containing c classification in prediction block is indicated,Indicate the probability containing c classification in callout box, λcoordIndicate network
To the weight parameter of prediction block coordinate, wiIndicate the width of prediction block,Indicate the width of callout box, hiIndicate the height of prediction block,Table
Show the height of callout box, xiIndicate projection of the top left co-ordinate in x-axis of prediction block,Indicate xmin, yiIndicate the upper left of prediction block
Angular coordinate y-axis projection,Indicate ymin。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910202078.4A CN109978035B (en) | 2019-03-18 | 2019-03-18 | Pedestrian detection method based on improved k-means and loss function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910202078.4A CN109978035B (en) | 2019-03-18 | 2019-03-18 | Pedestrian detection method based on improved k-means and loss function |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109978035A true CN109978035A (en) | 2019-07-05 |
CN109978035B CN109978035B (en) | 2021-04-02 |
Family
ID=67079213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910202078.4A Active CN109978035B (en) | 2019-03-18 | 2019-03-18 | Pedestrian detection method based on improved k-means and loss function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109978035B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866476A (en) * | 2019-11-06 | 2020-03-06 | 南京信息职业技术学院 | Dense stacking target detection method based on automatic labeling and transfer learning |
CN110929646A (en) * | 2019-11-22 | 2020-03-27 | 国网福建省电力有限公司 | Power distribution tower reverse-off information rapid identification method based on unmanned aerial vehicle aerial image |
CN110942005A (en) * | 2019-11-21 | 2020-03-31 | 网易(杭州)网络有限公司 | Object recognition method and device |
CN111104965A (en) * | 2019-11-25 | 2020-05-05 | 河北科技大学 | Vehicle target identification method and device |
CN111274894A (en) * | 2020-01-15 | 2020-06-12 | 太原科技大学 | Improved YOLOv 3-based method for detecting on-duty state of personnel |
CN112800906A (en) * | 2021-01-19 | 2021-05-14 | 吉林大学 | Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile |
CN113361347A (en) * | 2021-05-25 | 2021-09-07 | 东南大学成贤学院 | Job site safety detection method based on YOLO algorithm |
CN113537257A (en) * | 2020-04-13 | 2021-10-22 | 山西农业大学 | Wheat detection method realized based on YoLov3 network |
CN113807472A (en) * | 2021-11-19 | 2021-12-17 | 智道网联科技(北京)有限公司 | Hierarchical target detection method and device |
CN114119583A (en) * | 2021-12-01 | 2022-03-01 | 常州市新创智能科技有限公司 | Industrial visual inspection system, method, network model selection method and warp knitting machine |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100272366A1 (en) * | 2009-04-24 | 2010-10-28 | Sony Corporation | Method and device of detecting object in image and system including the device |
CN103186776A (en) * | 2013-04-03 | 2013-07-03 | 西安电子科技大学 | Human detection method based on multiple features and depth information |
US20140133698A1 (en) * | 2012-11-09 | 2014-05-15 | Analog Devices Technology | Object detection |
CN107358223A (en) * | 2017-08-16 | 2017-11-17 | 上海荷福人工智能科技(集团)有限公司 | A kind of Face datection and face alignment method based on yolo |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN108647665A (en) * | 2018-05-18 | 2018-10-12 | 西安电子科技大学 | Vehicle real-time detection method of taking photo by plane based on deep learning |
-
2019
- 2019-03-18 CN CN201910202078.4A patent/CN109978035B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100272366A1 (en) * | 2009-04-24 | 2010-10-28 | Sony Corporation | Method and device of detecting object in image and system including the device |
US20140133698A1 (en) * | 2012-11-09 | 2014-05-15 | Analog Devices Technology | Object detection |
CN103186776A (en) * | 2013-04-03 | 2013-07-03 | 西安电子科技大学 | Human detection method based on multiple features and depth information |
CN107358223A (en) * | 2017-08-16 | 2017-11-17 | 上海荷福人工智能科技(集团)有限公司 | A kind of Face datection and face alignment method based on yolo |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN108647665A (en) * | 2018-05-18 | 2018-10-12 | 西安电子科技大学 | Vehicle real-time detection method of taking photo by plane based on deep learning |
Non-Patent Citations (2)
Title |
---|
HUI ZHANG 等: "Pedestrian Detection Method Based on Faster R-CNN", 《 2017 13TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY》 * |
PIOTR DOLLAR 等: "Pedestrian Detection: An Evaluation of the State of the Art", 《 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866476A (en) * | 2019-11-06 | 2020-03-06 | 南京信息职业技术学院 | Dense stacking target detection method based on automatic labeling and transfer learning |
CN110866476B (en) * | 2019-11-06 | 2023-09-01 | 南京信息职业技术学院 | Dense stacking target detection method based on automatic labeling and transfer learning |
CN110942005A (en) * | 2019-11-21 | 2020-03-31 | 网易(杭州)网络有限公司 | Object recognition method and device |
CN110929646A (en) * | 2019-11-22 | 2020-03-27 | 国网福建省电力有限公司 | Power distribution tower reverse-off information rapid identification method based on unmanned aerial vehicle aerial image |
CN111104965A (en) * | 2019-11-25 | 2020-05-05 | 河北科技大学 | Vehicle target identification method and device |
CN111274894A (en) * | 2020-01-15 | 2020-06-12 | 太原科技大学 | Improved YOLOv 3-based method for detecting on-duty state of personnel |
CN113537257A (en) * | 2020-04-13 | 2021-10-22 | 山西农业大学 | Wheat detection method realized based on YoLov3 network |
CN112800906A (en) * | 2021-01-19 | 2021-05-14 | 吉林大学 | Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile |
CN113361347A (en) * | 2021-05-25 | 2021-09-07 | 东南大学成贤学院 | Job site safety detection method based on YOLO algorithm |
CN113807472A (en) * | 2021-11-19 | 2021-12-17 | 智道网联科技(北京)有限公司 | Hierarchical target detection method and device |
CN113807472B (en) * | 2021-11-19 | 2022-02-22 | 智道网联科技(北京)有限公司 | Hierarchical target detection method and device |
CN114119583A (en) * | 2021-12-01 | 2022-03-01 | 常州市新创智能科技有限公司 | Industrial visual inspection system, method, network model selection method and warp knitting machine |
Also Published As
Publication number | Publication date |
---|---|
CN109978035B (en) | 2021-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109978035A (en) | Pedestrian detection method based on improved k-means and loss function | |
CN111104898B (en) | Image scene classification method and device based on target semantics and attention mechanism | |
Chen et al. | City-scale map creation and updating using GPS collections | |
Chu et al. | Camera as weather sensor: Estimating weather information from single images | |
CN108171233A (en) | Use the method and apparatus of the object detection of the deep learning model based on region | |
CN108647665A (en) | Vehicle real-time detection method of taking photo by plane based on deep learning | |
Yap et al. | A comparative study of mobile-based landmark recognition techniques | |
US9858503B2 (en) | Acceleration of linear classifiers | |
CN105574550A (en) | Vehicle identification method and device | |
CN108052966A (en) | Remote sensing images scene based on convolutional neural networks automatically extracts and sorting technique | |
CN107346328A (en) | A kind of cross-module state association learning method based on more granularity hierarchical networks | |
CN110309780A (en) | High resolution image houseclearing based on BFD-IGA-SVM model quickly supervises identification | |
CN109766835A (en) | The SAR target identification method of confrontation network is generated based on multi-parameters optimization | |
Chen et al. | Clues from the beaten path: Location estimation with bursty sequences of tourist photos | |
KR102516588B1 (en) | A measurement device, method and program that provides measured and estimated values of fine dust concentration through satellite image analysis using an artificial intelligence model | |
CN108492298A (en) | Based on the multispectral image change detecting method for generating confrontation network | |
CN113918837B (en) | Method and system for generating city interest point category representation | |
CN103955709B (en) | Weighted synthetic kernel and triple markov field (TMF) based polarimetric synthetic aperture radar (SAR) image classification method | |
CN110276363A (en) | A kind of birds small target detecting method based on density map estimation | |
Sathish et al. | Detection and localization of multiple objects using VGGNet and single shot detection | |
CN106250918B (en) | A kind of mixed Gauss model matching process based on improved soil-shifting distance | |
CN110659601A (en) | Depth full convolution network remote sensing image dense vehicle detection method based on central point | |
CN105205807B (en) | Method for detecting change of remote sensing image based on sparse automatic coding machine | |
CN115311502A (en) | Remote sensing image small sample scene classification method based on multi-scale double-flow architecture | |
CN108154511A (en) | SAR image segmentation method based on submodule dictionary learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |