CN109740662A

CN109740662A - Image object detection method based on YOLO frame

Info

Publication number: CN109740662A
Application number: CN201811621484.6A
Authority: CN
Inventors: 王强
Original assignee: Chengdu Si Han Science And Technology Co Ltd
Current assignee: Chengdu Si Han Science And Technology Co Ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2019-05-10

Abstract

The invention discloses a kind of image object detection methods based on YOLO frame.This method is by establishing YOLO frame model, input YOLO frame model obtains multiple prediction tensor values after testing image to be converted into the image of fixed size, each prediction tensor value obtains prediction rectangle frame by conversion formula retrospectively calculate, obtained multiple prediction rectangle frames are passed through into non-maximum suppression algorithm process, obtain most reliable rectangle frame, and the classification that target can be obtained and position are converted it in original image, YOLO frame model is in original YOLO frame foundation, it makes improvements, so that detectability greatly promotes, so that original Small object that can't detect is able to detect that, and the approximate location that and target target be can be accurately identified, it is compared to traditional modeling method simultaneously, reduce the development time, substantially increase the accuracy and speed of detection, so that algorithm Real-time ability can be reached.It is suitble to promote and apply in technical field of data processing.

Description

Image object detection method based on YOLO frame

Technical field

The present invention relates to technical field of data processing, especially a kind of image object detection method based on YOLO frame.

Background technique

The task of target detection is position and the classification found out all interested targets in image, and determine target, this It is one of the key problem in computer vision.Target detection mainly solve the problems, such as be target itself query, " Who Am I? " " I where? ", i.e., target be in the picture what and where the problem of.

Small object in image is locating in present daily life as it can be seen that such as general camera shoots remote object When will generate Small object；When vehicle identifies the small traffic lights in front at a distance in automatic Pilot；Medical microscope is clapped Also it will appear Small object etc. when taking the photograph cell image.Small object is closely coupled for our daily life, closely bound up, by grinding Studying carefully Small object can be finer with the life of let us, more convenient.

Small object is specifically defined there are two types of mode, and one is absolute dimension definition, another kind is that relative size defines.Absolutely The pixel for defining target in i.e. image to size is less than 32*32, that is, is regarded as Small object.Relative size definition is target picture Element is less than wide and high 1/10th of image, so that it may be considered Small object.In the research of this patent, Small object is all absolute ruler Very little definition.

In the development history of target detection, what is mainly studied before this is big target, i.e., occupies in an image more The target of pixel.It is constantly progressive with research, just starts the Small object that gradually begins one's study, and in the small mesh that most begins one's study Target image be also not using general camera shoot image, but infrared camera shooting image.It is introduced by machine learning After image, the image of general camera shooting is just begun to use to make a search.

For the research of the small target deteection of general camera shooting, there are mainly two types of study routes to carry out at present, traditional Machine learning method and the preferable deep learning method of effect.Traditional machine learning method mainly has HOG+SVM, DPM, Haar + Adaboost etc., the algorithm above mainly completes detection work using the single features of image and a classifier, this Method has a fatal defect, and generalization ability is too weak.Simple to understand to be exactly that HOG feature is mainly used for pedestrian detection, Harr is special Sign is mainly used for Face datection, the model that training obtains is used for other targets, obtained effect can be very poor.This defect is always Scientists are annoying, until the appearance of deep learning method, just gradually improve this problem.

In the algorithm of target detection of current mainstream, deep learning occupies mainstream, not only due to the effect of its detection is good, and And the required time is also gradually decreasing.There are two " school ", single phase detections and two for current main deep learning method Stage detection.Two " schools " respectively have the advantage of oneself, can be selected to be suitble to itself according to the actual conditions of the application of oneself Method.Two stage method occupies the rivers and mountains great Bi with the advantage of accuracy, and classical method has FastR-CNN, Faster R- CNN, MaskR-CNN, the main thought of this method are to inherit conventional method, after continuously improve till now achievement.The party Method used convolutional network to extract feature before this, and generate target may where region, these possible regions are put into point In class device and recurrence device, identification mission and Detection task are done respectively, finally obtains classification and the exact position of target.Though this method Right accuracy is high, but is not achieved in real time, therefore the method for single phase is come into being.

Single-stage process occupies the ground of a side with speed advantage, typically has YOLO serial in single-stage process.YOLO Main thought be a given picture, classification and the position of target are directly obtained by the thought of recurrence.Although this method is fast Degree is fast, but accuracy is unsatisfactory, especially when detecting Small object.

Above-mentioned method is all the algorithm in general target detection, also seldom for the algorithm of small target deteection, there is utilization The information on Small object periphery is helped to detect Small object and is predicted in a network using more shallow layer target, the above-mentioned party Method is not significant to the promotion effect of Small object, and the required time is also huge.

Summary of the invention

Technical problem to be solved by the invention is to provide one kind can detect Small object, and can accurately know Other target is the approximate location of what and target, and enables the algorithm to the figure based on YOLO frame for reaching real-time ability As object detection method.

The technical solution adopted by the present invention to solve the technical problems are as follows: should the image object detection side based on YOLO frame Method, comprising the following steps:

1) YOLO frame model, is established；

The foundation of the YOLO frame model includes the following steps:

A, the image for acquiring transmission line of electricity establishes data set；

B, each image concentrated to data pre-processes, and the pretreatment includes image cropping, scales, overturning, Displacement, rotation, brightness adjustment, plus noise；

C, feature extraction is carried out to the image by step B processing；The process of the feature extraction is as follows: will pass through step Each image of B processing inputs darknet53 network respectively and carries out character extraction, and obtaining three sizes is respectively 13* The characteristic pattern of 13,26*26,52*52；

D, each image is obtained after predicting network processes respectively by feature extraction three obtained characteristic pattern Three tensor values；Details are provided below: the characteristic pattern of 13*13 being first passed through to the convolutional layer of 5 conversions, passes through 2 later It predicts convolutional layer, the vector of 3* (the classification number of 4+1+ data set) is finally obtained in each characteristic point, in conjunction with 13*13 spy The size for levying figure, obtains the tensor value of one (batch_size, 3,13,13,3* (the classification numbers of 4+1+ data set))；It is right later The characteristic pattern of 13*13 is up-sampled to obtain the characteristic pattern of 26*26, the 26*26 characteristic pattern with this image obtained in step C It blends, obtains a new characteristic pattern, at convolutional layer and 2 prediction convolutional layers of the new characteristic pattern by 5 conversions An equal amount of vector is obtained after reason in each characteristic point, in conjunction with the size of 26*26 characteristic pattern, obtains one The tensor value of (batch_size, 3,26,26,3* (the classification number of 4+1+ data set))；Then the characteristic pattern of 26*26 is done and is adopted Sample obtains the characteristic pattern of 52*52, blends with the 52*52 characteristic pattern of this image obtained in step C, obtains a new spy After sign figure, the convolutional layer converted to the new characteristic pattern by 5 and 2 prediction convolutional layer processing in each characteristic point (batch_size, 3,52,52,3* (4+1+ are obtained in conjunction with the size of 52*52 characteristic pattern to an equal amount of vector The classification number of data set)) tensor value；Wherein number 3 indicates that the quantity of the anchor on this feature figure, number 4 indicate prediction The centre coordinate value of obtained prediction result and wide high level, number 1 indicate the confidence level of prediction block, and classification number is indicated in this feature The probability of classification is predicted on point；

E, the label data of each image is obtained, the label data includes centre coordinate value bx, by, wide high level bw, Bh and classification, and the label data of acquisition is converted into training data, specific conversion process is as described below: by the number of tags of acquisition According to centre coordinate value bx, by and wide high level bw, bh substitute into following conversion formula and obtain the centre coordinate value tx of training data, Ty, wide high level

Tw, th and confidence level, the classification number of training data and the classification number of label data are identical；The conversion formula Are as follows:

Wherein cx, cy are predicted grid, and pw, ph are predetermined anchor value；

F, penalty values calculate, and the penalty values include coordinate loss and confidence level loss, Classification Loss, specific calculation It is as follows:

The centre coordinate value of the training data obtained in each characteristic point and wide high level are obtained on this feature point respectively The centre coordinate value in (batch_size, 3,13,13,3* (the classification number of 4+1+ data set)) tensor value and wide high level that arrive, Centre coordinate value in (batch_size, 3,26,26,3* (the classification number of 4+1+ data set)) tensor value and wide high level, Centre coordinate value and wide high level in (batch_size, 3,52,52,3* (the classification number of 4+1+ data set)) tensor value pass through flat The mode of square error loss carrys out coordinates computed loss；

By the confidence level of the training data obtained in each characteristic point respectively with the (batch_ that is obtained on this feature point Size, 3,13,13,3* (the classification number of 4+1+ data set)) confidence level in tensor value, (batch_size, 3,26,26,3* (4 The classification number of+1+ data set)) confidence level, (batch_size, the 3,52,52,3* (classification of 4+1+ data set in tensor value Number)) confidence level in tensor value calculates confidence level in such a way that square error is lost；

By the classification number of the training data obtained in each characteristic point respectively with the (batch_ that is obtained on this feature point Size, 3,13,13,3* (the classification number of 4+1+ data set)) classification number, (batch_size, 3,26,26,3* (4 in tensor value The classification number of+1+ data set)) classification number in tensor value, (batch_size, the 3,52,52,3* (classification of 4+1+ data set Number)) the classification number in tensor value calculates Classification Loss by way of intersecting entropy loss；

G, by the algorithm that reversed gradient is propagated be iterated calculating gradually decrease penalty values until penalty values no longer reduce, Obtain final YOLO frame model；

2) input YOLO frame model obtains multiple predictions, testing image to be converted into the image of fixed size after Magnitude, each prediction tensor value includes centre coordinate value tx, ty, wide high level tw, th and confidence level and classification number, by acquisition Predict centre coordinate value tx, the ty of tensor value, wide high level tw, th substitute into following conversion formula retrospectively calculate and obtain prediction rectangle frame Centre coordinate value bx, by, width high level bw, bh and confidence level, predict the classification number of frame data and the class of prediction tensor value It does not count identical；The conversion formula are as follows:

Wherein cx, cy are predicted grid, and pw, ph are predetermined anchor value；

3) obtained multiple prediction rectangle frames, are obtained into most reliable rectangle frame by non-maximum suppression algorithm process, and Most reliable rectangle frame is transformed into classification and the position that target can be obtained in original image.

Beneficial effects of the present invention: the image object detection method based on YOLO frame is somebody's turn to do by establishing YOLO frame mould Type, input YOLO frame model obtains multiple prediction tensor values after testing image to be converted into the image of fixed size, often A prediction tensor value by conversion formula retrospectively calculate obtain prediction rectangle frame, by obtained multiple prediction rectangle frames by it is non-most Big restrainable algorithms processing, obtains most reliable rectangle frame, and most reliable rectangle frame is transformed into original image, target can be obtained Classification and position YOLO frame model of the present invention by using deep learning in original YOLO frame foundation, it is right It is improved, so that detectability greatly promotes, so that original Small object that can't detect is able to detect that, and can be quasi- Really identification target is the approximate location of what and target, while being compared to traditional modeling method, reduces a large amount of mathematics Formula calculates and model construction, reduces the development time, substantially increases the accuracy and speed of detection, enable the algorithm to reach To real-time ability, and YOLO frame model of the present invention is using the YOLO method of single phase, this method and its His two stage method is compared occupies huge advantage in speed, and can further promote detection essence by cascade method Degree.

Specific embodiment

The image object detection method based on YOLO frame, comprising the following steps:

1) YOLO frame model, is established；

The foundation of the YOLO frame model includes the following steps:

Wherein cx, cy are predicted grid, and pw, ph are predetermined anchor value；

2) input YOLO frame model obtains multiple predictions, testing image to be converted into the image of fixed size after Magnitude, the fixed size are ordinarily selected to 416*416 size, and each prediction tensor value includes centre coordinate value tx, ty, Wide high level tw, th and confidence level and classification number, by centre coordinate value tx, the ty, wide high level tw, th of the prediction tensor value of acquisition It substitutes into following conversion formula retrospectively calculate and obtains centre coordinate the value bx, by, width high level bw, bh and confidence of prediction rectangle frame Degree predicts that the classification number of frame data is identical as the prediction classification number of tensor value；The conversion formula are as follows:

Wherein cx, cy are predicted grid, and pw, ph are predetermined anchor value；

Beneficial effects of the present invention: the image object detection method based on YOLO frame is somebody's turn to do by establishing YOLO frame mould Type, input YOLO frame model obtains multiple prediction tensor values after testing image to be converted into the image of fixed size, often A prediction tensor value by conversion formula retrospectively calculate obtain prediction rectangle frame, by obtained multiple prediction rectangle frames by it is non-most Big restrainable algorithms processing, obtains most reliable rectangle frame, and most reliable rectangle frame is transformed into original image, target can be obtained Classification and position YOLO frame model of the present invention by using deep learning in original YOLO frame foundation, it is right It is improved, so that detectability greatly promotes, so that original Small object that can't detect is able to detect that, and can be quasi- Really identification target is the approximate location of what and target, while being compared to traditional modeling method, reduces a large amount of mathematics Formula calculates and model construction, reduces the development time, substantially increases the accuracy and speed of detection, enable the algorithm to reach To real-time ability, and YOLO frame model of the present invention is using the YOLO method of single phase, this method and its His two stage method is compared occupies huge advantage in speed, and can further promote detection essence by cascade method Degree is 85.4% by the Average Accuracy that the experimental verification YOLO frame model detects target, at GTX1080Ti Processing speed can achieve 31 frame per second, and Small object Average Accuracy is 72.4%.It therefore, should the image mesh based on YOLO frame Mark detection method can detecte Small object, and can accurately identify the position that and Small object Small object be, and Processing speed can comparatively fast reach real-time ability.

Claims

1. the image object detection method based on YOLO frame, it is characterised in that the following steps are included:

1) YOLO frame model, is established；

The foundation of the YOLO frame model includes the following steps:

B, each image concentrated to data pre-processes, and the pretreatment includes image cropping, is scaled, and overturns, displacement, Rotation, brightness adjustment, plus noise；

C, feature extraction is carried out to the image by step B processing；The process of the feature extraction is as follows: will be by step B Each image of reason inputs darknet53 network respectively and carries out character extraction, and obtaining three sizes is respectively 13*13, The characteristic pattern of 26*26,52*52；

D, each image is obtained three after predicting network processes respectively by feature extraction three obtained characteristic pattern Tensor value；Details are provided below: the characteristic pattern of 13*13 being first passed through to the convolutional layer of 5 conversions, passes through 2 predictions later Convolutional layer finally obtains the vector of 3* (the classification number of 4+1+ data set), in conjunction with 13*13 characteristic pattern in each characteristic point Size, obtain the tensor value of one (batch_size, 3,13,13,3* (the classification numbers of 4+1+ data set))；Later to 13* 13 characteristic pattern is up-sampled to obtain the characteristic pattern of 26*26, the 26*26 characteristic pattern phase with this image obtained in step C Fusion, obtains a new characteristic pattern, the convolutional layer convert to the new characteristic pattern by 5 and 2 prediction convolutional layers processing An equal amount of vector is obtained in each characteristic point afterwards, in conjunction with the size of 26*26 characteristic pattern, obtains (a batch_ Size, 3,26,26,3* (the classification number of 4+1+ data set)) tensor value；Then up-sampling is done to the characteristic pattern of 26*26 to obtain The characteristic pattern of 52*52 is blended with the 52*52 characteristic pattern of this image obtained in step C, obtains a new characteristic pattern, The convolutional layer and 2 prediction convolutional layers convert to the new characteristic pattern by 5 obtain same in each characteristic point after handling The vector of sample size obtains (batch_size, 3,52,52, a 3* (4+1+ data in conjunction with the size of 52*52 characteristic pattern The classification number of collection)) tensor value；Wherein number 3 indicates that the quantity of the anchor on this feature figure, number 4 indicate that prediction obtains Prediction result centre coordinate value and wide high level, number 1 indicates the confidence level of prediction block, and classification number indicates on this feature point Predict the probability of classification；

E, obtain each image label data, the label data include centre coordinate value bx, by, width high level bw, bh with Classification, and the label data of acquisition is converted into training data, specific conversion process is as described below: by the label data of acquisition Centre coordinate value bx, by and wide high level bw, bh substitute into following conversion formula and obtain centre coordinate value tx, the ty of training data, wide High level

Wherein cx, cy are predicted grid, and pw, ph are predetermined anchor value；

F, penalty values calculate, and the penalty values include coordinate loss and confidence level loss, Classification Loss, and specific calculation is such as Under:

By the centre coordinate value of the training data obtained in each characteristic point and wide high level respectively with obtained on this feature point Centre coordinate value in (batch_size, 3,13,13,3* (the classification number of 4+1+ data set)) tensor value and wide high level, Centre coordinate value in (batch_size, 3,26,26,3* (the classification number of 4+1+ data set)) tensor value and wide high level, Centre coordinate value and wide high level in (batch_size, 3,52,52,3* (the classification number of 4+1+ data set)) tensor value pass through flat The mode of square error loss carrys out coordinates computed loss；

By the confidence level of the training data obtained in each characteristic point respectively with obtained on this feature point (batch_size, 3,13,13,3* (the classification number of 4+1+ data set)) confidence level in tensor value, (batch_size, 3,26,26,3* (4+1+ number According to the classification number of collection)) confidence level in tensor value, (batch_size, 3,52,52,3* (the classification number of 4+1+ data set)) Confidence level in magnitude calculates confidence level in such a way that square error is lost；

By the classification number of the training data obtained in each characteristic point respectively with obtained on this feature point (batch_size, 3,13,13,3* (the classification number of 4+1+ data set)) classification number, (batch_size, 3,26,26,3* (4+1+ number in tensor value According to the classification number of collection)) classification number, (batch_size, 3,52,52,3* (the classification number of 4+1+ data set)) in tensor value Classification number in magnitude calculates Classification Loss by way of intersecting entropy loss；

2) input YOLO frame model obtains multiple prediction tensors, testing image to be converted into the image of fixed size after Value, each prediction tensor value includes centre coordinate value tx, ty, wide high level tw, th and confidence level and classification number, by the pre- of acquisition Centre coordinate value tx, the ty of tensor value are surveyed, wide high level tw, th substitute into following conversion formula retrospectively calculate and obtain prediction rectangle frame Centre coordinate value bx, by, width high level bw, bh and confidence level predict the classification number of frame data and the classification of prediction tensor value Number is identical；The conversion formula are as follows:

Wherein cx, cy are predicted grid, and pw, ph are predetermined anchor value；

3) obtained multiple prediction rectangle frames, are obtained into most reliable rectangle frame by non-maximum suppression algorithm process, and will most Reliable rectangle frame is transformed into the classification that target can be obtained in original image and position.