CN113435269A

CN113435269A - Improved water surface floating object detection and identification method and system based on YOLOv3

Info

Publication number: CN113435269A
Application number: CN202110647573.3A
Authority: CN
Inventors: 刘献忠; 徐浩
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2021-09-24

Abstract

The invention discloses a water surface floater detecting and identifying method based on an improved YOLOv3 identifying model, which relates to the technical field of computer vision and comprises the following steps: acquiring water surface drift object data in advance, performing enhanced amplification on the image data by adopting geometric transformation and color transformation, labeling drift objects in the data to obtain a water surface drift object data set, and splitting the water surface drift object data set into a training set and a testing set; constructing an improved YOLOv3 network model, and training the improved YOLOv3 network model by adopting a water surface drift training set; and constructing a water surface drifter test set according to the water surface drifter data image, and detecting and identifying the water surface drifter test set by using a trained improved YOLOv3 network model. The improved YOLOv3 has strong generalization capability, occupies small storage space and display space, improves the detection and identification accuracy, can ensure real-time performance, and can realize accurate and rapid monitoring and identification of the water surface drifter in client equipment with computational power and limited memory.

Description

Improved water surface floating object detection and identification method and system based on YOLOv3

Technical Field

The invention belongs to the technical field of computer vision, and relates to an improved method and system for detecting and identifying a water surface drift based on YOLOv 3.

Background

In recent years, the speed of urbanization and industrialization in China is faster and faster, and the problem of water environment pollution is not optimistic while the economy is rapidly developed. A large amount of floaters exist in rivers and lakes, so that not only is the natural ecological landscape destroyed, but also the life health of human beings and the sustainable development of economy are seriously threatened, and therefore, the research on effectively monitoring the floaters in the rivers and lakes has important practical significance.

The existing water surface floater detection technology based on video images mainly aims at remote sensing images, and analyzes and detects whether floaters exist or not by extracting spectral features, spatial features and textural features of the remote sensing images. Because the remote sensing image field of vision is usually far away, so be difficult to detect the floater of less area in the city river course, simultaneously because the formation remote sensing image has the requirement to imaging device, it has certain degree of difficulty to gather the remote sensing image data set that has a large amount of floaters, is unfavorable for popularizing in the reality application. In the traditional image segmentation technology, because the water surface has reflection, the segmentation effect is not ideal due to factors such as illumination change and the like, and a large amount of water surface reflection cannot be correctly segmented.

The deep learning target detection technology uses a convolutional neural network to extract features, and stronger adaptability and generalization capability are realized through training and learning. The Yolov3 is a mainstream algorithm in the field of target detection at present, but the Yolov3 is large in size and cannot be embedded in equipment with limited computing power (such as an automatic salvage ship and other river patrol equipment) so as to meet the requirement of real-time detection. The problem that the sample class of the model is unbalanced is caused by the fact that a professional data set disclosed in the field of water surface drift object detection does not exist at present. YOLOv3 sacrifices a certain detection speed to improve detection accuracy, but detection of the water surface floating objects of small targets still has certain difficulty.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a water surface floater detecting and identifying method based on an improved YOLOv3 identification model, which maintains and improves the performance of a YOLOv3 algorithm on water surface floater detecting and identifying and simplifies the volume of a detection algorithm model.

The technical scheme of the invention is realized as follows:

the method comprises the following steps of firstly, acquiring a data set for training the water surface drifter in advance, enhancing and amplifying image data by adopting geometric transformation and color transformation, labeling the drifter of the data set to obtain a water surface drifter data set, and splitting the water surface drifter data set into a training set and a testing set;

step two, constructing an improved YOLOv3 network model;

step three, training the improved YOLOv3 network model constructed in the step two by using the water surface drifter training set obtained in the step one;

and step four, using the water surface drifter test set obtained in the step one and split according to the data image of the water surface drifter, and detecting and identifying the water surface drifter test set by using the improved YOLOv3 network model trained in the step three.

The inventive dataset was collected manually on site and required processing into the PASCAL VOC dataset format used for YOLOv 3.

Further, the step one is divided into the following two steps:

1.1, acquiring a training data set of the water surface drifter manually on site, carrying out color transformation by adjusting hue, contrast, saturation and brightness, carrying out geometric transformation and random cutting on an image, and then splicing by randomly selecting pictures to generate a new image; the geometric transformation refers to zooming, translation and rotation;

and 1.2, manually labeling the data by adopting Labelme, converting a data set label format into a PASCAL VOC data set format, and dividing a training data set and a testing data set by adopting a ratio of 8: 2-9: 1.

Further, in the second step, the construction of the improved YOLOv3 network model specifically includes the following steps:

2.1, replacing the original DarkNet53 network of the YOLOv3 network model with a GhostNet network; an attention mechanism layer SELayer is added into a GhostNet bottleck of GhostNet to enhance the attention degree of main channel characteristics;

and 2.2, sequentially extracting feature maps with down-sampling multiples of 4, 8, 16 and 32 from the GhostNet network structure through average pooling, and sequentially performing up-sampling on the feature maps and fusing the original features to obtain four new feature maps.

The GhostNet backbone network comprises three GhostNet branch output feature maps with the sizes of 26 × 26, 52 × 52 and 104 × 104, respectively, in addition to the feature map with the size of 13 × 13 of the final output.

The whole backbone network is subjected to feature extraction, and feature graphs of different sizes contain feature information of different levels. The shallower the hierarchy, the more limited the feature meaning, the deeper the hierarchy, the richer the feature meaning, and the feature map of 104 is extracted by the first three Ghost blocks in the Ghost net; then, extracting the features of two Ghost blocks to obtain a feature map of 52; then obtaining a 26 characteristic diagram through six Ghost blocks; and finally obtaining a feature map of 13 through five GhostBlock, and finishing the process of extracting the features of the backbone network.

2.3, replacing the original positioning loss function of the YOLOv3 network model by GIOU loss;

the IOU is used for respectively solving intersection and union of any A, B frames and finally solving the ratio of the intersection and the union; the IOU expression is:

GIOU indicates that for any A, B boxes, a minimum closed shape C capable of enclosing the boxes is found firstly, then the ratio of the area of C \ AbuB to the area of C is calculated, wherein the area of C \ AbuB is the area of C minus the area of AbuB, and then the ratio is subtracted from the IOU value of A, B to obtain GIOU; the GIOU expression is as follows:

wherein: A. b is two arbitrary convex regions, C refers to the smallest closed shape containing A and B;

the expression for the final localization loss is:

final positioning loss L_GIOU＝1-GIOU。

2.4, replacing a category Loss function of an original model of YOLOv3 with Focal local Loss;

the calculation formula of Focal local is as follows:

FL(p_t)＝-α(1-p_t)^γlog(p_t)

wherein alpha is 2, gamma is 0.25, and p is_tRepresenting the probability equation of positive and negative samples, p_tAs shown in the following formula:

wherein:

p represents the positive sample probability and y represents the label value;

in step 2.2, deeper features are obtained through four-scale feature fusion, and 13 × 13, 26 × 26, 52 × 52 and 104 × 104 are selected as four output feature maps; the number of network iterations is set to 1000; the features of the four-layer feature graph are further extracted through an improved up-sampling module dw _ res2net _ block, and then up-sampling is carried out to be fused with the original features to serve as new candidate features.

The dw _ Res2Net _ block is constructed by referring to the basic structure of GhostNet and Res2Net on the basis of the original invested _ Res _ block structure. And a smaller residual error connection module is added, so that the gradient disappearance is relieved, the communication among all the characteristic graphs is increased, and the original 3X 3 convolution layer is replaced by 3X 1 and 1X 3, so that the module characteristic extraction capability is more exquisite, and the parameter quantity is reduced by 1/3. In order to further reduce the model parameters, DWConv is used to replace part of the convolution operation, so that the convolution effect which is not much different from the original convolution layer can be obtained with less calculation force.

In step 2.3, the improvement on the YOLOv3 backbone network includes acquiring more effective channel information through a self-attention mechanism SE, and adding the SE self-attention mechanism to ghost bottleeck to make the network pay more attention to the training of important channel characteristics.

Further, the improvement of the YOLOv3 backbone network further comprises: and (3) clustering the sizes of the anchor boxes by adopting k-means + + to the water surface floater data sets to generate 12 different sizes. The technology can improve the detection performance of the model and accelerate the convergence speed.

In step three, the training of the improved YOLOv3 network model comprises the following steps:

3.1, initializing the weight of the improved YOLOv3 network model and parameters, wherein the parameters comprise a convolutional layer parameter value, a learning rate, an iteration number epoch and the number of data volumes per batch, namely batch _ size;

3.2, placing the training set and the test set in an agreed directory, and operating the program for training;

3.3, the pre-training program will select 1/10 data from the training set as the validation set, and each iteration will pass the validation set to verify and record the difficult sample that does not perform well. Outputting each index of the model after the iteration times are reached; the indicators include average accuracy mAP, single category average accuracy AP, accuracy rate, and recall rate.

3.4, after training is finished, the recorded difficulty is analyzed, the difficulty sample is enhanced, and the process of steps 3.3-3.4 is repeated for three times.

And 3.5, reaching the iteration times, finishing training, and storing various parameters and weights of the model.

In the fourth step, the detection capability of the improved YOLOv3 network model on the water surface floater is detected by using the average class accuracy, the model parameters and the FLOPs as performance indexes.

The invention also provides a system for realizing the detection method, which comprises the following steps: the system comprises a data input module, a data processing module, a YOLOv3 network module and a result output module.

The data input module is used for transmitting the acquired image data to the data processing module;

the data processing module is used for carrying out geometric transformation and color transformation on the image data, labeling the target object to obtain a data set, and splitting the data set into a training set and a test set;

the YOLOv3 network module is used for training a model and detecting and identifying a target object;

the result output module is used for outputting a corresponding result to the input image data.

Based on the foregoing method, the present invention further provides a system of the detection and identification method, including:

the method comprises the following steps of (1) carrying out user login interface, water surface drift detection page, picture uploading, video uploading, detection function and camera identification; wherein,

the user login interface: displaying the name, author and application version information of the application, and providing an entrance of a detection interface on the interface;

the water surface drift object detection page comprises: the interface is the display area of the application main function button and the uploaded picture and the predicted picture; the display functions mainly include: uploading pictures, uploading videos, starting detection and detecting a camera;

uploading the pictures: uploading a picture to be detected;

and uploading the video: uploading a video to be detected;

the detection function is as follows: clicking a button to identify, and displaying an identified result;

the camera is identified: and opening the camera to monitor the detected object in real time.

The beneficial effects of the invention include:

the improved YOLOv3 network model in the invention adopts GhostNet network structure to greatly reduce the parameters and FLOPs of the model; meanwhile, the detection effect of the network on the small target water surface drifter can be improved by adopting four-scale feature map fusion; a GIOU loss function is adopted; the Focal local Loss function is adopted, so that the training of difficult samples is emphasized more in the training process of the model, and the problem of sample class imbalance is solved; and the detection effect of the model is improved by means of data enhancement and multi-scale training. Through the method, compared with a YOLOv3 original edition algorithm, the method has the advantages that the detection effect is higher, the volume of the model is greatly reduced, the parameter quantity is also greatly reduced, and accurate and rapid detection and identification can be realized in a mobile client with limited calculation capacity.

Drawings

FIG. 1 is a diagram of enhancing the pre-frequency number of a set of difficult sample data according to an embodiment of the present invention.

Fig. 2 is a frequency diagram after enhancing a difficult sample data set according to an embodiment of the present invention.

Fig. 3 is a general framework diagram of the YOLOv3 improved network of the present invention.

Fig. 4 is a detailed diagram of dw _ res2net _ block modules in the network of the present invention.

FIG. 5 is a flow chart of model training according to the present invention.

FIG. 6 is a diagram of the improved YOLOv3 network mAP convergence.

Fig. 7 is a graph of the raw YOLOv3 network mAP convergence.

Detailed Description

The invention is further described in detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

The invention provides an improved water surface floating object detection and identification method based on YOLOv3, which comprises the following steps:

step two, constructing an improved YOLOv3 network model;

Examples

The embodiment provides an improved water surface drifter detection and identification method based on YOLOv3, which comprises the following steps:

step 1, preparing a water surface drift object data set and performing data enhancement on the data, wherein the method specifically comprises the following steps:

(1) the data set is collected in a solid scene, comprises 3443 pictures in total, and adopts 8: a scale of 2 divides the data set into a training set and a test set. The training set comprises 2755 pictures, the testing set comprises 688 pictures, the 688 pictures are uniformly processed into 416 x 416 pictures, and the pictures are marked by Labelme. Because the number of each category contained in the data and the training difficulty are different, the difficult samples with low occurrence frequency cannot be sufficiently learned in the model, so that the samples with the precision of less than 50% are marked in the training process, and after the training is finished, the samples are trained again. The format of the picture marking file is json, and the picture marking file is converted into a VOC format through coding in order to facilitate network training.

(2) In order to make more data available to the network, a series of data enhancement methods are used, including geometric transformation and color transformation. Firstly, randomly selecting a small number of category samples, performing color transformation by adjusting hue, contrast, saturation, brightness and the like, performing geometric transformation such as scaling, translation, rotation and the like and random cutting on the images, and then randomly selecting the images to splice to generate a new image. And then, after the training is finished, enhancing the sample with the precision of less than 50% again by adopting the enhancing mode mentioned above, and taking the sample and the original data set together as training data. Data enhancement can alleviate the problem of data sample imbalance, such as fig. 1 is a frequency chart before enhancement on a difficult sample in the embodiment, and fig. 2 is a frequency chart after enhancement on a difficult sample in the embodiment.

And 2, building a YOLOv3 improved network model.

(1) In the present invention, fig. 3 is an overall framework diagram of the YOLOv3 improved network. Firstly, a YOLOv3 backbone network is improved, ResNet block in Darknet53 is replaced by Ghost block, images are partially convolved, then the images are subjected to linear operation through DWConv, and finally the results of the two images are spliced together, so that a new feature map is obtained. The GhostNet connection mode can greatly deepen the depth of a network while reducing model parameters, and better feature extraction capability can be obtained through less calculation force.

(2) And enhancing the multi-scale feature fusion capability. Sequentially extracting feature maps with down-sampling multiples of 13, 26, 52 and 104 from a GhostNet structure: route-1, Route-2, Route-3, Route-4. The dw _ res2net _ block shown in fig. 4 further performs fusion extraction on the features, and then performs up-sampling operation on the extracted feature map to obtain fusion features m1, m2, m3 and m4, so that the detection capability of the small-target water surface drifter is improved.

(3) dw _ Res2Net _ block is constructed by referring to the infrastructure of GhostNet and Res2Net on the basis of the original invested _ Res _ block structure. And a smaller residual error connection module is added, so that the gradient disappearance is relieved, the communication among all the characteristic graphs is increased, and the original 3X 3 convolution layer is replaced by 3X 1 and 1X 3, so that the module characteristic extraction capability is more exquisite, and the parameter quantity is reduced by 1/3. In order to further reduce the model parameters, DWConv is used to replace part of the convolution operation, so that the convolution effect which is not much different from the original convolution layer can be obtained with less calculation force. In addition, SE block is added to distribute the weight of the characteristic channel, and the extraction capability of the model characteristic is improved.

(4) Replacing the original positioning loss function of the YOLOv3 network model by GIOU loss;

IOU represents that for any A, B boxes, intersection and union are respectively solved, and finally the ratio of the intersection and the union is solved; the IOU expression is:

the final positioning loss expression is:

final positioning loss L_GIOU＝1-GIOU

(5) Replacing the category Loss function of the original model of YOLOv3 with Focal local Loss;

the calculation formula of Focal local is as follows:

FL(p_t)＝-α(1-p_t)^γlog(p_t)

in the formula, alpha is 2, gamma is 0.25, pt represents the probability equation of positive and negative samples, and pt is shown as the following formula:

wherein:

p represents the probability of a positive sample, y represents the label value;

(6) the anchors are clustered by using a k-means algorithm in an original network model of YOLOv3, and are replaced by using k-means + + in the invention. Compared with the method that k-means + + randomly selects k clustering centers at one time, the method is more reasonable, and each selection of k-means + + selects the clustering centers farther away from the last time, so that 12 anchors are selected. k-means + + can reduce the final position error to some extent.

Step 3, training the improved model

The picture input size of the model is 416 multiplied by 416, the initial learning rate is set to be 1e-3, the processed training data set is input into the model according to the set batch _ size (set according to hardware conditions) to carry out forward propagation and calculate loss, then the parameters in the network are updated by carrying out backward propagation according to a loss function, after a plurality of iterations, when the network loss tends to be stable, the training of the model is stopped, and the parameters of the network model are stored.

Step 4, using the trained improved model to perform detection test

The trained model is used for detecting the test data, the detection result is averaged, and the result shows that the detection precision of the improved model on the water surface drifter is greatly improved, and particularly the detection on the small target water surface drifter is remarkably improved.

As shown in table 1, when the average AP of the analogs of YOLOv3 and YOLOv3 are compared, it can be seen that the AP of each category is greatly improved, but the recognition capability of the categories with small amount of waterweeds, branches and lotus leaves is not high, and when folcal loss is added to YOLOv3, the recognition capability of the difficult samples with small amount of waterweeds and lotus leaves is improved, only at the cost of the precision of consuming a small amount of easy-training samples. As shown in Table 2, the detection accuracy of the improved Yolov3 in the overall average accuracy performance of the model is 13% higher than that of the improved Yolov3, and the addition of the Focal local can bring about 1% performance improvement for the model.

TABLE 1 average AP of each model

TABLE 2 Overall average accuracy of the model

As shown in table 3, the improved YOLOv3 was a large reduction in volume relative to YOLOv3 in the model parameters compared to the calculated force required, and the calculated force required by the model was also reduced. Compared with YOLOv3, the improved YOLOv3 is more suitable for being deployed in a device with limited calculation capacity and higher real-time requirement.

TABLE 3 comparison of the parameters of the model with the calculated forces required

When the number of network iterations is 1000, training is performed on the GPU2080ti, the improved YOLOv3 network is trained for about 3 days and half converged and ended, and the original network is trained for about one week and converged and ended. The converged view is shown in fig. 6 and fig. 7 (fig. 6 is the improved network; fig. 7 is the original network), and it can be seen from fig. 6 and fig. 7 that the training process of the improved network is smoother and faster in convergence, while the training process of the original network has larger fluctuation and slow training speed. Meanwhile, as can be seen from table 3, the parameters of the improved model are much smaller than those of the original model, so that fewer parameters need to be calculated and updated in the training process, and thus the training speed and the convergence speed of the improved model are faster.

The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.

Claims

1. A water surface floating object detection and identification method based on an improved YOLOv3 identification model is characterized by comprising the following steps:

step two, constructing an improved YOLOv3 network model;

2. The improved YOLOv3 recognition model-based water surface floater detection and recognition method according to claim 1, wherein the first step specifically comprises the following two substeps;

1.1, acquiring a training data set of the water surface drifter manually on site, carrying out color transformation by adjusting hue, contrast, saturation and brightness, carrying out geometric transformation and random cutting on an image by zooming, translating and rotating, and then randomly selecting pictures for splicing to generate a new image;

3. The method for detecting and identifying water surface drifters based on the improved YOLOv3 recognition model according to claim 1, wherein in the second step, the step of constructing the improved YOLOv3 network model specifically comprises the following steps:

2.1, replacing the original DarkNet53 network of the YOLOv3 network model with a GhostNet network; an attention mechanism layer SELayer is added into the GhostNet bottleck of the GhostNet, so that the attention degree of main channel characteristics is enhanced;

2.2, sequentially extracting feature maps with down-sampling multiples of 4, 8, 16 and 32 from the GhostNet network structure through average pooling, and sequentially performing up-sampling on the feature maps and fusing the original features to obtain four new feature maps;

wherein A, B is two arbitrary convex regions, C refers to the smallest closed shape containing A and B;

the expression for the final localization loss is:

final positioning loss L_GIOU＝1-GIOU；

the calculation formula of Focal local is as follows:

FL(p_t)＝-α(1-p_t)^γlog(p_t)，

wherein:

p represents the positive sample probability and y represents the label value.

4. The method for detecting and identifying the water surface drifter based on the improved YOLOv3 recognition model according to claim 3, wherein in step 2.2, deeper features are obtained through four-scale feature fusion, and 13 × 13, 26 × 26, 52 × 52 and 104 × 104 are selected as four output feature maps; the number of network iterations is set to 1000; further extracting features of the four-layer feature graph through an improved up-sampling module dw _ res2net _ block, and then performing up-sampling to fuse the features with original features to serve as new candidate features;

the dw _ res2net _ block is added with a smaller residual connecting module, so that the interaction among all sections of feature maps is increased while the gradient disappears is relieved, the module feature extraction capability becomes finer and finer, and the parameter quantity of 1/3 is reduced; while DWConv is used instead of part of the operation of convolution.

5. The method for detecting and identifying the water surface drifter based on the improved YOLOv3 recognition model according to claim 3, wherein in the second step, the improvement of the YOLOv3 trunk network further comprises a training that a SE self-attention mechanism is added into the ghost bottleeck to make the network focus more on important channel features.

6. The method for detecting and identifying the water surface drifter based on the improved YOLOv3 recognition model according to claim 3, wherein in the second step, the improvement on the YOLOv3 trunk network further comprises clustering the sizes of the anchor box by using k-means + + to the water surface drifter data set to generate 12 different sizes, so that the detection performance of the model is improved, and the convergence speed is accelerated.

7. The method for detecting and identifying water surface floating objects based on the improved YOLOv3 recognition model, according to claim 1, wherein the training of the improved YOLOv3 network model comprises the following steps in three steps:

3.3, selecting 1/10 data from the training set as a verification set by the program before training, verifying through the verification set by each iteration, and recording difficult samples with poor performance; outputting each index of the model after the iteration times are reached; the indicators comprise an average accuracy mAP, a single category average accuracy AP, an accuracy rate and a recall rate;

3.4, after training is finished, analyzing the recorded difficulty, enhancing the difficulty sample, and repeating the steps 3.3-3.4 for three times;

8. The method for detecting and identifying the water surface floating objects based on the improved YOLOv3 recognition model according to claim 1, wherein in the fourth step, the detection capability of the improved YOLOv3 network model on the water surface floating objects is detected by using average class accuracy, model parameters and FLOPs as performance indexes.

9. A system for implementing the detection and identification method according to any of claims 1 to 8, said system comprising: the system comprises a data input module, a data processing module, a YOLOv3 network module and a result output module;

10. A system for implementing the detection and identification method according to any of claims 1 to 8, said system comprising:

uploading the pictures: uploading a picture to be detected;

and uploading the video: uploading a video to be detected;