CN113901944B

CN113901944B - Marine organism target detection method based on improved YOLO algorithm

Info

Publication number: CN113901944B
Application number: CN202111251729.2A
Authority: CN
Inventors: 刘洋; 李英平; 孔程玉; 丛禹塵; 刘胜蓝; 王飞龙; 张津榕
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2024-04-09
Anticipated expiration: 2041-10-25
Also published as: CN113901944A

Abstract

The invention relates to the technical field of computer vision, and provides a marine organism target detection method based on an improved YOLO algorithm, which comprises the following steps: step 100, collecting a marine organism data set; step 200, preprocessing and data augmentation are carried out on the data set; step 300, constructing a YOLO V3 detector modified using deformable convolution; step 400, training a model on the data set formed in step 200; step 500, deploying a trained model for a marine organism target detection task. The invention can improve the accuracy and reliability of marine organism target detection.

Description

Marine organism target detection method based on improved YOLO algorithm

Technical Field

The invention relates to the technical field of computer vision, in particular to a marine organism target detection method based on an improved YOLO algorithm.

Background

The fishery is taken as a large plate of the marine industry, is an important component of the national economy of China, and China is the largest fishery production country in the world. In recent years, the fishery of China gradually goes to modernization, and large-scale marine pastures provide a feasible solution for the construction and management of the modernized fishery. However, in the current stage, there is still a need for a modern fishing operation mode, which needs a target detection method for marine organisms to assist a robot to perform the operation in water.

Object detection as a fundamental problem in the field of computer vision, there has been a research history of decades, and in recent years, with the rise of deep learning technology, the field of object detection has emerged a series of representative methods, for example, a two-stage object detection method represented by fast R-CNN (Faster Region with Convolutional Neural Network, faster regional convolutional neural network), a one-stage object detection method represented by YOLO V3 (You Only Look Once V, see only once object detector), an anchor-free object detection method represented by central net (central network), and the like.

Nevertheless, marine life is under water, and due to attenuation and scattering of light by seawater and suspended particles therein, the degradation of underwater images is serious, and problems of low brightness, low contrast, color cast and the like are commonly existed, so that the conventional target detection method is difficult to be effective. Considering that the underwater image acquisition difficulty is high, the data driving mode is difficult to work. Therefore, how to overcome the problem of degradation of underwater images and complete target detection of marine organisms by using small-scale data is a problem to be solved urgently.

Disclosure of Invention

The invention mainly solves the technical problems that the prior art is not suitable for small-scale data training and the degradation phenomenon of underwater images, and provides an improved YOLO algorithm-based marine organism target detection method, so as to achieve the aim of improving the accuracy and reliability of marine organism target detection.

The invention provides a marine organism target detection method based on an improved YOLO algorithm, which comprises the following steps of:

step 100, collecting a marine organism data set;

step 200, preprocessing and data augmentation are carried out on the data set;

step 300, constructing a YOLO V3 detector modified using deformable convolution;

step 400, training a model on the dataset formed by performing step 200, comprising the following steps 401 to 406:

step 401, reading a parameter file, and loading pre-trained model weights on a large-scale image classification data set;

step 402, reading images from the data set generated in the step 200 processing, and dividing a training set and a verification set;

step 403, inputting training data into a backbone network in batches, and generating three feature graphs with the scales of 13 x 13, 26 x 26 and 52 x 52 respectively through continuous convolution processing for subsequent feature fusion;

step 404, fusing information contained in the three feature maps under different scales in a mode of re-convolution after up-sampling for subsequent execution of prediction;

step 405, predicting three target frames at each grid on the three feature maps using the two-branch header network, and giving their location information and category information;

step 406, performing post-processing on the target frame by adopting a BBox voing method to obtain a final prediction result;

step 500, deploying a trained model for a marine organism target detection task.

Further, the step 100 of acquiring a marine organism dataset includes the steps of:

step 101, shooting an original image containing marine organisms in a coastal sea area by a diver or an underwater vehicle carrying a camera device;

step 102, screening the photographed images, and marking marine organisms on the images with better photographing quality to form standard images and marking information;

and step 103, exporting the standard image and the labeling information according to the standard format of the PASCAL VOC to form a data set.

Further, the step 200 of preprocessing and data augmentation of the data set includes the steps of:

step 201, reading in images in a data set, and adopting a histogram equalization method to adjust gray distribution of the images so as to generate an image set A;

step 202, reading in images in an image set A, randomly overturning the images by adopting a Flip method, increasing the number of samples in a data set under different view angles, and generating an image set B;

step 203, reading in the images in the image set A, adopting a Mixup method to randomly fuse the images, increasing the number of samples with shielding and dense distribution phenomena in the data set, and generating an image set C;

step 204, reading in the images in the image set A, randomly cutting the images by adopting a loop method, increasing the sample number of the large-scale targets in the data set, and generating an image set D;

step 205, reading in the images in the image set A, randomly splicing the images by adopting an expansion method, increasing the sample number of small-scale targets in the data set, and generating an image set E;

step 206, reading in the images in the image set A, counting the distribution of samples of different categories, and increasing the number of samples of the category with fewer samples in the data set by adopting a Copy-Paste method to generate an image set F;

in step 207, the images in the image set A, B, C, D, E, F are read in, and the image size is scaled to the standard size, so as to generate the image set G to be trained.

Further, the step 300, constructing an improved YOLO V3 detector, includes the steps of:

step 301, an input layer is constructed and used for receiving underwater image input;

step 302, constructing a backbone network improved by using a deformable convolution block for extracting features;

step 303, constructing a first large-scale detection head network modified by using a deformable convolution block for detecting large-scale marine organisms;

step 304, constructing a second mesoscale detection head network modified by using deformable convolution blocks for detecting mesoscale marine organisms;

step 305, constructing a third small-scale detection head network modified by using a deformable convolution block for detecting small-scale marine organisms;

step 306, combining the networks constructed in steps 301 to 305 to obtain an improved YOLO V3 detector.

Further, the calculation of the deformable convolution in the improved YOLO V3 detector includes the following processes:

acquiring a characteristic diagram of upper-layer convolution output;

performing convolution operation on the feature map to obtain an offset matrix;

obtaining a sampling grid containing offset according to the offset matrix;

resampling the feature map according to a sampling grid containing offsets;

and carrying out convolution operation on the sampling result and outputting new characteristics.

Further, after step 406, the method further includes:

step 407, calculating the error of the prediction result and the true value through the loss function, and executing back propagation;

and step 408, testing the model effect on the verification set, judging whether the loss exceeds a loss threshold value, if so, repeating the steps 403 to 408 until the loss is smaller than the loss threshold value, and storing the current model weight into the weight file.

Further, the step 500 of deploying the trained model for the marine organism target detection task includes the steps of:

step 501, deploying a modified YOLO V3 detector onto a computing device for performing the detection;

step 502, reading a parameter file, and loading the model weight trained in step 400;

step 503, reading an image on the camera device, and preprocessing the image by using a histogram equalization method;

step 504, the processed image is sent to a modified YOLO V3 detector to perform marine organism target prediction;

step 505, the position information and the category information included in the detection result are visualized.

Compared with the prior art, the marine organism target detection method based on the improved YOLO algorithm provided by the invention has the following advantages:

1. the traditional underwater image preprocessing assembly line is replaced, and the influence on the accuracy and recall rate of target detection caused by unbalanced image brightness distribution and color distribution due to the change of the working depth is avoided.

2. The data augmentation methods such as Mixup and Crop are introduced, the problems that underwater images are difficult to acquire and the data set is small in scale are solved, and the detection capability of marine organisms in dense distribution or under various conditions such as shielding is enhanced.

3. The detection effect of the traditional YOLO V3 detector is improved, a backbone network with an adaptive receptive field is obtained by introducing deformable convolution, and the adaptability of the detector to the shape of marine organisms is improved.

Drawings

FIG. 1 is a flow chart of a marine organism target detection method based on the modified YOLO algorithm of the method provided by the present invention;

FIG. 2 is a flow chart of data preprocessing and data augmentation provided by the present invention;

FIG. 3 is a schematic diagram of the network architecture of the improved YOLO V3 detector provided by the present invention;

fig. 4 is a flow chart of separable convolution.

Detailed Description

In order to make the technical problems solved by the invention, the technical scheme adopted and the technical effects achieved clearer, the invention is further described in detail below with reference to the accompanying drawings and the embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the matters related to the present invention are shown in the accompanying drawings.

Fig. 1 is a step diagram of the present method. As shown in fig. 1, the marine organism target detection method based on the improved YOLO algorithm provided by the embodiment of the invention comprises the following steps:

step 100, a marine organism dataset is acquired.

Step 101, a camera device is carried by a diver or an underwater vehicle to capture an original image containing marine life in an offshore area.

Specifically, the camera device is carried by a diver or an underwater vehicle to submerge to a natural sea area or a cultivation area rich in marine organisms, enough images are shot, and the images should contain various marine organisms possibly.

And 102, screening the photographed images, and marking marine organisms on the images with better photographing quality.

After the shooting operation is completed, the shot images are screened, invalid photos which do not contain any marine organisms are filtered, photos which are unsuitable for being used as a data set, such as blurred photos caused by shooting reasons, and marine organisms existing on the images with good shooting quality are marked.

And marking the remaining effective photos according to the standard format required by the PASCAL VOC, marking marine organisms in the images, and manufacturing a standard format data set for training of the target detector.

Step 200, preprocessing and data augmentation are performed on the data set.

Specifically, the detailed implementation of this step is shown in fig. 2. The problem of target detection for marine organisms faces mainly two challenges: firstly, the degradation problems of low brightness, low contrast, color cast and the like of the underwater image generally exist, and the degradation degree of the image is changed along with the change of the operation depth, so that a plurality of inconveniences are increased for the execution of the target detection task; secondly, underwater image acquisition is difficult, so that the data scale is small, and the model is difficult to train in a data-driven mode.

Step 201, reading in images in a data set, adopting a histogram equalization method to adjust gray distribution of the images, solving the problem of underwater image distortion, and generating an image set A;

the method mainly aims at the first challenge, and solves the problem of underwater image distortion. The invention adopts a histogram equalization algorithm to preprocess the underwater photographic image, the histogram equalization is a gray scale method in the digital image processing technology, the gray scale distribution of the image is counted, and the gray scale level is distributed on the interval of [0,255] approximately uniformly again by applying a nonlinear transformation. For the photographed RGB three-channel color image, the three channels are subjected to histogram equalization conversion and then superposition independently, so that the colors of the three channels can be restored to be more uniform distribution, and meanwhile, the algorithm has very remarkable effect of improving the brightness and contrast of the image. The algorithm has excellent self-adaptive characteristics for the problem that the image degradation degree is accompanied with the continuous change of the water depth, and effectively solves the problem of underwater image degradation.

Step 202, reading in the images in the image set A, randomly turning over the images by adopting a Flip method, increasing the number of samples in the data set under different viewing angles, and generating the image set B.

The Flip method randomly performs horizontal Flip or vertical Flip on an image, and can expand underwater image data collected in different directions and different viewing angles.

And 203, reading in the images in the image set A, randomly fusing the images by adopting a Mixup method, increasing the number of samples with shielding and dense distribution phenomena in the data set, and generating an image set C.

According to the Mixup (mixing) method, two images are randomly selected to be fused, so that underwater image data with dense distribution or shielding phenomenon can be expanded.

And 204, reading in the images in the image set A, randomly cutting the images by adopting a loop method, increasing the sample number of the large-scale targets in the data set, and generating the image set D.

The Crop method is characterized in that a next subarea containing a marine organism target is randomly cropped from an image, and the cropped subarea becomes a sample with a larger target scale after scaling in the subsequent step;

step 205, the images in the image set a are read in, the images are randomly spliced by adopting an expansion method, the sample number of the small-scale targets in the data set is increased, and the image set E is generated.

The expansion method randomly splices a plurality of underwater images, the spliced images become samples with smaller target dimensions after scaling in the subsequent steps, and the underwater image data with different dimensions can be expanded by introducing the expansion method and the loop method;

step 206, reading in the images in the image set A, counting the distribution of samples of different categories, and increasing the number of samples of the category with fewer samples in the data set by adopting a Copy-Paste method to generate the image set F.

The Copy-Paste method cuts and pastes the marine organism samples with smaller number on other images to increase the number of the samples, and the introduction of the method can expand the number of difficult samples to make the number distribution of the marine organisms of each class more uniform, considering that the number distribution of the marine organisms of each class is uneven.

After performing the above-described preprocessing and image augmentation method, to accommodate the input requirements of the YOLO V3 detector in step 300, all images will be uniformly scaled to the standard size of 416 x 416, forming a new data set to be trained, which will be used for model training in step 400.

Step 200 addresses the second of these challenges, and steps 202-206 are used to increase the number of samples in the dataset at different perspectives. The invention introduces Flip, mixup, crop, expand and Copy-Paste 5 data augmentation methods, and can expand training data.

Step 300, build a YOLO V3 detector modified using deformable convolution.

Specifically, the network architecture of the YOLO V3 detector using the deformable convolution improvement is shown in fig. 3, comprising a backbone network, a first large-scale detection header network, a second medium-scale detection header network, and a third small-scale detection header network. YOLO V3 extends the concept of the YOLO series method and evenly divides the input image into a plurality of grids, and if the center of a target falls on one grid, the grid is responsible for predicting the target. The traditional YOLO V3 detector uses a full convolution network containing 53-layer convolution operation as a backbone network to extract features in an input image, fuses feature graphs of different layers in the feature graphs, and directly predicts shape, position and category information of a target on the basis. The present step improves upon conventional YOLO V3 detectors by using deformable convolution instead of normal convolution in the backbone network, the second mesoscale detection header network, and the third small scale detection header network. The method comprises the following steps:

in step 301, an input layer is constructed for accepting underwater image input.

Necessary image preprocessing, such as normalization and image-to-tensor conversion, is performed as required by the YOLO V3 detector to meet the input needs of the backbone network.

Step 302, constructing a backbone network modified by using deformable convolution blocks for extracting features.

The backbone network comprises a first convolutional layer, a second convolutional layer, a first residual block, a third convolutional layer, a second residual block, a fourth convolutional layer, a third residual block, a fifth convolutional layer, a fourth residual block, a sixth convolutional layer, a fifth residual block, and a first deformable convolutional block. The tensor input in step 301 is subjected to this step to obtain a feature map which can be characterized.

At step 303, a first large scale detection head network modified using deformable convolution blocks is constructed for detecting large scale marine organisms.

The first large scale detection header network includes a seventh convolution layer and a first output layer. The feature map output in step 302 is subjected to this step to obtain a first set of prediction target frames.

A second mesoscale detection head network modified using deformable convolution blocks is constructed for detecting mesoscale marine organisms, step 304.

The second mesoscale detection header network comprises an eighth convolution layer, a first upsampling layer, a join operation, a second deformable convolution block, a ninth convolution layer, and a second output layer. The output feature maps of the first deformable convolution block and the fourth residual block in the backbone network constructed in step 302 undergo this step to obtain a second set of prediction target frames.

A third small scale detection head network, modified using deformable convolution blocks, is constructed for detection of small scale marine organisms, step 305.

The third small-scale detection head network comprises a tenth convolution layer, a second upsampling layer, a join operation, a third deformable convolution block, an eleventh convolution layer, and a third output layer. The third residual block in the backbone network constructed in step 302 and the output feature map of the second deformable convolution block in the second mesoscale detection header network constructed in step 304 are subjected to this step to obtain a third set of prediction target frames.

In view of the fact that the appearance shape of marine organisms is irregular, in order to better adapt to the phenomenon, the invention provides a self-adaptive Receptive Field (Receptive Field) for the traditional YOLO V3 detector, wherein the Receptive Field refers to the range of information which can be acquired by a convolution operator in one operation and corresponds to the original input image. The invention introduces a deformable convolution block, namely, a convolution sequence which uses deformable convolution to replace common 3*3 convolution.

Specifically, the deformable convolution is a plug and play module, and as shown in fig. 4, the calculation of the deformable convolution in the improved YOLO V3 detector includes the following procedures:

and a step a, obtaining a characteristic diagram of upper-layer convolution output.

And b, performing convolution operation on the feature map to obtain an offset matrix.

A convolution operation is performed using a convolution check feature map twice the number of feature channels, and the obtained operation result is regarded as an offset matrix. Each channel of the original characteristic diagram corresponds to two offset diagrams in the operation result, and the two offset diagrams respectively represent offset amounts of sampling points in the width and height directions.

And c, obtaining a sampling grid containing offset according to the offset matrix.

Inquiring the offset graph calculated in the step b, and applying offset correction of each sampling point on the offset graph to each sampling point on the standard sampling grid to obtain a corrected sampling grid.

And d, resampling the characteristic diagram according to the sampling grid containing the offset.

Sampling according to the sampling grid generated in the step c, and calculating the value at the target position by bilinear interpolation by taking the value in the neighborhood of the sampling point which is the decimal coordinate into consideration.

And e, carrying out convolution operation on the sampling result and outputting the characteristic.

And d, according to the sampling result of the step d, carrying out operation with a convolution kernel. And obtaining an output characteristic diagram until the sliding window is completed.

The deformable convolution block changes the sampling range of the conventional convolution, thereby expanding the receptive field of the deformable convolution block. The module is used as a part of a network, and can learn automatically in the training process, so that the YOLO V3 detector can obtain an adaptive receptive field, and the appearance shape of marine organisms can be better adapted.

Step 400, training a model on the dataset formed by step 200.

Step 401, reading a parameter file, loading model weights pre-trained on a large-scale image classification dataset (ImageNet).

Step 402, reading an image from the data set generated by the processing of step 200, and dividing the training set and the verification set.

Specifically, the training set after preprocessing and augmentation formed in step 200 is imported in this step, the training set and the verification set are divided according to the ratio of 3:1, training set data are loaded and randomly disordered, and images are read in batch by batch according to a certain input and synchronized into the video memory of the GPU. The pre-trained weights on ImageNet (a large scale image classification dataset) are loaded and the model is synchronized into the GPU's memory.

Step 403, inputting the training data into the backbone network in batches, and generating three feature graphs with dimensions of 13×13, 26×26 and 52×52 respectively through continuous convolution processing, so as to enable subsequent feature fusion.

For each batch of image data, the step sends the image data to the YOLO V3 detector for processing by the backbone network, which extracts features from the batch of image data and outputs feature maps at three different scales, 13 x 13, 26 x 26, 52 x 52, etc.

Step 404, fusing the information contained in the feature graphs under three different scales by means of up-sampling and then deconvolution, so as to execute prediction subsequently.

Through up-sampling and convolution operation of the bottleneck network, three feature images on different scales are fused from deep to shallow, and three feature images containing different layers of fusion semantic features are obtained.

Step 405 predicts three target frames at each grid on the three feature maps using a two-branch header network and gives their location information and category information.

Each point on the feature map corresponds to a grid, representing a block division on the original image, and the head network of YOLO V3 predicts three prediction frames with marine organisms possibly on each grid and gives information such as coordinates, size, belonging category and confidence level of the prediction frames in the image.

And 406, performing post-processing on the target frame by adopting a BBox voing (Bounding Box Voting, frame election) method to obtain a final prediction result.

These candidate frame information are post-processed using BBox voing method, which uses a weighted average of the shape location information of the predicted frames exceeding a certain threshold with the selected frame IoU (interaction-over-Union) to correct the shape location of the selected frame. Finally, a batch of prediction results of marine organisms are output, and forward propagation is completed.

After step 406, further includes:

in step 407, the error between the predicted result and the true value is calculated by the loss function, and back propagation is performed.

And comparing the output prediction result with a true value provided in labeling of training data, calculating a loss value by using a loss function of YOLO V3, and applying the loss value in back propagation of the network for updating the weight to advance the learning process of the network.

After the network has learned all the images in the training set, the performance of the model is evaluated on the verification set by using the loss function, and after the loss value of the model falls below a certain threshold value, the training process can be stopped, otherwise, the process is repeated, and the training set is disturbed again, so that the model continues to learn on the training set images. After the training process is finished, the model has excellent marine organism target detection capability, and the weight in the model is derived for deployment.

Step 501, a modified YOLO V3 detector is deployed onto a computing device for performing the detection.

Specifically, this step converts the trained model in step 400 into a format that is easy to deploy, and this conversion mode and the nature and requirement of the target platform on which the target format is deployed are determined. For example, it can be deployed in the form of TensorRT on the Nvidia Jetson series of embedded terminals; or deployed on a removable device running an Android or iOS system through an MNN framework, etc. Correspondingly, the weight file should also be converted in format according to the deployment requirement.

Step 502, reading the parameter file, and loading the model weight trained in step 400.

In step 503, the image on the camera device is read, and the image is preprocessed by using a histogram equalization method.

When the detection is performed, the camera equipment on the terminal is called to acquire a real-time video stream, and the video frame is subjected to histogram equalization processing in the process described in step 201, so as to solve the problem of degradation of the underwater image in the video frame. For the processed frame data, its size is scaled to 416 x 416.

Step 504, the processed image is sent to a modified YOLO V3 detector to perform marine organism target prediction.

Analyzing the prediction result output in the step 504, obtaining the position information and the category information of the marine organism target, using an image editing and displaying interface provided by OpenCV to draw a frame on the target position in the frame data, marking the marine organism category, and providing the marine organism category for display equipment.

According to the marine organism target detection method based on the improved YOLO algorithm, a marine organism target detection data set which accords with the standard PASCAL VOC format is firstly constructed; preprocessing and amplifying images are completed through a pipeline comprising methods of histogram equalization, flip, loop and the like, so that a data set for training is formed; further, constructing a YOLO V3 detector modified using deformable convolution and training the detector using the above data set to have excellent underwater target detection capability; and finally, deriving a deployable model, deploying the model on terminal operation equipment for detecting the operation, and executing a detection task from a video stream input by the camera.

Through tests, the marine organism target detection method based on the improved YOLO algorithm provided by the embodiment of the invention can cope with the problem of underwater image degradation caused by a complex marine environment, can train a target detection model for marine organisms by using small-scale data, and has excellent accuracy and recall rate while meeting the requirements of implementing target detection tasks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments is modified or some or all of the technical features are replaced equivalently, so that the essence of the corresponding technical scheme does not deviate from the scope of the technical scheme of the embodiments of the present invention.

Claims

1. A marine organism target detection method based on an improved YOLO algorithm, comprising the steps of:

step 100, collecting a marine organism data set;

step 200, preprocessing and data augmentation are carried out on the data set;

step 500, deploying a trained model for a marine organism target detection task;

the step 200 of preprocessing and data augmentation of the data set includes the steps of:

step 207, reading in the images in the image set A, B, C, D, E, F, scaling the image size to the standard size, and generating an image set G to be trained;

said step 300, constructing an improved YOLO V3 detector, comprises the steps of:

step 306, combining the networks constructed in steps 301 to 305 to obtain an improved YOLO V3 detector;

calculation of the deformable convolution in the improved YOLO V3 detector, comprising the following process:

acquiring a characteristic diagram of upper-layer convolution output;

performing convolution operation on the feature map to obtain an offset matrix;

obtaining a sampling grid containing offset according to the offset matrix;

resampling the feature map according to a sampling grid containing offsets;

2. The improved YOLO algorithm-based marine organism target detection method of claim 1, wherein the acquiring of the marine organism dataset in step 100 comprises the steps of:

3. The improved YOLO algorithm-based marine organism target detection method of claim 1, further comprising, after step 406:

4. The improved YOLO algorithm-based marine organism target detection method of claim 1, wherein the deploying trained models in step 500 for marine organism target detection tasks comprises the steps of: