CN116543295A

CN116543295A - Lightweight underwater target detection method and system based on degradation image enhancement

Info

Publication number: CN116543295A
Application number: CN202310366420.0A
Authority: CN
Inventors: 周昕; 吴至伟; 范小婷
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-08-04

Abstract

The invention discloses a degradation image enhancement-based light underwater target detection method and a degradation image enhancement-based light underwater target detection system, which relate to the technical field of deep learning image processing, and are used for receiving an image enhancement data set UIE and an underwater target detection data set URPC, preprocessing the UIE and the URPC, and dividing the UIE and the URPC into a training set, a verification set and a test set; inputting the preprocessed underwater image of the UIE into a pre-established underwater imaging model for enhancement to obtain a clear underwater image, and inputting the image of the URPC into the pre-established underwater imaging model to obtain an enhanced image; marking the underwater image and the enhanced image with the best effect as weight files, and storing the weight files; and inputting the weight file, the training set, the verification set and the test set into a pre-established lightweight underwater target detection model, finally outputting an image containing an underwater target detection frame, identifying and marking targets in the underwater target detection frame, and calculating average precision.

Description

Lightweight underwater target detection method and system based on degradation image enhancement

Technical Field

The invention relates to the technical field of deep learning image processing, in particular to a degradation image enhancement-based light underwater target detection method and system.

Background

The method has important value for the detection task of the underwater target which is not separated from the exploration of the ocean, namely in the resource development, the submarine fishing, the ecological protection and the military operation. However, due to the influence of the underwater complex environment, a series of problems such as high cost and low safety exist in a manual or semi-manual detection mode, which brings great challenges to the underwater task. At present, with the vigorous development of deep learning technology, a vision-based target detection technology becomes a hot spot gradually, is widely used for underwater target recognition tasks, plays an important role in various aspects such as resource development, underwater monitoring, ecological protection and the like, and provides powerful support for the underwater detection tasks.

The underwater environment is complex, and great trouble is brought to the target detection task. Since light rays can be selectively attenuated under water, that is, the propagation of light under water has a wavelength dependence. In the underwater environment, red light decays fastest, then green light goes to blue light, so that the collected underwater image always presents a blue-green background, and color deviation is generated. At the same time, the floats in water cause scattering of light, thereby blurring the image details. The above problems may lead to degradation of image quality, directly affecting the accuracy of the subsequent detection task.

In recent years, deep learning is being vigorously developed. The object detection technology has greatly progressed and is widely applied to various scenes. However, the network structure of the method at the present stage is complex, the parameter quantity is huge, and the real-time detection is not facilitated. At present, the target detection algorithm is mainly divided into two types, namely a one-stage detector which directly predicts the type of the position on the characteristic image, such as SSD, YOLO series and the like, and has the characteristics of high speed, high accuracy and the like. The other type is a two-stage algorithm based on detection frames and classifiers, firstly generates region candidate frames, and classifies each region candidate frame by a convolutional neural network, such as R-CNN, fast-RCNN and the like, and has the characteristics of high accuracy but low speed.

Disclosure of Invention

In order to solve the above-mentioned shortcomings in the background art, the present invention aims to provide a method and a system for detecting a lightweight underwater target based on degraded image enhancement,

the aim of the invention can be achieved by the following technical scheme: a method for detecting a lightweight underwater target based on degradation image enhancement comprises the following steps:

receiving an image enhancement data set UIE and an underwater target detection data set URPC, preprocessing the UIE and the URPC, and dividing the preprocessed UIE and the preprocessed URPC into a training set, a verification set and a test set;

embedding a pre-established underwater imaging model into a UWCNN-SD network, inputting an underwater image in a UIE data set into the UWCNN-SD network for training, storing the trained weight through an original underwater image and a real image training network, loading the weight into the UWCNN-SD network, inputting a URPC data set into the trained UWCNN-SD network, finally obtaining a clear underwater image,

inputting the clear underwater image, the training set, the verification set and the test set into a pre-established lightweight underwater target detection model, finally outputting an image containing an underwater target detection frame, identifying and marking the target in the underwater target detection frame, and calculating average precision to obtain a detection result.

Wherein optionally the UIE dataset comprises 950 pairs of underwater images and real images, wherein the real images are clear underwater images, without color bias.

Optionally, the URPC data set contains 7600 pictures and corresponding tag files, wherein the tag files comprise a labeling frame, position information of the labeling frame and real category information of the content of the labeling frame.

Optionally, the underwater imaging model performs discrete cosine transform on the underwater image, separates the underwater image into a high-frequency part and a low-frequency part, then builds a CNN network and a loss function, embeds the underwater imaging model into the network, can eliminate color deviation on the low-frequency part, highlights texture details on the high-frequency part, and finally fuses the high-frequency part and the low-frequency part to output a clear underwater image.

Wherein optionally, the underwater imaging model is as follows:

I _λ (x)＝J _λ (x)t _λ (x)+A _λ (1-t _λ (x))

in the formula ,I_λ (x) For captured underwater images, J _λ (x) To make the image clear, t _λ (x) For transmittance, A _λ As global background light, λ represents an RGB channel;

converting the atmospheric scattering model:

J _λ (x)＝K _λ (x)I _λ (x)-K _λ (x)+1

let t _λ(x) and A_λ Combined into a single variable K _λ (x) Performing discrete cosine transform on the underwater image, and separating the underwater image into a high-frequency component and a low-frequency component:

I _λ (x)＝I _λ ^LF (x)+I _λ ^HF (x)

J _λ ^LF (x)＝K _λ (x)I _λ ^LF (x)-K _λ (x)+1

J _λ ^HF (x)＝K _λ (x)I _λ ^HF (x)-K _λ (x)+1

LF represents a low-frequency component, HF represents a high-frequency component, a CNN network is constructed, underwater images in a UIE data set are input into the network for training, and parameters K are learned _λ (x) Will K _λ (x) Substituting the image into an underwater imaging model, and reversely solving a clear underwater image.

Optionally, the CNN network training process is as follows:

training a network by minimizing a loss function, initializing network parameters by adopting Gaussian distribution, wherein an Adam optimizer is used for optimizing the network parameters, storing and loading learned weights into a test file, inputting an underwater image in a URPC data set into the test file, and acquiring an enhanced underwater target detection image, wherein the loss function is as follows:

wherein ,

mu and sigma respectively represent a gray image J _λ(x) and I_λ (x) The mean value and the standard deviation of (c),representing a gray-scale image J _λ(x) and I_λ (x) Covariance of C ₁ ＝(K ₁ +L) ² 、C ₂ ＝(K ₂ +L) ², in the formula K₁ ＝0.01、K ₂ ＝0.03、L＝1。

Optionally, the lightweight underwater target detection model extracts three feature graphs with different sizes by replacing a feature extraction network in YOLOV5 with a GhostNet lightweight module, adds a CA attention mechanism on the neck, and then inputs the three feature graphs with different scales into three classification regression layers respectively for prediction.

Wherein optionally, the CA attention mechanism comprises the following three operations:

information embedding operation: for a given input feature map, performing pooling operation along the horizontal direction and the vertical direction of the feature map by using global average pooling to obtain two embedded information feature maps, and using H×1 pooling cores in the horizontal direction, and obtaining H×1×C information feature maps by using H×W×C input features through global average pooling operation, wherein the formula is as follows:

the vertical direction, namely the Y direction, uses a 1 XW pooling core, and an information characteristic diagram of 1 XW XC is obtained by carrying out global average pooling operation on input characteristics of H XW XC, wherein the formula is as follows:

attention generation operation: the two information feature graphs Z generated in the last step are compared ^h _c and Z^w _c Stitching is performed along the spatial dimension, followed by a 1 x 1 convolution operation and activation function. Then performing slicing operation along the space dimension to obtain two separated feature images, and respectively performing transformation and activation functions on the two separated feature images to obtain two attention vectors g ^h and g^w The formula is as follows:

wherein ,

g ^h ＝σ(F _h (f ^h ))

g ^w ＝σ(F ^w (f ^w ))

feature map correction operation: the previous operation results in two attention vectors g ^h E C.times.H.times.1 and g ^w E C x 1 x W, broadcast transforming it into C x H x W dimension and residual operating input characteristic diagram x _c And performing corresponding position multiplication operation to obtain the final attention characteristic.

Optionally, the calculation process of the average precision is as follows:

the statistical data of p and recall rate r are used for measuring accuracy, the accuracy is the ratio of true positive tp to all predicted positive tp+fp, the ratio of true positive tp+fn is represented in the predicted result, the recall rate is the ratio of true positive tp+fn, the ratio of true positive tp+fn is represented in all targets, and the average accuracy AP is the average value of all accuracies obtained under the condition of possible values of all recall rates;

wherein ,

a degraded image enhancement based lightweight underwater target detection system, comprising:

an image processing module: the method comprises the steps of receiving an image enhancement data set UIE and an underwater target detection data set URPC, preprocessing the UIE and the URPC, and dividing the preprocessed UIE and the preprocessed URPC into a training set, a verification set and a test set;

an image enhancement module: embedding a pre-established underwater imaging model into a UWCNN-SD network, inputting the underwater image in the preprocessed UIE data set into the UWCNN-SD network for training, storing the trained weight through an original underwater image and a real image training network, loading the weight into the UWCNN-SD network, and inputting the URPC data set into the trained UWCNN-SD network to obtain a clear underwater image;

an image generation module: the method is used for inputting clear underwater images and training sets, verification sets and test sets into a pre-established lightweight underwater target detection model, finally outputting images containing the underwater target detection frame, identifying and marking targets in the underwater target detection frame, and calculating average accuracy.

The invention has the beneficial effects that:

the invention utilizes the image enhancement technology and the target detection technology of the forefront edge, and realizes the practicability of the front edge technology. Aiming at the difficulty of underwater target detection, the invention firstly applies UWCNN-SD algorithm to enhance the degraded underwater image, eliminates color deviation caused by light attenuation, then improves the model based on the YOLOV5 model, replaces the feature extraction network of the model with GhostNet to reduce parameters and calculated amount, improves reasoning speed, introduces CA attention mechanism and enhances the extraction of the features. Finally, the underwater target detection precision is higher, the speed is faster, and the generalization capability is good.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort;

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a network structure diagram of the UWCNN-SD algorithm used in the present invention;

FIG. 3 is a network structure diagram of the improved yolo 5 model of the present invention;

FIG. 4 is a flow chart of a CA attention mechanism used in the present invention;

FIG. 5 is a graph of effect detection for YOLOV 5;

fig. 6 is an effect detection diagram of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a method for detecting a lightweight underwater target based on degraded image enhancement includes the steps of:

s1, acquiring an image enhancement dataset UIE and an underwater target detection dataset URPC, wherein the UIE dataset comprises 960 pairs of underwater images and corresponding clear images, the URPC dataset comprises 7600 pictures and labels, the label format of the URPC dataset is converted from xml to txt by PASCALVOC, and the dataset is divided into a training set, a verification set and a test set according to the ratio of 7:2:1. The URPC data set is divided into an image and a label, the image comprises four targets of sea urchins, starfish, scallops and sea cucumbers, and the label comprises target category information and position information.

S2, unifying the image size of the data set in the step S1 to be 640 multiplied by 3, and enhancing the underwater image by utilizing a UWCNN-SD algorithm. The UWCNN-SD algorithm is based on an underwater imaging model:

I _λ (x)＝J _λ (x)t _λ (x)+A _λ (1-t _λ (x))

converting the atmospheric scattering model:

J _λ (x)＝K _λ (x)I _λ (x)-K _λ (x)+1

I _λ (x)＝I _λ ^LF (x)+I _λ ^HF (x)

J _λ ^LF (x)＝K _λ (x)I _λ ^LF (x)-K _λ (x)+1

J _λ ^HF (x)＝K _λ (x)I _λ ^HF (x)-K _λ (x)+1

The network is trained by minimizing the loss function by first initializing network parameters with a gaussian distribution. Adam optimizers are used to optimize network parameters. During training, the learning rate was set to 0.0001, the batch size was 16, and the training period was 30. And finally, saving and loading the learned weight into a test file, inputting the underwater image in the URPC data set into the test file, and obtaining the enhanced underwater target detection image. The loss function is as follows:

wherein ,

S3, constructing a lightweight underwater target detection network, replacing a feature extraction network in the YOLOV5 with a GhostNet lightweight module, and extracting three feature graphs with different sizes, namely 80×80×40, 40×40×112 and 20×20×160. Adding a CA attention mechanism in the neck, and then respectively inputting three feature maps with different scales into three classification regression layers to predict;

the GhosNet module firstly performs feature extraction by using fewer convolution check input feature graphs, and then performs linear transformation on the feature graphs phi_i of all channels by using Depth-wise con-volumtion to obtain a Ghost feature graph. And finally, compressing the Ghost feature map and the feature map to generate a final feature map. The module consists of two stacked Ghost parts, the front of which is the extension of Ghost-BottleNeck, the main effect being to increase the number of channels and thus the dimension of the feature map. The latter part is to reduce the dimension of the feature map, to ensure consistency with the input, and finally to connect the two parts together by means of a jump connection. Meanwhile, the ReLu activation function is used in the front part to ensure the gradient disappearance phenomenon in the backward propagation process of the data. The ReLu activation function is not used later, because the distribution of the data of the next layer and the previous layer is different after the activation function is used, so that the difference of data input is continuously adapted, and the training speed of the network is reduced.

The CA attention mechanism comprises the following three operations:

information embedding operation (Coordinate Information Embedding): for a given input feature map, performing pooling operation along the horizontal direction and the vertical direction of the feature map by using global average pooling to obtain two embedded information feature maps (as shown in fig. 3), and using h×1 pooling cores in the horizontal direction to obtain h×1×c information feature maps by using the global average pooling operation on the input features of h×w×c, wherein the formula is as follows:

wherein ,

g ^h ＝σ(F _h (f ^h ))

g ^w ＝σ(F ^w (f ^w ))

S4, inputting the images of the training set and the verification set which are enhanced in the step S2 into a lightweight target detection model, wherein the training set is used for training the model, the training condition of the verification set on the model is fed back in time, and the test set is used for checking the final detection effect of the model. The input image size is 640×640×3, the training batch is set to 32, the training period is 300, the IOU threshold is set to 0.45, the initial learning rate is set to 0.01, the learning rate is updated by adopting a cosine annealing mode, and the training of the model is accelerated. And (3) saving a trained weight file, marking as 'best. Pt', loading the weight file into a model, inputting an image of a test set into the model, finally outputting a picture with a target marking frame and target information, and calculating average precision.

The average precision, which is the ratio of true positives (tp) to all predicted positives (tp+fp), represents the proportion of correct predictions in the predicted results, is measured using statistics of precision p and recall r. Whereas recall is the ratio of true positive to actual positive (tp+fn), indicating the proportion of all targets that are correctly predicted. Average accuracy AP refers to the average of all accuracies obtained with the possible values of all recall.

wherein ,

the simulation experiment uses an underwater image from a URPC2020 official data set, wherein the data set is from a national underwater robot major optical image game, comprises 7600 underwater real optical images with marks, comprises color cast, weak contrast and the like caused by various illumination conditions, and has four targets of sea urchins, starfish, scallops and sea cucumbers, and complex detection scenes such as target overlapping and shielding. The Python3.7 and Pytorch frameworks were used to implement on a server with 2 NVIDIARTX2080TI GPUs (11 GB memory).

The data set is divided into a training set, a verification set and a test set, and the dividing ratio is 7:2:1.

Comparison method YOLOV5 and the method of the invention the underwater target detection accuracy, model size and detection speed in the test image are compared with the results of table 1.

Table 1 simulation experiment underwater target detection precision and model parameter comparison

Quantitative analysis from table 1: the method can effectively detect the underwater target, has remarkable advantages compared with the YOLOV5L algorithm, improves the average precision by 3.2%, and reduces the size of the model by 53.1%.

an image enhancement module: the method comprises the steps of inputting a preprocessed underwater image of the UIE into a pre-established underwater imaging model for enhancement to obtain a clear underwater image, and inputting an image of the URPC into the pre-established underwater imaging model to obtain an enhanced image;

and a storage module: the method comprises the steps of marking an underwater image and an enhanced image with the best effect as weight files, and storing the weight files;

an image generation module: the method is used for inputting the weight file, the training set, the verification set and the test set into a pre-established lightweight underwater target detection model, finally outputting an image containing an underwater target detection frame, identifying and marking targets in the underwater target detection frame, and calculating average precision.

Based on the same inventive concept, the present invention also provides a computer apparatus comprising: one or more processors, and memory for storing one or more computer programs; the program includes program instructions and the processor is configured to execute the program instructions stored in the memory. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC), field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal for implementing one or more instructions, in particular for loading and executing one or more instructions within a computer storage medium to implement the methods described above.

It should be further noted that, based on the same inventive concept, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor performs the above method. The storage media may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electrical, magnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing has shown and described the basic principles, principal features, and advantages of the present disclosure. It will be understood by those skilled in the art that the present disclosure is not limited to the embodiments described above, which have been described in the foregoing and description merely illustrates the principles of the disclosure, and that various changes and modifications may be made therein without departing from the spirit and scope of the disclosure, which is defined in the appended claims.

Claims

1. The lightweight underwater target detection method based on degraded image enhancement is characterized by comprising the following steps of:

embedding a pre-established underwater imaging model into a UWCNN-SD network, inputting the underwater image in the preprocessed UIE data set into the UWCNN-SD network for training, storing the trained weight through an original underwater image and a real image training network, loading the weight into the UWCNN-SD network, and inputting the URPC data set into the trained UWCNN-SD network to obtain a clear underwater image;

2. The degradation image enhancement based lightweight underwater target detection method of claim 1, wherein the UIE dataset comprises 950 pairs of underwater images and true images, wherein the true images are clear underwater images without color bias.

3. The degradation image enhancement based lightweight underwater target detection method of claim 1, wherein the URPC data set comprises 7600 pictures and corresponding tag files, wherein the tag files comprise a tag frame, position information of the tag frame and true category information of the tag frame content.

4. The degradation image enhancement-based light underwater target detection method according to claim 1, wherein the underwater imaging model performs discrete cosine transform on an underwater image, separates the underwater image into a high-frequency part and a low-frequency part, then builds a CNN network and a loss function, embeds the underwater imaging model into the network to eliminate color deviation on the low-frequency part, highlights texture details on the high-frequency part, and finally fuses the two parts to output a clear underwater image.

5. The degradation image enhancement based lightweight underwater target detection method of claim 4, wherein the underwater imaging model is as follows:

I _λ (x)＝J _λ (x)t _λ (x)+A _λ (1-t _λ (x))

converting the atmospheric scattering model:

J _λ (x)＝K _λ (x)I _λ (x)-K _λ (x)+1

let t _λ(x) and A_λ Combined into a single variable K _λ (x) Separating underwater imagesA discrete cosine transform, separated into a high frequency component and a low frequency component:

I _λ (x)＝I _λ ^LF (x)+I _λ ^HF (x)

J _λ ^LF (x)＝K _λ (x)I _λ ^LF (x)-K _λ (x)+1

J _λ ^HF (x)＝K _λ (x)I _λ ^HF (x)-K _λ (x)+1

6. The degradation image enhancement-based lightweight underwater target detection method of claim 5, wherein the CNN network training process is as follows:

wherein ,

7. The degradation image enhancement-based light-weight underwater target detection method according to claim 1, wherein the light-weight underwater target detection model extracts three feature graphs with different sizes by replacing a feature extraction network in YOLOV5 with a Ghostnet light-weight module, adds a CA attention mechanism to a neck, and then inputs the three feature graphs with different scales into three classification regression layers respectively for prediction.

8. The method for detecting a lightweight underwater target based on degraded image enhancement according to claim 7, wherein the CA attention mechanism comprises the following three operations:

the vertical direction, namely the Y direction, uses a 1 XW pooling core, and the H XW XC input features are subjected to global average pooling operation to obtain a 1 XW XC information feature map, wherein the formula is as follows:

wherein ,

g ^h ＝σ(F _h (f ^h ))

g ^w ＝σ(F ^w (f ^w ))

9. The degradation image enhancement-based lightweight underwater target detection method as claimed in claim 1, wherein the calculation process of the average accuracy is as follows:

wherein ,

10. a degradation image enhancement-based lightweight underwater target detection system, comprising: