CN116612378A - Unbalanced data and underwater small target detection method under complex background based on SSD improvement - Google Patents

Unbalanced data and underwater small target detection method under complex background based on SSD improvement Download PDF

Info

Publication number
CN116612378A
CN116612378A CN202310578589.2A CN202310578589A CN116612378A CN 116612378 A CN116612378 A CN 116612378A CN 202310578589 A CN202310578589 A CN 202310578589A CN 116612378 A CN116612378 A CN 116612378A
Authority
CN
China
Prior art keywords
feature
ssd
network
distillation
improvement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310578589.2A
Other languages
Chinese (zh)
Inventor
于俊洋
何义茹
谷航宇
潘顺杰
辛致宜
赵宇曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202310578589.2A priority Critical patent/CN116612378A/en
Publication of CN116612378A publication Critical patent/CN116612378A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application discloses an SSD (solid State disk) -based improved unbalanced data and underwater small target detection method under a complex background, which comprises the following steps: the network of SSD is improved: using VGG16 as the front-end backbone network, taking the output at conv3_3 as the first layer of feature layer, embedding the multidimensional pixel attention network after conv3_3; using hole convolution with expansion rate r and ReLU activation after the multidimensional pixel attention network to sequentially generate a plurality of residual feature layers needing prediction; inputting the generated feature graphs corresponding to the feature layers into a joint weighted knowledge distillation and multi-scale feature distillation module; adjusting the size of a picture to be detected in the original unbalanced underwater image data set and inputting the picture to a network of the improved SSD; and detecting the underwater small target based on the network of the SSD after improvement. The application greatly improves the detection capability of rare categories and reduces the influence on the model detection capability due to unbalanced sample distribution.

Description

Unbalanced data and underwater small target detection method under complex background based on SSD improvement
Technical Field
The application relates to the technical field of computer vision and image processing, in particular to an underwater small target detection method based on SSD (solid state drive) improved unbalanced data and a complex background.
Background
With the advent of deep convolutional neural networks, researchers have made significant progress in the task of target detection. Underwater target detection aims to locate and identify objects in an underwater scene. This research has been receiving continuous attention for its wide application in the fields of oceanography, underwater navigation, fish farming, etc. However, this is still a challenging task due to the complex underwater environment and lighting conditions.
The traditional SSD (single shot multibox detector) algorithm is a single-stage target detection algorithm, adopts a multi-stage characteristic diagram to carry out multi-stage output, and effectively improves the detection capability of the algorithm on targets with different scales. Ideal results are achieved on many generic test datasets. However, there are still deficiencies in handling underwater target detection. This is because, first, the underwater survey data set is scarce, and the available underwater data set and the objects in practical use are typically small. The SSD detection algorithm is also decreasing the resolution of the feature map as the number of network layers is deepened, and cannot effectively detect small objects in the image. Second, the images of the actual underwater dataset are cluttered. In underwater scenes, wavelength dependent absorption and scattering can significantly reduce the quality of the underwater image, which causes many problems such as visibility loss, weak contrast and color variation, which makes it difficult for a general purpose target detection algorithm to meet the detection accuracy in underwater environments with complex backgrounds.
Disclosure of Invention
Aiming at the problems that the visibility of an underwater image is low, the background is complex, the target duty ratio is small, and a sample is unbalanced, and the detection precision of a traditional SSD algorithm in an underwater environment with a complex background is difficult to meet the requirement, the application provides unbalanced data based on SSD improvement and an underwater small target detection method under the complex background; for images in most underwater environments, the background is complex, the occupation of the target in the image is small, and the multidimensional pixel attention network can well eliminate the influence of the complex background in the detection process; finally, for unbalanced distribution of sample types of underwater images, knowledge distillation is introduced into a classifier to strengthen training of few sample types, and detection accuracy is effectively improved.
In order to achieve the above purpose, the present application adopts the following technical scheme:
an underwater small target detection method based on SSD improved unbalanced data and complex background comprises the following steps:
step 1: using VGG16 as front end backbone network, taking output at Conv3_3 of VGG16 as a first layer for predicting feature layer, embedding multidimensional pixel attention network after Conv3_3 of VGG16 model to eliminate complex background of underwater environment; using hole convolution with expansion rate r and ReLU activation after the multidimensional pixel attention network to sequentially generate a plurality of residual feature layers needing prediction; inputting the generated feature graphs corresponding to the feature layers into a joint weighted knowledge distillation and multi-scale feature distillation module so as to improve the detection performance of the underwater small target with fewer sample types;
step 2: adjusting the size of a picture to be detected in the original unbalanced underwater image data set and inputting the picture to a network of the SSD after the improvement in the step 1;
step 3: and (3) detecting the underwater small target based on the network of the SSD after the improvement of the step (1).
Further, the feature maps of the plurality of feature layers (including the feature layer of the first layer for prediction, and the plurality of feature layers remaining to be predicted) have different sizes.
Further, the multi-dimensional pixel attention network is composed of a combination of pixel attention and CBAM attention.
Further, in the multidimensional pixel attention network, the following steps are specifically executed:
in the pixel attention network, a first layer of feature layers F 1 The feature map of the system is characterized in that the feature map of the system is obtained through an initial structure with convolution kernels with different ratios, and then a two-channel saliency map is learned through convolution operation, and the obtained saliency map respectively represents scores of a foreground and a background; then, a Softmax operation is performed on the significance map, and one of the channels is selected with F 1 Is multiplied by the feature map of (a); finally, a new information characteristic diagram A is obtained 1
The method of supervised learning is adopted: firstly, obtaining a binary mapping as a label according to a sample real label, and then using cross entropy loss and saliency mapping of the binary mapping as attention loss; furthermore, CBAM is used as a secondary attention network.
Further, in the combined weighted knowledge distillation and multi-scale feature distillation module, the following steps are specifically executed:
based on the generated feature graphs corresponding to the feature layers, the feature graphs are continuously trained by adopting example sampling and cross entropy loss to obtain a teacher model
Retraining a student model ψ by adding a weighted knowledge distillation loss and multi-scale feature distillation loss θ,ω In this process, multi-scale features and predictions from the trained teacher model are used, leaving room for training better student models to learn.
Advancing oneStep, the weighted knowledge distillation loss L is calculated according to the following formula RKL
Where abs () represents the absolute value function, the weight factor w i Is that
wherein ti and si Class prediction probabilities respectively representing teacher model and student model, C representing class number, N i Representing the number of samples of class i.
Further, the multiscale characteristic distillation loss L is calculated according to the following formula KF
wherein , and />Is a multi-scale feature v learned by a teacher model t Characteristics v of the student's end s And respectively carrying out normalization to obtain n representing the number of the feature graphs extracted by the model.
Further, in the combined weighted knowledge distillation and multi-scale feature distillation module, a final classification loss L JWAFD The method comprises the following steps:
obtaining the classification loss L JWAFD
wherein ,LCE Real label for representing ground and studentCross entropy loss between model predictions,y=(y 1 ,y 2 ,...,y C )∈R C true label vector representing an image data point, C representing the number of categories, y i Represents the ith component of y, z s Output representing student model->The estimated class probability representing the output of the student model, the superparameters alpha and beta control the respective distillation amounts, while delta, gamma represent scaling parameters for multi-scale feature distillation and weighted knowledge distillation, respectively, and T represents the temperature of the knowledge distillation.
Compared with the prior art, the application has the beneficial effects that:
according to the SSD-based improved unbalanced data and underwater small target detection method under the complex background, the designed supervised multidimensional pixel attention network can effectively eliminate the influence of the complex background of the underwater image. Aiming at the problem of unbalanced distribution of the types in an underwater environment, a combined weighted knowledge distillation and multi-scale characteristic distillation module is provided, so that the types with fewer samples can obtain a better identification and detection effect, and compared with an original SSD algorithm, the detection capability of rare types is greatly improved, and the influence on the detection capability of the model due to unbalanced distribution of the samples is reduced. In the training process, as the backbone network extracts the features of the underwater image, the resolution of the feature map is also reduced, and the application adds a layer of cavity convolution with the expansion rate of 3, so that the network can have a multi-scale convolution kernel, and the capability of the model for detecting small objects is further enhanced. Finally, compared with the original SSD target detection model, the design in the application is more suitable for underwater image detection with complex background.
Drawings
FIG. 1 is a diagram of a network structure improved by an SSD-based improved unbalanced data and underwater small target detection method in a complex background;
FIG. 2 is a diagram of an SSD network structure;
FIG. 3 is a diagram of a supervised multidimensional pixel attention network architecture;
FIG. 4 is a block diagram of a CBAM module;
FIG. 5 is a schematic diagram of a hole convolution structure;
FIG. 6 is a schematic diagram of a weighted knowledge distillation and multi-scale feature distillation module architecture.
Detailed Description
The application is further illustrated by the following description of specific embodiments in conjunction with the accompanying drawings:
the application provides an SSD (solid State disk) -based improved unbalanced data and underwater small target detection method under a complex background, wherein the network structure of the application is shown in figure 1, and the method comprises the following steps:
s101: an improvement to the network of SSDs (the original network is as shown in fig. 2), comprising: the feature extraction layer of the front end is based on the architecture of the standard VGG16 model (truncated at conv3—3 layer), using its output as the feature layer F of the first layer for prediction 1 . And a multidimensional pixel attention network (MDA-Net, as shown in fig. 3) is embedded after conv3—3, which is composed of a combination of pixel attention and CBAM attention (as shown in fig. 4). The complex background of the underwater environment can be eliminated. After this, using a hole convolution with expansion ratio r (as an embodiment, in particular r=3) and with ReLU activation (as shown in fig. 5), a plurality of feature layers (as an embodiment, in particular 6 feature layers F 2 ,F 3 ,F 4 ,F 5 ,F 6 ,F 7 ). This can achieve a larger receptive field without sacrificing feature map resolution (large receptive fields result in strong semantics). The feature maps of the 7 feature layers have different sizes, which can be used to effectively predict targets of different sizes. Finally, in order to solve the problem of sample unbalance in underwater target detection, a combined weighted knowledge distillation and multi-scale characteristic distillation module (shown in fig. 6) is used, so that the performance of fewer sample types is greatly improved.
S102: adjusting the size of a picture to be detected in the original unbalanced underwater image data set and inputting the picture to a network of the SSD after the improvement of S101;
s103: the network based on the SSD after S101 improvement detects the underwater small object.
For a better understanding of the present application, the present application will be specifically described by:
(1) Supervised multidimensional pixel attention network for eliminating complex background effects
To more effectively capture small underwater objects in a complex context, we have introduced a supervised multidimensional attention network (MDA-Net) among the original SSD networks. Specifically, in a pixel attention network, feature layer F 1 Through an initial structure with convolution kernels of different ratios, and then through a convolution operation learn a two-pass saliency map. The resulting saliency maps represent the scores of the foreground and background, respectively. Then, a Softmax operation is performed on the significance map, and one of the channels is selected with F 1 Is multiplied by the feature map of (c). Finally, a new information characteristic diagram A is obtained 1 . It should be noted that the significance map values following the Softmax function are in [0,1 ]]Between them. That is, it can reduce noise and relatively enhance target information. Since the saliency map is continuous, non-object information is not completely eliminated, which is beneficial to preserving certain context information and improving robustness. To guide the process of web learning, we use a method of supervised learning. First, we can derive a binary map as a label from the sample genuine label, and then use the cross entropy loss of the binary map and the saliency map as a loss of attention. Furthermore, we also used CBAM as an auxiliary attention network with a reduction ratio of 16.
(2) Hole convolution
Along with the extraction of the backbone network to the underwater image features, the resolution of the feature map is also becoming smaller. Convolution is to preserve a small number of key features in the data to reduce learning and training costs. The large convolution kernel is beneficial to detecting a large target; smaller convolution kernels are advantageous for detecting small targets. The model is additionally provided with a layer of cavity convolution with the expansion rate of 3, so that the network can have a multi-scale convolution kernel, and the capability of the model for detecting objects with different sizes is further enhanced. In addition, the hole convolution with a smaller expansion rate is designed to be more effective in extracting low-resolution feature map information.
(3) Combined weighted knowledge distillation and multi-scale feature distillation module
Unlike the original SSD, when we do class prediction, this can bias the classifier model trained by the original data towards multiple classes of samples due to the class sample imbalance problem in the underwater target detection, and the remaining classes can be misclassified. Therefore, the patent designs and implements a new two-stage training method for training the classifier, which is called a joint weighted knowledge distillation and multi-scale feature distillation module. The module needs to train a teacher model by using the class data of the training sample first, and then in the prediction stage, the teacher model is used to provide guidance for the class prediction process so as to learn a more balanced classifier, which is beneficial to the detection effect of the final model.
The model is characterized by comprising a weighted knowledge distillation module and a characteristic distillation module, wherein the weighted knowledge distillation module considers class priors, and different weights are given to different classes, so that the model is focused on the class with less samples. The feature distillation module effectively compensates for the problem of insufficient feature representation caused by the weighted knowledge distillation method. It is applicable to data sets with unbalanced category numbers, which effectively combines the advantages of re-weighting with raw knowledge distillation.
Weighted knowledge distillation loss L RKL The calculation is as follows:
where abs () represents the absolute value function, the weight factor w i The calculation of (2) is shown in the following formula, and the weight of the class with fewer samples can be effectively improved in this way, and the negative gradient caused by the class with more samples on other classes is reduced.
wherein ti and si Class prediction probabilities respectively representing teacher model and student model, C representing class number, N i Representing the number of samples of class i.
Multiscale characteristic distillation loss L KF The calculation is as follows:
wherein , and />Is a multi-scale feature v learned by a teacher model t Characteristics v of the student's end s And respectively carrying out normalization to obtain n representing the number of feature graphs extracted by the model, wherein in the embodiment, n=5 is taken.
Some of the mispredictions of the teacher model give erroneous guidance to the student model. Therefore, the application combines the cross entropy loss when training the student network, so that the student model has the opportunity to learn the real label of the sample. Finally training a student model by minimizing the sum of the above three partial losses, based on which the classification losses of the original SSD destination detection algorithm are redesigned as:
wherein ,LCE Representing cross entropy loss between ground truth labels and student model predictions,y=(y 1 ,y 2 ,...,y C )∈R C true label vector representing an image data point, C representing the number of categoriesAmount, y i Represents the ith component of y, z s Output representing student model->The estimated class probability representing the output of the student model, the superparameters alpha and beta control the respective distillation amounts, while delta, gamma represent scaling parameters of multi-scale feature distillation and weighted knowledge distillation, t, respectively i and si The model prediction probabilities of the teacher model and the student model are represented respectively, and T represents the temperature of knowledge distillation.
As a specific implementation manner, the underwater small target detection method based on SSD improved unbalanced data and under complex background comprises the following steps:
step 1: the picture to be detected in the original unbalanced underwater image data set is input into the network through the size of 512×512×3 by Resize.
Step 2: the underwater small target detection method based on SSD improvement comprises the following steps: using VGG-16 (truncated at Conv3_3 layer) as front-end backbone network, feature layer F is taken at its Conv3_3 th 1 ,F 1 The feature map of (2) is 64 multiplied by 512, and the feature map is processed by a supervised multidimensional pixel attention network to eliminate complex and noisy background and obtain a clearer feature map A 1
Step 3: at A 1 Then adopting a cavity convolution block with the expansion rate of 3 to obtain more effective characteristics, and further carrying out 3X 3 convolution and maximum pooling operation to obtain a characteristic layer F 2 The feature map size is 32×32×1024, and then similar operation is performed to obtain the remaining 5 prediction feature layers F 3 (feature map size is 16×16×512), F 4 (feature map size is 8×8×256), F 5 (feature map size is 4×4×256), F 6 (feature map size is 2×2×256), F 7 (the feature map size is 1×1×256). Finally, 7 features are subjected to category prediction and position deviation prediction to obtain confidence coefficient and coordinates respectively, wherein the features are input into a 3X 3 convolution in the prediction process, F 1 ,F 6 ,F 7 Each pixel point generates 4 prior frames, F 2 ,F 3 ,F 4 ,F 5 Each pixel point generates 6 prior frames respectively.
Step 4: firstly, training a teacher model by using class data of training samples, and then providing guidance for a class prediction process by using the teacher model in a prediction stage so as to learn a more balanced classifier and further finish underwater small target detection.
Further, the step 4 includes:
step 4.1, based on the generated feature graphs corresponding to the feature layers, continuing training the feature graphs by adopting example sampling and cross entropy loss to obtain a teacher model
Step 4.2, retraining a student model ψ by adding a weighted knowledge distillation loss and multi-scale feature distillation loss θ,ω In the process, multi-scale characteristics and predictions from a trained teacher model are fully utilized, and a learnable space is reserved for training a better student model.
To verify the effect of the application, the following experiments were performed:
(a) Experimental configuration and data set
The experiment used ubuntu20.04 operating system, NVIDIA Tesla V100 GPU,32GB memory. Training, validation and testing on URPC2019 and chinamam datasets based on the pyrerch framework. The training process sets the batch size to 16, the input picture size to 512×512, the initial learning rate to 0.001, after 80 epochs training are performed on each detector, the learning rate is reduced to 0.0001, another 40 epochs training are performed, the momentum is set to 0.9, and the total number of iterations is 120.
The URPC data set is a sea cucumber, sea urchin, scallop and other sea treasure image data set established by the national natural science foundation committee for underwater robot target grabbing. The method is mainly used for underwater image detection tasks as a public data set for image processing. The application uses the URPC2019 data set, and as the test set is not disclosed, the training set of the URPC2019 is divided into 3409 training images and 1000 test images, and the training images comprise four object categories including sea cucumbers, sea urchins, scallops and starfishes. The chinamam data collection of underwater images enhanced games contained 2071 training images and 676 Zhang Yanzheng images altogether.
All data sets have the problem of class imbalance, i.e. scallops and starfish contain much more data than sea cucumbers and sea urchins.
(b) Evaluation index
The application uses 3 indexes such as a common accuracy-recall (P-R) curve, an average accuracy (Average Precision, AP), an average accuracy average (mean Average Precision, mAP) and the like in a target detection task, mainly selects 3 indexes such as an accuracy-recall (P-R) curve, an average accuracy (Average Precision, AP), an average accuracy average (mean Average Precision, mAP) and the like, and has the following calculation formula:
wherein :TTP Representing a correct prediction; f (F) FP Representing mispredictions, including detecting objects that are not sea cucumbers as sea cucumbers and missed detection; f (F) FN Representing the situation that the sea cucumber target is erroneously detected as other types; p is the accuracy; r is the recall rate. In the P-R curve, the area enclosed by the P-R curve and the coordinate axis is equal to the AP value. In particular, we use an AP in order to evaluate the imbalance problem in the underwater dataset r ,AP m ,AP f And evaluating the performances of rare categories, general categories and frequent categories in the data set respectively. Finally, the mAP is obtained by averaging the AP values of all the classes, and is generally used for the whole target detection network modelAnd evaluating the detection performance.
Finally, based on the experimental setting, compared with an original SSD target detection model, the design in the application is more suitable for detecting the underwater image with the complex background, and the small target detection effect is better.
In summary, the application designs the method for detecting the underwater small target based on the unbalanced data and the complex background of the SSD, and the designed supervised multidimensional pixel attention network can effectively eliminate the influence of the complex background of the underwater image. Aiming at the problem of unbalanced distribution of the types in an underwater environment, a combined weighted knowledge distillation and multi-scale characteristic distillation module is provided, so that the types with fewer samples can obtain a better identification and detection effect, and compared with an original SSD algorithm, the detection capability of rare types is greatly improved, and the influence on the detection capability of the model due to unbalanced distribution of the samples is reduced. In the training process, as the backbone network extracts the features of the underwater image, the resolution of the feature map is also reduced, and the application adds a layer of cavity convolution with the expansion rate of 3, so that the network can have a multi-scale convolution kernel, and the capability of the model for detecting small objects is further enhanced. Finally, compared with the original SSD target detection model, the design in the application is more suitable for underwater image detection with complex background.
The foregoing is merely illustrative of the preferred embodiments of this application, and it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of this application, and it is intended to cover such modifications and changes as fall within the true scope of the application.

Claims (8)

1. An underwater small target detection method based on SSD improved unbalanced data and complex background is characterized by comprising the following steps:
step 1: an improvement to a network of SSDs, comprising: using VGG16 as front end backbone network, taking output at Conv3_3 of VGG16 as a first layer for predicting feature layer, embedding multidimensional pixel attention network after Conv3_3 of VGG16 model to eliminate complex background of underwater environment; using hole convolution with expansion rate r and ReLU activation after the multidimensional pixel attention network to sequentially generate a plurality of residual feature layers needing prediction; inputting the generated feature graphs corresponding to the feature layers into a joint weighted knowledge distillation and multi-scale feature distillation module so as to improve the detection performance of the underwater small target with fewer sample types;
step 2: adjusting the size of a picture to be detected in the original unbalanced underwater image data set and inputting the picture to a network of the SSD after the improvement in the step 1;
step 3: and (3) detecting the underwater small target based on the network of the SSD after the improvement of the step (1).
2. The method for detecting underwater small objects based on unbalanced data and complex background of SSD improvement according to claim 1, wherein feature patterns of a plurality of feature layers have different sizes.
3. The method for detecting underwater small objects in an unbalanced data and complex background based on SSD improvement of claim 1, wherein the multidimensional pixel attention network is composed of a combination of pixel attention and CBAM attention.
4. The method for detecting underwater small objects in unbalanced data and complex backgrounds based on SSD improvement according to claim 3, wherein the following steps are specifically executed in the multidimensional pixel attention network:
in the pixel attention network, a first layer of feature layers F 1 The feature map of the system is characterized in that the feature map of the system is obtained through an initial structure with convolution kernels with different ratios, and then a two-channel saliency map is learned through convolution operation, and the obtained saliency map respectively represents scores of a foreground and a background; then, a Softmax operation is performed on the significance map, and one of the channels is selected with F 1 Is multiplied by the feature map of (a); finally, a new information characteristic diagram A is obtained 1
The method of supervised learning is adopted: firstly, obtaining a binary mapping as a label according to a sample real label, and then using cross entropy loss and saliency mapping of the binary mapping as attention loss; furthermore, CBAM is used as a secondary attention network.
5. The method for detecting underwater small targets based on unbalanced data and complex background of SSD improvement of claim 1, wherein the combined weighted knowledge distillation and multi-scale feature distillation module specifically performs the following steps:
based on the generated feature graphs corresponding to the feature layers, the feature graphs are continuously trained by adopting example sampling and cross entropy loss to obtain a teacher model
Retraining a student model ψ by adding a weighted knowledge distillation loss and multi-scale feature distillation loss θ,ω In this process, multi-scale features and predictions from the trained teacher model are used, leaving room for training better student models to learn.
6. The method for detecting underwater small objects in unbalanced data and complex backgrounds based on SSD improvement as claimed in claim 5, wherein the weighted knowledge distillation loss L is calculated according to the following formula RKL
Where abs () represents the absolute value function, the weight factor w i Is that
wherein ti and si Class prediction probabilities respectively representing teacher model and student model, C representing class number, N i Representing the number of samples of class i.
7. The method for detecting underwater small targets in an unbalanced data and complex background based on SSD improvement of claim 6, wherein the multi-scale feature distillation loss L is calculated according to the following formula KF
wherein , and />Is a multi-scale feature v learned by a teacher model t Characteristics v of the student's end s And respectively carrying out normalization to obtain n representing the number of the feature graphs extracted by the model.
8. The method for detecting underwater small targets in an unbalanced data and complex background based on SSD improvement of claim 7, wherein the final classification loss L in the combined weighted knowledge distillation and multi-scale feature distillation module JWAFD The method comprises the following steps:
obtaining the classification loss L JWAFD
wherein ,LCE Representing cross entropy loss between ground truth labels and student model predictions,true label vector representing an image data point, C representing the number of categories, y i Represents the ith component of y, z s Output representing student model->The estimated class probability representing the output of the student model, the superparameters alpha and beta control the respective distillation amounts, while delta, gamma represent scaling parameters for multi-scale feature distillation and weighted knowledge distillation, respectively, and T represents the temperature of the knowledge distillation.
CN202310578589.2A 2023-05-22 2023-05-22 Unbalanced data and underwater small target detection method under complex background based on SSD improvement Pending CN116612378A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310578589.2A CN116612378A (en) 2023-05-22 2023-05-22 Unbalanced data and underwater small target detection method under complex background based on SSD improvement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310578589.2A CN116612378A (en) 2023-05-22 2023-05-22 Unbalanced data and underwater small target detection method under complex background based on SSD improvement

Publications (1)

Publication Number Publication Date
CN116612378A true CN116612378A (en) 2023-08-18

Family

ID=87677640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310578589.2A Pending CN116612378A (en) 2023-05-22 2023-05-22 Unbalanced data and underwater small target detection method under complex background based on SSD improvement

Country Status (1)

Country Link
CN (1) CN116612378A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823864A (en) * 2023-08-25 2023-09-29 锋睿领创(珠海)科技有限公司 Data processing method, device, equipment and medium based on balance loss function
CN117173550A (en) * 2023-08-22 2023-12-05 中国科学院声学研究所 Method and system for detecting underwater small target of synthetic aperture sonar image

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764462A (en) * 2018-05-29 2018-11-06 成都视观天下科技有限公司 A kind of convolutional neural networks optimization method of knowledge based distillation
CN111860693A (en) * 2020-07-31 2020-10-30 元神科技(杭州)有限公司 Lightweight visual target detection method and system
CN111898617A (en) * 2020-06-29 2020-11-06 南京邮电大学 Target detection method and system based on attention mechanism and parallel void convolution network
US10970598B1 (en) * 2020-05-13 2021-04-06 StradVision, Inc. Learning method and learning device for training an object detection network by using attention maps and testing method and testing device using the same
CN112860183A (en) * 2021-01-07 2021-05-28 西安交通大学 Multisource distillation-migration mechanical fault intelligent diagnosis method based on high-order moment matching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764462A (en) * 2018-05-29 2018-11-06 成都视观天下科技有限公司 A kind of convolutional neural networks optimization method of knowledge based distillation
US10970598B1 (en) * 2020-05-13 2021-04-06 StradVision, Inc. Learning method and learning device for training an object detection network by using attention maps and testing method and testing device using the same
CN111898617A (en) * 2020-06-29 2020-11-06 南京邮电大学 Target detection method and system based on attention mechanism and parallel void convolution network
CN111860693A (en) * 2020-07-31 2020-10-30 元神科技(杭州)有限公司 Lightweight visual target detection method and system
CN112860183A (en) * 2021-01-07 2021-05-28 西安交通大学 Multisource distillation-migration mechanical fault intelligent diagnosis method based on high-order moment matching

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LINFENG LI等: "Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms", NEURAL PROCESSING LETTERS, vol. 55, 3 January 2023 (2023-01-03), pages 6165 *
张彤彤;董军宇;赵浩然;李琼;孙鑫;: "基于知识蒸馏的轻量型浮游植物检测网络", 应用科学学报, no. 03, 30 May 2020 (2020-05-30) *
袁泽昊等: "基于特征知识蒸馏的人体姿态估计", 软件, vol. 41, no. 12, 31 December 2020 (2020-12-31), pages 198 - 207 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173550A (en) * 2023-08-22 2023-12-05 中国科学院声学研究所 Method and system for detecting underwater small target of synthetic aperture sonar image
CN116823864A (en) * 2023-08-25 2023-09-29 锋睿领创(珠海)科技有限公司 Data processing method, device, equipment and medium based on balance loss function
CN116823864B (en) * 2023-08-25 2024-01-05 锋睿领创(珠海)科技有限公司 Data processing method, device, equipment and medium based on balance loss function

Similar Documents

Publication Publication Date Title
CN111259930B (en) General target detection method of self-adaptive attention guidance mechanism
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN112750140B (en) Information mining-based disguised target image segmentation method
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN116612378A (en) Unbalanced data and underwater small target detection method under complex background based on SSD improvement
CN110598029A (en) Fine-grained image classification method based on attention transfer mechanism
CN110942471B (en) Long-term target tracking method based on space-time constraint
CN112085059B (en) Breast cancer image feature selection method based on improved sine and cosine optimization algorithm
CN111612008A (en) Image segmentation method based on convolution network
CN109743642B (en) Video abstract generation method based on hierarchical recurrent neural network
CN107862680B (en) Target tracking optimization method based on correlation filter
CN115131760B (en) Lightweight vehicle tracking method based on improved feature matching strategy
CN114067444A (en) Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature
CN112163530B (en) SSD small target detection method based on feature enhancement and sample selection
CN113159066B (en) Fine-grained image recognition algorithm of distributed labels based on inter-class similarity
CN111753682A (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN110633727A (en) Deep neural network ship target fine-grained identification method based on selective search
CN113743505A (en) Improved SSD target detection method based on self-attention and feature fusion
CN113408472A (en) Training method of target re-recognition model, target re-recognition method and device
CN114897782B (en) Gastric cancer pathological section image segmentation prediction method based on generation type countermeasure network
CN111123232B (en) Radar individual identification system with task adaptability
CN114627424A (en) Gait recognition method and system based on visual angle transformation
CN114049503A (en) Saliency region detection method based on non-end-to-end deep learning network
CN114065831A (en) Hyperspectral image classification method based on multi-scale random depth residual error network
CN111832463A (en) Deep learning-based traffic sign detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination