CN115170943A - Improved visual transform seabed substrate sonar image classification method based on transfer learning - Google Patents

Improved visual transform seabed substrate sonar image classification method based on transfer learning Download PDF

Info

Publication number
CN115170943A
CN115170943A CN202210889544.2A CN202210889544A CN115170943A CN 115170943 A CN115170943 A CN 115170943A CN 202210889544 A CN202210889544 A CN 202210889544A CN 115170943 A CN115170943 A CN 115170943A
Authority
CN
China
Prior art keywords
network
training
image
layer
transform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210889544.2A
Other languages
Chinese (zh)
Inventor
赵玉新
郑良锋
朱可心
邓雄
卢志忠
杨德全
何健新
吴昌哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202210889544.2A priority Critical patent/CN115170943A/en
Publication of CN115170943A publication Critical patent/CN115170943A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an improved visual transform seabed sediment sonar image classification method based on transfer learning, and belongs to a deep learning method. The method comprises the steps of carrying out pre-convolution processing on an image to form a patch, and sequentially carrying out patch embedding, position embedding transform coding layer and multi-layer perceptron output layer to obtain a classification result. And minimizing network residual errors through a back propagation method to realize the training of the classifier. The deep learning aims at learning the internal rules and the expression levels of the sample data, has strong nonlinear fitting learning capacity, and can effectively discover the common characteristics among the similar images. Through the stacking of multiple coding layers, the network gradually learns global and local features, and actively focuses on 'important' parts in the image.

Description

Improved visual transform seabed substrate sonar image classification method based on transfer learning
Technical Field
The invention belongs to the fields of submarine substrate acoustic exploration, inversion, image classification technology and artificial intelligence, and particularly relates to an improved visual transform submarine substrate sonar image classification method based on transfer learning.
Background
The submarine detection aims to obtain the space distribution characteristics of submarine substrates and the change rule of the submarine substrates along with time by using a certain technical means, and is a mode research based on a continuous space-time system. The sound waves have excellent propagation characteristics in seawater, and with the rapid development of sonar technology, the submarine substrate sonar images can contain abundant submarine landforms and substrate characteristic information, and the submarine landforms are inverted according to the submarine substrate sonar images, so that the method has important significance for submarine exploration, underwater salvage and deep sea protection. However, due to the limitation of the complex underwater sound field environment and the performance of sonar equipment, sonar images may have problems of speckle noise, edge blurring and the like, which makes the requirements on the performance of the classifier stricter.
In the scientific research of the seabed, various seabed acoustic detection technologies have been developed, and the underwater acoustic technology plays an important role in detecting shallow sections and even deep stratum structures of the seabed. Because parameters such as acoustic absorption coefficients, reflection coefficients and surface roughness of different types of substrates have obvious difference, the traditional acoustic seabed inversion identifies the type of the seabed substrate by using seabed acoustic reflection characteristics. In fact, although the inversion method is simple in calculation, the classification accuracy is not ideal due to the complex seabed environment.
Modern acoustic seabed sediment classification is based on sonar images acquired by a sonar system, real seabed sediment sample data acquired by a mechanical sampling method are combined, and a sediment type automatic analysis model is established by using a classification algorithm, so that efficient and accurate seabed sediment classification is realized. Diesing classifies 4 types of substrate sonar images by using a random forest algorithm, the obtained result is superior to that of the traditional target analysis method and the geological statistical analysis method, and many scholars at home and abroad use various artificial neural network methods to realize classification and identification of the seabed substrate. Chakraborty et al studied classification and identification of seabed sediment using self-organizing feature mapping (SOFM) and Learning Vector Quantization (LVQ) neural network methods. The GA and BP neural networks are effectively combined in Tang and autumn China, automatic classification and identification of the types of the seabed bedrock, gravel, sand, fine sand, mud and other substrates are realized, and the classification accuracy of the bedrock, the gravel, the sand, the fine sand and the mud is respectively 92.2%, 81.9%, 89.3%, 85.9% and 88.2%. But LVQ neural networks do not take full advantage of neurons and are sensitive to initial values. Moreover, the BP neural network has the problems of low convergence speed, random initialization parameters, easy falling into local optimization and the like, the algorithm has low operation efficiency, and especially, when the number of network layers is large and the complexity is large, the training time is long. Then, a convolutional neural network is used to combine the texture features and statistic features of the sonar image with the CNN, so as to construct a classifier. The classifier can realize the automatic identification and classification functions of the substrate. However, the general convolutional neural network has an over-fitting problem during training, and due to the limitation of the principle, the convolutional neural network cannot effectively obtain global information and position information of a feature map in a large visual field range. That is, the long-distance dependence of the picture cannot be learned, thereby affecting the classification performance of the network. In addition, the deep learning method needs a large number of training samples to obtain common features of images, and the improvement of the classification accuracy of deep learning seabed sediment sonar images is limited due to the insufficient number of seabed sediment sonar images.
Disclosure of Invention
The invention aims to provide an improved visual transform seabed substrate sonar image classification method based on transfer learning.
The purpose of the invention is realized by the following technical scheme:
an improved visual transform seabed sediment sonar image classification method based on transfer learning comprises the following steps:
step 1: acquiring a data set for training;
acquiring n types of seabed substrate sonar images by knowing the type of the substrate to which the image belongs;
step 2: carrying out preprocessing operation on the images;
preprocessing comprises image denoising and image feature enhancement;
and step 3: acquiring a source domain image for pre-training;
and 4, step 4: establishing an improved vision Transformer network;
the system comprises a pre-convolution transformation layer, a patch and position embedding layer, a transform coding layer and a multi-layer perceptron output layer; the layers are mutually connected to form the whole improved vision transformer network;
and 5: training a network on the source domain texture image to obtain a pre-training model;
dividing the texture image into a training set and a verification set by using a cross entropy loss function, and inputting the training set and the verification set into a network for network training;
step 6: according to the step 2, setting the number of training sets and verification sets according to the proportion of 4;
and 7: inputting the images in the training set and the actual classes thereof into a network for adjustment, and verifying the classification accuracy on the verification set in each training round;
and 8: adjusting network parameters to obtain a trained Transformer classifier;
and step 9: and inputting the sonar image of which the substrate type needs to be judged into a classifier, and automatically outputting a classification result by the classifier.
Further, the images selected in step 1 belong to the same substrate type, sonar image slices of the same size are used, and the network uses input images of 100 × 100 resolution.
Further, step 2 specifically includes:
step 2.1: respectively carrying out smooth denoising and feature enhancement on the sonar image of the seabed substrate;
step 2.2: filtering the smooth denoised and feature-enhanced sonar image of the seabed substrate by using a wavelet transform technology;
step 2.3: and (3) performing enhancement processing on the filtered seabed substrate sonar image by using a multi-stage median filtering method.
Further, in the step 4, each layer structure is specifically:
the pre-convolution transform layer is: converting an input picture into a 12 multiplied by 512 feature map, wherein the dimension of an output tensor is 12 multiplied by 512, and the subsequent patch embedding operation is carried out;
patch and position embedding layer are: converting the tensor output by the pre-convolution conversion layer into patches which can be used by a Transformer, recording the position information of each patch, and adding a category token;
the transform coding layer is: the system comprises 12 coding blocks and a multi-head attention structure, and does not change the dimensionality of a tensor; the multi-head attention structure uses 12 heads, and for containing 12 q, k, v values, the following calculation is performed:
Figure BDA0003766960820000031
head i = attention (QW) i Q ,KW i K ,VW i V )
Multi-head attention (Q, K, V) = head 1 ,...,head i )W O
The W matrix is a transformation parameter matrix, and the Softmax function can map the input to the probabilities of various classes as follows:
Figure BDA0003766960820000032
wherein h (x, y) j ) Is the prediction result of the j item;
the output layers of the multilayer perceptron are as follows: implementing dimension transformation by using a full connection layer to obtain class output; the output is an n-dimensional tensor, n is the number of the classes of the samples, and the type of the input picture can be directly obtained.
Further, step 5, taking the difference between the output of the network and the real output as the basis of network training, dividing the training set image into a plurality of batchs, and inputting the batchs into the network in Batch for training; when the verification accuracy of the network does not improve any more, the training is completed.
Further, the adjustment manner of step 7 is as follows:
copying parameters in a Transformer coding layer in the pre-trained model into a new model to obtain a post-migration model; inputting the images in the training set into the network, and performing a training process by using a cross entropy loss function, wherein the training process is similar to that in the step 5; and after each iteration of the training set is finished, the classification accuracy needs to be verified on the verification set, and when the verification accuracy of the network is not improved any more, the model after final fine tuning is obtained.
Further, the output result of step 9 will be presented in the form of n-dimensional vector, where n is the number of picture categories; the number of the position corresponding to each category is the probability of the position, and the number of the position corresponding to the maximum value in the result vector is the category result output by the classifier.
The invention has the beneficial effects that:
the invention adopts a transfer learning method to transfer the prior knowledge to a new model. The basic idea of pre-training in a source domain and fine-tuning in a target domain is adopted, so that the training speed, the classification accuracy and the convergence speed under a small sample are effectively improved. The model after pre-training can be suitable for sonar images of other seabed substrates. The deep learning classifier using the transfer learning can use the learned method in the similar data domain to a new target domain through pre-training. The migration process can obtain a better classification effect only by fine adjustment, and the number of required training samples can be greatly reduced.
The invention discloses a Transformer, which is a novel neural network architecture and is characterized by the use of a large number of attention modules. The transform classifier adopts a multi-head attention mechanism, can effectively capture detailed information which is beneficial to classification in an image, inhibit other useless information, learn effective information in a large-scale range, is not limited by the window size of a convolutional neural network, can effectively grasp global and local characteristics of a seabed bottom material sonar image, has strong generalization capability, retains the attention mechanism and grasps the capability of important characteristics, reduces overfitting due to the introduction of multiple heads, parallelly calculates and selects a plurality of information from input information, and has higher calculation efficiency in a high parallel environment such as a GPU. The method has the capability of quickly acquiring the region needing attention, and the use of a multi-head attention mechanism can enable the network to integrate global and local characteristics, so that the unique characteristics of various images are effectively held. Many layer normalization (LayerNorm), dropPath, etc. regularization methods are used in the network to reduce network overfitting. Meanwhile, aiming at the characteristics of the submarine sonar image, the invention improves and adjusts the network structure, enables the network structure to process a single-channel image, and solves the problem that the original visual Transformer cannot process a gray image. By adjusting the construction mode of the input patch, the convergence capability of the network training on few samples is improved. The method can realize the rapid and automatic identification of the substrate category, realize the sonar image classification operation in large batch, and can realize the higher classification accuracy rate by only using less training samples.
Drawings
FIG. 1 is a flow chart of the steps of a visual transform-based classification method for sonar images of seabed substrate according to the present invention;
FIG. 2 is a schematic diagram of transfer learning employed by the present invention;
FIG. 3 is a diagram of an overall network structure of an improved Transformer network according to the present invention;
FIG. 4 is a detailed diagram of the pre-convolution transform layer in the network of the present invention;
fig. 5 is a detailed structure diagram of a single coding block of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention discloses an improved visual Transformer classification method, belonging to a deep learning method. The method comprises the steps of carrying out pre-convolution processing on an image to form a patch, and sequentially carrying out patch embedding, position embedding of a transform coding layer and a multi-layer perceptron output layer to obtain a classification result. And minimizing network residual errors through a back propagation method to realize the training of the classifier. The deep learning aims at learning the internal rules and the expression levels of sample data, has strong nonlinear fitting learning capacity, and can effectively discover the common characteristics among similar images. Through the stacking of multiple coding layers, the network gradually learns global and local features, and actively focuses on 'important' parts in the image.
The invention will be described in further detail with reference to the accompanying drawings based on the steps of the technical solution.
The method comprises the following steps: acquiring a data set for training, wherein the type of a substrate to which an image belongs needs to be known, and a plurality of n types of sonar images of the substrate are acquired. Since deep learning requires the same features in the learning images to realize classification, a certain number of similar sonar images are indispensable. In this step, it is necessary to ensure that the selected images belong to the same substrate type. To obtain consistent features, we also need to use sonar image slices of the same size, and our network uses 100 x 100 resolution input images.
Step two: the dataset images are pre-processed. Since sonar images may have characteristics of low resolution, severe noise interference, deteriorated edge texture, etc., it is necessary to preprocess the sonar images of the seabed substrate. The pretreatment specifically comprises: firstly, smooth denoising and feature enhancement are respectively carried out on a sonar image of the seabed substrate, wherein the feature enhancement comprises correction enhancement and edge sharpening; and then filtering the smooth denoised and feature-enhanced seabed substrate sonar image by using a wavelet transform technology to obtain a seabed substrate sonar image with a higher signal-to-noise ratio and a better denoising effect, wherein the filtering adopts wavelet band-pass filters with different scales, the scales refer to that wavelet functions use different thresholds for low-frequency and high-frequency parts, and finally, reinforcing the filtered seabed substrate sonar image by using a multi-level median filtering method, wherein the reinforcing treatment comprises texture and edge shape information. The image is subjected to operations such as filtering, characteristic enhancement and the like, so that the influence of the noise generated by the sound field environment, the performance of sonar equipment and the like to blur the edge of the image is reduced, and the characteristic information of the sonar image is highlighted.
Step three: source domain images for pre-training are acquired. The transfer learning requires that the source domain image for pre-training and the target domain image for fine tuning have similar features and distribution, so that the mode learned by pre-training can be used for target domain image classification. Most of the existing pre-training models are based on ImageNet images, and the correlation with the features of the sonar images of the seabed substrate is weak. The invention provides a method for pre-training by using a texture data set as a source domain image, which has strong similarity and close data distribution and is more suitable for model migration, and the flow of data selection and migration learning is shown in figure 2.
Step four: and establishing an improved vision Transformer network. The network used by the invention adopts a pre-convolution conversion layer, a patch and position embedding (encoding) layer, a transform encoding (encoding) layer and a multi-layer perceptron (multi-layer perceptron) output layer. The layers are connected with each other to form the whole improved vision transducer network, as shown in the figure 2, and the dimension of the current tensor is shown on the right of each layer. The structure of each layer is described in detail below.
Pre-convolution transform layer: it converts the input picture into a 12 × 12 × 512 feature map for the subsequent patch embedding operation. The detailed structure of the pre-convolution transform layer is shown in fig. 3. First a 7 x 7 two-dimensional convolution layer is passed through, which prevents overfitting from occurring after group normalization and dropout operations, and then two residual blocks are passed through. Residual concatenation helps to preserve the pre-and post-information, and the final output tensor dimension is 12 × 12 × 512. The tensor dimensions obtained at each stage are listed in the figure. The improved network is additionally provided with the prepositive convolution conversion module, and the optimized network aims at the sonar image, so that the network can directly process the sonar image with a single channel and can be converged more quickly than the original network.
Patch and position embedding layer: its role is to convert the pre-tensor into patches that can be used by the Transformer and to record the position information of each patch. Meanwhile, a category token is added, which is a trainable parameter and can change along with the training of the network. As shown in the corresponding structure of fig. 2, the patch dimension is raised to 768 using a 1 × 1 convolution operation, and the tensor is flattened to two dimensions. At this time, a position patch is additionally spliced in, and a trainable parameter which is also 145 × 768 is added to the position patch to realize position embedding.
transform coding layer: it is formed by stacking 12 code blocks, each code block and the structure of multi-head attention are shown in fig. 4, and the tensor dimension is not changed in this part. Most important of them is a multi-head attention module, the invention uses 12 heads, namely, 12 q, k, v values are contained. The multi-head process performs attention initialized by a plurality of different parameters in parallel to calculate matching parameters between two elements, and splices the attention result of each element into one input of a subsequent projection network. The multi-head attention layer is calculated as follows:
Figure BDA0003766960820000061
head i = attention (QW) i Q ,KW i K ,VW i V )
Multi-head attention (Q, K, V) = head 1 ,...,head i )W O
The W matrix is a transformation parameter matrix, and the Softmax function can map the input to the probabilities of various classes, as follows
Figure BDA0003766960820000062
Wherein h (x, y) j ) Is the predicted result for the j-th item.
In addition, the detailed structure of the multi-layer sensor in the structure is also given in fig. 4. the main function of the transform coding layer is to learn key information in input data, which is the core part of the whole network, and it does not change the dimensionality of the data.
Multilayer perceptron output layer: the invention uses a full connection layer to realize dimension transformation so as to obtain category output. The output is an n-dimensional tensor, and n is the number of the classes of the samples, so that the type of the input picture can be directly obtained through the output.
The invention selects the SGD optimizer to perform gradient descent. Aiming at the performance in a submarine substrate sonar image classification task, the SGD optimizer can obtain higher accuracy.
Step five: and training a network on the source domain texture image to obtain a pre-training model.
And (4) dividing the texture image into a training set and a verification set by using a cross entropy loss function, and inputting the training set and the verification set into a network for network training. And continuously learning and updating network parameters by taking the difference between the output of the network and the output under the real condition as the basis of network training. And (4) segmenting the training set image into a plurality of batchs, and inputting the batchs into a network in batches for training. When the verification accuracy of the network does not improve any more, the training is completed. At this time, the network has knowledge of the data extraction capability of the source domain.
Step six: the number of training sets and verification sets is set according to the proportion of 4. And dividing the data set processed in the step two into a training set and a verification set according to a specific proportion. The method aims to ensure that a training set and a verification set have reasonable and similar distribution and a network can better induce common characteristics of learning images.
Step seven: and inputting the images in the training set and the actual categories thereof into a network for fine adjustment, and verifying the classification accuracy on the verification set every training round. The fine tuning mode is as follows: copying the parameters in the Transformer coding layer in the pre-trained model to a new model, namely the model after migration. And inputting the images in the training set into the network, and performing a training process by using the cross entropy loss function similarly to the step five. At this time, since the network has learned the priori knowledge of the source domain, the network can be converged by training only a small number of rounds. Note that after each iteration of the training set is completed, the classification accuracy needs to be verified on the verification set, and when the verification accuracy of the network does not improve any more, the model after final fine tuning is obtained.
Step eight: and adjusting parameters such as learning rate, momentum of the optimizer and the like to enable the classifier to perform best on the verification set, so as to obtain the trained Transformer classifier. In order to obtain a network with the best classification effect, parameters such as learning rate and batch size need to be adjusted to enable the network to reach the optimal solution. The larger the learning rate is, the faster the network convergence speed is, but phenomena such as loss explosion, oscillation and the like can occur; when the learning rate is too small, the network is easy to fall into a local optimal solution, so that the training loss is not reduced, the verification accuracy is not improved, and the network effect is limited, so that the learning rate needs to be adjusted according to experience.
Increasing the Batch size can improve the stability of train loss convergence, but the size of the Batch size is limited by memory. And the Batch size is too small, the network may not converge.
The invention provides the method for using the learning rate of 0.0001-0.001, along with the increase of the iteration times and the reduction of training loss, the method can also properly reduce the size of the learning rate and realize the approximation of the optimal solution. The selected batch size is suitable for classification work of submarine substrate sonar images, and the selected batch size simultaneously considers memory, video memory and computing resource limitations and cannot exceed the maximum value.
Step nine: and inputting the sonar image of which the substrate type needs to be judged into a classifier, and automatically outputting a classification result by the classifier. The output result will be shown in the form of n-dimensional vector, where n is the number of picture categories. The number of the corresponding position of each category is the possible probability, so the corresponding position of the maximum value in the result vector is the category given by the classifier.
The factors that influence the accuracy of the classifier in the invention are as follows:
1. the images used as the classifier training set can preferably well embody the characteristics of various types of substrates. The generalization of the extracted features of the algorithm is influenced by image quality problems such as noise, resolution and the like, and the most excellent classification performance is brought by the excellent training set images. The method hopefully reduces the influence brought by the image quality of the training set through the means of image denoising, feature enhancement and the like.
2. The learning rate is set according to the situation of the training set, so that the network obtains better convergence rate and higher classification accuracy. Too high a learning rate may cause the network to fail to converge and wander around the optimal value. Too low a learning rate may cause the network to fall into a local optimum.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. The utility model provides an improve vision transform seabed substrate sonar image classification method based on migration learning which characterized in that: the method comprises the following steps:
step 1: acquiring a data set for training;
acquiring n types of seabed substrate sonar images by knowing the type of the substrate to which the image belongs;
step 2: carrying out preprocessing operation on the images;
preprocessing comprises image denoising and image feature enhancement;
and 3, step 3: acquiring a source domain image for pre-training;
and 4, step 4: establishing an improved vision Transformer network;
the system comprises a pre-convolution transformation layer, a patch and position embedding layer, a transform coding layer and a multi-layer perceptron output layer; the layers are mutually connected to form the whole improved vision transformer network;
and 5: training a network on the source domain texture image to obtain a pre-training model;
dividing the texture image into a training set and a verification set by using a cross entropy loss function, and inputting the training set and the verification set into a network for network training;
step 6: according to the step 2, setting the number of training sets and verification sets according to the proportion of 4;
and 7: inputting the images in the training set and the actual classes thereof into a network for adjustment, and verifying the classification accuracy on the verification set in each training round;
and step 8: adjusting network parameters to obtain a trained Transformer classifier;
and step 9: and inputting the sonar image of which the substrate type needs to be judged into a classifier, and automatically outputting a classification result by the classifier.
2. The method for classifying the sonar image of the improved visual transform seabed sediment based on the transfer learning according to claim 1, is characterized in that: the images selected in step 1 belong to the same substrate type, sonar image slices of the same size are used, and the network adopts 100 × 100 resolution input images.
3. The method for improving the visual transform seabed substrate sonar image classification based on the transfer learning according to claim 1, is characterized in that: the step 2 specifically comprises the following steps:
step 2.1: respectively carrying out smooth denoising and feature enhancement on the sonar image of the seabed substrate;
step 2.2: filtering the smooth denoised and feature-enhanced submarine substrate sonar image by using a wavelet transform technology;
step 2.3: and (3) performing enhancement processing on the filtered seabed substrate sonar image by using a multi-stage median filtering method.
4. The method for classifying the sonar image of the improved visual transform seabed sediment based on the transfer learning according to claim 1, is characterized in that: in the step 4, each layer structure is specifically as follows:
the pre-convolution transform layer is: converting an input picture into a feature map of 12 multiplied by 512, wherein the dimension of an output tensor is 12 multiplied by 512 for subsequent patch embedding operation;
patch and position embedding layer are: converting the tensor output by the pre-convolution conversion layer into patches which can be used by a Transformer, recording the position information of each patch, and adding a category token;
the transform coding layer is: the system comprises 12 coding blocks and a multi-head attention structure, and does not change the dimensionality of a tensor; the multi-head attention structure uses 12 heads, and for containing 12 q, k, v values, the following calculation is performed:
Figure FDA0003766960810000021
head i = attention (QW) i Q ,KW i K ,VW i V )
Multiple head attention (Q, K, m = mosaic (head) 1 ,…,heod i )W O
The W matrix is a transformation parameter matrix, and the Softmax function can map the input to the probabilities of various classes as follows:
Figure FDA0003766960810000022
wherein h (x, y) j ) Is the prediction result of the j item;
the output layers of the multilayer perceptron are as follows: implementing dimension transformation by using a full connection layer to obtain class output; the output is an n-dimensional tensor, n is the number of the classes of the samples, and the type of the input picture can be directly obtained.
5. The method for classifying the sonar image of the improved visual transform seabed sediment based on the transfer learning according to claim 1, is characterized in that: step 5, taking the difference between the output of the network and the real output as the basis of network training, dividing the training set image into a plurality of batchs, and inputting the plurality of batchs into the network in batches for training; when the verification accuracy of the network is not improved any more, the training is completed.
6. The method for classifying sonar images of improved visual transform seabed sediment based on transfer learning according to claim 1 or 4, is characterized in that: the adjustment mode of step 7 is as follows:
copying parameters in a Transformer coding layer in the pre-trained model to a new model to obtain a post-migration model; inputting the images in the training set into the network, and performing a training process by using a cross entropy loss function similarly to the step 5; and after each iteration of the training set is finished, the classification accuracy needs to be verified on the verification set, and when the verification accuracy of the network is not improved any more, the model after final fine tuning is obtained.
7. The method for improving the visual transform seabed substrate sonar image classification based on the transfer learning according to claim 1, is characterized in that: step 9, the output result is displayed in the form of n-dimensional vectors, wherein n is the number of picture categories; the number of the corresponding position of each category is the probability, and the number of the corresponding position of the maximum value in the result vector is the category result output by the classifier.
CN202210889544.2A 2022-07-27 2022-07-27 Improved visual transform seabed substrate sonar image classification method based on transfer learning Pending CN115170943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210889544.2A CN115170943A (en) 2022-07-27 2022-07-27 Improved visual transform seabed substrate sonar image classification method based on transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210889544.2A CN115170943A (en) 2022-07-27 2022-07-27 Improved visual transform seabed substrate sonar image classification method based on transfer learning

Publications (1)

Publication Number Publication Date
CN115170943A true CN115170943A (en) 2022-10-11

Family

ID=83496334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210889544.2A Pending CN115170943A (en) 2022-07-27 2022-07-27 Improved visual transform seabed substrate sonar image classification method based on transfer learning

Country Status (1)

Country Link
CN (1) CN115170943A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117197596A (en) * 2023-11-08 2023-12-08 自然资源部第二海洋研究所 Mixed substrate acoustic classification method based on small sample transfer learning
CN117437287A (en) * 2023-12-14 2024-01-23 深圳大学 Underwater positioning method for structure priori knowledge augmentation and migration

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117197596A (en) * 2023-11-08 2023-12-08 自然资源部第二海洋研究所 Mixed substrate acoustic classification method based on small sample transfer learning
CN117197596B (en) * 2023-11-08 2024-02-13 自然资源部第二海洋研究所 Mixed substrate acoustic classification method based on small sample transfer learning
CN117437287A (en) * 2023-12-14 2024-01-23 深圳大学 Underwater positioning method for structure priori knowledge augmentation and migration
CN117437287B (en) * 2023-12-14 2024-03-19 深圳大学 Underwater positioning method for structure priori knowledge augmentation and migration

Similar Documents

Publication Publication Date Title
CN109086824B (en) Seabed substrate sonar image classification method based on convolutional neural network
CN111583263B (en) Point cloud segmentation method based on joint dynamic graph convolution
CN110232341B (en) Semi-supervised learning image identification method based on convolution-stacking noise reduction coding network
CN112347888B (en) Remote sensing image scene classification method based on bi-directional feature iterative fusion
CN115170943A (en) Improved visual transform seabed substrate sonar image classification method based on transfer learning
CN108427958A (en) Adaptive weight convolutional neural networks underwater sonar image classification method based on deep learning
CN112149755B (en) Small sample seabed underwater sound image substrate classification method based on deep learning
CN110991257B (en) Polarized SAR oil spill detection method based on feature fusion and SVM
CN110490265A (en) A kind of image latent writing analysis method based on two-way convolution sum Fusion Features
CN116468995A (en) Sonar image classification method combining SLIC super-pixel and graph annotation meaning network
CN108596044B (en) Pedestrian detection method based on deep convolutional neural network
CN111310816B (en) Method for recognizing brain-like architecture image based on unsupervised matching tracking coding
Liu et al. Ensemble of CNN and rich model for steganalysis
CN116452862A (en) Image classification method based on domain generalization learning
CN112508106B (en) Underwater image classification method based on convolutional neural network
CN113763417B (en) Target tracking method based on twin network and residual error structure
CN110503157B (en) Image steganalysis method of multitask convolution neural network based on fine-grained image
CN113435276A (en) Underwater sound target identification method based on antagonistic residual error network
CN116977694A (en) Hyperspectral countermeasure sample defense method based on invariant feature extraction
CN112991257B (en) Heterogeneous remote sensing image change rapid detection method based on semi-supervised twin network
CN115329821A (en) Ship noise identification method based on pairing coding network and comparison learning
CN114463614A (en) Significance target detection method using hierarchical significance modeling of generative parameters
CN114155554A (en) Transformer-based camera domain pedestrian re-recognition method
Wang et al. Underwater optical image object detection based on YOLOv7 algorithm
CN112926619B (en) High-precision underwater laser target recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination