CN115170943A

CN115170943A - Improved visual transform seabed substrate sonar image classification method based on transfer learning

Info

Publication number: CN115170943A
Application number: CN202210889544.2A
Authority: CN
Inventors: 赵玉新; 郑良锋; 朱可心; 邓雄; 卢志忠; 杨德全; 何健新; 吴昌哲
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-10-11

Abstract

The invention discloses an improved visual transform seabed sediment sonar image classification method based on transfer learning, and belongs to a deep learning method. The method comprises the steps of carrying out pre-convolution processing on an image to form a patch, and sequentially carrying out patch embedding, position embedding transform coding layer and multi-layer perceptron output layer to obtain a classification result. And minimizing network residual errors through a back propagation method to realize the training of the classifier. The deep learning aims at learning the internal rules and the expression levels of the sample data, has strong nonlinear fitting learning capacity, and can effectively discover the common characteristics among the similar images. Through the stacking of multiple coding layers, the network gradually learns global and local features, and actively focuses on 'important' parts in the image.

Description

Improved visual transform seabed substrate sonar image classification method based on transfer learning

Technical Field

The invention belongs to the fields of submarine substrate acoustic exploration, inversion, image classification technology and artificial intelligence, and particularly relates to an improved visual transform submarine substrate sonar image classification method based on transfer learning.

Background

The submarine detection aims to obtain the space distribution characteristics of submarine substrates and the change rule of the submarine substrates along with time by using a certain technical means, and is a mode research based on a continuous space-time system. The sound waves have excellent propagation characteristics in seawater, and with the rapid development of sonar technology, the submarine substrate sonar images can contain abundant submarine landforms and substrate characteristic information, and the submarine landforms are inverted according to the submarine substrate sonar images, so that the method has important significance for submarine exploration, underwater salvage and deep sea protection. However, due to the limitation of the complex underwater sound field environment and the performance of sonar equipment, sonar images may have problems of speckle noise, edge blurring and the like, which makes the requirements on the performance of the classifier stricter.

In the scientific research of the seabed, various seabed acoustic detection technologies have been developed, and the underwater acoustic technology plays an important role in detecting shallow sections and even deep stratum structures of the seabed. Because parameters such as acoustic absorption coefficients, reflection coefficients and surface roughness of different types of substrates have obvious difference, the traditional acoustic seabed inversion identifies the type of the seabed substrate by using seabed acoustic reflection characteristics. In fact, although the inversion method is simple in calculation, the classification accuracy is not ideal due to the complex seabed environment.

Modern acoustic seabed sediment classification is based on sonar images acquired by a sonar system, real seabed sediment sample data acquired by a mechanical sampling method are combined, and a sediment type automatic analysis model is established by using a classification algorithm, so that efficient and accurate seabed sediment classification is realized. Diesing classifies 4 types of substrate sonar images by using a random forest algorithm, the obtained result is superior to that of the traditional target analysis method and the geological statistical analysis method, and many scholars at home and abroad use various artificial neural network methods to realize classification and identification of the seabed substrate. Chakraborty et al studied classification and identification of seabed sediment using self-organizing feature mapping (SOFM) and Learning Vector Quantization (LVQ) neural network methods. The GA and BP neural networks are effectively combined in Tang and autumn China, automatic classification and identification of the types of the seabed bedrock, gravel, sand, fine sand, mud and other substrates are realized, and the classification accuracy of the bedrock, the gravel, the sand, the fine sand and the mud is respectively 92.2%, 81.9%, 89.3%, 85.9% and 88.2%. But LVQ neural networks do not take full advantage of neurons and are sensitive to initial values. Moreover, the BP neural network has the problems of low convergence speed, random initialization parameters, easy falling into local optimization and the like, the algorithm has low operation efficiency, and especially, when the number of network layers is large and the complexity is large, the training time is long. Then, a convolutional neural network is used to combine the texture features and statistic features of the sonar image with the CNN, so as to construct a classifier. The classifier can realize the automatic identification and classification functions of the substrate. However, the general convolutional neural network has an over-fitting problem during training, and due to the limitation of the principle, the convolutional neural network cannot effectively obtain global information and position information of a feature map in a large visual field range. That is, the long-distance dependence of the picture cannot be learned, thereby affecting the classification performance of the network. In addition, the deep learning method needs a large number of training samples to obtain common features of images, and the improvement of the classification accuracy of deep learning seabed sediment sonar images is limited due to the insufficient number of seabed sediment sonar images.

Disclosure of Invention

The invention aims to provide an improved visual transform seabed substrate sonar image classification method based on transfer learning.

The purpose of the invention is realized by the following technical scheme:

an improved visual transform seabed sediment sonar image classification method based on transfer learning comprises the following steps:

step 1: acquiring a data set for training;

acquiring n types of seabed substrate sonar images by knowing the type of the substrate to which the image belongs;

step 2: carrying out preprocessing operation on the images;

preprocessing comprises image denoising and image feature enhancement;

and step 3: acquiring a source domain image for pre-training;

and 4, step 4: establishing an improved vision Transformer network;

the system comprises a pre-convolution transformation layer, a patch and position embedding layer, a transform coding layer and a multi-layer perceptron output layer; the layers are mutually connected to form the whole improved vision transformer network;

and 5: training a network on the source domain texture image to obtain a pre-training model;

dividing the texture image into a training set and a verification set by using a cross entropy loss function, and inputting the training set and the verification set into a network for network training;

step 6: according to the step 2, setting the number of training sets and verification sets according to the proportion of 4;

and 7: inputting the images in the training set and the actual classes thereof into a network for adjustment, and verifying the classification accuracy on the verification set in each training round;

and 8: adjusting network parameters to obtain a trained Transformer classifier;

and step 9: and inputting the sonar image of which the substrate type needs to be judged into a classifier, and automatically outputting a classification result by the classifier.

Further, the images selected in step 1 belong to the same substrate type, sonar image slices of the same size are used, and the network uses input images of 100 × 100 resolution.

Further, step 2 specifically includes:

step 2.1: respectively carrying out smooth denoising and feature enhancement on the sonar image of the seabed substrate;

step 2.2: filtering the smooth denoised and feature-enhanced sonar image of the seabed substrate by using a wavelet transform technology;

step 2.3: and (3) performing enhancement processing on the filtered seabed substrate sonar image by using a multi-stage median filtering method.

Further, in the step 4, each layer structure is specifically:

the pre-convolution transform layer is: converting an input picture into a 12 multiplied by 512 feature map, wherein the dimension of an output tensor is 12 multiplied by 512, and the subsequent patch embedding operation is carried out;

patch and position embedding layer are: converting the tensor output by the pre-convolution conversion layer into patches which can be used by a Transformer, recording the position information of each patch, and adding a category token;

the transform coding layer is: the system comprises 12 coding blocks and a multi-head attention structure, and does not change the dimensionality of a tensor; the multi-head attention structure uses 12 heads, and for containing 12 q, k, v values, the following calculation is performed:

head _i = attention (QW) _i ^Q ，KW _i ^K ，VW _i ^V )

Multi-head attention (Q, K, V) = head ₁ ，...，head _i )W ^O

The W matrix is a transformation parameter matrix, and the Softmax function can map the input to the probabilities of various classes as follows:

wherein h (x, y) _j ) Is the prediction result of the j item;

the output layers of the multilayer perceptron are as follows: implementing dimension transformation by using a full connection layer to obtain class output; the output is an n-dimensional tensor, n is the number of the classes of the samples, and the type of the input picture can be directly obtained.

Further, step 5, taking the difference between the output of the network and the real output as the basis of network training, dividing the training set image into a plurality of batchs, and inputting the batchs into the network in Batch for training; when the verification accuracy of the network does not improve any more, the training is completed.

Further, the adjustment manner of step 7 is as follows:

copying parameters in a Transformer coding layer in the pre-trained model into a new model to obtain a post-migration model; inputting the images in the training set into the network, and performing a training process by using a cross entropy loss function, wherein the training process is similar to that in the step 5; and after each iteration of the training set is finished, the classification accuracy needs to be verified on the verification set, and when the verification accuracy of the network is not improved any more, the model after final fine tuning is obtained.

Further, the output result of step 9 will be presented in the form of n-dimensional vector, where n is the number of picture categories; the number of the position corresponding to each category is the probability of the position, and the number of the position corresponding to the maximum value in the result vector is the category result output by the classifier.

The invention has the beneficial effects that:

the invention adopts a transfer learning method to transfer the prior knowledge to a new model. The basic idea of pre-training in a source domain and fine-tuning in a target domain is adopted, so that the training speed, the classification accuracy and the convergence speed under a small sample are effectively improved. The model after pre-training can be suitable for sonar images of other seabed substrates. The deep learning classifier using the transfer learning can use the learned method in the similar data domain to a new target domain through pre-training. The migration process can obtain a better classification effect only by fine adjustment, and the number of required training samples can be greatly reduced.

The invention discloses a Transformer, which is a novel neural network architecture and is characterized by the use of a large number of attention modules. The transform classifier adopts a multi-head attention mechanism, can effectively capture detailed information which is beneficial to classification in an image, inhibit other useless information, learn effective information in a large-scale range, is not limited by the window size of a convolutional neural network, can effectively grasp global and local characteristics of a seabed bottom material sonar image, has strong generalization capability, retains the attention mechanism and grasps the capability of important characteristics, reduces overfitting due to the introduction of multiple heads, parallelly calculates and selects a plurality of information from input information, and has higher calculation efficiency in a high parallel environment such as a GPU. The method has the capability of quickly acquiring the region needing attention, and the use of a multi-head attention mechanism can enable the network to integrate global and local characteristics, so that the unique characteristics of various images are effectively held. Many layer normalization (LayerNorm), dropPath, etc. regularization methods are used in the network to reduce network overfitting. Meanwhile, aiming at the characteristics of the submarine sonar image, the invention improves and adjusts the network structure, enables the network structure to process a single-channel image, and solves the problem that the original visual Transformer cannot process a gray image. By adjusting the construction mode of the input patch, the convergence capability of the network training on few samples is improved. The method can realize the rapid and automatic identification of the substrate category, realize the sonar image classification operation in large batch, and can realize the higher classification accuracy rate by only using less training samples.

Drawings

FIG. 1 is a flow chart of the steps of a visual transform-based classification method for sonar images of seabed substrate according to the present invention;

FIG. 2 is a schematic diagram of transfer learning employed by the present invention;

FIG. 3 is a diagram of an overall network structure of an improved Transformer network according to the present invention;

FIG. 4 is a detailed diagram of the pre-convolution transform layer in the network of the present invention;

fig. 5 is a detailed structure diagram of a single coding block of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The invention discloses an improved visual Transformer classification method, belonging to a deep learning method. The method comprises the steps of carrying out pre-convolution processing on an image to form a patch, and sequentially carrying out patch embedding, position embedding of a transform coding layer and a multi-layer perceptron output layer to obtain a classification result. And minimizing network residual errors through a back propagation method to realize the training of the classifier. The deep learning aims at learning the internal rules and the expression levels of sample data, has strong nonlinear fitting learning capacity, and can effectively discover the common characteristics among similar images. Through the stacking of multiple coding layers, the network gradually learns global and local features, and actively focuses on 'important' parts in the image.

The invention will be described in further detail with reference to the accompanying drawings based on the steps of the technical solution.

The method comprises the following steps: acquiring a data set for training, wherein the type of a substrate to which an image belongs needs to be known, and a plurality of n types of sonar images of the substrate are acquired. Since deep learning requires the same features in the learning images to realize classification, a certain number of similar sonar images are indispensable. In this step, it is necessary to ensure that the selected images belong to the same substrate type. To obtain consistent features, we also need to use sonar image slices of the same size, and our network uses 100 x 100 resolution input images.

Step two: the dataset images are pre-processed. Since sonar images may have characteristics of low resolution, severe noise interference, deteriorated edge texture, etc., it is necessary to preprocess the sonar images of the seabed substrate. The pretreatment specifically comprises: firstly, smooth denoising and feature enhancement are respectively carried out on a sonar image of the seabed substrate, wherein the feature enhancement comprises correction enhancement and edge sharpening; and then filtering the smooth denoised and feature-enhanced seabed substrate sonar image by using a wavelet transform technology to obtain a seabed substrate sonar image with a higher signal-to-noise ratio and a better denoising effect, wherein the filtering adopts wavelet band-pass filters with different scales, the scales refer to that wavelet functions use different thresholds for low-frequency and high-frequency parts, and finally, reinforcing the filtered seabed substrate sonar image by using a multi-level median filtering method, wherein the reinforcing treatment comprises texture and edge shape information. The image is subjected to operations such as filtering, characteristic enhancement and the like, so that the influence of the noise generated by the sound field environment, the performance of sonar equipment and the like to blur the edge of the image is reduced, and the characteristic information of the sonar image is highlighted.

Step three: source domain images for pre-training are acquired. The transfer learning requires that the source domain image for pre-training and the target domain image for fine tuning have similar features and distribution, so that the mode learned by pre-training can be used for target domain image classification. Most of the existing pre-training models are based on ImageNet images, and the correlation with the features of the sonar images of the seabed substrate is weak. The invention provides a method for pre-training by using a texture data set as a source domain image, which has strong similarity and close data distribution and is more suitable for model migration, and the flow of data selection and migration learning is shown in figure 2.

Step four: and establishing an improved vision Transformer network. The network used by the invention adopts a pre-convolution conversion layer, a patch and position embedding (encoding) layer, a transform encoding (encoding) layer and a multi-layer perceptron (multi-layer perceptron) output layer. The layers are connected with each other to form the whole improved vision transducer network, as shown in the figure 2, and the dimension of the current tensor is shown on the right of each layer. The structure of each layer is described in detail below.

Pre-convolution transform layer: it converts the input picture into a 12 × 12 × 512 feature map for the subsequent patch embedding operation. The detailed structure of the pre-convolution transform layer is shown in fig. 3. First a 7 x 7 two-dimensional convolution layer is passed through, which prevents overfitting from occurring after group normalization and dropout operations, and then two residual blocks are passed through. Residual concatenation helps to preserve the pre-and post-information, and the final output tensor dimension is 12 × 12 × 512. The tensor dimensions obtained at each stage are listed in the figure. The improved network is additionally provided with the prepositive convolution conversion module, and the optimized network aims at the sonar image, so that the network can directly process the sonar image with a single channel and can be converged more quickly than the original network.

Patch and position embedding layer: its role is to convert the pre-tensor into patches that can be used by the Transformer and to record the position information of each patch. Meanwhile, a category token is added, which is a trainable parameter and can change along with the training of the network. As shown in the corresponding structure of fig. 2, the patch dimension is raised to 768 using a 1 × 1 convolution operation, and the tensor is flattened to two dimensions. At this time, a position patch is additionally spliced in, and a trainable parameter which is also 145 × 768 is added to the position patch to realize position embedding.

transform coding layer: it is formed by stacking 12 code blocks, each code block and the structure of multi-head attention are shown in fig. 4, and the tensor dimension is not changed in this part. Most important of them is a multi-head attention module, the invention uses 12 heads, namely, 12 q, k, v values are contained. The multi-head process performs attention initialized by a plurality of different parameters in parallel to calculate matching parameters between two elements, and splices the attention result of each element into one input of a subsequent projection network. The multi-head attention layer is calculated as follows:

head _i = attention (QW) _i ^Q ，KW _i ^K ，VW _i ^V )

Multi-head attention (Q, K, V) = head ₁ ，...，head _i )W ^O

The W matrix is a transformation parameter matrix, and the Softmax function can map the input to the probabilities of various classes, as follows

Wherein h (x, y) _j ) Is the predicted result for the j-th item.

In addition, the detailed structure of the multi-layer sensor in the structure is also given in fig. 4. the main function of the transform coding layer is to learn key information in input data, which is the core part of the whole network, and it does not change the dimensionality of the data.

Multilayer perceptron output layer: the invention uses a full connection layer to realize dimension transformation so as to obtain category output. The output is an n-dimensional tensor, and n is the number of the classes of the samples, so that the type of the input picture can be directly obtained through the output.

The invention selects the SGD optimizer to perform gradient descent. Aiming at the performance in a submarine substrate sonar image classification task, the SGD optimizer can obtain higher accuracy.

Step five: and training a network on the source domain texture image to obtain a pre-training model.

And (4) dividing the texture image into a training set and a verification set by using a cross entropy loss function, and inputting the training set and the verification set into a network for network training. And continuously learning and updating network parameters by taking the difference between the output of the network and the output under the real condition as the basis of network training. And (4) segmenting the training set image into a plurality of batchs, and inputting the batchs into a network in batches for training. When the verification accuracy of the network does not improve any more, the training is completed. At this time, the network has knowledge of the data extraction capability of the source domain.

Step six: the number of training sets and verification sets is set according to the proportion of 4. And dividing the data set processed in the step two into a training set and a verification set according to a specific proportion. The method aims to ensure that a training set and a verification set have reasonable and similar distribution and a network can better induce common characteristics of learning images.

Step seven: and inputting the images in the training set and the actual categories thereof into a network for fine adjustment, and verifying the classification accuracy on the verification set every training round. The fine tuning mode is as follows: copying the parameters in the Transformer coding layer in the pre-trained model to a new model, namely the model after migration. And inputting the images in the training set into the network, and performing a training process by using the cross entropy loss function similarly to the step five. At this time, since the network has learned the priori knowledge of the source domain, the network can be converged by training only a small number of rounds. Note that after each iteration of the training set is completed, the classification accuracy needs to be verified on the verification set, and when the verification accuracy of the network does not improve any more, the model after final fine tuning is obtained.

Step eight: and adjusting parameters such as learning rate, momentum of the optimizer and the like to enable the classifier to perform best on the verification set, so as to obtain the trained Transformer classifier. In order to obtain a network with the best classification effect, parameters such as learning rate and batch size need to be adjusted to enable the network to reach the optimal solution. The larger the learning rate is, the faster the network convergence speed is, but phenomena such as loss explosion, oscillation and the like can occur; when the learning rate is too small, the network is easy to fall into a local optimal solution, so that the training loss is not reduced, the verification accuracy is not improved, and the network effect is limited, so that the learning rate needs to be adjusted according to experience.

Increasing the Batch size can improve the stability of train loss convergence, but the size of the Batch size is limited by memory. And the Batch size is too small, the network may not converge.

The invention provides the method for using the learning rate of 0.0001-0.001, along with the increase of the iteration times and the reduction of training loss, the method can also properly reduce the size of the learning rate and realize the approximation of the optimal solution. The selected batch size is suitable for classification work of submarine substrate sonar images, and the selected batch size simultaneously considers memory, video memory and computing resource limitations and cannot exceed the maximum value.

Step nine: and inputting the sonar image of which the substrate type needs to be judged into a classifier, and automatically outputting a classification result by the classifier. The output result will be shown in the form of n-dimensional vector, where n is the number of picture categories. The number of the corresponding position of each category is the possible probability, so the corresponding position of the maximum value in the result vector is the category given by the classifier.

The factors that influence the accuracy of the classifier in the invention are as follows:

1. the images used as the classifier training set can preferably well embody the characteristics of various types of substrates. The generalization of the extracted features of the algorithm is influenced by image quality problems such as noise, resolution and the like, and the most excellent classification performance is brought by the excellent training set images. The method hopefully reduces the influence brought by the image quality of the training set through the means of image denoising, feature enhancement and the like.

2. The learning rate is set according to the situation of the training set, so that the network obtains better convergence rate and higher classification accuracy. Too high a learning rate may cause the network to fail to converge and wander around the optimal value. Too low a learning rate may cause the network to fall into a local optimum.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The utility model provides an improve vision transform seabed substrate sonar image classification method based on migration learning which characterized in that: the method comprises the following steps:

step 1: acquiring a data set for training;

step 2: carrying out preprocessing operation on the images;

preprocessing comprises image denoising and image feature enhancement;

and 3, step 3: acquiring a source domain image for pre-training;

and 4, step 4: establishing an improved vision Transformer network;

and step 8: adjusting network parameters to obtain a trained Transformer classifier;

2. The method for classifying the sonar image of the improved visual transform seabed sediment based on the transfer learning according to claim 1, is characterized in that: the images selected in step 1 belong to the same substrate type, sonar image slices of the same size are used, and the network adopts 100 × 100 resolution input images.

3. The method for improving the visual transform seabed substrate sonar image classification based on the transfer learning according to claim 1, is characterized in that: the step 2 specifically comprises the following steps:

step 2.2: filtering the smooth denoised and feature-enhanced submarine substrate sonar image by using a wavelet transform technology;

4. The method for classifying the sonar image of the improved visual transform seabed sediment based on the transfer learning according to claim 1, is characterized in that: in the step 4, each layer structure is specifically as follows:

the pre-convolution transform layer is: converting an input picture into a feature map of 12 multiplied by 512, wherein the dimension of an output tensor is 12 multiplied by 512 for subsequent patch embedding operation;

head _i = attention (QW) _i ^Q ，KW _i ^K ，VW _i ^V )

Multiple head attention (Q, K, m = mosaic (head) ₁ ，…，heod _i )W ^O

wherein h (x, y) _j ) Is the prediction result of the j item;

5. The method for classifying the sonar image of the improved visual transform seabed sediment based on the transfer learning according to claim 1, is characterized in that: step 5, taking the difference between the output of the network and the real output as the basis of network training, dividing the training set image into a plurality of batchs, and inputting the plurality of batchs into the network in batches for training; when the verification accuracy of the network is not improved any more, the training is completed.

6. The method for classifying sonar images of improved visual transform seabed sediment based on transfer learning according to claim 1 or 4, is characterized in that: the adjustment mode of step 7 is as follows:

copying parameters in a Transformer coding layer in the pre-trained model to a new model to obtain a post-migration model; inputting the images in the training set into the network, and performing a training process by using a cross entropy loss function similarly to the step 5; and after each iteration of the training set is finished, the classification accuracy needs to be verified on the verification set, and when the verification accuracy of the network is not improved any more, the model after final fine tuning is obtained.

7. The method for improving the visual transform seabed substrate sonar image classification based on the transfer learning according to claim 1, is characterized in that: step 9, the output result is displayed in the form of n-dimensional vectors, wherein n is the number of picture categories; the number of the corresponding position of each category is the probability, and the number of the corresponding position of the maximum value in the result vector is the category result output by the classifier.