CN115240058A

CN115240058A - Side-scan sonar target detection method combining accurate image segmentation and target shadow information

Info

Publication number: CN115240058A
Application number: CN202210669858.1A
Authority: CN
Inventors: 王惠刚; 雷灿
Original assignee: Northwestern Polytechnical University; Shenzhen Institute of Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University; Shenzhen Institute of Northwestern Polytechnical University
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-10-25

Abstract

The invention discloses a side-scan sonar target detection method combining image accurate segmentation and target shadow information. The method is characterized in that firstly, aiming at the problems of overlarge gray scale difference, different physical resolutions and the like existing in a side scan sonar image, the prior information of the height, the angle and the like during imaging is utilized to autonomously perform gain compensation and resolution correction on the image. And then carrying out image segmentation based on an improved deep LabV3+ network model on the corrected sonar image. After segmentation, the highlight area and the shadow area of the same target are matched by using the imaging principle and the geometric relation of the side scan sonar, and the target of interest is selected. Through image correction and accurate segmentation, after a data set is manufactured, a sonar target detection model based on a YOLOv5s network is constructed, model training is carried out, and finally, a trained result is used for carrying out target detection on a segmented sonar image to be detected. The method simultaneously introduces the target sound shadow area information and the image accurate segmentation, and effectively improves the detection and identification precision of the side scan sonar target.

Description

Side-scan sonar target detection method combining accurate image segmentation and target shadow information

Technical Field

The invention belongs to the field of underwater detection and identification, and particularly relates to a side-scan sonar target detection method combining accurate image segmentation and target shadow information.

Background

In offshore operations, underwater target detection and identification are the most common applications. The underwater targets have wide range, small targets comprise fish swarms, reefs, sunken ships, mines, submarines, underwater robots and the like, and large targets comprise submarine volcanoes, large sea ditches, large-area submarine substrates and the like. In the development of ocean resources, side scan sonar is a very important underwater fine detection means, is usually towed behind an unmanned boat or mounted on two sides of an underwater vehicle, and forms echo intensities in different distance and different azimuth areas on the two sides of the vehicle by adopting echo positioning and beam forming. The side scan sonar has the advantages of large use amount, large imaging range, low cost and moderate image resolution, is the most popular image sonar, and is widely applied to the fields of topographic and geomorphic surveying and mapping, seabed search and rescue, ocean exploration and the like.

The side-scan sonar imaging characteristic point is greatly related to submarine topography, a sonar image can be roughly divided into three parts, namely a target, a shadow and a background, and the background area is submarine reverberation and contains a lot of noise points; the shadow region is formed by the inability of sound waves to pass through to the region blocked by the target or hillock. The sonar image has the characteristics of large noise, serious distortion, fuzzy target edge, low resolution and poor texture, so that the underwater target classification and identification precision is not high, and the key problem of urgent research and solution is how to improve the classification accuracy and rapidity and reduce the complexity of a model. In consideration of the above situation, the deep learning algorithm can extract more abundant features, has strong robustness and strong practicability, has more excellent performance, and is very suitable for being applied to the field of sonar target detection and identification. The Yolov5s is a network with the minimum network model, the minimum width of a characteristic diagram and the fastest identification speed in a Yolo detection algorithm series, so that the classification accuracy, the efficiency and the model complexity can be well improved by using the network as a main model for sonar target detection and identification.

The shadow area in the side scan sonar image contains information such as the shape and height of the sonar target, and the shadow area is often ignored in conventional sonar image recognition, but the amount of information of the shadow area is not less than that of the target highlight area, so that the images of the target area and the shadow area need to be considered together to obtain more effective target information. Then, the information of the high-brightness area and the shadow area of the sonar target is extracted simultaneously, and the information of the two areas is used for target identification, so that the method is very effective for improving the identification effect of the sonar target.

Based on the consideration, the method develops research aiming at the side-scan sonar target detection and identification, and provides the side-scan sonar target detection and identification method combining image accurate segmentation and target shadow information. According to the characteristics of side-scan sonar imaging, the side-scan sonar images are synchronously segmented into a high-brightness target area and a shadow area, and finally, the information quantity of the two areas is used as the characteristic information of subsequent detection and identification, so that the identification work of the side-scan sonar target is completed.

Disclosure of Invention

Aiming at the technical problems, the invention discloses a side-scan sonar target detection method combining image accurate segmentation and target shadow information. The method starts from the imaging principle of a side scan sonar image, fully considers the information of a sound shadow area, firstly aims at the problems of overlarge gray scale difference and different physical resolutions at different distances in the side scan sonar image, and autonomously performs gain compensation and resolution correction on the image by utilizing the prior information such as height, angle and the like during side scan sonar imaging so as to realize the autonomous gain compensation and correction technology. And then, carrying out image segmentation on the corrected sonar image, and respectively carrying out feature extraction and image accurate segmentation on a highlight area and a shadow area of an interest target in the side scan sonar image based on an improved deep LabV3+ network model. After segmentation, the highlight area and the shadow area of the same target are matched by using the imaging principle and the geometric relation of the side scan sonar, and the target of interest is selected. After the preprocessing operation, finally, the image of the segmentation result is detected and identified by using a YOLOv5s network model, and meanwhile, target sound shadow area information and image accurate segmentation are introduced, so that the identification precision of the sonar target is effectively improved.

The invention aims to provide a side-scan sonar target detection method combining accurate image segmentation and target shadow information, which comprises the following steps:

s1, performing autonomous gain compensation aiming at overlarge gray level difference caused by energy attenuation due to the distance and the proximity in a side scan sonar image and the unstable port and starboard gray level difference caused by the mobile platform itself due to the severe underwater environment.

And S2, correcting geometric distortion aiming at the problem that the side-scan sonar images have different physical resolutions at different distances.

And S3, carrying out image segmentation based on an improved deep LabV3+ network on the corrected side-scan sonar image, and extracting a highlight area and a shadow area of an interested target in the side-scan sonar image.

And S4, after the segmented regions are obtained, matching the highlight region and the shadow region of the same target by using the imaging principle and the geometric relationship of the side-scan sonar.

And S5, constructing a sonar target detection model based on a YOLOv5S network, sending the corrected and segmented image into the network for training, and using the trained model for detection and identification of a real sonar target.

And S6, ablation experiment setting, namely verifying the effectiveness of the shadow area in the sonar image and the effectiveness of sonar image segmentation.

Further, the step S1 includes the following steps:

s11, finding a seabed line:

the seabed line is searched by taking the attitude information of the mobile platform as reference, and then the seabed line position is accurately searched left and right. When sonar data acquisition, height information and the attitude information of moving platform under water are stored, then sonar data is analyzed according to ping number, and the intensity of sound data and height information are obtained, through the height information of prestoring, carry out preliminary rough calculation to the sea bottom line, because the side scan sonar divide into port and starboard, therefore according to as follows when carrying out from height to image pixel position point conversion:

line _orig ＝N _s -(altitude*N _s /range) (1)

line in the formula _orig The method comprises the steps of representing an initial sea bottom line position point in a sonar image, indicating height information by an entity, representing the working range of the sonar by a range, and N _s The number of sampling points of a certain piece (ping (n)) of sound intensity data acquired by the single side is shown. And then, in 50 pixel values near the left and right of the initial value, searching a pixel point corresponding to the maximum gray value as a submarine line position point of the sonar.

S12, calculating the width of the image area corresponding to the maximum depth of the fish in the image:

the height of the aircraft is constantly changed when the aircraft moves underwater, if each piece of ping data is processed, the images are in a ragged state, and the minimum area range needs to be found because regularly arranged image data are finally formed. When the depth of the fish is the maximum, the range of the collected underwater information is the minimum, from the perspective of starboard, a complete side scan sonar image is formed by adopting N pieces of data, and a gray value converted from each point sound intensity value is defined to form a sequence s (N, i), wherein N is the serial number of ping, and i is the serial number of each point in each piece of data. The position of the sea bottom line of the nth data searched by the algorithm is a (N), so that the width N of the corresponding image area when the height of the fish in the image is maximum can be calculated according to the following formula _min ：

N _min ＝min(N _s -a(n)),n＝1...N (2)

S13, counting the average value of each ping section gray level:

after the submarine line position and the width of the image area corresponding to the maximum depth of the fish in the image are obtained, gray correction needs to be carried out on all pixel points in the area. Firstly, counting the average value of the gray levels of each ping section (sonar images are transversely fixed and the vertical direction is counted) in the vertical direction of the image:

s14, obtaining an image gray correction factor sequence:

after the gray average value of the longitudinal ping section is obtained, the gray average value is made from the horizontal direction of the sonar image, and finally the gray correction factor sequences of all pixel points in the image area are obtained:

further, the step S2 includes the steps of:

s21, obtaining the relation among the slant distance, the horizontal distance and the depth:

according to the position of the sonar transducer and the ray direction of the sound wave, the geometric relation among the slant distance, the horizontal distance and the depth can be obtained, and the method is specifically as follows:

where plantarrange denotes the horizontal distance, slantingrange denotes the pitch distance, and TowfishAlt denotes the height of the transducer from the sea floor. TowfishAlt can be obtained by sea-bottom line detection in sonar images.

S22, obtaining a point on the original slant-distance image corresponding to the corrected point on the flat-distance image according to the geometric relation:

the side scan sonar is divided into port and starboard, and the sonar data is stored according to the return sequence of the sonar signals, so the resolution correction needs to take the problem of port and starboard into consideration. Let a certain point P (x) on the original slant range image ₂ ,y ₂ ) Corresponding to the point P (x) on the corrected flat pitch image ₁ ,y ₁ ) According to the geometrical relationship, the specific corresponding relationship between the straight-distance point and the slant-distance point can be obtained as follows:

port resolution correction factor:

starboard resolution correction factor:

where Res denotes the resolution of the image and width denotes the image width. According to the prior information of the motion parameters of the aircraft during imaging, pixel coordinates of each point on the corrected image are automatically obtained in the resolution correction module, and then one of the pixel coordinates is mapped to the pixel coordinates, so that the side-scan sonar image with the corrected resolution is finally obtained.

Further, the step S3 includes the steps of:

s31, constructing a segmentation network model:

a sonar image segmentation model based on an improved DeepLabv3+ network is constructed, and an original Xconcentration series network is replaced by a MobilenetV2 serving as a main feature extraction network. Feature network extraction is enhanced in the Encoder and the Decoder, cross entropy is adopted for loss calculation, and a Dice loss index is introduced to evaluate semantic segmentation results.

Further, the step S31 includes the steps of:

s311, modifying the backbone feature extraction network:

aiming at a DeepLabv3+ network model for side scan sonar image segmentation, an Xceptation series with large parameter quantity and slow training speed is adopted as a trunk extraction network in an original structure network, so that a lightweight mobilenetV2 is replaced as the trunk extraction network in the invention, the mobilenetV2 adopts an Inverted resolution structure to firstly expand and then compress, and finally a residual block is added to directly connect input and output.

S312, reinforced feature network extraction:

after completing the feature extraction of mobilenetV2, two effective feature layers are obtained, and then the preliminary effective features are subjected to reinforced feature extraction. The enhanced feature extraction network is divided into an Encoder part and a Decoder part: in the Encoder, for a preliminary effective feature layer compressed four times, firstly, feature extraction is performed by using parallel hole convolutions with different rates (expansion rates), and for an input x and a convolution kernel w, an output feature y of the hole convolution at an ith position point is specifically calculated as follows:

where r represents the expansion ratio in the hole convolution and kernel-size represents the convolution kernel size. And after extraction, merging the features, and finally performing 1x1 convolution compression on the features.

In the Decoder, aiming at the primary effective characteristic layer compressed twice, the number of channels is firstly adjusted by utilizing 1x1 convolution, the adjusted channels are stacked with the result obtained after the up-sampling is carried out on the hollow convolution characteristic, and the stacked result is subjected to two times of depth separable convolution, so that a final effective characteristic layer is obtained.

S313: modified loss function:

during model training, the training effect of the network model is evaluated by using a Focal loss function and a Dice loss function. Because the difficulty of classifying sonar image samples is inconsistent, in order to solve the problem of model training caused by sample imbalance, the improvement is made on the basis of a cross entropy loss function, a Focal loss function is provided, weight is added to the corresponding loss of the samples according to the difficulty of sample resolution, the weight occupied by a large number of simple negative samples in training is reduced, and a specific expression is written as follows:

FL(p _t )＝-α _t (1-p _t ) ^γ log(p _t ) (9)

wherein p is _t To predict the probability of this class, in multi-classification, i.e. the probability of the Softmax output, α _t For each class weight factor, γ is an adjustment factor, and when γ =0, it is the cross entropy loss function.

The Dice coefficient is a set similarity measurement function, the value range is [0,1], the larger the coefficient value is, the larger the coincidence degree of the predicted result and the real result is, but the smaller the loss value is, the better the loss value is during model training, so the Dice loss is defined as follows:

wherein | X |, Y |, represents the intersection of the predicted result and the real result, and | X | and | Y | represent the number of elements of the predicted result and the real result, respectively. In the DeepLabV3+ semantic segmentation model adopted by the invention, X represents a ground channel, and Y represents a predicted segmentation image.

S32, making a sonar image data set, and performing model training:

and after the model is modified, a sonar image data set is made and is processed correspondingly. The purpose of the sonar image segmentation part is to segment a target highlight region formed by active sound wave echoes and a shadow region formed by object occlusion, and not to distinguish the target types represented by the respective regions, so in the segmentation task, only two types are labeled to the data set, namely a highlight region (light) and a shadow region (dark). Through experimental collection, 488 pieces of side scan sonar image training sets are arranged, 71 pieces of verification sets are arranged, and 136 pieces of testing sets are arranged. And marking a target bright area and a shadow area of each image in the test set and the verification set, sending the marked images into a network, modifying category parameters, a trunk model, pre-training weights and the like in the network, and finally finishing the training of the sonar image segmentation model.

And S33, segmenting the target bright area and the sound image area of the sonar image to be segmented by using the trained model to obtain a final image segmentation result:

and modifying the weight file in the test code into the best weight of the training result, modifying the category parameters, and after modification, carrying out segmentation on the target bright area and the shadow area on the sonar image to be segmented in the test set to obtain the final segmentation result. After segmentation, the original sonar image is converted into Mask (Mask) only containing 3 pixel values, and the 3 values respectively represent: the method comprises the steps of target bright area, shadow segmentation and background, so that the problems of fuzzy sonar target edge, large sonar image noise and low resolution are effectively solved.

The sonar image segmentation part does not classify the target, which is equivalent to further preprocessing the original sonar image, and the precise segmentation of the image and the extraction of the sonar target shadow information are completed simultaneously through the preprocessing, so that the segmented image needs to be detected and identified.

Further, the step S4 includes the steps of:

and S41, matching the bright area and the shadow area of the same target by using the imaging principle of the side scan sonar.

After the segmented area is obtained, according to the imaging principle of a side-scan sonar image, because the side-scan sonar emission and detection echo are perpendicular to the course, after the sound wave is emitted, the reflected wave returns to the transducer along the original path to form a corresponding target echo after reaching the seabed through the propagation of the sound channel. The target shadow area is not irradiated because the sound wave is blocked by the target, so that the target shadow area and the target highlight area are in the same direction with the sound ray, namely the bright area and the shadow are on the same horizontal line, and the heights of the areas formed by the shadow and the bright area (the moving direction of the aircraft) are consistent. A target located on the sea floor with a shadow immediately behind the target's strong echo.

Because a plurality of target highlight areas and shadow areas obtained by the segmentation network may exist, the highlight areas and the shadow areas of the same target are matched according to the imaging principle of the side-scan sonar image, and an interested target is selected.

Further, the step S5 includes the steps of:

s51, constructing a network model and an algorithm:

YOLOv5s is a network with the smallest network model, the smallest width of a characteristic diagram and the fastest identification speed in a YOLOv5 detection algorithm series, so the network is used as a main model for detecting and identifying the sonar target, and the network structure of the YOLOv5s comprises 4 main parts: input (Input end), backbone (Backbone network), neck (multi-scale feature fusion) and Output (Output end), wherein the Input comprises Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling; backbone used Focus structure and CSP structure; the Neck adopts an FPN + PAN structure; output includes Bounding box loss function calculation and NMS non-maximum suppression.

S52, data collection, labeling and data set construction:

after the preprocessing operation, the problems of sonar target edge blurring and the like are solved through image correction and accurate segmentation, the accurate extraction of target shadow information is realized, and the side-scan sonar target detection based on the image accurate segmentation and the target shadow information is carried out in the part, so that a segmentation result obtained by a DeepLabV3+ network is used as original data of a sonar target detection network.

Firstly, frame marking is carried out on a target highlight area and a shadow area of a segmentation result at the same time, the frame marking is carried out on the target highlight area and the shadow area in the same target frame, 4 types of data are marked, the data are respectively a drowner, a mine, an airplane and a sunken ship, and then the marked data are decomposed into a training set, a verification set and a test set. 488 training sets, wherein 98 images of a drowner target, 119 images of a mine target, 87 images of an airplane target and 184 images of a sunken ship target are included; 71 verification sets, wherein 17 target images of drowners, 18 target images of mines, 13 target images of airplanes and 23 target images of sunken ships are contained; and 136 test sets, wherein 13 images of drowners, 35 images of mine targets, 23 images of airplane targets and 65 images of sunken ship targets are included.

S53, experimental setting and model training:

configuration and parameter setting of experimental environment: the model of the invention finishes training and testing on NVIDIAGeForce RTX 3080 display card based on PyTorch deep learning frame, uses Python as programming language, modifies target type, pretrains weight path, configuration file of network structure, sets parameters such as epochs, batch-size, picture size, initial learning rate, cyclic learning rate, learning rate momentum, weight attenuation coefficient, ioU loss coefficient, cls loss coefficient and cls BCELoss positive sample weight, and completes construction and setting of the whole model.

Model training: and after the parameters are modified and set, sending a data set prepared from the sonar image segmentation result obtained above into a network for model training, and checking the model training result by using a Tenscoreboard visualization tool. Obtaining the variation trend of the three loss function means of the prediction frame regression loss function mean, the target detection loss function mean and the classification loss function mean along with the iteration times, the variation situation of the accuracy and the recall rate along with the iteration times, and the variation curve of the average accuracy mean when the threshold value of the cross-over ratio IoU is 0.5 and the average accuracy mean when the threshold value of the cross-over ratio IoU is 0.5.

S54: model testing and result analysis:

after the network model training is completed, selecting a weight file with the best training effect to perform position regression on the data of the test set for recognizing the sonar target, thereby completing the test of the model, and finally performing multi-aspect evaluation on the model according to evaluation indexes such as Precision (Precision), recall rate (Recall), PR curve, F1 score and average Precision average value. TP (true positive) indicates that the true class of the sample is a positive case, and the result of the model prediction is also a positive case; TN (true negative) indicates that the true class of the sample is a negative case and the model predicts it as a negative case; FP (false positive) indicates that the true class of the sample is a negative case, but the model predicts it as a positive case; FN (false negative) indicates that the true class of the sample is a positive case, but the model predicts it as a negative case.

Precision (Precision) is a measure of Precision, and represents the proportion of the real positive example divided into positive examples, and the calculation formula is defined as follows:

recall (Recall) is a measure of coverage, and a plurality of positive examples of the measure are divided into positive examples, and the calculation formula is defined as follows:

the PR curve is a curve composed of Recall as the abscissa and Precision as the ordinate, and the larger the area at the lower left indicates that the model has the better effect on the data set. The shaded area at the lower left of the PR curve is the average accuracy value (AP), and the calculation formula is defined as follows:

f1-score (F1 score) represents the harmonic mean of accuracy and recall, is a comprehensive evaluation index for evaluating the detection capability of the model, and takes a value of [0,1], and the calculation formula is defined as follows:

further, the step S6 includes the steps of:

s61: verifying the effectiveness of the sonar target shadow area:

firstly, only labeling the highlight region of a target object on an original sonar image, and directly labeling the sonar original image without image segmentation. Decomposing the marked data set into a training set, a verification set and a test set, sending the training set data into a YOLOv5s network for model training, verifying the quality of the model by the verification set, predicting the result of the trained model by the test set, and finally obtaining index curve graphs such as Precision, recall, PR curve, F1-score and the like and a confusion matrix result.

And secondly, simultaneously labeling the highlight area and the shadow area of the target object of the original sonar image, and obtaining a series of index curve graphs and confusion matrix results. And finally, comparing a test result obtained based on the sonar target highlight area with a test result obtained based on the combination of the sonar target highlight area and the shadow area, thereby verifying the validity of the shadow area information in the acoustic image.

S62: verifying the effectiveness of accurate image segmentation:

according to the results obtained in the steps, an index curve result and a confusion matrix obtained by simultaneously detecting a target highlight area and a shadow area in an original sonar image are compared with a test result obtained by detecting the highlight area and the shadow area of the sonar target after the bright area and the shadow area are segmented by a DeepLabV3+ network in advance, so that the effectiveness of accurate segmentation of the side-scan sonar image in advance is verified.

S63: verifying the validity of the combined image accurate segmentation and target shadow information:

according to the obtained test result, the original detection and identification effect which only comprises a target bright area without division is compared with an index curve result and a confusion matrix which are obtained by simultaneously combining the shadow area and the accurate image division, so that the effectiveness of the combined image accurate division and the target shadow information is verified.

Drawings

FIG. 1 is a block flow diagram of the present invention.

Fig. 2 is a schematic view of side scan sonar imaging according to the present invention.

FIG. 3 is a schematic diagram of the result of the side scan sonar image preprocessing of the present invention.

FIG. 4 is a curve diagram of the training process based on the Yolov5s model detection of the present invention.

FIG. 5 is a graph showing the index values of all categories in the test results of the present invention.

FIG. 6 is a graph of Precision (Precision) versus confidence in the test results of the present invention.

FIG. 7 is a plot of Recall (Recall) versus confidence in test results of the present invention.

FIG. 8 is a diagram illustrating PR curves in the test results of the present invention.

FIG. 9 is a plot of F1-score versus confidence for the results of the test of the present invention.

FIG. 10 is a diagram illustrating confusion matrix results in the test results of the present invention.

FIG. 11 is a diagram illustrating an example of a part of the test results of the present invention.

FIG. 12 is a diagram illustrating comparison results of verifying the effectiveness of the shadow region according to the present invention.

Fig. 13 is a schematic diagram of a comparison result for verifying the image segmentation validity according to the present invention.

FIG. 14 is a comparison graph of the verification of the accurate segmentation effect of the combined target shadow region and the image according to the present invention.

Detailed Description

The following describes embodiments of the present invention in detail with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart of a side-scan sonar target detection method combining accurate image segmentation and target shadow information, which is provided by the present invention, and includes the following steps:

And S4, matching the highlight area and the shadow area of the same target by using the imaging principle and the geometric relationship of the side-scan sonar after the segmented area is obtained.

Further, the step S1 includes the steps of:

s11, finding the seabed line, wherein the seabed line is referred by the posture information of the mobile platform, and then accurately finding the position of the seabed line from left to right:

when sonar data acquisition, height information and the attitude information of moving platform under water are stored, then sonar data is analyzed according to ping number, and the intensity of sound data and height information are obtained, through the height information of prestoring, carry out preliminary rough calculation to the sea bottom line, because the side scan sonar divide into port and starboard, therefore according to as follows when carrying out from height to image pixel position point conversion:

line _orig ＝N _s -(altitude*N _s /range) (15)

line in the formula _orig Representing the initial sea bottom line position point in the sonar image, altitude representing height information, range representing the working range of the sonar, N _s The number of sampling points of a certain piece (ping (n)) of sound intensity data acquired by the single side is shown. Then, within 50 pixel values around the initial value, the point where the maximum gray value is found is the seabed line position point a (n) of sonar:

a(n)＝max(line _orig ±50) (16)

in the sonar style adopted by the invention, N _s =1000, range =20m and altitude are carried by an underwater mobile platform the height sensor(s) collects the height information in real time.

the height of the aircraft is constantly changed when the aircraft moves underwater, if each ping data is processed, the images are in a ragged state, and the minimum area range needs to be found because regularly arranged image data are formed finally. When the depth of the fish is the maximum, the range of the collected underwater information is the minimum, from the perspective of starboard, a complete side scan sonar image is formed by adopting N pieces of data, and a gray value converted from each point sound intensity value is defined to form a sequence s (N, i), wherein N is the serial number of ping, and i is the serial number of each point in each piece of data. The position of the sea bottom line of the nth data searched by the algorithm is a (N), so that the width N of the corresponding image area when the height of the fish in the image is maximum can be calculated according to the following formula _min ：

N _min ＝min(N _s -a(n)),n＝1...N (17)

S13, counting the average value of each ping section gray level:

in order to observe the side scan sonar image, the image is beautiful, and N =500 is adopted in the invention.

S14, obtaining an image gray correction factor sequence:

further, the step S2 includes the steps of:

according to the position of sonar transducer and the ray direction of sound wave can obtain the geometric relation between slope distance, horizontal distance and the degree of depth three, specifically as follows:

S22, obtaining a point on the original slant-distance image which corresponds to the point on the corrected flat-distance image according to the geometrical relationship:

the side scan sonar is divided into a port side and a starboard side, and sonar data are stored according to the return sequence of sonar signals, so the problem of port and starboard sides needs to be considered in resolution correction. Let a certain point P (x) on the original slant range image ₂ ,y ₂ ) Corresponding to the point P (x) on the corrected flat pitch image ₁ ,y ₁ ) According to the geometric relationship, the following specific corresponding relationship between the straight pitch point and the slant pitch point can be obtained:

port resolution correction factor：

Starboard resolution correction factor:

where Res denotes the resolution of the image and width denotes the image width. In the invention, width =2000, towfishAlt = a, res =1. At the time of calculation, since x ₂ Is an integer, x obtained after a series of calculations ₁ The non-integer value is a non-integer, and the non-integer value cannot find a corresponding pixel position point in the image, so that after the corresponding relation is calculated, the pixel value at the non-integer image coordinate position point is calculated by adopting a bilinear interpolation principle. Due to y ₁ ＝y ₂ Then the image coordinate only has the interpolation in the transverse x-axis direction, so that the non-integer x can be obtained according to the bilinear interpolation calculation formula ₁ Pixel value of (b):

according to the prior information of the motion parameters of the aircraft during imaging, pixel coordinates of each point on the corrected image are automatically obtained in the resolution correction module, and then one of the pixel coordinates is mapped to the pixel coordinates, so that the side-scan sonar image with the corrected resolution is finally obtained.

Further, the step S3 includes the steps of:

s31, constructing a segmentation network model:

a sonar image segmentation model based on an improved DeepLabV3+ network is constructed, and an original Xconcentration series network is replaced by a MobilenetV2 serving as a main feature extraction network. Feature network extraction is enhanced in the Encoder and the Decoder, loss calculation is carried out by adopting Focal loss, and a semantic segmentation result is evaluated by introducing a Dice loss index.

Further, the step S31 includes the steps of:

s311, modifying the backbone feature extraction network:

aiming at a DeepLabV3+ network model for side scan sonar image segmentation, an Xceptation series with large parameters and slow training speed is adopted as a trunk extraction network in an original structure network, so that a lightweight mobileneetV 2 is replaced as the trunk extraction network in the invention, the mobileneetV 2 adopts an inversed residual structure to firstly expand and then compress, specifically, dimension increase is firstly carried out by utilizing 1x1 convolution, then feature extraction is carried out by utilizing 3x3 depth separable convolution, and finally dimension reduction is carried out by utilizing 1x1 convolution. A new activation function ReLU6 is employed in the mobilenetV2 structure:

y＝ReLU6(x)＝min(max(x,0),6) (24)

in order to prevent the problem that the ReLU activation function causes relatively large loss to low-dimensional information and causes little loss to high-dimensional information in the last layer of 1x1 convolution dimensionality reduction, a linear activation function is adopted:

y＝linear(x)＝x (25)

and finally, a residual block is added into the whole structure to directly connect the input and the output.

S312, reinforced feature network extraction:

where r represents the expansion ratio in the hole convolution and kernel-size represents the convolution kernel size. And after extraction, merging the features, and finally performing 1x1 convolution compression on the features. The cavity convolution under different rates can be adopted to obtain the reception fields r in different ranges:

in the formula S _kernal-i Is the ith layer convolution kernel size, V _i Step, the size of the i-th layer receptive field _i-1 Is the i-1 st layer core size.

In the Decoder, aiming at the primary effective characteristic layer compressed twice, the number of channels is firstly adjusted by utilizing 1x1 convolution, the adjusted channels are stacked with the result obtained after the up-sampling is carried out on the cavity convolution characteristic, and the stacking result is further subjected to two times of deep separable convolution, so that a final effective characteristic layer is obtained.

S313: modified loss function:

and during model training, evaluating the training effect of the network model by using a cross entropy loss function and a Dice loss function. Wherein the cross entropy loss function is embodied as follows:

where θ represents a weight parameter, x represents a batch training sample size, p ₁ Representing the expected class probability, p ₂ Representing the prediction class probability.

Because the difficulty of classifying sonar image samples is inconsistent, in order to solve the problem of model training caused by sample imbalance, the improvement is made on the basis of a cross entropy loss function, a Focal loss function is provided, weight is added to the corresponding loss of the samples according to the difficulty of sample resolution, the weight occupied by a large number of simple negative samples in training is reduced, and a specific expression is written as follows:

FL(p _t )＝-α _t (1-p _t ) ^γ log(p _t ) (29)

The Dice coefficient is an aggregate similarity measurement function, is usually used for calculating the similarity of two samples, and has a value range [0,1], and the calculation formula is as follows:

wherein | X |, Y |, represents the intersection of the predicted result and the real result, and | X | and | Y | represent the number of elements of the predicted result and the real result, respectively. In the DeepLabV3+ semantic segmentation model adopted by the invention, X represents a ground channel, and Y represents a predicted segmentation image. The larger coefficient value represents the higher coincidence degree between the predicted result and the real result, but the smaller loss value is better when the model is trained, so the Dice loss as the semantic segmentation loss is defined as follows:

s32, making a sonar image data set, and performing model training:

and after the model is modified, a sonar image data set is made and is processed correspondingly. The purpose of the sonar image segmentation part is to segment a target highlight region formed by active sound wave echoes and a shadow region formed by object occlusion, and not to distinguish the target types represented by the respective regions, so in the segmentation task, only two types are labeled to the data set, namely a highlight region (light) and a shadow region (dark). Through experimental collection, 488 side-scan sonar image training sets, 71 verification sets and 136 test sets are arranged. And marking a target bright area and a shadow area of each image in the test set and the verification set, sending the marked images into a network, modifying category parameters, a trunk model, pre-training weights and the like in the network, and finally finishing the training of the sonar image segmentation model.

The side-scan sonar image results after the above correction and pre-segmentation are shown with reference to fig. 2. The sonar image segmentation part does not classify the target, which is equivalent to further preprocessing the original sonar image, and the precise segmentation of the image and the extraction of the sonar target shadow information are completed simultaneously through the preprocessing, so that the segmented image needs to be detected and identified.

Further, the step S4 includes the following steps:

s41, matching the bright area and the shadow area of the same target by using the imaging principle of a side scan sonar:

after the segmented area is obtained, according to the imaging principle of a side-scan sonar image, because the side-scan sonar emission and detection echo are perpendicular to the course, after the acoustic wave is emitted, the reflected wave returns to the transducer along the original path to form a corresponding target echo through the propagation of an acoustic channel after reaching the seabed. The target shadow area is not irradiated due to the fact that sound waves are blocked by the target, so that the echo of the sound waves in the area is very little, the target shadow area and the target highlight area are in the same direction with sound rays, namely the bright area and the shadow are on the same horizontal line, and the height (the moving direction of a vehicle) of the area formed by the shadow and the bright area is the same.

Referring to fig. 3, the imaging principle of the side scan sonar image and the form of the shadow generated by the target are shown, and it can be seen from the figure that the high-brightness area formed by the sonar target is on the same horizontal line with the shadow area, and the formed height is consistent. The target in the figure, which is located on the sea bottom, has its shadow immediately behind the strong echo of the target.

Further, the step S5 includes the steps of:

s51, constructing a network model and an algorithm:

YOLOv5s is a network with the smallest network model, the smallest width of a characteristic diagram and the fastest identification speed in a YOLOv5 detection algorithm series, so the network is used as a main model for detecting and identifying the sonar target, and the system comprises 4 main parts: input (Input end), backbone (Backbone network), neck (multi-scale feature fusion) and Output (Output end), wherein the Input comprises Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling; backbone uses Focus structure and CSP structure; the Neck adopts an FPN + PAN structure; output includes Bounding box loss function calculation and NMS non-maximum suppression. The approximate flow of side-scan sonar target detection using the YOLOv5s network is as follows:

(1) And (5) image preprocessing. Firstly, conducting Mosaic data enhancement on side-scan sonar images, then adaptively scaling the images to a pre-specified image size (640 x 640), and finally inputting the images into a network in batches according to a preset minipatch (batch).

(2) The data is propagated forward. The forward propagation comprises Backbone-based feature extraction, neck-based feature fusion and Prediction, and finally the position and size of a sonar target Prediction frame and the type of a contained sonar target are obtained.

(3) And calculating an error. And calculating the error size between the prediction box and the Ground channel according to the loss function.

(4) And (6) updating the parameters. And updating the coefficient matrix and the bias in forward propagation by a gradient descent method, thereby reducing the error between the prediction box and the group channel. And finally, selecting a coefficient matrix and an offset corresponding to the minimum loss value.

(5) And (4) target prediction. And replacing the coefficient matrix and the bias selected after the iteration is finished into forward propagation, and solving the prediction information of the target object from the side-scan sonar image to be detected.

S52, data collection, labeling and data set construction:

Firstly, frame marking is carried out on a target highlight area and a shadow area of a segmentation result at the same time, the frame marking is carried out on the target highlight area and the shadow area in the same target frame, 4 types of data are marked, the data are respectively a drowner, a mine, an airplane and a sunken ship, and then the marked data are decomposed into a training set, a verification set and a test set. 488 training sets, wherein 98 images of a drowner target, 119 images of a mine target, 87 images of an airplane target and 184 images of a sunken ship target are included in the training sets; 71 verification sets, wherein 17 target images of drowners, 18 target images of mines, 13 target images of airplanes and 23 target images of sunken ships are contained; and 136 test sets, wherein 13 images of drowners, 35 images of mines, 23 images of airplane targets and 65 images of sunken ship targets are contained in the test set. The details are shown in the following table:

categories	Training collection/sheet	Verification collection/sheet	Test collection/sheet
				Drowning person	98	17	13
Mine	119	18	35
				Aircraft with a flight control device	87	13	23
Sunken ship	184	23	65

S53, experimental setting and model training:

and (3) experimental environment configuration: the training and testing of the model are all completed under the Ubuntu system, python is used as a programming language, a PyTorch deep learning framework is selected, and the CPU model is

A silver 4110 CPU@2.10Ghz, a memory 64G, a GPU model NVIDIA GeForce RTX 3080, and a GPU acceleration library CUDA 11.4.

Setting parameters: the modified target class (number of classes) is 4, the modified pre-training weight path is yolov5s.pt, the configuration file of the modified network structure is yolov5s.yaml, the modified data set path is a self-created sonar data set data file sonar.yaml, the epochs are set to 150, the batch-size is 16, the picture size is 640, the initial learning rate is 0.01, the cycle learning rate is 0.1, the learning rate momentum is 0.937, the weight attenuation coefficient is 0.0005, the pre-heat learning is 3.0, the pre-heat learning momentum is 0.8, the pre-heat initial learning rate is 0.1, the iou loss coefficient is 0.05, the cls loss coefficient is 0.5, the cls bcels positive sample weight is 1.0, the aspect ratio of the threshold value when training is 0.2, and the anchor is 4.0.

Model training: and after the parameters are modified and set, sending a data set prepared from the sonar image segmentation result obtained above into a network for model training, and checking the model training result by using a Tenscoreboard visualization tool. Obtaining a change trend of three loss function means, namely a prediction frame regression loss function mean, a target detection loss function mean and a classification loss function mean, along with the change situation of the iteration times, the accuracy and the recall rate, and a change curve of the average accuracy mean when the threshold value of the cross-over ratio IoU is 0.5 and the average accuracy mean when the threshold value of the cross-over ratio IoU is 0.5.

The variation trend of various types of function values during training is shown with reference to fig. 4.

The 1 st graph in the 1 st row is the change condition of the mean value of the regression loss function of the prediction frame in the training set along with the iteration number, the smaller the value of the mean value is, the more accurate the block prediction is, and the result of the block prediction is known to be below 0.02 from the training result, so the result is more accurate. The 2 nd graph in the 1 st row is the change condition of the target detection loss function mean value along with the iteration times in the training set, the smaller the value is, the more accurate the target detection is, the loss function is finally reduced to 0.01 from the training result, and therefore, the target detection result is more accurate. The 3 rd graph in the 1 st row is the change condition of the mean value of the classification loss function in the training set along with the iteration times, the smaller the value is, the more accurate the classification is, the loss function value is reduced to 0.01 from the training result, and therefore, the classification result is more accurate. The average value of the above 3 loss functions decreases rapidly with the increase of the number of iterations, and the training period is iterated to about 100 times and tends to be stable.

The 4 th graph in the 1 st row is the change situation of the precision along with the iteration number in the training set, and the larger the value is, the higher the prediction accuracy is. The 5 th graph in the 1 st row is the change situation of the recall ratio in the training set along with the iteration number, and the larger the value is, the higher the prediction accuracy is. It can be seen from the graph that the accuracy and the recall rate tend to be stable after rapidly rising with the increase of the number of iterations.

The 1 st, 2 nd and 3 nd graphs in row 2 are respectively the change conditions of the regression loss function mean value, the target detection loss function mean value and the classification loss function mean value of the verification centralized prediction frame along with the iteration times, and the change conditions also rapidly decrease along with the increase of the iteration times, and finally tend to be stable about 100 times, but the curve fluctuation is larger and unsmooth compared with that of the training set.

Row 2, panel 4 is the average mean of precision (mAP@0.5) at a cross-over ratio Iou threshold of 0.5. Row 2, panel 5 is the average mean of accuracy at a cross-over ratio IoU threshold of 0.5 (mAP@0.5: 0.95). Both mAP@0.5 and mAP@0.5:0.95 values are gradually increasing and stabilizing.

S54: model testing and result analysis:

after the network model training is finished, selecting a weight file with the best training effect to perform position regression of the sonar target recognition on the test set data, so that the model testing is finished, and finally performing multi-azimuth evaluation on the model according to evaluation indexes such as Precision (Precision), recall rate (Recall), PR curve, F1 score and average Precision average value. TP (true positive) indicates that the true class of the sample is a positive case, and the result of the model prediction is also a positive case; TN (true negative) indicates that the true class of the sample is a negative case and the model predicts it as a negative case; FP (false positive) indicates that the true class of the sample is a negative case, but the model predicts it as a positive case; FN (false negative) indicates that the true class of the sample is a positive case, but the model predicts it as a negative case.

recall (Recall) is a measure of coverage, and there are a plurality of positive examples divided into positive examples, and the calculation formula is defined as follows:

the PR curve is a curve composed of Recall as the abscissa and Precision as the ordinate, and the larger the area at the lower left indicates that the model has the better effect on the data set. The shaded area at the lower left of the PR curve is the average precision value (AP), and the calculation formula is defined as follows:

the Precision (Precision) value, recall (Recall) value, mAP@0.5 value and mAP@0.5:0.95 value for all types and categories in the test set results are shown with reference to fig. 5. When the intersection ratio IoU is 0.5, the overall predicted value is: p =0.944, r =0.925, map @0.5=0.974, map @0.5: the human index was 1, mine 91.2%, plane 92.7%, and ship 93.9%. According to the prediction result, the prediction value of the test set is high, the prediction effect is good, and the result of high identification precision is achieved.

Referring to FIG. 6, a graphical representation of Precision (Precision) versus confidence in test set results is shown. The figure shows that the prediction accuracy of all the classes reaches 1 when the confidence is 0.753 and above. Where the prediction accuracy of the human class reaches 1 with a confidence of 0.4.

Referring to FIG. 7, a graphical representation of Recall (Recall) versus confidence in test set results is shown. The confidence score is generally taken as 0.5, and it is shown that when the confidence score is 0.5, the recall score of the human category is 0.915, the recall score of the mine category is 0.886, the recall score of the plane category is 0.886, and the recall score of the ship category is 0.941. From the above results, the recall values are all very high, so the prediction effect is good.

A schematic diagram of the PR curve in the test set results is shown with reference to fig. 8. The figure shows the values of the area AP enclosed by the coordinate axis below the PR curve, with the average value of AP for all classes being mAP@0.5, which is 0.974, and the values of AP for each individual class being: a human of 0.982, a mine of 0.959, a plane of 0.977, and a ship of 0.978. The mAP value is very high, and the performance of the model is verified to be very good.

Referring to FIG. 9, a graph of F1-score versus confidence in the test set results is shown. The F1 scores of all classes are shown to reach 0.93 with a confidence of 0.393, and the score value is close to 1, which verifies that the model of the invention has strong detection capability.

Referring to fig. 10, a schematic diagram of the confusion matrix results in the test set results is shown. The confusion matrix shows what parts of the classification model will be confused when making predictions. The values on the main diagonal line from the top left to the bottom right of the matrix are correct classification probability values, and the specific correct classification probabilities read from the graph are respectively: a human of 0.85, a mine of 0.89, a plane of 0.96 and a ship of 0.94. The matrix is read from the horizontal axis, and the human class is easily recognized as the plane class by mistake, the mine class is easily recognized as the ship class by mistake, the plane class is easily recognized as the background by mistake, and the ship class is easily recognized as the mine class by mistake.

Referring to fig. 11, a schematic diagram of a portion of the test results in the test set is shown. And 4, the detection and identification results of the types are displayed, and the targets can be accurately detected and the types of the sonar targets can be accurately identified.

Further, the step S6 includes the steps of:

s61: verifying the effectiveness of the sonar target shadow area:

firstly, only labeling the highlight region of a target object on an original sonar image, and directly labeling the sonar original image without image segmentation. Decomposing the marked data set into a training set, a verification set and a test set, sending the training set data into a YOLOv5s network for model training, verifying the quality of the model by the verification set, predicting the result of the trained model by the test set, and finally obtaining index curve graphs such as Precision, recall, PR curve and F1-score and a confusion matrix result.

And secondly, simultaneously labeling the highlight area and the shadow area of the target object of the original sonar image, and obtaining a series of index curve graphs and confusion matrix results. And finally, comparing a test result obtained based on the sonar target highlight area with a test result obtained based on the combination of the sonar target highlight area and the shadow area, thereby verifying the effectiveness of the shadow area information in the acoustic image.

Referring to fig. 12, a diagram showing the comparison result in verifying the effectiveness of the shaded area is shown. As can be seen from comparison of various indexes in the data table, the target detection effect containing information of both the shadow area and the highlight area is better than the detection result containing only information of the target highlight area on various indexes, the value of mAP@0.5 rich in shadow information is 3.8% higher than that without shadow information, the value of mAP@0.5 is 11.6% higher than that of 0.95, the value of R is 3.9% higher than that of R, and the value of P is 0.2% higher than that of P. And comparing a relation curve between the F1 score and the confidence coefficient, a relation curve between the P value and the confidence coefficient, a relation curve between the R value and the confidence coefficient, a PR relation curve and a confusion matrix, thereby verifying the correctness and the effectiveness of the combination sonar target shadow area information provided in the algorithm.

S62: verifying the effectiveness of accurate image segmentation:

Referring to fig. 13, a comparison result diagram for verifying the validity of image segmentation is shown. It can be known from the comparison of various indexes in the data table that the target detection effect of segmenting the bright area and the shadow of the sonar target through the deep labv3+ network is better than the detection result without segmentation preprocessing (but the target shadow information is still contained in the target detection result), the value of mAP@0.5 after image segmentation is 0.8% higher than that without segmentation, the value of mAP@0.5:0.95 is 6.3% higher than that of 0.95, the value of R is 1.3% higher than that of P is 4.3% higher than that of R, so that the accuracy and the effectiveness of segmentation preprocessing of the target bright area and the shadow in the sonar image before target detection are provided in the algorithm are demonstrated.

Comparing the results of fig. 12 and fig. 13, it can be found that adding the sonar target shadow information greatly improves the value of the mapp, which indicates that adding the shadow information improves the detection and identification effect of the model on all categories. The segmentation preprocessing operation is carried out on the target bright area and the shadow in the sonar image in advance before the target detection, so that the P value is greatly improved, and the effect of identifying the target category of the model is effectively improved through the segmentation preprocessing operation. The shadow area effectively improves the detection effect of the whole model, and the image segmentation effectively improves the target recognition effect of the model.

according to the obtained test result, the original detection and identification effect which does not perform segmentation and only comprises the target bright area is compared with the index curve result and the confusion matrix which are obtained by simultaneously combining the shadow area and the image precise segmentation, so that the validity of the combined image precise segmentation and the target shadow information is verified.

Referring to fig. 14, the comparison result is shown, and it can be seen from the comparison of the indexes in the data table that the target detection effect of the combined shadow region and the image segmentation is better than the original detection result which only includes the target bright area without segmentation on the indexes, the mAP@0.5 value of the combined shadow region and the image segmentation is 4.6% higher than that of the original detection result which does not perform segmentation, the mAP@0.5 is 17.9% higher than 0.95, the R value is 5.2% higher, and the P value is 4.5% higher, so that the correctness and the effectiveness of the combined shadow region and the image segmentation proposed in the algorithm are verified

While the preferred embodiments and principles of this invention have been described in detail, it will be apparent to those skilled in the art that variations may be made in the embodiments based on the teachings of the invention and such variations are considered to be within the scope of the invention.

Claims

1. A side-scan sonar target detection method combining image accurate segmentation and target shadow information is characterized by comprising the following steps:

s1, performing autonomous gain compensation aiming at the gray level difference caused by energy attenuation due to the distance in a side scan sonar image and the large gray level difference caused by the instability of a mobile platform due to the bad underwater environment;

s2, correcting geometric distortion aiming at the problem that the side scan sonar images have different physical resolutions at different distances;

s3, carrying out image segmentation based on an improved deep LabV3+ network on the corrected side-scan sonar image, and extracting a highlight area and a shadow area of an interested target in the side-scan sonar image;

s4, after the segmented areas are obtained, matching the highlight areas and the shadow areas of the same target by using the imaging principle and the geometric relation of the side-scan sonar;

s5, constructing a sonar target detection model based on a YOLOv5S network, sending the corrected and segmented image into the network for training, and using the trained model for detection and identification of a real sonar target;

2. The method for detecting the side-scan sonar target by combining the accurate image segmentation and the target shadow information according to claim 1, wherein the step S1 comprises the following steps:

s11, finding a seabed line:

the seabed line is searched for the attitude information of the mobile platform as reference, and then the seabed line position is accurately searched left and right; since the side scan sonar is divided into port and starboard, the following rule is followed when converting from height to image pixel position point:

line _orig ＝N _s -(altitude*N _s /range) (1)

line in the formula _orig Representing the initial sea bottom line position point in the sonar image, altitude representing height information, range representing the working range of the sonar, N _s Indicates the sound intensity of a certain line (ping (n)) acquired by a single side boardCounting the number of data sampling points; then, in 50 pixel points around the initial value, searching a pixel point corresponding to the maximum gray value as a submarine line position point of sonar;

from the perspective of starboard, a complete side-scan sonar image is formed by adopting N pieces of data, and a gray value converted from each point sound intensity value is defined to form a sequence s (N, i), wherein N is the serial number of ping, and i is the serial number of each point in each piece of data; the position of the submarine line of the nth data searched by the algorithm is a (N), so that the width N of the corresponding image area when the height of the fish in the image is maximum can be calculated according to the following formula _min ：

N _min ＝min(N _s -a(n)),n＝1…N (2)

S13, counting the average value of each ping section gray level:

after the submarine line position and the width of an image area corresponding to the maximum depth of the fish in the image are obtained, gray correction is needed to be carried out on all pixel points in the area; firstly, counting the average value of the gray levels of each ping section (sonar images are transversely fixed and the vertical direction is counted) in the vertical direction of the image:

s14, obtaining an image gray correction factor sequence:

3. the method for detecting the side-scan sonar target by combining the accurate image segmentation and the target shadow information according to claim 1, wherein the step S2 comprises the following steps:

wherein plantarrange represents the horizontal distance, swantange represents the pitch distance, and towfisheralt represents the height of the transducer from the sea floor; towfishAlt can be obtained by submarine line detection in a sonar image;

the side scan sonar is divided into a port and a starboard, and a certain point P (x) on the original slope image is set ₂ ,y ₂ ) Corresponding to the point P (x) on the corrected flat pitch image ₁ ,y ₁ ) According to the geometric relationship, the following specific corresponding relationship between the straight pitch point and the slant pitch point can be obtained:

port resolution correction factor:

starboard resolution correction factor:

where Res denotes the resolution of the image and width denotes the image width.

4. The method for detecting the side-scan sonar target by combining the accurate image segmentation and the target shadow information according to claim 1, wherein the step S3 comprises the following steps:

s31, constructing a segmentation network model:

constructing a sonar image segmentation model based on an improved DeepLabv3+ network, and replacing an original Xconcentration series network with MobilenetV2 as a main feature extraction network; adding feature network extraction in Encoder and Decoder, calculating loss by adopting cross entropy, and evaluating semantic segmentation results by introducing a Dice loss index;

s32, making a sonar image data set, and performing model training:

after the model is modified, a sonar image data set is made and is processed correspondingly; the purpose of the sonar image segmentation part is to segment a target highlight area formed by active sound wave echoes and a shadow area formed by object occlusion, and does not distinguish the target types represented by the respective areas, so that in the segmentation task, only two types are labeled for a data set, namely a highlight area (light) and a shadow area (dark); through experimental collection, 488 pieces of side scan sonar image training sets, 71 pieces of verification sets and 136 pieces of testing sets are arranged; labeling a target bright area and a shadow area of each image in the test set and the verification set, sending the labeled images into a network, modifying category parameters, a trunk model, pre-training weights and the like in the network, and finally finishing training of a sonar image segmentation model;

modifying the weight file in the test code into the best weight of the training result, modifying the category parameters, and after modification, carrying out segmentation on the target bright area and the shadow area on the sonar image to be segmented in the test set to obtain a final segmentation result; after segmentation, the original sonar image is converted into Mask (Mask) only containing 3 pixel values, and the 3 values respectively represent: the method comprises the steps of target bright area, shadow segmentation and background, so that the problems of fuzzy sonar target edge, large sonar image noise and low resolution are effectively solved.

5. The method for detecting the side-scan sonar target by combining the accurate image segmentation and the target shadow information according to claim 1, wherein the step S4 specifically comprises:

because a plurality of target highlight areas and shadow areas obtained by dividing the network may exist, the bright areas and the shadow areas of the same target need to be matched by using the imaging principle of a side scan sonar, and an interested target is selected;

after the segmented regions are obtained, according to the imaging principle of the side scan sonar image, the target shadow region is formed because the sound waves are blocked by the target and are not irradiated, so that the target shadow region and the target highlight region are in the same direction with the sound rays, namely the bright region and the shadow are in the same horizontal line, and the heights (the direction of movement of a vehicle) of the regions formed by the shadow and the bright region are consistent; a target located on the sea floor, whose shadow is immediately behind the target's strong echo.

6. The method for detecting the side-scan sonar target by combining the accurate image segmentation and the target shadow information according to claim 1, wherein the step S5 comprises the following steps:

s51, constructing a network model and an algorithm:

the YOLOv5s network structure consists of 4 major components: the system comprises an Input (Input end), a Backbone (Backbone network), a Neck (multi-scale feature fusion) and an Output (Output end), wherein the Input comprises Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling; backbone uses Focus structure and CSP structure; the Neck adopts an FPN + PAN structure; output includes Bounding box loss function calculation and NMS non-maximum suppression;

s52, data collection, labeling and data set construction:

firstly, simultaneously carrying out frame marking on a target highlight area and a shadow area of a segmentation result, marking the frame in the same target frame, marking 4 types of data which are respectively a drowner, a mine, an airplane and a sunken ship, and then decomposing the marked data into a training set, a verification set and a test set; 488 training sets, wherein 98 images of a drowner target, 119 images of a mine target, 87 images of an airplane target and 184 images of a sunken ship target are included; 71 verification sets, wherein 17 target images of drowners, 18 target images of mines, 13 target images of airplanes and 23 target images of sunken ships are contained; 136 test sets, wherein 13 images of drowners, 35 images of mine targets, 23 images of airplane targets and 65 images of sunken ship targets are contained;

s53, experimental setting and model training:

configuration and parameter setting of experimental environment: the model of the invention finishes training and testing on NVIDIA GeForce RTX 3080 display card based on PyTorch deep learning frame, uses Python as programming language, modifies target type, pretrains weight path, configuration file of network structure, sets parameters such as epochs, batch-size, picture size, initial learning rate, cyclic learning rate, learning rate momentum, weight attenuation coefficient, ioU loss coefficient, cls loss coefficient and cls BCELoss positive sample weight, and completes construction and setting of the whole model;

model training: after modification and setting of various parameters are completed, a data set made by the sonar image segmentation result obtained above is sent to a network for model training, and a Tenscoreboard visualization tool is used for checking model training results; obtaining the variation trend of the three loss function means of the prediction frame regression loss function mean, the target detection loss function mean and the classification loss function mean along with the iteration times, the variation situation of the accuracy and the recall rate along with the iteration times, and the variation curve of the average accuracy mean when the threshold value of the cross-over ratio IoU is 0.5 and the average accuracy mean when the threshold value of the cross-over ratio IoU is 0.5;

s54: model testing and result analysis:

after the network model training is finished, selecting a weight file with the best training effect to perform position regression of the sonar target recognition on the test set data, so that the model testing is finished, and finally performing multi-azimuth evaluation on the model according to evaluation indexes such as Precision (Precision), recall rate (Recall), PR curve, F1 score and average Precision average value.

7. The method for detecting the side-scan sonar target by combining the accurate image segmentation and the target shadow information according to claim 1, wherein the step S6 comprises the following steps:

s61: verifying the effectiveness of the sonar target shadow area:

firstly, only labeling a highlight area of a target object on an original sonar image, and directly labeling the sonar original image without image segmentation; decomposing the marked data set into a training set, a verification set and a test set, sending the training set data into a YOLOv5s network for model training, verifying the quality of the model by the verification set, predicting the result of the trained model by the test set, and finally obtaining index curve graphs such as Precision, recall, PR curve and F1-score and a confusion matrix result;

secondly, simultaneously labeling a highlight area and a shadow area of a target object on the original sonar image, and obtaining a series of index curve graphs and a confusion matrix result in the same way; finally, comparing a test result obtained based on the sonar target highlight area with a test result obtained based on the combination of the sonar target highlight area and the shadow area, and verifying the validity of the shadow area information in the acoustic image;

s62: verifying the effectiveness of accurate image segmentation:

according to the results obtained in the steps, an index curve result and a confusion matrix obtained by simultaneously detecting a target highlight area and a shadow area in an original sonar image are compared with a test result obtained by detecting the segmented bright area and the shadow part of the sonar target by a DeepLabv3+ network in advance, so that the effectiveness of the pre-accurate segmentation of the side-scan sonar image is verified;