CN113283278A

CN113283278A - Anti-interference laser underwater target recognition instrument

Info

Publication number: CN113283278A
Application number: CN202110025036.5A
Authority: CN
Inventors: 吕以豪; 陈金水; 高洁; 王文海; 卢建刚; 刘兴高
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-08-20
Anticipated expiration: 2041-01-08
Also published as: CN113283278B

Abstract

The invention discloses an anti-interference laser underwater target recognition instrument which is formed by sequentially connecting an underwater laser scanning device, a computing device and a display device, wherein the underwater laser scanning device scans an underwater area to be detected, transmits obtained laser point cloud to the computing device, converts the laser point cloud into two-dimensional image data and performs target recognition work. The computing device comprises a neural network identification module and a storage module. The neural network identification module utilizes the combination of the EfficientNet convolution neural network and an attention mechanism to guide the network to ignore noise in the image and improve the attention to the target effective characteristic position, thereby increasing the anti-interference capability of the network. The invention realizes the intelligent, anti-interference and high-precision laser underwater target identification.

Description

Anti-interference laser underwater target recognition instrument

Technical Field

The invention relates to the field of laser underwater target identification, in particular to an anti-interference laser underwater target identification instrument.

Background

The laser underwater target recognition technology is an advanced detection technology which is being developed before a project, integrates a laser technology, a communication technology, a signal processing, target recognition and an electronic technology, and has wide application prospect. The research and development of the technology have important value in both theory and practical application.

To improve the performance and ability of existing deep learning networks, many scholars propose to further mimic the working patterns of the human brain, introducing a mechanism of attention into the neural network model. There have been many successful attempts to learn classical application scenarios in both natural language processing and image understanding analysis at two great depths. The former introduces an attention mechanism into a Recurrent Neural Network (RNN) and gives different importance to keywords, vocabularies, sentences, paragraphs and even chapter relations in input text sentences in the model training process, so that complex tasks such as content understanding and machine translation are completed more accurately. The latter combines an attention mechanism with a convolution neural network, and continuously learns a part which is more important for the completion of a final task in a feature map in a training process, so that higher weight is given to the part, which is equivalent to secondary enhancement of a self-training result, and the method is successfully applied to a plurality of visual application scenes such as classification, detection, segmentation and the like.

Disclosure of Invention

The invention aims to provide an anti-interference laser underwater target recognition instrument aiming at the defects of the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows: an anti-interference laser underwater target recognition instrument comprises an underwater laser scanning device, a computing device and a display device which are sequentially connected, wherein the underwater laser scanning device scans an underwater area to be detected, transmits obtained laser point clouds to the computing device, converts the laser point clouds into two-dimensional image data, and performs target recognition. The computing device comprises a neural network identification module which combines a traditional channel domain deep learning attention model with a class activation mapping map and utilizes an EfficientNet convolution neural network with a highly optimized structure to complete identification, and a storage module which stores data such as network parameters and an operating system.

Further, the neural network recognition module realizes the target recognition work and the process is as follows:

(1) and processing the laser point cloud data into three-channel picture data, and inputting the three-channel picture data into an EfficientNet convolution neural network model which is stored in a data storage module and contains an attention machine module.

(2) The convolutional neural network extracts features of an input picture, generates a feature map in multiple stages, gives a weight to each channel of the feature map, and multiplies the weight back to each element of the channel feature map, so that the activation amount of different channels of the layer of feature map presents new numerical value difference on the original basis, wherein the higher the weight of a certain channel is, the larger the numerical value of the newly generated feature map of the channel is, namely, the higher the importance is. The realization process is as follows:

and (2.1) performing global average pooling on the original feature map in the channel dimension, wherein the expression is as the expression (1). Wherein p is_iIs the ith pixel point in the image, n is the total number of pixel points in the image,

is the pooled value of the output. Therefore, the feature map with the number of channels C can be converted into a feature vector F of 1 × 1 × C.

And (2.2) inputting the characteristic vector F into a plurality of layers of full-connection structures, and finally obtaining a weight vector W which is the same as the input dimension by reducing and increasing the characteristic dimension through mapping.

And (2.3) multiplying W back to the original feature map after passing through the sigmoid activation function layer, and giving different weights to different channels of the feature map.

(3) Generating a Class Activation Map (CAM), which is a visualization mode for the abstract feature extraction condition of the convolutional neural network, and can also be regarded as a visualization means for a network attention area, and the specific process is as follows:

and (3.1) performing global average pooling on the input feature map to finally obtain the feature vector with the dimension same as the number of the CONV channels.

And (3.2) calculating a Class Activation Map (CAM) corresponding to the current input sample by using the formula (2).

Wherein w₁,w₂...w_nThe weight vector w corresponding to the category of the current input sample in the full connection layer is the same as the feature vector in dimension; m_iIs the ith channel image of the feature map; f is a function that zeroes out pixels in the image that are smaller than the mean. Expression (2) represents the situation that the weight vector w is multiplied by the corresponding feature map and then summed, and the finally obtained heat map is the area concerned by the network and corresponds to the input image. Because the element in w represents the weight of each element in the feature vector according to which the network makes the judgment that the input sample belongs to the category, and because the feature vector in the method is obtained by the last layer of feature graph through the GAP, the feature vector represents the activation condition of each channel of the feature graph, namely the distribution condition of the correspondingly represented feature in the input sample.

(4) For each block output signature graph of the EfficientNet convolutional neural network, an attention heat map is calculated, which is implemented as equation (3). Wherein h is_iIs the weight of each channel of the feature map in the step (2), n is the number of channels of the feature map of the layer, M_iIs the ith channel image of the feature map.

(5) And (3) making a square error Loss (MSE Loss) of a formula (4) by using a Heat map (Heat) generated by each block of the EfficientNet convolutional neural network and a Heat map (Heat) generated by a deeper block of the EfficientNet convolutional neural network, making a square error Loss (MSE Loss) by using a Heat map (Heat) of the last layer of feature map and a Class Activation Map (CAM) of the network, and adding all calculated losses into the total Loss of the network.

Wherein p is_i，q_iI-th pixel points of the two input images are respectively, and n is the total pixel point number of the image.

The process is a hierarchical structure, so that the feature graph output by each block from shallow to deep of the network can be guided by an attention system, thereby enhancing the key features and improving the identification precision of the network. Since each channel of the feature map is sensitive to features in a class of input data, the weight of the channel is the weight of the feature, which is a feature selection process.

The technical conception of the invention is as follows: the method is characterized in that the underwater laser scanning device scans an underwater area to be detected, transmits obtained laser point clouds to the computing device, converts the laser point clouds into two-dimensional image data, and performs target identification. The computing device comprises a neural network identification module which combines a traditional channel domain deep learning attention model with a class activation mapping map and utilizes an EfficientNet convolution neural network with a highly optimized structure to complete identification, and a storage module which stores data such as network parameters and an operating system. Due to the flow of the water body and the change of the refractive index, more noise points exist in the collected laser images of various underwater targets, and the recognition effect of the deep learning network is influenced. The neural network identification module in the patent combines the EfficientNet convolutional neural network with an attention mechanism, guides the network to ignore noise in an image, improves attention to a target effective characteristic position, and accordingly increases the anti-interference capability of the network.

The invention has the following beneficial effects: the anti-interference laser underwater target recognition instrument has high reasoning speed and anti-interference capability, and can quickly, stably and accurately complete the recognition of the laser underwater target.

Drawings

FIG. 1 is a flow chart of the identification implementation process of an anti-interference laser underwater target identification instrument.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

The anti-interference laser underwater target recognition instrument comprises an underwater laser scanning device, a computing device and a display device which are sequentially connected, wherein the underwater laser scanning device scans an underwater area to be detected, the obtained laser point cloud is transmitted to the computing device and converted into two-dimensional image data for target recognition, and the recognition result is displayed by the display device. The computing device comprises a neural network identification module which combines a traditional channel domain deep learning attention model with a class activation mapping map and utilizes an EfficientNet convolution neural network with a highly optimized structure to complete identification, and a storage module which stores data such as network parameters and an operating system.

The working process of the invention is shown in figure 1, and the specific steps are as follows:

1) the underwater laser scanning device carries out laser scanning in the operation area to form point cloud data and then returns the point cloud data to the computing device through the cable.

2) The calculation device is integrated with an EfficientNet convolutional neural network model containing an attention machine module and a data storage module, and the identification process is as follows:

and 2.1) processing the laser point cloud data into three-channel picture data, and inputting the three-channel picture data into an EfficientNet convolution neural network model which is stored in a data storage module and contains an attention machine module.

2.2) the convolutional neural network extracts the features of the input picture to generate feature maps in multiple stages, gives a weight to each channel of the feature maps, and multiplies the weight back to each element of the channel feature maps, so that the activation amount of different channels of the layer of feature maps presents new numerical value differences on the original basis, wherein the higher the weight of a certain channel is, the larger the numerical value of the newly generated feature map of the channel is, namely, the higher the importance is. The realization process is as follows:

2.2.1) performing global average pooling on the original feature map in the channel dimension, wherein the expression is as the expression (1). Wherein p is_iIs the ith pixel point in the image, n is the total number of pixel points in the image,

2.2.2) inputting the characteristic vector F into a plurality of layers of full-connection structures, and finally obtaining the weight vector W which is the same as the input dimension by reducing and increasing the characteristic dimension through mapping.

2.2.3) multiplying W back to the original feature map after passing through the sigmoid activation function layer, and giving different weights to different channels of the feature map.

2.3) generating a Class Activation Map (CAM), which is a visualization mode of the abstract feature extraction condition of the convolutional neural network and can also be regarded as a visualization means for a network attention area, and the specific process is as follows:

2.3.1) performing global average pooling on the input feature map to finally obtain the feature vector with the dimension same as the number of the CONV channels.

2.3.2) calculate the Class Activation Map (CAM) for the current input sample using equation (2):

wherein, w₁,w₂...w_nThe weight vector w corresponding to the category of the current input sample in the full connection layer has the same dimension as the feature vector; m_iIs a characteristic diagramThe ith channel image of (1); f is a function that zeroes out pixels in the image that are smaller than the mean. Expression (2) represents the situation that the weight vector w is multiplied by the corresponding feature map and then summed, and the finally obtained heat map is the area concerned by the network and corresponds to the input image. Because the element in w represents the weight of each element in the feature vector according to which the network makes the judgment that the input sample belongs to the category, and because the feature vector in the method is obtained by the last layer of feature graph through the GAP, the feature vector represents the activation condition of each channel of the feature graph, namely the distribution condition of the correspondingly represented feature in the input sample.

2.4) calculating an attention heat map for each block output characteristic map of the EfficientNet convolutional neural network, wherein the attention heat map is realized as the formula (3). Wherein h is_iIs 2.2) the weight of each channel of the feature map, n is the number of channels of the feature map of the layer, M_iIs the ith channel image of the feature map.

2.5) the square error Loss (MSE Loss) of the formula (4) of the Heat map (Heat) generated by each block of the EfficientNet convolutional neural network and the Heat map (Heat) generated by the block of the deeper layer. The Heat map (Heat) of the last layer of feature map and the Class Activation Map (CAM) of the network do square error Loss (MSE Loss). All losses calculated are added to the total loss of the network.

The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit and scope of the claims.

Claims

1. An anti-interference laser underwater target recognition instrument is characterized by comprising an underwater laser scanning device, a computing device and a display device which are sequentially connected, wherein the underwater laser scanning device scans an underwater area to be detected, the obtained laser point cloud is transmitted to the computing device and converted into two-dimensional image data to perform target recognition work, and a recognition result is displayed by the display device. The computing device comprises a neural network identification module which combines a traditional channel domain deep learning attention model with a class activation mapping map and utilizes an EfficientNet convolution neural network with a highly optimized structure to complete identification, and a storage module which stores data such as network parameters and an operating system.

2. The anti-jamming laser underwater target recognition instrument according to claim 1, wherein the neural network recognition module realizes target recognition work by the following process:

(2.1) performing global average on the original characteristic diagram in the channel dimensionPooling, which is expressed as formula (1). Wherein p is_iIs the ith pixel point in the image, n is the total number of pixel points in the image,

And (2.2) inputting the characteristic vector F into a plurality of layers of full-connection structures, and finally obtaining the weight vector W which is the same as the input dimension by reducing and increasing the characteristic dimension through mapping.

Wherein w₁,w₂...w_nThe weight vector w corresponding to the category of the current input sample in the full connection layer has the same dimension as the feature vector; m_iIs the ith channel image of the feature map; f is a function that zeroes out pixels in the image that are smaller than the mean. Expression (2) represents the situation that the weight vector w is multiplied by the corresponding feature map and then summed, and the finally obtained heat map is the area concerned by the network and corresponds to the input image. Since the elements in w representThe network makes the weight of each element in the feature vector according to which the input sample belongs to the category judgment, and because the feature vector in the method is obtained by the last layer of feature graph through the GAP, the feature vector represents the activation condition of each channel of the feature graph, namely the distribution condition of the correspondingly represented feature in the input sample.

(5) And (3) making a Heat map (Heat) generated by each block of the EfficientNet convolutional neural network and a Heat map (Heat) generated by a block at a deeper layer thereof as a square error Loss (MSE Loss) of a formula (4), making a Heat map (Heat) of the last layer of feature map and a Class Activation Map (CAM) of the network as a square error Loss (MSE Loss), and adding all calculated losses into the total Loss of the network.