CN117671472A

CN117671472A - Underwater multi-target group identification method based on dynamic visual sensor

Info

Publication number: CN117671472A
Application number: CN202410128788.8A
Authority: CN
Inventors: 姜宇; 王跃航; 赵明浩; 齐红; 魏枫林; 王凯; 张永霁; 郭千仞
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-03-08
Anticipated expiration: 2044-01-31
Also published as: CN117671472B

Abstract

An underwater multi-target group identification method based on a dynamic vision sensor. The method comprises the following steps: s1, collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor; s2, constructing a dataset by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion; s3, the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network; s4, inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set; s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.

Description

Underwater multi-target group identification method based on dynamic visual sensor

Technical Field

The invention belongs to the technical field of underwater machine vision target detection, and particularly relates to the technical field of underwater multi-target group identification based on a dynamic vision sensor.

Background

With the development of deep sea resources and the increasing demands for protecting marine environments, higher requirements are put on real-time monitoring and efficient operation of underwater environments. In such a background, it has become particularly important to develop techniques capable of accurately detecting and identifying a variety of targets in a complex underwater environment.

The development of underwater multi-target group identification technology has profound effects on the promotion of advances in marine technology. With the exacerbation of global climate change and marine pollution problems, scientists increasingly rely on accurate underwater data to monitor the health of the marine ecosystem, evaluate the impact of human activity on marine organism diversity, and formulate corresponding protective measures. In addition, accurate underwater target detection is essential for archaeologists to explore sunken ship sites, biologists to study submarine communities of organisms, engineers to maintain submarine infrastructure, and the like.

Currently, underwater multi-target population identification detection faces the following challenges:

complex underwater environment: the underwater environment is usually poor in light condition, and a large number of scattering and absorption phenomena exist, which can lead to low underwater image quality and poor contrast, so that target detection becomes difficult. In addition, various interference factors such as sediment, plankton and the like can exist in the underwater environment.

Dynamically changing scenarios: natural factors such as water flow, waves and the like and the motion of the target itself can cause dynamic changes of underwater scenes, which can provide challenges for stable detection and tracking of the target.

Multi-sensor data fusion problem: in order to improve the accuracy of detection, it is often necessary to fuse data from different sensors, such as sonar, optical camera, thermal imager, etc., and the development of data fusion techniques remains a challenge.

Dynamic vision sensors are a new type of vision sensor that has several unique advantages over traditional cameras that make them well suited for environments where dynamic and lighting conditions are poor, such as underwater scenes. The dynamic vision sensor operates on the principle that instead of capturing frame images at fixed time intervals, the change in pixel brightness is detected, and when the change in pixel brightness exceeds a preset threshold, the camera outputs an event.

The advantages of dynamic vision sensors in the identification of multiple underwater target groups are represented by their unique principles of operation and characteristics, which are capable of providing clear, real-time visual information in extremely challenging underwater environments. The high dynamic range of these cameras enables them to capture clear images under highly reflective or dark underwater conditions without failing due to overexposure or underexposure problems common to conventional cameras. Furthermore, the pixel independent response mechanism of the dynamic vision sensor ensures that motion blur does not occur even in fast moving scenes, which is critical for accurate detection of the dynamics of an underwater creature or robot.

In summary, the dynamic vision sensor has obvious advantages in the aspect of underwater multi-target group identification, especially in the underwater application with high real-time requirements and limited energy sources for processing dynamic scenes. However, the processing of event data and the fusion of RGB data requires specialized technical processing, and this field is still under development.

Disclosure of Invention

The invention provides an underwater multi-target group identification method based on a dynamic vision sensor, which aims to solve the problems of low light, motion blur and the like caused by an underwater complex environment when underwater target identification is carried out in the past and the problem of insufficient fusion of underwater event data and RGB data.

The method comprises the following steps:

s1, data acquisition:

collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor;

s2, data set division:

constructing a data set by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion;

s3, constructing a multi-target group identification model:

the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network;

s4, training a multi-target group identification model: inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set;

s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.

Further, the underwater multi-target group event image is obtained by the following steps: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method.

Further, the target detection model includes SSD, efficientDet, retinaNet, YOLOv, YOLOv6, YOLOv7, or YOLOv8.

Further, the multi-target group identification model comprises an event image input end, an RGB image input end, a self-adaptive image enhancement module, two backbone networks, a feature level mode fusion module, a neck network, a head network and an output end;

the underwater multi-target group event image enters a backbone network through an input end to obtain multi-scale event characteristics and then is input into a characteristic level mode fusion module;

the underwater multi-target group RGB image enters the self-adaptive enhancement module through the input end to carry out self-adaptive image enhancement, enters the backbone network after being output to obtain multi-scale image characteristics, and then enters the characteristic-level mode fusion module;

and carrying out feature level mode fusion on the two groups of input images in a feature level mode fusion module, sequentially inputting the fused feature images into a neck network and a head network to obtain a target recognition result, and outputting the target recognition result through an output end.

Further, the self-adaptive image enhancement module comprises a characteristic extraction network and an RGB image enhancement module, wherein the characteristic extraction network is used for predicting self-adaptive adjustment parameters, and the self-adaptive adjustment parameters are used for adjusting the enhancement effect of each filter in the RGB image enhancement module;

the RGB image enhancement module is used for carrying out self-adaptive picture enhancement.

Further, the feature extraction network comprises four feature extraction modules with the same structure and a classification module which are connected in sequence;

each feature extraction module sequentially comprises a 3×3 convolution layer, an activation layer and a maximum pooling layer, the feature extraction modules are used for fully extracting RGB image feature information, each classification module sequentially comprises a global tie pooling layer, an activation layer and a 1×1 convolution layer, and the classification modules are used for integrating the feature information extracted by the feature extraction modules to obtain self-adaptive adjustment parameters.

Further, the RGB image enhancement module comprises a white balance filter, a sharpening filter, a pixel-level filter and a characteristic enhancement module;

the RGB image enters a white balance filter after being input, is respectively input into a sharpening filter and a pixel level filter after being output from the white balance filter, and is simultaneously input into a characteristic enhancement module after being respectively subjected to sharpening treatment and filtering treatment, and is output after being subjected to characteristic enhancement;

the feature enhancement module sequentially comprises a feature fusion layer, a 3 multiplied by 3 convolution layer, an activation layer and a 1 multiplied by 1 convolution layer.

Further, the feature level mode fusion module comprises two multi-scale feature fusion modules, a mode fusion module and a joint output module;

the multi-scale event features are input into one multi-scale feature fusion module, the multi-scale image features are input into another multi-scale feature fusion module, feature images are generated by the feature fusion module and then input into the mode fusion module, the generated feature images are input into the joint output module, the feature images are spliced according to channel dimensions through a Concat function, and then the features are extracted through a 1 multiplied by 1 convolution layer and then output.

Further, after the multiscale event feature is received by the multiscale feature fusion module, the multiscale event feature is obtained by:

+/>+；

obtaining a feature map output after fusing multi-scale event featuresWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Deep characteristic information representing different scales;

after receiving the multi-scale image features, the multi-scale feature fusion module passes through:

+/>+；

obtaining a feature map output after fusing the features of the multi-scale imagesWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Representing deep feature information of different scales.

Further, the mode fusion module receives a feature mapAnd feature map->Thereafter, by:

；

obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein, the method comprises the steps of, wherein,=/>+/>，GAP/>representing global average pooling operations,/->Representing convolution operations, sigmoid +.>Representing an activation function, the SiLU represents an activation function.

The method has the beneficial effects that:

(1) In order to cope with the low illumination and motion blur problems commonly existing in underwater environments, the method adopts an innovative self-adaptive image enhancement strategy which is specially optimized for RGB images of underwater multi-target groups. By fine adjustment at the pixel level, the module can enhance key features in the image, thereby improving the visibility and recognition of the object.

(2) Meanwhile, in order to further improve the image quality under the conditions of insufficient light and visual disturbance in the underwater complex environment, event data is introduced as a supplementary information source. In order to effectively integrate the two data types, a feature level mode fusion module is independently developed, and can deeply mine and combine features in RGB images and event data, so that information which is possibly lost in a single mode is made up, and further accuracy of target detection and adaptability of a system are remarkably improved. By the bimodal information fusion method, the influence of the underwater extreme environment on the image quality is overcome, and the stability and the robustness of the target detection algorithm in the face of complex underwater scenes are enhanced. The target detection system adopting the method can realize more efficient and accurate multi-target group identification in changeable underwater conditions, and the performance of underwater detection and monitoring tasks is obviously improved.

Drawings

FIG. 1 is a schematic diagram of a multi-objective group identification model in an embodiment of the present invention;

FIG. 2 is a block diagram of an adaptive image enhancement module in an embodiment of the present invention;

fig. 3 is a block diagram of a feature level modality fusion module in an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The embodiment provides an underwater multi-target group identification method based on a dynamic vision sensor, which comprises the following steps:

s1, data acquisition:

s2, data set division:

s3, constructing a multi-target group identification model:

The underwater multi-target group event image is obtained in the following way: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method, wherein the underwater multi-target group event image corresponds to the underwater multi-target group RGB image one by one.

The object detection model includes SSD, efficientDet, retinaNet, YOLOv, YOLOv6, YOLOv7 or YOLOv8, which models all contain three parts, namely a backbone network, a neck network and a head network.

As shown in fig. 1, the multi-target group identification model includes an event image input end, an RGB image input end, an adaptive image enhancement module, two backbone networks, a feature level modality fusion module, a neck network, a head network, and an output end;

The self-adaptive image enhancement module comprises a characteristic extraction network and an RGB image enhancement module, wherein the characteristic extraction network is used for predicting self-adaptive adjustment parameters, and the self-adaptive adjustment parameters are used for adjusting the enhancement effect of each filter in the RGB image enhancement module; the RGB image enhancement module is used for carrying out self-adaptive picture enhancement.

As shown in fig. 2, the feature extraction network includes four feature extraction modules with the same structure and a classification module which are sequentially connected;

Adaptive adjustment parameters for adjusting enhancement effects of filters in RGB image enhancement modules, including、/>、/>、/>、/>The method specifically comprises the following steps:

=/>；

= Conv(SiLU(Conv(Concat[/>:/>])))；

r, G, B each represent three channels of the input image, convRepresenting convolution operations, siLU->Representing an activation function,/->For the output characteristic diagram enhanced by the white balance filter Gau->For the gaussian filter to operate as a gaussian filter,for sharpening the filter enhanced output profile, +.>For the enhanced output feature map of the pixel-level filter,for integration->And->And (3) outputting the feature map, wherein Concat represents splicing the feature map according to the channel dimension.

As shown in fig. 2, the RGB image enhancement module includes a white balance filter, a sharpening filter, a pixel-level filter, and a feature enhancement module;

As shown in fig. 3, the feature-level mode fusion module includes two multi-scale feature fusion modules, a mode fusion module, and a joint output module;

the multi-scale event features are input into one multi-scale feature fusion module, the multi-scale image features are input into another multi-scale feature fusion module, feature images are generated by the feature fusion module and then input into the mode fusion module, the generated feature images are input into a joint output module, in the joint output module, the feature images are spliced according to channel dimensions through a Concat function, and then the features are extracted through a 1 multiplied by 1 convolution layer and then output.

After the multiscale event features are received by the multiscale feature fusion module, the multiscale event features are obtained by:

+/>+；

The mode fusion module receives the feature mapAnd feature map->Thereafter, by:

；

obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein, the method comprises the steps of, wherein,=/>+/>，GAP/>representing global average pooling operations,/->Representing convolution operations, sigmoid/>Representing an activation function, the SiLU represents an activation function.

According to the technical scheme, the underwater multi-target group RGB image and the event are acquired through the dynamic vision sensor, meanwhile, the self-adaptive feature enhancement module is used for carrying out self-adaptive feature enhancement on the RGB image, the event data are represented as event pictures through a time-space voxel representation method, and the multi-scale features of the RGB image and the event image are integrated through the feature level mode fusion module, so that accurate and efficient target identification is achieved.

Claims

1. An underwater multi-target group identification method based on a dynamic vision sensor is characterized by comprising the following steps:

s1, data acquisition:

s2, data set division:

s3, constructing a multi-target group identification model:

2. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 1, wherein the underwater multi-target group event image is obtained by the following steps: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method.

3. The dynamic vision sensor-based underwater multi-target population identification method of claim 1, wherein the target detection model comprises SSD, efficientDet, retinaNet, YOLOv5, YOLOv6, YOLOv7 or YOLOv8.

4. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 1, wherein the multi-target group identification model comprises an event image input end, an RGB image input end, an adaptive image enhancement module, two backbone networks, a feature level mode fusion module, a neck network, a head network and an output end;

5. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 4, wherein the adaptive image enhancement module comprises a feature extraction network and an RGB image enhancement module, the feature extraction network is used for predicting adaptive adjustment parameters, and the adaptive adjustment parameters are used for adjusting enhancement effects of filters in the RGB image enhancement module;

6. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 5, wherein the feature extraction network comprises four feature extraction modules with the same structure and a classification module which are connected in sequence;

7. The dynamic vision sensor-based underwater multi-target population identification method of claim 5, wherein the RGB image enhancement module comprises a white balance filter, a sharpening filter, a pixel-level filter, and a feature enhancement module;

8. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 4, wherein the feature level modal fusion module comprises two multi-scale feature fusion modules, a modal fusion module and a joint output module;

9. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 8, wherein after the multi-scale feature fusion module receives the multi-scale event features, the method comprises the following steps:

+/>+；

10. The method for identifying the underwater multi-target group based on the dynamic vision sensor as claimed in claim 8, wherein the mode fusion module receives the feature mapAnd feature map->Thereafter, by:

；

obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein->= + />，GAP/>Representing global average pooling operations,/->Representing convolution operations, sigmoid +.>Representing an activation function, the SiLU represents an activation function.