CN117671472A - Underwater multi-target group identification method based on dynamic visual sensor - Google Patents
Underwater multi-target group identification method based on dynamic visual sensor Download PDFInfo
- Publication number
- CN117671472A CN117671472A CN202410128788.8A CN202410128788A CN117671472A CN 117671472 A CN117671472 A CN 117671472A CN 202410128788 A CN202410128788 A CN 202410128788A CN 117671472 A CN117671472 A CN 117671472A
- Authority
- CN
- China
- Prior art keywords
- feature
- target group
- underwater
- module
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000000007 visual effect Effects 0.000 title description 3
- 230000004927 fusion Effects 0.000 claims abstract description 59
- 238000001514 detection method Methods 0.000 claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 15
- 230000000694 effects Effects 0.000 claims abstract description 10
- 238000012795 verification Methods 0.000 claims abstract description 8
- 238000000605 extraction Methods 0.000 claims description 27
- 230000004913 activation Effects 0.000 claims description 16
- 238000011176 pooling Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 238000012512 characterization method Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 206010047571 Visual impairment Diseases 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002902 bimodal effect Effects 0.000 description 1
- 230000005713 exacerbation Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000013049 sediment Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/05—Underwater scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/36—Applying a local operator, i.e. means to operate on image points situated in the vicinity of a given point; Non-linear local filtering operations, e.g. median filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Nonlinear Science (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Processing (AREA)
Abstract
An underwater multi-target group identification method based on a dynamic vision sensor. The method comprises the following steps: s1, collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor; s2, constructing a dataset by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion; s3, the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network; s4, inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set; s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.
Description
Technical Field
The invention belongs to the technical field of underwater machine vision target detection, and particularly relates to the technical field of underwater multi-target group identification based on a dynamic vision sensor.
Background
With the development of deep sea resources and the increasing demands for protecting marine environments, higher requirements are put on real-time monitoring and efficient operation of underwater environments. In such a background, it has become particularly important to develop techniques capable of accurately detecting and identifying a variety of targets in a complex underwater environment.
The development of underwater multi-target group identification technology has profound effects on the promotion of advances in marine technology. With the exacerbation of global climate change and marine pollution problems, scientists increasingly rely on accurate underwater data to monitor the health of the marine ecosystem, evaluate the impact of human activity on marine organism diversity, and formulate corresponding protective measures. In addition, accurate underwater target detection is essential for archaeologists to explore sunken ship sites, biologists to study submarine communities of organisms, engineers to maintain submarine infrastructure, and the like.
Currently, underwater multi-target population identification detection faces the following challenges:
complex underwater environment: the underwater environment is usually poor in light condition, and a large number of scattering and absorption phenomena exist, which can lead to low underwater image quality and poor contrast, so that target detection becomes difficult. In addition, various interference factors such as sediment, plankton and the like can exist in the underwater environment.
Dynamically changing scenarios: natural factors such as water flow, waves and the like and the motion of the target itself can cause dynamic changes of underwater scenes, which can provide challenges for stable detection and tracking of the target.
Multi-sensor data fusion problem: in order to improve the accuracy of detection, it is often necessary to fuse data from different sensors, such as sonar, optical camera, thermal imager, etc., and the development of data fusion techniques remains a challenge.
Dynamic vision sensors are a new type of vision sensor that has several unique advantages over traditional cameras that make them well suited for environments where dynamic and lighting conditions are poor, such as underwater scenes. The dynamic vision sensor operates on the principle that instead of capturing frame images at fixed time intervals, the change in pixel brightness is detected, and when the change in pixel brightness exceeds a preset threshold, the camera outputs an event.
The advantages of dynamic vision sensors in the identification of multiple underwater target groups are represented by their unique principles of operation and characteristics, which are capable of providing clear, real-time visual information in extremely challenging underwater environments. The high dynamic range of these cameras enables them to capture clear images under highly reflective or dark underwater conditions without failing due to overexposure or underexposure problems common to conventional cameras. Furthermore, the pixel independent response mechanism of the dynamic vision sensor ensures that motion blur does not occur even in fast moving scenes, which is critical for accurate detection of the dynamics of an underwater creature or robot.
In summary, the dynamic vision sensor has obvious advantages in the aspect of underwater multi-target group identification, especially in the underwater application with high real-time requirements and limited energy sources for processing dynamic scenes. However, the processing of event data and the fusion of RGB data requires specialized technical processing, and this field is still under development.
Disclosure of Invention
The invention provides an underwater multi-target group identification method based on a dynamic vision sensor, which aims to solve the problems of low light, motion blur and the like caused by an underwater complex environment when underwater target identification is carried out in the past and the problem of insufficient fusion of underwater event data and RGB data.
The method comprises the following steps:
s1, data acquisition:
collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor;
s2, data set division:
constructing a data set by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion;
s3, constructing a multi-target group identification model:
the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network;
s4, training a multi-target group identification model: inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set;
s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.
Further, the underwater multi-target group event image is obtained by the following steps: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method.
Further, the target detection model includes SSD, efficientDet, retinaNet, YOLOv, YOLOv6, YOLOv7, or YOLOv8.
Further, the multi-target group identification model comprises an event image input end, an RGB image input end, a self-adaptive image enhancement module, two backbone networks, a feature level mode fusion module, a neck network, a head network and an output end;
the underwater multi-target group event image enters a backbone network through an input end to obtain multi-scale event characteristics and then is input into a characteristic level mode fusion module;
the underwater multi-target group RGB image enters the self-adaptive enhancement module through the input end to carry out self-adaptive image enhancement, enters the backbone network after being output to obtain multi-scale image characteristics, and then enters the characteristic-level mode fusion module;
and carrying out feature level mode fusion on the two groups of input images in a feature level mode fusion module, sequentially inputting the fused feature images into a neck network and a head network to obtain a target recognition result, and outputting the target recognition result through an output end.
Further, the self-adaptive image enhancement module comprises a characteristic extraction network and an RGB image enhancement module, wherein the characteristic extraction network is used for predicting self-adaptive adjustment parameters, and the self-adaptive adjustment parameters are used for adjusting the enhancement effect of each filter in the RGB image enhancement module;
the RGB image enhancement module is used for carrying out self-adaptive picture enhancement.
Further, the feature extraction network comprises four feature extraction modules with the same structure and a classification module which are connected in sequence;
each feature extraction module sequentially comprises a 3×3 convolution layer, an activation layer and a maximum pooling layer, the feature extraction modules are used for fully extracting RGB image feature information, each classification module sequentially comprises a global tie pooling layer, an activation layer and a 1×1 convolution layer, and the classification modules are used for integrating the feature information extracted by the feature extraction modules to obtain self-adaptive adjustment parameters.
Further, the RGB image enhancement module comprises a white balance filter, a sharpening filter, a pixel-level filter and a characteristic enhancement module;
the RGB image enters a white balance filter after being input, is respectively input into a sharpening filter and a pixel level filter after being output from the white balance filter, and is simultaneously input into a characteristic enhancement module after being respectively subjected to sharpening treatment and filtering treatment, and is output after being subjected to characteristic enhancement;
the feature enhancement module sequentially comprises a feature fusion layer, a 3 multiplied by 3 convolution layer, an activation layer and a 1 multiplied by 1 convolution layer.
Further, the feature level mode fusion module comprises two multi-scale feature fusion modules, a mode fusion module and a joint output module;
the multi-scale event features are input into one multi-scale feature fusion module, the multi-scale image features are input into another multi-scale feature fusion module, feature images are generated by the feature fusion module and then input into the mode fusion module, the generated feature images are input into the joint output module, the feature images are spliced according to channel dimensions through a Concat function, and then the features are extracted through a 1 multiplied by 1 convolution layer and then output.
Further, after the multiscale event feature is received by the multiscale feature fusion module, the multiscale event feature is obtained by:
+/>+;
obtaining a feature map output after fusing multi-scale event featuresWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Deep characteristic information representing different scales;
after receiving the multi-scale image features, the multi-scale feature fusion module passes through:
+/>+;
obtaining a feature map output after fusing the features of the multi-scale imagesWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Representing deep feature information of different scales.
Further, the mode fusion module receives a feature mapAnd feature map->Thereafter, by:
;
;
obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein, the method comprises the steps of, wherein,=/>+/>,GAP/>representing global average pooling operations,/->Representing convolution operations, sigmoid +.>Representing an activation function, the SiLU represents an activation function.
The method has the beneficial effects that:
(1) In order to cope with the low illumination and motion blur problems commonly existing in underwater environments, the method adopts an innovative self-adaptive image enhancement strategy which is specially optimized for RGB images of underwater multi-target groups. By fine adjustment at the pixel level, the module can enhance key features in the image, thereby improving the visibility and recognition of the object.
(2) Meanwhile, in order to further improve the image quality under the conditions of insufficient light and visual disturbance in the underwater complex environment, event data is introduced as a supplementary information source. In order to effectively integrate the two data types, a feature level mode fusion module is independently developed, and can deeply mine and combine features in RGB images and event data, so that information which is possibly lost in a single mode is made up, and further accuracy of target detection and adaptability of a system are remarkably improved. By the bimodal information fusion method, the influence of the underwater extreme environment on the image quality is overcome, and the stability and the robustness of the target detection algorithm in the face of complex underwater scenes are enhanced. The target detection system adopting the method can realize more efficient and accurate multi-target group identification in changeable underwater conditions, and the performance of underwater detection and monitoring tasks is obviously improved.
Drawings
FIG. 1 is a schematic diagram of a multi-objective group identification model in an embodiment of the present invention;
FIG. 2 is a block diagram of an adaptive image enhancement module in an embodiment of the present invention;
fig. 3 is a block diagram of a feature level modality fusion module in an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment provides an underwater multi-target group identification method based on a dynamic vision sensor, which comprises the following steps:
s1, data acquisition:
collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor;
s2, data set division:
constructing a data set by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion;
s3, constructing a multi-target group identification model:
the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network;
s4, training a multi-target group identification model: inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set;
s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.
The underwater multi-target group event image is obtained in the following way: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method, wherein the underwater multi-target group event image corresponds to the underwater multi-target group RGB image one by one.
The object detection model includes SSD, efficientDet, retinaNet, YOLOv, YOLOv6, YOLOv7 or YOLOv8, which models all contain three parts, namely a backbone network, a neck network and a head network.
As shown in fig. 1, the multi-target group identification model includes an event image input end, an RGB image input end, an adaptive image enhancement module, two backbone networks, a feature level modality fusion module, a neck network, a head network, and an output end;
the underwater multi-target group event image enters a backbone network through an input end to obtain multi-scale event characteristics and then is input into a characteristic level mode fusion module;
the underwater multi-target group RGB image enters the self-adaptive enhancement module through the input end to carry out self-adaptive image enhancement, enters the backbone network after being output to obtain multi-scale image characteristics, and then enters the characteristic-level mode fusion module;
and carrying out feature level mode fusion on the two groups of input images in a feature level mode fusion module, sequentially inputting the fused feature images into a neck network and a head network to obtain a target recognition result, and outputting the target recognition result through an output end.
The self-adaptive image enhancement module comprises a characteristic extraction network and an RGB image enhancement module, wherein the characteristic extraction network is used for predicting self-adaptive adjustment parameters, and the self-adaptive adjustment parameters are used for adjusting the enhancement effect of each filter in the RGB image enhancement module; the RGB image enhancement module is used for carrying out self-adaptive picture enhancement.
As shown in fig. 2, the feature extraction network includes four feature extraction modules with the same structure and a classification module which are sequentially connected;
each feature extraction module sequentially comprises a 3×3 convolution layer, an activation layer and a maximum pooling layer, the feature extraction modules are used for fully extracting RGB image feature information, each classification module sequentially comprises a global tie pooling layer, an activation layer and a 1×1 convolution layer, and the classification modules are used for integrating the feature information extracted by the feature extraction modules to obtain self-adaptive adjustment parameters.
Adaptive adjustment parameters for adjusting enhancement effects of filters in RGB image enhancement modules, including、/>、/>、/>、/>The method specifically comprises the following steps:
=/>;
=/>;
=/>;
= Conv(SiLU(Conv(Concat[/>:/>])));
r, G, B each represent three channels of the input image, convRepresenting convolution operations, siLU->Representing an activation function,/->For the output characteristic diagram enhanced by the white balance filter Gau->For the gaussian filter to operate as a gaussian filter,for sharpening the filter enhanced output profile, +.>For the enhanced output feature map of the pixel-level filter,for integration->And->And (3) outputting the feature map, wherein Concat represents splicing the feature map according to the channel dimension.
As shown in fig. 2, the RGB image enhancement module includes a white balance filter, a sharpening filter, a pixel-level filter, and a feature enhancement module;
the RGB image enters a white balance filter after being input, is respectively input into a sharpening filter and a pixel level filter after being output from the white balance filter, and is simultaneously input into a characteristic enhancement module after being respectively subjected to sharpening treatment and filtering treatment, and is output after being subjected to characteristic enhancement;
the feature enhancement module sequentially comprises a feature fusion layer, a 3 multiplied by 3 convolution layer, an activation layer and a 1 multiplied by 1 convolution layer.
As shown in fig. 3, the feature-level mode fusion module includes two multi-scale feature fusion modules, a mode fusion module, and a joint output module;
the multi-scale event features are input into one multi-scale feature fusion module, the multi-scale image features are input into another multi-scale feature fusion module, feature images are generated by the feature fusion module and then input into the mode fusion module, the generated feature images are input into a joint output module, in the joint output module, the feature images are spliced according to channel dimensions through a Concat function, and then the features are extracted through a 1 multiplied by 1 convolution layer and then output.
After the multiscale event features are received by the multiscale feature fusion module, the multiscale event features are obtained by:
+/>+;
obtaining a feature map output after fusing multi-scale event featuresWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Deep characteristic information representing different scales;
after receiving the multi-scale image features, the multi-scale feature fusion module passes through:
+/>+;
obtaining a feature map output after fusing the features of the multi-scale imagesWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Representing deep feature information of different scales.
The mode fusion module receives the feature mapAnd feature map->Thereafter, by:
;
;
obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein, the method comprises the steps of, wherein,=/>+/>,GAP/>representing global average pooling operations,/->Representing convolution operations, sigmoid/>Representing an activation function, the SiLU represents an activation function.
According to the technical scheme, the underwater multi-target group RGB image and the event are acquired through the dynamic vision sensor, meanwhile, the self-adaptive feature enhancement module is used for carrying out self-adaptive feature enhancement on the RGB image, the event data are represented as event pictures through a time-space voxel representation method, and the multi-scale features of the RGB image and the event image are integrated through the feature level mode fusion module, so that accurate and efficient target identification is achieved.
Claims (10)
1. An underwater multi-target group identification method based on a dynamic vision sensor is characterized by comprising the following steps:
s1, data acquisition:
collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor;
s2, data set division:
constructing a data set by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion;
s3, constructing a multi-target group identification model:
the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network;
s4, training a multi-target group identification model: inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set;
s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.
2. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 1, wherein the underwater multi-target group event image is obtained by the following steps: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method.
3. The dynamic vision sensor-based underwater multi-target population identification method of claim 1, wherein the target detection model comprises SSD, efficientDet, retinaNet, YOLOv5, YOLOv6, YOLOv7 or YOLOv8.
4. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 1, wherein the multi-target group identification model comprises an event image input end, an RGB image input end, an adaptive image enhancement module, two backbone networks, a feature level mode fusion module, a neck network, a head network and an output end;
the underwater multi-target group event image enters a backbone network through an input end to obtain multi-scale event characteristics and then is input into a characteristic level mode fusion module;
the underwater multi-target group RGB image enters the self-adaptive enhancement module through the input end to carry out self-adaptive image enhancement, enters the backbone network after being output to obtain multi-scale image characteristics, and then enters the characteristic-level mode fusion module;
and carrying out feature level mode fusion on the two groups of input images in a feature level mode fusion module, sequentially inputting the fused feature images into a neck network and a head network to obtain a target recognition result, and outputting the target recognition result through an output end.
5. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 4, wherein the adaptive image enhancement module comprises a feature extraction network and an RGB image enhancement module, the feature extraction network is used for predicting adaptive adjustment parameters, and the adaptive adjustment parameters are used for adjusting enhancement effects of filters in the RGB image enhancement module;
the RGB image enhancement module is used for carrying out self-adaptive picture enhancement.
6. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 5, wherein the feature extraction network comprises four feature extraction modules with the same structure and a classification module which are connected in sequence;
each feature extraction module sequentially comprises a 3×3 convolution layer, an activation layer and a maximum pooling layer, the feature extraction modules are used for fully extracting RGB image feature information, each classification module sequentially comprises a global tie pooling layer, an activation layer and a 1×1 convolution layer, and the classification modules are used for integrating the feature information extracted by the feature extraction modules to obtain self-adaptive adjustment parameters.
7. The dynamic vision sensor-based underwater multi-target population identification method of claim 5, wherein the RGB image enhancement module comprises a white balance filter, a sharpening filter, a pixel-level filter, and a feature enhancement module;
the RGB image enters a white balance filter after being input, is respectively input into a sharpening filter and a pixel level filter after being output from the white balance filter, and is simultaneously input into a characteristic enhancement module after being respectively subjected to sharpening treatment and filtering treatment, and is output after being subjected to characteristic enhancement;
the feature enhancement module sequentially comprises a feature fusion layer, a 3 multiplied by 3 convolution layer, an activation layer and a 1 multiplied by 1 convolution layer.
8. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 4, wherein the feature level modal fusion module comprises two multi-scale feature fusion modules, a modal fusion module and a joint output module;
the multi-scale event features are input into one multi-scale feature fusion module, the multi-scale image features are input into another multi-scale feature fusion module, feature images are generated by the feature fusion module and then input into the mode fusion module, the generated feature images are input into the joint output module, the feature images are spliced according to channel dimensions through a Concat function, and then the features are extracted through a 1 multiplied by 1 convolution layer and then output.
9. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 8, wherein after the multi-scale feature fusion module receives the multi-scale event features, the method comprises the following steps:
+/>+;
obtaining a feature map output after fusing multi-scale event featuresWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Deep characteristic information representing different scales;
after receiving the multi-scale image features, the multi-scale feature fusion module passes through:
+/>+;
obtaining a feature map output after fusing the features of the multi-scale imagesWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Representing deep feature information of different scales.
10. The method for identifying the underwater multi-target group based on the dynamic vision sensor as claimed in claim 8, wherein the mode fusion module receives the feature mapAnd feature map->Thereafter, by:
;
;
obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein->= + />,GAP/>Representing global average pooling operations,/->Representing convolution operations, sigmoid +.>Representing an activation function, the SiLU represents an activation function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410128788.8A CN117671472B (en) | 2024-01-31 | 2024-01-31 | Underwater multi-target group identification method based on dynamic visual sensor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410128788.8A CN117671472B (en) | 2024-01-31 | 2024-01-31 | Underwater multi-target group identification method based on dynamic visual sensor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117671472A true CN117671472A (en) | 2024-03-08 |
CN117671472B CN117671472B (en) | 2024-05-14 |
Family
ID=90079183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410128788.8A Active CN117671472B (en) | 2024-01-31 | 2024-01-31 | Underwater multi-target group identification method based on dynamic visual sensor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117671472B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113627504A (en) * | 2021-08-02 | 2021-11-09 | 南京邮电大学 | Multi-mode multi-scale feature fusion target detection method based on generation of countermeasure network |
CN114202475A (en) * | 2021-11-24 | 2022-03-18 | 北京理工大学 | Adaptive image enhancement method and system |
CN114332559A (en) * | 2021-12-17 | 2022-04-12 | 安徽理工大学 | RGB-D significance target detection method based on self-adaptive cross-modal fusion mechanism and depth attention network |
CN114926514A (en) * | 2022-05-13 | 2022-08-19 | 北京交通大学 | Registration method and device of event image and RGB image |
CN116797946A (en) * | 2023-05-11 | 2023-09-22 | 南京航空航天大学 | Cross-modal fusion target detection method based on unmanned aerial vehicle |
CN117132759A (en) * | 2023-08-02 | 2023-11-28 | 上海无线电设备研究所 | Saliency target detection method based on multiband visual image perception and fusion |
-
2024
- 2024-01-31 CN CN202410128788.8A patent/CN117671472B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113627504A (en) * | 2021-08-02 | 2021-11-09 | 南京邮电大学 | Multi-mode multi-scale feature fusion target detection method based on generation of countermeasure network |
CN114202475A (en) * | 2021-11-24 | 2022-03-18 | 北京理工大学 | Adaptive image enhancement method and system |
CN114332559A (en) * | 2021-12-17 | 2022-04-12 | 安徽理工大学 | RGB-D significance target detection method based on self-adaptive cross-modal fusion mechanism and depth attention network |
CN114926514A (en) * | 2022-05-13 | 2022-08-19 | 北京交通大学 | Registration method and device of event image and RGB image |
CN116797946A (en) * | 2023-05-11 | 2023-09-22 | 南京航空航天大学 | Cross-modal fusion target detection method based on unmanned aerial vehicle |
CN117132759A (en) * | 2023-08-02 | 2023-11-28 | 上海无线电设备研究所 | Saliency target detection method based on multiband visual image perception and fusion |
Non-Patent Citations (1)
Title |
---|
YIMIN ZHANG.ETC: ""ESarDet: An Efficient SAR Ship Detection Method Based on Context Information and Large Effective Receptive Field"", 《REMOTE SENSING》, 9 June 2023 (2023-06-09) * |
Also Published As
Publication number | Publication date |
---|---|
CN117671472B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020259118A1 (en) | Method and device for image processing, method and device for training object detection model | |
CN108764071B (en) | Real face detection method and device based on infrared and visible light images | |
WO2021022983A1 (en) | Image processing method and apparatus, electronic device and computer-readable storage medium | |
WO2021164234A1 (en) | Image processing method and image processing device | |
CN105740775A (en) | Three-dimensional face living body recognition method and device | |
CN112424795B (en) | Face anti-counterfeiting method, processor chip and electronic equipment | |
CN110276831B (en) | Method and device for constructing three-dimensional model, equipment and computer-readable storage medium | |
CN113591592B (en) | Overwater target identification method and device, terminal equipment and storage medium | |
CN110827375B (en) | Infrared image true color coloring method and system based on low-light-level image | |
CN112861987A (en) | Target detection method under dark light environment | |
CN113763261B (en) | Real-time detection method for far small target under sea fog weather condition | |
Liu et al. | SETR-YOLOv5n: A lightweight low-light lane curvature detection method based on fractional-order fusion model | |
CN115035010A (en) | Underwater image enhancement method based on convolutional network guided model mapping | |
CN116682000B (en) | Underwater frogman target detection method based on event camera | |
CN110688926B (en) | Subject detection method and apparatus, electronic device, and computer-readable storage medium | |
CN117671472B (en) | Underwater multi-target group identification method based on dynamic visual sensor | |
CN116385293A (en) | Foggy-day self-adaptive target detection method based on convolutional neural network | |
JP5278307B2 (en) | Image processing apparatus and method, and program | |
CN114332682B (en) | Marine panorama defogging target identification method | |
CN103400381B (en) | A kind of Method for Underwater Target Tracking based on optical imagery | |
CN113920455B (en) | Night video coloring method based on deep neural network | |
Li et al. | Multi-scale fusion framework via retinex and transmittance optimization for underwater image enhancement | |
CN113537397A (en) | Target detection and image definition joint learning method based on multi-scale feature fusion | |
CN112017128A (en) | Image self-adaptive defogging method | |
CN117218033B (en) | Underwater image restoration method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |