CN117671472A - Underwater multi-target group identification method based on dynamic visual sensor - Google Patents

Underwater multi-target group identification method based on dynamic visual sensor Download PDF

Info

Publication number
CN117671472A
CN117671472A CN202410128788.8A CN202410128788A CN117671472A CN 117671472 A CN117671472 A CN 117671472A CN 202410128788 A CN202410128788 A CN 202410128788A CN 117671472 A CN117671472 A CN 117671472A
Authority
CN
China
Prior art keywords
feature
target group
underwater
module
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410128788.8A
Other languages
Chinese (zh)
Other versions
CN117671472B (en
Inventor
姜宇
王跃航
赵明浩
齐红
魏枫林
王凯
张永霁
郭千仞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202410128788.8A priority Critical patent/CN117671472B/en
Publication of CN117671472A publication Critical patent/CN117671472A/en
Application granted granted Critical
Publication of CN117671472B publication Critical patent/CN117671472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/36Applying a local operator, i.e. means to operate on image points situated in the vicinity of a given point; Non-linear local filtering operations, e.g. median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Nonlinear Science (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Processing (AREA)

Abstract

An underwater multi-target group identification method based on a dynamic vision sensor. The method comprises the following steps: s1, collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor; s2, constructing a dataset by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion; s3, the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network; s4, inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set; s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.

Description

Underwater multi-target group identification method based on dynamic visual sensor
Technical Field
The invention belongs to the technical field of underwater machine vision target detection, and particularly relates to the technical field of underwater multi-target group identification based on a dynamic vision sensor.
Background
With the development of deep sea resources and the increasing demands for protecting marine environments, higher requirements are put on real-time monitoring and efficient operation of underwater environments. In such a background, it has become particularly important to develop techniques capable of accurately detecting and identifying a variety of targets in a complex underwater environment.
The development of underwater multi-target group identification technology has profound effects on the promotion of advances in marine technology. With the exacerbation of global climate change and marine pollution problems, scientists increasingly rely on accurate underwater data to monitor the health of the marine ecosystem, evaluate the impact of human activity on marine organism diversity, and formulate corresponding protective measures. In addition, accurate underwater target detection is essential for archaeologists to explore sunken ship sites, biologists to study submarine communities of organisms, engineers to maintain submarine infrastructure, and the like.
Currently, underwater multi-target population identification detection faces the following challenges:
complex underwater environment: the underwater environment is usually poor in light condition, and a large number of scattering and absorption phenomena exist, which can lead to low underwater image quality and poor contrast, so that target detection becomes difficult. In addition, various interference factors such as sediment, plankton and the like can exist in the underwater environment.
Dynamically changing scenarios: natural factors such as water flow, waves and the like and the motion of the target itself can cause dynamic changes of underwater scenes, which can provide challenges for stable detection and tracking of the target.
Multi-sensor data fusion problem: in order to improve the accuracy of detection, it is often necessary to fuse data from different sensors, such as sonar, optical camera, thermal imager, etc., and the development of data fusion techniques remains a challenge.
Dynamic vision sensors are a new type of vision sensor that has several unique advantages over traditional cameras that make them well suited for environments where dynamic and lighting conditions are poor, such as underwater scenes. The dynamic vision sensor operates on the principle that instead of capturing frame images at fixed time intervals, the change in pixel brightness is detected, and when the change in pixel brightness exceeds a preset threshold, the camera outputs an event.
The advantages of dynamic vision sensors in the identification of multiple underwater target groups are represented by their unique principles of operation and characteristics, which are capable of providing clear, real-time visual information in extremely challenging underwater environments. The high dynamic range of these cameras enables them to capture clear images under highly reflective or dark underwater conditions without failing due to overexposure or underexposure problems common to conventional cameras. Furthermore, the pixel independent response mechanism of the dynamic vision sensor ensures that motion blur does not occur even in fast moving scenes, which is critical for accurate detection of the dynamics of an underwater creature or robot.
In summary, the dynamic vision sensor has obvious advantages in the aspect of underwater multi-target group identification, especially in the underwater application with high real-time requirements and limited energy sources for processing dynamic scenes. However, the processing of event data and the fusion of RGB data requires specialized technical processing, and this field is still under development.
Disclosure of Invention
The invention provides an underwater multi-target group identification method based on a dynamic vision sensor, which aims to solve the problems of low light, motion blur and the like caused by an underwater complex environment when underwater target identification is carried out in the past and the problem of insufficient fusion of underwater event data and RGB data.
The method comprises the following steps:
s1, data acquisition:
collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor;
s2, data set division:
constructing a data set by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion;
s3, constructing a multi-target group identification model:
the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network;
s4, training a multi-target group identification model: inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set;
s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.
Further, the underwater multi-target group event image is obtained by the following steps: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method.
Further, the target detection model includes SSD, efficientDet, retinaNet, YOLOv, YOLOv6, YOLOv7, or YOLOv8.
Further, the multi-target group identification model comprises an event image input end, an RGB image input end, a self-adaptive image enhancement module, two backbone networks, a feature level mode fusion module, a neck network, a head network and an output end;
the underwater multi-target group event image enters a backbone network through an input end to obtain multi-scale event characteristics and then is input into a characteristic level mode fusion module;
the underwater multi-target group RGB image enters the self-adaptive enhancement module through the input end to carry out self-adaptive image enhancement, enters the backbone network after being output to obtain multi-scale image characteristics, and then enters the characteristic-level mode fusion module;
and carrying out feature level mode fusion on the two groups of input images in a feature level mode fusion module, sequentially inputting the fused feature images into a neck network and a head network to obtain a target recognition result, and outputting the target recognition result through an output end.
Further, the self-adaptive image enhancement module comprises a characteristic extraction network and an RGB image enhancement module, wherein the characteristic extraction network is used for predicting self-adaptive adjustment parameters, and the self-adaptive adjustment parameters are used for adjusting the enhancement effect of each filter in the RGB image enhancement module;
the RGB image enhancement module is used for carrying out self-adaptive picture enhancement.
Further, the feature extraction network comprises four feature extraction modules with the same structure and a classification module which are connected in sequence;
each feature extraction module sequentially comprises a 3×3 convolution layer, an activation layer and a maximum pooling layer, the feature extraction modules are used for fully extracting RGB image feature information, each classification module sequentially comprises a global tie pooling layer, an activation layer and a 1×1 convolution layer, and the classification modules are used for integrating the feature information extracted by the feature extraction modules to obtain self-adaptive adjustment parameters.
Further, the RGB image enhancement module comprises a white balance filter, a sharpening filter, a pixel-level filter and a characteristic enhancement module;
the RGB image enters a white balance filter after being input, is respectively input into a sharpening filter and a pixel level filter after being output from the white balance filter, and is simultaneously input into a characteristic enhancement module after being respectively subjected to sharpening treatment and filtering treatment, and is output after being subjected to characteristic enhancement;
the feature enhancement module sequentially comprises a feature fusion layer, a 3 multiplied by 3 convolution layer, an activation layer and a 1 multiplied by 1 convolution layer.
Further, the feature level mode fusion module comprises two multi-scale feature fusion modules, a mode fusion module and a joint output module;
the multi-scale event features are input into one multi-scale feature fusion module, the multi-scale image features are input into another multi-scale feature fusion module, feature images are generated by the feature fusion module and then input into the mode fusion module, the generated feature images are input into the joint output module, the feature images are spliced according to channel dimensions through a Concat function, and then the features are extracted through a 1 multiplied by 1 convolution layer and then output.
Further, after the multiscale event feature is received by the multiscale feature fusion module, the multiscale event feature is obtained by:
+/>+
obtaining a feature map output after fusing multi-scale event featuresWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Deep characteristic information representing different scales;
after receiving the multi-scale image features, the multi-scale feature fusion module passes through:
+/>+
obtaining a feature map output after fusing the features of the multi-scale imagesWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Representing deep feature information of different scales.
Further, the mode fusion module receives a feature mapAnd feature map->Thereafter, by:
obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein, the method comprises the steps of, wherein,=/>+/>,GAP/>representing global average pooling operations,/->Representing convolution operations, sigmoid +.>Representing an activation function, the SiLU represents an activation function.
The method has the beneficial effects that:
(1) In order to cope with the low illumination and motion blur problems commonly existing in underwater environments, the method adopts an innovative self-adaptive image enhancement strategy which is specially optimized for RGB images of underwater multi-target groups. By fine adjustment at the pixel level, the module can enhance key features in the image, thereby improving the visibility and recognition of the object.
(2) Meanwhile, in order to further improve the image quality under the conditions of insufficient light and visual disturbance in the underwater complex environment, event data is introduced as a supplementary information source. In order to effectively integrate the two data types, a feature level mode fusion module is independently developed, and can deeply mine and combine features in RGB images and event data, so that information which is possibly lost in a single mode is made up, and further accuracy of target detection and adaptability of a system are remarkably improved. By the bimodal information fusion method, the influence of the underwater extreme environment on the image quality is overcome, and the stability and the robustness of the target detection algorithm in the face of complex underwater scenes are enhanced. The target detection system adopting the method can realize more efficient and accurate multi-target group identification in changeable underwater conditions, and the performance of underwater detection and monitoring tasks is obviously improved.
Drawings
FIG. 1 is a schematic diagram of a multi-objective group identification model in an embodiment of the present invention;
FIG. 2 is a block diagram of an adaptive image enhancement module in an embodiment of the present invention;
fig. 3 is a block diagram of a feature level modality fusion module in an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment provides an underwater multi-target group identification method based on a dynamic vision sensor, which comprises the following steps:
s1, data acquisition:
collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor;
s2, data set division:
constructing a data set by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion;
s3, constructing a multi-target group identification model:
the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network;
s4, training a multi-target group identification model: inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set;
s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.
The underwater multi-target group event image is obtained in the following way: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method, wherein the underwater multi-target group event image corresponds to the underwater multi-target group RGB image one by one.
The object detection model includes SSD, efficientDet, retinaNet, YOLOv, YOLOv6, YOLOv7 or YOLOv8, which models all contain three parts, namely a backbone network, a neck network and a head network.
As shown in fig. 1, the multi-target group identification model includes an event image input end, an RGB image input end, an adaptive image enhancement module, two backbone networks, a feature level modality fusion module, a neck network, a head network, and an output end;
the underwater multi-target group event image enters a backbone network through an input end to obtain multi-scale event characteristics and then is input into a characteristic level mode fusion module;
the underwater multi-target group RGB image enters the self-adaptive enhancement module through the input end to carry out self-adaptive image enhancement, enters the backbone network after being output to obtain multi-scale image characteristics, and then enters the characteristic-level mode fusion module;
and carrying out feature level mode fusion on the two groups of input images in a feature level mode fusion module, sequentially inputting the fused feature images into a neck network and a head network to obtain a target recognition result, and outputting the target recognition result through an output end.
The self-adaptive image enhancement module comprises a characteristic extraction network and an RGB image enhancement module, wherein the characteristic extraction network is used for predicting self-adaptive adjustment parameters, and the self-adaptive adjustment parameters are used for adjusting the enhancement effect of each filter in the RGB image enhancement module; the RGB image enhancement module is used for carrying out self-adaptive picture enhancement.
As shown in fig. 2, the feature extraction network includes four feature extraction modules with the same structure and a classification module which are sequentially connected;
each feature extraction module sequentially comprises a 3×3 convolution layer, an activation layer and a maximum pooling layer, the feature extraction modules are used for fully extracting RGB image feature information, each classification module sequentially comprises a global tie pooling layer, an activation layer and a 1×1 convolution layer, and the classification modules are used for integrating the feature information extracted by the feature extraction modules to obtain self-adaptive adjustment parameters.
Adaptive adjustment parameters for adjusting enhancement effects of filters in RGB image enhancement modules, including、/>、/>、/>、/>The method specifically comprises the following steps:
=/>
=/>
=/>
= Conv(SiLU(Conv(Concat[/>:/>])));
r, G, B each represent three channels of the input image, convRepresenting convolution operations, siLU->Representing an activation function,/->For the output characteristic diagram enhanced by the white balance filter Gau->For the gaussian filter to operate as a gaussian filter,for sharpening the filter enhanced output profile, +.>For the enhanced output feature map of the pixel-level filter,for integration->And->And (3) outputting the feature map, wherein Concat represents splicing the feature map according to the channel dimension.
As shown in fig. 2, the RGB image enhancement module includes a white balance filter, a sharpening filter, a pixel-level filter, and a feature enhancement module;
the RGB image enters a white balance filter after being input, is respectively input into a sharpening filter and a pixel level filter after being output from the white balance filter, and is simultaneously input into a characteristic enhancement module after being respectively subjected to sharpening treatment and filtering treatment, and is output after being subjected to characteristic enhancement;
the feature enhancement module sequentially comprises a feature fusion layer, a 3 multiplied by 3 convolution layer, an activation layer and a 1 multiplied by 1 convolution layer.
As shown in fig. 3, the feature-level mode fusion module includes two multi-scale feature fusion modules, a mode fusion module, and a joint output module;
the multi-scale event features are input into one multi-scale feature fusion module, the multi-scale image features are input into another multi-scale feature fusion module, feature images are generated by the feature fusion module and then input into the mode fusion module, the generated feature images are input into a joint output module, in the joint output module, the feature images are spliced according to channel dimensions through a Concat function, and then the features are extracted through a 1 multiplied by 1 convolution layer and then output.
After the multiscale event features are received by the multiscale feature fusion module, the multiscale event features are obtained by:
+/>+
obtaining a feature map output after fusing multi-scale event featuresWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Deep characteristic information representing different scales;
after receiving the multi-scale image features, the multi-scale feature fusion module passes through:
+/>+
obtaining a feature map output after fusing the features of the multi-scale imagesWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Representing deep feature information of different scales.
The mode fusion module receives the feature mapAnd feature map->Thereafter, by:
obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein, the method comprises the steps of, wherein,=/>+/>,GAP/>representing global average pooling operations,/->Representing convolution operations, sigmoid/>Representing an activation function, the SiLU represents an activation function.
According to the technical scheme, the underwater multi-target group RGB image and the event are acquired through the dynamic vision sensor, meanwhile, the self-adaptive feature enhancement module is used for carrying out self-adaptive feature enhancement on the RGB image, the event data are represented as event pictures through a time-space voxel representation method, and the multi-scale features of the RGB image and the event image are integrated through the feature level mode fusion module, so that accurate and efficient target identification is achieved.

Claims (10)

1. An underwater multi-target group identification method based on a dynamic vision sensor is characterized by comprising the following steps:
s1, data acquisition:
collecting underwater multi-target group RGB images and underwater multi-target group events by using a dynamic vision sensor;
s2, data set division:
constructing a data set by using the underwater multi-target group event image and the underwater multi-target group RGB image, and dividing a training set and a verification set according to the proportion;
s3, constructing a multi-target group identification model:
the multi-target group identification model is based on a target detection model, a self-adaptive image enhancement module is embedded in front of a backbone network of the target detection model, and a feature level mode fusion module is embedded between the backbone network of the target detection model and a neck network;
s4, training a multi-target group identification model: inputting the data of the training set into the multi-target group identification model in the step S4 for training so as to obtain model parameters meeting the requirements, and verifying the effect through the verification set;
s5, carrying out underwater multi-target group identification through the trained multi-target group identification model.
2. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 1, wherein the underwater multi-target group event image is obtained by the following steps: and characterizing the underwater multi-target group event as an underwater multi-target group event image by using a time-space voxel characterization method.
3. The dynamic vision sensor-based underwater multi-target population identification method of claim 1, wherein the target detection model comprises SSD, efficientDet, retinaNet, YOLOv5, YOLOv6, YOLOv7 or YOLOv8.
4. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 1, wherein the multi-target group identification model comprises an event image input end, an RGB image input end, an adaptive image enhancement module, two backbone networks, a feature level mode fusion module, a neck network, a head network and an output end;
the underwater multi-target group event image enters a backbone network through an input end to obtain multi-scale event characteristics and then is input into a characteristic level mode fusion module;
the underwater multi-target group RGB image enters the self-adaptive enhancement module through the input end to carry out self-adaptive image enhancement, enters the backbone network after being output to obtain multi-scale image characteristics, and then enters the characteristic-level mode fusion module;
and carrying out feature level mode fusion on the two groups of input images in a feature level mode fusion module, sequentially inputting the fused feature images into a neck network and a head network to obtain a target recognition result, and outputting the target recognition result through an output end.
5. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 4, wherein the adaptive image enhancement module comprises a feature extraction network and an RGB image enhancement module, the feature extraction network is used for predicting adaptive adjustment parameters, and the adaptive adjustment parameters are used for adjusting enhancement effects of filters in the RGB image enhancement module;
the RGB image enhancement module is used for carrying out self-adaptive picture enhancement.
6. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 5, wherein the feature extraction network comprises four feature extraction modules with the same structure and a classification module which are connected in sequence;
each feature extraction module sequentially comprises a 3×3 convolution layer, an activation layer and a maximum pooling layer, the feature extraction modules are used for fully extracting RGB image feature information, each classification module sequentially comprises a global tie pooling layer, an activation layer and a 1×1 convolution layer, and the classification modules are used for integrating the feature information extracted by the feature extraction modules to obtain self-adaptive adjustment parameters.
7. The dynamic vision sensor-based underwater multi-target population identification method of claim 5, wherein the RGB image enhancement module comprises a white balance filter, a sharpening filter, a pixel-level filter, and a feature enhancement module;
the RGB image enters a white balance filter after being input, is respectively input into a sharpening filter and a pixel level filter after being output from the white balance filter, and is simultaneously input into a characteristic enhancement module after being respectively subjected to sharpening treatment and filtering treatment, and is output after being subjected to characteristic enhancement;
the feature enhancement module sequentially comprises a feature fusion layer, a 3 multiplied by 3 convolution layer, an activation layer and a 1 multiplied by 1 convolution layer.
8. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 4, wherein the feature level modal fusion module comprises two multi-scale feature fusion modules, a modal fusion module and a joint output module;
the multi-scale event features are input into one multi-scale feature fusion module, the multi-scale image features are input into another multi-scale feature fusion module, feature images are generated by the feature fusion module and then input into the mode fusion module, the generated feature images are input into the joint output module, the feature images are spliced according to channel dimensions through a Concat function, and then the features are extracted through a 1 multiplied by 1 convolution layer and then output.
9. The method for identifying the underwater multi-target group based on the dynamic vision sensor according to claim 8, wherein after the multi-scale feature fusion module receives the multi-scale event features, the method comprises the following steps:
+/>+
obtaining a feature map output after fusing multi-scale event featuresWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Deep characteristic information representing different scales;
after receiving the multi-scale image features, the multi-scale feature fusion module passes through:
+/>+
obtaining a feature map output after fusing the features of the multi-scale imagesWherein->Representing shallow feature information obtained from a front-layer feature extraction network in a backbone network, +.>To->Representing deep feature information of different scales.
10. The method for identifying the underwater multi-target group based on the dynamic vision sensor as claimed in claim 8, wherein the mode fusion module receives the feature mapAnd feature map->Thereafter, by:
obtaining integrationAnd->Derived feature map->Integration->And->Derived feature map->Wherein->= + />,GAP/>Representing global average pooling operations,/->Representing convolution operations, sigmoid +.>Representing an activation function, the SiLU represents an activation function.
CN202410128788.8A 2024-01-31 2024-01-31 Underwater multi-target group identification method based on dynamic visual sensor Active CN117671472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410128788.8A CN117671472B (en) 2024-01-31 2024-01-31 Underwater multi-target group identification method based on dynamic visual sensor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410128788.8A CN117671472B (en) 2024-01-31 2024-01-31 Underwater multi-target group identification method based on dynamic visual sensor

Publications (2)

Publication Number Publication Date
CN117671472A true CN117671472A (en) 2024-03-08
CN117671472B CN117671472B (en) 2024-05-14

Family

ID=90079183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410128788.8A Active CN117671472B (en) 2024-01-31 2024-01-31 Underwater multi-target group identification method based on dynamic visual sensor

Country Status (1)

Country Link
CN (1) CN117671472B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627504A (en) * 2021-08-02 2021-11-09 南京邮电大学 Multi-mode multi-scale feature fusion target detection method based on generation of countermeasure network
CN114202475A (en) * 2021-11-24 2022-03-18 北京理工大学 Adaptive image enhancement method and system
CN114332559A (en) * 2021-12-17 2022-04-12 安徽理工大学 RGB-D significance target detection method based on self-adaptive cross-modal fusion mechanism and depth attention network
CN114926514A (en) * 2022-05-13 2022-08-19 北京交通大学 Registration method and device of event image and RGB image
CN116797946A (en) * 2023-05-11 2023-09-22 南京航空航天大学 Cross-modal fusion target detection method based on unmanned aerial vehicle
CN117132759A (en) * 2023-08-02 2023-11-28 上海无线电设备研究所 Saliency target detection method based on multiband visual image perception and fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627504A (en) * 2021-08-02 2021-11-09 南京邮电大学 Multi-mode multi-scale feature fusion target detection method based on generation of countermeasure network
CN114202475A (en) * 2021-11-24 2022-03-18 北京理工大学 Adaptive image enhancement method and system
CN114332559A (en) * 2021-12-17 2022-04-12 安徽理工大学 RGB-D significance target detection method based on self-adaptive cross-modal fusion mechanism and depth attention network
CN114926514A (en) * 2022-05-13 2022-08-19 北京交通大学 Registration method and device of event image and RGB image
CN116797946A (en) * 2023-05-11 2023-09-22 南京航空航天大学 Cross-modal fusion target detection method based on unmanned aerial vehicle
CN117132759A (en) * 2023-08-02 2023-11-28 上海无线电设备研究所 Saliency target detection method based on multiband visual image perception and fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIMIN ZHANG.ETC: ""ESarDet: An Efficient SAR Ship Detection Method Based on Context Information and Large Effective Receptive Field"", 《REMOTE SENSING》, 9 June 2023 (2023-06-09) *

Also Published As

Publication number Publication date
CN117671472B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
WO2020259118A1 (en) Method and device for image processing, method and device for training object detection model
CN108764071B (en) Real face detection method and device based on infrared and visible light images
WO2021022983A1 (en) Image processing method and apparatus, electronic device and computer-readable storage medium
WO2021164234A1 (en) Image processing method and image processing device
CN105740775A (en) Three-dimensional face living body recognition method and device
CN112424795B (en) Face anti-counterfeiting method, processor chip and electronic equipment
CN110276831B (en) Method and device for constructing three-dimensional model, equipment and computer-readable storage medium
CN113591592B (en) Overwater target identification method and device, terminal equipment and storage medium
CN110827375B (en) Infrared image true color coloring method and system based on low-light-level image
CN112861987A (en) Target detection method under dark light environment
CN113763261B (en) Real-time detection method for far small target under sea fog weather condition
Liu et al. SETR-YOLOv5n: A lightweight low-light lane curvature detection method based on fractional-order fusion model
CN115035010A (en) Underwater image enhancement method based on convolutional network guided model mapping
CN116682000B (en) Underwater frogman target detection method based on event camera
CN110688926B (en) Subject detection method and apparatus, electronic device, and computer-readable storage medium
CN117671472B (en) Underwater multi-target group identification method based on dynamic visual sensor
CN116385293A (en) Foggy-day self-adaptive target detection method based on convolutional neural network
JP5278307B2 (en) Image processing apparatus and method, and program
CN114332682B (en) Marine panorama defogging target identification method
CN103400381B (en) A kind of Method for Underwater Target Tracking based on optical imagery
CN113920455B (en) Night video coloring method based on deep neural network
Li et al. Multi-scale fusion framework via retinex and transmittance optimization for underwater image enhancement
CN113537397A (en) Target detection and image definition joint learning method based on multi-scale feature fusion
CN112017128A (en) Image self-adaptive defogging method
CN117218033B (en) Underwater image restoration method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant