CN117557795A - Underwater target semantic segmentation method and system based on multi-source data fusion - Google Patents

Underwater target semantic segmentation method and system based on multi-source data fusion Download PDF

Info

Publication number
CN117557795A
CN117557795A CN202410035082.7A CN202410035082A CN117557795A CN 117557795 A CN117557795 A CN 117557795A CN 202410035082 A CN202410035082 A CN 202410035082A CN 117557795 A CN117557795 A CN 117557795A
Authority
CN
China
Prior art keywords
module
semantic segmentation
cbr
representing
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410035082.7A
Other languages
Chinese (zh)
Other versions
CN117557795B (en
Inventor
姜宇
郭千仞
魏枫林
赵明浩
齐红
王凯
王跃航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202410035082.7A priority Critical patent/CN117557795B/en
Publication of CN117557795A publication Critical patent/CN117557795A/en
Application granted granted Critical
Publication of CN117557795B publication Critical patent/CN117557795B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an underwater target semantic segmentation method based on multi-source data fusion, and belongs to the technical field of underwater machine vision semantic segmentation. Step 1, acquiring a data set constructed by an underwater target event image and an RGB image, and dividing a training set and a verification set; step 2, designing a cross-modal attention module and a cross-channel attention module; step 3, embedding the cross-modal attention module and the cross-channel attention module into a designed multi-source data fusion module; step 4, embedding the multi-source data fusion module into the constructed semantic segmentation model, and training and verifying the semantic segmentation model; and 5, performing semantic segmentation on the underwater target by using the semantic segmentation model in the step 4. And acquiring an underwater target event sequence and RGB images by using an event camera, and fusing the underwater target event sequence and RGB image information with high-efficiency and sufficient data characteristic information to provide rich characteristic information for underwater target semantic segmentation.

Description

Underwater target semantic segmentation method and system based on multi-source data fusion
Technical Field
The invention belongs to the technical field of underwater machine vision semantic segmentation, and particularly relates to an underwater target semantic segmentation method and system based on multi-source data fusion.
Background
In recent years, research on underwater target semantic segmentation methods has profound and extensive effects on various fields. The progress of the technology brings new dimensions for ocean science research, so that scientists can more accurately explore the influence of ocean ecosystems, geological structures and climate change on the ocean. In the exploration and management of underwater resources, the accuracy of semantic segmentation is directly related to sustainable development of resources. In addition, the application of the underwater robots in the fields of ocean science, detection and maintenance is gradually increased, semantic segmentation provides the underwater robots with more intelligent ability of sensing and understanding the underwater environment, and the autonomy and accuracy of tasks are improved. In the field of underwater communication and navigation, semantic segmentation is beneficial to the underwater equipment to adapt to complex environments better, and communication reliability and navigation accuracy are improved. In addition, the technology also shows important application prospects in the aspects of underwater cultural heritage protection, emergency searching and rescue and the like. Therefore, the research of the underwater target semantic segmentation method not only promotes the development of science, but also provides an innovative solution for practical application and provides powerful support for people to understand and utilize the underwater environment more deeply.
Underwater target semantic segmentation is a key task in the field of computer vision and faces various challenges. First, the complexity of the underwater illumination conditions results in shadows, scattering and color distortion in the image, which in turn affects the stability and accuracy of the algorithm. Secondly, various objects and organisms exist in the water, the shape and texture of the objects and organisms are large in difference, and recognition and classification difficulties are brought to the semantic segmentation model. In addition, the motion blur of the underwater object also produces serious interference on the accuracy of the underwater semantic segmentation task. These problems, taken together, hamper further development and practical application of underwater target semantic segmentation techniques. Thus, solving these challenges is critical to improving the performance and practicality of underwater target semantic segmentation.
In underwater targets, the application of event cameras can bring a series of advantages to the semantic segmentation task. Compared with the traditional frame camera, the event camera can realize high-frequency event capturing under the condition of being not influenced by illumination conditions and background complexity by observing the event response of each pixel point.
First, the high sensitivity and fast response characteristics of event cameras make them more suitable for complex underwater environments, especially in poor illumination and limited visibility situations. The event camera can capture the events occurring in the scene in real time, efficiently and accurately acquire the texture form information of the target, and does not need to rely on the inter-frame difference of the traditional camera, so that challenges brought by a series of underwater complex environments such as unstable underwater illumination can be better dealt with. Second, event cameras are more sensitive to small objects and fast motion response, which is particularly critical in underwater objects. There may be fast moving living beings or objects under water, the traditional camera may lose detail due to motion blur, while the event camera can capture each event, providing more accurate feature information for semantic segmentation. In addition, the event camera has the advantages of lower power consumption and high bandwidth, which is particularly important for underwater equipment such as a submersible and the like with limited resources. The method can collect data with higher efficiency, provide richer data input for a semantic segmentation algorithm, and simultaneously reduce the requirements on equipment energy and storage resources.
Therefore, the event camera has better adaptability in the underwater target semantic segmentation, can overcome various limitations of the traditional camera in the underwater environment, provides more reliable and real-time perception information, and provides powerful support for successful implementation of underwater tasks.
Disclosure of Invention
The invention provides an underwater target semantic segmentation method based on multi-source data fusion, which utilizes an event camera to acquire an underwater target event sequence and RGB (red, green and blue) images, and performs high-efficiency and full data characteristic information fusion on the underwater target event sequence and the RGB image information, so as to provide rich characteristic information for underwater target semantic segmentation.
The invention also provides an underwater target semantic segmentation system based on the multi-source data fusion, which is used for realizing an underwater target semantic segmentation method based on the multi-source data fusion.
The invention is realized by the following technical scheme:
an underwater target semantic segmentation method based on multi-source data fusion comprises the following steps:
step 1, acquiring a data set constructed by an underwater target event image and an RGB image according to 8:2, dividing the training set and the verification set in proportion;
step 2, designing a cross-modal attention module and a cross-channel attention module;
Step 3, embedding the cross-modal attention module and the cross-channel attention module into a designed multi-source data fusion module;
step 4, embedding the multi-source data fusion module into the constructed semantic segmentation model, training the semantic segmentation model by using the training set in the step 1, and verifying the trained semantic segmentation model by using the verification set;
and 5, performing semantic segmentation on the underwater target by using the semantic segmentation model in the step 4.
Further, the step 1 specifically includes collecting an underwater target event sequence and RGB images by using an event camera; and characterizing the acquired underwater target event sequence as an event image by adopting a fixed time interval method.
Furthermore, the step 2 designs a cross-modal attention module, which is specifically composed of a dual-branch architecture, wherein one branch is a CBR module, and the CBR module is composed of convolution, batch standardization and ReLU activation functions, and is capable of generating attention feature map information in a modal dimension in a serialized manner, and then multiplying the modal attention feature map with an original input feature map and performing adaptive modal feature screening to generate a final feature map.
Furthermore, the step 2 is designed to be a cross-channel attention module, specifically, the cross-channel attention module is formed by a three-branch architecture, wherein one branch is formed by one CBR module, the other branch is formed by connecting two CBR modules in series, and the two branches are simultaneously used as residual edges and finally added with the cross-channel attention mechanism output. The method can sequentially generate attention feature map information in the channel dimension, then multiply the channel attention feature map with the original input feature map and perform self-adaptive channel feature screening to generate a final feature map.
Further, the multi-source data fusion module in the step 3 specifically includes two CBR branches, a cross-modal attention module branch and a cross-channel attention module branch;
the CBR branch is made up of two CBR modules in series,
the cross-modal attention module branch is formed by a cross-modal attention module in series with a CBR module,
the cross-channel attention module branch is formed by connecting a cross-channel attention module and a CBR module in series;
the output of one of the CBR branches is summed with the output of the cross-modal attention module,
the output of one CBR branch is added to the output of the cross-channel attention module;
the feature graphs after the two branches are added are spliced according to the channel dimension and then enter a CBR module to obtain a fusion feature graph of 3 x 640 x 640, so that final multi-source data feature fusion is realized.
Further, for the operation of the cross-modal fusion attention module, the calculation formula is as follows:
for cross-channel fusion attention module operation, the calculation formula is as follows:
wherein,representing the input characteristic map of the object,Conv/>representing the operation of a convolution,DWConv/>representing a depth-separable convolution operation,CBR/>on behalf of the CBR module operation, NormThe representative layer is subjected to a standardized operation,
R representing morphological transformation of feature vector, +.> /> />Represents a query vector, a modal key vector, a modal value vector, based on modal dimensions +.> /> />Representing a channel dimension based query vector, a channel key vector, a channel value vector,/a channel value vector> />Representing transpose of the current modal key vector,/-, is performed>Representing a feature vector derived from modal attention, < +.>Representing the feature vector derived from the channel attention,
is a learning parameterA number.
Further, for the operation of the multi-source data fusion module, the calculation formula is as follows:
wherein,representing an event feature map, < >>Representing RGB feature map, ">Represents->And->Feature map is spliced according to dimensions, and the feature map is added with the dimensions>Represents->And->According to the characteristic diagram obtained after channel dimension splicing, +.>Representing a first branch CBR feature branch operation, < >>Representing a second CBR feature branching operation,CMFArepresenting a modal interaction attention branching operation,CCFArepresenting a channel interaction attention branching operation, +.>A feature map obtained by adding an output feature map representing a first CBR feature branch operation and an output feature map of a modal interaction attention branch operation,/a feature map obtained by adding the output feature map representing a modal interaction attention branch operation and the output feature map representing a modal interaction attention branch operation>A feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map of the channel interaction attention branch operation,/a feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map representing the channel interaction attention branch operation >Represents->And->The characteristic diagram is obtained after the channel dimension is spliced,CBR/>representing CBR module operation,/->An output signature representing the operation of the CBR module.
An underwater target semantic segmentation system based on multi-source data fusion, comprising
The construction unit is used for acquiring a data set constructed by the underwater target event image and the RGB image according to 8:2, dividing the training set and the verification set in proportion;
the semantic segmentation model unit is used for embedding the designed cross-modal attention module and the cross-channel attention module into the designed multi-source data fusion module; embedding the multisource data fusion module into the constructed semantic segmentation model, training the semantic segmentation model by using a training set, and verifying the trained semantic segmentation model by using a verification set;
and the application module performs semantic segmentation on the underwater target by using the semantic segmentation model unit.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the underwater target semantic segmentation method based on multi-source data fusion as described above when executing the computer program.
A computer readable storage medium having stored therein a computer program which when executed by a processor implements the above-described underwater target semantic segmentation method based on multi-source data fusion.
The beneficial effects of the invention are as follows:
the invention utilizes an event camera to acquire an underwater target event sequence and RGB images. And designing a cross-modal attention module for interacting the underwater target event image with the RGB image features. And designing a cross-channel attention module for interacting channel characteristics formed by splicing the underwater target event image and the RGB image characteristics. And designing a multi-source data fusion module for fully fusing the underwater target event image and the RGB image features. The method can effectively influence a series of negative influences on segmentation, such as target motion blur, underwater illumination unevenness, underwater target morphological texture feature information blur and the like in an underwater environment. According to the underwater target semantic segmentation method based on multi-source data fusion. The semantic segmentation of the underwater target can be realized in a high-efficient manner.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a block diagram of a cross-modal attention module of the present invention.
FIG. 3 is a block diagram of a cross-channel attention module of the present invention.
Fig. 4 is a block diagram of a multi-source data fusion module of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The following description of the embodiments of the present application, taken in conjunction with fig. 1-4 of the present application, will clearly and fully describe the embodiments described herein, as examples, only in some, but not all embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.
Example 1
1-4, the invention provides an underwater target semantic segmentation method based on multi-source data fusion, which comprises the following steps:
step 1, acquiring a data set constructed by an underwater target event image and an RGB image, and simultaneously according to 8:2, dividing the training set and the verification set in proportion;
step 2, designing a cross-modal attention module and a cross-channel attention module;
step 3, embedding the cross-modal attention module and the cross-channel attention module into a designed multi-source data fusion module;
step 4, embedding the multi-source data fusion module into the constructed semantic segmentation model, training the semantic segmentation model by using the training set in the step 1, and verifying the trained semantic segmentation model by using the verification set; training a semantic segmentation model by using the data set constructed in the step 1, and verifying a network training effect by using a verification set, so that the model can fully extract the characteristic information of the underwater target, thereby carrying out fine semantic segmentation on the underwater target;
And 5, performing semantic segmentation on the underwater target by using the semantic segmentation model in the step 4. The method comprises the steps of obtaining an underwater target RGB image and an underwater target event sequence by using an event camera, representing the underwater target event sequence into an underwater target event image by using a fixed time interval method, and extracting characteristic information and performing semantic segmentation on the underwater target by using a trained model.
Further, the step 1 specifically includes collecting an underwater target event sequence and RGB images by using an event camera; characterizing the collected underwater target event sequence as an event image by adopting a fixed time interval method;
the event data is converted into an image by using a fixed time interval method. Specifically, the frame reconstruction is set to 20 milliseconds so that the detection frequency reaches 50 frames per second. In each time interval, according to the pixel position of an event in the event sequence, a white pixel is used for representing the event with increased polarity on a pixel point generated by the corresponding polarity, and a black pixel is used for representing the event with reduced polarity. The background color of the overall image is set to gray, and the finally generated event image has the same size as the RGB image.
Furthermore, the step 2 designs a cross-mode attention module, which is specifically composed of a dual-branch architecture and is used for interacting the underwater target event image and the RGB image mode characteristics. One of the branches is a CBR (Convolume+ Batch Normalization +relu) module, which consists of Convolution, batch normalization, and ReLU activation functions, for fully extracting spatial features of the data, while finally adding to the cross-modal attentive mechanism output as a residual edge. The cross-modal attention module is designed, attention feature map information can be generated in a modal dimension in a serialized mode, and then the modal attention feature map is multiplied with the original input feature map and subjected to self-adaptive modal feature screening to generate a final feature map. Modeling global modal characteristics is achieved, and therefore finer-granularity modal characteristic information extraction is conducted.
Furthermore, the step 2 designs a cross-channel attention module, specifically, a three-branch architecture, for interacting the channel dimension characteristics of the underwater target event image and the RGB image. One branch consists of one CBR module, the other branch consists of two CBR modules which are connected in series and are used for fully extracting the target space characteristic information, and the two branches are simultaneously used as residual edges and finally added with the cross-channel attention mechanism output. The cross-channel attention module is designed, attention feature map information can be generated in the channel dimension in a serialized mode, and then the channel attention feature map is multiplied with the original input feature map and subjected to self-adaptive channel feature screening to generate a final feature map. Modeling global channel characteristics is achieved, and therefore finer-granularity channel characteristic information extraction is conducted.
Furthermore, the multi-source data fusion module in the step 3 is specifically designed to be based on the underwater target RGB image and the underwater target event image, and the event image and the RGB image are simultaneously input into the data fusion network. The multi-source data fusion module comprises two CBR branches, a cross-modal attention module branch and a cross-channel attention module branch;
Wherein: the CBR branch is made up of two CBR modules in series,
the cross-modal attention module branch is formed by a cross-modal attention module in series with a CBR module,
the cross-channel attention module branch is formed by connecting a cross-channel attention module and a CBR module in series;
the output of one of the CBR branches is summed with the output of the cross-modal attention module,
the output of one CBR branch is added to the output of the cross-channel attention module;
the feature graphs after the two branches are added are spliced according to the channel dimension and then enter a CBR module to obtain a fusion feature graph of 3 x 640 x 640, so that final multi-source data feature fusion is realized.
Further, for the operation of the cross-modal fusion attention module, the calculation formula is as follows:
for cross-channel fusion attention module operation, the calculation formula is as follows:
wherein,representing the input characteristic map of the object,Conv/>representing the operation of a convolution,DWConv/>representing a depth-separable convolution operation,CBR/>on behalf of the CBR module operation,Normthe representative layer is subjected to a standardized operation,
R representing morphological transformation of feature vector, +.> /> />Represents a query vector, a modal key vector, a modal value vector, based on modal dimensions +.> /> />Representing a channel dimension based query vector, a channel key vector, a channel value vector,/a channel value vector > />Representing transpose of the current modal key vector,/-, is performed>Representing a feature vector derived from modal attention, < +.>Representing the feature vector derived from the channel attention,
is a learnable parameter.
Further, for the operation of the multi-source data fusion module, the calculation formula is as follows:
wherein,representing an event feature map, < >>Representing RGB feature map, ">Represents->And->Feature map is spliced according to dimensions, and the feature map is added with the dimensions>Represents->And->According to the characteristic diagram obtained after channel dimension splicing, +.>Representing a first branch CBR feature branch operation, < >>Representing a second CBR feature branching operation,CMFArepresenting a modal interaction attention branching operation,CCFArepresenting a channel interaction attention branching operation, +.>A feature map obtained by adding an output feature map representing a first CBR feature branch operation and an output feature map of a modal interaction attention branch operation,/a feature map obtained by adding the output feature map representing a modal interaction attention branch operation and the output feature map representing a modal interaction attention branch operation>A feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map of the channel interaction attention branch operation,/a feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map representing the channel interaction attention branch operation>Represents->And->The characteristic diagram is obtained after the channel dimension is spliced,CBR/>representing CBR module operation,/->An output signature representing the operation of the CBR module.
The semantic segmentation model constructed in the step 4 is specifically that before the feature fusion network is embedded into the semantic segmentation model, any semantic segmentation model can be selected as the semantic segmentation model. The embedding of the multi-source data fusion module enables the semantic segmentation model to obtain an input feature map with richer feature information, so that the model realizes finer semantic segmentation.
The training and verifying the semantic segmentation model in the step 4 is specifically to obtain an underwater target RGB image and an underwater target event sequence by using an event camera, characterize the underwater target event sequence into an underwater target event image by using a fixed time interval method, and extract characteristic information and perform semantic segmentation on the underwater target by using the model trained in the step 7.
Specifically, fig. 1 is a schematic flow chart of an underwater target semantic segmentation method based on multi-source data fusion, in this embodiment, a shoal event sequence and RGB images are obtained through an event camera, a cross-modal attention module is designed for fully interacting characteristic information of the shoal RGB images and the shoal event images, a cross-channel attention module is designed for fully extracting texture characteristic information of the shoal event images, and a multi-source data fusion module is designed for carrying out characteristic fusion on the shoal RGB image characteristic information and the shoal event image characteristic information. The semantic segmentation model finally acquires rich fish swarm characteristic information, and accurate and efficient semantic segmentation of the fish swarm target is realized.
As shown in fig. 1, the method of this embodiment specifically includes the following steps:
and step 1, acquiring a fish shoal event sequence and RGB images by using an event camera.
Wherein 1000 fish school RGB images and fish school events were acquired using an event camera.
And 2, setting the frame reconstruction to be a fixed frame length of 20ms by adopting a fixed time interval method for the fish shoal event sequence acquired in the step 1, so that the detection frequency reaches 50 frames per second. By employing a fixed time interval approach, event data is characterized as an image. Specifically, the frame reconstruction is set to 20 milliseconds so that the detection frequency reaches 50 frames per second. In each time interval, according to the pixel position of an event in the event sequence, a white pixel is used for representing the event with increased polarity on a pixel point generated by the corresponding polarity, and a black pixel is used for representing the event with reduced polarity. The background color of the overall image is set to gray, and the finally generated event image has the same size as the RGB image.
And 3, designing a cross-mode attention module for fully interacting the fish-shoal event image and RGB image characteristics.
Specifically, one of the branches is composed of a CBR (Convolume+ Batch Normalization +relu) module, which is composed of Convolution, batch normalization, and ReLU activation functions, for fully extracting spatial features of the data, while finally adding to the cross-modal attentive mechanism output as a residual edge. The cross-modal attention module is designed, attention feature map information can be generated in a modal dimension in a serialized mode, and then the modal attention feature map is multiplied with the original input feature map and subjected to self-adaptive modal feature screening to generate a final feature map. Modeling global modal characteristics is achieved, and therefore finer-granularity modal characteristic information extraction is conducted.
And 4, designing a cross-channel attention module for fully extracting image features of the shoal event.
Specifically, the module is composed of a three-branch architecture and is used for interacting the channel dimension characteristics of the fish-shoal event image and the RGB image. One branch consists of one CBR module, and the other branch consists of two CBR modules which are connected in series and are used for fully extracting target space characteristic information, and the two branches are simultaneously used as residual edges and finally added with the cross-channel attention mechanism output. The cross-channel attention module is designed, attention feature map information can be generated in the channel dimension in a serialized mode, and then the channel attention feature map is multiplied with the original input feature map and subjected to self-adaptive channel feature screening to generate a final feature map. Modeling global channel characteristics is achieved, and therefore finer-granularity channel characteristic information extraction is conducted.
And 5, designing a fish group RGB image and event image feature fusion network, and inputting the event image and the RGB image into the network at the same time.
Specifically, the event image and the RGB image are simultaneously input into the data fusion network. The feature fusion network is composed of two CBR branches, a cross-modal attention module branch and a cross-channel attention module branch. Wherein: the CBR branch part branch is formed by connecting two CBR modules in series, the cross-modal attention module branch is formed by connecting a cross-modal attention module and a CBR module in series, and the cross-channel attention module branch is formed by connecting a cross-channel attention module and a CBR module in series. The output of one CBR branch is added to the output of the cross-modal attention module and the output of one CBR branch is added to the output of the cross-channel attention module. The feature graphs after the two branches are added are spliced according to the channel dimension and then enter a CBR module to obtain a fusion feature graph of 3 multiplied by 640, so that final multi-source data feature fusion is realized.
Further, for the operation of the cross-modal fusion attention module, the calculation formula is as follows:
for cross-channel fusion attention module operation, the calculation formula is as follows:
wherein,representing the input characteristic map of the object,Conv/>representing the operation of a convolution,DWConv/>representing a depth-separable convolution operation,CBR/>on behalf of the CBR module operation,Normthe representative layer is subjected to a standardized operation,
R representing morphological transformation of feature vector, +.> /> />Represents a query vector, a modal key vector, a modal value vector, based on modal dimensions +.> /> />Representing a channel dimension based query vector, a channel key vector, a channel value vector,/a channel value vector> />Representing transpose of the current modal key vector,/-, is performed>Representing a feature vector derived from modal attention, < +.>Representing the feature vector derived from the channel attention,
is a learnable parameter.
For the operation of the multi-source data fusion module, the calculation formula is as follows:
wherein,representing an event feature map, < >>Representing RGB feature map, ">Represents->And->Feature map is spliced according to dimensions, and the feature map is added with the dimensions>Represents->And->According to the characteristic diagram obtained after channel dimension splicing, +.>Representing a first branch CBR feature branch operation, < >>Representing a second CBR feature branching operation,CMFArepresenting a modal interaction attention branching operation,CCFArepresenting a channel interaction attention branching operation, +. >A feature map obtained by adding an output feature map representing a first CBR feature branch operation and an output feature map of a modal interaction attention branch operation,/a feature map obtained by adding the output feature map representing a modal interaction attention branch operation and the output feature map representing a modal interaction attention branch operation>A feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map of the channel interaction attention branch operation,/a feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map representing the channel interaction attention branch operation>Represents->And->The characteristic diagram is obtained after the channel dimension is spliced,CBR/>representing CBR module operation,/->An output signature representing the operation of the CBR module.
And 6, before embedding the feature fusion network into the semantic segmentation model, such as deeplabv3plus, unet, segformer, panet, the feature fusion network is added before a feature extraction network of the semantic segmentation model. The multi-source data fusion module can fully integrate event information and RGB information, so that the semantic segmentation model can obtain feature images with richer information, and the semantic segmentation effect of the model is enhanced.
And 7, training a semantic segmentation model by using the data set constructed in the step 2, so that the model can fully extract the characteristics of the fish shoal, and therefore, the fish shoal is accurately subjected to semantic segmentation.
Specifically, preprocessing operation is performed on the divided fish-shoal data set, including cutting into the same size, randomly adding noise, randomly rotating and randomly scaling to enhance the robustness of the model. Training the semantic segmentation model by using a training set, saving the model with the best training index, and verifying the training effect of the model by using a verification set.
And 8, carrying out semantic segmentation on the fish shoals by using the model trained in the step 7.
Specifically, a shoal event sequence acquired by an event camera is characterized into a shoal event image by using a time interval method, and then a shoal RGB image and the event image are put into a trained semantic segmentation model for segmentation, so that a shoal segmentation map is obtained.
Through the technical scheme in the application, the fine segmentation effect on the fish swarm target is realized.
According to the technical scheme of the embodiment, the event camera is utilized to acquire a fish shoal event sequence and RGB images. And designing a cross-modal attention module for interacting the fish-shoal event image and the RGB image features. And designing a cross-channel attention module for the channel characteristics formed by splicing the interactive fish school event images and the RGB image characteristics. And designing a multi-source data fusion module for fully fusing the fish-shoal event image and the RGB image features. The method can effectively influence a series of negative influences on segmentation, such as shoal motion blur, underwater illumination unevenness, morphological texture feature information blur and the like in an underwater environment. According to the shoal semantic segmentation method based on multi-source data fusion, multi-source characteristic data can be integrated to achieve fine shoal semantic segmentation.
Example two
The invention also provides an underwater target semantic segmentation system based on multi-source data fusion, which is characterized by comprising a construction unit, a semantic segmentation model unit and an application module
The construction unit is used for acquiring a data set constructed by the underwater target event image and the RGB image, and simultaneously, according to 8:2, dividing the training set and the verification set in proportion;
the semantic segmentation model unit is used for embedding the designed cross-modal attention module and cross-channel attention module into the designed multi-source data fusion module; embedding the multisource data fusion module into the constructed semantic segmentation model, training the semantic segmentation model by using a training set, and verifying the trained semantic segmentation model by using a verification set; training a semantic segmentation model by using the data set constructed in the step 1, and verifying a network training effect by using a verification set, so that the model can fully extract the characteristic information of the underwater target, thereby carrying out fine semantic segmentation on the underwater target;
the application module performs semantic segmentation on the underwater target by using a semantic segmentation model unit.
Further, an underwater target event sequence and RGB images are acquired by using an event camera; characterizing the collected underwater target event sequence as an event image by adopting a fixed time interval method;
The event data is converted into an image by using a fixed time interval method. Specifically, the frame reconstruction is set to 20 milliseconds so that the detection frequency reaches 50 frames per second. In each time interval, according to the pixel position of an event in the event sequence, a white pixel is used for representing the event with increased polarity on a pixel point generated by the corresponding polarity, and a black pixel is used for representing the event with reduced polarity. The background color of the overall image is set to gray, and the finally generated event image has the same size as the RGB image.
Furthermore, the cross-modal attention module is specifically designed to be composed of a double-branch architecture and used for interacting the underwater target event image and the RGB image modal characteristics. One of the branches is a CBR (Convolume+ Batch Normalization +relu) module, which consists of Convolution, batch normalization, and ReLU activation functions, for fully extracting spatial features of the data, while finally adding to the cross-modal attentive mechanism output as a residual edge. The cross-modal attention module is designed, attention feature map information can be generated in a modal dimension in a serialized mode, and then the modal attention feature map is multiplied with the original input feature map and subjected to self-adaptive modal feature screening to generate a final feature map. Modeling global modal characteristics is achieved, and therefore finer-granularity modal characteristic information extraction is conducted.
Furthermore, the cross-channel attention module is specifically designed to be composed of a three-branch architecture and used for interacting the channel dimension characteristics of the underwater target event image and the RGB image. One branch consists of one CBR module, the other branch consists of two CBR modules which are connected in series and are used for fully extracting the target space characteristic information, and the two branches are simultaneously used as residual edges and finally added with the cross-channel attention mechanism output. The cross-channel attention module is designed, attention feature map information can be generated in the channel dimension in a serialized mode, and then the channel attention feature map is multiplied with the original input feature map and subjected to self-adaptive channel feature screening to generate a final feature map. Modeling global channel characteristics is achieved, and therefore finer-granularity channel characteristic information extraction is conducted.
Further, the multi-source data fusion module is specifically designed based on underwater target RGB images and underwater target event images, and the event images and the RGB images are simultaneously input into a data fusion network. The multi-source data fusion module comprises two CBR branches, a cross-modal attention module branch and a cross-channel attention module branch;
Wherein: the CBR branch is made up of two CBR modules in series,
the cross-modal attention module branch is formed by a cross-modal attention module in series with a CBR module,
the cross-channel attention module branch is formed by connecting a cross-channel attention module and a CBR module in series;
the output of one of the CBR branches is summed with the output of the cross-modal attention module,
the output of one CBR branch is added to the output of the cross-channel attention module;
the feature graphs after the two branches are added are spliced according to the channel dimension and then enter a CBR module to obtain a fusion feature graph of 3 multiplied by 640, so that final multi-source data feature fusion is realized.
Further, for the operation of the cross-modal fusion attention module, the calculation formula is as follows:
for cross-channel fusion attention module operation, the calculation formula is as follows:
wherein,representing the input characteristic map of the object,Conv/>representing the operation of a convolution,DWConv/>representing a depth-separable convolution operation,CBR/>on behalf of the CBR module operation,Normthe representative layer is subjected to a standardized operation,
R representing morphological transformation of feature vector, +.> /> />Represents a query vector, a modal key vector, a modal value vector, based on modal dimensions +.> /> />Representing a channel dimension based query vector, a channel key vector, a channel value vector,/a channel value vector > />Representing transpose of the current modal key vector,/-, is performed>Representing a feature vector derived from modal attention, < +.>Representing the feature vector derived from the channel attention,
is a learnable parameter.
Further, for the operation of the multi-source data fusion module, the calculation formula is as follows:
wherein,representing an event feature map, < >>Representing RGB feature map, ">Represents->And->Feature map is spliced according to dimensions, and the feature map is added with the dimensions>Represents->And->According to the characteristic diagram obtained after channel dimension splicing, +.>Representing a first branch CBR feature branch operation, < >>Representing a second CBR feature branching operation,CMFArepresenting a modal interaction attention branching operation,CCFArepresenting a channel interaction attention branching operation, +.>A feature map obtained by adding an output feature map representing a first CBR feature branch operation and an output feature map of a modal interaction attention branch operation,/a feature map obtained by adding the output feature map representing a modal interaction attention branch operation and the output feature map representing a modal interaction attention branch operation>A feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map of the channel interaction attention branch operation,/a feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map representing the channel interaction attention branch operation>Represents->And->The characteristic diagram is obtained after the channel dimension is spliced,CBR/>representing CBR module operation,/->An output signature representing the operation of the CBR module.
Specifically, a fish shoal event sequence and RGB images acquired by an event camera are obtained by using an acquisition module. And designing a cross-modal attention module for interacting the fish-shoal event image and the RGB image features. And designing a cross-channel attention module for the channel characteristics formed by splicing the interactive fish school event images and the RGB image characteristics. And designing a multi-source data fusion module for fully fusing the fish-shoal event image and the RGB image features. The system can effectively influence a series of negative influences on segmentation, such as shoal motion blur, underwater illumination unevenness, morphological texture feature information blur and the like in an underwater environment. According to the shoal semantic segmentation system based on multi-source data fusion, multi-source characteristic data can be integrated to achieve fine shoal semantic segmentation.
Example III
The embodiment of the invention provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the memory is used for storing the software program and a module, and the processor executes various functional applications and data processing by running the software program and the module stored in the memory. The memory and the processor are connected by a bus. In particular, the processor implements any of the steps of the above-described embodiment by running the above-described computer program stored in the memory.
It should be appreciated that in embodiments of the present invention, the processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read-only memory, flash memory, and random access memory, and provides instructions and data to the processor. Some or all of the memory may also include non-volatile random access memory.
From the above, the electronic device provided by the embodiment of the invention can implement the self-supervision group behavior recognition method according to the first embodiment by running a computer program, so as to obtain a new end-to-end framework called a group feature self-supervision model for group behavior characterization learning. The context is captured through the context-aware relationship predictive coding, so that not only the spatial interaction context but also the overall scene time-varying context are considered, and a more comprehensive group behavior characteristic representation is obtained.
It should be appreciated that the above-described integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by instructing related hardware by a computer program, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by a processor. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include: any entity or device capable of carrying the computer program code described above, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier wave signal, a telecommunications signal, a software distribution medium, and so forth. The content of the computer readable storage medium can be appropriately increased or decreased according to the requirements of the legislation and the patent practice in the jurisdiction.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
It should be noted that, the method and the details thereof provided in the foregoing embodiments may be combined into the apparatus and the device provided in the embodiments, and are referred to each other and are not described in detail.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/device embodiments described above are merely illustrative, e.g., the division of modules or elements described above is merely a logical functional division, and may be implemented in other ways, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. The underwater target semantic segmentation method based on multi-source data fusion is characterized by comprising the following steps of:
step 1, acquiring a data set constructed by an underwater target event image and an RGB image according to 8:2, dividing the training set and the verification set in proportion;
step 2, designing a cross-modal attention module and a cross-channel attention module;
step 3, embedding the cross-modal attention module and the cross-channel attention module into a designed multi-source data fusion module;
step 4, embedding the multi-source data fusion module into the constructed semantic segmentation model, training the semantic segmentation model by using the training set in the step 1, and verifying the trained semantic segmentation model by using the verification set;
And 5, performing semantic segmentation on the underwater target by using the semantic segmentation model in the step 4.
2. The method for semantic segmentation of underwater targets based on multi-source data fusion according to claim 1, wherein step 1 is specifically that an event camera is used to collect an underwater target event sequence and RGB images; and characterizing the acquired underwater target event sequence as an event image by adopting a fixed time interval method.
3. The method for semantic segmentation of an underwater target based on multi-source data fusion according to claim 1, wherein the step 2 is characterized in that a cross-modal attention module is designed and composed of a dual-branch architecture, wherein one branch is a CBR module, the CBR module is composed of convolution, batch standardization and ReLU activation functions, attention profile information can be generated in a mode dimension in a serialized manner, and then the mode attention profile is multiplied by an original input profile and subjected to adaptive mode feature screening to generate a final profile.
4. The underwater target semantic segmentation method based on multi-source data fusion according to claim 3, wherein the step 2 is characterized in that a cross-channel attention module is designed and specifically comprises a three-branch architecture, one branch comprises a CBR module, the other branch comprises two CBR modules which are connected in series, and the two branches are simultaneously used as residual edges and finally added with the cross-channel attention mechanism output; the method can sequentially generate attention feature map information in the channel dimension, then multiply the channel attention feature map with the original input feature map and perform self-adaptive channel feature screening to generate a final feature map.
5. The underwater target semantic segmentation method based on multi-source data fusion according to claim 4, wherein the multi-source data fusion module in the step 3 specifically comprises two CBR branches, a cross-modal attention module branch and a cross-channel attention module branch;
the CBR branch is made up of two CBR modules in series,
the cross-modal attention module branch is formed by a cross-modal attention module in series with a CBR module,
the cross-channel attention module branch is formed by connecting a cross-channel attention module and a CBR module in series;
the output of one of the CBR branches is summed with the output of the cross-modal attention module,
the output of one CBR branch is added to the output of the cross-channel attention module;
the feature graphs after the two branches are added are spliced according to the channel dimension and then enter a CBR module to obtain a fusion feature graph of 3 multiplied by 640, so that final multi-source data feature fusion is realized.
6. The method for semantic segmentation of an underwater target based on multi-source data fusion according to claim 5, wherein for the operation of a cross-modal fusion attention module, the calculation formula is as follows:
For cross-channel fusion attention module operation, the calculation formula is as follows:
wherein,representing the input characteristic map of the object,Conv/>representing the operation of a convolution,DWConv/>representing a depth-separable convolution operation,CBR/>on behalf of the CBR module operation,Normthe representative layer is subjected to a standardized operation,
R representing morphological transformation of feature vector, +.> /> />Represents a query vector, a modal key vector, a modal value vector, based on modal dimensions +.> /> />Representing a channel dimension based query vector, a channel key vector, a channel value vector, />representing transpose of the current modal key vector,/-, is performed>Representing the feature vector resulting from the modal attention,representing the feature vector derived from the channel attention,
is a learnable parameter.
7. The underwater target semantic segmentation method based on multi-source data fusion according to claim 5, wherein for the operation of the multi-source data fusion module, the calculation formula is as follows:
wherein,representing an event feature map, < >>Representing RGB feature map, ">Represents->And->The feature map is spliced according to dimensions,represents->And->According to the characteristic diagram obtained after channel dimension splicing, +.>Representing a first CBR feature branching operation,representing a second CBR feature branching operation,CMFArepresenting a modal interaction attention branching operation, CCFARepresenting a channel interaction attention branching operation, +.>A feature map obtained by adding an output feature map representing a first CBR feature branch operation and an output feature map of a modal interaction attention branch operation,/a feature map obtained by adding the output feature map representing a modal interaction attention branch operation and the output feature map representing a modal interaction attention branch operation>A feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map of the channel interaction attention branch operation,/a feature map obtained by adding the output feature map representing the second CBR feature branch operation and the output feature map representing the channel interaction attention branch operation>Represents->And->The characteristic diagram is obtained after the channel dimension is spliced,CBR/>representing CBR module operation,/->An output signature representing the operation of the CBR module.
8. An underwater target semantic segmentation system based on multi-source data fusion is characterized by comprising
The construction unit is used for acquiring a data set constructed by the underwater target event image and the RGB image according to 8:2, dividing the training set and the verification set in proportion;
the semantic segmentation model unit is used for embedding the designed cross-modal attention module and the cross-channel attention module into the designed multi-source data fusion module; embedding the multisource data fusion module into the constructed semantic segmentation model, training the semantic segmentation model by using a training set, and verifying the trained semantic segmentation model by using a verification set;
and the application module performs semantic segmentation on the underwater target by using the semantic segmentation model unit.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1-7 when executing the computer program.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-7.
CN202410035082.7A 2024-01-10 2024-01-10 Underwater target semantic segmentation method and system based on multi-source data fusion Active CN117557795B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410035082.7A CN117557795B (en) 2024-01-10 2024-01-10 Underwater target semantic segmentation method and system based on multi-source data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410035082.7A CN117557795B (en) 2024-01-10 2024-01-10 Underwater target semantic segmentation method and system based on multi-source data fusion

Publications (2)

Publication Number Publication Date
CN117557795A true CN117557795A (en) 2024-02-13
CN117557795B CN117557795B (en) 2024-03-29

Family

ID=89823493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410035082.7A Active CN117557795B (en) 2024-01-10 2024-01-10 Underwater target semantic segmentation method and system based on multi-source data fusion

Country Status (1)

Country Link
CN (1) CN117557795B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160311A (en) * 2020-01-02 2020-05-15 西北工业大学 Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network
CN113065578A (en) * 2021-03-10 2021-07-02 合肥市正茂科技有限公司 Image visual semantic segmentation method based on double-path region attention coding and decoding
CN116028846A (en) * 2022-12-20 2023-04-28 北京信息科技大学 Multi-mode emotion analysis method integrating multi-feature and attention mechanisms
CN116091765A (en) * 2022-12-29 2023-05-09 清华大学 RGB-T image semantic segmentation method and device
CN116597144A (en) * 2023-05-26 2023-08-15 大连理工大学 Image semantic segmentation method based on event camera
CN116682000A (en) * 2023-07-28 2023-09-01 吉林大学 Underwater frogman target detection method based on event camera
CN116958736A (en) * 2023-06-14 2023-10-27 安徽理工大学 RGB-D significance target detection method based on cross-modal edge guidance
CN117173394A (en) * 2023-08-07 2023-12-05 山东大学 Weak supervision salient object detection method and system for unmanned aerial vehicle video data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160311A (en) * 2020-01-02 2020-05-15 西北工业大学 Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network
CN113065578A (en) * 2021-03-10 2021-07-02 合肥市正茂科技有限公司 Image visual semantic segmentation method based on double-path region attention coding and decoding
CN116028846A (en) * 2022-12-20 2023-04-28 北京信息科技大学 Multi-mode emotion analysis method integrating multi-feature and attention mechanisms
CN116091765A (en) * 2022-12-29 2023-05-09 清华大学 RGB-T image semantic segmentation method and device
CN116597144A (en) * 2023-05-26 2023-08-15 大连理工大学 Image semantic segmentation method based on event camera
CN116958736A (en) * 2023-06-14 2023-10-27 安徽理工大学 RGB-D significance target detection method based on cross-modal edge guidance
CN116682000A (en) * 2023-07-28 2023-09-01 吉林大学 Underwater frogman target detection method based on event camera
CN117173394A (en) * 2023-08-07 2023-12-05 山东大学 Weak supervision salient object detection method and system for unmanned aerial vehicle video data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAMING ZHANG 等: ""CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation With Transformers"", 《IEEE》, vol. 24, no. 12, 31 December 2023 (2023-12-31), pages 14679 - 14694, XP011954524, DOI: 10.1109/TITS.2023.3300537 *
YU JIANG 等: ""Nighttime traffic object detection via adaptively integrating event and frame domains"", 《ELSEVIER》, 10 October 2023 (2023-10-10), pages 1 - 12 *

Also Published As

Publication number Publication date
CN117557795B (en) 2024-03-29

Similar Documents

Publication Publication Date Title
He et al. Hybrid first and second order attention Unet for building segmentation in remote sensing images
Yeh et al. Lightweight deep neural network for joint learning of underwater object detection and color conversion
EP3757890A1 (en) Method and device for image processing, method and device for training object detection model
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN116309781B (en) Cross-modal fusion-based underwater visual target ranging method and device
CN115719445A (en) Seafood identification method based on deep learning and raspberry type 4B module
CN115577768A (en) Semi-supervised model training method and device
CN117557795B (en) Underwater target semantic segmentation method and system based on multi-source data fusion
CN116205905B (en) Power distribution network construction safety and quality image detection method and system based on mobile terminal
CN115527159B (en) Counting system and method based on inter-modal scale attention aggregation features
CN116109682A (en) Image registration method based on image diffusion characteristics
CN112418111A (en) Offshore culture area remote sensing monitoring method and device, equipment and storage medium
Zhang et al. Image Segmentation Based on Visual Attention Mechanism.
Niu et al. Real-time recognition and location of indoor objects
Cao et al. FOD detection using a multi-channel information fusion method
Choi et al. Efficient bokeh effect rendering using generative adversarial network
Yin et al. Headdress Detection Based on Saliency Map for Thangka Portrait Image.
Zhang et al. An Improved Yolov3 Object Detection Algorithm for UAV Aerial Images
CN111382773A (en) Image matching method based on nine-grid principle for monitoring inside of pipeline
CN111582067B (en) Facial expression recognition method, system, storage medium, computer program and terminal
CN117218033B (en) Underwater image restoration method, device, equipment and medium
CN114926734B (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN112906679B (en) Pedestrian re-identification method, system and related equipment based on human shape semantic segmentation
Balasundaram et al. Zero-DCE++ Inspired Object Detection in Less Illuminated Environment Using Improved YOLOv5.
CN115115509A (en) Image generation method, image generation device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant