CN116895012A - Underwater image abnormal target identification method, system and equipment - Google Patents

Underwater image abnormal target identification method, system and equipment Download PDF

Info

Publication number
CN116895012A
CN116895012A CN202310904099.7A CN202310904099A CN116895012A CN 116895012 A CN116895012 A CN 116895012A CN 202310904099 A CN202310904099 A CN 202310904099A CN 116895012 A CN116895012 A CN 116895012A
Authority
CN
China
Prior art keywords
image
feature
generate
abnormal
underwater
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310904099.7A
Other languages
Chinese (zh)
Inventor
冯炎炯
谢顺添
陈先
罗文博
张清文
林斌
梁席源
陈俊东
黎如欣
司徒宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Yangjiang Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Yangjiang Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Yangjiang Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN202310904099.7A priority Critical patent/CN116895012A/en
Publication of CN116895012A publication Critical patent/CN116895012A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method, a system and equipment for identifying abnormal targets of an underwater image, wherein the method comprises the steps of adopting a preset abnormal object detection module to detect the abnormal targets of an original underwater image in an object boundary frame of a feature fusion image, and generating a preliminary detection result; and performing false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image, and generating an abnormal target detection result. The method solves the technical problems that the prior art lacks context information in the enhancement process and is difficult to accurately detect the abnormal target in the underwater image. The invention integrates image enhancement and object detection, fuses the original image, the enhanced image and the encoder characteristics, can adapt to the characteristics of different scales, fully utilizes the context information and the depth information of the predicted object, reduces the influence of noise and scattering on the detection result, reduces false positive detection and improves the reliability of the detection result.

Description

Underwater image abnormal target identification method, system and equipment
Technical Field
The invention relates to the technical field of recognition of abnormal targets of underwater images, in particular to a method, a system and equipment for recognizing abnormal targets of underwater images.
Background
The underwater image enhancement and abnormal object detection are important technologies in the fields of underwater robots, submarine cable detection, marine organism research and the like. In an underwater environment, due to the absorption and scattering characteristics of a water body, images are often affected by problems such as blurring and color distortion, and the accuracy of abnormal object detection is reduced. Conventional image enhancement methods and single network architectures do not achieve the desired detection performance in these complex environments.
Therefore, in the prior art, it is common to perform image enhancement by generating a countermeasure network (GAN) and object detection using a separate Convolutional Neural Network (CNN). However, the above method has a certain limitation, for example, useful information may be lost in the enhancement process, so that the lack of context information causes low detection accuracy, and it is difficult to accurately detect an abnormal target in the underwater image.
Disclosure of Invention
The invention provides a method, a system and equipment for identifying an abnormal target in an underwater image, which solve the technical problems that the existing technology possibly loses useful information in the enhancement process, and the detection precision is low and the abnormal target in the underwater image is difficult to accurately detect due to the lack of context information.
The invention provides a method for identifying an abnormal target of an underwater image, which comprises the following steps:
responding to a received underwater image abnormal target identification request, and acquiring an original underwater image corresponding to the underwater image abnormal target identification request;
performing enhancement processing on the original underwater image to generate an enhanced underwater image;
carrying out convolution processing on the original underwater image and the enhanced underwater image respectively, and carrying out feature fusion to generate a feature fusion image;
performing abnormal target detection on the original underwater image in an object boundary frame of the feature fusion image by adopting a preset abnormal object detection module to generate a preliminary detection result;
and performing false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image, and generating an abnormal target detection result.
Optionally, the step of performing enhancement processing on the original underwater image to generate an enhanced underwater image includes:
inputting the original underwater image into a preset initial image enhancement module;
extracting the characteristics of the original underwater image through a discriminator of the initial image enhancement module to generate an updated underwater image;
Training the initial image enhancement module by adopting the updated underwater image to generate a target image enhancement module;
and extracting the characteristics of the original underwater image through a generator of the target image enhancement module to generate an enhanced underwater image.
Optionally, the arbiter comprises a plurality of convolution layers, a downsampling layer, an activation layer and normalization layer, a pooling layer and a first fully-connected layer; the step of extracting the characteristics of the original underwater image by the discriminator of the initial image enhancement module to generate an updated underwater image comprises the following steps:
extracting features of the original underwater image through a plurality of convolution layers of a discriminator of the initial image enhancement module, generating a first feature image and inputting the first feature image into the downsampling layer;
extracting the features of the first feature image through the downsampling layer, generating a second feature image and inputting the second feature image into the activation layer;
performing nonlinear conversion processing on the second characteristic image through the activation layer to generate a third characteristic image and inputting the third characteristic image into the normalization layer;
normalizing the third characteristic image through the normalization layer to generate a fourth characteristic image and inputting the fourth characteristic image into the pooling layer;
Carrying out pooling treatment on the fourth characteristic image through the pooling layer to generate a fifth characteristic image and inputting the fifth characteristic image into the first full-connection layer;
and connecting the features of the fifth feature image through the first full connection layer to generate an updated underwater image.
Optionally, the generator comprises an encoder and a decoder; the step of extracting the characteristics of the original underwater image by the generator of the target image enhancement module to generate an enhanced underwater image comprises the following steps:
extracting the characteristics of the original underwater image through a convolution layer corresponding to an encoder of the target image enhancement module, generating a first enhancement characteristic image and inputting the first enhancement characteristic image into an activation layer of the encoder;
performing nonlinear conversion processing on the first enhancement feature image through an activation layer of the encoder, generating a second enhancement feature image and inputting the second enhancement feature image into a normalization layer of the encoder;
normalizing the second enhancement feature image through a normalization layer of the encoder to generate a third enhancement feature image and respectively inputting a skip layer connection and a convolution layer of the decoder;
convolving the third enhanced feature image by a convolution layer of the decoder to generate a fourth enhanced feature image and inputting the fourth enhanced feature image into a transpose convolution layer of the decoder;
Upsampling the fourth enhanced feature image through a transposed convolutional layer of the decoder to generate a fifth enhanced feature image and inputting the fifth enhanced feature image into an active layer of the decoder;
performing nonlinear conversion processing on the fifth enhanced feature image through an activation layer of the decoder to generate a sixth enhanced feature image and inputting the sixth enhanced feature image into a normalization layer of the decoder;
normalizing the sixth enhanced feature image through a normalization layer of the decoder to generate a seventh enhanced feature image and inputting the seventh enhanced feature image into the layer jump connection;
and splicing the second enhanced feature image and the seventh enhanced feature image through the layer jump connection to generate an enhanced underwater image.
Optionally, the step of performing convolution processing on the original underwater image and the enhanced underwater image, and performing feature fusion to generate a feature fusion image includes:
the self-adaptive expansion convolution module is used for respectively carrying out convolution processing on the original underwater image and the enhanced underwater image, respectively generating a first convolution characteristic image and a second convolution characteristic image, and inputting the first convolution characteristic image and the second convolution characteristic image into the dynamic characteristic fusion module;
and carrying out feature fusion on the first convolution feature image and the second convolution feature image through the dynamic feature fusion module to generate a feature fusion image.
Optionally, the step of performing feature fusion on the first convolution feature image and the second convolution feature image by the dynamic feature fusion module to generate a feature fusion image includes:
respectively carrying out mean pooling operation on the first convolution feature of the first convolution feature image, the second convolution feature of the second convolution feature image and the third convolution feature through a multipath feature channel of the dynamic feature fusion module, and respectively generating a first pooling feature, a second pooling feature and a third pooling feature;
splicing the first pooling feature, the second pooling feature and the third pooling feature, generating an initial feature vector, inputting the initial feature vector into a second full-connection layer for connection, generating an updated feature vector, and inputting the updated feature vector into a softmax layer;
performing softmax operation on the updated feature vector through the softmax layer to generate an updated vector;
multiplying the numerical value of the update vector by the feature channel number of the feature corresponding to the update vector to generate a plurality of target vectors;
and splicing all the target vectors to generate a feature fusion image.
Optionally, the step of detecting the abnormal target of the original underwater image in the object bounding box of the feature fusion image by using a preset abnormal object detection module to generate a preliminary detection result includes:
Inputting the feature fusion image into a preset area suggestion network to generate a plurality of object boundary boxes;
predicting the objectivity of the feature fusion image in the object boundary box through the area suggestion network, and scoring the objectivity of the feature fusion image to generate an objectivity score;
judging whether the object property score is larger than or equal to a score threshold value;
if yes, extracting an object boundary frame corresponding to the object nature score and object boundary frame coordinates corresponding to the object boundary frame;
optimizing an object boundary frame corresponding to the object property score by adopting a re-optimization module to generate an optimized object boundary frame;
detecting the feature fusion image by adopting the optimized object boundary box to generate a region detection result;
inputting the original underwater image into the abnormal object detection module;
extracting the characteristics of the original underwater image through a characteristic extractor of the abnormal object detection module to generate a first abnormal object characteristic image;
splicing the first abnormal object characteristic image and the characteristic fusion image to generate a second abnormal object characteristic image and inputting the second abnormal object characteristic image into a convolution network;
extracting the characteristics of the second abnormal object characteristic image through the convolution network to generate an attention detection result;
Multiplying the attention detection result by a confidence coefficient channel corresponding to the area detection result to generate a target abnormal object characteristic image;
and detecting the abnormal object of the target abnormal object characteristic image according to the optimized object boundary box on the target abnormal object characteristic image, and generating a preliminary detection result.
Optionally, the step of performing false positive suppression processing on the preliminary detection result through a preset spatial distribution model to generate an abnormal target detection result includes:
detecting depth information of an abnormal target from the feature fusion image by a regression method;
constructing a preset spatial distribution model based on the depth information of the abnormal target and image pixel data in an optimized object boundary frame on the feature fusion image;
analyzing a plurality of spatial distributions of abnormal targets in the preliminary detection results;
judging whether the spatial distribution is larger than or equal to a distribution threshold value of the preset spatial distribution model;
if not, determining the spatial distribution as false positive, and removing the spatial distribution;
if yes, determining the spatial distribution of the abnormal target, and combining a plurality of spatial distributions to generate an abnormal target detection result.
The invention provides an underwater image abnormal target recognition system, which comprises:
the original underwater image module is used for responding to the received underwater image abnormal target identification request and acquiring an original underwater image corresponding to the underwater image abnormal target identification request;
the enhanced underwater image module is used for enhancing the original underwater image to generate an enhanced underwater image;
the characteristic fusion image module is used for respectively carrying out convolution processing on the original underwater image and the enhanced underwater image and carrying out characteristic fusion to generate a characteristic fusion image;
the primary detection result module is used for detecting an abnormal target of the original underwater image in the object boundary frame of the characteristic fusion image by adopting a preset abnormal object detection module, so as to generate a primary detection result;
and the abnormal target detection result module is used for carrying out false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image to generate an abnormal target detection result.
An electronic device according to a third aspect of the present invention includes a memory and a processor, where the memory stores a computer program, and the computer program when executed by the processor causes the processor to execute the steps of the method for identifying an abnormal target of an underwater image as described in any one of the above.
From the above technical scheme, the invention has the following advantages:
according to the method, an original underwater image corresponding to the underwater image abnormal target identification request is obtained by responding to the received underwater image abnormal target identification request; performing enhancement processing on the original underwater image to generate an enhanced underwater image; carrying out convolution processing on the original underwater image and the enhanced underwater image respectively, and carrying out feature fusion to generate a feature fusion image; carrying out abnormal target detection on the original underwater image in an object boundary frame of the feature fusion image by adopting a preset abnormal object detection module to generate a preliminary detection result; and performing false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image, and generating an abnormal target detection result. The method solves the technical problems that the prior art possibly loses useful information in the enhancement process, lacks context information, has low detection precision and is difficult to accurately detect abnormal targets in underwater images.
The invention integrates image enhancement and object detection, fuses the original image, the enhanced image and the encoder characteristics, can adapt to the characteristics of different scales, fully utilizes the context information and the depth information of the predicted object, reduces the influence of noise and scattering on the detection result, reduces false positive detection and improves the reliability of the detection result.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flowchart illustrating steps of a method for identifying an abnormal target in an underwater image according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of a method for identifying an abnormal target in an underwater image according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of a module operation flow used in an underwater image abnormal target recognition method according to a second embodiment of the present invention;
fig. 4 is a block diagram of a system for identifying an abnormal target in an underwater image according to a third embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a system and equipment for identifying an abnormal target in an underwater image, which are used for solving the technical problems that the existing technology possibly loses useful information in the enhancement process, the detection precision is low and the abnormal target in the underwater image is difficult to accurately detect due to the lack of context information.
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a method for identifying an abnormal object of an underwater image according to an embodiment of the present invention.
The invention provides a method for identifying an abnormal target of an underwater image, which comprises the following steps:
step 101, responding to a received underwater image abnormal target identification request, and acquiring an original underwater image corresponding to the underwater image abnormal target identification request.
The underwater image abnormal target recognition request refers to a request for recognizing an abnormal target in an underwater image captured by an underwater robot.
In the implementation, when an abnormal target identification request of the underwater image is received, an original underwater image shot by the underwater robot is acquired, so that the subsequent image enhancement or other image processing is facilitated.
And 102, performing enhancement processing on the original underwater image to generate an enhanced underwater image.
In a specific implementation, the enhancement processing is performed on the original underwater image, including but not limited to extracting features of the original underwater image, and performing activation processing, normalization processing and the like on the image after the features are extracted, so as to obtain the enhanced underwater image.
And 103, respectively carrying out convolution processing on the original underwater image and the enhanced underwater image, and carrying out feature fusion to generate a feature fusion image.
The feature fusion image refers to a feature fusion image obtained after feature fusion of two feature images.
In specific implementation, two self-adaptive expansion convolution modules are adopted to respectively carry out convolution processing on an original underwater image and an enhanced image, two updated images are generated, and a dynamic feature fusion module is adopted to carry out feature fusion on the two updated images, so that a feature fusion image is obtained.
And 104, detecting an abnormal target of the original underwater image in an object boundary frame of the feature fusion image by adopting a preset abnormal object detection module, and generating a preliminary detection result.
It should be noted that the preset abnormal object detection module refers to a context-aware abnormal object detection module, specifically, an attention mechanism guided by a predefined underwater object template, focusing on an area more likely to contain an abnormal object in the background of the submarine cable. The templates represent the appearance of expected marine life, debris and cable damage, and abnormal targets are detected within the refined regional advice using the fused features of the dynamic feature fusion module and the attention directed context information.
In specific implementation, the depth information extracted by the feature fusion image and the image pixel data in the object boundary frame are adopted to construct a preset abnormal object detection module, so that the abnormal object detection module detects an abnormal target of an original underwater image in the object boundary frame, and a preliminary detection result is obtained.
And 105, performing false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image, and generating an abnormal target detection result.
The abnormal target detection result refers to a detection result obtained after detecting an abnormal target of an underwater image, and the detection result includes depth information of the detected abnormal object and the surrounding underwater environment.
False positive suppression processing refers to a predefined spatial distribution model customized for the underwater environment to reduce false positives of underwater images, which helps to improve the overall accuracy of abnormal object detection.
In the specific implementation, for a bounding box (object bounding box) of a preliminary detection result, cutting depth information of an abnormal target in an image pixel data and feature fusion image in the bounding box, obtaining distribution probability of the abnormal target in a spatial distribution model after normalization processing, extracting distribution probability corresponding to spatial distribution of the abnormal target in the preliminary detection result, determining false positive when the distribution probability is smaller than a distribution probability threshold of the abnormal target in the spatial distribution model, removing the spatial distribution, and determining a detection result corresponding to the residual spatial distribution as an abnormal target detection result.
Referring to fig. 2-3, fig. 2 is a flowchart illustrating steps of a method for identifying an abnormal object of an underwater image according to a second embodiment of the present invention.
The invention provides a method for identifying an abnormal target of an underwater image, which comprises the following steps:
step 201, responding to a received underwater image abnormal target identification request, and acquiring an original underwater image corresponding to the underwater image abnormal target identification request.
In the embodiment of the present invention, the implementation process of step 201 is similar to that of step 101, and will not be repeated here.
Step 202, inputting an original underwater image into a preset initial image enhancement module.
It should be noted that the preset initial image enhancement module includes a discriminator and a generator, where the generator adopts an encoder-decoder architecture, and layer jump connection is provided between the corresponding layers.
In specific implementation, the original underwater image is input into an initial image enhancement module for image enhancement processing.
And 203, extracting the characteristics of the original underwater image through a discriminator of the initial image enhancement module to generate an updated underwater image.
Optionally, the arbiter comprises a plurality of convolution layers, a downsampling layer, an activation layer and normalization layer, a pooling layer and a first fully-connected layer; step 203 comprises the following steps S11-S16:
S11, extracting features of an original underwater image through a plurality of convolution layers of a discriminator of an initial image enhancement module, generating a first feature image and inputting the first feature image into a downsampling layer;
s12, extracting the features of the first feature image through the downsampling layer, generating a second feature image and inputting the second feature image into the activation layer;
s13, performing nonlinear conversion processing on the second characteristic image through the activation layer, generating a third characteristic image and inputting the third characteristic image into the normalization layer;
s14, carrying out normalization processing on the third characteristic image through a normalization layer, generating a fourth characteristic image and inputting the fourth characteristic image into a pooling layer;
s15, carrying out pooling treatment on the fourth characteristic image through a pooling layer, generating a fifth characteristic image and inputting the fifth characteristic image into a first full-connection layer;
s16, connecting the features of the fifth feature image through the first full connection layer to generate an updated underwater image.
It should be noted that the arbiter is a convolutional neural network, and outputs confidence in the form of a single value through multiple convolutional layers, downsampling layers (including pooled or step-length-greater convolutional layers), an activation layer, a normalization layer, and finally a pooled layer (reducing the spatial size to 1×1), 1 or more fully-connected layers (reducing the characteristic channel from C to 1), 1 sigmoid layer. For distinguishing between real images and generated images during network training, the guidance generator generates enhanced images of quality and appearance similar to sharp images. Wherein, the first fully-connected layer and the second fully-connected layer in the above step and in the following step S42 are fully-connected layers.
In the specific implementation, in the step, the first characteristic image refers to a characteristic image obtained after the original underwater image is subjected to convolution treatment by a plurality of convolution layers of the discriminator; the second characteristic image refers to a characteristic image obtained after the first characteristic image is subjected to downsampling treatment by a downsampling layer; the third characteristic image refers to a characteristic image obtained after the second characteristic image is subjected to nonlinear conversion treatment by the activation layer; the fourth characteristic image refers to a characteristic image obtained after the normalization processing of the third characteristic image by the normalization layer; the fifth characteristic image refers to a characteristic image obtained after the fourth characteristic image is subjected to pooling treatment by the pooling layer; updating the underwater image refers to the feature image obtained after the fifth feature image is subjected to connection processing through the full connection layer.
And 204, training the initial image enhancement module by adopting the updated underwater image to generate a target image enhancement module.
After training by using the discriminator, the target image enhancement module guides the generator to generate an enhanced image with quality and appearance similar to those of a clear image, and the enhanced image can be the target image enhancement module.
In particular embodiments, the initial image enhancement module is trained using updated underwater images, and the loss function of the training initial image enhancement module includes countermeasures to loss L g Loss of content L c And downstream task loss L f . When the generator is G, the discriminator is D, and a certain original underwater image is x 0 The true value of the corresponding enhanced image is x gt The specific formula is
L g =-log(D(x gt ))-log(1-D(G(x 0 ))
L c =||G(x 0 )-x gt ||
Specifically, the downstream task loss is obtained by back propagation of the loss of the detection result of the whole neural network framework, and in the early stage of the whole training stage, the image enhancement module uses the data of the degraded original underwater image and the enhanced underwater image to train independently, and the middle and later stages participate in the joint training of the whole framework so as to optimize the output characteristics of the encoder. Specifically, referring to fig. 3, the target detection loss of the abnormal object detection module and the loss of the depth estimation module in fig. 3 are back-propagated, and specific formulas are:
L f =BP(L det :{Θ RPNREF })+αBP(L depDE )
wherein L is det Loss for target detection, including two-part loss of region suggestion network and boundary re-optimization, Θ RPN Suggesting parameters of the network (Region proposal network), Θ, for the region REF Optimizing parameters of a network (definition) for boundary re-optimization, L dep For loss of depth estimation, typically a 2-norm or other loss function between the predicted depth and the true depth (specifically, the distance of each point in the depth value image to the camera here, i.e., D in RGBD, not referring to underwater depth), Θ DE For parameters of the Depth estimation network (Depth estimation), BP is back-propagation, i.e. the gradient is back-propagated conventionally by the chain derivative principle.
Earlier in the network training, the image enhancement modules were trained alone, i.e. using only the countermeasures against loss L G And content loss L c The specific formula is as follows:
L enh_early =L g +λL c
wherein L is enh_early Loss function of target image enhancement module in early training stage, L g To combat losses, L c Is a loss of content.
The middle and later stages participate in the joint training, and the specific formula is as follows:
L enh =L g1 L c2 L f
wherein L is enh For the loss function of the target image enhancement module in the middle and later stages of training, L g To combat losses, L c For content loss, L f Lost for downstream tasks.
And 205, extracting the characteristics of the original underwater image through a generator of the target image enhancement module to generate an enhanced underwater image.
Optionally, the generator comprises an encoder and a decoder; step 205 includes the following steps S21-S28:
s21, extracting features of an original underwater image through a convolution layer corresponding to an encoder of a target image enhancement module, generating a first enhancement feature image and inputting the first enhancement feature image into an activation layer of the encoder;
s22, performing nonlinear conversion processing on the first enhancement feature image through an activation layer of the encoder, generating a second enhancement feature image and inputting the second enhancement feature image into a normalization layer of the encoder;
S23, carrying out normalization processing on the second enhancement feature image through a normalization layer of the encoder to generate a third enhancement feature image, and respectively inputting the third enhancement feature image into a skip layer connection and a convolution layer of the decoder;
s24, convolving the third enhancement feature image through a convolution layer of the decoder to generate a fourth enhancement feature image and inputting the fourth enhancement feature image into a transpose convolution layer of the decoder;
s25, performing up-sampling processing on the fourth enhancement feature image through a transposed convolution layer of the decoder to generate a fifth enhancement feature image, and inputting the fifth enhancement feature image into an activation layer of the decoder;
s26, performing nonlinear conversion processing on the fifth enhancement feature image through an activation layer of the decoder, generating a sixth enhancement feature image and inputting the sixth enhancement feature image into a normalization layer of the decoder;
s27, carrying out normalization processing on the sixth enhanced feature image through a normalization layer of the decoder, generating a seventh enhanced feature image and inputting the seventh enhanced feature image into a layer jump connection;
and S28, splicing the second enhanced feature image and the seventh enhanced feature image through layer jump connection to generate an enhanced underwater image.
It should be noted that the encoder is composed of a series of convolution layers with step sizes of 1 or greater, the first layer of the convolution kernel can use, but not limited to, a convolution kernel with a size of 7×7, and the number of output channels includes, but not limited to, 128 or 256, and the subsequent convolution kernel uses a convolution kernel with a size of 1×1 or 3×3, steps The feature space domain size is unchanged when the length is 1, the feature space domain size is reduced to 1/s when the step length is 2 or higher (s is recorded), the feature space domain size is called downsampling, the channel number is amplified to be (without limitation) s or 2s times of the last feature with each feature downsampling, and the feature channel number is kept unchanged when the feature space domain size is not downsampled. Each convolutional layer is followed by an active layer, which may employ, but is not limited to, a Relu, a leak Relu, etc. active function. Each activation layer is followed by a normalization layer including, but not limited to, batch normalization, layer normalization, group normalization, or instance normalization. Recording the convolution layer as C, the activation layer as R, and the normalization layer as B, for a certain layer of characteristics x i (x 0 As original underwater image) are:
x i+1 =B i+1 (R i+1 (C i+1 (x i )))
in a specific implementation, the decoder is composed of a series of convolution layers, a transposed convolution layer and an activation and normalization layer, wherein the step sizes of the convolution layers are all 1, the up-sampling rate of the transposed convolution layer corresponds to the mirror image of the down-sampling layer in the encoder, for example, if the encoder comprises 3 down-sampling convolution layers, the down-sampling rates are respectively 1/4,1/2 and 1/2, the decoder comprises 3 transposed convolution layers, and the up-sampling rates are respectively 2,2 and 4. Each convolutional layer or transposed convolutional layer in the decoder is also followed by an active layer and a normalization layer.
The layer jump connection is to take the intermediate features output by the convolution layer in the encoder to the corresponding scale of the decoder to splice the feature channels (refer to the Unet).
In the specific implementation, the first enhanced feature image refers to a feature image generated after the original underwater image is subjected to convolution feature extraction processing by a convolution layer corresponding to the encoder; the second enhancement feature image refers to a feature image generated after the first enhancement feature image is subjected to nonlinear conversion treatment by an activation layer of the encoder; the third enhanced feature image refers to a feature image generated after the second enhanced feature image is normalized by a normalization layer of the encoder; the fourth enhanced feature image refers to a feature image generated by the third enhanced feature image after being subjected to convolution treatment by a convolution layer of a decoder; the fifth enhanced feature image refers to a feature image generated after the fourth enhanced feature image is subjected to up-sampling treatment by a transposed convolution layer of the decoder; the sixth enhanced feature image refers to a feature image generated after the fifth enhanced feature image is subjected to nonlinear conversion processing by an activation layer of the decoder; the seventh enhanced feature image refers to a feature image generated after the sixth enhanced feature image is normalized by a normalization layer of the decoder, and the enhanced underwater image refers to an underwater image generated after the second enhanced feature image and the seventh enhanced feature image are spliced by layer jump connection.
And 206, respectively carrying out convolution processing on the original underwater image and the enhanced underwater image, and carrying out feature fusion to generate a feature fusion image.
Optionally, step 206 includes the following steps S31-S32:
s31, respectively carrying out convolution processing on an original underwater image and an enhanced underwater image through a self-adaptive expansion convolution module, respectively generating a first convolution characteristic image and a second convolution characteristic image, and inputting the first convolution characteristic image and the second convolution characteristic image into a dynamic characteristic fusion module;
s32, performing feature fusion on the first convolution feature image and the second convolution feature image through a dynamic feature fusion module to generate a feature fusion image.
In the above steps, the features are extracted from the original underwater image and the generated enhanced underwater image using the adaptive dilation convolution modules with different dilation rates. Since the size of the object of interest in the image varies greatly depending on a number of factors such as the size of the object itself and the distance and viewing angle of the camera, a variable receptive field is required to improve the object detection effect. The adaptive dilation convolution module expands the acceptance field of the convolution without increasing the number of parameters, enabling the network to capture context information on different scales. However, the expansion ratio of the manually set adaptive expansion convolution module is difficult to adapt to complex changes in real large data.
Thus, in an embodiment of the present invention, a certain convolution layer is described as having C convolution kernels, which contain C 0 Common convolution kernel sum c 1 A variable expansion rate convolution kernel. One step of training with a networkFor example, use c of the last step 1 The expansion rate of each expansion convolution kernel is propagated forward, and a loss function is calculated.
Adjusting c in heuristic or random directions 1 And (3) expanding part of the expansion coefficients in the convolution kernels for several times, calculating a loss function each time, recording a minimum loss function as an optimization direction, and searching for a better expansion coefficient setting. And along with the training process of the whole frame, after the loss function is reduced below a set threshold value, stopping optimizing the expansion rate, solidifying the configuration, continuing the subsequent training by using the solidified configuration, and carrying out target detection in an reasoning stage. The self-adaptive expansion convolution module is beneficial to capturing more meaningful characteristics in abnormal target detection tasks.
In the implementation, when two adaptive expansion convolution modules respectively perform feature extraction on an original underwater image and an enhanced underwater image, a first convolution feature F is generated orig And a second convolution characteristic F enh And a feature F extracted by an encoder portion of a generator in the target image enhancement module enc And inputting the images into a dynamic feature fusion module to perform feature fusion processing, and generating a feature fusion image.
Optionally, step S32 includes the following steps S41-S45:
s41, carrying out mean pooling operation on a first convolution feature of a first convolution feature image, a second convolution feature of a second convolution feature image and a third convolution feature through a multipath feature channel of a dynamic feature fusion module, and respectively generating a first pooling feature, a second pooling feature and a third pooling feature;
s42, splicing the first pooling feature, the second pooling feature and the third pooling feature, generating an initial feature vector, inputting the initial feature vector into a second full-connection layer for connection, generating an updated feature vector, and inputting the updated feature vector into a softmax layer;
s43, performing softmax operation on the updated feature vector through a softmax layer to generate an updated vector;
s44, multiplying the numerical value of the update vector by the feature channel number of the feature corresponding to the update vector to generate a plurality of target vectors;
and S45, splicing all the target vectors to generate a feature fusion image.
It should be noted that three features are obtained from the encoder of the generator of the target image enhancement module and the adaptive expansion convolution module as input of the dynamic feature fusion module, and the features are extracted at different levels of the two networks to capture multi-scale information. A series of attention-based gating mechanisms are employed to selectively combine feature maps from three paths. These gating mechanisms trade off the importance of features from each source based on their relevance to the task.
In particular implementation, first convolution features F generated by the original underwater image and the enhanced underwater image respectively orig And a second convolution characteristic F enh And a feature F extracted by an encoder portion of a generator in the target image enhancement module enc Namely, the left, middle and right three feature channels of the dynamic feature fusion module are respectively input with F orig 、F enh And F enc Let the channel number of three-way characteristic channel be C orig 、C enh And C enc For F respectively orig 、F enh And F enc Performing average pooling operation in space dimension to generate a first pooling feature, a second pooling feature and a third pooling feature, reducing the space dimension to 1×1, and then splicing to obtain a product with length of C orig +C enh +C enc A vector (i.e., initial feature vector) that is input into 1 or more fully connected layers to obtain a vector of length n (i.e., updated feature vector), and a softmax operation is performed on the vector to obtain a vector a (i.e., updated vector), which represents n gating mechanisms, where n=1+n D For the non-repetition number of the expansion ratio in the adaptive expansion convolution module (for example, in the adaptive expansion convolution module, the convolution kernel comprises 32 convolution kernels with the expansion ratio of 1 (equivalent to a common convolution kernel), 32 convolution kernels with the expansion ratio of 2, 16 convolution kernels with the expansion ratio of 4, and 16 convolution kernels with the expansion ratio of 8, then n D =4, i.e. num ({ 1,2,4,8 })), and 1 corresponds to F enc ObtainingAfter the vector A is reached, a plurality of target vectors are generated after the numerical value of the vector A is multiplied into the corresponding channel of the feature, and splicing is carried out in the channel dimension, namely:
F merge =concatenate({A[0]·F enc ,A[DR=1]·F orig [DR=1],A[DR=1]·F en [DR=1],A[DR=2]·F orig [DR=2],A[DR=2]·F enh [DR=2],…})
wherein F is merge For fusion characteristics, DR expansion rate, f]Representation according to [ []Indexing operation, con-cate indicates that a splicing operation is performed in the feature channel.
In the implementation, the output of the dynamic feature fusion module is a group of fusion features, including the information of the enhanced image and the original image, and the information of the encoder of the target image enhancement module, and the features of different receptive fields are fused after a gating mechanism is added, so that the subsequent network can obtain self-adaptive superior fusion features, and has self-adaptive attention to context information of different scales.
And 207, detecting an abnormal target of the original underwater image in an object boundary frame of the feature fusion image by adopting a preset abnormal object detection module, and generating a preliminary detection result.
Optionally, step 207 includes the following steps S51-S512:
s51, inputting the feature fusion image into a preset area suggestion network to generate a plurality of object boundary boxes;
S52, predicting the objectivity of the feature fusion image in the object boundary frame through the regional suggestion network, and grading the objectivity of the feature fusion image to generate an objectivity grade;
s53, judging whether the object score is larger than or equal to a score threshold value;
s54, if yes, extracting an object boundary frame corresponding to the object property score and object boundary frame coordinates corresponding to the object boundary frame;
s55, optimizing an object boundary frame corresponding to the object property score by adopting a re-optimization module to generate an optimized object boundary frame;
s56, detecting the feature fusion image by adopting an optimized object boundary box to generate a region detection result;
s57, inputting the original underwater image into an abnormal object detection module;
s58, extracting features of an original underwater image through a feature extractor of the abnormal object detection module to generate a first abnormal object feature image;
s59, splicing the first abnormal object feature image and the feature fusion image to generate a second abnormal object feature image and inputting the second abnormal object feature image into a convolution network;
s510, extracting the features of the second abnormal object feature image through a convolution network to generate an attention detection result;
s511, multiplying the attention detection result by a confidence coefficient channel corresponding to the region detection result to generate a target abnormal object feature image;
S512, detecting the abnormal object of the target abnormal object feature image according to the optimized object boundary box on the target abnormal object feature image, and generating a preliminary detection result.
The preset region suggestion network (region proposal network) generates region suggestions by sliding a small network over the feature fusion image from the dynamic feature fusion module, predicting object-like scores and bounding box coordinates in each region. The re-optimization module independently uses the bounding box coordinates of the fused feature refinement region suggestions from the dynamic feature fusion module. Improving the positioning of the detected object helps the network focus on the most relevant areas in the abnormal object detection task.
In specific implementation, the feature fusion image is input into a region suggestion network, an abnormal target on feature fusion is predicted through the region suggestion network, an object boundary frame is divided for the predicted abnormal target, the feature fusion image in the object boundary frame is predicted through the region suggestion network, the object property is scored, and an object property score is generated. And when the object property score is larger than or equal to a preset score threshold value, extracting an object boundary box corresponding to the object property score.
In the specific implementation, an optimized object boundary frame can be generated by adopting a re-optimization module to refine the object boundary frame, and the optimized object boundary frame is applied to detect the feature fusion image to generate a region detection result.
In the specific implementation, a group of original underwater images containing common underwater abnormal targets is set as I tplt The image size is h×w, the number of images is N, then I tplt The size of N x h x w is input into a feature extractor E in the form of a convolutional neural network t0 Control E t0 Is down-sampled by a ratio of space size to F merge The same, obtain F tplt (i.e., first anomalous object feature image), F merge (feature fusion image) F tplt (i.e. the first abnormal object feature image) is cascaded (i.e. feature channel is spliced) to obtain a second abnormal object feature image, and then the second abnormal object feature image is input into another convolutional neural network E t1 Obtain a thermodynamic diagram form of attention result A T . Will A T Multiplying the confidence coefficient channel in the region detection result to obtain a target abnormal object characteristic image (the output of the region suggesting and re-optimizing module is the same as the existing various object detection output and is h ×w ×(1+4+C),h X w' is the downsampled size of the original underwater image through each module, 1 is the confidence, 4 is the optimized object bounding box, C is the object class classification result, multiplied by the "1" channel).
In the specific implementation, according to A T And carrying out abnormal object detection on information on the target abnormal object characteristic image in the boundary frame by multiplying the target abnormal object characteristic image and the boundary frame on the target abnormal object characteristic image obtained by the confidence coefficient channel in the region detection result of the region recommendation and re-optimization module, thereby obtaining a preliminary detection result.
In particular, the introduction of common template attention can further improve the accuracy and generalization capability of the network in terms of abnormal object detection.
And step 208, performing false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image, and generating an abnormal target detection result.
Optionally, step 208 includes the following steps S61-S66:
s61, detecting depth information of an abnormal target from the feature fusion image through a regression method;
s62, constructing a preset spatial distribution model based on depth information of an abnormal target and image pixel data in an optimized object boundary frame on a feature fusion image;
s63, analyzing a plurality of spatial distributions of abnormal targets in the preliminary detection result;
s64, judging whether the spatial distribution is larger than or equal to a distribution threshold value of a preset spatial distribution model;
S65, if not, determining the spatial distribution as false positive, and removing the spatial distribution;
and S66, if so, determining the spatial distribution of the abnormal target, and combining a plurality of spatial distributions to generate an abnormal target detection result.
In the above step, the depth of the detected abnormal object is estimated using the fusion feature from the dynamic feature fusion module. The depth estimation adopts a regression-based method, uses a neural network similar to a Full Convolution Network (FCN) to output the depth of each point, and obtains the object depth through the predicted abnormal target bounding box sampling. The estimated depth information provides additional context information to the reject sampling based false positive suppression module, helping it to better distinguish between true and false positives. At the same time, the depth estimation can provide a loss function for optimizing the network parameters of the preceding module. In addition, the depth information of the detected abnormal target and the surrounding underwater environment is also interesting information, and can provide more perception to the user on site.
In specific implementation, the preset spatial distribution setting method specifically comprises the following steps: the internal RGB and depth data of the abnormal target instance are cut by an optimized object boundary box, a preset space distribution model is built in a normalized RGBD space, and probability distribution (namely space distribution) is obtained.
And analyzing a bounding box (namely spatial distribution) of the predicted abnormal target in the preliminary detection result, cutting the internal RGB and depth data of the bounding box, sampling the probability of the bounding box in the distribution after normalization processing, and recognizing the bounding box as false positive when the probability is smaller than a set threshold (distribution threshold). And removing the spatial distribution corresponding to the false positive, thereby determining the spatial distribution of the abnormal target, and generating an abnormal target detection result by combining the spatial distributions.
The method comprises the steps of responding to a received underwater image abnormal target identification request, and acquiring an original underwater image corresponding to the underwater image abnormal target identification request; performing enhancement processing on the original underwater image to generate an enhanced underwater image; carrying out convolution processing on the original underwater image and the enhanced underwater image respectively, and carrying out feature fusion to generate a feature fusion image; carrying out abnormal target detection on the original underwater image in an object boundary frame of the feature fusion image by adopting a preset abnormal object detection module to generate a preliminary detection result; and performing false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image, and generating an abnormal target detection result. The method solves the technical problems that the prior art possibly loses useful information in the enhancement process, lacks context information, has low detection precision and is difficult to accurately detect abnormal targets in underwater images.
The invention integrates image enhancement and object detection, fuses the original image, the enhanced image and the encoder characteristics, can adapt to the characteristics of different scales, fully utilizes the context information and the depth information of the predicted object, reduces the influence of noise and scattering on the detection result, reduces false positive detection and improves the reliability of the detection result.
Referring to fig. 4, fig. 4 is a block diagram illustrating a system for identifying an abnormal object of an underwater image according to a third embodiment of the present invention.
The invention provides an underwater image abnormal target recognition system, which comprises:
the original underwater image module 401 is configured to respond to a received underwater image abnormal target identification request, and obtain an original underwater image corresponding to the underwater image abnormal target identification request;
an enhanced underwater image module 402, configured to perform enhancement processing on an original underwater image, and generate an enhanced underwater image;
the feature fusion image module 403 is configured to perform convolution processing on the original underwater image and the enhanced underwater image, and perform feature fusion to generate a feature fusion image;
the preliminary detection result module 404 is configured to perform abnormal target detection on the original underwater image in the object bounding box of the feature fusion image by using a preset abnormal object detection module, so as to generate a preliminary detection result;
The abnormal target detection result module 405 is configured to perform false positive suppression processing on the preliminary detection result through a preset spatial distribution model based on depth information of an abnormal target in the feature fusion image, and generate an abnormal target detection result.
Optionally, the enhanced underwater image module 402 includes:
the initial image enhancement sub-module is used for inputting the original underwater image into the preset initial image enhancement module;
the updating underwater image sub-module is used for extracting the characteristics of the original underwater image through the discriminator of the initial image enhancement module to generate an updating underwater image;
the target image enhancement sub-module is used for training the initial image enhancement module by adopting the updated underwater image to generate a target image enhancement module;
and the enhanced underwater image sub-module is used for extracting the characteristics of the original underwater image through a generator of the target image enhancing module to generate an enhanced underwater image.
Optionally, the arbiter comprises a plurality of convolution layers, a downsampling layer, an activation layer and normalization layer, a pooling layer and a first fully-connected layer; updating the underwater image submodule includes:
the downsampling layer sub-module is used for extracting the characteristics of the original underwater image through a plurality of convolution layers of the discriminator of the initial image enhancement module, generating a first characteristic image and inputting the first characteristic image into the downsampling layer;
The activation submodule is used for extracting the features of the first feature image through the downsampling layer, generating a second feature image and inputting the second feature image into the activation layer;
the input normalization layer sub-module is used for carrying out nonlinear conversion processing on the second characteristic image through the activation layer, generating a third characteristic image and inputting the third characteristic image into the normalization layer;
the pooling layer sub-module is used for carrying out normalization processing on the third characteristic image through the normalization layer, generating a fourth characteristic image and inputting the fourth characteristic image into the pooling layer;
the first full-connection layer sub-module is used for carrying out pooling treatment on the fourth characteristic image through the pooling layer, generating a fifth characteristic image and inputting the fifth characteristic image into the first full-connection layer;
and the generation updating underwater image sub-module is used for connecting the features of the fifth feature image through the first full-connection layer to generate an updating underwater image.
Optionally, the generator comprises an encoder and a decoder; the enhanced underwater image submodule includes:
the first encoder submodule is used for extracting the characteristics of the original underwater image through the convolution layer corresponding to the encoder of the target image enhancement module, generating a first enhancement characteristic image and inputting the first enhancement characteristic image into the activation layer of the encoder;
the second encoder submodule is used for carrying out nonlinear conversion processing on the first enhancement feature image through an activation layer of the encoder, generating a second enhancement feature image and inputting the second enhancement feature image into a normalization layer of the encoder;
The convolution layer sub-module is used for carrying out normalization processing on the second enhancement feature image through a normalization layer of the encoder to generate a third enhancement feature image and respectively inputting the third enhancement feature image into a skip layer connection and a convolution layer of the decoder;
the device convolution layer submodule is used for convoluting the third enhancement characteristic image through a convolution layer of the decoder, generating a fourth enhancement characteristic image and inputting the fourth enhancement characteristic image into a transposed convolution layer of the decoder;
the activation layer sub-module is used for carrying out up-sampling processing on the fourth enhancement feature image through a transposed convolution layer of the decoder, generating a fifth enhancement feature image and inputting the fifth enhancement feature image into an activation layer of the decoder;
the normalization layer sub-module is used for carrying out nonlinear conversion processing on the fifth enhancement feature image through an activation layer of the decoder, generating a sixth enhancement feature image and inputting the sixth enhancement feature image into a normalization layer of the decoder;
the layer jump connection sub-module is used for carrying out normalization processing on the sixth enhanced feature image through a normalization layer of the decoder, generating a seventh enhanced feature image and inputting the seventh enhanced feature image into layer jump connection;
the enhanced underwater image generation sub-module is used for splicing the second enhanced feature image and the seventh enhanced feature image through layer jump connection to generate an enhanced underwater image.
Optionally, the feature fusion image module 403 includes:
the dynamic fusion sub-module is used for respectively carrying out convolution processing on the original underwater image and the enhanced underwater image through the self-adaptive expansion convolution module, respectively generating a first convolution characteristic image and a second convolution characteristic image, and inputting the first convolution characteristic image and the second convolution characteristic image into the dynamic characteristic fusion module;
and the feature fusion image sub-module is used for carrying out feature fusion on the first convolution feature image and the second convolution feature image through the dynamic feature fusion module to generate a feature fusion image.
Optionally, the feature fusion image submodule includes:
the third pooling feature sub-module is used for respectively carrying out average pooling operation on the first convolution feature of the first convolution feature image, the second convolution feature of the second convolution feature image and the third convolution feature through the multipath feature channels of the dynamic feature fusion module, and respectively generating a first pooling feature, a second pooling feature and a third pooling feature;
the softmax layer sub-module is used for splicing the first pooling feature, the second pooling feature and the third pooling feature, generating an initial feature vector and inputting the initial feature vector into the second full-connection layer for connection, generating an updated feature vector and inputting the updated feature vector into the softmax layer;
The update vector submodule is used for performing softmax operation on the update feature vector through the softmax layer to generate an update vector;
the target vector submodule is used for multiplying the numerical value of the update vector by the characteristic channel number of the characteristic corresponding to the update vector to generate a plurality of target vectors;
and the feature fusion image sub-module is used for splicing all the target vectors to generate a feature fusion image.
Optionally, the preliminary detection result module 404 includes:
the object boundary frame sub-module is used for inputting the feature fusion image into a preset area suggestion network to generate a plurality of object boundary frames;
the object property score sub-module is used for predicting the object property of the feature fusion image in the object boundary frame through the regional suggestion network, scoring the object property of the feature fusion image and generating an object property score;
the score threshold submodule is used for judging whether the object score is larger than or equal to a score threshold value;
the object boundary frame coordinate submodule is used for extracting an object boundary frame corresponding to the object property score and object boundary frame coordinates corresponding to the object boundary frame if the object boundary frame coordinates are positive;
the optimizing object boundary frame sub-module is used for optimizing the object boundary frame corresponding to the object property score by adopting the re-optimizing module to generate an optimizing object boundary frame;
The region detection result submodule is used for detecting the feature fusion image by adopting the optimized object boundary frame to generate a region detection result;
the abnormal object detection module sub-module is used for inputting the original underwater image into the abnormal object detection module;
the first abnormal object feature image sub-module is used for extracting features of the original underwater image through a feature extractor of the abnormal object detection module to generate a first abnormal object feature image;
the convolution network sub-module is used for splicing the characteristic image of the first abnormal object with the characteristic fusion image, generating a characteristic image of the second abnormal object and inputting the characteristic image of the second abnormal object into the convolution network;
the attention detection result submodule is used for extracting the characteristics of the second abnormal object characteristic image through the convolution network and generating an attention detection result;
the target abnormal object feature image sub-module is used for multiplying the attention detection result by a confidence coefficient channel corresponding to the region detection result to generate a target abnormal object feature image;
the preliminary detection result submodule is used for detecting the abnormal object of the target abnormal object feature image according to the optimized object boundary box on the target abnormal object feature image, and a preliminary detection result is generated.
Optionally, the abnormal target detection result module 405 includes:
the depth information sub-module is used for detecting the depth information of the abnormal target from the feature fusion image through a regression method;
the construction submodule is used for constructing a preset spatial distribution model based on depth information of an abnormal target and image pixel data in an optimized object boundary frame on the feature fusion image;
an analysis sub-module for analyzing a plurality of spatial distributions of the abnormal target in the preliminary detection result;
the distribution threshold submodule is used for judging whether the spatial distribution is larger than or equal to a distribution threshold of a preset spatial distribution model;
the space distribution removing submodule is used for determining the space distribution as false positive and removing the space distribution if not;
and the abnormal target detection result submodule is used for determining the spatial distribution of the abnormal target if yes, and generating an abnormal target detection result by combining a plurality of spatial distributions.
The fourth embodiment of the invention also provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program; the computer program, when executed by a processor, causes the processor to perform the steps of the underwater image anomaly target recognition system method of any of the embodiments described above.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The method for identifying the abnormal target of the underwater image is characterized by comprising the following steps of:
responding to a received underwater image abnormal target identification request, and acquiring an original underwater image corresponding to the underwater image abnormal target identification request;
performing enhancement processing on the original underwater image to generate an enhanced underwater image;
carrying out convolution processing on the original underwater image and the enhanced underwater image respectively, and carrying out feature fusion to generate a feature fusion image;
performing abnormal target detection on the original underwater image in an object boundary frame of the feature fusion image by adopting a preset abnormal object detection module to generate a preliminary detection result;
And performing false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image, and generating an abnormal target detection result.
2. The method for identifying an abnormal object in an underwater image according to claim 1, wherein said step of performing enhancement processing on said original underwater image to generate an enhanced underwater image comprises:
inputting the original underwater image into a preset initial image enhancement module;
extracting the characteristics of the original underwater image through a discriminator of the initial image enhancement module to generate an updated underwater image;
training the initial image enhancement module by adopting the updated underwater image to generate a target image enhancement module;
and extracting the characteristics of the original underwater image through a generator of the target image enhancement module to generate an enhanced underwater image.
3. The method of claim 2, wherein the discriminator comprises a plurality of convolution layers, a downsampling layer, an activation layer and normalization layer, a pooling layer and a first full-connection layer; the step of extracting the characteristics of the original underwater image by the discriminator of the initial image enhancement module to generate an updated underwater image comprises the following steps:
Extracting features of the original underwater image through a plurality of convolution layers of a discriminator of the initial image enhancement module, generating a first feature image and inputting the first feature image into the downsampling layer;
extracting the features of the first feature image through the downsampling layer, generating a second feature image and inputting the second feature image into the activation layer;
performing nonlinear conversion processing on the second characteristic image through the activation layer to generate a third characteristic image and inputting the third characteristic image into the normalization layer;
normalizing the third characteristic image through the normalization layer to generate a fourth characteristic image and inputting the fourth characteristic image into the pooling layer;
carrying out pooling treatment on the fourth characteristic image through the pooling layer to generate a fifth characteristic image and inputting the fifth characteristic image into the first full-connection layer;
and connecting the features of the fifth feature image through the first full connection layer to generate an updated underwater image.
4. The underwater image anomaly target recognition method of claim 2, wherein the generator comprises an encoder and a decoder; the step of extracting the characteristics of the original underwater image by the generator of the target image enhancement module to generate an enhanced underwater image comprises the following steps:
Extracting the characteristics of the original underwater image through a convolution layer corresponding to an encoder of the target image enhancement module, generating a first enhancement characteristic image and inputting the first enhancement characteristic image into an activation layer of the encoder;
performing nonlinear conversion processing on the first enhancement feature image through an activation layer of the encoder, generating a second enhancement feature image and inputting the second enhancement feature image into a normalization layer of the encoder;
normalizing the second enhancement feature image through a normalization layer of the encoder to generate a third enhancement feature image and respectively inputting a skip layer connection and a convolution layer of the decoder;
convolving the third enhanced feature image by a convolution layer of the decoder to generate a fourth enhanced feature image and inputting the fourth enhanced feature image into a transpose convolution layer of the decoder;
upsampling the fourth enhanced feature image through a transposed convolutional layer of the decoder to generate a fifth enhanced feature image and inputting the fifth enhanced feature image into an active layer of the decoder;
performing nonlinear conversion processing on the fifth enhanced feature image through an activation layer of the decoder to generate a sixth enhanced feature image and inputting the sixth enhanced feature image into a normalization layer of the decoder;
normalizing the sixth enhanced feature image through a normalization layer of the decoder to generate a seventh enhanced feature image and inputting the seventh enhanced feature image into the layer jump connection;
And splicing the second enhanced feature image and the seventh enhanced feature image through the layer jump connection to generate an enhanced underwater image.
5. The method for identifying an abnormal object of an underwater image according to claim 1, wherein the step of performing convolution processing on the original underwater image and the enhanced underwater image, respectively, and performing feature fusion, and generating a feature fusion image, comprises:
the self-adaptive expansion convolution module is used for respectively carrying out convolution processing on the original underwater image and the enhanced underwater image, respectively generating a first convolution characteristic image and a second convolution characteristic image, and inputting the first convolution characteristic image and the second convolution characteristic image into the dynamic characteristic fusion module;
and carrying out feature fusion on the first convolution feature image and the second convolution feature image through the dynamic feature fusion module to generate a feature fusion image.
6. The method for identifying an abnormal target of an underwater image according to claim 5, wherein the step of performing feature fusion on the first convolution feature image and the second convolution feature image by the dynamic feature fusion module to generate a feature fusion image comprises:
respectively carrying out mean pooling operation on the first convolution feature of the first convolution feature image, the second convolution feature of the second convolution feature image and the third convolution feature through a multipath feature channel of the dynamic feature fusion module, and respectively generating a first pooling feature, a second pooling feature and a third pooling feature;
Splicing the first pooling feature, the second pooling feature and the third pooling feature, generating an initial feature vector, inputting the initial feature vector into a second full-connection layer for connection, generating an updated feature vector, and inputting the updated feature vector into a softmax layer;
performing softmax operation on the updated feature vector through the softmax layer to generate an updated vector;
multiplying the numerical value of the update vector by the feature channel number of the feature corresponding to the update vector to generate a plurality of target vectors;
and splicing all the target vectors to generate a feature fusion image.
7. The method for identifying an abnormal target of an underwater image according to claim 1, wherein the step of detecting the abnormal target of the original underwater image in the object bounding box of the feature fusion image by using a preset abnormal object detection module, and generating a preliminary detection result, comprises:
inputting the feature fusion image into a preset area suggestion network to generate a plurality of object boundary boxes;
predicting the objectivity of the feature fusion image in the object boundary box through the area suggestion network, and scoring the objectivity of the feature fusion image to generate an objectivity score;
Judging whether the object property score is larger than or equal to a score threshold value;
if yes, extracting an object boundary frame corresponding to the object nature score and object boundary frame coordinates corresponding to the object boundary frame;
optimizing an object boundary frame corresponding to the object property score by adopting a re-optimization module to generate an optimized object boundary frame;
detecting the feature fusion image by adopting the optimized object boundary box to generate a region detection result;
inputting the original underwater image into the abnormal object detection module;
extracting the characteristics of the original underwater image through a characteristic extractor of the abnormal object detection module to generate a first abnormal object characteristic image;
splicing the first abnormal object characteristic image and the characteristic fusion image to generate a second abnormal object characteristic image and inputting the second abnormal object characteristic image into a convolution network;
extracting the characteristics of the second abnormal object characteristic image through the convolution network to generate an attention detection result;
multiplying the attention detection result by a confidence coefficient channel corresponding to the area detection result to generate a target abnormal object characteristic image;
and detecting the abnormal object of the target abnormal object characteristic image according to the optimized object boundary box on the target abnormal object characteristic image, and generating a preliminary detection result.
8. The method for identifying an abnormal target of an underwater image according to claim 7, wherein the step of performing false positive suppression processing on the preliminary detection result by a preset spatial distribution model to generate an abnormal target detection result comprises:
detecting depth information of an abnormal target from the feature fusion image by a regression method;
constructing a preset spatial distribution model based on the depth information of the abnormal target and image pixel data in an optimized object boundary frame on the feature fusion image;
analyzing a plurality of spatial distributions of abnormal targets in the preliminary detection results;
judging whether the spatial distribution is larger than or equal to a distribution threshold value of the preset spatial distribution model;
if not, determining the spatial distribution as false positive, and removing the spatial distribution;
if yes, determining the spatial distribution of the abnormal target, and combining a plurality of spatial distributions to generate an abnormal target detection result.
9. An underwater image anomaly target recognition system, comprising:
the original underwater image module is used for responding to the received underwater image abnormal target identification request and acquiring an original underwater image corresponding to the underwater image abnormal target identification request;
The enhanced underwater image module is used for enhancing the original underwater image to generate an enhanced underwater image;
the characteristic fusion image module is used for respectively carrying out convolution processing on the original underwater image and the enhanced underwater image and carrying out characteristic fusion to generate a characteristic fusion image;
the primary detection result module is used for detecting an abnormal target of the original underwater image in the object boundary frame of the characteristic fusion image by adopting a preset abnormal object detection module, so as to generate a primary detection result;
and the abnormal target detection result module is used for carrying out false positive inhibition processing on the preliminary detection result through a preset spatial distribution model based on the depth information of the abnormal target in the feature fusion image to generate an abnormal target detection result.
10. An electronic device comprising a memory and a processor, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method for identifying an abnormal object of an underwater image as claimed in any of claims 1 to 8.
CN202310904099.7A 2023-07-21 2023-07-21 Underwater image abnormal target identification method, system and equipment Pending CN116895012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310904099.7A CN116895012A (en) 2023-07-21 2023-07-21 Underwater image abnormal target identification method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310904099.7A CN116895012A (en) 2023-07-21 2023-07-21 Underwater image abnormal target identification method, system and equipment

Publications (1)

Publication Number Publication Date
CN116895012A true CN116895012A (en) 2023-10-17

Family

ID=88311955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310904099.7A Pending CN116895012A (en) 2023-07-21 2023-07-21 Underwater image abnormal target identification method, system and equipment

Country Status (1)

Country Link
CN (1) CN116895012A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611983A (en) * 2023-11-17 2024-02-27 河南大学 Underwater target detection method and system based on hidden communication technology and deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611983A (en) * 2023-11-17 2024-02-27 河南大学 Underwater target detection method and system based on hidden communication technology and deep learning

Similar Documents

Publication Publication Date Title
JP7147078B2 (en) Video frame information labeling method, apparatus, apparatus and computer program
US11386900B2 (en) Visual speech recognition by phoneme prediction
CN103259962B (en) A kind of target tracking method and relevant apparatus
Lu et al. Deep-sea organisms tracking using dehazing and deep learning
US11288438B2 (en) Bi-directional spatial-temporal reasoning for video-grounded dialogues
CN113963026B (en) Target tracking method and system based on non-local feature fusion and online update
CN113920170B (en) Pedestrian track prediction method, system and storage medium combining scene context and pedestrian social relationship
WO2023061102A1 (en) Video behavior recognition method and apparatus, and computer device and storage medium
CN111476771A (en) Domain self-adaptive method and system for generating network based on distance countermeasure
CN116895012A (en) Underwater image abnormal target identification method, system and equipment
CN112686326B (en) Target tracking method and system for intelligent sorting candidate frame
CN114140885A (en) Emotion analysis model generation method and device, electronic equipment and storage medium
CN117036417A (en) Multi-scale transducer target tracking method based on space-time template updating
Edge et al. A generative approach for detection-driven underwater image enhancement
Zhang et al. A framework for the efficient enhancement of non-uniform illumination underwater image using convolution neural network
CN118212463A (en) Target tracking method based on fractional order hybrid network
CN116432736A (en) Neural network model optimization method and device and computing equipment
Gu et al. Vtst: Efficient visual tracking with a stereoscopic transformer
KR20230057765A (en) Multi-object tracking apparatus and method based on self-supervised learning
Fu et al. Distractor-aware event-based tracking
KR102289668B1 (en) Apparatus and method for semantic matching based on matching reliability
Li et al. UStark: underwater image domain-adaptive tracker based on Stark
AU2021100892A4 (en) Deeply learned intelligent system for end to end tracking and detection in videos
KR102420924B1 (en) 3D gaze estimation method and apparatus using multi-stream CNNs
EP3401843A1 (en) A method, an apparatus and a computer program product for modifying media content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination