CN117809168B - Method and device for detecting inherent attribute characteristics based on underwater target - Google Patents

Method and device for detecting inherent attribute characteristics based on underwater target Download PDF

Info

Publication number
CN117809168B
CN117809168B CN202410025836.0A CN202410025836A CN117809168B CN 117809168 B CN117809168 B CN 117809168B CN 202410025836 A CN202410025836 A CN 202410025836A CN 117809168 B CN117809168 B CN 117809168B
Authority
CN
China
Prior art keywords
optical image
image
optical
inherent
underwater
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410025836.0A
Other languages
Chinese (zh)
Other versions
CN117809168A (en
Inventor
张晓伟
王帅
肖龙斌
董文涛
高硕�
崔伟
林媛媛
张士太
张雪鑫
孔紫宁
陈桐
栾新瑞
詹争光
李震宇
董玉才
肖涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 15 Research Institute
Original Assignee
CETC 15 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 15 Research Institute filed Critical CETC 15 Research Institute
Priority to CN202410025836.0A priority Critical patent/CN117809168B/en
Publication of CN117809168A publication Critical patent/CN117809168A/en
Application granted granted Critical
Publication of CN117809168B publication Critical patent/CN117809168B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for detecting inherent attribute characteristics based on an underwater target, wherein the method comprises the following steps: preprocessing an image relating to an underwater target; constructing a training sample set; obtaining an acoustic feature map and an optical feature map related to the underwater target; inputting the fused optical image characteristics into a multi-scale underwater target recognition model to be trained, and obtaining a first-stage multi-scale underwater target recognition model after a plurality of rounds of training; carrying out feature fusion on the inherent spectral features added with the channel attention and the inherent optical features added with the space attention to obtain inherent fusion features; training the first-stage multi-scale underwater target recognition model based on the inherent fusion characteristics to obtain a trained multi-scale underwater target recognition model; inputting the underwater image to be detected into a trained multi-scale underwater target recognition model to obtain detection and recognition results. The method improves the accuracy and stability of underwater target detection.

Description

Method and device for detecting inherent attribute characteristics based on underwater target
Technical Field
The invention relates to the field of image processing, in particular to a method and a device for detecting inherent attribute characteristics based on an underwater target.
Background
In underwater target identification, the accuracy and the effectiveness of the identification of the underwater target have important significance in the fields of marine survey, submarine resource development and the like. The key to underwater target identification is the efficient detection and classification of targets in an underwater environment. For example, in the prior art, an underwater target recognition method based on machine learning, a target recognition system facing a small sample underwater image, an underwater target recognition method based on YOLOv, an underwater target recognition method based on multi-feature fusion, an underwater target recognition method based on multi-mode fusion, and the like are disclosed.
The scheme of the underwater target recognition technology can be mainly divided into two major categories of a deep learning algorithm and a traditional vision algorithm:
(1) Deep learning algorithm: the deep learning algorithm achieves significant results in underwater target recognition. The target detection method mainly comprises a Convolutional Neural Network (CNN), a cyclic neural network (RNN) and the like, and mainly comprises a target detection algorithm based on FASTER RCNN and YOLO. The CNN is suitable for image recognition and classification, and can learn the characteristics of underwater targets and accurately classify and recognize the underwater targets. RNNs are adapted to process time series data of underwater objects, such as sonar signals.
(2) Traditional vision algorithm: traditional vision algorithms are based primarily on computer vision and image processing techniques. Common methods include edge detection, texture feature extraction, template matching, and the like. Edge detection locates and identifies underwater objects by detecting edge information in the image, such as Canny edge detection algorithms. The texture feature extraction algorithm is used for carrying out object identification by extracting feature information such as texture, shape, color and the like of an underwater object and using texture descriptors. Template matching is performed by matching a predefined target template with the underwater image, so that the positioning and the identification of the target are realized.
However, the current algorithm mainly has the following disadvantages:
(1) Deep learning detection and classification algorithms are sensitive to data quality and noise, require a large amount of annotation data to train, have long training time and high calculation resource requirements, and have complex models and are difficult to interpret and debug. The generalization performance for different detection platforms is not strong.
(2) Edge detection algorithms are sensitive to illumination changes and noise, and for underwater environments with large illumination changes and noise interference, the accuracy of edge detection may be reduced. And the edge detection algorithm can only provide boundary information of the target, but can not provide characteristic information such as specific shape, texture and the like of the target.
(3) Texture feature extraction algorithms typically require manual design of feature extraction methods, which may lead to subjectivity of feature extraction, and different algorithms and parameter choices may lead to different results.
(4) The template matching algorithm is sensitive to deformation and shielding of the target, and inaccurate matching results can be caused for the deformation and shielding of the target in a more underwater environment. And template matching algorithms typically require providing an initial position of the target, which if inaccurate may result in a failure of the match or a mislocalization of the target.
Disclosure of Invention
The invention provides a method and a device for detecting inherent attribute characteristics based on an underwater target, which can solve the technical problems that a deep learning algorithm has larger influence on an edge detection algorithm due to data quality and noise sensitivity, illumination change and noise interference, subjectivity caused by a manual design characteristic extraction method, and inaccuracy of a target detection result is caused by influence of deformation and shielding of the underwater target on a template matching algorithm.
In the above embodiments of the method of the present invention, a method for detecting an intrinsic attribute feature of an underwater target includes:
step S1: acquiring an image related to a target, and preprocessing the image related to the target, wherein the image related to the target comprises an acoustic image and an optical image, and the optical image is a water surface optical image or an underwater optical image; the preprocessing comprises the steps of carrying out random rotation on the acoustic image to obtain a first acoustic image; if the optical image is a water surface optical image, removing a water surface reflection area from the water surface optical image by using a trained reflection area detection model to obtain a first optical image; defogging the first optical image, adjusting brightness and restoring color to obtain a second optical image; if the optical image is an underwater optical image, defogging the optical image, adjusting brightness and restoring color to obtain a second optical image;
Step S2: constructing a training sample set based on the first acoustic image, the first optical image, and/or the second optical image;
Step S3: inputting the first acoustic image into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; performing feature fusion based on the acoustic feature map and the optical feature map to obtain fused optical image features; inputting the fused optical image characteristics into a multi-scale underwater target recognition model to be trained, obtaining a first-stage multi-scale underwater target recognition model after a plurality of rounds of training, and obtaining weight information of each parameter in the first-stage multi-scale underwater target recognition model;
step S4: extracting spectral features from the first acoustic image to obtain inherent spectral features; respectively extracting texture, shape and color characteristics of the first optical image and the second optical image of the water surface optical image to obtain inherent optical characteristics; extracting texture, shape and color characteristics of the underwater optical image and the second optical image to obtain inherent optical characteristics; increasing channel attention to the inherent frequency spectrum characteristics to obtain inherent frequency spectrum characteristics with added channel attention; increasing the spatial attention to the inherent optical characteristics to obtain the inherent optical characteristics with the added spatial attention; carrying out feature fusion on the inherent spectral features of the added channel attention and the inherent optical features of the added spatial attention to obtain inherent fusion features;
step S5: training the first-stage multi-scale underwater target recognition model based on the inherent fusion characteristics to obtain a trained multi-scale underwater target recognition model;
Step S6: and acquiring an underwater image to be detected, inputting the underwater image to be detected into the trained multi-scale underwater target recognition model, and obtaining the detection and recognition results of the underwater target.
Optionally, removing the water surface reflection area by using a trained reflection area detection model on the water surface optical image to obtain a first optical image, including:
Step S311: acquiring a plurality of pieces of labeling information comprising training water surface optical images and reflection areas of the training water surface optical images to form a first training data set;
Step S312: training the reflection area detection model based on the first training data set, wherein the reflection area detection model is Faste R-CNN network model, and the trained reflection area detection model is obtained;
step S313: and inputting the water surface optical image into a trained reflection area detection model to obtain a reflection area of the water surface optical image, removing the reflection area from the water surface optical image, adding pixel values to the part corresponding to the reflection area in the water surface optical image, and fusing the area added with the pixel values with surrounding pixels of the area to obtain a first optical image.
Optionally, the defogging, brightness adjustment and color restoration are performed to obtain a second optical image, which includes:
step S321: acquiring a plurality of training optical images under different underwater environments, different targets and different illumination conditions to form a second training data set;
Step S322: defogging by using a dark channel prior defogging algorithm;
step S323: then adjusting brightness of the defogged image by using a histogram equalization method;
step S324: and performing color restoration on the image with the adjusted brightness by using a color adjustment algorithm to obtain a second optical image.
Optionally, inputting the first acoustic image into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; based on the acoustic feature map and the optical feature map, performing feature fusion to obtain fused optical image features, including:
Step S31: inputting the first acoustic image into a trained foreground region segmentation network model, wherein the foreground region segmentation network model is a network model comprising an encoder and a decoder, predicting the probability that each pixel in the first acoustic image belongs to an underwater target foreground by using the foreground region segmentation network model, and dividing a foreground region corresponding to the first acoustic image as an acoustic feature map;
Step S32: inputting the first optical image and/or the second optical image as input, inputting a target depth estimation network model, wherein the target depth estimation network model is a network model comprising an encoder and a decoder, predicting the probability of the underwater target depth corresponding to each pixel of the input image by the target depth estimation network model, and taking the obtained underwater target depth estimation feature map as an optical feature map;
step S33: and fusing the acoustic feature map and the optical feature map in a dimension feature weighting mode to obtain fused optical image features.
Optionally, the extracting texture, shape, and color features to obtain inherent optical features includes:
step S411: a sliding window with a fixed size is arranged for sliding on an optical image with an inherent optical characteristic to be acquired, wherein the optical image with the inherent optical characteristic to be acquired is a first optical image and/or a second optical image;
Step S412: acquiring pixel values of all pixel points in a subarea corresponding to the sliding window on the optical image with the inherent optical characteristics to be acquired; taking the pixel value of the central point of the sub-region as a central pixel value, comparing the central pixel value with the pixel values of 8 pixel points revolving around the central point, generating 8 binary digits according to the comparison result, and combining the 8 binary digits into an LBP code;
The LBP code is generated by the following steps:
Wherein LBP (p) represents LBP coding of the center pixel p, g (n) represents gray value of the surrounding pixel n, g (p) represents gray value of the center pixel p, s (-) is a step function, s (-) is equal to 1 when the function value is equal to or greater than 0, otherwise equal to 0, k is displacement formed by the surrounding pixel position and the center pixel position, and k ranges from 0 to 7;
Generating a characteristic vector of the sliding window at the current position based on the LBP code, wherein the LBP code is used as a part of the characteristic vector of the sliding window at the current position;
Step S413: and combining all feature vectors obtained by sliding the sliding window on the optical image with the inherent optical features to be acquired into the inherent optical features.
In the above embodiments of the method of the present invention, an apparatus for detecting an intrinsic attribute feature of an underwater target includes:
and a pretreatment module: the method comprises the steps of acquiring an image related to a target, and preprocessing the image related to the target, wherein the image related to the target comprises an acoustic image and an optical image, and the optical image is a water surface optical image or a underwater optical image; the preprocessing comprises the steps of carrying out random rotation on the acoustic image to obtain a first acoustic image; if the optical image is a water surface optical image, removing a water surface reflection area from the water surface optical image by using a trained reflection area detection model to obtain a first optical image; defogging the first optical image, adjusting brightness and restoring color to obtain a second optical image; if the optical image is an underwater optical image, defogging the optical image, adjusting brightness and restoring color to obtain a second optical image;
training sample generation module: configured to construct a training sample set based on the first acoustic image, the first optical image, and/or the second optical image;
And the feature extraction module is used for: the first acoustic image is input into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; performing feature fusion based on the acoustic feature map and the optical feature map to obtain fused optical image features; inputting the fused optical image characteristics into a multi-scale underwater target recognition model to be trained, obtaining a first-stage multi-scale underwater target recognition model after a plurality of rounds of training, and obtaining weight information of each parameter in the first-stage multi-scale underwater target recognition model;
And a feature fusion module: the method comprises the steps of extracting spectral features from the first acoustic image to obtain inherent spectral features; respectively extracting texture, shape and color characteristics of the first optical image and the second optical image of the water surface optical image to obtain inherent optical characteristics; extracting texture, shape and color characteristics of the underwater optical image and the second optical image to obtain inherent optical characteristics; increasing channel attention to the inherent frequency spectrum characteristics to obtain inherent frequency spectrum characteristics with added channel attention; increasing the spatial attention to the inherent optical characteristics to obtain the inherent optical characteristics with the added spatial attention; carrying out feature fusion on the inherent spectral features of the added channel attention and the inherent optical features of the added spatial attention to obtain inherent fusion features;
Training module: training the first-stage multi-scale underwater target recognition model based on the inherent fusion characteristics to obtain a trained multi-scale underwater target recognition model;
And an identification module: the method comprises the steps of acquiring an underwater image to be detected, inputting the underwater image to be detected into the trained multi-scale underwater target recognition model, and obtaining detection and recognition results of the underwater target.
In the above-described method embodiments of the present invention, a computer-readable storage medium having stored therein a plurality of instructions for loading and executing by a processor the method as described above.
In the above embodiments of the method of the present invention, an electronic device includes:
a processor for executing a plurality of instructions;
a memory for storing a plurality of instructions;
Wherein the plurality of instructions are for storage by the memory and loading and executing by the processor the method as described above.
The invention provides underwater target detection and identification based on an attention mechanism and inherent attribute characteristics of targets. The method can adaptively pay attention to the area related to the target in the underwater image by introducing an attention mechanism, and improves the robustness of the deep learning algorithm to noise and data quality. Meanwhile, the inherent attribute characteristics of the target, such as shape, texture and the like, are utilized to extract and classify the characteristics by combining a deep learning algorithm, so that the accuracy and the robustness of underwater target detection and identification are improved.
Specifically, the method firstly uses a deep learning algorithm to extract and classify the characteristics of the underwater image to obtain a preliminary target detection result. Then, through introducing the attention mechanism, the attention weight of the model is adaptively adjusted according to the inherent attribute characteristics of the target, more attention points are concentrated in the target area, and the accuracy of target detection is improved. And then, extracting and classifying the features again by utilizing the inherent attribute features of the target, such as shape, texture and the like and combining a deep learning algorithm, so as to further improve the accuracy and the robustness of target identification.
By comprehensively utilizing the attention mechanism and the inherent attribute characteristics of the target, the invention can overcome the sensitivity of deep learning to data quality and noise and improve the accuracy and the robustness of underwater target detection and identification. Meanwhile, the method can also provide more target characteristic information and solve the problems of target deformation and shielding. Therefore, the invention brings a new solution to the field of underwater target detection and identification, and has important application value and popularization prospect.
The invention has the following advantages:
1. the invention introduces an inherent characteristic extraction fusion module, and can fuse the underwater target characteristics fused by the multi-source image data with the underwater target characteristics based on the LOFAR spectrogram and the local binary pattern algorithm. Compared with the traditional single feature extraction method, the multichannel fusion module can fully utilize the advantages of different features to extract more comprehensive and rich underwater target features, so that the accuracy and stability of underwater target detection are improved.
2. The method and the device can fully utilize the similarity of the structure and the space dimension of the multi-source data, and evaluate the importance of each source data by calculating the structure aggregation degree and the space aggregation degree. The method and the device effectively utilize the multi-source data, and can effectively utilize the information of the multi-source data to extract the important characteristics of the multi-mode image by analyzing the structural dimension similarity and the spatial dimension similarity. The attention mechanism adjusts the importance, and the importance of each source data on the characteristic channel and the space is adjusted by adopting the channel attention mechanism and the space attention mechanism, so that the characteristics of the multi-source data are fused better.
3. The invention adopts a two-stage training deep learning method, and adopts a deep learning method, such as a Convolutional Neural Network (CNN), to extract the characteristic representation of the underwater target. The deep learning method has strong characteristic learning and representing capability, and can better capture the information of the shape, texture, structure and the like of the underwater target. Compared with the traditional underwater target detection method, the method based on deep learning can improve the accuracy, the robustness and the instantaneity of underwater target detection.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing embodiments of the present invention in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, and not constitute a limitation to the invention. In the drawings, like reference numerals generally refer to like parts or steps.
FIG. 1 is a flow chart of a method for detecting an inherent attribute feature of an underwater target according to the present invention;
FIG. 2 is a schematic diagram of a method architecture for detecting based on inherent attribute features of an underwater target according to the present invention;
FIG. 3 is a schematic diagram of a flow for detecting and removing a water surface reflection area of an image according to the present invention;
FIG. 4 is a flow chart of the invention for defogging, shading adjustment and color restoration of an image;
FIG. 5 is a flow chart of feature fusion according to the present invention;
FIG. 6 is a schematic flow chart of extracting features of an underwater target by adopting a local binary pattern algorithm;
FIG. 7 is a schematic flow chart of extracting characteristics of an underwater target based on a LOFAR spectrogram algorithm;
FIG. 8 is a flow chart of feature fusion based on attention mechanism weights according to the present invention;
FIG. 9 is a schematic flow chart of two-stage training based on weight sharing multi-scale underwater target recognition according to the present invention;
FIG. 10 is a schematic view of the structure of the device for detecting the inherent attribute characteristics of the underwater target according to the present invention;
Fig. 11 is a schematic structural diagram of an electronic device for detecting based on inherent attribute characteristics of an underwater target according to the present invention.
Detailed Description
Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise. It will be appreciated by those skilled in the art that the terms "first", "second", S1, S2, etc. in the embodiments of the present invention are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them. It should also be understood that in embodiments of the present invention, "plurality" may refer to two or more, and "at least one" may refer to one, two or more. It should also be appreciated that any component, data, or structure referred to in an embodiment of the invention may be generally understood as one or more without explicit limitation or the contrary in the context.
In addition, the term "and/or" in the present invention is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In the present invention, the character "/" generally indicates that the front and rear related objects are an or relationship. It should also be understood that the description of the embodiments of the present invention emphasizes the differences between the embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, the techniques, methods, and apparatus should be considered part of the specification. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations with electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
Exemplary method
Fig. 1 is a flow chart of a method for detecting an intrinsic property characteristic of an underwater target according to an exemplary embodiment of the present invention. As shown in fig. 1, the method comprises the following steps:
step S1: acquiring an image related to a target, and preprocessing the image related to the target, wherein the image related to the target comprises an acoustic image and an optical image, and the optical image is a water surface optical image or an underwater optical image; the preprocessing comprises the steps of carrying out random rotation on the acoustic image to obtain a first acoustic image; if the optical image is a water surface optical image, removing a water surface reflection area from the water surface optical image by using a trained reflection area detection model to obtain a first optical image; defogging the first optical image, adjusting brightness and restoring color to obtain a second optical image; if the optical image is an underwater optical image, defogging the optical image, adjusting brightness and restoring color to obtain a second optical image;
Step S2: constructing a training sample set based on the first acoustic image, the first optical image, and/or the second optical image;
Step S3: inputting the first acoustic image into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; performing feature fusion based on the acoustic feature map and the optical feature map to obtain fused optical image features; inputting the fused optical image characteristics into a multi-scale underwater target recognition model to be trained, obtaining a first-stage multi-scale underwater target recognition model after a plurality of rounds of training, and obtaining weight information of each parameter in the first-stage multi-scale underwater target recognition model;
step S4: extracting spectral features from the first acoustic image to obtain inherent spectral features; respectively extracting texture, shape and color characteristics of the first optical image and the second optical image of the water surface optical image to obtain inherent optical characteristics; extracting texture, shape and color characteristics of the underwater optical image and the second optical image to obtain inherent optical characteristics; increasing channel attention to the inherent frequency spectrum characteristics to obtain inherent frequency spectrum characteristics with added channel attention; increasing the spatial attention to the inherent optical characteristics to obtain the inherent optical characteristics with the added spatial attention; carrying out feature fusion on the inherent spectral features of the added channel attention and the inherent optical features of the added spatial attention to obtain inherent fusion features;
step S5: training the first-stage multi-scale underwater target recognition model based on the inherent fusion characteristics to obtain a trained multi-scale underwater target recognition model;
Step S6: and acquiring an underwater image to be detected, inputting the underwater image to be detected into the trained multi-scale underwater target recognition model, and obtaining the detection and recognition results of the underwater target.
The technical concept of the invention is to detect and identify the underwater target based on the attention mechanism and the inherent attribute characteristics of the underwater target. The invention can fully utilize the attention mechanism and the inherent attribute characteristics of the target, and improve the accuracy and the robustness of underwater target detection and identification. Meanwhile, the method can also solve the problem of sensitivity of deep learning to data quality and noise, provide more target characteristic information and solve the problems of target deformation and occlusion. Therefore, the invention has important application value and popularization prospect.
As shown in fig. 2, an embodiment of a method for detecting based on inherent property characteristics of an underwater target is provided.
(1) First, underwater image data is input, and optical and acoustic images are preprocessed. The pretreatment step comprises denoising and enhancement treatment, such as removing a water surface target reflection area, defogging, shading adjustment, color restoration and the like, so as to improve the accuracy of a subsequent algorithm.
(2) The preprocessed acoustic and optical images are respectively input into the GAN deep learning network to expand the dataset. A large amount of unlabeled data is generated over the GAN network, providing a data set for the following unsupervised learning.
(3) And respectively inputting the preprocessed acoustic image and the preprocessed optical image into a foreground region segmentation network and a target depth estimation network for feature extraction and classification, and obtaining a feature map of corresponding data through the two networks. And then, heterogeneous information fusion is carried out by a dimension characteristic weighting method. The acoustic and optical feature maps are fused to extract more comprehensive and accurate feature information.
(4) And extracting the inherent attribute feature graphs of the acoustic image and the optical image respectively by using a LOFAR spectrogram and a local binary pattern algorithm. These feature maps include spectral, spectral line feature extraction, texture, shape, color, geometric feature extraction. Next, attention-based mechanisms are introduced to fuse the inherent attribute features of the target, which includes two sub-modules, feature fusion and attention weight calculation.
(5) And (3) inputting the feature map obtained by fusing the heterogeneous information in the step (3) into a multi-scale underwater target recognition model. And obtaining a network model of the first stage through multiple rounds of supervision training, and extracting weight parameters of the network.
(6) And finally, applying the feature map based on the attention mechanism weight fusion in the step (4) to a multi-scale deep learning network on the basis of the weight parameters obtained in the step (5), and obtaining a network model of a second stage through a large amount of unsupervised training. And inputting the acquired data into the model to obtain the detection and identification results of the underwater target.
The step S1: acquiring an image related to a target, and preprocessing the image related to the target, wherein the image related to the target comprises an acoustic image and an optical image, and the optical image is a water surface optical image or an underwater optical image; the preprocessing comprises the steps of carrying out random rotation on the acoustic image to obtain a first acoustic image; if the optical image is a water surface optical image, removing a water surface reflection area from the water surface optical image by using a trained reflection area detection model to obtain a first optical image; defogging the first optical image, adjusting brightness and restoring color to obtain a second optical image; if the optical image is an underwater optical image, defogging the optical image, adjusting brightness and recovering color to obtain a second optical image, wherein:
as shown in fig. 3, the removing the water surface reflection area by using the trained reflection area detection model on the water surface optical image to obtain a first optical image includes:
Step S311: acquiring a plurality of pieces of labeling information comprising training water surface optical images and reflection areas of the training water surface optical images to form a first training data set;
Step S312: training the reflection area detection model based on the first training data set, wherein the reflection area detection model is Faste R-CNN network model, and the trained reflection area detection model is obtained;
Step S313: and inputting the water surface optical image into a trained reflection area detection model to obtain a reflection area of the water surface optical image, removing the reflection area in the water surface optical image by utilizing Adaptive Histogram Equalization (AHE), adding pixel values to the part corresponding to the reflection area in the water surface optical image, fusing the area added with the pixel values with surrounding pixels of the area, using an adaptive enhancement Retinex technology for image enhancement, improving brightness, contrast and detail of the image, removing noise in the image and signals by wavelet transformation, enhancing the contrast of the image, and obtaining the first optical image.
In this embodiment, the labeling may be manual labeling or automatic labeling using other algorithms. It is possible to determine which regions are considered to be reflective regions by setting a threshold. Through the steps, the reflection area detection model can effectively detect and remove the reflection area in the underwater image, and the accuracy of water surface target detection is improved.
As shown in fig. 4, the defogging, brightness adjustment and color restoration to obtain a second optical image includes:
step S321: acquiring a plurality of training optical images under different underwater environments, different targets and different illumination conditions to form a second training data set;
Step S322: defogging by using a dark channel prior defogging algorithm;
step S323: then adjusting brightness of the defogged image by using a histogram equalization method;
step S324: and performing color restoration on the image with the adjusted brightness by using a color adjustment algorithm to obtain a second optical image.
In this embodiment, the defogging effect is removed by estimating the global minimum in the image, so that the image is clearer. The brightness, namely the brightness and the darkness of the underwater image are adjusted, so that the brightness and the contrast of the image are more balanced. The brightness adjustment is realized by using methods such as histogram equalization and the like, so that the image is clearer and clearer. Since underwater images often shift to blue, color correction algorithms are used to adjust the color of the image to more closely approximate a real scene. Correcting the offset and gain of the color channels may enable color reproduction.
Through the steps, the underwater region data enhancement can reduce the atomization effect, adjust the brightness and contrast of the image, restore the true color of the image, and generate more diversified training samples, so that the accuracy and the robustness of underwater target detection are improved.
The step S3: inputting the first acoustic image into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; performing feature fusion based on the acoustic feature map and the optical feature map to obtain fused optical image features; inputting the fused optical image characteristics into a multi-scale underwater target recognition model to be trained, obtaining a first-stage multi-scale underwater target recognition model after a plurality of rounds of training, and obtaining weight information of each parameter in the first-stage multi-scale underwater target recognition model, wherein:
As shown in fig. 5, inputting the first acoustic image into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; based on the acoustic feature map and the optical feature map, performing feature fusion to obtain fused optical image features, including:
Step S31: inputting the first acoustic image into a trained foreground region segmentation network model, wherein the foreground region segmentation network model is a network model comprising an encoder and a decoder, predicting the probability that each pixel in the first acoustic image belongs to an underwater target foreground by using the foreground region segmentation network model, and dividing a foreground region corresponding to the first acoustic image as an acoustic feature map;
Step S32: inputting the first optical image and/or the second optical image as input, inputting a target depth estimation network model, wherein the target depth estimation network model is a network model comprising an encoder and a decoder, predicting the probability of the underwater target depth corresponding to each pixel of the input image by the target depth estimation network model, and taking the obtained underwater target depth estimation feature map as an optical feature map;
the step realizes the estimation of the depth of the underwater target.
Step S33: and fusing the acoustic feature map and the optical feature map in a dimension feature weighting mode to obtain fused optical image features.
The step fuses features of the visible light image and the acoustic image data. Heterogeneous information fusion is carried out by a dimension feature weighting method, features of different data sources are fused, more comprehensive and accurate target features are obtained, and the feature extraction effect of the underwater micro target structure is improved.
In this embodiment, on the basis of the fusion of the features, a method for target depth regression is studied to achieve accurate feature extraction of the spatial target structure. And through learning the depth information of the target, the depth fusion of the data features of different sensors is completed by means of the heterogeneous feature fusion module, so that the depth estimation of the underwater target is realized.
The foreground region segmentation network model and the target depth estimation network model perform end-to-end training through a back propagation algorithm to minimize segmentation errors and depth estimation errors. By jointly researching the foreground segmentation and the depth estimation of the underwater target, the foreground region of the underwater target can be accurately extracted, accurate depth information can be obtained, and the feature map extraction of the underwater target can be realized.
The step S4: extracting spectral features from the first acoustic image to obtain inherent spectral features; respectively extracting texture, shape and color characteristics of the first optical image and the second optical image of the water surface optical image to obtain inherent optical characteristics; extracting texture, shape and color characteristics of the underwater optical image and the second optical image to obtain inherent optical characteristics; increasing channel attention to the inherent frequency spectrum characteristics to obtain inherent frequency spectrum characteristics with added channel attention; increasing the spatial attention to the inherent optical characteristics to obtain the inherent optical characteristics with the added spatial attention; and carrying out feature fusion on the inherent spectral features of the added channel attention and the inherent optical features of the added spatial attention to obtain inherent fusion features, wherein:
as shown in fig. 6, the extraction of texture, shape and color features, to obtain inherent optical features, includes:
step S411: a sliding window with a fixed size is arranged for sliding on an optical image with an inherent optical characteristic to be acquired, wherein the optical image with the inherent optical characteristic to be acquired is a first optical image and/or a second optical image;
Step S412: acquiring pixel values of all pixel points in a subarea corresponding to the sliding window on the optical image with the inherent optical characteristics to be acquired; taking the pixel value of the central point of the sub-region as a central pixel value, comparing the central pixel value with the pixel values of 8 pixel points revolving around the central point, generating 8 binary digits according to the comparison result, and combining the 8 binary digits into an LBP code;
The LBP code is generated by the following steps:
Wherein LBP (p) represents LBP coding of the center pixel p, g (n) represents gray value of the surrounding pixel n, g (p) represents gray value of the center pixel p, s (-) is a step function, s (-) is equal to 1 when the function value is equal to or greater than 0, otherwise equal to 0, k is displacement formed by the surrounding pixel position and the center pixel position, and k ranges from 0 to 7;
Generating a characteristic vector of the sliding window at the current position based on the LBP code, wherein the LBP code is used as a part of the characteristic vector of the sliding window at the current position;
Step S413: and combining all feature vectors obtained by sliding the sliding window on the optical image with the inherent optical features to be acquired into the inherent optical features.
In this embodiment, input: an underwater target image.
Pretreatment: the image is subjected to preprocessing operations such as denoising, enhancement and the like to improve the image quality.
Window setting and moving: a window of fixed size is provided for sliding over the image. Starting from the upper left corner of the image, the window is moved to each position of the image in turn.
Pixel acquisition within a window: at each window position, pixel values within the window are acquired.
Center pixel value calculation: and comparing the central pixel value with the surrounding pixel values in the window, and generating a binary code according to the comparison result.
LBP code generation: for each pixel within the window, a binary bit is generated based on the comparison of the central pixel value and the surrounding pixel values, and these binary bits are combined into an LBP code.
The formula of the LBP algorithm is as follows:
Wherein LBP (p) represents the LBP coding of the center pixel p, g (n) represents the gray value of the surrounding pixel n, g (p) represents the gray value of the center pixel p, s (-) is a step function, s (-) is equal to 1 when the function value is equal to or greater than 0, otherwise equal to 0, k is the displacement formed by the surrounding pixel position and the center pixel position, and k ranges from 0 to 7.
Feature vector generation: for each window position, a feature vector is generated and LBP is encoded as an element of the feature vector.
Window movement termination condition: when the window moves to the lower right corner of the image, the movement is stopped.
And (3) outputting: all feature vectors of the image are obtained and used for subsequent target recognition and classification.
The local binary pattern algorithm has the advantages of simple calculation, high calculation efficiency and the like, and is suitable for processing and analyzing underwater images. Through a local binary pattern algorithm, the texture features of the underwater target can be better captured and described. The local binary pattern algorithm extracts texture features by calculating the gray level difference between the pixel point and the neighboring pixel points, thereby revealing the texture structure of the target. By performing local binary pattern coding on each pixel point and combining the coding results into a feature vector, the texture features of the target can be effectively described.
In this embodiment, as shown in fig. 7, the spectral features are extracted from the first acoustic image to obtain an intrinsic spectral feature, where:
Input: underwater sonar signal data.
Framing: the sonar signal is divided into a plurality of overlapping time windows.
LOFAR spectrogram calculation: a Short Time Fourier Transform (STFT) is applied to each time window, energy in a specific frequency range is calculated, and the energy is converted into pixel values of the image.
The calculation method of the LOFAR spectrogram is as follows: first, an input signal is subjected to framing processing, and the signal is divided into a plurality of overlapping time windows. A Short Time Fourier Transform (STFT) is applied to each time window to convert the time domain signal to a frequency domain signal. In the frequency domain, the energy of each frequency is calculated from a specific frequency range. The frequency energy within each time window is converted into pixel values of the image according to a certain color mapping rule. And splicing the spectrogram images of all the time windows together to obtain a complete LOFAR spectrogram.
Extracting target features: and extracting the spectral characteristics of the target from the LOFAR spectrogram, wherein the spectral characteristics comprise frequency distribution, frequency change and other information.
In this embodiment, by calculating the LOFAR spectrum, the spectrum features of the underwater target can be extracted, so as to be used for classifying and identifying the target. The LOFAR spectrogram method is suitable for underwater acoustic image processing tasks such as underwater target detection, target tracking, target identification and the like. In practical applications, a suitable interval calculation method and window size may be selected according to specific application and task requirements.
In this embodiment, as shown in fig. 2 and 8, a feature fusion module based on attention mechanism weight is used to perform feature fusion on multi-source data to obtain a multi-mode image. The flow of the module is as follows:
Input: multisource data, including structural dimensional similarity and spatial dimensional similarity.
Structural polymerization degree calculation: and calculating the structural aggregation degree of each source data by analyzing the structural dimension similarity. The degree of structural aggregation represents the degree of similarity of data in the structural dimension.
Channel attention mechanism: the degree of structure aggregation is input into a channel attention mechanism, and the attention weight of each source data on the characteristic channel is calculated. The channel attention mechanism adjusts the importance of each source data on the feature channel according to the degree of structure aggregation.
Calculating the space polymerization degree: and calculating the spatial aggregation degree of each source data by analyzing the spatial dimension similarity. The degree of spatial aggregation represents the degree of similarity of data in the spatial dimension.
Spatial attention mechanism: the spatial aggregation degree is input into a spatial attention mechanism, and the attention weight of each source data on the space is calculated. The spatial attention mechanism adjusts the spatial importance of each source data according to the spatial aggregation degree.
Feature fusion: and carrying out weighted fusion on each source data according to the channel attention weight and the space attention weight to obtain a final multi-mode image.
Through the steps, the feature fusion module based on the attention mechanism weight can effectively fuse the features of the multi-source data and extract the important features of the multi-mode image, so that the comprehensive analysis and processing of the multi-mode data are realized. The module has better logic and consistency.
The invention provides an embodiment of a method for detecting based on inherent attribute characteristics of an underwater target.
The part of underwater target detection based on a multi-scale and attention mechanism adopts a network structure based on YOLO v5, and introduces some improvements to adapt to the requirements of underwater multi-scale target detection. The specific implementation mode is as follows:
the stage judging module is used for: as shown in fig. 9, the network is judged to belong to the stage, the first stage inputs labeling data for supervised learning, and the second stage inputs non-labeling data for unsupervised learning.
And the feature extraction module is used for: first, a Convolutional Neural Network (CNN) is used as a basic network structure to capture local and global features of an underwater target through convolution and pooling operations.
Multi-scale prediction model: a network structure based on YOLO v5 is designed, and target detection is carried out in feature graphs of different levels through a pyramid structure so as to adapt to targets of different scales. And a multiscale change model is introduced, and the size of the receptive field of the network is gradually adjusted to adapt to detection of targets with different scales.
Two-stage training network: the first stage takes a multi-source heterogeneous feature map as input to carry out supervised learning training; the second stage takes the multi-source heterogeneous characteristics based on the attention mechanism weight module as input, and uses the model weight of the first stage as initial weight to perform unsupervised learning training.
In summary, the method for detecting the underwater target based on the multi-scale and the attention mechanism can extract the features of different scales through the multi-scale convolution operation and the multi-scale change model, and can enhance the accuracy and the robustness of the feature representation through the attention mechanism. The method can effectively cope with the challenges of complexity of the underwater environment and the change of the target scale, and improves the accuracy and the stability of underwater target detection.
The invention introduces an inherent characteristic extraction fusion module, and can fuse the underwater target characteristics fused by the multi-source image data with the underwater target characteristics based on the LOFAR spectrogram and the local binary pattern algorithm. Compared with the traditional single feature extraction method, the multichannel fusion module can fully utilize the advantages of different features to extract more comprehensive and rich underwater target features, so that the accuracy and stability of underwater target detection are improved.
The method and the device can fully utilize the similarity of the structure and the space dimension of the multi-source data, and evaluate the importance of each source data by calculating the structure aggregation degree and the space aggregation degree. The method and the device effectively utilize the multi-source data, and can effectively utilize the information of the multi-source data to extract the important characteristics of the multi-mode image by analyzing the structural dimension similarity and the spatial dimension similarity. The importance of each source data on the characteristic channel and space is adjusted by adopting a channel attention mechanism and a space attention mechanism, so that the characteristics of the multi-source data are fused better.
The present invention employs a deep learning method, such as Convolutional Neural Network (CNN), to extract a characteristic representation of an underwater target. The deep learning method has strong characteristic learning and representing capability, and can better capture the information of the shape, texture, structure and the like of the underwater target. Compared with the traditional underwater target detection method, the method based on deep learning can improve the accuracy, the robustness and the instantaneity of underwater target detection.
Exemplary apparatus
Fig. 10 is a schematic structural diagram of an apparatus for detecting an intrinsic property characteristic of an underwater target according to an exemplary embodiment of the present invention. As shown in fig. 10, the present embodiment includes:
and a pretreatment module: the method comprises the steps of acquiring an image related to a target, and preprocessing the image related to the target, wherein the image related to the target comprises an acoustic image and an optical image, and the optical image is a water surface optical image or a underwater optical image; the preprocessing comprises the steps of carrying out random rotation on the acoustic image to obtain a first acoustic image; if the optical image is a water surface optical image, removing a water surface reflection area from the water surface optical image by using a trained reflection area detection model to obtain a first optical image; defogging the first optical image, adjusting brightness and restoring color to obtain a second optical image; if the optical image is an underwater optical image, defogging the optical image, adjusting brightness and restoring color to obtain a second optical image;
training sample generation module: configured to construct a training sample set based on the first acoustic image, the first optical image, and/or the second optical image;
And the feature extraction module is used for: the first acoustic image is input into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; performing feature fusion based on the acoustic feature map and the optical feature map to obtain fused optical image features; inputting the fused optical image characteristics into a multi-scale underwater target recognition model to be trained, obtaining a first-stage multi-scale underwater target recognition model after a plurality of rounds of training, and obtaining weight information of each parameter in the first-stage multi-scale underwater target recognition model;
And a feature fusion module: the method comprises the steps of extracting spectral features from the first acoustic image to obtain inherent spectral features; respectively extracting texture, shape and color characteristics of the first optical image and the second optical image of the water surface optical image to obtain inherent optical characteristics; extracting texture, shape and color characteristics of the underwater optical image and the second optical image to obtain inherent optical characteristics; increasing channel attention to the inherent frequency spectrum characteristics to obtain inherent frequency spectrum characteristics with added channel attention; increasing the spatial attention to the inherent optical characteristics to obtain the inherent optical characteristics with the added spatial attention; carrying out feature fusion on the inherent spectral features of the added channel attention and the inherent optical features of the added spatial attention to obtain inherent fusion features;
Training module: training the first-stage multi-scale underwater target recognition model based on the inherent fusion characteristics to obtain a trained multi-scale underwater target recognition model;
And an identification module: the method comprises the steps of acquiring an underwater image to be detected, inputting the underwater image to be detected into the trained multi-scale underwater target recognition model, and obtaining detection and recognition results of the underwater target.
Exemplary electronic device
Fig. 11 is a structure of an electronic device 110 provided in an exemplary embodiment of the present invention. The electronic device may be either or both of the first device and the second device, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom. Fig. 11 illustrates a block diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 11, the electronic device includes one or more processors 111 and a memory 112.
The processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that may be executed by the processor 111 to implement the method of detecting an inherent property characteristic of a software program based on an underwater target of the various embodiments of the disclosure described above and/or other desired functions. In one example, the electronic device may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
In addition, the input device 113 may also include, for example, a keyboard, a mouse, and the like.
The output device 114 can output various information to the outside. The output device 114 may include, for example, a display, speakers, a printer, and a communication network and remote output apparatus connected thereto, etc.
Of course, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 11 for simplicity, components such as buses, input/output interfaces, and the like being omitted. In addition, the electronic device may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the method of detecting an inherent property feature of an underwater target according to the various embodiments of the present disclosure described in the "exemplary method" section of this specification.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present disclosure described in the above section "exemplary method" of the present disclosure.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present disclosure have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (8)

1. A method of detecting based on inherent property characteristics of an underwater target, the method comprising the steps of:
step S1: acquiring an image related to a target, and preprocessing the image related to the target, wherein the image related to the target comprises an acoustic image and an optical image, and the optical image is a water surface optical image or an underwater optical image; the preprocessing comprises the steps of carrying out random rotation on the acoustic image to obtain a first acoustic image; if the optical image is a water surface optical image, removing a water surface reflection area from the water surface optical image by using a trained reflection area detection model to obtain a first optical image; defogging the first optical image, adjusting brightness and restoring color to obtain a second optical image; if the optical image is an underwater optical image, defogging the optical image, adjusting brightness and restoring color to obtain a second optical image;
Step S2: constructing a training sample set based on the first acoustic image, the first optical image, and/or the second optical image;
Step S3: inputting the first acoustic image into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; performing feature fusion based on the acoustic feature map and the optical feature map to obtain fused optical image features; inputting the fused optical image characteristics into a multi-scale underwater target recognition model to be trained, obtaining a first-stage multi-scale underwater target recognition model after a plurality of rounds of training, and obtaining weight information of each parameter in the first-stage multi-scale underwater target recognition model;
step S4: extracting spectral features from the first acoustic image to obtain inherent spectral features; respectively extracting texture, shape and color characteristics of the first optical image and the second optical image of the water surface optical image to obtain inherent optical characteristics; extracting texture, shape and color characteristics of the underwater optical image and the second optical image to obtain inherent optical characteristics; increasing channel attention to the inherent frequency spectrum characteristics to obtain inherent frequency spectrum characteristics with added channel attention; increasing the spatial attention to the inherent optical characteristics to obtain the inherent optical characteristics with the added spatial attention; carrying out feature fusion on the inherent spectral features of the added channel attention and the inherent optical features of the added spatial attention to obtain inherent fusion features;
step S5: training the first-stage multi-scale underwater target recognition model based on the inherent fusion characteristics to obtain a trained multi-scale underwater target recognition model;
Step S6: and acquiring an underwater image to be detected, inputting the underwater image to be detected into the trained multi-scale underwater target recognition model, and obtaining the detection and recognition results of the underwater target.
2. The method of claim 1, wherein removing the surface reflection area from the surface optical image using a trained reflection area detection model to obtain a first optical image comprises:
Step S311: acquiring a plurality of pieces of labeling information comprising training water surface optical images and reflection areas of the training water surface optical images to form a first training data set;
Step S312: training the reflection area detection model based on the first training data set, wherein the reflection area detection model is Faste R-CNN network model, and the trained reflection area detection model is obtained;
step S313: and inputting the water surface optical image into a trained reflection area detection model to obtain a reflection area of the water surface optical image, removing the reflection area from the water surface optical image, adding pixel values to the part corresponding to the reflection area in the water surface optical image, and fusing the area added with the pixel values with surrounding pixels of the area to obtain a first optical image.
3. The method of claim 1, wherein the defogging, brightness adjustment, color reduction to obtain a second optical image comprises:
step S321: acquiring a plurality of training optical images under different underwater environments, different targets and different illumination conditions to form a second training data set;
Step S322: defogging by using a dark channel prior defogging algorithm;
step S323: then adjusting brightness of the defogged image by using a histogram equalization method;
step S324: and performing color restoration on the image with the adjusted brightness by using a color adjustment algorithm to obtain a second optical image.
4. The method of claim 1, wherein the inputting the first acoustic image into a trained foreground region segmentation network model results in an acoustic signature; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; based on the acoustic feature map and the optical feature map, performing feature fusion to obtain fused optical image features, including:
Step S31: inputting the first acoustic image into a trained foreground region segmentation network model, wherein the foreground region segmentation network model is a network model comprising an encoder and a decoder, predicting the probability that each pixel in the first acoustic image belongs to an underwater target foreground by using the foreground region segmentation network model, and dividing a foreground region corresponding to the first acoustic image as an acoustic feature map;
Step S32: inputting the first optical image and/or the second optical image as input, inputting a target depth estimation network model, wherein the target depth estimation network model is a network model comprising an encoder and a decoder, predicting the probability of the underwater target depth corresponding to each pixel of the input image by the target depth estimation network model, and taking the obtained underwater target depth estimation feature map as an optical feature map;
step S33: and fusing the acoustic feature map and the optical feature map in a dimension feature weighting mode to obtain fused optical image features.
5. The method of claim 4, wherein extracting texture, shape, color features to obtain intrinsic optical features comprises:
step S411: a sliding window with a fixed size is arranged for sliding on an optical image with an inherent optical characteristic to be acquired, wherein the optical image with the inherent optical characteristic to be acquired is a first optical image and/or a second optical image;
Step S412: acquiring pixel values of all pixel points in a subarea corresponding to the sliding window on the optical image with the inherent optical characteristics to be acquired; taking the pixel value of the central point of the sub-region as a central pixel value, comparing the central pixel value with the pixel values of 8 pixel points revolving around the central point, generating 8 binary digits according to the comparison result, and combining the 8 binary digits into an LBP code;
The LBP code is generated by the following steps:
Wherein LBP (p) represents LBP coding of the center pixel p, g (n) represents gray value of the surrounding pixel n, g (p) represents gray value of the center pixel p, s (-) is a step function, s (-) is equal to 1 when the function value is equal to or greater than 0, otherwise equal to 0, k is displacement formed by the surrounding pixel position and the center pixel position, and k ranges from 0 to 7;
Generating a characteristic vector of the sliding window at the current position based on the LBP code, wherein the LBP code is used as a part of the characteristic vector of the sliding window at the current position;
Step S413: and combining all feature vectors obtained by sliding the sliding window on the optical image with the inherent optical features to be acquired into the inherent optical features.
6. An apparatus for detecting an inherent property characteristic of an underwater target, comprising:
and a pretreatment module: the method comprises the steps of acquiring an image related to a target, and preprocessing the image related to the target, wherein the image related to the target comprises an acoustic image and an optical image, and the optical image is a water surface optical image or a underwater optical image; the preprocessing comprises the steps of carrying out random rotation on the acoustic image to obtain a first acoustic image; if the optical image is a water surface optical image, removing a water surface reflection area from the water surface optical image by using a trained reflection area detection model to obtain a first optical image; defogging the first optical image, adjusting brightness and restoring color to obtain a second optical image; if the optical image is an underwater optical image, defogging the optical image, adjusting brightness and restoring color to obtain a second optical image;
training sample generation module: configured to construct a training sample set based on the first acoustic image, the first optical image, and/or the second optical image;
And the feature extraction module is used for: the first acoustic image is input into a trained foreground region segmentation network model to obtain an acoustic feature map; taking the first optical image and/or the second optical image as input, and inputting a target depth estimation network model to obtain an optical characteristic diagram; performing feature fusion based on the acoustic feature map and the optical feature map to obtain fused optical image features; inputting the fused optical image characteristics into a multi-scale underwater target recognition model to be trained, obtaining a first-stage multi-scale underwater target recognition model after a plurality of rounds of training, and obtaining weight information of each parameter in the first-stage multi-scale underwater target recognition model;
And a feature fusion module: the method comprises the steps of extracting spectral features from the first acoustic image to obtain inherent spectral features; respectively extracting texture, shape and color characteristics of the first optical image and the second optical image of the water surface optical image to obtain inherent optical characteristics; extracting texture, shape and color characteristics of the underwater optical image and the second optical image to obtain inherent optical characteristics; increasing channel attention to the inherent frequency spectrum characteristics to obtain inherent frequency spectrum characteristics with added channel attention; increasing the spatial attention to the inherent optical characteristics to obtain the inherent optical characteristics with the added spatial attention; carrying out feature fusion on the inherent spectral features of the added channel attention and the inherent optical features of the added spatial attention to obtain inherent fusion features;
Training module: training the first-stage multi-scale underwater target recognition model based on the inherent fusion characteristics to obtain a trained multi-scale underwater target recognition model;
And an identification module: the method comprises the steps of acquiring an underwater image to be detected, inputting the underwater image to be detected into the trained multi-scale underwater target recognition model, and obtaining detection and recognition results of the underwater target.
7. A computer-readable storage medium having stored therein a plurality of instructions; the plurality of instructions for loading and executing the method of any of claims 1-5 by a processor.
8. An electronic device, the electronic device comprising:
a processor for executing a plurality of instructions;
a memory for storing a plurality of instructions;
Wherein the plurality of instructions are for storage by the memory and loading and executing by the processor the method of any of claims 1-5.
CN202410025836.0A 2024-01-08 2024-01-08 Method and device for detecting inherent attribute characteristics based on underwater target Active CN117809168B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410025836.0A CN117809168B (en) 2024-01-08 2024-01-08 Method and device for detecting inherent attribute characteristics based on underwater target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410025836.0A CN117809168B (en) 2024-01-08 2024-01-08 Method and device for detecting inherent attribute characteristics based on underwater target

Publications (2)

Publication Number Publication Date
CN117809168A CN117809168A (en) 2024-04-02
CN117809168B true CN117809168B (en) 2024-05-17

Family

ID=90426916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410025836.0A Active CN117809168B (en) 2024-01-08 2024-01-08 Method and device for detecting inherent attribute characteristics based on underwater target

Country Status (1)

Country Link
CN (1) CN117809168B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118154993B (en) * 2024-05-09 2024-07-19 南昌工程学院 Bimodal underwater dam crack detection method based on acousto-optic image fusion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209952A (en) * 2020-01-03 2020-05-29 西安工业大学 Underwater target detection method based on improved SSD and transfer learning
CN113420819A (en) * 2021-06-25 2021-09-21 西北工业大学 Lightweight underwater target detection method based on CenterNet
CN116486177A (en) * 2023-05-15 2023-07-25 青岛中科研海海洋科技有限公司 Underwater target identification and classification method based on deep learning
CN116844032A (en) * 2023-07-06 2023-10-03 海南大学 Target detection and identification method, device, equipment and medium in marine environment
CN117079117A (en) * 2023-09-13 2023-11-17 中国电子科技集团公司第十五研究所 Underwater image processing and target identification method and device, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8203911B2 (en) * 2007-10-23 2012-06-19 Kevin Kremeyer Acoustic and optical illumination technique for underwater characterization of objects/environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209952A (en) * 2020-01-03 2020-05-29 西安工业大学 Underwater target detection method based on improved SSD and transfer learning
CN113420819A (en) * 2021-06-25 2021-09-21 西北工业大学 Lightweight underwater target detection method based on CenterNet
CN116486177A (en) * 2023-05-15 2023-07-25 青岛中科研海海洋科技有限公司 Underwater target identification and classification method based on deep learning
CN116844032A (en) * 2023-07-06 2023-10-03 海南大学 Target detection and identification method, device, equipment and medium in marine environment
CN117079117A (en) * 2023-09-13 2023-11-17 中国电子科技集团公司第十五研究所 Underwater image processing and target identification method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的水下图像目标检测综述;罗逸豪 等;电 子 与 信 息 学 报;20231031;第45卷(第10期);3469-3476 *

Also Published As

Publication number Publication date
CN117809168A (en) 2024-04-02

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
CN108256562B (en) Salient target detection method and system based on weak supervision time-space cascade neural network
Lu et al. Dense and sparse reconstruction error based saliency descriptor
Venugopal Automatic semantic segmentation with DeepLab dilated learning network for change detection in remote sensing images
CN106845440B (en) Augmented reality image processing method and system
Qi et al. FTC-Net: Fusion of transformer and CNN features for infrared small target detection
CN117809168B (en) Method and device for detecting inherent attribute characteristics based on underwater target
EP4085369A1 (en) Forgery detection of face image
US20240257423A1 (en) Image processing method and apparatus, and computer readable storage medium
CN110827265B (en) Image anomaly detection method based on deep learning
CN116910752B (en) Malicious code detection method based on big data
Zhou et al. STI-Net: Spatiotemporal integration network for video saliency detection
CN117292117A (en) Small target detection method based on attention mechanism
WO2019100348A1 (en) Image retrieval method and device, and image library generation method and device
CN115358952A (en) Image enhancement method, system, equipment and storage medium based on meta-learning
US12026231B2 (en) System for local optimization of object detector based on deep neural network and method of creating local database therefor
Ma et al. MSMA-Net: An Infrared Small Target Detection Network by Multi-scale Super-resolution Enhancement and Multi-level Attention Fusion
Huang et al. Random sampling-based background subtraction with adaptive multi-cue fusion in RGBD videos
CN113344987A (en) Infrared and visible light image registration method and system for power equipment under complex background
Salman et al. Image Enhancement using Convolution Neural Networks
CN116994049A (en) Full-automatic flat knitting machine and method thereof
CN113792764B (en) Sample expansion method, system, storage medium and electronic equipment
CN115294636A (en) Face clustering method and device based on self-attention mechanism
Wang et al. Sonar image detection based on multi-scale multi-column convolution neural networks
Zhao et al. IR saliency detection via a GCF-SB visual attention framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant