CN117173562A - SAR image ship identification method based on latent layer diffusion model technology - Google Patents
SAR image ship identification method based on latent layer diffusion model technology Download PDFInfo
- Publication number
- CN117173562A CN117173562A CN202311065024.0A CN202311065024A CN117173562A CN 117173562 A CN117173562 A CN 117173562A CN 202311065024 A CN202311065024 A CN 202311065024A CN 117173562 A CN117173562 A CN 117173562A
- Authority
- CN
- China
- Prior art keywords
- image
- channel
- module
- layer
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000009792 diffusion process Methods 0.000 title claims abstract description 22
- 230000007246 mechanism Effects 0.000 claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 230000004927 fusion Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 9
- 230000005284 excitation Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 238000009499 grossing Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000001125 extrusion Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000003213 activating effect Effects 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 5
- 238000003709 image segmentation Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000005484 gravity Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 210000005036 nerve Anatomy 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000002407 reforming Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The application provides a SAR image ship identification method based on a latent layer diffusion model technology. The method is used for SAR ship recognition tasks, the data set of a limited sample is expanded through an image generation module, then the characteristic extraction of Transformer Layer of a T2T module and an SE attention mechanism increasing module is carried out, the information fusion between adjacent characteristics is enhanced, and key characteristics are highlighted. Efficient and accurate SAR image ship identification is realized.
Description
Technical Field
The application belongs to the technical field of target recognition of Synthetic Aperture Radar (SAR), and particularly relates to an SAR image ship recognition method based on a latent layer diffusion model technology.
Background
In recent years, the deep learning method has been successfully applied to target recognition in SAR images. However, the identification of vessels in SAR images still presents significant challenges. Firstly, due to the specificity of SAR images, the traditional algorithm cannot model important local structures such as edges and lines between adjacent pixels, so that the training efficiency of samples is low. Secondly, in case of insufficient SAR ship identification data sets, the training samples provide very limited features for the backbone network, which makes the model impossible to achieve high classification accuracy. Therefore, how to efficiently and accurately identify the target of the SAR ship under the condition of limited training samples is a problem to be studied and solved.
Disclosure of Invention
The application aims to solve the problem of insufficient data of a SAR ship sample, and provides a SAR image ship identification method based on a submerged layer diffusion model technology. According to the method, the deep learning network is utilized to efficiently and accurately identify the SAR ship targets, and corresponding category information is output.
The application is realized by the following technical scheme, and provides a SAR image ship identification method based on a submerged layer diffusion model technology, which comprises the following steps:
step 1: after the current limited sample data is processed by an image generation module, generating a picture of a corresponding category according to text information or semantic description; dividing the enhanced data set into a training set, a verification set and a test set according to the proportion;
step 2: extracting features of an input image, selecting a T2T-ViT model as a feature extraction network, converting the input image into feature vectors after image segmentation and T2T module processing, and fusing adjacent feature information;
step 3: an SE attention mechanism module is added after the multi-head attention, the weight of each channel is calculated according to the input feature diagram, and the attention degree of important features is increased;
step 4: and (3) performing multi-task regression on the characteristics by using the classification head, and giving a weight coefficient to adapt to the scene of SAR ship identification, so as to finally obtain an identification result.
Further, in step 1, the data set used is the OpenSARShip 2.0 ship data set, and 320 sample pictures of three kinds of ships and Cargo ships, cargo, fishing vessels, and tugboat Tug are selected.
Further, in step 1, for the image generation module set based on the latent layer diffusion model, the image is converted from the pixel space to the latent layer space by the encoder, and after being subjected to the processes of adding noise and U-Net denoising, the image is converted from the latent layer space to the pixel space by decoding, and the whole process objective is simplified as follows:
wherein the nerve trunk e θ (o, t) is U-Net, z for a time condition t Is generated by the addition of noise to the input latent layer feature vector z.
Further, in order that the image generation module can generate corresponding pictures according to requirements, the U-Net backbone of the foundation of the image generation module is enhanced by using a cross attention mechanism, the U-Net backbone is converted into a more flexible conditional image generator, and in order to preprocess y from various modes, an encoder tau in a specific field is introduced θ The encoder will project to the intermediate representationIt is then mapped to the middle layer of the U-Net by implementing a cross-attention layer of attention:
wherein,representation e θ Andis represented in the middle of U-Net; />And->Is a learnable projection matrix;
based on the image condition pair, the final learning condition is:
wherein τ θ And e θ And the optimization is combined through the formula.
Further, in the step 2, the image is divided into n image blocks through a T2T-ViT model, then feature extraction and fusion are carried out on the image blocks through a T2T module, a certain number of tokens are generated, and then class tokens and position codes position embedding are combined for subsequent processing.
Further, there are two steps per T2T module; reconstruction and soft segmentation; for input image I i It is converted into tokens by soft segmentation:
T i+1 =SS(I i )
then T is generated by conversion of transformerencoder i+1 ′:
T i+1 ′=MLP(MSA(T i+1 ))
Wherein MSA is a layerNormalized multi-headed self-care operation, MLP is a layer normalized multi-layer perceptron in standard transformers, and then remodelling these symbols in the spatial dimension into image I i+1 :
I i+1 =Reshape(T i+1 ′)
Wherein Reshape indicates thatRecombined as->Wherein l is T i+1 ' length, h, w, c are height, width, channel, respectively, and l=h×w.
Further, in step 3, the SE attention mechanism module is divided into two parts, namely extrusion and excitation, and channel weights are reconstructed by modeling the relation between channels;
extrusion using global average pooling F sq (.) generating channel level information for the purpose of compressing global space information into a channel descriptor vector Z E R C The method comprises the steps of carrying out a first treatment on the surface of the The channel descriptor vector Z is considered a set of local features, where each element represents the global features of each channel of U; formally representing z= [ Z ] 1 ,z 2 ,…,z c ]It is obtained by compressing the feature map u= [ U ] 1 ,u 2 ,…,u c ]And generated; the C-th element using the spatial dimension H x W, Z of U is calculated as follows:
after compressing the information, implementing excitation to fully capture the relationship between the channels of U, each element of the descriptor vector Z representing a global feature of the corresponding channel of U; thus, two fully connected layers are established, which are regarded as mapping functions F ex Parameterizing the nonlinear relationship of each element of Z, and then activating the parameters by an S-shaped activation function to obtain channel weights at the U-pixel level; the excitation equation is expressed as:
S=F ex (Z,W)=σ(g(Z,V))=σ(V 2 δ(V 1 Z))
wherein σ is a sigmoid function; delta is a reforming linear unit ReLU activation function; v (V) 1 ∈R C/R×C And V 2 ∈R C/R×C A weight matrix representing the full connection layer; C/R is the dimension-reducing gravity of the full-connection layer; s epsilon R C With a value falling between 0 and 1, representing the model's interest in each channel of the profile U;
the final output obtained in the SE attention mechanism module by activating S rescaled U is:
x c ′=F scale (u c ,s c )=u c s c
wherein X' = [ X ] 1 ′,x 2 ′,…,x c ′],F scale (u c ,s c ) Index quantity s c And a characteristic diagram u c ∈R H×W Channel multiplication between; the output X' of the SE attention mechanism module is the product of readjusting the channel weights on U.
Further, in step 4, a cross entropy loss function and label smoothing are used to prevent over-fitting problems; the cross entropy loss function formula is as follows:
wherein x is i Is the result of the model output passing through softmax, y i Indicating whether it is the corresponding category label, expressed by the following formula:
then, a label smoothing method is used, so that probability distribution is changed, and the following is obtained:
where ε is an infinitesimal constant.
The application provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the SAR image ship recognition method based on the latent layer diffusion model technology when executing the computer program.
The application provides a computer readable storage medium for storing computer instructions which when executed by a processor realize the steps of the SAR image ship identification method based on the latent layer diffusion model technology.
Compared with the prior art, the application has the beneficial effects that:
the application provides a SAR image ship recognition method based on a latent layer diffusion model technology, which is used for SAR ship recognition tasks. Efficient and accurate SAR image ship identification is realized.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a SAR image ship identification method based on a latent layer diffusion model technology.
Fig. 2 is a network architecture diagram. Wherein, (a) is an overall network structure diagram, (b) is a T2T module, and (c) is an SE attention mechanism module.
Fig. 3 is a schematic diagram of data generated by the image generating module in an embodiment. The method comprises the steps of (a) displaying a remote sensing Cargo image generation result, (b) displaying a remote sensing Fiswing image generation result, and (c) displaying a remote sensing Tug image generation result.
Fig. 4 is a schematic diagram of a SAR ship identification result in an embodiment.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application provides a T2T-ViT model based on an improvement of a latent layer diffusion model, which is used for a SAR image ship identification method. In order to solve the problem of insufficient SAR ship sample data, an image generation module is added into a T2T-ViT model according to the thought of a latent layer diffusion model, training can be performed according to current limited SAR ship sample data, and then corresponding types of SAR ship pictures can be generated according to input text information or semantic description, so that the lack of training samples is compensated. Meanwhile, an SE attention mechanism (sque-and-specification) is added into the T2T-ViT model, the weight of each channel is calculated according to the input feature map, and the network is enabled to pay more attention to the features which have important contributions to tasks such as classification and the like in a weighted mode, so that the performance of the model is improved. Meanwhile, due to the existence of global pooling, the SE attention mechanism can introduce more information without adding too many parameters, so that the problem of overfitting is avoided.
With reference to fig. 1-4, the application provides a method for identifying a ship from an SAR image based on a latent layer diffusion model technology, which comprises the following steps:
step 1: and processing the current limited sample data by an image generation module, and generating pictures of corresponding categories according to the text information or the semantic description. The enhanced data set is proportionally divided into a training set, a verification set and a test set.
Step 2: and carrying out feature extraction on the input image, selecting a T2T-ViT model as a feature extraction network, converting the input image into feature vectors after image segmentation and T2T module processing, and fusing adjacent feature information.
Step 3: the traditional Transformer Layer is improved, an SE attention mechanism module is added after the multi-head attention, the weight of each channel is calculated according to the input feature map, and the attention degree of important features is increased.
Step 4: and (3) performing multi-task regression on the characteristics by using the classification head, and giving a weight coefficient to adapt to the scene of SAR ship identification, so as to finally obtain an identification result.
Optionally, the data set used in the step 1 is an OpenSARShip 2.0 ship data set, and the data set is processed by an image generation processing module to expand the data set in consideration of the problems of unbalance of various samples and excessive invalid samples. The data set is next divided into a training set, a validation set and a test set. Finally, training parameters are set.
Optionally, in step 2, the image is divided into n image blocks through a T2T-ViT model, and then feature extraction and fusion are performed on the image blocks through a T2T module, so as to generate a certain number of tokens. And then combines the class token with the position code position embedding for subsequent processing.
Optionally, step 3 processes the acquired token by Transformer Layer, adds an SE attention mechanism module after multi-head attention, and makes the network pay more attention to the features that have important contributions to tasks such as recognition by a weighted manner.
Optionally, step 4 uses a cross entropy loss function for task identification. On this basis, the label smoothing is modified for preventing overfitting.
Examples
The application aims to solve the problem of SAR image ship identification. And (3) efficiently and accurately identifying the SAR ship targets by using the deep learning network and outputting corresponding category information. In order to achieve the object, the embodiment of the application provides a SAR image ship identification method based on a latent layer diffusion model technology, the basic flow of which is shown in figure 1, comprising the following steps:
step 1: and processing the current limited sample data by an image generation module, and generating pictures of corresponding categories according to the text information or the semantic description. The enhanced data set is proportionally divided into a training set, a verification set and a test set.
Step 2: and carrying out feature extraction on the input image, selecting a T2T-ViT model as a feature extraction network, converting the input image into feature vectors after image segmentation and T2T module processing, and fusing adjacent feature information.
Step 3: the traditional Transformer Layer is improved, an SE attention mechanism module is added after the multi-head attention, the weight of each channel is calculated according to the input feature map, and the attention degree of important features is increased.
Step 4: and (3) performing multi-task regression on the characteristics by using the classification head, and giving a weight coefficient to adapt to the scene of SAR ship identification, so as to finally obtain an identification result.
The data set used in the step 1 is an OpenSARShip 2.0 ship data set, and 320 sample pictures of three kinds of ship Cargo ships, cargo, fishing boat Fising and tugboat Tug are selected. After the existing sample is added into the image generation module for processing, corresponding text information is input, such as SARCargo, SARFishing, SARTanker, so that sample data are generated, and each ship is expanded by 400. Next, the data set is divided, the training set accounts for 80% of the total number of images, the test set accounts for 20% of the total number of images (the training set and the test set are randomly generated), and a part of the training set is randomly selected as the verification set. During training, the input image is fixed at 224×244. The training batch size was 8 and the number of training iterations was 500.
For an image generation module based on the latent layer diffusion model, an image is converted from a pixel space to a latent layer space through an encoder, and after being subjected to noise adding and U-Net denoising process treatment, the image is converted from the latent layer space to the pixel space through decoding, and the whole process target can be simplified as follows:
wherein the nerve trunk e θ (o, t) is U-Net, z for a time condition t Is generated by the addition of noise to the input latent layer feature vector z. For the image generation module to generate corresponding pictures according to the requirement, the U-Net backbone of the foundation is enhanced by using a cross-attention mechanism, the U-Net backbone is converted into a more flexible conditional image generator, and for preprocessing y from various modes (such as text information), an encoder tau in a specific field is introduced θ The encoder will project to the intermediate representationIt is then mapped to the middle layer of the U-Net by implementing a cross-attention layer of attention:
wherein,here->Representation e θ And->Is denoted as U-Net. />And->Is a learnable projection matrix.
Based on the image condition pair, the final learning condition is specifically:
wherein τ θ And e θ And the optimization is combined through the formula.
And step 2, dividing an original image into image blocks through a T2T-ViT model, and extracting image features in a T2T module process.
Each of which isThe T2T module has two steps; reconstruction and soft segmentation. For input image I i It is converted into tokens by soft segmentation:
T i+1 =SS(I i )
then T is generated by classical transformerencoder conversion i+1 ′。
T i+1 ′=MLP(MSA(T i+1 ))
Wherein MSA is multi-head self-attention operation of layer normalization, and MLP is multi-layer perceptron of layer normalization of standard transformer. These symbols are then reshaped in the spatial dimension into image I i+1 。
I i+1 =Reshape(T i+1 ′)
Wherein Reshape indicates thatRecombined as->Wherein l is T i+1 ' length, h, w, c are height, width, channel, respectively, and l=h×w.
Step 3 improves on the traditional Transformer Layer by adding an SE attention mechanism module after multi-head attention. The SE attention mechanism module is divided into two parts, namely extrusion and excitation, and channel weights are reconstructed by modeling the relationship between channels. Thus, the features associated with the channel region are more pronounced.
Extrusion using global average pooling F sq (.) generating channel level information for the purpose of compressing global space information into a channel descriptor vector Z E R C . The channel descriptor vector Z is considered a set of local features, where each element represents the global features of each channel of U. Formally representing z= [ Z ] 1 ,z 2 ,…,z c ]It is obtained by compressing the feature map u= [ U ] 1 ,u 2 ,…,u c ]And is generated. The C-th element using the spatial dimension H x W, Z of U is calculated as follows:
after compressing the information, excitation is achieved to fully capture the relationship between the channels of the U. Each element of the descriptor vector Z represents a global feature of the corresponding channel of U. Thus, two fully connected layers are established, which are regarded as mapping functions F ex (.) to parameterize the nonlinear relationship of each element of Z. The parameters are then activated by an S-shaped activation function to obtain the channel weights at the U pixel level. The excitation equation is expressed as:
S=F ex (Z,W)=σ(g(Z,V))=σ(V 2 δ(V 1 Z))
wherein σ is a sigmoid function; delta is a reforming linear unit (ReLU) activation function; v (V) 1 ∈R C/R×C And V 2 ∈R C/R×C A weight matrix representing the full connection layer; C/R is the dimension-reducing gravity of the fully connected layer, with a recommended value of 16. Thus S ε R C With a value falling between 0 and 1, representing the model's interest in each channel of the feature map U.
The final output obtained in the SE attention mechanism module by activating S rescaled U is:
x c ′=F scale (u c ,s c )=u c s c
wherein X' = [ X ] 1 ′,x 2 ′,…,x c ′],F scale (u c ,s c ) Index quantity s c And a characteristic diagram u c ∈R H×W Channel multiplication between them. Obviously, the output X' of the SE attention mechanism block is the product of readjusting the channel weights on U. In the task learning process, the weight of the channel related to the state is increased, and the expression capability of the characteristics is improved.
Step 4 employs a cross entropy loss function and label smoothing to prevent over-fitting problems.
The cross entropy loss function formula is as follows:
wherein x is i Is the result of the model output passing through softmax, y i Whether or not it is a corresponding category label can be expressed by the following formula:
the processing can lead to neglect of the relation between the real tag and other tags, and the model is easy to influence when the classification and identification problems of SAR ship data sets with high sample similarity and high data noise are processed. The label smoothing method is used later, so that the probability distribution is changed, resulting in:
where ε is an infinitesimal constant, which makes the probability optimal targets in softmax loss no longer 1 and 0, which avoids over-fitting to some extent and also mitigates the effects of false labels.
The application provides a SAR image ship recognition method based on a latent layer diffusion model technology, which is used for SAR ship recognition tasks. Efficient and accurate SAR image ship identification is realized.
The application provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the SAR image ship recognition method based on the latent layer diffusion model technology when executing the computer program.
The application provides a computer readable storage medium for storing computer instructions which when executed by a processor realize the steps of the SAR image ship identification method based on the latent layer diffusion model technology.
The memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DRRAM). It should be noted that the memory of the methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
It should be noted that the processor in the embodiments of the present application may be an integrated circuit chip with signal processing capability. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The SAR image ship identification method based on the latent layer diffusion model technology is described in detail, and specific examples are applied to illustrate the principle and the implementation mode of the method, and the description of the examples is only used for helping to understand the method and the core idea of the method; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (10)
1. A SAR image ship identification method based on a latent layer diffusion model technology is characterized by comprising the following steps of: the method comprises the following steps:
step 1: after the current limited sample data is processed by an image generation module, generating a picture of a corresponding category according to text information or semantic description; dividing the enhanced data set into a training set, a verification set and a test set according to the proportion;
step 2: extracting features of an input image, selecting a T2T-ViT model as a feature extraction network, converting the input image into feature vectors after image segmentation and T2T module processing, and fusing adjacent feature information;
step 3: an SE attention mechanism module is added after the multi-head attention, the weight of each channel is calculated according to the input feature diagram, and the attention degree of important features is increased;
step 4: and (3) performing multi-task regression on the characteristics by using the classification head, and giving a weight coefficient to adapt to the scene of SAR ship identification, so as to finally obtain an identification result.
2. The method according to claim 1, characterized in that: in step 1, the data set used is the OpenSARShip 2.0 ship data set, and 320 sample pictures of three kinds of ships and Cargo ships, namely Cargo, fishing vessel fisher and tugboat Tug are selected.
3. The method according to claim 1, characterized in that: in step 1, for an image generation module set based on a latent layer diffusion model, an image is converted from a pixel space to a latent layer space through an encoder, and after being subjected to noise adding and U-Net denoising processes, the image is converted from the latent layer space to the pixel space through decoding, and the whole flow target is simplified into:
wherein the nerve trunk e θ (o, t) is U-Net, z for a time condition t Is generated by the addition of noise to the input latent layer feature vector z.
4. A method according to claim 3, characterized in that: in order that the image generation module can generate corresponding pictures according to requirements, the U-Net backbone of the foundation of the image generation module is enhanced by using a cross attention mechanism, the U-Net backbone is converted into a more flexible conditional image generator, and in order to preprocess y from various modes, an encoder tau in a specific field is introduced θ The encoder will project to the intermediate representationIt is then mapped to the middle layer of the U-Net by implementing a cross-attention layer of attention:
wherein,representation e θ Andis represented in the middle of U-Net; />And->Is a learnable projection matrix;
based on the image condition pair, the final learning condition is:
wherein τ θ And e θ And the optimization is combined through the formula.
5. The method according to claim 4, wherein: and 2, dividing the image into n image blocks through a T2T-ViT model, performing feature extraction and fusion on the image blocks through a T2T module to generate a certain number of tokens, and combining class tokens and position codes position embedding for subsequent processing.
6. The method according to claim 5, wherein: each T2T module has two steps; reconstruction and soft segmentation; for input image I i It is converted into tokens by soft segmentation:
T i+1 =SS(I i )
then T is generated by conversion of transformerencoder i+1 ′:
T i+1 ′=MLP(MSA(T i+1 ))
Wherein MSA is multi-head self-attention operation of layer normalization, MLP is multi-layer perceptron of layer normalization of standard transformer, and then the symbols are arranged in the following waySpatially dimensionally remodelling to image I i+1 :
I i+1 =Reshape(T i+1 ′)
Wherein Reshape indicates thatRecombined as->Wherein l is T i+1 ' length, h, w, c are height, width, channel, respectively, and l=h×w.
7. The method according to claim 6, wherein: in step 3, the SE attention mechanism module is divided into two parts, namely extrusion and excitation, and channel weights are reconstructed by modeling the relation between channels;
extrusion using global average pooling F sq (.) generating channel level information for the purpose of compressing global space information into a channel descriptor vector Z E R C The method comprises the steps of carrying out a first treatment on the surface of the The channel descriptor vector Z is considered a set of local features, where each element represents the global features of each channel of U; formally representing z= [ Z ] 1 ,z 2 ,…,z c ]It is obtained by compressing the feature map u= [ U ] 1 ,u 2 ,…,u c ]And generated; the C-th element using the spatial dimension H x W, Z of U is calculated as follows:
after compressing the information, implementing excitation to fully capture the relationship between the channels of U, each element of the descriptor vector Z representing a global feature of the corresponding channel of U; thus, two fully connected layers are established, which are regarded as mapping functions F ex Parameterizing the nonlinear relationship of each element of Z, and then activating the parameters by an S-shaped activation function to obtain channel weights at the U-pixel level; the excitation equation is expressed as:
S=F ex (Z,W)=σ(g(Z,V))=σ(V 2 δ(V 1 Z))
wherein σ is a sigmoid function; delta is a reforming linear unit ReLU activation function; v (V) 1 ∈R C/R×C And V 2 ∈R C/R×C A weight matrix representing the full connection layer; C/R is the dimension-reducing gravity of the full-connection layer; s epsilon R C With a value falling between 0 and 1, representing the model's interest in each channel of the profile U;
the final output obtained in the SE attention mechanism module by activating S rescaled U is:
x c ′=F scale (u c ,s c )=u c s c
wherein X' = [ X ] 1 ′,x 2 ′,…,x c ′],F scale (u c ,s c ) Index quantity s c And a characteristic diagram u c ∈R H×W Channel multiplication between; the output X' of the SE attention mechanism module is the product of readjusting the channel weights on U.
8. The method according to claim 7, wherein: in step 4, a cross entropy loss function and label smoothing are used to prevent over-fitting problems;
the cross entropy loss function formula is as follows:
wherein x is i Is the result of the model output passing through softmax, y i Indicating whether it is the corresponding category label, expressed by the following formula:
then, a label smoothing method is used, so that probability distribution is changed, and the following is obtained:
where ε is an infinitesimal constant.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1-8 when the computer program is executed.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311065024.0A CN117173562A (en) | 2023-08-23 | 2023-08-23 | SAR image ship identification method based on latent layer diffusion model technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311065024.0A CN117173562A (en) | 2023-08-23 | 2023-08-23 | SAR image ship identification method based on latent layer diffusion model technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117173562A true CN117173562A (en) | 2023-12-05 |
Family
ID=88944004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311065024.0A Pending CN117173562A (en) | 2023-08-23 | 2023-08-23 | SAR image ship identification method based on latent layer diffusion model technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117173562A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113435288A (en) * | 2021-06-21 | 2021-09-24 | 南京航空航天大学 | SAR image ship target identification method based on MFF-MA module |
CN113537243A (en) * | 2021-07-23 | 2021-10-22 | 广东工业大学 | Image classification method based on SE module and self-attention mechanism network |
CN115272685A (en) * | 2022-06-21 | 2022-11-01 | 北京科技大学 | Small sample SAR ship target identification method and device |
US20220415027A1 (en) * | 2021-06-29 | 2022-12-29 | Shandong Jianzhu University | Method for re-recognizing object image based on multi-feature information capture and correlation analysis |
CN115841629A (en) * | 2022-12-12 | 2023-03-24 | 中国人民武装警察部队海警学院 | SAR image ship detection method based on convolutional neural network |
-
2023
- 2023-08-23 CN CN202311065024.0A patent/CN117173562A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113435288A (en) * | 2021-06-21 | 2021-09-24 | 南京航空航天大学 | SAR image ship target identification method based on MFF-MA module |
US20220415027A1 (en) * | 2021-06-29 | 2022-12-29 | Shandong Jianzhu University | Method for re-recognizing object image based on multi-feature information capture and correlation analysis |
CN113537243A (en) * | 2021-07-23 | 2021-10-22 | 广东工业大学 | Image classification method based on SE module and self-attention mechanism network |
CN115272685A (en) * | 2022-06-21 | 2022-11-01 | 北京科技大学 | Small sample SAR ship target identification method and device |
CN115841629A (en) * | 2022-12-12 | 2023-03-24 | 中国人民武装警察部队海警学院 | SAR image ship detection method based on convolutional neural network |
Non-Patent Citations (2)
Title |
---|
LI YUAN 等: "Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet", ARXIV, 30 November 2021 (2021-11-30) * |
李家起;江政杰;姚力波;简涛;: "一种基于深度学习的舰船目标融合识别算法", 舰船电子工程, no. 09, 20 September 2020 (2020-09-20) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115222998B (en) | Image classification method | |
CN114821196A (en) | Zero sample image identification method and identification device, medium and computer terminal thereof | |
CN116128894A (en) | Image segmentation method and device and electronic equipment | |
CN116701637B (en) | Zero sample text classification method, system and medium based on CLIP | |
CN110889290B (en) | Text encoding method and apparatus, text encoding validity checking method and apparatus | |
CN111339734A (en) | Method for generating image based on text | |
US20230021551A1 (en) | Using training images and scaled training images to train an image segmentation model | |
CN117173562A (en) | SAR image ship identification method based on latent layer diffusion model technology | |
CN116975347A (en) | Image generation model training method and related device | |
CN115049546A (en) | Sample data processing method and device, electronic equipment and storage medium | |
CN116012662A (en) | Feature encoding and decoding method, and method, device and medium for training encoder and decoder | |
CN111445545B (en) | Text transfer mapping method and device, storage medium and electronic equipment | |
CN112580658B (en) | Image semantic description method, device, computing equipment and computer storage medium | |
CN112016480B (en) | Face feature representing method, system, electronic device and storage medium | |
TWI826201B (en) | Object detection method, object detection apparatus, and non-transitory storage medium | |
CN116824194A (en) | Training method of image classification model, image processing method and device | |
CN117036832B (en) | Image classification method, device and medium based on random multi-scale blocking | |
CN115100432B (en) | Small sample target detection method and device and computer readable storage medium | |
CN116363037B (en) | Multi-mode image fusion method, device and equipment | |
CN113630098B (en) | Gain control method and device of audio amplifier and electronic equipment | |
CN117456219A (en) | Training method of image classification model, image classification method and related equipment | |
CN117541868A (en) | Training method for image classification model, image classification method, model, computer device, and medium | |
CN116978030A (en) | Text information recognition method and training method of text information recognition model | |
CN117893859A (en) | Multi-mode text image classification method and device, electronic equipment and storage medium | |
CN115830164A (en) | Text image generation method based on generation of countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |