CN114565856A - Target identification method based on multiple fusion deep neural networks - Google Patents

Target identification method based on multiple fusion deep neural networks Download PDF

Info

Publication number
CN114565856A
CN114565856A CN202210178011.3A CN202210178011A CN114565856A CN 114565856 A CN114565856 A CN 114565856A CN 202210178011 A CN202210178011 A CN 202210178011A CN 114565856 A CN114565856 A CN 114565856A
Authority
CN
China
Prior art keywords
network
fusion
convolution
layer
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210178011.3A
Other languages
Chinese (zh)
Inventor
白雪茹
毛宇航
周雪宁
刘潇丹
周峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210178011.3A priority Critical patent/CN114565856A/en
Publication of CN114565856A publication Critical patent/CN114565856A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target identification method based on a multiple fusion deep neural network, which mainly solves the technical problem of low spatial target identification accuracy in the prior art and comprises the following implementation steps: (1) acquiring a training sample set and a test sample set; (2) constructing a target recognition model based on a multiple fusion deep neural network; (3) performing iterative training on a target recognition model based on the multiple fusion depth network; (4) and acquiring the identification results of the target ISAR image and the optical image. The invention respectively extracts the channel attention-based fusion characteristic and the bilinear pooling fusion characteristic of the ISAR image and the optical image of the target by performing multi-modal fusion recognition on the ISAR image and the optical image of the target, so that a network can extract richer target characteristic information and improve the attention degree of different contributions and importance to the characteristics during characteristic splicing, thereby improving the recognition accuracy rate of the spatial target.

Description

Target identification method based on multiple fusion deep neural networks
Technical Field
The invention belongs to the technical field of image processing, relates to a target identification method, and particularly relates to a target identification method of an ISAR image and an optical image based on a multiple fusion deep neural network, which can be used for effectively identifying a space target.
Background
The target identification task is to distinguish the targets according to the different characteristics of the targets in different classes reflected in the information. For the identification of radar targets, the traditional radar target identification method extracts effective features from original data and realizes category judgment by manually designing a feature extractor, so that a large amount of time and expert knowledge are needed, and meanwhile, whether the extracted features are effective or not is difficult to define in many tasks, which brings great difficulty to the radar target identification. In recent years, a deep learning radar target identification method based on data driving is greatly developed, so that a complex characteristic design and selection process can be avoided, and effective characteristics can be automatically learned from data. However, the existing deep learning radar target identification method is mainly single-mode identification, that is, only single-mode data is used for radar target feature extraction and category judgment, so that the problem of insufficient data information amount may exist, and the requirement for accurately identifying the target cannot be met. The deep learning multi-mode fusion recognition method utilizes complementary information fusion and redundant information elimination of a plurality of modal data to learn better feature representation, thereby effectively improving the recognition performance of the model. The existing multi-mode fusion recognition method based on deep learning mainly utilizes a convolutional neural network to extract features, but the fusion mode is simple, different contributions and importance of the features are not considered, and the importance of a feature space is enriched by extracting various features, so that the recognition accuracy rate also has a space for improving.
The published paper "radio-Based Human goal Recognition Using Dual-Channel Deep computational Neural Network" (IEEE Transactions on Geoscience and Remote Sensing,2019) of Xueruu Bai, Ye Hui, Li Wang, Feng Zhou proposes a Human body posture Recognition method Based on a two-Channel Convolutional Neural Network. Firstly, constructing a short-window long-channel module and a long-window long-channel module, and constructing a two-channel convolutional neural network based on the short-window long-channel module and the long-window long-channel module; the short-window long-time-frequency graph and the long-window long-time-frequency graph obtained by respectively processing short-window long STFT and long-window long STFT by using radar echo signals are used as training samples, wherein the characteristics of four limbs of a human body in the short-window long-time-frequency graph are obvious, and the characteristics of the trunk of the long-window long-time-frequency graph are obvious; training the two-channel convolutional neural network by using the training sample; and inputting the test samples into the trained dual-channel convolutional neural network to complete recognition. According to the method, the two-channel convolution neural network is constructed, the short-window long-time-frequency graph and the long-window long-time-frequency graph extracted from the radar echo signal are subjected to fusion recognition, the network can simultaneously utilize the four limbs information and the trunk information of a human body contained in the radar echo signal, the complementary information of two modal data is effectively utilized, and therefore the model recognition performance is effectively improved. However, the method has the disadvantages that only convolution characteristics are extracted from two modal data, and only simple characteristic splicing is carried out on the two modal convolution characteristics, so that the problems that characteristic information is not rich enough, different contributions to the characteristics and importance attention is not enough are caused, and the accuracy of target identification is limited.
Disclosure of Invention
The invention aims to provide a target identification method based on ISAR (inverse synthetic aperture radar) image and optical image fusion of a multi-fusion deep neural network aiming at overcoming the defects of the prior art, and aims to solve the technical problems that when multi-modal data fusion identification of a target is carried out in the prior art, feature information is not rich enough, different contributions and importance of the features are not paid enough attention to, and the target is difficult to accurately identify.
The technical idea of the invention is as follows: the method has the advantages that the multiple fusion deep neural network is constructed, and the trained multiple fusion deep neural network is used for directly carrying out fusion recognition on the ISAR image and the optical image of the target, so that the problems that in the prior art, when target recognition is carried out, feature information is not rich enough, attention to feature splicing details is not enough, and recognition accuracy is low are solved. According to the invention, the channel attention fusion network is utilized to perform fine convolution feature fusion on the ISAR image and the optical image of the target, so that the problem that feature details on a channel are difficultly concerned in the prior art is avoided; meanwhile, bilinear pooling features are further extracted and merged from the convolution features through a bilinear pooling feature merging network, so that the extraction of the bilinear pooling feature of the target is realized. The bilinear pooling fusion characteristics and the fusion characteristics based on the channel attention are subjected to additive fusion through an additive fusion network, so that the characteristic abundance of the target is improved, and the problem that the target is difficult to accurately fuse and recognize by utilizing ISAR images and optical images in the prior art is solved.
In order to realize the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) acquiring a training sample set R and a test sample set E:
obtaining S ISAR images A ═ A including K object categoriessL 1 is less than or equal to S and S optical images B is equal to { B ≦ S ≦sS is more than or equal to 1 and less than or equal to S, and for each ISAR image AsAnd each optical image BsRespectively carrying out normalization, labeling targets contained in each normalized ISAR image and each optical image, taking M normalized ISAR images, optical images and corresponding labels thereof as a training sample set R, taking the rest normalized ISAR images, optical images and corresponding labels thereof as a test sample set E, wherein K is more than or equal to 3, S is more than or equal to 800,
Figure BDA0003521126340000031
As、Bsrespectively indicate identity to the same objectThe s th ISAR image and the optical image observed from the ground observation visual angle;
(2) constructing a target recognition model H based on a multi-fusion deep neural network:
constructing a target recognition model H comprising a convolution feature extraction network and a multiple fusion deep neural network, wherein the multiple fusion deep neural network comprises a bilinear pooling fusion network, a channel attention fusion network and an addition fusion network, and the method comprises the following steps of:
the convolution feature extraction network comprises a first convolution feature extraction network and a second convolution feature extraction network which are arranged in parallel;
the bilinear pooling fusion network comprises a first bilinear pooling feature extraction network and a second bilinear pooling feature extraction network which are arranged in parallel, and a feature fusion layer connected with the output ends of the two bilinear pooling feature extraction networks;
the output of the first convolution feature extraction network is connected with the input ends of the first bilinear pooling feature extraction network and the channel attention fusion network, and the output of the second convolution feature extraction network is connected with the input ends of the second bilinear pooling feature extraction network and the channel attention fusion network; the output ends of the bilinear pooling fusion network and the channel attention fusion network are connected with the input end of the addition fusion network;
(3) carrying out iterative training on a target recognition model H based on the multiple fusion depth network:
(3a) the initial iteration number is T, the maximum iteration number T is more than or equal to 20, and the target identification model of the tth iteration is Ht,HtThe weight parameter of is omegatLet t be 1;
(3b) taking a training sample set R as a target recognition model based on a multiple fusion depth network as HtPerforming convolution feature extraction on each ISAR image training sample by a first convolution feature extraction network in the convolution feature extraction network to obtain an ISAR image convolution feature set corresponding to R, and performing convolution feature extraction on each optical image training sample by a second convolution feature extraction network to obtain an optical image convolution feature set corresponding to R;
(3c) the multiple fusion depth neural network performs multiple fusion on each ISAR image convolution characteristic and the corresponding optical image convolution characteristic:
(3c1) a first bilinear pooling feature extraction network in the bilinear pooling fusion network performs bilinear pooling feature extraction on each ISAR image convolution feature, a second bilinear pooling feature extraction network performs bilinear pooling feature extraction on each optical image convolution feature, and a feature fusion layer performs feature fusion on the extracted bilinear pooling feature of each ISAR image and the bilinear pooling feature of the corresponding optical image; meanwhile, the channel attention fusion network performs channel attention-based feature fusion on each ISAR image convolution feature and the corresponding optical image convolution feature;
(3c2) the additive fusion network performs additive fusion on each bilinear pooling fusion feature obtained by the bilinear pooling fusion network and the fusion feature based on the channel attention obtained by the corresponding channel attention fusion network to obtain a prediction label y of each target in the training sample set R;
(3d) adopting a cross entropy loss function to pass through the predicted label y of each target and the corresponding real label y*Calculate HtLoss value L oftAnd find LtFor weight parameter omegatPartial derivatives of
Figure BDA0003521126340000041
Then using a gradient descent method by
Figure BDA0003521126340000042
For weight parameter omegatUpdating is carried out;
(3e) judging whether T is more than or equal to T, if so, obtaining a trained target recognition model H*Otherwise, let t be t +1, and execute step (3 b);
(4) acquiring the identification results of the target ISAR image and the optical image:
taking the test sample set E as a trained target classification model H*Is propagated forward to obtain the predicted targets of all targets contained in EAnd labeling, wherein the target corresponding to each predicted label is the recognition result.
Compared with the prior art, the invention has the following advantages:
firstly, the target recognition model constructed by the invention comprises a multiple fusion deep neural network, in the process of training the model and acquiring the target recognition result, a bilinear pooling fusion network in the multiple fusion deep neural network extracts bilinear pooling fusion characteristics of the target, a channel attention fusion network extracts convolution fusion characteristics of the target, and the network model can extract two types of fusion characteristics of the target, so that the information richness of the target characteristic extraction is improved by the network model, the problem that the characteristic information is not rich enough due to the fact that only a single type of target characteristic is extracted when the target is recognized based on the neural network in the prior art is solved, and the accuracy of target recognition is effectively improved.
Secondly, in the process of training a target recognition model and acquiring a target recognition result, when the bilinear pooling feature of the target is merged by the bilinear pooling merging network, the effect of reinforcing the important feature is realized through self-adaptive weight; when the channel attention fusion network performs feature fusion on the target convolution features, the channel attention enables a network model to pay more attention to the channel features with the largest information quantity and restrain some unimportant channel features; therefore, when feature fusion is carried out on the bilinear pooling fusion network and the channel attention fusion network, the problem that different contributions to features and importance attention are insufficient when feature splicing fusion is carried out firstly in the prior art is solved, better fusion features are extracted from the target, and the accuracy of target identification is further improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic diagram of a structure of an object recognition model constructed by the present invention;
fig. 3 is a schematic structural diagram of an additive fusion network constructed by the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
Referring to fig. 1, the present invention includes the following steps.
(1) Acquiring a training sample set R and a test sample set E:
obtaining S ISAR images A including K target categories as AsL 1 ≦ S } and S optical images B ≦ BsS is more than or equal to 1 and less than or equal to S, each image comprises 1 target, and each ISAR image A is subjected tosAnd each optical image BsRespectively carrying out normalization to solve the problem of network gradient explosion, labeling targets contained in each normalized ISAR image and each optical image, taking M normalized ISAR images, optical images and corresponding labels thereof as a training sample set R, taking the rest normalized ISAR images, optical images and corresponding labels thereof as a test sample set E, wherein K is more than or equal to 3, S is more than or equal to 800,
Figure BDA0003521126340000051
As、Bsrespectively representing the s th ISAR image and the optical image observed by the same target at the same ground observation visual angle; in this embodiment, K is 4, S is 858, and M is 468;
(2) constructing a target recognition model H based on a multi-fusion deep neural network:
constructing a target recognition model H comprising a convolution feature extraction network and a multiple fusion deep neural network, wherein the structure of the target recognition model H is shown in figure 2, the multiple fusion deep neural network comprises a bilinear pooling fusion network, a channel attention fusion network and an addition fusion network, and the method comprises the following steps:
the convolution feature extraction network comprises a first convolution feature extraction network and a second convolution feature extraction network which are arranged in parallel; the first convolution feature extraction network and the second convolution feature extraction network both comprise six network layers, each network layer comprises a convolution layer, a batch normalization layer, a ReLU activation layer and a maximum value pooling layer which are sequentially cascaded, and specific parameters are set as follows: the convolution kernel size of the convolution layer in the first network layer is 5 multiplied by 5 pixels, the convolution kernel number is 32, the channel number of the batch normalization layer is 32, and the window size of the maximum pooling layer is 2 multiplied by 2 pixels; the convolution kernel size of the convolution layers in the second network layer and the third network layer is 5 x 5 pixels, the number of the convolution kernels is 64, the number of channels of the batch normalization layer is 64, and the window size of the maximum pooling layer is 2 x 2 pixels; the convolution kernel size of the convolution layer in the fourth to sixth network layers is 3 × 3 pixels, the number of convolution kernels is 128, the number of channels of the batch normalization layer is 128, and the window size of the maximum pooling layer is 2 × 2 pixels;
the bilinear pooling fusion network comprises a first bilinear pooling feature extraction network, a second bilinear pooling feature extraction network and a feature fusion layer, wherein the first bilinear pooling feature extraction network and the second bilinear pooling feature extraction network are arranged in parallel, and the feature fusion layer is connected with the output ends of the two bilinear pooling feature extraction networks; the first bilinear pooling feature extraction network and the second bilinear pooling feature extraction network both comprise a bilinear pooling layer and a feature normalization layer which are connected in sequence;
the channel attention fusion network comprises a characteristic channel splicing layer and a channel attention unit which are connected in sequence; the channel attention unit comprises a global average pooling layer and a global maximum pooling layer which are arranged in parallel, and a first convolution layer, a ReLU active layer, a second convolution layer and a Sigmoid active layer which are sequentially cascaded with the output of the global average pooling layer and the output of the global maximum pooling layer, wherein the output dimensionalities of the global average pooling layer and the output dimensionality of the global maximum pooling layer are both 1 x 1, the first convolution layer comprises 16 convolution kernels, the size of each convolution kernel is 1 x 1, the second convolution layer comprises 256 convolution kernels, and the size of each convolution kernel is 1 x 1;
the structure of the additive fusion network is shown in fig. 3, and the additive fusion network comprises a first fully-connected network and a second fully-connected network which are arranged in parallel, and an additive fusion layer and a Softmax layer which are sequentially cascaded with the outputs of the first fully-connected network and the second fully-connected network; the first full-connection network and the second full-connection network respectively comprise two full-connection layers, the number of neurons in the first full-connection layer is 128, and the number of neurons in the second full-connection layer are 4;
the output of the first convolution feature extraction network is connected with the input ends of the first bilinear pooling feature extraction network and the channel attention fusion network, and the output of the second convolution feature extraction network is connected with the input ends of the second bilinear pooling feature extraction network and the channel attention fusion network; the output ends of the bilinear pooling fusion network and the channel attention fusion network are connected with the input end of the addition fusion network;
(3) carrying out iterative training on a target recognition model H based on the multiple fusion deep network:
(3a) the initial iteration number is T, the maximum iteration number T is more than or equal to 20, and the target identification model of the tth iteration is Ht,HtThe weight parameter of is omegatLet t be 1; wherein, in the embodiment, T is 20;
(3b) taking a training sample set R as a target recognition model based on a multiple fusion depth network as HtPerforming convolution feature extraction on each ISAR image training sample by a first convolution feature extraction network in the convolution feature extraction network to obtain an ISAR image convolution feature set corresponding to R, and performing convolution feature extraction on each optical image training sample by a second convolution feature extraction network to obtain an optical image convolution feature set corresponding to R;
(3c) the multiple fusion depth neural network performs multiple fusion on each ISAR image convolution characteristic and the corresponding optical image convolution characteristic:
(3c1) a first bilinear pooling feature extraction network in the bilinear pooling fusion network performs bilinear pooling feature extraction on each ISAR image convolution feature, a second bilinear pooling feature extraction network performs bilinear pooling feature extraction on each optical image convolution feature, and a feature fusion layer performs feature fusion on each extracted bilinear pooling feature of each ISAR image and the bilinear pooling feature of the corresponding optical image; meanwhile, the channel attention fusion network performs channel attention-based feature fusion on each ISAR image convolution feature and the corresponding optical image convolution feature;
a bilinear pooling layer in the first bilinear pooling feature extraction network performs bilinear pooling on the convolution feature f of each ISAR image, the output result is x, then a feature normalization layer normalizes x, and the output result is z:
Figure BDA0003521126340000071
Figure BDA0003521126340000072
wherein the content of the first and second substances,
Figure BDA0003521126340000073
representing the input convolution characteristics of the bilinear pooling layer, D representing the channel number of the input convolution characteristics, L ∈ L representing the spatial position of the input convolution characteristics, Σ representing a summation operation, vec representing expanding a matrix into vectors, fT(l) Transpose of f (l), sgn stands for sign operation, | | | | | non-conducting phosphor22-norm operation is expressed;
bilinear pooling feature F of each extracted ISAR image by the feature fusion layerBIAnd bilinear pooling feature F of its corresponding optical imageBOPerforming feature fusion, and outputting the result as FB
Figure BDA0003521126340000074
Wherein, WBIAnd WBOThe representation may be a representation of a learnable weight,
Figure BDA0003521126340000075
representing a co-located element multiplication operation, concat (-) representing a splicing operation on a feature on a channel;
through the bilinear pooling fusion network, the target ISAR image convolution characteristics and the optical image convolution characteristics are subjected to bilinear pooling characteristic extraction firstly and then are fused, so that the problem of model overfitting caused by the fact that the calculated amount is multiplied and increased due to the fact that the convolution characteristic is fused firstly and then the fusion characteristic is subjected to bilinear pooling is effectively solved, meanwhile, when the bilinear pooling fusion network is used for feature fusion, the details of the channel of the bilinear pooling characteristic can be concerned, the effect of highlighting the important characteristic can be achieved through self-adaptive weight, and the bilinear pooling fusion characteristic of the target is better extracted; convolution feature fusion is carried out on the convolution feature of the target ISAR image and the convolution feature of the optical image through a channel attention fusion network based on channel attention, so that a network model can focus more on the channel feature with the largest information amount, inhibit some unimportant channel features and better summarize the convolution features; by extracting the convolution characteristic and the bilinear pooling characteristic of the target, the problem of insufficient characteristic extraction in deep learning is effectively solved.
(3c2) The additive fusion network performs additive fusion on each bilinear pooling fusion feature obtained by the bilinear pooling fusion network and the fusion feature based on the channel attention obtained by the corresponding channel attention fusion network to obtain a prediction label y of each target in the training sample set R; wherein the output result of the additive fusion layer to the first fully-connected network
Figure BDA0003521126340000081
And output results of the second fully connected network
Figure BDA0003521126340000082
And performing addition fusion, wherein the output result is F:
Figure BDA0003521126340000083
wherein, WBAnd WCThe representation may be a representation of a learnable weight,
Figure BDA0003521126340000084
representing a co-located element multiplication operation.
The bilinear pooling fusion network extracts bilinear pooling fusion characteristics of a target and fusion characteristics extracted by the channel attention fusion network and based on channel attention through the addition fusion network, and the fusion of the bilinear pooling fusion network and the channel attention fusion network improves the richness of the target characteristics, so that the effect of improving the identification accuracy of the deep neural network on the target ISAR image and the optical image is achieved.
(3d) Adopting a cross entropy loss function to pass through the predicted label y of each target and the corresponding real label y*Calculate HtLoss value L oftAnd find LtFor weight parameter omegatPartial derivatives of
Figure BDA0003521126340000085
Then using a gradient descent method by
Figure BDA0003521126340000086
For weight parameter omegatUpdating is carried out;
calculating HtLoss value L oftFor the weight parameter omegatUpdating, wherein the calculation formula and the updating formula are respectively as follows:
Figure BDA0003521126340000087
Figure BDA0003521126340000091
wherein M represents the number of training samples,
Figure BDA0003521126340000092
representing the true label, y, corresponding to the mth sample in the training sample setmRepresentation model HtFor the prediction label of the mth sample, ln represents the base e logarithm operation,
Figure BDA0003521126340000093
represents omegatEta represents LtThe learning rate of (a) is determined,
Figure BDA0003521126340000094
representing a derivative operation.
(3e) Judging whether T is more than or equal to T, if so, obtaining a trained target recognition model H*Otherwise, let t be t +1, and execute step (3 b);
(4) acquiring the identification results of the target ISAR image and the optical image:
taking the test sample set E as a trained target recognition model H*The input of E is transmitted forward to obtain the prediction labels of all the targets contained in E, and the target corresponding to each prediction label is the recognition result.
The technical effects of the present invention will be further described with reference to simulation experiments.
1. Simulation conditions and contents.
The hardware platform of the simulation experiment is as follows: the processor is an Intel Xeon Silver 4114CPU, the main frequency of the processor is 2.20GHz, the memory is 128GB, and the display card is NVIDIA GeForce GTX 1080 Ti; the software platform is Windows 10 operating system, Matlab2018, Python 3.6 and tensorflow 1.5.
The data set used in the simulation experiment is 4 types of satellite ISAR images acquired by the radar under the conditions of 16GHz central frequency, 2GHz signal bandwidth and 6-degree accumulation angle and 4 types of satellite optical images acquired in 3ds Max and simulated at the same ground observation visual angle, and the four types of satellite targets are respectively: calipso, Cloudsat, Jason-3, OCO2, selecting 468 ISAR images and corresponding optical images as a training sample set, and 390 ISAR images and corresponding optical images as a test sample set.
The invention and the existing multi-mode feature fusion method are adopted to respectively perform fusion recognition on the ISAR image and the optical image of the 4 types of satellite targets, so as to obtain the recognition accuracy.
2. And (5) analyzing a simulation result.
The method is applied to perform fusion recognition on the ISAR images and the optical images of the 4-class satellite targets, firstly, a training sample set is used for training the multi-fusion deep neural network to obtain a trained target recognition model based on the multi-fusion deep neural network, and then, a test sample set is used for testing the trained multi-fusion deep neural network.
Calculating the accuracy of target identification by the following formula:
Figure BDA0003521126340000101
wherein c represents the recognition accuracy of the test sample set, E*Represents the total number of samples of the test sample set, h (-) represents the class discrimination function,
Figure BDA0003521126340000102
a true class label, y, representing the e-th test sample in the set of test sampleseThe neural output result of the multiple fusion depth network corresponding to the e test sample in the test sample set is shown when
Figure BDA0003521126340000103
And yeWhen the phase difference is equal to each other,
Figure BDA0003521126340000104
equal to 1, otherwise,
Figure BDA0003521126340000105
equal to 0.
According to E*=390,
Figure BDA0003521126340000106
The recognition accuracy of the invention is calculated to be 90.0%.
The method comprises the steps of identifying ISAR images and optical images of 4 types of satellite targets by applying an existing multi-mode Fusion identification method, wherein the existing multi-mode Fusion identification method is taken from a paper 'Multimodal 2D +3D Facial Expression With Deep Fusion Neural Network' issued by Huibin Li, Jian Sun and the like. In a simulation experiment, a multi-modal fusion recognition network is trained by using a training sample set to obtain a trained multi-modal fusion recognition network, and then a test sample set is used for testing on the trained multi-modal fusion recognition network.
Calculating the accuracy rate of target identification by the following formula:
Figure BDA0003521126340000107
wherein c represents the recognition accuracy of the test sample set, E*Represents the total number of samples of the test sample set, h (-) represents the class discrimination function,
Figure BDA0003521126340000108
true class label, y, representing the e-th test sample in the test sample seteThe multi-modal fusion recognition network output result corresponding to the e test sample in the test sample set is shown
Figure BDA0003521126340000109
And yeWhen the phase difference is equal to each other,
Figure BDA00035211263400001010
equal to 1, otherwise,
Figure BDA00035211263400001011
equal to 0.
According to E*=390,
Figure BDA00035211263400001012
The recognition accuracy based on the multi-modal fusion recognition method is 86.67% through calculation.
In conclusion, compared with the existing method, the target identification method based on the fusion of the ISAR image and the optical image of the multi-fusion deep neural network provided by the invention can effectively improve the identification accuracy of the space target.

Claims (5)

1. A target identification method for fusing an ISAR image and an optical image based on a multi-fusion deep neural network is characterized by comprising the following steps:
(1) acquiring a training sample set R and a test sample set E:
obtaining S ISAR images A including K target categories as AsL 1 is less than or equal to S and S optical images B is equal to { B ≦ S ≦sS is more than or equal to 1 and less than or equal to S, and for each ISAR image AsAnd each optical image BsRespectively carrying out normalization, labeling targets contained in each normalized ISAR image and each optical image, taking M normalized ISAR images, optical images and corresponding labels thereof as a training sample set R, taking the rest normalized ISAR images, optical images and corresponding labels thereof as a test sample set E, wherein K is more than or equal to 3, S is more than or equal to 800,
Figure FDA0003521126330000011
As、Bsrespectively representing the s th ISAR image and the optical image observed by the same target at the same ground observation visual angle;
(2) constructing a target recognition model H based on a multi-fusion deep neural network:
constructing a target recognition model H comprising a convolution feature extraction network and a multiple fusion deep neural network, wherein the multiple fusion deep neural network comprises a bilinear pooling fusion network, a channel attention fusion network and an addition fusion network, and the method comprises the following steps of:
the convolution feature extraction network comprises a first convolution feature extraction network and a second convolution feature extraction network which are arranged in parallel;
the bilinear pooling fusion network comprises a first bilinear pooling feature extraction network and a second bilinear pooling feature extraction network which are arranged in parallel, and a feature fusion layer connected with the output ends of the two bilinear pooling feature extraction networks;
the output of the first convolution feature extraction network is connected with the input ends of the first bilinear pooling feature extraction network and the channel attention fusion network, and the output of the second convolution feature extraction network is connected with the input ends of the second bilinear pooling feature extraction network and the channel attention fusion network; the output ends of the bilinear pooling fusion network and the channel attention fusion network are connected with the input end of the addition fusion network;
(3) carrying out iterative training on a target recognition model H based on the multiple fusion depth network:
(3a) the number of initialization iterations is t,the maximum iteration time T is more than or equal to 20, and the target identification model of the tth iteration is Ht,HtThe weight parameter of is omegatLet t be 1;
(3b) taking a training sample set R as a target recognition model based on a multiple fusion depth network as HtPerforming convolution feature extraction on each ISAR image training sample by a first convolution feature extraction network in the convolution feature extraction network to obtain an ISAR image convolution feature set corresponding to R, and performing convolution feature extraction on each optical image training sample by a second convolution feature extraction network to obtain an optical image convolution feature set corresponding to R;
(3c) the multiple fusion depth neural network performs multiple fusion on each ISAR image convolution characteristic and the corresponding optical image convolution characteristic:
(3c1) a first bilinear pooling feature extraction network in the bilinear pooling fusion network performs bilinear pooling feature extraction on each ISAR image convolution feature, a second bilinear pooling feature extraction network performs bilinear pooling feature extraction on each optical image convolution feature, and a feature fusion layer performs feature fusion on the extracted bilinear pooling feature of each ISAR image and the bilinear pooling feature of the corresponding optical image; meanwhile, the channel attention fusion network performs channel attention-based feature fusion on each ISAR image convolution feature and the corresponding optical image convolution feature;
(3c2) the additive fusion network performs additive fusion on each bilinear pooling fusion feature obtained by the bilinear pooling fusion network and the fusion feature based on the channel attention obtained by the corresponding channel attention fusion network to obtain a prediction label y of each target in the training sample set R;
(3d) adopting a cross entropy loss function to pass through the predicted label y of each target and the corresponding real label y*Calculate HtLoss value L oftAnd find LtFor weight parameter omegatPartial derivatives of
Figure FDA0003521126330000021
Then adopting gradient descent methodBy passing
Figure FDA0003521126330000022
For weight parameter omegatUpdating is carried out;
(3e) judging whether T is more than or equal to T, if so, obtaining a trained target recognition model H*Otherwise, let t be t +1, and execute step (3 b);
(4) acquiring the identification results of the target ISAR image and the optical image:
taking the test sample set E as a trained target recognition model H*The input of E is transmitted forward to obtain the prediction labels of all the targets contained in E, and the target corresponding to each prediction label is the recognition result.
2. The method for identifying an ISAR image fused with an optical image based on a multi-fusion deep neural network as claimed in claim 1, wherein the step (2) is executed by the target identification model H, wherein:
the convolution feature extraction network comprises a first convolution feature extraction network and a second convolution feature extraction network which both comprise six network layers, each network layer comprises a convolution layer, a batch normalization layer, a ReLU activation layer and a maximum value pooling layer which are sequentially cascaded, and specific parameters are set as follows: the convolution kernel size of the convolution layer in the first network layer is 5 multiplied by 5 pixels, the convolution kernel number is 32, the channel number of the batch normalization layer is 32, and the window size of the maximum pooling layer is 2 multiplied by 2 pixels; the convolution kernel size of the convolution layers in the second network layer and the third network layer is 5 x 5 pixels, the number of the convolution kernels is 64, the number of channels of the batch normalization layer is 64, and the window size of the maximum pooling layer is 2 x 2 pixels; the convolution kernel size of the convolution layer in the fourth to sixth network layers is 3 × 3 pixels, the number of convolution kernels is 128, the number of channels of the batch normalization layer is 128, and the window size of the maximum pooling layer is 2 × 2 pixels;
the system comprises a bilinear pooling fusion network, a first bilinear pooling feature extraction network and a second bilinear pooling feature extraction network, wherein the first bilinear pooling feature extraction network and the second bilinear pooling feature extraction network both comprise a bilinear pooling layer and a feature normalization layer which are connected in sequence;
the channel attention fusion network comprises a characteristic channel splicing layer and a channel attention unit which are connected in sequence; the channel attention unit comprises a global average pooling layer and a global maximum pooling layer which are arranged in parallel, and a first convolution layer, a ReLU active layer, a second convolution layer and a Sigmoid active layer which are sequentially cascaded with the output of the global average pooling layer and the output of the global maximum pooling layer, wherein the output dimensionalities of the global average pooling layer and the output dimensionality of the global maximum pooling layer are both 1 x 1, the first convolution layer comprises 16 convolution kernels, the size of each convolution kernel is 1 x 1, the second convolution layer comprises 256 convolution kernels, and the size of each convolution kernel is 1 x 1;
the adding fusion network comprises a first full-connection network and a second full-connection network which are arranged in parallel, and an adding fusion layer and a Softmax layer which are sequentially cascaded with the output of the first full-connection network and the output of the second full-connection network; the first full-connection network and the second full-connection network both comprise two layers of full-connection layers, the number of neurons in the first layer of full-connection layers is 128, and the number of neurons in the second layer of full-connection layers and the number of neurons are 4.
3. The method for identifying an ISAR image and optical image fused target based on a multi-fusion deep neural network as claimed in claim 2, wherein the first bilinear pooling feature extraction network in the bilinear pooling network in step (3c1) performs bilinear pooling feature extraction on each ISAR image convolution feature, and the feature fusion layer performs feature fusion on the extracted bilinear pooling feature of each ISAR image and the bilinear pooling feature of the corresponding optical image, wherein:
the first bilinear pooling feature extraction network in the bilinear pooling fusion network carries out bilinear pooling feature extraction on convolution features of each ISAR image, firstly, a bilinear pooling layer in the first bilinear pooling feature extraction network carries out bilinear pooling on convolution features f of each ISAR image, an output result is x, then, a feature normalization layer normalizes x, and the output result is z:
Figure FDA0003521126330000041
Figure FDA0003521126330000042
wherein the content of the first and second substances,
Figure FDA0003521126330000043
representing the input convolution characteristics of the bilinear pooling layer, D representing the channel number of the input convolution characteristics, L ∈ L representing the spatial position of the input convolution characteristics, Σ representing a summation operation, vec representing expanding a matrix into vectors, fT(l) Transpose of f (l), sgn stands for sign operation, | | | | | non-conducting phosphor22-norm operation is expressed;
bilinear pooling feature F of each extracted ISAR image by the feature fusion layerBIAnd bilinear pooling feature F of its corresponding optical imageBOPerforming feature fusion, and outputting the result as FB
Figure FDA0003521126330000044
Wherein, WBIAnd WBOThe representation may be a representation of a learnable weight,
Figure FDA0003521126330000045
representing a co-located element multiplication operation, concat (-) representing a splice operation on a feature on a channel.
4. The method for object recognition based on ISAR image fusion and optical image fusion of multiple fusion deep neural networks as claimed in claim 2, wherein the step (3c2) is an additive fusion network, wherein the additive fusion layer is the output result of the first fully-connected network
Figure FDA0003521126330000046
And a second fully-connected networkOutput result of the network
Figure FDA0003521126330000047
And performing addition fusion, wherein the output result is F:
Figure FDA0003521126330000048
wherein, WBAnd WCThe representation may be a representation of a learnable weight,
Figure FDA0003521126330000049
representing a co-located element multiplication operation.
5. The method for fusion recognition of ISAR image and optical image based on multiple fusion depth network of claim 1, wherein the step (3d) is performed by calculating HtLoss value L oftTo the weight parameter omegatUpdating, wherein the calculation formula and the updating formula are respectively as follows:
Figure FDA00035211263300000410
Figure FDA0003521126330000051
wherein M represents the number of training samples,
Figure FDA0003521126330000052
representing the true label, y, corresponding to the mth sample in the training sample setmRepresentation model HtFor the prediction label of the mth sample, ln represents the base e logarithm operation,
Figure FDA0003521126330000053
represents omegatEta represents LtThe learning rate of (a) is set,
Figure FDA0003521126330000054
representing a derivative operation.
CN202210178011.3A 2022-02-25 2022-02-25 Target identification method based on multiple fusion deep neural networks Pending CN114565856A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210178011.3A CN114565856A (en) 2022-02-25 2022-02-25 Target identification method based on multiple fusion deep neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210178011.3A CN114565856A (en) 2022-02-25 2022-02-25 Target identification method based on multiple fusion deep neural networks

Publications (1)

Publication Number Publication Date
CN114565856A true CN114565856A (en) 2022-05-31

Family

ID=81715467

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210178011.3A Pending CN114565856A (en) 2022-02-25 2022-02-25 Target identification method based on multiple fusion deep neural networks

Country Status (1)

Country Link
CN (1) CN114565856A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783069A (en) * 2022-06-21 2022-07-22 中山大学深圳研究院 Method, device, terminal equipment and storage medium for identifying object based on gait
CN115019174A (en) * 2022-06-10 2022-09-06 西安电子科技大学 Up-sampling remote sensing image target identification method based on pixel recombination and attention
CN116452936A (en) * 2023-04-22 2023-07-18 安徽大学 Rotation target detection method integrating optics and SAR image multi-mode information
CN117528233A (en) * 2023-09-28 2024-02-06 哈尔滨航天恒星数据系统科技有限公司 Zoom multiple identification and target re-identification data set manufacturing method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019174A (en) * 2022-06-10 2022-09-06 西安电子科技大学 Up-sampling remote sensing image target identification method based on pixel recombination and attention
CN115019174B (en) * 2022-06-10 2023-06-16 西安电子科技大学 Up-sampling remote sensing image target recognition method based on pixel recombination and attention
CN114783069A (en) * 2022-06-21 2022-07-22 中山大学深圳研究院 Method, device, terminal equipment and storage medium for identifying object based on gait
CN116452936A (en) * 2023-04-22 2023-07-18 安徽大学 Rotation target detection method integrating optics and SAR image multi-mode information
CN116452936B (en) * 2023-04-22 2023-09-29 安徽大学 Rotation target detection method integrating optics and SAR image multi-mode information
CN117528233A (en) * 2023-09-28 2024-02-06 哈尔滨航天恒星数据系统科技有限公司 Zoom multiple identification and target re-identification data set manufacturing method
CN117528233B (en) * 2023-09-28 2024-05-17 哈尔滨航天恒星数据系统科技有限公司 Zoom multiple identification and target re-identification data set manufacturing method

Similar Documents

Publication Publication Date Title
CN108229468B (en) Vehicle appearance feature recognition and vehicle retrieval method and device, storage medium and electronic equipment
EP3971772B1 (en) Model training method and apparatus, and terminal and storage medium
CN114565856A (en) Target identification method based on multiple fusion deep neural networks
CN109522942B (en) Image classification method and device, terminal equipment and storage medium
CN111507378A (en) Method and apparatus for training image processing model
CN111274869B (en) Method for classifying hyperspectral images based on parallel attention mechanism residual error network
CN105740894B (en) Semantic annotation method for hyperspectral remote sensing image
CN108846426B (en) Polarization SAR classification method based on deep bidirectional LSTM twin network
CN112446476A (en) Neural network model compression method, device, storage medium and chip
Wang et al. Few-shot SAR automatic target recognition based on Conv-BiLSTM prototypical network
CN112288011B (en) Image matching method based on self-attention deep neural network
CN112639828A (en) Data processing method, method and equipment for training neural network model
CN111368972B (en) Convolutional layer quantization method and device
CN112561027A (en) Neural network architecture searching method, image processing method, device and storage medium
CN113705769A (en) Neural network training method and device
CN113570029A (en) Method for obtaining neural network model, image processing method and device
CN114821164A (en) Hyperspectral image classification method based on twin network
CN113822125B (en) Processing method and device of lip language recognition model, computer equipment and storage medium
CN111814685A (en) Hyperspectral image classification method based on double-branch convolution self-encoder
CN113592060A (en) Neural network optimization method and device
CN113537462A (en) Data processing method, neural network quantization method and related device
CN110705600A (en) Cross-correlation entropy based multi-depth learning model fusion method, terminal device and readable storage medium
CN111695673A (en) Method for training neural network predictor, image processing method and device
CN114926693A (en) SAR image small sample identification method and device based on weighted distance
CN111797970A (en) Method and apparatus for training neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination