CN114565856A

CN114565856A - Target identification method based on multiple fusion deep neural networks

Info

Publication number: CN114565856A
Application number: CN202210178011.3A
Authority: CN
Inventors: 白雪茹; 毛宇航; 周雪宁; 刘潇丹; 周峰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-05-31

Abstract

The invention provides a target identification method based on a multiple fusion deep neural network, which mainly solves the technical problem of low spatial target identification accuracy in the prior art and comprises the following implementation steps: (1) acquiring a training sample set and a test sample set; (2) constructing a target recognition model based on a multiple fusion deep neural network; (3) performing iterative training on a target recognition model based on the multiple fusion depth network; (4) and acquiring the identification results of the target ISAR image and the optical image. The invention respectively extracts the channel attention-based fusion characteristic and the bilinear pooling fusion characteristic of the ISAR image and the optical image of the target by performing multi-modal fusion recognition on the ISAR image and the optical image of the target, so that a network can extract richer target characteristic information and improve the attention degree of different contributions and importance to the characteristics during characteristic splicing, thereby improving the recognition accuracy rate of the spatial target.

Description

Target identification method based on multiple fusion deep neural networks

Technical Field

The invention belongs to the technical field of image processing, relates to a target identification method, and particularly relates to a target identification method of an ISAR image and an optical image based on a multiple fusion deep neural network, which can be used for effectively identifying a space target.

Background

The target identification task is to distinguish the targets according to the different characteristics of the targets in different classes reflected in the information. For the identification of radar targets, the traditional radar target identification method extracts effective features from original data and realizes category judgment by manually designing a feature extractor, so that a large amount of time and expert knowledge are needed, and meanwhile, whether the extracted features are effective or not is difficult to define in many tasks, which brings great difficulty to the radar target identification. In recent years, a deep learning radar target identification method based on data driving is greatly developed, so that a complex characteristic design and selection process can be avoided, and effective characteristics can be automatically learned from data. However, the existing deep learning radar target identification method is mainly single-mode identification, that is, only single-mode data is used for radar target feature extraction and category judgment, so that the problem of insufficient data information amount may exist, and the requirement for accurately identifying the target cannot be met. The deep learning multi-mode fusion recognition method utilizes complementary information fusion and redundant information elimination of a plurality of modal data to learn better feature representation, thereby effectively improving the recognition performance of the model. The existing multi-mode fusion recognition method based on deep learning mainly utilizes a convolutional neural network to extract features, but the fusion mode is simple, different contributions and importance of the features are not considered, and the importance of a feature space is enriched by extracting various features, so that the recognition accuracy rate also has a space for improving.

The published paper "radio-Based Human goal Recognition Using Dual-Channel Deep computational Neural Network" (IEEE Transactions on Geoscience and Remote Sensing,2019) of Xueruu Bai, Ye Hui, Li Wang, Feng Zhou proposes a Human body posture Recognition method Based on a two-Channel Convolutional Neural Network. Firstly, constructing a short-window long-channel module and a long-window long-channel module, and constructing a two-channel convolutional neural network based on the short-window long-channel module and the long-window long-channel module; the short-window long-time-frequency graph and the long-window long-time-frequency graph obtained by respectively processing short-window long STFT and long-window long STFT by using radar echo signals are used as training samples, wherein the characteristics of four limbs of a human body in the short-window long-time-frequency graph are obvious, and the characteristics of the trunk of the long-window long-time-frequency graph are obvious; training the two-channel convolutional neural network by using the training sample; and inputting the test samples into the trained dual-channel convolutional neural network to complete recognition. According to the method, the two-channel convolution neural network is constructed, the short-window long-time-frequency graph and the long-window long-time-frequency graph extracted from the radar echo signal are subjected to fusion recognition, the network can simultaneously utilize the four limbs information and the trunk information of a human body contained in the radar echo signal, the complementary information of two modal data is effectively utilized, and therefore the model recognition performance is effectively improved. However, the method has the disadvantages that only convolution characteristics are extracted from two modal data, and only simple characteristic splicing is carried out on the two modal convolution characteristics, so that the problems that characteristic information is not rich enough, different contributions to the characteristics and importance attention is not enough are caused, and the accuracy of target identification is limited.

Disclosure of Invention

The invention aims to provide a target identification method based on ISAR (inverse synthetic aperture radar) image and optical image fusion of a multi-fusion deep neural network aiming at overcoming the defects of the prior art, and aims to solve the technical problems that when multi-modal data fusion identification of a target is carried out in the prior art, feature information is not rich enough, different contributions and importance of the features are not paid enough attention to, and the target is difficult to accurately identify.

The technical idea of the invention is as follows: the method has the advantages that the multiple fusion deep neural network is constructed, and the trained multiple fusion deep neural network is used for directly carrying out fusion recognition on the ISAR image and the optical image of the target, so that the problems that in the prior art, when target recognition is carried out, feature information is not rich enough, attention to feature splicing details is not enough, and recognition accuracy is low are solved. According to the invention, the channel attention fusion network is utilized to perform fine convolution feature fusion on the ISAR image and the optical image of the target, so that the problem that feature details on a channel are difficultly concerned in the prior art is avoided; meanwhile, bilinear pooling features are further extracted and merged from the convolution features through a bilinear pooling feature merging network, so that the extraction of the bilinear pooling feature of the target is realized. The bilinear pooling fusion characteristics and the fusion characteristics based on the channel attention are subjected to additive fusion through an additive fusion network, so that the characteristic abundance of the target is improved, and the problem that the target is difficult to accurately fuse and recognize by utilizing ISAR images and optical images in the prior art is solved.

In order to realize the purpose, the technical scheme adopted by the invention comprises the following steps:

(1) acquiring a training sample set R and a test sample set E:

obtaining S ISAR images A ═ A including K object categories_sL 1 is less than or equal to S and S optical images B is equal to { B ≦ S ≦_sS is more than or equal to 1 and less than or equal to S, and for each ISAR image A_sAnd each optical image B_sRespectively carrying out normalization, labeling targets contained in each normalized ISAR image and each optical image, taking M normalized ISAR images, optical images and corresponding labels thereof as a training sample set R, taking the rest normalized ISAR images, optical images and corresponding labels thereof as a test sample set E, wherein K is more than or equal to 3, S is more than or equal to 800,

A_s、B_srespectively indicate identity to the same objectThe s th ISAR image and the optical image observed from the ground observation visual angle;

(2) constructing a target recognition model H based on a multi-fusion deep neural network:

constructing a target recognition model H comprising a convolution feature extraction network and a multiple fusion deep neural network, wherein the multiple fusion deep neural network comprises a bilinear pooling fusion network, a channel attention fusion network and an addition fusion network, and the method comprises the following steps of:

the convolution feature extraction network comprises a first convolution feature extraction network and a second convolution feature extraction network which are arranged in parallel;

the bilinear pooling fusion network comprises a first bilinear pooling feature extraction network and a second bilinear pooling feature extraction network which are arranged in parallel, and a feature fusion layer connected with the output ends of the two bilinear pooling feature extraction networks;

the output of the first convolution feature extraction network is connected with the input ends of the first bilinear pooling feature extraction network and the channel attention fusion network, and the output of the second convolution feature extraction network is connected with the input ends of the second bilinear pooling feature extraction network and the channel attention fusion network; the output ends of the bilinear pooling fusion network and the channel attention fusion network are connected with the input end of the addition fusion network;

(3) carrying out iterative training on a target recognition model H based on the multiple fusion depth network:

(3a) the initial iteration number is T, the maximum iteration number T is more than or equal to 20, and the target identification model of the tth iteration is H^t，H^tThe weight parameter of is omega_tLet t be 1;

(3b) taking a training sample set R as a target recognition model based on a multiple fusion depth network as H^tPerforming convolution feature extraction on each ISAR image training sample by a first convolution feature extraction network in the convolution feature extraction network to obtain an ISAR image convolution feature set corresponding to R, and performing convolution feature extraction on each optical image training sample by a second convolution feature extraction network to obtain an optical image convolution feature set corresponding to R;

(3c) the multiple fusion depth neural network performs multiple fusion on each ISAR image convolution characteristic and the corresponding optical image convolution characteristic:

(3c1) a first bilinear pooling feature extraction network in the bilinear pooling fusion network performs bilinear pooling feature extraction on each ISAR image convolution feature, a second bilinear pooling feature extraction network performs bilinear pooling feature extraction on each optical image convolution feature, and a feature fusion layer performs feature fusion on the extracted bilinear pooling feature of each ISAR image and the bilinear pooling feature of the corresponding optical image; meanwhile, the channel attention fusion network performs channel attention-based feature fusion on each ISAR image convolution feature and the corresponding optical image convolution feature;

(3c2) the additive fusion network performs additive fusion on each bilinear pooling fusion feature obtained by the bilinear pooling fusion network and the fusion feature based on the channel attention obtained by the corresponding channel attention fusion network to obtain a prediction label y of each target in the training sample set R;

(3d) adopting a cross entropy loss function to pass through the predicted label y of each target and the corresponding real label y^*Calculate H^tLoss value L of^tAnd find L^tFor weight parameter omega_tPartial derivatives of

Then using a gradient descent method by

For weight parameter omega_tUpdating is carried out;

(3e) judging whether T is more than or equal to T, if so, obtaining a trained target recognition model H^*Otherwise, let t be t +1, and execute step (3 b);

(4) acquiring the identification results of the target ISAR image and the optical image:

taking the test sample set E as a trained target classification model H^*Is propagated forward to obtain the predicted targets of all targets contained in EAnd labeling, wherein the target corresponding to each predicted label is the recognition result.

Compared with the prior art, the invention has the following advantages:

firstly, the target recognition model constructed by the invention comprises a multiple fusion deep neural network, in the process of training the model and acquiring the target recognition result, a bilinear pooling fusion network in the multiple fusion deep neural network extracts bilinear pooling fusion characteristics of the target, a channel attention fusion network extracts convolution fusion characteristics of the target, and the network model can extract two types of fusion characteristics of the target, so that the information richness of the target characteristic extraction is improved by the network model, the problem that the characteristic information is not rich enough due to the fact that only a single type of target characteristic is extracted when the target is recognized based on the neural network in the prior art is solved, and the accuracy of target recognition is effectively improved.

Secondly, in the process of training a target recognition model and acquiring a target recognition result, when the bilinear pooling feature of the target is merged by the bilinear pooling merging network, the effect of reinforcing the important feature is realized through self-adaptive weight; when the channel attention fusion network performs feature fusion on the target convolution features, the channel attention enables a network model to pay more attention to the channel features with the largest information quantity and restrain some unimportant channel features; therefore, when feature fusion is carried out on the bilinear pooling fusion network and the channel attention fusion network, the problem that different contributions to features and importance attention are insufficient when feature splicing fusion is carried out firstly in the prior art is solved, better fusion features are extracted from the target, and the accuracy of target identification is further improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a schematic diagram of a structure of an object recognition model constructed by the present invention;

fig. 3 is a schematic structural diagram of an additive fusion network constructed by the present invention.

Detailed Description

The invention is further described below with reference to the figures and examples.

Referring to fig. 1, the present invention includes the following steps.

(1) Acquiring a training sample set R and a test sample set E:

obtaining S ISAR images A including K target categories as A_sL 1 ≦ S } and S optical images B ≦ B_sS is more than or equal to 1 and less than or equal to S, each image comprises 1 target, and each ISAR image A is subjected to_sAnd each optical image B_sRespectively carrying out normalization to solve the problem of network gradient explosion, labeling targets contained in each normalized ISAR image and each optical image, taking M normalized ISAR images, optical images and corresponding labels thereof as a training sample set R, taking the rest normalized ISAR images, optical images and corresponding labels thereof as a test sample set E, wherein K is more than or equal to 3, S is more than or equal to 800,

A_s、B_srespectively representing the s th ISAR image and the optical image observed by the same target at the same ground observation visual angle; in this embodiment, K is 4, S is 858, and M is 468;

constructing a target recognition model H comprising a convolution feature extraction network and a multiple fusion deep neural network, wherein the structure of the target recognition model H is shown in figure 2, the multiple fusion deep neural network comprises a bilinear pooling fusion network, a channel attention fusion network and an addition fusion network, and the method comprises the following steps:

the convolution feature extraction network comprises a first convolution feature extraction network and a second convolution feature extraction network which are arranged in parallel; the first convolution feature extraction network and the second convolution feature extraction network both comprise six network layers, each network layer comprises a convolution layer, a batch normalization layer, a ReLU activation layer and a maximum value pooling layer which are sequentially cascaded, and specific parameters are set as follows: the convolution kernel size of the convolution layer in the first network layer is 5 multiplied by 5 pixels, the convolution kernel number is 32, the channel number of the batch normalization layer is 32, and the window size of the maximum pooling layer is 2 multiplied by 2 pixels; the convolution kernel size of the convolution layers in the second network layer and the third network layer is 5 x 5 pixels, the number of the convolution kernels is 64, the number of channels of the batch normalization layer is 64, and the window size of the maximum pooling layer is 2 x 2 pixels; the convolution kernel size of the convolution layer in the fourth to sixth network layers is 3 × 3 pixels, the number of convolution kernels is 128, the number of channels of the batch normalization layer is 128, and the window size of the maximum pooling layer is 2 × 2 pixels;

the bilinear pooling fusion network comprises a first bilinear pooling feature extraction network, a second bilinear pooling feature extraction network and a feature fusion layer, wherein the first bilinear pooling feature extraction network and the second bilinear pooling feature extraction network are arranged in parallel, and the feature fusion layer is connected with the output ends of the two bilinear pooling feature extraction networks; the first bilinear pooling feature extraction network and the second bilinear pooling feature extraction network both comprise a bilinear pooling layer and a feature normalization layer which are connected in sequence;

the channel attention fusion network comprises a characteristic channel splicing layer and a channel attention unit which are connected in sequence; the channel attention unit comprises a global average pooling layer and a global maximum pooling layer which are arranged in parallel, and a first convolution layer, a ReLU active layer, a second convolution layer and a Sigmoid active layer which are sequentially cascaded with the output of the global average pooling layer and the output of the global maximum pooling layer, wherein the output dimensionalities of the global average pooling layer and the output dimensionality of the global maximum pooling layer are both 1 x 1, the first convolution layer comprises 16 convolution kernels, the size of each convolution kernel is 1 x 1, the second convolution layer comprises 256 convolution kernels, and the size of each convolution kernel is 1 x 1;

the structure of the additive fusion network is shown in fig. 3, and the additive fusion network comprises a first fully-connected network and a second fully-connected network which are arranged in parallel, and an additive fusion layer and a Softmax layer which are sequentially cascaded with the outputs of the first fully-connected network and the second fully-connected network; the first full-connection network and the second full-connection network respectively comprise two full-connection layers, the number of neurons in the first full-connection layer is 128, and the number of neurons in the second full-connection layer are 4;

(3) carrying out iterative training on a target recognition model H based on the multiple fusion deep network:

(3a) the initial iteration number is T, the maximum iteration number T is more than or equal to 20, and the target identification model of the tth iteration is H^t，H^tThe weight parameter of is omega_tLet t be 1; wherein, in the embodiment, T is 20;

(3c1) a first bilinear pooling feature extraction network in the bilinear pooling fusion network performs bilinear pooling feature extraction on each ISAR image convolution feature, a second bilinear pooling feature extraction network performs bilinear pooling feature extraction on each optical image convolution feature, and a feature fusion layer performs feature fusion on each extracted bilinear pooling feature of each ISAR image and the bilinear pooling feature of the corresponding optical image; meanwhile, the channel attention fusion network performs channel attention-based feature fusion on each ISAR image convolution feature and the corresponding optical image convolution feature;

a bilinear pooling layer in the first bilinear pooling feature extraction network performs bilinear pooling on the convolution feature f of each ISAR image, the output result is x, then a feature normalization layer normalizes x, and the output result is z:

wherein the content of the first and second substances,

representing the input convolution characteristics of the bilinear pooling layer, D representing the channel number of the input convolution characteristics, L ∈ L representing the spatial position of the input convolution characteristics, Σ representing a summation operation, vec representing expanding a matrix into vectors, f^T(l) Transpose of f (l), sgn stands for sign operation, | | | | | non-conducting phosphor₂2-norm operation is expressed;

bilinear pooling feature F of each extracted ISAR image by the feature fusion layer_BIAnd bilinear pooling feature F of its corresponding optical image_BOPerforming feature fusion, and outputting the result as F_B：

Wherein, W_BIAnd W_BOThe representation may be a representation of a learnable weight,

representing a co-located element multiplication operation, concat (-) representing a splicing operation on a feature on a channel;

through the bilinear pooling fusion network, the target ISAR image convolution characteristics and the optical image convolution characteristics are subjected to bilinear pooling characteristic extraction firstly and then are fused, so that the problem of model overfitting caused by the fact that the calculated amount is multiplied and increased due to the fact that the convolution characteristic is fused firstly and then the fusion characteristic is subjected to bilinear pooling is effectively solved, meanwhile, when the bilinear pooling fusion network is used for feature fusion, the details of the channel of the bilinear pooling characteristic can be concerned, the effect of highlighting the important characteristic can be achieved through self-adaptive weight, and the bilinear pooling fusion characteristic of the target is better extracted; convolution feature fusion is carried out on the convolution feature of the target ISAR image and the convolution feature of the optical image through a channel attention fusion network based on channel attention, so that a network model can focus more on the channel feature with the largest information amount, inhibit some unimportant channel features and better summarize the convolution features; by extracting the convolution characteristic and the bilinear pooling characteristic of the target, the problem of insufficient characteristic extraction in deep learning is effectively solved.

(3c2) The additive fusion network performs additive fusion on each bilinear pooling fusion feature obtained by the bilinear pooling fusion network and the fusion feature based on the channel attention obtained by the corresponding channel attention fusion network to obtain a prediction label y of each target in the training sample set R; wherein the output result of the additive fusion layer to the first fully-connected network

And output results of the second fully connected network

And performing addition fusion, wherein the output result is F:

wherein, W_BAnd W_CThe representation may be a representation of a learnable weight,

representing a co-located element multiplication operation.

The bilinear pooling fusion network extracts bilinear pooling fusion characteristics of a target and fusion characteristics extracted by the channel attention fusion network and based on channel attention through the addition fusion network, and the fusion of the bilinear pooling fusion network and the channel attention fusion network improves the richness of the target characteristics, so that the effect of improving the identification accuracy of the deep neural network on the target ISAR image and the optical image is achieved.

Then using a gradient descent method by

For weight parameter omega_tUpdating is carried out;

calculating H^tLoss value L of^tFor the weight parameter omega_tUpdating, wherein the calculation formula and the updating formula are respectively as follows:

wherein M represents the number of training samples,

representing the true label, y, corresponding to the mth sample in the training sample set_mRepresentation model H^tFor the prediction label of the mth sample, ln represents the base e logarithm operation,

represents omega_tEta represents L^tThe learning rate of (a) is determined,

representing a derivative operation.

taking the test sample set E as a trained target recognition model H^*The input of E is transmitted forward to obtain the prediction labels of all the targets contained in E, and the target corresponding to each prediction label is the recognition result.

The technical effects of the present invention will be further described with reference to simulation experiments.

1. Simulation conditions and contents.

The hardware platform of the simulation experiment is as follows: the processor is an Intel Xeon Silver 4114CPU, the main frequency of the processor is 2.20GHz, the memory is 128GB, and the display card is NVIDIA GeForce GTX 1080 Ti; the software platform is Windows 10 operating system, Matlab2018, Python 3.6 and tensorflow 1.5.

The data set used in the simulation experiment is 4 types of satellite ISAR images acquired by the radar under the conditions of 16GHz central frequency, 2GHz signal bandwidth and 6-degree accumulation angle and 4 types of satellite optical images acquired in 3ds Max and simulated at the same ground observation visual angle, and the four types of satellite targets are respectively: calipso, Cloudsat, Jason-3, OCO2, selecting 468 ISAR images and corresponding optical images as a training sample set, and 390 ISAR images and corresponding optical images as a test sample set.

The invention and the existing multi-mode feature fusion method are adopted to respectively perform fusion recognition on the ISAR image and the optical image of the 4 types of satellite targets, so as to obtain the recognition accuracy.

2. And (5) analyzing a simulation result.

The method is applied to perform fusion recognition on the ISAR images and the optical images of the 4-class satellite targets, firstly, a training sample set is used for training the multi-fusion deep neural network to obtain a trained target recognition model based on the multi-fusion deep neural network, and then, a test sample set is used for testing the trained multi-fusion deep neural network.

Calculating the accuracy of target identification by the following formula:

wherein c represents the recognition accuracy of the test sample set, E^*Represents the total number of samples of the test sample set, h (-) represents the class discrimination function,

a true class label, y, representing the e-th test sample in the set of test samples_eThe neural output result of the multiple fusion depth network corresponding to the e test sample in the test sample set is shown when

And y_eWhen the phase difference is equal to each other,

equal to 1, otherwise,

equal to 0.

According to E^*＝390，

The recognition accuracy of the invention is calculated to be 90.0%.

The method comprises the steps of identifying ISAR images and optical images of 4 types of satellite targets by applying an existing multi-mode Fusion identification method, wherein the existing multi-mode Fusion identification method is taken from a paper 'Multimodal 2D +3D Facial Expression With Deep Fusion Neural Network' issued by Huibin Li, Jian Sun and the like. In a simulation experiment, a multi-modal fusion recognition network is trained by using a training sample set to obtain a trained multi-modal fusion recognition network, and then a test sample set is used for testing on the trained multi-modal fusion recognition network.

Calculating the accuracy rate of target identification by the following formula:

true class label, y, representing the e-th test sample in the test sample set_eThe multi-modal fusion recognition network output result corresponding to the e test sample in the test sample set is shown

And y_eWhen the phase difference is equal to each other,

equal to 1, otherwise,

equal to 0.

According to E^*＝390，

The recognition accuracy based on the multi-modal fusion recognition method is 86.67% through calculation.

In conclusion, compared with the existing method, the target identification method based on the fusion of the ISAR image and the optical image of the multi-fusion deep neural network provided by the invention can effectively improve the identification accuracy of the space target.

Claims

1. A target identification method for fusing an ISAR image and an optical image based on a multi-fusion deep neural network is characterized by comprising the following steps:

(1) acquiring a training sample set R and a test sample set E:

obtaining S ISAR images A including K target categories as A_sL 1 is less than or equal to S and S optical images B is equal to { B ≦ S ≦_sS is more than or equal to 1 and less than or equal to S, and for each ISAR image A_sAnd each optical image B_sRespectively carrying out normalization, labeling targets contained in each normalized ISAR image and each optical image, taking M normalized ISAR images, optical images and corresponding labels thereof as a training sample set R, taking the rest normalized ISAR images, optical images and corresponding labels thereof as a test sample set E, wherein K is more than or equal to 3, S is more than or equal to 800,

A_s、B_srespectively representing the s th ISAR image and the optical image observed by the same target at the same ground observation visual angle;

(3a) the number of initialization iterations is t,the maximum iteration time T is more than or equal to 20, and the target identification model of the tth iteration is H^t，H^tThe weight parameter of is omega_tLet t be 1;

Then adopting gradient descent methodBy passing

For weight parameter omega_tUpdating is carried out;

2. The method for identifying an ISAR image fused with an optical image based on a multi-fusion deep neural network as claimed in claim 1, wherein the step (2) is executed by the target identification model H, wherein:

the convolution feature extraction network comprises a first convolution feature extraction network and a second convolution feature extraction network which both comprise six network layers, each network layer comprises a convolution layer, a batch normalization layer, a ReLU activation layer and a maximum value pooling layer which are sequentially cascaded, and specific parameters are set as follows: the convolution kernel size of the convolution layer in the first network layer is 5 multiplied by 5 pixels, the convolution kernel number is 32, the channel number of the batch normalization layer is 32, and the window size of the maximum pooling layer is 2 multiplied by 2 pixels; the convolution kernel size of the convolution layers in the second network layer and the third network layer is 5 x 5 pixels, the number of the convolution kernels is 64, the number of channels of the batch normalization layer is 64, and the window size of the maximum pooling layer is 2 x 2 pixels; the convolution kernel size of the convolution layer in the fourth to sixth network layers is 3 × 3 pixels, the number of convolution kernels is 128, the number of channels of the batch normalization layer is 128, and the window size of the maximum pooling layer is 2 × 2 pixels;

the system comprises a bilinear pooling fusion network, a first bilinear pooling feature extraction network and a second bilinear pooling feature extraction network, wherein the first bilinear pooling feature extraction network and the second bilinear pooling feature extraction network both comprise a bilinear pooling layer and a feature normalization layer which are connected in sequence;

the adding fusion network comprises a first full-connection network and a second full-connection network which are arranged in parallel, and an adding fusion layer and a Softmax layer which are sequentially cascaded with the output of the first full-connection network and the output of the second full-connection network; the first full-connection network and the second full-connection network both comprise two layers of full-connection layers, the number of neurons in the first layer of full-connection layers is 128, and the number of neurons in the second layer of full-connection layers and the number of neurons are 4.

3. The method for identifying an ISAR image and optical image fused target based on a multi-fusion deep neural network as claimed in claim 2, wherein the first bilinear pooling feature extraction network in the bilinear pooling network in step (3c1) performs bilinear pooling feature extraction on each ISAR image convolution feature, and the feature fusion layer performs feature fusion on the extracted bilinear pooling feature of each ISAR image and the bilinear pooling feature of the corresponding optical image, wherein:

the first bilinear pooling feature extraction network in the bilinear pooling fusion network carries out bilinear pooling feature extraction on convolution features of each ISAR image, firstly, a bilinear pooling layer in the first bilinear pooling feature extraction network carries out bilinear pooling on convolution features f of each ISAR image, an output result is x, then, a feature normalization layer normalizes x, and the output result is z:

wherein the content of the first and second substances,

representing a co-located element multiplication operation, concat (-) representing a splice operation on a feature on a channel.

4. The method for object recognition based on ISAR image fusion and optical image fusion of multiple fusion deep neural networks as claimed in claim 2, wherein the step (3c2) is an additive fusion network, wherein the additive fusion layer is the output result of the first fully-connected network

And a second fully-connected networkOutput result of the network

And performing addition fusion, wherein the output result is F:

representing a co-located element multiplication operation.

5. The method for fusion recognition of ISAR image and optical image based on multiple fusion depth network of claim 1, wherein the step (3d) is performed by calculating H^tLoss value L of^tTo the weight parameter omega_tUpdating, wherein the calculation formula and the updating formula are respectively as follows:

wherein M represents the number of training samples,

represents omega_tEta represents L^tThe learning rate of (a) is set,

representing a derivative operation.