CN113537120B

CN113537120B - Complex convolution neural network target identification method based on complex coordinate attention

Info

Publication number: CN113537120B
Application number: CN202110858271.0A
Authority: CN
Inventors: 张袁鹏; 解岩; 张雷; 陈一畅; 姚汉英; 李槟槟; 范亚; 朱振波; 余方利; 汤子跃
Original assignee: Air Force Early Warning Academy
Current assignee: Air Force Early Warning Academy
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2023-04-07
Anticipated expiration: 2041-07-28
Also published as: CN113537120A

Abstract

The invention discloses a target identification method of a complex convolution neural network based on complex coordinate attention, which relates to the field of target identification, wherein the convolution neural network comprises the following steps: the device comprises an input layer, N basic units, a classification unit and an output layer; the processing unit is used for mapping the complex numbers into corresponding real numbers through modular operation and performing classification and identification; the N basic units include first to nth basic units, each of the N basic units including: the system comprises a first complex convolution module, a first complex batch normalization module, a first complex activation module and a first complex pooling module; wherein one of the N basic units further comprises: a plurality of coordinate attention modules; the complex coordinate attention module includes: the invention realizes the high-precision identification of the similar space cone target.

Description

Complex convolution neural network target identification method based on complex coordinate attention

Technical Field

The invention relates to the field of target identification, in particular to a target identification method of a complex convolutional neural network based on complex coordinate attention.

Background

Ballistic missiles release a bait that is very similar to the warhead during penetration, and therefore it is necessary to identify the warhead and the bait during the missile, thereby reducing the interception cost. The warhead and the bait can be regarded as similar space cone targets with the shapes consistent with the movement forms and only slight differences in movement parameters, so that the similar space cone target identification plays an important role in the fields of space resource utilization, space monitoring (surveyability) and military.

In recent years, studies for spatial object recognition by introducing a Convolutional Neural Network (CNN) have been increasing based on the idea of extracting a fine motion feature in an image domain and then using the extracted fine motion feature for recognition. Li et al, based on CNN, investigated the problem of identification of spatially different shaped, different precession frequency targets using a Multi-mode Fusion (Multi-mode Fusion) approach. Generating S-frequency band and X-frequency band one-dimensional distance images and time-frequency spectrograms of targets with different shapes and different precession frequencies by using an ideal point scattering model; then, the multimode data is used as input of CNN, and three targets of cone, small cone and cylinder are identified. Bai et al, xu et al and Han et al all use the time-frequency spectrogram of the target as the network input method to identify the target, and actually they convert the problem of identifying the target by using the micro-motion feature into the problem of identifying the image. Designing a CNN with three layers of depths, and generating a time-frequency spectrogram in three micro-motion forms of spin, precession and nutation by using an ideal point scattering model; then, the time-frequency spectrograms are properly cut and used as input of the designed CNN, and the three common micromotion forms are identified. Xu et al, designing a CNN with the depth of six layers, and generating echo signals in four micromotion forms of spin, rolling, precession and nutation by using a scattering point model; then obtaining time-frequency spectrograms of a plurality of micro-motion periods through WVD (Wigner-Ville distribution); and finally, the time-frequency spectrogram is used as input of the CNN to finish the identification of the four micromotion forms. Han et al designed a deep learning network consisting of one-dimensional parallel structures (1-D parallel structures) and Long-Short-Term Memory (LSTM) layers. The method comprises the following steps that the method utilizes an electromagnetic calculation method to simulate five targets with different structural parameters and different micro-motion forms, and echo data are obtained; then, performing Time-frequency analysis on the echoes by Short-Time Fourier Transform (STFT) to obtain Time-frequency spectrograms of a plurality of micro-motion periods; and finally, the time-frequency spectrograms are sent to a designed network to finish the identification of the five targets. Wang et al, based on electromagnetic calculation data, obtained more than one precession cycle distance-slow time image of three targets with different geometric shapes and the same micromotion form of cone, cone-cylinder (cone-cylinder) and cone-cylinder skirt (cone-cylinder-skirt), and sent it to the designed CNN to realize target recognition. From the prior art listed above, the main way of doing so is to retain the processing from "echo data domain" to "image domain" as preprocessing, and then replace the process of extracting the micro-motion features from "image domain" for identification with a deep convolutional neural network, which has the following problems: (1) Preprocessing such as time-frequency analysis (time-frequency analysis) or distance slow time imaging is required, so that longer signal processing time is required; (2) A long time of continuous observation of the target is required to obtain a complete periodic image of the target; (3) These methods only target different shapes, different micro-motion forms, and do not achieve the identification of similar spatial cone targets.

Disclosure of Invention

In order to solve the three problems existing in the identification of the space cone target based on the image domain CNN at present, the invention integrates the advantages of CV-CNN and an attention mechanism, introduces the coordinate attention of a real number domain into a complex number domain, constructs a convolutional neural network based on a complex number coordinate attention module and a target identification method, aims to directly operate radar echo complex data as input data, fully utilizes amplitude and phase information, and realizes the high-precision identification of similar space cone targets with the same geometric shape and micromotion form and slightly different micromotion parameters.

To achieve the above object, the present invention provides a convolutional neural network based on a complex coordinate attention module, the convolutional neural network comprising:

the device comprises an input layer, N basic units, a classification unit and an output layer;

the processing unit is used for mapping the complex numbers into corresponding real numbers through modular operation and performing classification and identification; the N basic units comprise first to Nth basic units, the first basic unit is connected with the input layer, the output of the first basic unit is the input of the second basic unit, the input of the Nth basic unit is the output of the (N-1) th basic unit, N is an integer larger than 1, the output of the Nth basic unit is the input of the processing unit, the output of the processing unit is the input of the classifier, and the classifier is connected with the output layer; the N basic units each include: the system comprises a first complex convolution module, a first complex batch normalization module, a first complex activation module and a first complex pooling module; wherein one of the N basic units further comprises: a plurality of coordinate attention modules; the complex coordinate attention module includes: the system comprises a complex coordinate attention embedding unit and a complex coordinate attention generating unit, wherein for each channel, the complex coordinate attention embedding unit is used for encoding a first complex input characteristic diagram of the channel along the horizontal direction and the vertical direction respectively, and generating first output characteristic information of the first complex input characteristic diagram encoded along the horizontal direction and second output characteristic information of the first complex input characteristic diagram encoded along the vertical direction in the channel respectively;

for each channel, the complex coordinate attention generating unit is to: splicing the first output characteristic information and the second output characteristic information to generate a characteristic information splicing result of the channel; performing feature dimensionality reduction on the feature information splicing result of the channel to obtain feature information after dimensionality reduction, and activating the feature information after dimensionality reduction to obtain a first complex output feature map of the channel; splitting the first complex output profile into a first tensor and a second tensor along a spatial dimension; adjusting the dimensions of the first tensor and the second tensor to be the same as the dimensions of the first complex input feature map, and obtaining a second complex output feature map of the channel in the horizontal direction and a third complex output feature map of the channel in the vertical direction; obtaining a third tensor and a fourth tensor, wherein the third tensor is the set of the second complex output characteristic maps of all the channels, and the fourth tensor is the set of the third complex output characteristic maps of all the channels;

expressing each element in the third tensor and the fourth tensor in a polar coordinate form, constraining the amplitude of the polar coordinate by using a constraint function, respectively obtaining a fourth complex output feature map and a fifth complex output feature map in the horizontal and vertical spatial directions, expanding the fourth complex output feature map and the fifth complex output feature map to generate attention weight distribution in the horizontal and vertical spatial directions, and applying the attention weight distribution to a complex input feature map of the complex coordinate attention module to obtain a complex output feature map of the complex coordinate attention module;

wherein, the complex input characteristic diagram and the complex output characteristic diagram are both complex characteristic diagrams.

When the convolutional neural network based on the complex coordinate attention module is used for target identification, preprocessing such as time-frequency analysis or distance slow time imaging is not needed, so that longer signal processing time is not needed, and the efficiency is higher; when the convolutional neural network based on the complex coordinate attention module is used for identifying the target, the target does not need to be continuously observed for a long time to obtain a complete periodic image of the target, and the efficiency is high; the convolutional neural network based on the complex coordinate attention module realizes the identification of the similar space cone target.

Preferably, in the basic unit not including the complex coordinate attention module, the output of the complex convolution module in the basic unit is the input of the complex batch normalization module, the output of the complex batch normalization module is the input of the complex activation module, and the output of the complex activation module is the input of the complex pooling module.

Preferably, in the basic unit including the complex coordinate attention module, the output of the complex convolution module in the basic unit is the input of the complex batch normalization module, the output of the complex batch normalization module is the input of the complex coordinate attention module, the output of the complex coordinate attention module is the input of the complex activation module, and the output of the complex activation module is the input of the complex pooling module.

Preferably, the classification unit includes:

the second complex convolution module, the second complex batch normalization module, the second complex activation module, the third complex convolution module and the classifier; the output of the second complex convolution module is the input of the second complex batch normalization module, the output of the second complex batch normalization module is the input of the second complex activation module, the output of the second complex activation module is the input of the third complex convolution module, and the output of the third complex convolution module is the input of the classifier.

Preferably, the convolutional neural network includes first to sixth basic units.

Preferably, the sixth base unit comprises said complex coordinate attention module.

Preferably, an optimizer is arranged in the convolutional neural network and used for updating the network weight and the bias term.

Preferably, the number of the convolution kernels of the first to sixth basic units is 64, 128, 256 and 256, respectively, the sizes of the convolution kernels are all 1 × 3, the sizes of the sampling windows of the first complex pooling module are all 1 × 2, the sliding step of the convolution is 1, and the padding number is 1.

Preferably, the complex input feature map is a complex input feature map of the spatial target identification signal, and the complex output feature map is a complex output feature map of the spatial target identification signal.

On one hand, a complex coordinate attention module CV-CA utilizes a complex convolution neural network to obtain amplitude and phase characteristics of a signal through complex real part and imaginary part correlation learning; on the other hand, spatial information and channel information in the horizontal and vertical directions are concerned by the attention of complex coordinates, remote modeling of characteristic information is better carried out according to the lazy relationship, and the characteristic characterization capability of the target object is enhanced.

In the channel attention, global pooling is usually adopted to encode global spatial information, but it compresses the global spatial information into a channel descriptor, so that it is difficult to maintain location information, which is particularly important for capturing the spatial structure. Therefore, in the coordinate attention module, the operation of decomposing the global pooling into two one-dimensional feature codes is extended to a complex field, the complex feature maps X of each channel are encoded along two horizontal and vertical directions respectively (direction-correlation is called horizontal and vertical directions for short), and direction-correlated complex feature maps are generated, so that the features in two spatial directions are integrated respectively.

The complex coordinate attention embedding unit outputs accurate spatial position information aggregated under the global receptive field. Based on the encoding result of the complex coordinate attention embedding unit, the complex coordinate attention module designs a second transformation called a complex coordinate attention generating unit. The complex coordinate attention generating unit transformation includes three parts, respectively: the method comprises the following steps of (1) direction-related feature information aggregation, (2) direction-related complex feature map splitting, and (3) complex coordinate attention automatic allocation.

Preferably, in the present invention, X is a complex input feature map of the complex coordinate attention module,

wherein x is _c For a complex input characteristic map of the c-th channel, the decision is taken>

Is a C x W x H dimension number tensor, is greater than or equal to>

The method comprises the following steps of (1) obtaining a complex space, C being the number of channels of input feature maps, W being the width of each input feature map, and H being the height of each input feature map; y is a complex output characteristic diagram of the complex coordinate attention module, based on the characteristic diagram>

Wherein, y _c Is a complex output characteristic diagram of the C channel, C is an integer which is greater than or equal to 1 and less than or equal to C, and the dimension of X is the same as that of Y;

the output of the p channel of the complex input feature diagram X after being coded along the horizontal direction is

The output of the p channel of the multiple input characteristic diagram X after being coded along the vertical direction is->

Wherein:

wherein j represents an imaginary unit,

represents the real part of a complex number, is greater than or equal to>

Representing the imaginary part of the complex number, h being the horizontal pixel index of the input feature map, x _p (h, j) is the value of the h row and j column of the p channel of the complex input characteristic diagram, i is the pixel index of the vertical direction of the input characteristic diagram, x _p (i, w) is the value of the ith row and the w column of the ith channel of the complex input characteristic diagram.

Preferably, the invention will

And &>

Splicing is carried out to obtain a characteristic information splicing result M,

each tensor in M is denoted as pick>

Wherein, wherein [, ]]Representing splicing operation, wherein T is a transposition matrix;

the complex coordinate attention generating unit uses a convolution kernel of 1 multiplied by 1 to carry out feature dimensionality reduction on the feature information splicing result, wherein the feature dimensionality reduction can reduce the number of parameters, and meanwhile, cross-channel information interaction and integration can be realized, the complex coordinate attention generating unit uses a convolution kernel of 1 multiplied by 1 to carry out feature dimensionality reduction on the feature information splicing result, and the complex coordinate attention generating unit sets the feature dimensionality reduction

Is a 1 x 1 rewinding core shared by the convolution layers, wherein it is present>

Represents the kth rewinding log kernel, k =1,2>

Represents medium->

C ofA 1X 1 rewinding and accumulating nucleus>

Represents medium->

Q =1, 2.. C, r denotes a scaling coefficient for controlling the number of channels of the convolution output feature map, s denotes the step size of the convolution operation, and the k-th feature map of the convolution output is v _k (i, j) wherein:

f _k (i,j)＝σ(v _k (i,j))

wherein m is _q For the qth tensor in M,

m _q (i · s, j · s) are the values of the i · s row and j · s column of the q th tensor after feature information stitching, v _k (i, j) represents the complex output characteristic map of the k-th channel which is not activated, which is taken together with the signal>

A complex output feature map representing the kth channel, the collection of complex feature maps for each channel being marked as >>

f _C/r For the complex output characteristic diagram of the C/r channel, sigma (-) represents a complex activation function; the complex activation function is a CReLU function, which is:

wherein z is a complex variable.

Preferably, in the invention, the collection of the complex characteristic maps of all channels is emptySplitting the inter-dimension into the first measures

And said second tensor +>

Wherein the content of the first and second substances,

and &>

For a complex output characteristic map of the kth channel in the horizontal direction, based on the characteristic map>

The complex output characteristic diagram of the kth channel in the vertical direction, the complex output characteristic diagram of the C/r channel in the horizontal direction and the complex output characteristic diagram of the C/r channel in the vertical direction are shown.

Preferably, the present invention uses a 1 × 1 rewinding kernel to convert f ^h And f is ^w Restoring to the same dimension as the X to obtain

And &>

Wherein:

wherein the content of the first and second substances,

is a complex output characteristic diagram for the ith channel in the horizontal direction>

A complex output characteristic diagram for the ith channel in the vertical direction>

Is->

The (o) th 1 x 1 rewinding kernel of (1), o =1,2, · C/r, · v ·>

For a complex output characteristic diagram in the horizontal direction for the mth channel>

Is->

The (o) th 1 x 1 rewinding and accumulating kernel of (4), (v), and (v)>

For a complex output characteristic diagram in the vertical direction for the mth channel>

Second complex output profile, v, representing the ith channel in the horizontal direction ^h For the set of the second complex output characteristic maps of all channels, a->

Denotes the first in the vertical directionThird complex output profile of the channel, v ^w For the set of the third complex output profiles of all channels,

preferably, v is defined in the present invention ^h And said v ^w Each element in the system is represented in a polar coordinate form, and the amplitude of the polar coordinate is constrained by a Sigmoid function, which specifically comprises the following steps:

wherein the content of the first and second substances,

and &>

The complex output characteristic diagrams of the ith channel in the horizontal direction and the ith channel in the vertical direction are respectively subjected to the result of the restraint of the Sigmoid function on the amplitude, and are then subjected to the judgment of the result>

And &>

The phase of the complex output characteristic map of the ith channel in the horizontal and vertical directions, respectively, < > 4>

And &>

The magnitude of the complex output signature for the ith channel in the horizontal and vertical directions respectively,

and &>

The complex output profile phase, sig (·) for the ith channel in the horizontal and vertical directions, respectively, represents the Sigmoid function, and the result after constraint by Sigmoid is recorded as £ er>

And &>

Under a rectangular coordinate system:

preferably, in the present invention, g is ^h And said g ^w Performing expansion to generate attention weight distribution in horizontal and vertical space directions, and applying the attention weight distribution to the complex number of seatsMarking a complex input feature diagram of the attention module, and obtaining a complex output feature diagram of the complex coordinate attention module, wherein the complex output feature diagram of the complex coordinate attention module is y _l (i,j)：

Wherein x is _l And (i, j) the ith row and jth column of the ith channel complex input characteristic diagram.

The input and the output of the complex coordinate attention module are both complex form feature information, and the complex coordinate attention module can process the complex feature information.

The complex coordinate attention module obtains the amplitude and phase characteristics of the signal through the associated learning of a real part and an imaginary part of a complex number by using a complex convolution neural network.

According to the invention, the complex coordinate attention module focuses on the spatial information and the channel information in the horizontal and vertical directions simultaneously through the complex coordinate attention, so that the remote lazy relationship of the characteristic information is better modeled, and the characteristic representation capability of the target object is enhanced.

The invention also provides a target identification method, which comprises the following steps:

obtaining a target signal;

inputting the target information into the complex coordinate attention module-based convolutional neural network;

and the convolutional neural network outputs a target recognition result.

Preferably, the target signal is radar echo data.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

the invention integrates the advantages of CV-CNN and an attention mechanism, introduces the coordinate attention of a real number domain into a complex number domain, constructs a convolutional neural network and a target identification method based on a complex coordinate attention module, can take radar echo complex data as input data to carry out direct operation, fully utilizes amplitude and phase information, and realizes the high-precision identification of similar space cone targets with the same geometric shape and micromotion form and slightly different micromotion parameters.

When the convolutional neural network based on the complex coordinate attention module is used for target identification, preprocessing such as time-frequency analysis or distance slow time imaging is not needed, so that longer signal processing time is not needed, and the efficiency is higher; when the convolutional neural network based on the complex coordinate attention module is used for identifying the target, the target does not need to be continuously observed for a long time to obtain a complete periodic image of the target, and the efficiency is high.

The convolutional neural network based on the complex coordinate attention module can realize high-precision identification on similar space cone targets with the same micromotion form and only slightly different micromotion parameters.

According to the end-to-end similar space cone target identification method, the radar echo complex data are input and the identification result is output, so that echo signal preprocessing and phase information loss are avoided, and the time required by identification is remarkably shortened.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;

FIG. 1 is a schematic diagram of a convolutional neural network based on a complex coordinate attention module;

FIG. 2 is a schematic diagram of a plurality of coordinate attention modules;

fig. 3 is a flowchart illustrating a complex input feature map processing method.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflicting with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described and thus the scope of the present invention is not limited by the specific embodiments disclosed below.

It is understood that the terms "a" and "an" should be interpreted as meaning that a number of one element or element is one in one embodiment, while a number of other elements is one in another embodiment, and the terms "a" and "an" should not be interpreted as limiting the number.

Example one

Referring to fig. 1, fig. 1 is a schematic structural diagram of a convolutional neural network based on a complex coordinate attention module, the convolutional neural network based on the complex coordinate attention module, and the convolutional neural network includes:

the processing unit is used for mapping the complex numbers into corresponding real numbers through modular operation and performing classification and identification; the N basic units comprise first to Nth basic units, the first basic unit is connected with the input layer, the output of the first basic unit is the input of the second basic unit, the input of the Nth basic unit is the output of the (N-1) th basic unit, N is an integer larger than 1, the output of the Nth basic unit is the input of the processing unit, the output of the processing unit is the input of the classifier, and the classifier is connected with the output layer; the N basic units each include: the system comprises a first complex convolution module, a first complex batch normalization module, a first complex activation module and a first complex pooling module; wherein one of the N basic units further comprises: a plurality of coordinate attention modules; the complex coordinate attention module includes: the system comprises a complex coordinate attention embedding unit and a complex coordinate attention generating unit, wherein for each channel, the complex coordinate attention embedding unit is used for encoding a first complex input feature map of the channel along the horizontal direction and the vertical direction respectively, and generating first output feature information of the first complex input feature map after the first complex input feature map is encoded along the horizontal direction and second output feature information of the first complex input feature map after the first complex input feature map is encoded along the vertical direction;

for each channel, the complex coordinate attention generating unit is to: splicing the first output characteristic information and the second output characteristic information to generate a characteristic information splicing result of the channel; performing feature dimension reduction on the feature information splicing result of the channel to obtain feature information after dimension reduction, and activating the feature information after dimension reduction to obtain a first complex output feature map of the channel; splitting the first complex output profile into a first tensor and a second tensor along a spatial dimension; adjusting the dimensions of the first tensor and the second tensor to be the same as the dimensions of the first complex input feature map to obtain a second complex output feature map of the channel in the horizontal direction and a third complex output feature map of the channel in the vertical direction; obtaining a third tensor and a fourth tensor, the third tensor being the set of the second complex output eigenmaps for all channels, the fourth tensor being the set of the third complex output eigenmaps for all channels;

The invention integrates the advantages of CV-CNN and attention mechanism, introduces the coordinate attention of real number domain into complex number domain, and constructs complex number attention network. An attention mechanism (attention mechanism) can capture long-distance dependency relationship through global information search, automatically focus important information through weighted distribution, and ignore unimportant redundant information, which is useful for similar space cone target identification in a short observation duration. The attention mechanism undergoes the development process of spatial attention, channel attention and space-channel attention. The spatial dimension and channel dimension information play a role in improving the network identification capability, which is proved again in a newly proposed Coordinate Attention (CA) module, and the model performance is improved by embedding spatial position information into the channel Attention.

The Complex-Valued convolutional neural network (CV-CNN) can directly process echo Complex data, fully utilize amplitude and phase information, avoid echo preprocessing and reduce recognition time.

The CV-CANet is built based on a CV-CA module. The CV-CANet is an end-to-end complex convolutional neural network, and the architecture of the network is shown in FIG. 1. Each layer of basic unit of the network consists of 4 basic modules of complex convolution, complex batch normalization, complex activation and complex pooling, wherein a CV-CA module is embedded in the sixth layer. The numbers of convolution kernels of the first layer to the sixth layer are 64, 128, 256 and 256, respectively, the size of each convolution kernel (volumetric kernel) is 1 × 3, the size of the sampling window of the pooling layer (pooling layer) is 1 × 2, the sliding step (stride) of all convolution layers is 1, and the padding (padding) is 1. The last two layers of the network replace the traditional full connection with full convolution to reduce the model parameters. The output result of the final layer of full convolution is complex number, and the class label of the target is real number. Therefore, the complex numbers are mapped into corresponding real numbers through modular operation and then sent to a Softmax classifier for classification and identification. The loss function is a cross entropy loss function. Adaptive motion Estimation (Adam) acts as an optimizer for updating network weights and bias terms.

The number of layers of the complex convolutional neural network, the number of the basic units and the concrete CV-CA module embedded in any basic unit are not limited, and the number of layers and the number of the basic units can be flexibly adjusted according to actual requirements.

According to the similar space cone target end-to-end identification method based on CV-CANet, the identification result of the similar space cone target is directly obtained by inputting radar echo data, so that the problems of complex echo signal preprocessing and phase information loss are solved. In order to directly process radar echo complex signals, the invention provides a CV-CA module and constructs CV-CANet based on the module. The invention introduces a coordinate attention mechanism into a complex field, deduces and establishes basic structures such as direction-related complex feature information aggregation, direction-related complex feature diagram splitting, complex coordinate attention automatic allocation and the like. And effective identification can be realized for similar space cone targets with the same micromotion form and only slightly different micromotion parameters.

The method is usually carried out under the condition that the observation time length does not exceed a half cycle, and in practice, a radar cannot observe a target for a long time, or noise exists in data, or data is lost, so that the target is expected to be better identified by less data, and the real-time property is ensured.

Example two

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a composition of a plurality of coordinate attention modules, in this embodiment, the plurality of coordinate attention modules include: the system comprises a complex coordinate attention embedding unit and a complex coordinate attention generating unit, wherein for each channel, the complex coordinate attention embedding unit is used for encoding a first complex input feature map of the channel along the horizontal direction and the vertical direction respectively, and generating first output feature information of the first complex input feature map after the first complex input feature map is encoded along the horizontal direction and second output feature information of the first complex input feature map after the first complex input feature map is encoded along the vertical direction;

The prior art CV-CNN that simply separates the real and imaginary parts of the complex number or uses a real convolution kernel does not take advantage of the complex convolution kernel. Therefore, the invention follows the law of complex number calculation, and carries out detailed formula derivation and constructs a complex number coordinate attention module (CV-CA module) according to a complex number network basic unit and a Real number coordinate attention (CV-CA, RV-CA) module.

The CV-CA module proposed by the present invention includes a Complex-Coordinate Attention Information Embedding (CVCIE) and a Complex-Coordinate Attention authorization Generation (CVCAG) unit.

In practical application, the input of the CV-CA module may be any complex-form feature information, and the present invention is described by taking a radar echo signal as an example, but the input feature information in the present invention is not limited to the feature information of the radar echo signal, H =1 is used for narrow-band radar echo data, and the value of H in practical application may be determined according to practical situations, and the present invention is not limited specifically.

The echo signal of the radar measurement can be expressed as:

S _t (n)＝S _th (n)+ν(n)

wherein S is _th And (n) is a theoretical radar echo signal, v (n) represents independent and equally distributed Gaussian white noise generated by a radar receiver, and n represents a pulse sequence number.

Is provided with

Is a multiple input feature map, in which->

A complex input characteristic diagram representing the p-th channel, which is evaluated>

Is a complex output characteristic diagram in which>

And (3) a complex output characteristic diagram of the p channel, wherein the dimension of the complex output characteristic diagram is the same as that of the X.

Global pooling is usually used for encoding global spatial information in channel attention, but it compresses the global spatial information into a channel descriptor, so that it is difficult to maintain the position information, which is particularly important for capturing the spatial structure. Therefore, in the coordinate attention module, the operation of decomposing the global pooling into two one-dimensional feature codes is extended to the complex field, the complex feature maps X of each channel are encoded along two horizontal and vertical directions (direction-correlation is abbreviated as horizontal and vertical directions), and a direction-correlation complex feature map is generated, so as to integrate the features in two spatial directions, respectively, and this operation is described mathematically as:

wherein j represents an imaginary number unit,

represents the real part of a complex number, is greater than or equal to>

Representing the imaginary part of the complex number, h being the horizontal pixel index of the input feature map, x _p (h, j) is the value of the h row and j column of the p channel of the complex input characteristic diagram, i is the pixel index of the vertical direction of the input characteristic diagram, x _p (i, w) is the value of the ith row and the w column of the ith channel of the complex input characteristic diagram. The plural characteristic maps of each channel X are transformed to respectively obtain two plural tensors ^ greater than or equal to>

Wherein->

Wherein->

The above CVCIE outputs accurate spatial location information aggregated under the global receptive field. Based on the CVCIE encoding results, the CV-CA module designs a second transformation, referred to as CVCAG. The CVCAG transform includes three steps, respectively: the method comprises the following steps of (1) direction-related feature information aggregation, (2) direction-related complex feature map splitting, and (3) complex coordinate attention automatic allocation.

(1) Direction-related complex feature information aggregation

And (4) performing complex splicing. And (3) splicing the results obtained by the formula (1) and the formula (2). Is provided with

For the stitched result, each tensor in M is represented as:

wherein [, ] represents a stitching operation.

And (5) reducing the dimension of the feature. The 1 x 1 convolution kernel is used for reducing the dimension of the characteristic channel, reducing the parameter quantity and realizing the information interaction and integration of the cross-channel. Is provided with

A 1 x 1 rewind integrating kernel shared for the layer, wherein>

Representing the kth rewinding kernel, k =1,2, \ 8230;, C/r, \ depending on the condition>

Represents medium->

The Cth 1 x 1 rewinding area nucleus of (4), based on the number of pixels in the file system, and a number of pixels in the file system>

Represents medium->

Q =1, 2.. Times, C, r denotes a scaling coefficient for controlling the number of channels of the convolution output feature map (r =18 in the present invention, where r may take other values in practical applications, and embodiments of the present invention are not specifically limited), s denotes a step size of convolution operation, and k-th feature map of convolution output is v £ 1 _k (i, j) wherein:

f _k (i,j)＝σ(v _k (i,j)) (5)

wherein m is _q For the qth tensor in M,

A complex output characteristic diagram representing the k-th channel, and the set of complex characteristic diagrams of each channel is denoted as

f _C/r For the complex output characteristic diagram of the C/r channel, sigma (-) represents the complex activation function; the complex activation function is a CReLU function, which is:

wherein z is a complex variable.

(2) Direction-dependent complex feature map splitting

The complex signature is split. Splitting f into two independent tensors along the spatial dimension

And &>

Namely:

/>

wherein the content of the first and second substances,

and (5) feature dimension increasing. Using a 1X 1 rewinding kernel to f ^h And f ^w Reverting to the same dimension as the input signature X. Setting the rewinding product kernel of 1 x 1 in the convolution operation in the horizontal direction as

Wherein

Represents the ith (l =1,2, \8230;, C) rewinding and accumulating nucleus, and/or the combination thereof>

Represents->

The (o =1, 2., C/r) 1 × 1 rewinding core. In the same way, is based on>

Is a rewinding multiplication kernel of 1 × 1 in the case of a convolution operation in the vertical direction, wherein->

Indicates the lth>

Complex convolution kernel->

Represents->

The qth (l =1, 2.., C) rewinding core of 1 × 1, then:

using a 1 x 1 rewinding kernel to wrap said f ^h And f is ^w Restoring to the same dimension as the X to obtain

And

wherein:

wherein the content of the first and second substances,

a complex output characteristic diagram for the ith channel in the horizontal direction>

Is a complex output characteristic diagram for the ith channel in the vertical direction>

Is->

O =1,2,. C/r,. X.r,. For the (o) th 1 x 1 rewinding nucleus in (1)>

Is->

The (o) th 1 x 1 rewinding and accumulating kernel of (4), (v), and (v)>

For a complex output characteristic map in the vertical direction for the ith channel>

Third complex output characteristic diagram, v, representing the ith channel in the vertical direction ^w For the set of the third complex output profiles of all channels,

(3) Automatic complex coordinate attention allocation

Direction-dependent complex attention weight coefficients are calculated. Tensor v of complex eigen-map ^h And v ^w Each element (complex value) in the system is written into a polar coordinate form, then the amplitude of the polar coordinate is constrained by adopting a Sigmoid function, and the amplitude is limited within a value range of 0-1, namely:

wherein the content of the first and second substances,

and &>

And &>

And &>

and &>

The phase of the complex output characteristic diagram of the ith channel in the horizontal direction and the vertical direction respectively, sig (·) represents a Sigmoid function and is used for converting the final amplitude into a numerical value between 0 and 1, and the result after being constrained by the Sigmoid is recorded as a numerical value

And &>

This conversion of the amplitude of the polar coordinates into the 0-1 range alone does not affect the phase, i.e. the phase information is preserved.

Since the original image is in the rectangular coordinate system, the expressions of equation (11) and equation (14) are converted to the rectangular coordinate system:

under a rectangular coordinate system:

complex coordinate attention is automatically assigned. G output for horizontal and vertical spatial directions ^h And g ^w And expanding, generating attention weight distribution in each space direction, and acting on the complex input feature map to realize automatic distribution of complex coordinate attention. The output of the complex coordinate attention module is obtained as:

According to the above CV-CA construction process, on one hand, the CV-CA obtains amplitude and phase characteristics of a target signal, such as a radar echo signal, through complex real part and imaginary part associated learning by using a complex convolutional neural network; on the other hand, spatial information and channel information in the horizontal and vertical directions are concerned by the attention of complex coordinates, remote modeling of characteristic information is better carried out according to the lazy relationship, and the characteristic characterization capability of the target object is enhanced.

The CV-CA module provided by the invention comprises two parts, wherein the first part is complex coordinate attention embedding, and the second part is complex coordinate attention generation. The physical significance of each part is explained in detail below.

The first part is complex coordinate attention insertion. In the field of computer vision, the position information on the feature map has an important influence on acquiring the structural features of the space. Since the targets to be resolved by the invention are very similar space cone targets, the invention considers that the space structure information is beneficial to the resolution and identification of the targets. Therefore, in order for the proposed complex coordinate attention module to retain the position information and further capture the long distance dependency on the space by using the position information, the present invention decomposes the global pooling in CNN into pooling operation in the horizontal direction and pooling operation in the vertical direction.

The second part is complex coordinate attention generation, which is done in three sub-steps. For this section, the general design principle of the present invention has three points: 1) The modules should be as simple and light-weight as possible. 2) The module should make full use of the spatial location information obtained in the first part. 3) The module should take into account the interrelationship between the channels in order to take advantage of the channel attention.

Direction-related feature information aggregation. The first part has obtained spatial position information in both the horizontal and vertical directions. Under the principle that the designed module is as simple as possible and the parameter quantity is as small as possible, the invention firstly splices the spatial position information in the horizontal direction and the vertical direction, and aims to simultaneously retain the information in the two directions. Then, the invention uses 1 × 1 convolution kernel to convolute the splicing result for dimension reduction. By the design, the characteristic information among the channels is considered, and the parameter quantity is reduced.

The direction-related complex feature information is split. The present invention expects that the weight in the horizontal direction and the weight in the vertical direction should be applied to the horizontal direction and the vertical direction of the input feature map, respectively, and the number of channels of the weights should be consistent with the number of channels of the input feature map. Therefore, in the first sub-step, the weights of the spatial position information and the channel information are taken into consideration, and the weight in the horizontal direction and the weight in the vertical direction of the channel information are taken into consideration. Then, the two directions are subjected to dimensionality raising by using a convolution kernel of 1 × 1 respectively.

A plurality of attentions are automatically assigned. After the above operation steps, the present invention obtains complex weights considering both spatial position information and channel information, and on one hand, the present invention is expected to keep the phase information of the complex weights, and on the other hand, the magnitude of the weights is limited to the interval of 0-1. Finally, the attention weight is respectively applied to each element of the input feature vector and each channel, so that the complex coordinate attention provided by the invention is realized. The method can weight each channel and pay attention to the important channel; spatial information is also considered; regions that facilitate target recognition are focused.

In addition, the CV-CA module in the embodiment can obtain the optimal feature recognition capability with the increase of fewer parameters under the condition of ensuring the operation efficiency of the model, and can improve the recognition capability of the model and reduce the target misjudgment probability.

EXAMPLE III

A third embodiment of the present invention provides a complex input feature map processing method, please refer to fig. 3, where fig. 3 is a schematic flow chart of the complex input feature map processing method, where the method includes:

obtaining a complex input characteristic diagram to be processed;

coding the first complex input characteristic diagram of the channel along the horizontal direction and the vertical direction respectively, and generating first output characteristic information of the first complex input characteristic diagram coded along the horizontal direction and second output characteristic information coded along the vertical direction in the channel respectively;

for each channel, splicing the first output characteristic information and the second output characteristic information to generate a characteristic information splicing result of the channel; performing feature dimensionality reduction on the feature information splicing result of the channel to obtain feature information after dimensionality reduction, and activating the feature information after dimensionality reduction to obtain a first complex output feature map of the channel; splitting the first complex output feature map into a first tensor and a second tensor along a spatial dimension; adjusting the dimensions of the first tensor and the second tensor to be the same as the dimensions of the first complex input feature map to obtain a second complex output feature map of the channel in the horizontal direction and a third complex output feature map of the channel in the vertical direction; obtaining a third tensor and a fourth tensor, wherein the third tensor is the set of the second complex output characteristic maps of all the channels, and the fourth tensor is the set of the third complex output characteristic maps of all the channels;

expressing each element in the third tensor and the fourth tensor in a polar coordinate form, constraining the amplitude of the polar coordinate by using a constraint function, respectively obtaining a fourth complex output feature diagram and a fifth complex output feature diagram in the horizontal and vertical spatial directions, expanding the fourth complex output feature diagram and the fifth complex output feature diagram to generate attention weight distribution in the horizontal and vertical spatial directions, and applying the attention weight distribution to the to-be-processed complex input feature diagram to obtain a processed complex output feature diagram;

The method can be used for processing complex output characteristic information, amplitude and phase characteristics of signals are obtained through complex convolution neural network through complex real part and imaginary part correlation learning, spatial information and channel information in the horizontal direction and the vertical direction are concerned by the method through complex coordinate attention, remote lazy relation of the characteristic information is better modeled, and characteristic characterization capability of a target object is enhanced.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for identifying a target by a complex convolutional neural network based on complex coordinate attention, the method comprising:

obtaining a plurality of radar echo data of the space cone target;

inputting the complex radar echo data into a complex convolution neural network based on a complex coordinate attention module;

the complex convolutional neural network outputs the identification result of the space cone target, and the space cone target is identified to be a bullet or a bait;

the complex convolutional neural network includes:

the system comprises an input layer, N basic units, a classification unit of a space cone target and an output layer;

the input layer is used for inputting the complex radar echo data;

the N basic units comprise first to Nth basic units, the first basic unit is connected with the input layer, the output of the first basic unit is the input of the second basic unit,. The input of the Nth basic unit is the output of the (N-1) th basic unit, N is an integer larger than 1, and the output of the Nth basic unit is the input of the classification unit of the space cone target; the N basic units each include: the system comprises a first complex convolution module, a first complex batch normalization module, a first complex activation module and a first complex pooling module; wherein one of the N basic units further comprises: a plurality of coordinate attention modules; the complex coordinate attention module includes: the system comprises a complex coordinate attention embedding unit and a complex coordinate attention generating unit, wherein for each channel, the complex coordinate attention embedding unit is used for encoding a first complex input feature map of a space cone target of the channel along the horizontal direction and the vertical direction respectively, and generating first output feature information of the space cone target encoded by the first complex input feature map along the horizontal direction and second output feature information of the space cone target encoded along the vertical direction in the channel respectively;

for each channel, the complex coordinate attention generating unit is to: splicing the first output characteristic information and the second output characteristic information to generate a characteristic information splicing result of the channel; performing feature dimensionality reduction on the feature information splicing result of the channel to obtain feature information after dimensionality reduction, and activating the feature information after dimensionality reduction to obtain a first complex output feature map of the channel; splitting the first complex output feature map along a spatial dimension into a first tensor along a horizontal direction and a second tensor along a vertical direction; adjusting the dimensions of the first tensor and the second tensor to be the same as the dimensions of the first complex input feature map to obtain a second complex output feature map of the channel in the horizontal direction and a third complex output feature map of the channel in the vertical direction; obtaining a third tensor and a fourth tensor, wherein the third tensor is the set of the second complex output characteristic maps of all the channels, and the fourth tensor is the set of the third complex output characteristic maps of all the channels; expressing each element in the third tensor and the fourth tensor in a polar coordinate form, constraining the amplitude of the polar coordinate by using a constraint function, respectively obtaining a fourth complex output feature map and a fifth complex output feature map in the horizontal and vertical spatial directions, expanding the fourth complex output feature map and the fifth complex output feature map to generate attention weight distribution in the horizontal and vertical spatial directions, and applying the attention weight distribution to a complex input feature map of the complex coordinate attention module to obtain a complex output feature map of the complex coordinate attention module;

wherein, the complex input characteristic diagram and the complex output characteristic diagram are both complex characteristic diagrams;

the classification unit of the space cone target is connected with the output layer; and the classification unit of the space cone target is used for mapping the complex output data of the Nth basic unit into corresponding real numbers through modular operation and performing classification and identification on the space cone target.

2. The method of claim 1, wherein in the basic unit not including the complex coordinate attention module, the output of the complex convolution module in the basic unit is the input of the complex batch normalization module, the output of the complex batch normalization module is the input of the complex activation module, and the output of the complex activation module is the input of the complex pooling module.

3. The method of claim 1, wherein in the basic unit comprising the plurality of coordinate attention modules, the outputs of the plurality of convolution modules in the basic unit are the inputs of the plurality of batch normalization modules, the outputs of the plurality of batch normalization modules are the inputs of the plurality of coordinate attention modules, the outputs of the plurality of coordinate attention modules are the inputs of the plurality of activation modules, and the outputs of the plurality of activation modules are the inputs of the plurality of pooling modules.

4. The method of claim 1, wherein the classification unit comprises:

5. The complex coordinate attention-based object recognition method of a complex convolutional neural network as claimed in claim 1, wherein the complex convolutional neural network comprises first to sixth basic units.

6. The complex coordinate attention-based object recognition method of a complex convolutional neural network as claimed in claim 5, wherein a sixth basic unit comprises the complex coordinate attention module.

7. The complex coordinate attention-based target identification method of the complex convolutional neural network as claimed in claim 1, wherein an optimizer is provided in the complex convolutional neural network for updating the network weight and the bias term.

8. The method of claim 1, wherein the complex input feature map is a complex input feature map of complex radar echo data of the spatial cone target, and the complex output feature map is a complex output feature map of complex radar echo data of the spatial cone target.