CN116452960A

CN116452960A - Multi-mode fusion military cross-domain combat target detection method

Info

Publication number: CN116452960A
Application number: CN202310425308.XA
Authority: CN
Inventors: 魏明强; 范溢华; 燕雪峰
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-07-18

Abstract

The invention discloses a multi-mode fusion military cross-domain combat target detection method, which relates to the technical field of target detection and specifically comprises the following steps of: s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics; s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics; s3: and (3) feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism. The invention relates to a multi-mode fusion military cross-domain combat target detection method, which is characterized in that image information and sound signals in different domains are captured through different sensors in different domains, the captured image information and sound signals are subjected to feature extraction, a attention mechanism is used for feature fusion, and the fused features are used for target detection.

Description

Multi-mode fusion military cross-domain combat target detection method

Technical Field

The invention relates to the technical field of target detection, in particular to a multi-mode fusion military cross-domain combat target detection method.

Background

With the continuous progress of military technologies, battlefield space is continuously expanded, battlefield is expanded from traditional land, ocean and air to space, network space, electromagnetic spectrum, information environment, cognitive category and the like, and the battlefield is also more and more emphasized, so that great changes are brought to battlefield characteristics rules and winning mechanisms, new battlefield modes are continuously emerging, and cross-domain battlefield becomes a new battlefield mode.

The main characteristic of the cross-domain combat is that the limit between the army and the field is broken, the combined combat capability of the fields such as the air, ocean, land, space, network, electromagnetic spectrum and the like is utilized to the maximum extent, so that synchronous cross-domain firepower and global maneuver are realized, the advantages of the physical domain, the cognitive domain and the time aspect are taken, the intelligent combat characteristics of the cross-domain cooperation are increasingly obvious, and the future combat is promoted to develop towards the cross-domain combat direction.

Therefore, based on the deep learning technology, the robustness, generalization and effectiveness of target detection of a military system in different fields are improved by using image and sound multi-mode information fusion, the method is very important for developing the fight capability of the army under cross-domain collaborative intelligent fight, and in recent years, a target detection algorithm based on the deep learning has been well developed.

The existing target detection difficulty of cross-domain combat is high, robust feature representation needs to be obtained for different domain information, and downstream target detection is served to complete military cross-domain collaborative intelligent combat.

Disclosure of Invention

The invention aims to provide a multi-mode fusion military cross-domain combat target detection method, which is used for extracting effective characteristics to perform characteristic fusion through image information and sound signals captured by different sensors in different domains, and performing target detection so as to improve target detection performance.

In order to solve the problems, the technical scheme adopted by the invention is as follows:

a multi-mode fusion military cross-domain combat target detection method specifically comprises the following steps:

s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics;

s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics;

s3: feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism;

s4: network training, namely inputting the fused characteristics into a target detector to obtain a detection result, calculating detection loss by using the detection result and a truth value label, and training a network;

s5: and (3) target detection, namely inputting the image and the sound to be detected into a trained network, and obtaining a target detection reasoning result.

Further, the image information and the sound signals in S1 and S2 are captured by different sensors in different domains.

Further, the step of S1 extracting visual features includes the steps of:

s101: inputting an image into an acceptance Block, wherein the acceptance Block comprises a plurality of branches, each branch uses convolution kernels with different sizes to carry out convolution operation, and simultaneously captures features with different scales;

s102: and splicing the outputs of the branches in the channel dimension to obtain visual characteristics.

Further, the step of S2 extracting the sound feature includes the steps of:

s201: performing short-time Fourier transform on an input sound signal, and converting the sound signal into a spectrogram:

wherein S (ω, τ) is a two-dimensional matrix representing the transformed spectral result, ω represents angular frequency, t represents time, x (t) represents the original signal, w (t- τ) represents the window function, τ represents the center of the window function, j represents the imaginary unit;

s202: and inputting the spectrogram to an acceptance Block to extract sound characteristics.

Further, the S3 feature fusion includes the following steps:

s301: splicing the visual features and the sound features in the channel dimension;

s302: the spliced visual features and sound features pass through an MLP, wherein the MLP comprises self-adaptive pooling operation, convolution operation, reLU activation function and convolution operation, and attention weight is obtained through a softmax activation function;

s303, multiplying the initially spliced features by the attention weight to obtain fused features.

Further, the S4 network training includes the following steps:

s401: inputting the fused characteristics into a YOLOX network for target detection to obtain a target detection prediction result;

s402: calculating detection losses according to the prediction results and the truth labels, wherein the detection losses comprise classification losses, confidence losses and regression losses, the classification losses and the confidence losses use BCEWITHLogitsLoss, and the regression losses use IoULSs:

wherein x is _i And y _i Respectively representing a predicted category and a real category, n represents the total category number, B _p And B _g Respectively representing a prediction frame and a real frame;

s403: gradient back propagation, updating network parameters, training the network.

Further, in the step S403, the gradient back propagation is to adjust the network parameters toward the point with minimum loss on the basis of detecting the loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of the network according to the loss value to increase the detection accuracy.

Further, the S5 target detection includes the steps of:

s501: the method comprises the steps that an image and a sound to be detected are respectively passed through a visual encoder and a sound encoder to obtain visual characteristics and sound characteristics, and the characteristics are fused by using an attention mechanism;

s502: and (5) the fused features pass through a target detector to obtain an inference result.

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, the image information and the sound signals in different domains are captured through the different sensors in the different domains, the captured image information and the captured sound signals are subjected to feature extraction, the attention mechanism is used for feature fusion, and the fused features are used for target detection, so that the reconnaissance of the battlefield environment is facilitated, the battlefield situation analysis efficiency is increased, and the capability of cross-domain battlefield is improved.

According to the invention, the information in different domains is captured and fused, the detection loss is calculated, the effective characteristics are extracted to perform characteristic fusion, the existing neural network model is continuously trained, the network parameters are updated, so that the detection performance of the target detector is improved, and the accuracy of target detection is ensured.

Drawings

FIG. 1 is a schematic diagram of a specific flow of a multi-mode fusion military cross-domain combat target detection method;

FIG. 2 is a schematic diagram of a multi-modal fusion military cross-domain combat target detection method;

FIG. 3 is a diagram showing the spectrum of a sound signal and a short-time Fourier transform in a multi-modal fusion military cross-domain combat target detection method;

fig. 4 is a diagram showing the results of input images, sound signals and target detection reasoning provided in the multi-mode fusion military cross-domain combat target detection method.

Detailed Description

The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.

Referring to fig. 1-4, the invention discloses a multi-mode fusion military cross-domain combat target detection method, which specifically comprises the following steps:

The image information and the sound signals in S1 and S2 are captured by different sensors in different domains.

S1, extracting visual features comprises the following steps of:

S2, extracting sound features comprises the following steps of:

S3, feature fusion comprises the following steps:

s302: the spliced visual features and sound features are subjected to MLP (multi-level processing), wherein the MLP comprises self-adaptive pooling operation, convolution operation, reLU activation function and convolution operation, and attention weight is obtained through softmax activation function;

The S4 network training comprises the following steps:

s402: calculating detection losses according to the prediction results and the truth labels, wherein the detection losses comprise classification losses, confidence losses and regression losses, the classification losses and the confidence losses use BCEWITHLogitsLoss, and the regression losses use Iouloss:

In S403, the gradient back propagation is to adjust network parameters toward the point with minimum loss on the basis of detecting loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of network according to the loss value to increase the detection accuracy.

S5, target detection comprises the following steps:

The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The multi-mode fusion military cross-domain combat target detection method is characterized by comprising the following steps of:

2. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein: the image information and the sound signals in the S1 and the S2 are captured by different sensors in different domains.

3. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S1 extraction of visual features comprises the steps of:

4. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S2 extraction of acoustic features comprises the steps of:

5. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S3 feature fusion comprises the steps of:

6. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S4 network training comprises the steps of:

7. The multi-modal fusion military cross-domain combat target detection method of claim 6, wherein said method comprises the steps of: in the step S403, the gradient back propagation is to adjust network parameters toward the point with minimum loss on the basis of detecting loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of network according to the loss value to increase the detection accuracy.

8. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S5 target detection comprises the steps of: