CN116452960A - Multi-mode fusion military cross-domain combat target detection method - Google Patents
Multi-mode fusion military cross-domain combat target detection method Download PDFInfo
- Publication number
- CN116452960A CN116452960A CN202310425308.XA CN202310425308A CN116452960A CN 116452960 A CN116452960 A CN 116452960A CN 202310425308 A CN202310425308 A CN 202310425308A CN 116452960 A CN116452960 A CN 116452960A
- Authority
- CN
- China
- Prior art keywords
- sound
- target detection
- features
- domain
- visual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 67
- 230000004927 fusion Effects 0.000 title claims abstract description 36
- 230000000007 visual effect Effects 0.000 claims abstract description 34
- 230000005236 sound signal Effects 0.000 claims abstract description 20
- 230000007246 mechanism Effects 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 230000004913 activation Effects 0.000 claims description 6
- 238000000034 method Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 3
- 230000001149 cognitive effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a multi-mode fusion military cross-domain combat target detection method, which relates to the technical field of target detection and specifically comprises the following steps of: s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics; s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics; s3: and (3) feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism. The invention relates to a multi-mode fusion military cross-domain combat target detection method, which is characterized in that image information and sound signals in different domains are captured through different sensors in different domains, the captured image information and sound signals are subjected to feature extraction, a attention mechanism is used for feature fusion, and the fused features are used for target detection.
Description
Technical Field
The invention relates to the technical field of target detection, in particular to a multi-mode fusion military cross-domain combat target detection method.
Background
With the continuous progress of military technologies, battlefield space is continuously expanded, battlefield is expanded from traditional land, ocean and air to space, network space, electromagnetic spectrum, information environment, cognitive category and the like, and the battlefield is also more and more emphasized, so that great changes are brought to battlefield characteristics rules and winning mechanisms, new battlefield modes are continuously emerging, and cross-domain battlefield becomes a new battlefield mode.
The main characteristic of the cross-domain combat is that the limit between the army and the field is broken, the combined combat capability of the fields such as the air, ocean, land, space, network, electromagnetic spectrum and the like is utilized to the maximum extent, so that synchronous cross-domain firepower and global maneuver are realized, the advantages of the physical domain, the cognitive domain and the time aspect are taken, the intelligent combat characteristics of the cross-domain cooperation are increasingly obvious, and the future combat is promoted to develop towards the cross-domain combat direction.
Therefore, based on the deep learning technology, the robustness, generalization and effectiveness of target detection of a military system in different fields are improved by using image and sound multi-mode information fusion, the method is very important for developing the fight capability of the army under cross-domain collaborative intelligent fight, and in recent years, a target detection algorithm based on the deep learning has been well developed.
The existing target detection difficulty of cross-domain combat is high, robust feature representation needs to be obtained for different domain information, and downstream target detection is served to complete military cross-domain collaborative intelligent combat.
Disclosure of Invention
The invention aims to provide a multi-mode fusion military cross-domain combat target detection method, which is used for extracting effective characteristics to perform characteristic fusion through image information and sound signals captured by different sensors in different domains, and performing target detection so as to improve target detection performance.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
a multi-mode fusion military cross-domain combat target detection method specifically comprises the following steps:
s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics;
s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics;
s3: feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism;
s4: network training, namely inputting the fused characteristics into a target detector to obtain a detection result, calculating detection loss by using the detection result and a truth value label, and training a network;
s5: and (3) target detection, namely inputting the image and the sound to be detected into a trained network, and obtaining a target detection reasoning result.
Further, the image information and the sound signals in S1 and S2 are captured by different sensors in different domains.
Further, the step of S1 extracting visual features includes the steps of:
s101: inputting an image into an acceptance Block, wherein the acceptance Block comprises a plurality of branches, each branch uses convolution kernels with different sizes to carry out convolution operation, and simultaneously captures features with different scales;
s102: and splicing the outputs of the branches in the channel dimension to obtain visual characteristics.
Further, the step of S2 extracting the sound feature includes the steps of:
s201: performing short-time Fourier transform on an input sound signal, and converting the sound signal into a spectrogram:
wherein S (ω, τ) is a two-dimensional matrix representing the transformed spectral result, ω represents angular frequency, t represents time, x (t) represents the original signal, w (t- τ) represents the window function, τ represents the center of the window function, j represents the imaginary unit;
s202: and inputting the spectrogram to an acceptance Block to extract sound characteristics.
Further, the S3 feature fusion includes the following steps:
s301: splicing the visual features and the sound features in the channel dimension;
s302: the spliced visual features and sound features pass through an MLP, wherein the MLP comprises self-adaptive pooling operation, convolution operation, reLU activation function and convolution operation, and attention weight is obtained through a softmax activation function;
s303, multiplying the initially spliced features by the attention weight to obtain fused features.
Further, the S4 network training includes the following steps:
s401: inputting the fused characteristics into a YOLOX network for target detection to obtain a target detection prediction result;
s402: calculating detection losses according to the prediction results and the truth labels, wherein the detection losses comprise classification losses, confidence losses and regression losses, the classification losses and the confidence losses use BCEWITHLogitsLoss, and the regression losses use IoULSs:
wherein x is i And y i Respectively representing a predicted category and a real category, n represents the total category number, B p And B g Respectively representing a prediction frame and a real frame;
s403: gradient back propagation, updating network parameters, training the network.
Further, in the step S403, the gradient back propagation is to adjust the network parameters toward the point with minimum loss on the basis of detecting the loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of the network according to the loss value to increase the detection accuracy.
Further, the S5 target detection includes the steps of:
s501: the method comprises the steps that an image and a sound to be detected are respectively passed through a visual encoder and a sound encoder to obtain visual characteristics and sound characteristics, and the characteristics are fused by using an attention mechanism;
s502: and (5) the fused features pass through a target detector to obtain an inference result.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, the image information and the sound signals in different domains are captured through the different sensors in the different domains, the captured image information and the captured sound signals are subjected to feature extraction, the attention mechanism is used for feature fusion, and the fused features are used for target detection, so that the reconnaissance of the battlefield environment is facilitated, the battlefield situation analysis efficiency is increased, and the capability of cross-domain battlefield is improved.
According to the invention, the information in different domains is captured and fused, the detection loss is calculated, the effective characteristics are extracted to perform characteristic fusion, the existing neural network model is continuously trained, the network parameters are updated, so that the detection performance of the target detector is improved, and the accuracy of target detection is ensured.
Drawings
FIG. 1 is a schematic diagram of a specific flow of a multi-mode fusion military cross-domain combat target detection method;
FIG. 2 is a schematic diagram of a multi-modal fusion military cross-domain combat target detection method;
FIG. 3 is a diagram showing the spectrum of a sound signal and a short-time Fourier transform in a multi-modal fusion military cross-domain combat target detection method;
fig. 4 is a diagram showing the results of input images, sound signals and target detection reasoning provided in the multi-mode fusion military cross-domain combat target detection method.
Detailed Description
The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.
Referring to fig. 1-4, the invention discloses a multi-mode fusion military cross-domain combat target detection method, which specifically comprises the following steps:
s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics;
s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics;
s3: feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism;
s4: network training, namely inputting the fused characteristics into a target detector to obtain a detection result, calculating detection loss by using the detection result and a truth value label, and training a network;
s5: and (3) target detection, namely inputting the image and the sound to be detected into a trained network, and obtaining a target detection reasoning result.
The image information and the sound signals in S1 and S2 are captured by different sensors in different domains.
S1, extracting visual features comprises the following steps of:
s101: inputting an image into an acceptance Block, wherein the acceptance Block comprises a plurality of branches, each branch uses convolution kernels with different sizes to carry out convolution operation, and simultaneously captures features with different scales;
s102: and splicing the outputs of the branches in the channel dimension to obtain visual characteristics.
S2, extracting sound features comprises the following steps of:
s201: performing short-time Fourier transform on an input sound signal, and converting the sound signal into a spectrogram:
wherein S (ω, τ) is a two-dimensional matrix representing the transformed spectral result, ω represents angular frequency, t represents time, x (t) represents the original signal, w (t- τ) represents the window function, τ represents the center of the window function, j represents the imaginary unit;
s202: and inputting the spectrogram to an acceptance Block to extract sound characteristics.
S3, feature fusion comprises the following steps:
s301: splicing the visual features and the sound features in the channel dimension;
s302: the spliced visual features and sound features are subjected to MLP (multi-level processing), wherein the MLP comprises self-adaptive pooling operation, convolution operation, reLU activation function and convolution operation, and attention weight is obtained through softmax activation function;
s303, multiplying the initially spliced features by the attention weight to obtain fused features.
The S4 network training comprises the following steps:
s401: inputting the fused characteristics into a YOLOX network for target detection to obtain a target detection prediction result;
s402: calculating detection losses according to the prediction results and the truth labels, wherein the detection losses comprise classification losses, confidence losses and regression losses, the classification losses and the confidence losses use BCEWITHLogitsLoss, and the regression losses use Iouloss:
wherein x is i And y i Respectively representing a predicted category and a real category, n represents the total category number, B p And B g Respectively representing a prediction frame and a real frame;
s403: gradient back propagation, updating network parameters, training the network.
In S403, the gradient back propagation is to adjust network parameters toward the point with minimum loss on the basis of detecting loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of network according to the loss value to increase the detection accuracy.
S5, target detection comprises the following steps:
s501: the method comprises the steps that an image and a sound to be detected are respectively passed through a visual encoder and a sound encoder to obtain visual characteristics and sound characteristics, and the characteristics are fused by using an attention mechanism;
s502: and (5) the fused features pass through a target detector to obtain an inference result.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (8)
1. The multi-mode fusion military cross-domain combat target detection method is characterized by comprising the following steps of:
s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics;
s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics;
s3: feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism;
s4: network training, namely inputting the fused characteristics into a target detector to obtain a detection result, calculating detection loss by using the detection result and a truth value label, and training a network;
s5: and (3) target detection, namely inputting the image and the sound to be detected into a trained network, and obtaining a target detection reasoning result.
2. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein: the image information and the sound signals in the S1 and the S2 are captured by different sensors in different domains.
3. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S1 extraction of visual features comprises the steps of:
s101: inputting an image into an acceptance Block, wherein the acceptance Block comprises a plurality of branches, each branch uses convolution kernels with different sizes to carry out convolution operation, and simultaneously captures features with different scales;
s102: and splicing the outputs of the branches in the channel dimension to obtain visual characteristics.
4. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S2 extraction of acoustic features comprises the steps of:
s201: performing short-time Fourier transform on an input sound signal, and converting the sound signal into a spectrogram:
wherein S (ω, τ) is a two-dimensional matrix representing the transformed spectral result, ω represents angular frequency, t represents time, x (t) represents the original signal, w (t- τ) represents the window function, τ represents the center of the window function, j represents the imaginary unit;
s202: and inputting the spectrogram to an acceptance Block to extract sound characteristics.
5. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S3 feature fusion comprises the steps of:
s301: splicing the visual features and the sound features in the channel dimension;
s302: the spliced visual features and sound features pass through an MLP, wherein the MLP comprises self-adaptive pooling operation, convolution operation, reLU activation function and convolution operation, and attention weight is obtained through a softmax activation function;
s303, multiplying the initially spliced features by the attention weight to obtain fused features.
6. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S4 network training comprises the steps of:
s401: inputting the fused characteristics into a YOLOX network for target detection to obtain a target detection prediction result;
s402: calculating detection losses according to the prediction results and the truth labels, wherein the detection losses comprise classification losses, confidence losses and regression losses, the classification losses and the confidence losses use BCEWITHLogitsLoss, and the regression losses use IoULSs:
wherein x is i And y i Respectively representing a predicted category and a real category, n represents the total category number, B p And B g Respectively representing a prediction frame and a real frame;
s403: gradient back propagation, updating network parameters, training the network.
7. The multi-modal fusion military cross-domain combat target detection method of claim 6, wherein said method comprises the steps of: in the step S403, the gradient back propagation is to adjust network parameters toward the point with minimum loss on the basis of detecting loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of network according to the loss value to increase the detection accuracy.
8. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S5 target detection comprises the steps of:
s501: the method comprises the steps that an image and a sound to be detected are respectively passed through a visual encoder and a sound encoder to obtain visual characteristics and sound characteristics, and the characteristics are fused by using an attention mechanism;
s502: and (5) the fused features pass through a target detector to obtain an inference result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310425308.XA CN116452960A (en) | 2023-04-20 | 2023-04-20 | Multi-mode fusion military cross-domain combat target detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310425308.XA CN116452960A (en) | 2023-04-20 | 2023-04-20 | Multi-mode fusion military cross-domain combat target detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116452960A true CN116452960A (en) | 2023-07-18 |
Family
ID=87127038
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310425308.XA Pending CN116452960A (en) | 2023-04-20 | 2023-04-20 | Multi-mode fusion military cross-domain combat target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116452960A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210383231A1 (en) * | 2020-08-20 | 2021-12-09 | Chang'an University | Target cross-domain detection and understanding method, system and equipment and storage medium |
CN115188066A (en) * | 2022-06-02 | 2022-10-14 | 广州大学 | Moving target detection system and method based on cooperative attention and multi-scale fusion |
CN115631444A (en) * | 2022-10-31 | 2023-01-20 | 成都浩孚科技有限公司 | Unmanned aerial vehicle aerial image target detection algorithm |
CN115700808A (en) * | 2022-10-27 | 2023-02-07 | 东南大学 | Dual-mode unmanned aerial vehicle identification method for adaptively fusing visible light and infrared images |
-
2023
- 2023-04-20 CN CN202310425308.XA patent/CN116452960A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210383231A1 (en) * | 2020-08-20 | 2021-12-09 | Chang'an University | Target cross-domain detection and understanding method, system and equipment and storage medium |
CN115188066A (en) * | 2022-06-02 | 2022-10-14 | 广州大学 | Moving target detection system and method based on cooperative attention and multi-scale fusion |
CN115700808A (en) * | 2022-10-27 | 2023-02-07 | 东南大学 | Dual-mode unmanned aerial vehicle identification method for adaptively fusing visible light and infrared images |
CN115631444A (en) * | 2022-10-31 | 2023-01-20 | 成都浩孚科技有限公司 | Unmanned aerial vehicle aerial image target detection algorithm |
Non-Patent Citations (1)
Title |
---|
韩锡辉: ""基于视听觉注意机理的目标检测方法"", 《万方数据知识服务平台》, pages 43 - 44 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ren et al. | Adversarial examples: attacks and defenses in the physical world | |
CN111898504B (en) | Target tracking method and system based on twin circulating neural network | |
WO2023280065A1 (en) | Image reconstruction method and apparatus for cross-modal communication system | |
CN110796166B (en) | Attention mechanism-based multitask image processing method | |
Isa et al. | Optimizing the hyperparameter tuning of YOLOv5 for underwater detection | |
Teng et al. | Underwater target recognition methods based on the framework of deep learning: A survey | |
US20230260255A1 (en) | Three-dimensional object detection framework based on multi-source data knowledge transfer | |
CN107844743A (en) | A kind of image multi-subtitle automatic generation method based on multiple dimensioned layering residual error network | |
Bar et al. | The vulnerability of semantic segmentation networks to adversarial attacks in autonomous driving: Enhancing extensive environment sensing | |
CN114463677B (en) | Safety helmet wearing detection method based on global attention | |
CN112529065B (en) | Target detection method based on feature alignment and key point auxiliary excitation | |
CN116486243A (en) | DP-ViT-based sonar image target detection method | |
CN115830531A (en) | Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion | |
Li et al. | Spear and shield: Attack and detection for CNN-based high spatial resolution remote sensing images identification | |
Dhiyanesh et al. | Improved object detection in video surveillance using deep convolutional neural network learning | |
CN114566170A (en) | Lightweight voice spoofing detection algorithm based on class-one classification | |
Chen et al. | GFSNet: Generalization-friendly siamese network for thermal infrared object tracking | |
EP3832542A1 (en) | Device and method with sensor-specific image recognition | |
Lei et al. | Real-time Anomaly Target Detection and Recognition in Intelligent Surveillance Systems based on SLAM | |
Wei et al. | Lightweight multimodal feature graph convolutional network for dangerous driving behavior detection | |
Chu et al. | Illumination-guided transformer-based network for multispectral pedestrian detection | |
CN116452960A (en) | Multi-mode fusion military cross-domain combat target detection method | |
CN115830643A (en) | Light-weight pedestrian re-identification method for posture-guided alignment | |
Wang et al. | Simulation of human ear recognition sound direction based on convolutional neural network | |
CN115937993A (en) | Living body detection model training method, living body detection device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |