CN116452960A - Multi-mode fusion military cross-domain combat target detection method - Google Patents

Multi-mode fusion military cross-domain combat target detection method Download PDF

Info

Publication number
CN116452960A
CN116452960A CN202310425308.XA CN202310425308A CN116452960A CN 116452960 A CN116452960 A CN 116452960A CN 202310425308 A CN202310425308 A CN 202310425308A CN 116452960 A CN116452960 A CN 116452960A
Authority
CN
China
Prior art keywords
sound
target detection
features
domain
visual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310425308.XA
Other languages
Chinese (zh)
Inventor
魏明强
范溢华
燕雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310425308.XA priority Critical patent/CN116452960A/en
Publication of CN116452960A publication Critical patent/CN116452960A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a multi-mode fusion military cross-domain combat target detection method, which relates to the technical field of target detection and specifically comprises the following steps of: s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics; s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics; s3: and (3) feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism. The invention relates to a multi-mode fusion military cross-domain combat target detection method, which is characterized in that image information and sound signals in different domains are captured through different sensors in different domains, the captured image information and sound signals are subjected to feature extraction, a attention mechanism is used for feature fusion, and the fused features are used for target detection.

Description

Multi-mode fusion military cross-domain combat target detection method
Technical Field
The invention relates to the technical field of target detection, in particular to a multi-mode fusion military cross-domain combat target detection method.
Background
With the continuous progress of military technologies, battlefield space is continuously expanded, battlefield is expanded from traditional land, ocean and air to space, network space, electromagnetic spectrum, information environment, cognitive category and the like, and the battlefield is also more and more emphasized, so that great changes are brought to battlefield characteristics rules and winning mechanisms, new battlefield modes are continuously emerging, and cross-domain battlefield becomes a new battlefield mode.
The main characteristic of the cross-domain combat is that the limit between the army and the field is broken, the combined combat capability of the fields such as the air, ocean, land, space, network, electromagnetic spectrum and the like is utilized to the maximum extent, so that synchronous cross-domain firepower and global maneuver are realized, the advantages of the physical domain, the cognitive domain and the time aspect are taken, the intelligent combat characteristics of the cross-domain cooperation are increasingly obvious, and the future combat is promoted to develop towards the cross-domain combat direction.
Therefore, based on the deep learning technology, the robustness, generalization and effectiveness of target detection of a military system in different fields are improved by using image and sound multi-mode information fusion, the method is very important for developing the fight capability of the army under cross-domain collaborative intelligent fight, and in recent years, a target detection algorithm based on the deep learning has been well developed.
The existing target detection difficulty of cross-domain combat is high, robust feature representation needs to be obtained for different domain information, and downstream target detection is served to complete military cross-domain collaborative intelligent combat.
Disclosure of Invention
The invention aims to provide a multi-mode fusion military cross-domain combat target detection method, which is used for extracting effective characteristics to perform characteristic fusion through image information and sound signals captured by different sensors in different domains, and performing target detection so as to improve target detection performance.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
a multi-mode fusion military cross-domain combat target detection method specifically comprises the following steps:
s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics;
s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics;
s3: feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism;
s4: network training, namely inputting the fused characteristics into a target detector to obtain a detection result, calculating detection loss by using the detection result and a truth value label, and training a network;
s5: and (3) target detection, namely inputting the image and the sound to be detected into a trained network, and obtaining a target detection reasoning result.
Further, the image information and the sound signals in S1 and S2 are captured by different sensors in different domains.
Further, the step of S1 extracting visual features includes the steps of:
s101: inputting an image into an acceptance Block, wherein the acceptance Block comprises a plurality of branches, each branch uses convolution kernels with different sizes to carry out convolution operation, and simultaneously captures features with different scales;
s102: and splicing the outputs of the branches in the channel dimension to obtain visual characteristics.
Further, the step of S2 extracting the sound feature includes the steps of:
s201: performing short-time Fourier transform on an input sound signal, and converting the sound signal into a spectrogram:
wherein S (ω, τ) is a two-dimensional matrix representing the transformed spectral result, ω represents angular frequency, t represents time, x (t) represents the original signal, w (t- τ) represents the window function, τ represents the center of the window function, j represents the imaginary unit;
s202: and inputting the spectrogram to an acceptance Block to extract sound characteristics.
Further, the S3 feature fusion includes the following steps:
s301: splicing the visual features and the sound features in the channel dimension;
s302: the spliced visual features and sound features pass through an MLP, wherein the MLP comprises self-adaptive pooling operation, convolution operation, reLU activation function and convolution operation, and attention weight is obtained through a softmax activation function;
s303, multiplying the initially spliced features by the attention weight to obtain fused features.
Further, the S4 network training includes the following steps:
s401: inputting the fused characteristics into a YOLOX network for target detection to obtain a target detection prediction result;
s402: calculating detection losses according to the prediction results and the truth labels, wherein the detection losses comprise classification losses, confidence losses and regression losses, the classification losses and the confidence losses use BCEWITHLogitsLoss, and the regression losses use IoULSs:
wherein x is i And y i Respectively representing a predicted category and a real category, n represents the total category number, B p And B g Respectively representing a prediction frame and a real frame;
s403: gradient back propagation, updating network parameters, training the network.
Further, in the step S403, the gradient back propagation is to adjust the network parameters toward the point with minimum loss on the basis of detecting the loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of the network according to the loss value to increase the detection accuracy.
Further, the S5 target detection includes the steps of:
s501: the method comprises the steps that an image and a sound to be detected are respectively passed through a visual encoder and a sound encoder to obtain visual characteristics and sound characteristics, and the characteristics are fused by using an attention mechanism;
s502: and (5) the fused features pass through a target detector to obtain an inference result.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, the image information and the sound signals in different domains are captured through the different sensors in the different domains, the captured image information and the captured sound signals are subjected to feature extraction, the attention mechanism is used for feature fusion, and the fused features are used for target detection, so that the reconnaissance of the battlefield environment is facilitated, the battlefield situation analysis efficiency is increased, and the capability of cross-domain battlefield is improved.
According to the invention, the information in different domains is captured and fused, the detection loss is calculated, the effective characteristics are extracted to perform characteristic fusion, the existing neural network model is continuously trained, the network parameters are updated, so that the detection performance of the target detector is improved, and the accuracy of target detection is ensured.
Drawings
FIG. 1 is a schematic diagram of a specific flow of a multi-mode fusion military cross-domain combat target detection method;
FIG. 2 is a schematic diagram of a multi-modal fusion military cross-domain combat target detection method;
FIG. 3 is a diagram showing the spectrum of a sound signal and a short-time Fourier transform in a multi-modal fusion military cross-domain combat target detection method;
fig. 4 is a diagram showing the results of input images, sound signals and target detection reasoning provided in the multi-mode fusion military cross-domain combat target detection method.
Detailed Description
The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.
Referring to fig. 1-4, the invention discloses a multi-mode fusion military cross-domain combat target detection method, which specifically comprises the following steps:
s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics;
s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics;
s3: feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism;
s4: network training, namely inputting the fused characteristics into a target detector to obtain a detection result, calculating detection loss by using the detection result and a truth value label, and training a network;
s5: and (3) target detection, namely inputting the image and the sound to be detected into a trained network, and obtaining a target detection reasoning result.
The image information and the sound signals in S1 and S2 are captured by different sensors in different domains.
S1, extracting visual features comprises the following steps of:
s101: inputting an image into an acceptance Block, wherein the acceptance Block comprises a plurality of branches, each branch uses convolution kernels with different sizes to carry out convolution operation, and simultaneously captures features with different scales;
s102: and splicing the outputs of the branches in the channel dimension to obtain visual characteristics.
S2, extracting sound features comprises the following steps of:
s201: performing short-time Fourier transform on an input sound signal, and converting the sound signal into a spectrogram:
wherein S (ω, τ) is a two-dimensional matrix representing the transformed spectral result, ω represents angular frequency, t represents time, x (t) represents the original signal, w (t- τ) represents the window function, τ represents the center of the window function, j represents the imaginary unit;
s202: and inputting the spectrogram to an acceptance Block to extract sound characteristics.
S3, feature fusion comprises the following steps:
s301: splicing the visual features and the sound features in the channel dimension;
s302: the spliced visual features and sound features are subjected to MLP (multi-level processing), wherein the MLP comprises self-adaptive pooling operation, convolution operation, reLU activation function and convolution operation, and attention weight is obtained through softmax activation function;
s303, multiplying the initially spliced features by the attention weight to obtain fused features.
The S4 network training comprises the following steps:
s401: inputting the fused characteristics into a YOLOX network for target detection to obtain a target detection prediction result;
s402: calculating detection losses according to the prediction results and the truth labels, wherein the detection losses comprise classification losses, confidence losses and regression losses, the classification losses and the confidence losses use BCEWITHLogitsLoss, and the regression losses use Iouloss:
wherein x is i And y i Respectively representing a predicted category and a real category, n represents the total category number, B p And B g Respectively representing a prediction frame and a real frame;
s403: gradient back propagation, updating network parameters, training the network.
In S403, the gradient back propagation is to adjust network parameters toward the point with minimum loss on the basis of detecting loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of network according to the loss value to increase the detection accuracy.
S5, target detection comprises the following steps:
s501: the method comprises the steps that an image and a sound to be detected are respectively passed through a visual encoder and a sound encoder to obtain visual characteristics and sound characteristics, and the characteristics are fused by using an attention mechanism;
s502: and (5) the fused features pass through a target detector to obtain an inference result.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. The multi-mode fusion military cross-domain combat target detection method is characterized by comprising the following steps of:
s1: extracting visual characteristics, inputting the image into a visual encoder, and extracting the visual characteristics;
s2: extracting sound characteristics, performing short-time Fourier transform on the sound signals to obtain a spectrogram, inputting the spectrogram into a sound encoder, and extracting the sound characteristics;
s3: feature fusion, namely inputting the extracted visual features and the extracted sound features into a feature fusion module, and effectively fusing the features by using an attention mechanism;
s4: network training, namely inputting the fused characteristics into a target detector to obtain a detection result, calculating detection loss by using the detection result and a truth value label, and training a network;
s5: and (3) target detection, namely inputting the image and the sound to be detected into a trained network, and obtaining a target detection reasoning result.
2. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein: the image information and the sound signals in the S1 and the S2 are captured by different sensors in different domains.
3. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S1 extraction of visual features comprises the steps of:
s101: inputting an image into an acceptance Block, wherein the acceptance Block comprises a plurality of branches, each branch uses convolution kernels with different sizes to carry out convolution operation, and simultaneously captures features with different scales;
s102: and splicing the outputs of the branches in the channel dimension to obtain visual characteristics.
4. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S2 extraction of acoustic features comprises the steps of:
s201: performing short-time Fourier transform on an input sound signal, and converting the sound signal into a spectrogram:
wherein S (ω, τ) is a two-dimensional matrix representing the transformed spectral result, ω represents angular frequency, t represents time, x (t) represents the original signal, w (t- τ) represents the window function, τ represents the center of the window function, j represents the imaginary unit;
s202: and inputting the spectrogram to an acceptance Block to extract sound characteristics.
5. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S3 feature fusion comprises the steps of:
s301: splicing the visual features and the sound features in the channel dimension;
s302: the spliced visual features and sound features pass through an MLP, wherein the MLP comprises self-adaptive pooling operation, convolution operation, reLU activation function and convolution operation, and attention weight is obtained through a softmax activation function;
s303, multiplying the initially spliced features by the attention weight to obtain fused features.
6. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S4 network training comprises the steps of:
s401: inputting the fused characteristics into a YOLOX network for target detection to obtain a target detection prediction result;
s402: calculating detection losses according to the prediction results and the truth labels, wherein the detection losses comprise classification losses, confidence losses and regression losses, the classification losses and the confidence losses use BCEWITHLogitsLoss, and the regression losses use IoULSs:
wherein x is i And y i Respectively representing a predicted category and a real category, n represents the total category number, B p And B g Respectively representing a prediction frame and a real frame;
s403: gradient back propagation, updating network parameters, training the network.
7. The multi-modal fusion military cross-domain combat target detection method of claim 6, wherein said method comprises the steps of: in the step S403, the gradient back propagation is to adjust network parameters toward the point with minimum loss on the basis of detecting loss, reversely transmit the loss value to each layer of the neural network, and reversely adjust the weight of each layer of network according to the loss value to increase the detection accuracy.
8. The multi-modal fusion military cross-domain combat target detection method of claim 1, wherein said S5 target detection comprises the steps of:
s501: the method comprises the steps that an image and a sound to be detected are respectively passed through a visual encoder and a sound encoder to obtain visual characteristics and sound characteristics, and the characteristics are fused by using an attention mechanism;
s502: and (5) the fused features pass through a target detector to obtain an inference result.
CN202310425308.XA 2023-04-20 2023-04-20 Multi-mode fusion military cross-domain combat target detection method Pending CN116452960A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310425308.XA CN116452960A (en) 2023-04-20 2023-04-20 Multi-mode fusion military cross-domain combat target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310425308.XA CN116452960A (en) 2023-04-20 2023-04-20 Multi-mode fusion military cross-domain combat target detection method

Publications (1)

Publication Number Publication Date
CN116452960A true CN116452960A (en) 2023-07-18

Family

ID=87127038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310425308.XA Pending CN116452960A (en) 2023-04-20 2023-04-20 Multi-mode fusion military cross-domain combat target detection method

Country Status (1)

Country Link
CN (1) CN116452960A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210383231A1 (en) * 2020-08-20 2021-12-09 Chang'an University Target cross-domain detection and understanding method, system and equipment and storage medium
CN115188066A (en) * 2022-06-02 2022-10-14 广州大学 Moving target detection system and method based on cooperative attention and multi-scale fusion
CN115631444A (en) * 2022-10-31 2023-01-20 成都浩孚科技有限公司 Unmanned aerial vehicle aerial image target detection algorithm
CN115700808A (en) * 2022-10-27 2023-02-07 东南大学 Dual-mode unmanned aerial vehicle identification method for adaptively fusing visible light and infrared images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210383231A1 (en) * 2020-08-20 2021-12-09 Chang'an University Target cross-domain detection and understanding method, system and equipment and storage medium
CN115188066A (en) * 2022-06-02 2022-10-14 广州大学 Moving target detection system and method based on cooperative attention and multi-scale fusion
CN115700808A (en) * 2022-10-27 2023-02-07 东南大学 Dual-mode unmanned aerial vehicle identification method for adaptively fusing visible light and infrared images
CN115631444A (en) * 2022-10-31 2023-01-20 成都浩孚科技有限公司 Unmanned aerial vehicle aerial image target detection algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩锡辉: ""基于视听觉注意机理的目标检测方法"", 《万方数据知识服务平台》, pages 43 - 44 *

Similar Documents

Publication Publication Date Title
Ren et al. Adversarial examples: attacks and defenses in the physical world
CN111898504B (en) Target tracking method and system based on twin circulating neural network
WO2023280065A1 (en) Image reconstruction method and apparatus for cross-modal communication system
CN110796166B (en) Attention mechanism-based multitask image processing method
Isa et al. Optimizing the hyperparameter tuning of YOLOv5 for underwater detection
Teng et al. Underwater target recognition methods based on the framework of deep learning: A survey
US20230260255A1 (en) Three-dimensional object detection framework based on multi-source data knowledge transfer
CN107844743A (en) A kind of image multi-subtitle automatic generation method based on multiple dimensioned layering residual error network
Bar et al. The vulnerability of semantic segmentation networks to adversarial attacks in autonomous driving: Enhancing extensive environment sensing
CN114463677B (en) Safety helmet wearing detection method based on global attention
CN112529065B (en) Target detection method based on feature alignment and key point auxiliary excitation
CN116486243A (en) DP-ViT-based sonar image target detection method
CN115830531A (en) Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion
Li et al. Spear and shield: Attack and detection for CNN-based high spatial resolution remote sensing images identification
Dhiyanesh et al. Improved object detection in video surveillance using deep convolutional neural network learning
CN114566170A (en) Lightweight voice spoofing detection algorithm based on class-one classification
Chen et al. GFSNet: Generalization-friendly siamese network for thermal infrared object tracking
EP3832542A1 (en) Device and method with sensor-specific image recognition
Lei et al. Real-time Anomaly Target Detection and Recognition in Intelligent Surveillance Systems based on SLAM
Wei et al. Lightweight multimodal feature graph convolutional network for dangerous driving behavior detection
Chu et al. Illumination-guided transformer-based network for multispectral pedestrian detection
CN116452960A (en) Multi-mode fusion military cross-domain combat target detection method
CN115830643A (en) Light-weight pedestrian re-identification method for posture-guided alignment
Wang et al. Simulation of human ear recognition sound direction based on convolutional neural network
CN115937993A (en) Living body detection model training method, living body detection device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination