CN116258934A - Feature enhancement-based infrared-visible light fusion method, system and readable storage medium - Google Patents

Feature enhancement-based infrared-visible light fusion method, system and readable storage medium Download PDF

Info

Publication number
CN116258934A
CN116258934A CN202310267771.6A CN202310267771A CN116258934A CN 116258934 A CN116258934 A CN 116258934A CN 202310267771 A CN202310267771 A CN 202310267771A CN 116258934 A CN116258934 A CN 116258934A
Authority
CN
China
Prior art keywords
feature
rgb
infrared
visible light
enhancement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310267771.6A
Other languages
Chinese (zh)
Inventor
李智勇
肖志强
付浩龙
刘函豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202310267771.6A priority Critical patent/CN116258934A/en
Publication of CN116258934A publication Critical patent/CN116258934A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an infrared-visible light fusion method based on feature enhancement, which adopts a feature extraction network of YOLOv5 with double-flow trunks to extract deep features from visible light and infrared images, and reduces deviation to a single mode through symmetrical complementary masks; aiming at the difference between visible light and infrared images, a cross feature enhancement module is added in a fusion module to improve intra-mode feature representation, and a long-distance dependent fusion module is added, so that enhanced features are fused by correlating position codes of multi-mode features, and the joint utilization rate of the multi-mode images and the detection effect of complex scenes can be improved. The application also provides an infrared-visible light fusion system based on characteristic enhancement and a readable storage medium.

Description

Feature enhancement-based infrared-visible light fusion method, system and readable storage medium
Technical Field
The application belongs to the technical field of target detection, and particularly relates to an infrared-visible light fusion method and system based on feature enhancement and a readable storage medium.
Background
The target detection algorithm is widely applied to the fields of automatic driving, monitoring, remote sensing and the like. However, due to limitations of the visible light sensor, most target detection methods of visible light images cannot achieve satisfactory accuracy and are sensitive to severe environmental factors such as rain, fog, and weak light. In contrast, infrared sensors perform well in the harsh environments described above. However, the infrared sensor is greatly affected by temperature. In a high temperature environment with good illumination, visible light images have rich texture and color information, while infrared images have difficulty in effectively distinguishing foreground and background. Therefore, by fusing complementary information from the visible light and the infrared sensor, the accuracy, reliability and robustness of the detection algorithm can be further improved.
In the prior art, the work of infrared-visible light multi-mode target detection is mainly divided into a traditional method and a deep learning method. For the traditional method, features are extracted from visible light and infrared images by using a directional gradient Histogram (HOG), and cascading fusion features are input into a Support Vector Machine (SVM) to obtain detection results, however, the feature extraction capability of a manually designed operator is limited, and an optimal feature extraction result is difficult to obtain; for the deep learning method, due to the strong representation learning capability, the deep learning shows advantages in the aspect of visible infrared fusion target detection, and four fusion detection strategies are designed based on YOLO v 4: image transition fusion, early fusion, mid-term fusion, and late fusion, however, the method using Convolutional Neural Networks (CNNs) is based on the non-global accepted domain of convolutional operators, resulting in information fusion only in local regions. Although these methods have higher performance than single-mode detection methods, they often lack long-range dependence and do not take full advantage of complementarity between modes, resulting in an undesirable detection result.
Therefore, there is a need to provide an infrared-visible light fusion method, system and readable storage medium based on feature enhancement, so as to solve the above-mentioned problems in the background art.
Disclosure of Invention
The purpose of the application is to provide an infrared-visible light fusion method, an infrared-visible light fusion system and a readable storage medium based on feature enhancement, which adopt a feature extraction network of YOLOv5 with double-flow trunks to extract deep features from visible light and infrared images, and reduce deviation to a single mode through symmetrical complementary masks; aiming at the difference between visible light and infrared images, a cross feature enhancement module is added in a fusion module to improve intra-mode feature representation, and a long-distance dependent fusion module is added, so that enhanced features are fused by correlating position codes of multi-mode features, and the joint utilization rate of the multi-mode images and the detection effect of complex scenes can be improved.
In order to solve the technical problems, the application is realized as follows:
an infrared-visible light fusion method based on feature enhancement comprises the following steps:
and (3) data acquisition: collecting a multi-target data set, and preprocessing the multi-target data set, wherein the multi-target data set comprises a visible light image and an infrared image;
feature extraction: constructing a double-flow trunk feature extraction network, wherein the double-flow trunk feature extraction network comprises two branches with the same structure, and the visible light image and the infrared image are respectively sent into the two branches to extract deep features so as to obtain visible light features and infrared features;
feature fusion: constructing a feature fusion network, wherein the feature fusion network comprises a cross feature enhancement module and a long-distance dependent fusion module, the cross feature enhancement module comprises a channel attention branch and a space attention branch which are arranged in series, the visible light features and the infrared features are sent into the channel attention branch for enhancement, the visible light features before enhancement and the infrared features after enhancement through the channel attention branch are added to obtain a first feature, and the infrared features before enhancement and the visible light features after enhancement through the channel attention branch are added to obtain a second feature; the first feature and the second feature are sent to the space attention branch for enhancement, the first feature is added with the second feature enhanced by the space attention branch to obtain the final output of the visible light feature, and the second feature is added with the first feature enhanced by the space attention branch to obtain the final output of the infrared feature; and sending the final output of the visible light characteristic and the infrared characteristic to the long-distance dependent fusion module, and fusing by correlation of position codes based on a Swin transducer model.
Preferably, the pretreatment method is as follows: the method for enhancing the data by generating the image mask by adopting the random area is used for processing the visible light image and the infrared image in the multi-target data set, and comprises the following specific processes: dividing the image into 10 x 10 checkerboards according to the size of the image, and selecting two blocks set to zero in each row with a probability of 30% as an image mask; the image mask is then divided into two complementary masks by rows, one of which is used as a mask for the visible light image and the other is used as a mask for the infrared image, the mask generation process being expressed as:
Generate mask =RGB mask ∪IR mask
RGB mask |IR mask =1;
wherein EGB is mask Is a mask for visible light images, IR mask Is a mask for infrared images, the generator mask Is the total mask.
Preferably, the dual-stream trunk feature extraction network is a YOLOv5 feature extraction network of a dual-stream trunk, and the extracted visible light features are expressed as
Figure BDA0004133556140000031
The infrared characteristic is expressed as +.>
Figure BDA0004133556140000032
Figure BDA0004133556140000033
Figure BDA0004133556140000034
Representing a three-dimensional matrix; w is the width; h is the width; c represents the number of channels.
Preferably, the enhancing process of the attention branch of the channel is as follows: the input features are fully folded in one direction q while maintaining high resolution in the orthogonal direction v of direction q, the operation is expressed as:
W RGBq =σ 1 (F 1 (X RGB ));W RGBv =σ 2 (F 2 (X RGB ));
E IRq =σ 1 (F 1 (X IR ));W IRv =σ 2 (F 2 (X IR ));
in which W is RGBq Information indicating the visible light characteristic in the q direction; w (W) RGBv Information representing the visible light characteristic in the v direction; w (W) IRq Information representing the infrared signature in the q-direction; w (W) IRv Information representing the infrared signature in the v direction; sigma (sigma) 1 Sum sigma 2 All represent tensor shaping operators; f (F) 1 (. Cndot.) and F 2 (. Cndot.) all represent a 1X 1 convolution operation;
in W RGBq And W is IRq The weights respectively representing the visible light characteristics and the infrared characteristics are input into a softmax function for classification, the weight distribution of the visible light characteristics and the infrared characteristics is output, and the calculation process is expressed as follows:
Figure BDA0004133556140000041
in which W is RGBk Weights representing visible light characteristics; w (W) IRk The weights representing the infrared features.
Will information W RGBv Multiplied by weight W RGBk Will information W IRv Multiplied by weight W IRk Then, a 1×1 convolution operation is performed, the channel dimension is upgraded from C/2 to C by adopting a standardized process, all parameters are kept in the range of 0-1 by using a Sigmoid function, and the calculation process is expressed as follows:
W RGBz =Sigmoid(σ 3 (F 3 (W RGBv ×W RGBk )));
W IRz =sigmoid(σ 3 (F 3 (W IRv ×W IRk )));
in which W is RGBz Information representing the visible light characteristics in the z-direction; w (W) IRz Information representing the infrared signature in the z-direction; "×" represents a matrix dot product operation; f (F) z (. Cndot.) represents a 1×1 convolution operation; sigma (sigma) 3 Representing a tensor remodelling operator;
then X is taken up RGB And W is RGBz Multiplying the channel levels to obtain the characteristic W with smaller noise RGBln The method comprises the steps of carrying out a first treatment on the surface of the X is to be IR And W is IRz Multiplying the channel levels to obtain the characteristic W with smaller noise IRln The calculation process is expressed as:
W RGBln =X RGB ⊙W RGBz ;W IRln =X IR ⊙W IRz
features W RGBln And X is BGB Adding to perform recalibration enhancement to obtain a first feature A RGBch The method comprises the steps of carrying out a first treatment on the surface of the Features W RGBln And X is IR Adding to perform recalibration enhancement to obtain a second feature A IRch The calculation process is expressed as:
A RGBch =W IRln +X RGB ;A IRch =W RGBln +X IR
preferably, the enhancement procedure of the spatial attention branch is as follows: the input features are fully folded in direction q while maintaining high resolution in direction v, the process of operation is expressed as:
A RGBq =σ 4 (F GP (F 4 (A RGBch )));A RGBv =σ 5 (F 5 (A RGBch ));
A IRq =σ 4 (F GP (F 4 (A IRch )));A IRv =σ 5 (F 5 (A IRch ));
in sigma 4 Sum sigma 5 All represent tensor remodelling operators; f (F) 4 (. Cndot.) and F 5 (. Cndot.) all represent 1X 1 convolution operations; f (F) GP (. Cndot.) represents a global pooling operator,
Figure BDA0004133556140000042
in A way RGBq And A IRq Weights respectively representing the first feature and the second feature are input into a Softmax function for classification, weight distribution of the first feature and the second feature is output, and the calculation process is represented as follows:
Figure BDA0004133556140000051
wherein A is RGBk Weights representing the first characteristic, A IRk Weights representing the second features;
will information A RGBv Multiplied by weight A RGBk Will information A IRv Multiplied by weight A IRk Then sequentially performing information completion, remodelling and Sigmoid functions:
A RGBz =Sigmoid(σ 6 (A RGBv ×A RFBk ));
A IRz =Sigmoid(σ 6 (A IRv ×A IRk ));
wherein A is RGBz ∈R 1×HW ;A IRz ∈R 1×HW A spatial gate representing the first feature and the second feature, respectively;
will A RGBch And A RGBz Multiplying by A iRch And A IRz Multiplication to obtain spatially enhanced features A of the first and second features, respectively RGBln And A IRln The calculation process is expressed as:
A RGBln =A RGBch ⊙A RGBz ;A IRln =A IRch ⊙A IRz
feature A IRln And A is a RGBch Adding to recalibrate enhancement to obtain final output X of visible light characteristics RGBout The method comprises the steps of carrying out a first treatment on the surface of the Feature A RGBln And A is a IRch Adding to recalibrate the enhancement to obtain the final output X of the infrared signature IRout The calculation process is expressed as:
X RGBout =A IRln +A RGBch ;X IRout =A RGBln +A IRch
preferably, the feature fusion process is as follows:
using a shift window dividing method to alternately divide the feature map into M x M dimensions, and if the element map size is smaller than M x M, filling it into M x M size; the window of the next module will then be relatively shifted by (M/2 ) pixels; with this calculation method, the calculation formula is as follows:
Figure BDA0004133556140000052
Figure BDA0004133556140000053
Figure BDA0004133556140000054
Figure BDA0004133556140000055
wherein F is i The representation being a joint input of visible and infrared features, F i ={X RGBout ,X IRout };F o Is the output characteristic of the transducer block;
Figure BDA0004133556140000056
and->
Figure BDA0004133556140000057
Is an intermediate variable; W-MSA and MW-MSA are window multi-head self-attention operation and mask window multi-head self-attention operation, respectively;
taking partial characteristics after window segmentation into consideration, in the self-attention calculation process, an input visible light characteristic diagram F is given RGB ∈R 8×8×C And infrared characteristic diagram F IR ∈R 8×8×C The method comprises the steps of carrying out a first treatment on the surface of the Flattening each feature map and arranging the sequences of the matrixes to obtain sentences I RGB ∈R 64×C And I IR ∈R 64×C The method comprises the steps of carrying out a first treatment on the surface of the Then provide the input sentence I E R 128×C Connection sentence I RGB And I IR The method comprises the steps of carrying out a first treatment on the surface of the Projecting the input sentence I into three weight matrices to obtain a set of query Q, key K, and value V:
Q=IW Q ,K=IW K ,V=IW V
in which W is Q ∈R C×128 、W K ∈R C×128 And W is V ∈R C×128 All represent a weight matrix.
The self-attention calculation process is as follows:
Figure BDA0004133556140000061
where d represents the dimension that is either query Q or key K; FPRE denotes the position coding of visible and infrared features, and contains four types of position information: visible light position information RPE RGB Infrared location information RPE IR Visible and infrared relative position information RPE RGB- Infrared and visible light relative position information RPE iR-RGB
Figure BDA0004133556140000062
In the method, in the process of the invention,
Figure BDA0004133556140000063
t represents a matrix transposition operation;
the visible light characteristic X output after the deep interaction is obtained through the operation RGBout And infrared feature X IRout And adds the two to obtain the final fusion feature F fusion
F fusion =X RGBout +X IRout
The application also provides an infrared-visible light fusion system based on feature enhancement, comprising:
and a data acquisition module: the method comprises the steps of acquiring a multi-target data set, and preprocessing the multi-target data set, wherein the multi-target data set comprises a visible light image and an infrared image.
And the feature extraction module is used for: the method is used for constructing a double-flow trunk feature extraction network, the double-flow trunk feature extraction network comprises two branches with the same structure, and the visible light image and the infrared image are respectively sent into the two branches to extract deep features, so that visible light features and infrared features are obtained.
And a feature fusion module: the method comprises the steps that a feature fusion network is constructed, the feature fusion network comprises a cross feature enhancement module and a long-distance dependent fusion module, the cross feature enhancement module comprises a channel attention branch and a space attention branch which are arranged in series, the visible light features and the infrared features are sent into the channel attention branch to be enhanced, the visible light features before enhancement and the infrared features after enhancement through the channel attention branch are added to obtain a first feature, and the infrared features before enhancement and the visible light features after enhancement through the channel attention branch are added to obtain a second feature; the first feature and the second feature are sent to the space attention branch for enhancement, the first feature is added with the second feature enhanced by the space attention branch to obtain the final output of the visible light feature, and the second feature is added with the first feature enhanced by the space attention branch to obtain the final output of the infrared feature; and sending the final output of the visible light characteristic and the infrared characteristic to the long-distance dependent fusion module, and fusing by correlation of position codes based on a Swin transducer model.
The present application also provides a readable storage medium having one or more programs stored therein, the one or more programs being executable by one or more processors to implement the steps of the feature-enhanced infrared-visible fusion method described above.
The beneficial effects of this application lie in:
(1) Noise data is introduced in the data preprocessing process so as to force the network to learn complementary mode information, and therefore deviation of the network to a single mode is reduced;
(2) The cross characteristic enhancement module is utilized to perform characteristic enhancement from two angles of a channel and a space, wherein the characteristic enhancement comprises complementary information exchange between two modes, so that the multi-mode characteristics are better fused, the difference between the two modes is effectively overcome, the problem of serious shortage of visible information is solved, and the detection performance of a network is improved;
(3) The method has the advantages that the long-distance dependence fusion module is used, deep interaction information enhancement is focused, the characteristics of the two modes are synchronously segmented through moving the shift window, the depth interaction information enhancement is carried out on the fused characteristics through the multi-head self-retaining mechanism, the adaptability of the fused characteristics in complex illumination scenes is improved, and the omission and false detection of the detector are reduced.
Drawings
FIG. 1 shows a flow chart of the feature enhancement based infrared-visible fusion method provided herein;
FIG. 2 shows a schematic diagram of multi-objective dataset preprocessing;
FIG. 3 illustrates the architecture of a cross-feature enhancement module;
FIG. 4 illustrates the architecture of a long-range dependent fusion module;
FIG. 5 shows a flow chart of long distance attention fusion;
FIG. 6 is a schematic diagram showing the test results of the model of the present application in the first embodiment.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Referring to fig. 1-6 in combination, the present invention provides an infrared-visible light fusion method based on feature enhancement, comprising the following steps:
and (3) data acquisition: and acquiring a multi-target data set, and preprocessing the multi-target data set, wherein the multi-target data set comprises a visible light image and an infrared image.
The pretreatment mode is as follows: and processing the visible light image and the infrared image in the multi-target data set by adopting a data enhancement method for generating an image mask by a random area. The specific process is as follows: dividing the image into 10×10 checkerboards according to the size of the image, and selecting two blocks set to zero with a probability of 30% in each row to generate an image mask; the image mask is then divided into two complementary masks by rows, one of which is used as a mask for the visible light image and the other is used as a mask for the infrared image, the mask generation process being expressed as:
Generate mask =RGB mask ∪IR mask
RGB mask |IR mask =1;
in RGB mask Is a mask for visible light images, IR mask Is a mask for infrared images, the generator mask Is the total mask.
Feature extraction: and constructing a double-flow trunk feature extraction network, wherein the double-flow trunk feature extraction network comprises two branches with the same structure, and the visible light image and the infrared image are respectively sent into the two branches to extract deep features so as to obtain visible light features and infrared features.
The dual-flow backbone feature extraction network is redesigned to be a dual-flow backbone based on the YOLOv5 feature extraction network. The extracted visible light features are expressed as
Figure BDA0004133556140000091
The infrared characteristic is expressed as +.>
Figure BDA0004133556140000092
Figure BDA0004133556140000093
Representing a three-dimensional matrix; w is the width;h is the width; c represents the number of channels.
Feature fusion: constructing a feature fusion network, wherein the feature fusion network comprises a cross feature enhancement module and a long-distance dependent fusion module, the cross feature enhancement module comprises a channel attention branch and a space attention branch which are arranged in series, the visible light features and the infrared features are sent into the channel attention branch for enhancement, the visible light features before enhancement and the infrared features after enhancement through the channel attention branch are added to obtain a first feature, and the infrared features before enhancement and the visible light features after enhancement through the channel attention branch are added to obtain a second feature; the first feature and the second feature are sent to the space attention branch for enhancement, the first feature is added with the second feature enhanced by the space attention branch to obtain the final output of the visible light feature, and the second feature is added with the first feature enhanced by the space attention branch to obtain the final output of the infrared feature; and sending the final output of the visible light characteristic and the infrared characteristic to the long-distance dependent fusion module, and fusing by correlation of position codes based on a Swin transducer model.
The enhancement process of the channel attention branch is as follows: the input features are fully folded in one direction q while maintaining high resolution in the orthogonal direction v of direction q, the operation is expressed as:
W RGBq =σ 1 (F 1 (X RGB ));W RGBv =σ 2 (F 2 (X RGB ));
W IRq =σ 1 (F 1 (X IR ));W IRv =σ 2 (F 2 (X IR ));
in which W is RGBq Information indicating the visible light characteristic in the q direction; w (W) RGBv Information representing the visible light characteristic in the v direction; w (W) IRq Information representing the infrared signature in the q-direction; w (W) IRv Information representing the infrared signature in the v direction; sigma (sigma) 1 Sum sigma 2 All represent tensor shaping operators; f (F) 1 (. Cndot.) and F 2 (. Cndot.) all represent a 1X 1 convolution operation;
Figure BDA0004133556140000094
Figure BDA0004133556140000095
in W RGBq And W is IRq The weights respectively representing the visible light characteristic and the infrared characteristic are input into a Softmax function for classification, the weight distribution of the visible light characteristic and the infrared characteristic is output, and the calculation process is expressed as follows:
Figure BDA0004133556140000101
in which W is RGBk Weights representing visible light characteristics; w (W) IRk The weights representing the infrared features.
In the calculation of the weight distribution, the information is severely compressed, and the information W is used for maintaining the information intensity RGBv Multiplied by weight W RGBk Will information W IRv Multiplied by weight W IRk Then, a 1×1 convolution operation is performed, the channel dimension is upgraded from C/2 to C by adopting a standardized process, all parameters are kept in the range of 0-1 by using a Sigmoid function, and the calculation process is expressed as follows:
W RGBz =Sigmoid(σ 3 (F 3 (W RGBv ×W RGBk )));
W IRz =Sigmoid(σ 3 (F 3 (W IRv ×W IRk )));
in which W is RGBz Information representing the visible light characteristics in the z-direction; w (W) IRz Information representing the infrared signature in the z-direction; "×" represents a matrix dot product operation; f (F) 3 (. Cndot.) represents a 1×1 convolution operation; sigma (sigma) 3 Representing a tensor remodelling operator;
by the operation, the characteristic noise in the inter-mode representation can be effectively restrained by utilizing the visible appearance and the geometric characteristic with the maximum information quantity in the intra-mode representation.
Then X is taken up RGB And W is RGBz Multiplying the channel levels to obtain the characteristic W with smaller noise RGBln The method comprises the steps of carrying out a first treatment on the surface of the X is to be IR And W is IRz Multiplying the channel levels to obtain the characteristic W with smaller noise IRln The calculation process is expressed as: w (W) RGBln =X RGB ⊙W RGBz ;W IRln =X IR ⊙W IRz
In the method, in the process of the invention,
Figure BDA0004133556140000102
the ". Iy represents Hadamard product operation.
Features W RGBln And X is RGB Adding to perform recalibration enhancement to obtain a first feature A RGBch The method comprises the steps of carrying out a first treatment on the surface of the Features W RGBln And X is IR Adding to perform recalibration enhancement to obtain a second feature A IRch The calculation process is expressed as:
A RGBch =W IRln +X RGB ;A IRch =W RGBln +X IR
the enhancement process of the space attention branch is as follows: the input features are fully folded in direction q while maintaining high resolution in direction v, the process of operation is expressed as:
A RGBq =σ 4 (F GP (F 4 (A RGBch )));A RGBv =σ 5 (F 5 (A RGBch ));
A IRq =σ 4 (F GP (F 4 (A IRch )));A IRv =σ 5 (F 5 (A IRch ));
in sigma 4 Sum sigma 5 All represent tensor remodelling operators; f (F) 4 (. Cndot.) and F 5 (. Cndot.) all represent 1X 1 convolution operations; f (F) GP (. Cndot.) represents a global pooling operator,
Figure BDA0004133556140000111
A RGBq ∈R 1×C/2
A RGBv ∈R C/2×HW ;A IRq ∈R 1×C/2 ;A IRv ∈R C/2×HW
in A way RGBq And A IRq Weights respectively representing the first feature and the second feature are input into a Softmax function for classification, weight distribution of the first feature and the second feature is output, and the calculation process is represented as follows:
Figure BDA0004133556140000112
wherein A is RGBk Weights representing the first characteristic, A IRk And a weight representing the second feature.
In the calculation of the weight distribution, the information is severely compressed, and in order to maintain the information intensity, the information A is RGBv Multiplied by weight A RGBk Will information A IRv Multiplied by weight A IRk Then sequentially performing information completion, remodelling and Sigmoid functions:
A RGBz =Sigmoid(σ 6 (A RGBv ×A RGBk ));
A IRz =Sigmoid(σ 6 (A IRv ×A IRk ));
wherein A is RGBz ∈R 1×HW ;A IRz ∈R 1×HW A spatial gate representing the first feature and the second feature, respectively.
Then, by combining A RGBch And A RGBz Multiplying by A IRch And A IRz Multiplication to obtain spatially enhanced features A of the first and second features, respectively RGBln And A IRln The calculation process is expressed as:
A RGBln =A RGBch ⊙A RGBz ;A IRln =A IRch ⊙A IRz
wherein A is RGBln ∈R C×H×W ;A IRln ∈R C×H×W
Feature A IRln And A is a RGBch Adding to recalibrate enhancement to obtain final output X of visible light characteristics RGBout The method comprises the steps of carrying out a first treatment on the surface of the Feature A RGBln And A is a IRch Adding to recalibrate the enhancement to obtain the final output X of the infrared signature IRout The calculation process is expressed as:
X RGBout =A IRln +A RGBch ;X IRout =A RGBln +A IRch
in order to better fuse the visible light and infrared characteristics, the long-distance dependent fusion module is based on a Swin transform model, and fusion of multi-mode complementary information is greatly improved through correlation fusion characteristics of position codes.
The characteristic fusion process comprises the following steps:
the feature map is alternately divided into m×m dimensions using a shift window division method, and if the element map size is smaller than m×m, it is padded to the size of m×m. The window of the next module will then be relatively shifted by (M/2 ) pixels. With this calculation method, the calculation formula is as follows:
Figure BDA0004133556140000121
Figure BDA0004133556140000122
Figure BDA0004133556140000123
Figure BDA0004133556140000124
wherein F is i The representation being a joint input of visible and infrared features, F i ={X RGBout ,X IRout };F o Is the output characteristic of the transducer block;
Figure BDA0004133556140000125
and->
Figure BDA0004133556140000126
Is an intermediate variable; W-MSA and MW-MSA are window multi-head self-attention operation and mask window multi-head self-attention operation, respectively.
Taking partial characteristics after window segmentation into consideration, in the self-attention calculation process, an input visible light characteristic diagram F is given RGB ∈R 8×8×C And infrared characteristic diagram F IR ∈R 8×8×C . Then, each feature map is flattened and the order of the matrices is arranged to obtain sentence I RGB ∈R 64×C And I IRt ∈R 64×C . Then provide the input sentence I E R 128×C Connection sentence I RGB And I IR . Thirdly, the input sentence I is projected into three weight matrices to obtain a set of query Q, key K and value V:
Q=IW Q ,K=IW K ,V=IW V
in which W is Q ∈R C×128 、W K ∈R C×128 And W is V ∈R C×128 All represent a weight matrix.
The self-attention calculation process is as follows:
Figure BDA0004133556140000127
where d represents the dimension that is either query Q or key K; FPRE denotes the position coding of visible and infrared features, and contains four types of position information: visible light position information RPE RGB Infrared location information RPE IR Visible and infrared relative position information RPE RGB-IR Infrared and visible light relative position information RPE IR-RGB
Figure BDA0004133556140000131
In the method, in the process of the invention,
Figure BDA0004133556140000132
t denotes a matrix transpose operation.
The visible light characteristic X output after the deep interaction is obtained through the operation RGBout And infrared feature X IRout And adds the two to obtain the final fusion feature F fusion
F fusion =X RGBout +X IRout
The feature enhancement of the cross feature enhancement module is essentially that noise data is introduced to force a network to learn complementary mode information so as to reduce the deviation of the network to a single mode; the characteristic enhancement is carried out from the two angles of the channel and the space, including the complementary information exchange between the two modes, so that the multi-mode characteristics are better fused, the problems of the difference between the two modes and the serious shortage of visible information are effectively solved, and the detection performance of the network is improved. The long-distance dependent fusion module focuses on deep interactive information enhancement, the features of the two modes are synchronously segmented through moving a shift window, the depth interactive information enhancement is carried out on the fused features through a multi-head self-retaining mechanism, the adaptability of the fused features in complex illumination scenes is improved, and the omission and false detection of the detector are reduced.
The application also provides an infrared-visible light fusion system based on feature enhancement, comprising:
and a data acquisition module: the method comprises the steps of acquiring a multi-target data set, and preprocessing the multi-target data set, wherein the multi-target data set comprises a visible light image and an infrared image.
And the feature extraction module is used for: the method is used for constructing a double-flow trunk feature extraction network, the double-flow trunk feature extraction network comprises two branches with the same structure, and the visible light image and the infrared image are respectively sent into the two branches to extract deep features, so that visible light features and infrared features are obtained.
And a feature fusion module: the method comprises the steps that a feature fusion network is constructed, the feature fusion network comprises a cross feature enhancement module and a long-distance dependent fusion module, the cross feature enhancement module comprises a channel attention branch and a space attention branch which are arranged in series, the visible light features and the infrared features are sent into the channel attention branch to be enhanced, the visible light features before enhancement and the infrared features after enhancement through the channel attention branch are added to obtain a first feature, and the infrared features before enhancement and the visible light features after enhancement through the channel attention branch are added to obtain a second feature; the first feature and the second feature are sent to the space attention branch for enhancement, the first feature is added with the second feature enhanced by the space attention branch to obtain the final output of the visible light feature, and the second feature is added with the first feature enhanced by the space attention branch to obtain the final output of the infrared feature; and sending the final output of the visible light characteristic and the infrared characteristic to the long-distance dependent fusion module, and fusing by correlation of position codes based on a Swin transducer model.
The present application also provides a readable storage medium having one or more programs stored therein, the one or more programs being executable by one or more processors to implement the steps of the feature-enhanced infrared-visible fusion method described above.
Example 1
The model is built by the infrared-visible light fusion method based on feature enhancement, training is carried out on 1080Ti desktop computers, an SGD optimizer is used, the initial learning rate is 0.001, the momentum is 0.937, and the weight attenuation is 0.0005. The overall performance of our proposed method was extensively tested using the VEDAI database, with the test results shown in the following table:
Figure BDA0004133556140000141
Figure BDA0004133556140000151
from table 1, it is seen that the proposed method achieves the best performance compared to other methods. Compared with the optimal single-mode detection algorithm, mAP indexes (average of average accuracy) are improved by 12.9 percent; for the multi-mode detection method, compared with the optimal method, mAP (average accuracy) is improved by 3.1%; compared with a basic detector, the method provided by the application remarkably reduces missed detection and false detection. The method improves target feature representation through the cross feature enhancement module, and greatly improves detection performance. In addition, the method carries out deep interaction between the information in the two modes, and has higher information fusion degree and better detection performance.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims (8)

1. The infrared-visible light fusion method based on the characteristic enhancement is characterized by comprising the following steps of:
and (3) data acquisition: collecting a multi-target data set, and preprocessing the multi-target data set, wherein the multi-target data set comprises a visible light image and an infrared image;
feature extraction: constructing a double-flow trunk feature extraction network, wherein the double-flow trunk feature extraction network comprises two branches with the same structure, and the visible light image and the infrared image are respectively sent into the two branches to extract deep features so as to obtain visible light features and infrared features;
feature fusion: constructing a feature fusion network, wherein the feature fusion network comprises a cross feature enhancement module and a long-distance dependent fusion module, the cross feature enhancement module comprises a channel attention branch and a space attention branch which are arranged in series, the visible light features and the infrared features are sent into the channel attention branch for enhancement, the visible light features before enhancement and the infrared features after enhancement through the channel attention branch are added to obtain a first feature, and the infrared features before enhancement and the visible light features after enhancement through the channel attention branch are added to obtain a second feature; the first feature and the second feature are sent to the space attention branch for enhancement, the first feature is added with the second feature enhanced by the space attention branch to obtain the final output of the visible light feature, and the second feature is added with the first feature enhanced by the space attention branch to obtain the final output of the infrared feature; and sending the final output of the visible light characteristic and the infrared characteristic to the long-distance dependent fusion module, and fusing by correlation of position codes based on a Swin transducer model.
2. The feature-enhancement-based infrared-visible light fusion method of claim 1, wherein the preprocessing is performed in the following manner: the method for enhancing the data by generating the image mask by adopting the random area is used for processing the visible light image and the infrared image in the multi-target data set, and comprises the following specific processes: dividing the image into 10 x 10 checkerboards according to the size of the image, and selecting two blocks set to zero in each row with a probability of 30% as an image mask; the image mask is then divided into two complementary masks by rows, one of which is used as a mask for the visible light image and the other is used as a mask for the infrared image, the mask generation process being expressed as:
Generate mask =RGB mask ∪IR mask
RGB mask |IR mask =1;
in RGB mask Is a mask for visible light images, IR mask Is a mask for infrared images, the generator mask Is the total mask.
3. The infrared-visible light fusion method based on feature enhancement as claimed in claim 2, wherein the dual-stream trunk feature extraction network is a YOLOv5 feature extraction network of dual-stream trunk, and the extracted visible light features are expressed as
Figure FDA0004133556130000021
The infrared characteristic is expressed as +.>
Figure FDA0004133556130000022
Figure FDA0004133556130000023
Representing a three-dimensional matrix; w is the width; h is the width; c represents the number of channels.
4. The infrared-visible light fusion method based on feature enhancement according to claim 3, wherein the enhancement procedure of the channel attention branch is: the input features are fully folded in one direction q while maintaining high resolution in the orthogonal direction v of direction q, the operation is expressed as:
W RGB q =σ 1 (F 1 (X RGB ));W RGB v =σ 2 (F 2 (X RGB ));
W IR q =σ 1 (F 1 (X IR ));W IR v =σ 2 (F 2 (X IR ));
in which W is RGB q Information indicating the visible light characteristic in the q direction; w (W) RGB v Information representing the visible light characteristic in the v direction; w (W) IR q Information representing the infrared signature in the q-direction; w (W) IR v Information representing the infrared signature in the v direction; sigma (sigma) 1 Sum sigma 2 All represent tensor shaping operators; f (F) 1 (. Cndot.) and F 2 (. Cndot.) all represent a 1X 1 convolution operation;
in W RGB q And W is IR q The weights respectively representing the visible light characteristic and the infrared characteristic are input into a Softmax function for classification, the weight distribution of the visible light characteristic and the infrared characteristic is output, and the calculation process is expressed as follows:
Figure FDA0004133556130000024
in which W is RGB k Weights representing visible light characteristics; w (W) IR k Weights representing infrared features;
will information W RGB v Multiplied by weight W RGB k Will information W IR v Multiplied by weight W IR k Then, a 1×1 convolution operation is performed, the channel dimension is upgraded from C/2 to C by adopting a standardized process, all parameters are kept in the range of 0-1 by using a Sigmoid function, and the calculation process is expressed as follows:
W RGB z =Sigmoid(σ 3 (F 3 (W RGB v ×W RGB k )));
W IR z =Sigmoid(σ 3 (F 3 (W IR v ×W IR k )));
in which W is RGB z Information representing the visible light characteristics in the z-direction; w (W) IR z Information representing the infrared signature in the z-direction; "×" represents a matrix dot product operation; f (F) 3 (. Cndot.) represents a 1×1 convolution operation; sigma (sigma) 3 Representing a tensor remodelling operator;
then X is taken up RGB And W is RGB z Multiplying the channel levels to obtain the characteristic W with smaller noise RGB ln The method comprises the steps of carrying out a first treatment on the surface of the X is to be IR And W is IR z Multiplying the channel levels to obtain the characteristic W with smaller noise IR ln The calculation process is expressed as: w (W) RGB ln =X RGB ⊙W RGB z ;W IR ln =X IR ⊙W IR z
In the method, in the process of the invention,
Figure FDA0004133556130000031
the Hadamard product is calculated;
features W RGB ln And X is RGB Adding to perform recalibration enhancement to obtain a first feature A RGB ch The method comprises the steps of carrying out a first treatment on the surface of the Features W RGB ln And X is IR Adding to perform recalibration enhancement to obtain a second feature A IR ch The calculation process is expressed as:
A RGB ch =W IR ln +X RGB ;A IR ch =W RGB ln +X IR
5. the feature enhancement-based infrared-visible light fusion method of claim 4, wherein the enhancement procedure of the spatial attention branch is: the input features are fully folded in direction q while maintaining high resolution in direction v, the process of operation is expressed as:
A RGB q =σ 4 (F GP (F 4 (A RGB ch )));A RGB v =σ 5 (F 5 (A RGB ch ));
A IR q =σ 4 (F GP (F 4 (A IR ch )));A IR v =σ 5 (F 5 (A IR ch ));
in sigma 4 Sum sigma 5 All represent tensor remodelling operators; f (F) 4 (. Cndot.) and F 5 (. Cndot.) all represent 1X 1 convolution operations; f (F) GP (. Cndot.) represents a global pooling operator,
Figure FDA0004133556130000032
in A way RGB q And A IR q Weights respectively representing the first feature and the second feature are input into a Softmax function for classification, weight distribution of the first feature and the second feature is output, and the calculation process is represented as follows:
Figure FDA0004133556130000033
wherein A is RGB k Weights representing the first characteristic, A IR k Weights representing the second features;
will information A RGB v Multiplied by weight A RGB k Will information A IR v Multiplied by weight A IR k Then sequentially performing information completion, remodelling and Sigmoid functions:
A RGB z =Sigmoid(σ 6 (A RGB v ×A RGB k ));
A IR z =Sigmoid(σ 6 (A IR v ×A IR k ));
wherein A is RGB z ∈R 1×HW ;A IR z ∈R 1×HW A spatial gate representing the first feature and the second feature, respectively;
will A RGB ch And A RGB z Multiplying by A IR ch And A IR z Multiplication to obtain spatially enhanced features A of the first and second features, respectively RGB ln And A IR ln The calculation process is expressed as:
A RGB ln =A RGB ch ⊙A RGB z ;A IR ln =A IR ch ⊙A IR z
wherein A is RGB ln ∈R C×H×W ;A IR ln ∈R C×H×W
Feature A IR ln And A is a RGB ch Adding to recalibrate enhancement to obtain final output X of visible light characteristics RGB out The method comprises the steps of carrying out a first treatment on the surface of the Feature A RGB ln And A is a IR ch Adding to recalibrate the enhancement to obtain the final output X of the infrared signature IR out The calculation process is expressed as:
X RGB out =A IR ln +A RGB ch ;X IR out =A RGB ln +A IR ch
6. the infrared-visible light fusion method based on feature enhancement according to claim 5, wherein the feature fusion process is as follows:
using a shift window dividing method to alternately divide the feature map into M x M dimensions, and if the element map size is smaller than M x M, filling it into M x M size; the window of the next module will then be relatively shifted by (M/2 ) pixels; with this calculation method, the calculation formula is as follows:
Figure FDA0004133556130000041
Figure FDA0004133556130000042
Figure FDA0004133556130000043
Figure FDA0004133556130000044
wherein F is i The representation being a joint input of visible and infrared features, F i ={X RGB out ,X IR out };F o Is the output characteristic of the transducer block;
Figure FDA0004133556130000045
and->
Figure FDA0004133556130000046
Is an intermediate variable; W-MSA and MW-MSA are window multi-head self-attention operation and mask window multi-head self-attention operation, respectively;
taking partial characteristics after window segmentation into consideration, in the self-attention calculation process, an input visible light characteristic diagram F is given RGB ∈R 8×8×C And infrared characteristic diagram F IR ∈R 8×8×C The method comprises the steps of carrying out a first treatment on the surface of the Flattening each feature map and arranging the sequences of the matrixes to obtain sentences I RGB ∈R 64×C And I IR ∈R 64×C The method comprises the steps of carrying out a first treatment on the surface of the Then provide the input sentence I E R 128×C Connection sentence I RGB And I IR The method comprises the steps of carrying out a first treatment on the surface of the Projecting the input sentence I into three weight matrices to obtain a set of query Q, key K, and value V:
Q=IW Q ,K=IW K ,V=IW V
in which W is Q ∈R C×128 、W K ∈R C×128 And W is V ∈R C×128 All represent a weight matrix.
The self-attention calculation process is as follows:
Figure FDA0004133556130000051
where d represents the dimension that is either query Q or key K; FPRE denotes the position coding of visible and infrared features, and contains four types of position information: visible light position information RPE RGB Infrared location information RPE IR Visible and infrared relative position information RPE RGB-IR Infrared and visible light relative position information RPE IR-R
Figure FDA0004133556130000052
In the method, in the process of the invention,
Figure FDA0004133556130000053
t represents a matrix transposition operation;
the visible light characteristic X output after the deep interaction is obtained through the operation RGB out And infrared feature X IR out And adds the two to obtain the final fusion feature F fusion
F fusion =X RGB out +X IR out
7. An infrared-visible light fusion system based on feature enhancement, comprising:
and a data acquisition module: the method comprises the steps of acquiring a multi-target data set, and preprocessing the multi-target data set, wherein the multi-target data set comprises a visible light image and an infrared image.
And the feature extraction module is used for: the method is used for constructing a double-flow trunk feature extraction network, the double-flow trunk feature extraction network comprises two branches with the same structure, and the visible light image and the infrared image are respectively sent into the two branches to extract deep features, so that visible light features and infrared features are obtained.
And a feature fusion module: the method comprises the steps that a feature fusion network is constructed, the feature fusion network comprises a cross feature enhancement module and a long-distance dependent fusion module, the cross feature enhancement module comprises a channel attention branch and a space attention branch which are arranged in series, the visible light features and the infrared features are sent into the channel attention branch to be enhanced, the visible light features before enhancement and the infrared features after enhancement through the channel attention branch are added to obtain a first feature, and the infrared features before enhancement and the visible light features after enhancement through the channel attention branch are added to obtain a second feature; the first feature and the second feature are sent to the space attention branch for enhancement, the first feature is added with the second feature enhanced by the space attention branch to obtain the final output of the visible light feature, and the second feature is added with the first feature enhanced by the space attention branch to obtain the final output of the infrared feature; and sending the final output of the visible light characteristic and the infrared characteristic to the long-distance dependent fusion module, and fusing by correlation of position codes based on a Swin transducer model.
8. A readable storage medium having one or more programs stored therein, the one or more programs being executable by one or more processors to implement the steps of the feature-based enhanced infrared-visible light fusion method of any of claims 1-6.
CN202310267771.6A 2023-03-20 2023-03-20 Feature enhancement-based infrared-visible light fusion method, system and readable storage medium Pending CN116258934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310267771.6A CN116258934A (en) 2023-03-20 2023-03-20 Feature enhancement-based infrared-visible light fusion method, system and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310267771.6A CN116258934A (en) 2023-03-20 2023-03-20 Feature enhancement-based infrared-visible light fusion method, system and readable storage medium

Publications (1)

Publication Number Publication Date
CN116258934A true CN116258934A (en) 2023-06-13

Family

ID=86684318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310267771.6A Pending CN116258934A (en) 2023-03-20 2023-03-20 Feature enhancement-based infrared-visible light fusion method, system and readable storage medium

Country Status (1)

Country Link
CN (1) CN116258934A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912649A (en) * 2023-09-14 2023-10-20 武汉大学 Infrared and visible light image fusion method and system based on relevant attention guidance

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912649A (en) * 2023-09-14 2023-10-20 武汉大学 Infrared and visible light image fusion method and system based on relevant attention guidance
CN116912649B (en) * 2023-09-14 2023-11-28 武汉大学 Infrared and visible light image fusion method and system based on relevant attention guidance

Similar Documents

Publication Publication Date Title
CN111738124A (en) Remote sensing image cloud detection method based on Gabor transformation and attention
CN112991350B (en) RGB-T image semantic segmentation method based on modal difference reduction
CN112257741B (en) Method for detecting generative anti-false picture based on complex neural network
CN113192124B (en) Image target positioning method based on twin network
CN115375737B (en) Target tracking method and system based on adaptive time and serialized space-time characteristics
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN114140623A (en) Image feature point extraction method and system
CN116258934A (en) Feature enhancement-based infrared-visible light fusion method, system and readable storage medium
CN117292117A (en) Small target detection method based on attention mechanism
CN115937693A (en) Road identification method and system based on remote sensing image
CN116402851A (en) Infrared dim target tracking method under complex background
CN115171074A (en) Vehicle target identification method based on multi-scale yolo algorithm
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN117252904A (en) Target tracking method and system based on long-range space perception and channel enhancement
CN118351410A (en) Multi-mode three-dimensional detection method based on sparse agent attention
CN112418203B (en) Robustness RGB-T tracking method based on bilinear convergence four-stream network
CN116977747B (en) Small sample hyperspectral classification method based on multipath multi-scale feature twin network
CN117315473A (en) Strawberry maturity detection method and system based on improved YOLOv8
CN114943949B (en) Parallel multi-element feature processing method and system
CN116012349A (en) Hyperspectral image unmixing method based on minimum single-body volume constraint and transducer structure
CN115797684A (en) Infrared small target detection method and system based on context information
CN114581790A (en) Small target detection method based on image enhancement and multi-feature fusion
CN114694042A (en) Disguised person target detection method based on improved Scaled-YOLOv4
CN117392392B (en) Rubber cutting line identification and generation method
CN116486203B (en) Single-target tracking method based on twin network and online template updating

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination