CN114898189A - Infrared and visible light fusion recognition system and method based on modal difference feature guidance - Google Patents

Infrared and visible light fusion recognition system and method based on modal difference feature guidance Download PDF

Info

Publication number
CN114898189A
CN114898189A CN202210333408.5A CN202210333408A CN114898189A CN 114898189 A CN114898189 A CN 114898189A CN 202210333408 A CN202210333408 A CN 202210333408A CN 114898189 A CN114898189 A CN 114898189A
Authority
CN
China
Prior art keywords
infrared
visible light
feature
scale
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210333408.5A
Other languages
Chinese (zh)
Inventor
秦翰林
罗国慧
延翔
欧洪璇
孙鹏
张昱赓
陈嘉欣
冯冬竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210333408.5A priority Critical patent/CN114898189A/en
Publication of CN114898189A publication Critical patent/CN114898189A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an infrared visible light fusion recognition system based on modal difference feature guidance, which comprises a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module, wherein the double-current backbone network module comprises an infrared feature extraction unit, a visible light feature extraction unit and a feature guidance unit, the infrared feature extraction unit acquires infrared features of different scales of an original infrared image, and the visible light feature extraction unit acquires visible light features of different scales of a visible light image; the characteristic guide unit obtains weighted infrared characteristics and weighted visible light characteristics; the illumination heat sensing module acquires a reliability weight of the visible light characteristic and a reliability weight of the infrared characteristic; the cascade region proposing module is used for obtaining the identification result of the target. The invention adopts the inter-modal feature difference to guide the inter-level feature generation mode of supplementary learning, improves the dependency of inter-modal feature representation and improves the generalization capability of a network system.

Description

Infrared and visible light fusion recognition system and method based on modal difference feature guidance
Technical Field
The invention belongs to the technical field of multispectral image identification, and particularly relates to an infrared visible light fusion detection identification method based on modal difference feature guidance.
Background
The infrared image can capture clear outline information of the target, and the visible light image can present fine visual information of the target, including color, texture, structure and the like. Under the condition of insufficient illumination, the target detection is indispensable by effectively utilizing the complementary information of the two wave bands.
In the early exploration in the field of infrared-visible light fusion target detection, many scholars focus on studying imaging hardware and calibrating to obtain paired infrared-visible light image sequences, and design a dual-band fusion detector by using a machine learning detection method to verify the effect of a data set. Hwang S, Park J, Kim N, etc. has set up an infrared-visible light pairing imaging system composed of a color camera, a thermal imager, a beam splitter and a triaxial camera clamp, designed a dual-band target detector composed of an ACF (visible light path aggregation channel characteristic) + T (pixel intensity) + THOG (gradient direction histogram) characteristic and an Adaboost classifier, and then obtain a classification result by extracting and fusing the ACF, the infrared pixel intensity and the THOG characteristic.
With the rapid development of the deep neural network in the field of visible light image detection, researchers are dedicated to applying the deep neural network to dual-band feature fusion detection. Aiming at the problem that the two modes of infrared and visible light have field of view and mode difference, Zhang L, Zhu X, Chen X and the like design a new alignment area CNN network to process weak alignment data end to end. And for a group of input color images and heat images, after feature extraction, matching the input region features with the number prediction and the region of interest, merging the aligned image features, and then performing confidence fusion. In the method, the difference between the modes is relieved, but the generation of the current characteristics is guided by mutual complementation of the difference of the characteristics between the modes without starting from different characteristic layers. Meanwhile, in the feature fusion process, a complete fusion guidance strategy is not established in the current method, so that the fusion detection precision is low.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an infrared and visible light fusion detection and identification method based on modal difference feature guidance. The technical problem to be solved by the invention is realized by the following technical scheme:
the invention provides an infrared visible light fusion recognition system based on modal difference characteristic guidance, which comprises a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module, wherein,
the double-current backbone network module comprises an infrared feature extraction unit, a visible light feature extraction unit and a feature guide unit, wherein the infrared feature extraction unit is used for acquiring infrared features of original infrared images in different scales, and the visible light feature extraction unit is used for acquiring visible light features of visible light images in the same scene as the infrared images in different scales; the feature guiding unit is used for fusing and weighting the infrared features and the visible light features of the current scale to obtain weighted infrared features and weighted visible light features so as to guide the formation of the infrared features and the visible light features of the next scale in the infrared feature extracting unit and the visible light feature extracting unit in a crossed manner;
the illumination heat sensing module is used for obtaining a reliability weight of the visible light characteristic and a reliability weight of the infrared characteristic according to the image heat value of the original infrared image and the image illumination value of the original visible light image;
the cascade region proposing module is used for obtaining the identification result of the target according to the infrared features with different scales, the visible light features with different scales, the credibility weight of the visible light features and the credibility weight of the infrared features.
In one embodiment of the present invention, the infrared feature extraction unit includes a plurality of cascaded infrared feature extraction convolution layers to obtain infrared light features of different scales, the visible light feature extraction unit includes a plurality of cascaded visible light feature extraction units to obtain visible light features of different scales, the feature guiding unit includes a plurality of cascaded mode difference guiding sub-units, wherein,
the modal difference guiding subunit obtains weighted infrared features, weighted visible light features and infrared-visible light fusion vectors by using the infrared features and visible light features of the current scale and the infrared-visible light fusion vectors obtained by the previous modal difference guiding subunit, transmits the weighted infrared features to the visible light feature extraction unit of the next scale, and transmits the weighted visible light features to the infrared feature extraction unit of the next scale so as to guide the formation of the infrared features and visible light features of the next scale in the infrared feature extraction unit and the visible light feature extraction unit; and transmitting the infrared-visible light fusion vector to a next modal difference guiding subunit.
In an embodiment of the invention, the modal difference guidance subunit is specifically configured to:
fusing the current visible light characteristic, the current infrared characteristic and the visible light infrared fusion characteristic output by the previous mode difference guide subunit, and performing characteristic separation through three parallel 3 × 3 convolution layers to obtain the visible light characteristic, the infrared characteristic and the visible light infrared fusion characteristic comprising more semantic information; and then, obtaining a weight value corresponding to the current visible light characteristic and the current infrared characteristic by using a Sigmoid activation function, thereby forming the visible light characteristic with the weight value and the infrared characteristic with the weight value.
In an embodiment of the present invention, the infrared feature extraction unit specifically includes a first base layer, a first dense fusion layer, a second dense fusion layer, a third dense fusion layer, and a fourth dense fusion layer, the visible light feature extraction unit specifically includes a second base layer, a fifth dense fusion layer, a sixth dense fusion layer, a seventh dense fusion layer, and an eighth dense fusion layer, the feature guiding unit includes a first modal feature difference guiding subunit, a second modal feature difference guiding subunit, a third modal feature difference guiding subunit, and a fourth modal feature difference guiding subunit, wherein,
the first base layer is used for inputting an original visible light image and obtaining first-scale visible light features, the second base layer is used for inputting an original infrared image and obtaining first-scale infrared features, and the first modal feature difference guiding subunit is used for fusing the first-scale visible light features and the first-scale infrared features and obtaining first-scale infrared visible light fusion vectors, first weighted infrared features and first weighted visible light features;
the first dense fusion layer is used for obtaining a second scale visible light characteristic by utilizing the first weighted infrared characteristic and the first scale visible light characteristic; the fifth dense fusion layer is used for obtaining second-scale infrared features by utilizing the first weighted visible light features and the first-scale infrared features; the second modal feature difference guiding subunit is configured to fuse the second scale visible light feature, the second scale infrared feature, and the first scale infrared-visible light fusion vector to obtain a second scale infrared-visible light fusion vector, a second weighted infrared feature, and a second weighted visible light feature;
the second dense fusion layer is used for obtaining a third-scale visible light characteristic by utilizing the second weighted infrared characteristic and the second-scale visible light characteristic; the sixth dense fusion layer is used for obtaining a third-scale infrared feature by utilizing the second weighted visible light feature and the second-scale infrared feature; the third modal characteristic difference guiding subunit is configured to fuse the third scale visible light characteristic, the third scale infrared characteristic, and the second scale infrared-visible light fusion vector to obtain a third scale infrared-visible light fusion vector, a third weighted infrared characteristic, and a third weighted visible light characteristic;
the third dense fusion layer is used for obtaining a fourth scale visible light characteristic by utilizing the third weighted infrared characteristic and the third scale visible light characteristic; the seventh dense fusion layer is used for obtaining a fourth-scale infrared feature by using the third weighted visible light feature and the third-scale infrared feature; the fourth modal feature difference guiding subunit is configured to fuse the fourth scale visible light feature, the fourth scale infrared feature, and the third scale infrared-visible light fusion vector to obtain a fourth scale infrared-visible light fusion vector, a fourth weighted infrared feature, and a fourth weighted visible light feature;
the fourth dense fusion layer is used for obtaining a fifth scale visible light characteristic by utilizing the fourth weighted infrared characteristic and the fourth scale visible light characteristic; and the eighth dense fusion layer is used for obtaining a fifth-scale infrared feature by using the fourth weighted visible light feature and the fourth-scale infrared feature.
In one embodiment of the present invention, the first base layer and the second base layer are each composed of a 5 × 5 convolution kernel with step size 1, a 1 × 1 convolution kernel with step size 1, and a 3 × 3 convolution kernel with step size 2; the first dense fusion layer, the second dense fusion layer, the third dense fusion layer, the fourth dense fusion layer, the fifth dense fusion layer, the sixth dense fusion layer, the seventh dense fusion layer and the eighth dense fusion layer are all composed of a shullflentv 2 base convolution kernel with a step size of 2.
In one embodiment of the present invention, the illumination heat sensing module includes an illumination heat sensing unit, an illumination sensing mechanism unit, and a heat sensing mechanism unit, wherein,
the illumination heat sensing unit is used for processing the input original visible light image and the original infrared image to obtain a minimum illumination value and a maximum illumination value of the target area and a minimum heat value and a maximum heat value of the target area;
the illumination sensing mechanism unit is used for acquiring a visible light characteristic reliability weight and an infrared characteristic reliability weight under the current illumination condition according to the lowest illumination value and the highest illumination value of the target area;
the heat sensing mechanism unit is used for obtaining a visible light characteristic reliability weight and an infrared characteristic reliability weight under the current heat condition according to the lowest heat value and the highest heat value of the target area.
In one embodiment of the present invention, the light heat sensing unit includes two convolutional layers and three fully-connected layers, a RELU activation function and a 2 × 2 max pooling layer are adopted after the convolutional layers to compress and extract features, and the loss of light intensity of the light heat sensing unit is:
Figure BDA0003575838110000061
wherein, w d And w n Is the lowest illumination value and the highest illumination value of the target area acquired by the illumination heat sensing unit,
Figure BDA0003575838110000062
and
Figure BDA0003575838110000063
a real label indicating daytime and nighttime,
loss of infrared intensity of the illumination heat sensing unit
Figure BDA0003575838110000064
Comprises the following steps:
Figure BDA0003575838110000065
wherein m is d And m n Is the lowest heat value and the highest heat value of the target area acquired by the illumination heat sensing unit,
Figure BDA0003575838110000066
and
Figure BDA0003575838110000067
a genuine label representing cold and hot.
In one embodiment of the invention, the cascaded region proposal module comprises a feature fusion unit, a classification regression unit and a parameter adjustment unit, wherein,
the feature fusion unit is used for fusing the visible light feature of the third scale, the visible light feature of the fourth scale, the visible light feature of the fifth scale, the infrared feature of the third scale, the infrared feature of the fourth scale and the infrared feature of the fifth scale to obtain a fused image;
the classification regression unit is used for carrying out probability prediction on each target type in the fused image to obtain a target class approximation degree estimation value, and calculating the intersection and parallel ratio of each target position prediction frame and a true value frame to obtain a target position deviation estimation value;
and the parameter adjusting unit adjusts the target category approximation degree estimation value and the target position deviation estimation value through the heat illumination sensing parameters output by the illumination heat sensing module, and obtains a category prediction structure and a position prediction result of the target in the image.
The invention provides an infrared visible light fusion identification method based on modal difference feature guidance, which comprises the following steps:
s1: constructing an infrared and visible light fusion recognition system based on modal difference feature guidance according to any one of the above embodiments;
s2: training a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module in the infrared and visible light fusion recognition system to obtain the trained infrared and visible light fusion recognition system;
s3: and inputting the original infrared image and the original visible light image of the same scene into the trained infrared and visible light fusion recognition system to obtain the target type and the target position in the picture.
In an embodiment of the present invention, the S2 includes:
forming a first training data set by using a large number of pictures with heat value labels and illumination value labels, and training the illumination heat sensing module to obtain a trained illumination heat sensing module;
the infrared and visible light fusion recognition system is constructed by utilizing an original double-current backbone network module, a trained illumination heat sensing module and an original cascade region proposing module, and the original double-current backbone network module and the original cascade region proposing module in the infrared and visible light fusion recognition system are trained by utilizing a second training data set formed by a large number of pictures with target position labels and target type labels, so that the trained infrared and visible light fusion recognition system is obtained.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the method, an inter-modal feature difference guiding and supplementary learning inter-level feature generation mode is adopted, the reliability weight of each modal feature is obtained by densely embedding modal feature difference guiding units between feature layers, and then deep integration of visible light and infrared modal features is carried out, so that the problem of false detection caused by the fact that a simple fusion strategy of linear combination or splicing retains wrong information in the fusion process is solved.
2. The invention fuses different feature layers and different scale features, improves the detection performance of multi-scale targets, has higher resolution of low-level features, contains more position and detail information, has lower semantic property and more noise, has stronger semantic information of high-level features, but has low resolution and poorer detail perception capability, and is beneficial to the detection capability of targets with different sizes by utilizing the multi-level features.
3. According to the invention, the infrared heat intensity and visible light illuminance sensing module is adopted to obtain the illumination value and the heat intensity value in the current scene, the confidence coefficient scores and the position offset scores of the infrared modal branch and the visible light modal branch are adaptively adjusted, and the generalization capability of the network to different scenes, weathers and targets is improved.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
Fig. 1 is a block diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a modal difference guiding subunit according to an embodiment of the present invention;
fig. 4 is a schematic working process diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following describes in detail an infrared-visible light fusion recognition system and method based on modal difference feature guidance according to the present invention with reference to the accompanying drawings and the detailed description.
The foregoing and other technical matters, features and effects of the present invention will be apparent from the following detailed description of the embodiments, which is to be read in connection with the accompanying drawings. The technical means and effects of the present invention adopted to achieve the predetermined purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only and are not used for limiting the technical scheme of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or device that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of additional like elements in the article or device comprising the element.
Example one
Referring to fig. 1 and fig. 2, fig. 1 is a block diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention; fig. 2 is a schematic structural diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention. The infrared visible light fusion recognition system comprises a double-current backbone network module 1, an illumination heat sensing module 2 and a cascade region proposing module 3. The dual-current backbone network module 1 comprises an infrared feature extraction unit 11, a visible light feature extraction unit 12 and a feature guide unit 13, wherein the infrared feature extraction unit 11 is used for acquiring infrared features of different scales of an original infrared image, and the visible light feature extraction unit 12 is used for acquiring visible light features of different scales of a visible light image in the same scene as the original infrared image; the feature guiding unit 13 is configured to fuse and weight the infrared features and the visible light features of the current scale to obtain weighted infrared features and weighted visible light features, so as to cross-guide formation of the infrared features and the visible light features of the next scale in the infrared feature extracting unit 11 and the visible light feature extracting unit 12.
The illumination heat sensing module 2 is used for obtaining a reliability weight of the visible light characteristic and a reliability weight of the infrared characteristic according to the image heat value of the original infrared image and the image illumination value of the original visible light image; the cascade region proposing module 3 is used for obtaining the identification result of the target according to the infrared features of different scales, the visible light features of different scales, the reliability weight of the visible light features and the reliability weight of the infrared features.
Further, the infrared feature extraction unit 11 in the dual-flow backbone network module 1 includes a plurality of cascaded infrared feature extraction convolution layers to obtain infrared features of different scales, and the visible light feature extraction unit 12 includes a plurality of cascaded visible light feature extraction units to obtain visible light features of different scales. The feature guiding unit 13 includes a plurality of cascaded modal difference guiding subunits, where the modal difference guiding subunit obtains a weighted infrared feature, a weighted visible light feature, and an infrared-visible light fusion vector by using the infrared feature and the visible light feature of the current scale and the infrared-visible light fusion vector obtained by the previous modal difference guiding subunit, transmits the weighted infrared feature to the visible light feature extraction unit of the next scale, and transmits the weighted visible light feature to the infrared feature extraction unit of the next scale, so as to guide the formation of the infrared feature and the visible light feature of the next scale in the infrared feature extraction unit and the visible light feature extraction unit; and transmitting the infrared-visible light fusion vector to a next modal difference guiding subunit.
Further, referring to fig. 3, the modal difference guiding subunit is specifically configured to:
fusing the current visible light characteristic, the current infrared characteristic and the visible light infrared fusion characteristic output by the previous mode difference guide subunit, and performing characteristic separation through three parallel 3 × 3 convolution layers to obtain the visible light characteristic, the infrared characteristic and the visible light infrared fusion characteristic comprising more semantic information; and then, obtaining a weight value corresponding to the current visible light characteristic and the current infrared characteristic by using a Sigmoid activation function, thereby forming a weighted visible light characteristic and a weighted infrared characteristic.
Further, referring to fig. 2, the infrared feature extraction unit 11 of the present embodiment specifically includes a first base layer, a first dense fusion layer, a second dense fusion layer, a third dense fusion layer, and a fourth dense fusion layer, the visible light feature extraction unit 12 specifically includes a second base layer, a fifth dense fusion layer, a sixth dense fusion layer, a seventh dense fusion layer, and an eighth dense fusion layer, the feature guiding unit includes 13 a first modal feature difference guiding subunit, a second modal feature difference guiding subunit, a third modal feature difference guiding subunit, and a fourth modal feature difference guiding subunit, wherein,
the first base layer is used for inputting an original visible light image and obtaining first-scale visible light features, the second base layer is used for inputting an original infrared image and obtaining first-scale infrared features, and the first modal feature difference guiding subunit is used for fusing the first-scale visible light features and the first-scale infrared features and obtaining first-scale infrared visible light fusion vectors, first weighted infrared features and first weighted visible light features;
the first dense fusion layer is used for obtaining a second scale visible light characteristic by utilizing the first weighted infrared characteristic and the first scale visible light characteristic; the fifth dense fusion layer is used for obtaining second-scale infrared features by utilizing the first weighted visible light features and the first-scale infrared features; the second modal feature difference guiding subunit is configured to fuse the second scale visible light feature, the second scale infrared feature, and the first scale infrared-visible light fusion vector to obtain a second scale infrared-visible light fusion vector, a second weighted infrared feature, and a second weighted visible light feature;
the second dense fusion layer is used for obtaining a third-scale visible light characteristic by utilizing the second weighted infrared characteristic and the second-scale visible light characteristic; the sixth dense fusion layer is used for obtaining a third-scale infrared feature by utilizing the second weighted visible light feature and the second-scale infrared feature; the third modal characteristic difference guiding subunit is configured to fuse the third scale visible light characteristic, the third scale infrared characteristic, and the second scale infrared-visible light fusion vector to obtain a third scale infrared-visible light fusion vector, a third weighted infrared characteristic, and a third weighted visible light characteristic;
the third dense fusion layer is used for obtaining a fourth scale visible light characteristic by utilizing the third weighted infrared characteristic and the third scale visible light characteristic; the seventh dense fusion layer is used for obtaining a fourth-scale infrared feature by using the third weighted visible light feature and the third-scale infrared feature; the fourth modal feature difference guiding subunit is configured to fuse the fourth scale visible light feature, the fourth scale infrared feature, and the third scale infrared-visible light fusion vector to obtain a fourth scale infrared-visible light fusion vector, a fourth weighted infrared feature, and a fourth weighted visible light feature;
the fourth dense fusion layer is used for obtaining a fifth scale visible light characteristic by utilizing the fourth weighted infrared characteristic and the fourth scale visible light characteristic; and the eighth dense fusion layer is used for obtaining a fifth-scale infrared feature by using the fourth weighted visible light feature and the fourth-scale infrared feature.
Specifically, the infrared feature extraction unit and the visible light feature extraction unit in the dual-flow backbone network module of the present embodiment respectively include five convolution layers (i.e., one base layer and four dense fusion layers) and embed one modal difference feature guiding subunit layer by layer. Firstly, respectively extracting the characteristics of an original infrared image and a visible light image through a convolutional layer to obtain visible light characteristics and infrared characteristics, then inputting the visible light characteristics, the infrared characteristics and infrared visible light fusion characteristics obtained by fusing the infrared characteristics and the visible light characteristics of a previous layer into a current mode difference characteristic guide subunit, outputting weighted infrared characteristics to guide the generation of the visible light characteristics of the next stage through cross guidance, and outputting the weighted visible light characteristics to guide the generation of the infrared characteristics of the next stage.
Furthermore, five convolution layers in the infrared characteristic extraction unit and the visible light characteristic extraction unit comprise a base layer and four dense fusion layers, wherein the base layer is composed of a 5 × 5depth-wise convolution kernel with the step size of 1, a 1 × 1 convolution kernel with the step size of 1 and a 3 × 3 convolution kernel with the step size of 2, and the dense fusion layers are composed of a shullflentv 2 base convolution kernel with the step size of 2. In this embodiment, the first base layer and the second base layer are each composed of a 5 × 5 convolution kernel with a step size of 1, a 1 × 1 convolution kernel with a step size of 1, and a 3 × 3 convolution kernel with a step size of 2; the first dense fusion layer, the second dense fusion layer, the third dense fusion layer, the fourth dense fusion layer, the fifth dense fusion layer, the sixth dense fusion layer, the seventh dense fusion layer and the eighth dense fusion layer are all composed of a shullflentv 2 base convolution kernel with a step size of 2. After an original visible light image with the size of H multiplied by W multiplied by C (H is the height of the image, W is the width of the image, and C is the number of channels) is input into the visible light feature extraction unit, different-scale visible light feature maps with the sizes of H/2 multiplied by W/2 multiplied by 32, H/4 multiplied by W/4 multiplied by 64, H/8 multiplied by W/8 multiplied by 64, H/16 multiplied by W/16 multiplied by 128 and H/32 multiplied by W/32 multiplied by 256 are obtained step by step in the hierarchical convolution process, and similarly, an original infrared image with the size of H multiplied by W multiplied by C (H is the height of the image, W is the width of the image, and C is the number of channels) is input into the infrared feature extraction unit, and H/2 multiplied by W/2 multiplied by 32, H/4 multiplied by W/4 multiplied by 64, H/8 multiplied by W/8 multiplied by 64 are obtained step by step in the hierarchical convolution process, H/16 xW/16 x 128 and H/32 xW/32 x 256 infrared feature maps in different scales.
Further, the input of each modal difference guiding subunit in the feature guiding unit 13 includes the visible light feature extracted from the current-stage visible light convolutional layer, the infrared feature extracted from the current-stage infrared convolutional layer, and the visible light infrared fusion feature obtained in the previous-stage modal difference guiding subunit. The modal difference guiding subunit fuses the current visible light feature, the current infrared feature and the visible light infrared fusion feature output by the previous modal difference guiding subunit, separates the three features through three parallel 3 × 3 convolutions, increases the receptive field to obtain a higher-level visible light feature, an infrared feature and a visible light infrared fusion feature which comprise more semantic information (namely, obtain the correlation information between the target and the surrounding background), and obtains a weight corresponding to each modal feature by using a Sigmoid activation function according to the contribution ratio of each modal (visible light modality or infrared modality) in the double-flow backbone network, and obtains an option infrared feature and a weighted visible light feature. The output weighted infrared signature is then returned to the visible signature stream for a stitching operation. And returning the output weighted visible light characteristics to the infrared characteristic stream for splicing. Meanwhile, the output infrared visible light fusion characteristics are input to a next-stage characteristic difference guiding subunit to participate in next-stage characteristic generation.
Specifically, before the feature guiding unit is not added, the layer-by-layer feature generation process in the dual-flow backbone network module may be represented as:
Figure BDA0003575838110000131
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003575838110000132
the characteristic of the l-th layer visible light is shown,
Figure BDA0003575838110000133
indicating the visible light characteristics of the l +1 st layer. f (x, l) represents the convolution merge and activation operation of the l-th layer. According to the chain rule, the visible light back propagation formula of the corresponding loss function can be expressed as:
Figure BDA0003575838110000141
accordingly, the formula for the back propagation of the infrared signature can be expressed as:
Figure BDA0003575838110000142
wherein the content of the first and second substances,
Figure BDA0003575838110000143
the infrared characteristics of the layer l are shown,
Figure BDA0003575838110000144
indicating the infrared characteristics of the l +1 layer. f (x, l) represents the convolution merge and activation operation of the l-th layer.
Further, after the feature difference guide subunits are densely added in the double-current backbone network module, the weighted visible light features are used for guiding the generation of the next-stage infrared features, and the cross guide among the modes is completed. The inter-level visible light feature generation pattern may be expressed as:
Figure BDA0003575838110000145
wherein the content of the first and second substances,
Figure BDA0003575838110000146
the characteristics of the visible light of the l layers are shown,
Figure BDA0003575838110000147
represents the visible light characteristics of the l +1 layer,
Figure BDA0003575838110000148
representing the weighting characteristics obtained after the infrared characteristics and the visible light characteristics pass through the weight sensing module, wherein tau (-) represents the guidance of characteristic addition or splicing, and the corresponding back propagation obtained according to the chain rule can be represented as:
Figure BDA0003575838110000149
Figure BDA00035758381100001410
wherein the content of the first and second substances,
Figure BDA00035758381100001411
and psi (-) represents the weighting operation of the infrared features of the visible light. In the feature difference guidance subunit, the generation of the visible light feature includes the co-participation of the infrared weighting feature and the visible light feature, as does the infrared weighting feature. The cross difference guide mechanism generated by the feature guide unit can deepen the connection among the multi-modal networks and establish deeper and more accurate modal correlation, thereby obtaining more distinctive enhanced features.
Further, the illumination heat sensing module 2 of the present embodiment includes an illumination heat sensing unit 21, an illumination sensing mechanism unit 22, and a heat sensing mechanism unit 23, where the illumination heat sensing unit 21 is configured to process the input original visible light image and the input original infrared image to obtain a lowest illumination value and a highest illumination value of the target area, and a lowest heat value and a highest heat value of the target area; the illumination sensing mechanism unit 22 is configured to obtain a visible light feature reliability weight and an infrared feature reliability weight under the current illumination condition according to the lowest illumination value and the highest illumination value of the target area; the heat sensing mechanism unit 23 is configured to obtain a visible light feature reliability weight and an infrared feature reliability weight under the current heat condition according to the lowest heat value and the highest heat value of the target area.
Specifically, referring to fig. 2, the present embodiment utilizes a tiny neural network as an illumination heat sensing unit to capture the illumination value of the visible light image and the heat intensity of the infrared image. To reduce computational complexity, the visible and infrared images are resized to 56 x 56 and fed into the photothermal sensing module. The illumination heat sensing unit of the embodiment includes two convolution layers and three full-connected layers, and a RELU activation function and a 2 × 2 maximum pooling layer are adopted after the convolution layers to compress and extract features, and a network is optimized by optimizing a cross entropy loss function of illumination loss and heat. The loss of the illumination intensity of the illumination heat sensing unit is as follows:
Figure BDA0003575838110000151
wherein w d And w n Is Softmax output of a full connection layer, which is respectively the lowest illumination value and the highest illumination value of the target area acquired by the illumination heat sensing unit,
Figure BDA0003575838110000152
and
Figure BDA0003575838110000153
a genuine label indicating day and night.
The illumination sensing mechanism unit 22 is configured to obtain a visible light feature reliability weight and an infrared feature reliability weight under the current illumination condition according to the lowest illumination value and the highest illumination value of the target area, and the calculation formula is as follows:
Figure BDA0003575838110000161
w t =1-w r
wherein, w r Representing the confidence weight, w, of the visible light characteristics under the current lighting conditions t And representing the infrared characteristic credibility weight under the current illumination condition. To adapt in the network, w is readjusted in the illumination awareness mechanism unit 22 d And w n Wherein w is a parameter set according to actual conditions, | w | ∈ [0,1 |)]Independent prediction of deviation from 0.5, a w ,r w Are two learnable parameters initialized with 1, 0.
Further, the infrared intensity loss Lt of the photothermal sensing unit i Comprises the following steps:
Figure BDA0003575838110000162
wherein m is d And m n Is the Softmax output of the full connection layer, which is respectively the lowest heat value and the highest heat value of the target area acquired by the illumination heat sensing unit,
Figure BDA0003575838110000163
and
Figure BDA0003575838110000164
a genuine label representing cold and hot.
The heat sensing mechanism unit 23 is configured to obtain a visible light feature reliability weight and an infrared feature reliability weight under the current heat condition according to the lowest heat value and the highest heat value of the target area, and a calculation formula is as follows:
Figure BDA0003575838110000165
wherein m is r Representing the visible light characteristic confidence weight m under the current heat condition t And representing the infrared characteristic reliability weight under the current heat condition. In particular, to adapt in the network, m is readjusted in the heat-aware mechanism unit 23 h And m c Wherein m is a parameter set according to actual conditions, and m is equal to [0,1 ]]Independent prediction of deviation from 0.5, a m ,r m Are two learnable parameters initialized with 1, 0.
Further, the cascade region proposing module 3 of this embodiment includes a feature fusion unit, a classification regression unit, and a parameter adjustment unit, where the feature fusion unit is configured to fuse a visible light feature of a third scale, a visible light feature of a fourth scale, a visible light feature of a fifth scale, an infrared feature of a third scale, an infrared feature of a fourth scale, and an infrared feature of a fifth scale, and obtain a fused image. The classification regression unit firstly carries out probability prediction on each possible type of the fused image through a softmax function of a full connection layer to obtain an estimated value of the target class approximation degree; then calculating the intersection ratio (IOU) of the target position prediction box and the truth value box to obtain a target position offset estimation value; the parameter adjusting unit adjusts the target category approximation degree estimation value and the target position deviation estimation value through the heat illumination perception parameters (namely, the visible light feature reliability weight and the infrared feature reliability weight under the current illumination condition, and the visible light feature reliability weight and the infrared feature reliability weight under the current heat condition) obtained by the illumination heat perception module, and when the final target category approximation degree estimation value and the IOU are both higher than 75%, the current category prediction and the position prediction are considered to be accurate, and the detection and identification result of the image is obtained.
Specifically, as the target size variation range is large, the multi-level features extracted by the dual-flow backbone network module are fused, and the target parameters are obtained by adopting a mode suggested by a cascade region. Specifically, visible light characteristics of a third scale, visible light characteristics of a fourth scale, visible light characteristics of a fifth scale and infrared of a third scale are combinedFusing the characteristics, the infrared characteristics of the fourth scale and the infrared characteristics of the fifth scale to obtain fused images, and then performing probability prediction on each possible type of the fused images through a softmax function of a full connection layer to obtain a target class approximation degree estimation value s 0 (ii) a Then calculating the intersection ratio (IOU) of the target position prediction box and the truth value box to obtain a target position deviation estimated value t 0
Subsequently, the target category approximation degree estimation value and the target position deviation estimation value are adjusted by using the heat illumination perception parameters obtained by the illumination heat perception module, namely, the visible light feature reliability weight and the infrared feature reliability weight under the current illumination condition, and the visible light feature reliability weight and the infrared feature reliability weight under the current heat condition, so as to obtain an adjusted value s 1 And t 1 I.e. from the light heat parameter w r 、w t 、m r 、m t Regression bias t predicted for visible and infrared features r 、t t And confidence score s r 、s t Re-weighting to obtain an adjusted value s 1 And t 1 . The specific calculation formula is as follows:
Figure BDA0003575838110000181
Figure BDA0003575838110000182
wherein s is r And t r The confidence coefficient scores and regression offsets of the visible light feature predictions of different scales obtained by using the visible light feature extraction unit are used; s t And t t The confidence scores and regression offsets of the infrared feature predictions of different scales obtained by the infrared feature extraction unit are used.
Final confidence score s fina And regression offset t fina The calculation formula of (a) is as follows:
Figure BDA0003575838110000183
Figure BDA0003575838110000184
the final confidence score is given by a two-stage score s 0 、s 1 Multiplication results, when both phase scores are high, are used to excite the final confidence score. And the target position parameter approaches the target boundary by adopting a summation method. At a classification loss of L cls The focus weight is added in the system to solve the problem of positive and negative imbalance. Classification loss L of the present embodiment cls Expressed as:
Figure BDA0003575838110000185
where α ═ 0.3 and γ ═ 2 are set. s i Is the confidence score of the object i, i.e. the final confidence score s mentioned above fina . The total loss being the loss of light
Figure BDA0003575838110000186
Loss of heat
Figure BDA0003575838110000187
Loss of classification L cls And regression loss L reg Of the sum of (1), wherein the regression loss L reg Is the smoothed L proposed by fast-RCNN 1 And (4) loss. Specifically, the regression loss is expressed as:
Figure BDA0003575838110000191
wherein the content of the first and second substances,
Figure BDA0003575838110000192
the group treth bounding box vector, t, representing sample i i An approximate position estimate representing the target i, i.e. the above-mentioned regression offset t fina R denotes the smooth L1 function in fast-Rcnn.
The total loss is the classification loss L cls And regression loss L reg The total loss function L is as follows:
L=L cls +L reg
it should be noted that, in an actual process, after the infrared-visible light fusion recognition system based on modal difference feature guidance is constructed, the system needs to be trained first to obtain a trained system, and then the target recognition can be performed on the image to be recognized.
In the training process, firstly, a first training data set is formed by a large number of pictures with heat value labels and illumination value labels, and the illumination heat sensing module is trained until the illumination intensity loss of the illumination heat sensing unit and the infrared intensity loss value of the illumination heat sensing unit meet preset requirements, so that the trained illumination heat sensing module can be obtained; and then constructing the infrared-visible light fusion recognition system by utilizing an original double-current backbone network module, a trained light heat perception module and an original cascade region proposing module, and training the original double-current backbone network module and the original cascade region proposing module in the infrared-visible light fusion recognition system by utilizing a second training data set formed by a large number of pictures with target position labels and target type labels until the total loss function L meets the preset requirement, thereby obtaining the trained infrared-visible light fusion recognition system.
And then, inputting the original infrared image and the original visible light image of the same scene into the trained infrared and visible light fusion recognition system to obtain the target type and the target position information in the picture.
According to the embodiment of the invention, an inter-modal characteristic difference is adopted to guide an inter-level characteristic generation mode of supplementary learning, so that the dependency of inter-modal characteristic representation is improved. And different feature layers and different scale features are fused, so that the detection performance of the multi-scale target is improved. The infrared heat intensity and visible light illuminance sensing module is adopted to weight the fusion characteristics, so that the network is more suitable for different scenes, weather and targets, and the generalization capability of the network is improved. The embodiment fuses different feature layers and different scale features, improves the detection performance of multi-scale targets, has higher resolution of low-level features, contains more positions and detail information, has lower semantic property and more noise, has stronger semantic information of high-level features, but has low resolution, has poorer perception capability of details, and is favorable for the detection capability of the targets with different sizes by utilizing the multi-level features.
Example two
On the basis of the foregoing embodiments, the present embodiment provides an infrared-visible light fusion identification method based on modal difference feature guidance, where the method includes:
s1: constructing the infrared and visible light fusion recognition system based on modal difference feature guidance according to the first embodiment.
The infrared and visible light fusion identification system of the embodiment comprises a double-current backbone network module 1, an illumination heat sensing module 2 and a cascade region proposing module 3. The dual-current backbone network module 1 comprises an infrared feature extraction unit 11, a visible light feature extraction unit 12 and a feature guide unit 13, wherein the infrared feature extraction unit 11 is used for acquiring infrared features of different scales of an original infrared image, and the visible light feature extraction unit 12 is used for acquiring visible light features of different scales of a visible light image in the same scene as the original infrared image; the feature guiding unit 13 is configured to perform cross-guide on the infrared features and the visible light features of the current scale to obtain weighted infrared features and weighted visible light features, so as to guide formation of the infrared features and the visible light features of the next scale in the infrared feature extraction unit 11 and the visible light feature extraction unit 12.
The illumination heat sensing module 2 is used for acquiring an infrared image heat value in an original infrared image and a visible light image illumination value in an original visible light image; the cascade region proposing module 3 is used for obtaining the identification result of the target according to the infrared features of different scales, the visible light features of different scales, the infrared image heat value and the visible light image illumination value.
S2: and training a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module in the infrared and visible light fusion recognition system to obtain the trained infrared and visible light fusion recognition system.
Further, the S2 includes:
forming a first training data set by using a large number of pictures with heat value labels and illumination value labels, and training the illumination heat sensing module to obtain a trained illumination heat sensing module;
the infrared and visible light fusion recognition system is constructed by utilizing an original double-current backbone network module, a trained illumination heat sensing module and an original cascade region proposing module, and the original double-current backbone network module and the original cascade region proposing module in the infrared and visible light fusion recognition system are trained by utilizing a second training data set formed by a large number of pictures with target position labels and target type labels, so that the trained infrared and visible light fusion recognition system is obtained.
S3: and inputting the original infrared image and the original visible light image of the same scene into the trained infrared and visible light fusion recognition system to obtain the target type and the target position in the picture.
It should be noted that, for the working principle and the data processing process of the infrared-visible light fusion recognition system, reference is made to the first embodiment, which is not described herein again.
A further embodiment of the present invention provides a storage medium, in which a computer program is stored, the computer program being configured to execute the steps of the infrared-visible light fusion identification guided based on modal difference features in the foregoing embodiments. Yet another aspect of the present invention provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method for infrared-visible light fusion recognition based on modal difference feature guidance according to the above embodiment when calling the computer program in the memory. Specifically, the integrated module implemented in the form of a software functional module may be stored in a computer readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable an electronic device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. An infrared visible light fusion identification system based on modal difference feature guidance is characterized by comprising a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module, wherein,
the double-current backbone network module comprises an infrared feature extraction unit, a visible light feature extraction unit and a feature guide unit, wherein the infrared feature extraction unit is used for acquiring infrared features of original infrared images in different scales, and the visible light feature extraction unit is used for acquiring visible light features of visible light images in the same scene as the infrared images in different scales; the feature guiding unit is used for fusing and weighting the infrared features and the visible light features of the current scale to obtain weighted infrared features and weighted visible light features so as to guide the formation of the infrared features and the visible light features of the next scale in the infrared feature extracting unit and the visible light feature extracting unit in a crossed manner;
the illumination heat sensing module is used for obtaining a reliability weight of the visible light characteristic and a reliability weight of the infrared characteristic according to the image heat value of the original infrared image and the image illumination value of the original visible light image;
the cascade region proposing module is used for obtaining the identification result of the target according to the infrared features of different scales, the visible light features of different scales, the reliability weight of the visible light features and the reliability weight of the infrared features.
2. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 1, wherein the infrared feature extraction unit comprises a plurality of cascaded infrared feature extraction convolution layers to obtain infrared light features of different scales; the visible light feature extraction unit comprises a plurality of cascaded visible light feature extraction units so as to obtain visible light features with different scales;
the feature guiding unit comprises a plurality of cascaded modal difference guiding subunits, wherein the modal difference guiding subunit obtains weighted infrared features, weighted visible light features and infrared visible light fusion vectors by using infrared features and visible light features of a current scale and infrared visible light fusion vectors obtained by a previous modal difference guiding subunit, transmits the weighted infrared features to a visible light feature extraction unit of a next scale, transmits the weighted visible light features to an infrared feature extraction unit of a next scale, so as to guide the formation of the infrared features and visible light features of the next scale in the infrared feature extraction unit and the visible light feature extraction unit in a cross manner, and transmits the infrared visible light fusion vectors to the next modal difference guiding subunit.
3. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 2, wherein the modal difference guidance subunit is specifically configured to:
fusing the current visible light characteristic, the current infrared characteristic and the visible light infrared fusion characteristic output by the previous modal difference guide subunit, performing characteristic separation through three parallel 3 multiplied by 3 convolutional layers to obtain the visible light characteristic, the infrared characteristic and the visible light infrared fusion characteristic which comprise more semantic information, and then obtaining a weight value corresponding to the current visible light characteristic and the current infrared characteristic by utilizing a Sigmoid activation function so as to form a weighted visible light characteristic and a weighted infrared characteristic.
4. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 1, wherein the infrared feature extraction unit specifically includes a first base layer, a first dense fusion layer, a second dense fusion layer, a third dense fusion layer, and a fourth dense fusion layer, the visible light feature extraction unit specifically includes a second base layer, a fifth dense fusion layer, a sixth dense fusion layer, a seventh dense fusion layer, and an eighth dense fusion layer, and the feature guidance unit includes a first modal feature difference guidance subunit, a second modal feature difference guidance subunit, a third modal feature difference guidance subunit, and a fourth modal feature difference guidance subunit, wherein,
the first base layer is used for inputting an original visible light image and obtaining first-scale visible light features, the second base layer is used for inputting an original infrared image and obtaining first-scale infrared features, and the first modal feature difference guiding subunit is used for fusing the first-scale visible light features and the first-scale infrared features and obtaining first-scale infrared visible light fusion vectors, first weighted infrared features and first weighted visible light features;
the first dense fusion layer is used for obtaining a second scale visible light characteristic by utilizing the first weighted infrared characteristic and the first scale visible light characteristic; the fifth dense fusion layer is used for obtaining second-scale infrared features by utilizing the first weighted visible light features and the first-scale infrared features; the second modal feature difference guiding subunit is configured to fuse the second scale visible light feature, the second scale infrared feature, and the first scale infrared-visible light fusion vector to obtain a second scale infrared-visible light fusion vector, a second weighted infrared feature, and a second weighted visible light feature;
the second dense fusion layer is used for obtaining a third-scale visible light characteristic by utilizing the second weighted infrared characteristic and the second-scale visible light characteristic; the sixth dense fusion layer is used for obtaining a third-scale infrared feature by utilizing the second weighted visible light feature and the second-scale infrared feature; the third modal characteristic difference guiding subunit is configured to fuse the third scale visible light characteristic, the third scale infrared characteristic, and the second scale infrared-visible light fusion vector to obtain a third scale infrared-visible light fusion vector, a third weighted infrared characteristic, and a third weighted visible light characteristic;
the third dense fusion layer is used for obtaining a fourth scale visible light characteristic by utilizing the third weighted infrared characteristic and the third scale visible light characteristic; the seventh dense fusion layer is used for obtaining a fourth scale infrared characteristic by using the third weighted visible light characteristic and the third scale infrared characteristic; the fourth modal feature difference guiding subunit is configured to fuse the fourth scale visible light feature, the fourth scale infrared feature, and the third scale infrared-visible light fusion vector to obtain a fourth scale infrared-visible light fusion vector, a fourth weighted infrared feature, and a fourth weighted visible light feature;
the fourth dense fusion layer is used for obtaining a fifth scale visible light characteristic by utilizing the fourth weighted infrared characteristic and the fourth scale visible light characteristic; and the eighth dense fusion layer is used for obtaining a fifth-scale infrared feature by using the fourth weighted visible light feature and the fourth-scale infrared feature.
5. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 4, wherein the first base layer and the second base layer are each composed of a 5 x 5 convolution kernel with step size 1, a 1 x 1 convolution kernel with step size 1, and a 3 x 3 convolution kernel with step size 2; the first dense fusion layer, the second dense fusion layer, the third dense fusion layer, the fourth dense fusion layer, the fifth dense fusion layer, the sixth dense fusion layer, the seventh dense fusion layer and the eighth dense fusion layer are all composed of a shullflentv 2 base convolution kernel with a step size of 2.
6. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 1, wherein the illumination heat perception module comprises an illumination heat perception unit, an illumination perception mechanism unit and a heat perception mechanism unit, wherein,
the illumination heat sensing unit is used for acquiring a lowest illumination value and a highest illumination value of a target area and a lowest heat value and a highest heat value of the target area according to an input original visible light image and an input original infrared image;
the illumination sensing mechanism unit is used for acquiring a visible light characteristic reliability weight and an infrared characteristic reliability weight under the current illumination condition according to the lowest illumination value and the highest illumination value of the target area;
the heat sensing mechanism unit is used for obtaining a visible light characteristic reliability weight and an infrared characteristic reliability weight under the current heat condition according to the lowest heat value and the highest heat value of the target area.
7. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 6, wherein the photothermal sensing unit comprises two convolutional layers and three fully-connected layers, a RELU activation function and a 2 x 2 max pooling layer are employed between the convolutional layers and the fully-connected layers for feature compression and extraction, and the loss of illumination intensity of the photothermal sensing unit is:
Figure FDA0003575838100000041
wherein, w d And w n Is the lowest illumination value and the highest illumination value of the target area acquired by the illumination heat sensing unit,
Figure FDA0003575838100000051
and
Figure FDA0003575838100000052
represents whiteThe real label of the day and night,
loss of infrared intensity of the illumination heat sensing unit
Figure FDA0003575838100000053
Comprises the following steps:
Figure FDA0003575838100000054
wherein m is d And m n Is the lowest heat value and the highest heat value of the target area acquired by the illumination heat sensing unit,
Figure FDA0003575838100000055
and
Figure FDA0003575838100000056
a genuine label representing cold and hot.
8. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 4, wherein the cascade region proposing module includes a feature fusion unit, a classification regression unit, and a parameter adjustment unit, wherein,
the feature fusion unit is used for fusing the visible light feature of the third scale, the visible light feature of the fourth scale, the visible light feature of the fifth scale, the infrared feature of the third scale, the infrared feature of the fourth scale and the infrared feature of the fifth scale to obtain a fused image;
the classification regression unit is used for carrying out probability prediction on each target type in the fused image to obtain a target class approximation degree estimation value, and calculating the intersection and parallel ratio of each target position prediction frame and a true value frame to obtain a target position deviation estimation value;
and the parameter adjusting unit adjusts the target category approximation degree estimated value and the target position deviation estimated value through the heat illumination sensing parameters output by the illumination heat sensing module, and obtains a category prediction structure and a position prediction result of the target in the image.
9. An infrared visible light fusion identification method based on modal difference feature guidance is characterized by comprising the following steps:
s1: constructing the infrared-visible light fusion recognition system based on modal difference feature guidance according to any one of claims 1 to 8;
s2: training a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module in the infrared and visible light fusion recognition system to obtain the trained infrared and visible light fusion recognition system;
s3: and inputting the original infrared image and the original visible light image of the same scene into the trained infrared and visible light fusion recognition system to obtain the target type and the target position in the picture.
10. The method for infrared-visible light fusion recognition guided by modal difference features according to claim 9, wherein the S2 includes:
forming a first training data set by using a large number of pictures with heat value labels and illumination value labels, and training the illumination heat sensing module to obtain a trained illumination heat sensing module;
the infrared and visible light fusion recognition system is constructed by utilizing an original double-current backbone network module, a trained illumination heat sensing module and an original cascade region proposing module, and the original double-current backbone network module and the original cascade region proposing module in the infrared and visible light fusion recognition system are trained by utilizing a second training data set formed by a large number of pictures with target position labels and target type labels, so that the trained infrared and visible light fusion recognition system is obtained.
CN202210333408.5A 2022-03-31 2022-03-31 Infrared and visible light fusion recognition system and method based on modal difference feature guidance Pending CN114898189A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210333408.5A CN114898189A (en) 2022-03-31 2022-03-31 Infrared and visible light fusion recognition system and method based on modal difference feature guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210333408.5A CN114898189A (en) 2022-03-31 2022-03-31 Infrared and visible light fusion recognition system and method based on modal difference feature guidance

Publications (1)

Publication Number Publication Date
CN114898189A true CN114898189A (en) 2022-08-12

Family

ID=82715941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210333408.5A Pending CN114898189A (en) 2022-03-31 2022-03-31 Infrared and visible light fusion recognition system and method based on modal difference feature guidance

Country Status (1)

Country Link
CN (1) CN114898189A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410147A (en) * 2022-08-16 2022-11-29 北京航空航天大学 All-weather cross-modal self-adaptive fusion pedestrian target detection system and method
CN115527159A (en) * 2022-09-02 2022-12-27 燕山大学 Counting system and method based on cross-modal scale attention aggregation features
CN116206275A (en) * 2023-02-23 2023-06-02 南通探维光电科技有限公司 Knowledge distillation-based recognition model training method and device
CN115410147B (en) * 2022-08-16 2024-07-02 北京航空航天大学 All-weather-oriented cross-mode self-adaptive pedestrian fusion target detection system and method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410147A (en) * 2022-08-16 2022-11-29 北京航空航天大学 All-weather cross-modal self-adaptive fusion pedestrian target detection system and method
CN115410147B (en) * 2022-08-16 2024-07-02 北京航空航天大学 All-weather-oriented cross-mode self-adaptive pedestrian fusion target detection system and method
CN115527159A (en) * 2022-09-02 2022-12-27 燕山大学 Counting system and method based on cross-modal scale attention aggregation features
CN115527159B (en) * 2022-09-02 2024-01-12 燕山大学 Counting system and method based on inter-modal scale attention aggregation features
CN116206275A (en) * 2023-02-23 2023-06-02 南通探维光电科技有限公司 Knowledge distillation-based recognition model training method and device
CN116206275B (en) * 2023-02-23 2024-03-01 南通探维光电科技有限公司 Knowledge distillation-based recognition model training method and device

Similar Documents

Publication Publication Date Title
Jiang et al. Deep learning in object detection and recognition
CN109284670B (en) Pedestrian detection method and device based on multi-scale attention mechanism
Pomari et al. Image splicing detection through illumination inconsistencies and deep learning
Föckler et al. Phoneguide: museum guidance supported by on-device object recognition on mobile phones
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN110909605B (en) Cross-modal pedestrian re-identification method based on contrast correlation
CN114898189A (en) Infrared and visible light fusion recognition system and method based on modal difference feature guidance
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111652159B (en) Micro-expression recognition method and system based on multi-level feature combination
Cun et al. Defocus blur detection via depth distillation
CN112580480B (en) Hyperspectral remote sensing image classification method and device
US8170332B2 (en) Automatic red-eye object classification in digital images using a boosting-based framework
CN112560604A (en) Pedestrian re-identification method based on local feature relationship fusion
CN111539351A (en) Multi-task cascaded face frame selection comparison method
CN112115805A (en) Pedestrian re-identification method and system with bimodal hard-excavation ternary-center loss
CN113052170A (en) Small target license plate recognition method under unconstrained scene
CN114581456A (en) Multi-image segmentation model construction method, image detection method and device
Dai et al. GCD-YOLOv5: An armored target recognition algorithm in complex environments based on array lidar
Setta et al. Real-time facial recognition using SURF-FAST
Mu et al. Finding autofocus region in low contrast surveillance images using CNN-based saliency algorithm
Wang et al. MSF 2 DN: Multi Scale Feature Fusion Dehazing Network with Dense Connection
Wang et al. SIHRNet: a fully convolutional network for single image highlight removal with a real-world dataset
Mokalla et al. On designing MWIR and visible band based deepface detection models
CN115546668A (en) Marine organism detection method and device and unmanned aerial vehicle
CN114445916A (en) Living body detection method, terminal device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination