CN114898189A

CN114898189A - Infrared and visible light fusion recognition system and method based on modal difference feature guidance

Info

Publication number: CN114898189A
Application number: CN202210333408.5A
Authority: CN
Inventors: 秦翰林; 罗国慧; 延翔; 欧洪璇; 孙鹏; 张昱赓; 陈嘉欣; 冯冬竹
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-08-12

Abstract

The invention discloses an infrared visible light fusion recognition system based on modal difference feature guidance, which comprises a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module, wherein the double-current backbone network module comprises an infrared feature extraction unit, a visible light feature extraction unit and a feature guidance unit, the infrared feature extraction unit acquires infrared features of different scales of an original infrared image, and the visible light feature extraction unit acquires visible light features of different scales of a visible light image; the characteristic guide unit obtains weighted infrared characteristics and weighted visible light characteristics; the illumination heat sensing module acquires a reliability weight of the visible light characteristic and a reliability weight of the infrared characteristic; the cascade region proposing module is used for obtaining the identification result of the target. The invention adopts the inter-modal feature difference to guide the inter-level feature generation mode of supplementary learning, improves the dependency of inter-modal feature representation and improves the generalization capability of a network system.

Description

Infrared and visible light fusion recognition system and method based on modal difference feature guidance

Technical Field

The invention belongs to the technical field of multispectral image identification, and particularly relates to an infrared visible light fusion detection identification method based on modal difference feature guidance.

Background

The infrared image can capture clear outline information of the target, and the visible light image can present fine visual information of the target, including color, texture, structure and the like. Under the condition of insufficient illumination, the target detection is indispensable by effectively utilizing the complementary information of the two wave bands.

In the early exploration in the field of infrared-visible light fusion target detection, many scholars focus on studying imaging hardware and calibrating to obtain paired infrared-visible light image sequences, and design a dual-band fusion detector by using a machine learning detection method to verify the effect of a data set. Hwang S, Park J, Kim N, etc. has set up an infrared-visible light pairing imaging system composed of a color camera, a thermal imager, a beam splitter and a triaxial camera clamp, designed a dual-band target detector composed of an ACF (visible light path aggregation channel characteristic) + T (pixel intensity) + THOG (gradient direction histogram) characteristic and an Adaboost classifier, and then obtain a classification result by extracting and fusing the ACF, the infrared pixel intensity and the THOG characteristic.

With the rapid development of the deep neural network in the field of visible light image detection, researchers are dedicated to applying the deep neural network to dual-band feature fusion detection. Aiming at the problem that the two modes of infrared and visible light have field of view and mode difference, Zhang L, Zhu X, Chen X and the like design a new alignment area CNN network to process weak alignment data end to end. And for a group of input color images and heat images, after feature extraction, matching the input region features with the number prediction and the region of interest, merging the aligned image features, and then performing confidence fusion. In the method, the difference between the modes is relieved, but the generation of the current characteristics is guided by mutual complementation of the difference of the characteristics between the modes without starting from different characteristic layers. Meanwhile, in the feature fusion process, a complete fusion guidance strategy is not established in the current method, so that the fusion detection precision is low.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an infrared and visible light fusion detection and identification method based on modal difference feature guidance. The technical problem to be solved by the invention is realized by the following technical scheme:

the invention provides an infrared visible light fusion recognition system based on modal difference characteristic guidance, which comprises a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module, wherein,

the double-current backbone network module comprises an infrared feature extraction unit, a visible light feature extraction unit and a feature guide unit, wherein the infrared feature extraction unit is used for acquiring infrared features of original infrared images in different scales, and the visible light feature extraction unit is used for acquiring visible light features of visible light images in the same scene as the infrared images in different scales; the feature guiding unit is used for fusing and weighting the infrared features and the visible light features of the current scale to obtain weighted infrared features and weighted visible light features so as to guide the formation of the infrared features and the visible light features of the next scale in the infrared feature extracting unit and the visible light feature extracting unit in a crossed manner;

the illumination heat sensing module is used for obtaining a reliability weight of the visible light characteristic and a reliability weight of the infrared characteristic according to the image heat value of the original infrared image and the image illumination value of the original visible light image;

the cascade region proposing module is used for obtaining the identification result of the target according to the infrared features with different scales, the visible light features with different scales, the credibility weight of the visible light features and the credibility weight of the infrared features.

In one embodiment of the present invention, the infrared feature extraction unit includes a plurality of cascaded infrared feature extraction convolution layers to obtain infrared light features of different scales, the visible light feature extraction unit includes a plurality of cascaded visible light feature extraction units to obtain visible light features of different scales, the feature guiding unit includes a plurality of cascaded mode difference guiding sub-units, wherein,

the modal difference guiding subunit obtains weighted infrared features, weighted visible light features and infrared-visible light fusion vectors by using the infrared features and visible light features of the current scale and the infrared-visible light fusion vectors obtained by the previous modal difference guiding subunit, transmits the weighted infrared features to the visible light feature extraction unit of the next scale, and transmits the weighted visible light features to the infrared feature extraction unit of the next scale so as to guide the formation of the infrared features and visible light features of the next scale in the infrared feature extraction unit and the visible light feature extraction unit; and transmitting the infrared-visible light fusion vector to a next modal difference guiding subunit.

In an embodiment of the invention, the modal difference guidance subunit is specifically configured to:

fusing the current visible light characteristic, the current infrared characteristic and the visible light infrared fusion characteristic output by the previous mode difference guide subunit, and performing characteristic separation through three parallel 3 × 3 convolution layers to obtain the visible light characteristic, the infrared characteristic and the visible light infrared fusion characteristic comprising more semantic information; and then, obtaining a weight value corresponding to the current visible light characteristic and the current infrared characteristic by using a Sigmoid activation function, thereby forming the visible light characteristic with the weight value and the infrared characteristic with the weight value.

In an embodiment of the present invention, the infrared feature extraction unit specifically includes a first base layer, a first dense fusion layer, a second dense fusion layer, a third dense fusion layer, and a fourth dense fusion layer, the visible light feature extraction unit specifically includes a second base layer, a fifth dense fusion layer, a sixth dense fusion layer, a seventh dense fusion layer, and an eighth dense fusion layer, the feature guiding unit includes a first modal feature difference guiding subunit, a second modal feature difference guiding subunit, a third modal feature difference guiding subunit, and a fourth modal feature difference guiding subunit, wherein,

the first base layer is used for inputting an original visible light image and obtaining first-scale visible light features, the second base layer is used for inputting an original infrared image and obtaining first-scale infrared features, and the first modal feature difference guiding subunit is used for fusing the first-scale visible light features and the first-scale infrared features and obtaining first-scale infrared visible light fusion vectors, first weighted infrared features and first weighted visible light features;

the first dense fusion layer is used for obtaining a second scale visible light characteristic by utilizing the first weighted infrared characteristic and the first scale visible light characteristic; the fifth dense fusion layer is used for obtaining second-scale infrared features by utilizing the first weighted visible light features and the first-scale infrared features; the second modal feature difference guiding subunit is configured to fuse the second scale visible light feature, the second scale infrared feature, and the first scale infrared-visible light fusion vector to obtain a second scale infrared-visible light fusion vector, a second weighted infrared feature, and a second weighted visible light feature;

the second dense fusion layer is used for obtaining a third-scale visible light characteristic by utilizing the second weighted infrared characteristic and the second-scale visible light characteristic; the sixth dense fusion layer is used for obtaining a third-scale infrared feature by utilizing the second weighted visible light feature and the second-scale infrared feature; the third modal characteristic difference guiding subunit is configured to fuse the third scale visible light characteristic, the third scale infrared characteristic, and the second scale infrared-visible light fusion vector to obtain a third scale infrared-visible light fusion vector, a third weighted infrared characteristic, and a third weighted visible light characteristic;

the third dense fusion layer is used for obtaining a fourth scale visible light characteristic by utilizing the third weighted infrared characteristic and the third scale visible light characteristic; the seventh dense fusion layer is used for obtaining a fourth-scale infrared feature by using the third weighted visible light feature and the third-scale infrared feature; the fourth modal feature difference guiding subunit is configured to fuse the fourth scale visible light feature, the fourth scale infrared feature, and the third scale infrared-visible light fusion vector to obtain a fourth scale infrared-visible light fusion vector, a fourth weighted infrared feature, and a fourth weighted visible light feature;

the fourth dense fusion layer is used for obtaining a fifth scale visible light characteristic by utilizing the fourth weighted infrared characteristic and the fourth scale visible light characteristic; and the eighth dense fusion layer is used for obtaining a fifth-scale infrared feature by using the fourth weighted visible light feature and the fourth-scale infrared feature.

In one embodiment of the present invention, the first base layer and the second base layer are each composed of a 5 × 5 convolution kernel with step size 1, a 1 × 1 convolution kernel with step size 1, and a 3 × 3 convolution kernel with step size 2; the first dense fusion layer, the second dense fusion layer, the third dense fusion layer, the fourth dense fusion layer, the fifth dense fusion layer, the sixth dense fusion layer, the seventh dense fusion layer and the eighth dense fusion layer are all composed of a shullflentv 2 base convolution kernel with a step size of 2.

In one embodiment of the present invention, the illumination heat sensing module includes an illumination heat sensing unit, an illumination sensing mechanism unit, and a heat sensing mechanism unit, wherein,

the illumination heat sensing unit is used for processing the input original visible light image and the original infrared image to obtain a minimum illumination value and a maximum illumination value of the target area and a minimum heat value and a maximum heat value of the target area;

the illumination sensing mechanism unit is used for acquiring a visible light characteristic reliability weight and an infrared characteristic reliability weight under the current illumination condition according to the lowest illumination value and the highest illumination value of the target area;

the heat sensing mechanism unit is used for obtaining a visible light characteristic reliability weight and an infrared characteristic reliability weight under the current heat condition according to the lowest heat value and the highest heat value of the target area.

In one embodiment of the present invention, the light heat sensing unit includes two convolutional layers and three fully-connected layers, a RELU activation function and a 2 × 2 max pooling layer are adopted after the convolutional layers to compress and extract features, and the loss of light intensity of the light heat sensing unit is:

wherein, w _d And w _n Is the lowest illumination value and the highest illumination value of the target area acquired by the illumination heat sensing unit,

and

a real label indicating daytime and nighttime,

loss of infrared intensity of the illumination heat sensing unit

Comprises the following steps:

wherein m is _d And m _n Is the lowest heat value and the highest heat value of the target area acquired by the illumination heat sensing unit,

and

a genuine label representing cold and hot.

In one embodiment of the invention, the cascaded region proposal module comprises a feature fusion unit, a classification regression unit and a parameter adjustment unit, wherein,

the feature fusion unit is used for fusing the visible light feature of the third scale, the visible light feature of the fourth scale, the visible light feature of the fifth scale, the infrared feature of the third scale, the infrared feature of the fourth scale and the infrared feature of the fifth scale to obtain a fused image;

the classification regression unit is used for carrying out probability prediction on each target type in the fused image to obtain a target class approximation degree estimation value, and calculating the intersection and parallel ratio of each target position prediction frame and a true value frame to obtain a target position deviation estimation value;

and the parameter adjusting unit adjusts the target category approximation degree estimation value and the target position deviation estimation value through the heat illumination sensing parameters output by the illumination heat sensing module, and obtains a category prediction structure and a position prediction result of the target in the image.

The invention provides an infrared visible light fusion identification method based on modal difference feature guidance, which comprises the following steps:

s1: constructing an infrared and visible light fusion recognition system based on modal difference feature guidance according to any one of the above embodiments;

s2: training a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module in the infrared and visible light fusion recognition system to obtain the trained infrared and visible light fusion recognition system;

s3: and inputting the original infrared image and the original visible light image of the same scene into the trained infrared and visible light fusion recognition system to obtain the target type and the target position in the picture.

In an embodiment of the present invention, the S2 includes:

forming a first training data set by using a large number of pictures with heat value labels and illumination value labels, and training the illumination heat sensing module to obtain a trained illumination heat sensing module;

the infrared and visible light fusion recognition system is constructed by utilizing an original double-current backbone network module, a trained illumination heat sensing module and an original cascade region proposing module, and the original double-current backbone network module and the original cascade region proposing module in the infrared and visible light fusion recognition system are trained by utilizing a second training data set formed by a large number of pictures with target position labels and target type labels, so that the trained infrared and visible light fusion recognition system is obtained.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, an inter-modal feature difference guiding and supplementary learning inter-level feature generation mode is adopted, the reliability weight of each modal feature is obtained by densely embedding modal feature difference guiding units between feature layers, and then deep integration of visible light and infrared modal features is carried out, so that the problem of false detection caused by the fact that a simple fusion strategy of linear combination or splicing retains wrong information in the fusion process is solved.

2. The invention fuses different feature layers and different scale features, improves the detection performance of multi-scale targets, has higher resolution of low-level features, contains more position and detail information, has lower semantic property and more noise, has stronger semantic information of high-level features, but has low resolution and poorer detail perception capability, and is beneficial to the detection capability of targets with different sizes by utilizing the multi-level features.

3. According to the invention, the infrared heat intensity and visible light illuminance sensing module is adopted to obtain the illumination value and the heat intensity value in the current scene, the confidence coefficient scores and the position offset scores of the infrared modal branch and the visible light modal branch are adaptively adjusted, and the generalization capability of the network to different scenes, weathers and targets is improved.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

Fig. 1 is a block diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a modal difference guiding subunit according to an embodiment of the present invention;

fig. 4 is a schematic working process diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following describes in detail an infrared-visible light fusion recognition system and method based on modal difference feature guidance according to the present invention with reference to the accompanying drawings and the detailed description.

The foregoing and other technical matters, features and effects of the present invention will be apparent from the following detailed description of the embodiments, which is to be read in connection with the accompanying drawings. The technical means and effects of the present invention adopted to achieve the predetermined purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only and are not used for limiting the technical scheme of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or device that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of additional like elements in the article or device comprising the element.

Example one

Referring to fig. 1 and fig. 2, fig. 1 is a block diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention; fig. 2 is a schematic structural diagram of an infrared-visible light fusion recognition system based on modal difference feature guidance according to an embodiment of the present invention. The infrared visible light fusion recognition system comprises a double-current backbone network module 1, an illumination heat sensing module 2 and a cascade region proposing module 3. The dual-current backbone network module 1 comprises an infrared feature extraction unit 11, a visible light feature extraction unit 12 and a feature guide unit 13, wherein the infrared feature extraction unit 11 is used for acquiring infrared features of different scales of an original infrared image, and the visible light feature extraction unit 12 is used for acquiring visible light features of different scales of a visible light image in the same scene as the original infrared image; the feature guiding unit 13 is configured to fuse and weight the infrared features and the visible light features of the current scale to obtain weighted infrared features and weighted visible light features, so as to cross-guide formation of the infrared features and the visible light features of the next scale in the infrared feature extracting unit 11 and the visible light feature extracting unit 12.

The illumination heat sensing module 2 is used for obtaining a reliability weight of the visible light characteristic and a reliability weight of the infrared characteristic according to the image heat value of the original infrared image and the image illumination value of the original visible light image; the cascade region proposing module 3 is used for obtaining the identification result of the target according to the infrared features of different scales, the visible light features of different scales, the reliability weight of the visible light features and the reliability weight of the infrared features.

Further, the infrared feature extraction unit 11 in the dual-flow backbone network module 1 includes a plurality of cascaded infrared feature extraction convolution layers to obtain infrared features of different scales, and the visible light feature extraction unit 12 includes a plurality of cascaded visible light feature extraction units to obtain visible light features of different scales. The feature guiding unit 13 includes a plurality of cascaded modal difference guiding subunits, where the modal difference guiding subunit obtains a weighted infrared feature, a weighted visible light feature, and an infrared-visible light fusion vector by using the infrared feature and the visible light feature of the current scale and the infrared-visible light fusion vector obtained by the previous modal difference guiding subunit, transmits the weighted infrared feature to the visible light feature extraction unit of the next scale, and transmits the weighted visible light feature to the infrared feature extraction unit of the next scale, so as to guide the formation of the infrared feature and the visible light feature of the next scale in the infrared feature extraction unit and the visible light feature extraction unit; and transmitting the infrared-visible light fusion vector to a next modal difference guiding subunit.

Further, referring to fig. 3, the modal difference guiding subunit is specifically configured to:

fusing the current visible light characteristic, the current infrared characteristic and the visible light infrared fusion characteristic output by the previous mode difference guide subunit, and performing characteristic separation through three parallel 3 × 3 convolution layers to obtain the visible light characteristic, the infrared characteristic and the visible light infrared fusion characteristic comprising more semantic information; and then, obtaining a weight value corresponding to the current visible light characteristic and the current infrared characteristic by using a Sigmoid activation function, thereby forming a weighted visible light characteristic and a weighted infrared characteristic.

Further, referring to fig. 2, the infrared feature extraction unit 11 of the present embodiment specifically includes a first base layer, a first dense fusion layer, a second dense fusion layer, a third dense fusion layer, and a fourth dense fusion layer, the visible light feature extraction unit 12 specifically includes a second base layer, a fifth dense fusion layer, a sixth dense fusion layer, a seventh dense fusion layer, and an eighth dense fusion layer, the feature guiding unit includes 13 a first modal feature difference guiding subunit, a second modal feature difference guiding subunit, a third modal feature difference guiding subunit, and a fourth modal feature difference guiding subunit, wherein,

Specifically, the infrared feature extraction unit and the visible light feature extraction unit in the dual-flow backbone network module of the present embodiment respectively include five convolution layers (i.e., one base layer and four dense fusion layers) and embed one modal difference feature guiding subunit layer by layer. Firstly, respectively extracting the characteristics of an original infrared image and a visible light image through a convolutional layer to obtain visible light characteristics and infrared characteristics, then inputting the visible light characteristics, the infrared characteristics and infrared visible light fusion characteristics obtained by fusing the infrared characteristics and the visible light characteristics of a previous layer into a current mode difference characteristic guide subunit, outputting weighted infrared characteristics to guide the generation of the visible light characteristics of the next stage through cross guidance, and outputting the weighted visible light characteristics to guide the generation of the infrared characteristics of the next stage.

Furthermore, five convolution layers in the infrared characteristic extraction unit and the visible light characteristic extraction unit comprise a base layer and four dense fusion layers, wherein the base layer is composed of a 5 × 5depth-wise convolution kernel with the step size of 1, a 1 × 1 convolution kernel with the step size of 1 and a 3 × 3 convolution kernel with the step size of 2, and the dense fusion layers are composed of a shullflentv 2 base convolution kernel with the step size of 2. In this embodiment, the first base layer and the second base layer are each composed of a 5 × 5 convolution kernel with a step size of 1, a 1 × 1 convolution kernel with a step size of 1, and a 3 × 3 convolution kernel with a step size of 2; the first dense fusion layer, the second dense fusion layer, the third dense fusion layer, the fourth dense fusion layer, the fifth dense fusion layer, the sixth dense fusion layer, the seventh dense fusion layer and the eighth dense fusion layer are all composed of a shullflentv 2 base convolution kernel with a step size of 2. After an original visible light image with the size of H multiplied by W multiplied by C (H is the height of the image, W is the width of the image, and C is the number of channels) is input into the visible light feature extraction unit, different-scale visible light feature maps with the sizes of H/2 multiplied by W/2 multiplied by 32, H/4 multiplied by W/4 multiplied by 64, H/8 multiplied by W/8 multiplied by 64, H/16 multiplied by W/16 multiplied by 128 and H/32 multiplied by W/32 multiplied by 256 are obtained step by step in the hierarchical convolution process, and similarly, an original infrared image with the size of H multiplied by W multiplied by C (H is the height of the image, W is the width of the image, and C is the number of channels) is input into the infrared feature extraction unit, and H/2 multiplied by W/2 multiplied by 32, H/4 multiplied by W/4 multiplied by 64, H/8 multiplied by W/8 multiplied by 64 are obtained step by step in the hierarchical convolution process, H/16 xW/16 x 128 and H/32 xW/32 x 256 infrared feature maps in different scales.

Further, the input of each modal difference guiding subunit in the feature guiding unit 13 includes the visible light feature extracted from the current-stage visible light convolutional layer, the infrared feature extracted from the current-stage infrared convolutional layer, and the visible light infrared fusion feature obtained in the previous-stage modal difference guiding subunit. The modal difference guiding subunit fuses the current visible light feature, the current infrared feature and the visible light infrared fusion feature output by the previous modal difference guiding subunit, separates the three features through three parallel 3 × 3 convolutions, increases the receptive field to obtain a higher-level visible light feature, an infrared feature and a visible light infrared fusion feature which comprise more semantic information (namely, obtain the correlation information between the target and the surrounding background), and obtains a weight corresponding to each modal feature by using a Sigmoid activation function according to the contribution ratio of each modal (visible light modality or infrared modality) in the double-flow backbone network, and obtains an option infrared feature and a weighted visible light feature. The output weighted infrared signature is then returned to the visible signature stream for a stitching operation. And returning the output weighted visible light characteristics to the infrared characteristic stream for splicing. Meanwhile, the output infrared visible light fusion characteristics are input to a next-stage characteristic difference guiding subunit to participate in next-stage characteristic generation.

Specifically, before the feature guiding unit is not added, the layer-by-layer feature generation process in the dual-flow backbone network module may be represented as:

wherein, the first and the second end of the pipe are connected with each other,

the characteristic of the l-th layer visible light is shown,

indicating the visible light characteristics of the l +1 st layer. f (x, l) represents the convolution merge and activation operation of the l-th layer. According to the chain rule, the visible light back propagation formula of the corresponding loss function can be expressed as:

accordingly, the formula for the back propagation of the infrared signature can be expressed as:

wherein the content of the first and second substances,

the infrared characteristics of the layer l are shown,

indicating the infrared characteristics of the l +1 layer. f (x, l) represents the convolution merge and activation operation of the l-th layer.

Further, after the feature difference guide subunits are densely added in the double-current backbone network module, the weighted visible light features are used for guiding the generation of the next-stage infrared features, and the cross guide among the modes is completed. The inter-level visible light feature generation pattern may be expressed as:

wherein the content of the first and second substances,

the characteristics of the visible light of the l layers are shown,

represents the visible light characteristics of the l +1 layer,

representing the weighting characteristics obtained after the infrared characteristics and the visible light characteristics pass through the weight sensing module, wherein tau (-) represents the guidance of characteristic addition or splicing, and the corresponding back propagation obtained according to the chain rule can be represented as:

wherein the content of the first and second substances,

and psi (-) represents the weighting operation of the infrared features of the visible light. In the feature difference guidance subunit, the generation of the visible light feature includes the co-participation of the infrared weighting feature and the visible light feature, as does the infrared weighting feature. The cross difference guide mechanism generated by the feature guide unit can deepen the connection among the multi-modal networks and establish deeper and more accurate modal correlation, thereby obtaining more distinctive enhanced features.

Further, the illumination heat sensing module 2 of the present embodiment includes an illumination heat sensing unit 21, an illumination sensing mechanism unit 22, and a heat sensing mechanism unit 23, where the illumination heat sensing unit 21 is configured to process the input original visible light image and the input original infrared image to obtain a lowest illumination value and a highest illumination value of the target area, and a lowest heat value and a highest heat value of the target area; the illumination sensing mechanism unit 22 is configured to obtain a visible light feature reliability weight and an infrared feature reliability weight under the current illumination condition according to the lowest illumination value and the highest illumination value of the target area; the heat sensing mechanism unit 23 is configured to obtain a visible light feature reliability weight and an infrared feature reliability weight under the current heat condition according to the lowest heat value and the highest heat value of the target area.

Specifically, referring to fig. 2, the present embodiment utilizes a tiny neural network as an illumination heat sensing unit to capture the illumination value of the visible light image and the heat intensity of the infrared image. To reduce computational complexity, the visible and infrared images are resized to 56 x 56 and fed into the photothermal sensing module. The illumination heat sensing unit of the embodiment includes two convolution layers and three full-connected layers, and a RELU activation function and a 2 × 2 maximum pooling layer are adopted after the convolution layers to compress and extract features, and a network is optimized by optimizing a cross entropy loss function of illumination loss and heat. The loss of the illumination intensity of the illumination heat sensing unit is as follows:

wherein w _d And w _n Is Softmax output of a full connection layer, which is respectively the lowest illumination value and the highest illumination value of the target area acquired by the illumination heat sensing unit,

and

a genuine label indicating day and night.

The illumination sensing mechanism unit 22 is configured to obtain a visible light feature reliability weight and an infrared feature reliability weight under the current illumination condition according to the lowest illumination value and the highest illumination value of the target area, and the calculation formula is as follows:

w _t ＝1-w _r

wherein, w _r Representing the confidence weight, w, of the visible light characteristics under the current lighting conditions _t And representing the infrared characteristic credibility weight under the current illumination condition. To adapt in the network, w is readjusted in the illumination awareness mechanism unit 22 _d And w _n Wherein w is a parameter set according to actual conditions, | w | ∈ [0,1 |)]Independent prediction of deviation from 0.5, a _w ，r _w Are two learnable parameters initialized with 1, 0.

Further, the infrared intensity loss Lt of the photothermal sensing unit _i Comprises the following steps:

wherein m is _d And m _n Is the Softmax output of the full connection layer, which is respectively the lowest heat value and the highest heat value of the target area acquired by the illumination heat sensing unit,

and

a genuine label representing cold and hot.

The heat sensing mechanism unit 23 is configured to obtain a visible light feature reliability weight and an infrared feature reliability weight under the current heat condition according to the lowest heat value and the highest heat value of the target area, and a calculation formula is as follows:

wherein m is _r Representing the visible light characteristic confidence weight m under the current heat condition _t And representing the infrared characteristic reliability weight under the current heat condition. In particular, to adapt in the network, m is readjusted in the heat-aware mechanism unit 23 _h And m _c Wherein m is a parameter set according to actual conditions, and m is equal to [0,1 ]]Independent prediction of deviation from 0.5, a _m ，r _m Are two learnable parameters initialized with 1, 0.

Further, the cascade region proposing module 3 of this embodiment includes a feature fusion unit, a classification regression unit, and a parameter adjustment unit, where the feature fusion unit is configured to fuse a visible light feature of a third scale, a visible light feature of a fourth scale, a visible light feature of a fifth scale, an infrared feature of a third scale, an infrared feature of a fourth scale, and an infrared feature of a fifth scale, and obtain a fused image. The classification regression unit firstly carries out probability prediction on each possible type of the fused image through a softmax function of a full connection layer to obtain an estimated value of the target class approximation degree; then calculating the intersection ratio (IOU) of the target position prediction box and the truth value box to obtain a target position offset estimation value; the parameter adjusting unit adjusts the target category approximation degree estimation value and the target position deviation estimation value through the heat illumination perception parameters (namely, the visible light feature reliability weight and the infrared feature reliability weight under the current illumination condition, and the visible light feature reliability weight and the infrared feature reliability weight under the current heat condition) obtained by the illumination heat perception module, and when the final target category approximation degree estimation value and the IOU are both higher than 75%, the current category prediction and the position prediction are considered to be accurate, and the detection and identification result of the image is obtained.

Specifically, as the target size variation range is large, the multi-level features extracted by the dual-flow backbone network module are fused, and the target parameters are obtained by adopting a mode suggested by a cascade region. Specifically, visible light characteristics of a third scale, visible light characteristics of a fourth scale, visible light characteristics of a fifth scale and infrared of a third scale are combinedFusing the characteristics, the infrared characteristics of the fourth scale and the infrared characteristics of the fifth scale to obtain fused images, and then performing probability prediction on each possible type of the fused images through a softmax function of a full connection layer to obtain a target class approximation degree estimation value s ₀ (ii) a Then calculating the intersection ratio (IOU) of the target position prediction box and the truth value box to obtain a target position deviation estimated value t ₀ 。

Subsequently, the target category approximation degree estimation value and the target position deviation estimation value are adjusted by using the heat illumination perception parameters obtained by the illumination heat perception module, namely, the visible light feature reliability weight and the infrared feature reliability weight under the current illumination condition, and the visible light feature reliability weight and the infrared feature reliability weight under the current heat condition, so as to obtain an adjusted value s ₁ And t ₁ I.e. from the light heat parameter w _r 、w _t 、m _r 、m _t Regression bias t predicted for visible and infrared features _r 、t _t And confidence score s _r 、s _t Re-weighting to obtain an adjusted value s ₁ And t ₁ . The specific calculation formula is as follows:

wherein s is _r And t _r The confidence coefficient scores and regression offsets of the visible light feature predictions of different scales obtained by using the visible light feature extraction unit are used; s _t And t _t The confidence scores and regression offsets of the infrared feature predictions of different scales obtained by the infrared feature extraction unit are used.

Final confidence score s _fina And regression offset t _fina The calculation formula of (a) is as follows:

the final confidence score is given by a two-stage score s ₀ 、s ₁ Multiplication results, when both phase scores are high, are used to excite the final confidence score. And the target position parameter approaches the target boundary by adopting a summation method. At a classification loss of L _cls The focus weight is added in the system to solve the problem of positive and negative imbalance. Classification loss L of the present embodiment _cls Expressed as:

where α ═ 0.3 and γ ═ 2 are set. s _i Is the confidence score of the object i, i.e. the final confidence score s mentioned above _fina . The total loss being the loss of light

Loss of heat

Loss of classification L _cls And regression loss L _reg Of the sum of (1), wherein the regression loss L _reg Is the smoothed L proposed by fast-RCNN ₁ And (4) loss. Specifically, the regression loss is expressed as:

wherein the content of the first and second substances,

the group treth bounding box vector, t, representing sample i _i An approximate position estimate representing the target i, i.e. the above-mentioned regression offset t _fina R denotes the smooth L1 function in fast-Rcnn.

The total loss is the classification loss L _cls And regression loss L _reg The total loss function L is as follows:

L＝L _cls +L _reg 。

it should be noted that, in an actual process, after the infrared-visible light fusion recognition system based on modal difference feature guidance is constructed, the system needs to be trained first to obtain a trained system, and then the target recognition can be performed on the image to be recognized.

In the training process, firstly, a first training data set is formed by a large number of pictures with heat value labels and illumination value labels, and the illumination heat sensing module is trained until the illumination intensity loss of the illumination heat sensing unit and the infrared intensity loss value of the illumination heat sensing unit meet preset requirements, so that the trained illumination heat sensing module can be obtained; and then constructing the infrared-visible light fusion recognition system by utilizing an original double-current backbone network module, a trained light heat perception module and an original cascade region proposing module, and training the original double-current backbone network module and the original cascade region proposing module in the infrared-visible light fusion recognition system by utilizing a second training data set formed by a large number of pictures with target position labels and target type labels until the total loss function L meets the preset requirement, thereby obtaining the trained infrared-visible light fusion recognition system.

And then, inputting the original infrared image and the original visible light image of the same scene into the trained infrared and visible light fusion recognition system to obtain the target type and the target position information in the picture.

According to the embodiment of the invention, an inter-modal characteristic difference is adopted to guide an inter-level characteristic generation mode of supplementary learning, so that the dependency of inter-modal characteristic representation is improved. And different feature layers and different scale features are fused, so that the detection performance of the multi-scale target is improved. The infrared heat intensity and visible light illuminance sensing module is adopted to weight the fusion characteristics, so that the network is more suitable for different scenes, weather and targets, and the generalization capability of the network is improved. The embodiment fuses different feature layers and different scale features, improves the detection performance of multi-scale targets, has higher resolution of low-level features, contains more positions and detail information, has lower semantic property and more noise, has stronger semantic information of high-level features, but has low resolution, has poorer perception capability of details, and is favorable for the detection capability of the targets with different sizes by utilizing the multi-level features.

Example two

On the basis of the foregoing embodiments, the present embodiment provides an infrared-visible light fusion identification method based on modal difference feature guidance, where the method includes:

s1: constructing the infrared and visible light fusion recognition system based on modal difference feature guidance according to the first embodiment.

The infrared and visible light fusion identification system of the embodiment comprises a double-current backbone network module 1, an illumination heat sensing module 2 and a cascade region proposing module 3. The dual-current backbone network module 1 comprises an infrared feature extraction unit 11, a visible light feature extraction unit 12 and a feature guide unit 13, wherein the infrared feature extraction unit 11 is used for acquiring infrared features of different scales of an original infrared image, and the visible light feature extraction unit 12 is used for acquiring visible light features of different scales of a visible light image in the same scene as the original infrared image; the feature guiding unit 13 is configured to perform cross-guide on the infrared features and the visible light features of the current scale to obtain weighted infrared features and weighted visible light features, so as to guide formation of the infrared features and the visible light features of the next scale in the infrared feature extraction unit 11 and the visible light feature extraction unit 12.

The illumination heat sensing module 2 is used for acquiring an infrared image heat value in an original infrared image and a visible light image illumination value in an original visible light image; the cascade region proposing module 3 is used for obtaining the identification result of the target according to the infrared features of different scales, the visible light features of different scales, the infrared image heat value and the visible light image illumination value.

S2: and training a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module in the infrared and visible light fusion recognition system to obtain the trained infrared and visible light fusion recognition system.

Further, the S2 includes:

It should be noted that, for the working principle and the data processing process of the infrared-visible light fusion recognition system, reference is made to the first embodiment, which is not described herein again.

A further embodiment of the present invention provides a storage medium, in which a computer program is stored, the computer program being configured to execute the steps of the infrared-visible light fusion identification guided based on modal difference features in the foregoing embodiments. Yet another aspect of the present invention provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method for infrared-visible light fusion recognition based on modal difference feature guidance according to the above embodiment when calling the computer program in the memory. Specifically, the integrated module implemented in the form of a software functional module may be stored in a computer readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable an electronic device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. An infrared visible light fusion identification system based on modal difference feature guidance is characterized by comprising a double-current backbone network module, an illumination heat sensing module and a cascade region proposing module, wherein,

the cascade region proposing module is used for obtaining the identification result of the target according to the infrared features of different scales, the visible light features of different scales, the reliability weight of the visible light features and the reliability weight of the infrared features.

2. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 1, wherein the infrared feature extraction unit comprises a plurality of cascaded infrared feature extraction convolution layers to obtain infrared light features of different scales; the visible light feature extraction unit comprises a plurality of cascaded visible light feature extraction units so as to obtain visible light features with different scales;

the feature guiding unit comprises a plurality of cascaded modal difference guiding subunits, wherein the modal difference guiding subunit obtains weighted infrared features, weighted visible light features and infrared visible light fusion vectors by using infrared features and visible light features of a current scale and infrared visible light fusion vectors obtained by a previous modal difference guiding subunit, transmits the weighted infrared features to a visible light feature extraction unit of a next scale, transmits the weighted visible light features to an infrared feature extraction unit of a next scale, so as to guide the formation of the infrared features and visible light features of the next scale in the infrared feature extraction unit and the visible light feature extraction unit in a cross manner, and transmits the infrared visible light fusion vectors to the next modal difference guiding subunit.

3. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 2, wherein the modal difference guidance subunit is specifically configured to:

fusing the current visible light characteristic, the current infrared characteristic and the visible light infrared fusion characteristic output by the previous modal difference guide subunit, performing characteristic separation through three parallel 3 multiplied by 3 convolutional layers to obtain the visible light characteristic, the infrared characteristic and the visible light infrared fusion characteristic which comprise more semantic information, and then obtaining a weight value corresponding to the current visible light characteristic and the current infrared characteristic by utilizing a Sigmoid activation function so as to form a weighted visible light characteristic and a weighted infrared characteristic.

4. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 1, wherein the infrared feature extraction unit specifically includes a first base layer, a first dense fusion layer, a second dense fusion layer, a third dense fusion layer, and a fourth dense fusion layer, the visible light feature extraction unit specifically includes a second base layer, a fifth dense fusion layer, a sixth dense fusion layer, a seventh dense fusion layer, and an eighth dense fusion layer, and the feature guidance unit includes a first modal feature difference guidance subunit, a second modal feature difference guidance subunit, a third modal feature difference guidance subunit, and a fourth modal feature difference guidance subunit, wherein,

the third dense fusion layer is used for obtaining a fourth scale visible light characteristic by utilizing the third weighted infrared characteristic and the third scale visible light characteristic; the seventh dense fusion layer is used for obtaining a fourth scale infrared characteristic by using the third weighted visible light characteristic and the third scale infrared characteristic; the fourth modal feature difference guiding subunit is configured to fuse the fourth scale visible light feature, the fourth scale infrared feature, and the third scale infrared-visible light fusion vector to obtain a fourth scale infrared-visible light fusion vector, a fourth weighted infrared feature, and a fourth weighted visible light feature;

5. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 4, wherein the first base layer and the second base layer are each composed of a 5 x 5 convolution kernel with step size 1, a 1 x 1 convolution kernel with step size 1, and a 3 x 3 convolution kernel with step size 2; the first dense fusion layer, the second dense fusion layer, the third dense fusion layer, the fourth dense fusion layer, the fifth dense fusion layer, the sixth dense fusion layer, the seventh dense fusion layer and the eighth dense fusion layer are all composed of a shullflentv 2 base convolution kernel with a step size of 2.

6. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 1, wherein the illumination heat perception module comprises an illumination heat perception unit, an illumination perception mechanism unit and a heat perception mechanism unit, wherein,

the illumination heat sensing unit is used for acquiring a lowest illumination value and a highest illumination value of a target area and a lowest heat value and a highest heat value of the target area according to an input original visible light image and an input original infrared image;

7. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 6, wherein the photothermal sensing unit comprises two convolutional layers and three fully-connected layers, a RELU activation function and a 2 x 2 max pooling layer are employed between the convolutional layers and the fully-connected layers for feature compression and extraction, and the loss of illumination intensity of the photothermal sensing unit is:

and

represents whiteThe real label of the day and night,

loss of infrared intensity of the illumination heat sensing unit

Comprises the following steps:

and

a genuine label representing cold and hot.

8. The infrared-visible light fusion recognition system based on modal difference feature guidance according to claim 4, wherein the cascade region proposing module includes a feature fusion unit, a classification regression unit, and a parameter adjustment unit, wherein,

and the parameter adjusting unit adjusts the target category approximation degree estimated value and the target position deviation estimated value through the heat illumination sensing parameters output by the illumination heat sensing module, and obtains a category prediction structure and a position prediction result of the target in the image.

9. An infrared visible light fusion identification method based on modal difference feature guidance is characterized by comprising the following steps:

s1: constructing the infrared-visible light fusion recognition system based on modal difference feature guidance according to any one of claims 1 to 8;

10. The method for infrared-visible light fusion recognition guided by modal difference features according to claim 9, wherein the S2 includes: