CN117095158A

CN117095158A - Terahertz image dangerous article detection method based on multi-scale decomposition convolution

Info

Publication number: CN117095158A
Application number: CN202311063505.8A
Authority: CN
Inventors: 吴衡; 郭梓杰; 罗劭娟; 陈梅云
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2023-08-23
Filing date: 2023-08-23
Publication date: 2023-11-21
Anticipated expiration: 2043-08-23

Abstract

The application discloses a terahertz image dangerous goods detection method based on multi-scale decomposition convolution, which comprises the following steps: acquiring an image of a target object to be detected by using terahertz imaging equipment; processing the image of the object to be detected to construct an input image data set; constructing a target detection network model, inputting the image of the target object to be detected into the target detection network model, generating feature layers with different sizes, carrying out multi-scale feature fusion on features in the feature layers, extracting multi-scale features, and identifying hidden dangerous goods, wherein the target detection network model is obtained by training the input image data set; outputting a hidden dangerous goods detection result image comprising a dangerous goods detection frame, a dangerous goods class number and a predicted probability value if dangerous goods exist in the image of the object to be detected, and outputting the image of the object to be detected consistent with input if dangerous goods do not exist in the image of the object to be detected.

Description

Terahertz image dangerous article detection method based on multi-scale decomposition convolution

Technical Field

The application belongs to the technical field of terahertz detection, and particularly relates to a terahertz image dangerous article detection method based on multi-scale decomposition convolution.

Background

Terahertz imaging techniques use a terahertz radiation source to illuminate an object and capture transmitted or reflected light rays of the object for imaging. Because of the frequency and wavelength characteristics of terahertz radiation, and the advantage of being harmless to human bodies, terahertz imaging technology has great application potential in the fields of biomedicine and safety, such as detection of inflammable and explosive substances, drugs, illicit guns and other dangerous substances. However, because of factors such as hardware of a terahertz imaging system and external environment interference, the terahertz image has the problems of serious noise interference, low signal-to-noise ratio and contrast, blurred image and the like. The problems have great influence on the accuracy of security detection, and the traditional target detection system has the problems of low detection precision, low recognition rate and the like due to the limitation that dangerous goods cannot be accurately recognized and positioned from terahertz images with poor quality. With the development of deep learning technology, the detection task can be automatically completed by adopting a deep learning mode under the training and learning of a large number of sample data sets. In addition, the target detection method based on the multi-scale decomposition convolution can improve accuracy and reduce model complexity so as to be deployed to terminal equipment. Therefore, the development of the target detection algorithm which can more accurately identify the types of dangerous goods in the terahertz image and has high detection accuracy is very helpful for the application and development of the terahertz image detection technology.

Disclosure of Invention

In order to solve the technical problems, the application provides a terahertz image dangerous article detection method based on multi-scale decomposition convolution. By means of a deep learning target detection algorithm and a network model optimization method, detection accuracy and recognition rate of dangerous goods in the terahertz image are improved. Is expected to be widely applied in the field of safety detection of scenes such as subways, airports, frontier and the like.

In order to achieve the above purpose, the application provides a terahertz image dangerous article detection method based on multi-scale decomposition convolution, which comprises the following steps:

acquiring an image of a target object to be detected by using terahertz imaging equipment;

processing the image of the object to be detected to construct an input image data set;

constructing a target detection network model, inputting the image of the target object to be detected into the target detection network model, generating feature layers with different sizes, carrying out multi-scale feature fusion on features in the feature layers, extracting multi-scale features, and identifying hidden dangerous goods, wherein the target detection network model is obtained by training the input image data set;

outputting a hidden dangerous goods detection result image comprising a dangerous goods detection frame, a dangerous goods class number and a predicted probability value if dangerous goods exist in the image of the object to be detected, and outputting the image of the object to be detected consistent with input if dangerous goods do not exist in the image of the object to be detected.

Optionally, processing the image of the object to be detected includes:

converting dangerous goods in the image of the object to be detected into tag data by using a rectangular frame, and obtaining an image containing the tag data;

and performing mosaics enhancement, random left-right overturn and size random scaling on the image containing the tag data to complete the construction of the input image data set.

Optionally, the object detection network model includes: a feature extraction backbone network, a feature fusion network and a feature detection network;

the feature extraction backbone network performs shallow feature extraction on the image of the object to be detected through convolution operation to obtain feature layers with different sizes;

the feature fusion network performs multi-scale feature fusion on the feature layers with different sizes, extracts multi-scale features and acquires a multi-scale feature map;

and the characteristic detection network predicts the multi-scale characteristic map and outputs a prediction result map.

Optionally, a self-adaptive multi-scale large-kernel decomposition convolution module and a attention mechanism BRA are added into the feature extraction backbone network; the characteristic fusion network is added with the self-adaptive multi-scale large-kernel decomposition convolution module and the non-parametric 3-D local attention SimAM;

the self-adaptive multi-scale large-kernel decomposition convolution module carries out multi-scale decomposition and self-adaptive fusion on the input feature images.

Optionally, the adaptive multi-scale large-kernel decomposition convolution module includes a depth convolution, a depth expansion convolution, and a point-by-point convolution.

Optionally, before the adaptive multi-scale large-kernel decomposition convolution module performs multi-scale decomposition and adaptive fusion on the input feature map, the method includes: and carrying out convolution and classification operation on the input feature images to obtain a plurality of feature images.

Optionally, the performing multi-scale decomposition and self-adaptive fusion on the input feature map by the self-adaptive multi-scale large-kernel decomposition convolution module includes:

extracting one of the feature maps as a first feature map;

respectively inputting the rest characteristic diagrams in the plurality of characteristic diagrams into different self-adaptive multi-scale large-kernel decomposition convolution modules;

setting different large convolution kernels and different expansion rates in different self-adaptive multi-scale large kernel decomposition convolution modules, setting depth expansion convolution, depth convolution and point-by-point convolution in different self-adaptive multi-scale large kernel decomposition convolution modules, performing multi-scale decomposition on the residual feature images, and outputting a plurality of new feature images;

connecting a plurality of the new feature graphs in a channel dimension by using cascading operation to obtain a second feature graph;

performing feature fusion and channel number dimension reduction on the second feature map through convolution operation to obtain a third feature map;

performing Softmax and channel separation operation on the third feature map to obtain a space self-adaptive weight;

carrying out weighted aggregation on a plurality of new feature images and the space self-adaptive weights to obtain a fourth feature image;

and performing cascading and convolution operation on the fourth characteristic diagram and the first characteristic diagram to obtain a fifth characteristic diagram, so as to realize self-adaptive fusion.

Optionally, the target parameter in the adaptive multi-scale large-kernel decomposition convolution module is a preset target value.

Optionally, the mathematical model for outputting the hidden dangerous goods detection result image is:

wherein O is an image of a hidden dangerous article detection result,in order to optimize parameters, F' is an image group obtained after the feature extraction backbone network and the feature fusion network are processed, D (·) represents an objective detection function, ψ is parameters of a neural network, and (x, y) represents pixel coordinates of an output detection frame.

Optionally, the training process of the object detection network model through the input image dataset includes:

optimizing the Loss function Loss (Θ) by adopting an SGD function:

L _b ＝L _CIoU +L _DFL

L _DFL ＝-((y _i+1 -y)log(S _i )+(y-y _i )log(S _i+1 ))

wherein N is the number of detection layers, L _b For the frame regression loss function, L _c To classify the loss function, alpha ₁ ，α ₂ Weight coefficient of loss function, L _CIoU And L _DFL As the boundary frame loss function, ioU is the cross ratio, ρ is the distance between the center points of the predicted frame and the real frame, p and g are the center points of the predicted frame and the real frame, c is the diagonal distance between the minimum circumscribed rectangular frames of the two frames, v is the parameter for measuring the consistent length-width ratio, y _i+1 Is the nearest integer to the right of the true value, y _i Is the nearest integer to the left of the true value, n is the number of samples, B _i Is the target value, S _i And outputting a value for the model.

The application has the technical effects that: the application adopts the self-adaptive multi-scale large-kernel decomposition convolution module, can effectively increase the receptive field, simultaneously reduce the complexity and improve the identification and extraction capacity of dangerous goods features in the image. In addition, attention mechanisms are added to the feature extraction backbone network and the feature fusion layer, and the detection precision and the recognition rate of dangerous goods can be improved by utilizing global features. The method is beneficial to application and research of the terahertz image dangerous object detection technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a schematic diagram of a network model architecture of a target detection algorithm according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an adaptive multi-scale large-kernel decomposition convolution module architecture according to an embodiment of the present application;

fig. 3 is a schematic diagram of an attention mechanism BRA module architecture according to an embodiment of the present application;

fig. 4 is a flowchart of a terahertz image dangerous article detection method based on multi-scale decomposition convolution in an embodiment of the application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

As shown in fig. 4, in this embodiment, a method for detecting a terahertz image dangerous article based on multi-scale decomposition convolution is provided, including: acquiring an image of a target object to be detected by using terahertz imaging equipment;

Acquiring an image of a target object to be detected by using terahertz imaging equipment, processing the image of the target object to be detected, and constructing an input image data set (manufacturing a terahertz image training data set for carrying hidden dangerous articles by a human body) comprises the following steps:

shooting N=3157 images of the object to be detected by using terahertz imaging equipment, wherein each object to be detectedThe image of the object is marked as S _i I=1, 2, …,3157. And (3) framing out dangerous goods in the N=3157 target object images to be detected by using a rectangular frame, converting the dangerous goods into tag data, obtaining an image containing the tag data, and preprocessing the target object image containing the tag data through mosaics enhancement, random left-right overturn, random size scaling and the like to obtain a preprocessed sub-image. After n=3157 images of the object to be measured are preprocessed in the same manner, an input image set a including n=3157 images can be obtained.

The target detection network model is trained by the input image set.

Inputting the image of the object to be detected into the object detection network model, and detecting the image of the object to be detected (the object detection algorithm obtains the detection image of the hidden dangerous goods carried by the human body) comprises the following steps:

as shown in fig. 1, the target detection algorithm obtains the detection image of the hidden dangerous goods carried by the human body through the multi-scale decomposition convolution self-adaptive fusion deep learning neural network. Image I epsilon R of object to be measured ^3×640×640 Inputting the hidden dangerous goods detection images into a target detection network model, and outputting the hidden dangerous goods detection images O epsilon R ^3×640×640 The mathematical model thereof can be expressed as follows:

O＝O(x,y)＝Φ(I,Ψ)

in the above formula, O (x, y) represents a dangerous article detection image, phi (·) represents a target detection algorithm neural network model, ψ is a parameter of the neural network, and (x, y) represents pixel coordinates of an output detection frame, and I is a detected target object image. If dangerous goods exist in the detected object image, the output detection frame is drawn on the detected object image. In addition, the class number of the dangerous goods and the predicted probability value thereof are marked on the detection frame. Otherwise, if no dangerous article exists in the detected object image, the output image is consistent with the input image.

The object detection network model includes:

in a feature extraction backbone network, shallow feature extraction is carried out through convolution operation to obtain a feature map with the size reduced by half in sequence, three feature layers with different sizes generated by three back-layer convolution operation in the backbone network are utilized to carry out multi-scale feature fusion at a feature fusion stage, so that the multi-scale feature extraction is facilitated, and the recognition rate of hidden dangerous goods is improved.

A self-adaptive multi-scale large-kernel decomposition convolution module AMDC is designed in the feature extraction process, so that the complexity is reduced while the receptive field is increased by the network, and the recognition and extraction capacity of dangerous goods features in the image is improved. As shown in fig. 2, the designed adaptive multi-scale large-kernel decomposition convolution module is realized by the following way: given a c×h×w=64×160×160 feature map f, the feature map f is first convolved and separated to obtain four C/4×h×w=16×160×160 feature maps f ₁ 、f ₂ 、f ₃ 、f ₄ And then three of the feature patterns f ₁ 、f ₂ 、f ₃ Three large kernel decomposition convolution modules are respectively input. The large-core decomposition convolution module comprises three convolutions, namely depth-wise convolution (depth-wise), depth-expansion convolution (depth-wise-position) and point-wise convolution (point-wise), wherein the depth-expansion convolution is realized by setting expansion rate of the depth convolution. Assuming that the dilation rate is d=3, a conventional convolution of a k×k=9×9 large convolution kernel can be decomposed into a depth convolution of (2 d-1) × (2 d-1) =5×5 convolution kernel, a depth dilation convolution of K/d×k/d=3×3 convolution kernel, and a point-wise convolution of a 1×1 convolution kernel. Finally, in the three large-kernel decomposition convolution modules, three large convolution kernels K with different sizes are set ₁ ＝5、K ₂ ＝21、K ₃ =45 and different expansion rates d are set ₁ ＝1、d ₂ ＝3、d ₃ Decomposition was performed to achieve multi-scale decomposition. The mathematical model can be expressed as follows:

D _i ＝P _c (D _dc (D _c (f _i ))),i＝1,2,3

in the above, P _c (. Cndot.) represents a point-by-point convolution function, D _dc (. Cndot.) represents the depth-expanded convolution function, D _c (. Cndot.) represents the depth convolution function, D _i And the characteristic diagram which is output by the ith large-kernel decomposition convolution module is shown.

In a multi-scale large coreThe decomposition convolution module outputs three feature maps D _i ∈R ^16×160×160 After i=1, 2,3, first, three feature graphs are connected in the channel dimension by using a cascade operation to obtain a feature graph D e R ^48×160×160 . Secondly, feature fusion and channel number dimension reduction are carried out on the feature map D through a convolution operation with a convolution kernel of 3, and a feature map D' E R is obtained ^C ^″×H×W Then, performing Softmax and channel separation operation to obtain three space self-adaptive weights Q _i I=1, 2,3. Three feature maps D to be input _i ∈R ^16×160×160 I=1, 2,3 and three spatially adaptive weights Q, respectively _i The i=1, 2,3 is weighted and aggregated to obtain an output characteristic diagram D'. Epsilon.R ^16×160×160 . Finally, the feature map D' and the feature map f are outputted ₄ Performing cascading and convolution operations to obtain a characteristic diagram f' E R ^64×160×160 The self-adaptive fusion of the features can be realized. In addition, in the feature extraction trunk part, if the parameter shortcut of the adaptive multi-scale large-kernel decomposition convolution module AMDC is set to True, the feature map f is added with the feature map f' after a convolution operation with a convolution kernel of 1 to realize residual connection. The mathematical model can be expressed as follows:

D＝Concat(D ₁ ,D ₂ ,D ₃ )

[Q ₁ ,Q ₂ ,Q ₃ ]＝S _p (S _o (C(D)))

D″＝Q ₁ D ₁ +Q ₂ D ₂ +Q ₃ D ₃

f′＝C(Concat(D″,f ₄ ))

f′＝f′+C(f),if shortcut＝True

concat (·) in the above formula represents the hidden function of the cascading operation, S _p (. Cndot.) represents separation channel dimension, S _o (. Cndot.) represents a Softmax function, and C (-) represents a hidden function of the convolution operation.

As shown in fig. 1, in order to improve the global representation capability and detection accuracy of terahertz image dangerous articles, a attention mechanism BRA shown in fig. 3 is designed in a feature extraction main network, and the attention mechanism BRA is introduced into the target in the applicationIn the feature extraction backbone network of the target detection network model, the network can utilize global features to improve the detection accuracy. Given an input feature map I epsilon R ^20×20×256 Dividing the feature map I into s×s=4×4 mutually non-overlapping regions, and converting the feature vectors therein into I ^r ∈R ^16×25×256 At the same time, tensors Q, K, V epsilon R are derived ^16×25×256 . Calculating the average value Q of Q and K in each region of the feature map ^r ,K ^r ∈R ^16×256 By Q ^r 、K ^r The area correlation between them results in an adjacency matrix A ^r The first k connection indexes of each area are reserved to obtain an index matrix X ^r . Finally, K and V are respectively combined with an index matrix X ^r Aggregation to obtain tensor K ^g And V ^g The feature map P is obtained for the aggregated K-V pairs using attention manipulation. The mathematical model can be expressed as follows:

Q＝I ^r W ^q ,K＝I ^r W ^k ,V＝I ^r W ^v

A ^r ＝Q ^r (K ^r ) ^T

K ^g ＝g(K,X ^r ),V ^g ＝g(V,X ^r )

P＝Attention(Q,K ^g ,V ^g )

w in the above ^q ,W ^k ,W ^v ∈R ^C×C Projection weights of query, key, value, g (·) represent hidden functions of the gather operation, and Attention (·) represents hidden functions of the Attention operation, respectively.

In addition, in order to better distinguish the feature difference between dangerous goods and the background, a parameter-free 3-D local attention SimAM is introduced into a feature fusion layer with deep semantic information, as shown in fig. 1. According to the application, the SimAM attention is introduced into the feature fusion module of the network model, so that the feature representation of the target detection model can be enhanced, and the accuracy of target detection can be improved. The implementation process is as follows: given an input profile Z ε R ^{1×128×80×80} Deriving 3-D attention weight of feature map by energy function EI (&) calculation and activating with sigmod (&) function to obtain activation weight, and then inputting feature mapMultiplying the activation weight to obtain an output attention weight Q _at The mathematical model can be expressed as follows:

Q _at ＝Z×sigmod(EI(Z))

after feature extraction backbone network extraction features and feature fusion are carried out on the detected object image, a feature map with n=3 sizes reduced by half in sequence is obtainedFeature map O' = [ O ] by Detect module ₁ ′,O ₂ ′,O ₃ ′]And (5) predicting and outputting a prediction result graph O. The mathematical model can be expressed as follows:

O＝O(x,y)＝D(O′)

in the above formula, D (·) represents an objective detection function, and O' is an array including n input feature maps.

In the deep neural network training process, the Loss function Loss (Θ) is optimized by adopting an SGD function, and the process is expressed as follows:

L _b ＝L _CIoU +L _DFL

L _DFL ＝-((y _i+1 -y)log(S _i )+(y-y _i )log(S _i+1 ))

wherein N is the number of detection layers, L _b For the frame regression loss function, L _c To classify the loss function, alpha ₁ ，α ₂ Weight coefficient of loss function, L _CIoU And L _DFL As a bounding box loss function, ioU is an intersection ratio. ρ is a prediction frameDistance from center point of real frame, p and g are center points of predicted frame and real frame, c is diagonal distance of minimum circumscribed rectangular frame of two frames, v is parameter for measuring length-width ratio to be consistent, y _i+1 Is the nearest integer to the right of the true value, y _i Is the nearest integer to the left of the true value, n is the number of samples, B _i Is the target value, S _i And outputting a value for the model.

After m=300 training, the optimized parameters can be obtained

For an object image F to be detected shot by terahertz imaging equipment, an image group F 'is obtained after the processing of a feature extraction backbone network and a feature fusion network, and finally a dangerous article detection result can be obtained after the image group F' is input into a feature detection network, namely byAnd obtaining a hidden dangerous goods detection result image containing the dangerous goods detection frame, the dangerous goods category number and the predicted probability value.

The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A terahertz image dangerous goods detection method based on multi-scale decomposition convolution is characterized by comprising the following steps:

2. The method for detecting the dangerous goods based on the terahertz image of the multi-scale decomposition convolution according to claim 1, wherein the processing the image of the object to be detected comprises the following steps:

3. The terahertz image dangerous article detection method based on multi-scale decomposition convolution of claim 1, wherein the target detection network model includes: a feature extraction backbone network, a feature fusion network and a feature detection network;

4. The terahertz image dangerous article detection method based on multi-scale decomposition convolution of claim 3, wherein a self-adaptive multi-scale large-kernel decomposition convolution module and an attention mechanism BRA are added into the feature extraction main network; the characteristic fusion network is added with the self-adaptive multi-scale large-kernel decomposition convolution module and the non-parametric 3-D local attention SimAM;

5. The terahertz image threat detection method based on multi-scale decomposition convolution of claim 4, wherein the adaptive multi-scale large-kernel decomposition convolution module comprises a depth convolution, a depth expansion convolution and a point-by-point convolution.

6. The terahertz image dangerous article detection method based on multi-scale decomposition convolution of claim 5, wherein before the self-adaptive multi-scale large-kernel decomposition convolution module performs multi-scale decomposition and self-adaptive fusion on the input feature map, the method comprises: and carrying out convolution and classification operation on the input feature images to obtain a plurality of feature images.

7. The terahertz image dangerous article detection method based on multi-scale decomposition convolution of claim 6, wherein the adaptive multi-scale large-kernel decomposition convolution module performs multi-scale decomposition and adaptive fusion on the input feature map, including:

extracting one of the feature maps as a first feature map;

8. The terahertz image dangerous article detection method based on multi-scale decomposition convolution of claim 4, wherein the target parameter in the adaptive multi-scale large-kernel decomposition convolution module is a preset target value.

9. The terahertz image dangerous article detection method based on multi-scale decomposition convolution of claim 3, wherein a mathematical model for outputting the hidden dangerous article detection result image is:

10. The terahertz image dangerous article detection method based on multi-scale decomposition convolution according to claim 1, wherein the training process of the target detection network model through the input image dataset comprises the following steps:

optimizing the Loss function Loss (Θ) by adopting an SGD function:

L _b ＝L _CIoU +L _DFL

L _DFL ＝-((y _i+1 -y)log(S _i )+(y-y _i )log(S _i+1 ))