CN115187842A

CN115187842A - Target detection method of passive terahertz security inspection image based on mode conversion

Info

Publication number: CN115187842A
Application number: CN202210823984.8A
Authority: CN
Inventors: 柏业超; 陈玥秀; 唐岚; 张兴敢
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-10-14

Abstract

The invention discloses a target detection method of a passive terahertz security inspection image based on mode conversion, which comprises the following steps: the method comprises the steps of carrying out image fusion on a hazardous article image and a passive terahertz human body image under an X ray, carrying out modal style conversion on unpaired data by using a cyclic generation countermeasure network, generating a passive terahertz security inspection image which is highly similar to a real image, expanding the number and the types of data sets by synthesizing the image, then training an improved YOLOv5 target detection network by using the data sets, learning characteristics of multiple types of hazardous articles, and finally detecting the real passive terahertz security inspection image by using the trained target detection network.

Description

Target detection method of passive terahertz security inspection image based on mode conversion

Technical Field

The invention relates to the technical field of target detection, in particular to a target detection method of a passive terahertz security inspection image based on mode conversion.

Background

Terahertz (THz) waves are electromagnetic waves with a frequency within a range of 0.1-10THz (with a wavelength of 3000-30 μm), have good substance penetration characteristics for articles, and have substance fingerprint spectrum identification characteristics, so that the terahertz waves are applied to imaging. The terahertz wave single photon has low energy, only a few milli-electron volts is far lower than X-rays, and the material characteristics cannot be damaged due to ionization, so that the terahertz wave imaging is good in safety when used for human body security inspection and can be applied to security inspection in large-scale occasions with dense crowds.

Disclosure of Invention

The invention provides a target detection method of a passive terahertz security inspection image based on mode conversion, which can solve the problems of few samples and unbalanced samples in the target detection training process of the passive terahertz security inspection image in the prior art, and improve the detection precision and the generalization.

In order to achieve the purpose, the invention provides the following technical scheme: the target detection method of the passive terahertz security inspection image based on mode conversion comprises the following steps:

s1, carrying out image fusion on an acquired dangerous article image under X-ray and a passive terahertz human body image;

s2, constructing a cyclic generation countermeasure network, performing modal style conversion of unpaired data, and generating a passive terahertz security inspection image;

s3, selecting and marking passive terahertz security inspection images generated based on different kinds of articles to manufacture a passive terahertz security inspection image dataset;

s4, training an improved YOLOv5 target detection network by utilizing a passive terahertz security inspection image data set;

and S5, detecting a real passive terahertz security check image through the trained target detection network.

Preferably, in step S1, the dangerous goods image under the X-ray is processed and converted into a grayscale image, a mask of the processed image is established, an effective region with goods is extracted, a random position conforming to two-dimensional normal distribution is generated according to the goods position statistics in the real data set, a superimposed region generated by the terahertz human body image is extracted, and the two images are fused according to the mask of the X-ray image:

Img _add [i,j]＝Img ₁ [i,j]*mask[i,j]+Img ₂ [i,j]*(1-mask[i,j])；

wherein Img _add For superimposing pictures, mask is the mask of the X-ray image, img ₁ And Img ₂ Respectively representing areas to be fused extracted from an X-ray image and a passive terahertz human body image; will Img _add Covering the corresponding area of the original terahertz human body picture.

Preferably, the image of the dangerous goods under the X-ray is processed: and carrying out data set annotation on the dangerous goods image, then rotating the X-ray dangerous goods image to obtain object images at different angles, and then zooming the image according to a certain proportion.

Preferably, the contrast enhancement is performed on the grayscale picture through histogram orthography, and the enhanced image is as follows:

wherein I is an image gray matrix, I _max Is the maximum gray level of I _min Is the minimum gray level in I.

Preferably, in step (ii)In S2, the cyclic generation countermeasure network comprises two generators and two discriminators, wherein the generated passive terahertz security inspection image is recorded as an X domain, a real terahertz picture is recorded as a Y domain, and the generator for converting the X domain picture into the Y domain picture is recorded as a G _XY G for converting Y-domain picture into X-domain picture _YX The decision device for identifying the X domain picture is D _X The decision device for discriminating the Y-domain picture is D _Y (ii) a The cyclic generation countermeasure network simultaneously establishes the mappings of X → Y and Y → X.

Preferably, the loss function during the cycle-generated antagonistic network conversion is:

wherein,

as a function of the penalty incurred during the X → Y cycle

Is the antagonistic loss function during the Y → X cycle, λ _cyc 、λ _idt Is a coefficient of proportionality that is,

in order to be a function of the cyclic consistency loss,

loss of diversity;

the optimization target is as follows:

preferably, the generator comprises an encoder, a converter and a decoder, wherein the encoder extracts a feature vector from an input image, and performs convolution, normalization and activation operations, the converter converts the feature vector of a source domain into a feature vector of a target domain, and the decoder restores low-level features from the feature vector to generate an image; the discriminator is used for extracting the features from the image and judging whether the features are close to the image features of a certain domain.

Preferably, in step S3, the number and types of data sets are continuously expanded by the passive terahertz security inspection image generated in step S5.

Preferably, in step S4, a YOLOv5 target detection network is built and improved, wherein an attention mechanism module CBAM for fusing channel attention and spatial attention is added at the tail end of a backhaul network of the YOLOv5 target detection network, a key position in an output feature map is concerned, and a feature extraction module adopts a bidirectional feature fusion method: and (3) weighting a bidirectional feature pyramid network BiFPN, and realizing bidirectional fusion of features by adopting bidirectional cross scale connection and weighted feature fusion.

Compared with the prior art, the invention has the following beneficial effects: according to the method, the acquired dangerous goods image under the X ray and the passive terahertz human body image are overlapped and fused by using mask operation, and a synthetic terahertz security inspection image which accords with terahertz imaging characteristics and is close to a real image is generated by using a cyclic countermeasure generation network for mode conversion, wherein the cyclic countermeasure generation network simultaneously trains two generators, and simultaneously establishes X → Y and Y → X mapping, so that a group of pictures with different contents and structures can be trained, and the method gets rid of the problem that the existing method is difficult to restrict the generated image and the input image to keep the contents and the structural consistency, and the structural alignment of a source picture and a target picture is required, namely the limitation of a paired data set is required; based on the passive terahertz security check images generated by different types of articles, terahertz security check image data sets with rich types and high definition are manufactured, the detection precision and the range of the detected articles can be effectively improved, the problems of few samples and unbalanced samples in the target detection training process of the passive terahertz security check images are solved, the detection precision and the generalization are improved, and the detection requirements on multiple types of dangerous articles under the actual security check scene are met.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

In the drawings:

FIG. 1 is a flow chart of a method of object detection of the present invention;

FIG. 2 is a diagram showing statistical results of the positions and sizes of anchors of hazardous materials in a terahertz image according to the present invention;

FIG. 3 is a block diagram of the loop countermeasure generation network of the present invention;

FIG. 4 is an effect graph and a real classmate graph of the synthesized terahertz picture of the invention;

FIG. 5 is a block diagram of the convolution block attention module of the present invention;

FIG. 6 is a schematic diagram of a weighted bidirectional feature pyramid network according to the present invention;

fig. 7 is a schematic structural diagram of the object detection effect of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that they are presented herein only to illustrate and explain the present invention and not to limit the present invention.

Example (b): as shown in fig. 1, the target detection method of the passive terahertz security inspection image based on the mode conversion includes the following steps:

collecting X-ray security inspection images, classifying pictures according to article types, and collecting X-ray images of a knife, a gun, scissors and a lighter which are white backgrounds in a specific embodiment and have the size of 256 × 256;

according to whether dangerous goods are carried or not, obtaining two types of danger and safety, labeling data sets of pictures of the dangerous goods by using labelImg software, wherein the pictures are in a PASCAL VOC format, contain labels and position information of target goods and are divided into a pistol, a large-sized pistol, a stick tool and a mobile phone; as shown in fig. 2, the relative size and relative position of the anchor box (anchor box) with respect to the whole picture are counted;

rotating the X-ray dangerous goods picture to obtain object pictures at different angles, and filling the missing backgroundFilling white; scaling the picture according to a certain proportion, wherein the proportion is selected by referring to the relative size of the anchor frame and the actual size of the object which are counted, and the proportion is generally set to be [0.3,0.42 ]]A random number in between; converting the processed picture from an RGB picture into a gray picture; in consideration of the difference between X-ray imaging and terahertz imaging, contrast enhancement operation is carried out on pictures of objects such as a lighter; performing contrast enhancement by adopting histogram orthogonalization; the enhanced images are:

wherein I is an image gray matrix, I _max Is the maximum gray level of I _min Is the minimum gray level in I;

establishing a mask of the processed image, extracting an effective area with the article, for example, setting the effective area to be 1 and the ineffective area to be 0, and setting the area with the gray value greater than 210 to be 0 in a specific embodiment; selecting a target coverage area in the terahertz human body image, referring to the obtained anchor frame center relative position statistical result, regarding the anchor frame center position as two-dimensional normal distribution, establishing a coordinate system by taking the lower left corner of the terahertz image as an original point, and describing the distribution of the anchor frame center position as follows: (ii) a

In this example, take μ ₁ ＝0.58，μ ₂ ＝0.42，σ ₁ ＝σ ₂ Generating a random number which is in accordance with the distribution as an anchor frame central point, and re-fetching points when a coverage area corresponding to the fetched points exceeds the picture range; extracting the position of the corresponding region of the background human body image, fusing the two images according to the mask of the X-ray image, img _add [i,j]＝Img ₁ [i,j]*mask[i,j]+Img ₂ [i,j]*(1-mask[i,j]) (ii) a Wherein Img _add For superimposed pictures, mask is the mask of the X-ray image, img ₁ And Img ₂ Respectively representing areas to be fused extracted from an X-ray image and a passive terahertz human body image; will Img _add Covering original terahertz human bodyA corresponding region of the picture;

s2, constructing a loop to generate a confrontation network, and performing modal style conversion on unpaired data to generate a passive terahertz security inspection image;

referring to fig. 3, the loop generation countermeasure network includes two generators and two discriminators, and for a common GAN to discriminate a picture generated by the generator from a target picture, it is difficult to constrain that a generated image and an input image maintain content and structural consistency, so that structural alignment of a source picture and a target picture is required, that is, a paired data set is required, and the loop generation countermeasure network gets rid of this limitation, where the generator includes an encoder, a converter, and a decoder, the encoder extracts a feature vector from the input image, the encoder used in this embodiment is composed of three layers of convolutional neural networks, performs convolution, normalization, and activation operations, and the encoder output is 256 × 64; the converter converts the feature vector of the source domain into the feature vector of the target domain, the embodiment adopts 9 residual blocks, the gradient disappearance can be weakened by using the residual blocks, the network depth can be adjusted in a self-adaptive manner, and the output of the residual blocks is 256 × 64; the decoder recovers low-level features from the feature vector, and the low-level features are composed of two layers of deconvolution layers and one layer of convolution network, and the output is 3 × 256. The activation function of the last convolution layer in the generator adopts Tanh, and the rest adopts ReLU; the discriminator extracts features from the image and judges whether the features are close to the image features of a certain domain. In this embodiment, a five-layer convolutional network is used to perform convolution, normalization, and activation operations, and the activation function uses leakyreu.

In a specific embodiment, the synchronization step S1 generates 1200 synthesized terahertz security images, which are taken as a source domain image set and denoted as X; selecting 1200 passive terahertz images containing dangerous goods as a target domain image set and recording the images as Y; the generator for converting X domain picture into Y domain picture is marked as G _XY G for converting Y-domain picture into X-domain picture _YX The decision device for identifying the X domain picture is D _X The decision device for discriminating the Y-domain picture is D _Y (ii) a Circularly generating an antagonistic network and simultaneously establishing mapping of X → Y and Y → X;

in the forward loop, X-domain picture X is input,through G _XY Generating

Will be provided with

Input decision device D _Y Identify and calculate the challenge loss, will

Input generator G _YX To obtain

X and

the content distribution therein is aligned for calculating the cyclic consistency loss for constraining the output picture to be identical to the input picture content; in the reverse loop, the Y-domain picture Y is input, via the generator G _YX Generating

Will be provided with

Input decision device D _X Make decision to calculate the countermeasure loss

Input generator G _XY Generating

y and

are aligned, their cycle consistency loss is calculated, and the training effect is shown with reference to FIG. 4;

wherein the loss function is composed of the countermeasures

Loss of cyclic consistency

identity loss composition; the countermeasures loss describes the quality of a discrimination result in one-way propagation, and least square loss is used and is expressed as:

the training targets are:

loss of cyclic consistency

Comparing the input image with the images generated by the two generators, describing the consistency between the generated images and the content of the original image, which is an important point in the cyclic generation countermeasure network, and ensuring that the source domain picture and the synthesized picture are aligned in structure and similar in content in training, in this embodiment, L1 loss is used, which is helpful for recovering the low frequency part of the image, and is represented as:

adding Identity loss

For describing the continuity of the image, the generated image is brought close to the input image, and is represented as:

the overall loss function is expressed as:

wherein,

as a function of the penalty loss on the fly,

for the penalty function in the reverse cycle, λ _cyc 、λ _idt Is a scaling factor.

The optimization target is as follows:

according to the loss calculation result, parameters are optimized by adopting a gradient descent method, the learning rate in a specific embodiment is initially 0.0002, and after a half round number training, the linear reduction is 0; wherein, the generators and the discriminators both use Adam optimizers with better convergence performance, the two generators are adopted for optimization at the same time, the discriminators separately optimize the optimization strategy, and the momentum is beta ₁ ＝0.5，β ₂ ＝0.999。

S3, selecting and marking passive terahertz security inspection images generated based on different kinds of articles to manufacture a passive terahertz security inspection image data set;

in a specific embodiment, 1000 synthetic terahertz security inspection images are generated and divided into four types of handguns, knives, sharp scissors and lighters, and are labeled by LabelImg to be manufactured into a passive terahertz security inspection image data set; including the tag and location information of the hazardous item in the picture.

building and improving a YOLOv5 target detection network, wherein the YOLOv5 consists of an input end, a backbone, a nic and a pre-measuring head, and the input end adopts a data enhancement mode such as Mosaic, cutout, copy-paste and the like to perform self-adaptive anchor frame calculation and self-adaptive picture scaling operation on input data; the backhaul uses CSPDarknet, and consists of three CSP modules and one SPP module, as shown in fig. 5, an attention mechanism module CBAM for fusing channel attention and space attention is added at the tail end of the backhaul, the key position in an output characteristic diagram is concerned, the nack adopts a characteristic pyramid FPN + PAN structure, the FPN is from top to bottom, the characteristic information of a high layer is transmitted and fused in an up-sampling mode, the PAN is formed by adding a pyramid from bottom to top behind the FPN to perform secondary fusion on the characteristics, and the positioning characteristics of a bottom layer are transmitted to the upper layer; in this embodiment, a more efficient bidirectional feature fusion mode is adopted: the weighted bidirectional feature pyramid network BiFPN adopts bidirectional cross scale connection and weighted feature fusion to realize bidirectional fusion of features, wherein a BiFPN structure diagram is shown in a reference diagram of FIG. 7, and compared with a FPN + PAN structure, the weighted bidirectional feature pyramid network BiFPN reduces parameter quantity and calculation cost;

and S5, detecting a real passive terahertz security inspection image through the trained target detection network, wherein a test example refers to the graph shown in FIG. 6, and obtaining high detection precision.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The target detection method of the passive terahertz security inspection image based on mode conversion is characterized by comprising the following steps of:

2. The target detection method of the passive terahertz security inspection image based on mode conversion according to claim 1, characterized in that: in step S1, processing dangerous goods images under X-rays, converting the dangerous goods images into gray level pictures, establishing masks of the processed images, extracting effective areas with goods, generating random positions conforming to two-dimensional normal distribution according to the position statistics of the goods in real data set, extracting superposed areas generated by terahertz human body images, and fusing the two images according to the masks of the X-ray images:

Img _add [i,j]＝Img ₁ [i,j]*mask[i,j]+Img ₂ [i,j]*(1-mask[i,j])；

wherein Img _add For superimposed pictures, mask is the mask of the X-ray image, img ₁ And Img ₂ Respectively representing regions to be fused extracted from an X-ray image and a passive terahertz human body image; will Img _add Covering the corresponding area of the original terahertz human body picture.

3. The target detection method of the passive terahertz security inspection image based on mode conversion according to claim 2, characterized in that: processing the dangerous goods image under the X-ray: and carrying out data set annotation on the dangerous goods image, then rotating the X-ray dangerous goods image to obtain object images at different angles, and then zooming the image according to a certain proportion.

4. The target detection method of the passive terahertz security inspection image based on mode conversion according to claim 2 or 3, wherein: contrast enhancement is carried out on the gray level picture through histogram orthographic visualization, and the enhanced image is：

5. The target detection method of the passive terahertz security inspection image based on mode conversion according to claim 1, characterized in that: in step S2, the cyclic generation countermeasure network includes two generators and two discriminators, where the generated passive terahertz security inspection image is recorded as X domain, the real terahertz picture is recorded as Y domain, and the generator for converting X domain picture into Y domain picture is recorded as G domain picture _XY G for converting Y-domain picture into X-domain picture _YX The decision device for identifying the X domain picture is D _X The decision device for discriminating the Y-domain picture is D _Y (ii) a The cyclic generation countermeasure network simultaneously establishes the mappings of X → Y and Y → X.

6. The target detection method of the passive terahertz security inspection image based on mode conversion according to claim 5, characterized in that: the loss function during the cycle-to-counter network transformation is:

wherein,

as a function of the penalty incurred during the X → Y cycle,

for cyclic consistency lossThe function of the function is that of the function,

loss of diversity;

the optimization target is as follows:

7. the target detection method of the passive terahertz security inspection image based on mode conversion according to claim 5, characterized in that: the generator comprises an encoder, a converter and a decoder, wherein the encoder extracts a feature vector from an input image, convolution, normalization and activation operations are performed, the converter converts the feature vector of a source domain into the feature vector of a target domain, and the decoder restores low-level features from the feature vector to generate an image; the discriminator is used for extracting the features from the image and judging whether the features are close to the image features of a certain domain.

8. The target detection method of the passive terahertz security inspection image based on mode conversion according to claim 1, characterized in that: in step S3, the number and types of data sets are continuously expanded by the passive terahertz security inspection image generated in step S5.

9. The target detection method of the passive terahertz security inspection image based on mode conversion according to claim 1, characterized in that: in step S4, a YOLOv5 target detection network is built and improved, wherein an attention mechanism module CBAM for fusing channel attention and spatial attention is added at the end of a backbone network of the YOLOv5 target detection network, a key position in an output feature map is concerned, and a feature extraction module adopts a bidirectional feature fusion mode: and (3) weighting a bidirectional feature pyramid network BiFPN, and realizing bidirectional fusion of features by adopting bidirectional cross scale connection and weighted feature fusion.