CN110853019B

CN110853019B - Method for detecting and identifying controlled cutter through security check

Info

Publication number: CN110853019B
Application number: CN201911104406.3A
Authority: CN
Inventors: 张莉; 郭瑞鸿; 孙欢; 杨莹; 曹洋; 孟俊熙; 谭海燕; 韩仪洒
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2023-02-24
Anticipated expiration: 2039-11-13
Also published as: CN110853019A

Abstract

The invention discloses a method for detecting and identifying a controlled cutter in security inspection, which is implemented according to the following steps: step 1, carrying out normalization processing on an image A to obtain an image B; step 2, processing the image B by an SSD-ResNet101 model, a feature fusion method, pooling and a Relu function to obtain a feature map C; step 3, calculating a default frame of the characteristic diagram C by adopting a scale calculation formula; matching the default frame with the target frame by adopting a Jaccard overlay matching strategy and model training to obtain a prediction frame; and 4, screening the final prediction box set by adopting a non-maximum suppression algorithm to obtain a detection result. The invention has high detection and identification precision for dangerous goods of the X-ray tube cutter and strong real-time security check.

Description

Method for detecting and identifying controlled cutter through security check

Technical Field

The invention belongs to the technical field of image detection and identification and X-ray image dangerous goods identification, and particularly relates to a method for detecting and identifying a controlled cutter through security inspection.

Background

An accurate automatic detection technology for controlled articles is a main research direction in the field of security inspection, the existing target detection algorithm based on deep learning has the problems of partial small target false detection, missing detection and the like in detection of a tube-making cutter in a security inspection picture, and the overall detection effect is poor. In 2016, akcay S applies a deep convolutional network to the detection of X-ray dangerous goods for the first time in the literature, and in recent years, target detection algorithms based on CNN are mainly divided into two types: single-stage detection and two-stage detection.

The two-stage detection mainly comprises the following steps: R-CNN, fast RCNN, R-FCN, FPN, etc. Compared with the traditional method, the detection algorithm improves the detection precision, but has large calculation amount and slower detection speed. The single-phase detection algorithm mainly comprises YOLO and SSD. The detection speed is improved by YOLO, but the detection precision of the algorithm is low. Compared with other algorithms, the SSD algorithm has good performance, balanced detection precision and speed, strong robustness and mature application in the field of small target detection, but has the defects. The DSSD algorithm proposed by Fu Cheng-Yang et al in 2017 improves SSD and improves the detection precision of the algorithm, but the detection speed is slow.

Disclosure of Invention

The invention aims to provide a method for detecting and identifying a controlled cutter through security check, which has high detection and identification precision on dangerous goods of the X-ray controlled cutter and strong real-time security check.

The technical scheme adopted by the invention is that the method for detecting and identifying the controlled cutter in the security check is implemented according to the following steps:

step 1, carrying out normalization processing on an image A to obtain an image B;

step 2, processing the image B by an SSD-ResNet101 model, a feature fusion method, pooling and a Relu function to obtain a feature map C;

step 3, calculating a default frame of the characteristic diagram C by adopting a scale calculation formula; matching the default frame with the target frame by adopting a Jaccard overlay matching strategy and model training to obtain a prediction frame;

and 4, screening the final prediction box set by adopting a non-maximum suppression algorithm to obtain a detection result.

The invention is also characterized in that:

the specific process of step 1 is as follows:

the image a is normalized using the following formula:

in the formula, y _i Representing the result of the normalization processing; x is the number of _i A matrix of images representing the input; mu is x _i The mean value of (a); sigma ² Is x _i The variance of (a); ε is a positive number, typically 0.0001; gamma is a pull-up parameter; beta is an offset parameter.

The specific process of the step 2 is as follows:

inputting image B into the SSD-ResNet101 network; performing convolution operation on the image B by adopting 512 convolution kernels with the size of 3 multiplied by 3 on the C1 layer, and outputting a characteristic diagram of the C1 layer as the input of the C2 layer; performing convolution operation on the C2 layer by using a convolution kernel of 1 multiplied by 1024, and outputting a characteristic diagram of the C2 layer as the input of the C3 layer; performing convolution operation on the C3 layer by using 512 3 multiplied by 3 convolution kernels, and outputting a C3 layer feature map; introducing a C2 layer characteristic diagram into the C1 layer characteristic diagram, and enhancing by using a Relu function to obtain a C4 layer characteristic diagram; introducing a C3 layer characteristic diagram into the C2 layer characteristic diagram, and enhancing by using a Relu function to obtain a C5 layer characteristic diagram; fusing the C4 layer characteristic diagram and the C5 layer characteristic diagram, enhancing by using a Relu function, and outputting a C6 layer characteristic diagram as the input of a C7 layer;

adopting 512 convolution kernels with the size of 3 multiplied by 3 to carry out convolution operation on the C7 layer, and outputting a C7 layer feature diagram as the input of the C8 layer; carrying out convolution operation on the C8 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C8 layer feature map as the input of a C9 layer; performing convolution operation on the C9 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C9 layer feature map as an input of a C10 layer; performing convolution operation on the C10 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C10 layer feature map as an input of a C11 layer; carrying out convolution operation on the C11 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C10 layer feature map as the input of the C11 layer; and performing pooling operation on the C11 layer, and outputting a C12 layer feature map, namely the feature map C.

The Relu function is as follows:

f(x)＝max(0,x) (2)

in the formula, x is an input vector from the neural network of the upper layer, and f (x) is an output vector.

The specific process of step 3 is as follows:

selecting characteristic graphs of C6-C12 layers in the characteristic graph C, and calculating default frames of the characteristic graphs of the C6-C12 layers by adopting a scale calculation formula; and matching the default frame with the target frame by using a Jaccard overlay matching strategy to obtain a prediction frame.

The scale calculation formula is as follows:

in the formula, S _min The scale of a default frame in the C1 layer is set as S _min ＝0.04；S _max The scale of a default frame in the top C12 layer is S _max ＝0.49；m＝5；

Wherein, different aspect ratios are set for each default frame, and the calculation formula is as follows:

aspect ratio:

width:

high:

in the formula, when the aspect ratio is 1, a default box is added, and the scale is as follows:

)。

when the Jaccard overlay strategy is adopted to match the target frame with the default frame, when the Jaccard overlay is larger than the threshold value of 0.5, the default frame is considered to be matched with the target frame; the Jaccard Overlap expression is as follows:

in the formula, A represents a set area formed by all pixels in a default frame; b denotes an aggregate area made up of all pixels within the target frame.

Model training is used for reducing the total prediction error; the total error of prediction is obtained by weighted summation of classified and detected errors, and the expression is as follows:

wherein L is the total error; l is _conf (x, c) is a classification error; l is _loc (x, l, g) is detectionAn error; the weight value a =1; x is a matching flag of the prediction box and the target box, x =1 represents matching, and x =0 represents mismatching; c is the score of the class prediction; l is the pixel value of the prediction frame; g is the pixel value of the target frame; if N =0, L =0;

wherein L is _conf (x, c) is calculated by a Softmax loss function, and the calculation formula is as follows:

wherein, L is adopted _loc (x, L, g) is obtained by performing position offset regression calculation by a Smooth L1 function, and the calculation formula is as follows:

in the formula (I), the compound is shown in the specification,

l represents a prediction box; g represents a target frame; d denotes a default box.

The process of non-maxima suppression is as follows:

sorting all the prediction frames from top to bottom according to the scores, judging the Jaccard Over lap of the prediction frame with the highest score and other prediction frames, if the score is larger than the threshold value, discarding other prediction frames, then judging the Jaccard Over lap of the prediction frame with the highest score and other prediction frames, and discarding the prediction frames larger than the threshold value; then judging a prediction frame with the third highest score; and so on until all prediction boxes are traversed.

The invention has the beneficial effects that:

1) The mAP value of the invention on the VOC2007+2012 universal data set is higher than that of the original SSD algorithm;

2) The invention solves the problems of partial small target false detection and missing detection existing in the detection of the tube cutter in the security inspection picture by the current target detection algorithm based on deep learning, and improves the overall detection effect of the X-ray image dangerous goods.

Drawings

FIG. 1 is a comparison graph of detection speed and detection precision of an algorithm and an SSD algorithm, an FSSD algorithm and an ESSD algorithm in the method for detecting and identifying the managed cutter by security inspection of the invention;

FIG. 2 is a diagram showing the experimental results of the method for detecting and identifying the controlled cutting tools in the security inspection of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention relates to a method for detecting and identifying a controlled cutter in security check, which is implemented according to the following steps:

the specific process of step 1 is as follows:

the image a is normalized using the following formula:

in the formula, y _i Representing the result of the normalization processing; x is the number of _i A matrix of images representing the input; mu is x _i The mean value of (a); sigma ² Is x _i The variance of (a); ε is a positive number, typically 0.0001; gamma is a tensile stressA parameter; beta is an offset parameter.

the SSD-ResNet101 network structure adopts a residual error network ResNet101 to replace the original VGGNet as a basic network of the SSD; the model is mainly divided into two parts, wherein a basic network is a residual error ResNet101 network, res3b3 and Res5c are residual error structural layers, conv6_2, new layer 2-New layer6 are newly added auxiliary convolutional layers, and New layer7 is a pooling layer; conv:3 × 3 × 512 represents convolving the feature map using 512 convolution kernels of size 3 × 3;

the specific process of the step 2 is as follows:

adopting 512 convolution kernels with the size of 3 multiplied by 3 to carry out convolution operation on the C7 layer, and outputting a characteristic diagram of the C7 layer as the input of the C8 layer; performing convolution operation on the C8 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C8 layer feature map as an input of a C9 layer; carrying out convolution operation on the C9 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C9 layer feature map as the input of a C10 layer; performing convolution operation on the C10 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C10 layer feature map as an input of a C11 layer; performing convolution operation on the C11 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C10 layer feature map as an input of the C11 layer; performing pooling operation on the C11 layer, and outputting a C12 layer characteristic diagram, namely a characteristic diagram C;

wherein the Relu function is as follows:

f(x)＝max(0,x) (2)

the specific process of step 3 is as follows:

The scale calculation formula is as follows:

aspect ratio:

width:

high:

)。

in the formula, A represents a set area formed by all pixels in a default frame; b denotes a collective region composed of all pixels in the target frame.

wherein L is the total error; l is a radical of an alcohol _conf (x, c) is a classification error; l is _loc (x, l, g) is detection error; the weight value a =1; x is a matching flag of the prediction box and the target box, x =1 represents matching, and x =0 represents mismatching; c is the score of the class prediction; l is the pixel value of the prediction frame; g is the pixel value of the target frame; if N =0, L =0;

in the formula (I), the compound is shown in the specification,

Step 4, screening the final prediction box set by adopting a non-maximum suppression algorithm to obtain a detection result;

the process of non-maxima suppression is as follows:

sorting all the prediction frames from top to bottom according to the scores, judging the Jaccard Over lap of the prediction frame with the highest score and other prediction frames, if the score is larger than the threshold value, discarding other prediction frames, then judging the Jaccard Over lap of the prediction frame with the highest score and other prediction frames, and discarding the prediction frames larger than the threshold value; then judging a prediction box with the third highest score; and so on until all prediction boxes are traversed.

1. Simulation experiment and result analysis

1. Experimental Environment and parameter configuration

The experimental process is carried out at the Linux PC side, and the specific hardware and software environments used for training the model and testing the algorithm performance are shown in Table 1:

table 1 experimental environment configuration

Table 1 experimental environment configuration

Setting the initial learning rate of the training model to be 0.001, changing the initial learning rate to be 0.0001 after the number of iterations reaches 4000, and finishing the model training after the iterations continue for 4000; model training was performed on a Tensorflow architecture, with tests performed every 20000 intervals, with momentum and weight decay rates set to 0.9000 and 0.0005, respectively.

2. Comparative method analysis

In order to verify the improved algorithm performance of the invention, an SSD algorithm, an FSSD algorithm and an ESSD algorithm are selected, five groups of tests are performed on the SDCK2019 managed cutter data set on the performance of each algorithm, and the test results are shown in Table 2:

TABLE 2

mAP of each algorithm on SDCK2019 data set

Table 2 mAP of each algorithm on SDCK2019 data set

It can be seen that compared with the classic SSD algorithm, the precision of the improved SSD detection algorithm is improved by 1.6% on the SDCK2019 managed cutter data, and the improved SSD detection algorithm is superior to the FSSD algorithm and the ESSD algorithm.

3. Detecting speed contrast

In order to test the detection speed of the improved SSD algorithm, the FSSD algorithm and the ESSD algorithm are selected, a GPU server carrying two NVIDIA RTX2080ti is tested on the SDCK2019 control tool data set, and the Frames pre second of each algorithm is calculated; the table of comparison of the detection speeds of the respective algorithms is shown in table 3:

TABLE 3

Detection speed of each algorithm on SDCK2019 data set

Table 5 detection speed of each algorithm on SDCK2019 data set

It can be seen that the detection precision of the improved SSD algorithm on the SDCK2019 control tool data set is 12.5Fps, which is equivalent to the detection speed of the ESSD algorithm, but lower than the detection speeds of the SSD and the FSSD algorithm; the reason is that the SSD basic network is replaced by ResNet101 from VGGNet, the network structure is deepened, and the algorithm detection speed is reduced due to complex calculation; however, the detection speed of the current security check machine for the controlled articles is required to be 10 frames per second, and the real-time requirement of the algorithm applied to the security check machine can be met as long as the detection speed of the algorithm is not lower than 10Fps, so that the improved SSD algorithm can meet the actual security check requirement.

4. In order to compare the overall detection performance of the improved SSD algorithm, FSSD algorithm and ESSD algorithm on the SDCK2019 managed cutter data set, a comparison graph of the detection speed and the detection precision of each algorithm is made by combining the tables 2 and 3, and the comparison graph is shown in FIG. 1; in the figure, the horizontal and vertical coordinates respectively represent the number of picture frames detected by the algorithm per second and the average detection precision (mAP value); comparing and analyzing the comprehensive detection performance of various algorithms on the small target of the cutting tool by using a speed precision comparison graph of the algorithms, wherein the closer to the upper right corner in the graph, the better the comprehensive detection performance of the algorithms is; the improved SSD algorithm has the highest detection precision on the small targets in the machining tool, solves the problems of partial false detection and missing detection of the small targets in the machining tool in the security inspection picture by the conventional target detection algorithm based on deep learning to a certain extent, and improves the overall detection effect.

5. To verify whether the improved SSD algorithm and feature fusion of the present invention is effective on the traditional SSD improved method, two steps of the algorithm of the present invention are tested in a distribution on SDCK2019 managed tool dataset, testing the maps of each step as shown in table 4:

TABLE 4

Improved algorithm step-by-step mAP test

Table 7 improved algorithm test mAP step by step

The improved SSD algorithm enhances the feature extraction capability of the detection model on the small target cutter; the deep layer and the shallow layer of the improved SSD algorithm are subjected to feature fusion, the receptive field of a shallow layer feature map is increased, and richer semantic information in the deep layer is fused to the shallow layer, so that the detection capability of the algorithm on the small cutter control target in the security inspection picture is comprehensively improved, the algorithm is high in precision and good in robustness; the results of the detection are shown in FIG. 2.

Claims

1. A method for detecting and identifying a controlled cutter in security inspection is characterized by comprising the following steps:

2. The method for safety inspection and identification of control tools according to claim 1, wherein the specific process of step 1 is as follows:

the image a is normalized using the following formula:

3. The method for safety inspection and identification of controlled cutters as claimed in claim 2, wherein the step 2 comprises the following specific processes:

inputting image B into the SSD-ResNet101 network; carrying out convolution operation on the image B by adopting 512 3 multiplied by 3 convolution kernels on the C1 layer, and outputting a C1 layer feature map as the input of the C2 layer; performing convolution operation on the C2 layer by using a convolution kernel of 1 multiplied by 1024, and outputting a characteristic diagram of the C2 layer as the input of the C3 layer; performing convolution operation on the C3 layer by using 512 3 multiplied by 3 convolution kernels, and outputting a C3 layer feature map; introducing a C2 layer characteristic diagram into the C1 layer characteristic diagram, and enhancing by using a Relu function to obtain a C4 layer characteristic diagram; introducing a C3 layer characteristic diagram into the C2 layer characteristic diagram, and enhancing by using a Relu function to obtain a C5 layer characteristic diagram; fusing the C4 layer characteristic diagram and the C5 layer characteristic diagram, enhancing by using a Relu function, and outputting a C6 layer characteristic diagram as the input of a C7 layer;

adopting 512 convolution kernels with the size of 3 multiplied by 3 to carry out convolution operation on the C7 layer, and outputting a characteristic diagram of the C7 layer as the input of the C8 layer; performing convolution operation on the C8 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C8 layer feature map as an input of a C9 layer; performing convolution operation on the C9 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C9 layer feature map as an input of a C10 layer; performing convolution operation on the C10 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C10 layer feature map as an input of a C11 layer; performing convolution operation on the C11 layer by using 256 3 multiplied by 3 convolution kernels, and outputting a C10 layer feature map as an input of the C11 layer; and performing pooling operation on the C11 layer, and outputting a C12 layer feature map, namely the feature map C.

4. The method for safety inspection identifying a regulatory tool as in claim 3, wherein the Relu function is as follows:

f(x)＝max(0,x) (2)

5. The method for safety inspection and identification of control tools according to claim 3 or 4, wherein the specific process of step 3 is as follows:

6. The method for safety inspection and identification of a managed tool as claimed in claim 5, wherein the scale calculation formula is as follows:

in the formula, S _min The scale of a default frame in the C1 layer is set as S _min ＝0.04；S _max Is the scale of a default frame in the topmost C12 layer and takes the value of S _max ＝0.49；m＝5；

aspect ratio:

width:

high:

7. the method for safety inspection and recognition of the managed cutter according to claim 5, wherein when the Jaccard overlay strategy is adopted to match the target frame with the default frame, when the Jaccard overlay is greater than the threshold value of 0.5, the default frame is considered to be matched with the target frame; the Jaccard Overlap expression is as follows:

8. The method for safety inspection recognition of a regulatory tool as claimed in claim 5, wherein the model training is used to reduce the predicted total error; the total error of prediction is obtained by weighted summation of classified and detected errors, and the expression is as follows:

wherein L is the total error; l is _conf (x, c) is a classification error; l is _loc (x, l, g) is detection error; the weight value a =1; x is a matching flag of the prediction box and the target box, x =1 represents matching, and x =0 represents mismatching; c is the score of the class prediction; l is the pixel value of the prediction frame; g is the pixel value of the target frame; if N =0, then L =0;

in the formula (I), the compound is shown in the specification,

9. The method for safety inspection and identification of a regulatory tool as claimed in claim 7 or 8 wherein the process of non-maximum suppression is as follows: