CN115115895A

CN115115895A - Explosive mobile phone X-ray image classification method based on attention mechanism

Info

Publication number: CN115115895A
Application number: CN202210896302.6A
Authority: CN
Inventors: 张小利; 赵立波; 周子倍; 杨飞扬; 于爽; 朱芮; 李雄飞
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2022-07-28
Filing date: 2022-07-28
Publication date: 2022-09-27

Abstract

The invention provides an explosive mobile phone X-ray image classification method based on an attention mechanism, which comprises the following steps: acquiring a data set of the explosive mobile phone X-ray image; constructing a classification model, wherein the classification model comprises a position information attention module and a residual error network, and the position information attention module is used for carrying out information self-adaptive aggregation and reconstruction on the explosive mobile phone X-ray image; and improving a loss function, training the classification model based on the improved loss function, and performing feature extraction on the images in the data set through the trained classification model to obtain a classification result. The invention adopts the design that the position information attention module is introduced into a residual error network and guides the network learning based on the loss function of the sample cost coefficient, so that the classification model has strong detail characteristic attribute extraction capability and can accurately classify the mobile phones with explosives.

Description

Explosive mobile phone X-ray image classification method based on attention mechanism

Technical Field

The invention belongs to the technical field of small sample classification, and particularly relates to an explosive mobile phone X-ray image classification method based on an attention mechanism.

Background

In daily trips, safety inspection provides essential safety guarantee for public transport. For example, at airports and railways, passengers need to place baggage into an X-ray scanner to check for contraband. Explosives detection is an important component of security. Some terrorists have a serious threat to public safety by placing explosives in their cell phones for bombing. Therefore, it is of great significance to correctly identify the mobile phone with the explosive.

Security screening is one of the main application scenarios for X-ray image classification and contraband detection at present. The currently published reference data sets include GDXray, SIXray, and OPIXray, where GDXray and SIXray can be used for X-ray image classification. Unfortunately, the above three data sets are all used for image classification or detection of X-ray contraband. It is known that there is currently little data set for explosive handset classification tasks. Existing Convolutional Neural Networks (CNNs) tend to directly discard the predicted positive samples due to similar positive and negative sample and class imbalances. As classification of all samples into negative classes can achieve classification accuracy as high as 98.4%. In the explosive mobile phone classification problem, the CNN hardly focuses on important positions and extracts information features from a few classes. Furthermore, the class imbalance problem also makes it easy for the classifier to be dominated by samples from most classes, since these samples are easily classified during the training process. In some studies, the classification difficulty of one sample is evaluated, so that the classifier can be used for treating different samples differently, and the method is very important for improving the generalization and reliability of the classifier.

Disclosure of Invention

In order to solve the technical problems, the invention provides an explosive mobile phone X-ray image classification method based on an attention mechanism, which is characterized in that a position information attention module is introduced into the design of a residual error network, and the network learning is guided based on a loss function of a sample cost coefficient, so that a classification model has strong detail characteristic attribute extraction capability, and a mobile phone with explosives can be accurately classified.

In order to achieve the purpose, the invention provides an explosive mobile phone X-ray image classification method based on an attention mechanism, which comprises the following steps:

acquiring a data set of the explosive mobile phone X-ray image;

constructing a classification model, wherein the classification model comprises a position information attention module and a residual error network, and the position information attention module is used for carrying out information self-adaptive aggregation and reconstruction on the explosive mobile phone X-ray image;

and improving a loss function, training the classification model based on the improved loss function, and performing feature extraction on the images in the data set through the trained classification model to obtain a classification result.

Optionally, the location information attention module performs information adaptive aggregation and reconstruction, including two parts of block compression and pixel-by-pixel reconstruction;

the block compression includes: dividing an input feature map into a plurality of position blocks, and measuring the position blocks along a channel and a space dimension simultaneously to obtain a channel-space context description map;

the pixel-by-pixel reconstruction includes: expanding the channel-space context description map to the same size as the input feature map to obtain a hybrid attention map.

Optionally, the input feature map is obtained by performing feature extraction on the explosive mobile phone X-ray image through a basic volume block in the classification model.

Optionally, the method of block compression is:

compress(f _k )＝conv ^1×1 (AvgPool(f _k ))+conv ^1×1 (MaxPool(f _k ))

＝W1(Re LU(W ₀ (AvgPool(f _k ))))+W ₁ (Re LU(W ₀ (MaxPool(f _k ))))

wherein k is 1,2 ² ，conv ^1×1 Representing two layers of 1 x 1 volume blocks,

and

representing the two-layer 1 x 1 convolutional block weights, r represents the reduction rate.

Optionally, the pixel-by-pixel reconstruction method is:

wherein σ is a sigmoid function, concat is a join operation, and expand is an expand operation.

Optionally, the improvement loss function comprises: and obtaining the improved loss function by adopting a two-classification-based cross entropy loss function and introducing a sample cost coefficient.

Optionally, the two-class cross-entropy loss function is:

where y is the real category {0,1}, and p is the real category [0,1 ]]To predict a positive probability, L _CE Is a two-class cross entropy loss function.

Optionally, the modified loss function is:

wherein y belongs to {0,1} as a real category, sigma is a sigmoid function, t is the cumulative error classification times of positive samples in the whole training process, z is the difference between pixel points, and L _SC Is the loss function after improvement.

Optionally, acquiring the data set of the explosive cell phone X-ray image further includes: preprocessing the data set;

the pretreatment comprises the following steps: and manually marking the area of the explosive substance in the explosive mobile phone image in the data set.

Compared with the prior art, the invention has the following advantages and technical effects:

the invention adopts the design that a position information attention module is introduced into a residual error network and guides network learning based on a loss function of a sample cost coefficient, so that the model has strong detail characteristic attribute extraction capability and accurately classifies the mobile phones with explosives.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

fig. 1 is a schematic flowchart of an explosive mobile phone X-ray image classification method based on an attention mechanism according to embodiment 1 of the present invention;

FIG. 2 is a schematic structural diagram of a classification model according to embodiment 1 of the present invention;

fig. 3 is a schematic structural diagram of a location information attention module according to embodiment 1 of the present invention;

fig. 4 is a schematic view of a verification process of the classification method in embodiment 2 of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than here.

Example 1

As shown in fig. 1, the present embodiment provides an explosive cell phone X-ray image classification method based on attention mechanism, including:

acquiring a data set of the explosive mobile phone X-ray image;

Further, the position information attention module performs information self-adaptive aggregation and reconstruction, including two parts of block compression and pixel-by-pixel reconstruction;

the block compression includes: dividing an input feature map into a plurality of position blocks, and measuring the position blocks along a channel and a space dimension simultaneously to obtain a channel-space context description map; the input characteristic diagram is obtained by extracting the characteristics of the explosive mobile phone X-ray image through a basic volume block in the classification model.

In the present embodiment, the design of the location information attention module, specifically, the implementation of the information adaptive aggregation andreconstruction, i.e. block compression and pixel-by-pixel reconstruction. The block compression operation aggregates the spatial information of each partition to indicate their criticality while rearranging the importance of each channel. Is divided into blocks by

The represented input intermediate feature map is partitioned into n in spatial dimensions based on the position information ² A position block. During this process, the channel dimensions remain unchanged. n is a variable, n ² The number of blocks of the input feature map F is shown.

Further, improving the loss function includes: obtaining a loss function based on a binary cross entropy improvement, and introducing a sample cost to the improved loss function.

As shown in fig. 1, in this embodiment, an explosive cell phone X-ray image classification method based on an attention mechanism includes the following specific implementation steps:

step one, acquiring an explosive mobile phone X-ray image dataset;

designing an attention module, a unit for aggregating the global information of the space and the channel, and the attention module can reinforce the explosive information in network training and convergence;

specifically, the input image is first extracted from the basic volume block to obtain an intermediate feature map F. The location information attention module divides F into an appropriate number of blocks according to spatial location and generates a channel-space context description graph F' along the channel dimensions. Then, the description graph F' is expanded and reconstructed into an attention graph F ″. Then, element-by-element multiplication operation is carried out between the original feature diagram F and the attention diagram F' to obtain a refined feature diagram

The location information attention module performs information adaptive aggregation and reconstruction in two steps: block compression and pixel-by-pixel reconstruction.

The block compression can be expressed as:

compress(f _k )＝conv ^1×1 (AvgPool(f _k ))+conv ^1×1 (MaxPool(f _k ))

＝W1(ReLU(W ₀ (AvgPool(f _k ))))+W ₁ (ReLU(W ₀ (MaxPool(f _k ))))

and

representing the two-layer 1 x 1 convolutional block weights, r represents the reduction rate. The ReLU activation function is applied between two 1 × 1 convolution layers, and a series of channel-space context description graphs are obtained after compression

Can be represented by the following formula:

the pixel-by-pixel reconstruction process is then performed, which can be represented by the following equation:

wherein σ represents sigmoid function, concat represents join operation, expand represents expand operation, and the location information attention structure is shown in fig. 3;

step three, because the explosive mobile phone X-ray image classification is a two-classification problem, the designed loss function is improved based on a two-classification cross entropy, and the formula is as follows:

where y is a {0,1} representation of the true class, p is a [0,1 ]]Indicating the probability that the prediction is positive. In the classical network model, the probability p is always the full connectivity layer output (x) ₁ ,x ₂ ) And processing by a SoftMax function. The output is:

where output (x) ₁ )+output(x ₂ ) 1. For binary classification, output (x) ₁ ) Equal to the probability p, output (x) ₂ ) Equal to 1-p. We denote z ═ x ₂ -x ₁ . Output (x) is easily deduced ₁ )＝σ(z),output(x ₂ ) σ (-z). Thus, L _CE Can be expressed as:

wherein σ represents a sigmoid function;

step four, in order to emphasize important samples and suppress useless samples, sample cost is introduced. The sample cost can prevent most types of samples from dominating the gradient in an overwhelming manner during the training process, thereby improving the convergence and generalization capability of the network, which can be represented by the following formula:

wherein sigma represents a sigmoid function, t records the accumulated error classification times of positive samples in the whole training process, and z represents the difference between pixel points; introducing sample cost into a cross-entropy loss function, wherein the cross-entropy loss based on the sample cost is expressed as:

wherein y belongs to {0,1} to represent a real category, sigma represents a sigmoid function, t records the accumulated error classification times of positive samples in the whole training process, and z represents the difference between pixel points;

and step five, replacing the loss function in the network with a new loss function introduced with the sample cost coefficient, and then guiding the network to train. Meanwhile, the accurate classification of the explosive mobile phones is realized by extracting the characteristics of a network model formed by combining a residual error network with a position information attention module. The specific structure of the network model is shown in fig. 2.

Example 2

As shown in fig. 4, the present embodiment provides a verification test method for an explosive mobile phone X-ray image classification method based on an attention mechanism, including:

step one, selecting a data set. An explosive mobile phone X-ray (EMXray) image is selected.

And step two, preprocessing data. In the acquired X-ray images of explosive mobile phones, explosive mobile phone classification is a typical class imbalance problem because mobile phones with explosives are almost invisible in daily security inspection. The difference between the X-ray images of the cell phone before and after modification is not obvious, and the difference only lies in the positions of the explosive and the lead. Therefore, in the face of class-unbalanced and very similar positive and negative samples, it is important to learn detailed features rich in information from a few classes of samples; the data preprocessing comprises the following steps: the image data set comprises two pictures of an explosive mobile phone and a normal mobile phone. And manually marking the area of the explosive in the explosive mobile phone picture.

Step three, setting up an experiment. The experiment is completed on a hardware platform with an NVIDIA RTX3090 display card of an Ubuntu operating system, the most popular deep learning frame PyTorch is adopted, toolkits such as matchlotlib, re and pydicom are mainly used, and then the final experiment is completed by combining PyCharm. The parameters in the experiment were set as follows:

iteration times are as follows: 100 epochs

An optimizer: SGD

Learning rate: 10 ^-2

Batch size processing: 10

The number of rounds is as follows: 50

And a residual error network is used in the experiment, so that the model convergence is faster. The location information attention module is a unit for aggregating global information of space and channel, which can emphasize explosive information in network training and convergence. Therefore, the attention module and the CNN structure are combined to bring the attention module and the CNN structure out of the best in each other, and for different data sets, the Nvidia RTX3090 GPU is used, so that the training time is different from several hours to three days;

and step four, evaluating indexes. Recall and F1 values were mainly used as the main evaluation indices for model performance. The recall ratio refers to the ratio of the number of detected certain type of features to the number of all the type of features in the data set, and the evaluation index recall ratio can be expressed as: the number of correct pieces of information extracted/number of pieces of information in the sample.

The comprehensive evaluation index F1 is a harmonic mean of the accuracy and recall and is defined as follows:

wherein Precision refers to accuracy, and the ratio between the number of detected certain types of features and the number of all detected features is detected;

and fifthly, evaluating classification results, wherein in the experiment, a position information attention module and sample cost are applied to five popular baseline network models and an ablation experiment is performed. The experimental results show that the location information attention module improves the recall ratio and the F1 value of the baseline model by over 53% and over 27%, and the sample cost improves the recall ratio and the F1 value of the baseline model by over 42% and over 25%. The combination of the two achieves the best performance, and the recall rate and the F1 value of the baseline model are improved by more than 55% and 29%. Comparison and visualization of the classification performance with the other three well-known attention modules also demonstrates that the location information attention module is better at capturing detailed feature information.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An explosive mobile phone X-ray image classification method based on an attention mechanism is characterized by comprising the following steps:

acquiring a data set of the explosive mobile phone X-ray image;

and improving a loss function, training the classification model based on the improved loss function, and extracting the features of the images in the data set through the trained classification model to obtain a classification result.

2. The method for explosive mobile phone X-ray image classification based on attention mechanism according to claim 1, characterized in that the position information attention module performs information adaptive aggregation and reconstruction, including two parts of block compression and pixel-by-pixel reconstruction;

3. The method according to claim 2, wherein the input feature map is obtained by feature extraction for the explosive cell phone X-ray image through a basic volume block in the classification model.

4. The method for classifying the X-ray image of the explosive mobile phone based on the attention mechanism according to claim 2, wherein the method for compressing the blocks is as follows:

compress(f _k )＝conv ^1×1 (AvgPool(f _k ))+conv ^1×1 (MaxPool(f _k ))

＝W1(ReLU(W ₀ (AvgPool(f _k ))))+W ₁ (ReLU(W ₀ (MaxPool(f _k ))))

and

representing the two-layer 1 x 1 convolution block weights and r the reduction rate.

5. The method for explosive mobile phone X-ray image classification based on attention mechanism as claimed in claim 2, wherein the pixel-by-pixel reconstruction method is as follows:

6. The attention mechanism-based explosive mobile phone X-ray image classification method according to claim 1, wherein improving the loss function comprises: and obtaining the improved loss function by adopting a two-classification-based cross entropy loss function and introducing a sample cost coefficient.

7. The method of classifying explosive cell phone X-ray images based on attention mechanism according to claim 6, wherein the two-classification cross entropy loss function is:

where y is the real category {0,1}, and p is the real category [0,1 ]]To predict a positive probability, L _CE Is a cross entropy loss function.

8. The attention mechanism-based explosive mobile phone X-ray image classification method according to claim 6, wherein the improved loss function is:

9. The method for explosive X-ray images based on attention mechanism of claim 1, wherein the step of obtaining the data set of the explosive X-ray images further comprises: preprocessing the data set;