CN117132875A

CN117132875A - Dangerous goods identification method, dangerous goods identification device and storage medium

Info

Publication number: CN117132875A
Application number: CN202211278338.4A
Authority: CN
Inventors: 苗应亮; 付雪平; 夏炉系; 张浒
Original assignee: Maxvision Technology Corp
Current assignee: Maxvision Technology Corp
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-11-28

Abstract

The invention discloses a dangerous goods identification method, a dangerous goods identification device and a storage medium, and relates to the field of artificial intelligence, wherein the dangerous goods identification method comprises the following steps: the data processing stage, the training stage and the reasoning stage utilize the advantages of target detection, and fuse rich semantic information to correct the target detection result by utilizing the semantic segmentation result. Meanwhile, a channel attention and a spatial attention mechanism are introduced in the feature extraction to optimize the inter-domain difference problem, so that the problems of false recognition and missing recognition caused by the object detection downsampling and the inter-domain difference of an X-ray machine can be reduced, the detection effect is improved, and the recognition precision is improved.

Description

Dangerous goods identification method, dangerous goods identification device and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a dangerous goods identification method, a dangerous goods identification device and a storage medium.

Background

In the security inspection recognition system, the existing manual mode is adopted to inspect the X-ray imaging pictures generated by the X-ray machine, and although the step of unpacking inspection can be omitted, the professional literacy requirement on the inspection personnel is higher, and meanwhile, the inspection personnel is required to have better tolerance and dangerous object alertness.

However, with longer operating periods, there is a greater risk of false leak detection due to perceived degradation limited to tiredness.

Disclosure of Invention

The invention provides a dangerous goods identification method, a dangerous goods identification device and a storage medium, aiming at the problem that the existing security inspection system is large in dangerous goods identification risk of false detection.

The technical scheme provided by the invention for the technical problems is as follows:

in a first aspect, the invention provides a dangerous goods identification method, which is applied to a security inspection system, and the method comprises the following steps:

and a data processing stage:

performing target detection labeling on the collected X-ray image to obtain a target detection label; performing semantic segmentation labeling on the X-ray image to obtain a semantic segmentation label;

training phase:

performing feature extraction on the X-ray image by using a main network to obtain a first feature map, wherein a convolution layer of the main network performs feature extraction by using a channel attention module and a space attention module;

performing feature coding on the first feature map and generating a second feature map with more scale information by using a neck network;

the second feature map is respectively input into a target detection head and a semantic segmentation head to obtain a target prediction result and a semantic segmentation prediction result;

performing first loss degree calculation on the target prediction result and the target detection label to optimize a target detection head; performing second loss degree calculation on the semantic segmentation prediction result and the semantic segmentation label to optimize a semantic segmentation head;

reasoning:

inputting an X-ray image to be identified into the optimized semantic segmentation head to obtain a semantic segmentation result, wherein the semantic segmentation result comprises target areas of different types of targets; inputting an X-ray image to be identified into an optimized target detection head to obtain a target detection result, wherein the target detection result comprises classification information and position information of a target;

and carrying out cross-correlation calculation on the semantic segmentation result and the target detection result, and keeping a region with an overlapping region larger than a preset value as a final target detection region and outputting the final target detection region.

Preferably, the feature extraction by the convolution layer of the backbone network using the channel attention module includes:

carrying out average pooling operation on the features of the first feature map to obtain an average channel descriptor; performing maximum pooling operation on the features of the first feature map to obtain a maximum channel descriptor;

the average channel descriptor and the maximum channel descriptor are input into a shared network to generate a one-dimensional channel attention map.

Preferably, the channel attention calculation formula satisfies:

M _c (F)＝δ(MLP(AvgPool(F))+MLP(MaxPool(F)))

＝δ(W ₁ (W ₀ (F ^c _avg ))+W ₁ (W ₀ (F ^c _max )))，

wherein W is ₁ And W is ₀ Respectively representing two convolution layers, σ representing a sigmoid function.

The method of claim 1, wherein the feature extraction by the convolution layer of the backbone network using a spatial attention module comprises:

carrying out average pooling operation on the features of the first feature map to obtain an average space descriptor; performing maximum pooling operation on the features of the first feature map to obtain a maximum space descriptor;

the average spatial descriptor and the maximum spatial descriptor are input into a shared network to generate a two-dimensional spatial attention graph.

Preferably, the channel attention calculation formula satisfies:

M _s (F)＝δ(f ^7x7 ([AvgPool(F)；MaxPool(F)]))

＝δ(f ^7x7 ([F ^s _avg ；F ^s _max ]))，

wherein f ^7×7 Indicating that the convolution kernel performs a 7 x 7 convolution operation.

Preferably, the calculating the first loss degree of the target prediction result and the target detection label to optimize the target detection head includes:

by L _locationloss Performing position loss calculation on the function; by L _classesloss The function performs class loss calculation, where L _locationloss The function satisfies the following formula:

L _locationloss ＝1-(I ₀ /(A ^p +A ^g -I ₀ )-((A ^c -(A ^p +A ^g -I ₀ ))/A ^c ，

wherein I is ₀ Representing the intersection of a prediction frame and a real label frame, A ^p +A ^g -I ₀ Intersection is subtracted from the union representing the prediction box and the real label box, A ^c A minimum bounding rectangle representing the prediction box and the actual label box;

L _classesloss the function satisfies the following formula:

wherein x is _n Representing the predicted result of the target detection head; y is _n Representing a real label; n represents the sample size.

Preferably, the performing the second loss degree calculation on the semantic segmentation prediction result and the semantic segmentation label includes:

by L _seg The function performs semantic segmentation loss calculation, and the following formula is satisfied:

wherein, alpha is used for balancing the weights of positive and negative samples, y is gt, the prediction probability p E [0,1], (1-p) r times and p r times of the foreground class are used for modulating the weight of each sample, so that the difficult samples have higher weights.

Preferably, the calculating of the cross-correlation ratio between the semantic segmentation result and the target detection result is performed, and a region with an overlapping region larger than a preset value is reserved as a final target detection region and output, so that the following formula is satisfied:

wherein I is _out Representing an output result; x is x _n1 Representing the predicted result of the target detection head; x is x _n2 The detection result of the semantic segmentation head is represented.

In a second aspect, the present invention provides a hazardous article identification device, the device comprising:

the data processing module is used for carrying out target detection labeling on the collected X-ray images to obtain target detection labels; performing semantic segmentation labeling on the X-ray image to obtain a semantic segmentation label;

training module for:

an inference module for:

In a third aspect, the present invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the hazardous article identification method as described above.

The technical scheme provided by the embodiment of the invention has the beneficial effects that:

the dangerous goods identification method provided by the invention utilizes the advantages of target detection, and fuses rich semantic information to correct the target detection result by utilizing the semantic segmentation result. Meanwhile, a channel attention and a spatial attention mechanism are introduced in the feature extraction to optimize the inter-domain difference problem, so that the problems of false recognition and missing recognition caused by the object detection downsampling and the inter-domain difference of an X-ray machine can be reduced, the detection effect is improved, and the recognition precision is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a dangerous goods identification method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of functional modules of the dangerous goods identification device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a dangerous goods identification method provided by the invention is provided in an embodiment. The dangerous goods identification method is mainly applied to a security inspection system, and is used for obtaining whether the current security inspection object is a dangerous goods or not and determining the type and the position of the dangerous goods by processing and identifying the X-ray image.

As shown in fig. 1, the dangerous goods identification method may include the following steps:

s101: in the data processing stage, in the stage, mainly, the collected X-ray image is subjected to target detection labeling to obtain a target detection label, and the X-ray image is subjected to semantic segmentation labeling to obtain a semantic segmentation label, specifically:

the X-ray image can be acquired first, and can be acquired and generated in real time by the X-ray machine or can be stored under a specified directory.

Then, marking the specific position of the dangerous goods in the corresponding X-ray image by using MASK according to the category information and the position information of the dangerous goods, thereby obtaining a target detection tag y1; meanwhile, different pixel values are used for calibrating the X-ray image, so that the types of dangerous goods and the specific positions of the dangerous goods in the corresponding X-ray image are divided in the X-ray image under the different pixel values, and the semantic segmentation label y2 is obtained.

After that, the X-ray image and the corresponding object detection tag y1 and semantic segmentation tag y2 may be further subjected to enhancement processing, where the enhancement processing may include clipping, affine transformation, rotation transformation, and the like, and the X-ray processed image I, the object detection tag y1 ', and the semantic segmentation tag y 2' are obtained after the processing is completed.

S102: in the training stage, the YOLOV5 network is mainly used for extracting and training the characteristics of the X-ray image I, and specifically:

(1) The method comprises the steps of performing feature extraction on an X-ray image I by using a backbone network to obtain a first feature map I ₀ The convolution layer of the backbone network performs feature extraction by using a channel attention module and a space attention module.

In the invention, because of the problem that the inter-domain offset is easily caused by the internal factors of the X-ray machine such as imaging principle, hardware parameters, machine aging and the like, a channel attention module and a space attention module are introduced in a convolution layer in a feature extraction network for reducing the inter-domain offset so as to optimize feature extraction.

In the channel attention module of the present embodiment, the feature extraction by the convolution layer of the backbone network using the channel attention module may include:

(1) the features of the first feature map are subjected to an average pooling operation to obtain an average channel descriptor, which can be counted as F _avg The method comprises the steps of carrying out a first treatment on the surface of the The maximum pooling operation is carried out on the features of the first feature map to obtain a maximum channel descriptor, which can be counted as F _max 。

(2) Inputting the average channel descriptor and the maximum channel descriptor into a shared network to generate a one-dimensional channel attention map, which may be M _c ∈R ^Cx1x1 。

Here, the shared network is composed of multiple layers of perceptrons, and the channel attention calculation formula satisfies:

M _c (F)＝δ(MLP(AvgPool(F))+MLP(MaxPool(F)))

＝δ(W ₁ (W ₀ (F ^c _avg ))+W ₁ (W ₀ (F ^c _max )))，

In the spatial attention module of the present embodiment, the feature extraction by the convolution layer of the backbone network using the spatial attention module may include:

(1) carrying out average pooling operation on the features of the first feature map to obtain an average space descriptor; performing maximum pooling operation on the features of the first feature map to obtain a maximum space descriptor;

(2) inputting the average spatial descriptor and the maximum spatial descriptor into a shared network to generate a two-dimensional spatial attention graph.

Here, the channel attention calculation formula satisfies:

M _s (F)＝δ(f ^7x7 ([AvgPool(F)；MaxPool(F)]))

＝δ(f ^7x7 ([F ^s _avg ；F ^s _max ]))，

In this embodiment, the channel attention module is preferably used to extract the features first, and then the spatial attention module is used to extract the features.

(2) Performing feature coding on the first feature map and generating a second feature map with more scale information by using a neck network, namely firstly extracting the first feature map I obtained by channel attention and space attention features by using a feature coder ₀ Coding, and then utilizing a neck network to code the first characteristic diagram I ₀ Processing to obtain a second feature map I with more scale information ₁ 。

(3) Inputting the second feature map into a target detection head (head_objdet) and a semantic segmentation head (head_segment) respectively to obtain a target prediction result and a semantic segmentation prediction result, wherein the target prediction result can be counted as x _n1 The method comprises the steps of carrying out a first treatment on the surface of the The semantic segmentation prediction result can be counted asx _n2 。

(4) Performing first loss degree calculation on the target prediction result and the target detection label to optimize a target detection head; and carrying out second loss degree calculation on the semantic segmentation prediction result and the semantic segmentation label so as to optimize a semantic segmentation head.

In this step, the calculating the first loss degree between the target prediction result and the target detection tag to optimize the target detection head includes:

by L _{location loss} Performing position loss calculation on the function; by L _{classes loss} The function performs class loss calculation, where L _{location loss} The function satisfies the following formula:

L _{location loss} ＝1-(I ₀ /(A ^p +A ^g -I ₀ )-((A ^c -(A ^p +A ^g -I ₀ ))/A ^c ，

L _{classes loss} the function satisfies the following formula:

In addition, L can also be utilized _{objectness loss} The function calculates the position regression loss and satisfies the following formula:

L _{classes loss} ＝L _{objectness loss} ，

while all sample obj losses satisfy:

L _obj ＝L _{classes loss} +L _{objectness loss} +L _{location loss} 。

in this step, the performing the second loss degree calculation on the semantic segmentation prediction result and the semantic segmentation label includes:

The total loss calculation satisfies the following formula:

L＝L _{location loss} +L _{objectness loss} +L _{classes loss} +L _seg 。

s103: in the reasoning stage, in this stage, the X-ray image to be identified may be input into a training-completed network, specifically:

(1) Inputting the X-ray image to be identified into the optimized semantic segmentation head to obtain a semantic segmentation result, wherein the semantic segmentation result comprises target areas of different types of targets; inputting the X-ray image to be identified into the optimized target detection head to obtain a target detection result, wherein the target detection result comprises classification information and position information of the target.

(2) And calculating the cross-over ratio of the semantic segmentation result and the target detection result, and reserving a region with the overlapping region larger than a preset value as a final target detection region and outputting the final target detection region, wherein the following formula is satisfied:

wherein I is _out Representing an output result; x is x _n1 Representing the predicted result of the target detection head; x is x _n2 The detection result of the semantic segmentation head is represented. Here, the preset value is 0.5, i.e. pairAnd reserving the area with the overlapping area larger than 0.5 as a final target detection area and outputting the final target detection area.

Compared with the existing manual screening method, the method has the problems of long training time consumption, high labor cost, missed recognition and false recognition of dangerous goods due to fatigue and the like; the existing target detection and recognition method based on deep learning often needs to be subjected to downsampling for a plurality of times in the feature extraction process, and the problem that missing recognition and false recognition are easy to cause due to inter-domain difference caused by internal factors in X-ray imaging. The dangerous goods identification method provided by the invention utilizes the advantages of target detection, and fuses rich semantic information to correct the target detection result by utilizing the semantic segmentation result. Meanwhile, a channel attention and a spatial attention mechanism are introduced in the feature extraction to optimize the inter-domain difference problem, so that the problems of false recognition and missing recognition caused by the object detection downsampling and the inter-domain difference of an X-ray machine can be reduced, the detection effect is improved, and the recognition precision is improved.

Referring to fig. 2, a functional module schematic diagram of the dangerous goods identification device according to an embodiment of the present invention is shown. The dangerous goods identification device 100 of the present embodiment may include a data processing module 11, a training module 12, and an inference module 13, wherein:

the data processing module 11 is used for carrying out target detection labeling on the collected X-ray images to obtain target detection labels; performing semantic segmentation labeling on the X-ray image to obtain a semantic segmentation label;

the training module 12 is configured to:

the reasoning module 13 is configured to:

It should be understood that after the corresponding modules execute the corresponding functions, the effect achieved may be the same as the aforementioned dangerous goods identification method, so that the description thereof is omitted herein.

The invention provides a computer device, which comprises a processor, wherein the processor is used for realizing the steps in the dangerous goods identification method when executing a computer program stored in a memory.

The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the computer device, connecting various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the computer device by running or executing the computer program and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

In addition, the invention also provides a storage medium, on which a computer program is stored, which when being executed by a processor, realizes the steps in the dangerous goods identification method.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. The dangerous goods identification method is applied to a security inspection system and is characterized by comprising the following steps of:

and a data processing stage:

training phase:

reasoning:

2. The method of claim 1, wherein the feature extraction by the convolution layer of the backbone network using a channel attention module comprises:

3. The dangerous goods recognition method according to claim 2, wherein the channel attention calculation formula satisfies:

M _c (F)＝δ(MLP(AvgPool(F))+MLP(MaxPool(F)))

＝δ(W ₁ (W ₀ (F ^c _avg ))+W ₁ (W ₀ (F ^c _max )))，

4. The method of claim 1, wherein the feature extraction by the convolution layer of the backbone network using a spatial attention module comprises:

5. The method for identifying dangerous goods according to claim 4, wherein the channel attention calculation formula satisfies:

M _s (F)＝δ(f ^7x7 ([AvgPool(F)；MaxPool(F)]))

＝δ(f ^7x7 ([F ^s _avg ；F ^s _max ]))，

6. The method of any one of claims 1 to 5, wherein the performing a first loss degree calculation on the target prediction result and the target detection tag to optimize a target detection head includes:

L _{classes loss} the function satisfies the following formula:

7. The method of any one of claims 1 to 5, wherein the performing a second loss degree calculation on the semantic segmentation prediction result and the semantic segmentation label includes:

8. The dangerous goods identification method according to any one of claims 1 to 5, wherein the calculating of the cross-correlation ratio between the semantic segmentation result and the target detection result is performed, and a region with an overlapping region larger than a preset value is reserved as a final target detection region and output, so as to satisfy the following formula:

wherein I is _out Representing output junctionsFruit; x is x _n1 Representing the predicted result of the target detection head; x is x _n2 The detection result of the semantic segmentation head is represented.

9. A hazardous article identification device, the device comprising:

the data processing module is used for carrying out target detection labeling on the collected X-ray images so as to obtain target detection labels; performing semantic segmentation labeling on the X-ray image to obtain a semantic segmentation label;

training module, training module:

an inference module that:

10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for identifying a hazardous article as claimed in any one of claims 1 to 8.