CN114445363A - System, method, device, processor and storage medium for realizing dangerous goods identification based on multi-mode data attention model - Google Patents

System, method, device, processor and storage medium for realizing dangerous goods identification based on multi-mode data attention model Download PDF

Info

Publication number
CN114445363A
CN114445363A CN202210087402.4A CN202210087402A CN114445363A CN 114445363 A CN114445363 A CN 114445363A CN 202210087402 A CN202210087402 A CN 202210087402A CN 114445363 A CN114445363 A CN 114445363A
Authority
CN
China
Prior art keywords
attention
image
dangerous goods
module
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210087402.4A
Other languages
Chinese (zh)
Inventor
常青青
李维姣
沈天明
刘伟豪
欧阳光
姚鸿达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Third Research Institute of the Ministry of Public Security
Original Assignee
Third Research Institute of the Ministry of Public Security
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Third Research Institute of the Ministry of Public Security filed Critical Third Research Institute of the Ministry of Public Security
Priority to CN202210087402.4A priority Critical patent/CN114445363A/en
Publication of CN114445363A publication Critical patent/CN114445363A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10116X-ray image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a system for realizing dangerous goods identification based on a multi-modal data attention model, wherein the system comprises a feature extraction processing module, a feature vector extraction module and a feature vector extraction module, wherein the feature extraction processing module is used for inputting a perspective image and a back scattering image to extract an image feature vector; the attention fusion processing module is used for carrying out attention fusion processing on the extracted image features; and the recognition output module is used for carrying out training processing of image classification, type detection, pixel segmentation and effective atomic number mapping on the obtained image according to a specific recognition task. The invention also relates to a corresponding method, device, processor and storage medium thereof. By adopting the system, the method, the device, the processor and the storage medium thereof for realizing dangerous goods identification based on the multi-mode data attention model, not only dangerous goods with shape characteristics such as guns and cutters, lighters and the like in luggage packages can be identified, but also material components of liquid in a container can be extracted, and non-unpacking identification of dangerous goods such as flammable liquid and the like is realized.

Description

System, method, device, processor and storage medium for realizing dangerous goods identification based on multi-mode data attention model
Technical Field
The invention relates to the technical field of computer vision, in particular to the technical field of image recognition, and specifically relates to a system, a method, a device, a processor and a computer readable storage medium for realizing dangerous goods recognition based on a multi-mode data attention model.
Background
The X-ray inspection equipment is one of the most widely applied equipment in the field of safety inspection, the traditional X-ray equipment mostly adopts the perspective imaging technology to carry out non-contact imaging on luggage packages carried by passengers, whether dangerous goods exist or not is rapidly checked in a picture reading mode, the identification types are limited, the intelligent level is low, the picture reading level of workers is seriously depended, and the safety inspection efficiency and the identification accuracy of the dangerous goods are influenced. With the development of new technologies such as big data and deep learning, part of X-ray security inspection equipment learns the shapes of objects in X-ray images and the like by embedding an image recognition algorithm module based on deep learning, and identifies dangerous goods such as guns and cutters with obvious shape characteristics to a certain extent, but the technology does not have material characteristic resolution capability and cannot identify liquid dangerous goods such as gasoline and alcohol in luggage packages. The X-ray back scattering technology is another detection imaging technology, is sensitive to organic matters with low effective atomic number and high density, can highlight dangerous goods such as inflammable and explosive articles and the like, and further improves the inspection accuracy.
In the dangerous goods identification technology based on deep learning, Convolutional Neural Networks (CNNs) are widely used as feature extractors of X-ray images. In multi-modal data fusion, although a multi-stream depth architecture can be established by inputting perspective and backscatter images of the same baggage package, the technology has two disadvantages, firstly, feature extraction of different modes is carried out independently, and some important shared advanced functions in data of the two modes are omitted; secondly, the features extracted by the method are simply connected in series or combined to have redundant information, so that the system can be easily overfitted.
Disclosure of Invention
The present invention is directed to overcoming the above-mentioned disadvantages of the prior art, and providing a system, method, apparatus, processor and computer readable storage medium thereof for implementing dangerous goods identification based on multi-modal data attention model, which can implement cross-attention fusion feature extraction.
In order to achieve the above objects, the system, method, apparatus, processor and computer readable storage medium thereof for realizing identification of dangerous goods based on multi-modal data attention model of the present invention are as follows:
the system for realizing dangerous goods identification based on the multi-mode data attention model is mainly characterized by comprising the following components:
the characteristic extraction processing module is used for extracting image characteristic vectors from the input perspective image and the backscattering image through a basic neural network;
the attention fusion processing module is connected with the feature extraction processing module and is used for carrying out fusion processing on the self-attention and the cross-attention of each image feature acquired in a plurality of extraction stages to obtain a fusion image feature; and
and the recognition output module is connected with the characteristic vector splicing processing module and used for carrying out training processing of image classification, type detection, pixel segmentation and effective atomic number mapping on the obtained image according to a specific recognition task.
Preferably, the feature extraction processing module specifically includes:
training sample T of high-energy image of perspective image subjected to X-ray perspective processingHAnd low energy image training sample TLAnd carrying out extraction processing on the characteristics of the perspective image on the back scattering image.
Preferably, the attention fusion processing module includes:
the self-attention processing unit is used for generating a space attention mask for the perspective image processed by the convolutional neural network model and generating a substance attention mask for the backscatter image;
the cross attention processing unit is connected with the self attention processing unit and is used for performing enhancement processing on the material characteristic by using the image characteristic;
and the modality fusion extraction unit is connected with the self-attention processing unit and the cross-attention processing unit and is used for performing output combination processing on the perspective image and the backscatter image so as to highlight important parts of two modalities.
Preferably, the identification output module includes:
the detection submodule is connected with the attention fusion processing module and used for detecting and identifying the interested object in the image and outputting a bounding box of the interested object;
the segmentation sub-module is connected with the attention fusion processing module and is used for identifying and processing the pixels in the image according to the corresponding object types; and
and the atomic number regression submodule is connected with the attention fusion processing module and is used for estimating the equivalent atomic number of the scanned object pixel by pixel.
The method for realizing the identification of the dangerous goods based on the multi-mode data attention model by utilizing the system is mainly characterized by comprising the following steps of:
(1) the basic neural network carries out feature extraction processing on the input original perspective image and the back scattering image;
(2) performing multi-mode information fusion processing based on an attention mechanism;
(3) and performing network training and reasoning processing of multi-task joint learning on the acquired images.
Preferably, the step (1) comprises the following steps:
(1.1) carrying out feature extraction on the input original perspective image and the input back scattering image through a basic neural network;
(1.2) the height of the input collected by the security inspection machine is W1Width of H1And a feature dimension of D1And a backscatter image, and outputting a first feature tensor W1×H1×D1
Preferably, the step (1.1) in the specific implementation adopts a deep convolutional neural network, which is composed of a plurality of layers of convolutions and poling layers, the output of each convolution operation is standardized in batch, and residual connection exists among a plurality of different stages to ensure that gradient transmission is smoother during training, thereby facilitating training.
Preferably, the step (2) comprises:
(2.1) fusing the characteristics of the perspective and backscatter images through a plurality of cascade-crossed self-attention layers and mutual-attention layers to obtain fused multi-modal characteristics;
(2.2) separately inputting each modal characteristic height as W2Width of H2And a feature dimension of D2After being processed and fused by a plurality of attention layers, the second feature tensor W is output2×H2×D2
Preferably, the step (3) specifically includes the following steps:
(3.1) the detection submodule refers to the Region Proposal Network and the ROIAlign layer to detect the dangerous goods and output the bounding box information of the interested category;
(3.2) the segmentation submodule realizes pixel-level segmentation and extracts the edge of the dangerous goods through a deconvolution layer and an up-sampling layer;
and (3.3) generating an effective atomic number mapping map corresponding to the luggage package by the effective atomic number regression module, wherein the effective atomic number mapping map is used for assisting in identifying dangerous goods contained in the luggage package.
Preferably, on the manual labeling data set, the feature extraction processing module, the attention fusion module and the recognition output module are subjected to multi-task training according to a joint loss function of a plurality of final tasks through a labeled training set, so as to obtain better network parameters.
The device for realizing dangerous goods identification based on the multi-mode data attention model is mainly characterized by comprising the following components:
a processor configured to execute computer-executable instructions;
a memory storing one or more computer-executable instructions that, when executed by the processor, perform the steps of the above-described method for multi-modal data attention model-based threat identification.
The processor for realizing dangerous goods identification based on the multi-modal data attention model is mainly characterized in that the processor is configured to execute computer executable instructions, and when the computer executable instructions are executed by the processor, the steps of the dangerous goods identification method based on the multi-modal data attention model are realized.
The computer-readable storage medium is mainly characterized by having a computer program stored thereon, wherein the computer program is executable by a processor to implement the steps of the above-mentioned method for identifying a hazardous material based on a multi-modal data attention model.
The system, the method, the device, the processor and the computer readable storage medium for realizing the identification of the dangerous goods based on the multi-mode data attention model are adopted, the attention model based on the multi-mode data is provided based on X-ray dual-energy perspective and back scattering image data, the convolutional neural network model CNN is used for extracting the space characteristics of an X-ray image, and the substance characteristics are extracted by using a cross attention mechanism in a fusion manner, so that the dangerous goods with shape characteristics, such as guns, cutters and lighters in luggage packages, can be identified, the substance components of liquid in containers can be extracted, the non-unpacking identification of the dangerous goods, such as flammable liquid and the like, is realized, and the display performance in a visual reasoning task is obviously improved.
Drawings
Fig. 1 is a block diagram of a system for identifying dangerous goods based on a multi-modal data attention model according to the present invention.
FIG. 2 is a flow chart of a method of implementing multi-modal data attention model-based threat identification in accordance with the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, the following further description is given in conjunction with specific embodiments.
Before describing in detail embodiments that are in accordance with the present invention, it should be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, the system for identifying dangerous goods based on multi-modal data attention model includes:
the characteristic extraction processing module is used for extracting image characteristic vectors from the input perspective image and the backscattering image through a basic neural network;
the attention fusion processing module is connected with the feature extraction processing module and is used for carrying out fusion processing on the self-attention and the cross-attention of each image feature acquired in a plurality of extraction stages to obtain a fusion image feature; and
and the recognition output module is connected with the characteristic vector splicing processing module and used for carrying out training processing of image classification, type detection, pixel segmentation and effective atomic number mapping on the obtained image according to a specific recognition task.
As a preferred embodiment of the present invention, the feature extraction processing module specifically includes:
training sample T of high-energy image of perspective image subjected to X-ray perspective processingHAnd low energy image training sample TLAnd carrying out extraction processing on the characteristics of the perspective image on the back scattering image.
As a preferred embodiment of the present invention, the attention fusion processing module includes:
the self-attention processing unit is used for generating a space attention mask for the perspective image processed by the convolutional neural network model and generating a substance attention mask for the backscatter image;
the cross attention processing unit is connected with the self attention processing unit and is used for performing enhancement processing on the material characteristic by using the image characteristic;
and the modality fusion extraction unit is connected with the self-attention processing unit and the cross-attention processing unit and is used for performing output combination processing on the perspective image and the backscatter image so as to highlight important parts of two modalities.
As a preferred embodiment of the present invention, the identification output module includes:
the detection submodule is connected with the attention fusion processing module and used for detecting and identifying the interested object in the image and outputting a bounding box of the interested object;
the segmentation sub-module is connected with the attention fusion processing module and is used for identifying and processing the pixels in the image according to the corresponding object types; and
and the atomic number regression submodule is connected with the attention fusion processing module and is used for estimating the equivalent atomic number of the scanned object pixel by pixel.
The method for realizing the identification of the dangerous goods based on the multi-modal data attention model by using the system comprises the following steps:
(1) the basic neural network carries out feature extraction processing on the input original perspective image and the back scattering image;
(2) performing multi-mode information fusion processing based on an attention mechanism;
(3) and performing network training and reasoning processing of multi-task joint learning on the acquired images.
As a preferred embodiment of the present invention, the step (1) comprises the steps of:
(1.1) carrying out feature extraction on the input original perspective image and the input back scattering image through a basic neural network;
(1.2) the height of the input collected by the security inspection machine is W1Width of H1And a feature dimension of D1And a backscatter image, and outputting a first feature tensor W1×H1×D1
As a preferred embodiment of the invention, the step (1.1) adopts a deep convolutional neural network in the specific implementation, the deep convolutional neural network is composed of a plurality of layers of convolutions and pooling layers, the output of each convolution operation is standardized in batch, and residual connection exists among a plurality of different stages to ensure that gradient transmission is smoother during training, so that the training is easier.
As a preferred embodiment of the present invention, the step (2) comprises:
(2.1) fusing the characteristics of the perspective and backscatter images through a plurality of cascade-crossed self-attention layers and mutual-attention layers to obtain fused multi-modal characteristics;
(2.2) separately inputting each modal characteristic height as W2Width of H2And a feature dimension of D2After being processed and fused by a plurality of attention layers, the second feature tensor W is output2×H2×D2
As a preferred embodiment of the present invention, the step (3) specifically comprises the following steps:
(3.1) the detection submodule refers to the Region Proposal Network and the ROIAlign layer to detect the dangerous goods and output the bounding box information of the interested category;
(3.2) the segmentation submodule realizes pixel-level segmentation and extracts the edge of the dangerous goods through a deconvolution layer and an up-sampling layer;
and (3.3) generating an effective atomic number mapping map corresponding to the luggage package by the effective atomic number regression module, wherein the effective atomic number mapping map is used for assisting in identifying dangerous goods contained in the luggage package.
As a preferred embodiment of the invention, on a manually labeled data set, the feature extraction processing module, the attention fusion module and the recognition output module are subjected to multi-task training according to a joint loss function of a plurality of final tasks through a labeled training set so as to obtain better network parameters.
The device for realizing dangerous goods identification based on the multi-modal data attention model comprises:
a processor configured to execute computer-executable instructions;
a memory storing one or more computer-executable instructions that, when executed by the processor, perform the steps of the above-described method for multi-modal data attention model-based threat identification.
The processor for identifying dangerous goods based on the multi-modal data attention model is configured to execute computer executable instructions, and when the computer executable instructions are executed by the processor, the steps of the method for identifying dangerous goods based on the multi-modal data attention model are realized.
The computer-readable storage medium has stored thereon a computer program which is executable by a processor to implement the steps of the above-described method for identifying a hazardous material based on a multimodal data attention model.
In practical application, the feature extraction processing module of the invention has the following main functions:
the module has the main functions of extracting characteristic opening vectors from perspective images and back scattering images respectively, inputting the characteristic opening vectors into original perspective images and back scattering images respectively, ensuring consistent resolution ratio and ensuring one-to-one correspondence relationship. To accommodate different input image resolutions, a full convolution neural network (FCN) is employed, consisting of several layers of convolution and pooling layers. The output of each convolution operation is standardized in batch, and residual connection among a plurality of different stages ensures that gradient transfer is smoother during training, so that the training is easier.
This part can usually initialize the weights directly from the existing network, fine-tuning on the training data of the X-ray image. This part can also initialize the weights randomly and train from scratch if there is enough X-ray related labeling data.
In practical application, the multi-modal information fusion based on attention mechanism of the invention has the following main functions:
the part is the core of the technical scheme, and the information of the perspective and back scattering images is fused through the attention layers of a plurality of stages to obtain the fused characteristics. Similar to the neural message posting, a fused feature is obtained by stacking the self-attention (self-attention) and cross-attention (cross-attention) of multiple stages, specifically, the self-attention layer for the feature T
SA(T)=SA(QT,KT,VT)=softmax(QT×KT).dot(VT)
For a cross anchorage layer,
CA(T1,T2)=CT(QT2,KT1,VT1)=softmax(QT2×KT1).dot(VT1)
the complete computation also includes a residual connection, then the complete input and output relationships from the attention layer are:
SA(T)=MLP(BatchNorm(SA(T)+T
the input and output relationship of the mutual attention layer is as follows:
CT(T1,T2)=MLP(BatchNorm(CA(T1,T2)+T2
the input of the module is W multiplied by H multiplied by D tenor, the output is W multiplied by H multiplied by D tenor, and the dimension is guaranteed to be unchanged. However, in the process, the features of different positions of the image and the features of different modes are gradually integrated by mutual attention, so that the feature fusion is completed.
In practical application, the main functions of the multi-task joint learning of the invention are as follows:
the multi-task learning is proved to be capable of improving the effect of a single task by using the commonality between different tasks. And the computation complexity is reduced by sharing the computation by a plurality of tasks. In the invention, the multi-mode characteristics can be fused in the multi-task learning, the characteristics of different modes for different tasks are fully utilized, and the effect is improved. For this purpose, at this stage, different tasks are performed by means of multi-task learning via a plurality of different branches and are trained simultaneously.
The first branch can refer to the region technical network and the ROIAlign layer, detect objects such as guns and cutters, flammable liquid and the like in the image, and output information such as a bounding box of the interested category.
The second branch realizes the pixel-level segmentation simultaneously, extracts dangerous articles and backgrounds and facilitates subjective display.
And the third branch directly regresses the effective atomic number to generate an effective atomic number mapping map of the luggage package, and the map is used for identifying dangerous goods such as gasoline, explosive and the like in the package.
In the whole training process, the loss function comprises the sum of the losses of all branches, namely the object class, the mask and the like.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution devices.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of terms "an embodiment," "some embodiments," "an example," "a specific example," or "an embodiment," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
The system, the method, the device, the processor and the computer readable storage medium for realizing the identification of the dangerous goods based on the multi-mode data attention model are adopted, the attention model based on the multi-mode data is provided based on X-ray dual-energy perspective and back scattering image data, the convolutional neural network model CNN is used for extracting the space characteristics of an X-ray image, and the substance characteristics are extracted by using a cross attention mechanism in a fusion manner, so that the dangerous goods with shape characteristics, such as guns, cutters and lighters in luggage packages, can be identified, the substance components of liquid in containers can be extracted, the non-unpacking identification of the dangerous goods, such as flammable liquid and the like, is realized, and the display performance in a visual reasoning task is obviously improved.
In this specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (13)

1. A system for identifying hazardous materials based on a multi-modal data attention model, the system comprising:
the characteristic extraction processing module is used for extracting image characteristic vectors from the input perspective image and the backscattering image through a basic neural network;
the attention fusion processing module is connected with the feature extraction processing module and is used for carrying out fusion processing on the self-attention and the cross-attention of each image feature acquired in a plurality of extraction stages to obtain a fusion image feature; and
and the recognition output module is connected with the characteristic vector splicing processing module and used for carrying out training processing of image classification, type detection, pixel segmentation and effective atomic number mapping on the obtained image according to a specific recognition task.
2. The system for realizing dangerous goods identification based on multi-modal data attention model according to claim 1, wherein the feature extraction processing module is specifically:
training sample T of high-energy image of perspective image subjected to X-ray perspective processingHAnd low energy image training sample TLAnd carrying out extraction processing on the characteristics of the perspective image on the back scattering image.
3. The system for performing threat identification based on multimodal data attention models of claim 2, wherein the attention fusion processing module comprises:
the self-attention processing unit is used for generating a space attention mask for the perspective image processed by the convolutional neural network model and generating a substance attention mask for the backscatter image;
the cross attention processing unit is connected with the self attention processing unit and is used for performing enhancement processing on the material characteristics by using the image characteristics;
and the modality fusion extraction unit is connected with the self-attention processing unit and the cross-attention processing unit and is used for performing output combination processing on the perspective image and the backscatter image so as to highlight important parts of two modalities.
4. The system for performing threat identification according to claim 3, wherein the identification output module comprises:
the detection sub-module is connected with the attention fusion processing module and is used for detecting and identifying the interested object in the image and outputting a bounding box of the interested object;
the segmentation sub-module is connected with the attention fusion processing module and is used for identifying and processing the pixels in the image according to the corresponding object types; and
and the atomic number regression submodule is connected with the attention fusion processing module and is used for estimating the equivalent atomic number of the scanned object pixel by pixel.
5. A method for performing a multi-modal data attention model based threat identification using the system of claims 1-4, the method comprising the steps of:
(1) the basic neural network carries out feature extraction processing on the input original perspective image and the back scattering image;
(2) performing multi-mode information fusion processing based on an attention mechanism;
(3) and performing network training and reasoning processing of multi-task joint learning on the acquired images.
6. The method for realizing the identification of the dangerous goods based on the multi-modal data attention model according to claim 5, wherein the step (1) comprises the following steps:
(1.1) carrying out feature extraction on the input original perspective image and the input back scattering image through a basic neural network;
(1.2) the height of the input collected by the security inspection machine is W1Width of H1And a feature dimension of D1And a backscatter image, and outputting a first feature tensor W1×H1×D1
7. The method for realizing the identification of dangerous goods based on the multi-modal data attention model as claimed in claim 6, wherein the step (1.1) is implemented by using a deep convolutional neural network, which is composed of a plurality of layers of convolution and a pooling layer, the output of each convolution operation is standardized in batch, and the residual connection between a plurality of different stages ensures that the gradient transmission is smoother during training, thereby being easier to train.
8. The method for performing multi-modal data attention model-based threat identification as claimed in claim 6, wherein said step (2) comprises:
(2.1) fusing the characteristics of the perspective and backscatter images through a plurality of cascade-crossed self-attention layers and mutual-attention layers to obtain fused multi-modal characteristics;
(2.2) separately inputting each modal characteristic height as W2Width of H2And a feature dimension of D2After being processed and fused by a plurality of attention layers, the second feature tensor W is output2×H2×D2
9. The method for realizing the identification of the dangerous goods based on the multi-modal data attention model according to claim 8, wherein the step (3) comprises the following steps:
(3.1) the detection submodule refers to the Region Proposal Network and the ROIAlign layer to detect the dangerous goods and output the bounding box information of the interested category;
(3.2) the segmentation submodule realizes pixel-level segmentation and extracts the edge of the dangerous goods through a deconvolution layer and an up-sampling layer;
and (3.3) generating an effective atomic number mapping map corresponding to the luggage package by the effective atomic number regression module, wherein the effective atomic number mapping map is used for assisting in identifying dangerous goods contained in the luggage package.
10. The method for realizing dangerous goods identification based on multi-modal data attention model according to claim 9, characterized in that on the manually labeled data set, the feature extraction processing module, the attention fusion module and the identification output module are multi-tasked trained according to the joint loss function of the final multiple tasks through the labeled training set to obtain better network parameters.
11. An apparatus for realizing dangerous goods identification based on a multi-modal data attention model, which is characterized in that the apparatus comprises:
a processor configured to execute computer-executable instructions;
a memory storing one or more computer-executable instructions that, when executed by the processor, perform the steps of the method for multi-modal data attention model-based threat identification method of claim 10.
12. A processor for performing threat identification based on a multimodal data attention model, wherein the processor is configured to execute computer-executable instructions that, when executed by the processor, perform the steps of the method for threat identification based on a multimodal data attention model of claim 10.
13. A computer-readable storage medium, having stored thereon a computer program which is executable by a processor for performing the steps of the method for threat identification based on multimodal data attention model as claimed in claim 10.
CN202210087402.4A 2022-01-25 2022-01-25 System, method, device, processor and storage medium for realizing dangerous goods identification based on multi-mode data attention model Pending CN114445363A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210087402.4A CN114445363A (en) 2022-01-25 2022-01-25 System, method, device, processor and storage medium for realizing dangerous goods identification based on multi-mode data attention model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210087402.4A CN114445363A (en) 2022-01-25 2022-01-25 System, method, device, processor and storage medium for realizing dangerous goods identification based on multi-mode data attention model

Publications (1)

Publication Number Publication Date
CN114445363A true CN114445363A (en) 2022-05-06

Family

ID=81369608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210087402.4A Pending CN114445363A (en) 2022-01-25 2022-01-25 System, method, device, processor and storage medium for realizing dangerous goods identification based on multi-mode data attention model

Country Status (1)

Country Link
CN (1) CN114445363A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117347396A (en) * 2023-08-18 2024-01-05 北京声迅电子股份有限公司 XGBoost model-based substance type identification method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117347396A (en) * 2023-08-18 2024-01-05 北京声迅电子股份有限公司 XGBoost model-based substance type identification method
CN117347396B (en) * 2023-08-18 2024-05-03 北京声迅电子股份有限公司 Material type identification method based on XGBoost model

Similar Documents

Publication Publication Date Title
US20200125885A1 (en) Vehicle insurance image processing method, apparatus, server, and system
Raghavan et al. Optimized building extraction from high-resolution satellite imagery using deep learning
CN107909093B (en) Method and equipment for detecting articles
KR101930940B1 (en) Apparatus and method for analyzing image
CN108491848A (en) Image significance detection method based on depth information and device
CN110751090B (en) Three-dimensional point cloud labeling method and device and electronic equipment
Nordeng et al. DEBC detection with deep learning
CN114445363A (en) System, method, device, processor and storage medium for realizing dangerous goods identification based on multi-mode data attention model
CN116092096A (en) Method, system, device and medium for verifying the authenticity of a declared message
CN116645586A (en) Port container damage detection method and system based on improved YOLOv5
Andrews et al. Representation-learning for anomaly detection in complex x-ray cargo imagery
CN113095404B (en) X-ray contraband detection method based on front-back background convolution neural network
Zhao et al. Simultaneous material segmentation and 3D reconstruction in industrial scenarios
CN112884755B (en) Method and device for detecting contraband
Esmaeily et al. Building roof wireframe extraction from aerial images using a three-stream deep neural network
CN115546824B (en) Taboo picture identification method, apparatus and storage medium
CN116228637A (en) Electronic component defect identification method and device based on multi-task multi-size network
CN113312970A (en) Target object identification method, target object identification device, computer equipment and storage medium
CN116188361A (en) Deep learning-based aluminum profile surface defect classification method and device
Li et al. Deep Learning-based Model for Automatic Salt Rock Segmentation
CN116994024A (en) Method, device, equipment, medium and product for identifying parts in container image
CN114140612A (en) Method, device, equipment and storage medium for detecting hidden danger of power equipment
CN113191237A (en) Improved YOLOv 3-based fruit tree image small target detection method and device
Delgado et al. Methodology for generating synthetic labeled datasets for visual container inspection
CN111950475A (en) Yalhe histogram enhancement type target recognition algorithm based on yoloV3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination