WO2019057200A1

WO2019057200A1 - Inspection method and inspection device and computer readable medium

Info

Publication number: WO2019057200A1
Application number: PCT/CN2018/107317
Authority: WO
Inventors: 赵自然; 张; 李强; 刘耀红; 顾建平
Original assignee: 清华大学; 同方威视技术股份有限公司
Priority date: 2017-09-25
Filing date: 2018-09-25
Publication date: 2019-03-28
Also published as: CN109557114A; CN109557114B

Abstract

An inspection method and an inspection device, and a computer readable medium. Using an x-ray to scan an inspected object to obtain an x-ray image of the inspected object; using a convoluted neural network to process the x-ray image of the inspected object to obtain a class activity map of the inspected object; and, on the basis of the class activity map, determining whether the inspected object comprises a suspicious object. More accurate security inspection results can be obtained.

Description

Inspection method and inspection device and computer readable medium

Technical field

Embodiments of the present disclosure relate to security inspections, and more particularly to an inspection method and inspection apparatus for inspecting an article entrainment and a computer readable medium.

Background technique

The emergence of containers has greatly improved the efficiency of cargo transportation. With the rapid development of the global economy, container transportation plays an important role in the modern transportation industry. Container cargo transportation has the characteristics of easy loading and unloading and easy to carry, but it also causes illegal elements in the transportation process to trap some contraband. Especially in the import and export of customs, there are many bulk goods, and it is not practical to open the boxes one by one. It is necessary to use the method of radiation imaging to carry out rapid inspection.

However, existing inspection techniques utilize features such as texture of the image to inspect entrained items. For example, the existing entrainment detection method calculates the difference in texture between the local and its surroundings for the transmission image of the object to be inspected. After traversing the image, other features are integrated, and the portion with large difference and non-noise is judged to be entrained suspect. However, due to the wide variety of items, it is difficult to accurately check whether there are any entrained items in the container.

Summary of the invention

In view of one or more problems in the prior art, an inspection method and inspection apparatus and a computer readable medium are proposed, which are capable of more accurately determining whether or not an entrained item is contained in a cargo such as a container.

In an aspect of the present disclosure, an inspection method is provided, comprising the steps of: scanning an object to be inspected by X-rays to obtain an X-ray image of the object to be inspected; and processing an X-ray image of the object to be inspected by using a convolutional neural network, Obtaining a class activity map of the object to be inspected; and determining whether the object to be inspected includes a suspicious object based on the class activity map.

According to some embodiments of the present disclosure, the convolutional neural network includes a plurality of paths corresponding to different scales, each path having at least one convolution layer, a pooling layer after the at least one convolution layer, and a whole a convolution layer, and the full convolution layer is used to output a weight vector at a corresponding scale, and the step of obtaining a class activity map of the object to be inspected includes: using a weight vector outputted by each path and a last pool in the path The features of the convolutional layer before the layer are weighted and summed to obtain the class activity map at the scale; the class activity maps of the plurality of scales are merged to obtain the class activity map of the object to be inspected.

According to some embodiments of the present disclosure, a class activity map of the original scale and at least one smaller scale class activity map are obtained, and the class activity map of the at least one smaller scale is upsampled to obtain an upsampled class activity The graph and the original activity map of the original scale and the class activity map of the upsampled are obtained to obtain a class activity map of the object to be inspected.

According to some embodiments of the present disclosure, determining, according to the class activity map, whether the object to be inspected includes a suspicious object comprises: obtaining a thermogram based on a class activity map of the object to be inspected and the X-ray image; A method of threshold division is used to determine whether a suspicious object is included in the heat map.

According to some embodiments of the present disclosure, the class map of the object to be inspected and the X-ray image are weighted and summed to obtain the thermogram.

According to some embodiments of the present disclosure, the plurality of paths in the convolutional neural network share at least one convolutional layer and at least one pooled layer.

In another aspect of the present disclosure, an inspection apparatus is provided, comprising: a scanning device that scans an object to be inspected with X-rays to obtain an X-ray image; and a processor configured to: process the object with a convolutional neural network Checking an X-ray image of the object to obtain a class activity map of the object to be inspected; and determining whether the object to be inspected includes a suspicious object based on the class activity map.

According to some embodiments of the present disclosure, the convolutional neural network includes a plurality of paths corresponding to different scales, each path having at least one convolution layer, a pooling layer after the at least one convolution layer, and a whole a convolutional layer, and the full convolutional layer is used to output a weight vector at a corresponding scale, the processor being configured to: use the weight vector output by each path and the volume before the last pooled layer in the path The features of the layer are weighted and summed to obtain a class activity map of the scale; and the class activity maps of the plurality of scales are merged to obtain a class activity map of the object to be inspected.

According to some embodiments of the present disclosure, the processor is configured to: obtain a class activity map of the original scale and at least one category activity map of the smaller scale, and upsample the class activity map of the at least one smaller scale And obtaining a upsampled class activity map, and merging the class activity map of the original scale and the upsampled class activity map to obtain a class activity map of the object to be inspected.

According to some embodiments of the present disclosure, the processor is configured to: obtain a thermogram based on a class activity map of the object to be inspected and the X-ray image; and determine whether the thermogram includes suspiciousness by using a method of threshold division Object.

According to some embodiments of the present disclosure, the processor is configured to perform a weighted summation of a class activity map of the object under inspection and the X-ray image to obtain the thermogram.

In still another aspect of the present disclosure, a computer readable medium is provided, stored with a computer program that, when executed by a processor, implements the steps of: processing an X-ray image of an object under inspection using a convolutional neural network, a class activity map of the object to be inspected; determining whether the object to be inspected includes a suspicious object based on the class activity map.

With the solution of the above embodiment, the position of the suspicious item is judged based on the category activity map, and a more accurate security check result can be obtained.

DRAWINGS

For a better understanding of the present disclosure, the present disclosure will be described in detail in accordance with the following drawings:

FIG. 1 shows a schematic diagram of an inspection apparatus according to an embodiment of the present disclosure;

Figure 2 is a diagram showing the internal structure of a computer for image processing in the embodiment shown in Figure 1;

FIG. 3 is a schematic flowchart for explaining an inspection method according to an embodiment of the present disclosure; FIG.

4 is a schematic diagram depicting a convolutional neural network in an embodiment of the present disclosure;

FIG. 5 is a schematic diagram showing a convolutional neural network used in an inspection apparatus and an inspection method of an embodiment of the present disclosure;

6 is a schematic diagram describing a process of calculating a class-like activity map in an inspection method of an embodiment of the present disclosure;

FIG. 7 shows an example of an inspection result obtained in an inspection method according to an embodiment of the present disclosure.

Detailed ways

The embodiments of the present invention are described in detail below, and it should be noted that the embodiments described herein are for illustrative purposes only and are not intended to limit the invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be apparent to those skilled in the art that the invention In other instances, well-known structures, materials, or methods are not specifically described in order to avoid obscuring the invention.

In view of the problems in the prior art, embodiments of the present disclosure propose an inspection technique that uses a convolutional neural network to process an X-ray image of an object to be inspected to obtain a class activity map, and then judge based on the class activity map. Whether the object being inspected contains suspicious items. Such an inspection technique can more accurately determine whether or not a suspicious item is included in the object to be inspected. For example, the early entrainment detection method is to find the difference between the local and the surrounding texture for the current picture. After traversing the image, the other features are integrated, and the part with large difference and non-noise is determined as the entrainment suspect. However, the technique proposed by the present disclosure utilizes the learning ability of the deep learning network, relies on a large number of training samples, learns the feature structure inside the radiation image and combines the specific characteristics of the shallow layer, and then determines the location of the abnormality through the class activity map. In particular, the techniques of the present disclosure characterize the locations in the image where anomalies may occur by convolving neural network generated class activity maps and by threshold determination methods to find locations where there may be entrained items.

FIG. 1 shows a schematic structural view of an inspection apparatus according to an embodiment of the present disclosure. The inspection apparatus 100 as shown in FIG. 1 includes an X-ray source 110, a detector module 130, a data collection device 150, a controller 140, a computing device 160, and the like. The source 110 includes one or more X-ray generators that can perform a single-energy transmission scan or a dual-energy transmission scan under the control of the control 140.

As shown in FIG. 1, an inspected object 120, such as a container truck, moves through a scanning area between the source 110 and the detector 130. In some embodiments, the detector 130 and the data acquisition device 150 are, for example, detectors and data collectors having an integral modular structure, such as multiple rows of detectors, for detecting radiation transmitted through the object under inspection 120, obtaining an analog signal, and The analog signal is converted into a digital signal, thereby outputting a transmission image of the object under inspection 120 for X-rays. In the case of dual energy, for example, one row of detectors can be provided for high energy rays, another row of detectors for low energy rays, or the same row of detectors for high energy and low energy rays. The controller 140 is used to control the various parts of the entire system to work synchronously. A computing device 160, such as a computer, is used to process the data collected by the data collection device 150, process the image data, and output the results. For example, the processing device 160 runs an image processing program, analyzes and learns the scanned X-ray image, and processes the X-ray image of the object to be inspected, for example, by using a convolutional neural network, to obtain a class activity map of the object to be inspected 120. Further, based on the class activity map, it is determined whether or not the object to be inspected includes a suspicious object.

According to this embodiment, the detector module 130 and the data acquisition device 150 are used to acquire transmission data of the object 110 under inspection. The data acquisition device 150 includes a data amplification shaping circuit that operates in either (current) integration mode or pulse (count) mode. The data output cable of data collection device 150 is coupled to controller 140 and computing device 160, and the acquired data is stored in computing device 160 in accordance with a trigger command.

In some embodiments, the detector 130 includes a plurality of detection units that receive X-rays that penetrate the object under inspection 120. The data acquisition device 150 is coupled to the detector 130 to convert the signal generated by the detector 130 into probe data. The controller 140 is coupled to the radiation source 110 via a control line, to the detector 130 via another control line, and further coupled to the data acquisition device 150 for controlling one or more X-ray generators in the radiation source 110 to the object being inspected 120 performs a single-energy scan, or performs a dual-energy scan on the object to be inspected 120, so that X-rays are transmitted through the object to be inspected 120 as the object to be inspected 120 moves. In addition, controller 140 controls detector 130 and data acquisition device 150 to obtain corresponding transmission data, such as single energy transmission data or dual energy transmission data. The computing device 160 obtains an image of the object under inspection 120 based on the transmission data, processes the image to obtain a class activity map of the object 120 to be inspected, and then determines whether the object to be inspected 120 includes a suspicious object based on the class activity map.

FIG. 2 shows a block diagram of the structure of the computing device shown in FIG. 1. As shown in FIG. 2, computing device 160 includes storage device 161, read only memory (ROM) 162, random access memory (RAM) 163, input device 164, processor 165, display device 166 and interface unit 167, bus 168, and the like. .

The data collected by the data collection device 150 is stored in the storage device 161 via the interface unit 1677 and the bus 168. Configuration information and a program of the computer data processor are stored in a read only memory (ROM) 162. A random access memory (RAM) 163 is used to temporarily store various data during the operation of the processor 165. In addition, a computer program for performing data processing is also stored in the storage device 161. The internal bus 168 is connected to the above-described storage device 161, read only memory 162, random access memory 163, input device 164, processor 165, display device 168, and interface unit 167.

After the user inputs an operation command through the input device 164 such as a keyboard and a mouse, the instruction code of the computer program instructs the processor 165 to execute the data processing algorithm, and after obtaining the data processing result, displays it on an LCD display or the like. The processing result is output on the display device 166 or directly in the form of a hard copy such as printing.

For example, the source 110 can be a radioisotope (e.g., cobalt-60), a low energy X-ray machine, or a high energy X-ray accelerator.

For example, the detector 130 is divided into materials, which may be gas detectors, scintillator detectors or solid detectors, etc., which are divided into arrays, which may be single row, double row or multiple rows, and single layer detectors or Double-layer high and low energy detectors, etc.

What has been described above is that the object to be inspected 120, such as a container truck, moves through the inspection area, but it will be appreciated by those skilled in the art that the object to be inspected 120 may be stationary while the source of radiation and the detector are moved to complete the scanning process.

FIG. 3 is a schematic flow chart for explaining an inspection method according to an embodiment of the present disclosure. As shown in FIG. 3, in step S310, the object to be inspected 120 is subjected to X-ray scanning using the inspection apparatus shown in FIG. 1, and an X-ray image of the object to be inspected 120 is obtained. Then, optionally, in step S320, the X-ray image of the object to be inspected is pre-processed, such as denoising or normalization. In step S330, the X-ray image is processed using a convolutional neural network to obtain a class activity map. The process of obtaining a class activity map from an X-ray image is described in detail below.

Deep learning is a branch of the machine learning field. It inserts the original shallow neural network into more hidden layers to achieve distributed expression of input information. Different from traditional shallow learning, deep learning emphasizes the depth of the model. The more the hidden layer of the model, the stronger the learning ability. At the same time, deep learning emphasizes the importance of feature learning, and believes that features learned at different depths can make classification and prediction more accurate. Weak labeling (weak supervised positioning) is mainly based on the labeling of weakly supervised convolutional neural networks. It means that in the training process of the network, only the data is given as the constraint of the image scale, which allows the network to automatically learn more in complex scenes. Things. The basis of weak annotation implementation is mainly from: (1) the hierarchical structure of CNNs - a certain tendency for the discrimination of spatial position; (2) effective end-to-end training.

4 is a schematic diagram depicting a convolutional neural network in an embodiment of the present disclosure. As described above, in order to identify features in a transmitted image, embodiments of the present disclosure propose to use a convolutional neural network CNN to identify features in an image. The convolutional neural network 400 in accordance with an embodiment of the present disclosure is described in detail below in conjunction with FIG. The convolutional neural network 400 as shown in FIG. 4 may generally include a plurality of

convolutional layers

420 and 440, which are generally small neurons that are partially overlapping each other (which is also referred to in a mathematical sense) A collection of convolution kernels, which are used interchangeably unless otherwise stated. Moreover, in the context of the present disclosure, layers of input data (or input layers, such as input layer 410 of FIG. 4) that are closer to input data (for any two layers in convolutional neural network 400) are, unless explicitly stated otherwise, A layer referred to as "before" or "below" and another layer closer to the output data (or output layer, such as output layer 470 of FIG. 4) is referred to as a "behind" or "on" layer. Moreover, the direction from the input layer (eg, input layer 410 of FIG. 4) to the output layer (eg, output layer 470 of FIG. 4) during training, verification, and/or use is referred to as forward or forward (forward) The direction from the output layer (eg, output layer 470 of FIG. 4) to the input layer (eg, input layer 410 of FIG. 4) is referred to as backward or backward.

Taking the first convolutional layer 420 shown in FIG. 4 as an example, these small neurons can process various parts of the input image. The outputs of these small neurons are then combined into one output (referred to as a feature map, such as a square in the first convolutional layer 420) to obtain an output image that better represents certain features in the original image. At the same time, the partially overlapping arrangement between adjacent neurons also causes the convolutional neural network 400 to have a degree of translational tolerance for features in the original image. In other words, the convolutional neural network 400 can correctly identify the feature even if the feature in the original image changes its position in a translational manner within a certain tolerance. A detailed description of the convolutional layer will be given later and will not be discussed in detail herein.

The next layer is the optional pooling layer, the first pooling layer 430, which is mainly used to downsample the output data of the previous convolution layer 420 while maintaining the features, reducing the calculation. Quantity and prevent overfitting.

The next layer is also a convolutional layer, and the second convolutional layer 440 can perform further feature sampling on the output data generated by the first convolutional layer 420 and downsampled via the pooling layer 430. Intuitively, the features learned are globally larger than those learned by the first convolutional layer. Similarly, subsequent convolutional layers are global to the characteristics of the previous convolutional layer.

The convolutional layer (eg, the first and second convolutional layers 420 and 440) is the core building block of the CNN (eg, convolutional neural network 400). The parameters of this layer consist of a collection of learnable convolution kernels (or simply convolution kernels), each with a small receptive field, but extending over the entire depth of the input data. In the forward process, each convolution kernel is convolved along the width and height of the input data, the dot product between the elements of the convolution kernel and the input data is computed, and a two-dimensional activation map of the convolution kernel is generated. As a result, the network is able to learn the convolution kernel that can be activated when a particular type of feature is seen at a spatial location of the input.

The activation maps of all convolution kernels are stacked in the depth direction to form the full output data of the convolutional layer. Thus, each element in the output data can be interpreted as an output of a convolution kernel that sees small regions in the input and shares parameters with other convolution kernels in the same activation map.

The depth of the output data controls the number of convolution kernels in the same area of the layer that are connected to the input data. For example, as shown in FIG. 4, the depth of the first convolutional layer 420 is 4, and the depth of the second convolutional layer 440 is 6. All of these convolution kernels will learn to activate for different features in the input. For example, if the first convolutional layer 420 is input with the original image, different convolution kernels along the depth dimension (ie, different squares in FIG. 4) may have various directional edges, or grayscales, appearing in the input data. Activated when the block.

The training process is a very important part of deep learning. In order to ensure that the network can effectively converge, a stochastic gradient descent method can be used. For example, the Nesterov optimization algorithm can be used to solve. In some embodiments, the initial learning rate can be set to start at 0.01 and gradually decrease until an optimal value is found. Moreover, in some embodiments, for an initial value of the weight, a Gaussian random process with a smaller variance can be used to initialize the weight values of the respective convolution kernels. In some embodiments, the image training set may employ a tagged item image that is each labeled with a feature location in the image. Although two examples of convolutional layers, two pooled layers and fully connected layers are given in Figure 4, those skilled in the art will appreciate that convolutional layers and pooling layers can be used to implement convolution. Neural Networks.

The Class Activation Map used in the embodiments of the present disclosure refers to a discriminable region of each type of image obtained by convolutional neural networks. FIG. 5 shows a schematic diagram of a convolutional neural network used in an inspection apparatus and an inspection method of an embodiment of the present disclosure.

As shown in FIG. 5, the convolutional neural network of the embodiment of the present disclosure may include a first convolutional layer 510, a first pooling layer 511, a second convolutional layer 512, a second pooling layer 513, and a third convolutional layer. 514, third pooling layer 515, fourth convolution layer 516, fourth pooling layer 517, fifth convolution layer 518, fifth pooling layer 519, sixth convolution layer 520, seventh convolution layer 521 The full convolutional layer 522 and the classification layer 523. As shown in Figure 5, the network used is mainly composed of convolutional layers. During the forward propagation of the network, multiple branches are intercepted from different network depths to obtain network features of different scales. For a pathway. For the end of each path, the present invention takes the following actions. First, a global average is obtained, for example, after the third convolutional layer 515 and the fourth convolutional layer are respectively connected to the global average pooling layer, the fourth pooling layer 516 and the fifth pooling layer 518, and in the fourth pool. The layer 516 and the fifth pooling layer 518 are respectively connected to a

full convolution layer

542 and 532, and the features of the third pooling layer 515 are used as input of the fourth convolution layer 516 (full convolution layer). The features of the four pooling layer 517 are input to the fifth convolutional layer 518 (full convolutional layer) and then connected to the classification layers -

softmax layers

543 and 532, respectively, to obtain corresponding classification results. Similarly, the global convolutional layer 522 is connected after the seventh convolutional layer 521, and is then connected to the classification layer 522. If you want to determine the importance of the classification feature in the original image, you only need to apply the weight of the output layer (output from the classification layer) to the convolution layer before the last pooling layer of the path for each path, and perform weighted summation. A class-like activity map in the pathway can be obtained. The position of the entrained item is analyzed based on the activity map of at least one of the passages.

According to an embodiment of the present disclosure, the convolution layer performs a convolution operation on the image, and as the depth of the neural network changes, features of different scales are obtained. For example, the convolution kernel has a length and a width of 3, a step length of 1 and a width of 1 for traversing the image from left to right and top to bottom. The operation taken in the traversal is to perform a dot-plus summation on the convolution and the size of the image and the convolution sum as a result of the position.

According to an embodiment of the present disclosure, after the convolution layer, the pooling layer implements a downsampling process, which can reduce network parameters and increase the receptive field. For example, the length and width of the window of the pooling layer are 3, the length and width of the first three pooling layer steps are 2, and the length and width of the last three preceding pooling layer steps are 1.

In addition, according to an embodiment of the present disclosure, the implementation of upsampling may adopt a method of hole-convolution, the convolution kernel length, the width is 3, the step size is 1, and the hole values used are 6, 12, and 18, respectively. The hole-adding operation is to fill in zero for every two values in the convolution kernel, and the number of zeros is the hole value. After zero-padding, the convolution kernel continues to perform the convolution operation.

According to an embodiment of the present disclosure, network training is performed on an X-ray map, and a network model can be obtained. The container picture B is brought into the network, and according to the above method, according to FIG. 5 and 6, the steps are as follows:

(1) Preparation data: Due to the limited number of test data images of entrained items, a large number of pictures containing entrained items are generated by data calculation methods for network training.

(2) Data pre-processing: Pre-processing of X-ray scanned images of goods in containers, including de-averaging and image size cropping.

Since the radiation image is a single-channel grayscale image, assuming that the data set is A, the mean value is processed by the following equation, that is,

A _mean =A-mean(A)

Where mean() means to find the mean. The network after de-averaging is tailored to obtain training samples.

(3) Training network: In the training phase, a training method is adopted. As shown in FIG. 5, the network adopts a hopping structure, and three branches appear after the pooling layer 3 and the pooling layer 4 and after the convolution network 7, respectively. The network borrows all-connected networks to comprehensively utilize features at the shallow, medium, and deep levels to assist the network in training prediction. The combination of features of different scales greatly improves the accuracy of model prediction. Those skilled in the art will appreciate that more branches, for example four, or fewer branches, for example two, may be provided.

(4) Test phase: Bring the picture to be tested into the network, and extract the features of the convolution layer online by feedforward calculation, and record it as scores.

(5) Extract the values of the fully connected layers at the end of the network in each path, and record them as weight vectors w1, w2, w3.

(6) Extract the features of the convolution layer in each path, denoted as CAM_conv1, CAM_conv2, CAM_conv3.

(7) Find the mean of each feature layer, ie scores(i)=∑ _j scores(i,j), and arrange them in descending order. The feature layer above the threshold is obtained for subsequent operations.

(8) For each channel, the characteristics of the convolution layer before the last pooling layer in the path are multiplied and weighted to obtain a CAM map of the channel, for example, a CAM map of three paths: w1*CAM_conv1 , w2*CAM_conv2., w3*CAM_conv3, as shown in Figure 6.

Specifically, for the picture to be determined, let f _k (x, y) denote the unit k activity of the spatial position (x, y) on the convolution layer before the last pooling layer in one path. Thus for unit k, the global average pooled result F ^{k is} expressed as

F ^k =∑ _x,y f _k (x,y)

For a category c, enter the classification function, then

among them,

Is the weight of the category c on the corresponding unit k, indicating the degree of importance of F ^k in the category c. Finally, the classification output of category c is,

Since the offset term has no effect on the classification result, the offset term is not considered here, and the input bias of the softmax layer is set to zero. Introducing F ^k =∑ _x,y f _k (x,y)

In, there is

Here, S _c = ∑ _{x, y} M _c (x, y), so M _c (x, y) directly indicates the importance of the spatial position class c activity.

(9) The three channels are subjected to interpolation (for example, upsampling) of the active images CAM1, CAM2, and CAM3, respectively, to obtain an image of the original scale, and then summed and summed to obtain a final CAM map.

(10) Convert the CAM map into a heat map, add the corresponding scale factor to the original image, and obtain the weighted sum as the final result.

Output=0.3×image+0.7×CAM

The above image represents the original image, and CAM represents the CAM image. Those skilled in the art understand that other weighting coefficients can be used to weight the two and output the result.

(11) A method of threshold division is used to extract an abnormal region in the image and is indicated by a square. Those skilled in the art will appreciate that the above process of converting a CAM map into a heat map may be omitted, or other weighting factors may be used to obtain an output result.

Further, although the above embodiment employs a method of setting a plurality of branches in one neural network, those skilled in the art can understand that the input image can also be down-sampled to obtain images of a plurality of scales. Images for multiple scales are processed using their respective convolutional neural networks. Then, the processing results of the respective channels are upsampled, and the results of the multiple channel processing are combined on the original image size.

The present disclosure utilizes the local characteristics of convolutional neural networks to locate the properties of objects, and introduces multi-scale features of convolutional neural networks to describe the location information of different classes of objects. By using upsampling, it is possible to maintain the consistency of the classified picture and the original picture in scale, thereby more accurately and clearly characterizing the position information of the item.

The above detailed description has set forth numerous embodiments of inspection methods and inspection apparatus by using schematics, flowcharts, and/or examples. In the event that such schematics, flowcharts, and/or examples include one or more functions and/or operations, those skilled in the art will appreciate that each function and/or operation in such a schematic, flowchart, or example may They are implemented individually and/or collectively by various structures, hardware, software, firmware or virtually any combination thereof. In one embodiment, portions of the subject matter of embodiments of the present invention may be implemented in an application specific integrated circuit (ASIC), field programmable gate array (FPGA), digital signal processor (DSP), or other integrated format. However, those skilled in the art will appreciate that some aspects of the embodiments disclosed herein may be implemented in an integrated circuit as a whole or in part, as one or more of one or more computers running on one or more computers. A computer program (eg, implemented as one or more programs running on one or more computer systems) implemented as one or more programs running on one or more processors (eg, implemented as one or One or more programs running on a plurality of microprocessors, implemented as firmware, or substantially in any combination of the above, and those skilled in the art, in accordance with the present disclosure, will be provided with design circuitry and/or write software and / or firmware code capabilities. Moreover, those skilled in the art will recognize that the mechanisms of the subject matter described herein can be distributed as a variety of forms of program products, and regardless of the particular type of signal-bearing medium that is actually used to perform the distribution, the subject matter of the present disclosure The exemplary embodiments are applicable. Examples of signal bearing media include, but are not limited to, recordable media such as floppy disks, hard drives, compact disks (CDs), digital versatile disks (DVDs), digital tapes, computer memories, and the like; and transmission-type media such as digital and / or analog communication media (eg, fiber optic cable, waveguide, wired communication link, wireless communication link, etc.).

While the invention has been described with respect to the exemplary embodiments illustrated embodiments The present invention may be embodied in a variety of forms without departing from the spirit or scope of the invention. It is to be understood that the invention is not limited to the details. All changes and modifications that come within the scope of the claims or the equivalents thereof are intended to be covered by the appended claims.

Claims

An inspection method comprising the steps of:

Scanning the object to be inspected by X-rays to obtain an X-ray image of the object to be inspected;

Processing an X-ray image of the object to be inspected by using a convolutional neural network to obtain a class activity map of the object to be inspected;

Determining whether or not the object to be inspected includes a suspicious object based on the class activity map.
The inspection method according to claim 1, wherein said convolutional neural network comprises a plurality of paths corresponding to different scales, each path having at least one convolution layer, pooling after said at least one convolution layer a layer and a full convolution layer, and the full convolution layer is used to output a weight vector at a corresponding scale, and the step of obtaining a class activity map of the object to be inspected includes:

Weighting and summing the weight vector outputted by each path and the feature of the convolution layer before the last pooling layer in the path to obtain a class activity map at the scale;

A class activity map of a plurality of scales is integrated to obtain a class activity map of the object to be inspected.
The inspection method according to claim 2, wherein a class activity map of the original scale and at least one smaller scale class activity map are obtained, and the class activity map of the at least one smaller scale is upsampled to obtain an upsampling The class activity map, and the original scale class activity map and the upsampled class activity map are combined to obtain a class activity map of the object to be inspected.
The inspection method according to claim 1, wherein the determining, based on the class activity map, whether the object to be inspected includes a suspicious object comprises:

Generating a heat map based on the class activity map of the object to be inspected and the X-ray image;

A method of threshold division is used to determine whether a suspicious object is included in the heat map.
The inspection method according to claim 4, wherein the heat map is obtained by weighting and summing the class activity map of the object to be inspected and the X-ray image.
The inspection method according to claim 2, wherein the plurality of paths in the convolutional neural network share at least one convolution layer and at least one pooling layer.
An inspection device comprising:

a scanning device that scans an object to be inspected with X-rays to obtain an X-ray image;

Processor, configured as:

Processing an X-ray image of the object to be inspected by using a convolutional neural network to obtain a class activity map of the object to be inspected;

Determining whether or not the object to be inspected includes a suspicious object based on the class activity map.
The inspection apparatus according to claim 7, wherein said convolutional neural network includes a plurality of paths corresponding to different scales, each path having at least one convolutional layer, pooling after said at least one convolutional layer a layer and a full convolution layer, and the full convolution layer is used to output a weight vector at a corresponding scale, the processor being configured to:

Weighting and summing the weight vector outputted by each path and the feature of the convolution layer before the last pooling layer in the path to obtain a class activity map at the scale;

A class activity map of a plurality of scales is integrated to obtain a class activity map of the object to be inspected.
The inspection apparatus of claim 8 wherein said processor is configured to:

Obtaining a class activity map of the original scale and at least one class activity map of the smaller scale,

Upsampling the class activity map at the at least one smaller scale to obtain an upsampled class activity map, and

A class activity map of the original scale and an upsampled class activity map are combined to obtain a class activity map of the object to be inspected.
The inspection apparatus of claim 7 wherein said processor is configured to:

Generating a heat map based on the class activity map of the object to be inspected and the X-ray image;

A method of threshold division is used to determine whether a suspicious object is included in the heat map.
The inspection apparatus according to claim 10, wherein said processor is configured to perform a weighted summation of a class activity map of said object to be inspected and said X-ray image to obtain said heat map.
The inspection apparatus according to claim 8, wherein the plurality of paths in the convolutional neural network share at least one convolution layer and at least one pooling layer.
A computer readable medium storing a computer program, the computer program being executed by a processor to implement the following steps:

Processing an X-ray image of the object to be inspected by using a convolutional neural network to obtain a class activity map of the object to be inspected;

Determining whether or not the object to be inspected includes a suspicious object based on the class activity map.