CN114708189A

CN114708189A - Deep learning-based multi-energy X-ray image fusion method and device

Info

Publication number: CN114708189A
Application number: CN202210172479.1A
Authority: CN
Inventors: 刘祎; 刘宇航; 桂志国; 张权; 颜溶標
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-07-05

Abstract

The invention relates to a multi-energy X-ray image fusion method and a device based on deep learning, which are used for collecting different energy X-ray images of different workpieces as training data sets; inputting the X-ray diagram of the training data set into an encoder, a training encoder and a decoder, and obtaining the trained encoder and decoder after a training network is stable; adding a fusion strategy combining channel attention and spatial attention based on fuzzy entropy between a trained encoder and a trained decoder; inputting X-ray diagrams with different energies to a trained encoder to extract features, fusing the feature diagrams by using a fusion strategy combining channel attention and space attention based on fuzzy entropy, and inputting the fused feature diagrams into a trained decoder to output a fusion result. The method and the device of the invention can effectively reflect the information of the workpiece and improve the detection accuracy by processing the image.

Description

Deep learning-based multi-energy X-ray image fusion method and device

Technical Field

The invention relates to a method and a device for fusing a multi-energy X-ray image based on deep learning.

Background

Important parts with complex structures play an irreplaceable role in the fields of aerospace, national defense and industrial application, manufacturers need to strictly control the quality when manufacturing the workpieces, and X-rays are often regarded as tools for quality detection. The digital X-ray imaging technology can complete a series of work such as defect detection and internal structure analysis, but is limited by the structure and material of a workpiece, and a detection object with a large thickness ratio has the phenomena of overexposure and underexposure in a single-energy system, so that comprehensive structural information is difficult to reflect. Specifically, the method comprises the following steps: in an X-ray diagram, when a thicker region of the workpiece shows well, only a small portion of X-rays are absorbed at the thinner region, and overexposure may occur; in contrast, when the thin portion of the workpiece shows a good appearance, the thick portion cannot secure the residual intensity after the X-ray passes through, and may appear as an underexposure. The image fusion can fuse information of different sensors or the same sensor under different imaging conditions, so that the resolution and definition of the image are improved better, and the related characteristics of the image are enhanced and supplemented mutually. The problem of coexistence of overexposure and underexposure during workpiece detection can be well solved by fusing X-ray images under different energy irradiation. There are two important factors that determine the final result of image fusion: firstly, how to detect and extract the information in the source image, and secondly, designing a proper rule to fuse the extracted feature information.

In the prior art, the disclosed and popular image fusion methods can be classified into 5 types: a spatial domain processing algorithm, a transform domain processing algorithm, a sparse representation algorithm, a deep learning algorithm, and a composite algorithm. The spatial domain processing algorithm directly processes the image by taking pixel points of the original image as a basic unit, a typical example is to obtain a fusion weight value graph by adopting weighted average, the processing speed is high, noise is easily introduced, and the importance of each feature is difficult to determine in a self-adaptive manner. The image fusion method performed in the transform domain trades computational complexity for the preservation of relatively much detail. The image is fused by sparse coefficient characteristics based on the block sparse representation algorithm, the process is simple, but the algorithm is influenced by a block strategy and the size of the block size and is often interfered by unnecessary background information, a dictionary obtained by training contains too many unnecessary characteristics, the fusion result has spatial discontinuity and sometimes contains black edges, and the practical application is not facilitated. The method based on deep learning has more applications in infrared and visible light image fusion, medical image fusion and multi-focus image fusion, and embodies better fusion effect and good robustness, and the published literature of the multi-energy X-ray image fusion method under deep learning is limited. Most workpiece detection methods based on X-ray acquire transillumination subgraphs with different thickness ranges by increasing the voltage of an X-ray tube, and obtain a fusion result through weighted fusion of the subgraphs, but the depth range of an image possibly exceeds the display capability of equipment in a weighted fusion process, and the structural information of a workpiece cannot be completely displayed.

Disclosure of Invention

The invention aims to provide a method and a device for fusing a multi-energy X-ray image based on deep learning, which can effectively reflect workpiece information and improve the detection accuracy.

Based on the same inventive concept, the invention has two independent technical schemes:

1. a multi-energy X-ray image fusion method based on deep learning comprises the following steps:

step 1: collecting different energy X-ray diagrams of different workpieces as a training data set;

step 2: inputting the X-ray diagram of the training data set into an encoder, a training encoder and a decoder, and obtaining the trained encoder and decoder after a training network is stable;

and step 3: adding a fusion strategy combining channel attention and spatial attention based on fuzzy entropy between a trained encoder and a trained decoder;

and 4, step 4: inputting X-ray diagrams with different energies to a trained encoder to extract features, fusing the feature diagrams by using a fusion strategy combining channel attention and space attention based on fuzzy entropy, and inputting the fused feature diagrams into a trained decoder to output a fusion result.

Further, in step 2, the encoder consists of a main branch and an auxiliary branch, the main branch firstly uses a 1 × 1 convolution layer to increase the number of characteristic channels for the input image, and then uses 4 layers of multi-scale convolution blocks to extract the global characteristics of the image; and the auxiliary branch uses a trainable edge detection operator to enhance the edge information of the input image to obtain a feature map with the same dimension as that of the main branch, and the only difference between the auxiliary branch and the main branch is that the trainable edge detection operator replaces the initial convolution layer of the main branch.

Furthermore, in step 2, the features extracted by the main branch and the auxiliary branch of the encoder are added and then enter the decoder, the number of image channels is changed into a single channel by the convolution layer at the tail end of the decoder, and a reconstructed image is output.

Further, in step 2, the edge detection operator adopted by the auxiliary branch is a Sobel operator, a Laplacian operator, a Canny operator or a LOG operator.

Further, in step 2, the main branch and the auxiliary branch are both connected with dense bypasses, and the bypasses are used for improving the feature multiplexing capability.

Further, in step 2, the multi-scale convolution block has four branches, and each convolution block is composed of a convolution layer, an activation function and a batch normalization layer.

Further, in step 2, the activation function of the convolution block adopts a Sigmoid function, a Relu function, a Tanh function, a Softmax function or a Leaky Relu function.

Further, in step 2, the composite loss function for guiding the network training is formed by weighting and combining an L1 loss function and a consistency loss function based on the image block; the consistency loss function based on the image blocks is used for calculating the L1 norm value loss of the local energy graph of the input image and the local energy graph of the output image, and the calculation mode of pixel points in the local energy graph is that the sum of squares of difference values of adjacent pixels and intermediate pixels in one image block is calculated, and then the average is calculated.

Further, in step 3, the fusion strategy combining the channel attention and the spatial attention based on the fuzzy entropy is specifically that the feature map is fused by using a channel attention module and a spatial attention module based on the fuzzy entropy respectively, and the average value of the outputs of the two fusion modules is taken as the feature map fusion result.

2. A multi-energy X-ray image fusion device based on deep learning is used for executing the method.

The invention has the following beneficial effects:

the invention collects different energy X-ray diagrams of different workpieces as a training data set; inputting the X-ray diagram of the training data set into an encoder, a training encoder and a decoder, and obtaining the trained encoder and decoder after a training network is stable; adding a fusion strategy combining channel attention and spatial attention based on fuzzy entropy between a trained encoder and a trained decoder; inputting X-ray diagrams with different energies to a trained encoder to extract features, fusing the feature diagrams by using a fusion strategy combining channel attention and space attention based on fuzzy entropy, and inputting the fused feature diagrams into a trained decoder to output a fusion result. The invention utilizes the strong feature extraction capability of the neural network and the feature fusion strategy combining the channel attention and the spatial attention based on the fuzzy entropy, well fuses effective information in different energy X-ray images of the workpiece while considering the complexity of the algorithm, the fused images have rich edge details, the proposed algorithm also has stronger robustness to different fusion objects, overcomes the problems of overexposure and underexposure of the workpiece with uneven appearance and complex internal structure, completely and clearly displays the internal information of the complex component, and improves the accuracy of the detection and measurement of the internal defects of the workpiece to a greater extent.

The encoder comprises a main branch and an auxiliary branch, wherein the main branch firstly uses a 1 multiplied by 1 convolution layer to increase the number of characteristic channels of an input image and then uses 4 layers of multi-scale convolution blocks to extract the global characteristics of the image; the auxiliary branch uses a trainable edge detection operator to enhance edge information of an input image to obtain a feature map with the same dimension as that of the main branch; the main branch and the auxiliary branch are both connected with dense bypasses, and the bypasses are used for improving the characteristic multiplexing capability. The main branch circuit of the invention adopts 4 layers of multi-scale rolling blocks to extract the global characteristics of the image, and the auxiliary branch circuit enhances the edge information of the image, thereby further ensuring the effective extraction of the image characteristics.

The fusion strategy combining the channel attention and the spatial attention based on the fuzzy entropy is specifically that the feature map is fused by using a channel attention module and a spatial attention module based on the fuzzy entropy respectively, and the average value of the output of the two fusion modules is taken as the feature map fusion result. The invention further ensures that effective information in different energy X-ray diagrams of the workpiece is well fused through the fusion strategy.

Drawings

FIG. 1 is a schematic diagram of a deep learning-based multi-energy X-ray image fusion method of the present invention;

FIG. 2 is a schematic diagram of an encoder extraction feature of the present invention;

FIG. 3 is a schematic diagram of a multi-scale convolution block in the encoder of the present invention;

FIG. 4 is a schematic diagram of the fusion strategy of the present invention combining channel attention and spatial attention based on fuzzy entropy;

FIG. 5 is a schematic view of a channel attention module;

FIG. 6 is a schematic diagram of a spatial attention module based on fuzzy entropy.

Detailed Description

The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.

The first embodiment is as follows:

depth learning-based multi-energy X-ray image fusion method

As shown in fig. 1 and fig. 2, the method for fusing a multi-energy X-ray image based on deep learning of the present invention includes the following steps:

step 1: different energy X-ray patterns of different workpieces are acquired as a training data set.

The hyper-parameters of the network are set and the dataset image is scaled to a size suitable for the network input, in this embodiment the input image is scaled to a size of 256 x 256.

Step 2: and inputting the X-ray diagram of the training data set into the encoder, the training encoder and the decoder, and obtaining the trained encoder and decoder after the training network is stable.

The encoder consists of a main branch and an auxiliary branch, wherein the main branch firstly uses a 1 multiplied by 1 convolutional layer to increase the number of characteristic channels of an input image and then uses 4 layers of multi-scale convolutional blocks to extract the global characteristics of the image; and the auxiliary branch uses a trainable edge detection operator to enhance edge information of the input image to obtain a feature map with the same dimension as the main branch. The features extracted by the main branch and the auxiliary branch of the encoder are added and then enter a decoder, the number of image channels is changed into a single channel by a convolution layer at the tail end of the decoder, and a reconstructed image is output. The decoder structure is a U-Net complex network nested at different depths, so that fusion of feature mapping with different semantics is realized.

The main branch and the auxiliary branch are both connected with dense bypasses, and the bypasses are used for improving the characteristic multiplexing capability. The only difference between the auxiliary branch of the encoder and the main branch is that a trainable edge detection operator is used for replacing the initial convolution layer of the main branch, the auxiliary branch adds the feature maps output by each layer of multi-scale module to the feature map of the corresponding layer of the main branch, so that the edge information of the image is enhanced for each layer of feature map, and each layer of feature map added with the edge information enters the decoder to restore the image.

In order to fully utilize the concept of multi-scale fusion, convolution blocks of the encoder are all Incept ion multi-scale feature extraction modules, and the overall network fully and effectively utilizes multi-scale information and edge information of features. As shown in fig. 3, the multi-scale convolution block has four branches, each of which is composed of a convolution layer, an activation function, and a bulk normalization layer. The activation function of the volume block adopts a Sigmoid function, a Relu function, a Tanh function, a Softmax function or a Leaky Relu function. a × a is the scaled input image size; n is the number of convolution kernels of learnable edge detection operators and is also the number of convolution kernels in each convolution block; w × w is the size of the LOG template; sxs is the convolution kernel size; n, w and s can all be adjusted according to the fusion effect decision, and in addition, the network needs to use down-sampling and up-sampling to ensure that the size of the added feature map is the same, the down-sampling adopts maximum pooling, and the up-sampling adopts linear interpolation. Each branch uses 1 × 1 convolution kernel to reduce the number of channels and reduce the number of parameters, the first three branches have three different magnitude receptive fields, namely 1 × 1, 3 × 3 and 5 × 5, wherein the third branch uses two 3 × 3 convolutions to replace the 5 × 5 convolution kernel to further reduce the number of parameters while achieving the same effect, and for the fourth branch, although the four branch is the same as the receptive field of the second branch, the output with the same dimension is obtained, but the output content is different, and the diversity of feature fusion is increased. And finally, accumulating the extracted different scale characteristics on the four branches as the output of the multi-scale module. The multi-scale module increases the depth and the width of the network, and improves the generalization capability of the network.

The edge detection operator adopted by the auxiliary branch is a Sobel operator, a Laplacian operator, a Canny operator or a LOG operator. In this embodiment, a LOG (gaussian Laplacian convolution) operator is used to perform gaussian smoothing on an image, and then convolved with a Laplacian operator, a common LOG convolution template is 5 × 5, a learnable parameter α is defined, each value in the LOG convolution template is multiplied by α, α can be updated continuously along with training of a network, where the LOG convolution template with a size of 5 × 5 is:

and calculating a loss function, continuously training the network, improving the feature extraction capability of the coding and decoding network, and storing the model after the network is stable. Composite loss function (L) to guide network training^Compound) From L1 toLoss function (L)^l1) And a consistency loss function (L) based on the image blocks^lpatch) The formula is as follows:

the consistency loss function based on the image blocks is used for calculating the L1 norm value loss of the local energy graph of the input image and the local energy graph of the output image, and the calculation mode of pixel points in the local energy graph is that the sum of squares of difference values of adjacent pixels and intermediate pixels in one image block is calculated, and then the average is calculated.

As shown in fig. 1 and fig. 2, in the training phase, the network only includes the codec and does not include the fusion policy, assuming that the input is an X-ray diagram with a single channel size of a × a, the main branch of the encoder first uses one convolutional layer (containing n convolutional kernels with size of 1 × 1) to perform upscaling on the input image to generate n feature maps with size of a × a, and after passing through a convolutional block E12 (the number of output channels in the multi-scale module is n), new n feature maps with size of a × a are generated, and after passing through a convolutional block E22, n feature maps with size of a/2 × a/2 are generated, and so on, the last convolutional block of the branch of the encoder outputs n feature maps with size of a/8 × a/8.

As shown in fig. 1, the secondary branch of the encoder extracts edge information of an image using an edge detection operator (including n sets of w × w trainable edge detection operators, where w must be an odd number) of the secondary branch of the encoder, and generates n feature maps with a size of a × a, and the rest operations are the same as the primary branch, and feature maps at corresponding depths of two branches are added and output to the decoder. The decoder comprises decoder subnets with different depths, a convolution block D31 outputs n feature maps with the size of a/4 xa/4, a convolution block D21 and a convolution block D22 output n feature maps with the size of a/2 xa/2, a convolution block D11, a convolution block D12 and a convolution block D13 output n feature maps with the size of a X a, and a convolution layer at the end of the decoder outputs an X-ray map with the size of a X a.

The implementation mode of the convolution kernel initialization weight parameter is as follows: random initialization, gaussian initialization, etc. The bypass connection module in network design is adoptedThe jumping connection mode connects two convolution layers with a certain interval, and the size of the interval and the number of bypass connections can be judged and adjusted according to the number of the convolution layers. The initial learning rate and training times of the network model training module can be adjusted according to the convergence condition of the network (in this embodiment, the learning rate is initialized to 10)^-4The total number of training times was set to 10). The network parameter updating of the network model training module is realized by the following steps: in the training process, one or more of learning rate, the number of convolution kernels, the size of the convolution kernels, weight, the number of network layers and the like can be continuously adjusted according to the convergence condition and the fusion result of the network.

And step 3: a fusion strategy combining channel attention and spatial attention based on fuzzy entropy is added between the trained encoder and decoder.

In the fusion stage, a specific fusion strategy is added to each layer (four layers in total) between the trained encoder and decoder as a complete network. Inputting K different-energy X-ray diagrams to be fused into a model, wherein K groups of feature diagrams are output by a four-layer multi-scale inclusion structure of an encoder due to the fact that the number of input images is K, and a fusion strategy of each layer is to fuse the K groups of feature diagrams of the corresponding layer and input the fused feature diagrams into a decoder to obtain fused images.

As shown in fig. 4, the fusion strategy combining the channel attention and the spatial attention based on the fuzzy entropy specifically is to fuse the feature maps by using a channel attention module and a spatial attention module based on the fuzzy entropy, respectively, and take the average value of the outputs of the two fusion modules as the feature map fusion result.

For input K groups of characteristic graphs

Fusion is obtained separately

R^W×H×CThe length, width and number of channels representing the feature map are each W, H, C,

is the final fusion result of the K sets of feature maps of the mth layer, as shown in equation (1):

as shown in FIG. 5, the channel attention module uses the average pooling versus K sets of profiles

Calculating respective channel attention coefficients

The attention parameter of the current feature map is divided by the sum of all attention parameters for normalization to obtain

As shown in equation (2). Multiplying the characteristic diagram with the corresponding attention coefficient to form a new characteristic diagram

All new feature map accumulations as a result of a channel attention fusion strategy

As shown in equation (3).

Where m is equal to {1,2,3,4}, K is equal to {1,2, …, K } is the kth input image, K is the total number of fused images, c is the feature map

And attention factor

The c-th channel of (1).

As shown in FIG. 6, the spatial attention module based on fuzzy entropy first applies K sets of feature maps

Computing respective fuzzy entropy maps

Specifically, according to formula (4), a fuzzy entropy diagram corresponding to each channel in each group of feature diagrams is calculated.

Wherein (x, y, c) is E.R^W×H×C，N_x,y,cIs a 3 x 3 neighborhood centered at (x, y) in the channel signature graph of the c ∈, (i, j, c) N_x,y,c，

For the membership degree of the neighborhood pixel with (x, y) as the center in the c-th channel feature map, the calculation formula is as follows:

the fuzzy entropy diagram of the current characteristic diagram is divided by the sum of all fuzzy entropy diagrams for normalization to obtain

As shown in equation (6) and as the attention coefficient of the spatial attention module. Multiplying the characteristic diagram with the corresponding attention coefficient to form a new characteristic diagram

And new K groupsAccumulating the characteristic graphs to obtain the result of the space attention fusion strategy

As shown in equation (7).

Where m ∈ {1,2,3,4} refers to the mth layer fusion policy, K ∈ {1,2, …, K } refers to the kth input image, and K is the total number of fused images.

Example two:

multi-energy X-ray image fusion device based on deep learning

The multi-energy X-ray image fusion device based on the deep learning is used for executing the multi-energy X-ray image fusion method based on the deep learning.

The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A multi-energy X-ray image fusion method based on deep learning is characterized by comprising the following steps:

2. The deep learning-based multi-energy X-ray image fusion method according to claim 1, characterized in that: in step 2, the encoder consists of a main branch and an auxiliary branch, wherein the main branch firstly uses a 1 multiplied by 1 convolutional layer to increase the number of characteristic channels of an input image, and then uses 4 layers of multi-scale convolutional blocks to extract the global characteristics of the image; and the auxiliary branch uses a trainable edge detection operator to enhance the edge information of the input image to obtain a feature map with the same dimension as that of the main branch, and the only difference between the auxiliary branch and the main branch is that the trainable edge detection operator replaces the initial convolution layer of the main branch.

3. The deep learning-based multi-energy X-ray image fusion method according to claim 2, characterized in that: in step 2, the features extracted by the main branch and the auxiliary branch of the encoder are added and then enter a decoder, the number of image channels is changed into a single channel by the convolution layer at the tail end of the decoder, and a reconstructed image is output.

4. The multi-energy X-ray image fusion method based on deep learning of claim 2, wherein: in the step 2, the edge detection operator adopted by the auxiliary branch is a Sobel operator, a Laplacian operator, a Canny operator or a LOG operator.

5. The deep learning-based multi-energy X-ray image fusion method according to claim 2, characterized in that: in step 2, the main branch and the auxiliary branch are both connected with dense bypasses, and the bypasses are used for improving the characteristic multiplexing capability.

6. The deep learning-based multi-energy X-ray image fusion method according to claim 2, characterized in that: in step 2, the multi-scale convolution block has four branches, and each convolution block is composed of a convolution layer, an activation function and a batch normalization layer.

7. The deep learning-based multi-energy X-ray image fusion method according to claim 6, characterized in that: in step 2, the activation function of the convolution block adopts a Sigmoid function, a Relu function, a Tanh function, a Softmax function or a Leaky Relu function.

8. The deep learning-based multi-energy X-ray image fusion method according to claim 1, characterized in that: in step 2, the composite loss function for guiding the network training is formed by weighting and combining an L1 loss function and a consistency loss function based on the image block; the consistency loss function based on the image blocks is used for calculating the L1 norm value loss of the local energy map of the input image and the local energy map of the output image, and the calculation mode of pixel points in the local energy map is that the square sum of the difference values of adjacent pixels and middle pixels in one image block is calculated and then the average is calculated.

9. The deep learning-based multi-energy X-ray image fusion method according to claim 1, characterized in that: in step 3, the fusion strategy combining the channel attention and the spatial attention based on the fuzzy entropy specifically includes that the feature map is fused by using a channel attention module and a spatial attention module based on the fuzzy entropy respectively, and the average value of the outputs of the two fusion modules is taken as the feature map fusion result.

10. A deep learning based multi-energy X-ray image fusion apparatus for performing the method of any one of claims 1 to 9.