CN110189308B

CN110189308B - Tumor detection method and device based on fusion of BM3D and dense convolution network

Info

Publication number: CN110189308B
Application number: CN201910415029.9A
Authority: CN
Inventors: 刘慧�; 姜迪; 郭强; 张彩明
Original assignee: Shandong University of Finance and Economics
Current assignee: Shandong University of Finance and Economics
Priority date: 2019-05-17
Filing date: 2019-05-17
Publication date: 2020-10-23
Anticipated expiration: 2039-05-17
Also published as: CN110189308A

Abstract

The invention provides a tumor detection method and a device based on the fusion of BM3D and a dense convolution network, wherein similar blocks are marked; random discarding and quick marking; optimizing and training a DenseNet; and reconstructing based on the deeply trained data by using the spatial information, the structural information and the extracted characteristic information of the input image. The input spatial information is abstracted into one dimension, and the phenomenon of loss of irreversible initial features is reduced. Constructing a dense convolutional network fused with BM3D, using a scalable exponential linear unit activation function to replace a linear unsaturated unit activation function to activate the network, introducing negative part parameters, improving the network optimization degree and enhancing the network robustness, adding a layer of maximum pooling layer abstract image characteristics behind each dense block, and extracting tumor core information points. And the network end adopts the BM3D aggregation method to reconstruct the characteristics, and combines gradient and spatial information to improve the network effect. The accuracy of tumor detection is effectively improved.

Description

Tumor detection method and device based on fusion of BM3D and dense convolution network

Technical Field

The invention relates to the technical field of medical image processing, in particular to a tumor detection method and device based on fusion of BM3D and a dense convolution network.

Background

Medical images are derived from imaging techniques including computed tomography, magnetic resonance imaging, ultrasound, positron emission tomography, medical ultrasonography, and the like. Medical imaging techniques may acquire two-dimensional or three-dimensional images of corresponding locations of the human body. In a two-dimensional image, the smallest unit element representing specific information is called a pixel; the three-dimensional images are referred to as voxels. Under specific conditions, the three-dimensional image can be represented into a series of two-dimensional images, and the use mode greatly reduces the calculation complexity and the memory requirement. However, although medical image imaging technology has become mature, many medical images have low imaging resolution due to the mutual constraints of medical imaging equipment, radioactive element hazard, and human physiological health. For example, in the case of tumor detection, the low-resolution image cannot complete the classification task due to the similarity between the lung nodule and the tumor, and thus medical image processing techniques are in force. Medical image processing is spawning a higher demand for medical image quality, resulting from the related need for pathology problem studies, an important step in the analysis of medical images. Through carrying out secondary processing on the imaging image, the image can be clearer and simpler, the diagnosis efficiency is improved, and the misdiagnosis rate is reduced. Medical image fusion, ultrasound imaging, image reconstruction, and other techniques are several applications of medical image processing. Among them, in its underlying implementation, the semantic segmentation technique of medical images is one of the important means for processing medical images.

The image semantic segmentation is to segment regions with different meanings in an image, the regions meet region connectivity and are not intersected with each other, and the sum of all the regions forms the whole image. Fast and efficient separation of regions of different meaning is one of the targets of image segmentation. Threshold segmentation is the simplest pixel level image segmentation algorithm, taking a gray level image as an example, the gray levels of two objects are very similar, but the gray level histogram of the image is observed to find that the histogram has two different peaks, and the valley between the two peaks is selected as the threshold, so that the image can be well segmented. Edge detection is also a common image segmentation algorithm, and the edge pixels of the object are determined by adopting the idea that the gray value change of the edge pixels of the object is very violent, so that the object segmentation is completed. In addition, the idea of image segmentation based on regions is also common, for example, the image segmentation and merging uses the quadtree principle to select a plurality of mutually disjoint initial regions for the image, and the region segmentation and merging is performed according to a given uniformity detection criterion until the minimum number of regions. However, the processing effect is not ideal due to poor pixel resolution of most medical images. In order to effectively process image features on the basis of low-resolution pixels, some methods using blur information have emerged. The fuzzy clustering algorithm is based on a fuzzy set theory and a clustering method, and the adopted membership principle is effective to solve the problem. In addition, the medical image segmentation method using wavelet transformation also has better effect, and the wavelet decomposition structure of the image histogram is easy to detect the object with large gray level mutation in the image, thereby being convenient to use the threshold values of different degrees to segment the original image. In recent years, besides the traditional medical image semantic segmentation method, with the rise of a Convolutional Neural Network (CNN), image segmentation algorithms based on the CNN are also widely applied to the field of medical image segmentation.

In the CNN development process, it is once more popular because the AlexNet model proposed by Krizhevsky et al in 2012 seizes the champion with an advantage over the second 11% in the ImageNet competition in the image classification task, and the model uses a linear non-saturated model (ReLU) as its activation function, and the performance is improved by the dual-GPU parallel operation and local normalization together. It has to be mentioned that in the implementation of the CNN model, the ReLU activation function is simple and has better effect than the usual Sigmoid activation function. The Sigmoid activation function is also called S-type growth curve, the function image is like S lying, the upper limit is 1, the lower limit is-1, and the function is to map the variable to the (0,1) interval. The effect of the ReLU activation function is that positive values are unchanged, negative values are all changed into 0, and practice proves that the application of the ReLU activation function to CNN can better fit training and mine data features than the application of a Sigmoid activation function. After AlexNet, Karen Simoyan and the like propose VGG networks which continuously deepen the network structure and improve the performance of the network, and 1 multiplied by 1 convolutional layers are used for increasing linear transformation extraction characteristics; two loss return mitigation gradients are introduced when the GoogLeNet layer proposed by Christian Szegedy et al is deep, and a plurality of convolution kernels are used to add features when the width is expanded. The high-speed Network (Highway Network) places the gravity center on how to better extract the features, allows information to pass through each layer of the Network at high speed without obstruction, and effectively relieves the gradient problem. In 2015, a full convolution neural network (FCN) was also proposed in terms of object detection and semantic segmentation. FCN is an end-to-end CNN semantic segmentation model from pixel to pixel, and the core view is to construct a 'full convolution' network, the network has no requirement on the size of an input image, and the final output result is the same as the input result. The network is added with a deconvolution layer, which is used for sampling the characteristic diagram of the input layer on the network, so as to recover the spatial resolution of the data consistent with the input of the input layer. Thus, the network also successfully preserves spatial information in the original input image while predicting for each pixel of the input image. Secondly, the FCN introduces a jump connection between down-sampling and up-sampling, and combines semantic information and characteristics from deep layers and shallow layers to optimize pixel level resolution loss, thereby being beneficial to restoring micro-scale information from a down-sampling layer in an up-sampling process.

However, the conventional CNN or FCN has problems of more or less information loss, etc. during information transmission, and after the input information or gradient information passes through multiple layers, the gradient is likely to disappear or explode. This problem is particularly acute when the network depth is deep.

Disclosure of Invention

The invention provides a tumor detection method based on the fusion of BM3D and a dense convolution network, which is used for improving the accuracy of semantic detection and segmentation of an original image in a tumor detection task.

Therefore, the method comprises the following steps:

marking similar blocks;

step two, randomly discarding and rapidly marking;

step three, optimizing and training a DenseNet network;

and fourthly, reconstructing based on the deep training data by utilizing the spatial information and the structural information of the input image and the extracted characteristic information.

The invention also provides a device for realizing the tumor detection method based on the fusion of BM3D and a dense convolution network, which comprises the following steps:

a memory for storing a computer program and a lesion detection method based on the fusion of BM3D and a dense convolutional network;

a processor for executing the computer program and implementing a multi-memory stress testing system to implement the steps of the tumor detection method based on the fusion of BM3D and a dense convolutional network.

According to the technical scheme, the invention has the following advantages:

based on various advantages of the DenseNet, the DenseNet is applied to a tumor detection task, and an image denoising algorithm BM3D is fused to establish a DenseNet model for enhancing semantic segmentation of medical tumor images. The tumor detection model adopts the similar block grouping technology of BM3D and places the similar block grouping technology at the front end of DenseNet, reuses image characteristics and optimizes a network structure, and enhances network robustness. And a scalable exponential linear unit activation function (SELU) is adopted to replace a ReLU activation function in the constructed DenseNet network model to optimize a parameter network structure, so that the loss of narrow and small features is avoided by fully utilizing detailed features, and the capability of extracting the features from the network is improved. Meanwhile, a maximum pooling layer is added at the tail end of each layer of dense blocks, so that abstract characteristic information is deepened, and the detection effect is more accurate. And finally, at the end of the network, performing feature reconstruction by adopting a similarity aggregation method in BM3D, and fully mining the spatial structure relationship of image pixel levels.

The invention adopts the similar block grouping technology of BM3D and arranges the same at the front end of DenseNet, the characteristics of ReLU optimized negative part are replaced by SELU activating function in the network architecture, the characteristics learned by different layers are mapped and connected in series by using the dense block structure in DenseNet, and the maximum pooling secondary extraction characteristics are added at the same time. And finally, at the end of the network, performing feature reconstruction by adopting a similarity aggregation method in BM3D, and fully mining the spatial structure relationship of image pixel levels. Finally, the region fusion was performed by the aggregation method of BM 3D. The network of the invention adopts a plurality of evaluation indexes, such as average cross-over ratio, average pixel precision and the like, and shows advantages. The network architecture has good robustness in medical image segmentation optimization.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description will be briefly introduced, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a tumor detection model based on fusion of BM3D and a dense convolutional network;

FIG. 2 is a DenseNet dense block diagram;

FIG. 3 is a schematic diagram of the comparison of three activation functions with a SELU;

FIG. 4 is a schematic diagram comparing ablation experiments a and b-h;

FIG. 5 is a comparison image of six advanced methods and the present method;

fig. 6 is a flowchart of a tumor detection method based on the fusion of BM3D and a dense convolutional network.

Detailed Description

In the present invention, as shown in fig. 1 to 6, in the aspect of brain lesion segmentation, a 3D convolutional neural network framework is proposed, which adopts a dual parallel network architecture to simultaneously process high and low resolution images, and is called deep media. Dense convolutional networks (DenseNet) propose the idea to solve network degradation using a dense block structure, built up of dense blocks and pooling operations, each layer taking as input the outputs of all previous layers. The structure of DenseNet mainly refers to high way Network, ResNet and GoogleNet, and the final classification accuracy is improved by deepening the Network structure.

In a tumor detection task, the method improves the accuracy of semantic detection and segmentation of the original image. DenseNet has a good classification effect in image classification such as ImageNet data sets, and the combination of DenseNet and BM3D can be migrated to a tumor detection task by verifying the structure of DenseNet. DenseNet looks at features, achieves better results by high utilization of features, gets fewer parameters, directly connects all layers to each other using a connection operation, each layer uses the outputs of all layers in front, and passes its own output to all layers in the back, performing iterative summation on the previous feature maps. This approach solves the problem that as the depth of the network increases, after the input information or gradient information passes through several layers, the information may approach infinity and eventually be lost at the end of the network. Based on the above, the invention constructs a DenseNet model for detecting tumor, and incorporates BM 3D-related structure therein, the network structure is shown in FIG. 1, wherein similar block marks, fast marks, random discard, volume and operation, dense blocks, transition layers, feature fusion, and classification markers exist. Wherein the transition layer is a convolution operation plus a pooling operation.

Step one of the invention is to mark similar blocks;

specifically, the similar block marking operation in the framework of the invention is similar to the similar block grouping method in the BM3D, and a similar block set is found according to an Euclidean distance formula by adopting the Euclidean distance and is used for extracting corresponding two-dimensional and three-dimensional information of an input image so as to facilitate feature fusion reconstruction.

BM3D is somewhat an extension of non-local mean (NLM), the main idea of which is non-local block matching. When similar blocks are inquired, the BM3D firstly uses the linear transformation of a hard threshold to process Euclidean distance calculation, which is different from the idea that NLM directly uses Euclidean distance to search, so that the calculation complexity is reduced; after finding out the similar blocks, the BM3D performs domain conversion on the similar blocks, performs collaborative filtering noise reduction, and weights the similar blocks at the aggregation operation to finally obtain the noise-reduced processing blocks, which are different from NLM direct mean processing but introduces similar block noise.

Wherein the BM3D similar block grouping operation is to first select N in the original image_kEach k × k reference block (considering algorithm complexity, there is no need to select a reference block for each pixel point, and usually every S is adopted_k(S_kLess than 10) pixels are selected, and the complexity is reduced to that of the original algorithm

) And selecting an n multiplied by n area around the reference block to search similar blocks, searching all blocks with the difference smaller than a threshold value in the area, and integrating the searched similar blocks into a three-dimensional matrix. In addition, the reference block also needs to be integrated into the three-dimensional matrix. BM3D uses Euclidean distance to judge similar blocks, and adopts normalized two-dimensional linear transformation and hard threshold value for preprocessing block distance, and the formula is as follows:

wherein x is a number of pixels in the image,

is a target similar block, Z_xIs a search block that is to be searched,

is to select the size of the block, gamma' is a hard thresholding operation, the threshold is set to

Is a normalized two-dimensional linear transformation.

Similar block sets can be found according to the Euclidean distance formula, and the similar block sets are shown by the formula (2):

wherein X is the number of images in the image,

is a hyper-parameter, which can determine whether blocks are similar to each other,

is composed of

A set of similar blocks.

The grouping difference of the similar block marking operation of the invention and the similar blocks of BM3D is that a plurality of blocks with the difference degree smaller than the threshold value, which are found by using the reference block, are directly stored in a two-dimensional matrix together with the reference block A and are marked as marking blocks (A, A1, A2, …), (A, A1, A2, …) which are two-dimensional space information in one-dimensional format, and the marking blocks are basic information. Several obtained two-dimensional matrices are then integrated into a three-dimensional matrix together with spatial information. In a sentence explaining the difference between the two, the similar block grouping operation of BM3D has several three-dimensional blocks, and the use of similar block labels has only one three-dimensional block.

By using the similar block analysis method, the characteristics of the image can be extracted in a more objective mode, the similar block set is obtained, and the network fitting resistance and stability are improved by adopting a new data access mode.

Step two of the invention is random discarding and rapid marking;

specifically random dropping, by randomly closing N ports to the input, the fitting resistance and robustness of the network are increased. The random discarding method used in the invention is similar to artificially adding noise to the image and artificially deforming the image to some extent, but the random discarding method is safer than artificially intervening the noise added by the image. When in use, the randomly discarded parameter may be set to 1, so that the input matrix is not operated.

The quick marks in the frame can be divided into two types according to different functions, namely label quick marks and similar block quick marks. In which tag fast marking is a simplified operation that is regarded as similar block marking, which marks only a reference block as a marking block and takes the marking block as an input, thereby greatly improving the operation efficiency, but this method cannot ensure the prominence of the final result. The label rapid labeling can directly skip the similar block labeling and randomly abandon the similar block labeling, and the reason is that when actual input data is overlarge or the memory space is seriously insufficient, an intuitive experimental result can be rapidly and efficiently generated and can be used as pre-training pre-iteration to be beneficial to data analysis. The second operation may be considered a complementary operation to the labeling of the similar blocks. The operation is to randomly select the mark blocks generated by marking similar blocks, which is beneficial to reusing the characteristics. Similar block fast flag differs from random discard in that random discard is a delete operation, while similar block fast flag is an add operation. The similar block fast flag may be set by setting the parameter value to 0 so that no parameter is input.

Step three, optimizing and training a DenseNet;

specifically, the method has a complete DenseNet structure, the activation function is replaced to SELU, each dense block is added with a maximum pooling layer, the feature reuse and connection mechanism of the dense blocks is fully utilized, and deep features are further abstracted and extracted. The advantage of DenseNet is that feature learning is deepened by using the height of the feature, the degradation problem is solved, the feature utilization rate is improved, and all layers in the dense block can directly receive the control information. The output dimension of each layer in the network has psi (growth rate parameter) feature maps, so the network depth is deepened, the number of feature maps is increased, and the two feature maps are linearly related.

In CNN, if the network contains N layers, the number of connections is N. However, in DenseNet, there are N (N +1)/2 linkages. The dense block structure of the network is shown in FIG. 2, x₀Is an input, H₁Is x₀；H₂Is x₀And x₁Wherein x is₁Is H₁An output of (d); h₃Is x₀、x₁And x₂Wherein x is₂Is H₂An output of (d); by analogy with that。

Let x_lIs the l (l) dense block in DenseNet>1) And (4) outputting the layers. To understand the structure of DenseNet more easily, the following three points:

first, in CNN, x_lIs passed through the output x to the previous layer, i.e. layer l-1_l-1Acting with non-linear variation H_lOutput with the formula of x_l＝H_l(x_l-1), (3)

Wherein the change is non-linear H_lOften defined as convolution by ReLU activation function and random dropping of some trained connections.

The ReLU activation function is a piecewise linear function that turns all possible negative values to 0 and keeps the positive values of the inputs unchanged. The single-side inhibition mode can better mine relevant features, and for a nonlinear function, the problem of gradient disappearance can be avoided by using a ReLU activation function. The activation formula is as follows:

secondly, in ResNet, in order to simplify the training of deep networks, a residual block is introduced, allowing the gradient to flow directly to an earlier layer, and performing feature reuse, thereby realizing the addition of the output identifier mapping. The resulting output x_lBecomes:

x_l＝H_l(x_l-1)+x_l-1, (5)

third, in DenseNet, a more dense connection model is designed that iteratively connects all the characteristic outputs in a direct layer-to-layer connection. Thus, the output x of the Nth layer_lThe formula of (1) is:

x_l＝H_l([x_l-1,x_l-2,…,x₀]), (6)

wherein [ …]Indicating a connection operation by feature reuse of the outgoing connection. H_lIs defined as a Batch Normalization (BN), followed by a ReLU activation function, followed by a convolutional layer and a random deactivation layer. Batch standardization layer mainly solves the problems of gradient disappearance and ladderAnd (4) the explosive problem, and a forward propagation structure and a backward propagation structure are provided. The random inactivation layer can solve the redundancy problem of information by randomly closing the neurons.

The advantage of this connection mode of DenseNet is the ability to reuse features. And allows all layers in the network architecture to receive control information directly. The output dimension of each layer in the network has psi feature maps, so the depth of the network is deepened, the number of feature maps is increased, and the two feature maps are linearly related. In order to avoid data explosion and reduce the space dimension of the characteristic diagram, the network end adopts a1 × 1 convolution operation and a2 × 2 pooling operation for optimization.

When the similar block mark generates data input, the input is firstly subjected to batch normalization operation, SELU function activation and convolution operation to extract features. Wherein SELU has the formula

FIG. 3 is a linear representation of SELU activation function and activation function images of ReLU, PReLU, ELU, etc., wherein ReLU has the advantage of effectively alleviating the gradient vanishing problem and enabling the model to have sparse representation capability. But the disappearance of neuronal death occurs when the neuronal coefficient is less than 0. The PReLU can solve the problem of neuron death, and the negative axis characteristic is reserved by introducing negative part parameters. ELU reduces the difference between normal gradient and unit gradient, accelerates learning efficiency. The three activation functions are all flat at the negative half axis, which suppresses gradient explosion when the variance is too large. And the negative half shaft of the SELU has a fixed point, so that the negative half shaft can be increased when the variance is too small, and the disappearance of the gradient is prevented. The distribution of the SELU activation function can be automatically normalized to 0 mean and unit variance under certain parameters, which requires two conditions to be satisfied, namely, taking the values of λ and α for satisfying the given formula (7) or initializing the weights according to the given parameters. As the feature maps are required to keep the same size in the single dense block, a transition layer is arranged between different feature maps for downsampling, so that the realization of the algorithm is ensured. The transition layer is composed of batch standardization operation, convolution operation and pooling operation. After the transition layer, the invention adds a maximum pooling operation to perform two-degree extraction on the characteristics.

Due to the existence of dense blocks, features extracted from shallow layers may still be used directly by deeper layers, even in transition layers outside the dense blocks, using features of all layers in the previous dense blocks. The inventive network uses the SELU activation function and adds a max pooling operation in dense blocks. This allows for increased utilization of features in the module.

And step four, reconstructing the image based on the deep training data by utilizing the spatial information and the structural information of the input image and the extracted characteristic information.

Specifically, the data of the deep training is reconstructed using the spatial information, the structural information, the extracted feature information, and the like of the input image. The truth value transmission process is to transmit the spatial information and the dimensional characteristics of the original data, the marking information and the spatial structure of the randomly discarded data and the like to the classification mark. The fast truth value transmission process only transmits the spatial information, the spatial structure and the like of the original data to the classification mark for utilization and reconstruction. A 'rough' model can be constructed by using rapid truth value transfer, and the training efficiency of the model is accelerated. The classification mark comprises transmitted depth features and feature information obtained by searching similar blocks from the input image and processing the similar blocks, and the corresponding information is in one-to-one correspondence through collection and connection of the classification mark, so that subsequent processing is facilitated. The characteristic fusion method is characterized in that the gray value of each pixel is updated by weighted average of the value of each corresponding position block, and the weight depends on the number of 0 and the noise intensity. The two-dimensional blocks are fused to the original positions, and the final image is calculated.

Wherein

Is the weight of the image,

is the number of non-0 coefficients after the hard threshold operation of equation (6),

the model fusion result is good, and the network architecture based on the combination of BM3D and DenseNet can complete the tumor identification task of medical image semantic segmentation.

The so many advantages of DenseNet have led to their deep research and application even if they appear for only two years. In the aspect of semantic segmentation, J gou and the like improve an upsampling process, provide full convolution DenseNet, reuse features and avoid feature explosion. It uses a reference dataset training of the urban scene, applying DenseNet on semantic segmentation. In the field of image classification, Huang and the like provide an improved network CondenseNet, and from the aspect of actual efficiency, a more efficient image classification network model is obtained by combining direct interconnection of dense blocks and deleting unused connection mechanisms. In the super-resolution direction of the image, Zhang and the like are combined with ResNet and DenseNet to fuse a residual error structure and a dense block structure, a residual error dense block is proposed, and the image is reconstructed from a low-resolution original image to a high-resolution image.

Although the research on DenseNet is rapidly developed, because DenseNet is too young, there are still many aspects to be perfected, especially in practical application, and it is more necessary to be applied to practice step by step to exert its advantages.

Based on the tumor detection method based on the fusion of BM3D and the dense convolution network, the method adopts the average cross-over ratio, the average pixel precision and the peak signal-to-noise ratio as evaluation indexes. The intersection-to-union ratio (IoU) is an accurate measure of semantic segmentation by computing the ratio of the intersection and union of the two sets. In the problem of semantic segmentation, where the two sets are true and predicted, this ratio can be transformed into the sum of true, false negative, and false positive (union) over a positive-true ratio.

The Mean Intersection over Union (MIoU) is calculated IoU on each class and then averaged. Due to its simplicity and strong representativeness, MIoU is called the most common metric for semantic segmentation. The function value is less than or equal to 1, and the effect is better when the numerical value is larger.

Mean Pixel Accuracy (MPA) is a simple enhancement of Pixel Accuracy. Pixel Accuracy (PA) is one of the simplest measures of detecting the Accuracy of an image, and the result is to mark the correct pixels as a proportion of the total pixels. The formula is as follows:

the average pixel precision calculates the proportion of the number of correctly classified pixels in each class, and then averages all classes. Wherein, the value of the accuracy function of the average pixel is less than or equal to 1, and the effect is better when the numerical value is larger. The specific method comprises the following steps:

peak Signal to Noise Ratio (PSNR) is an objective standard for evaluating the similarity between an image and an original image, and in order to measure the quality of the processed image, the PSNR value is usually observed to measure whether a certain processing can meet an expected requirement. The PSNR is applied to the research of ablation experiments on similar mark blocks and the like, and is a logarithmic value of mean square error between an original image and a processed image relative to (2^ n-1) ^ 2.

The tumor detection method is verified through an experimental image, the specific experimental image selects lung, brain and other organ slices containing tumors or nodules, and image label labeling is completed manually. Since medical images are confidential and highly dependent on the accuracy of machine equipment, acquisition of relevant images is partially difficult, and the total number of final network images is 1200, and the number of training sets is 1000, wherein 600 images are 512x512,400 images and 128x128 images. In the experiment, the network structure is realized by adopting a TensorFlow framework. The experimental hardware facilities were Intel (R) Xeon (R) E5-2643 v4@3.40GHz CPU, NVIDIA GeForce GTX 1080M GPU,256GB memory, and the operating system was ubuntu 14.04.

The purpose of the experiment was to complete tumor detection. The model proposed by the invention is based on DenseNet fusion BM3D, uses SELU as an activation function, and adds maximum pooling operation to dense blocks. Similar block segmentation based on BM3D and the training of DenseNet require neither too large nor too small block length and width. BM3D pursues a small number of block parameters, so the present invention trains with tiles in units of 8 as input in each training batch. Although DenseNet cannot be too large in the aspect of extracting features, the small-value image block cannot learn features, so that the size of the image block is set to be 32 × 32 in the part, the image features can be acquired more completely, and the data security can be ensured. In the experiment, 150 images different from the training image were selected as the test set, and 50 images different from both the training image and the marker image were selected as the verification set.

In the learning rate problem, the initial learning rate is set to 1e^-3When the image is used halfway, the learning rate is reduced to 1e^-4. Images used for training the network are enhanced through operations such as overturning, area warping and the like, so that the network can be rapidly trained, and overfitting is restrained. The loss rate of the model was set to 0.2 and the weight attenuation was set to 1e^-4The Nesterov momentum is set to 0.9 and the number of iterations is set to 150. The experimental results are observed through average cross-over ratio, pixel precision, average pixel precision and the like.

In the similar block mark, random discard, label quick mark and similar block quick mark research, the number of iterations is 25, 50, 75, 100, 125, 150, 175, 200, 225 respectively (for simplicity, the number of iterations is equal to 125 is taken as an example), the similar block mark is tested, and the similar block mark is opened and closed under permission (the similar block mark is marked to be 1 (existing), the random discard is marked to be 1 (non-discard parameter), the label quick mark is marked to be 0 (non-input parameter) and the similar block quick mark is marked to be 0 (non-input parameter) to be an original state a, if the mark state is T, the mark state F is absent, and the mark state 0 is not possible.

The fusion image in the feature fusion reconstruction process is used for testing through an ablation experiment of researching similar block marks, randomly discarding, labeling rapid marks and similar block rapid marks. The test was performed with values of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, respectively, for each module present. As the test is complicated, the invention takes the following conditions as an example, if the random discarding module exists, the value is 0.7; if the label rapid marking module exists, the value is also 0.7; if a similar block quick flag module exists, 0.7 is also used as the test value. Unifying module data is beneficial to unifying standards, performing ablation experiments, and observing peak signal-to-noise ratios of b-h and a, as shown in table 1:

tables 1 b-h PSNR comparison with a (number of iterations equal to 125)

The experimental results in Table 1 show that the PSNR results of b, c, d and a are better than those of e, f, g, h and a. This shows that the existence of the random discarding method is the premise of better network effect of the invention;

b. PSNR of c, d, e, f, g and a are all greatly larger than h, and existence of a method for obtaining similar block marks is a core element of the network. It can be seen from the PSNR of b, c, d and a that the PSNR of c structure and a is greater than that of b, d and a, which indicates that sometimes the feature reuse may cause an overfitting problem, but leads to a slight decrease in the training result. Similarly, the PSNR results at e, f, g and a are also found.

Fig. 4 is a practical result of selecting one image for a-h transformation, and it can be seen that although PSNR has a large difference, the overall image has a small difference, the overall architecture is complete, and the difference is more detailed. According to the commonly selected image blocks, the detail description capability of a, b, c and d is obviously higher than that of e, f, g and h. This demonstrates the necessity of a random discard approach to a good optimization of the network, similar block grouping, from an image vision perspective. In addition, although the PSNR of the image processed by the method f is relatively low, the detail restoration capability is not poor.

And (3) comparing the network with the processing results of SegNet, DeconvNet, hole convolution (relative volumes), RefineNet, PSPNet and DeepLab v36 network images, wherein during model training, the parameters are unified, and the initial learning rate is set as 1e^-3When the image is used halfway, the learning rate is reduced to 1e^-4. The images used for network training were all enhanced images that were also subjected to operations such as inversion and region warping, the loss ratios of the models were all set to 0.2, and the weight attenuation was set to 1e^-4The Nesterov momentum is set to 0.9 and the number of iterations is set to 150. The experimental hardware facilities were the same and were Intel (R) Xeon (R) E5-2643 v4@3.40GHz CPU, NVIDIA GeForce GTX 1080M GPU,256GB memory, and the operating system was ubuntu 14.04. The experimental results were visually detected by calculating the corresponding MIoU and MPA, respectively.

SegNet is composed of an encoder, a decoder and a normalized exponential function classification layer, with the advantage of using the decoder to upsample its lower resolution input feature map. This approach eliminates the need for the network to learn upsampling, balancing performance and memory.

DeconvNet introduces deconvolution to solve pixel level prediction, improves the defects of the traditional CNN network part, and can identify fine objects.

Under the condition that the relative parameters are guaranteed to be unchanged, the view field of a convolution kernel is increased, pooling operation is not needed, and meanwhile, the size of the output feature mapping is guaranteed to be unchanged.

The RefineNet is a multi-stage refining network, can fuse rough high-level semantic features and fine low-level features, and effectively retains detailed information.

The pyramid pooling module provided by the PSPNet can aggregate context information of different areas, and the capability of acquiring global information is improved.

The deep lab v3 further studies the hole convolution, and designs the hole convolution cascade or the hole convolution parallel framework with different sampling rates.

The results of the method of the present invention comparing MIoU and MPA with 6 networks under substantially the same variable conditions are shown in table 3.

TABLE 3 semantic segmentation image MIoU and MPA comparison of different algorithms

The MIoU is one of core indexes for detecting semantic segmentation, and the accuracy of the MIoU can be judged to a great extent. Whereas MPA may form a supplementary reference to MIoU. It can be seen in table 3 that of the six methods used for the comparative experiments under the medical image data set used in the present invention, the MIoU of DeepLab v3 was the highest relative to the calculated value of MPA, followed by PSPNet, RefineNet and scaled contributions. And SegNet and DeconvNet have different relative results from the above four.

The method of the invention is optimal on medical image datasets used by the invention, both for MIoU results and MPA results. MIoU was 0.4% more than DeepLab v3 and MPA was 0.5% more.

Fig. 5 is a diagram of the results of three randomly selected images from the verification set in 7 network structures including the invention, the original three images are located at the leftmost side, the group try image is located at the rightmost side, and the rest are the arrangement of the inspection methods, the last method (located at the left side of the group try image) in the verification method is the method of the invention, and it can be seen in visual effect that the method of the invention is closest to the group try. According to the three images, it can be seen that DeconvNet and DeepLab v3 are the most stable and the image fluctuation is not large, but the RefineNet and the PSPNet are in an unstable state, and the three images are different in labeling difference mode. In PSPNet, the narrow red marker region in the first image is located at the lowermost edge, but the narrow red marker regions in the second and third images are instead in the lungs. In RefineNet, the difference between the colors of the marks and the background is too small to be recognized accurately. The method of the invention (with the three figures as reference) is accurate and stable. The reason why the PSNR of some methods is high but the visual effect of the image is not very ideal is not known at present, and the specific reason is currently tested.

The invention improves the dense convolution network, introduces a three-dimensional block matching filtering module, and provides an algorithm combining the three-dimensional block matching filtering and the dense convolution network for medical image segmentation and tumor detection. The tumor detection model adopts a similar block grouping technology of BM3D and places the similar block grouping technology at the front end of DenseNet, replaces the characteristics of the ReLU optimization negative number part with SELU activation function in the network architecture, and connects the learned characteristics of different layers in series by using a dense block structure in the DenseNet network and adds the maximum pooling secondary extraction characteristics. And finally, at the end of the network, performing feature reconstruction by adopting a similarity aggregation method in BM3D, and fully mining the spatial structure relationship of image pixel levels. Finally, the region fusion was performed by the aggregation method of BM 3D. The network of the invention adopts a plurality of evaluation indexes, such as average cross-over ratio, average pixel precision and the like, and shows advantages. Experimental results show that the network architecture has good robustness in medical image segmentation optimization.

The invention also provides a device for realizing the tumor detection method based on the fusion of BM3D and a dense convolution network, which comprises the following steps: a memory for storing a computer program and a lesion detection method based on the fusion of BM3D and a dense convolutional network; a processor for executing the computer program and implementing a multi-memory stress testing system to implement the steps of the tumor detection method based on the fusion of BM3D and a dense convolutional network.

The techniques described herein may be implemented in hardware, software, firmware, or any combination thereof. Various features are described as modules, units or components that may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices or other hardware devices. In some cases, various features of an electronic circuit may be implemented as one or more integrated circuit devices, such as an integrated circuit chip or chipset.

The code or instructions may be software and/or firmware executed by processing circuitry including one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used in this disclosure, may refer to any of the foregoing structure or any other structure that is more suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described in this disclosure may be provided in software modules and hardware modules.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An apparatus for implementing a tumor detection method based on fusion of BM3D and a dense convolutional network, comprising:

a processor for executing the computer program and implementing a multi-memory stress testing system to implement the steps of the tumor detection method as based on the fusion of BM3D and a dense convolutional network;

the tumor detection method based on the fusion of BM3D and a dense convolution network comprises the following steps:

marking similar blocks;

step two, randomly discarding and rapidly marking;

step three, optimizing and training a DenseNet network;

configuring a DenseNet structure, setting an activation function as SELU, adding a maximum pooling layer to each dense block, and extracting deep features by using a feature reuse and connection mechanism of the dense blocks;

based on a DenseNet structure, feature learning is deepened by utilizing the height of the features, and all layers in the dense block receive control information;

the output dimension of each layer in the network has psi feature maps, so that the depth of the network is deepened, and the number of feature maps is increased; psi is a growth rate parameter;

dense block structure, x, of a configuration network₀Is an input, H₁Is x₀；

H₂Is x₀And x₁Wherein x is₁Is H₁An output of (d);

H₃is x₀、x₁And x₂Wherein x is₂Is H₂An output of (d);

let x_lIs the output of the l-th layer of the dense block in DenseNet,/>1；

Configuration x based on DenseNet structure_l，x_lBy outputting x to the l-1 th layer of the previous layer_l-1Acting with non-linear variation H_lOutput with the formula of x_l＝H_l(x_l-1) (3)

Wherein the change is non-linear H_l(x_l-1) Defined as convolution by ReLU activation function and random discarding of some trained connections;

the ReLU activation function is a piecewise linear function, all negative values are converted into 0, and the input positive value is kept unchanged;

for non-linear functions, the ReLU activation function is used to avoid the problem of gradient disappearance; the activation formula is as follows:

in ResNet, training a deep network, introducing a residual block, allowing a gradient to directly flow to an earlier layer, reusing features, realizing the addition of the output identifier mapping, and obtaining an output x_lBecomes:

x_l＝H_l(x_l-1)+x_l-1(5)

in DenseNet, the iteration is in a direct connection from layer to layerConnecting all characteristic outputs; output x of Nth layer_lThe formula of (1) is:

x_l＝H_l([x_l-1,x_l-2,…,x₀]) (6)

wherein [ …]Representing a connection operation by performing feature reuse on the outgoing connection; h_l([x_l-1,x_l-2,…,x₀]) The definition of (1) is that a batch standardization layer is connected with a ReLU activation function and then a convolution layer and a random inactivation layer; the batch normalization layer is used for solving the problems of gradient disappearance and gradient explosion and has a forward propagation structure and a backward propagation structure; the random inactivation layer solves the redundancy problem of information by closing the neurons randomly;

when the similar block marks generate data input, firstly carrying out batch processing standardization operation, SELU function activation and convolution operation on the input to extract features; wherein SELU has the formula

Under the preset parameters, the distribution of the SELU activation function can be automatically normalized to a mean value of 0 and a unit variance;

taking specific values of λ and α for satisfying a given formula (7) or initializing weights according to given parameters;

because the feature maps in the single dense block are required to keep the same size, a transition layer is arranged among different feature maps for down-sampling, and the realization of an algorithm is ensured;

the transition layer consists of batch standardization operation, convolution operation and pooling operation;

configuring maximum pooling operation after the transition layer, and performing second-degree extraction on the features;

2. The apparatus for implementing tumor detection method based on fusion of BM3D and dense convolutional network as claimed in claim 1,

the first step further comprises the following steps:

selecting N in the original image_kK × k reference blocks;

usually every S_kIndividual pixel selection, S_kLess than 10, the complexity is reduced to the original algorithm

Selecting an n multiplied by n area around the reference block to search similar blocks, searching all blocks with the difference smaller than a threshold value in the area, and integrating the searched similar blocks into a three-dimensional matrix;

integrating the reference block into the three-dimensional matrix; BM3D uses Euclidean distance to judge similar blocks, and adopts normalized two-dimensional linear transformation and hard threshold value for preprocessing block distance, and the formula is as follows:

wherein x is a number of pixels in the image,

is a target similar block, Z_xIs a search block that is to be searched,

Is a normalized two-dimensional linear transformation;

finding a similar block set according to an Euclidean distance formula, wherein the similar block set is shown by the formula (2):

wherein X is the number of images in the image,

is a hyper-parameter that can determine if blocks are similar to each other and remember

Is composed of

A set of similar blocks.

3. The apparatus for realizing tumor detection method based on fusion of BM3D and dense convolutional network as claimed in claim 1 or 2,

the second step further comprises:

the randomly discarded parameter is set to be 1, and the input matrix is not operated;

fast tags are classified in the framework as: label quick tags and similar block quick tags;

the random discarding is deleting operation, and the similar blocks are quickly marked as adding operation; setting a parameter value to be 0 by the fast flag of the similar block;

the label is marked as the simplified operation of similar block marking quickly, the operation marks the reference block as the marking block, and the marking block is taken as the input; the label fast marking directly skips similar block marking and randomly discards two operations;

similar blocks are quickly marked as marked blocks generated by randomly selecting similar blocks.

4. The apparatus for implementing tumor detection method based on fusion of BM3D and dense convolutional network as claimed in claim 1,

the fourth step also comprises:

reconstructing the deeply trained data by using the spatial information and the structural information of the input image and the extracted characteristic information;

based on a truth value transfer process, transferring spatial information and dimensional characteristics of original data, label information and a spatial structure of randomly discarded data to a classification label;

the quick true value transmission process is to transmit the spatial information and the spatial structure of the original data to the classification mark for utilization and reconstruction; a rough model can be constructed by using rapid truth value transfer, so that the model training efficiency is accelerated;

the classification mark comprises transmitted depth features and feature information obtained by searching similar blocks from an input image and processing the similar blocks, and the corresponding information is in one-to-one correspondence through the collection and connection of the classification mark, so that the subsequent processing is facilitated;

and fusing the characteristic information.

5. The apparatus for implementing tumor detection method based on fusion of BM3D and dense convolutional network as claimed in claim 4,

the method for fusing the characteristic information comprises the following steps: updating and configuring the gray value of each pixel by weighted average of the value of each corresponding position block; the weight depends on the number of 0's and the noise strength; fusing the two-dimensional block to the original position, thereby calculating a final image;

wherein

Is the weight of the image,

X→{0,1}；

the model fusion result is based on a network architecture combining BM3D and DenseNet to complete the medical image semantic segmentation tumor recognition task.