CN113256536B

CN113256536B - Ultrahigh-dimensional data reconstruction deep learning method based on wavelet analysis

Info

Publication number: CN113256536B
Application number: CN202110682443.3A
Authority: CN
Inventors: 胡劲楠; 王俊彦
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-11-23
Anticipated expiration: 2041-06-18
Also published as: CN113256536A

Abstract

The invention discloses a wavelet analysis-based ultrahigh-dimensional data reconstruction deep learning method, which utilizes high-dimensional high-order discrete wavelet packet transformation to expand high-dimensional data to different frequency domain channels, and combines a plurality of parallel neural networks to realize the reconstruction task of the high-dimensional data; data are preprocessed firstly, wavelet packet coefficients of sub-domains of different frequency bands are converted through wavelet packets, the wavelet packet coefficients are input into an independent network which is built and trained for the data, and the output of the network is subjected to wavelet packet inverse conversion to reconstruct an original image. The invention utilizes the independent property of each frequency domain of the high-dimensional data after wavelet transformation and the GPU memory in parallel, accelerates the training process of the neural network and makes the deep learning artificial task originally limited by hardware computing resources possible. The invention also extends to the segmentation and generation tasks. For the segmentation task, the output result of the U-net network is subjected to deconvolution up-sampling to form an original image resolution segmentation label. For the generation task, the neural network of each channel is changed to GAN.

Description

Ultrahigh-dimensional data reconstruction deep learning method based on wavelet analysis

Technical Field

The invention relates to the technical field of image processing and neural networks, in particular to a super-high dimensional data reconstruction deep learning method based on wavelet analysis.

Background

In the field of image processing, the fast discrete wavelet transform applies a series of filters to expand image information into different independent frequency domain sub-bands and is represented by wavelet coefficients.

The CNN is the basis of a neural network in the field of processing images, the convolutional layer is the core of the CNN, and the detailed information of the images is extracted through a series of filters to generate a feature vector diagram. The pooling layer introduces invariance for CNN, and simultaneously performs down-sampling to expand the receptive field of the next layer of convolution kernel, so that the network can well learn the characteristic information of the image under different scales. The active layer is generally a nonlinear function, so that the network can better fit an arbitrary function, and the overfitting phenomenon of the ultra-deep network can be relieved.

U-net is a common image semantic segmentation and reconstruction model and is composed of an encoder and a decoder.

The GAN architecture is composed of a generator and a discriminator in cascade connection, and is widely applied to image generation tasks. The core idea is to make two networks compete with each other, which tends to reduce the discrimination accuracy of the discriminator from the generator point of view, i.e. strive to generate images similar to the Truth (Ground Truth) in order to be spurious; from the perspective of the discriminator, it tends to reduce the output image similarity of the generator, i.e., continuously improve the auditing standard. After the training is completed, the output of the discriminator is converged to a certain value, and dynamic balance is achieved.

In the prior art, DTI (Diffusion Tensor Imaging) is a method for describing a brain structure by multi-directional MRI scanning data, generally, scanning data of a tested individual is as large as several Gigabytes (GB), deep learning often requires a very large sample size, and data directly participating in training is about 100GB-200GB, but the memory capacity of the current deep learning chip is limited, for example, england tesla V100 only supports a memory capacity of 32GB, so that a large amount of DTI data cannot be directly stored in the memory of the deep learning chip for deep learning training.

Disclosure of Invention

In order to solve the defects of the prior art, overcome the problem that the deep learning network is difficult to support under the existing hardware conditions to train the ultrahigh-dimensional data and achieve the aim of training the large-volume ultrahigh-dimensional data by using limited computing resources, the invention adopts the following technical scheme:

as shown in fig. 1, a parallel depth network model based on wavelet transform includes the following specific steps;

s1, preprocessing data;

s11, data tag: tagging data collected by an individual;

s12, data enhancement: expanding the database by using translation, rotation, scaling and other modes;

s13, data cleaning: filling missing values, and checking abnormal objects;

s14, data normalization: a unified picture coding scheme is used.

S2, data frequency domain expansion;

s21, selecting proper wavelet basis functions according to different tasks and data;

s22, extracting low-frequency and high-frequency information of each dimension of image data by utilizing multi-order high-dimensional wavelet packet transformation, placing the information in different frequency domain sub-bands, and determining the number of decomposition layers according to the memory of a graphic processor and the data scale;

in the invention, high-dimensional N-order wavelet packet transformation is adopted, and the wavelet transformed numerical value, namely the wavelet basis coefficient which is used for fitting the original image space information, can be regarded as the projection value of the image information on each wavelet basis. After an original image is decomposed by a group of orthogonal wavelet bases, the original image is actually divided into a high-frequency component and a low-frequency component by a group of filters, on the basis, wavelet packet transformation carries out further decomposition on the high-frequency sub-band and the low-frequency sub-band of each level, and an optimal decomposition path is calculated by maximizing the information entropy of a step-by-step decomposition signal, so the wavelet packet decomposition is also called as an optimal sub-band tree structure. Taking three-dimensional image input as an example, the sub-blocks (wavelet packet decomposition coefficients) obtained by first-order wavelet packet decomposition are 1/8 of the original image size at the minimum, the decomposition path is optimized according to the maximum information entropy, sub-blocks which are not completely decomposed are also remained, the size of the sub-blocks is 1/4 of the original image size, and the decomposition mode is as shown in fig. 2. Wavelet packet inverse transformation can reconstruct the original high-dimensional image without distortion.

S3, building a network;

s31, respectively building independent neural networks for each frequency domain channel according to different tasks, building a U-net-based neural network for segmentation and reconstruction tasks, and building a GAN network model for generating tasks, so that the neural networks are trained in parallel in S4;

s32, building a U-net: the U-net is composed of the same number of encoders and decoders, each encoder comprises a convolutional layer and a downsampling layer, wherein each convolutional layer comprises a batch normalization layer, an active layer and a convolution kernel, the downsampling layer comprises a batch normalization layer, an active layer and a pooling layer, the pooling layer is used for downsampling image features, each decoder comprises a convolutional layer and an upsampling layer, each convolutional layer comprises a batch normalization layer, an active layer and a convolution kernel, the upsampling layer comprises a batch normalization layer, an active layer and a sampler, pixels are recovered through an inverse convolution algorithm or an interpolation algorithm, the batch normalization layer is used for normalizing data and calibration, and the active function is used for introducing a nonlinear factor to enable a neural network to be fitted to data distribution. Splicing multi-scale feature graphs by using Skip Connection among codecs at the same level, and ensuring the fusion of the multi-scale feature graphs, thereby improving the reconstruction and segmentation precision;

the U-net uniquely designs skip layer connection, splices the features before down-sampling into a feature map for recovering equivalent pixels, helps a decoder to lose as little detail information as possible in the process of recovering the pixels, an encoder comprises a plurality of convolution kernels and is responsible for extracting detail features under the receptive field to form a feature vector map, and a pooling layer introduces invariance and down-sampling images; the decoder also comprises a plurality of convolution kernels, partial feature vector graphs cut from the decoder at the same level are spliced through layer jump connection after the feature vector graphs are obtained, the situation that when the pixel is recovered by the up-sampler, the detail information is lost as little as possible is guaranteed, meanwhile, the image features learned by the encoder and the decoder are utilized, the pixel is recovered by the up-sampler corresponding to the pooling layer of the encoder, and a reverse convolution algorithm or an interpolation algorithm is usually adopted. Before each convolver and pooling layer, the U-net uses a batch normalization layer to normalize the data and calibrate the data distribution that may produce drift; the activation function typically uses modified linear units (relus) in order to introduce non-linear factors to better fit the network to the data distribution. The U-net network structure is shown in fig. 3.

S33, generating a countermeasure network GAN building: the GAN architecture is formed by cascading a generator and a discriminator, is a Convolutional Neural Network (CNN) architecture, the generator is an up-sampling convolutional network, is similar to a U-net decoder, inputs random signal distribution, outputs image data with the same size as a target image, and mainly comprises a convolutional layer and an up-sampling layer, the convolutional layer also comprises a batch normalization layer, an activation layer and a convolver, the up-sampling layer comprises an up-sampler, and outputs a high-dimensional image; the discriminator is a downsampled convolution network and comprises a convolution layer and a pooling layer, the convolution layer comprises a convolver, a batch normalization layer and an activation layer, the input is image data, and the output is one-dimensional probability E [0,1 ].

Before training the GAN network, the network weights of the generator and the discriminator need to be initialized. Firstly, network parameters of an initial generator are migrated and learned or randomized, then, false image data are generated by random seeds, a real image data training discriminator is mixed, and a network weight is updated. Usually, the number of database true and false data is equal, and the initialization is completed when the output of the discriminator is about 0.5. Then, random noise is used as the input of a generator, the generator and the discriminator are connected in series, the weight of the generator network is updated by using the back propagation error, the generator and the discriminator alternately rise in the repeated iterative training, and the training is continued until the stability is reached. The training process for GAN is shown in fig. 4.

S4, network training;

s41, initializing network parameters;

s42, calculating a loss function by using the labeled data (true value) and the network predicted value in training, updating the network weight by using residual back propagation algorithms such as random gradient descent, potential energy method (Momentum) or adaptive gradient (AdaGrad), adaptive Momentum estimation (Adam) and the like, specifically, according to the true value and the predicted value, calculating partial differential of each network parameter by the summation of a similar loss function and a cross entropy loss function, reversely propagating the residual from the predicted end to the input end among network layers, bringing the residual value of the previous layer into each network parameter, updating the partial differential value (gradient) and the residual value of each weight, updating each network weight by the partial differential value (gradient) and adopting a gradient descent method, and reducing the prediction error. The complete one-time forward propagation and backward propagation are one-time iteration, and the training process is shown in fig. 5 through a plurality of times of iteration optimization models;

s43, hyper-parameter adjustment: the overfitting problem is avoided by adjusting the Batch Size (Batch Size), the learning rate, the Momentum (Momentum), the Weight Decay (Weight Decay), the regularization coefficient, the random deactivation (Dropout) and the like.

S5, reconstructing data;

s51, after the neural network of the same level frequency domain channel is trained, performing wavelet packet inverse transformation on each channel network output (predicted value) and reconstructing upper level frequency domain information;

s6, evaluating the model;

s61, comparing the difference between the predicted value and the true value by Cross Validation (Cross Validation) for the network corresponding to each frequency domain channel, and adjusting the neural network structure and parameters of the frequency domain channel with poor performance;

s62, evaluation and measurement method of the model: similarity Coefficient (Dice coeffient), degree of overlap (IoU), Confusion Matrix (fusion Matrix) and its parameter Precision (Precision), Sensitivity (Sensitivity), Accuracy (Accuracy), receiver operating characteristic curve (ROC) and area under the curve, etc. (AUC).

The invention has the advantages and beneficial effects that:

the invention provides a set of deep learning method aiming at large-volume high-dimensional data by utilizing the independent property of each frequency domain of a high-dimensional image after wavelet transformation, and the invention effectively utilizes the GPU memory, remarkably accelerates the training process of a neural network and makes the deep learning artificial task originally limited by hardware computing resources possible. The method also extends to segmentation and generation tasks. For the segmentation task, the output result of the U-net network is subjected to deconvolution up-sampling to form an original image resolution segmentation label. For the image generation task, the neural network of each channel is changed to the antagonistic neural network GAN.

Drawings

FIG. 1 is a schematic flow chart of a reconstruction model of an ultra-high dimensional neural network according to the present invention.

Fig. 2 is a schematic diagram of wavelet packet transformation in the present invention.

Fig. 3 is a schematic diagram of a U-net network structure in the present invention.

Fig. 4 is a schematic diagram of the GAN network training procedure in the present invention.

FIG. 5 is a schematic diagram of single-channel neural network training in the present invention.

FIG. 6 is a block diagram of an exemplary system for reconstructing DTI data according to the present invention.

Fig. 7 is a schematic diagram of a wavelet packet decomposition process according to the present invention.

FIG. 8 is a schematic diagram of the top layer codec of the first-order tile input U-net of the present invention

FIG. 9 is a schematic diagram of the top layer codec of the two-level tile input U-net according to the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

The invention provides a multi-frequency domain parallel neural network, creatively provides that ultra-high dimensional data (such as video data, diffusion weighted magnetic resonance data, functional magnetic resonance data and the like) are decomposed into different frequency sub-bands (sub-bands) by utilizing wavelet packet transformation, and an independent neural network is built for each sub-domain, so that the tasks of segmenting, generating, reconstructing and the like of high dimensional image data are completed. The method can be applied to deep learning tasks of large-volume ultrahigh-dimensional data input, such as medical image analysis, video analysis and the like. Note: each (wavelet transform-neural network-inverse wavelet transform) is referred to as a frequency domain channel or channel.

Because different frequency domain channels after transformation have certain independence, the original image can be perfectly recovered after the image passes through the neural network, and the error only comes from the prediction error of each neural network. In short, the invention provides a method for parallel learning of lossless compression large-volume data in deep learning, in the process, ultrahigh-dimensional large-volume data is decomposed into a plurality of channels, a neural network model in each channel can be trained independently, the calculation and memory resources of a graphic processor required by data training are greatly reduced, and finally, the output of each network reconstructs superior frequency domain information through wavelet packet inverse transformation to restore ultrahigh-dimensional data information.

A deep learning framework which has no information loss and can decompose original data and train independently in a sub-frequency domain solves the problem that deep learning is not suitable for ultra-high-dimensional large-volume data to a certain extent. As shown in fig. 6, a flow of steps for reconstructing DTI under the technical framework provided by the present invention is described.

Step 1, data preprocessing:

1-1) data enhancement: circularly translating the DTI data along x, y and z axes by 2-10 voxel points (voxel) to obtain a new data expansion database;

1-2) data normalization: and aligning the direction (b-vector) of a diffusion sensitive field by using a nearest neighbor method, and extracting a corresponding scanning image.

Step 2, data frequency domain expansion:

2-1) selecting a multi-bezier (Daubechies) wavelet basis;

2-2) the database is 290 DTI data, one DTI data comprises four dimensions of length, width, height and direction, and the size is 148 multiplied by 176 multiplied by 148 multiplied by 288. The wavelet packet transform selects an optimal decomposition path according to the maximum information entropy principle, and a DTI object is decomposed into 1 first-order sub-tile block (first-order wavelet packet coefficient) with the size of 74 × 88 × 74 × 288 and 56 second-order sub-tile blocks (second-order wavelet packet coefficient) with the size of 37 × 44 × 37 × 288 after second-order wavelet packet transform. The decomposition order is automatically determined by a wavelet packet decomposition algorithm. Particularly, the space occupied by a single data sample is correspondingly reduced from 3.92GB to 0.49GB and 61.2MB, and a deep neural network model with about 64 or 128 data sample objects and about 400 ten thousand parameters can be trained by simultaneously loading the data sample objects under the display memory of a 24GB graphics processor. The wavelet packet decomposition process for a DTI object is shown in fig. 7.

Step 3, network building:

3-1) building 8 independent networks based on U-net, and training in parallel in step 4;

3-2) building U-net: the U-net is composed of 6 encoders and decoders, each encoder convolution layer and Pooling layer, the convolution layer includes two convolution calculations, the convolution kernel is 3 × 3 × 3 × 3, the zero Padding (Padding) value is 1, the step size (Stride) is 1, besides, a batch layer and a modified linear unit are used, the Pooling layer uses a 2 × 2 × 2 × 2 maximum Pooling algorithm (Max Pooling), the input 1 data of 37 × 44 × 37 × 288 is output by the convolution layer to obtain Feature maps (Feature maps) with the size of 64, 37 × 44 × 37 × 288, and the Feature maps with the size of 64, 19 × 22 × 19 × 144 are obtained by the Pooling layer; each decoder comprises a convolution layer and an upper sampling layer of convolution kernel, the convolution parameters of the convolution layer are identical to those of the encoder, the upper sampling layer uses a bilinear interpolation algorithm, the input of the decoder at the uppermost layer is 64 characteristic diagrams of 19 × 22 × 19 × 144, 64 outputs of 37 × 44 × 37 × 288 obtained by the upper sampler, the encoder at the same level and the decoder are connected by using a layer jump, the characteristic diagram output by the convolution layer of the encoder is partially fused to the input part of the convolution layer of the decoder, 32 output characteristic diagrams of the encoder at the uppermost layer are randomly selected and spliced to the input end of the convolution layer of the decoder at the uppermost layer, the input of the convolution layer of the decoder is 96 characteristic diagrams of 37 × 44 × 37 × 288, and a reconstructed DTI second-order sub-tile block (second-order wavelet packet coefficient) with the original size of 37 × 44 × 37 × 288 is obtained after the convolution layer of the final decoder; as shown in fig. 8 and 9, the composition of the uppermost codec is described, and the input and output of each layer are described when the input is a first-order sub-block (first-order wavelet packet coefficients) and a second-order sub-block (second-order wavelet packet coefficients), respectively.

Step 4, network training:

4-1) the step is explained as a training method of a single channel, and 8 independent networks are the same and can be respectively and independently trained;

4-2) randomly initializing network parameters by using standard normal distribution;

4-3) forward propagating the label data through the network, and obtaining two loss functions by using the truth value and the network predicted value: adding a similar Loss function (Dice Loss) and a Cross entropy Loss function (Cross-entropy Loss), obtaining a gradient by using a back propagation algorithm, updating a network weight by using an adaptive momentum estimation (Adam) optimizer, and optimizing the model through multiple iterations until the model converges;

4-3) adjusting the hyper-parameters: the learning rate was set to 0.0005 in the first 50 iterations and 0.0001 in the last 30 iterations, the weight decay was 0.01, and the random deactivation rate was 0.5.

Step 5, data reconstruction:

5-1) after the neural network of the same level frequency domain channel is trained, performing wavelet packet inverse transformation on the neural network output (each channel predicted value) of each frequency domain channel, and reconstructing upper level frequency domain information to obtain reconstructed high-dimensional data.

Step 6, model evaluation:

6-1) comparing the difference between a predicted value and a true value by Cross Validation (Cross Validation) for networks corresponding to all frequency domain channels (namely the same frequency domain channel), and pertinently adjusting the neural network structure and parameters of the frequency domain channels with poor performance;

6-2) evaluation and measurement method of model: similarity factor, degree of overlap.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A super-high dimensional data reconstruction deep learning method based on wavelet analysis is characterized by comprising the following steps:

s1, preprocessing the ultrahigh-dimensional image data;

s2, the frequency domain expansion of the ultra-high dimensional image data comprises the following steps:

s21, selecting a wavelet basis;

s22, extracting frequency information of different dimensions of the ultra-high dimensional image data by using N-order high-dimensional wavelet packet transformation, decomposing the ultra-high dimensional image data into N values which are transformed by the wavelet packet, namely, decomposing wavelet basis coefficients fitting image space information, and placing the N values in different frequency domain sub-bands, wherein the number of decomposing layers is determined according to the memory of the image processor and the data scale;

s3, building a neural network, and building independent neural networks for each frequency domain channel;

s4, training a neural network, and performing parallel training on the neural network corresponding to each frequency domain channel;

and S5, reconstructing data, namely inputting the decomposed ultrahigh-dimensional image data into trained neural networks in the same-level frequency domain channels, performing wavelet packet inverse transformation on the output of each channel neural network, and reconstructing upper-level frequency domain information.

2. The method as claimed in claim 1, wherein in S22, the image data is decomposed into high frequency components and low frequency components by a set of orthogonal wavelet bases, and then divided into high frequency components and low frequency components by a set of filters, and based on the high frequency components and the low frequency components, the wavelet packet transform further decomposes the high frequency sub-bands and the low frequency sub-bands of each level, and the optimal decomposition path is calculated by maximizing the entropy of the signals decomposed in stages.

3. The method for deep learning based on wavelet analysis for ultrahigh-dimensional data reconstruction as claimed in claim 1, wherein S3 is implemented to build a U-net neural network, the U-net neural network comprises a set of pairs of encoder and decoder corresponding to each other, and between peer encoders and decoders, using layer-hopping connection to concatenate multi-scale feature maps, and concatenating image features before downsampling by the encoders into feature maps of equivalent pixels restored by the decoders.

4. The method as claimed in claim 3, wherein the encoder includes an encoder convolutional layer and a down-sampling layer, the convolutional core of the encoder convolutional layer extracts detail features under its receptive field to form a feature vector diagram, and the down-sampling layer introduces invariance and down-samples image features; the decoder comprises a decoder convolution layer and an upper sampling layer, after the feature vector diagram is obtained by the convolution kernel of the decoder convolution layer, the feature vector diagram is spliced by layer jump connection, and the pixels are recovered by the upper sampling layer.

5. The method for deep learning based on wavelet analysis and reconstruction of ultrahigh-dimensional data according to claim 1, wherein S3 is used for constructing a GAN neural network, the GAN neural network comprises a cascade generator and a discriminator, the generator is an upsampled convolutional network, the discriminator is a downsampled convolutional network, one end of the discriminator is used for respectively acquiring a target image and image data which is output by the generator and has the same size as the target image, and is used for training the discriminator, and the other end of the discriminator is used for training the generator with a one-dimensional probability e [0,1] output by the discriminator.

6. The method for ultrahigh-dimensional data reconstruction deep learning based on wavelet analysis as recited in claim 1, wherein said S4 comprises the steps of:

s41, initializing neural network parameters;

s42, the marked image data is propagated forward through a neural network to obtain a neural network prediction value, a loss function is calculated according to the prediction value and the true value, the loss function is utilized to carry out backward propagation to obtain a gradient, a gradient descent method is adopted to update a network weight, and the neural network is optimized through multiple iterations until the neural network converges;

and S43, adjusting the neural network hyper-parameters to prevent the neural network from being over-fitted.

7. The method of claim 6, wherein in step S4, a U-net neural network is trained, the U-net neural network includes a set of encoder and decoder pairs corresponding to each other, the original image data is input into the encoder, a prediction value is obtained from an output of the corresponding decoder, and the loss function is calculated according to the prediction value and a true value corresponding to the original image.

8. The method according to claim 6, wherein in step S4, a GAN neural network is trained, the GAN neural network comprises cascaded generators and classifiers, before training, network weights of the generators and classifiers are initialized, false image data is generated by using random seeds after network parameters of the generators are initialized through migration learning and/or randomization, the classifiers are trained in a true value and updated, then random noise is used as an input of the generators, the network weights of the generators are updated through back propagation errors according to a true value and a predicted value output by the classifiers, and the performances of the generators and the classifiers alternately rise in repeated iterative training and training is continued until errors between the true value and the predicted value tend to be stable.

9. The method according to claim 1, wherein after S5, model evaluation is performed to compare the difference between the predicted value and the true value for the neural network corresponding to each frequency domain channel by cross validation, and the neural network structure and parameters of the frequency domain channel are adjusted according to the difference; the evaluation and measurement modes comprise: similarity coefficients, degree of overlap, confusion matrix and its parameter accuracy, sensitivity, accuracy, receiver operating characteristics curve and area under the curve.

10. The method for ultrahigh-dimensional data reconstruction deep learning based on wavelet analysis as recited in claim 1, wherein in said S1, the image data preprocessing includes: image data labeling, image data enhancement, image data cleaning and image data normalization;

the image data label is used for labeling the collected image data;

the image data enhancement: expanding the image database, wherein the expansion mode comprises translation, rotation and scaling of image data;

and image data cleaning: filling missing values in the image data, and checking an abnormal object;

the image data normalization: a unified picture coding scheme is used.