CN117115463A

CN117115463A - Neural network input image sampling method based on wavelet decomposition

Info

Publication number: CN117115463A
Application number: CN202311007200.5A
Authority: CN
Inventors: 贺王鹏; 屈柯帆
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2023-11-24

Abstract

The invention discloses a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps: inputting three-channel color images and normalizing to obtain normalized input images; decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components; performing wavelet decomposition for n times according to four frequency component cycles to obtain 4×n frequency band subgraphs; stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network; replacing an original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition; the invention can be widely applied to neural network models of all image tasks. Especially for the task of large resolution image and the task of small target detection, the image information loss can be well reduced.

Description

Neural network input image sampling method based on wavelet decomposition

Technical Field

The invention belongs to the technical field of neural network image processing methods, and particularly relates to a neural network input image sampling method based on wavelet decomposition.

Background

The neural network for image task needs to consider the resolution of the input image, in general, in the image classification task, the input image preprocessing normalization size is 224, and for tasks such as object detection, image segmentation, etc., is set to 640, while for remote sensing images, the input image resolution is even thousands, because finer features need to be extracted. Theoretically, inputting a larger resolution image is advantageous for extracting more accurate results of feature calculation by the neural network, but the calculation amount is multiplied.

In practice, we will typically consider the image resolution size empirically, making a trade-off between model performance and inference speed. The current downsampling method for deep learning is mainly divided into downsampling images with reduced image resolution through interpolation calculation; the original image is decomposed into a plurality of images with different resolutions, and the resolution of the images can be reduced under the condition that the detailed information of the images is not lost, so that the image pyramid of the model performance and the reasoning speed is improved. At present, the existing work mainly adopts a pooling-like downsampling method. I.e. splitting adjacent pixels of the image into different channels, and exchanging smaller space dimensions by increasing channel dimensions.

These methods have in common that the size of the spatial dimension is converted into a channel, thereby significantly reducing the image resolution, however these methods have some problems as follows: because they are in the same spatial position, local information is lost when splitting the subgraph, namely, the pixel points in adjacent positions lose contact; the split subgraphs have high similarity, the features have high similarity through the extraction of convolution features, a plurality of redundancies are brought to the model, the differentiation gradually disappears along with the activation of nonlinear neurons, and the equivalent degradation of the model is finally a direct downsampling method.

In addition to the above direct processing of the input image, there are some methods that use transfer learning to force the model to achieve the same effect as a large resolution with a smaller input resolution. However, this method requires additional training costs, and the amount of computation increases exponentially with the resolution of the input image. Most cases do not support training a model of greater resolution due to computational limitations. In addition, the lost image detail information of the small-resolution model is irreparable, so that the model is required to learn related information in association with the image context, the task difficulty is increased, and the transfer learning method is not efficient at present.

Therefore, in the prior art, wavelet decomposition is not used for an input image downsampling module, so that the problem that model reasoning speed is low due to high input resolution and large calculation amount is solved.

Disclosure of Invention

The invention aims to provide a neural network input image sampling method based on wavelet decomposition, which has the characteristics of greatly reducing the resolution of an input image and promoting the stable convergence of a convolution network.

The technical scheme of the invention is that the neural network input image sampling method based on wavelet decomposition is implemented according to the following steps:

step 1, inputting three-channel color images and obtaining normalized input images through normalization;

step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;

step 3, carrying out wavelet decomposition for n times according to four frequency components until the preset size is reached, and obtaining 4 multiplied by n frequency band subgraphs;

step 4, stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;

and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.

The invention is also characterized in that: the step 2 is specifically implemented according to the following steps:

step 2.1, using Haar wavelet function defined as

For any image X, the image size is h, w, and the two-dimensional image Haar transformation is adopted, so that a decomposed image x=ψ (X) can be obtained, namely, the image is respectively subjected to Haar decomposition along two horizontal and vertical directions;

step 2.2, decomposing along the horizontal and vertical directions respectively to obtain four frequency components; namely, four different subgraphs X= { H, C, V and L } are obtained through wavelet decomposition once, and the subgraphs are half of the original resolution.

The four frequency components in step 2 include a low frequency approximation component, a vertical component, a horizontal component, and a diagonal component.

Subgraphs include multi-scale features of the input image horizontal and vertical textures

Step 3 where the resolution of the 4×n band sub-images is reduced to the normalized input image resolution

The beneficial effects of the invention are as follows:

1. the invention uses wavelet decomposition for preprocessing the image, reduces the resolution of the image and ensures the integrity of the information; the calculation amount of the model is greatly reduced, the limitation of the calculation force of equipment is relieved, and the time loss caused by the wavelet decomposition is almost negligible;

2. the method extracts the frequency domain information of the image in advance by a wavelet decomposition method, has better interpretability and stability than the parameters learned by a neural network, and simultaneously avoids gradient disappearance;

3. the invention can be widely applied to neural network models of all image tasks. Especially for the task of large resolution image and the task of small target detection, the image information loss can be well reduced.

Drawings

FIG. 1 is a flow chart of a neural network input image sampling method based on wavelet decomposition of the present invention;

FIG. 2 is a schematic representation of Haar wavelet in the sampling method of the present invention;

FIG. 3 is a schematic diagram of the decomposition process in the sampling method of the present invention;

fig. 4 is a schematic diagram of a module for replacing input neural network with wavelet decomposition in the sampling method of the present invention.

Detailed Description

The invention will be described in detail below with reference to the drawings and the detailed description.

The invention provides a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps as shown in figure 1:

as shown in fig. 2, haar wavelet is used as a basis function for two-dimensional wavelet decomposition in the experiment;

step 2.1, using Haar wavelet function defined as

Secondly, for any image X, the image size is h, w, and the two-dimensional image Haar transformation is adopted to obtain a decomposed image x=ψ (X), namely, the image is respectively subjected to Haar decomposition along two horizontal and vertical directions;

as shown in fig. 3, the decomposition process:

step 2.2, decomposing along the horizontal and vertical directions respectively to obtain four frequency components; namely, four different subgraphs X= { H, C, V and L } are obtained through wavelet decomposition once, the subgraphs are half of the original resolution in size, and four frequency components comprise a low-frequency approximate component, a vertical component, a horizontal component and a diagonal component.

Step 3, carrying out wavelet decomposition for n times according to four frequency components circularly until the wavelet decomposition is carried out to obtain 4 multiplied by n frequency band subgraphs; decomposing the image matrix into subgraphs containing different frequency band characteristics: LL, HL, LH, HH. They represent a low frequency band LL and three high frequency bands HH, HL, LH representing diagonal, horizontal and vertical features, respectively. The resolution of the subgraph is half of the original size, but contains multi-scale feature information. The decomposed sub-graph can be repeated to obtain smaller resolution, and the sub-graph is circularly decomposed to obtain the final size needed by us.

Subgraphs include multi-scale features of horizontal, vertical texture of the input image, with 4×n band subgraph resolution reduced to normalized input image resolution

As shown in fig. 4, the block diagram combined with the neural network, on which the conventional neural network method is used, is that the input picture is normalized to a preset size, which results in unavoidable loss of a lot of detail information of the image and loss of performance of the model; the shallow neural network performs preliminary feature extraction on the input image on one hand, and performs downsampling of the feature map by pooling or convolution for reducing the calculation amount of the subsequent model on the other hand.

The method is equivalent to the traditional convolutional network, and particularly replaces an input shallow neural network module, so that on one hand, the input image is subjected to wavelet decomposition without any information loss, and the neural network can learn richer features. On the other hand, wavelet decomposition can be regarded as a 'trained' feature extractor, so that the problem of gradient disappearance caused by a deep convolution network in the training process is avoided, and meanwhile, the extracted features have good interpretability.

The invention does not need complicated convolution floating point operation, and greatly improves the model reasoning speed. In addition, the obtained feature subgraph simultaneously contains complete image information (the entropy of the image information is unchanged), and the image is decomposed from multiple scales, multiple angles and multiple frequencies, so that the method is also beneficial to the subsequent feature extraction of the neural network.

Example 1

The embodiment provides a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps:

Example 2

In the embodiment, experimental comparison is performed on text detection, firstly, for a text detection task, the best detection model PAN++ is adopted, and for the text detection task, an input image contains a large amount of background interference information, so that the detection of the text is very difficult, and in order to balance the model reasoning speed, a large amount of detail is lost because the resolution is not adopted.

The test results of the text detection model are shown in table 1, and the test results are input with 960 resolutions, are decomposed into 480 by wavelet once, and are input into a neural network for training. The F1 index is improved by 1.29 percent compared with a reference model (480 resolution) with the same resolution. In terms of speed, the F1 index is reduced compared with 736 resolution at standard resolution, but the reasoning time is reduced from 325ms to 91ms, so that the reasoning time is greatly reduced.

TABLE 1

Example 3

The embodiment performs experimental comparison on the text recognition task, and adopts the most classical CRNN model. The CRNN model firstly extracts characteristic information of a text image through a classical convolution network VGG module, and then converts the characteristic information into a sequence circulation input network by adopting a bidirectional LSTM. The inference speed of the model in practical applications is not efficient due to cyclic input, where the image width determines the number of cyclic sequences, which results in the image size largely affecting the model speed.

For this purpose, experiments were performed to the present invention, as shown in Table 2, where the experimental strategy followed the open source CRNN project, first the benchmark model, the accuracy of identification on the published dataset reached 75.115, then a smaller resolution was used, which was reduced to half to achieve a faster speed reasoning time from 4.6s to 4.19s, yet the accuracy was significantly reduced to 72.177 compared to the reduction.

Compared with the invention, the reasoning speed is 4.227, and the time consumption is slightly increased due to the additional wavelet decomposition process. However, the accuracy is rather higher than the reference model by 75.147, which proves that the invention can provide more accurate characteristics for the model and reduce information loss.

TABLE 2

Experiment	Precision of	Time
			CRNN	75.115	4.6s
CRNN-img*0.5	72.177	4.19s
			CRNN-dwt	75.147	4.227s

Claims

1. The neural network input image sampling method based on wavelet decomposition is characterized by comprising the following steps:

step 3, carrying out wavelet decomposition for n times according to the four frequency components until the preset size is reached, and obtaining 4 multiplied by n frequency band subgraphs;

step 4, stacking the 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;

2. The method for sampling an input image of a neural network based on wavelet decomposition according to claim 1, wherein said step 2 is specifically implemented according to the following steps:

step 2.1, using Haar wavelet function defined as

3. The method for sampling an input image of a neural network based on wavelet decomposition according to claim 2, wherein the four frequency components in step 2 include a low frequency approximation component, a vertical component, a horizontal component, and a diagonal component.

4. The wavelet decomposition-based neural network input image sampling method of claim 2, wherein the subgraph comprises multi-scale features of horizontal and vertical textures of the input image.

5. The method for sampling input images of a neural network based on wavelet decomposition according to claim 1, wherein the resolution of the 4 x n frequency band sub-images in step 3 is reduced to the normalized input image resolution