CN117115463A - Neural network input image sampling method based on wavelet decomposition - Google Patents
Neural network input image sampling method based on wavelet decomposition Download PDFInfo
- Publication number
- CN117115463A CN117115463A CN202311007200.5A CN202311007200A CN117115463A CN 117115463 A CN117115463 A CN 117115463A CN 202311007200 A CN202311007200 A CN 202311007200A CN 117115463 A CN117115463 A CN 117115463A
- Authority
- CN
- China
- Prior art keywords
- neural network
- image
- wavelet decomposition
- input image
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 42
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000005070 sampling Methods 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000010606 normalization Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 9
- 238000003062 neural network model Methods 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 7
- 102100032202 Cornulin Human genes 0.000 description 4
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/446—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps: inputting three-channel color images and normalizing to obtain normalized input images; decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components; performing wavelet decomposition for n times according to four frequency component cycles to obtain 4×n frequency band subgraphs; stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network; replacing an original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition; the invention can be widely applied to neural network models of all image tasks. Especially for the task of large resolution image and the task of small target detection, the image information loss can be well reduced.
Description
Technical Field
The invention belongs to the technical field of neural network image processing methods, and particularly relates to a neural network input image sampling method based on wavelet decomposition.
Background
The neural network for image task needs to consider the resolution of the input image, in general, in the image classification task, the input image preprocessing normalization size is 224, and for tasks such as object detection, image segmentation, etc., is set to 640, while for remote sensing images, the input image resolution is even thousands, because finer features need to be extracted. Theoretically, inputting a larger resolution image is advantageous for extracting more accurate results of feature calculation by the neural network, but the calculation amount is multiplied.
In practice, we will typically consider the image resolution size empirically, making a trade-off between model performance and inference speed. The current downsampling method for deep learning is mainly divided into downsampling images with reduced image resolution through interpolation calculation; the original image is decomposed into a plurality of images with different resolutions, and the resolution of the images can be reduced under the condition that the detailed information of the images is not lost, so that the image pyramid of the model performance and the reasoning speed is improved. At present, the existing work mainly adopts a pooling-like downsampling method. I.e. splitting adjacent pixels of the image into different channels, and exchanging smaller space dimensions by increasing channel dimensions.
These methods have in common that the size of the spatial dimension is converted into a channel, thereby significantly reducing the image resolution, however these methods have some problems as follows: because they are in the same spatial position, local information is lost when splitting the subgraph, namely, the pixel points in adjacent positions lose contact; the split subgraphs have high similarity, the features have high similarity through the extraction of convolution features, a plurality of redundancies are brought to the model, the differentiation gradually disappears along with the activation of nonlinear neurons, and the equivalent degradation of the model is finally a direct downsampling method.
In addition to the above direct processing of the input image, there are some methods that use transfer learning to force the model to achieve the same effect as a large resolution with a smaller input resolution. However, this method requires additional training costs, and the amount of computation increases exponentially with the resolution of the input image. Most cases do not support training a model of greater resolution due to computational limitations. In addition, the lost image detail information of the small-resolution model is irreparable, so that the model is required to learn related information in association with the image context, the task difficulty is increased, and the transfer learning method is not efficient at present.
Therefore, in the prior art, wavelet decomposition is not used for an input image downsampling module, so that the problem that model reasoning speed is low due to high input resolution and large calculation amount is solved.
Disclosure of Invention
The invention aims to provide a neural network input image sampling method based on wavelet decomposition, which has the characteristics of greatly reducing the resolution of an input image and promoting the stable convergence of a convolution network.
The technical scheme of the invention is that the neural network input image sampling method based on wavelet decomposition is implemented according to the following steps:
step 1, inputting three-channel color images and obtaining normalized input images through normalization;
step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;
step 3, carrying out wavelet decomposition for n times according to four frequency components until the preset size is reached, and obtaining 4 multiplied by n frequency band subgraphs;
step 4, stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;
and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.
The invention is also characterized in that: the step 2 is specifically implemented according to the following steps:
step 2.1, using Haar wavelet function defined as
For any image X, the image size is h, w, and the two-dimensional image Haar transformation is adopted, so that a decomposed image x=ψ (X) can be obtained, namely, the image is respectively subjected to Haar decomposition along two horizontal and vertical directions;
step 2.2, decomposing along the horizontal and vertical directions respectively to obtain four frequency components; namely, four different subgraphs X= { H, C, V and L } are obtained through wavelet decomposition once, and the subgraphs are half of the original resolution.
The four frequency components in step 2 include a low frequency approximation component, a vertical component, a horizontal component, and a diagonal component.
Subgraphs include multi-scale features of the input image horizontal and vertical textures
Step 3 where the resolution of the 4×n band sub-images is reduced to the normalized input image resolution
The beneficial effects of the invention are as follows:
1. the invention uses wavelet decomposition for preprocessing the image, reduces the resolution of the image and ensures the integrity of the information; the calculation amount of the model is greatly reduced, the limitation of the calculation force of equipment is relieved, and the time loss caused by the wavelet decomposition is almost negligible;
2. the method extracts the frequency domain information of the image in advance by a wavelet decomposition method, has better interpretability and stability than the parameters learned by a neural network, and simultaneously avoids gradient disappearance;
3. the invention can be widely applied to neural network models of all image tasks. Especially for the task of large resolution image and the task of small target detection, the image information loss can be well reduced.
Drawings
FIG. 1 is a flow chart of a neural network input image sampling method based on wavelet decomposition of the present invention;
FIG. 2 is a schematic representation of Haar wavelet in the sampling method of the present invention;
FIG. 3 is a schematic diagram of the decomposition process in the sampling method of the present invention;
fig. 4 is a schematic diagram of a module for replacing input neural network with wavelet decomposition in the sampling method of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention provides a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps as shown in figure 1:
step 1, inputting three-channel color images and obtaining normalized input images through normalization;
step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;
as shown in fig. 2, haar wavelet is used as a basis function for two-dimensional wavelet decomposition in the experiment;
step 2.1, using Haar wavelet function defined as
Secondly, for any image X, the image size is h, w, and the two-dimensional image Haar transformation is adopted to obtain a decomposed image x=ψ (X), namely, the image is respectively subjected to Haar decomposition along two horizontal and vertical directions;
as shown in fig. 3, the decomposition process:
step 2.2, decomposing along the horizontal and vertical directions respectively to obtain four frequency components; namely, four different subgraphs X= { H, C, V and L } are obtained through wavelet decomposition once, the subgraphs are half of the original resolution in size, and four frequency components comprise a low-frequency approximate component, a vertical component, a horizontal component and a diagonal component.
Step 3, carrying out wavelet decomposition for n times according to four frequency components circularly until the wavelet decomposition is carried out to obtain 4 multiplied by n frequency band subgraphs; decomposing the image matrix into subgraphs containing different frequency band characteristics: LL, HL, LH, HH. They represent a low frequency band LL and three high frequency bands HH, HL, LH representing diagonal, horizontal and vertical features, respectively. The resolution of the subgraph is half of the original size, but contains multi-scale feature information. The decomposed sub-graph can be repeated to obtain smaller resolution, and the sub-graph is circularly decomposed to obtain the final size needed by us.
Subgraphs include multi-scale features of horizontal, vertical texture of the input image, with 4×n band subgraph resolution reduced to normalized input image resolution
Step 4, stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;
and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.
As shown in fig. 4, the block diagram combined with the neural network, on which the conventional neural network method is used, is that the input picture is normalized to a preset size, which results in unavoidable loss of a lot of detail information of the image and loss of performance of the model; the shallow neural network performs preliminary feature extraction on the input image on one hand, and performs downsampling of the feature map by pooling or convolution for reducing the calculation amount of the subsequent model on the other hand.
The method is equivalent to the traditional convolutional network, and particularly replaces an input shallow neural network module, so that on one hand, the input image is subjected to wavelet decomposition without any information loss, and the neural network can learn richer features. On the other hand, wavelet decomposition can be regarded as a 'trained' feature extractor, so that the problem of gradient disappearance caused by a deep convolution network in the training process is avoided, and meanwhile, the extracted features have good interpretability.
The invention does not need complicated convolution floating point operation, and greatly improves the model reasoning speed. In addition, the obtained feature subgraph simultaneously contains complete image information (the entropy of the image information is unchanged), and the image is decomposed from multiple scales, multiple angles and multiple frequencies, so that the method is also beneficial to the subsequent feature extraction of the neural network.
Example 1
The embodiment provides a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps:
step 1, inputting three-channel color images and obtaining normalized input images through normalization;
step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;
step 3, carrying out wavelet decomposition for n times according to four frequency components until the preset size is reached, and obtaining 4 multiplied by n frequency band subgraphs;
step 4, stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;
and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.
Example 2
In the embodiment, experimental comparison is performed on text detection, firstly, for a text detection task, the best detection model PAN++ is adopted, and for the text detection task, an input image contains a large amount of background interference information, so that the detection of the text is very difficult, and in order to balance the model reasoning speed, a large amount of detail is lost because the resolution is not adopted.
The test results of the text detection model are shown in table 1, and the test results are input with 960 resolutions, are decomposed into 480 by wavelet once, and are input into a neural network for training. The F1 index is improved by 1.29 percent compared with a reference model (480 resolution) with the same resolution. In terms of speed, the F1 index is reduced compared with 736 resolution at standard resolution, but the reasoning time is reduced from 325ms to 91ms, so that the reasoning time is greatly reduced.
TABLE 1
Example 3
The embodiment performs experimental comparison on the text recognition task, and adopts the most classical CRNN model. The CRNN model firstly extracts characteristic information of a text image through a classical convolution network VGG module, and then converts the characteristic information into a sequence circulation input network by adopting a bidirectional LSTM. The inference speed of the model in practical applications is not efficient due to cyclic input, where the image width determines the number of cyclic sequences, which results in the image size largely affecting the model speed.
For this purpose, experiments were performed to the present invention, as shown in Table 2, where the experimental strategy followed the open source CRNN project, first the benchmark model, the accuracy of identification on the published dataset reached 75.115, then a smaller resolution was used, which was reduced to half to achieve a faster speed reasoning time from 4.6s to 4.19s, yet the accuracy was significantly reduced to 72.177 compared to the reduction.
Compared with the invention, the reasoning speed is 4.227, and the time consumption is slightly increased due to the additional wavelet decomposition process. However, the accuracy is rather higher than the reference model by 75.147, which proves that the invention can provide more accurate characteristics for the model and reduce information loss.
TABLE 2
Experiment | Precision of | Time |
CRNN | 75.115 | 4.6s |
CRNN-img*0.5 | 72.177 | 4.19s |
CRNN-dwt | 75.147 | 4.227s |
Claims (5)
1. The neural network input image sampling method based on wavelet decomposition is characterized by comprising the following steps:
step 1, inputting three-channel color images and obtaining normalized input images through normalization;
step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;
step 3, carrying out wavelet decomposition for n times according to the four frequency components until the preset size is reached, and obtaining 4 multiplied by n frequency band subgraphs;
step 4, stacking the 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;
and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.
2. The method for sampling an input image of a neural network based on wavelet decomposition according to claim 1, wherein said step 2 is specifically implemented according to the following steps:
step 2.1, using Haar wavelet function defined as
For any image X, the image size is h, w, and the two-dimensional image Haar transformation is adopted, so that a decomposed image x=ψ (X) can be obtained, namely, the image is respectively subjected to Haar decomposition along two horizontal and vertical directions;
step 2.2, decomposing along the horizontal and vertical directions respectively to obtain four frequency components; namely, four different subgraphs X= { H, C, V and L } are obtained through wavelet decomposition once, and the subgraphs are half of the original resolution.
3. The method for sampling an input image of a neural network based on wavelet decomposition according to claim 2, wherein the four frequency components in step 2 include a low frequency approximation component, a vertical component, a horizontal component, and a diagonal component.
4. The wavelet decomposition-based neural network input image sampling method of claim 2, wherein the subgraph comprises multi-scale features of horizontal and vertical textures of the input image.
5. The method for sampling input images of a neural network based on wavelet decomposition according to claim 1, wherein the resolution of the 4 x n frequency band sub-images in step 3 is reduced to the normalized input image resolution
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311007200.5A CN117115463A (en) | 2023-08-10 | 2023-08-10 | Neural network input image sampling method based on wavelet decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311007200.5A CN117115463A (en) | 2023-08-10 | 2023-08-10 | Neural network input image sampling method based on wavelet decomposition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117115463A true CN117115463A (en) | 2023-11-24 |
Family
ID=88806779
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311007200.5A Pending CN117115463A (en) | 2023-08-10 | 2023-08-10 | Neural network input image sampling method based on wavelet decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117115463A (en) |
-
2023
- 2023-08-10 CN CN202311007200.5A patent/CN117115463A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jain et al. | Natural image denoising with convolutional networks | |
CN109102469B (en) | Remote sensing image panchromatic sharpening method based on convolutional neural network | |
CN110533077B (en) | Shape adaptive convolution depth neural network method for hyperspectral image classification | |
CN110909801B (en) | Data classification method, system, medium and device based on convolutional neural network | |
Fan et al. | Superpixel guided deep-sparse-representation learning for hyperspectral image classification | |
CN103208097B (en) | Filtering method is worked in coordination with in the principal component analysis of the multi-direction morphosis grouping of image | |
CN109615008B (en) | Hyperspectral image classification method and system based on stack width learning | |
CN105184772A (en) | Adaptive color image segmentation method based on super pixels | |
Liu et al. | Discovering distinctive" semantics" in super-resolution networks | |
CN113554112A (en) | Remote sensing image fusion method, system, equipment and medium | |
CN112967210A (en) | Unmanned aerial vehicle image denoising method based on full convolution twin network | |
Dhar et al. | Accurate segmentation of complex document image using digital shearlet transform with neutrosophic set as uncertainty handling tool | |
CN111611962A (en) | Face image super-resolution identification method based on fractional order multi-set partial least square | |
CN111666813A (en) | Subcutaneous sweat gland extraction method based on three-dimensional convolutional neural network of non-local information | |
CN117746079B (en) | Clustering prediction method, system, storage medium and equipment for hyperspectral image | |
CN117786151A (en) | Automatic retrieval method for plain color pattern of silk jacquard | |
CN1790374A (en) | Face recognition method based on template matching | |
CN116310452B (en) | Multi-view clustering method and system | |
CN110264482B (en) | Active contour segmentation method based on transformation matrix factorization of noose set | |
CN111127407A (en) | Fourier transform-based style migration counterfeit image detection device and method | |
Özyurt et al. | A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function | |
CN108052981B (en) | Image classification method based on nonsubsampled Contourlet transformation and convolutional neural network | |
CN117115463A (en) | Neural network input image sampling method based on wavelet decomposition | |
CN113128521B (en) | Method, system, computer equipment and storage medium for extracting characteristics of miniaturized artificial intelligent model | |
CN109598205A (en) | The method of Finger print characteristic abstract and compressed encoding based on Gabor transformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |