CN117115463A - Neural network input image sampling method based on wavelet decomposition - Google Patents

Neural network input image sampling method based on wavelet decomposition Download PDF

Info

Publication number
CN117115463A
CN117115463A CN202311007200.5A CN202311007200A CN117115463A CN 117115463 A CN117115463 A CN 117115463A CN 202311007200 A CN202311007200 A CN 202311007200A CN 117115463 A CN117115463 A CN 117115463A
Authority
CN
China
Prior art keywords
neural network
image
wavelet decomposition
input image
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311007200.5A
Other languages
Chinese (zh)
Inventor
贺王鹏
屈柯帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202311007200.5A priority Critical patent/CN117115463A/en
Publication of CN117115463A publication Critical patent/CN117115463A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/446Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps: inputting three-channel color images and normalizing to obtain normalized input images; decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components; performing wavelet decomposition for n times according to four frequency component cycles to obtain 4×n frequency band subgraphs; stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network; replacing an original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition; the invention can be widely applied to neural network models of all image tasks. Especially for the task of large resolution image and the task of small target detection, the image information loss can be well reduced.

Description

Neural network input image sampling method based on wavelet decomposition
Technical Field
The invention belongs to the technical field of neural network image processing methods, and particularly relates to a neural network input image sampling method based on wavelet decomposition.
Background
The neural network for image task needs to consider the resolution of the input image, in general, in the image classification task, the input image preprocessing normalization size is 224, and for tasks such as object detection, image segmentation, etc., is set to 640, while for remote sensing images, the input image resolution is even thousands, because finer features need to be extracted. Theoretically, inputting a larger resolution image is advantageous for extracting more accurate results of feature calculation by the neural network, but the calculation amount is multiplied.
In practice, we will typically consider the image resolution size empirically, making a trade-off between model performance and inference speed. The current downsampling method for deep learning is mainly divided into downsampling images with reduced image resolution through interpolation calculation; the original image is decomposed into a plurality of images with different resolutions, and the resolution of the images can be reduced under the condition that the detailed information of the images is not lost, so that the image pyramid of the model performance and the reasoning speed is improved. At present, the existing work mainly adopts a pooling-like downsampling method. I.e. splitting adjacent pixels of the image into different channels, and exchanging smaller space dimensions by increasing channel dimensions.
These methods have in common that the size of the spatial dimension is converted into a channel, thereby significantly reducing the image resolution, however these methods have some problems as follows: because they are in the same spatial position, local information is lost when splitting the subgraph, namely, the pixel points in adjacent positions lose contact; the split subgraphs have high similarity, the features have high similarity through the extraction of convolution features, a plurality of redundancies are brought to the model, the differentiation gradually disappears along with the activation of nonlinear neurons, and the equivalent degradation of the model is finally a direct downsampling method.
In addition to the above direct processing of the input image, there are some methods that use transfer learning to force the model to achieve the same effect as a large resolution with a smaller input resolution. However, this method requires additional training costs, and the amount of computation increases exponentially with the resolution of the input image. Most cases do not support training a model of greater resolution due to computational limitations. In addition, the lost image detail information of the small-resolution model is irreparable, so that the model is required to learn related information in association with the image context, the task difficulty is increased, and the transfer learning method is not efficient at present.
Therefore, in the prior art, wavelet decomposition is not used for an input image downsampling module, so that the problem that model reasoning speed is low due to high input resolution and large calculation amount is solved.
Disclosure of Invention
The invention aims to provide a neural network input image sampling method based on wavelet decomposition, which has the characteristics of greatly reducing the resolution of an input image and promoting the stable convergence of a convolution network.
The technical scheme of the invention is that the neural network input image sampling method based on wavelet decomposition is implemented according to the following steps:
step 1, inputting three-channel color images and obtaining normalized input images through normalization;
step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;
step 3, carrying out wavelet decomposition for n times according to four frequency components until the preset size is reached, and obtaining 4 multiplied by n frequency band subgraphs;
step 4, stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;
and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.
The invention is also characterized in that: the step 2 is specifically implemented according to the following steps:
step 2.1, using Haar wavelet function defined as
For any image X, the image size is h, w, and the two-dimensional image Haar transformation is adopted, so that a decomposed image x=ψ (X) can be obtained, namely, the image is respectively subjected to Haar decomposition along two horizontal and vertical directions;
step 2.2, decomposing along the horizontal and vertical directions respectively to obtain four frequency components; namely, four different subgraphs X= { H, C, V and L } are obtained through wavelet decomposition once, and the subgraphs are half of the original resolution.
The four frequency components in step 2 include a low frequency approximation component, a vertical component, a horizontal component, and a diagonal component.
Subgraphs include multi-scale features of the input image horizontal and vertical textures
Step 3 where the resolution of the 4×n band sub-images is reduced to the normalized input image resolution
The beneficial effects of the invention are as follows:
1. the invention uses wavelet decomposition for preprocessing the image, reduces the resolution of the image and ensures the integrity of the information; the calculation amount of the model is greatly reduced, the limitation of the calculation force of equipment is relieved, and the time loss caused by the wavelet decomposition is almost negligible;
2. the method extracts the frequency domain information of the image in advance by a wavelet decomposition method, has better interpretability and stability than the parameters learned by a neural network, and simultaneously avoids gradient disappearance;
3. the invention can be widely applied to neural network models of all image tasks. Especially for the task of large resolution image and the task of small target detection, the image information loss can be well reduced.
Drawings
FIG. 1 is a flow chart of a neural network input image sampling method based on wavelet decomposition of the present invention;
FIG. 2 is a schematic representation of Haar wavelet in the sampling method of the present invention;
FIG. 3 is a schematic diagram of the decomposition process in the sampling method of the present invention;
fig. 4 is a schematic diagram of a module for replacing input neural network with wavelet decomposition in the sampling method of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention provides a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps as shown in figure 1:
step 1, inputting three-channel color images and obtaining normalized input images through normalization;
step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;
as shown in fig. 2, haar wavelet is used as a basis function for two-dimensional wavelet decomposition in the experiment;
step 2.1, using Haar wavelet function defined as
Secondly, for any image X, the image size is h, w, and the two-dimensional image Haar transformation is adopted to obtain a decomposed image x=ψ (X), namely, the image is respectively subjected to Haar decomposition along two horizontal and vertical directions;
as shown in fig. 3, the decomposition process:
step 2.2, decomposing along the horizontal and vertical directions respectively to obtain four frequency components; namely, four different subgraphs X= { H, C, V and L } are obtained through wavelet decomposition once, the subgraphs are half of the original resolution in size, and four frequency components comprise a low-frequency approximate component, a vertical component, a horizontal component and a diagonal component.
Step 3, carrying out wavelet decomposition for n times according to four frequency components circularly until the wavelet decomposition is carried out to obtain 4 multiplied by n frequency band subgraphs; decomposing the image matrix into subgraphs containing different frequency band characteristics: LL, HL, LH, HH. They represent a low frequency band LL and three high frequency bands HH, HL, LH representing diagonal, horizontal and vertical features, respectively. The resolution of the subgraph is half of the original size, but contains multi-scale feature information. The decomposed sub-graph can be repeated to obtain smaller resolution, and the sub-graph is circularly decomposed to obtain the final size needed by us.
Subgraphs include multi-scale features of horizontal, vertical texture of the input image, with 4×n band subgraph resolution reduced to normalized input image resolution
Step 4, stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;
and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.
As shown in fig. 4, the block diagram combined with the neural network, on which the conventional neural network method is used, is that the input picture is normalized to a preset size, which results in unavoidable loss of a lot of detail information of the image and loss of performance of the model; the shallow neural network performs preliminary feature extraction on the input image on one hand, and performs downsampling of the feature map by pooling or convolution for reducing the calculation amount of the subsequent model on the other hand.
The method is equivalent to the traditional convolutional network, and particularly replaces an input shallow neural network module, so that on one hand, the input image is subjected to wavelet decomposition without any information loss, and the neural network can learn richer features. On the other hand, wavelet decomposition can be regarded as a 'trained' feature extractor, so that the problem of gradient disappearance caused by a deep convolution network in the training process is avoided, and meanwhile, the extracted features have good interpretability.
The invention does not need complicated convolution floating point operation, and greatly improves the model reasoning speed. In addition, the obtained feature subgraph simultaneously contains complete image information (the entropy of the image information is unchanged), and the image is decomposed from multiple scales, multiple angles and multiple frequencies, so that the method is also beneficial to the subsequent feature extraction of the neural network.
Example 1
The embodiment provides a neural network input image sampling method based on wavelet decomposition, which is implemented according to the following steps:
step 1, inputting three-channel color images and obtaining normalized input images through normalization;
step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;
step 3, carrying out wavelet decomposition for n times according to four frequency components until the preset size is reached, and obtaining 4 multiplied by n frequency band subgraphs;
step 4, stacking 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;
and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.
Example 2
In the embodiment, experimental comparison is performed on text detection, firstly, for a text detection task, the best detection model PAN++ is adopted, and for the text detection task, an input image contains a large amount of background interference information, so that the detection of the text is very difficult, and in order to balance the model reasoning speed, a large amount of detail is lost because the resolution is not adopted.
The test results of the text detection model are shown in table 1, and the test results are input with 960 resolutions, are decomposed into 480 by wavelet once, and are input into a neural network for training. The F1 index is improved by 1.29 percent compared with a reference model (480 resolution) with the same resolution. In terms of speed, the F1 index is reduced compared with 736 resolution at standard resolution, but the reasoning time is reduced from 325ms to 91ms, so that the reasoning time is greatly reduced.
TABLE 1
Example 3
The embodiment performs experimental comparison on the text recognition task, and adopts the most classical CRNN model. The CRNN model firstly extracts characteristic information of a text image through a classical convolution network VGG module, and then converts the characteristic information into a sequence circulation input network by adopting a bidirectional LSTM. The inference speed of the model in practical applications is not efficient due to cyclic input, where the image width determines the number of cyclic sequences, which results in the image size largely affecting the model speed.
For this purpose, experiments were performed to the present invention, as shown in Table 2, where the experimental strategy followed the open source CRNN project, first the benchmark model, the accuracy of identification on the published dataset reached 75.115, then a smaller resolution was used, which was reduced to half to achieve a faster speed reasoning time from 4.6s to 4.19s, yet the accuracy was significantly reduced to 72.177 compared to the reduction.
Compared with the invention, the reasoning speed is 4.227, and the time consumption is slightly increased due to the additional wavelet decomposition process. However, the accuracy is rather higher than the reference model by 75.147, which proves that the invention can provide more accurate characteristics for the model and reduce information loss.
TABLE 2
Experiment Precision of Time
CRNN 75.115 4.6s
CRNN-img*0.5 72.177 4.19s
CRNN-dwt 75.147 4.227s

Claims (5)

1. The neural network input image sampling method based on wavelet decomposition is characterized by comprising the following steps:
step 1, inputting three-channel color images and obtaining normalized input images through normalization;
step 2, decomposing the normalized input image by adopting Haar wavelet to obtain four frequency components;
step 3, carrying out wavelet decomposition for n times according to the four frequency components until the preset size is reached, and obtaining 4 multiplied by n frequency band subgraphs;
step 4, stacking the 4×n frequency band subgraphs along the channel direction to form an input feature map of the neural network;
and step 5, replacing the original neural network shallow convolutional layer with the input feature map, and performing iterative training according to an actual task to complete neural network input image sampling based on wavelet decomposition.
2. The method for sampling an input image of a neural network based on wavelet decomposition according to claim 1, wherein said step 2 is specifically implemented according to the following steps:
step 2.1, using Haar wavelet function defined as
For any image X, the image size is h, w, and the two-dimensional image Haar transformation is adopted, so that a decomposed image x=ψ (X) can be obtained, namely, the image is respectively subjected to Haar decomposition along two horizontal and vertical directions;
step 2.2, decomposing along the horizontal and vertical directions respectively to obtain four frequency components; namely, four different subgraphs X= { H, C, V and L } are obtained through wavelet decomposition once, and the subgraphs are half of the original resolution.
3. The method for sampling an input image of a neural network based on wavelet decomposition according to claim 2, wherein the four frequency components in step 2 include a low frequency approximation component, a vertical component, a horizontal component, and a diagonal component.
4. The wavelet decomposition-based neural network input image sampling method of claim 2, wherein the subgraph comprises multi-scale features of horizontal and vertical textures of the input image.
5. The method for sampling input images of a neural network based on wavelet decomposition according to claim 1, wherein the resolution of the 4 x n frequency band sub-images in step 3 is reduced to the normalized input image resolution
CN202311007200.5A 2023-08-10 2023-08-10 Neural network input image sampling method based on wavelet decomposition Pending CN117115463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311007200.5A CN117115463A (en) 2023-08-10 2023-08-10 Neural network input image sampling method based on wavelet decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311007200.5A CN117115463A (en) 2023-08-10 2023-08-10 Neural network input image sampling method based on wavelet decomposition

Publications (1)

Publication Number Publication Date
CN117115463A true CN117115463A (en) 2023-11-24

Family

ID=88806779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311007200.5A Pending CN117115463A (en) 2023-08-10 2023-08-10 Neural network input image sampling method based on wavelet decomposition

Country Status (1)

Country Link
CN (1) CN117115463A (en)

Similar Documents

Publication Publication Date Title
Jain et al. Natural image denoising with convolutional networks
CN109102469B (en) Remote sensing image panchromatic sharpening method based on convolutional neural network
CN110533077B (en) Shape adaptive convolution depth neural network method for hyperspectral image classification
CN110909801B (en) Data classification method, system, medium and device based on convolutional neural network
Fan et al. Superpixel guided deep-sparse-representation learning for hyperspectral image classification
CN103208097B (en) Filtering method is worked in coordination with in the principal component analysis of the multi-direction morphosis grouping of image
CN109615008B (en) Hyperspectral image classification method and system based on stack width learning
CN105184772A (en) Adaptive color image segmentation method based on super pixels
Liu et al. Discovering distinctive" semantics" in super-resolution networks
CN113554112A (en) Remote sensing image fusion method, system, equipment and medium
CN112967210A (en) Unmanned aerial vehicle image denoising method based on full convolution twin network
Dhar et al. Accurate segmentation of complex document image using digital shearlet transform with neutrosophic set as uncertainty handling tool
CN111611962A (en) Face image super-resolution identification method based on fractional order multi-set partial least square
CN111666813A (en) Subcutaneous sweat gland extraction method based on three-dimensional convolutional neural network of non-local information
CN117746079B (en) Clustering prediction method, system, storage medium and equipment for hyperspectral image
CN117786151A (en) Automatic retrieval method for plain color pattern of silk jacquard
CN1790374A (en) Face recognition method based on template matching
CN116310452B (en) Multi-view clustering method and system
CN110264482B (en) Active contour segmentation method based on transformation matrix factorization of noose set
CN111127407A (en) Fourier transform-based style migration counterfeit image detection device and method
Özyurt et al. A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function
CN108052981B (en) Image classification method based on nonsubsampled Contourlet transformation and convolutional neural network
CN117115463A (en) Neural network input image sampling method based on wavelet decomposition
CN113128521B (en) Method, system, computer equipment and storage medium for extracting characteristics of miniaturized artificial intelligent model
CN109598205A (en) The method of Finger print characteristic abstract and compressed encoding based on Gabor transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination