CN112862715A - Real-time and controllable scale space filtering method - Google Patents
Real-time and controllable scale space filtering method Download PDFInfo
- Publication number
- CN112862715A CN112862715A CN202110172012.2A CN202110172012A CN112862715A CN 112862715 A CN112862715 A CN 112862715A CN 202110172012 A CN202110172012 A CN 202110172012A CN 112862715 A CN112862715 A CN 112862715A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- filtering
- edge image
- edge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000003062 neural network model Methods 0.000 claims abstract description 15
- 230000000306 recurrent effect Effects 0.000 claims abstract description 15
- 238000001514 detection method Methods 0.000 claims description 21
- 238000013528 artificial neural network Methods 0.000 abstract description 21
- 125000004122 cyclic group Chemical group 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 17
- 238000012549 training Methods 0.000 description 17
- 238000013135 deep learning Methods 0.000 description 16
- 238000009499 grossing Methods 0.000 description 13
- 230000014759 maintenance of location Effects 0.000 description 11
- 241000282414 Homo sapiens Species 0.000 description 10
- 230000000007 visual effect Effects 0.000 description 10
- 238000000926 separation method Methods 0.000 description 9
- 238000007796 conventional method Methods 0.000 description 8
- 230000004438 eyesight Effects 0.000 description 7
- 238000003709 image segmentation Methods 0.000 description 7
- 238000003706 image smoothing Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 239000000243 solution Substances 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 238000002679 ablation Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000010339 dilation Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000016776 visual perception Effects 0.000 description 2
- 101100460704 Aspergillus sp. (strain MF297-2) notI gene Proteins 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 206010011878 Deafness Diseases 0.000 description 1
- 241001522296 Erithacus rubecula Species 0.000 description 1
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000003637 basic solution Substances 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 241000411851 herbal medicine Species 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G06T5/70—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
Abstract
The invention discloses a real-time and controllable scale space filtering method, which outputs a plurality of filtering results with different scales circularly by designing a recurrent neural network model. The cyclic neural network is to obtain a new output by taking the output of the neural network as the input of the network again, and then obtain a new output again by taking the new output as the input of the neural network again, so as to reciprocate until the condition of cycle termination is met. The recurrent neural network model includes a guide network G and a stripping network P.
Description
Technical Field
The invention relates to the field of image processing based on deep learning, in particular to a real-time and controllable scale space filtering method.
Background
The existing traditional methods such as L0[1], RGF [2], RTV [3], muGIF [4], etc. need to go through many iterative optimization operations to obtain reasonable filtering results, resulting in very slow speed and consuming a lot of time during execution; the deep learning method [13] [14] [15] [16] can solve the disadvantage of slow speed of the traditional method, because the deep neural network only needs one-time forward propagation operation in the testing stage, but the current deep learning method can only obtain a filtering result of one scale, cannot obtain a plurality of images of different scales at one time, and needs to set different hyper-parameters and retrain for the smoothing results of different scales. PIO [5] can adjust the hyper-parameters of the network to obtain filtering results of different scales without retraining the neural network. However, for all the conventional methods and deep learning methods at present, the manner of obtaining filtering results of different scales needs to be implemented by adjusting hyper-parameters (such as weights of different loss functions, number of layers of a network, and other parameters) of a model, and is not intuitive enough. Because the hyper-parameter of the model is only a number and a scalar, the spatial and semantic information of the image cannot be reflected intuitively. In other words, the hyper-parameters cannot intuitively control which parts of the image need to be smoothed and which parts should be maintained.
The importance of multi-scale representation of images has long been well-validated, thereby leading to the idea of scale-space filtering. Similar to the human eye observing an object, the perceived features are not the same when the distance from the object is different, i.e. for the same object in the visual range, the features are not the same when the imaged size is different, i.e. the scale is different. An intuitive example of the scale space is different scales of a map, a map with a large scale can know the position information of a province or even a country, and a map with a small scale can only know the position information of a town in detail. According to human vision mechanism and scale space theory, people usually obtain different information on different scales, but when an unknown scene is analyzed by a vision system of a computer, the computer has no way to know the scale of an object in an image in advance, so that the description of the image under multiple scales needs to be considered at the same time to know the optimal scale of the object of interest. Many times, therefore, the images are constructed as a series of image sets of different scales in which features of interest are detected. In short, a larger scale provides more structural information about a scene, while a smaller scale reflects more texture information.
In various existing documents, various image filtering methods attempt to decompose an image into single-scale structural and texture portions. However, for different images and tasks, it is difficult to have a reasonable way to determine which scales are correct or best. Rather than eliminating the size ambiguity and finding an optimal image separation strategy, the images are separated in an organized, natural and efficient manner to obtain results that vary in size for the user to pick. Establishing a multi-scale representation of an image should be a very useful function, as users wish to adjust and select the most satisfactory results in various multimedia, computer vision and graphics applications. Applications of scale-space filtering in multimedia, computer vision, and graphics include image restoration, image stylization, stereo matching, optical flow, and semantic flow, among others.
Over the past few decades, the computer vision and multimedia fields have increasingly focused on organizing images hierarchically or multi-dimensionally through research that has perceived the human eye as an external world principle. Taking image segmentation as an example, an image may be spatially segmented into a set of object instances or superpixels, which may serve as a basis for subsequent further processing. In contrast to image segmentation, another hierarchical image organization scheme is being explored from the perspective of information extraction. This task is called image separation to distinguish it from image segmentation.
The work of Image smoothing via unsupervised learning explores how to directly learn data by using a deep neural network (without supervising data) to generate an ideal filtering effect. The advantage of using deep learning is that it can not only produce good filtering effect, but also have faster speed. However, deep learning often requires real filtered images (GT) as supervised data during training, and these supervised data are often difficult to obtain. Manually labeling the supervised data of the training images is time consuming and laborious. To solve this problem, this work designed the training objective as a loss function, similar to optimization-based methods, to train deep neural networks in an unsupervised, label-free situation. The major contributions of this work can be summarized as:
(1) an unsupervised image smoothing framework is proposed. Unlike previous approaches, the proposed framework does not require real tags for supervised training and can achieve the desired results by learning from any sufficiently diverse set of image data.
(2) Can obtain filtering results equivalent to or even better than the flag drum of the prior method, and designs a new image smoothing loss function which is based on a spatial adaptive Lp flat criterion and an edge image keeping regularizer.
(3) The proposed method is based on a convolutional neural network, which is far less computationally intensive than most previous methods. For example, processing a 1280 × 720 image on a modern GPU requires only 5 ms.
The specific method of this work is to take the original image as the input of the deep neural network, and the deep convolutional neural network outputs the smoothing result of the original image. This work designed a full convolution neural network with hole convolution and hopping connections. This network contains a total of 26 convolutional layers, all of which use a 33 convolution kernel and output 64 feature maps (except the last layer of the network which outputs 3 channels of images), all of which are followed by a batch normalization layer (batch normalization) and a ReLU activation function. The third convolutional layer down-samples the feature map to half the size using a convolution with step size 2, and the third to last convolutional layer restores the feature map to the same size as the input image using a deconvolution.
Image smoothing generally requires acquiring information about the context of an image, and this work gradually enlarges the image's field of view by introducing a hole convolution with an exponentially increasing dilation rate. More specifically, each two adjacent residual blocks have the same expansion ratio, and the expansion ratio of the last two residual blocks is multiplied by two.
The network structure of this work also adopts a residual learning mode, namely, the last layer of the website outputs a residual image, and the addition of the residual image and the input image is the final filtering result.
The purpose of image smoothing is to reduce unimportant image details while maintaining the original image structure, and this works as the overall loss function formula for image smoothing as follows:
ε=εd+λf·εf+λe·εe
wherein epsilondIs a data retention term, εfIs a regular term, epsiloneThe edge image retention term is a weight that balances the different penalty function terms.
The data retention term minimizes the difference between the input image and the output filtered image to ensure structural similarity. An input image is denoted by I, an output image is denoted by T (note that T here means different meaning from T in the present embodiment, and T means that the output image is valid only in the section of the conventional embodiment), and in the RGB color space, a simple data holding item can be defined as:
where i represents the index of the pixel and N is the total number of all pixels. During the filtering process, some important edge images may be lost or weakened in the smoothing process because the object of pixel value smoothing and the object of edge image retention have a certain degree of conflict. To address this problem, this work proposed an explicit edge image retention constraint that preserves the important edge image pixels. Before this constraint is put forward, the concept of a guide image is first introduced, which refers to the edge image response of an image in appearance. One simple form of edge image response is the local gradient values:
wherein, n (i) represents the domain of the ith pixel point, and c represents the c color channel. For output filtering result T sideThe edge image response may also be denoted as e (t). The edge image retention term is defined by minimizing the difference in edge image response of the leading edge images e (i) and e (t). Let B be a binary image, where B i1 denotes an important edge image point, and the edge image retention term is defined as:
herein, theIs the total number of all significant edge image points. The definition of the important edge image is more subjective and diverse, and needs to depend on different application scenes. An ideal way to obtain the binary map B would be to use manual tagging of user preferences. However, manual marking at the pixel level is quite time consuming and laborious. This work uses existing edge image detection techniques to obtain B. This process is not described in excess because it is not a major contribution to the work. Given enough training images and edge images B, the depth network will explicitly learn the information of the significant edge images by minimizing the edge image retention terms and reflect the features of the significant edge images into the filtering results.
To achieve better quality and greater flexibility, this work proposed a new smoothing/flattening term and a spatially variable Lp norm. To remove unwanted image details, the smoothing or flattening term ensures the smoothness of the filtering result by penalizing the gradients between adjacent pixels:
wherein N ish(i) Representing adjacent pixels in a h window around the ith pixel point, wi,jIs the weight of each pixel pair. Weight wi,jThe calculation can be carried out through the spatial position relation or the pixel value relation, and the calculation modes are respectively as follows:
wherein σrAnd σsC denotes the channel of the image, and x and y are the coordinates of the pixel, respectively, for calculating the standard deviation of the gaussian function of the pixel values and the spatial position differences. It is not easy to determine the P value of Lp. To locate in the algorithm which regions in the image use what p-values, this work uses edge image guided images to define the p-value of each image pixel i and its corresponding weight as:
wherein p islarge2 and psmall0.8 is two values of p, c1And c2Are two non-negative thresholds. It can be seen that the value of p is not determined by the input image, but is conditioned on the output filtering result.
The reason for determining the p-value in this way is that when minimizing the loss function, L is used first0.8Norm until p is due tosmallApplying L again, which results in some over-sharpened artifacts in the output image due to the 0.8 regularization term2The norm suppresses the artifact. These pseudo-structures are identified as: the edge image response of pixel I on the original image I is low (e.g., E)i(I)<c1Shown) but in the output image T (E)i(T)-Ei(i)>c2) A significantly enhanced structure without L2The norm can achieve a strong smoothing effect, but also produces stair-stepping artifacts. On the other hand, in the absence of L0.8Norm due to L2Due to the existence of the norm, an optimized image is very fuzzy, and a plurality of important structures are not well reserved. In contrast, with a complete regularization termA better visual effect can be obtained.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a real-time and controllable scale space filtering method, belongs to a deep learning method, only needs one forward propagation operation during execution, is high in speed, and can obtain a plurality of filtering results of different scales of one image. The multi-scale filtering result is controlled by introducing the edge image of the image, and the intuitiveness is stronger than that of controlling the smoothing degree by only using the hyper-parameter.
The purpose of the invention is realized by the following technical scheme:
a real-time and controllable scale space filtering method is based on a recurrent neural network model consisting of a guide network G and a stripping network P, and comprises the following steps:
(1) using original image I as input of guide network G, utilizing guide network to output several edge images G with different scales of image in circulating modetThen, the edge image G of the image is processedtAnd the filtering result It-1Input together into the stripping network P; wherein t represents the t-th step of the cycle; t is 1,2,3 … … T, wherein I0I denotes the original drawing;
(2) stripping the network P at the edge image GtUnder the guidance of (2), outputting the next filtering result It;
(3) Filtering result ItRecycled ground and edge image Gt+1Taken together as input to the stripping network to again obtain a new filtering result It+1Repeating the operation until the cycle number reaches the set total cycle number T; in which the filtering result I output by the stripping network P istAnd the input edge image GtMaintaining the same picture structure, i.e. at GtUnder the guide of (3), image peeling is performed.
Further, the peeling network P can peel off each component from the image hierarchically: the structure/edge image included in the filtering result of each step is a subset of the structure/edge image included in the filtering result of the previous step.
Further, in the above-mentioned case,for edge image GtOf the edge image, which is in the filtering result ItThe gradient of the pixels corresponding to the same position should remain unchanged; for GtOf the non-edge image, which is in the filtering result ItThe more thoroughly the corresponding co-located pixels are smoothed the better.
Further, an edge image GtCan be obtained by any one of the existing depth or non-depth edge image detection methods.
Further, the loop of each step in the filtering method is implemented by a single forward propagation operation or by iterating two or three steps of operations.
Furthermore, the stripping network P can supervise and learn any one of the existing filtering methods, and can also train in an unsupervised manner; at each step of the cycle, the core of the stripping network P is to accept the output I of the previous stept-1As input, at the edge image GtIs guided by (2) to perform image filtering, thereby obtaining a filtered image fromt-1Middle peeling off to obtaint。
Further, the filtering result of the input image is obtained by setting a hyper-parameter, and the guide network G is used for establishing the hyper-parameter and the edge image GtThe relation between the two; setting different hyper-parameters for different scales, wherein each hyper-parameter corresponds to an edge image Gt。
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
1. the invention formally defines a general image separation problem, and focuses on a specific member in an image separation series task by introducing a concept of scale space filtering: and (5) stripping the hierarchical image. And the initial image separation problem is simplified through theoretical analysis, so that the original complex problem is converted into a series of small sub-problems on the basis of the theoretical analysis, and the complexity is greatly reduced.
2. The method belongs to one deep learning method, only needs one forward propagation operation during execution, has high speed, and can obtain a plurality of filtering results of different scales of one image. The invention introduces the edge image of the image to control the multi-scale filtering result, and the intuitiveness is stronger than that of controlling the smoothing degree by only utilizing the hyper-parameter.
3. Compared with the numerical value hyper-parameter through adjusting the model, the invention adopts a more intuitive and flexible method to generate the filtering results with different scales, namely, the edge image with perception significance is adopted to replace the hyper-parameter.
4. Many of the tasks currently available expect the model to be able to perform in a real-time manner. As a core function of the model, in addition to useful functions, the efficiency of the model is also crucial. The invention designs a lightweight recurrent neural network model, namely a hierarchical image stripping network, so that the task of hierarchical image stripping can be efficiently and effectively completed, and the supervised and unsupervised conditions can be flexibly processed. The size of the image is about 3.5Mb, and on a GTX 2080Ti GPU, the speed of repeatedly processing 1080p images each time exceeds 60fps, so that the image processing method has very strong practical value.
5. The method can realize the hierarchical organization of the image, finally obtain the multi-scale representation of the image, and can acquire the interested information in different scales aiming at the image.
Drawings
FIGS. 1a and 1b are schematic diagrams illustrating a general image filtering; fig. 1c to 1g refer to schematic diagrams for performing scale-space filtering.
Fig. 2 is a schematic diagram of a framework structure and a working process of the recurrent neural network model in this embodiment. The numbers below each network block in the figure represent the number of channels output by the corresponding convolution module in the neural network, and the letters K, S and D represent the size of the convolution kernel, the step size of the convolution and the expansion rate of the hole convolution, respectively.
Fig. 3a to 3c show an input image, a guide map and a final result map during application of the method of the present invention, respectively.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Over the past few decades, the computer vision and multimedia fields have increasingly focused on organizing images hierarchically or multi-dimensionally through research that has perceived the human eye as an external world principle. Taking image segmentation as an example, an image may be spatially segmented into a set of object instances or superpixels, which may serve as a basis for subsequent further processing. Unlike image segmentation, this embodiment introduces another level of image organization from the perspective of information extraction. And is referred to as image separation to distinguish it from image segmentation.
The object of the invention is to hierarchically organize images resulting in a multi-scale representation of the images. Specifically, given an image I, a plurality of components C satisfying a hierarchical relationship are gradually peeled off from the imageiThe original image I is obtained by adding these components at the pixel level. Unlike image segmentation (considered from the overall space), the present embodiment decomposes an image into a series of components from the perspective of scale-space filtering. The term image filtering, also called image smoothing, refers to removing image texture while keeping the main structure of the image unchanged. The structure of the image is generally an edge image part and can reflect the overall contour and shape of an object in the image; the texture of an image refers to a visual pattern that is distributed within an object, repeating in a regular or irregular manner. Fig. 1a shows an example of image filtering, in which images of the edges of a dog, five sense organs of the dog, a rope pulling the dog and a square frame can be seen as the structure of the image, and dense and numb small squares distributed inside the body of the dog can be seen as texture details. It is a very subjective matter to determine whether a pixel in an image belongs to a structure or a texture, for example, in fig. 1a and 1b, the eyes and mouth inside a dog are determined to be a structure and thus are not smoothed but maintained, but the eyes and mouth of the dog can be determined to be texture smooth-out completely, and the two ways do not have a good or bad score depending on how the user wants to smooth the image, in other words, the result of image filtering is not single and can be selected in many different ways. As shown in the figure1c to 1g, the scale space filtering means that a plurality of filtering results with different scales and different degrees are obtained after a single input image is processed, and the image is described in a multi-scale manner, so that interesting information is obtained in different scales.
The core of the invention is to design a real-time and controllable scale space filtering method. To simplify the description, let And isItAnd PtSatisfy I ═ It+PtHere, n represents that a total of n components are separated from the original image I. In order to perform scale-space filtering efficiently, the present embodiment provides a recurrent neural network model to cyclically output a plurality of filtering results of different scales. The cyclic neural network is to obtain a new output by taking the output of the neural network as the input of the network again, and then obtain a new output again by taking the new output as the input of the neural network again, so as to reciprocate until the condition of cycle termination is met. The recurrent neural network model can be trained both in a supervised and unsupervised manner. The recurrent neural network model includes a guide network G and a stripping network P. The specific method is that the original image I is used as the input of the guide network, and the guide network is used to output a plurality of edge images G with different scales of the image in a circulating modetThen, the edge image G of the image is processedtAnd the filtering result I of the previous stept-1(T is 1,2,3 … … T; the result of filtering in the zeroth step is the original image, i.e. I0I) are input together into a stripping network, which strips the edge image GtUnder the guidance of (2), outputting the next filtering result ItNext step of filtering result ItRecycled ground and edge image Gt+1Taken together as input to the stripped network, to again obtain new filteringResults It+1This operation is repeated until the number of cycles reaches the set total number of cycles T. It is emphasized that the filtering result I of the output of the stripping networktEdge image G to be inputtMaintaining the same picture structure, i.e. at GtUnder the guide of (3), image peeling is performed.
Specifically, definition 1 (image separation): given an image I, I is separated to obtain its constituent C ═ C1,C2,…,CnAre multiplied byThis is called image decomposition, and n denotes the separation of n components from the image.
The object of the present embodiment can be represented by definition 1;representing a first order gradient of the decomposed components. The gradient may represent the structure and detail information of the image, and is obtained by subtracting adjacent pixels of the image, that is, each pixel value of the image gradient is the result of subtracting a pixel at a corresponding position in the original image and its adjacent pixels. Is calculated by the formulaWhere x, y are the position indices of the image pixels. In general, the gradient corresponding to the edge image portion of the image is large. Due to a plurality of composition components CiThe complex relationship between them, it is very difficult to resolve multiple components directly from the image, and in order to make the problem easier to handle, image stripping based on the following sequence reduces it to a series of sub-problems that iteratively strip two components from the image.
Theorem 1 (image stripping of the sequence): suppose for any t, [ C ]t,It]Are all from It-1In such a way that the two components are iteratively separated from It-1Middle peeling off to obtain CtThe result obtained is the same as the result of separating all components directly from the image.
And (3) proving that: genus dissected according to hierarchyThe nature of the Chinese herbal medicine is that,should beA subset of non-zero elements. The properties that are retained by the structure can then be obtained,andshould not have a correlation between them, can be expressed asNow given It=It+1+Ct+1Andcan obtain the productAndby analogy, can obtain And theorem 1 proves the syndrome.
Hierarchical image stripping is a special member of the image separation series of tasks, which progressively separates/strips out the components of an image in a progressive manner, not all at once from the original. The above analysis transforms the initial problem into a sequential one, naturally inspiring solutions with recurrent neural networks. Each step cycle may be viewed as performing a controlled image stripping operation of the retaining structure. The goal to be completed for each step cycle can generally be written in the form:
wherein phi (-) and psi (-) are with respect to CtAnd Itσ denotes a hyper-parameter that controls the filtering/stripping strength. The formula represents C output at each step of the cycletAnd ItShould be such that the objective function Φ (C) of the optimizationt)+Ψ(It) A solution to the minimum is taken. Many conventional methods, e.g. L0[1]],RGF[2],RTV[3],muGIF[4]And the like, operations involving very large calculation amount such as inverse calculation of a matrix are slow, which limits the application of the traditional method in real time in practical scenes. When using the deep learning method, a deep neural network can be trained from the input and output perspective to simulate the effect of the conventional method (training the deep neural network using the result of the conventional method as supervised data). Once training is completed, the neural network only needs one forward propagation operation in execution, and the required operation overhead is greatly reduced. However, the use of the numerical parameter α to control the degree of filtering/stripping is not intuitive, and it is difficult to adjust the hyper-parameter α to obtain the desired result, after all α is only a number, a scalar. In contrast, guidance information that is meaningful and intuitive in visual perception is more practical. Among the many visual meaningful cues that can be selected, edge images are a good choice because they are very simple and can intuitively reflect the semantic features of an image and the overall outline of an image. Unfortunately, in general, at each stage of the loop, the edge image used as a guide is unknown. To solve this problem, the invention makes it possible to predict in advance a reasonable edge image for each phase of the cycle, i.e. G, using the bootstrap network Gt←G(It-1) And then using the edge image obtained by prediction to guide the stripping network P from It-1Middle peeling off to obtaint. With the above considerations in mind, the present invention therefore proposes a round robin strategy, cyclically with GtAs a guide from It-1Middle peeling off to obtaintI.e., [ C ]t,It]←P(It-1,Gt). t denotes the t-th step of the cycle. Notably, G istNot only can be output by a guide network, but also allows a user to create an edge image G in a customized mannert. All the image pixel values used in the invention have the value ranges of [0, 1]]For a value range of [0,255]Can be obtained by simple standardization operation to obtain [0, 1]]The image of (2).
As shown in fig. 2, the recurrent neural network model according to this embodiment logically includes two modules, one is a bootstrap network G, and the other is a stripping network P. The stripping network P is an edge image GtPerformed as a condition, edge image GtEither output by the bootstrap network G or provided by the user. By this logical partitioning, the boot network and the peel network can be decoupled to a large extent, further simplifying the problem. In addition, through the logical division, the solution space of the original problem is greatly restricted, so that the solution space becomes smaller, which is beneficial to the simplification and training of the model.
First, regarding the stripping network; the stripped network can not only learn the effect of any one of the existing conventional filtering methods with supervision (using the result of the conventional method as supervision during training), but also train in an unsupervised manner. At each step of the cycle, the core task of the stripping network is to accept the output I of the previous stept-1As input, at the edge image GtIs guided by (2) to perform image filtering, thereby obtaining a filtered image fromt-1Middle peeling off to obtaint. Edge image G whether used as a guidetWhat appears to be reasonable, peeling result ItAll should strictly comply with GtThe guidance of (2). Since this section mainly introduces the peeling network, assume that the edge image GtAre existing, known, and the following section will give details on how the bootstrap network is designed to get GtIn (1). Then based on It-1=It+CtThis hard constraint, the stripped network, may be represented byt-1As input, output ItOr Ct. As long as ItOr CtOne of the two is output and the other can simply be through It-1And subtracting the output result. Furthermore, to better consider the context information of the image, the stripped network needs to have a larger receptive field. Thus, as shown in FIG. 2, a hole convolution is introduced to gradually increase the receptive field, where the dilation rate of the hole convolution increases exponentially. Although the purpose of increasing the receptive field can be achieved by designing a deeper network, the parameter number of the very deep network is very large, and in order to save the storage overhead of the model, the receptive field is enlarged without increasing the parameter number of the network by deepening the network layer number, but by introducing the hole convolution.
And second to boot the network. For input image It-1The result of the filtering is, in general, by setting the hyper-parameter σtObtained (for different scales, σ)tDifferent), can be represented asF denotes a specific filtering method and,is the target result of image filtering. What the bootstrap network needs to do is to establish the sigma of the valuetAnd the corresponding guide map (edge image in the present invention) having a visual perception meaning. In order to achieve this object,gradient of (2)Can be used as a pair GtTo train a bootstrap network. However, there is some difference between the image gradient and the edge image in the true sense. A reasonable edge image should be considered semantically, and at the same time be binary (binary image takes value of 0 or 1).The edge image considering the semantic meaning can reflect the target which can be perceived by human eyes in the image of the image, and the binary edge image can avoid the ambiguity when judging whether a certain pixel belongs to the edge image. To reduce the difference between the gradient and the edge image in the true sense, on the one hand, similar to the stripped network, the steering network also employs hole convolution to increase the receptive field to better learn the contextual characteristics of the image. The perception of the neural network on semantically meaningful objects can be obviously improved by using the context information obtained by the cavity convolution learning; on the other hand, a Sigmoid activation function is added to the last layer of the network to force the input as close to binary as possible. More details can be taken from the part of the guiding network in fig. 2. However, real supervision dataNot always available during the training phase. Therefore, whenIn the absence of the presence of the agent,gradient of (2)Should use I0Gradient of (2)To approximate. Result of approximationMay not be particularly precise as long as the overall structure of the image can be reflected. By repeatedly using I0Gradient of (2)Of a series of different dimensionsCan be constructed to train a reasonable steering network.
Why the edge image G is not directly obtained by the ready-made edge image detection methodtSuch as conventional edge image detection operators Roberts, Sobel and Canny, or other edge image detection methods based on deep learning. Theoretically, the existing edge image detection method is feasible, but the existing method has the defects of sensitivity to noise, inaccurate edge image positioning, too thick edge image obtained by detection, unsatisfied hierarchical property and the like. In addition, the basic solution idea of the deep learning method is to learn the multi-scale features of a single input image, and then fuse the multi-scale features to obtain a final predicted edge image. The existing deep learning method needs to acquire multi-scale features by means of a pre-trained image classification network such as AlexNet or VGG 16. The specific method of these existing deep learning methods is to fuse features output by different layers of the pre-trained image classification network by using 1 × 1 convolution operation, and then obtain a predicted edge image by using the fused features, where the features of different layers of the pre-trained image classification network can be regarded as features with multiple scales. As shown in fig. 2, the framework of this embodiment can overcome the problems of the existing deep learning method, because the present invention reuses the parameters of the network and cyclically outputs the feature F output in the previous stept-1Re-used as input to obtain the next feature Ft。FtCompared with Ft-1Has a larger receptive field.
How the neural network model in this embodiment is trained is specifically as follows: for different scales, different σ needs to be sett,ItCan be seen as setting different sigmatAs a result of the filtering of (2), each sigmatWill have an edge image GtCorresponding to it. T represents the total number of cycles, and in this embodiment, T is temporarily set to 4. It should be noted that the recurrent neural network may be cycled any number of times, not limited to T times. The interval of the filtering degree at each step is formed bytControl, σtOr may be adjusted according to particular needs.
The loss function during network training can be simultaneously applied to supervised and unsupervised training modes, including guiding consistency loss, peeling reconstruction loss, peeling maintenance loss and peeling consistency loss. In connection with the supervised image stripping,andis present, obtainable, whereinCan be generated by any existing image filtering method and used as supervision in neural network training; for unsupervised image stripping,
the specific form of the loss function is described below. The guiding consistency loss is to ensure the edge image G output by the guiding networktAndand the consistency is maintained. Let 1 denote that one pixel value is all 1 and the size and ItThe same image, ° represents the Hadamard product, or pixel-by-pixel multiplication, i.e. the corresponding multiplication of pixels at the same position in both images. If it is notIs not available and can be obtained
Wherein G isgrTo further enhance the important edge images in the image. GgrCan be manually marked edge images or pairsAnd (5) binarization result. For simplicity of description, the following description is used uniformlyTo representThe boot consistency loss can be expressed as:
wherein the content of the first and second substances,||·||1denotes a 1 norm, βgIs a constant that balances the two terms in the loss function. The cycle consistency loss is comparedAndthe sizes of the two are determined as GtWhether the pixel at a certain position in the image belongs to the edge image or not.
Stripping reconstruction losing result I of desired outputtAndas close as possible in color space. Make | · | non-conducting phosphor2Is a 2 norm in the form:
the purpose of the peel retention loss is to retain ItThe gradient of the structural moiety in (1) is unchanged. I istPixel and G identified as belonging to a structuretThe pixels whose median value is close to 1 have the same position, in other words,Itstructural pixel of (2) corresponds to GtA pixel having a middle pixel value close to 1 (considered to belong to an edge image). Since the gradient of an image can naturally reflect the structural information of the image, the peel retention loss is defined as ItAndis the distance between the gradients of (1), i.e.Andthe distance between:
the loss of peel consistency severely constrains the image peeling process such that the result of peeling is in accordance with GtAnd the consistency is maintained. Peeling consistency loss pair ItEach pixel of (a) is smoothed to a different degree. For ItThe punishment of the pixels belonging to the structure is small, and the punishment of the pixels belonging to the texture is large. The specific form of peel consistency loss is:
where e is used to avoid a denominator of 0. In order to stabilize the training and increase the convergence rate during the training, the bootstrap network and the stripper network are trained independently of each other. Since the stripping network requires GtThe lead network is trained with the lead consistency loss, the parameters of the trained lead network are fixed, and the peel network is trained with the peel reconstruction loss, the peel retention loss, and the peel consistency loss.
Further, to demonstrate the significant progress of the method of the present invention, the following is illustrated with reference to some experimental results:
firstly, in order to fully utilize the multi-scale characteristics of the method, a new strategy is provided for applying the method to the significance detection task. The image saliency detection means that a computer algorithm is utilized to simulate the visual characteristics of human beings so as to extract a salient region in an image. The salient regions of an image can be regarded as the parts most noticeable to the human eye when looking at an image, and in general, the salient, high-contrast and color-changing regions of an image are noticeable to the human eye. Saliency detection is closely related to the nature of what the human visual system selectively processes, and its goal is to locate important and salient regions or objects in an image, which is an important and popular research direction in the field of computer vision. This example first utilizes the existing significance detection model CSF 6]And EGNet [7 ]]The original image I and four filtering results (total 5) generated by the method of the invention are respectively subjected to significance detection, and then DUTS-TR [8 ]]A lightweight network (only 91KB) was trained on the dataset to predict a better saliency map from these five saliency detections. The invention refers to a method for evaluating the significance detection accuracy in the related documents, and the significance detection quality is measured by using the average absolute error, and the calculation formula is MAE (S)o,Sgt):=mean(|So-SgtIn which S)oSignificance map, S, output for the modelgtIs a true saliency map (marked by a human). The evaluation dataset of this example is a public significance detection dataset: ECSSD [9],PASCAL-S[10],HKU-IS[11],SOD[12]And DUTS-TE [8 ]]. The results show that the method can effectively improve the performance of the existing saliency detection model, because some features useful for saliency detection may be more prominent in different scales, and the removal of unwanted textures in the image helps to enhance the contrast of a salient region. In addition to saliency detection, the present invention can flexibly improve the performance of many other visual and graphical models.
In addition, the embodiment is compared with other filtering methods. Conventional methods for comparison include L0[1]],RTV[3],RGF[2],SD[17],muGIF[4],realLS[18]And enBF [19 ]](ii) a The deep learning method comprises DEAF [15]],FIP[16]And PIO 5]. In the visual results, Ours-S (A), (B), (C), (DProduced by muGIF) and Ours represent models resulting from supervised and unsupervised training, respectively. For evaluating the quality of the image, gradient correlation coefficients are usedTo evaluate the degree of irrelevancy between the stripped-out image texture and structure. For fairness, the hyper-parameters of the contrast method are also carefully adjusted so that all methods achieve a similar degree of filtering/smoothing, which is calculated by
TABLE 1 quantitative comparison of GCC and execution speed for each method model
Note 1: quantitative comparison with GCC. For a fair comparison, the smoothing/filtering degree of all comparison methods is controlled to 0.146 ± 0.01. The best results are bolded. Smaller GCC values indicate better results.
Note 2: run-time contrast when processing 1080p (1627 × 1080) images. The time of the CPU is not marked in the table, and the time of the GPU is usedAnd (4) marking.
As can be seen from the quantitative results in Table 1, the method of the present invention ranks first in the index of GCC compared to other methods, which illustrates that it is obtained using the recurrent neural network framework of the present inventionAndthe mutually orthogonal property is well satisfied. In addition, whether running on a CPU or GPU, the recurrent neural network model executes much faster than conventional methods. Based on the advantages of the deep learning technology, the cyclic neural network model and the PIO can achieve real-time speed when processing 1080p images. In terms of visual effect, it was observed that L0, RGF and PIO had very poor visual effects and PIO also had a very severe color shift problem when the degree of filtering/smoothing was increased. While RTV and muGIF perform relatively well, both methods do not completely smooth or preserve certain areas of the image, in contrast, the present method achieves visually pleasing results in both smoothing the texture details of the image and preserving the main texture/edge images of the image. It is worth mentioning here that, except for the method of the present invention, neither the conventional method nor the depth learning method can generate a filtering result conforming to the indication of the guide image by using an edge image whose scale changes step by step or an edge image provided/edited by a user as the guide image. As shown in fig. 3a to 3c, model flexibility is verified using a manually edited guide map. The guide map is formed by combining four edge images with different scales. Only the method of the present invention successfully outputs a filtering result structurally consistent with the guide image, as compared with other methods.
The invention also evaluates the edge image output by the guide network. Because the framework involved in the invention has the property of multi-scale, the edge image output by the guide network at each step of the loop can be used for constructing an edge image confidence map. The edge image confidence map is also an edge image, except that the value of each pixel in the edge image confidence map is not necessarily close to 0 or 1, and there may be many pixel values around 0.5, because the value of each pixel in the edge image confidence map can be regarded as the probability whether the pixel belongs to the edge image, and the larger the value of the pixel in the edge image confidence map is, the more likely the pixel belongs to the edge image. In particular, the present invention relates to a method for producing,the real edge image artificially marked in the BSDS500 data set is used as G in the process of unsupervised learninggrTraining a guide network, enabling the guide network to iterate for 24 times during execution, and averaging edge images obtained by iterating for 24 times to obtain an edge image confidence map. The constructed edge image confidence map is quantitatively evaluated with an accuracy-recall curve. The edge image confidence map is processed for non-maximum suppression before evaluation.
Finally, an ablation experiment for the recurrent neural network model of the present invention is presented. Since the guiding network has only one guiding consistency loss, it is no longer necessary or necessary to perform an ablation analysis on the loss function of the guiding network. For with It-1As an input, there are two execution modes for the stripping network: one is output ItThe other is output Ct. The invention is inclined to output CtI.e. Ct←P(It-1,Gt). The reason for this is CtContaining the information ratio ItLess and simpler distribution.
In addition, the technical scheme of the invention can be replaced by the following steps:
alternative one: edge image GtThe method is not obtained by guiding network output, but by any one of the existing deep or non-deep edge image detection methods.
Alternative scheme two: each step of loop is not a single forward propagation operation, but needs to iterate two or three steps again, which is equivalent to embedding a small loop in a large loop. The cyclic neural network model of the invention has only one forward operation in each step of cycle, and a small cycle cannot be embedded in a large cycle.
Alternative scheme three: the interaction mode of the guide network and the stripping network is not to take the output of the guide network as the input of the stripping network, but to directly use the guide network to output the model parameters of the stripping network.
The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.
Reference documents:
[1]Li Xu,Cewu Lu,Yi Xu,and Jiaya Jia.Image smoothing via L0 gradient minimization.TOG,30(6):112,2011.
[2]Qi Zhang,Xiaoyong Shen,Li Xu,and Jiaya Jia.Rolling guidance filter.In ECCV,2014.
[3]L.Xu,Q.Yan,Y.Xia,and J.Jia.Structure extraction from texture via relative total variation.TOG.,31(6):139,2012.
[4]X.Guo,Y.Li,J.Ma,and H.Ling.Mutually guided image filtering.TPAMI,42(3):694–707,2020.
[5]Qingnan Fan,Dongdong Chen,Lu Yuan,Gang Hua,Nenghai Yu,and Baoquan Chen.A general decoupled learning framework for parameterized image operators.TPAMI,2019.
[6]Shang-Hua Gao,Yong-Qiang Tan,Ming-Ming Cheng,Chengze Lu,Yunpeng Chen,and Shuicheng Yan.Highly efficient salient object detection with 100k parameters.In ECCV,2020.
[7]Jia-Xing Zhao,Jiang-Jiang Liu,Deng-Ping Fan,Yang Cao,Jufeng Yang,and Ming-Ming Cheng.Egnet:edge guidance network for salient object detection.In ICCV,Oct 2019.
[8]Chuan Yang,Lihe Zhang,Ruan Xiang Lu,Huchuan,and Ming-Hsuan Yang.Saliency detection via graph-based manifold ranking.In CVPR,pages 3166–3173,2013.
[9]Q.Yan,L.Xu,J.Shi,and J.Jia.Hierarchical saliency detection.In CVPR,pages 1155–1162,2013.
[10]Y.Li,X.Hou,C.Koch,J.M.Rehg,and A.L.Yuille.The secrets of salient object segmentation.In CVPR,pages 280–287,2014.
[11]Guanbin Li and Y.Yu.Visual saliency based on multiscale deep features.In CVPR,pages 5455–5463,2015.
[12]David Martin,Charless Fowlkes,Doron Tal,and Jitendra Malik.A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics.In ICCV,2001.
[13]Michal Gharbi,Gaurav Chaurasia,Sylvain Paris,and Frdo Durand.Deep joint demosaicking and denoising.TOG,35(6):1–12,2016.
[14]Sifei Liu,Jinshan Pan,and Ming-Hsuan Yang.Learning recursive filters for low-level vision via a hybrid neural network.In ECCV,2016.
[15]Li Xu,Jimmy S.J.Ren,Qiong Yan,Renjie Liao,and Jiaya Jia.Deep edge-aware filters.In ICML,2015.
[16]Qifeng Chen,Jia Xu,and Vladlen Koltun.Fast image processing with fully-convolutional networks.In ICCV,pages 2516–2525,2017.
[17]Bumsub Ham,Minsu Cho,and Jean Ponce.Robust guided image filtering using nonconvex potentials.TPAMI,40(1):192–207,2017.
[18]Wei Liu,Pingping Zhang,Xiaolin Huang,Jie Yang,Chunhua Shen,and Ian Reid.Real-time image smoothing via iterative least squares.TOG,39(3):28,2020.
[19]Wei Liu,Pingping Zhang,Xiaogang Chen,Chunhua Shen,Xiaolin Huang,and Jie Yang.Embedding bilateral filter in least squares for efficient edge-preserving image smoothing.TCSVT,30(1):23–35,2020.
[20]John Canny.A computational approach to edge detection.TPAMI,8(6):679–698,1986.
[21]David R Martin,Charless C Fowlkes,and Jitendra Malik.Learning to detect natural image boundaries using local brightness,color,and texture cues.TPAMI,26(5):530–549,2004.
[22]Pablo Arbelaez,Michael Maire,Charless Fowlkes,and Jitendra Malik.Contour detection and hierarchical image segmentation.TPAMI,33(5):898–916,2011.
[23]Zhile Ren and Gregory Shakhnarovich.Image segmentation by cascaded region agglomeration.In CVPR,pages 2011–2018,2013.
[24]Piotr Doll′ar and C Lawrence Zitnick.Structured forests for fast edge detection.In CVPR,pages 1841–1848,2013.
[25]Wei Shen,Xinggang Wang,Yan Wang,Xiang Bai,and Zhijiang Zhang.DeepContour:A deep convolutional feature learned by positive-sharing loss for contour detection.In CVPR,pages 3982–3991,2015.
[26]Saining Xie and Zhuowen Tu.Holistically-nested edge detection.In CVPR,2015.
[27]Yun Liu,Ming-Ming Cheng,Xiaowei Hu,Kai Wang,and Xiang Bai.Richer convolutional features for edge detection.In CVPR,2017.
Claims (7)
1. a real-time and controllable scale space filtering method is characterized in that based on a recurrent neural network model composed of a guide network G and a stripping network P, the method comprises the following steps:
(1) using original image I as input of guide network G, utilizing guide network to output several edge images G with different scales of image in circulating modetThen, the edge image G of the image is processedtAnd the filtering result It-1Input together into the stripping network P; wherein t represents the t-th step of the cycle; t is 1,2,3 … … T, wherein I0I denotes the original drawing;
(2) stripping the network P at the edge image GtUnder the guidance of (2), outputting the next filtering result It;
(3) Filtering result ItRecycled ground and edge image Gt+1Taken together as input to the stripping network to again obtain a new filtering result It+1Repeating the operation until the cycle number reaches the set total cycle number T; in which the filtering result I output by the stripping network P istAnd the input edge image GtMaintaining the same picture structure, i.e. at GtUnder the guide of (3), image peeling is performed.
2. A real-time and controllable scale-space filtering method according to claim 1, wherein the peeling network P is capable of peeling each component from the image hierarchically: the structure/edge image included in the filtering result of each step is a subset of the structure/edge image included in the filtering result of the previous step.
3. A real-time and controllable scale-space filtering method according to claim 1, characterized in that for edge images GtOf the edge image, which is in the filtering result ItThe gradient of the pixels corresponding to the same position should remain unchanged; for GtOf the non-edge image, which is in the filtering result ItThe more thoroughly the corresponding co-located pixels are smoothed the better.
4. A real-time and controllable scale space filtering method according to claim 1, characterized in that the edge image GtCan be obtained by any one of the existing depth or non-depth edge image detection methods.
5. A real-time and controllable scale space filtering method according to claim 1, wherein the loop of each step in the filtering method is implemented by a single forward propagation operation or by an iterative two-step or three-step operation.
6. A real-time and controllable scale space filtering method according to claim 1, wherein the peeling network P can learn any one of the existing filtering methods supervised and also can train in an unsupervised manner; at each step of the cycle, the core of the stripping network P is to accept the output I of the previous stept-1As input, at the edge image GtIs guided by (2) to perform image filtering, thereby obtaining a filtered image fromt-1Middle peeling off to obtaint。
7. A real-time and controllable scale space filtering method according to claim 1, wherein the filtering result of the input image is obtained by setting hyper-parameters,the guide network G is used for establishing a hyper-parameter and sum-edge image GtThe relation between the two; setting different hyper-parameters for different scales, wherein each hyper-parameter corresponds to an edge image Gt。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110172012.2A CN112862715B (en) | 2021-02-08 | 2021-02-08 | Real-time and controllable scale space filtering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110172012.2A CN112862715B (en) | 2021-02-08 | 2021-02-08 | Real-time and controllable scale space filtering method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112862715A true CN112862715A (en) | 2021-05-28 |
CN112862715B CN112862715B (en) | 2023-06-30 |
Family
ID=75989229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110172012.2A Active CN112862715B (en) | 2021-02-08 | 2021-02-08 | Real-time and controllable scale space filtering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112862715B (en) |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100074552A1 (en) * | 2008-09-24 | 2010-03-25 | Microsoft Corporation | Removing blur from an image |
CN102521798A (en) * | 2011-11-11 | 2012-06-27 | 浙江捷尚视觉科技有限公司 | Image automatic recovering method for cutting and selecting mask structure based on effective characteristic |
CN105931192A (en) * | 2016-03-21 | 2016-09-07 | 温州大学 | Image texture filtering method based on weighted median filtering |
CN107622481A (en) * | 2017-10-25 | 2018-01-23 | 沈阳东软医疗系统有限公司 | Reduce the method, apparatus and computer equipment of CT picture noises |
CN107633490A (en) * | 2017-09-19 | 2018-01-26 | 北京小米移动软件有限公司 | Image processing method, device and storage medium |
CN107844751A (en) * | 2017-10-19 | 2018-03-27 | 陕西师范大学 | The sorting technique of guiding filtering length Memory Neural Networks high-spectrum remote sensing |
CN108280831A (en) * | 2018-02-02 | 2018-07-13 | 南昌航空大学 | A kind of acquisition methods and system of image sequence light stream |
CN108492308A (en) * | 2018-04-18 | 2018-09-04 | 南昌航空大学 | A kind of determination method and system of variation light stream based on mutual structure guiding filtering |
CN109118451A (en) * | 2018-08-21 | 2019-01-01 | 李青山 | A kind of aviation orthography defogging algorithm returned based on convolution |
CN109272539A (en) * | 2018-09-13 | 2019-01-25 | 云南大学 | The decomposition method of image texture and structure based on guidance figure Total Variation |
CN109450406A (en) * | 2018-11-13 | 2019-03-08 | 中国人民解放军海军航空大学 | A kind of filter construction based on Recognition with Recurrent Neural Network |
CN109978764A (en) * | 2019-03-11 | 2019-07-05 | 厦门美图之家科技有限公司 | A kind of image processing method and calculate equipment |
CN110009580A (en) * | 2019-03-18 | 2019-07-12 | 华东师范大学 | The two-way rain removing method of single picture based on picture block raindrop closeness |
CN110246099A (en) * | 2019-06-10 | 2019-09-17 | 浙江传媒学院 | It is a kind of keep structural edge image remove texture method |
CN110276721A (en) * | 2019-04-28 | 2019-09-24 | 天津大学 | Image super-resolution rebuilding method based on cascade residual error convolutional neural networks |
CN110689021A (en) * | 2019-10-17 | 2020-01-14 | 哈尔滨理工大学 | Real-time target detection method in low-visibility environment based on deep learning |
CN110910317A (en) * | 2019-08-19 | 2020-03-24 | 北京理工大学 | Tongue image enhancement method |
CN110991463A (en) * | 2019-11-04 | 2020-04-10 | 同济大学 | Multi-scale guided filtering feature extraction method under guide of super-pixel map |
CN111275642A (en) * | 2020-01-16 | 2020-06-12 | 西安交通大学 | Low-illumination image enhancement method based on significant foreground content |
CN111462012A (en) * | 2020-04-02 | 2020-07-28 | 武汉大学 | SAR image simulation method for generating countermeasure network based on conditions |
CN111626330A (en) * | 2020-04-23 | 2020-09-04 | 南京邮电大学 | Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation |
CN111639471A (en) * | 2020-06-01 | 2020-09-08 | 浙江大学 | Electromagnetic interference filter design method based on recurrent neural network |
CN112132753A (en) * | 2020-11-06 | 2020-12-25 | 湖南大学 | Infrared image super-resolution method and system for multi-scale structure guide image |
-
2021
- 2021-02-08 CN CN202110172012.2A patent/CN112862715B/en active Active
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100074552A1 (en) * | 2008-09-24 | 2010-03-25 | Microsoft Corporation | Removing blur from an image |
CN102521798A (en) * | 2011-11-11 | 2012-06-27 | 浙江捷尚视觉科技有限公司 | Image automatic recovering method for cutting and selecting mask structure based on effective characteristic |
CN105931192A (en) * | 2016-03-21 | 2016-09-07 | 温州大学 | Image texture filtering method based on weighted median filtering |
CN107633490A (en) * | 2017-09-19 | 2018-01-26 | 北京小米移动软件有限公司 | Image processing method, device and storage medium |
CN107844751A (en) * | 2017-10-19 | 2018-03-27 | 陕西师范大学 | The sorting technique of guiding filtering length Memory Neural Networks high-spectrum remote sensing |
CN107622481A (en) * | 2017-10-25 | 2018-01-23 | 沈阳东软医疗系统有限公司 | Reduce the method, apparatus and computer equipment of CT picture noises |
CN108280831A (en) * | 2018-02-02 | 2018-07-13 | 南昌航空大学 | A kind of acquisition methods and system of image sequence light stream |
CN108492308A (en) * | 2018-04-18 | 2018-09-04 | 南昌航空大学 | A kind of determination method and system of variation light stream based on mutual structure guiding filtering |
CN109118451A (en) * | 2018-08-21 | 2019-01-01 | 李青山 | A kind of aviation orthography defogging algorithm returned based on convolution |
CN109272539A (en) * | 2018-09-13 | 2019-01-25 | 云南大学 | The decomposition method of image texture and structure based on guidance figure Total Variation |
CN109450406A (en) * | 2018-11-13 | 2019-03-08 | 中国人民解放军海军航空大学 | A kind of filter construction based on Recognition with Recurrent Neural Network |
CN109978764A (en) * | 2019-03-11 | 2019-07-05 | 厦门美图之家科技有限公司 | A kind of image processing method and calculate equipment |
CN110009580A (en) * | 2019-03-18 | 2019-07-12 | 华东师范大学 | The two-way rain removing method of single picture based on picture block raindrop closeness |
CN110276721A (en) * | 2019-04-28 | 2019-09-24 | 天津大学 | Image super-resolution rebuilding method based on cascade residual error convolutional neural networks |
CN110246099A (en) * | 2019-06-10 | 2019-09-17 | 浙江传媒学院 | It is a kind of keep structural edge image remove texture method |
CN110910317A (en) * | 2019-08-19 | 2020-03-24 | 北京理工大学 | Tongue image enhancement method |
CN110689021A (en) * | 2019-10-17 | 2020-01-14 | 哈尔滨理工大学 | Real-time target detection method in low-visibility environment based on deep learning |
CN110991463A (en) * | 2019-11-04 | 2020-04-10 | 同济大学 | Multi-scale guided filtering feature extraction method under guide of super-pixel map |
CN111275642A (en) * | 2020-01-16 | 2020-06-12 | 西安交通大学 | Low-illumination image enhancement method based on significant foreground content |
CN111462012A (en) * | 2020-04-02 | 2020-07-28 | 武汉大学 | SAR image simulation method for generating countermeasure network based on conditions |
CN111626330A (en) * | 2020-04-23 | 2020-09-04 | 南京邮电大学 | Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation |
CN111639471A (en) * | 2020-06-01 | 2020-09-08 | 浙江大学 | Electromagnetic interference filter design method based on recurrent neural network |
CN112132753A (en) * | 2020-11-06 | 2020-12-25 | 湖南大学 | Infrared image super-resolution method and system for multi-scale structure guide image |
Non-Patent Citations (4)
Title |
---|
LIXIANG ZHEN;LU YONG GE;XUZHI MING;YAOJING PING: "《RADAR Cross Section Measurement And Imaging RRelated To Ship Target In The Sea Environment》", 《PROCEDIA COMPUTER SCIENCE》, 6 February 2019 (2019-02-06) * |
张永新作, 新华出版社 * |
张燕咏,张莎,张昱等: "《基于多模态融合的自动驾驶感知及计算》", 《计算机研究与发展》 * |
张燕咏,张莎,张昱等: "《基于多模态融合的自动驾驶感知及计算》", 《计算机研究与发展》, 31 December 2020 (2020-12-31) * |
Also Published As
Publication number | Publication date |
---|---|
CN112862715B (en) | 2023-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A closed-form solution to photorealistic image stylization | |
Tian et al. | Deep learning on image denoising: An overview | |
Li et al. | Single image dehazing via conditional generative adversarial network | |
Zhu et al. | A fast single image haze removal algorithm using color attenuation prior | |
Zhou et al. | UGIF-Net: An efficient fully guided information flow network for underwater image enhancement | |
CN108875935B (en) | Natural image target material visual characteristic mapping method based on generation countermeasure network | |
Brox et al. | Unsupervised segmentation incorporating colour, texture, and motion | |
Liu et al. | Interactive image segmentation based on level sets of probabilities | |
Atoum et al. | Color-wise attention network for low-light image enhancement | |
Basaran et al. | An efficient framework for visible–infrared cross modality person re-identification | |
WO2015192115A1 (en) | Systems and methods for automated hierarchical image representation and haze removal | |
CN111488865A (en) | Image optimization method and device, computer storage medium and electronic equipment | |
CA3137297C (en) | Adaptive convolutions in neural networks | |
KR102311796B1 (en) | Method and Apparatus for Deblurring of Human Motion using Localized Body Prior | |
CN111681198A (en) | Morphological attribute filtering multimode fusion imaging method, system and medium | |
Feng et al. | URNet: A U-Net based residual network for image dehazing | |
CN113379707A (en) | RGB-D significance detection method based on dynamic filtering decoupling convolution network | |
CN115880720A (en) | Non-labeling scene self-adaptive human body posture and shape estimation method based on confidence degree sharing | |
Qu et al. | UMLE: unsupervised multi-discriminator network for low light enhancement | |
Wang et al. | Adaptive shape prior in graph cut segmentation | |
Yuan et al. | Explore double-opponency and skin color for saliency detection | |
CN116342377A (en) | Self-adaptive generation method and system for camouflage target image in degraded scene | |
Guo et al. | Progressive Domain Translation Defogging network for real-world fog images | |
CN113627342B (en) | Method, system, equipment and storage medium for video depth feature extraction optimization | |
CN112862715B (en) | Real-time and controllable scale space filtering method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |